Identity and reputation are important primitives in the Web 3 world. In this post, I’ll give an overview of the problem space and outline some approaches.
Why identity & reputation matter
Let’s start with the how identity and reputation unlock value in the protocol layer.
Transform one-off games into iterated ones
Since blockchain users are pseudonymous and the cost of spinning up a new address is negligible, protocols are limited in how they can enforce good behavior. Today it is common for us to use the notion of staking and implement “slashing conditions"; in essence, punishing bad behavior through loss of something participants value (usually ETH or an ERC20 token).
Now suppose there was a way to create a digitally-scarce notion of identity and reputation. If this were the case, we could modify existing staking techniques so that misbehavior results in loss of reputation, as well as capital. Protocols could also reward high-reputation participants; for instance, by giving them higher income from the network (analogous to the “work token” model used in Cosmos and Keep). Participants will be incentivized to optimize for reputation as opposed to short-term wins; one-off prisoner’s dilemmas become iterated ones.
Perhaps the most important application of sampling is in consensus itself (Proof-of-Work and Proof-of-Stake are stand-ins for Sybil-resistant identity). This can extend to other use-cases like selecting who stores a file or performs a computation in resource marketplaces.
Rate-limiting and pricing strategies
Web services make extensive use of rate limiting and pricing strategies to onboard and cater to different tiers of users. Dropbox, for instance, uses a freemium model (anyone can use the service for free up to a level). By relying on an identity system, decentralized protocols could implement similar features. Filecoin, for instance, could use a freemium model or offer lower prices for power users.
Peering beyond the protocol layer, we also see many applications that require identity.
Existing blockchain voting systems are based on coin-voting. Coins are a scarce resource and provide Sybil-resistance, but also lead to plutocracy, where those with wealth have more say. Schemes like quadratic voting, which lower returns to wealth, are currently gameable; one can spread their coins across multiple addresses.
Identity enables one to use these schemes.
Reputation, as derived from past behavior within the protocol or through trust by the community, could be used to determine the weight of one’s vote.
It’s worth noting that secure onchain voting has requirements beyond the above. To prevent bribery, for instance, participants should not be able to prove how they voted (the property of “coercion resistance” more broadly, as outlined in this post by Daian, Kell, Miers, and Juels)
Onchain lending protocols cannot trust participants, and therefore require full collateralization. This high cost of capital inhibits adoption.
Identity can change this by identifying the borrower's real-world attributes (their credit score, record of home ownership, or income). Reputation, as a result of borrowing history or vouching by others, can also act as a stand in for collateral.
Tokenization of real-world securities such as real-estate, equity, and art provides concrete value-adds: fractional ownership, immediate settlement, and 24/7 global markets. Issuance and trade of these tokens requires compliance with security laws (by performing KYC, AML, accreditation, and other checks on the trading parties). These restrictions could be baked into the token itself (Harbor’s approach with R-Token) or enforced at the exchange level (e.g. in a DEX prior to calling the ERC 20 token’s transfer function).
Token sales & airdrops
Airdrops incentivize participants in token networks. Examples of recent schemes are Livepeer’s Merkle Mine and Handshake’s airdrop based on open-source contributions. With identity and reputation, one would be able to design more nuanced distribution techniques.
Other use-cases for identity and reputation include: building social features in onchain games (e.g. the popularity of in-game items impacting their price); login and access-control; DPoS; universal basic income; and aggregating inputs within decentralized oracles.
Now let’s look at what we exactly mean by identity and reputation.
What do we mean by identity and reputation?
An identity is the atomic actor within the system. On Ethereum, these would be Ethereum addresses.
Identities can attest to facts about themself and others. These facts are known as claims.
Identities can also build up reputation over time.
The protocols for identity, claims, and reputation should interact with each other through well-defined interfaces. The system should be modular, extensible, and have these properties:
Decentralization: the rules should be defined and enforced by participants within the network, not by a central authority.
Self-sovereignty: users should own their own identity, claims, and reputation.
This stands in contrast to Web 2 systems. A merchant amassing reputation for their products on Amazon will lose this data if Amazon goes out of business or decides to remove them.
Portability & interoperability: the system should not lock users in. It should allow users to move their data into other systems.
Sybil resistance: participants should not gain an advantage by issuing multiple identities. Moreover, the protocol must disincentivize participants from leaving an identity to regain newcomer status.
Building decentralized Sybil-resistant identity is a high-impact problem. This could involve using a function that is easy to compute once but hard to compute multiple times. Or an action which can be be performed by humans and not by machines. Proof-of-Work and Proof-of-Stake are current approaches.
Privacy: participants should be able to selectively share data with those they desire. Moreover, participants should be identified by opaque identifiers by default.
We will now look into each component of this system: identity, claims, and reputation.
The identity can either be an externally-owned account (controlled by a private key) or a smart contract (a multisig, a DAO, or something else). It can sign messages, encrypt data, and make claims about itself and others.
Key management is an important part of identity management. Using a smart contract for identity, as opposed to a private key, allows one to build features for key recovery, access control, and dynamic spending limits.
An identity contract should have a key recovery mechanism that handles the following cases:
- Loss of keys (i.e. the user has lost access).
- Theft of keys (i.e. the user has access + a thief has access)
- Theft and loss of keys (i.e. the user has lost access + a thief has access)
Various mechanisms can be used for this: social recovery, paralysis proofs, or threshold schemes using Shamir Secret Sharing or Schnorr signatures.
Separation of concerns amongst keys allows them to be held with different levels of security. ERC 725, for instance, proposes having separate keys for purposes of management, action, making claims, and encryption.
In order to maintain privacy, the user should create and manage separate identities for different use-cases. Otherwise, using the same identity to borrow, delegate to a staker, and pay friends will result in the user's real-world identity being triangulated. Hierarchical Deterministic (HD) schemes as specified in BIP 32/39/44 can be used here.
Lastly, the identity scheme should be interoperable with other systems and adopt the W3C DID (Decentralized Identifier) specification.
A claim is a fact stated by one identity about another.
Claims follow the syntax “an entity makes a claim about a subject”, and can express many things:
- “A claims that B has passed KYC” – useful for Sybil-resistance.
- “A claims that B is an accredited investor” – useful for compliance (token sales and trade of security tokens).
- “A claims that B is over-21” – useful for products requiring age of majority.
- “A claims that B is a resident of China” – useful for compliance.
- “A claims that B has x income” – useful for underwriting credit risk.
The claim is signed by the claimant, so its authenticity and integrity is cryptographically verifiable.
One can even make claims about oneself (e.g. about one’s name, citizenship, or date of birth).
Some design principles to keep in mind:
Claims should be encrypted: claims can contain sensitive information. As such, they should always be stored in an encrypted format. The owner can use a selective disclosure flow to share with specific identities (for instance, by using a Diffie-Hellman key exchange).
Store as little data as possible onchain: unless required for onchain compliance (for instance, a permissioned ERC20 trade), claims should be stored offchain with a cryptographic commitment (e.g. a Merkle root) stored onchain. Moreover, due to the likely future prevalence of quantum computers and the susceptibility of current cryptosystems, even encrypted data should not be stored on the blockchain.
Only the claim creator should be able to remove their claim: this enables the framework to be used for negative claims without the subject removing them. For instance, one can claim that an Ethereum address is part of a phishing scam (a persistent problem in the crypto space from fake ICOs to Twitter giveaways) or that a seller did not uphold their end of a commercial transaction.
Continuing with this train of thought, one can imagine that subjects will accumulate claims made by irrelevant or malicious third parties. This is not problematic since no claim is authoritative. Claims are simply facts stated by an identity about another identity. Determining which claims are “true” is left entirely to the verifier.
Claims can expire: claims do not last forever. For instance, accreditation status needs to be renewed every 90 days in some jurisdictions. By including the block number at which a claim was issued, verifiers can interpret the validity of the claim according to their own judgement.
Now we will consider some proposed designs.
ERC 725 is a smart contract identity, which implements ERC 735, a standard interface for managing that identity’s claims. The user has to approve all claims made about them and can at any point remove them. Since the system does not require coordination between different parties, it is upgradable.
There are some problems with this approach. Since each identity deploys its own claims contract, there is no guarantee that it adheres to the spec in ERC 735. Therefore, verifiers interacting with the identity need to first verify its source code. This adds friction. Moreover, allowing the user to delete claims made about them precludes this system from use for negative claims.
Lastly, to protect user privacy, ERC 725 should be paired with a system for offchain claims as it stores all claims onchain.
ERC 780 proposes a global registry to store all claims on Ethereum. uPort is using the registry as the basis of a decentralized PKI system, with the aim of keeping most claims offchain. The registry makes no statement about whether the claimants or subjects are accounts or contracts. And since there is only one contract, verifiers know that they can trust its logic.
One problem with this approach is lack of expressibility. Requiring that all claims conform to the same data structure is inherently limiting. Upgrading the central will be challenging as it needs buy-in from a wide range of stakeholders.
The Zeppelin TPL (Transaction Processing Layer) project is designed for permissioned trading of ERC 20 tokens. In this scheme, a different contract is deployed for each so called jurisdiction. The jurisdiction’s governance collective ellects certificate authorities, who can then write to the jurisdiction’s claim registry.
This is a practical design. Verifiers can readily trust claims within a jurisdiction since they know all claimants are certified. On the other hand, allowing only certain users to create claims leads to inflexibility.
The system might also run into complexities at scale. No jurisdiction will provide a conclusive list of relevant claims (due to differences in governance, geography, or use-case). As a result, verifiers will have to check multiple registries, or subjects will need to replicate their claims across jurisdictions.
One can encode permissions as an NFT (Non-Fungible Token), which a group of authorities can mint for users after verification. The problem here is that users can trade their NFTs, while the design does not provide a perceptible benefit over using claims.
A potential design to bring together the benefits of the above approaches is to use a system of federated claim registries.
Registries specific to each claim type are created; they inherit from a top-level contract and implement the exact rules by which claims can be added or removed. This adds expressibility, while ensuring that there are no redundant claims across registries.
The registry would, moreover, allow anyone to create claims. Verifiers have the responsibility of filtering which claims they trust. They can implement rules which depend on their specific use-case, geography, or time. Or they can rely on whitelists of certified claimants as provided by a centralized party (e.g. the government) or a decentralized system (e.g. a DAO).
Now let’s move onto reputation.
One can think of reputation in two broad categories: reputation based on the trust graph, or reputation based on behavior.
Reputation based on the trust graph
This scheme borrows ideas from social networks, PageRank, and has similarities to liquid democracy.
A thought experiment
Imagine that prior to a large vote, each Ethereum address is airdropped 100 trust tokens. The instructions are to delegate these to people whose opinion they value in the vote.
During a campaigning period, users advertise their Ethereum addresses online (on Twitter, Reddit, personal websites) and ask for others to delegate to them. Depending on their reputation for expertise and thoughtfulness within the community, they receive allocations.
After the campaigning period, each user casts a vote. The weight of each vote is derived from the number of delegated tokens that the user received, either as equal to the total number, or based on a metric like PageRank.
In our example, token allocations could be seen as edges in a graph. PageRank is recursive and incorporates the wisdom of crowds. If many users allocate tokens to person A, then the tokens that person A allocates to others are worth more. The tokens that these users allocate are, in turn, worth more. Iterating in this manner, reputation scores converge to their true values.
Note that this scheme can also be implemented on-top of a claims protocol. Instead of allocating trust tokens, one can use claims from one identity to another to express edges.
As we will see, there are some problems with reputation based on the trust-graph.
An attacker can spin up multiple identities and have them trust each other, artificially propping up their reputation scores. There are a few ways to solve this problem:
The original PageRank algorithm, presented below, has a term E(u) which makes the crawler jump to a pre-selected website with some probability. This prevents the crawler from getting stuck in Sybil sub-graphs.
Likewise, in our use-case, trusted identities could be decided up front. Or they could be sampled from a set of identities who have verified real-world attributes.
One can compute reputation scores locally. In lending, for instance, an underwriter can assess a borrower’s risk by only incorporating the user's interactions with themself and other identities they trust.
Interestingly, if a user is trusted by large portions of the network, the global view can converge on their local view. This is similar to how IOUs from a reputable party come to represent money in the debt theory of money.
Require a scarce resource to create an edge
Another valid approach to making trust graphs Sybil resistant is to require a scarce resource for creating edges. This could be stake, or a kind of Proof-of-Work. Sourcecred, for instance, relies on Git commits and code dependencies to build a graph. The edges in this graph take real work to setup, and are therefore, resistant to Sybil attacks (barring machine learning algorithms that can code effectively!).
One could also analyze the trust graph manually and decide to filter results post-hoc. For instance, a cluster of addresses that trust each other but are not trusted by anyone else could be seen as suspicious, and therefore removed.
How are users incentivized to create correct edges within the trust graph?
One potential answer lies in work around p2p lending and insurance (e.g. TrustDavis) whereby nodes provide collateral for edges. They receive an interest when the edge is “in use” but also stand to lose their stake if the other misbehaves.
Another potential answer is that edges have to be created as a byproduct of existing behavior. For instance, a publisher adds high-quality links to their website in order to attract a large audience; as such, they give PageRank the data it needs. Likewise, a developer trying to build a good codebase which attracts other contributors is diligent to rely only on well-built dependencies; as such, they create the edges needed by Sourcecred (shoutout to Dandelion for this note).
The other kind of reputation is based on behavior.
Reputation based on behavior
In this scheme, instead of relying on graphs to determine reputation, one judges a node based on their historical behavior. For instance, the reputation of borrowers in Dharma’s lending protocol could be calculated from their historical payments and timeliness. Or the reputation of solvers in Truebit could be calculated from their historical correctness when solving tasks.
One could bake this kind of reputation into the protocol; as the user's reputation increases, they receive discounts or are given a higher income from the network.
These schemes rely on Sybil-resistant identity to work; otherwise, they can be gamed. One can increase their reputation by doing fake work for Sybil identities.
Another problem to be weary of is that one can build reputation up to an exit scam. This has happened with exchanges and Bitcoin lending platforms before. It’s therefore important for a reputation scheme to have a rigorous analysis of economic cost for byzantine behavior.
Other notes on reputation
Reputation is contextual. Just because someone is a good borrower does not mean they will be a good Casper staker, nor does it mean that they will be a good Livepeer transcoder. Or as Stefan George said, someone could have a good reputation within the mafia, but it doesn’t mean you should trust them.
Finally, the idea of “negative reputation” needs to be explored. If an identity can have a below-zero score after misbehavior, they would be incentivized to leave their identity to regain newcomer status.
Identity and reputation are important problems to work on. They turn protocols from one-off games to iterated ones, and are a necessary building block for many blockchain applications like sampling, pricing strategies, governance, lending, security tokens, and airdrops.
Thanks to Yondon Fu, Liam Horne, Robbie Bent, and Melisa Smith for feedback on this blog post. Thanks to Ali Yahya, Nicola Greco, Dandelion Mane, Andy Bromberg, Faraaz Nishtar, Sunny Aggarwal, Sid Ramesh, Nadav Hollander, Dan Finlay, Ryan Sepassi, Ben Fisch, Dieter Shirley, Martin Köppelmann, Stefan George, Kei Kreutler, Fred Ehrsam, Bjorn Wagner, Yondon Fu, Robbie Bent, and others for engaging conversations which contributed to these ideas.