Suvarna Garge (Editor)

Conflict free replicated data type

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In distributed computing, a conflict-free replicated data type (abbreviated CRDT) is a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks) . As their name indicates, a CRDT instance is distributed into several replicas; each replica can be mutated promptly and concurrently; the potential divergence between replicas is however guaranteed to be eventually reconciled through downstream synchronisation (off the critical path); consequently CRDTs are known to be highly available.

Contents

There are two alternative routes to ensure SEC: operation-based CRDTs and state-based CRDTs . The two alternatives are equivalent, as one can emulate the other, but there is a tradeoff: operation-based CRDTs require additional guarantees from the communication middleware whereas state-based CRDTs have a high dissemination overhead, as the entire state must be disseminated. Delta state CRDTs (or simply Delta CRDTs) are optimised state-based CRDTs where only recently applied mutations to a state are disseminated instead of the entire state. Pure operation-based CRDTs are also an improved variant of operation-based CRDTs that reduce the meta-data size through exploiting the causality information of the middleware.

CRDTs are used to replicate data across multiple computers in a network, executing updates without the need for remote synchronization. This would lead to merge conflicts in systems using conventional eventual consistency technology, but CRDTs are designed such that conflicts are mathematically impossible. Under the constraints of the CAP theorem they provide the strongest consistency guarantees for available/partition-tolerant (AP) settings. In contrast, consensus protocols such as Paxos are required for strongly-consistent/partition-tolerant (CP) settings.

The CRDT concept was first formally defined in 2007 by Marc Shapiro and Nuno Preguiça in terms of operation commutativity, and development was initially motivated by collaborative text editing. The concept of semilattice evolution of replicated states was first defined by Baquero and Moura in 1997, and development was initially motivated by mobile computing. The two concepts were later unified in 2011. Precursor ideas can be traced back as far as 1990

Eventual consistency

Informally, eventual consistency means that replicas eventually reach the same value if clients stop submitting updates. Eventually consistent systems accept local updates without remote synchronization, improving performance and scalability by sacrificing strong consistency. Without remote synchronization, replicas concurrently hold different values which are expected to converge over time. Convergence is complicated by conflicts which arise when merging values between replicas. A conflict is a combination of concurrent updates which may be individually correct, but taken together violate some system invariant. Conventional conflict-resolution schemes involve state roll-back, full consensus, or even user interaction.

Strong eventual consistency

Strong eventual consistency is a property of some eventually-consistent systems: replicas that have received and applied the same set of updates must immediately have equivalent state. There is no conflict arbitration process, because conflicts do not exist in strong eventually consistent systems. CRDTs are used to achieve strong eventual consistency in a distributed system.

Mathematical properties

If the system is monotonically increasing in state, clients never observe state rolling back. The set of system states is partially ordered, and the merge operation being commutative, associative and idempotent, the set of all system states is a semilattice, and the merge operation is the semilattice join.

CRDT classes

Two general classes of CRDTs are known to exist. Although any CRDT of one class has an other-class equivalent, the classes differ in assumptions and performance characteristics.

Operation-based CRDTs

Operation-based CRDTs are called commutative replicated data types, or CmRDTs. CmRDT replicas propagate state by broadcasting the state update operation itself, which must be commutative. For example, a CmRDT of a single integer might broadcast the operations (+10) or (−20). Replicas receive the updates and apply them locally. The operations are commutative, so can be received and applied in any order; however, they are not idempotent, and additional network protocol guarantees are required to ensure unique delivery.

State-based CRDTs

State-based CRDTs are called convergent replicated data types, or CvRDTs. In contrast to CmRDTs, CvRDTs send their full local state to other replicas. CvRDTs have the following local interface:

query
reads the state of the replica, with no side effects.
update
writes to the replica state in accordance with certain restrictions.
merge
merges local state with the state of some remote replica.

The merge function must be commutative, associative, and idempotent. It provides a join for any pair of replica states, so the set of all states forms a semilattice. The update function must monotonically increase the internal state, according to the same partial order rules as the semilattice.

Comparison

While CmRDTs require additional guarantees from the network protocol, they use less bandwidth than CvRDTs when the number of transactions is small in comparison to the size of internal state. However, since the CvRDT merge function is associative, merging with the state of some replica yields all previous updates to that replica; gossip protocols work well for propagating CvRDT state to other replicas while reducing network use and handling topology changes.

Some lower bounds on the storage complexity of state-based CRDTs are known.

State-based increment-only counter

payload integer[n] P initial [0,0,...,0] update increment() let g = myId() P[g] := P[g] + 1 query value() : integer v let v = i P[i] compare (X, Y) : boolean b let b = ( i [0, n - 1] : X.P[i] Y.P[i]) merge (X, Y) : payload Z let i [0, n - 1] : Z.P[i] = max(X.P[i], Y.P[i])

This CvRDT implements a counter for a cluster of n nodes. Each node in the cluster is assigned an ID from 0 to n - 1, which is retrieved with a call to myId(). Thus each node is assigned its own slot in the array P, which it increments locally. Updates are propagated in the background, and merged by taking the max() of every element in P. The compare function is included to illustrate a partial order on the states. The merge function is commutative, associative, and idempotent. The update function monotonically increases the internal state according to the compare function. This is thus a correctly-defined CvRDT and will provide strong eventual consistency. The CmRDT equivalent broadcasts increment operations as they are received.

State-based PN-counter

payload integer[n] P, integer[n] N initial [0,0,...,0], [0,0,...,0] update increment() let g = myId() P[g] := P[g] + 1 update decrement() let g = myId() N[g] := N[g] + 1 query value() : integer v let v = i P[i] - i N[i] compare (X, Y) : boolean b let b = ( i [0, n - 1] : X.P[i] Y.P[i] i [0, n - 1] : X.N[i] Y.N[i]) merge (X, Y) : payload Z let i [0, n - 1] : Z.P[i] = max(X.P[i], Y.P[i]) let i [0, n - 1] : Z.N[i] = max(X.N[i], Y.N[i])

A common strategy in CRDT development is to stick multiple primitive CRDTs together to make a more complex CRDT. In this case, two increment-only counters were combined to create a CvRDT supporting both increment and decrement operations. Note that the CvRDT's internal state must increase monotonically, even though its external state as exposed through query can return to previous values.

State-based grow-only set

payload set A initial update add(element e) A := A {e} query lookup(element e) : boolean b let b = (e A) compare (S, T) : boolean b let b = (S.A T.A) merge (S, T) : payload U let U.A = S.A T.A

The grow-only set is a CvRDT implementing a set which only allows adds. Since it is impossible for adds and removes to commute (one must take precedence over the other), any CvRDT supporting both add and remove operations must pick and choose its semantics.

State-based 2P-set

payload set A, set R initial , query lookup(element e) : boolean b let b = (e A e R) update add(element e) A := A {e} update remove(element e) pre lookup(e) R := R {e} compare (S, T) : boolean b let b = (S.A T.A S.R T.R) merge (S, T) : payload U let U.A = S.A T.A let U.R = S.R T.R

Two grow-only set CvRDTs are combined to create the 2P-set CvRDT. With the addition of a "tombstone" set, elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an element e is in the tombstone set, query will never again return True for that element. The 2P-set uses "remove-wins" semantics, so remove(e) takes precedence over add(e).

Sequence CRDT

A sequence (a.k.a. list, ordered set) CRDT can be used to build a Collaborative real-time editor, as an alternative to Operational transformation (OT).

Some known Sequence CRDTs are Treedoc, RGA, Woot, Logoot, LSEQ. CRATE is a decentralized real-time editor built on top of LSEQ and runnable on a network of browsers thanks to WebRTC.

Others

  • OR-Set: Add-wins set with tombstones, supporting multiple adds and removes by storing elements with GUIDs.
  • LWW-element-set: Set with LWW timestamped adds and removes
  • AWORSet: Add-wins optimized observed-remove set that allows adds and removes (a.k.a. ORSWOT)
  • RWORSet: Remove-wins optimized observed-remove set that allows adds and removes
  • MVRegister: Optimized multi-value register (Dynamo shopping cart)
  • Graphs: Using set CRDTs for the sets of vertices and edges
  • [SU-Set ] is a CRDT for the RDF-Graph type with SPARQL 1.1 Update operations.
  • Industry use

    Support for CRDTs is implemented in Riak. League of Legends uses the Riak CRDT implementation for its in-game chat system, which handles 7.5 million concurrent users and 11,000 messages per second. Bet365 (the largest European on-line betting company with 2.5 million simultaneous users peak), store hundreds of megabytes of data in the Riak implementation of OR-Set.

    SoundCloud open-sourced Roshi, a LWW-element-set CRDT for the SoundCloud stream implemented on top of Redis.

    TomTom employs CRDTs to synchronize navigation data between the devices of a user.

    Phoenix, a web framework written in Elixir, uses CRDTs to support real time multi-node information sharing in version 1.2.

    References

    Conflict-free replicated data type Wikipedia