Rahul Sharma (Editor)

Version vector

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality tracking among data replicas and are a basic mechanism for optimistic replication. In mathematical terms, the version vector generates a preorder that tracks the events that precede, and may therefore influence, later updates.

Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:

  • Initially all vector counters are zero.
  • Each time a replica experiences a local update event, it increments its own counter in the vector by one.
  • Each time two replicas a and b synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters: V a [ x ] = V b [ x ] = m a x ( V a [ x ] , V b [ x ] ) . After synchronization, the two replicas have identical version vectors.
  • Pairs of replicas, a , b , can be compared by inspecting their version vectors and determined to be either: identical ( a = b ), concurrent ( a b ), or ordered ( a < b or b < a ). The ordered relation is defined as: Vector a < b if and only if every element of V a is less than or equal to its corresponding element in V b , and at least one of the elements is strictly less than. If neither a < b or b < a , but the vectors are not identical, then the two vectors must be concurrent.

    Version vectors or variants are used to track updates in many distributed file systems, such as Coda (file system) and Ficus, and are the main data structure behind optimistic replication.

    Other Mechanisms

  • Hash Histories avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
  • Concise Version Vectors allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
  • Version Stamps allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
  • Interval Tree Clocks generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
  • Bounded Version Vectors allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
  • Dotted Version Vectors address scalability with a small set of servers mediating replica access by a large number of concurrent clients.
  • References

    Version vector Wikipedia