Trisha Shetty (Editor)

Distributed data store

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion. It is usually specifically used to refer to either a distributed database where users store information on a number of nodes, or a computer network in which users store information on a number of peer network nodes.

Contents

Distributed databases

Distributed databases are usually non-relational databases that make a quick access to data over a large number of nodes possible. Some distributed databases expose rich query abilities while others are limited to a key-value store semantics. Examples of limited distributed databases are Google's BigTable, which is much more than a distributed file system or a peer-to-peer network, Amazon's Dynamo and Windows Azure Storage.

As the ability of arbitrary querying is not as important as the availability, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in reduced consistency, as it is not possible to have both consistency, availability, and partition tolerance of the network, as it has been proven by the CAP theorem.

Peer network node data stores

In peer network data stores, the user can usually reciprocate and allow other users to use their computer as a storage node as well. Information may or may not be accessible to other users depending on the design of the network.

Most peer-to-peer networks do not have distributed data stores in that the user's data is only available when their node is on the network. However, this distinction is somewhat blurred in a system such as BitTorrent, where it is possible for the originating node to go offline but the content to continue to be served. Still, this is only the case for individual files requested by the redistributors, as contrasted with a network such as Freenet where all computers are made available to serve all files.

Distributed data stores typically use an error detection and correction technique. Some distributed data stores (such as Parchive over NNTP) use forward error correction techniques to recover the original file when parts of that file are damaged or unavailable. Others try again to download that file from a different mirror.

Distributed non-relational databases

  • Aerospike
  • Apache Cassandra, former data store of Facebook
  • BigTable, the data store of Google
  • CrateIO
  • Druid (open-source data store), used by Netflix, Yahoo and others
  • Dynamo of Amazon
  • Hazelcast
  • HBase, current data store of Facebook's Messaging Platform
  • Couchbase, data store used by LinkedIn, Paypal, Ebay and others.
  • MongoDB
  • Riak
  • Hypertable, from Baidu
  • Voldemort, data store used by LinkedIn
  • Peer network node data stores

  • BitTorrent
  • Blockchain (database)
  • Chord project
  • GNUnet
  • Freenet
  • Unity, of the software Perfect Dark
  • Mnet
  • NNTP (the distributed data storage protocol used for Usenet news)
  • Storage@home
  • Tahoe-LAFS
  • References

    Distributed data store Wikipedia