Paper Reading Notes: Dynamo: Amazon's Highly Available Key-value Store

Dynamo is a key-value NoSQL database service that is designed for high availability and elastic scalability at the cost of lenient consistency. It offers an alternative to relational databases who are constrained by their requirement to support ACID properties which do not scale well horizontally.

The paper goes over a number of concepts that are useful to generic large-scale distributed systems. It is interesting even if you are not interested in using Dynamo because it explains a lot of distributed computing recurring themes.

Link to paper: http://dl.acm.org/citation.cfm?id=1294281, full pdf.

Keywords: key-value stores, large scale distributed systems.

Advantages:

Drawbacks:

Key Concepts:

Gossip-Based Protocol

In a network of N nodes that want to know each other state, a naive approach is for all nodes to broadcast its status to all node for all status changes. However, as the number of nodes grows, this approach does not scale.

A gossip approach uses statistical propagation; for example, every second, every node selects another node randomly and sends its state. There are thorough mathematical underpinnings to consider (see [8]), but intuitively this generates a finite amount of traffic as N increases (N communications every second, instead of N square for the previous algorithm). Also intuitively, if the rate of change is slow enough, all nodes end-up with up-to date information on all nodes. Of course, this is another case of eventual consistency as it can take a lot of time for information to propagate to other nodes.

Unclear

It was not clear to me at first the difference between Dynamo and Amazon S3. Dynamo is for storing/retrieving small objects at very high throughput rates whereas Amazon S3 is used for huge objects that are not queried very frequently.

How to structure data for a service when relational queries are not available. Dynamo does support relational and non-relational DB for storage, when it uses SQL does it offer queries? Aren't SQL queries fundamentally incompatible with the consistent-hashing-node-partition used in Dynamo?