Paper Reading Notes: Scaling Memcache At Facebook

Facebook uses memcache as a key-value cache in between frontend instances and MySQL databases. It is optimized for read-heavy scenarios where direct SQL databases use would be prohibitive. Memcache in itself (the original open-source project) is an in-memory, single computer, key-value cache; this paper explains how Facebook scaled this out to a multi-computer, multi-region key-value cache with a SQL-backed permanent store. It also goes over how it is used in their server-side architecture.

Typically, a client connects to a front-end through a load-balancer, the frontend requests data needed to render the client's request (let's say a friend's feed) to many different memcache instances, aggregates all the results and returns the result to the client. This is done through an all-to-all communication pattern, where frontends fetch data from many memcache instances (in theory, one frontend instance could be connected to all memcache instances; memcache instances do not communicate with each-other). See the read / write section below for what happens on a cache miss or delete & subsequent cache invalidation.

Keywords: key-value stores, memcache, large scale distributed systems, distributed cache.

Link to paper: https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala, full pdf here.

Key Concepts

How Read / Writes Are Done

Read:

When a get issued to a memcache instance is successful ({key, value} pair in memory), the value is simply returned to the client. On a cache miss, the client is responsible for fetching the data from the MySQL database and then updating the read key in the memcache instances (see figure 1).

Write:

On a write (or delete), the data is first deleted form the MySQL database, then the memcache instances that hold the key to this data are notified. Of course, this means that it is possible for a frontend server to do a valid get on a memcache instance after the data has actually been deleted in the database.

This is eventual consistency: it is possible to read stale data, you application just has to deal with it.

Overarching Architecture

Facebook is organized in regions (machines geographically located together) and clusters (set of machines in a region). Data is organized in one master MySQL database, and a number of replicas that can be in different regions / clusters (5.0). A cluster contains a set of frontends (web servers that the client directly connects to), memcache instances and MySQL database instances. Each frontend is connected to a number (up to all) of memcache instances through a proxy named mcrouter. The memcache instances do not communicate with each other.

Key can be replicated to multiple clusters, but I can't find any details on the criteria and how a frontend instance gets > 1 cache location from hashing a key.

When a write / delete occurs on a database (MySQL), a broadcaster proxy (called "McSqueal") is informed and will notify each memcache instance containing the key whose's data has changed. The master / replica MySQL servers are kept in sync using MySQL's replication mechanism.

Details

Performance

Gets are performed using UDP, but writes and delete use a long-lived TCP connection proxy (mcrouter) which is shared between all threads on a machine.

The paper discussed TCP vs. UDP, with UDP having a ~20% latency benefit using a "try, if fail try again later" model where the client implements the retry logic (3.1).

Unanswered Questions / Things Not Clear From Paper