ScalabilitySystem Design

An Overview of Distributed Caching

This is the 4th post in a series on System Design.

Nearly 3% of infrastructure at Twitter is dedicated to application-level caching Link

In an application, caches can be found in many places. CPUs that run your applications have fast, multilevel hardware caches to reduce main memory access times.

Scalable systems require distributed caching. By caching the results of expensive queries and computations, subsequent requests can reuse them at a low cost. The system can scale to handle more workloads because it does not have to reconstruct the cached results for every request.

Application Caching

By caching the results of queries and computations in memory, application caching improves request responsiveness. As many queries can be served directly from the cache, caching relieves databases of heavy read traffic. Additionally, it reduces the computation costs of expensive objects.

Storage of cached results requires additional resources and therefore costs. When compared to upgrading database and service nodes, well-designed caching schemes are low cost.

Dedicated distributed cache engines are used for application-level caching. Memcached and Redis are the two most prevalent technologies in this area. Essentially, both are distributed in-memory hash tables for arbitrary data, such as results of database queries or API calls to downstream services

Data from user sessions and the results of database queries are commonly stored in caches. Application services see the cache as a single store, and objects are assigned to individual cache servers based on their key.

First, the service checks the cache to see if the data it needs is there. A cache hit occurs when the cached contents are returned as the results. Cache misses occur when the data is not in the cache, so the service retrieves the requested data from the database and writes the query results to the cache for future client requests.

The results of cache access must be associated with a key. Cache contents can be invalidated by using an expiry time, such as the TTL. This prevents clients from receiving stale, out-of-date results from a service. Additionally, it allows the system to control cache contents, which are typically limited. The cache will fill up if cached items are not flushed periodically. To free up space for more current, timely results, a cache will adopt a policy such as least recently used or least accessed.

Application caching can boost throughput, reduce latencies, and improve client application responsiveness. In order to achieve these desirable qualities, the cache must satisfy as many requests as possible. As a general principle, the cache hit rate should be maximized and the cache miss rate minimized. In the event of a cache miss, the request must be fulfilled by querying databases or downstream services. The results of the request can then be written to the cache and hence be available for further access.

The cost of cache misses can negate the cache’s benefits when items are updated regularly. In order to construct caching mechanisms that yield the most benefit, service designers must carefully examine the query and update patterns an application experiences. Once a service is in production, it is crucial to monitor cache usage to ensure the hit and miss rates are within design expectations. To monitor cache usage characteristics, caches will provide both management utilities and APIs.

Application-level caching is also known as cache-aside caching. When the required results are available in the cache, the application code effectively bypasses the data storage systems.

In contrast, other caching patterns require the application to read from and write to the cache continuously. The following are defined as follows:

  • Read-through Cache—By accessing the cache, the application fulfills all requests. If the data required is not in the cache, a loader is invoked to access the data systems and load the results into the cache.
  • Write-through Cache— Updates are always written to the cache by the application. As soon as the cache is updated, a writer is invoked to write the new cache values to the database. The application can complete the request once the database has been updated.
  • Write-behind Cache — The application does not wait for the cached value to be written to the database like a write-through. Caching increases request responsiveness at the expense of possible lost updates if the cache server crashes before a database update is completed. It is also called a write-back cache and is the strategy most database engines use internally.

The read-through, write-through, and write-behind strategies require caching technology augmented with an application-specific handler to perform database reads and write when the application accesses the cache.

The cache-aside strategy has the advantage of being resilient to cache failure. When the cache is unavailable, all requests are treated as cache misses. The performance will suffer, but services will still be able to satisfy requests. As a result of their simple, distributed hash table model, cache-aside platforms like Redis and Memcached are easy to scale. Massively scalable systems primarily use the cache-aside pattern because of these reasons.

Web Caching

Due to the abundance of web caches on the internet, websites are incredibly responsive. A copy of a given resource is stored in a web cache.

Typically, caches store only the results of GET requests, and the cache key is the associated URI. A client’s GET request may be intercepted by one or more caches along the request path. Any cache that has a fresh copy of the requested resource can respond to the request. The origin server serves the request if no cached content is found.

HTTP caching directives allow services to control what results are cached and for how long. These directives are set in various HTTP response headers.


Client requests and service responses can use the Cache-Control HTTP header to specify how caching should be used for the resources in question. The following values are possible:

no-store — Request-response resources should not be cached. Typically, this is used for sensitive data that needs to be retrieved from the origin servers.

no-cache — Before using a cached resource, it must be revalidated with an origin server

private — Specifies that a resource can only be cached by a user-specific device, such as a web browser.

public — Provides a resource that can be cached by any proxy server.

max-age — Sets the amount of time in seconds that a cached copy of a resource should be retained. After expiration, a cache must refresh the resource by sending a request to the origin server.

Expires and Last-Modified

To control how long cached data is retained, the Expires and Last-Modified HTTP headers interact with the max-age directive.

Since caches have limited storage resources, they must periodically evict items from memory to free up space. Services can specify how long cached resources should remain valid or fresh in order to influence cache eviction. In response to a fresh resource request, the cache serves the locally stored results without contacting the origin server. When a cached resource’s retention period expires, it becomes stale and is eligible for eviction.

A combination of header values is used to calculate freshness. As the primary directive, the “Cache-Control: max-age=Value” header specifies the freshness period in seconds. The Expires header is checked next if max-age is not specified. This header is used to calculate the freshness period if it exists. The Expires header specifies a date and time after which the resource should be considered stale. Resource retention periods can be calculated using the Last-Modified header as the last resort.


Another HTTP directive can be used to control cache item freshness. An Etag is what we call it. Using Etag values, web caches can determine if cached resources are still valid.

The origin server responds with a maximum age that defines the cache freshness, as well as an Etag that represents the last version of the response.

When a cache expires, the resource becomes stale. In response to a request for a stale resource, the cache forwards the request to the origin server with an If-None-Match directive and the Etag to determine if it is still valid. It is called revalidation.

There are two possible responses to this request:

  • When the Etag in the request matches the value associated with the resource in the service, the cached value remains valid. It is therefore possible for the origin server to return a 304 (Not Modified) response. Since the cached value is still current, no response body is required, saving bandwidth, especially for large resources. Additionally, the response may include new cache directives to update the freshness of the cached resource.
  • An origin server may ignore the revalidation request and respond with a 200 OK response code, a response body, and an Etag representing the latest version of the resource.

The use of web caching can significantly reduce latencies and save network bandwidth when used effectively. Especially for large items like images and documents, this is true. Additionally, web caches handle requests rather than application services, reducing the load on origin servers.

Proxy caches such as Squid and Varnish are extensively deployed on the internet.


Scalable distributions require caching. Caching stores information that is requested by many clients in memory and serves it as requested by clients. The information can be served potentially millions of times without having to recreate it.

The most common method of caching in scalable systems is through the use of a distributed cache. When a client request arrives, the application logic checks for cached values and returns them if they exist. It is possible to significantly reduce the load on backend services and databases if the cache hit rate is high.

A multilevel caching infrastructure is also built into the internet. HTTP headers contain cache directives that can be exploited by applications. Caching directives allow services to specify what information can be cached and for how long, and to check if stale cache entries are still valid using a protocol. When used wisely, HTTP caching can significantly reduce the load on downstream services and databases.


Book: Memcached A Complete Guide
Book: Foundations of Scalable Systems

Leave a Reply

Your email address will not be published.