HTTP requests caching
Behavior
We currently have AMO behind a CDN, which caches HTTP requests. It works like a standard reverse proxy: when a request comes in, a cache key is generated from various information sent by the client, a cached response is returned if one is found, otherwise the request is forwarded to the origin and returned to the client while being put in cache if appropriate.
For the cache key, we take into account the following parameters:
- Country corresponding to the client IP using GeoIP
- Value of the
frontend_active_experiments
cookie (parsed from theCookie
request header) - its absence being a separate value as far as the cache key is concerned - The following HTTP headers:
Accept-Encoding
User-Agent
DNT
- The following cookies, extracted from the
Cookie
HTTP header: frontend_active_experiments
sessionid
If a response is found in the cache with the key, it's returned and the request never reaches the origin server.
If a response is not found in the cache, the request is forwarded to the origin server, and if the response returned by the origin server contains a Cache-Control: s-maxage=<value>
header, it's cached using the same logic to determine the key described above. The duration of the cache is the value of that header.
Behind the scenes the cache key is generated with a mix of hardcoded CDN configuration and HTTP headers returned in the Vary
header(s) in the response. It might include more headers depending on the page, for instance pages doing Accept-Language
detection add that header to the key automatically by adding it to the Vary
header in the response).
The origin will send a Cache-Control: s-maxage=<value>
header (causing the CDN to cache the response) on all responses unless the request came in with a sessionId
or the response being generated is a 40x or 50x. On top of that, a Cache-Control: max-age=0
is sent by default so browsers themselves never cache the responses, to deal with authentication and back/forward cache interaction.
Additional considerations
Cookies in requests
We can't Vary
on a specific cookie, only the whole header, which would include all cookies ever set on the AMO domain, including analytics - so we would likely see an extremely poor cache hit ratio if we did that. Therefore, cookies that affect the CDN cache are hardcoded in the CDN configuration for a given path pattern. This allows us to cache differently based on the value of frontend_active_experiments
for instance, but any other cookie not specified in that configuration will be ignored for caching purposes. If a request comes in with a foo=bar
cookie, it could be served the same response from cache as someone coming in without it.
Cache duration
We currently return 180
as the number of seconds to cache responses. The CDN might potentially serve stale responses while it's populating the cache, so sometimes clients might see a cached response that is a bit older than that.
API
The API also has a similar caching layer, using a different set of parameters for the cache key: User-Agent
and DNT
are ignored, Origin
is used instead, frontend_active_experiments
cookie is ignored, and the cache is bypassed for requests coming in with a sessionid
cookie or Authorization
header instead of the sessionId
cookie.