Caching Strategies for Federated GraphQL
A federated GraphQL deployment is not a single cache — it is a stack of independent cache layers, each with its own keys, scopes, and invalidation semantics, that the Apollo Router and your subgraphs must keep coherent. Treating caching as an afterthought produces stale entity reads, leaked private data, and unpredictable hit rates; treating it as a deliberate, layered architecture turns the same supergraph into a system that serves the overwhelming majority of reads without ever touching a datasource. This guide maps every cache tier in a production federated graph, from the CDN edge down to per-request DataLoader memoisation, and explains how cache-control hints propagate across the boundary. It is part of Federated GraphQL Operations in Production, the parent guide covering router deployment, observability, and lifecycle concerns.
Problem Statement
In a monolithic GraphQL server you have one place to cache: the response, the resolver, or the datasource. In a federated graph the request fans out across the router and N subgraphs, and a single client query may assemble data from three services with wildly different freshness requirements — a product catalog that changes hourly, inventory that changes per second, and a user profile that is private to the caller. A naive whole-response cache is wrong because it cannot express “cache the catalog fields publicly for an hour but never cache the viewer’s cart.” The federated answer is a set of cooperating caches keyed at different granularities, governed by @cacheControl hints that every subgraph emits and the router aggregates into the strictest applicable policy. Get the propagation rules wrong and you either cache nothing (no benefit) or cache a private field in a shared tier (a security incident).
Prerequisites
The Cache Hierarchy: What Lives Where
Caching in a federated graph is best understood as a descent through layers, each catching what the layer above it missed. A read that misses every tier walks all the way to the datasource; a read that hits the CDN never reaches the router at all.
The strictest layer wins. A response is cacheable at the CDN only if every contributing field is PUBLIC and the operation arrived as a cacheable GET; the moment any field is PRIVATE or the operation is a mutation, the edge passes it through. The router cache sits one layer in, operating at entity granularity so that the PUBLIC catalog half of a mixed query can be cached even when the PRIVATE cart half cannot. Subgraph-local caches and DataLoader catch the remainder, collapsing the per-entity _entities fan-out into a small number of datasource calls.
Layer 1 — HTTP and CDN Caching
The cheapest cache is the one that answers before your infrastructure wakes up. HTTP caching at a CDN or reverse proxy works for GraphQL only under two conditions: the operation is sent as a GET request (so intermediaries treat it as cacheable), and the response carries a Cache-Control header the edge can honour. POST bodies are not cacheable by standard HTTP caches, which is why CDN caching is almost always paired with automatic persisted queries — APQ replaces a large POST body with a short hashed GET, and the hash becomes a clean cache key. See Implementing Automatic Persisted Queries in Federation for the full handshake.
The router computes the response Cache-Control header by taking the most restrictive hint across every field in the response. If any field is no-store or private, the whole response becomes uncacheable at shared HTTP caches. The header the router emits looks like:
# A query whose fields all carry PUBLIC max-age >= 60
Cache-Control: max-age=60, public
# A query that touched any PRIVATE field
Cache-Control: max-age=0, private
Edge caching is therefore an all-or-nothing layer per response: it is enormously effective for read-only, fully-public operations (a marketing storefront, public documentation graph, anonymous product browsing) and silently inert for anything user-specific. Do not try to force it; let it serve the operations it can and rely on the entity cache for mixed and private traffic.
Layer 2 — The Apollo Router Entity/Response Cache
The router’s entity cache is the workhorse of federated caching because it operates below the response and above the subgraphs. Rather than caching whole responses, it caches the result of each subgraph _entities fetch keyed by entity type, key fields, the requested field set, and (for PRIVATE data) a session identifier. When a query plan dispatches a fetch to a subgraph, the router first checks Redis; on a hit it skips the subgraph call entirely and splices the cached entity into the response. This is what lets a single query cache its catalog portion while re-fetching its inventory portion on every request.
The cache is Redis-backed for horizontal scale: every router replica shares one store, so a warm entity benefits all of them. Each subgraph declares its own TTL and scope through cache-control hints, and the router records those per subgraph — a 1-hour TTL on the products subgraph and a 5-second TTL on inventory coexist in the same cache. The deep-dive on key structure, TTL tuning, PRIVATE vs PUBLIC partitioning, and invalidation lives in Entity Caching with the Apollo Router Response Cache.
A minimal router configuration enabling the entity cache:
# router.yaml — enable the Redis-backed entity cache
preview_entity_cache:
enabled: true
redis:
urls:
- "redis://redis-cache.internal:6379"
timeout: 5ms # fail fast: a slow cache must not slow the request
subgraphs:
products:
enabled: true
ttl: 3600s # catalog changes slowly; cache aggressively
inventory:
enabled: true
ttl: 5s # near-real-time stock; tiny TTL still cuts load
users:
enabled: true
private_id: "sub" # partition PRIVATE entries by the JWT 'sub' claim
Layer 3 — @cacheControl and Cache Hints
Cache hints are the contract that makes the upper layers safe. Every subgraph annotates its types and fields with a maxAge (TTL in seconds) and a scope (PUBLIC or PRIVATE), and the router aggregates these into the policy it applies at the response and entity layers. A subgraph emits hints either through static @cacheControl directives in SDL or dynamically from a resolver via info.cacheControl.setCacheHint(...).
# products subgraph SDL — static cache hints
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.9", import: ["@key"])
@link(url: "https://specs.apollo.dev/cache-control/v0.1", import: ["@cacheControl"])
type Product @key(fields: "id") @cacheControl(maxAge: 3600, scope: PUBLIC) {
id: ID!
name: String!
description: String!
# Field-level hint overrides the type default for this field only
price: Money! @cacheControl(maxAge: 60)
}
type Query {
product(id: ID!): Product
}
The aggregation rule is “minimum maxAge, strictest scope.” If a query selects Product.name (maxAge 3600) and Product.price (maxAge 60), the effective maxAge for that response is 60. If any selected field is PRIVATE, the whole response is PRIVATE. Fields with no hint default to uncacheable unless you set a defaultMaxAge, which is the safe default — a field you forgot to annotate should never silently land in a shared cache.
// Dynamic hint from a resolver — useful when TTL depends on data
const resolvers = {
Query: {
product: async (_p, { id }, ctx, info) => {
const product = await ctx.loaders.product.load(id);
// Discontinued products never change again: cache them for a day
if (product?.discontinued) {
info.cacheControl.setCacheHint({ maxAge: 86400, scope: 'PUBLIC' });
}
return product;
},
},
};
Layer 4 — DataLoader Caching Inside Subgraphs
The lowest layer is per-request memoisation. DataLoader does two jobs at once: it batches the per-entity _entities calls the router dispatches into a single datasource query, and it deduplicates repeated loads of the same key within one request. Crucially, DataLoader’s cache is request-scoped — it must be instantiated in the context factory, never at module scope, or it leaks data across requests. This is the same discipline detailed in Optimizing Reference Resolvers for Performance.
import DataLoader from 'dataloader';
interface Context {
loaders: { product: DataLoader<string, Product | null> };
}
// Per-request: created fresh for every incoming operation
const createLoaders = (db: Db) => ({
product: new DataLoader<string, Product | null>(async (ids) => {
const rows = await db.products.findMany({ where: { id: { in: [...ids] } } });
const byId = new Map(rows.map((r) => [r.id, r]));
return ids.map((id) => byId.get(id) ?? null); // preserve key order
}),
});
const server = new ApolloServer<Context>({ schema });
const { url } = await startStandaloneServer(server, {
context: async () => ({ loaders: createLoaders(db) }),
});
DataLoader caching is not a substitute for the router or Redis cache — it lives only for the duration of one request and vanishes afterward. Its value is collapsing the N-key _entities batch the router sends into one datasource round trip, which is the single most important defence against the N+1 pattern in entity resolution. For a cross-request entity cache inside a subgraph, layer an L1 (in-memory LRU) and L2 (Redis) cache behind the loader’s batch function, as shown in the caching section of the resolver performance guide.
Cache Hint and Config Reference
| Knob | Where it lives | Valid values | Applied at | Effect |
|---|---|---|---|---|
@cacheControl(maxAge:) |
Subgraph SDL / resolver | integer seconds | composition + runtime | TTL for the field/type |
@cacheControl(scope:) |
Subgraph SDL / resolver | PUBLIC | PRIVATE |
runtime | shared vs per-session |
info.cacheControl.setCacheHint |
Subgraph resolver | { maxAge, scope } |
runtime | data-dependent TTL |
defaultMaxAge |
Subgraph cache plugin | integer seconds | runtime | fallback for unannotated fields |
preview_entity_cache.enabled |
router.yaml | bool | runtime | turns on Redis entity cache |
subgraphs.<name>.ttl |
router.yaml | duration | runtime | per-subgraph entity TTL |
subgraphs.<name>.private_id |
router.yaml | claim name | runtime | PRIVATE partition key |
apq.router.cache |
router.yaml | redis config | runtime | APQ hash → query store |
Step-by-Step: Wiring the Layers Together
- Classify every field. Tag each type and field as PUBLIC or PRIVATE in subgraph SDL with
@cacheControl. Set a conservativedefaultMaxAge(or none) so unannotated fields are never cached. - Enable DataLoader in each subgraph. Instantiate loaders in the context factory and route
__resolveReferencethrough them so the router’s_entitiesbatch collapses to one datasource call. - Turn on the router entity cache. Add
preview_entity_cachewith Redis URLs and per-subgraph TTLs. Start with short TTLs and widen them as you observe hit rates. - Partition PRIVATE data. For subgraphs that emit PRIVATE hints, set
private_idto a stable session/user claim so per-user entries never collide. - Add APQ + CDN for public read traffic. Enable APQ on the router and switch read clients to GET; let a CDN cache fully-public responses by the operation hash.
- Verify aggregation. Run a mixed query and inspect the response
Cache-Controlheader — it should reflect the minimum maxAge and strictest scope across all selected fields.
Composition and CI Integration
Cache-control directives are part of subgraph SDL and therefore flow through normal composition. The @cacheControl directive must be imported via @link in every subgraph that uses it, and rover supergraph compose will surface a composition error if a directive is referenced but not linked. Run rover subgraph check in CI before publishing so a change to a cache hint — say, widening a TTL or flipping a field to PUBLIC — is reviewed like any other schema change.
# CI: validate the subgraph (including cache-control directive usage) before publish
rover subgraph check my-supergraph@prod \
--name products \
--schema ./products.graphql
rover subgraph publish my-supergraph@prod \
--name products \
--schema ./products.graphql \
--routing-url https://products.internal/graphql
Treat a field flipping from PRIVATE to PUBLIC as a security-sensitive change in review: it moves data from a per-session partition into a shared cache. A schema check diff makes that visible to reviewers.
Performance and Scale Considerations
The single biggest lever is the entity cache hit rate on your slowest subgraphs. A 90% hit rate on a subgraph that takes 80ms at the datasource turns its effective contribution to p50 latency into single-digit milliseconds. Keep the router’s Redis timeout aggressive (single-digit milliseconds) — a cache lookup that is slower than the datasource is worse than no cache, and the router must fall through to the subgraph quickly on a slow Redis. Size Redis for your working set of hot entities, not your entire dataset; the cold remainder of entities will miss and that is fine.
PRIVATE caching multiplies cardinality by the number of distinct sessions, so it pays off only for entities a single user reads repeatedly within a TTL window. For data that is both private and rarely re-read, skip the shared cache and rely on DataLoader within the request. Finally, remember that every cache layer adds an invalidation obligation: the wider the TTL, the staler the worst-case read, so pair aggressive TTLs with explicit invalidation on mutation where correctness demands it.
Failure Modes and Debugging
Private data served from a shared cache. The classic and most dangerous bug: a field that varies per user was annotated (or defaulted) as PUBLIC, so user A’s entity is served to user B. The fix is to mark the field scope: PRIVATE and set a private_id. Audit for this by checking that every resolver returning user-specific data emits a PRIVATE hint.
Everything is uncacheable. If the response Cache-Control is always max-age=0, private, one field is dragging the aggregation down — usually an unannotated field with no defaultMaxAge, or one stray PRIVATE field. Inspect the per-field hints in the trace and find the minimum.
Stale reads after a mutation. A wide TTL with no invalidation means a write is invisible until the entry expires. Either shorten the TTL for write-heavy entities or issue an explicit cache delete on mutation, as covered in the entity cache guide.
Cache lookups slower than the datasource. A misconfigured or overloaded Redis with a generous timeout makes cached requests slower than uncached ones. Lower the router’s Redis timeout so it fails fast and falls through to the subgraph.
Frequently Asked Questions
Does the Apollo Router cache whole responses or individual entities?
Both layers exist, but the entity/response cache the router ships operates at entity granularity — it caches the result of each subgraph _entities fetch independently, keyed by type, key fields, and field set. This is what lets a mixed query cache its PUBLIC entities while re-fetching its PRIVATE ones. Whole-response caching happens above the router at the CDN/HTTP layer, and only applies when every field in the response is PUBLIC.
How do cache hints from different subgraphs combine?
The router takes the minimum maxAge and the strictest scope across every field contributing to a response or entity. If one subgraph says maxAge: 3600, PUBLIC and another says maxAge: 5, PUBLIC, the response gets maxAge: 5. If any field is PRIVATE, the whole response is PRIVATE. A field with no hint and no defaultMaxAge is treated as uncacheable, which pulls the effective maxAge to zero.
Can I cache data that is different for each user?
Yes, with PRIVATE scope. Mark the field or type scope: PRIVATE and configure private_id on the router pointing to a stable per-user claim (commonly the JWT sub). The router then partitions cache entries by that identifier so users never see each other’s data. PRIVATE caching only pays off when a single user re-reads the same entity within the TTL; otherwise the cardinality cost outweighs the benefit.
Do I still need DataLoader if the router has an entity cache?
Yes. The router cache prevents cross-request datasource hits, but on a cache miss the subgraph still receives a batch of entity keys in one _entities call. Without DataLoader, that batch becomes N separate datasource queries (the N+1 pattern). DataLoader collapses the batch into one query and also deduplicates repeated keys within a single request — work the router cache cannot do.
Why is CDN caching tied to persisted queries?
Standard HTTP caches do not cache POST requests, and full GraphQL operations are normally sent as POSTs because the query string is large. Automatic persisted queries let the client send a short hash as a GET request, which intermediaries treat as cacheable, with the hash serving as a clean cache key. This is the only practical way to get edge caching for GraphQL reads.