Federated GraphQL Operations in Production

Composing a supergraph is only half the work; running it reliably under production load is the other half. This guide covers the operational disciplines that keep a federated graph healthy in production — deploying and tuning the Apollo Router, instrumenting distributed traces across subgraphs, caching at every layer, and versioning the supergraph so schema changes ship without downtime.

Federation moves the complexity of a large API out of a single codebase and into a runtime topology: a router fans every operation out across independently deployed subgraphs, then stitches the partial responses back together. That topology has to be configured, observed, cached, and evolved like any other distributed system. The pages in this section assume you have already designed your graph — see GraphQL Federation Architecture & Design for boundary and composition decisions — and have working subgraphs as covered in Subgraph Implementation & Entity Resolution. Here we focus exclusively on the production runtime.

Core Operational Concepts

Four concerns dominate the operational lifecycle of a federated graph, and each has a dedicated guide in this section. They are not independent: caching decisions depend on what observability tells you about hot paths, schema migrations depend on the router config that propagates the new fields, and router tuning depends on what the trace data reveals about subgraph behaviour. Treat them as one feedback loop — observe, tune, cache, evolve — rather than four separate checklists.

The router is the single ingress for every client operation, so its configuration and deployment footprint determine your tail latency, your blast radius during incidents, and your scaling behaviour. Apollo Router Configuration and Deployment covers router.yaml structure, supergraph SDL loading, header propagation, traffic shaping, and container/Kubernetes topologies.

Once traffic flows through the router, you cannot debug what you cannot see. A single client query may touch five subgraphs, and a latency regression in any one of them surfaces as a slow request with no obvious cause unless trace context is propagated end to end. Observability and Distributed Tracing in Federation covers OpenTelemetry spans, trace propagation, and metrics that map query plans to subgraph timings.

Federation multiplies network hops — every entity boundary is a potential round trip — so caching is not optional at scale. Caching Strategies for Federated GraphQL covers the layered cache model: query plan caching, entity/response caching at the router, automatic persisted queries, and per-subgraph caching with correct invalidation semantics.

Finally, the supergraph is a living contract. Subgraphs deploy on independent schedules, and a careless field removal can break clients you have never met. Migrating and Versioning Federated Schemas covers safe schema evolution, the @deprecated and @override migration path, contract variants, and the rover-driven publish/check loop that gates breaking changes.

Underlying all four is a shift in how you think about “the API”. In a monolith the API is a deployable unit you version and ship atomically. In a federated graph the API is an emergent property of many independently deployed services plus a composition step; no single team owns the whole surface, and no single deploy changes all of it. That is liberating for development velocity and unforgiving operationally — there is no single rollback button for the graph. Production discipline replaces that button with checks, observability, and ordered rollouts, which is exactly what the guides in this section establish.

Production Topology

The diagram below shows the runtime components of a federated graph in production and how an operation flows through them. A client sends a single GraphQL document to the router; the router consults its query plan cache, fetches the supergraph SDL from the schema registry (Apollo’s Uplink or a local file), executes the plan across subgraphs, emits traces and metrics to your telemetry backend, and consults its caches on the way through.

Federated GraphQL production topology A client sends one operation to the Apollo Router, which loads supergraph SDL from a schema registry, consults caches, fans the query plan out across three subgraphs, and emits traces and metrics to a telemetry backend. Client one operation Apollo Router plan · execute propagate · merge Schema Registry supergraph SDL Cache plan · entity Telemetry traces · metrics accounts subgraph products subgraph reviews subgraph

Two properties of this topology drive every operational decision. First, the router is stateful only in its caches and its loaded SDL — the SDL comes from the registry and the caches are rebuildable — so router pods are effectively disposable and scale horizontally. Second, the latency a client experiences is the critical path through the query plan, not the sum of all subgraph calls; the planner parallelises independent fetches, so observability has to attribute time to the plan node, not just the subgraph. Both points recur throughout the guides below.

Read the flow left to right. The client sends one GraphQL document; it does not know or care how many services back the graph. The router parses and validates that document, then either reuses a cached query plan or computes a new one. Planning is the router’s most expensive operation, which is why the plan cache sits on the hot path and why the schema registry feeds the router the supergraph SDL that planning depends on. With a plan in hand the router executes its fetches — three subgraphs here, in whatever parallel/serial shape the plan dictates — propagating the client’s relevant headers to each. As partial responses return, the router merges them into the shape the original operation requested and streams the result back. Throughout, it emits spans and metrics to the telemetry backend and consults its caches, so observability and caching are not bolt-ons but properties of the request path itself.

The registry edge deserves emphasis. In managed federation the schema registry is the source of truth for what graph the router serves: a subgraph publish recomposes the supergraph and pushes new SDL to every router pod without a restart. In local mode that same SDL is a build artifact you compose with rover and ship alongside the router. Either way, the registry-to-router relationship is what lets independent teams evolve their subgraphs without coordinating a global deploy — the central promise of federation, realised at runtime.

Key Configuration Reference

These are the router.yaml keys and rover commands you will reach for most often when operating the graph. Each is covered in depth on the relevant child page. The split is worth internalising: router.yaml keys are runtime configuration that takes effect when the router loads or hot-reloads its config, while the rover commands act on the supergraph schema itself at composition time. Mixing the two mental models — for example, expecting a router.yaml change to fix a composition error, or a rover publish to change a timeout — is a common early confusion. Runtime behaviour lives in the config file; the shape of the graph lives in the schema artifacts that rover produces and publishes.

Key / Command Scope Effect
supergraph.listen Router config Bind address/port for the public GraphQL endpoint
headers.all.request.propagate Router config Forward client headers (auth, locale) to every subgraph
traffic_shaping.all.timeout Router config Per-subgraph request deadline before the fetch is failed
telemetry.instrumentation.spans Router config Enable OpenTelemetry spans for router and subgraph fetches
supergraph.query_planning.cache Router config In-memory and optional Redis query plan cache
health_check.enabled Router config Expose /health for liveness/readiness probes
rover supergraph compose CLI Merge subgraph SDLs into a supergraph SDL artifact
rover subgraph check CLI Validate a subgraph change against the registered graph + traffic
rover subgraph publish CLI Publish a subgraph schema, triggering recomposition

Canonical Production Pattern

The smallest complete production setup is a composed supergraph, a router.yaml, and a launch command. Compose the supergraph from a supergraph.yaml that lists each subgraph and its routing URL.

# supergraph.yaml — input to `rover supergraph compose`
federation_version: =2.9.0
subgraphs:
  accounts:
    routing_url: http://accounts.svc.cluster.local:4001/graphql
    schema:
      subgraph_url: http://accounts.svc.cluster.local:4001/graphql
  products:
    routing_url: http://products.svc.cluster.local:4002/graphql
    schema:
      subgraph_url: http://products.svc.cluster.local:4002/graphql

A production-representative router.yaml enables the operational features each child guide expands on:

# router.yaml
supergraph:
  listen: 0.0.0.0:4000
  query_planning:
    cache:
      in_memory:
        limit: 512          # number of cached query plans held in memory
health_check:
  enabled: true
  listen: 0.0.0.0:8088
headers:
  all:
    request:
      - propagate:
          named: authorization   # forward client auth to subgraphs
traffic_shaping:
  all:
    timeout: 5s                 # fail a slow subgraph fetch rather than hang the request
    deduplicate_query: true
telemetry:
  instrumentation:
    spans:
      mode: spec_compliant
  exporters:
    tracing:
      otlp:
        enabled: true
        endpoint: http://otel-collector:4317

Compose, then launch the router against the composed artifact:

rover supergraph compose --config supergraph.yaml > supergraph.graphql
./router --config router.yaml --supergraph supergraph.graphql

In managed federation you omit the local --supergraph flag and instead point the router at Apollo’s Uplink with an APOLLO_KEY and APOLLO_GRAPH_REF; the registry pushes new supergraph SDL to the router on every publish. Both modes are detailed in Apollo Router Configuration and Deployment.

This three-part pattern — supergraph.yaml, router.yaml, launch command — is the irreducible core of a production federated runtime, and almost every operational concern is a refinement of one of the three. Observability is a telemetry block in router.yaml. Caching is a query_planning.cache block plus optional response-cache config in the same file. Header propagation, timeouts, retries, rate limits, and CORS are all router.yaml sections. Schema versioning is the discipline that governs how supergraph.yaml and the published subgraph schemas change over time. Keep the mental model simple: you are operating one binary, configured by one file, serving one composed schema, fanning out to many services. The complexity is in the topology, not the runtime.

Note the resolver pattern in the subgraph snippet pairs a __resolveReference with the entity’s @key. That is the contract the router relies on to stitch entities across services: when a plan needs to hydrate a Product that another subgraph referenced by id, it issues an _entities query to the owning subgraph, which resolves the reference. Production performance therefore depends as much on how cheaply each subgraph resolves references as on anything in router.yaml — a point the caching and observability guides return to repeatedly.

Cross-Section Integration Points

Production operations inherit constraints from the two design-time sections. The way you drew subgraph boundaries dictates your runtime fan-out: deep cross-service traversals defined in Designing Cross-Service Type References translate directly into query-plan depth and therefore into tail latency, which is why query-plan caching and observability matter most for graphs with many entity boundaries.

Entity resolution performance is the other side of the same coin. The reference resolvers you wrote following Optimizing Reference Resolvers for Performance are where most subgraph-side latency lives; the router’s _entities batching only helps if the subgraph itself batches its data fetches. Entity response caching at the router, covered under Caching Strategies for Federated GraphQL, is the runtime complement to those resolver optimisations.

Authorization spans both worlds too. Header propagation configured in the router is what carries the claims that subgraph directives — see Directive Patterns for Cross-Service Authorization — depend on. A misconfigured headers block in router.yaml is the most common cause of 401 errors that appear to originate in the subgraph.

Schema evolution is the final integration point, and it reaches back into the architecture section. The conflict-resolution rules you adopted following Resolving Schema Conflicts in Apollo Federation — when to use @shareable, @override, and @inaccessible — are precisely the tools you reach for when migrating a field’s ownership between subgraphs at runtime without breaking clients. The operations job is to sequence those changes safely across independently deploying services, which is why versioning sits in this section even though the directives are defined design-side. A breaking change that composes cleanly can still break production if it ships before clients are ready; the runtime’s job is to make sure it does not.

Common Failure Modes

Stale supergraph on rolling deploy. When a subgraph ships a schema change before the router has the recomposed supergraph, the router plans against the old schema and fails fetches for new fields. The fix is ordering: publish the subgraph schema and let recomposition complete (managed federation) or recompose and roll the router (local mode) before clients use the new fields. This is the core of Migrating and Versioning Federated Schemas.

Subgraph timeout cascade. Without traffic_shaping.all.timeout, one slow subgraph holds the router connection open and every query that touches it stacks up, exhausting the router’s connection pool and degrading unrelated operations. Set a per-subgraph timeout below your client-facing SLA and treat the failed fetch as partial data.

Query plan cache thrash. A graph that receives a high volume of distinct operations (common with unbounded client-generated queries) can evict useful plans faster than it reuses them, sending planner CPU through the roof. Persisted queries and a sized plan cache address this — see Configuring Query Plan Caching in the Apollo Router.

Lost trace context. If the router emits spans but subgraphs do not continue the trace, you get a router-only flame graph that cannot localise a regression to a specific service. The remedy is propagating traceparent headers and instrumenting subgraph servers, covered in Observability and Distributed Tracing in Federation.

Header leakage across tenants. A propagate: { matching: ".*" } rule forwards every client header to every subgraph, which can leak one tenant’s headers into a shared cache key or downstream log. Propagate named headers explicitly rather than wildcarding.

Cold-pod planner CPU spike. When an autoscaler adds router pods, each new pod starts with an empty plan cache and re-plans every operation it sees, driving a CPU spike and a burst of slow first-requests right when you scaled up to handle load. A shared Redis plan cache and configured warm-up turn cold pods into warm ones; without them, scaling out can briefly make latency worse before it gets better.

Unbounded query cost. Federation makes it easy for a client to request a deeply nested traversal that fans out across every subgraph. Without depth, alias, and complexity limits in router.yaml, a single expensive operation can saturate the graph. Set query limits as a guardrail and pair them with the cost-analysis patterns from the authorization guides.

CI/CD & Tooling Integration

The router runtime is downstream of a composition pipeline. Every subgraph change should pass rover subgraph check against the registered graph variant before merge; the check validates composition and, when connected to a graph with metrics, flags operations that the change would break based on real traffic.

# .github/workflows/subgraph-check.yml (excerpt)
- name: Check subgraph against the graph
  run: |
    rover subgraph check my-graph@prod \
      --name products \
      --schema ./products/schema.graphql
  env:
    APOLLO_KEY: ${{ secrets.APOLLO_KEY }}

On merge to main, publish the subgraph so the registry recomposes and (in managed mode) pushes the new supergraph to the router fleet:

rover subgraph publish my-graph@prod \
  --name products \
  --schema ./products/schema.graphql \
  --routing-url http://products.svc.cluster.local:4002/graphql

This check-then-publish loop is what makes independent subgraph deployment safe; it is the operational backbone of every page in this section and is expanded in the architecture section’s Schema Validation in CI/CD Pipelines.

The check is more than a composition test. When the graph is connected to Apollo Studio and accumulating operation metrics, rover subgraph check evaluates the proposed change against real recent traffic and reports which live operations a field removal or type change would break, and how often each is called. That turns an abstract “is this a breaking change?” question into a concrete “this change breaks 412 calls/day to GetCheckout” answer, which is the data a reviewer needs to decide whether to proceed, deprecate first, or coordinate with a client team. Wiring this check as a required status on every subgraph pull request is the highest-leverage governance step a platform team can take; it moves breakage from production incidents to code review, where it is cheap to fix. Publishing then becomes a routine post-merge step rather than a coordinated event, because the check has already proven the change is safe to compose and serve.

Rolling Out a Production Federated Graph

A first production rollout follows a predictable order, and skipping steps is where most early incidents come from. Begin by connecting the graph to a registry and establishing the check-then-publish loop, even if you start in local mode — getting schema governance in place before traffic arrives is far cheaper than retrofitting it after a breaking change has already shipped. Next, stand up the router with conservative router.yaml defaults: explicit header propagation, per-subgraph timeouts below your SLA, health checks wired to probes, and query limits as a guardrail. Only then point real clients at it.

With traffic flowing, turn on observability before you turn on caching. You cannot size a cache or set a sensible timeout without knowing your operation cardinality, your hot paths, and your per-subgraph latency distribution, and all of that comes from traces and metrics. Once you can see the graph, enable the plan cache sized to the cardinality you observed, add persisted queries to bound that cardinality, and layer in entity/response caching for the subgraphs whose data is genuinely cacheable. Treat each cache as an optimisation you can prove worked by watching the hit ratio and the latency it was meant to improve, not as a setting you flip and forget.

Finally, rehearse a schema migration before you need one. Deprecate a field, watch the operation metrics confirm clients have moved off it, then remove it — and do this on a non-critical field first so the team learns the rhythm of safe evolution while the stakes are low. A graph whose team has practised migrations evolves fearlessly; one whose team has not tends to freeze its schema out of caution, which defeats the point of federation.

Decision Guide: Router Runtime Choices

Decision Choose A when… Choose B when…
Router runtime Apollo Router (Rust) for any production traffic — lower latency, lower memory @apollo/gateway (Node) only for legacy migrations or tight Node plugin coupling
Supergraph delivery Managed federation (Uplink) for multi-team graphs needing zero-touch updates Local --supergraph file for air-gapped or fully GitOps-controlled deploys
Plan cache In-memory only for single-pod or low-cardinality query sets In-memory + Redis for large fleets sharing a warm plan cache
Caching strategy APQ + plan cache as the baseline everywhere + entity/response cache when subgraphs serve cacheable, non-personalised data

The router-versus-gateway choice is the most consequential and is treated in full in Apollo Router vs @apollo/gateway: Production Trade-offs.

Frequently Asked Questions

Do I need managed federation to run the Apollo Router in production?

No. The router runs equally well against a local supergraph SDL file passed with --supergraph, which suits GitOps and air-gapped environments. Managed federation adds zero-touch supergraph updates and metrics-aware schema checks, which most multi-team organisations find worth the dependency, but it is optional.

Where does query latency actually come from in a federated graph?

The client sees the critical path through the query plan, not the sum of every subgraph call, because the planner parallelises independent fetches. Latency is therefore dominated by the slowest serial chain of entity resolutions plus router planning time. Distributed tracing that attributes time to plan nodes is the only reliable way to find it.

How do subgraph deploys avoid breaking the running router?

Run rover subgraph check in CI to reject breaking changes, then publish on merge. In managed federation the registry recomposes and pushes new supergraph SDL to the router without a restart. The ordering rule is that schema additions can ship ahead of clients, but field removals must follow a deprecation window so no in-flight client breaks.

Is the Apollo Router stateless enough to autoscale?

Effectively yes. Its only state is the loaded supergraph SDL (sourced from the registry or a mounted file) and its caches (rebuildable). Router pods scale horizontally behind a load balancer; a shared Redis plan cache lets new pods start warm rather than cold.

Should authorization run in the router or the subgraphs?

Coarse-grained checks (authentication, tenant isolation) belong at the router via a coprocessor or header validation; fine-grained, data-dependent checks belong in subgraphs. The router’s job in either case is to propagate the validated claims downstream — see the authorization directive patterns in the subgraph section.

What is the minimum observability needed before going live?

At minimum: router-level request rate, error rate, and p50/p95/p99 latency; per-subgraph fetch latency and error rate; and trace propagation so a slow request can be drilled into. Operation-level metrics in Apollo Studio close the loop by tying latency back to specific client operations.

How big should the query plan cache be?

Size it to your distinct-operation cardinality, not your request volume. If clients send a bounded set of operations (ideally via persisted queries), a few hundred cached plans covers nearly all traffic. Unbounded ad-hoc queries need either a larger cache or, better, persisted query enforcement.