Schema Registry and Managed Federation

Managed federation moves the supergraph out of your build artifacts and into a hosted control plane: subgraphs publish their SDL to a schema registry, the registry composes and validates the supergraph, and the Apollo Router fetches the composed schema at runtime from Uplink rather than reading a static file. This guide covers the full managed-federation loop end to end — the registry, graph variants, Uplink, rover subgraph publish, schema checks, proposals, and launches — as the operational backbone of your GraphQL Federation Architecture & Design.

The core shift is decoupling. In unmanaged federation you run rover supergraph compose in CI, bake the resulting supergraph SDL into a file, and ship it alongside the router. Every schema change requires a router redeploy. Managed federation inverts this: the router holds no schema at all on disk. It polls Apollo Uplink, receives the latest validated supergraph, and hot-swaps it in place — so a subgraph team can publish a new schema and have it live across the fleet within seconds, with no router deployment and no coordinated release train.

Prerequisites

An Apollo Studio (GraphOS) organization with a graph created and a graph ref in graph-id@variant form.
Rover installed via the official install script (curl -sSL https://rover.apollo.dev/nix/latest | sh) — it is not an npm package.
A graph API key exported as APOLLO_KEY with graph:write permission for publishing.
Each subgraph reachable at a stable routing URL the router can call at query time.
Federation v2 subgraph SDL using extend schema @link(url: "https://specs.apollo.dev/federation/v2.9", import: [...]).
An Apollo Router (or Gateway) configured to read its supergraph from Uplink, not from a local file.

Core Concepts Overview

Managed federation is a small set of components that hand work to each other in a fixed order. Understanding each one in isolation makes the whole loop predictable.

The schema registry is the source of truth. It stores every subgraph’s published SDL per variant, the composed supergraph for each variant, the full history of launches, and the operation/usage data reported by routers. When you ask “what schema is production actually serving?”, the answer lives in the registry, not in any repo.

Graph variants are independent environments of the same graph — my-graph@staging, my-graph@production, my-graph@dev. Each variant has its own set of published subgraph schemas, its own composed supergraph, its own checks configuration, and its own usage metrics. A subgraph published to @staging never affects @production. Variants are how you promote a schema through environments and how you keep checks honest (a staging check should diff against the staging baseline, never production).

Uplink is the delivery channel. It is the endpoint the router polls to fetch the current supergraph SDL and its runtime configuration. When the registry composes a new supergraph for a variant, Uplink begins serving that artifact; routers configured for that variant pick it up on their next poll. Uplink is what makes schema rollout independent of binary deployment.

Rover is the CLI that talks to the registry. rover subgraph publish pushes a subgraph’s SDL into a variant; rover subgraph check validates a proposed change against a variant before publishing. Detailed publishing mechanics live in Publishing Subgraph Schemas with the Rover CLI.

Schema checks run composition and breaking-change analysis inside the registry, scored against recorded production traffic. They are the gate that stops a breaking publish. See Apollo Studio Schema Checks for Managed Federation for the full configuration surface.

Schema proposals are a governance layer on top of checks — a review workflow for schema changes before they are implemented, covered in Schema Proposals and Approval Workflows.

Architecture Diagram

The diagram below traces the two flows that define managed federation: the publish flow (left to right, subgraphs into the registry) and the fetch flow (registry out to the router via Uplink). These flows are asynchronous and decoupled — that decoupling is the entire point.

Walking the diagram: each subgraph team runs rover subgraph publish against a variant (solid purple). The registry recomposes the supergraph and, if composition succeeds, hands the new artifact to Uplink (blue). The router polls Uplink on a fixed interval, fetches the validated supergraph, and hot-reloads it. At query time the router resolves the query plan and fetches data directly from subgraphs over their routing URLs (dashed). Clients only ever talk to the router. The registry is never on the request path — it is a control plane, not a data plane.

The Registry, Variants, and Uplink in Depth

A graph in the registry is a logical container; variants are the units you actually operate against. A typical setup runs @dev for local integration, @staging for pre-production verification, and @production for live traffic. Each variant accumulates its own subgraph schemas independently, which is what lets you publish a risky change to @staging, run checks and synthetic traffic against it, then promote the identical SDL to @production only once it is proven.

Uplink serves two distinct artifacts to the router: the supergraph schema (the composed SDL the router uses to build query plans) and router configuration (managed runtime config you can edit in Studio without redeploying). The router authenticates to Uplink with its own APOLLO_KEY and is pinned to a single APOLLO_GRAPH_REF. This pinning is critical: a router configured for @production will only ever fetch the production supergraph, so a staging publish can never leak into production traffic even by accident.

Configure the router for managed federation by giving it the graph ref and key, and crucially not giving it a --supergraph file:

# router.yaml — managed federation: supergraph comes from Uplink, not disk
supergraph:
  # No `path:` here — omitting it tells the router to poll Uplink.
  listen: 0.0.0.0:4000
# Uplink poll cadence (default 10s). Lower = faster rollout, more Uplink calls.
apollo:
  uplink:
    poll_interval: 10s
    # Uplink endpoints have built-in failover; the router rotates on error.

# The router needs only the graph ref and a key — no schema artifact.
export APOLLO_KEY="service:my-graph:xxxxxxxxxxxx"
export APOLLO_GRAPH_REF="my-graph@production"
./router --config router.yaml

If Uplink is briefly unreachable, the router keeps serving the last supergraph it successfully fetched — schema delivery degrades gracefully and never takes down live traffic.

Config & Command Spec Table

Element	Where it lives	Composition-time vs runtime	Notes
`APOLLO_GRAPH_REF`	env var (`graph-id@variant`)	both	Pins every Rover and router operation to one variant.
`APOLLO_KEY`	env var (secret)	both	Use a graph API key (`graph:write` to publish, `graph:read` for the router).
`rover subgraph publish`	CI / shell	composition-time	Triggers recomposition + a launch in the registry.
`rover subgraph check`	CI / shell	composition-time	Validates composition + usage impact before publish.
`supergraph.path` (router)	`router.yaml`	runtime	Omit for managed federation; set only for unmanaged/local.
`apollo.uplink.poll_interval`	`router.yaml`	runtime	Controls rollout latency vs Uplink request volume.
`--routing-url`	publish flag	composition-time	Stored in the supergraph; the router calls this URL at query time.

Step-by-Step: The Managed Federation Loop

Step 1 — Publish each subgraph to a variant. Each subgraph registers its SDL and routing URL. This is the only step that mutates the registry.

rover subgraph publish my-graph@production \
  --name products \
  --schema ./products/schema.graphql \
  --routing-url https://products.internal.svc:4001/graphql

Step 2 — The registry recomposes. On publish, the registry merges all current subgraph schemas for that variant into a new supergraph and validates it. If composition fails, the publish is rejected and the previous supergraph stays live — no broken artifact is ever served.

Step 3 — A launch is created. Every successful composition creates a launch: a timestamped, attributable record of which subgraph publish produced which supergraph, with full build logs. Launches are how you audit and roll back.

Step 4 — Uplink serves the new supergraph. Once the launch completes, Uplink begins offering the new artifact for that variant.

Step 5 — Routers poll and hot-reload. Every router pinned to the variant fetches the new supergraph on its next poll and swaps it in atomically, with zero downtime and no redeploy.

Step 6 — Routers report usage. Routers stream operation and field-usage data back to the registry, which feeds the breaking-change analysis used by future checks. The loop closes: today’s traffic informs tomorrow’s gate.

Composition Pipeline Integration

In a managed setup, CI runs checks on pull requests and publishes on merge. The check is the gate; the publish is the release. A minimal GitHub Actions flow:

name: Federation Managed Pipeline
on:
  pull_request:
    paths: ['subgraphs/products/**']
  push:
    branches: [main]
    paths: ['subgraphs/products/**']

jobs:
  check:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: curl -sSL https://rover.apollo.dev/nix/latest | sh
      - run: echo "$HOME/.rover/bin" >> $GITHUB_PATH
      - name: Check against production baseline
        run: |
          rover subgraph check my-graph@production \
            --name products \
            --schema ./subgraphs/products/schema.graphql
        env:
          APOLLO_KEY: ${{ secrets.APOLLO_KEY }}

  publish:
    if: github.event_name == 'push'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: curl -sSL https://rover.apollo.dev/nix/latest | sh
      - run: echo "$HOME/.rover/bin" >> $GITHUB_PATH
      - name: Publish on merge (triggers recompose + launch)
        run: |
          rover subgraph publish my-graph@production \
            --name products \
            --schema ./subgraphs/products/schema.graphql \
            --routing-url https://products.internal.svc:4001/graphql
        env:
          APOLLO_KEY: ${{ secrets.APOLLO_KEY }}

This mirrors the patterns in Schema Validation in CI/CD Pipelines, but with the publish step wired to a real registry rather than only running local composition. For the full publish surface and rollback procedure, see Publishing Subgraph Schemas with the Rover CLI.

Schema Proposals & Approval

Checks tell you whether a change is safe to compose; they do not tell you whether it is the right change for a shared type. That is the job of schema proposals — a review workflow where a draft schema change is opened, routed to the owning team for approval, and only then implemented via publish. Proposals are essential when several teams contribute to one entity, because they give the type owner a veto before the change ever reaches composition. The full lifecycle, reviewer assignment, and how to enforce that production changes are proposal-backed are covered in Schema Proposals and Approval Workflows. Pair this with clear Type Ownership and Shared Schema Contracts so every shared type has a named owner who reviews proposals against it.

Launches & Rollout

A launch is the atomic unit of change in managed federation. Each launch bundles the triggering publish, the composition build, the resulting supergraph, and the rollout to Uplink. Because every launch is recorded with its supergraph artifact, rollback is simply re-promoting a prior known-good schema. The fastest rollback is to re-publish the last good subgraph SDL, which creates a fresh launch with the old shape; the registry recomposes and Uplink serves it within a poll interval.

Rollout latency is governed entirely by poll_interval. With a 10-second interval, a publish reaches the entire router fleet in roughly 10 seconds — without touching a single deployment pipeline. This is the operational superpower of managed federation, and it is also why schema discipline matters: a bad publish propagates just as fast as a good one, which is exactly why the check gate and proposal workflow exist upstream of publish. For longer-lived schema evolution across versions, coordinate launches with Migrating and Versioning Federated Schemas.

Failure Modes & Debugging

Encountered X build errors while trying to build the supergraph. A publish was accepted as valid SDL but the supergraph failed to compose against the other subgraphs (for example an unshared overlapping field). The previous supergraph stays live. Reproduce locally with rover supergraph compose and fix before re-publishing.

Router serving a stale schema. The router did not pick up a launch. Confirm the router’s APOLLO_GRAPH_REF matches the variant you published to, check the router logs for Uplink fetch errors, and verify poll_interval is not set absurdly high. A router pinned to @staging will never see a @production publish.

error[E029]: Encountered an error while running checks: This graph variant does not exist. The graph ref in APOLLO_GRAPH_REF is wrong, or the variant has never received a publish. Variants are created on first publish; check the spelling of both graph id and variant.

Frequently Asked Questions

Does the router need the supergraph SDL on disk in managed federation?

No. In managed federation the router fetches the composed supergraph from Apollo Uplink at runtime and hot-reloads it. You omit the supergraph.path setting and provide only APOLLO_KEY and APOLLO_GRAPH_REF. A static --supergraph file is only used for unmanaged or local development.

What is the difference between a graph variant and a separate graph?

A variant is an isolated environment of the same graph — its own published subgraphs, composed supergraph, checks config, and metrics — typically @dev, @staging, @production. Using variants lets you promote the identical SDL through environments and run checks against the correct baseline. Separate graphs are fully unrelated and share nothing.

What exactly happens when I run rover subgraph publish?

The registry stores the new subgraph SDL for that variant, recomposes the supergraph from all current subgraphs, validates it, and — if composition succeeds — creates a launch and serves the new supergraph through Uplink. Routers pick it up on their next poll. If composition fails, the publish is rejected and the live supergraph is unchanged.

How fast does a published schema reach production?

Rollout latency equals the router’s Uplink poll_interval, typically around 10 seconds, plus the registry’s composition time. No router redeploy is involved, so a schema change can be live fleet-wide in seconds.

How do I roll back a bad schema in managed federation?

Re-publish the last known-good subgraph SDL. This creates a new launch with the prior shape; the registry recomposes and Uplink serves it within one poll interval. Because every launch records its supergraph artifact, the launch history is your rollback log.

Schema Registry and Managed Federation #

Prerequisites #

Core Concepts Overview #

Architecture Diagram #

The Registry, Variants, and Uplink in Depth #

Config & Command Spec Table #

Step-by-Step: The Managed Federation Loop #

Composition Pipeline Integration #

Schema Proposals & Approval #

Launches & Rollout #

Failure Modes & Debugging #

Frequently Asked Questions #

Related #