Migrating and Versioning Federated Schemas
A live supergraph is never finished. Subgraph teams add fields, rename concepts, move ownership of an entity from one service to another, and occasionally need to remove something that clients still query. Doing this without breaking in-flight operations is the central operational discipline of running federation in production. This guide is part of Federated GraphQL Operations in Production and focuses on safe schema evolution: distinguishing additive from breaking changes, deprecating fields gracefully, exposing curated slices of the graph through contract variants, transferring field ownership incrementally with @override, rolling subgraphs out blue/green, and coordinating versioning across teams that deploy independently.
The hard part of schema evolution in a federated graph is that there is no single deploy. Each subgraph ships on its own cadence, the router composes whatever is currently published, and clients — web, mobile, partner integrations — query the supergraph on their own schedule, often months behind. A change that looks trivial inside one service can break a mobile build that is still in app-store review. Treating the published schema as a contract, and the registry as the system of record for that contract, is what makes independent deployment safe.
Prerequisites
Additive vs Breaking Changes
Every schema change falls into one of two buckets, and the router enforces a third constraint on top: composition must still succeed.
Additive (safe) changes never invalidate an existing valid operation. Adding a new field to a type, adding a new type, adding an optional (nullable) argument, adding a new enum value to an output enum, or adding a new optional input field are all backward compatible. A client that never asked for the new field is unaffected.
Breaking changes invalidate operations that were valid before. Removing a field or type, renaming a field, changing a field’s type (String → Int), tightening nullability on an input (String → String!), adding a required argument, or removing an enum value will break someone. The subtle cases bite hardest: making an output field nullable (String! → String) is breaking for clients that fed it into a non-null position, and adding an enum value to an input enum can break clients with exhaustive switch logic, while adding one to an output enum is breaking for clients that don’t handle the unknown case.
In federation there is a fourth category — composition-breaking changes that are fine in isolation but fail when the supergraph is recomposed. The classic example is two subgraphs disagreeing on a shared field’s type or @shareable status. These surface at rover subgraph check / rover supergraph compose, not at runtime. Treat a composition failure as a breaking change for the team that introduced it.
A few cases trip up engineers most often. Changing a field’s arguments is asymmetric: adding an optional argument is additive, but adding a required one (or making an existing optional argument required) breaks every operation that omitted it. Default values matter too — adding a default to a previously required argument is additive, since old operations that supplied the value still work and new ones may omit it. Interface and union evolution is another trap: adding a member to a union is additive for clients that already handle the default case in their inline fragments, but removing a member, or removing a type from the set of implementers of an interface, breaks any operation that selected fields specific to that member. The safest mental model is to ask “could a single previously valid, previously executing operation now fail or return a differently typed value?” If yes, it is breaking, regardless of how small the SDL diff looks.
The discipline is simple: additive changes flow continuously; breaking changes go through deprecation. The registry’s check command classifies changes automatically against real traffic, which is the only reliable way to know whether a “breaking” change actually affects any live operation.
# Classify a proposed change against the last 7 days of real operations.
# Exits non-zero on a breaking change that affects observed traffic.
rover subgraph check my-graph@prod \
--name products \
--schema ./products.graphql
Deprecating Fields with @deprecated
@deprecated is the primary tool for retiring a field without an immediate break. It is a built-in GraphQL directive — no @link import needed — and it composes into the supergraph so the deprecation reason reaches every client’s introspection and IDE tooling.
type Product @key(fields: "id") {
id: ID!
# Old field kept alive while clients migrate.
price: Float! @deprecated(reason: "Use priceV2, which carries currency. Removal after 2026-09-01.")
priceV2: Money!
}
type Money {
amount: Int! # minor units, avoids float rounding
currencyCode: String!
}
The lifecycle is: ship the replacement field additively, mark the old field @deprecated with a reason and a target removal date, watch operation metrics until usage of the deprecated field drops to zero (or to a known set of legacy clients you can chase), then remove it. The registry tells you exactly which operations and which clients still touch the field, so removal becomes a data-driven decision rather than a hopeful one. A deprecation reason that names both the replacement and a date does far more to move teams than a bare @deprecated.
Curating the Graph with @tag and Contract Variants
Not every client should see every field. A public partner API and an internal admin console often want different slices of the same supergraph. Schema contracts let you derive a filtered variant — a contract variant — from a source variant by including or excluding fields based on @tag directives, without forking schemas or running separate graphs.
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.9", import: ["@key", "@tag"])
type Product @key(fields: "id") {
id: ID!
name: String! @tag(name: "public")
margin: Float! @tag(name: "internal") # excluded from the public contract
}
You define @tag labels in subgraph SDL, then configure a contract in the registry that, for example, includes only public-tagged fields. The registry composes a separate contract supergraph and hands it to a dedicated router instance. Clients of that router physically cannot query margin — it is absent from their schema, not merely access-controlled. Contracts are powerful for partner programs, free-vs-paid tiers, and keeping internal operational fields out of an externally documented API. They pair naturally with the ownership rules covered in Type Ownership and Shared Schema Contracts.
A key subtlety: removing a @tag that a contract depends on, or excluding a field that the contract’s clients still query, is a breaking change for that contract’s clients even if the source variant is unaffected. Run checks against every contract variant, not just the source.
Contracts also change how you reason about deprecation and removal. A field can be safely absent from the public contract while still live and heavily used internally, which means “is this field used?” must be answered per variant. The registry tracks operation metrics per variant, so a removal that is safe against the internal variant may still break the public contract’s partner clients, who move slowest of all. Sequence contract changes from the most-controlled audience outward: validate against internal first, then partner, then any fully public contract, and give external consumers the longest deprecation windows because you have the least visibility into their release cadence. When two contracts need genuinely different shapes of the same field — not just presence or absence — that is a signal the underlying type is doing too much, and the cleaner fix is usually to split the field rather than to diverge the contracts further.
Migrating Field Ownership with @override
The most delicate evolution is moving a field from one subgraph to another — for instance, pulling Product.inventoryStatus out of a legacy monolith and into a dedicated inventory service. @override makes this incremental and reversible. The new subgraph declares the field with @override(from: "<old-subgraph>"), and the query planner routes the field to the new owner while the old definition stays in place as a safety net.
Federation v2 adds progressive @override via the label argument, which lets you shift a percentage of traffic for the field to the new subgraph. This turns ownership migration into a graduated rollout rather than an atomic flip.
# inventory subgraph (new owner) — progressive override at 10%
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.9",
import: ["@key", "@external", "@override"])
type Product @key(fields: "id") {
id: ID! @external
# 10% of resolutions go here; 90% still hit the legacy subgraph.
inventoryStatus: String! @override(from: "legacy-monolith", label: "percent(10)")
}
The state machine below shows the safe path: both subgraphs define the field, you raise the override percentage in stages while comparing results, then once you are at 100% you remove the field from the old subgraph in a separate deploy.
The override is only complete once the field is removed from the old subgraph. Until then, the old resolver remains the fallback for the un-migrated traffic share, and a single change to label: "percent(N)" adjusts the split. If the new subgraph returns inconsistent values, lowering the percentage instantly restores the legacy behaviour without a redeploy of consumers. The ownership-transfer mechanics also appear in Resolving Schema Conflicts in Apollo Federation, which covers the conflict cases that arise when two subgraphs claim the same field.
Directive & Mechanism Reference
| Mechanism | Syntax | Effect | Composition-time vs runtime |
|---|---|---|---|
@deprecated |
field: T @deprecated(reason: "...") |
Marks a field/enum value as deprecated in introspection | Composition-time annotation; clients see it at runtime |
@tag |
field: T @tag(name: "public") |
Labels schema elements for contract filtering | Composition-time; consumed by contract builds |
@override |
field: T @override(from: "subgraphA") |
Transfers field ownership to the declaring subgraph | Composition-time routing decision |
Progressive @override |
@override(from: "subgraphA", label: "percent(10)") |
Splits field resolution by percentage | Composition-time config, runtime traffic split |
@inaccessible |
field: T @inaccessible |
Removes element from the API schema while keeping it for internal joins | Composition-time; element absent from public supergraph |
| Contract variant | Registry config (include/exclude @tag) |
Derives a filtered supergraph for a client segment | Composition-time; separate supergraph per contract |
@inaccessible deserves a note alongside @deprecated: it is the right tool when a field must exist in a subgraph for entity resolution but should not be queryable by any client. Adding @inaccessible to a field that clients currently use is a breaking change; removing it (exposing a field) is additive.
Step-by-Step: A Safe Breaking-Change Rollout
The following sequence retires a field across teams without a hard break.
- Introduce the replacement additively. Ship the new field (e.g.
priceV2: Money!) in the owning subgraph. This passes checks because it adds nothing required.
rover subgraph check my-graph@prod --name products --schema ./products.graphql
rover subgraph publish my-graph@prod --name products \
--schema ./products.graphql --routing-url https://products.svc/graphql
-
Deprecate the old field with a dated reason. Publish again with
@deprecated(reason: "Use priceV2. Removal after 2026-09-01."). Clients now see the warning in their IDE and codegen. -
Drive consumer migration with data. Use operation metrics from the registry to identify every client still selecting the deprecated field. Open tickets against those teams with the exact operation names.
-
Confirm zero traffic, then remove. When checks report no observed operations using the field, remove it and publish. Because the registry validates against real traffic, the removal check passes only when it is genuinely safe.
# This now passes only because no live operation selects the removed field.
rover subgraph check my-graph@prod --name products --schema ./products.graphql
- Watch the router after publish. Managed federation hot-reloads the supergraph; confirm error rates stay flat and no client begins emitting
Cannot query fielderrors.
Composition Pipeline Integration
Schema evolution belongs in CI, gated by checks before any publish. A minimal pipeline runs a check on pull requests and a publish on merge to the main branch. The check command compares the proposed subgraph schema against the published supergraph and against recent operation traffic, so it catches both composition failures and client-breaking changes.
# .github/workflows/subgraph.yml (excerpt)
jobs:
schema-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: rover check
run: |
rover subgraph check "$APOLLO_GRAPH_REF" \
--name "$SUBGRAPH_NAME" \
--schema ./schema.graphql
env:
APOLLO_KEY: ${{ secrets.APOLLO_KEY }}
APOLLO_GRAPH_REF: my-graph@prod
SUBGRAPH_NAME: products
For the full pipeline pattern, including contract-variant checks and proposal workflows, see Schema Validation in CI/CD Pipelines and the new Schema Registry and Managed Federation section, which covers how publishes propagate to the router.
Blue/Green Subgraph Rollout
Schema evolution and deployment evolution interact. When a subgraph change includes a resolver or runtime change — not just SDL — you want to roll the new binary out without a window where the published schema and the running service disagree. The order matters in both directions:
- Additive change (new field): Deploy the new subgraph binary first (green), confirm it is healthy behind the old schema, then publish the schema. If you publish first, the router may plan queries for a field the deployed pods can’t yet resolve.
- Removal: Publish the schema first (so the router stops planning for the field), then deploy the binary that drops the resolver. Reversing this leaves the router routing to a field the new pods no longer serve.
A blue/green rollout keeps both versions live behind a load balancer or service mesh. Shift traffic gradually, and keep the routing URL stable so the router does not need a republish to find the new pods. Progressive @override complements this for the special case of moving a field between different subgraphs rather than versions of the same one.
Performance & Scale Considerations
Versioning choices have runtime cost. Keeping a deprecated field alongside its replacement doubles the surface a subgraph must resolve, and if both fields hit the same backend you may double the load during the migration window — cache the underlying read so the deprecated and new fields share a fetch. Progressive @override adds a small query-planning cost because the planner must account for two possible owners of the field; this is negligible compared to the safety it buys, and disappears once the legacy field is removed. Contract variants each produce a separate supergraph that a separate router serves, so each contract is an independent operational surface with its own query-plan cache; do not spin up contracts you don’t need. Finally, the schema registry’s check against operation traffic depends on representative sampling — if your metrics only cover a narrow client mix, a “safe” removal can still surprise a low-traffic partner, so widen sampling before high-stakes removals.
Failure Modes & Debugging
Cannot query field "X" on type "Y" appears at the client after you remove a field that someone still queried. The fix is to roll the supergraph back to the prior published version (instant with managed federation), restore the field, re-deprecate it, and chase the remaining client before trying again. This error is precisely what traffic-aware checks exist to prevent.
A @shareable field must be marked @shareable in all subgraphs that define it — a composition error that surfaces when an evolution makes a field overlap between subgraphs without consistent @shareable. It blocks publish, so it cannot reach production, but it can block an unrelated team’s deploy if your schema is the one that broke composition.
Field "X" is already defined and cannot be overridden indicates a malformed @override, usually a wrong from: subgraph name or two subgraphs both trying to override the same field. The from: value must exactly match the registered subgraph name.
Progressive override percentage “stuck” — if traffic does not shift after you change label: "percent(N)", confirm you actually republished the subgraph; the percentage is part of the SDL and only takes effect when composed. Check the active supergraph version in the registry against your latest publish.
Frequently Asked Questions
Is adding a field to a federated subgraph ever a breaking change?
Adding an output field is additive and safe. The exceptions are subtle: adding a required argument to an existing field is breaking, and adding a field that collides with another subgraph’s definition can break composition. Run rover subgraph check — it classifies the change against both the supergraph and real traffic, so you do not have to reason about every edge case manually.
How do I version a federated graph — do I cut a v1 and v2 endpoint?
Generally no. The federation idiom is continuous evolution of a single graph: add fields additively, deprecate the old ones, and remove them once traffic reaches zero. Hard version endpoints duplicate the entire supergraph and double operational cost. If you must isolate a client segment, use a contract variant rather than a parallel graph.
What happens to in-flight queries when I publish a schema change with managed federation?
The router hot-reloads the new supergraph and applies it to subsequent requests; queries already in flight complete against the plan they were built with. Because additive changes don’t invalidate existing operations, clients see no disruption. For removals, sequence the schema publish ahead of the resolver removal so the router stops planning for the field before the pods stop serving it.
Can I roll back a schema publish?
Yes. The registry retains every published supergraph version, and with managed federation you can repoint the variant at a previous composition, which the router picks up on its next poll. Treat rollback as a first-class step in any risky migration rather than a last resort.