When to Use Schema Stitching vs Apollo Federation

Engineering teams evaluating GraphQL composition strategies must weigh operational complexity against architectural scalability. Both patterns unify disparate services into a single graph, but their routing mechanics, type resolution models, and failure modes diverge significantly. Schema stitching relies on runtime resolver delegation and manual type merging at the gateway layer. Apollo Federation shifts composition to a build-time router, enforcing declarative contracts via @key directives and distributed _entities resolution. Selecting the wrong pattern often triggers cascading routing failures, unmanageable type collisions, and unpredictable query planning latency. Understanding the foundational principles of GraphQL Federation Architecture & Design is critical before committing to a composition strategy. This guide provides diagnostic workflows, minimal viable configurations, and step-by-step migration paths to help you decide when to use schema stitching vs Apollo Federation.

Core Architectural Differences: Manual vs Declarative Composition

Dimension Schema Stitching Apollo Federation
Composition Phase Runtime (gateway merges schemas on startup) Build-time (router validates subgraph schemas before deployment)
Routing Logic Explicit delegateToSchema resolver chains Query planner precomputes execution paths across subgraphs
Type Resolution Manual type mapping & field merging @key directives + _entities query for distributed ownership
Failure Surface Runtime resolver errors, silent field drops Schema validation failures, directive mismatches, planning timeouts

Diagnostic Workflow: Identify Your Current Composition Model

  1. Run graphql introspection on your gateway.
  2. Check for _entities and _service fields. If present, you are running Federation.
  3. Inspect gateway logs for delegateToSchema or mergeSchemas calls. If present, you are using stitching.
  4. Query a shared type (e.g., User) across two services. If the gateway throws Cannot return null for non-nullable field, stitching is failing to merge resolvers correctly.

Minimal Viable Configurations

Schema Stitching Gateway (Node.js / @graphql-tools/stitch)

import { makeExecutableSchema } from '@graphql-tools/schema';
import { stitchSchemas } from '@graphql-tools/stitch';
import { delegateToSchema } from '@graphql-tools/delegate';

// Service A & B schemas loaded via introspection or SDL files
const serviceASchema = await introspectSchema(serviceAEndpoint);
const serviceBSchema = await introspectSchema(serviceBEndpoint);

const gatewaySchema = stitchSchemas({
 subschemas: [
 { schema: serviceASchema },
 { schema: serviceBSchema }
 ],
 // Manual type merging required for shared types
 typeMerging: {
 types: {
 User: {
 selectionSet: `{ id }`,
 merge: (originalResult, context, info) => 
 delegateToSchema({ schema: serviceBSchema, operation: 'query', fieldName: 'userById', args: { id: originalResult.id }, context, info })
 }
 }
 }
});

Apollo Federation Subgraph Definition (GraphQL SDL)

type Query {
 product(id: ID!): Product
}

type Product @key(fields: "id") {
 id: ID!
 name: String!
 price: Float!
}

# Extended type in a different subgraph (e.g., Inventory)
extend type Product @key(fields: "id") {
 id: ID! @external
 inStock: Boolean!
}

Federated Router Initialization (TypeScript / @apollo/server)

import { ApolloServer } from '@apollo/server';
import { startStandaloneServer } from '@apollo/server/standalone';
import { buildSubgraphSchema } from '@apollo/subgraph';

const server = new ApolloServer({
 schema: buildSubgraphSchema([{ typeDefs: productTypeDefs, resolvers: productResolvers }]),
});

const { url } = await startStandaloneServer(server, {
 listen: { port: 4001 },
 context: async ({ req }) => ({ token: req.headers.authorization }),
});

Decision Thresholds

When Schema Stitching Still Makes Sense

  • Legacy Monolith Decomposition: You are wrapping REST endpoints with a lightweight GraphQL facade and need to merge schemas without refactoring downstream services.
  • Centralized Ownership: A single platform team controls all services, making manual resolver mapping manageable.
  • Low Query Volume / Rapid Prototyping: CI/CD pipelines lack schema validation gates, and query planning overhead is negligible.
  • Operational Constraint: You cannot enforce strict subgraph contracts or deploy a dedicated router layer.

When Apollo Federation Is Mandatory

  • Cross-Team Domain Ownership: Multiple squads own distinct bounded contexts and require independent deployment cycles.
  • High-Traffic Microservices Ecosystems: You need deterministic query planning, automatic batching, and predictable latency SLAs.
  • Automated Contract Testing: Your CI/CD requires strict schema diffing, directive validation, and breaking-change prevention.
  • Shared Type Complexity: You are struggling with overlapping definitions and need a single source of truth enforced by the router. For teams hitting overlapping type definitions, refer to Resolving Schema Conflicts in Apollo Federation to implement strict contract validation before scaling.

Migration Path & Step-by-Step Resolution

Transitioning from stitching to Federation requires a phased, zero-downtime approach. Do not attempt a big-bang rewrite.

  1. Isolate Domain Boundaries: Map existing types to owning services. Extract @key fields and identify _entities resolution requirements.
  2. Deploy Parallel Router: Spin up an Apollo Router alongside your existing stitching gateway. Route 1% of traffic via feature flag or header injection.
  3. Shift Resolver Logic: Move delegateToSchema chains into subgraph resolvers. Replace manual typeMerging with @key and @external directives.
  4. Enable Schema Validation Gates: Integrate rover subgraph check into CI. Block merges that introduce breaking changes or directive conflicts.
  5. Cutover & Decommission: Gradually increase router traffic to 100%. Monitor _entities resolution latency and query planner cache hit rates. Retire the stitching gateway.

Gateway Routing Overlap During Transition Running both gateways simultaneously introduces routing complexity. Use a reverse proxy (Envoy/Nginx) to split traffic based on X-GraphQL-Router: federation headers. Ensure both gateways resolve identical query shapes to prevent client-side cache fragmentation.

Troubleshooting & Diagnostic Workflows

Symptom Exact Error Payload Root Cause Resolution Path
Duplicate Type Definition GRAPHQL_VALIDATION_FAILED: Type "User" was defined multiple times with different fields. Stitching merging conflicting SDLs or Federation missing @extends In Federation, designate one subgraph as the base owner. Use extend type User @key(fields: "id") in others.
Missing _entities Resolution ApolloRouterError: Subgraph "inventory" failed to resolve _entities for type Product Missing @key directive or resolver not returning __typename Ensure base type includes __typename in resolver. Verify @key matches the exact field path.
Query Planner Timeout ApolloRouterError: Query planning timed out after 5000ms Circular @key dependencies or unbounded field selection Flatten subgraph boundaries. Add @requires to limit cross-service fetch depth. Enable queryPlanner.experimental_cache
Stitching Resolver Drop [Stitching] Field "User.email" dropped: no resolver mapped in subschema Manual delegateToSchema missing field mapping Explicitly map the field in typeMerging or migrate to Federation’s declarative resolution.

Diagnostic Checklist for Production Incidents

  1. Run rover subgraph introspect <endpoint> to verify schema shape.
  2. Check router logs for query_planner.execution_time_ms spikes.
  3. Validate _entities payloads using curl -X POST -H "Content-Type: application/json" -d '{"query": "query { _entities(representations: [{__typename: \"User\", id: \"123\"}]) { ... on User { id name } } }"}' <router-url>
  4. If latency exceeds SLA, enable queryPlanner.experimental_parallelism: true and audit subgraph network hops.

FAQ

Can I run schema stitching and Apollo Federation in the same architecture?

Technically yes, but it introduces severe routing complexity and query planning conflicts. It is only recommended as a temporary migration bridge. Long-term, standardize on one composition model to maintain predictable performance and clear ownership boundaries.

Does Apollo Federation replace all schema stitching use cases?

No. Federation excels in distributed, multi-team environments with strict CI/CD validation. Schema stitching remains practical for lightweight internal APIs, legacy monolith decomposition, or scenarios where a single team controls all services and prefers manual resolver mapping.

How do I handle shared types like User or Product across subgraphs?

In Federation, one subgraph owns the base type definition. Others extend it using extend type and @external. This prevents duplicate definitions and ensures a single source of truth. Proper boundary mapping is essential to avoid circular dependencies.

What is the performance impact of switching from stitching to Federation?

Federation typically improves query planning efficiency by precomputing execution paths at build time. However, it introduces additional network hops for _entities resolution. Proper subgraph design and query batching mitigate latency, whereas stitching often suffers from N+1 resolver chains at runtime.