Entity Resolution Fallback Strategies for Partial Data

This guide shows how to keep a federated query returning a complete, type-safe response when a subgraph hands back partial data for a referenced entity — using fallback identifiers, default injection, and stale-cache recovery. It sits under optimizing reference resolvers for performance within Subgraph Implementation & Entity Resolution.

When to Use This Pattern

A downstream subgraph intermittently drops non-nullable fields under load or version drift, collapsing parent objects to null.
Some entities are addressable by more than one identifier (a new id and a legacy key) and representations may arrive with only one populated.
You need the graph to stay available during a partial outage rather than failing the whole operation.

If a subgraph is fully down rather than degraded, this is a circuit-breaking problem first; fallbacks handle the partial case where some data is recoverable.

Prerequisites

Apollo Federation v2.9 subgraphs with entities keyed via @key directives.
dataloader, opossum, and a Redis client (ioredis) installed in the owning subgraph.
A way to inspect _entities payloads — Apollo Studio traces or APOLLO_ROUTER_LOG=debug.
Knowledge of which fields are non-nullable in the composed supergraph schema.

Root Cause: Identifying Partial Data

Partial payloads rarely surface as HTTP errors. They appear as gateway merge failures or silent null propagation. When the router cannot satisfy a non-nullable field, it nulls the affected entity and may attach an error:

{
  "errors": [
    {
      "message": "Cannot return null for non-nullable field Product.sku",
      "path": ["catalog", "product", "sku"],
      "extensions": { "code": "INTERNAL_SERVER_ERROR" }
    }
  ],
  "data": { "catalog": { "product": null } }
}

Diagnose before patching: open the trace, filter to the _entities fetch phase, and identify which subgraph drops fields. Cross-reference field nullability in the supergraph schema against the resolver path — a field marked ! in SDL that returns null at runtime is a contract violation at the source, which a fallback masks rather than fixes.

Implementation Walkthrough

The most robust pattern combines three layers in the reference resolver: accept an alternative identifier, inject schema-compliant defaults for missing non-key fields, and fall back to a stale cache snapshot when the primary fetch degrades. The SDL declares a secondary key as non-resolvable so the router will not dispatch _entities queries against it, while the subgraph can still receive it in a representation.

# catalog subgraph — schema.graphql
extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.9", import: ["@key"])

# resolvable: false means the router won't plan fetches keyed on legacyId,
# but representations carrying legacyId are still accepted by this subgraph.
type Product
  @key(fields: "id")
  @key(fields: "legacyId", resolvable: false) {
  id: ID!
  legacyId: String
  sku: String!
  name: String!
  warehouseId: String!
}

// catalog subgraph — resolvers.ts
import DataLoader from 'dataloader';
import CircuitBreaker from 'opossum';
import { redisClient } from './cache';

interface Product { id: string; legacyId?: string; sku: string; name: string; warehouseId: string; }
interface Context {
  db: { products: { batchGet(keys: { id?: string; legacyId?: string }[]): Promise<(Product | null)[]> } };
  config: { defaultWarehouseId: string };
}

// Primary batched fetcher, guarded by a circuit breaker.
const fetcher = async (keys: { id?: string; legacyId?: string }[]) =>
  context.db.products.batchGet(keys);

const breaker = new CircuitBreaker(fetcher, {
  timeout: 2000, errorThresholdPercentage: 50, resetTimeout: 30000,
});

// When the breaker is open, serve the last known good entity from Redis.
breaker.fallback(async (keys: { id?: string; legacyId?: string }[]) => {
  const cached = await redisClient.mget(keys.map(k => `entity:product:${k.id ?? k.legacyId}`));
  return cached.map(v => (v ? (JSON.parse(v) as Product) : null));
});

export const resolvers = {
  Product: {
    __resolveReference: async (
      ref: { id?: string; legacyId?: string },
      context: Context,
    ): Promise<Product | null> => {
      // 1. Reject a truly empty representation early.
      if (!ref.id && !ref.legacyId) {
        throw new Error('ENTITY_KEY_MISSING: no primary or fallback identifier');
      }

      // 2. Fetch via the breaker (DataLoader batches; breaker handles degradation).
      const [product] = await breaker.fire([{ id: ref.id, legacyId: ref.legacyId }]);
      if (!product) return null;

      // 3. Inject type-safe defaults for missing non-key fields.
      //    Never inject a value of the wrong scalar type into the schema.
      return {
        ...product,
        sku: product.sku ?? `LEGACY-${ref.legacyId ?? 'UNKNOWN'}`,
        warehouseId: product.warehouseId ?? context.config.defaultWarehouseId,
      };
    },
  },
};

Three things make this safe. The secondary @key(... resolvable: false) lets a representation arrive keyed by legacyId without the router trying to plan a fetch on it. The default injection substitutes values that match the SDL scalar types exactly, so the merge phase never sees a type mismatch. And the circuit breaker’s fallback serves a Redis snapshot when the database is degraded, keeping the entity resolvable during a partial outage. Coordinate the batching here with optimizing reference resolvers for performance so the fallback path does not reintroduce N+1 fetches.

The order of the three layers is not arbitrary. Identifier resolution comes first because everything downstream needs some key to work with — there is no point caching or defaulting an entity you cannot address. Default injection comes last, after the fetch, because it operates on whatever the primary or fallback path returned; injecting defaults before the fetch would mask a successful read with placeholder data. The circuit breaker wraps the fetch itself, sitting between the two, so that a degraded database transparently swaps to the cache without the identifier logic above or the defaulting logic below needing to know which path produced the row. This layering means each concern stays independent: you can change the cache backend without touching identifier handling, or tighten the breaker thresholds without revisiting the defaults.

Be deliberate about what the stale snapshot contains. The Redis fallback is only as useful as the freshness of the last successful write, so populate the cache on the success path of the breaker — write the resolved entity back to entity:product:<key> whenever the primary fetch succeeds. A cache that is only written on the fallback path is empty exactly when you need it. Pair that with a TTL long enough to outlast a typical outage but short enough that a recovered subgraph’s fresh data wins quickly, and add jittered expiry so a fleet of pods does not all refill the same key at the same instant when the breaker closes.

Verification Steps

Confirm the SDL composes with the secondary key:

rover subgraph check "$APOLLO_GRAPH_REF" --name catalog --schema ./catalog/schema.graphql

Then fire a synthetic representation that omits a non-key field and verify the default is injected rather than a null collapsing the parent:

query VerifyFallback {
  product(id: "prod_404") { id sku warehouseId }
}

A healthy fallback returns a complete object, e.g. { "id": "prod_404", "sku": "LEGACY-UNKNOWN", "warehouseId": "wh-default" }, with no top-level errors. To exercise the stale-cache path, trip the breaker (force the database fetch to time out) and confirm the resolver returns the cached snapshot instead of null.

Common Mistakes & Gotchas

Injecting a wrong-typed default. Substituting 0 into a String! field passes your resolver but fails the gateway merge with a coercion error. Match the SDL scalar exactly — a string placeholder for String!, a sane numeric default for Int!.

Ignoring @requires dependencies in the fallback. If a downstream field uses @requires, a fallback entity that omits the required source fields will make that downstream resolver fail. Populate the fields named in any @requires contract, even in the degraded path.

Synchronous or unbatched fallback fetches. A blocking cache read or a per-key fallback negates the performance work upstream and inflates tail latency. Keep the fallback asynchronous and batched, and always log partial-data incidents so the underlying contract violation gets fixed at the source.

Frequently Asked Questions

How does federation handle partial entity payloads during query planning?

The router expects each referenced entity to resolve per its @key. If a subgraph returns null for a required key or non-nullable field, that null propagates up and collapses the parent object. Fallback strategies intercept this before the merge by supplying an alternative identifier, a cached snapshot, or a typed default.

Can I use @requires with fallback resolvers?

Yes, but the fallback must still satisfy the @requires contract. If the required source fields are absent from the degraded payload, fetch them from a secondary source or inject defaults of the correct scalar type — otherwise the dependent resolver fails downstream.

Should a fallback return partial data or block the query?

Return a complete, type-safe shape using cached or default values. Blocking the operation on partial data degrades the client experience and defeats federation’s fault tolerance. Always log the incident so the source subgraph’s contract violation is remediated.

Entity Resolution Fallback Strategies for Partial Data #

When to Use This Pattern #

Prerequisites #

Root Cause: Identifying Partial Data #

Implementation Walkthrough #

Verification Steps #

Common Mistakes & Gotchas #

Frequently Asked Questions #

Related #