Validating Custom Scalar Inputs Across Subgraphs

A custom scalar is only as trustworthy as its weakest implementation, and in a federated graph the same scalar name can be defined independently in every subgraph that uses it. This page shows how to make serialize, parseValue, and parseLiteral behave identically everywhere, so invalid input is rejected the same way regardless of which subgraph receives it.

The failure mode is subtle. Federation composes a scalar by name: if two subgraphs both declare scalar EmailAddress, the supergraph exposes one EmailAddress, but each subgraph keeps its own validation code. One subgraph might lowercase and trim, another might accept anything that contains an @, a third might reject the empty string. Clients then see a scalar that means different things on different paths — the same value accepted here and rejected there. The fix is to centralise the scalar’s behaviour in a shared library and import the identical GraphQLScalarType into every subgraph. If you have not yet shared a scalar across services, start with Sharing Custom Scalars Across Multiple Subgraphs, then return here for the validation discipline. The parent guide, Custom Scalars in Federated GraphQL Schemas, covers the broader serialization concerns.

When to use this pattern

Use it when a custom scalar accepts input (appears in arguments or input types) in more than one subgraph, because every entry point must reject malformed values identically.
Use it when the scalar carries semantic constraints beyond its wire type — an email, a positive money amount, an ISO-8601 instant, a bounded enum-like string — where “is it a string” is not the same as “is it valid”.
Skip a shared library only for an output-only scalar used by a single subgraph, where there is no input to validate and no second definition to diverge.

Prerequisites

Apollo Federation v2 subgraphs built with @apollo/subgraph (buildSubgraphSchema, ^2.7).
graphql ^16 (provides GraphQLScalarType, GraphQLError, Kind).
A shared internal package (e.g. @acme/graphql-scalars) that every subgraph can depend on at a pinned version.
Agreement on the scalar’s canonical form — the single normalised representation parseValue and parseLiteral both produce.

The three coordinates that must agree

A GraphQLScalarType defines three functions, and divergence in any one breaks uniformity:

serialize(internal) runs on the way out, converting your internal value to a JSON-safe response value. It should also defend against an internal value that was never validated (data written before validation existed, for instance).
parseValue(jsonInput) runs on the way in for values supplied through query variables. This is the most common input path for clients.
parseLiteral(astNode) runs on the way in for values written inline in the query document. It receives an AST node, not a plain value, so it must check the node kind before reading it.

The two input paths — variables and inline literals — must apply the same validation and produce the same canonical output. A frequent bug is validating in parseValue but not parseLiteral, so inline literals slip past the check. Route both through one shared validator.

Implementation walkthrough

Define the scalar once in the shared package. A single validate function holds the rules; all three coordinates call it so they can never drift. parseLiteral first rejects any AST node that is not a string, then delegates to the same validator as parseValue. Every rejection throws a GraphQLError, which the framework surfaces as a structured input error rather than a 500.

// @acme/graphql-scalars/src/email-address.ts
import { GraphQLScalarType, GraphQLError, Kind, ValueNode } from "graphql";

const EMAIL_RE = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

// Single source of truth. Returns the canonical (trimmed, lowercased) form
// or throws. Every coordinate of the scalar funnels through this.
function validateEmail(value: unknown): string {
  if (typeof value !== "string") {
    throw new GraphQLError(`EmailAddress must be a string, got ${typeof value}`);
  }
  const canonical = value.trim().toLowerCase();
  if (!EMAIL_RE.test(canonical)) {
    throw new GraphQLError(`EmailAddress is not a valid email: "${value}"`);
  }
  return canonical;
}

export const EmailAddress = new GraphQLScalarType<string, string>({
  name: "EmailAddress",
  description: "RFC-shaped email address, normalised to trimmed lowercase.",

  // OUT: validate even on serialize so legacy/unvalidated internal values
  // cannot leak a malformed address into a response.
  serialize(internal): string {
    return validateEmail(internal);
  },

  // IN via variables: same validator, same canonical output.
  parseValue(input): string {
    return validateEmail(input);
  },

  // IN via inline literals: check the AST kind first, then the SAME validator,
  // so literals and variables can never diverge.
  parseLiteral(ast: ValueNode): string {
    if (ast.kind !== Kind.STRING) {
      throw new GraphQLError(
        `EmailAddress must be a string literal, got ${ast.kind}`,
        { nodes: ast },
      );
    }
    return validateEmail(ast.value);
  },
});

Each subgraph imports that exact instance and wires it into both its SDL and its resolver map. The SDL declaration is just scalar EmailAddress; the behaviour comes entirely from the imported type.

// users subgraph
import { buildSubgraphSchema } from "@apollo/subgraph";
import gql from "graphql-tag";
import { EmailAddress } from "@acme/graphql-scalars";

const typeDefs = gql`
  extend schema
    @link(url: "https://specs.apollo.dev/federation/v2.9", import: ["@key"])

  scalar EmailAddress

  type User @key(fields: "id") {
    id: ID!
    email: EmailAddress!
  }

  type Mutation {
    inviteUser(email: EmailAddress!): User!   # input path -> parseValue/parseLiteral
  }
`;

export const schema = buildSubgraphSchema({
  typeDefs,
  // Map the scalar NAME to the shared instance. Do this in every subgraph
  // that declares `scalar EmailAddress` so validation is identical everywhere.
  resolvers: { EmailAddress },
});

Because billing and notify import the same EmailAddress from the same pinned package version, inviteUser(email: ...) in users, a billing contact field, and a notification recipient all reject "NOT AN EMAIL" with the same message and all normalise " Foo@Bar.COM " to foo@bar.com. There is no second implementation to drift.

Composition implications of divergent definitions

Federation will compose two subgraphs that both declare scalar EmailAddress even if their resolver code differs — composition matches scalars by name and does not inspect JavaScript. That is the trap: the supergraph looks healthy while validation silently diverges at runtime. Two guards keep this honest. First, never define the scalar’s logic inline per subgraph; import the shared instance so there is physically one implementation. Second, pin the shared package to one version across subgraphs and let rover subgraph check run in CI, so a subgraph that lags behind is caught at publish time rather than in production. If a subgraph genuinely needs different rules, that is a different scalar and deserves a different name — not a quietly divergent EmailAddress.

Verification steps

Compose and check the subgraphs to confirm the scalar resolves consistently across the supergraph:

rover supergraph compose --config ./supergraph.yaml > supergraph.graphql
rover subgraph check my-graph@current \
  --schema ./users/schema.graphql --name users

Then exercise both input paths against a running router. A valid value should be accepted and normalised; an invalid one should be rejected with the shared message — identically through a variable and through an inline literal.

# Inline literal path -> parseLiteral
mutation { inviteUser(email: "NOT AN EMAIL") { id } }

{
  "errors": [
    { "message": "EmailAddress is not a valid email: \"NOT AN EMAIL\"",
      "extensions": { "code": "BAD_USER_INPUT" } }
  ]
}

Repeat the same value via a variable (mutation($e: EmailAddress!){ inviteUser(email:$e){id} }) and confirm an identical error. Then send " Foo@Bar.COM " and confirm the resolver receives foo@bar.com, proving normalisation is shared. Run the same two probes against the billing and notify subgraphs; matching responses prove uniform validation.

Common mistakes & gotchas

Validating in parseValue but not parseLiteral. Inline literals then bypass validation entirely. Always route both through one validate function and check the AST kind in parseLiteral before reading the value.

Re-implementing the scalar per subgraph instead of importing it. Copy-pasted scalar code drifts the moment one team tweaks a regex. Composition will not catch it because it matches by name. Import a single pinned instance from a shared package.

Throwing a plain Error instead of GraphQLError. A plain Error surfaces as an opaque INTERNAL_SERVER_ERROR 500; a GraphQLError produces a clean input-validation error with a useful message and code. Always throw GraphQLError from the scalar.

Frequently Asked Questions

Will federation reject subgraphs whose custom scalar logic differs?

No. Composition matches scalars by name and never inspects the resolver implementation, so two subgraphs with the same scalar EmailAddress declaration compose cleanly even when their validation diverges. That is precisely why you import one shared instance and pin its version — the safety has to come from your build, not from composition.

Should serialize validate too, or only the input functions?

Validate in serialize as well. It defends against internal values that predate the scalar or were written by a path that skipped validation, so a malformed address cannot leak into a response. The cost is one regex test on output, which is negligible against the risk of emitting bad data.

How do I evolve the scalar’s rules without breaking subgraphs?

Change the rule in the shared package, bump its version, and roll it out to every subgraph together, gating the rollout with rover subgraph check. If only some consumers can adopt the stricter rule, that signals two distinct scalars; give the new one its own name rather than letting EmailAddress mean different things on different paths.

Validating Custom Scalar Inputs Across Subgraphs #

When to use this pattern #

Prerequisites #

The three coordinates that must agree #

Implementation walkthrough #

Composition implications of divergent definitions #

Verification steps #

Common mistakes & gotchas #

Frequently Asked Questions #

Related #