Add operation normalizer v2 by swalkinshaw · Pull Request #97 · Shopify/bluejay

swalkinshaw · 2026-03-25T20:07:50Z

Successor to #92 which implements a more "aggressive" static normalization algorithm for the purpose of stable observability signatures.

The normalizer now produces canonical structural representations by:

expanding named fragments into inline fragments (merging spread + definition directives)
removing all aliases
dropping all argument values
flattening bare inline fragments (no type condition, no directives)
deep merging inline fragments with matching type condition and directive set
dropping variable definitions and operation names
sorting fields first (alphabetically), then inline fragments (by type condition, then directives)

note: the normalized operation document is not required to be valid GraphQL

Architecture changes from single-pass visit-and-write to a 3-phase pipeline: build IR → normalize → serialize. The IR holds borrowed references into the original AST with owned structural nesting.

Since the normalization design is a little complex, I've purposely included a lot of comments and inline examples within the code.

Performance: I've already done an /autoresearch pass on this and included the performance optimizations here. See 6843650 for details.

     Running benches/normalize.rs (/Users/scottwalkinshaw/dev/bluejay/target/release/deps/normalize-bb82272d4a5ae77b)

normalize_small         time:   [97.066 ns 97.297 ns 97.599 ns]

signature_small         time:   [212.54 ns 213.34 ns 214.14 ns]

normalize_medium        time:   [443.75 ns 445.53 ns 447.49 ns]

signature_medium        time:   [676.01 ns 677.86 ns 679.72 ns]

normalize_complex       time:   [1.9462 µs 1.9524 µs 1.9595 µs]

signature_complex       time:   [2.5373 µs 2.5479 µs 2.5625 µs]

normalize_fragment_colocation
                        time:   [2.9908 µs 3.0127 µs 3.0365 µs]

signature_fragment_colocation
                        time:   [4.0905 µs 4.1078 µs 4.1279 µs]

normalize_wide_reverse  time:   [1.0499 µs 1.0536 µs 1.0577 µs]

signature_wide_reverse  time:   [1.4475 µs 1.4502 µs 1.4542 µs]

ravangen

My Rust skills are nonexistent, but leaving some comments around GraphQL aspects

bluejay-operation-normalizer/benches/normalize.rs

bluejay-operation-normalizer/src/build.rs

bluejay-operation-normalizer/src/lib.rs

Replace Apollo-style document signatures with a more aggressive normalization algorithm for observability signatures. The normalizer now produces canonical structural representations by: - Expanding named fragments into inline fragments (merging spread + definition directives) - Removing all aliases - Replacing all argument values with a single placeholder ($\_) - Flattening bare inline fragments (no type condition, no directives) - Deep merging inline fragments with matching type condition and directive set - Dropping variable definitions and operation names - Sorting fields first (alphabetically), then inline fragments (by type condition, then directives) Architecture changes from single-pass visit-and-write to a 3-phase pipeline: build IR → normalize → serialize. The IR holds borrowed references into the original AST with owned structural nesting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Key changes: - Add bumpalo arena allocator for all IR allocations - Replace HashMap/HashSet with Vec for fragment lookup and cycle detection - Single-pass build+normalize (merge flatten/sort into build, remove normalize_ir.rs) - Use swap_remove instead of remove in inline fragment merge (O(1) vs O(n)) - Skip re-sort of merged IF children when no merge occurred - Replace write! macro with direct push_str in serializer

Add crate-level doc comment to lib.rs describing the full normalization algorithm in numbered steps (1-4). Each module and key function references back to the specific algorithm step it implements (e.g. 'step 2a', 'step 2d-2e') so the high-level algorithm can be traced through the code.

Each algorithm step now shows a concrete input → normalized example so the transformations are immediately obvious without reading the code.

swalkinshaw · 2026-03-25T22:43:19Z

@ravangen updated those tests and do another pass on them to add about ~7 more edge cases

ravangen

Couple more thoughts. I'm excited for this!

bluejay-operation-normalizer/src/build.rs

ravangen · 2026-03-26T13:14:32Z

bluejay-operation-normalizer/src/lib.rs

+    #[test]
+    fn duplicate_fields_preserved() {
+        let doc = parse("{ a a a }");
+        assert_eq!(normalize(&doc, None).unwrap(), "query{a a a}");


This one is interesting in that we could decide to further merge them into "query{a}" due to same name/arguments/directives, but I imagine it isn't worth the compute time.

Yeah it would get much more complex due to child selection merging so I think this is fine and stable enough

bluejay-operation-normalizer/src/lib.rs

bluejay-operation-normalizer/src/serialize.rs

Ravinlathigra · 2026-03-26T14:16:36Z

Some AI feedback Mr Claude shared --
Correctness Issues

swap_remove in merge breaks determinism (bug)

In build.rs lines ~781-789, normalize_in_place uses swap_remove to merge
inline fragments. swap_remove moves the last element into the removed
position, which changes the relative order of unprocessed elements. While the
final sort_unstable_by at the end of the function rescues the top-level order,
the merged children from swap_removed fragments are appended in a
non-deterministic order (depending on which fragments were at the end of the
vec). After merging, the children are re-sorted, so this is actually safe for
the final output. However, swap_remove could cause the loop to skip an
element: after swapping index j with the last element, the new element at j is
never checked for merging with i. This is a correctness bug — it could miss a
merge if there are 3+ fragments with the same (type_condition, directives).

Transitive fragments don't merge — this may be intentional but worth
considering

The test transitive_fragments_expanded shows:
query{...on T{a ...on T{b ...on T{c}}}}
Fragment A on T contains ...B (also on T), which contains ...C (also on T).
The result nests them rather than merging into a single ...on T{a b c}. This
happens because merging only occurs at a single level. Since all three share
the same type condition, a fully-normalized form could merge them. Whether
this matters depends on your use case — if two queries use different fragment
structures but resolve to the same fields on the same type, they'd get
different signatures. If the goal is purely "same structure = same signature",
this is a gap. If the goal is "same query text modulo cosmetics = same
signature", this is fine.

Duplicate fields are preserved (intentional but worth noting)

{ a a a } normalizes to query{a a a}. This means a query selecting a once vs.
three times gets different signatures. For most GraphQL servers this
distinction is meaningless (field resolution is idempotent), but for your
"same footprint" goal this seems correct — the request structure IS different.

@swalkinshaw any risks you see here?

swalkinshaw · 2026-03-26T15:19:34Z

@Ravinlathigra

valid concern but not a bug in reality. Added a test case (three_same_type_fragments_with_interleaved_other) to explicitly confirm this
we could theoretically flatten in that case but it would require cross-level merging which gets very complex. It's hard to draw a hard boundary here for what normalization we will and won't do, in some cases it's just too complex and not worth it imo. Basically "normalize away the most common sources of differences with the least implementation complexity"
replied to @ravangen above with the same question, but this applies to the answer above here ^

- Add arguments to more DashboardQuery benchmark fields for realism - Add tests for 3+ inline fragment merging with swap_remove - Add test for repeated directives on fields - Extend variable renaming test with different types/defaults - Replace assert! with assert_eq! throughout - Inline let bindings into assert_eq! for directness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace "semantically equivalent" framing with explicit description of what produces the same signature and what intentionally does not (field dedup, cross-level fragment merging). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@dir

Argument values are now omitted entirely, keeping only `name:` per argument. Aligns with the GraphQL ArgumentCoordinate spec format, saves bytes, and removes a fake variable reference that added no information. Before: query{field(a:$_,b:$_)@dir(x:$_)} After: query{field(a:,b:)@dir(x:)} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ravangen

GraphQL aspect looks good. Would appreciate a review from a Rust developer

Ravinlathigra

Thanks for tackling this Scott!

adampetro

Very cool!

Did you want to add a line in the README with this new crate?

adampetro · 2026-03-26T19:50:23Z

bluejay-operation-normalizer/src/build.rs

+pub(crate) fn build_selections<'a, 'bump, E: ExecutableDocument + 'a>(
+    selection_set: &'a E::SelectionSet,
+    fragment_defs: &[(&'a str, &'a E::FragmentDefinition)],
+    expanding: &mut Vec<&'a str>,


I'm guessing you benchmarked and vec is faster than HashSet given the list is usually small?

And similarly is that why this doesn't go through the bump allocator?

Yep exactly since the stack is tiny here, plus vec matches the push/pop semantics better. That's also why it doesn't use bump allocator because it's re-used and growing/shrinking. And more technically, the bump allocator can't free individual items.

adampetro · 2026-03-26T19:51:45Z

bluejay-operation-normalizer/src/build.rs

+            // Step 2b: expand fragment spreads into inline fragments
+            SelectionReference::FragmentSpread(spread) => {
+                let name = spread.name();
+                // Cycle detection: skip if this fragment is already being expanded


is this needed for spec-compliant queries? or do we just have this for safety in case the query wasn't validated in advance?

Yeah this would be caught by static validation and not parsing so it does make it a little more flexible. I'm not 100% sure how this will be implemented in our application yet. It might be bundled into the full "analyze" workflow which would likely run static validation first but it might be beneficial to compute and log the signature as early as possible (even for invalid documents) so I'm inclined to keep it.

Ideally we can derive this pre static validation and gracefully fail or even get an hash without being fully compliant.

adampetro · 2026-03-26T19:55:15Z

bluejay-operation-normalizer/src/build.rs

+                    expanding.push(name);
+
+                    let mut directives = build_directives::<false, E>(spread.directives(), bump);
+                    directives.extend(build_directives::<false, E>(frag_def.directives(), bump));


is this because we have no way of knowing if the directives on the fragment definition have an impact on what gets spread so we're being conservative and including them?

being conservative here yeah which helps to distinguish between ...F @skip and ...F for example

adampetro · 2026-03-26T20:02:24Z

bluejay-operation-normalizer/src/build.rs

+                        if let NormalizedSelection::InlineFragment(inf) = removed {
+                            if let NormalizedSelection::InlineFragment(ref mut target) =
+                                selections[i]


just to confirm my understanding, we expect this to always be the case given how should_merge was evaluated?

Good point, in fact we might as well encode this to make it unreachable!?

Added. I think it simplifies and makes it clearer

adampetro · 2026-03-26T20:18:31Z

bluejay-operation-normalizer/src/lib.rs

+impl std::fmt::Display for SignatureError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            Self::OperationNotFound(name) => write!(f, "operation not found: {name}"),
+            Self::AmbiguousOperation => {
+                write!(f, "multiple operations found; specify operation name")
+            }
+            Self::NoOperations => write!(f, "no operations in document"),
+        }
+    }
+}
+
+impl std::error::Error for SignatureError {}


maybe using the thiserror crate would simplify doing this via macro?

I would if we already had the dependency 🤷‍♂️ It looks cool, but only 3 variants right now

bluejay-operation-normalizer/src/serialize.rs

ravangen reviewed Mar 25, 2026

View reviewed changes

swalkinshaw and others added 6 commits March 25, 2026 17:29

Add operation-normalizer crate

d985e41

Add inline examples to normalization algorithm docs

b2c47c8

Each algorithm step now shows a concrete input → normalized example so the transformations are immediately obvious without reading the code.

Run cargo fmt and fix clippy warning (elide lifetime)

67f99b7

swalkinshaw force-pushed the add-operation-normalizer-v2 branch from ea89296 to 9f6c997 Compare March 25, 2026 22:42

Improve test coverage

a6f790b

swalkinshaw force-pushed the add-operation-normalizer-v2 branch from 9f6c997 to a6f790b Compare March 25, 2026 22:47

ravangen reviewed Mar 26, 2026

View reviewed changes

swalkinshaw and others added 2 commits March 26, 2026 10:20

swalkinshaw force-pushed the add-operation-normalizer-v2 branch from 66319e1 to c535fc7 Compare March 26, 2026 15:20

swalkinshaw mentioned this pull request Mar 26, 2026

Add operation normalizer #92

Closed

swalkinshaw force-pushed the add-operation-normalizer-v2 branch from c8f53cc to 7f39e23 Compare March 26, 2026 16:52

ravangen approved these changes Mar 26, 2026

View reviewed changes

swalkinshaw requested review from JoviDeCroock and adampetro March 26, 2026 17:46

Ravinlathigra approved these changes Mar 26, 2026

View reviewed changes

adampetro approved these changes Mar 26, 2026

View reviewed changes

swalkinshaw added 2 commits March 26, 2026 16:56

PR feedback

e767180

Add README

db76d56

swalkinshaw merged commit 6e950c8 into main Mar 27, 2026
4 checks passed

swalkinshaw deleted the add-operation-normalizer-v2 branch March 27, 2026 17:21

Conversation

swalkinshaw commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravangen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swalkinshaw commented Mar 25, 2026

Uh oh!

ravangen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ravinlathigra commented Mar 26, 2026

Uh oh!

swalkinshaw commented Mar 26, 2026

Uh oh!

ravangen left a comment

Choose a reason for hiding this comment

Uh oh!

Ravinlathigra left a comment

Choose a reason for hiding this comment

Uh oh!

adampetro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swalkinshaw commented Mar 25, 2026 •

edited

Loading