Conversation
ravangen
left a comment
There was a problem hiding this comment.
My Rust skills are nonexistent, but leaving some comments around GraphQL aspects
Replace Apollo-style document signatures with a more aggressive normalization algorithm for observability signatures. The normalizer now produces canonical structural representations by: - Expanding named fragments into inline fragments (merging spread + definition directives) - Removing all aliases - Replacing all argument values with a single placeholder ($\_) - Flattening bare inline fragments (no type condition, no directives) - Deep merging inline fragments with matching type condition and directive set - Dropping variable definitions and operation names - Sorting fields first (alphabetically), then inline fragments (by type condition, then directives) Architecture changes from single-pass visit-and-write to a 3-phase pipeline: build IR → normalize → serialize. The IR holds borrowed references into the original AST with owned structural nesting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key changes: - Add bumpalo arena allocator for all IR allocations - Replace HashMap/HashSet with Vec for fragment lookup and cycle detection - Single-pass build+normalize (merge flatten/sort into build, remove normalize_ir.rs) - Use swap_remove instead of remove in inline fragment merge (O(1) vs O(n)) - Skip re-sort of merged IF children when no merge occurred - Replace write! macro with direct push_str in serializer
Add crate-level doc comment to lib.rs describing the full normalization algorithm in numbered steps (1-4). Each module and key function references back to the specific algorithm step it implements (e.g. 'step 2a', 'step 2d-2e') so the high-level algorithm can be traced through the code.
Each algorithm step now shows a concrete input → normalized example so the transformations are immediately obvious without reading the code.
ea89296 to
9f6c997
Compare
|
@ravangen updated those tests and do another pass on them to add about ~7 more edge cases |
9f6c997 to
a6f790b
Compare
ravangen
left a comment
There was a problem hiding this comment.
Couple more thoughts. I'm excited for this!
| #[test] | ||
| fn duplicate_fields_preserved() { | ||
| let doc = parse("{ a a a }"); | ||
| assert_eq!(normalize(&doc, None).unwrap(), "query{a a a}"); |
There was a problem hiding this comment.
This one is interesting in that we could decide to further merge them into "query{a}" due to same name/arguments/directives, but I imagine it isn't worth the compute time.
There was a problem hiding this comment.
Yeah it would get much more complex due to child selection merging so I think this is fine and stable enough
|
Some AI feedback Mr Claude shared --
In build.rs lines ~781-789, normalize_in_place uses swap_remove to merge
The test transitive_fragments_expanded shows:
{ a a a } normalizes to query{a a a}. This means a query selecting a once vs. @swalkinshaw any risks you see here? |
|
- Add arguments to more DashboardQuery benchmark fields for realism - Add tests for 3+ inline fragment merging with swap_remove - Add test for repeated directives on fields - Extend variable renaming test with different types/defaults - Replace assert! with assert_eq! throughout - Inline let bindings into assert_eq! for directness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace "semantically equivalent" framing with explicit description of what produces the same signature and what intentionally does not (field dedup, cross-level fragment merging). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
66319e1 to
c535fc7
Compare
Argument values are now omitted entirely, keeping only `name:` per
argument. Aligns with the GraphQL ArgumentCoordinate spec format,
saves bytes, and removes a fake variable reference that added no
information.
Before: query{field(a:$_,b:$_)@dir(x:$_)}
After: query{field(a:,b:)@dir(x:)}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c8f53cc to
7f39e23
Compare
ravangen
left a comment
There was a problem hiding this comment.
GraphQL aspect looks good. Would appreciate a review from a Rust developer
Ravinlathigra
left a comment
There was a problem hiding this comment.
Thanks for tackling this Scott!
adampetro
left a comment
There was a problem hiding this comment.
Very cool!
Did you want to add a line in the README with this new crate?
| pub(crate) fn build_selections<'a, 'bump, E: ExecutableDocument + 'a>( | ||
| selection_set: &'a E::SelectionSet, | ||
| fragment_defs: &[(&'a str, &'a E::FragmentDefinition)], | ||
| expanding: &mut Vec<&'a str>, |
There was a problem hiding this comment.
I'm guessing you benchmarked and vec is faster than HashSet given the list is usually small?
There was a problem hiding this comment.
And similarly is that why this doesn't go through the bump allocator?
There was a problem hiding this comment.
Yep exactly since the stack is tiny here, plus vec matches the push/pop semantics better. That's also why it doesn't use bump allocator because it's re-used and growing/shrinking. And more technically, the bump allocator can't free individual items.
| // Step 2b: expand fragment spreads into inline fragments | ||
| SelectionReference::FragmentSpread(spread) => { | ||
| let name = spread.name(); | ||
| // Cycle detection: skip if this fragment is already being expanded |
There was a problem hiding this comment.
is this needed for spec-compliant queries? or do we just have this for safety in case the query wasn't validated in advance?
There was a problem hiding this comment.
Yeah this would be caught by static validation and not parsing so it does make it a little more flexible. I'm not 100% sure how this will be implemented in our application yet. It might be bundled into the full "analyze" workflow which would likely run static validation first but it might be beneficial to compute and log the signature as early as possible (even for invalid documents) so I'm inclined to keep it.
There was a problem hiding this comment.
Ideally we can derive this pre static validation and gracefully fail or even get an hash without being fully compliant.
| expanding.push(name); | ||
|
|
||
| let mut directives = build_directives::<false, E>(spread.directives(), bump); | ||
| directives.extend(build_directives::<false, E>(frag_def.directives(), bump)); |
There was a problem hiding this comment.
is this because we have no way of knowing if the directives on the fragment definition have an impact on what gets spread so we're being conservative and including them?
There was a problem hiding this comment.
being conservative here yeah which helps to distinguish between ...F @skip and ...F for example
| if let NormalizedSelection::InlineFragment(inf) = removed { | ||
| if let NormalizedSelection::InlineFragment(ref mut target) = | ||
| selections[i] |
There was a problem hiding this comment.
just to confirm my understanding, we expect this to always be the case given how should_merge was evaluated?
There was a problem hiding this comment.
Good point, in fact we might as well encode this to make it unreachable!?
There was a problem hiding this comment.
Added. I think it simplifies and makes it clearer
| impl std::fmt::Display for SignatureError { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| match self { | ||
| Self::OperationNotFound(name) => write!(f, "operation not found: {name}"), | ||
| Self::AmbiguousOperation => { | ||
| write!(f, "multiple operations found; specify operation name") | ||
| } | ||
| Self::NoOperations => write!(f, "no operations in document"), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl std::error::Error for SignatureError {} |
There was a problem hiding this comment.
maybe using the thiserror crate would simplify doing this via macro?
There was a problem hiding this comment.
I would if we already had the dependency 🤷♂️ It looks cool, but only 3 variants right now
Successor to #92 which implements a more "aggressive" static normalization algorithm for the purpose of stable observability signatures.
The normalizer now produces canonical structural representations by:
note: the normalized operation document is not required to be valid GraphQL
Architecture changes from single-pass visit-and-write to a 3-phase pipeline: build IR → normalize → serialize. The IR holds borrowed references into the original AST with owned structural nesting.
Since the normalization design is a little complex, I've purposely included a lot of comments and inline examples within the code.
Performance: I've already done an
/autoresearchpass on this and included the performance optimizations here. See 6843650 for details.