Fix issue #1029: Support indexing through abstract types#1054
Fix issue #1029: Support indexing through abstract types#1054marcdaniels-toast wants to merge 22 commits intoblock:mainfrom
Conversation
…SON schema type enum When an abstract type (union or interface) has an index defined, its concrete subtypes should be indexable into that index. However, ElasticGraph was excluding those subtypes from the JSON schema type enum, causing validation errors when trying to index events for those types. Changes: 1. Modified `build_public_json_schema` in results.rb to include subtypes of indexed abstract types by using `recursively_resolve_subtypes` to flatten nested abstract types to their concrete implementations. 2. Added `indexed?` override in object_type.rb to recognize transitive indexing. A type is now considered indexed if it has its own index OR if it's a subtype of an indexed abstract type. 3. Fixed `self_update_target` in has_indices.rb to check `index_def.nil?` instead of `!indexed?`, ensuring update targets are only generated for types with direct indices. 4. Added 5 comprehensive test cases covering: - Union with mixed indexing (one subtype indexed, one not) - Interface with root_query_fields and non-indexed implementations - Deduplication when types have multiple indexing paths - Nested abstract types (unions containing interfaces) - Alphabetical ordering preservation This enables schemas where a union has an index and its subtypes can be indexed into that union's index, even if they don't have their own indices. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
This commit reverts the changes to the `indexed?` method and adopts a more conservative fix that only modifies the JSON schema type enum generation. Changes: 1. Removed `indexed?` override in object_type.rb that made transitively indexed types return true for `indexed?`. This was causing unwanted root query fields to be generated for types that shouldn't have them. 2. Reverted has_indices.rb `self_update_target` to check `!indexed?` instead of `index_def.nil?`, restoring the original behavior. 3. Updated test case to not have transitively indexed types implement indexed interfaces, as the conservative approach enforces strict consistency: all subtypes of an indexed abstract type must either all be directly indexed or all be non-indexed. The core fix (transitive indexing in JSON schema type enum) remains in results.rb from the previous commit, enabling types to be indexed into their parent union/interface index even without their own index. With this conservative approach: - Transitively indexed types CAN appear in JSON schema type enum ✓ - Transitively indexed types CANNOT implement indexed interfaces ✓ - Root query fields only generated for directly indexed types ✓ All 1635 tests pass with 100% code and branch coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
myronmarston
left a comment
There was a problem hiding this comment.
This is coming along nicely!
I tried applying the diff from #1029, ran boot_locally and queried parts...and I got no results. So while it no longer errors, it doesn't quite work yet.
I'd also like us to include acceptance testing that this works end-to-end (so we don't have to do the kind of manual testing I did to discover it didn't work):
- Add some types to https://github.com/block/elasticgraph/tree/main/config/schema that are defined this way
- Add factories for them in https://github.com/block/elasticgraph/tree/main/spec_support/lib/elastic_graph/spec_support/factories
- Use those factories in the Rakefile so that
rake boot_locallyincludes some factory-generated data. - Add some tests to https://github.com/block/elasticgraph/blob/main/elasticgraph-graphql/spec/acceptance/search_spec.rb for this case so that our acceptance tests cover it end-to-end.
...and of course, fix any issues that process surfaces :).
Besides proving that it works end to end, it'll also be useful when it comes time to test the implementation of #1024 -- we want _typename filtering to work for this new case as well as the when each subtype has its own index.
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/lib/elastic_graph/schema_definition/results.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb
Outdated
Show resolved
Hide resolved
Add resolved_index_def method that returns either a type's own index_def or its single parent abstract type's index_def. This ensures each concrete type resolves to at most one index: either directly declared or inherited from a parent union/interface. - Concrete type with t.index uses only its own index (parent ignored) - Concrete type without t.index uses parent's index (with __typename) - Validation error if type has no index and multiple indexed parents - Inherit routing/rollover config from resolved index Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes: - Reordered tests: pure transitive indexing first (simpler concept) - Improved interface test to include nested interface inheritance - Removed redundant tests (deduplication, alphabetical ordering) - Removed redundant .sort from results.rb (sorting handled in event_envelope.rb) - Verified .uniq is properly tested by mixed indexing scenario Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…c_json_schema. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ting test. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unnecessary fields and use type_named() in expectation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements comprehensive acceptance tests for transitive indexing feature where object types can be indexed through a parent interface without their own index definition. Adds Store interface with OnlineStore and PhysicalStore implementations, factories for test data generation, and acceptance tests covering querying, filtering, sorting, and pagination across mixed types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes boot_locally by adding stores index definition to both development.yaml and development_with_apollo.yaml. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…resolved-index-def Fix transitive indexing, improve test coverage
myronmarston
left a comment
There was a problem hiding this comment.
Thanks @marcdaniels-toast. This has gotten really complicated! I'm not done with my review but I'm out of time for the day and want to submit what I have so far.
| t.field "established_on", "Date" | ||
|
|
||
| t.index "stores" do |i| | ||
| i.default_sort "id", :desc |
There was a problem hiding this comment.
| i.default_sort "id", :desc | |
| i.default_sort "established_on", :desc |
/nit id is always used as the tie breaker for sorting already. Usually it's most natural to sort descending by the date so that recent data shows up first.
| t.field "address", "String!" | ||
| t.field "square_footage", "Int" | ||
| t.field "established_on", "Date" | ||
| end |
There was a problem hiding this comment.
I think it might be worth having one of these subtypes define t.index. That way, it not only covers the fact that the index can be defined on the abstract type, but also that it can be overridden on a subtype.
| """ | ||
| Fetches `Store`s based on the provided arguments. | ||
| """ | ||
| stores( |
There was a problem hiding this comment.
It's interesting reading the produced GraphQL schema. I think what this generates (Query.stores and Query.store_aggregations) is correct with your current schema definition since there's only one stores index, but if we defined index on one of the subtypes (say, PhysicalStore) then we should also get Query.physical_stores/Query.physical_store_aggregations.
| test_widget_search_highlighting(widget1, widget2, widget3) | ||
| end | ||
|
|
||
| it "queries union types with transitive indexing", :expect_search_routing do |
There was a problem hiding this comment.
| it "queries union types with transitive indexing", :expect_search_routing do | |
| it "queries an abstract type that defines its own index", :expect_search_routing do |
Technically, you have an interface type here, not a union type. I don't think that distinction matters, though--we can just say "abstract types".
Also, I think "transitive indexing" could be a confusing term to future readers who don't have the context of this PR in their heads. "that defines its own index" is more clear, I think.
| # Subtypes without their own index should require __typename for mixed-type index resolution | ||
| expect(json_schema.dig("$defs", "MechanicalPart", "required")).to include("__typename") | ||
| expect(json_schema.dig("$defs", "ElectricalPart", "required")).to include("__typename") | ||
| end |
There was a problem hiding this comment.
This test makes a point to say that __typename is only required for subtypes that don't override the index. But it doesn't show any examples of subtypes that override index to demonstrate that those subtypes don't need __typename.
Can you update this example to cover that case as well? You could override index on one of the subtypes.
|
|
||
| required_field_names = required_fields.map(&:name) | ||
| # For types that are indexed into a parent union/interface's mixed-type index (an index shared | ||
| # with other concrete types), we need __typename to resolve the concrete type at query time, |
There was a problem hiding this comment.
While we need __typename at query time, that doesn't mean that we need it in the JSON schema. The ElasticGraphEventEnvelope already includes the typename in it as the type:
It feels duplicative to also require __typename within record (and in theory it could allow them to disagree!). I think we can instead solve this in elasticgraph-indexer via changes to record_preparer.rb -- that's the abstraction that "prepares" the record for indexing. It already has some logic related to __typename -- we could expand it to also handle this case.
There was a problem hiding this comment.
I think record_preparer.rb is a more appropriate place to handle this. It already has logic related to __typename handling and it'll simplify things to keep all the __typename logic in one spot:
|
|
||
| indexed_parents = schema_def_state.types_by_name.values.filter_map do |type| | ||
| next unless type.respond_to?(:abstract?) && type.abstract? | ||
| next unless type.respond_to?(:index_def) && type.index_def |
There was a problem hiding this comment.
obj.respond_to?(:some_method) && obj.some_method is a bit of a code smell in Ruby. (It's similar to using instanceof in java or kotlin--sometimes it has to be done but it's a good idea to consider if it can be avoided).
In this case, it can be avoided:
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
index 380ee11d..55be6314 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
@@ -262,9 +262,9 @@ module ElasticGraph
def resolved_index_def
return index_def if index_def
- indexed_parents = schema_def_state.types_by_name.values.filter_map do |type|
- next unless type.respond_to?(:abstract?) && type.abstract?
- next unless type.respond_to?(:index_def) && type.index_def
+ indexed_parents = schema_def_state.object_types_by_name.values.filter_map do |type|
+ next unless type.abstract?
+ next unless type.index_def
type if type.recursively_resolve_subtypes.any? { |subtype| subtype.name == name }
endNote that object_types_by_name is a bit of a misnomer--it's not just types defined via object_type but also types defined via union_type and interface_type. (Put another way, schema.object_type defines a concrete object type, but union_type and interface_type define abstract object types). OTOH, types_by_name includes scalar types, enum types, etc, and we don't want to consider those here. Once we iterate over the right collection we no longer need to check respond_to?.
| # index_def (if defined), or the single parent abstract type's index_def (if this | ||
| # type is a subtype of exactly one indexed abstract type). Raises an error if this | ||
| # type has no index and is a subtype of multiple indexed abstract types. | ||
| def resolved_index_def |
There was a problem hiding this comment.
I'm concerned about the performance implications of this method. It essentially linearly scans the entire universe of types to try to find possible parents, and then for each possible parent it asks them to recursively resolve subtypes.
We should be able to implement a much more efficient way to handle this. And also, I think we can do this in a way that provides a nicer internal abstraction that makes it way simpler to implement this method.
I think we can follow the lead of recursively_resolve_subtypes and provide a recursively_resolve_supertypes API so that any time we need to get the supertypes we have an API to do so.
- For
union_type,recursively_resolve_supertypeswould be empty because union types can't themselves have any supertypes--they can't be a member of another type union or implement an interface (at least according to my read of the GraphQL spec). - For
interface_type, it can only be a member of a supertype by implementing an interface. The list of implemented interfaces is directly available viaMixins::ImplementsInterfaces#implemented_interfaces(no need to iterate over the full list of types onschema_def_state). - For
object_type, there are two ways for it to be a member of a supertype: it can implement an interface (and useMixins::ImplementsInterfaces#implemented_interfaces). In addition, it can be a member of a type union. In that case there's no direct list of types available because the relationship is defined on the union type. But we can implement a map that provides a direct lookup, using a technique we use elsewhere. Take a look at implementations_by_interface_ref -- this is used whenimplementsis called to provide a reverse lookup, which we then use inInterfaceType#resolve_subtypesto quickly return the list of subtypes with no need to iterate over the universe of types. We can do a similar thing by trackingunion_supertypes_by_subtype_ref.
I'm not yet sure if this logic should be defined with a new HasSupertypes mixin, or what but there's a seed of an idea here that would simplify things and make things more efficient.
There was a problem hiding this comment.
The direction this takes us results in an API that's misleading:
- Some types that return false from
indexed?wind up being indexed into a parent index definition. - It's not very clear from the names how
resolved_index_defandindex_defdiffer. - More generally, there's a lack of precision around what it means for a type to be indexed.
I think it would help things if we did a prep refactoring before this PR which just renamed things to pave the way for more intuitive, precise naming. Some ideas:
- The existing
index_defcould be renamed toown_index_defwhich more clearly communicates that it'snilwhen a type is inheriting an index definition from an abstract parent. indexed?could be split- For cases which only care about
own_index_def, we can offerhas_own_index_def? - For cases which care about whether the type winds up being a "root" document type vs a type embedded in other object types, we can offer
root_document_type?. For example,Mixins::HasSubtypes#indexed?currently is implemented according to theroot_document_type?semantic (it's true if it lives at the document root in OpenSearch), where some spots in the code that checkindexed?probably only care abouthas_own_index_def?.
- For cases which only care about
Resolving this naming ambiguity/confusion by preemptively renaming things in a new PR would help make this PR easier to reason about.
Fixes #1029
Summary
Enables object types to inherit the index configuration from their parent abstract type (union/interface) without requiring the object type's own index definition. Includes an end-to-end acceptance test demonstrating querying, filtering, sorting, and pagination across a mixed-type index.
Implementation
Core Changes:
Type Resolution (
has_indices.rb): Addedresolved_index_defmethod that allows types to inherit their parent abstract type's index. Validates that a type cannot be a subtype of multiple indexed abstract types.JSON Schema Generation (
results.rb): Modifiedbuild_public_json_schemato include concrete subtypes of indexed abstract types usingrecursively_resolve_subtypes, enabling validation of transitively indexed events.Index Uniqueness Validation (
schema.rb): Updatedindexed_document_types_by_index_definition_nameto allow subtypes to share their parent's index without raising duplicate index errors.__typename Handling:
object_type.rb): Adds__typenamefield definition in for types using transitive indexing (field is marked required in generated JSON schema)factory.rb): Injects__typenamevalue from event type into record at indexing time when required but missing, allowing clients to omit it from payloads while ensuring it's present in indexed documents for query-time type resolutionRuntime Metadata (
has_indices.rb): Updated to useresolved_index_deffor generating update targets and index routing.Testing
Unit tests verify core logic (JSON schema generation, type resolution,
__typenameinjection). Acceptance tests validate end-to-end functionality using a new Store interface with OnlineStore/PhysicalStore subtypes in the stock test schema.Schema Artifacts
config/schema/artifacts/andconfig/schema/artifacts_with_apollo/Testing
Unit Tests:
__typenameinjection at schema and indexing levelsAcceptance Tests: