Skip to content

Fix issue #1029: Support indexing through abstract types#1054

Draft
marcdaniels-toast wants to merge 22 commits intoblock:mainfrom
marcdaniels-toast:fix-transitive-indexing-issue-1029
Draft

Fix issue #1029: Support indexing through abstract types#1054
marcdaniels-toast wants to merge 22 commits intoblock:mainfrom
marcdaniels-toast:fix-transitive-indexing-issue-1029

Conversation

@marcdaniels-toast
Copy link
Contributor

@marcdaniels-toast marcdaniels-toast commented Feb 26, 2026

Fixes #1029

Summary

Enables object types to inherit the index configuration from their parent abstract type (union/interface) without requiring the object type's own index definition. Includes an end-to-end acceptance test demonstrating querying, filtering, sorting, and pagination across a mixed-type index.

Implementation

Core Changes:

  1. Type Resolution (has_indices.rb): Added resolved_index_def method that allows types to inherit their parent abstract type's index. Validates that a type cannot be a subtype of multiple indexed abstract types.

  2. JSON Schema Generation (results.rb): Modified build_public_json_schema to include concrete subtypes of indexed abstract types using recursively_resolve_subtypes, enabling validation of transitively indexed events.

  3. Index Uniqueness Validation (schema.rb): Updated indexed_document_types_by_index_definition_name to allow subtypes to share their parent's index without raising duplicate index errors.

  4. __typename Handling:

    • Schema (object_type.rb): Adds __typename field definition in for types using transitive indexing (field is marked required in generated JSON schema)
    • Indexer (factory.rb): Injects __typename value from event type into record at indexing time when required but missing, allowing clients to omit it from payloads while ensuring it's present in indexed documents for query-time type resolution
  5. Runtime Metadata (has_indices.rb): Updated to use resolved_index_def for generating update targets and index routing.

Testing

Unit tests verify core logic (JSON schema generation, type resolution, __typename injection). Acceptance tests validate end-to-end functionality using a new Store interface with OnlineStore/PhysicalStore subtypes in the stock test schema.

Schema Artifacts

  • Regenerated all artifacts in config/schema/artifacts/ and config/schema/artifacts_with_apollo/

Testing

Unit Tests:

  • JSON schema generation with mixed indexing scenarios
  • Type resolution and validation
  • __typename injection at schema and indexing levels
  • Nested abstract types (interfaces) and index name deduplication

Acceptance Tests:

  • Query all stores through interface
  • Filter by common fields across subtypes
  • Filter by ID with interface-level filtering
  • Sort by common fields across types
  • Cursor-based pagination on interface queries
  • NOT filter negation

…SON schema type enum

When an abstract type (union or interface) has an index defined, its
concrete subtypes should be indexable into that index. However, ElasticGraph
was excluding those subtypes from the JSON schema type enum, causing
validation errors when trying to index events for those types.

Changes:

1. Modified `build_public_json_schema` in results.rb to include subtypes
   of indexed abstract types by using `recursively_resolve_subtypes` to
   flatten nested abstract types to their concrete implementations.

2. Added `indexed?` override in object_type.rb to recognize transitive
   indexing. A type is now considered indexed if it has its own index OR
   if it's a subtype of an indexed abstract type.

3. Fixed `self_update_target` in has_indices.rb to check `index_def.nil?`
   instead of `!indexed?`, ensuring update targets are only generated for
   types with direct indices.

4. Added 5 comprehensive test cases covering:
   - Union with mixed indexing (one subtype indexed, one not)
   - Interface with root_query_fields and non-indexed implementations
   - Deduplication when types have multiple indexing paths
   - Nested abstract types (unions containing interfaces)
   - Alphabetical ordering preservation

This enables schemas where a union has an index and its subtypes can be
indexed into that union's index, even if they don't have their own indices.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2026

CLA assistant check
All committers have signed the CLA.

marcdaniels-toast and others added 2 commits February 26, 2026 20:02
This commit reverts the changes to the `indexed?` method and adopts a more
conservative fix that only modifies the JSON schema type enum generation.

Changes:
1. Removed `indexed?` override in object_type.rb that made transitively
   indexed types return true for `indexed?`. This was causing unwanted
   root query fields to be generated for types that shouldn't have them.

2. Reverted has_indices.rb `self_update_target` to check `!indexed?`
   instead of `index_def.nil?`, restoring the original behavior.

3. Updated test case to not have transitively indexed types implement
   indexed interfaces, as the conservative approach enforces strict
   consistency: all subtypes of an indexed abstract type must either
   all be directly indexed or all be non-indexed.

The core fix (transitive indexing in JSON schema type enum) remains
in results.rb from the previous commit, enabling types to be indexed
into their parent union/interface index even without their own index.

With this conservative approach:
- Transitively indexed types CAN appear in JSON schema type enum ✓
- Transitively indexed types CANNOT implement indexed interfaces ✓
- Root query fields only generated for directly indexed types ✓

All 1635 tests pass with 100% code and branch coverage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along nicely!

I tried applying the diff from #1029, ran boot_locally and queried parts...and I got no results. So while it no longer errors, it doesn't quite work yet.

I'd also like us to include acceptance testing that this works end-to-end (so we don't have to do the kind of manual testing I did to discover it didn't work):

...and of course, fix any issues that process surfaces :).

Besides proving that it works end to end, it'll also be useful when it comes time to test the implementation of #1024 -- we want _typename filtering to work for this new case as well as the when each subtype has its own index.

@marcdaniels-toast marcdaniels-toast marked this pull request as draft February 27, 2026 13:26
marcdaniels-toast and others added 12 commits February 27, 2026 19:52
Add resolved_index_def method that returns either a type's own index_def
or its single parent abstract type's index_def. This ensures each concrete
type resolves to at most one index: either directly declared or inherited
from a parent union/interface.

- Concrete type with t.index uses only its own index (parent ignored)
- Concrete type without t.index uses parent's index (with __typename)
- Validation error if type has no index and multiple indexed parents
- Inherit routing/rollover config from resolved index

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Reordered tests: pure transitive indexing first (simpler concept)
- Improved interface test to include nested interface inheritance
- Removed redundant tests (deduplication, alphabetical ordering)
- Removed redundant .sort from results.rb (sorting handled in event_envelope.rb)
- Verified .uniq is properly tested by mixed indexing scenario

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…c_json_schema.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ting test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unnecessary fields and use type_named() in expectation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
marcdaniels-toast and others added 6 commits March 1, 2026 11:35
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements comprehensive acceptance tests for transitive indexing feature
where object types can be indexed through a parent interface without their
own index definition. Adds Store interface with OnlineStore and PhysicalStore
implementations, factories for test data generation, and acceptance tests
covering querying, filtering, sorting, and pagination across mixed types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes boot_locally by adding stores index definition to both
development.yaml and development_with_apollo.yaml.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…resolved-index-def

Fix transitive indexing, improve test coverage
@marcdaniels-toast marcdaniels-toast marked this pull request as ready for review March 2, 2026 13:39
@marcdaniels-toast marcdaniels-toast changed the title Fix issue #1029: Include subtypes of indexed abstract types in JSON schema type enum Fix issue #1029: Support transitive indexing through abstract types Mar 2, 2026
Copy link
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marcdaniels-toast. This has gotten really complicated! I'm not done with my review but I'm out of time for the day and want to submit what I have so far.

t.field "established_on", "Date"

t.index "stores" do |i|
i.default_sort "id", :desc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i.default_sort "id", :desc
i.default_sort "established_on", :desc

/nit id is always used as the tie breaker for sorting already. Usually it's most natural to sort descending by the date so that recent data shows up first.

t.field "address", "String!"
t.field "square_footage", "Int"
t.field "established_on", "Date"
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth having one of these subtypes define t.index. That way, it not only covers the fact that the index can be defined on the abstract type, but also that it can be overridden on a subtype.

"""
Fetches `Store`s based on the provided arguments.
"""
stores(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting reading the produced GraphQL schema. I think what this generates (Query.stores and Query.store_aggregations) is correct with your current schema definition since there's only one stores index, but if we defined index on one of the subtypes (say, PhysicalStore) then we should also get Query.physical_stores/Query.physical_store_aggregations.

test_widget_search_highlighting(widget1, widget2, widget3)
end

it "queries union types with transitive indexing", :expect_search_routing do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it "queries union types with transitive indexing", :expect_search_routing do
it "queries an abstract type that defines its own index", :expect_search_routing do

Technically, you have an interface type here, not a union type. I don't think that distinction matters, though--we can just say "abstract types".

Also, I think "transitive indexing" could be a confusing term to future readers who don't have the context of this PR in their heads. "that defines its own index" is more clear, I think.

# Subtypes without their own index should require __typename for mixed-type index resolution
expect(json_schema.dig("$defs", "MechanicalPart", "required")).to include("__typename")
expect(json_schema.dig("$defs", "ElectricalPart", "required")).to include("__typename")
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test makes a point to say that __typename is only required for subtypes that don't override the index. But it doesn't show any examples of subtypes that override index to demonstrate that those subtypes don't need __typename.

Can you update this example to cover that case as well? You could override index on one of the subtypes.


required_field_names = required_fields.map(&:name)
# For types that are indexed into a parent union/interface's mixed-type index (an index shared
# with other concrete types), we need __typename to resolve the concrete type at query time,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we need __typename at query time, that doesn't mean that we need it in the JSON schema. The ElasticGraphEventEnvelope already includes the typename in it as the type:

"type" => {
"description" => "The type of object present in `record`.",
"type" => "string",
# Sorting doesn't really matter here, but it's nice for the output in the schema artifact to be consistent.
"enum" => indexed_type_names.sort
},

It feels duplicative to also require __typename within record (and in theory it could allow them to disagree!). I think we can instead solve this in elasticgraph-indexer via changes to record_preparer.rb -- that's the abstraction that "prepares" the record for indexing. It already has some logic related to __typename -- we could expand it to also handle this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think record_preparer.rb is a more appropriate place to handle this. It already has logic related to __typename handling and it'll simplify things to keep all the __typename logic in one spot:

https://github.com/block/elasticgraph/blob/304731d5fe58839afbca0a710f4c10a522eea346/elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb


indexed_parents = schema_def_state.types_by_name.values.filter_map do |type|
next unless type.respond_to?(:abstract?) && type.abstract?
next unless type.respond_to?(:index_def) && type.index_def
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj.respond_to?(:some_method) && obj.some_method is a bit of a code smell in Ruby. (It's similar to using instanceof in java or kotlin--sometimes it has to be done but it's a good idea to consider if it can be avoided).

In this case, it can be avoided:

diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
index 380ee11d..55be6314 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/has_indices.rb
@@ -262,9 +262,9 @@ module ElasticGraph
         def resolved_index_def
           return index_def if index_def
 
-          indexed_parents = schema_def_state.types_by_name.values.filter_map do |type|
-            next unless type.respond_to?(:abstract?) && type.abstract?
-            next unless type.respond_to?(:index_def) && type.index_def
+          indexed_parents = schema_def_state.object_types_by_name.values.filter_map do |type|
+            next unless type.abstract?
+            next unless type.index_def
 
             type if type.recursively_resolve_subtypes.any? { |subtype| subtype.name == name }
           end

Note that object_types_by_name is a bit of a misnomer--it's not just types defined via object_type but also types defined via union_type and interface_type. (Put another way, schema.object_type defines a concrete object type, but union_type and interface_type define abstract object types). OTOH, types_by_name includes scalar types, enum types, etc, and we don't want to consider those here. Once we iterate over the right collection we no longer need to check respond_to?.

# index_def (if defined), or the single parent abstract type's index_def (if this
# type is a subtype of exactly one indexed abstract type). Raises an error if this
# type has no index and is a subtype of multiple indexed abstract types.
def resolved_index_def
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the performance implications of this method. It essentially linearly scans the entire universe of types to try to find possible parents, and then for each possible parent it asks them to recursively resolve subtypes.

We should be able to implement a much more efficient way to handle this. And also, I think we can do this in a way that provides a nicer internal abstraction that makes it way simpler to implement this method.

I think we can follow the lead of recursively_resolve_subtypes and provide a recursively_resolve_supertypes API so that any time we need to get the supertypes we have an API to do so.

  • For union_type, recursively_resolve_supertypes would be empty because union types can't themselves have any supertypes--they can't be a member of another type union or implement an interface (at least according to my read of the GraphQL spec).
  • For interface_type, it can only be a member of a supertype by implementing an interface. The list of implemented interfaces is directly available via Mixins::ImplementsInterfaces#implemented_interfaces (no need to iterate over the full list of types on schema_def_state).
  • For object_type, there are two ways for it to be a member of a supertype: it can implement an interface (and use Mixins::ImplementsInterfaces#implemented_interfaces). In addition, it can be a member of a type union. In that case there's no direct list of types available because the relationship is defined on the union type. But we can implement a map that provides a direct lookup, using a technique we use elsewhere. Take a look at implementations_by_interface_ref -- this is used when implements is called to provide a reverse lookup, which we then use in InterfaceType#resolve_subtypes to quickly return the list of subtypes with no need to iterate over the universe of types. We can do a similar thing by tracking union_supertypes_by_subtype_ref.

I'm not yet sure if this logic should be defined with a new HasSupertypes mixin, or what but there's a seed of an idea here that would simplify things and make things more efficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction this takes us results in an API that's misleading:

  • Some types that return false from indexed? wind up being indexed into a parent index definition.
  • It's not very clear from the names how resolved_index_def and index_def differ.
  • More generally, there's a lack of precision around what it means for a type to be indexed.

I think it would help things if we did a prep refactoring before this PR which just renamed things to pave the way for more intuitive, precise naming. Some ideas:

  • The existing index_def could be renamed to own_index_def which more clearly communicates that it's nil when a type is inheriting an index definition from an abstract parent.
  • indexed? could be split
    • For cases which only care about own_index_def, we can offer has_own_index_def?
    • For cases which care about whether the type winds up being a "root" document type vs a type embedded in other object types, we can offer root_document_type?. For example, Mixins::HasSubtypes#indexed? currently is implemented according to the root_document_type? semantic (it's true if it lives at the document root in OpenSearch), where some spots in the code that check indexed? probably only care about has_own_index_def?.

Resolving this naming ambiguity/confusion by preemptively renaming things in a new PR would help make this PR easier to reason about.

@marcdaniels-toast marcdaniels-toast changed the title Fix issue #1029: Support transitive indexing through abstract types Fix issue #1029: Support indexing through abstract types Mar 3, 2026
@marcdaniels-toast marcdaniels-toast marked this pull request as draft March 3, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Defining an index on an abstract type doesn't work correctly

3 participants