Schema support for inherited indexes#1067
Schema support for inherited indexes#1067marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
Conversation
Allow concrete types to inherit index definitions from parent abstract types (unions/interfaces). Types can now be indexed without defining their own index by inheriting from an indexed parent, enabling mixed-type indices where multiple types share a single datastore index. Key changes: - Add own_or_inherited_index_def() to resolve inherited indices from parent types - Add inherits_index?() predicate for types in mixed-type indices - Add recursively_resolve_supertypes() to traverse type hierarchy - Add requires_typename_for_mixed_index runtime metadata field - Inject __typename into data_params for mixed-type indices - Add comprehensive test coverage including transitive inheritance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When an abstract type (union/interface) has an index, concrete subtypes inherit that index. The event envelope type enum should include these concrete types (which will actually be indexed) rather than just types with their own direct index definition. Changes: - Use own_or_inherited_index_def instead of root_document_type? to determine which types belong in the JSON schema type enum - Rename variable from root_document_type_names to indexed_type_names for clarity - Add test coverage for event envelope type enum with transitive indexing, including a mixed scenario where some subtypes inherit an index and others have their own index
myronmarston
left a comment
There was a problem hiding this comment.
I'm not done with my review (notably, I haven't reviewed the specs yet) but I wanted to submit what I have so far.
| # | ||
| # @return [Indexing::Index, nil] the index definition, or nil if this type has no index | ||
| # @raise [Errors::SchemaError] if this type is a subtype of multiple indexed abstract types | ||
| def own_or_inherited_index_def |
There was a problem hiding this comment.
Can we just call this index_def? I don't think the name needs to reveal the possible sources of where the index could be defined. To the caller, the fact that the index definition may be inherrited from the supertype should be an implementation detail not hinted at by the API, similar to an override fun in kotlin.
Also, part of the motivation for my prior suggestion to rename the old index_def to own_index_def was to free up index_def as the name we could use for this.
There was a problem hiding this comment.
Ah, I see. Makes sense!
| indexed_parents = recursively_resolve_supertypes.select do |supertype| | ||
| supertype.own_index_def | ||
| end |
There was a problem hiding this comment.
| indexed_parents = recursively_resolve_supertypes.select do |supertype| | |
| supertype.own_index_def | |
| end | |
| indexed_parents = recursively_resolve_supertypes.select(&:has_own_index_def?) |
- I like the readability/terseness of the
(&:method_name)form for this kind of thing. (And, IIRC, there are some optimizations in Ruby for it, where it avoids creating the full block context since it doesn't need it). - While it works to filter on
own_index_defdue to Ruby's truthiness semantics,has_own_index_def?is a bit more explicit and exists to provide a predicate we can use for cases like this.
There was a problem hiding this comment.
makes sense. I'll make this change.
| # @return [Boolean] true if this type inherits an index from a parent abstract type (union/interface). | ||
| # When true, the type may share an index with other types (a "mixed-type index"). | ||
| # @private | ||
| def inherits_index? |
There was a problem hiding this comment.
This method name feels a bit like a codesmell to me. In a statically typed language like kotlin, this would be a bit like:
abstract class Part {
val color: String = "white" // default color
abstract val inheritsColor: Boolean
}
class MechanicalPart {
override val color: String = "red"
override val inheritsColor: Boolean = false
}
class ElectricalPart {
override val inheritsColor: Boolean = true
}It doesn't feel right to expose whether or not the implementation of color is inherited. Likewise, it feels wrong to me to expose a predicate here indicating if the index is inherited.
WDYT about calling it needs_typename? instead?
There was a problem hiding this comment.
Yeah I agree with your suggestion to use needs_typename now that you point out this awkwardness. I think I had that initially but changed it because at the time I thought it was wrong that this schema API's method name indicated a concern of indexing (adding typename). But maybe the "needing typename" concept is more of a universal concept.
| # For types with `own_index_def`, returns true. For abstract types with indexed subtypes, overridden in {HasSubtypes}. | ||
| # @return [Boolean] true if this type is queryable at the root level of the GraphQL schema (i.e., has direct `Query` fields). | ||
| def root_document_type? | ||
| has_own_index_def? |
There was a problem hiding this comment.
I think this should be updated so that it returns true for a concrete type that inherits an index from an abstract supertype--such a type is still a root document type, right?
| has_own_index_def? | |
| own_or_inherited_index_def != nil |
...or this could be index_def != nil if you follow my suggested renaming on that method :).
There was a problem hiding this comment.
Here's where I landed on root_document_type? semantics:
Example schema:
interface_type "Store" do |t|
t.field "id", "ID!"
t.field "name", "String!"
t.index "stores"
end
object_type "OnlineStore" do |t|
t.implements "Store"
# Inherits "stores" index
end
object_type "PhysicalStore" do |t|
t.implements "Store"
t.index "physical_stores" # Own index
endI'm treating root_document_type? to specifically mean "has GraphQL Query fields," not "is indexed in the datastore." OnlineStore is indexed (via the inherited stores index) but is not queryable at the root level; it's only accessible through the polymorphic stores root document type. Different parts of the system need to distinguish between these concerns, which is why we have root_document_type? for Query field generation and own_or_inherited_index_def? (which I'll rename to has_index_def? per your suggestion above) for determining what's actually indexed.
This is why I simplified the doc for root_document_type? to this:
# @return [Boolean] true if this type is queryable at the root level of the GraphQL schema (i.e., has direct `Query` fields).
def root_document_type?If we changed root_document_type? to mean "is indexed", we'd need to add filtering logic in four places to distinguish types that should have Query fields from types that shouldn't:
- Main Query field generation
- Sort order enum generation
- Apollo
@keydirective - Apollo _entities field exposure
The current semantic keeps Query-related logic simple (just check root_document_type?) at the cost of slightly more complex JSON schema type enum logic (one place). The alternative would be simple JSON schema logic but complex Query logic in multiple places.
There was a problem hiding this comment.
I'm treating
root_document_type?to specifically mean "has GraphQL Query fields," not "is indexed in the datastore."
Ah, the root (no pun intended) of our misunderstanding is that I haven't been understanding root_document_type? in that way (even though you documented it to mean that--somehow my brain skipped over your docs...). I've been understanding root_document_type? to mean: does this type live at the root of an Elasticsearch/OpenSearch index?
It's worth noting that the term "root document" shows up in the Elasticsearch docs:
Effectively this aggregation can break out of the nested block structure and link to other nested structures or the root document...
...and in the OpenSearch docs:
A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form.
...whereas it does not show up anywhere in the GraphQL spec.
All that's to say: if we're going to have a method named root_document_type? I think it's semantics should be "does this type live at the root of an index?" and if we want a predicate for "is this type queryable off of the GraphQL Query type?", I think we should call it something different--maybe directly_queryable? (how does that sound?).
Apollo @key directive
Apollo _entities field exposure
I'd argue that any type which is a root document type in an Elasticsearch/OpenSearch index should be in _entities and have a @key directive, even if it's not directly queryable off of Query. Defining a type as an Apollo entity merely requires that it has a unique id and can be efficiently fetched. There's no requirement that Apollo entity types also be available off of Query (although they usually are!).
Tangentially: thanks for pushing back on my suggestion--I feel like we're getting to a better, more precise solution as a result!
There was a problem hiding this comment.
I see! I'll put this PR back in draft while I fix this. It may change a fair amount so I don't want to waste anyone's time reviewing it in this state.
| :graphql_only_return_type, | ||
| # Indicates if this type requires __typename to be added during indexing for query-time type resolution. | ||
| # True for types in mixed-type indices (types that inherit an index from a parent union/interface). | ||
| :requires_typename_for_mixed_index |
There was a problem hiding this comment.
We shouldn't need this on runtime metadata. There's a more straightforward way to determine if __typename is needed when ingesting data--we can just look at the mapping, and if it has __typename as a property at the root, then record_preparer.rb should populate it.
I had claude work up a basic plan for how that would work:
Remove requires_typename_for_mixed_index from runtime metadata
Context
This PR adds requires_typename_for_mixed_index to runtime metadata so record_preparer.rb can include __typename in indexed payloads for concrete types in mixed-type indices. We can avoid this new field by inferring the need from the index mapping: if __typename is a property in a type's index mapping, it should be populated during ingestion.
Plan
1. Derive types needing __typename from index mappings in RecordPreparer::Factory
File: elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb
In Factory#initialize, check each type's index mapping for a __typename property:
mappings = schema_artifacts.index_mappings_by_index_def_name
types_with_typename_in_mapping = schema_artifacts.runtime_metadata.object_types_by_name
.filter_map do |type_name, meta|
type_name if meta.index_definition_names.any? { |idx| mappings.dig(idx, "properties", "__typename") }
end.to_setPass types_with_typename_in_mapping to RecordPreparer.new alongside existing args.
2. Merge into RecordPreparer's @types_requiring_typename
File: elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb
In RecordPreparer#initialize, merge with existing JSON-schema-derived set:
def initialize(indexing_preparer_by_scalar_type_name, type_metas, types_with_typename_in_mapping)
# ...
@types_requiring_typename = type_metas.filter_map { |meta|
meta.name if meta.requires_typename
}.to_set | types_with_typename_in_mapping
endThis preserves existing behavior for union/interface types (from JSON schema required) while adding types whose index mapping includes __typename.
| # @private | ||
| def recursively_resolve_supertypes | ||
| schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes | ||
| end |
There was a problem hiding this comment.
Would it make sense for recursively_resolve_supertypes to return a set instead of an array? Then you wouldn't have to coerce it to an array (with .to_a) and conceptually, it's a set (order doesn't matter and we don't want duplicates).
| union_types = schema_def_state.union_types_by_member_ref[type_ref] # : ::Set[UnionType] | ||
| union_types << self |
There was a problem hiding this comment.
| union_types = schema_def_state.union_types_by_member_ref[type_ref] # : ::Set[UnionType] | |
| union_types << self | |
| schema_def_state.union_types_by_member_ref[type_ref] << self |
| # @return [Array<InterfaceType>] list of interface types this type implements | ||
| def resolve_interface_supertypes | ||
| implemented_interfaces.flat_map do |interface_ref| | ||
| interface = schema_def_state.types_by_name[interface_ref.name] |
There was a problem hiding this comment.
| interface = schema_def_state.types_by_name[interface_ref.name] | |
| interface = interface_ref.resolved |
Type refs offer a .resolved API for resolving them. (Internally it does the exact same lookup as you've done here!).
| # Add __typename to data_params for types in mixed-type indices. | ||
| # RecordPreparer will add __typename during indexing and we need to include it in data_params so it gets indexed. | ||
| if inherits_index? | ||
| data_params["__typename"] = SchemaArtifacts::RuntimeMetadata::DynamicParam.new(source_path: "__typename", cardinality: :one) |
There was a problem hiding this comment.
As I commented elsewhere I think you're missing the inclusion of __typename in indexing_fields_by_name. Since data_params is built from indexing_fields_by_name then data_params will pick it up automatically.
| # @private | ||
| def recursively_resolve_supertypes | ||
| resolve_interface_supertypes | ||
| end |
There was a problem hiding this comment.
It's a bit confusing that recursively_resolve_supertypes and resolve_interface_supertypes are paralleled like this. We can leverage inheritance and super to simplify:
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
index e87b5b74..b6cb2659 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
@@ -118,14 +118,12 @@ module ElasticGraph
generate_sdl(name_section: name_section, &field_arg_selector)
end
- private
-
# Returns all interface types that this type implements, including ancestor interfaces.
#
# @return [Array<InterfaceType>] list of interface types this type implements
- def resolve_interface_supertypes
+ def recursively_resolve_supertypes
implemented_interfaces.flat_map do |interface_ref|
- interface = schema_def_state.types_by_name[interface_ref.name]
+ interface = interface_ref.resolved
[interface] + interface.recursively_resolve_supertypes
end
end
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
index 47129b2b..7e314979 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
@@ -61,14 +61,6 @@ module ElasticGraph
__getobj__.graphql_fields_by_name
end
- # Returns all interface types that this interface implements, including ancestor interfaces.
- #
- # @return [Array<InterfaceType>] list of interface types this interface implements
- # @private
- def recursively_resolve_supertypes
- resolve_interface_supertypes
- end
-
private
def resolve_subtypes
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
index a9ae64d3..4cea0fd6 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
@@ -44,7 +44,7 @@ module ElasticGraph
# @return [Array<UnionType, InterfaceType>] list of supertypes
# @private
def recursively_resolve_supertypes
- schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes
+ schema_def_state.union_types_by_member_ref[type_ref].to_a + super
end
# @private...although I'd actually take it a step further and just fully implement it in ImplementsInterfaces:
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
index e87b5b74..1e1eb3b6 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
@@ -118,16 +118,14 @@ module ElasticGraph
generate_sdl(name_section: name_section, &field_arg_selector)
end
- private
-
# Returns all interface types that this type implements, including ancestor interfaces.
#
# @return [Array<InterfaceType>] list of interface types this type implements
- def resolve_interface_supertypes
+ def recursively_resolve_supertypes
implemented_interfaces.flat_map do |interface_ref|
- interface = schema_def_state.types_by_name[interface_ref.name]
+ interface = interface_ref.resolved
[interface] + interface.recursively_resolve_supertypes
- end
+ end + schema_def_state.union_types_by_member_ref[type_ref].to_a
end
end
end
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
index 47129b2b..7e314979 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
@@ -61,14 +61,6 @@ module ElasticGraph
__getobj__.graphql_fields_by_name
end
- # Returns all interface types that this interface implements, including ancestor interfaces.
- #
- # @return [Array<InterfaceType>] list of interface types this interface implements
- # @private
- def recursively_resolve_supertypes
- resolve_interface_supertypes
- end
-
private
def resolve_subtypes
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
index a9ae64d3..c920ab6b 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
@@ -38,15 +38,6 @@ module ElasticGraph
include Mixins::ImplementsInterfaces
include Mixins::HasReadableToSAndInspect.new { |t| t.name }
- # Returns all supertypes of this object type, including union memberships
- # and interface ancestors.
- #
- # @return [Array<UnionType, InterfaceType>] list of supertypes
- # @private
- def recursively_resolve_supertypes
- schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes
- end
-
# @private
def initialize(schema_def_state, name)
field_factory = schema_def_state.factory.method(:new_field)While schema_def_state.union_types_by_member_ref[type_ref] should always return an empty set when type_ref is an interface type (since interfaces cannot be members of type unions), it causes no harm to include that, and simplifies things. Plus, I view "interfaces cannot be members of type unions" simply as a current restriction based on the latest GraphQL spec. That might change in the future, and conceptually...if a interface was a member of a type union, we'd want this to include it.
Overview
First of three PRs implementing index inheritance (Issue #1029). Adds schema definition infrastructure to allow concrete types to inherit index definitions from parent abstract types (unions/interfaces), enabling mixed-type indices where multiple types share a single datastore index.
Problem
Currently every indexed type must define its own index. This PR enables patterns like:
Both
OnlineStoreandMobileStoredocuments live in the samestoresindex, requiring__typenamefor query-time type resolution.Key Changes
Schema Definition:
own_or_inherited_index_def()- resolves inherited indices from parent typesinherits_index?()- predicate for types in mixed-type indicesrecursively_resolve_supertypes()- traverses type hierarchy (including transitive interface inheritance)self_update_target()- injects__typenameinto data_params for mixed-type indicesown_or_inherited_index_definstead ofroot_document_type?to ensure transitively indexed types appear in event envelopeSchema Artifacts:
requires_typename_for_mixed_indexfield to runtime metadata ObjectType__typename(next PR)Validation:
Testing
index_inheritance_spec.rbwith comprehensive Store types example__typenameinjection, transitive inheritance, validationWhat's Not Included
This PR is infrastructure only:
Deferred to PR2 (Indexer) and PR3 (Schema + GraphQL + Acceptance).
Impact
Related