Skip to content

Schema support for inherited indexes#1067

Draft
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:mdaniels/issue-1029/schema-definition-artifacts-infrastructure
Draft

Schema support for inherited indexes#1067
marcdaniels-toast wants to merge 2 commits intoblock:mainfrom
marcdaniels-toast:mdaniels/issue-1029/schema-definition-artifacts-infrastructure

Conversation

@marcdaniels-toast
Copy link
Contributor

Overview

First of three PRs implementing index inheritance (Issue #1029). Adds schema definition infrastructure to allow concrete types to inherit index definitions from parent abstract types (unions/interfaces), enabling mixed-type indices where multiple types share a single datastore index.

Problem

Currently every indexed type must define its own index. This PR enables patterns like:

schema.interface_type "Store" do |t|
  t.field "id", "ID!"
  t.index "stores"  # Shared index
end

schema.object_type "OnlineStore" do |t|
  t.implements "Store"
  # Inherits "stores" index from Store
end

schema.object_type "MobileStore" do |t|
  t.implements "Store"
  # Also inherits "stores" index
end

Both OnlineStore and MobileStore documents live in the same stores index, requiring __typename for query-time type resolution.

Key Changes

Schema Definition:

  • Added own_or_inherited_index_def() - resolves inherited indices from parent types
  • Added inherits_index?() - predicate for types in mixed-type indices
  • Added recursively_resolve_supertypes() - traverses type hierarchy (including transitive interface inheritance)
  • Updated self_update_target() - injects __typename into data_params for mixed-type indices
  • Fixed JSON schema type enum generation - uses own_or_inherited_index_def instead of root_document_type? to ensure transitively indexed types appear in event envelope

Schema Artifacts:

  • Added requires_typename_for_mixed_index field to runtime metadata ObjectType
  • This will be consumed by indexer to determine when to inject __typename (next PR)

Validation:

  • Concrete types can implement multiple abstract types, but at most one may have an index (prevents ambiguity)

Testing

  • New index_inheritance_spec.rb with comprehensive Store types example
  • Tests cover: index resolution, __typename injection, transitive inheritance, validation

What's Not Included

This PR is infrastructure only:

  • ❌ No Store types added to widgets.rb yet
  • ❌ No schema artifacts regenerated
  • ❌ No indexer changes (RecordPreparer)
  • ❌ No GraphQL changes
  • ❌ No acceptance tests

Deferred to PR2 (Indexer) and PR3 (Schema + GraphQL + Acceptance).

Impact

  • ✅ 100% backward compatible
  • ✅ No visible behavior changes
  • ✅ Ready for PR2

Related

marcdaniels-toast and others added 2 commits March 8, 2026 16:53
Allow concrete types to inherit index definitions from parent abstract types (unions/interfaces). Types can now be indexed without defining their own index by inheriting from an indexed parent, enabling mixed-type indices where multiple types share a single datastore index.

Key changes:
- Add own_or_inherited_index_def() to resolve inherited indices from parent types
- Add inherits_index?() predicate for types in mixed-type indices
- Add recursively_resolve_supertypes() to traverse type hierarchy
- Add requires_typename_for_mixed_index runtime metadata field
- Inject __typename into data_params for mixed-type indices
- Add comprehensive test coverage including transitive inheritance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When an abstract type (union/interface) has an index, concrete subtypes
inherit that index. The event envelope type enum should include these
concrete types (which will actually be indexed) rather than just types
with their own direct index definition.

Changes:
- Use own_or_inherited_index_def instead of root_document_type? to
  determine which types belong in the JSON schema type enum
- Rename variable from root_document_type_names to indexed_type_names
  for clarity
- Add test coverage for event envelope type enum with transitive indexing,
  including a mixed scenario where some subtypes inherit an index and
  others have their own index
Copy link
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done with my review (notably, I haven't reviewed the specs yet) but I wanted to submit what I have so far.

#
# @return [Indexing::Index, nil] the index definition, or nil if this type has no index
# @raise [Errors::SchemaError] if this type is a subtype of multiple indexed abstract types
def own_or_inherited_index_def
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call this index_def? I don't think the name needs to reveal the possible sources of where the index could be defined. To the caller, the fact that the index definition may be inherrited from the supertype should be an implementation detail not hinted at by the API, similar to an override fun in kotlin.

Also, part of the motivation for my prior suggestion to rename the old index_def to own_index_def was to free up index_def as the name we could use for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Makes sense!

Comment on lines +285 to +287
indexed_parents = recursively_resolve_supertypes.select do |supertype|
supertype.own_index_def
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indexed_parents = recursively_resolve_supertypes.select do |supertype|
supertype.own_index_def
end
indexed_parents = recursively_resolve_supertypes.select(&:has_own_index_def?)
  • I like the readability/terseness of the (&:method_name) form for this kind of thing. (And, IIRC, there are some optimizations in Ruby for it, where it avoids creating the full block context since it doesn't need it).
  • While it works to filter on own_index_def due to Ruby's truthiness semantics, has_own_index_def? is a bit more explicit and exists to provide a predicate we can use for cases like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I'll make this change.

# @return [Boolean] true if this type inherits an index from a parent abstract type (union/interface).
# When true, the type may share an index with other types (a "mixed-type index").
# @private
def inherits_index?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name feels a bit like a codesmell to me. In a statically typed language like kotlin, this would be a bit like:

abstract class Part {
  val color: String = "white" // default color

  abstract val inheritsColor: Boolean
}

class MechanicalPart {
  override val color: String = "red"
  override val inheritsColor: Boolean = false
}

class ElectricalPart {
  override val inheritsColor: Boolean = true
}

It doesn't feel right to expose whether or not the implementation of color is inherited. Likewise, it feels wrong to me to expose a predicate here indicating if the index is inherited.

WDYT about calling it needs_typename? instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with your suggestion to use needs_typename now that you point out this awkwardness. I think I had that initially but changed it because at the time I thought it was wrong that this schema API's method name indicated a concern of indexing (adding typename). But maybe the "needing typename" concept is more of a universal concept.

# For types with `own_index_def`, returns true. For abstract types with indexed subtypes, overridden in {HasSubtypes}.
# @return [Boolean] true if this type is queryable at the root level of the GraphQL schema (i.e., has direct `Query` fields).
def root_document_type?
has_own_index_def?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be updated so that it returns true for a concrete type that inherits an index from an abstract supertype--such a type is still a root document type, right?

Suggested change
has_own_index_def?
own_or_inherited_index_def != nil

...or this could be index_def != nil if you follow my suggested renaming on that method :).

Copy link
Contributor Author

@marcdaniels-toast marcdaniels-toast Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I landed on root_document_type? semantics:

Example schema:

  interface_type "Store" do |t|
    t.field "id", "ID!"
    t.field "name", "String!"
    t.index "stores"
  end

  object_type "OnlineStore" do |t|
    t.implements "Store"
    # Inherits "stores" index
  end

  object_type "PhysicalStore" do |t|
    t.implements "Store"
    t.index "physical_stores"  # Own index
  end

I'm treating root_document_type? to specifically mean "has GraphQL Query fields," not "is indexed in the datastore." OnlineStore is indexed (via the inherited stores index) but is not queryable at the root level; it's only accessible through the polymorphic stores root document type. Different parts of the system need to distinguish between these concerns, which is why we have root_document_type? for Query field generation and own_or_inherited_index_def? (which I'll rename to has_index_def? per your suggestion above) for determining what's actually indexed.

This is why I simplified the doc for root_document_type? to this:

  # @return [Boolean] true if this type is queryable at the root level of the GraphQL schema (i.e., has direct `Query` fields).
  def root_document_type?

If we changed root_document_type? to mean "is indexed", we'd need to add filtering logic in four places to distinguish types that should have Query fields from types that shouldn't:

  1. Main Query field generation
  2. Sort order enum generation
  3. Apollo @key directive
  4. Apollo _entities field exposure

The current semantic keeps Query-related logic simple (just check root_document_type?) at the cost of slightly more complex JSON schema type enum logic (one place). The alternative would be simple JSON schema logic but complex Query logic in multiple places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm treating root_document_type? to specifically mean "has GraphQL Query fields," not "is indexed in the datastore."

Ah, the root (no pun intended) of our misunderstanding is that I haven't been understanding root_document_type? in that way (even though you documented it to mean that--somehow my brain skipped over your docs...). I've been understanding root_document_type? to mean: does this type live at the root of an Elasticsearch/OpenSearch index?

It's worth noting that the term "root document" shows up in the Elasticsearch docs:

Effectively this aggregation can break out of the nested block structure and link to other nested structures or the root document...

...and in the OpenSearch docs:

A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form.

...whereas it does not show up anywhere in the GraphQL spec.

All that's to say: if we're going to have a method named root_document_type? I think it's semantics should be "does this type live at the root of an index?" and if we want a predicate for "is this type queryable off of the GraphQL Query type?", I think we should call it something different--maybe directly_queryable? (how does that sound?).

Apollo @key directive
Apollo _entities field exposure

I'd argue that any type which is a root document type in an Elasticsearch/OpenSearch index should be in _entities and have a @key directive, even if it's not directly queryable off of Query. Defining a type as an Apollo entity merely requires that it has a unique id and can be efficiently fetched. There's no requirement that Apollo entity types also be available off of Query (although they usually are!).

Tangentially: thanks for pushing back on my suggestion--I feel like we're getting to a better, more precise solution as a result!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I'll put this PR back in draft while I fix this. It may change a fair amount so I don't want to waste anyone's time reviewing it in this state.

:graphql_only_return_type,
# Indicates if this type requires __typename to be added during indexing for query-time type resolution.
# True for types in mixed-type indices (types that inherit an index from a parent union/interface).
:requires_typename_for_mixed_index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this on runtime metadata. There's a more straightforward way to determine if __typename is needed when ingesting data--we can just look at the mapping, and if it has __typename as a property at the root, then record_preparer.rb should populate it.

I had claude work up a basic plan for how that would work:


Remove requires_typename_for_mixed_index from runtime metadata

Context

This PR adds requires_typename_for_mixed_index to runtime metadata so record_preparer.rb can include __typename in indexed payloads for concrete types in mixed-type indices. We can avoid this new field by inferring the need from the index mapping: if __typename is a property in a type's index mapping, it should be populated during ingestion.

Plan

1. Derive types needing __typename from index mappings in RecordPreparer::Factory

File: elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb

In Factory#initialize, check each type's index mapping for a __typename property:

mappings = schema_artifacts.index_mappings_by_index_def_name
types_with_typename_in_mapping = schema_artifacts.runtime_metadata.object_types_by_name
  .filter_map do |type_name, meta|
    type_name if meta.index_definition_names.any? { |idx| mappings.dig(idx, "properties", "__typename") }
  end.to_set

Pass types_with_typename_in_mapping to RecordPreparer.new alongside existing args.

2. Merge into RecordPreparer's @types_requiring_typename

File: elasticgraph-indexer/lib/elastic_graph/indexer/record_preparer.rb

In RecordPreparer#initialize, merge with existing JSON-schema-derived set:

def initialize(indexing_preparer_by_scalar_type_name, type_metas, types_with_typename_in_mapping)
  # ...
  @types_requiring_typename = type_metas.filter_map { |meta|
    meta.name if meta.requires_typename
  }.to_set | types_with_typename_in_mapping
end

This preserves existing behavior for union/interface types (from JSON schema required) while adding types whose index mapping includes __typename.

# @private
def recursively_resolve_supertypes
schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for recursively_resolve_supertypes to return a set instead of an array? Then you wouldn't have to coerce it to an array (with .to_a) and conceptually, it's a set (order doesn't matter and we don't want duplicates).

Comment on lines +99 to +100
union_types = schema_def_state.union_types_by_member_ref[type_ref] # : ::Set[UnionType]
union_types << self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
union_types = schema_def_state.union_types_by_member_ref[type_ref] # : ::Set[UnionType]
union_types << self
schema_def_state.union_types_by_member_ref[type_ref] << self

# @return [Array<InterfaceType>] list of interface types this type implements
def resolve_interface_supertypes
implemented_interfaces.flat_map do |interface_ref|
interface = schema_def_state.types_by_name[interface_ref.name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interface = schema_def_state.types_by_name[interface_ref.name]
interface = interface_ref.resolved

Type refs offer a .resolved API for resolving them. (Internally it does the exact same lookup as you've done here!).

# Add __typename to data_params for types in mixed-type indices.
# RecordPreparer will add __typename during indexing and we need to include it in data_params so it gets indexed.
if inherits_index?
data_params["__typename"] = SchemaArtifacts::RuntimeMetadata::DynamicParam.new(source_path: "__typename", cardinality: :one)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented elsewhere I think you're missing the inclusion of __typename in indexing_fields_by_name. Since data_params is built from indexing_fields_by_name then data_params will pick it up automatically.

# @private
def recursively_resolve_supertypes
resolve_interface_supertypes
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing that recursively_resolve_supertypes and resolve_interface_supertypes are paralleled like this. We can leverage inheritance and super to simplify:

diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
index e87b5b74..b6cb2659 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
@@ -118,14 +118,12 @@ module ElasticGraph
           generate_sdl(name_section: name_section, &field_arg_selector)
         end

-        private
-
         # Returns all interface types that this type implements, including ancestor interfaces.
         #
         # @return [Array<InterfaceType>] list of interface types this type implements
-        def resolve_interface_supertypes
+        def recursively_resolve_supertypes
           implemented_interfaces.flat_map do |interface_ref|
-            interface = schema_def_state.types_by_name[interface_ref.name]
+            interface = interface_ref.resolved
             [interface] + interface.recursively_resolve_supertypes
           end
         end
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
index 47129b2b..7e314979 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
@@ -61,14 +61,6 @@ module ElasticGraph
           __getobj__.graphql_fields_by_name
         end

-        # Returns all interface types that this interface implements, including ancestor interfaces.
-        #
-        # @return [Array<InterfaceType>] list of interface types this interface implements
-        # @private
-        def recursively_resolve_supertypes
-          resolve_interface_supertypes
-        end
-
         private

         def resolve_subtypes
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
index a9ae64d3..4cea0fd6 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
@@ -44,7 +44,7 @@ module ElasticGraph
         # @return [Array<UnionType, InterfaceType>] list of supertypes
         # @private
         def recursively_resolve_supertypes
-          schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes
+          schema_def_state.union_types_by_member_ref[type_ref].to_a + super
         end

         # @private

...although I'd actually take it a step further and just fully implement it in ImplementsInterfaces:

diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
index e87b5b74..1e1eb3b6 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/mixins/implements_interfaces.rb
@@ -118,16 +118,14 @@ module ElasticGraph
           generate_sdl(name_section: name_section, &field_arg_selector)
         end

-        private
-
         # Returns all interface types that this type implements, including ancestor interfaces.
         #
         # @return [Array<InterfaceType>] list of interface types this type implements
-        def resolve_interface_supertypes
+        def recursively_resolve_supertypes
           implemented_interfaces.flat_map do |interface_ref|
-            interface = schema_def_state.types_by_name[interface_ref.name]
+            interface = interface_ref.resolved
             [interface] + interface.recursively_resolve_supertypes
-          end
+          end + schema_def_state.union_types_by_member_ref[type_ref].to_a
         end
       end
     end
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
index 47129b2b..7e314979 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/interface_type.rb
@@ -61,14 +61,6 @@ module ElasticGraph
           __getobj__.graphql_fields_by_name
         end

-        # Returns all interface types that this interface implements, including ancestor interfaces.
-        #
-        # @return [Array<InterfaceType>] list of interface types this interface implements
-        # @private
-        def recursively_resolve_supertypes
-          resolve_interface_supertypes
-        end
-
         private

         def resolve_subtypes
diff --git a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
index a9ae64d3..c920ab6b 100644
--- a/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
+++ b/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/object_type.rb
@@ -38,15 +38,6 @@ module ElasticGraph
         include Mixins::ImplementsInterfaces
         include Mixins::HasReadableToSAndInspect.new { |t| t.name }

-        # Returns all supertypes of this object type, including union memberships
-        # and interface ancestors.
-        #
-        # @return [Array<UnionType, InterfaceType>] list of supertypes
-        # @private
-        def recursively_resolve_supertypes
-          schema_def_state.union_types_by_member_ref[type_ref].to_a + resolve_interface_supertypes
-        end
-
         # @private
         def initialize(schema_def_state, name)
           field_factory = schema_def_state.factory.method(:new_field)

While schema_def_state.union_types_by_member_ref[type_ref] should always return an empty set when type_ref is an interface type (since interfaces cannot be members of type unions), it causes no harm to include that, and simplifies things. Plus, I view "interfaces cannot be members of type unions" simply as a current restriction based on the latest GraphQL spec. That might change in the future, and conceptually...if a interface was a member of a type union, we'd want this to include it.

@marcdaniels-toast marcdaniels-toast marked this pull request as draft March 10, 2026 16:11
@marcdaniels-toast marcdaniels-toast marked this pull request as draft March 10, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants