feat(graphile-search): @searchConfig, chunk querying, validation, integration tests & codegen docs (Phases D-E, H-J)#851
Merged
pyramation merged 3 commits intomainfrom Mar 19, 2026
Conversation
…t chunk querying Phase D: Per-table search score customization via @searchConfig smart tag - Read @searchConfig from codec.extensions.tags (written by DataSearch/DataFullTextSearch/DataBm25) - Per-table weights override global searchScoreWeights - Configurable normalization strategy (linear vs sigmoid) - Recency boost with configurable field and decay rate - Extracted normalizeScore() and applyRecencyBoost() as reusable helpers Phase E: Transparent chunk querying in pgvector adapter - Detect @hasChunks smart tag on codecs with chunk table metadata - Generate chunk-aware SQL using LEAST(parent_distance, closest_chunk_distance) - Added includeChunks field to VectorNearbyInput (default true for chunk tables) - enableChunkQuerying option on PgvectorAdapterOptions (default true) Tests: 13 new unit tests covering chunk detection, filter generation, and config parsing
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…docs (Phases I+J+H) Phase I: Validate boost_recency_field exists on codec before injecting into SQL. Gracefully disables recency boost with console.warn if field is missing. Phase J: Integration tests for @searchConfig and @hasChunks smart tags. Custom makeTestSmartTagsPlugin injects tags programmatically during schema build. Tests cover per-table weights, recency boost, chunk-aware queries, and validation. Phase H: Enhance codegen docs for chunk-aware embedding tables. Detects VectorNearbyInput.includeChunks in TypeRegistry and adds chunk-aware search documentation to embedding field groups in generated docs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds several features to
graphile-searchand enhances codegen docs, spanning Phases D, E, H, I, and J:Phase D — Per-table
@searchConfigsmart tag: The compositesearchScorefield now reads per-table configuration fromcodec.extensions.tags.searchConfig(written byDataSearch/DataFullTextSearch/DataBm25in constructive-db PR #622). Per-table config overrides the globalsearchScoreWeights. Supports configurable normalization strategy (linearvssigmoid) and a recency boost (exponential decay). The previously duplicated normalization logic was extracted into reusablenormalizeScore()andapplyRecencyBoost()helpers, and the dual unweighted/weighted code paths were unified into a single weighted-average path (defaulting weight=1).Phase E — Transparent chunk querying: The pgvector adapter now reads
@hasChunkssmart tags from codec extensions. When present,buildFilterApplygenerates aLEAST(parent_distance, closest_chunk_distance)subquery that transparently searches both parent and chunks tables. A newincludeChunksboolean field is added toVectorNearbyInput(defaults to true when chunks exist; set false to skip chunk search).Phase I — Schema-time validation of recency field: If
@searchConfigspecifiesboost_recency_fieldbut that column doesn't exist on the table's codec attributes, recency boost is now disabled gracefully with aconsole.warninstead of crashing at query time with a column-not-found error.Phase J — Integration tests for
@searchConfigand@hasChunks: Newsearch-config-integration.test.tswith tests against a real PostgreSQL database. A custommakeTestSmartTagsPlugininjects JSON smart tags programmatically during schema build (since@searchConfigand@hasChunksare JSON objects that can't be set via SQLCOMMENT ON). Tests cover: per-table weights, recency boost, combined search, sigmoid normalization, chunk-aware querying (LEASTof parent + chunks),includeChunks: falsetoggle, and validation of missing recency fields.Phase H — Codegen docs for chunk-aware tables:
categorizeSpecialFields()indocs-utils.tsnow checks theTypeRegistryfor aVectorNearbyInputtype with anincludeChunksfield. When present, generated docs for tables with embedding fields mention chunk-aware search capability and theincludeChunksoption.Updates since last revision (auto-review feedback)
Addressed all auto-review feedback from initial Phases D-E:
applyRecencyBoostnow accepts the recency value directly instead of trying to read fromrow[fieldName]. The recency field is injected into the SQL SELECT via$select.selectAndReturnIndex()and accessed by numeric index at runtime (matching how score values are retrieved).parentPkFieldtoChunksInfo(read from@hasChunkstag'sparentPk, defaults to'id'). No longer hardcoded.chunksSchematoChunksInfo. Resolves via: explicitchunksSchemain tag → parent codec'spg.schemaName→ unqualified. Chunk query buildsschema.tablereference when schema is present.sigmoidstrategy forces sigmoid for ALL adapters (bounded and unbounded).linear(default) uses linear for known-range adapters and sigmoid as fallback for unbounded. Documented in docstring.sqlto destructuring frombuildinGraphQLObjectType_fieldshook (was causing TS2304 compilation error).Updates since last revision (Phases H, I, J)
plugin.tsnow checkscodec.attributes[boostRecencyField]at schema build time. If the field doesn't exist, emitsconsole.warnand setsboostRecent = false.search-config-integration.test.ts(~630 lines) with 6 test suites running against real PostgreSQL. Uses a custom Graphile plugin to inject JSON smart tags on test codecs during theinithook.hasIncludeChunksCapability()helper that inspectsTypeRegistryforVectorNearbyInput.includeChunks. When detected, the embeddingSpecialFieldGroupdescription includes chunk-aware search documentation in generated README/AGENTS/skills docs.Review & Testing Checklist for Human
VectorNearbyInputhasincludeChunksin theTypeRegistry, but this is a schema-wide type — ALL tables with embedding fields will show chunk-aware documentation, including tables that don't actually have a@hasChunkstag. Verify whether this is acceptable or if per-table detection is needed.SELECT MIN(chunk.embedding <=> vector) FROM chunks WHERE chunks.parent_fk = parent.pksubquery runs per parent row. On tables with many chunks per parent, this relies entirely on proper indexing (HNSW/IVFFlat on the chunks table). Confirm thatembedding_chunksauto-creation in PR Devin/1768714819 smtp env refactor #619 creates the needed index.1 / (1 + Math.abs(score))formula maps larger absolute values to lower normalized scores. If BM25 actually returns positive distance-like scores (lower = better), the math is correct but the comment is wrong. Verify the actual score polarity from the BM25 adapter.makeTestSmartTagsPluginmutates codec extensions in theinithook. This works but is a non-standard way to apply smart tags — in production, tags come from SQLCOMMENT ONormetaschema_public.table.tags. Verify the injection happens early enough that all downstream plugins see the tags consistently.Suggested test plan:
pnpm testingraphile/graphile-searchto execute both unit and integration tests.DataSearch+DataEmbedding+embedding_chunksfrom PRs Devin/1768714819 smtp env refactor #619/fix(codegen): use correct deleted{Entity}NodeId field in delete mutation #622, then run codegen and inspect the generated README for embedding fields — confirm the chunk-aware description appears whenincludeChunksis present.@searchConfigwithboost_recency_field: "nonexistent"— verifyconsole.warnfires and no query-time crash.Notes
DataFullTextSearch,DataBm25, andDataSearchnode types that write the@searchConfigsmart tag.@hasChunkssmart tag structure expected:{ chunksTable, chunksSchema?, parentFk?, parentPk?, embeddingField? }with defaults forparentFk(parent_id),parentPk(id), andembeddingField(embedding). Schema defaults to parent codec'spg.schemaNameif not explicit.DataPostGISnode type) is in a separate constructive-db PR #623.Link to Devin session: https://app.devin.ai/sessions/e57604f3fc7c4e3d87e78c75a00cca23
Requested by: @pyramation