Conversation
Implementation PlanAnalysisIssue 108 adds a semantic codebase search tool. Eight design decisions are already finalized: local-only embeddings (ONNX), AST-aware chunking (tree-sitter), project-scoped index ( Key discovery — zero native addons possible: The original assessment flagged native addon friction as a top risk (
This keeps dreb's zero native addon status intact. The search tool is feature-gated on Deliverables
Acceptance Criteria
Files to CreateSearch subsystem (
Tool definition:
Files to Modify
Testing ApproachTest files in
Test infrastructure:
Risks and Open Questions
Plan created by mach6 |
Add built-in `search` tool using embeddings + FTS5 for natural language queries over the codebase. Zero native addons — uses node:sqlite (Node 22), web-tree-sitter (WASM), and @huggingface/transformers (WASM/native). Search subsystem (src/core/search/): - AST-aware code chunking via tree-sitter (TS/JS/Python/Go/Rust/Java/C/C++) - Format-aware text chunking (markdown/YAML/JSON/TOML/plaintext) - nomic-embed-text-v1.5 embeddings with auto-download and caching - SQLite-backed index with FTS5 + vector storage - POEM/TFPR multi-metric ranking (6 metrics, column duplication weighting) - Incremental mtime-based re-indexing - File scanner respecting .gitignore with memory file inclusion Metrics: FTS5 BM25, vector cosine, path match, symbol match, import graph proximity, git recency. Query classifier biases ranking toward relevant metrics (identifiers -> BM25+symbol, natural language -> cosine, paths -> path). Feature-gated on node:sqlite — tool not registered on Node <22, no crash. Tests: 109 new tests across 6 test files (poem, db, vector-store, tree-sitter-chunker, search-tool, text-chunker).
Progress UpdateImplemented: Full semantic search subsystem29 files changed — 13,545 additions across 21 source files and 6 test files. Search subsystem (
|
nomic-embed-text-v1.5 in fp32 was 522MB and caused OOM on indexing. all-MiniLM-L6-v2 is 23MB, 384-dim, fast native inference via onnxruntime-node. First-index time for dreb repo: ~3 min (671 files, 6474 chunks). Subsequent queries: ~6s (includes model load; would be instant with warm model). Also made prefix handling model-aware (nomic needs search_document:/search_query: prefixes, most models don't).
The search tool was registered via createAllToolDefinitions() but wasn't in the hardcoded defaultActiveToolNames array in agent-session.ts. Now conditionally included when the tool is registered (node:sqlite available).
The actual default tool list is in sdk.ts, not agent-session.ts. sdk.ts builds initialActiveToolNames from defaultActiveToolNames + alwaysActiveBuiltins, which is what gets passed to AgentSession. The agent-session.ts list is a fallback that's only used when no initialActiveToolNames is provided (SDK bypass).
Progress UpdateFixes since initial implementationModel switch: Replaced nomic-embed-text-v1.5 (522MB fp32, OOM on indexing) with all-MiniLM-L6-v2 (23MB, 384-dim). First-index time for dreb repo: ~3 min (671 files, 6474 chunks). Subsequent queries: ~6s including model load. Tool registration fix: The search tool was being created by Verified with Commits:
Progress tracked by mach6 |
Code ReviewCriticalFinding 1: Non-atomic per-file indexing — Finding 2: Failed tree-sitter WASM init caches the rejected promise — AST chunking silently disabled forever Finding 3: SQLite connection leaked per search invocation Finding 4: Finding 5: ImportantFinding 6: Model cached per-project instead of shared Finding 7: Finding 8: Tree-sitter parser WASM memory leaked when Finding 9: Finding 10: Fragile Finding 11: Home dir scanning would be a recursive nightmare — needs shallow scan mode Finding 12: GDScript not supported despite being a listed acceptance criterion Finding 13: Major test coverage gaps — 4 critical modules entirely untested
SuggestionsFinding 14: Stale comments in Finding 15: Dead YAML preamble code block in Finding 16: Finding 17: Redundant Finding 18: Finding 19: Finding 20: Duplicated comment-lookback logic in TOML chunker Strengths
Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker, simplifier Reviewed by mach6 |
Review AssessmentClassifications
Action Plan
Assessment by mach6 |
… coverage Critical fixes: - Wrap per-file indexing in db.transaction() for atomicity (finding 1) - Reset tree-sitter initPromise on failure to allow retries (finding 2) - Fix SQLite connection leak in search tool stats (finding 3) - Replace per-file execSync with single git log call (finding 4) - Batch getChunksById to avoid SQLite bind variable limit (finding 5) Metric fixes (all 6 now functional): - BM25: use OR between terms + stopword removal (was implicit AND) - Import graph: fix path matching with extension stripping (0/1253 matched) - Import graph: add self-boost for connected seeds, threshold seed set - Show all non-zero metrics in results (was limited to top 3) Important fixes: - Model cache uses ~/.dreb/agent/models/ (shared, not per-project) - Fix parser WASM leak on null tree - Delete dead metrics/cosine.ts - Add shallow scan mode for home directory - Add .gitignore entries for .dreb/index/ and .dreb/agent/ Nitpick cleanup: - Fix stale embedder comments (nomic→MiniLM, 768→384) - Remove dead YAML preamble block, deduplicate TOML lookback - Replace blobToFloat32 with shared unpackVector - Use db.transaction() in batchUpsertEmbeddings - Remove redundant targetNode variable, deduplicate toPosix call Tests: +125 new tests (scanner 43, metrics 29, chunker 29, index-manager 24)
Progress UpdateReview findings fixed + all 6 POEM metrics now functional16 files changed — 1,812 additions, 220 deletions across 12 source files and 4 new test files. Critical fixes (findings 1-5)
Metric fixes — all 6 metrics now producing scores
Important fixes (findings 6-12)
Nitpick cleanup (findings 14-20)
New tests (+125)
Total test count: 210 search tests (85 existing + 125 new), 512 project-wide Commit: Progress tracked by mach6 |
Code ReviewCriticalNo critical findings. ImportantFinding 1: Finding 2: Two separate Finding 3: Finding 4: Bare Finding 5: Finding 6: Documentation not updated for the new
Finding 7: Major test coverage gaps — orchestration layer and 2 of 6 metrics untested
SuggestionsFinding 8: Finding 9: Global memory index architecture diverges from spec — no Finding 10: Finding 11: Dynamic Finding 12: Dead parameter Finding 13: Finding 14: Finding 15: Finding 16: Strengths
Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker, simplifier Reviewed by mach6 |
Review AssessmentClassifications
Action PlanMust fix before merge:
Should fix in this PR:
Defer (create issues):
Assessment by mach6 |
- Update docs: 10→11 built-in tools, add search to all tool lists (README.md, extensions.md, sdk.ts JSDoc) - Cap git log with --max-count=10000 in git-recency metric - Replace dynamic import with static import for vector-store in search.ts - Add tests for computeImportGraphScores (8), computeBm25Scores (5), getChunksById batching (7), SearchEngine.search() integration (11)
Progress UpdateReview round 2 fixes — docs, tests, git-recency cap, static import8 files changed — 678 additions, 10 deletions. Documentation (finding 6)
Bug fixes
New tests — 31 tests added (241 total search tests, 512 project-wide)
Commit: Progress tracked by mach6 |
Code Review (Round 3)CriticalNo critical findings. ImportantFinding 1: Finding 2: Finding 3: Finding 4: Finding 5: Finding 6: Broken Finding 7: Search tool Finding 8: Six of nine tree-sitter languages completely untested Finding 9: Finding 10: File update path does not verify stale chunk/embedding elimination Finding 11: SuggestionsFinding 12: Gap chunks ( Finding 13: Finding 14: Finding 15: Dead Finding 16: Redundant Finding 17: Finding 18: Finding 19: Module-level Strengths
Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker, simplifier Reviewed by mach6 |
Review AssessmentClassifications
Action PlanCorrectness bugs (fix first):
Resource/performance:
Documentation:
Test coverage:
Assessment by mach6 |
… scanner paths, error handling, docs, tests
Progress UpdateReview round 3 fixes — 5 bugs, docs, 35 new tests12 files changed — 770 additions, 85 deletions across 7 source files and 5 test files. Bug fixes (findings 1-5)
Documentation (finding 6)
New tests — 35 tests added (276 search tests, 547 project-wide)
Commit: Progress tracked by mach6 |
Closes #108
Adds a built-in
searchtool that uses embeddings + FTS5 to support natural language queries over the codebase. Uses POEM-based multi-metric ranking (FTS5 BM25, vector cosine, path match, symbol match, import graph, git recency) with AST-aware code chunking via tree-sitter.Key design: zero native addons — uses
node:sqlite(built-in Node 22),web-tree-sitter(WASM), and@huggingface/transformers(WASM).Implementation plan posted as a comment below.