Semantic code discovery for AI agents -- find code by meaning, not just name.
Ask natural language questions about your codebase -- clewdex indexes your code with AST-aware chunking, embeds it with Voyage AI, stores it in Qdrant, and serves results through both a CLI and an MCP server that Claude Code can call directly.
pip install clewdexOr with pipx, Homebrew, or npx:
pipx install clewdex # isolated install
brew install ruminaider/tap/clewdex # macOS
npx clewdex # run without installingdocker run -d -p 6333:6333 qdrant/qdrant:v1.16.1export VOYAGE_API_KEY=pa-xxxxxxxxxxxxxxxxxxxxGet a key at dash.voyageai.com.
clew index /path/to/your/project --full
clew search "how do we handle authentication"Clew and grep are complementary tools that handle different types of code discovery:
Use clew when you:
- Need to find code by concept, not by identifier name ("where is error handling for the pharmacy API")
- Are exploring an unfamiliar codebase and don't know what to search for
- Want to trace structural relationships (call chains, inheritance, imports)
- Need vocabulary bridging -- business language to code identifiers
- Want to understand how a feature is implemented across multiple files
Use grep when you:
- Know the exact pattern you're looking for (
raise ValidationError,@celery_app.task) - Need exhaustive enumeration -- every instance of a pattern, guaranteed complete
- Are matching literal text in comments, strings, or config files
- Need structural completeness (grep finds things in places BM25 cannot reach)
Use both (via agent skills) when you:
- Need to discover a concept AND find all its instances
- Are debugging and need both semantic context and exact pattern locations
- Want to verify that clew found everything relevant
- Hybrid search -- Dense embeddings (Voyage voyage-code-3) + BM25 keyword matching fused with Reciprocal Rank Fusion, optionally re-ranked with Voyage rerank-2.5
- Multi-vector architecture -- Three named vectors (signature, semantic, body) with intent-adaptive routing for precise retrieval
- AST-aware chunking -- tree-sitter parses Python, TypeScript, and JavaScript into semantic units (functions, classes, components) with token-aware fallback splitting
- Code relationship tracing -- Extracts imports, calls, inheritance, decorators, JSX renders, test mappings, and API boundaries; traversable via BFS graph queries
- Incremental indexing -- Git-aware change detection (with file-hash fallback) so re-indexing only touches what changed
- NL descriptions -- LLM-generated descriptions for undocumented code, prepended before embedding to improve search quality
- Compact MCP responses -- ~20x token reduction by default; returns signatures + docstring previews instead of full source
- Multi-collection -- Separate
codeanddocscollections with intent-driven routing - Confidence self-assessment -- Z-score based confidence scoring included in results as informational metadata
- Explicit exhaustive mode --
--mode exhaustiveruns grep alongside semantic search for completeness when needed
| Capability | clew | grepai | CodeSight | CodeGrok | Cursor |
|---|---|---|---|---|---|
| Multi-vector search (3 vectors) | Yes | No | No | No | No |
| BM25 hybrid + RRF fusion | Yes | No | Yes | No | No |
| Reranking (calibrated scores) | Yes | No | No | No | ? |
| Intent-adaptive routing | Yes | No | No | No | No |
| Relationship graph + trace | 7 types | 2 types | No | No | No |
| NL descriptions for code | Yes | No | No | No | No |
| Confidence self-assessment | Yes | No | No | No | No |
| MCP server | 5 tools | 3 tools | Yes | 4 tools | N/A |
| Agent skills / cookbooks | Yes | No | No | No | N/A |
| Compact responses (token-aware) | 20x | Yes | ? | Yes | N/A |
| Fully offline | Yes* | Yes | Yes | Yes | N/A |
| Open source | Yes | Yes | Yes | Yes | No |
* Requires Voyage AI API for embeddings and reranking. Qdrant runs locally.
Index a codebase for search.
# Incremental -- only re-index changed files
clew index /path/to/project
# Full reindex
clew index /path/to/project --full
# Generate NL descriptions for undocumented code (requires ANTHROPIC_API_KEY)
clew index /path/to/project --nl-descriptions
# Index specific files
clew index --files src/auth.py --files src/models.pySearch the indexed codebase.
# Natural language query
clew search "where is the rate limiter configured"
# Explicit exhaustive mode -- runs grep alongside semantic search
clew search "all error handlers" --mode exhaustive
# Filter by language
clew search "database models" --language python
# Filter by chunk type
clew search "API endpoints" --chunk-type function
# Set intent explicitly (code, docs, debug, location)
clew search "why does login fail" --intent debug
# JSON output
clew search "user authentication" --rawTrace code relationships via BFS graph traversal.
# Show all relationships for an entity
clew trace "src/auth/models.py::User"
# Only inbound (what depends on this)
clew trace "src/auth/models.py::User" --direction inbound
# Limit depth and filter types
clew trace "src/api/views.py::handle_request" --depth 3 --type calls --type imports
# JSON output
clew trace "src/auth/models.py::User" --rawRelationship types: imports, calls, inherits, decorates, renders, tests, calls_api
Show system health and index statistics.
clew statusStart the MCP server (stdio transport) for Claude Code integration.
clew serveAdd clewdex to Claude Code's .mcp.json:
{
"mcpServers": {
"clew": {
"command": "clew",
"args": ["serve"],
"env": {
"VOYAGE_API_KEY": "pa-xxxxxxxxxxxxxxxxxxxx",
"QDRANT_URL": "http://localhost:6333"
}
}
}
}Semantic search over the indexed codebase.
search(query, limit=5, collection="code", active_file=None,
intent=None, filters=None, detail="compact", mode=None)
detail="compact"(default) -- returns signature + docstring snippetdetail="full"-- returns complete source contentmode="exhaustive"-- runs grep alongside semantic search for completenessfilters-- metadata filters:language,chunk_type,app_name,layer,is_test
Read file content with optional related code chunks.
get_context(file_path, line_start=None, line_end=None, include_related=False)
Search for context about a symbol or question in a file.
explain(file_path, symbol=None, question=None, detail="compact")
Traverse code relationships (imports, calls, inheritance, etc.).
trace(entity, direction="both", max_depth=2, relationship_types=None)
Check health or trigger re-indexing.
index_status(action="status", project_root=None)
| Variable | Required | Default | Description |
|---|---|---|---|
VOYAGE_API_KEY |
Yes | -- | Voyage AI API key for embeddings and re-ranking |
QDRANT_URL |
No | http://localhost:6333 |
Qdrant server endpoint |
QDRANT_API_KEY |
No | -- | Qdrant API key (if auth is enabled) |
CLEW_CACHE_DIR |
No | Auto-detected from git root | SQLite cache directory (.clew/) |
CLEW_LOG_LEVEL |
No | INFO |
Logging verbosity |
ANTHROPIC_API_KEY |
No | -- | Required for NL description generation |
The cache directory resolves in order: CLEW_CACHE_DIR env var, then {git_root}/.clew/, then .clew/ relative to the working directory. This ensures the MCP server and CLI share the same cache.
Create a config.yaml in your project root for fine-grained control:
project:
name: "my-project"
root: "."
collections:
code:
include:
- "src/**/*.py"
- "frontend/**/*.tsx"
exclude:
- "**/migrations/*.py"
- "**/__pycache__/**"
docs:
include:
- "**/*.md"
chunking:
default_max_tokens: 3000
overlap_tokens: 200
terminology_file: indexer/terminology.yaml +==============+
| Claude Code |
| (MCP client) |
+------+-------+
| stdio
+------v-------+
| MCP Server | search, get_context, explain, trace, index_status
+------+-------+
|
+------------v------------+
| Search Pipeline |
| enhance -> classify -> |
| hybrid search -> rerank|
+------------+------------+
|
+------------v------------+
| Qdrant Collections |
| code: py/ts/tsx/js/jsx |
| docs: markdown |
+------------+------------+
|
+------------v------------+
| Indexing Pipeline |
| discover -> chunk -> |
| enrich -> embed -> |
| upsert + relationships |
+------------+------------+
|
+----------+-------+-------+----------+
v v v v v
tree-sitter Voyage SQLite git Anthropic
(AST parse) (embed) (cache) (diff) (NL desc)
- Query enhancement -- Terminology expansion via YAML (abbreviations, synonyms)
- Intent classification -- Heuristic routing:
CODE,DOCS,DEBUG,LOCATION - Hybrid search -- Dense + BM25 multi-prefetch with structural boosting (same-module, test files for debug intent)
- Re-ranking -- Voyage rerank-2.5 for final ordering
- Confidence assessment -- Z-score based, informational only (included in result metadata)
When --mode exhaustive is specified, grep runs in parallel with the semantic pipeline and results are merged and deduplicated before returning.
| File pattern | Strategy | Token range |
|---|---|---|
models.py |
Class + fields as unit | 1,500 - 3,000 |
views.py |
Class as unit; split large actions | 2,000 - 4,000 |
tasks.py |
Function with decorators | 1,000 - 2,000 |
*.tsx, *.jsx |
Component boundaries | 1,500 - 3,000 |
*.md |
Section-level by headers | 1,000 - 2,000 |
| Migrations | Skipped | -- |
Fallback chain: tree-sitter AST -> token-recursive splitting -> line-based splitting.
git clone https://github.com/ruminaider/clew.git
cd clew
pip install -e ".[dev]"# All tests with coverage
pytest --cov=clew -v
# Integration tests (requires running Qdrant)
pytest -m integration
# Single test file
pytest tests/search/test_hybrid.py -vruff format . # Format
ruff check . # Lint
mypy clew/ # Type check (strict mode)clew/
+-- chunker/ # AST parsing, language strategies, token counting
+-- clients/ # External service wrappers (Voyage, Qdrant, Anthropic)
+-- indexer/ # Pipeline, caching, change detection, relationship extraction
| +-- extractors/ # Pluggable per-language relationship extractors
+-- search/ # Engine, hybrid retrieval, intent classification, re-ranking
+-- cli.py # Typer CLI
+-- mcp_server.py # FastMCP server (5 tools)
+-- config.py # Environment variable loading
+-- factory.py # Component wiring (no global state)
+-- models.py # Pydantic v2 config models
+-- exceptions.py # Error hierarchy with fix hints
+-- discovery.py # File discovery with ignore patterns and safety checks
+-- safety.py # File size, chunk count, collection limits
| Problem | Fix |
|---|---|
| Qdrant not running | docker compose up -d qdrant or docker run -d -p 6333:6333 qdrant/qdrant:v1.16.1 |
VOYAGE_API_KEY not set |
export VOYAGE_API_KEY=pa-... |
| No search results | Run clew index --full to reindex |
| MCP server can't find cache | Set CLEW_CACHE_DIR to an absolute path, or run from within the git repo |
| Stale results after code changes | Run clew index (incremental) to pick up changes |
MIT