clew

Semantic code discovery for AI agents -- find code by meaning, not just name.

Ask natural language questions about your codebase -- clewdex indexes your code with AST-aware chunking, embeds it with Voyage AI, stores it in Qdrant, and serves results through both a CLI and an MCP server that Claude Code can call directly.

Quick start

1. Install clewdex

pip install clewdex

Or with pipx, Homebrew, or npx:

pipx install clewdex                          # isolated install
brew install ruminaider/tap/clewdex           # macOS
npx clewdex                                   # run without installing

2. Start Qdrant

docker run -d -p 6333:6333 qdrant/qdrant:v1.16.1

3. Set your API key

export VOYAGE_API_KEY=pa-xxxxxxxxxxxxxxxxxxxx

Get a key at dash.voyageai.com.

4. Index and search

clew index /path/to/your/project --full
clew search "how do we handle authentication"

When to use clew vs grep

Clew and grep are complementary tools that handle different types of code discovery:

Use clew when you:

Need to find code by concept, not by identifier name ("where is error handling for the pharmacy API")
Are exploring an unfamiliar codebase and don't know what to search for
Want to trace structural relationships (call chains, inheritance, imports)
Need vocabulary bridging -- business language to code identifiers
Want to understand how a feature is implemented across multiple files

Use grep when you:

Know the exact pattern you're looking for (raise ValidationError, @celery_app.task)
Need exhaustive enumeration -- every instance of a pattern, guaranteed complete
Are matching literal text in comments, strings, or config files
Need structural completeness (grep finds things in places BM25 cannot reach)

Use both (via agent skills) when you:

Need to discover a concept AND find all its instances
Are debugging and need both semantic context and exact pattern locations
Want to verify that clew found everything relevant

Features

Hybrid search -- Dense embeddings (Voyage voyage-code-3) + BM25 keyword matching fused with Reciprocal Rank Fusion, optionally re-ranked with Voyage rerank-2.5
Multi-vector architecture -- Three named vectors (signature, semantic, body) with intent-adaptive routing for precise retrieval
AST-aware chunking -- tree-sitter parses Python, TypeScript, and JavaScript into semantic units (functions, classes, components) with token-aware fallback splitting
Code relationship tracing -- Extracts imports, calls, inheritance, decorators, JSX renders, test mappings, and API boundaries; traversable via BFS graph queries
Incremental indexing -- Git-aware change detection (with file-hash fallback) so re-indexing only touches what changed
NL descriptions -- LLM-generated descriptions for undocumented code, prepended before embedding to improve search quality
Compact MCP responses -- ~20x token reduction by default; returns signatures + docstring previews instead of full source
Multi-collection -- Separate code and docs collections with intent-driven routing
Confidence self-assessment -- Z-score based confidence scoring included in results as informational metadata
Explicit exhaustive mode -- --mode exhaustive runs grep alongside semantic search for completeness when needed

Competitive comparison

Capability	clew	grepai	CodeSight	CodeGrok	Cursor
Multi-vector search (3 vectors)	Yes	No	No	No	No
BM25 hybrid + RRF fusion	Yes	No	Yes	No	No
Reranking (calibrated scores)	Yes	No	No	No	?
Intent-adaptive routing	Yes	No	No	No	No
Relationship graph + trace	7 types	2 types	No	No	No
NL descriptions for code	Yes	No	No	No	No
Confidence self-assessment	Yes	No	No	No	No
MCP server	5 tools	3 tools	Yes	4 tools	N/A
Agent skills / cookbooks	Yes	No	No	No	N/A
Compact responses (token-aware)	20x	Yes	?	Yes	N/A
Fully offline	Yes*	Yes	Yes	Yes	N/A
Open source	Yes	Yes	Yes	Yes	No

* Requires Voyage AI API for embeddings and reranking. Qdrant runs locally.

CLI usage

`clew index`

Index a codebase for search.

# Incremental -- only re-index changed files
clew index /path/to/project

# Full reindex
clew index /path/to/project --full

# Generate NL descriptions for undocumented code (requires ANTHROPIC_API_KEY)
clew index /path/to/project --nl-descriptions

# Index specific files
clew index --files src/auth.py --files src/models.py

`clew search`

Search the indexed codebase.

# Natural language query
clew search "where is the rate limiter configured"

# Explicit exhaustive mode -- runs grep alongside semantic search
clew search "all error handlers" --mode exhaustive

# Filter by language
clew search "database models" --language python

# Filter by chunk type
clew search "API endpoints" --chunk-type function

# Set intent explicitly (code, docs, debug, location)
clew search "why does login fail" --intent debug

# JSON output
clew search "user authentication" --raw

`clew trace`

Trace code relationships via BFS graph traversal.

# Show all relationships for an entity
clew trace "src/auth/models.py::User"

# Only inbound (what depends on this)
clew trace "src/auth/models.py::User" --direction inbound

# Limit depth and filter types
clew trace "src/api/views.py::handle_request" --depth 3 --type calls --type imports

# JSON output
clew trace "src/auth/models.py::User" --raw

Relationship types: imports, calls, inherits, decorates, renders, tests, calls_api

`clew status`

Show system health and index statistics.

clew status

`clew serve`

Start the MCP server (stdio transport) for Claude Code integration.

clew serve

MCP integration

Add clewdex to Claude Code's .mcp.json:

{
  "mcpServers": {
    "clew": {
      "command": "clew",
      "args": ["serve"],
      "env": {
        "VOYAGE_API_KEY": "pa-xxxxxxxxxxxxxxxxxxxx",
        "QDRANT_URL": "http://localhost:6333"
      }
    }
  }
}

MCP tools

`search`

Semantic search over the indexed codebase.

search(query, limit=5, collection="code", active_file=None,
       intent=None, filters=None, detail="compact", mode=None)

detail="compact" (default) -- returns signature + docstring snippet
detail="full" -- returns complete source content
mode="exhaustive" -- runs grep alongside semantic search for completeness
filters -- metadata filters: language, chunk_type, app_name, layer, is_test

`get_context`

Read file content with optional related code chunks.

get_context(file_path, line_start=None, line_end=None, include_related=False)

`explain`

Search for context about a symbol or question in a file.

explain(file_path, symbol=None, question=None, detail="compact")

`trace`

Traverse code relationships (imports, calls, inheritance, etc.).

trace(entity, direction="both", max_depth=2, relationship_types=None)

`index_status`

Check health or trigger re-indexing.

index_status(action="status", project_root=None)

Configuration

Environment variables

Variable	Required	Default	Description
`VOYAGE_API_KEY`	Yes	--	Voyage AI API key for embeddings and re-ranking
`QDRANT_URL`	No	`http://localhost:6333`	Qdrant server endpoint
`QDRANT_API_KEY`	No	--	Qdrant API key (if auth is enabled)
`CLEW_CACHE_DIR`	No	Auto-detected from git root	SQLite cache directory (`.clew/`)
`CLEW_LOG_LEVEL`	No	`INFO`	Logging verbosity
`ANTHROPIC_API_KEY`	No	--	Required for NL description generation

The cache directory resolves in order: CLEW_CACHE_DIR env var, then {git_root}/.clew/, then .clew/ relative to the working directory. This ensures the MCP server and CLI share the same cache.

Project configuration (optional)

Create a config.yaml in your project root for fine-grained control:

project:
  name: "my-project"
  root: "."

collections:
  code:
    include:
      - "src/**/*.py"
      - "frontend/**/*.tsx"
    exclude:
      - "**/migrations/*.py"
      - "**/__pycache__/**"
  docs:
    include:
      - "**/*.md"

chunking:
  default_max_tokens: 3000
  overlap_tokens: 200

terminology_file: indexer/terminology.yaml

Architecture

                    +==============+
                    | Claude Code  |
                    | (MCP client) |
                    +------+-------+
                           | stdio
                    +------v-------+
                    |  MCP Server  |  search, get_context, explain, trace, index_status
                    +------+-------+
                           |
              +------------v------------+
              |     Search Pipeline     |
              |  enhance -> classify -> |
              |  hybrid search -> rerank|
              +------------+------------+
                           |
              +------------v------------+
              |    Qdrant Collections   |
              |  code: py/ts/tsx/js/jsx |
              |  docs: markdown         |
              +------------+------------+
                           |
              +------------v------------+
              |   Indexing Pipeline     |
              |  discover -> chunk ->   |
              |  enrich -> embed ->     |
              |  upsert + relationships |
              +------------+------------+
                           |
        +----------+-------+-------+----------+
        v          v       v       v          v
   tree-sitter  Voyage   SQLite   git     Anthropic
   (AST parse)  (embed)  (cache)  (diff)  (NL desc)

Search pipeline

Query enhancement -- Terminology expansion via YAML (abbreviations, synonyms)
Intent classification -- Heuristic routing: CODE, DOCS, DEBUG, LOCATION
Hybrid search -- Dense + BM25 multi-prefetch with structural boosting (same-module, test files for debug intent)
Re-ranking -- Voyage rerank-2.5 for final ordering
Confidence assessment -- Z-score based, informational only (included in result metadata)

When --mode exhaustive is specified, grep runs in parallel with the semantic pipeline and results are merged and deduplicated before returning.

Chunking strategy

File pattern	Strategy	Token range
`models.py`	Class + fields as unit	1,500 - 3,000
`views.py`	Class as unit; split large actions	2,000 - 4,000
`tasks.py`	Function with decorators	1,000 - 2,000
`.tsx`, `.jsx`	Component boundaries	1,500 - 3,000
`*.md`	Section-level by headers	1,000 - 2,000
Migrations	Skipped	--

Fallback chain: tree-sitter AST -> token-recursive splitting -> line-based splitting.

Development

Setup

git clone https://github.com/ruminaider/clew.git
cd clew
pip install -e ".[dev]"

Tests

# All tests with coverage
pytest --cov=clew -v

# Integration tests (requires running Qdrant)
pytest -m integration

# Single test file
pytest tests/search/test_hybrid.py -v

Linting and type checking

ruff format .           # Format
ruff check .            # Lint
mypy clew/              # Type check (strict mode)

Project structure

clew/
+-- chunker/             # AST parsing, language strategies, token counting
+-- clients/             # External service wrappers (Voyage, Qdrant, Anthropic)
+-- indexer/             # Pipeline, caching, change detection, relationship extraction
|   +-- extractors/      # Pluggable per-language relationship extractors
+-- search/              # Engine, hybrid retrieval, intent classification, re-ranking
+-- cli.py               # Typer CLI
+-- mcp_server.py        # FastMCP server (5 tools)
+-- config.py            # Environment variable loading
+-- factory.py           # Component wiring (no global state)
+-- models.py            # Pydantic v2 config models
+-- exceptions.py        # Error hierarchy with fix hints
+-- discovery.py         # File discovery with ignore patterns and safety checks
+-- safety.py            # File size, chunk count, collection limits

Troubleshooting

Problem	Fix
Qdrant not running	`docker compose up -d qdrant` or `docker run -d -p 6333:6333 qdrant/qdrant:v1.16.1`
`VOYAGE_API_KEY not set`	`export VOYAGE_API_KEY=pa-...`
No search results	Run `clew index --full` to reindex
MCP server can't find cache	Set `CLEW_CACHE_DIR` to an absolute path, or run from within the git repo
Stale results after code changes	Run `clew index` (incremental) to pick up changes

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.claude		.claude
.clew-eval		.clew-eval
.github/workflows		.github/workflows
clew		clew
docs		docs
homebrew		homebrew
npm		npm
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

clew

Quick start

1. Install clewdex

2. Start Qdrant

3. Set your API key

4. Index and search

When to use clew vs grep

Features

Competitive comparison

CLI usage

clew index

clew search

clew trace

clew status

clew serve

MCP integration

MCP tools

search

get_context

explain

trace

index_status

Configuration

Environment variables

Project configuration (optional)

Architecture

Search pipeline

Chunking strategy

Development

Setup

Tests

Linting and type checking

Project structure

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`clew index`

`clew search`

`clew trace`

`clew status`

`clew serve`

`search`

`get_context`

`explain`

`trace`

`index_status`

Packages