🧬 Helix Context

Genome-based context compression for local LLMs. Scale-Invariant Knowledge Engine (SIKE) — 10/10 retrieval from 0.6B to 26B parameters.

Treats context like a genome instead of a flat text buffer. A 7,200-gene SQLite database (44MB raw knowledge) compresses to ~15K tokens of expressed context per turn — a 769x inference compression ratio. Retrieval is perfectly scale-invariant: the same genome delivers 10/10 needle accuracy to qwen3:0.6b and Claude Opus alike. The Librarian does the work; the Reader just extracts.

📖 Quick glossary — If the biological metaphor is new to you: gene = one knowledge chunk (content + metadata) · genome = the full SQLite store · ribosome = small model that packs/ranks/splices context · promoter = retrieval tags · expression = selecting + formatting genes for one query · chromatin = gene accessibility tier (open / euchromatin / heterochromatin) · replication = packing conversations back into the genome.

📑 Table of Contents

Benchmark Highlights
Quick Start
What You'll See
How It Works
Key Features
HTTP Endpoints
Continue IDE Integration
Python API
ScoreRift Integration
Configuration
Testing
Benchmarks
Architecture
Origin
License

  Client (Continue, Cursor, any OpenAI client)
         |
         v
  +--------------------------+
  |  Helix Proxy (FastAPI)   |  Port 11437
  |  /v1/chat/completions    |  OpenAI-compatible
  |                          |
  |  1. Extract query        |
  |  2. Express pipeline     |  <-- Genome (SQLite)
  |  3. Inject context       |  <-- Ribosome (CPU model)
  |  4. Forward to Ollama    |  --> localhost:11434
  |  5. Stream tee response  |
  |  6. Background replicate |
  +--------------------------+

Instead of stuffing your entire codebase into the prompt, Helix compresses it into a persistent SQLite genome and expresses only the relevant genes per turn. The model sees compressed context, not raw text. Conversations replicate back into the genome automatically, building institutional memory over time.

Benchmark Highlights

🎯 10/10 needle retrieval from 0.6B to 26B parameters (43x range) 🚀 769x inference compression (11.6M-token genome → 15K expressed per turn) 💎 Claude Haiku + Helix matches Opus — all three API tiers hit 10/10 accuracy 🧠 Local 4B model beats blind Opus 2.25x on domain-specific extraction

Test Corpus Composition

The benchmark genome is a real developer's working data, not a curated eval set. 65.8% of the corpus is pure noise — game data, subtitles, blueprints — and Helix still hits 10/10 on project-specific needles hidden in the remaining 34%.

Source Category	Genes	Tokens	%	Repo Visibility
🎮 Steam / game data (Hades subtitles, BeamNG configs, Dyson Sphere blueprints, Factorio saves)	2,905	~7.7M	65.8%	—
🌐 `SwiftWing21/BigEd` — BigEd fleet (Education dir)	2,405	~1.8M	15.4%	public (private worktree ahead by 2 commits)
🔒 `CosmicTasha/CosmicTasha`	944	~1.6M	13.9%	private
🔒 Project Tally (private financial ledger — repo URL withheld)	242	~0.2M	2.0%	private
🌐 `SwiftWing21/helix-context` — this repo	161	~0.1M	1.2%	public
🌐 `SwiftWing21/scorerift` — ScoreRift / two-brain-audit	110	~0.1M	0.7%	public
Unclassified / session memory	497	~0.1M	1.0%	—
Total	7,264	~11.6M	100%

Source breakdown (software only, excluding game noise):

🌐 Public GitHub repos: ~2.0M tokens (50.0%) — BigEd, helix-context, scorerift
🔒 Private GitHub repos: ~1.8M tokens (45.6%) — CosmicTasha, BookKeeper
🔄 Unclassified / session memory: ~0.2M tokens (4.4%)

Signal-to-noise: Only ~33% of the 11.6M-token corpus is relevant software knowledge. The other ~66% is game data the Agentome had to learn to ignore via chromatin state (HETEROCHROMATIN tier) and promoter-tag discrimination. The 10/10 retrieval holds despite the noise — arguably because of it, since real-world retrieval systems have to survive mixed-domain corpora.

💡 How this table was measured: Claude (co-authoring this repo) had workspace access to the user's local project directories during the benchmark session, including private repos that never leave the machine. The genome file itself is gitignored — only aggregate counts and the benchmark queries are public. This demonstrates a real use case for Helix: your proprietary code participates in retrieval without being uploaded anywhere. Even the Education directory is split — the bulk lives in the public BigEd repo, with a private worktree ahead by 2 unreleased commits.

Database Storage Breakdown (post-VACUUM)

The on-disk genome.db is 523 MB for 7,264 genes (~46 MB of raw content). Why the ~12x gap between raw content and DB file? Because the genome isn't just storage — it's a 4-tier retrieval engine (promoter tags → FTS5 → SPLADE → ΣĒMA semantic), and each tier carries its own index.

Component	Size	% of DB	Purpose
FTS5 posting lists (`genes_fts_data`)	187.3 MB	35.8%	Full-text inverted index for keyword retrieval
Raw content (`gene.content`)	44.5 MB	8.5%	Original source text, verbatim
SPLADE sparse index (`splade_terms`)	35.7 MB	6.8%	1.73M term weights for lexical expansion
Ribosome complements (`gene.complement`)	16.5 MB	3.2%	Small-model compressed summaries (2.69x storage ratio)
Gene relations (NLI)	6.6 MB	1.3%	108K typed logical relations between genes
Entity graph	5.6 MB	1.1%	117K entity-to-gene edges for co-activation
Promoter index (retrieval tags)	3.8 MB	0.7%	73,815 domain/entity tags across all genes
Codons + metadata JSON	8.2 MB	1.6%	Semantic tags, promoter JSON, epigenetics
ΣĒMA embeddings (20D vectors)	0.34 MB	0.1%	Semantic primes — 80 bytes per gene
Key-value facts (pre-extracted)	1.4 MB	0.3%	Pre-parsed `key=value` pairs for answer slate
Accounted payload subtotal	310.0 MB	59.3%	Actual data across all indexes
SQLite B-tree + page overhead	212.7 MB	40.7%	Index structure, not fragmentation
Total file size	522.7 MB	100%

💾 VACUUM impact: This table reflects post-VACUUM state. Before VACUUM, the database was 752 MB — the extra 229 MB (30.4%) was free pages from thinning 11,529 genes down to 7,264 during tuning. SQLite holds deleted pages until a VACUUM reclaims them. The ~213 MB of "B-tree overhead" that remains is structural: page headers, cell pointers, interior nodes of the index B-trees. That's not reclaimable without changing the indexing strategy.

Observations:

FTS5 dominates storage (35.8% of the file). The full-text index holds position data for every token across all 7K genes — it's what enables the sub-5ms content queries that make the ~1s total retrieval latency possible.
Raw content is only 8.5% of the file. The rest is indexes. This is the expected tradeoff for a retrieval-optimized database vs a flat text archive.
Accounted payload is 310 MB (59.3%). The remaining 213 MB (40.7%) is legitimate B-tree structure overhead — page headers, cell pointers, and internal index nodes. SQLite can't compress this further without sacrificing query speed.
ΣĒMA embeddings are essentially free — 20 floats per gene = 80 bytes. A 1M-gene genome would cost only 80 MB for the semantic tier.
Inference cost is unchanged by DB size: the LLM only ever sees ~15K tokens per turn regardless of whether the genome is 50 MB or 50 GB.

Compression summary:

Metric	Ratio	Meaning
Storage (raw → complement)	2.69x	How much the ribosome compresses each gene's summary
Expression (full corpus → single turn)	776x	How much of the genome the LLM sees per query
DB file / raw content	11.76x (post-VACUUM)	Index overhead for 4-tier retrieval
DB file / raw content	16.90x (pre-VACUUM)	With fragmentation from thinning
vs 128K-stuffed context	8.5x fewer tokens	Baseline "dump everything" approach
vs chunked RAG (25K tokens)	1.7x fewer tokens	Standard vector-search RAG

The headline number — 776x inference compression — is what matters for cost and latency. Everything else is a bookkeeping detail of how the Librarian files its books.

Needle-in-a-haystack on this 7,264-gene genome (~46MB raw knowledge):

Model	Params	VRAM	Retrieval	Accuracy
qwen3:0.6b	0.6B	0.5 GB	10/10	2/10
qwen3:1.7b	1.7B	1.4 GB	10/10	3/10
qwen3:4b	4B	2.5 GB	10/10	9/10
gemma4:e4b (MoE)	8B / 4B active	9.6 GB	10/10	9/10
qwen3:8b	8B	5.2 GB	10/10	9/10
gemma4:26b-a4b (MoE + DDR4 offload)	26B / 4B active	8 GB + 13 GB RAM	10/10	6/10
Claude Haiku + Helix	—	API	10/10	10/10
Claude Sonnet + Helix	—	API	10/10	10/10
Claude Opus + Helix	—	API	10/10	10/10

Without Helix, the same Claude models score 3-4/10 (hand-curated reference only). The genome is a universal uplift: identical gains at every price tier and parameter count. See docs/RESEARCH.md for the full SIKE analysis.

Quick Start

# Install from PyPI (beta)
pip install helix-context --pre

# Pull a small model for the ribosome (context codec)
ollama pull gemma4:e2b

# Start the proxy
helix
# or: python -m uvicorn helix_context.server:app --host 127.0.0.1 --port 11437

# Seed the genome with your own project files
python examples/seed_genome.py path/to/your/project/

# Check genome health
curl http://127.0.0.1:11437/stats

Point any OpenAI-compatible client at http://127.0.0.1:11437/v1 and start chatting. Context compression happens transparently.

What You'll See

After seeding the genome, /stats shows the state of your knowledge base:

$ curl -s http://127.0.0.1:11437/stats | jq
{
  "total_genes": 7264,
  "open": 7264,
  "compression_ratio": 2.69,
  "health": {
    "total_queries": 503,
    "avg_ellipticity": 0.62,
    "status_counts": {"aligned": 143, "sparse": 267, "denatured": 93}
  }
}

A /context query returns the expressed context window — exactly what gets injected into the downstream LLM:

$ curl -s http://127.0.0.1:11437/context \
    -H "Content-Type: application/json" \
    -d '{"query":"What port does the Helix proxy listen on?"}' | jq '.[0]'
{
  "name": "Helix Genome Context",
  "description": "12 genes expressed, 3.1x compression, health=aligned (Δε=0.66)",
  "content": "<expressed_context>\n<GENE src=\"helix-context/README.md\" facts=\"port=11437\">\n# Helix Context\n...",
  "context_health": {
    "ellipticity": 0.66,
    "coverage": 0.85,
    "density": 0.42,
    "freshness": 1.0,
    "genes_expressed": 12,
    "status": "aligned"
  }
}

A chat request through the proxy gets the context injected automatically — your client doesn't need to know Helix exists:

$ curl -s http://127.0.0.1:11437/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "qwen3:4b",
      "messages": [{"role":"user","content":"What port does the Helix proxy use?"}]
    }' | jq -r '.choices[0].message.content'

The Helix proxy server listens on **port 11437**, as specified in helix.toml
under [server]. This is configured in the repository at helix-context/README.md.

The model answered from the retrieved genes, not its training data — which doesn't contain your project.

How It Works

6-step expression pipeline per turn:

Step	What	Cost	Blocking?
1. Extract	Heuristic keyword extraction from query	0 tokens	No
2. Express	SQLite promoter lookup + synonym expansion + co-activation	0 tokens	No
3. Re-rank	Small CPU model scores candidates by relevance	~300 tokens	Yes
4. Splice	Small CPU model trims introns, keeps exons (batched)	~600 tokens	Yes
5. Assemble	Join spliced parts, enforce token budget, wrap in tags	0 tokens	No
6. Replicate	Pack query+response exchange back into genome	~300 tokens	No (background)

Token budget:

3k tokens: ribosome decoder prompt (fixed, tells the big model how to read codons)
12k tokens: expressed context (dense XML gene format, 12 genes per turn)
11M+ tokens: genome cold storage (SQLite, ~46MB raw on a mature project)

Compression metrics:

Storage: 2.7x (raw content → ribosome complements)
Expression: 769x (full genome → what the LLM sees per turn)
vs naive RAG at 25K tokens: 1.7x fewer tokens, 10/10 vs ~6/10 accuracy

Key Features

Context Health Monitor (Delta-Epsilon)

Every query computes a health signal measuring how well the genome served it:

{
  "context_health": {
    "ellipticity": 0.82,
    "coverage": 0.75,
    "density": 0.68,
    "freshness": 1.0,
    "genes_expressed": 3,
    "genes_available": 42,
    "status": "aligned"
  }
}

Status	Ellipticity	Meaning
`aligned`	>= 0.7	Genome is well-grounded, model is informed
`sparse`	>= 0.3	Gaps exist, model may guess on some topics
`stale`	any	Expressed genes are outdated (low freshness)
`denatured`	< 0.3	Context is unreliable, high hallucination risk

Horizontal Gene Transfer (HGT)

Export a genome and import it into another Helix instance:

# Export
python examples/hgt_transfer.py export -d "Project knowledge snapshot"

# Preview what an import would change
python examples/hgt_transfer.py diff genome_export.helix

# Import into another instance
python examples/hgt_transfer.py import genome_export.helix

Three merge strategies: skip_existing (safe default), overwrite, newest. Content-addressed gene IDs ensure deduplication across instances.

Associative Memory

Genes that are frequently expressed together build co-activation links. When you query for topic A, the genome also pulls in topic B if they've been co-expressed before. This creates an organic associative memory that grows smarter over time.

Tissue-Specific Expression (MoE + Small Models)

MoE models (Gemma 4) and sub-3.2B models can't reliably "look back" across a 15K context window. Helix auto-detects these architectures and switches to a tissue-specific expression mode inspired by how cell types selectively express genes from the same genome:

Answer slate — pre-extracted key=value facts front-loaded in the first ~200 tokens, inside every sliding-window attention layer (Gemma 4's 5:1 SWA ratio means 5 of 6 layers only see 1,024-token windows).
Relevance-first gene ordering — highest-scoring gene at position 0, not sorted by source sequence. Guarantees the best match lands inside every attention window.
Think suppression — /no_think injection + temp=0 for small models that otherwise waste their output budget on reasoning loops.

Measured impact on gemma4:e4b:

Mode	Retrieval	Accuracy
Standard expression	10/10	5/10
MoE tissue expression	10/10	9/10

Dense models (qwen3 family) automatically use the standard expression path and are unaffected. Detection is per-request based on the downstream model name, so the same server can handle mixed clients.

Synonym Expansion

Configure lightweight query expansion in helix.toml:

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

When a user asks about "cache", the genome also searches for "redis", "ttl", etc.

HTTP Endpoints

Core endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	OpenAI-compatible proxy (primary integration)
`/ingest`	POST	Ingest content into genome: `{content, content_type, metadata?}`
`/context`	POST	Query genome for context: `{query}` (Continue format)
`/consolidate`	POST	Distill session buffer into knowledge genes
`/stats`	GET	Genome metrics, compression ratio, health
`/health`	GET	Server status, ribosome model, gene count
`/health/history`	GET	Recent query health signals (`?limit=N`)

Admin / maintenance endpoints

Endpoint	Method	Description
`/admin/refresh`	POST	Reopen the genome connection to see external writes
`/admin/vacuum`	POST	Reclaim free SQLite pages after thinning (returns before/after size)
`/admin/kv-backfill`	POST	Run CPU regex KV extraction on genes missing `key_values`
`/replicas`	GET	List replica status (sync lag, paths)
`/replicas/sync`	POST	Force-sync all replicas from the master genome
`/bridge/status`	GET	Shared-memory bridge status (inbox, signals)
`/bridge/collect`	POST	Ingest pending files from the shared bridge inbox
`/bridge/signal`	POST	Write a named signal to the shared bridge

Four operations that sound similar — but do different things

These are the most confused operations in the admin surface. Know which one to reach for:

Operation	What it does	When to use
`checkpoint(mode)`	Flush WAL log into the main DB file. No file size change.	During/after bulk ingest, to guarantee data is durable before a crash. Automatic every 50 inserts.
`refresh()` / `/admin/refresh`	Close and reopen the long-lived DB connection so it picks up writes made by external processes.	After running a thinning script, ingest worker, or any out-of-band write. Cheap, non-destructive.
`compact()`	Scan every gene's `source_id`, mtime-check the file, mark source-changed genes as `AGING`. Does not delete or shrink anything.	Periodic source-staleness detection (runs automatically every `compact_interval` seconds).
`vacuum()` / `/admin/vacuum`	Rewrite the SQLite file to reclaim free pages from previous deletions. Shrinks the file.	After large thinning operations. Blocking — run during maintenance windows only. Our 7.2K-gene genome reclaimed 229 MB (30%) on first VACUUM.

Rule of thumb:

If you care about durability → checkpoint()
If you care about visibility (seeing external writes) → refresh()
If you care about staleness (detecting changed sources) → compact()
If you care about disk space → vacuum()

Continue IDE Integration

Add to ~/.continue/config.yaml:

models:
  - name: Helix (Local)
    provider: openai
    model: gemma4:e4b
    apiBase: http://127.0.0.1:11437/v1
    apiKey: EMPTY
    roles: [chat]
    defaultCompletionOptions:
      contextLength: 128000
      maxTokens: 4096

Use Chat mode (not Agent mode). Set contextLength high so Continue sends the full message; Helix handles compression downstream.

Python API

from helix_context import HelixContextManager, load_config

config = load_config()
helix = HelixContextManager(config)

# Ingest content
helix.ingest("Your document text here", content_type="text")
helix.ingest(open("src/main.py").read(), content_type="code")

# Build context for a query
window = helix.build_context("How does auth work?")
print(window.expressed_context)
print(window.context_health.status)  # "aligned" / "sparse" / "denatured"

# Learn from an exchange
helix.learn("How does auth work?", "JWT middleware validates tokens...")

# Export genome
from helix_context.hgt import export_genome
export_genome(helix.genome, "project.helix", description="Auth system knowledge")

ScoreRift Integration

Helix includes a bridge to ScoreRift for divergence-based context health monitoring:

from helix_context.integrations.scorerift import GenomeHealthProbe, cd_signal

# Probe genome health
probe = GenomeHealthProbe("http://127.0.0.1:11437")
report = probe.full_scan()

# Register as ScoreRift dimensions
from helix_context.integrations.scorerift import make_genome_dimensions
engine.register_many(make_genome_dimensions())

# Feed divergence resolutions back into the genome
from helix_context.integrations.scorerift import resolution_to_gene
resolution_to_gene("security", auto_score=0.85, manual_score=1.0,
                   resolution="False positives in auth module scanner rules")

Configuration

All config in helix.toml:

[ribosome]
model = "gemma4:e4b"        # context codec for pack/re_rank/splice
backend = "ollama"          # or "deberta" for faster CPU-only ribosome
timeout = 30                # seconds before fallback
keep_alive = "30m"          # keep model loaded (eliminates swap latency)
warmup = true               # pre-load model on server start

[budget]
ribosome_tokens = 3000
expression_tokens = 12000   # 15K total per turn (decoder + expression)
max_genes_per_turn = 12
splice_aggressiveness = 0.3
decoder_mode = "condensed"  # full | condensed | minimal | none

[genome]
path = "genome.db"
cold_start_threshold = 10
replicas = ["C:/helix-cache/genome.db", "E:/helix-cache/genome.db"]
replica_sync_interval = 100

[ingestion]
backend = "cpu"             # "cpu" (spaCy+regex, fast) | "ollama" (LLM, slow)
splade_enabled = true       # SPLADE sparse expansion at index time
entity_graph = true         # entity-based co-activation links

[server]
host = "127.0.0.1"
port = 11437
upstream = "http://localhost:11434"

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

Environment variables:

OLLAMA_KV_CACHE_TYPE=q4_0 — INT4 KV cache quantization (recommended). q8_0 tested but produced WORSE accuracy (gave models more room to hallucinate in think mode). q4_0 is faster, more accurate, and uses less VRAM.
HELIX_CONFIG=/path/to/helix.toml — override config file location

Testing

# Mock tests only (no Ollama needed, ~8s)
pytest tests/ -m "not live"

# Live tests (requires Ollama)
pytest tests/ -m live -v -s

# Full suite
pytest tests/ -v

Benchmarks

# Needle-in-a-haystack (single model)
HELIX_MODEL=qwen3:4b python benchmarks/bench_needle.py

# Full sweep across all local models
python benchmarks/bench_sweep.py

See docs/RESEARCH.md for full SIKE analysis and results across 7 local models + 3 Claude API tiers.

Architecture

Module	Role
`schemas.py`	Gene, ContextWindow, ContextHealth, ChromatinState
`codons.py`	CodonChunker (text/code splitting) + CodonEncoder (serialization)
`genome.py`	SQLite genome with promoter-tag retrieval + co-activation
`ribosome.py`	Small-model codec: pack, re_rank, splice, replicate
`context_manager.py`	6-step pipeline orchestrator + pending replication buffer
`server.py`	FastAPI proxy + standalone endpoints
`config.py`	TOML config loader with synonym map
`hgt.py`	Genome export/import (Horizontal Gene Transfer)
`integrations/scorerift.py`	CD spectroscope bridge to ScoreRift

Origin

Built as a standalone package extracted from BigEd CC. Implements the "Ribosome Hypothesis" for local LLM context management.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
helix_context		helix_context
scripts		scripts
tests		tests
training		training
.gitignore		.gitignore
BENCHMARK_NOTES.md		BENCHMARK_NOTES.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
SESSION_HANDOFF.md		SESSION_HANDOFF.md
genome_llm_backup.db		genome_llm_backup.db
helix.toml		helix.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Helix Context

Benchmark Highlights

Test Corpus Composition

Database Storage Breakdown (post-VACUUM)

Quick Start

What You'll See

How It Works

Key Features

Context Health Monitor (Delta-Epsilon)

Horizontal Gene Transfer (HGT)

Associative Memory

Tissue-Specific Expression (MoE + Small Models)

Synonym Expansion

HTTP Endpoints

Core endpoints

Admin / maintenance endpoints

Four operations that sound similar — but do different things

Continue IDE Integration

Python API

ScoreRift Integration

Configuration

Testing

Benchmarks

Architecture

Origin

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 Helix Context

Benchmark Highlights

Test Corpus Composition

Database Storage Breakdown (post-VACUUM)

Quick Start

What You'll See

How It Works

Key Features

Context Health Monitor (Delta-Epsilon)

Horizontal Gene Transfer (HGT)

Associative Memory

Tissue-Specific Expression (MoE + Small Models)

Synonym Expansion

HTTP Endpoints

Core endpoints

Admin / maintenance endpoints

Four operations that sound similar — but do different things

Continue IDE Integration

Python API

ScoreRift Integration

Configuration

Testing

Benchmarks

Architecture

Origin

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages