Skip to content

SwiftWing21/helix-context

Repository files navigation

🧬 Helix Context

License: Apache 2.0 Python 3.11+ PyPI version SIKE: 10/10 Compression: 769x Paper: Agentome

Genome-based context compression for local LLMs. Scale-Invariant Knowledge Engine (SIKE) — 10/10 retrieval from 0.6B to 26B parameters.

Treats context like a genome instead of a flat text buffer. A 7,200-gene SQLite database (44MB raw knowledge) compresses to ~15K tokens of expressed context per turn — a 769x inference compression ratio. Retrieval is perfectly scale-invariant: the same genome delivers 10/10 needle accuracy to qwen3:0.6b and Claude Opus alike. The Librarian does the work; the Reader just extracts.

📖 Quick glossary — If the biological metaphor is new to you: gene = one knowledge chunk (content + metadata) · genome = the full SQLite store · ribosome = small model that packs/ranks/splices context · promoter = retrieval tags · expression = selecting + formatting genes for one query · chromatin = gene accessibility tier (open / euchromatin / heterochromatin) · replication = packing conversations back into the genome.

📑 Table of Contents
  Client (Continue, Cursor, any OpenAI client)
         |
         v
  +--------------------------+
  |  Helix Proxy (FastAPI)   |  Port 11437
  |  /v1/chat/completions    |  OpenAI-compatible
  |                          |
  |  1. Extract query        |
  |  2. Express pipeline     |  <-- Genome (SQLite)
  |  3. Inject context       |  <-- Ribosome (CPU model)
  |  4. Forward to Ollama    |  --> localhost:11434
  |  5. Stream tee response  |
  |  6. Background replicate |
  +--------------------------+

Instead of stuffing your entire codebase into the prompt, Helix compresses it into a persistent SQLite genome and expresses only the relevant genes per turn. The model sees compressed context, not raw text. Conversations replicate back into the genome automatically, building institutional memory over time.

Benchmark Highlights

🎯 10/10 needle retrieval from 0.6B to 26B parameters (43x range) 🚀 769x inference compression (11.6M-token genome → 15K expressed per turn) 💎 Claude Haiku + Helix matches Opus — all three API tiers hit 10/10 accuracy 🧠 Local 4B model beats blind Opus 2.25x on domain-specific extraction

Test Corpus Composition

The benchmark genome is a real developer's working data, not a curated eval set. 65.8% of the corpus is pure noise — game data, subtitles, blueprints — and Helix still hits 10/10 on project-specific needles hidden in the remaining 34%.

Source Category Genes Tokens % Repo Visibility
🎮 Steam / game data (Hades subtitles, BeamNG configs, Dyson Sphere blueprints, Factorio saves) 2,905 ~7.7M 65.8%
🌐 SwiftWing21/BigEd — BigEd fleet (Education dir) 2,405 ~1.8M 15.4% public (private worktree ahead by 2 commits)
🔒 CosmicTasha/CosmicTasha 944 ~1.6M 13.9% private
🔒 Project Tally (private financial ledger — repo URL withheld) 242 ~0.2M 2.0% private
🌐 SwiftWing21/helix-context — this repo 161 ~0.1M 1.2% public
🌐 SwiftWing21/scorerift — ScoreRift / two-brain-audit 110 ~0.1M 0.7% public
Unclassified / session memory 497 ~0.1M 1.0%
Total 7,264 ~11.6M 100%

Source breakdown (software only, excluding game noise):

  • 🌐 Public GitHub repos: ~2.0M tokens (50.0%) — BigEd, helix-context, scorerift
  • 🔒 Private GitHub repos: ~1.8M tokens (45.6%) — CosmicTasha, BookKeeper
  • 🔄 Unclassified / session memory: ~0.2M tokens (4.4%)

Signal-to-noise: Only ~33% of the 11.6M-token corpus is relevant software knowledge. The other ~66% is game data the Agentome had to learn to ignore via chromatin state (HETEROCHROMATIN tier) and promoter-tag discrimination. The 10/10 retrieval holds despite the noise — arguably because of it, since real-world retrieval systems have to survive mixed-domain corpora.

💡 How this table was measured: Claude (co-authoring this repo) had workspace access to the user's local project directories during the benchmark session, including private repos that never leave the machine. The genome file itself is gitignored — only aggregate counts and the benchmark queries are public. This demonstrates a real use case for Helix: your proprietary code participates in retrieval without being uploaded anywhere. Even the Education directory is split — the bulk lives in the public BigEd repo, with a private worktree ahead by 2 unreleased commits.

Database Storage Breakdown (post-VACUUM)

The on-disk genome.db is 523 MB for 7,264 genes (~46 MB of raw content). Why the ~12x gap between raw content and DB file? Because the genome isn't just storage — it's a 4-tier retrieval engine (promoter tags → FTS5 → SPLADE → ΣĒMA semantic), and each tier carries its own index.

Component Size % of DB Purpose
FTS5 posting lists (genes_fts_data) 187.3 MB 35.8% Full-text inverted index for keyword retrieval
Raw content (gene.content) 44.5 MB 8.5% Original source text, verbatim
SPLADE sparse index (splade_terms) 35.7 MB 6.8% 1.73M term weights for lexical expansion
Ribosome complements (gene.complement) 16.5 MB 3.2% Small-model compressed summaries (2.69x storage ratio)
Gene relations (NLI) 6.6 MB 1.3% 108K typed logical relations between genes
Entity graph 5.6 MB 1.1% 117K entity-to-gene edges for co-activation
Promoter index (retrieval tags) 3.8 MB 0.7% 73,815 domain/entity tags across all genes
Codons + metadata JSON 8.2 MB 1.6% Semantic tags, promoter JSON, epigenetics
ΣĒMA embeddings (20D vectors) 0.34 MB 0.1% Semantic primes — 80 bytes per gene
Key-value facts (pre-extracted) 1.4 MB 0.3% Pre-parsed key=value pairs for answer slate
Accounted payload subtotal 310.0 MB 59.3% Actual data across all indexes
SQLite B-tree + page overhead 212.7 MB 40.7% Index structure, not fragmentation
Total file size 522.7 MB 100%

💾 VACUUM impact: This table reflects post-VACUUM state. Before VACUUM, the database was 752 MB — the extra 229 MB (30.4%) was free pages from thinning 11,529 genes down to 7,264 during tuning. SQLite holds deleted pages until a VACUUM reclaims them. The ~213 MB of "B-tree overhead" that remains is structural: page headers, cell pointers, interior nodes of the index B-trees. That's not reclaimable without changing the indexing strategy.

Observations:

  • FTS5 dominates storage (35.8% of the file). The full-text index holds position data for every token across all 7K genes — it's what enables the sub-5ms content queries that make the ~1s total retrieval latency possible.
  • Raw content is only 8.5% of the file. The rest is indexes. This is the expected tradeoff for a retrieval-optimized database vs a flat text archive.
  • Accounted payload is 310 MB (59.3%). The remaining 213 MB (40.7%) is legitimate B-tree structure overhead — page headers, cell pointers, and internal index nodes. SQLite can't compress this further without sacrificing query speed.
  • ΣĒMA embeddings are essentially free — 20 floats per gene = 80 bytes. A 1M-gene genome would cost only 80 MB for the semantic tier.
  • Inference cost is unchanged by DB size: the LLM only ever sees ~15K tokens per turn regardless of whether the genome is 50 MB or 50 GB.

Compression summary:

Metric Ratio Meaning
Storage (raw → complement) 2.69x How much the ribosome compresses each gene's summary
Expression (full corpus → single turn) 776x How much of the genome the LLM sees per query
DB file / raw content 11.76x (post-VACUUM) Index overhead for 4-tier retrieval
DB file / raw content 16.90x (pre-VACUUM) With fragmentation from thinning
vs 128K-stuffed context 8.5x fewer tokens Baseline "dump everything" approach
vs chunked RAG (25K tokens) 1.7x fewer tokens Standard vector-search RAG

The headline number — 776x inference compression — is what matters for cost and latency. Everything else is a bookkeeping detail of how the Librarian files its books.

Needle-in-a-haystack on this 7,264-gene genome (~46MB raw knowledge):

Model Params VRAM Retrieval Accuracy
qwen3:0.6b 0.6B 0.5 GB 10/10 2/10
qwen3:1.7b 1.7B 1.4 GB 10/10 3/10
qwen3:4b 4B 2.5 GB 10/10 9/10
gemma4:e4b (MoE) 8B / 4B active 9.6 GB 10/10 9/10
qwen3:8b 8B 5.2 GB 10/10 9/10
gemma4:26b-a4b (MoE + DDR4 offload) 26B / 4B active 8 GB + 13 GB RAM 10/10 6/10
Claude Haiku + Helix API 10/10 10/10
Claude Sonnet + Helix API 10/10 10/10
Claude Opus + Helix API 10/10 10/10

Without Helix, the same Claude models score 3-4/10 (hand-curated reference only). The genome is a universal uplift: identical gains at every price tier and parameter count. See docs/RESEARCH.md for the full SIKE analysis.

Quick Start

# Install from PyPI (beta)
pip install helix-context --pre

# Pull a small model for the ribosome (context codec)
ollama pull gemma4:e2b

# Start the proxy
helix
# or: python -m uvicorn helix_context.server:app --host 127.0.0.1 --port 11437

# Seed the genome with your own project files
python examples/seed_genome.py path/to/your/project/

# Check genome health
curl http://127.0.0.1:11437/stats

Point any OpenAI-compatible client at http://127.0.0.1:11437/v1 and start chatting. Context compression happens transparently.

What You'll See

After seeding the genome, /stats shows the state of your knowledge base:

$ curl -s http://127.0.0.1:11437/stats | jq
{
  "total_genes": 7264,
  "open": 7264,
  "compression_ratio": 2.69,
  "health": {
    "total_queries": 503,
    "avg_ellipticity": 0.62,
    "status_counts": {"aligned": 143, "sparse": 267, "denatured": 93}
  }
}

A /context query returns the expressed context window — exactly what gets injected into the downstream LLM:

$ curl -s http://127.0.0.1:11437/context \
    -H "Content-Type: application/json" \
    -d '{"query":"What port does the Helix proxy listen on?"}' | jq '.[0]'
{
  "name": "Helix Genome Context",
  "description": "12 genes expressed, 3.1x compression, health=aligned (Δε=0.66)",
  "content": "<expressed_context>\n<GENE src=\"helix-context/README.md\" facts=\"port=11437\">\n# Helix Context\n...",
  "context_health": {
    "ellipticity": 0.66,
    "coverage": 0.85,
    "density": 0.42,
    "freshness": 1.0,
    "genes_expressed": 12,
    "status": "aligned"
  }
}

A chat request through the proxy gets the context injected automatically — your client doesn't need to know Helix exists:

$ curl -s http://127.0.0.1:11437/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "qwen3:4b",
      "messages": [{"role":"user","content":"What port does the Helix proxy use?"}]
    }' | jq -r '.choices[0].message.content'

The Helix proxy server listens on **port 11437**, as specified in helix.toml
under [server]. This is configured in the repository at helix-context/README.md.

The model answered from the retrieved genes, not its training data — which doesn't contain your project.

How It Works

6-step expression pipeline per turn:

Step What Cost Blocking?
1. Extract Heuristic keyword extraction from query 0 tokens No
2. Express SQLite promoter lookup + synonym expansion + co-activation 0 tokens No
3. Re-rank Small CPU model scores candidates by relevance ~300 tokens Yes
4. Splice Small CPU model trims introns, keeps exons (batched) ~600 tokens Yes
5. Assemble Join spliced parts, enforce token budget, wrap in tags 0 tokens No
6. Replicate Pack query+response exchange back into genome ~300 tokens No (background)

Token budget:

  • 3k tokens: ribosome decoder prompt (fixed, tells the big model how to read codons)
  • 12k tokens: expressed context (dense XML gene format, 12 genes per turn)
  • 11M+ tokens: genome cold storage (SQLite, ~46MB raw on a mature project)

Compression metrics:

  • Storage: 2.7x (raw content → ribosome complements)
  • Expression: 769x (full genome → what the LLM sees per turn)
  • vs naive RAG at 25K tokens: 1.7x fewer tokens, 10/10 vs ~6/10 accuracy

Key Features

Context Health Monitor (Delta-Epsilon)

Every query computes a health signal measuring how well the genome served it:

{
  "context_health": {
    "ellipticity": 0.82,
    "coverage": 0.75,
    "density": 0.68,
    "freshness": 1.0,
    "genes_expressed": 3,
    "genes_available": 42,
    "status": "aligned"
  }
}
Status Ellipticity Meaning
aligned >= 0.7 Genome is well-grounded, model is informed
sparse >= 0.3 Gaps exist, model may guess on some topics
stale any Expressed genes are outdated (low freshness)
denatured < 0.3 Context is unreliable, high hallucination risk

Horizontal Gene Transfer (HGT)

Export a genome and import it into another Helix instance:

# Export
python examples/hgt_transfer.py export -d "Project knowledge snapshot"

# Preview what an import would change
python examples/hgt_transfer.py diff genome_export.helix

# Import into another instance
python examples/hgt_transfer.py import genome_export.helix

Three merge strategies: skip_existing (safe default), overwrite, newest. Content-addressed gene IDs ensure deduplication across instances.

Associative Memory

Genes that are frequently expressed together build co-activation links. When you query for topic A, the genome also pulls in topic B if they've been co-expressed before. This creates an organic associative memory that grows smarter over time.

Tissue-Specific Expression (MoE + Small Models)

MoE models (Gemma 4) and sub-3.2B models can't reliably "look back" across a 15K context window. Helix auto-detects these architectures and switches to a tissue-specific expression mode inspired by how cell types selectively express genes from the same genome:

  1. Answer slate — pre-extracted key=value facts front-loaded in the first ~200 tokens, inside every sliding-window attention layer (Gemma 4's 5:1 SWA ratio means 5 of 6 layers only see 1,024-token windows).
  2. Relevance-first gene ordering — highest-scoring gene at position 0, not sorted by source sequence. Guarantees the best match lands inside every attention window.
  3. Think suppression/no_think injection + temp=0 for small models that otherwise waste their output budget on reasoning loops.

Measured impact on gemma4:e4b:

Mode Retrieval Accuracy
Standard expression 10/10 5/10
MoE tissue expression 10/10 9/10

Dense models (qwen3 family) automatically use the standard expression path and are unaffected. Detection is per-request based on the downstream model name, so the same server can handle mixed clients.

Synonym Expansion

Configure lightweight query expansion in helix.toml:

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

When a user asks about "cache", the genome also searches for "redis", "ttl", etc.

HTTP Endpoints

Core endpoints

Endpoint Method Description
/v1/chat/completions POST OpenAI-compatible proxy (primary integration)
/ingest POST Ingest content into genome: {content, content_type, metadata?}
/context POST Query genome for context: {query} (Continue format)
/consolidate POST Distill session buffer into knowledge genes
/stats GET Genome metrics, compression ratio, health
/health GET Server status, ribosome model, gene count
/health/history GET Recent query health signals (?limit=N)

Admin / maintenance endpoints

Endpoint Method Description
/admin/refresh POST Reopen the genome connection to see external writes
/admin/vacuum POST Reclaim free SQLite pages after thinning (returns before/after size)
/admin/kv-backfill POST Run CPU regex KV extraction on genes missing key_values
/replicas GET List replica status (sync lag, paths)
/replicas/sync POST Force-sync all replicas from the master genome
/bridge/status GET Shared-memory bridge status (inbox, signals)
/bridge/collect POST Ingest pending files from the shared bridge inbox
/bridge/signal POST Write a named signal to the shared bridge

Four operations that sound similar — but do different things

These are the most confused operations in the admin surface. Know which one to reach for:

Operation What it does When to use
checkpoint(mode) Flush WAL log into the main DB file. No file size change. During/after bulk ingest, to guarantee data is durable before a crash. Automatic every 50 inserts.
refresh() / /admin/refresh Close and reopen the long-lived DB connection so it picks up writes made by external processes. After running a thinning script, ingest worker, or any out-of-band write. Cheap, non-destructive.
compact() Scan every gene's source_id, mtime-check the file, mark source-changed genes as AGING. Does not delete or shrink anything. Periodic source-staleness detection (runs automatically every compact_interval seconds).
vacuum() / /admin/vacuum Rewrite the SQLite file to reclaim free pages from previous deletions. Shrinks the file. After large thinning operations. Blocking — run during maintenance windows only. Our 7.2K-gene genome reclaimed 229 MB (30%) on first VACUUM.

Rule of thumb:

  • If you care about durabilitycheckpoint()
  • If you care about visibility (seeing external writes) → refresh()
  • If you care about staleness (detecting changed sources) → compact()
  • If you care about disk spacevacuum()

Continue IDE Integration

Add to ~/.continue/config.yaml:

models:
  - name: Helix (Local)
    provider: openai
    model: gemma4:e4b
    apiBase: http://127.0.0.1:11437/v1
    apiKey: EMPTY
    roles: [chat]
    defaultCompletionOptions:
      contextLength: 128000
      maxTokens: 4096

Use Chat mode (not Agent mode). Set contextLength high so Continue sends the full message; Helix handles compression downstream.

Python API

from helix_context import HelixContextManager, load_config

config = load_config()
helix = HelixContextManager(config)

# Ingest content
helix.ingest("Your document text here", content_type="text")
helix.ingest(open("src/main.py").read(), content_type="code")

# Build context for a query
window = helix.build_context("How does auth work?")
print(window.expressed_context)
print(window.context_health.status)  # "aligned" / "sparse" / "denatured"

# Learn from an exchange
helix.learn("How does auth work?", "JWT middleware validates tokens...")

# Export genome
from helix_context.hgt import export_genome
export_genome(helix.genome, "project.helix", description="Auth system knowledge")

ScoreRift Integration

Helix includes a bridge to ScoreRift for divergence-based context health monitoring:

from helix_context.integrations.scorerift import GenomeHealthProbe, cd_signal

# Probe genome health
probe = GenomeHealthProbe("http://127.0.0.1:11437")
report = probe.full_scan()

# Register as ScoreRift dimensions
from helix_context.integrations.scorerift import make_genome_dimensions
engine.register_many(make_genome_dimensions())

# Feed divergence resolutions back into the genome
from helix_context.integrations.scorerift import resolution_to_gene
resolution_to_gene("security", auto_score=0.85, manual_score=1.0,
                   resolution="False positives in auth module scanner rules")

Configuration

All config in helix.toml:

[ribosome]
model = "gemma4:e4b"        # context codec for pack/re_rank/splice
backend = "ollama"          # or "deberta" for faster CPU-only ribosome
timeout = 30                # seconds before fallback
keep_alive = "30m"          # keep model loaded (eliminates swap latency)
warmup = true               # pre-load model on server start

[budget]
ribosome_tokens = 3000
expression_tokens = 12000   # 15K total per turn (decoder + expression)
max_genes_per_turn = 12
splice_aggressiveness = 0.3
decoder_mode = "condensed"  # full | condensed | minimal | none

[genome]
path = "genome.db"
cold_start_threshold = 10
replicas = ["C:/helix-cache/genome.db", "E:/helix-cache/genome.db"]
replica_sync_interval = 100

[ingestion]
backend = "cpu"             # "cpu" (spaCy+regex, fast) | "ollama" (LLM, slow)
splade_enabled = true       # SPLADE sparse expansion at index time
entity_graph = true         # entity-based co-activation links

[server]
host = "127.0.0.1"
port = 11437
upstream = "http://localhost:11434"

[synonyms]
cache = ["redis", "ttl", "invalidation", "cdn"]
auth = ["jwt", "login", "security", "token"]

Environment variables:

  • OLLAMA_KV_CACHE_TYPE=q4_0 — INT4 KV cache quantization (recommended). q8_0 tested but produced WORSE accuracy (gave models more room to hallucinate in think mode). q4_0 is faster, more accurate, and uses less VRAM.
  • HELIX_CONFIG=/path/to/helix.toml — override config file location

Testing

# Mock tests only (no Ollama needed, ~8s)
pytest tests/ -m "not live"

# Live tests (requires Ollama)
pytest tests/ -m live -v -s

# Full suite
pytest tests/ -v

Benchmarks

# Needle-in-a-haystack (single model)
HELIX_MODEL=qwen3:4b python benchmarks/bench_needle.py

# Full sweep across all local models
python benchmarks/bench_sweep.py

See docs/RESEARCH.md for full SIKE analysis and results across 7 local models + 3 Claude API tiers.

Architecture

Module Role
schemas.py Gene, ContextWindow, ContextHealth, ChromatinState
codons.py CodonChunker (text/code splitting) + CodonEncoder (serialization)
genome.py SQLite genome with promoter-tag retrieval + co-activation
ribosome.py Small-model codec: pack, re_rank, splice, replicate
context_manager.py 6-step pipeline orchestrator + pending replication buffer
server.py FastAPI proxy + standalone endpoints
config.py TOML config loader with synonym map
hgt.py Genome export/import (Horizontal Gene Transfer)
integrations/scorerift.py CD spectroscope bridge to ScoreRift

Origin

Built as a standalone package extracted from BigEd CC. Implements the "Ribosome Hypothesis" for local LLM context management.

License

Apache 2.0

About

Genome-based context compression for local LLMs

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages