β οΈ Early Development β API may change.
Vectorless is a Rust library for querying structured documents using natural language β without vector databases or embedding models.
Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses a hybrid algorithm + LLM approach to navigate it β like how a human reads a table of contents:
- Algorithm handles "how to walk" β BM25 scoring, tree traversal (fast, deterministic)
- Pilot (LLM) handles "where to go" β semantic understanding, ambiguity resolution
Analogy: Traditional RAG is like searching every word in a book. Vectorless is like reading the table of contents, then going to the right chapter.
Technical Manual (root)
βββ Chapter 1: Introduction
βββ Chapter 2: Architecture
β βββ 2.1 System Design
β βββ 2.2 Implementation
βββ Chapter 3: API Reference
Each node gets an AI-generated summary, enabling fast navigation.
When you ask "How do I reset the device?":
- Analyze β Understand query intent and complexity
- Navigate β LLM guides tree traversal (like reading a TOC)
- Retrieve β Return the exact section with context
- Verify β Check if more information is needed (backtracking)
| Aspect | Traditional RAG | Vectorless |
|---|---|---|
| Infrastructure | Vector DB + Embedding Model | Just LLM API |
| Document Structure | Lost in chunking | Preserved |
| Context | Fragment only | Section + surrounding context |
| Setup Time | Hours to Days | Minutes |
| Best For | Unstructured text | Structured documents |
Input:
Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"
Output:
Answer: "To reset the device, hold the power button for 10 seconds
until the LED flashes blue, then release..."
Source: Chapter 4 > Section 4.2 > Reset Procedure
β Good fit:
- Technical documentation
- Manuals and guides
- Structured reports
- Policy documents
- Any document with clear hierarchy
β Not ideal:
- Unstructured text (tweets, chat logs)
- Very short documents (< 1 page)
- Pure Q&A datasets without structure
[dependencies]
vectorless = "0.1"cp vectorless.example.toml ./vectorless.tomluse vectorless::Engine;
#[tokio::main]
async fn main() -> vectorless::Result<()> {
// Create client
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;
// Index a document (PDF, Markdown, DOCX, HTML)
let doc_id = client.index("./document.pdf").await?;
// Query with natural language
let result = client.query(&doc_id, "What are the system requirements?").await?;
println!("Answer: {}", result.content);
println!("Source: {}", result.path); // e.g., "Chapter 2 > Section 2.1"
Ok(())
}| Feature | Description |
|---|---|
| Zero Infrastructure | No vector DB, no embedding model β just an LLM API |
| Multi-format Support | PDF, Markdown, DOCX, HTML out of the box |
| Incremental Updates | Add/remove documents without full re-index |
| Traceable Results | See the exact navigation path taken |
| Feedback Learning | Improves from user feedback over time |
| Multi-turn Queries | Handles complex questions with decomposition |
- Index Pipeline β Parses documents, builds tree, generates summaries
- Retrieval Pipeline β Analyzes query, navigates tree, returns results
- Pilot β LLM-powered navigator that guides retrieval decisions
- Metrics Hub β Unified observability for LLM calls, retrieval, and feedback
See the examples/ directory.
Contributions welcome! If you find this useful, please β the repo β it helps others discover it.
Apache License 2.0