documentdb · xgerman · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
@@ -0,0 +1,230 @@
+# LightRAG with DocumentDB
+
+This playground deploys [LightRAG](https://github.com/HKUDS/LightRAG) — a graph-based Retrieval-Augmented Generation (RAG) engine — using DocumentDB as its MongoDB-compatible storage backend.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────┐
+│                  Kubernetes Cluster                  │
+│                                                     │
+│  ┌──────────────┐    ┌──────────────┐               │
+│  │   LightRAG   │───▶│   Ollama     │               │
+│  │  (RAG Engine)│    │ (LLM + Embed)│               │
+│  └──────┬───────┘    └──────────────┘               │
+│         │                                           │
+│         │ MongoDB wire protocol                     │
+│         ▼                                           │
+│  ┌──────────────┐    ┌──────────────┐               │
+│  │  DocumentDB  │───▶│  PostgreSQL  │               │
+│  │  (Gateway)   │    │   (CNPG)     │               │
+│  └──────────────┘    └──────────────┘               │
+│                                                     │
+│  Storage mapping:                                   │
+│  ├─ KV storage      → MongoKVStorage (DocumentDB)   │
+│  ├─ Graph storage   → MongoGraphStorage (DocumentDB) │
+│  ├─ Doc status      → MongoDocStatusStorage (DocDB)  │
+│  └─ Vector storage  → NanoVectorDBStorage (local)    │
+└─────────────────────────────────────────────────────┘
+```
+
+LightRAG stores knowledge graph nodes, edges, document metadata, and LLM response caches in DocumentDB collections. Vector embeddings use local file-based storage because DocumentDB does not support the Atlas `$vectorSearch` operator.
+
+## Prerequisites
+
+- A running Kubernetes cluster with the DocumentDB operator installed
+- A healthy DocumentDB instance (see [Quick Start](../../docs/operator-public-documentation/preview/index.md))
+- [Helm](https://helm.sh/docs/intro/install/) v3.0+
+- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) configured for your cluster
+
+## Quick Start
+
+```bash
+# Deploy everything (Ollama + LightRAG) with default settings
+./scripts/deploy.sh
+
+# Clean up
+./scripts/cleanup.sh
+```
+
+## Step-by-Step Deployment
+
+### 1. Verify DocumentDB is Running
+
+```bash
+kubectl get documentdb --all-namespaces
+# Expected: STATUS = "Cluster in healthy state"
+```
+
+### 2. Deploy Ollama (LLM Backend)
+
+```bash
+kubectl apply -f helm/ollama.yaml
+kubectl wait --for=condition=ready pod -l app=ollama -n lightrag --timeout=120s
+
+# Pull models (required on first run)
+OLLAMA_POD=$(kubectl get pod -l app=ollama -n lightrag -o jsonpath='{.items[0].metadata.name}')
+kubectl exec -n lightrag "$OLLAMA_POD" -- ollama pull nomic-embed-text
+kubectl exec -n lightrag "$OLLAMA_POD" -- ollama pull qwen2.5:3b
+```
+
+### 3. Deploy LightRAG
+
+Edit `helm/lightrag-values.yaml` to set your DocumentDB connection string, then:
+
+```bash
+helm upgrade --install lightrag helm/lightrag \
+  -n lightrag \
+  -f helm/lightrag-values.yaml
+```
+
+### 4. Access the WebUI
+
+```bash
+kubectl port-forward svc/lightrag 9621:9621 -n lightrag
+# Open http://localhost:9621
+```
+
+### 5. Test Document Ingestion
+
+```bash
+# Insert a document
+curl -X POST http://localhost:9621/documents/text \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Your document text here..."}'
+
+# Query with graph-enhanced RAG
+curl -X POST http://localhost:9621/query \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is this document about?", "mode": "hybrid"}'
+```
+
+## Configuration
+
+### DocumentDB Connection
+
+Update the `MONGO_URI` in `helm/lightrag-values.yaml`:
+
+```yaml
+env:
+  MONGO_URI: "mongodb://<user>:<password>@<service>.<namespace>.svc.cluster.local:<port>/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true"
+  MONGO_DATABASE: "LightRAG"
+```
+
+To get your connection string from the DocumentDB resource status:
+
+```bash
+# The connection string contains embedded kubectl commands for credentials.
+# Use eval to resolve them into a usable URI.
+RAW_CONN=$(kubectl get documentdb <cluster-name> -n <documentdb-namespace> -o jsonpath='{.status.connectionString}')
+CONNECTION_STRING=$(eval "echo \"$RAW_CONN\"")
+echo "$CONNECTION_STRING"
+```
+
+> **Note:** The `eval` command executes shell expansions in the connection string. This is safe when the string comes from your own DocumentDB resource, but never pipe untrusted input through `eval`.
+
+### LLM Configuration
+
+The default configuration uses [Ollama](https://ollama.com) with `qwen2.5:3b` for text generation and `nomic-embed-text` for embeddings. To use OpenAI instead:
+
+```yaml
+env:
+  LLM_BINDING: openai
+  LLM_MODEL: gpt-4o-mini
+  LLM_BINDING_API_KEY: "sk-..."
+  EMBEDDING_BINDING: openai
+  EMBEDDING_MODEL: text-embedding-3-small
+  EMBEDDING_DIM: "1536"
+  EMBEDDING_BINDING_API_KEY: "sk-..."
+```
+
+### Storage Configuration
+
+| Storage Type | Backend | Notes |
+|---|---|---|
+| KV Storage | `MongoKVStorage` | Documents, chunks, entities, relations |
+| Graph Storage | `MongoGraphStorage` | Knowledge graph nodes and edges |
+| Doc Status | `MongoDocStatusStorage` | Document processing state |
+| Vector Storage | `NanoVectorDBStorage` | Local file-based (PVC) |
+
+> **Why not MongoVectorDBStorage?** DocumentDB does not support the MongoDB Atlas `$vectorSearch` aggregation operator required by `MongoVectorDBStorage`. The file-based `NanoVectorDBStorage` works without limitations.
+
+## DocumentDB Compatibility
+
+LightRAG's MongoDB storage assumes MongoDB Atlas features. This playground includes an init container that patches the LightRAG code for DocumentDB compatibility:
+
+| Feature | MongoDB Atlas | DocumentDB | Workaround |
+|---|---|---|---|
+| `$vectorSearch` | ✅ | ❌ | Use NanoVectorDBStorage |
+| `$listSearchIndexes` | ✅ | ❌ | Graceful fallback to regex |
+| `createIndex` with collation | ✅ | ❌ | Skip collation indexes |
+| `createIndex` (secondary) | ✅ | Hangs | Skip via init-container patch |
+| Basic CRUD operations | ✅ | ✅ | Works natively |
+| Aggregation pipelines | ✅ | ✅ | `$group`, `$match`, `$sort` work |
+
+### How the Init-Container Patches Work
+
+The Helm chart's `deployment.yaml` includes an init container (`patch-for-documentdb`) that runs before the main LightRAG container starts. It modifies LightRAG's MongoDB storage layer in-place to skip operations that are incompatible with DocumentDB.
+
+**What gets patched:**
+
+Three async methods in `lightrag/kg/mongo_impl.py` are stubbed out with an early `return`:
+
+| Method | Why it's patched |
+|---|---|
+| `create_and_migrate_indexes_if_not_exists` | Calls `createIndex` with collation and secondary indexes. DocumentDB rejects collation (`"not implemented yet"`) and hangs indefinitely on secondary index creation. |
+| `create_search_index_if_not_exists` | Calls `$listSearchIndexes` which DocumentDB doesn't support. While LightRAG catches the `PyMongoError` gracefully, skipping it avoids unnecessary error logs. |
+| `create_vector_index_if_not_exists` | Creates Atlas `$vectorSearch` indexes. Not applicable because this playground uses `NanoVectorDBStorage` (local) instead of `MongoVectorDBStorage`. |
+
+**How it works:**
+
+1. The init container shares the same image as the main LightRAG container.
+2. A Python script inserts `return` statements at the top of each method, effectively making them no-ops.
+3. Both code locations are patched — `/app/lightrag/` (dev install) and `/app/.venv/lib/python3.12/site-packages/lightrag/` (venv install) — because the LightRAG Docker image includes two copies.
+4. Bytecode caches (`__pycache__/mongo_impl*.pyc`) are cleared to prevent stale compiled code from being loaded.
+
+**Impact:** LightRAG operates normally without indexes. All CRUD, aggregation, and graph traversal operations work correctly. The only trade-off is that queries on large datasets may be slower without secondary indexes, which is acceptable for a playground.
+
+The patches are applied automatically — no manual configuration is needed.
+
+## Verified Operations
+
+The following LightRAG operations have been tested with DocumentDB:
+
+- ✅ Document ingestion and chunking
+- ✅ Entity and relationship extraction (via LLM)
+- ✅ Knowledge graph storage and traversal
+- ✅ LLM response caching
+- ✅ Naive, local, global, and hybrid RAG queries
+- ✅ Document status tracking
+- ✅ WebUI for graph visualization
+
+## Troubleshooting
+
+### LightRAG pod stuck in `Running` but not `Ready`
+
+The most common cause is `createIndex` hanging on DocumentDB. Verify the init container patch applied correctly:
+
+```bash
+POD=$(kubectl get pod -l app.kubernetes.io/name=lightrag -n lightrag -o jsonpath='{.items[0].metadata.name}')
+kubectl logs -n lightrag "$POD" -c patch-for-documentdb
+# Should show: "DocumentDB compatibility patches applied"
+```
+
+### Cannot connect to DocumentDB
+
+Verify the gateway service is reachable from the lightrag namespace:
+
+```bash
+kubectl run mongo-test --rm -it --restart=Never -n lightrag --image=mongo:7 \
+  --command -- mongosh "<your-connection-string>" --eval 'db.adminCommand({ping:1})'
+```
+
+### LLM errors during document processing
+
+Check that Ollama has the models pulled:
+
+```bash
+OLLAMA_POD=$(kubectl get pod -l app=ollama -n lightrag -o jsonpath='{.items[0].metadata.name}')
+kubectl exec -n lightrag "$OLLAMA_POD" -- ollama list
+```
@@ -0,0 +1,58 @@
+# LightRAG configuration for DocumentDB backend.
+# Update MONGO_URI with your DocumentDB connection string before deploying.
+replicaCount: 1
+
+image:
+  repository: ghcr.io/hkuds/lightrag
+  tag: latest
+  imagePullSecrets: []
+
+updateStrategy:
+  type: Recreate
+
+service:
+  type: ClusterIP
+  port: 9621
+
+resources:
+  limits:
+    cpu: 1000m
+    memory: 2Gi
+  requests:
+    cpu: 500m
+    memory: 1Gi
+
+persistence:
+  enabled: true
+  ragStorage:
+    size: 5Gi
+  inputs:
+    size: 2Gi
+
+env:
+  HOST: "0.0.0.0"
+  PORT: "9621"
+  WEBUI_TITLE: "LightRAG + DocumentDB"
+  WEBUI_DESCRIPTION: "Graph RAG backed by DocumentDB Kubernetes Operator"
+  # LLM - Ollama (in-cluster)
+  LLM_BINDING: ollama
+  LLM_MODEL: qwen2.5:3b
+  LLM_BINDING_HOST: "http://ollama.lightrag.svc.cluster.local:11434"
+  LLM_BINDING_API_KEY: ""
+  # Embedding - Ollama (in-cluster)
+  EMBEDDING_BINDING: ollama
+  EMBEDDING_MODEL: nomic-embed-text
+  EMBEDDING_DIM: "768"
+  EMBEDDING_BINDING_API_KEY: ""
+  # Storage - DocumentDB for KV/Graph/DocStatus, local for vectors
+  LIGHTRAG_KV_STORAGE: MongoKVStorage
+  LIGHTRAG_VECTOR_STORAGE: NanoVectorDBStorage
+  LIGHTRAG_GRAPH_STORAGE: MongoGraphStorage
+  LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage
+  # DocumentDB connection — update these values for your cluster
+  MONGO_URI: "mongodb://admin:MySecurePassword123@documentdb-service-my-cluster.documentdb-demo.svc.cluster.local:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true"
+  MONGO_DATABASE: "LightRAG"
+
+envFrom:
+  configmaps: []
+  secrets: []
@@ -0,0 +1,6 @@
+apiVersion: v2
+name: lightrag
+description: LightRAG graph-based RAG engine with DocumentDB backend
+type: application
+version: 0.1.0
+appVersion: "1.4.13"
@@ -0,0 +1,20 @@
+LightRAG has been deployed with DocumentDB backend.
+
+1. Get the application URL:
+{{- if eq .Values.service.type "ClusterIP" }}
+  kubectl port-forward svc/{{ include "lightrag.fullname" . }} -n {{ .Release.Namespace }} 9621:{{ .Values.service.port }}
+  Then visit: http://localhost:9621
+{{- end }}
+
+2. Check health:
+  curl http://localhost:9621/health
+
+3. Insert a document:
+  curl -X POST http://localhost:9621/documents/text \
+    -H "Content-Type: application/json" \
+    -d '{"text": "Your text here"}'
+
+4. Query the knowledge graph:
+  curl -X POST http://localhost:9621/query \
+    -H "Content-Type: application/json" \
+    -d '{"query": "Your question", "mode": "hybrid"}'
@@ -0,0 +1,27 @@
+{{- define "lightrag.name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{- define "lightrag.fullname" -}}
+{{- default .Release.Name .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{- define "lightrag.labels" -}}
+app.kubernetes.io/name: {{ include "lightrag.name" . }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end }}
+
+{{- define "lightrag.selectorLabels" -}}
+app.kubernetes.io/name: {{ include "lightrag.name" . }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end }}
+
+{{- define "lightrag.envContent" -}}
+{{- $first := true -}}
+{{- range $key, $val := .Values.env -}}
+{{- if not $first -}}{{- "\n" -}}{{- end -}}
+{{- $first = false -}}
+{{ $key }}={{ $val }}
+{{- end -}}
+{{- end -}}