-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Design and propose a production-ready index schema migrator for RedisVL that supports drop-and-recreate migrations, data transformation/backfill, multi-index orchestration, and cluster-safe execution.
Level: Advanced
Current State
- RedisVL supports index create/delete/clear/load flows, but there is no first-class schema migration workflow.
- Schema changes that affect field shape (especially vector datatype/precision changes) can require scanning existing keys and rewriting payload fields before or during reindex.
- Teams currently need ad hoc scripts for:
- dropping/recreating indices,
- transforming indexed documents,
- handling large datasets in Redis Cluster environments.
Problem to Solve
How should RedisVL support production migrations when:
- one index needs multiple sequential schema updates,
- several related indices must migrate together,
- vector fields need precision/type conversion and data rewrite,
- migration must scale with large keyspaces and cluster slot constraints?
Proposed Change
Create a research + design issue that delivers a concrete migration proposal and implementation plan (not full implementation in this task):
-
Migration model
- Define migration spec format (source schema, target schema, transforms, batch size, safety options).
- Support a baseline strategy:
drop -> transform/backfill -> recreate -> validate. - Evaluate optional low-downtime strategy (
shadow index + cutover) and document tradeoffs.
-
Data transformation/backfill
- Define transform hooks for field-level rewrites (e.g., vector
float32 -> float16conversions). - Specify how documents are scanned, transformed, and re-written safely.
- Define transform hooks for field-level rewrites (e.g., vector
-
Multi-index + multi-step orchestration
- Propose dependency-aware orchestration for N indices and ordered updates.
- Include checkpointing/resume semantics for long-running jobs.
-
Production/cluster scalability
- Batch execution and bounded memory guarantees.
- Slot-aware operations and cross-slot-safe deletion/update behavior.
- Throughput controls (concurrency limits, backpressure, retry policy).
-
Safety and observability
- Dry-run mode with migration plan preview and impact estimates.
- Validation checks before/after migration (doc counts, schema checks, sample query checks).
- Structured progress metrics/logging and failure recovery guidance.
Definition of Done
- A design doc (or RFC-style markdown) exists in-repo with:
- architecture and migration lifecycle,
- execution strategies (baseline + optional low-downtime),
- multi-index orchestration approach,
- cluster-scaling strategy,
- failure/rollback recommendations.
- Includes a proposed API surface (Python + optional CLI hooks) and phased implementation plan.
- Includes a breakdown into follow-up implementation issues sized for hackathon teams.
Out of Scope
- Full end-to-end migrator implementation in this issue.
- Guaranteeing zero-downtime for all migration types.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
Todo