-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
priority:mediumShould be done soonShould be done soonsize:sSmall — 1 to 4 hoursSmall — 1 to 4 hoursstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:spikeResearch or investigation (timeboxed)Research or investigation (timeboxed)
Description
Part of #69
Description
Investigate two approaches for handling large CSV files that don't fit in memory:
- SQLite virtual table () — streaming CSV input via a virtual table interface
- Disk-backed temp storage (
PRAGMA temp_store = FILE) — configure SQLite to spill to disk automatically
Produce a written recommendation (implementation notes, trade-offs, estimated effort) to guide sub-issues 2–5.
Acceptance Criteria
- Both approaches prototyped or evaluated with a 1GB+ test file
- Recommendation written as a comment on this issue: which approach to implement first and why
- Memory usage measured for each approach
- Known limitations documented (e.g. which SQL operations won't work)
Notes
- The disk-backed approach may deliver 80% of the value with 20% of the complexity
- Start with PRAGMA temp_store = FILE since it requires no query-semantic changes
- Timebox to 4 hours max
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority:mediumShould be done soonShould be done soonsize:sSmall — 1 to 4 hoursSmall — 1 to 4 hoursstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:spikeResearch or investigation (timeboxed)Research or investigation (timeboxed)