Skip to content

Chunked CSV reading for streaming mode #92

@vmvarela

Description

@vmvarela

Part of #69
Depends on #90 (streaming spike)

Description

Implement a streaming CSV reader that feeds data to SQLite in chunks rather than loading the entire file into memory at once. Add --stream and --chunk-size flags.

Acceptance Criteria

  • --stream flag enables chunked processing mode
  • --chunk-size <size> configures chunk size (default: 64MB, e.g. --chunk-size 128MB)
  • Simple queries (SELECT, WHERE, LIMIT) produce identical results to non-streaming mode
  • Memory usage stays bounded by chunk size, not input file size
  • Works correctly with piped stdin as well as file input

Notes

  • Chunked reading means importing rows in batches and using SQLite transactions per chunk
  • Result correctness for aggregates/GROUP BY depends on whether all data fits in temp storage (see Disk-backed large dataset support via SQLite temp storage #91)
  • Start with a single-pass chunked insert, not a full virtual table implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:mediumShould be done soonsize:mMedium — 4 to 8 hoursstatus:readyRefined and ready for sprint selectiontype:featureNew functionality

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions