-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
priority:lowNice to have, do when possibleNice to have, do when possiblesize:lLarge — 1 to 2 daysLarge — 1 to 2 daysstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:featureNew functionalityNew functionality
Description
Part of #68
Depends on #95 (format plugin architecture)
Description
Add Apache Parquet as an input and output format. This requires finding a Zig-compatible Parquet library or C bindings (e.g., Apache Arrow C Data Interface or nanoarrow).
This is a size:l issue due to the library integration complexity. A spike may be needed first to confirm feasibility.
Acceptance Criteria
-
--input-format parquetreads a Parquet file from stdin or--inputflag -
--output-format parquetwrites results as Parquet to stdout or--outputflag - Column types are preserved (integers, floats, strings, timestamps)
- Parquet schema is inferred from query result column types
- Error message if Parquet library is not available at build time (optional feature flag)
- Tested with files generated by pandas, DuckDB, and Apache Spark
Notes
- Investigate: nanoarrow (C library, small, permissive license), parquet-go (not relevant), or building from scratch
- Parquet is columnar — reading row-by-row may be inefficient; batch reads preferred
- May want to gate this behind a compile-time feature flag (
-Dparquet=true) to avoid mandatory C dependency - Consider de-scoping to just Parquet input first (output is harder)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority:lowNice to have, do when possibleNice to have, do when possiblesize:lLarge — 1 to 2 daysLarge — 1 to 2 daysstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:featureNew functionalityNew functionality