-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
priority:mediumShould be done soonShould be done soonsize:sSmall — 1 to 4 hoursSmall — 1 to 4 hoursstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:featureNew functionalityNew functionality
Description
Description
When exploring an unfamiliar dataset, users typically want to see a few rows alongside the inferred schema. Currently this requires two separate steps (--columns + SELECT * FROM t LIMIT n). A single --sample <n> flag combines both into one invocation designed for exploration.
Example
$ cat sales.csv | sql-pipe --sample 3
# Schema (5 columns, 42,317 rows estimated):
# id INTEGER
# region TEXT
# amount REAL
# date TEXT
# status TEXT
id,region,amount,date,status
1,North,1250.00,2024-01-15,paid
2,South,875.50,2024-01-16,pending
3,East,2100.75,2024-01-16,paidAcceptance Criteria
-
--sample <n>(defaultn=10if flag given without value, or require explicit value — decide) prints a schema comment block to stderr followed by the firstndata rows to stdout as CSV - Schema block lists each column name and its inferred type, prefixed with
#so it is ignored by downstream CSV parsers -
--sampleimplies--header(column names printed as first CSV row) -
--sampleis mutually exclusive with--json; compatible with--delimiter/--tsv - Exits after emitting
nrows — does not need to read the entire input - Documented in
--help, README.md, anddocs/sql-pipe.1.scd - Tests: correct number of rows output, schema block present on stderr, early exit confirmed
Notes
- Type inference still reads up to 100 rows (or
n, whichever is larger) before emitting output --sampleis a read/explore mode, not a query mode — no SQL query argument required
Dependencies
Depends on #85 (--columns flag must exist so --sample can extend it to show sample rows alongside schema)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority:mediumShould be done soonShould be done soonsize:sSmall — 1 to 4 hoursSmall — 1 to 4 hoursstatus:readyRefined and ready for sprint selectionRefined and ready for sprint selectiontype:featureNew functionalityNew functionality