Skip to content

Add --sample <n> flag for quick data preview with schema #89

@vmvarela

Description

@vmvarela

Description

When exploring an unfamiliar dataset, users typically want to see a few rows alongside the inferred schema. Currently this requires two separate steps (--columns + SELECT * FROM t LIMIT n). A single --sample <n> flag combines both into one invocation designed for exploration.

Example

$ cat sales.csv | sql-pipe --sample 3
# Schema (5 columns, 42,317 rows estimated):
#   id       INTEGER
#   region   TEXT
#   amount   REAL
#   date     TEXT
#   status   TEXT

id,region,amount,date,status
1,North,1250.00,2024-01-15,paid
2,South,875.50,2024-01-16,pending
3,East,2100.75,2024-01-16,paid

Acceptance Criteria

  • --sample <n> (default n=10 if flag given without value, or require explicit value — decide) prints a schema comment block to stderr followed by the first n data rows to stdout as CSV
  • Schema block lists each column name and its inferred type, prefixed with # so it is ignored by downstream CSV parsers
  • --sample implies --header (column names printed as first CSV row)
  • --sample is mutually exclusive with --json; compatible with --delimiter / --tsv
  • Exits after emitting n rows — does not need to read the entire input
  • Documented in --help, README.md, and docs/sql-pipe.1.scd
  • Tests: correct number of rows output, schema block present on stderr, early exit confirmed

Notes

  • Type inference still reads up to 100 rows (or n, whichever is larger) before emitting output
  • --sample is a read/explore mode, not a query mode — no SQL query argument required

Dependencies

Depends on #85 (--columns flag must exist so --sample can extend it to show sample rows alongside schema)

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:mediumShould be done soonsize:sSmall — 1 to 4 hoursstatus:readyRefined and ready for sprint selectiontype:featureNew functionality

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions