Structural comparability gate — know whether two CSV datasets can be compared before you waste time trying.
No AI. No inference. Pure deterministic checks.
brew install cmdrvl/tap/shapeThe Problem: Before you can compare two CSV exports, you need to know if comparison is even meaningful. Do the columns match? Is the key unique? Did the schema drift? Finding out mid-analysis wastes time and produces misleading results.
The Solution: One structural gate. shape checks schema overlap, key viability, row granularity, and type consistency — then gives a deterministic verdict before you run any analysis.
| Feature | What It Does |
|---|---|
| Four structural checks | Schema overlap, key viability, row granularity, type consistency — all at once |
| Three clear outcomes | COMPATIBLE, INCOMPATIBLE, or REFUSAL — never ambiguous |
| Concrete reasons | When incompatible, tells you exactly what broke and why |
| Machine-readable | --json output for pipelines and CI gates |
| Pairs with rvl | Run shape first to validate structure, then rvl to explain numeric changes |
| Deterministic | Same inputs always produce the same output — no models, no heuristics |
| Ambient witness ledger | Every comparison is recorded for audit trails (opt-out with --no-witness) |
$ shape nov.csv dec.csv --key loan_idSHAPE
COMPATIBLE
Compared: nov.csv -> dec.csv
Key: loan_id (unique in both files)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Schema: 22 common / 22 total (100% overlap)
Key: loan_id — unique in both, coverage=1.0
Rows: 3,214 old / 3,201 new (13 removed, 0 added, 3,201 overlap)
Types: 12 numeric columns, 0 type shifts
All four checks pass. These files are structurally compatible — safe to proceed with rvl, compare, or verify.
# Gate a pipeline (shape before rvl):
$ shape nov.csv dec.csv --key loan_id --json > shape.json \
&& rvl nov.csv dec.csv --key loan_id --json > rvl.json
# Exit code only (for scripts):
$ shape old.csv new.csv > /dev/null 2>&1
$ echo $? # 0 = compatible, 1 = incompatible, 2 = refused
# Machine-readable:
$ shape old.csv new.csv --json | jq '.checks.schema_overlap'shape always produces exactly one of three outcomes. There are no partial results.
All structural checks pass. These datasets can be meaningfully compared.
SHAPE
COMPATIBLE
Compared: nov.csv -> dec.csv
Key: loan_id (unique in both files)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Schema: 22 common / 22 total (100% overlap)
Key: loan_id — unique in both, coverage=1.0
Rows: 3,214 old / 3,201 new (13 removed, 0 added, 3,201 overlap)
Types: 12 numeric columns, 0 type shifts
How to read this:
- Schema — how many columns are shared between the two files.
- Key — whether the key column is unique and non-null in both files.
- Rows — row counts and key overlap (how many keys appear in both files).
- Types — whether any columns changed from numeric to non-numeric or vice versa.
One or more structural checks failed. The reasons field explains exactly what broke.
SHAPE
INCOMPATIBLE
Compared: nov.csv -> dec.csv
Key: loan_id (unique in both files)
Dialect(old): delimiter=, quote=" escape=none
Dialect(new): delimiter=, quote=" escape=none
Schema: 15 common / 17 total (88% overlap)
old_only: [retired_field]
new_only: [new_field]
Key: loan_id — unique in both, coverage=1.0
Rows: 4,183 old / 4,201 new (33 removed, 51 added, 4,150 overlap)
Types: 12 numeric columns, 1 type shift
balance: numeric -> non-numeric
Reasons:
1. Type shift: balance changed from numeric to non-numeric
When shape cannot parse or read the inputs. Always includes a concrete next step.
SHAPE ERROR (E_EMPTY)
Compared: nov.csv -> dec.csv
Dialect(old): delimiter=, quote=" escape=none
One or both files empty (no data rows after header)
Next: provide non-empty datasets.
shape runs four independent structural checks. All must pass for COMPATIBLE.
Measures how many columns are shared between the two files.
- Pass condition: at least 1 common column (
overlap_ratio > 0) - Reports:
columns_common,columns_old_only,columns_new_only,overlap_ratio
Checks whether the key column is suitable for row alignment.
- Pass condition: key is unique in both files with no nulls
- Only checked when
--keyis provided - Reports:
key_column,unique_old,unique_new,coverage
Reports row counts and key overlap. Does not gate — agents and policies interpret the counts.
- Always passes — informational only
- Reports:
rows_old,rows_new,key_overlap,keys_old_only,keys_new_only
Checks whether any common columns changed type between files.
- Pass condition: no columns changed from numeric to non-numeric or vice versa
- Only checked on columns common to both files
- Reports:
numeric_columns,type_shifts
| Capability | shape | Manual inspection | csvkit | pandas profiling |
|---|---|---|---|---|
| Schema overlap check | ✅ Automated | ❌ Eyeball headers | csvstat per-file |
|
| Key uniqueness validation | ✅ Both files | ❌ Manual | ||
| Type shift detection | ✅ Cross-file | ❌ | ❌ | |
| Single deterministic verdict | ✅ | ❌ | ❌ | ❌ |
| Machine-readable output | ✅ --json |
❌ | ✅ | |
| Audit trail (witness ledger) | ✅ Built-in | ❌ | ❌ | ❌ |
| Setup time | ✅ brew install |
N/A |
When to use shape:
- Before running
rvl— validate structure first, then explain numeric changes - Monthly reconciliation pipelines — catch schema drift before it corrupts results
- CI gate — fail fast if upstream changed the export format
When shape might not be ideal:
- You need content comparison (use
rvlfor that) - You need data profiling (distributions, outliers) — use pandas or Great Expectations
- You're comparing non-CSV formats
brew install cmdrvl/tap/shapecurl -fsSL https://raw.githubusercontent.com/cmdrvl/shape/main/scripts/install.sh | bashSet-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/cmdrvl/shape/main/scripts/install.ps1'))cargo build --release
./target/release/shape --helpPrebuilt binaries are available for x86_64 and ARM64 on Linux, macOS, and Windows (x86_64). Each release includes SHA256 checksums, cosign signatures, and an SBOM.
shape <old.csv> <new.csv> [OPTIONS]
| Flag | Type | Default | Description |
|---|---|---|---|
--key <column> |
string | (none) | Key column to check for alignment viability (uniqueness, coverage). |
--delimiter <delim> |
string | (auto-detect) | Force CSV delimiter for both files. See Delimiter. |
--json |
flag | false |
Emit a single JSON object on stdout instead of human-readable output. |
--no-witness |
flag | false |
Suppress ambient witness ledger recording for this compare run. |
--capsule-dir <path> |
path | (none) | Write deterministic repro capsule artifacts (manifest.json, copied inputs, rendered output) to this directory. |
--describe |
flag | false |
Print the compiled-in operator.json to stdout and exit 0 without positional args. |
Reserved v0 flags (parsed for schema stability, not yet enforced at runtime)
| Flag | Type | Default | Description |
|---|---|---|---|
--profile <path> |
path | (none) | Profile for check scoping. |
--profile-id <id> |
string | (none) | Echoed as profile_id in JSON output. |
--lock <lockfile> |
path | (none) | Lock verification for inputs. |
--max-rows <n> |
integer | (unlimited) | Row-limit refusal. |
--max-bytes <n> |
integer | (unlimited) | Byte-limit refusal. |
| Code | Meaning |
|---|---|
0 |
COMPATIBLE |
1 |
INCOMPATIBLE |
2 |
REFUSAL or CLI error |
| Mode | COMPATIBLE | INCOMPATIBLE | REFUSAL |
|---|---|---|---|
| Human (default) | stdout | stdout | stderr |
--json |
stdout | stdout | stdout |
In --json mode, stderr is reserved for process-level failures only (CLI parse errors, panics).
Use --capsule-dir to emit deterministic replay artifacts for a run without changing standard output behavior.
shape old.csv new.csv --key loan_id --json --no-witness --capsule-dir capsules/run-001Generated layout:
capsules/run-001/
manifest.json
inputs/old.csv
inputs/new.csv
outputs/report.txt
Replay from the capsule directory:
cd capsules/run-001
shape inputs/old.csv inputs/new.csv --key loan_id --json --no-witnessmanifest.json also stores replay args and a shell command under replay.argv and replay.shell.
Each file's delimiter is detected independently. Candidate delimiters are evaluated in order:
,, \t, ;, |, ^.
If detection is ambiguous (or the winner yields a single-column parse), shape refuses with
E_DIALECT and provides an actionable next_command.
If the first line is exactly sep=<char>, that delimiter is used for that file and the sep= line
is consumed (not treated as header data).
--delimiter still overrides sep= when both are present.
Accepted values:
| Format | Examples |
|---|---|
| Named | comma, tab, semicolon, pipe, caret (case-insensitive) |
| Hex | 0x2c (comma), 0x09 (tab) |
| Single ASCII char | ,, ;, ` |
Rules:
- Hex form must be exactly two digits after
0x. - Allowed bytes are ASCII, excluding
"(0x22),\r,\n, NUL (0x00), and DEL (0x7f). - Invalid values fail as CLI argument errors (exit
2).
Both shape and rvl are designed to be consumed by agents and pipelines, not just humans.
An agent can learn how to invoke shape without reading docs:
$ shape --describe | jq '.exit_codes'
{
"0": { "meaning": "COMPATIBLE", "domain": "positive" },
"1": { "meaning": "INCOMPATIBLE", "domain": "negative" },
"2": { "meaning": "REFUSAL / CLI error", "domain": "error" }
}
$ shape --describe | jq '.pipeline'
{
"upstream": [],
"downstream": ["rvl", "compare", "verify", "assess"]
}# 1. Structural gate
shape old.csv new.csv --key id --json > shape.json
if [ $? -ne 0 ]; then
# INCOMPATIBLE or REFUSAL — read .reasons or .refusal for why
cat shape.json | jq '.reasons // .refusal'
exit 1
fi
# 2. Numeric explanation (only if structurally compatible)
rvl old.csv new.csv --key id --json > rvl.json
# 3. Agent extracts the verdict
outcome=$(jq -r '.outcome' rvl.json)
if [ "$outcome" = "REAL_CHANGE" ]; then
jq '.contributors[] | "\(.row_id).\(.column): \(.delta)"' rvl.json
fiEverything an agent needs is in --json output: structured verdicts, exit codes for branching, and --describe for tool discovery.
Check if files are compatible (exit code only):
shape old.csv new.csv > /dev/null 2>&1
echo $? # 0 = compatible, 1 = incompatible, 2 = refusedExtract schema overlap from JSON:
shape old.csv new.csv --json | jq '.checks.schema_overlap'Get incompatibility reasons:
shape old.csv new.csv --json | jq '.reasons'Gate a pipeline (shape before rvl):
shape nov.csv dec.csv --key loan_id --json > shape.json \
&& rvl nov.csv dec.csv --key loan_id --json > rvl.jsonEvery refusal includes the error code and a concrete next step.
| Code | Meaning | Next Step |
|---|---|---|
E_IO |
File read error | Check file path and permissions |
E_ENCODING |
Unsupported encoding (UTF-16/32 BOM or NUL bytes) | Convert/re-export as UTF-8 |
E_CSV_PARSE |
CSV parse failure | Re-export as standard RFC4180 CSV |
E_EMPTY |
One or both files empty | Provide non-empty datasets |
E_HEADERS |
Missing header or duplicate headers | Fix headers or re-export |
E_DIALECT |
Delimiter ambiguous or undetectable | Use --delimiter <delim> |
Reserved refusal codes (defined for schema stability, not emitted in v0)
| Code | Meaning | Next Step |
|---|---|---|
E_AMBIGUOUS_PROFILE |
Both --profile and --profile-id provided |
Provide exactly one profile selector |
E_INPUT_NOT_LOCKED |
Input not in any provided lockfile | Re-run with correct --lock or lock inputs first |
E_INPUT_DRIFT |
Input hash doesn't match locked member | Use the locked file; regenerate lock if expected |
E_TOO_LARGE |
Input exceeds --max-rows or --max-bytes |
Increase limit or split input |
Your file has a header row but no data rows. Check that the export actually produced data:
wc -l old.csv new.csvYour file uses an uncommon delimiter or has inconsistent field counts. Force the delimiter:
shape old.csv new.csv --delimiter pipe # for |
shape old.csv new.csv --delimiter 0x09 # for tab
shape old.csv new.csv --delimiter semicolon # for ;Two or more columns share the same header name. Fix at the source, or rename duplicates before running shape.
Check for trailing whitespace, invisible characters, or encoding issues in key values. shape trims ASCII whitespace, but non-ASCII whitespace (e.g., NBSP) is preserved.
A cell in the new file has a value that can't be parsed as a number (e.g., #REF!, a stray string, or locale-specific formatting). The type_shifts field in JSON shows exactly which columns changed.
| Limitation | Detail |
|---|---|
| Structural only | shape checks whether comparison is possible, not what changed. Use rvl for content diffs. |
| Two files only | No multi-file or directory comparison. |
| In-memory | Both files are loaded fully into memory. No streaming mode yet. |
| No column filtering | All common columns are checked. You can't exclude specific columns in v0. |
| No content sampling | shape doesn't look at data distributions or outliers — it checks structure only. |
| Profile/lock not enforced | --profile, --lock, --max-rows, --max-bytes are parsed but have no runtime effect in v0. |
It checks the shape of your data — schema, keys, row counts, types — before you compare content. If the shapes don't match, comparison is meaningless.
shape validates structure. rvl explains numeric changes. Run shape first to confirm the files are comparable, then rvl to see what actually changed. They share delimiter detection and refusal patterns.
Every shape comparison is appended to a local JSONL file (~/.epistemic/witness.jsonl, or $EPISTEMIC_WITNESS). This gives you an audit trail of every structural check. Suppress with --no-witness.
Yes, using witness subcommands. See Witness Subcommands below.
Yes. Exit codes (0/1/2) and --json output are designed for automation. Gate on exit code, or parse the JSON for richer assertions.
Not supported. Convert to CSV first.
Witness Subcommands
shape records every comparison to an ambient witness ledger. You can query this ledger:
# Query by tool, date range, or outcome
shape witness query --tool shape --since 2026-01-01 --outcome COMPATIBLE --json
# Get the most recent comparison
shape witness last --json
# Count comparisons matching a filter
shape witness count --since 2026-02-01shape witness query [--tool <name>] [--since <iso8601>] [--until <iso8601>] \
[--outcome <COMPATIBLE|INCOMPATIBLE|REFUSAL>] [--input-hash <substring>] \
[--limit <n>] [--json]
shape witness last [--json]
shape witness count [--tool <name>] [--since <iso8601>] [--until <iso8601>] \
[--outcome <COMPATIBLE|INCOMPATIBLE|REFUSAL>] [--input-hash <substring>] [--json]| Code | Meaning |
|---|---|
0 |
One or more matching records returned |
1 |
No matches (or empty ledger for last) |
2 |
CLI parse error or witness internal error |
- Default:
~/.epistemic/witness.jsonl - Override: set
EPISTEMIC_WITNESSenvironment variable - Malformed ledger lines are skipped; valid lines continue to be processed.
JSON Output Reference
A single JSON object on stdout. If the process fails before domain evaluation (e.g., invalid CLI args), JSON may not be emitted.
checksisnullforREFUSAL.reasonsis[]forCOMPATIBLE, non-empty forINCOMPATIBLE, andnullforREFUSAL.refusalisnullunless outcome isREFUSAL.profile_idechoes--profile-idwhen provided, otherwisenull.profile_sha256andinput_verificationare reserved v0 contract fields and remainnullin current runtime behavior.key_viabilityisnullwhen--keyis not provided.key_viability.unique_old/unique_newarenullif the key column is missing in that file.key_viability.coverageisnullwhen key overlap is not computable.row_granularity.key_overlap/keys_old_only/keys_new_onlyarenullwhen key metrics are unavailable.
Column names in JSON use unambiguous encoding:
u8:<string>— valid UTF-8 with no ASCII control bytes (e.g.,u8:loan_id)hex:<hex-bytes>— anything else (e.g.,hex:ff00ab)
Same convention as rvl.
NTM Auto-Proceed (for multi-agent sessions)
If you run multi-agent sessions and want periodic proceed nudges:
scripts/ntm_proceed_ctl.sh start --session codex53-highThis feature is off by default. When started with defaults, it:
- Runs every
10m - Sends only during overnight hours (
20:00to08:00, local time) - Sends only if there are open or in-progress beads
Check/stop it:
scripts/ntm_proceed_ctl.sh status
scripts/ntm_proceed_ctl.sh stopUseful overrides:
# Enable during daytime too
scripts/ntm_proceed_ctl.sh start --session codex53-high --mode always
# Custom overnight window and interval
scripts/ntm_proceed_ctl.sh start --session codex53-high --overnight-start 21 --overnight-end 7 --interval 15mThe full specification is docs/PLAN.md. This README covers everything needed to use the tool; the spec adds implementation details, edge-case definitions, and testing requirements.
For canonical release/signoff docs, start at docs/README.md.
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test
{ "version": "shape.v0", "outcome": "COMPATIBLE", // "COMPATIBLE" | "INCOMPATIBLE" | "REFUSAL" "profile_id": null, // echoes --profile-id when provided "profile_sha256": null, // reserved in v0 (currently null) "input_verification": null, // reserved in v0 (currently null) "files": { "old": "nov.csv", "new": "dec.csv" }, "checks": { "schema_overlap": { "status": "pass", // "pass" | "fail" "columns_common": 15, "columns_old_only": ["retired_field"], "columns_new_only": ["new_field"], "overlap_ratio": 0.88 }, "key_viability": { "status": "pass", "key_column": "u8:loan_id", "found_old": true, "found_new": true, "unique_old": true, "unique_new": true, "coverage": 1.0 }, "row_granularity": { "status": "pass", "rows_old": 4183, "rows_new": 4201, "key_overlap": 4150, "keys_old_only": 33, "keys_new_only": 51 }, "type_consistency": { "status": "pass", "numeric_columns": 12, "type_shifts": [] } }, "reasons": [], // non-empty when INCOMPATIBLE "refusal": null // non-null when REFUSAL }