Skip to content

Add bt datasets command for managing remote datasets#104

Open
Parker Henderson (parkerhendo) wants to merge 4 commits intomainfrom
parkerhendo/datasets-e2e
Open

Add bt datasets command for managing remote datasets#104
Parker Henderson (parkerhendo) wants to merge 4 commits intomainfrom
parkerhendo/datasets-e2e

Conversation

@parkerhendo
Copy link
Copy Markdown
Contributor

@parkerhendo Parker Henderson (parkerhendo) commented Apr 9, 2026

TL;DR

Added a new bt datasets command for managing remote Braintrust datasets with full CRUD operations and data synchronization capabilities.

What changed?

Added comprehensive dataset management functionality:

  • New bt datasets command with subcommands: list, create, upload/add/append, refresh, view, and delete
  • Dataset operations support multiple input methods: files (--file), inline JSON (--rows), or stdin
  • Refresh functionality with deterministic upsert by record ID and optional pruning of stale records
  • Record processing handles nested ID fields (e.g., --id-field metadata.case_id) and validates dataset row structure
  • API integration with BTQL queries for fetching dataset rows and Logs3 batch uploader for efficient data submission
  • Interactive features including dataset selection, confirmation prompts, and browser opening
  • Project context resolution extracted into reusable module for consistent project handling across commands

How to test?

Run the dataset commands:

# List datasets
bt datasets list

# Create and seed a dataset
bt datasets create my-dataset --file records.jsonl
cat records.jsonl | bt datasets create my-dataset
bt datasets create my-dataset --rows '[{"id":"case-1","input":{"text":"hi"},"expected":"hello"}]'

# Add more records
bt datasets add my-dataset --file more-records.jsonl

# Refresh with pruning
bt datasets refresh my-dataset --file records.jsonl --id-field metadata.case_id --prune

# View dataset
bt datasets view my-dataset --verbose

# Delete dataset
bt datasets delete my-dataset

The test suite includes comprehensive fixtures testing various input methods and operations.

Why make this change?

This enables users to manage Braintrust datasets directly from the CLI without requiring local sync artifacts, providing a streamlined workflow for dataset creation, updates, and maintenance with support for deterministic refresh operations and flexible input formats.

@parkerhendo Parker Henderson (parkerhendo) changed the title feat: add remote dataset management commands Add bt datasets command for managing remote datasets Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Latest downloadable build artifacts for this PR commit d29c4bc762a0:

Available artifact names
  • ``artifacts-build-global
  • ``artifacts-build-local-x86_64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-apple-darwin
  • ``artifacts-build-local-x86_64-unknown-linux-gnu
  • ``artifacts-build-local-aarch64-apple-darwin
  • ``artifacts-build-local-x86_64-unknown-linux-musl
  • ``artifacts-build-local-aarch64-unknown-linux-musl
  • ``artifacts-build-local-aarch64-unknown-linux-gnu
  • ``artifacts-plan-dist-manifest
  • ``cargo-dist-cache

@ankrgyl
Copy link
Copy Markdown
Contributor

  • What if I want to upload records without an id?
  • If I upload something which contains extraneous fields, they get silently ignored:
Ankurs-MacBook-Pro:~/projects/braintrust-cli ankur$ cat test.json
{"foo": "bar", "id": 1}

just uploads the id

I think the PR description is a bit out of date. There's no refresh command, I think. But I also like that (I was about to ask, why would we have one!?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants