Skip to content

Add --missing-file-content option to handle broken symlinks in validate#1834

Open
yarikoptic-gitmate wants to merge 4 commits intomasterfrom
claude/fix-dandi-cli-1606-zbG3e
Open

Add --missing-file-content option to handle broken symlinks in validate#1834
yarikoptic-gitmate wants to merge 4 commits intomasterfrom
claude/fix-dandi-cli-1606-zbG3e

Conversation

@yarikoptic-gitmate
Copy link
Copy Markdown
Collaborator

Summary

  • Adds --missing-file-content CLI option (error/skip/only-non-data) to dandi validate for graceful handling of broken symlinks in datalad datasets without fetched data
  • Default error policy replaces verbose exception tracebacks with a concise single-line error per broken symlink
  • skip policy skips broken symlink files entirely, emitting a WARNING per skipped file
  • only-non-data policy skips content-dependent validators (pynwb, nwbinspector) but still validates path layout

Closes #1606

Problem

Running dandi validate on a datalad-cloned dataset without fetched data floods the screen with full Python tracebacks for every file — from pynwb, nwbinspector, and other content-reading validators that fail on broken symlinks.

Solution

A MissingFileContent enum with three policies controls how broken symlinks (files whose content is unavailable) are handled:

Policy Behavior Severity
error (default) Emits a concise error with symlink target info ERROR
skip Skips the file, emits a warning WARNING
only-non-data Skips content validators, still checks path layout WARNING

The parameter is threaded from the CLI through validate()DandiFile.get_validation_errors(). Broken symlinks are detected via _is_broken_symlink() before any validator is invoked.

Changes

  • dandi/validate/_types.py: New MissingFileContent enum
  • dandi/validate/_core.py: Broken symlink detection and _handle_missing_content() helper
  • dandi/files/bases.py: Updated get_validation_errors() signatures to accept missing_file_content; NWBAsset skips pynwb/nwbinspector for only-non-data
  • dandi/files/bids.py, dandi/files/zarr.py: Updated signatures for consistency
  • dandi/cli/cmd_validate.py: New --missing-file-content click option
  • 7 new tests covering all three policies (both core and CLI)

Test plan

  • 3 core tests: test_validate_broken_symlink_{error_default,skip,only_non_data}
  • 4 CLI tests: test_validate_missing_file_content_{error_default,skip,only_non_data,no_broken_symlinks}
  • All 110 existing validate tests pass (no regressions)
  • All 22 file tests pass (no regressions)
  • Pre-commit hooks (black, isort, flake8, codespell) pass
  • Can be manually tested with datalad clone https://github.com/dandisets/000027 && dandi validate --missing-file-content=skip 000027

https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE

When running `dandi validate` on a datalad dataset without fetched data,
broken symlinks cause verbose exception tracebacks for every file (issue #1606).

This adds a --missing-file-content option with three policies:
- error (default): emit a concise single-line error per broken symlink
- skip: skip the file entirely with a WARNING
- only-non-data: skip content-dependent validators (pynwb, nwbinspector)
  but still validate path layout

The MissingFileContent enum is threaded from the CLI through the validate
pipeline to individual DandiFile.get_validation_errors() implementations.

Closes #1606

https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 91.39073% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.46%. Comparing base (0af950f) to head (1809ce6).

Files with missing lines Patch % Lines
dandi/validate/_types.py 0.00% 7 Missing ⚠️
dandi/files/bases.py 77.27% 5 Missing ⚠️
dandi/validate/_core.py 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1834      +/-   ##
==========================================
+ Coverage   76.27%   76.46%   +0.18%     
==========================================
  Files          87       87              
  Lines       12484    12617     +133     
==========================================
+ Hits         9522     9647     +125     
- Misses       2962     2970       +8     
Flag Coverage Δ
unittests 76.46% <91.39%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

claude added 2 commits April 11, 2026 17:38
When --missing-file-content=only-non-data is active, pass
ignore_nifti_headers=True to bids_validate() so that the deno BIDS
validator runs with --ignoreNiftiHeaders, skipping content-dependent
NIfTI header checks while still validating BIDS layout and naming.

https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
Add explicit `assert message is not None` before calling `.lower()` on
ValidationResult.message, which is typed as `str | None`.

https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
@yarikoptic yarikoptic added the minor Increment the minor version when merged label Apr 11, 2026
Instead of passing --ignoreNiftiHeaders to bids-validator-deno (which
would suppress header checks even for real files), run the validator in
full and filter out content-dependent BIDS error codes
(NIFTI_HEADER_UNREADABLE, EMPTY_FILE) only for broken-symlink files.
This way real files still get full BIDS validation.

Empirical testing confirms bids-validator-deno reports
NIFTI_HEADER_UNREADABLE (error) for broken symlinks while still
validating path layout, sidecar JSON, and metadata correctly.

Also improves tests:
- All test dandisets now include at least one real NWB file alongside
  the broken symlinks
- New test_validate_broken_symlink_real_file_still_validated verifies
  that pynwb/nwbinspector results exist for the real file and are
  absent for the broken symlink under both skip and only-non-data
- CLI tests verify "sub-003" (real file) appears in output

https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validate on datalad dataset without data fetched -- floods the screen

3 participants