Add --missing-file-content option to handle broken symlinks in validate#1834
Open
yarikoptic-gitmate wants to merge 4 commits intomasterfrom
Open
Add --missing-file-content option to handle broken symlinks in validate#1834yarikoptic-gitmate wants to merge 4 commits intomasterfrom
yarikoptic-gitmate wants to merge 4 commits intomasterfrom
Conversation
When running `dandi validate` on a datalad dataset without fetched data, broken symlinks cause verbose exception tracebacks for every file (issue #1606). This adds a --missing-file-content option with three policies: - error (default): emit a concise single-line error per broken symlink - skip: skip the file entirely with a WARNING - only-non-data: skip content-dependent validators (pynwb, nwbinspector) but still validate path layout The MissingFileContent enum is threaded from the CLI through the validate pipeline to individual DandiFile.get_validation_errors() implementations. Closes #1606 https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1834 +/- ##
==========================================
+ Coverage 76.27% 76.46% +0.18%
==========================================
Files 87 87
Lines 12484 12617 +133
==========================================
+ Hits 9522 9647 +125
- Misses 2962 2970 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
When --missing-file-content=only-non-data is active, pass ignore_nifti_headers=True to bids_validate() so that the deno BIDS validator runs with --ignoreNiftiHeaders, skipping content-dependent NIfTI header checks while still validating BIDS layout and naming. https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
Add explicit `assert message is not None` before calling `.lower()` on ValidationResult.message, which is typed as `str | None`. https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
Instead of passing --ignoreNiftiHeaders to bids-validator-deno (which would suppress header checks even for real files), run the validator in full and filter out content-dependent BIDS error codes (NIFTI_HEADER_UNREADABLE, EMPTY_FILE) only for broken-symlink files. This way real files still get full BIDS validation. Empirical testing confirms bids-validator-deno reports NIFTI_HEADER_UNREADABLE (error) for broken symlinks while still validating path layout, sidecar JSON, and metadata correctly. Also improves tests: - All test dandisets now include at least one real NWB file alongside the broken symlinks - New test_validate_broken_symlink_real_file_still_validated verifies that pynwb/nwbinspector results exist for the real file and are absent for the broken symlink under both skip and only-non-data - CLI tests verify "sub-003" (real file) appears in output https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--missing-file-contentCLI option (error/skip/only-non-data) todandi validatefor graceful handling of broken symlinks in datalad datasets without fetched dataerrorpolicy replaces verbose exception tracebacks with a concise single-line error per broken symlinkskippolicy skips broken symlink files entirely, emitting a WARNING per skipped fileonly-non-datapolicy skips content-dependent validators (pynwb, nwbinspector) but still validates path layoutCloses #1606
Problem
Running
dandi validateon a datalad-cloned dataset without fetched data floods the screen with full Python tracebacks for every file — from pynwb, nwbinspector, and other content-reading validators that fail on broken symlinks.Solution
A
MissingFileContentenum with three policies controls how broken symlinks (files whose content is unavailable) are handled:error(default)skiponly-non-dataThe parameter is threaded from the CLI through
validate()→DandiFile.get_validation_errors(). Broken symlinks are detected via_is_broken_symlink()before any validator is invoked.Changes
dandi/validate/_types.py: NewMissingFileContentenumdandi/validate/_core.py: Broken symlink detection and_handle_missing_content()helperdandi/files/bases.py: Updatedget_validation_errors()signatures to acceptmissing_file_content; NWBAsset skips pynwb/nwbinspector foronly-non-datadandi/files/bids.py,dandi/files/zarr.py: Updated signatures for consistencydandi/cli/cmd_validate.py: New--missing-file-contentclick optionTest plan
test_validate_broken_symlink_{error_default,skip,only_non_data}test_validate_missing_file_content_{error_default,skip,only_non_data,no_broken_symlinks}datalad clone https://github.com/dandisets/000027 && dandi validate --missing-file-content=skip 000027https://claude.ai/code/session_01CLi49c7QcJx11b7UfshbvE