Fix/full znh5md support by PythonFZ · Pull Request #16 · zincware/asebytes

PythonFZ · 2026-03-10T19:08:09Z

Summary by CodeRabbit

New Features
- Better detection and support for H5MD content inside .h5 files for more reliable format selection.
- Improved per-atom data detection for richer atom attribute handling.
Performance
- Faster and more reliable path resolution via a new path-caching mechanism.
Tests
- Added comprehensive round-trip interoperability tests to validate cross-format fidelity.

Parametrized tests over all s22_* fixtures verify: - znh5md write -> asebytes read (catches H5MD path mapping bug with per-atom calc results like energies, stresses, magmoms) - asebytes write -> znh5md read - Full bidirectional round-trip - *.h5 extension with H5MD content (catches registry routing bug) 9 tests fail RED, exposing two bugs: 1. H5MDStore path mapping loses actual H5MD paths on round-trip 2. Registry sends *.h5 H5MD files to RaggedColumnarBackend Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add _is_h5md_file() helper that checks for "h5md" group in HDF5 files - Insert sniffing logic in resolve_backend to redirect *.h5 files containing H5MD metadata to the H5MD backend Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…roperties Properties like energies, stresses, magmoms placed by znh5md in /particles/ were lost on read because _column_to_h5() statically mapped them to /observables/. Now _walk_elements() caches the actual H5 path during discovery, and _get_ds() checks the cache first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In _discover(), columns from list_arrays() are known to exist, so the has_array() check was redundant and broke per-atom detection for columns whose _column_to_h5() mapping disagreed with the actual H5 path (e.g. per-atom energies placed in /particles/ by znh5md). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-10T19:08:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1d0871b-1b05-436d-b81d-f35bc0350c08

📥 Commits

Reviewing files that changed from the base of the PR and between 5d95065 and 20ce0a0.

📒 Files selected for processing (1)

tests/contract/test_znh5md_roundtrip.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/contract/test_znh5md_roundtrip.py

📝 Walkthrough

Walkthrough

Detect H5MD metadata inside .h5 files to select H5MD-specific backends, add a path cache mapping column names to HDF5 paths, change per-atom detection in the H5MD backend discovery to use HDF5 path/ndim, and add bidirectional znh5md ↔ asebytes round-trip tests.

Changes

Cohort / File(s)	Summary
Registry: H5MD sniffing `src/asebytes/_registry.py`	Added internal `_is_h5md_file(path)` that tries to open the file with `h5py` and checks for an `h5md` key. `resolve_backend()` now, for object-layer no-scheme URIs with `.h5` candidates, uses this sniff to switch candidates to `.h5md` when H5MD content is detected.
H5MD backend discovery `src/asebytes/h5md/_backend.py`	Adjusted `_discover()` per-atom detection: infer per-atom status from HDF5 path prefix (`/particles/`) and `ndim >= 2` (excluding `cell` and `pbc`) instead of relying on `has_array()` and shape[1] checks.
H5MD store path caching & resolution `src/asebytes/h5md/_store.py`	Added `path_cache` mapping column ↔ HDF5 path, introduced `_resolve_h5_path(key)` to consult the cache before `_column_to_h5`, updated `_get_ds()` and `has_array()` to use the resolver, and populate `path_cache` during `_walk_elements()` when discovering datasets.
Contract tests: znh5md interoperability `tests/contract/test_znh5md_roundtrip.py`	New tests exercising znh5md ↔ asebytes round-trips across fixtures: read znh5md with ASEIO, write with ASEIO and verify via znh5md.IO, and full bidirectional roundtrip checks (positions, counts, numbers) across frames.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Registry as resolve_backend()
    participant Detector as _is_h5md_file()
    participant h5py
    participant Backend as BackendCandidates

    User->>Registry: resolve(path, layer, patterns)
    activate Registry
    note over Registry: If no URI scheme and layer == "object"\nand pattern candidates include *.h5
    Registry->>Detector: _is_h5md_file(path)
    activate Detector
    Detector->>h5py: open(path)
    alt HDF5 with h5md key
        h5py-->>Detector: returns file with 'h5md'
        Detector-->>Registry: True
    else Not HDF5 or no h5md / error
        h5py-->>Detector: exception or no key
        Detector-->>Registry: False
    end
    deactivate Detector

    alt H5MD detected
        Registry->>Backend: select *.h5md candidates
        Backend-->>Registry: H5MD-specific candidates
    else keep original
        Backend-->>Registry: original candidates
    end
    Registry-->>User: resolved backend
    deactivate Registry

sequenceDiagram
    participant Discovery as _walk_elements()
    participant PathCache as path_cache
    participant Store as _get_ds()
    participant Resolver as _column_to_h5()

    Discovery->>Discovery: traverse HDF5 groups
    loop for each discovered dataset group
        Discovery->>Store: compute h5_path
        Store->>PathCache: cache mapping (column ← h5_path)
    end

    Store->>PathCache: lookup(column)
    alt cache hit
        PathCache-->>Store: return cached h5_path
    else cache miss
        PathCache-->>Store: miss
        Store->>Resolver: _column_to_h5(column)
        Resolver-->>Store: h5_path
    end
    Store-->>Caller: dataset at h5_path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I sniffed the H5 with eager nose,

found h5md where the metadata grows,
cached the paths and mapped each name,
round-trips hop back—numbers match the same,
a little rabbit cheers the interoperable game!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix/full znh5md support' accurately reflects the main objective of the pull request, which adds comprehensive H5MD support including H5MD detection, improved per-atom column detection, path caching, and interoperability tests with znh5md.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/full-znh5md-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/asebytes/h5md/_store.py`:
- Line 111: The path cache (_path_cache) is only used by _get_ds(), causing
has_array() and other probes using _column_to_h5() to miss previously discovered
foreign paths; centralize path resolution into a single helper (e.g.,
_resolve_column_path or update _column_to_h5) that looks up and populates
self._path_cache for a given column name and always returns the actual HDF5
path, then change _get_ds(), has_array(), and any other callers (references
around _get_ds, has_array, and the blocks at lines ~218-225 and ~452-456) to
call this resolved-path helper instead of directly using the old _column_to_h5()
logic so existence checks and reads use the same cached path. Ensure the helper
is thread-safe relative to _path_cache mutation if applicable.

In `@tests/contract/test_znh5md_roundtrip.py`:
- Around line 75-83: The test currently slices result to n before comparing,
which masks leaks of padded atoms; change the checks to first assert len(result)
== n (e.g., using assert len(result) == n or
np.testing.assert_equal(len(result), n)) and then compare the full arrays (call
np.testing.assert_allclose(result.positions, expected.positions, ...) and
np.testing.assert_array_equal(result.numbers, expected.numbers, ...)) instead of
result[:n]; make the same update for the analogous block later in the file where
positions/numbers are compared.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28b39aca-79f9-476b-8536-f9948807373d

📥 Commits

Reviewing files that changed from the base of the PR and between cd81592 and 229b6ba.

📒 Files selected for processing (4)

src/asebytes/_registry.py
src/asebytes/h5md/_backend.py
src/asebytes/h5md/_store.py
tests/contract/test_znh5md_roundtrip.py

src/asebytes/h5md/_backend.py

src/asebytes/h5md/_store.py

tests/contract/test_znh5md_roundtrip.py

- Use H5 path (/particles/ vs /observables/) for per-atom detection instead of shape heuristic that misclassified frame-level vectors - Centralize path resolution with _resolve_h5_path helper used by both _get_ds and has_array to fix foreign column lookups - Remove test slicing ([:n]) that masked potential padding leaks and add explicit atom count assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace weak positions/numbers-only assertions in TestAsebytesToZnH5MD and TestBidirectionalRoundtrip with assert_atoms_equal to catch regressions in cell, pbc, info, arrays, and calc properties. Merge TestH5ExtensionWithH5MDContent into TestZnH5MDToAsebytes as a parametrized extension (.h5md/.h5) to remove duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PythonFZ and others added 4 commits March 10, 2026 19:30

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

src/asebytes/h5md/_backend.py Outdated Show resolved Hide resolved

src/asebytes/h5md/_store.py Show resolved Hide resolved

tests/contract/test_znh5md_roundtrip.py Outdated Show resolved Hide resolved

PythonFZ and others added 2 commits March 10, 2026 21:02

PythonFZ merged commit 79c1b63 into main Mar 10, 2026
5 checks passed

PythonFZ deleted the fix/full-znh5md-support branch March 10, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/full znh5md support#16

Fix/full znh5md support#16
PythonFZ merged 6 commits intomainfrom
fix/full-znh5md-support

PythonFZ commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PythonFZ commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PythonFZ commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading