Reproducibility is the hallmark of the scientific method.
Modern science has become so complex that many science projects rely on multiple software packages to work in unison, resulting in networks of data products along the analyses. Versioning and managing these data products are essential in making modern data- and computation-intensive science reproducible.
Motivated by the Event Horizon Telescope (EHT)'s
observational data calibration pipelines and theory data analyses
tools, hallmark is a lightweight package designed to version
control and manage data products in a complex workflow.
It provides a simple abstraction and a uniform Application Programming
Interface (API) on top of different backend technologies such as
POSIX file system,
object storage,
globus,
iRODS,
stream,
etc.
By using hallmark with other packages such as yukon and
banyan in Project Laniakea, researchers can utilize
computing infrastructures in a global scale to accelerate their
science.
ParaFrame is a specialized subclass of pandas.DataFrame that
automatically extracts parameters encoded in file paths. When performing
large-scale parameter surveys or building simulation libraries, parameters
are often encoded directly in file naming schemes (e.g.,
Ma+0.94_i70/sed_Rh160.h5).
Features:
- Decodes file paths back to structured parameters using Python format strings
- Builds DataFrames with parsed parameters as columns
- Supports custom encodings via YAML configuration for complex parsing rules
- Provides intuitive filtering for parameter selection—easier than pure pandas
Examples of using ParaFrame with Python API or Command Line Interface (CLI)
can be found in the Jupyter Notebook tutorials in the demos folder.
Install hallmark from PyPI:
pip install hallmark
Or install from source for development purposes:
git clone https://github.com/l6a/hallmark.git cd hallmark pip install -e .