Skip to content

l6a/hallmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

168 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducibility is the hallmark of the scientific method.

Modern science has become so complex that many science projects rely on multiple software packages to work in unison, resulting in networks of data products along the analyses. Versioning and managing these data products are essential in making modern data- and computation-intensive science reproducible.

Motivated by the Event Horizon Telescope (EHT)'s observational data calibration pipelines and theory data analyses tools, hallmark is a lightweight package designed to version control and manage data products in a complex workflow. It provides a simple abstraction and a uniform Application Programming Interface (API) on top of different backend technologies such as POSIX file system, object storage, globus, iRODS, stream, etc. By using hallmark with other packages such as yukon and banyan in Project Laniakea, researchers can utilize computing infrastructures in a global scale to accelerate their science.

ParaFrame

ParaFrame is a specialized subclass of pandas.DataFrame that automatically extracts parameters encoded in file paths. When performing large-scale parameter surveys or building simulation libraries, parameters are often encoded directly in file naming schemes (e.g., Ma+0.94_i70/sed_Rh160.h5).

Features:

  • Decodes file paths back to structured parameters using Python format strings
  • Builds DataFrames with parsed parameters as columns
  • Supports custom encodings via YAML configuration for complex parsing rules
  • Provides intuitive filtering for parameter selection—easier than pure pandas

Tutorial

Examples of using ParaFrame with Python API or Command Line Interface (CLI) can be found in the Jupyter Notebook tutorials in the demos folder.

Installation

Install hallmark from PyPI:

pip install hallmark

Or install from source for development purposes:

git clone https://github.com/l6a/hallmark.git
cd hallmark
pip install -e .

About

Data management tool---reproducibility is the hallmark of the scientific method

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages