First, create an isolated environment. Replace yourname with your preferred environment name (e.g., sim_env).
python -m venv yourname
Activate the environment:
Bash:
source ./yourname/bin/activate
Install the required packages using the provided requirements files. Ensure you provide the correct path to your files.
For Cascade Computing (CC):
pip install -r ./cascade_computing/requirements_cc.txt
Create a kernel that you can use inside Jupyter Lab to use your enviroment
python -m ipykernel install --user --name=yourname --display-name "cc_kernel"
Jupyter lab runs in the local browser
jupyer lab &
or you can direct it to a port on your HPC system
nohup jupyter notebook --no-browser --port=8008 &in the second case you need to log onto your HPC system with the same port: ssh -L 8008:localhost:8008 yourHPC.
Check if its running using:
jupyer notebook list
To get an idea of our contact evaluation you can play around with a small contact-table and plot lifetimes as well as frequencies. The dataset does not include a full system so keep in mind, that some amino-acid pairs are not included.
Download the corresponding dataset (MUT16_FFR_atm_analysis_part.zip) from https://zenodo.org/uploads/19239689 and unpack the folder into the examples directory - then open & run:
examples/minimal_example_MUT16_FFR.ipnybYou might have to set the path to your cascade_computing directory. The examples shows you what the results of the analysis can look like:
-what does the contact record include -how are the contact frequencies and lifetimes saved -how to evaluate the corresponding distributions
Download the corresponding dataset (data.zip) from `https://zenodo.org/uploads/19239689 and unpack the folder into the examples directory. There are two test scripts included, that you can use to get insight into the contact calculations. The first one in
tests/unit_test.ipnybshows contact calculations and runs a small example that calculated also specific bond interaction. The second one in
tests/debug_test.ipnyballows you to verify agains pre-computed data, in case you modify or extend your version of the code.
Open
`setup_analysis.ipynb`then start by configuring your local paths, the notebook will guide you through the setup.
- Repository Path: Set
path_gitto your local clone of thecascade_computingrepository. - Project Name: Set
p_nameto a unique identifier (e.g.,'MUT16_MUT8'). This creates a dedicated workspace for your project.
To analyze your system, you must specify the proteins and their structural properties.
For every protein in your system, create a configuration dictionary including:
prot: The name of the protein.file: Path to the full-length PDB file.orig_na: Total amino acids in the original full-length protein.cut_na: Number of amino acids in the simulated fragment.min/max: The residue indices in the full-length protein defining your simulated fragment.
If your proteins contain specific domains (e.g., NTERM), define them using boundary columns. Use DOM_MIN and DOM_MAX to define the start and end residues relative to the full sequence.
Example Configuration:
domains = ['NTERM', 'FULL']
mut16 = {
'prot': "MUT16",
'orig_na': 2204,
'cut_na': 140,
'min': 633,
'max': 772,
'NTERM_MIN': 633,
'NTERM_MAX': 700,
'file': f"{path_input}/{filenameA}.pdb"
}The notebook aggregates these dictionaries into a df_domains table. This serves as the primary metadata source, allowing the notebook to automatically extract amino acid sequences and map analysis results to specific domains.
The notebook connects to the database created during setup. It looks for a signac.rc file in your directory to identify the project.
import signac
project = signac.get_project()You can filter for specific proteins or concentrations to limit the scope of your operations:
# Select all jobs
for job in project:
pass
# Filter for a specific protein
for job in project.find_jobs({'prot': 'MUT16'}):
passUse the project.run() method. This is the preferred execution method as it handles environment variables and logging automatically.
Example: Running a transformation
project.run(
names=['transform'],
jobs=[job],
progress=True,
num_passes=1,
order="by-job"
)Parameter Breakdown:
names: List of operation names to execute (e.g.,['transform', 'contacts']).jobs: A list containing the specific job object(s) to run.progress: Set toTruefor a progress bar.order="by-job": Executes all operations for one job before moving to the next.
The project is configured with the following operations:
post_processing: Sets up the analysis framework.transform: Trajectory transformation (centering, PBC wrapping).contacts`: Computing residue-residue contact maps.eval_contacts: Contact evaluation and creation of the contact recordanalysis: Downstream analysis e.g. pivot tables from the contact recordvisualization: Generating plots .
Note on Modifications: To modify these functions, edit the source file located at:
cascade_computing/src/compute/signac/sgnc.py