Code repo for the paper ProSec: Fortifying Code LLMs with Proactive Security Alignment.
The pipeline follows these stages:
Synthesize CWE-Inducing Instructions
|
v
Generate Vulnerable Code + Generate Benign Code
| |
v |
Detect Vulnerabilities (Purple Llama)
| |
v |
Generate Fixes & Re-detect |
| |
v v
Mix Fixed Code with Benign Code
|
v
Final Training Dataset
Clone the repository with its submodule:
git clone --recurse-submodules https://github.com/PurCL/ProSec.gitIf you have already cloned without --recurse-submodules, fetch the submodule separately:
git submodule update --init --recursive- Python 3
- The tested model must be hosted via vLLM with an OpenAI-compatible API endpoint.
- PurCL's Purple Llama is included as a git submodule under
PurpleLlama/.
Synthesize instructions for a single CWE-language pair:
./synth_claude.sh <CWE_ID> <LANG>This generates instructions and clusters them to select 2000 per pair.
To synthesize for all CWE-language pairs at once:
./synth_all.shNote: Set the
HF_USERenvironment variable to your HuggingFace username before running any scripts (e.g.,export HF_USER=your-hf-username). Make sure tomkdirthe output directory before running the script.
Generate vulnerable code for all CWE-language pairs using the tested model:
./infer_all_claude.shNote: Modify
src/gen_inferences.pyto specify the addresses of the hosted vLLM model.
Generate normal (non-vulnerable) code with the original instructions:
./infer_all_claude_ori_task.shNote: Host the tested model via vLLM and modify
src/gen_inferences.pyaccordingly.
This step detects vulnerabilities, generates fixes, and pairs them up. It uses scripts from both this repo and the PurpleLlama/ submodule.
Create a symlink from the output directory of infer_all_claude to the PurpleLlama/ directory, then merge the inference results:
python3 PurpleLlama/prosec_scripts/merge_multiple_infer_rets.pyAlso merge the benign inference results to produce infer-ret-original.jsonl.
Note: You need to manually modify the merge script before running it.
python3 PurpleLlama/prosec_scripts/detect_all.pyThis produces detection-ret.jsonl.
python3 src/gen_fix_inference_prompts.py \
--fin detection-ret.jsonl \
--fout-stats detection-ret.stats.json \
--fout detection-ret.fix-prompt.jsonlpython3 src/gen_fix_inference.py \
--prompts_in detection-ret.fix-prompt.jsonl \
--fout detection-ret.fixed.jsonlNote: Host the tested model and modify
src/gen_fix_inference.py.
python3 PurpleLlama/prosec_scripts/detect_all_from_fixed.pyThis produces detection-ret-fixed.jsonl.
python3 src/collect_and_upload_fixed_batch.py \
--detection_ret detection-ret.jsonl \
--fixed_detected_ret detection-ret-fixed.jsonl \
--ds_name <name-of-the-dataset> \
--fout <intermediate-results>This produces a fix-pair dataset (e.g., purcl/fix-dataset).
Concatenate multiple CWE-inducing instruction datasets:
python3 src/concat_dataset.pyNote: You will need to manually modify this file. Suppose the output is
purcl/concat-dataset.
Clean the benign data and mix with the fixed code:
python3 src/clean_benign_data.py --fin infer-ret-original.jsonl
python3 src/mix_and_upload_original_w_fixed_batch.py \
--inst_ds_name purcl/concat-dataset \
--fix_pair_ds_name purcl/fix-dataset \
--infer_ori_in infer-ret-original-filtered.jsonl \
--out_ds_name <output-dataset-name>The influence_score/ module provides tools for computing training dynamics and influence scores over synthesized datasets. These scores measure how individual training samples contribute to security alignment, enabling better data selection strategies. More detailed instructions will be published soon.
Key components:
| Module | Description |
|---|---|
data_utils_refactored.py |
Entry point: prepares selection datasets from instruction, fix-pair, and benign data |
training_dynamics_refactored.py |
Collects log-probabilities and accuracy across training checkpoints |
sample_refactored.py |
Data selection strategies based on training dynamics correlations |
scores.py |
Computes sequence-level log-probabilities and normalized scores |
collator.py |
Data collator for pairwise training data |
collect_grad_reps.py |
Gradient representation collection using TRAK for influence estimation |