Large language models produce repetitive output when prompted independently across many batches — a phenomenon we term cross-batch mode collapse. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling, which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating diversity strategies. DCE achieves 0.0% collapse versus 5.6% for naive prompting across three domains and two model families, at ~$0.50 per 1,000 candidates using only standard API calls.
Each batch: the generator produces candidates → VTS filters obvious ideas (self-assessed probability > 0.10) → semantic memory rejects near-duplicates (cosine similarity > 0.85) → prompt evolution rewrites the next prompt using memory state and a rotating diversity strategy.
git clone https://github.com/ryanlingo/dynamic-context-evolution.git
cd dynamic-context-evolution
pip install -e .
# Or with uv:
uv syncFor downstream evaluation (DeBERTa classifier):
pip install -e ".[downstream]"-
Copy the environment template and add your API keys:
cp .env.example .env # Edit .env with your OPENAI_API_KEY and ANTHROPIC_API_KEY -
Run a DCE generation session:
python experiments/run_exp2_comparison.py
-
Configuration is in
config.yaml— adjust domain, batch count, thresholds, etc.
Experiment 1 — Cross-batch mode collapse:
python experiments/run_exp1_collapse.pyExperiment 2 — DCE vs. baselines (multi-seed):
python experiments/run_multi_seed.pySensitivity analysis:
python experiments/run_sensitivity.py
python experiments/run_sensitivity_thresholds.pyDownstream evaluation:
python experiments/run_downstream.pyAnalysis scripts in analysis/ generate all paper figures and tables.
Experiment data (raw generation logs and processed embeddings) is available on the GitHub Releases page.
@article{lingo2026dynamic,
title={Dynamic Context Evolution for Scalable Synthetic Data Generation},
author={Lingo, Ryan and Chhajer, Rajeev},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
}This project is licensed under the Apache License 2.0 — see LICENSE for details.
Copyright 2026 Honda Research Institute, USA, Inc.
