Skip to content

Fix evalbuff signal quality and add edit history to doc writer#483

Open
jahooma wants to merge 1 commit intomainfrom
jahooma/improve-evalbuff
Open

Fix evalbuff signal quality and add edit history to doc writer#483
jahooma wants to merge 1 commit intomainfrom
jahooma/improve-evalbuff

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Mar 27, 2026

Summary

  • Commit pre-copied docs in test repos so they don't leak into the agent's diff — previously judges penalized agents for "creating unnecessary documentation" that was actually pre-loaded context
  • Isolate Claude calls (prompt generator + doc writer) by setting cwd=tmpDir so they don't read the repo's CLAUDE.md/AGENTS.md and get confused
  • Filter lockfiles (bun.lock, package-lock.json, etc.) from diffs and file lists to reduce noise
  • Add score comparison threshold (0.3 points minimum) to avoid accepting docs based on noise
  • Pass edit history to the doc writer so it knows which prior docs were accepted/rejected (with scores), avoiding repeated failures and building on successes
  • Cap improvement loop at 5 iterations to prevent runaway cost

Test plan

  • Tested on freebuff commits: rate-limit commit went from 7/10 → 9/10, login-URL commit now scores correctly
  • Run full evalbuff loop on a batch of commits to verify end-to-end

🤖 Generated with Claude Code

…e calls, filter lockfiles

- Commit pre-copied docs in test repos so they don't appear in the agent's
  diff — fixes corrupted diff attribution where judges penalized agents for
  docs they didn't create
- Run prompt generator and doc writer Claude calls with cwd=tmpDir to prevent
  them from reading the repo's CLAUDE.md/AGENTS.md
- Filter lockfiles (bun.lock, package-lock.json, etc.) from diffs and file lists
- Add 0.3-point minimum threshold for score comparisons to reduce noise
- Cap improvement loop at 5 iterations
- Pass edit history (accepted/rejected docs with scores) to the doc writer
  so it can avoid repeating rejected approaches and build on what worked

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant