Skip to content

🦀 PinchBench

Real-world benchmarks for AI coding agents

PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files.


Repositories

Repo Description
skill Benchmark runner and task definitions — run it yourself
leaderboard The pinchbench.com leaderboard frontend
api The public PinchBench API at api.pinchbench.com
scripts The offical PinchBench run automation with default_models.yml

Run the Benchmark

git clone https://github.com/pinchbench/skill.git
cd skill
./scripts/run.sh --model anthropic/claude-sonnet-4

Results upload to the public leaderboard. Get started →


Claw-some AI agent testing. Made with 🦀 by the humans at https://kilo.ai 🦞

Popular repositories Loading

  1. skill skill Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    Python 593 47

  2. leaderboard leaderboard Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    TypeScript 18 8

  3. api api Public

    TypeScript 1 4

  4. .github .github Public

    PinchBench organization profile and community health files

  5. scripts scripts Public

    Shell 4

Repositories

Showing 5 of 5 repositories
  • leaderboard Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    pinchbench/leaderboard’s past year of commit activity
    TypeScript 18 8 7 2 Updated Mar 16, 2026
  • .github Public

    PinchBench organization profile and community health files

    pinchbench/.github’s past year of commit activity
    0 0 0 0 Updated Mar 16, 2026
  • skill Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    pinchbench/skill’s past year of commit activity
    Python 593 MIT 46 12 9 Updated Mar 16, 2026
  • scripts Public
    pinchbench/scripts’s past year of commit activity
    Shell 0 4 0 1 Updated Mar 16, 2026
  • api Public
    pinchbench/api’s past year of commit activity
    TypeScript 1 MIT 4 1 0 Updated Mar 15, 2026

Most used topics

Loading…