RALPHBench

An ultra-long-horizon benchmark designed to evaluate coding agents on realistic, high-complexity software engineering tasks.

NeurIPS 2026 Submission Deadline · May 7, 2026
Open Contribution

Contribute a task. Become a NeurIPS 2026 co-author.

Step 01

Get Access

Join the RALPHBench GitHub repo and review the task guidelines.

Step 02

Build a Task

Tasks follow the Harbor format with deterministic unit tests and full solution.

Step 03

Submit and Get Merged

Open a PR. One approved task earns co-authorship on the NeurIPS 2026 paper.

Every capability leap needs a new benchmark.
2021
HumanEval
Write a single function
~1 min
unlocked code LLMs
2023
SWE-bench
Fix one GitHub issue
~15 min
unlocked coding agents
2025
TerminalBench
Run multi-step terminal tasks
~1 hour
unlocked terminal agents
2026
RALPHBench
Build entire systems from scratch
1–5+ hours · 25M+ tokens
unlocking autonomous agents

Build RALPHBench with us

abundant.ai benchflow.ai
rishi@abundant.ai xiangyi@benchflow.ai