Contribute
Help build the benchmark that will define how long-horizon agents are evaluated.
Ready to contribute?
Get in touch to join the project. Weekly syncs Mon & Thu 5PM PT. Office hours Tue/Thu/Sat 9:30AM PT.
Every capability leap needs a new benchmark.
How to Contribute
Get access
Join the RALPHBench Slack and introduce yourself. Add your name, email, and affiliation to the RALPHBench Workspace and we'll add you to meeting invites. Schedule a quick call to brainstorm ideas if you'd like.
Get started
Read CONTRIBUTING.md on GitHub, then browse past meeting notes to find open ideas. AI-assisted coding is encouraged, but task ideas, instruction.md, and task.toml must be written by humans.
Submit & get merged
Pick a hard, real-world coding problem that takes a skilled human multiple hours. Submit a PR; tasks are reviewed weekly. You can also contribute engineering work, experiment runs, or paper writing.
Authorship
Contributors with 3 or more tasks merged receive authorship on the RALPHBench paper. Engineering contributions, experiment runs, and paper writing are also considered.
Related Projects
SkillsBench
Benchmark for evaluating agent skill acquisition and transfer
Terminal Bench
Multi-step terminal task benchmark for CLI agents
Harbor
Standardized agent-environment interface specification
SWE-bench
GitHub issue resolution benchmark for coding agents