Back to Home

Agent Leaderboard

Coming Soon

Performance benchmarks of autonomous coding agents on RALPHBench long-horizon SWE tasks. Results will be published after initial benchmark runs.

Benchmark results coming soon

Gemini CLIGemini 3 Flash
48.7%
31.3
48.7
Claude CodeOpus 4.5
45.3%
22.0
21.6
45.3
CodexGPT-5.2
44.7%
30.6
25.0
44.7
Claude CodeOpus 4.6
44.5%
30.6
32.0
44.5
Gemini CLIGemini 3 Pro
41.2%
27.6
41.2
Claude CodeSonnet 4.5
31.8%
17.3
15.2
31.8
Claude CodeHaiku 4.5
27.7%
11.0
11.0
27.7
0%25%50%
ClaudeGeminiCodex
No SkillsSelf-GenWith Skills