Job Description

A technical researcher to own how we evaluate frontier models on the ARC-AGI benchmarks. This person will run new models end-to-end, mine the data exhaust from every run, and translate what we learn into reports and public communication that shape the conversation on where model capability is heading. This is a remote, full-time role.

What You'll Do:

Own our model benchmarking and testing process, and run new frontier models against ARC-AGI-1, ARC-AGI-2, and ARC-AGI-3 as they ship
Build and own the ARC Prize Analysis Package - a repeatable report produced for every new frontier model, turning raw logs into insight on capability, failure modes, and gaps
Own the official and community leaderboards end-to-end - from scoring pipeline to public page
Serve as primary contact for new labs testing on ARC-AGI, and communicate findings externally via Twitter, newsletter, and policy and partner briefings

What We're Looking For:

Research background with hands-on model evaluation experience - you've run evals before and know how to read the results (model training experience not required)
Deep understanding of how modern models work and fail, and comfortable building your own tooling and analysis to answer the questions you care about
Strong ownership instinct and clear technical communicator

Example outputs this role would produce: a model score announcement and a model analysis blog post.

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Ready to Apply?

This will take you to the ARC Prize Foundation application page

Apply on ARC Prize Foundation

Benchmark Testing and Analysis Lead

Job Description

Optimize Your Resume for This Job

Ready to Apply?

Job Details

About ARC Prize Foundation

More Roles at ARC Prize Foundation

Similar Business & Finance Roles