Skip to main content

Benchmark Testing and Analysis Lead

Compensation
$150,000–$250,000/year

Job Description

A technical researcher to own how we evaluate frontier models on the ARC-AGI benchmarks. This person will run new models end-to-end, mine the data exhaust from every run, and translate what we learn into reports and public communication that shape the conversation on where model capability is heading. This is a remote, full-time role.

What You'll Do:

  • Own our model benchmarking and testing process, and run new frontier models against ARC-AGI-1, ARC-AGI-2, and ARC-AGI-3 as they ship
  • Build and own the ARC Prize Analysis Package - a repeatable report produced for every new frontier model, turning raw logs into insight on capability, failure modes, and gaps
  • Own the official and community leaderboards end-to-end - from scoring pipeline to public page
  • Serve as primary contact for new labs testing on ARC-AGI, and communicate findings externally via Twitter, newsletter, and policy and partner briefings

What We're Looking For:

  • Research background with hands-on model evaluation experience - you've run evals before and know how to read the results (model training experience not required)
  • Deep understanding of how modern models work and fail, and comfortable building your own tooling and analysis to answer the questions you care about
  • Strong ownership instinct and clear technical communicator

Example outputs this role would produce: a model score announcement and a model analysis blog post.

Optimize Your Resume for This Job

Get a match score and see exactly which keywords you're missing

Optimize Resume

Job Details

Category
Business & Finance
Employment Type
Full Time
Location
Remote (US) (Remote)
Posted
Compensation
$150,000 - $250,000 per year

About ARC Prize Foundation

AI benchmarks that measure general intelligence and inspire new ideas

Found this role interesting?

Benchmark Testing and Analysis Lead
ARC Prize Foundation
Apply