Tracking AI Progress - AGI When?

Progress Legend:

Emoji	Meaning
✅	Completed
🚧	In Progress
⏳	Awaiting Progress
❌	Deadline Missed

Leopold Aschenbrenner - Situational Awareness Timeline

Leopold Aschenbrenner Base Scale Up

Yearly Predictions

2025/2026:

AI will drive $100B+ annual revenues for big tech companies
AI will outcompete PhDs in raw problem-solving smarts
We’ll have $10T companies (OpenAI @ $157 Billion)

2027/2028:

We’ll have models trained on the $100B+ cluster
Full-fledged AI agents/drop-in remote workers will start to widely automate software engineering and other cognitive jobs

Test Time Compute

Number of tokens	Equivalent to me working on something for…	OOMs	Progress
100s	A few minutes	ChatGPT (we are here)	✅
1,000s	Half an hour	+1 OOMs test-time compute	✅(OpenAI's O1-preview thinks for several minutes)
10,000s	Half a workday	+2 OOMs	⏳
100,000s	A workweek	+3 OOMs	⏳
Millions	Multiple months	+4 OOMs	⏳

Training Compute

Observing the increase in model sizes and parameter counts to evaluate progress in AI capability.

Year	OOMs	H100s-equivalent	Cost	Power	Power reference class	Progress
2022	~GPT-4 cluster	~10k	~$500M	~10 MW	~10,000 average homes	✅
~2024	+1 OOM	~100k	$billions	~100 MW	~100,000 homes	✅ (xAI Mephis Datacenter, Colossus in 2024)
~2026	+2 OOMs	~1M	$10s of billions	~1 GW	The Hoover Dam, or a large nuclear reactor	🚧(OpenAI Abilene Datacenter, eta mid 2026)
~2028	+3 OOMs	~10M	$100s of billions	~10 GW	A small/medium US state	🚧(OpenAI + Microsoft, eta 2028)
~2030	+4 OOMs	~100M	$1T+	~100 GW	>20% of US electricity production	⏳

Source: Situational Awareness

OpenAI Levels

OpenAI has a 5 level system for benchmarking progress to AGI

Level	Description	Progress
Chatbots	AI with conversational language	✅
Reasoners	Human-level problem-solving	✅ (OpenAI's O1)
Agents	Systems that can take actions	🚧(OpenAI targeting January 2025)
Innovators	AI that can aid in invention	⏳
Organizations	AI that can do the work of an organization	⏳

Source: AXIOS

Benchmark Saturation

ARC PRIZE: 87.5% (on 12/20/2024 by OpenAI's O3)

MATH: 94.8% (on 9/12/2024 by OpenAI's O1)

GPQA Diamond: 87.7% (on 12/20/2024 by OpenAI's O3)

MMLU: 92.3% (on 9/12/2024 by OpenAI's O1)

AIME: 96.7% (on 12/20/2024 by OpenAI's O3)

EpochAI Frontier Math: 25.2% (on 12/20/2024 by OpenAI's O3)

SWE-bench Verified 71.7% (on 12/20/2024 by OpenAI's O3)

Note: We assume labs are not fabricating scores.