AI League — Game Day 2: Claude Holds Court, DeepSeek Clock Ticking

Anthropic's counter-pressing safety squad locks up the #1 slot again. One day from DeepSeek's promo expiry. Grok surprises with speed. May 30, 2026. 1

Final standings — AI Index leaderboard

Rank	Franchise	Model (Best Variant)	AI Index	Speed (t/s)	Price Blend
🥇 1	Anthropic	Claude Opus 4.8 (Max)	61	57.8	$4.10/M
🥈 2	OpenAI	GPT-5.5 (xhigh)	60	54.7	$4.35/M
🥉 3	OpenAI	GPT-5.5 (high)	59	—	—
4	Anthropic	Claude Opus 4.7 (Max)	57	—	—
5	Google	Gemini 3.1 Pro Preview	57	112.9	$1.74/M
—	Google	Gemini 3.5 Flash (high)	55	175.6	$1.31/M
—	xAI	Grok 4.3 (high)	53	157.9	$0.64/M
—	DeepSeek	V4 Pro (Reasoning Max)	52	46.2	$0.18/M
—	Meta	Llama 4 Scout	14	105.9	$0.22/M

Sources: Artificial Analysis Intelligence Index v4.0 — 10-eval composite (GDPval-AA, Terminal-Bench Hard, GPQA Diamond, HLE, and others). 1

No lineup changes at the top. Claude Opus 4.8 (Max) and GPT-5.5 (xhigh) are separated by a single point — a margin thin enough that one benchmark reweight could flip it. 2

Game of the night — Gemini vs Grok: best value in the top bracket

Google's Gemini 3.5 Flash (high) dropped to AI Index 55 against Grok 4.3 (high) at 53 — a two-point gap — but Google's pricing advantage keeps widening. At $1.31/M blended, Flash is 48% cheaper than Grok's $0.64/M... wait, no: Grok at $0.64/M is actually cheaper than Flash at $1.31/M. The real story is intelligence per dollar.

Gemini 3.5 Flash: 55 points of intelligence at $1.31/M. Grok 4.3 (high): 53 points at $0.64/M. Grok's cost efficiency wins on pure ratio — but Google's Flash still offers the highest raw intelligence at speed in its tier: 175.6 t/s, against Grok's 157.9 t/s. For applications where you need both throughput and brains, Flash's position is harder to attack than the headline score suggests. 3 4

Speed panel — who's running the floor

Tier	Model	Speed (t/s)	Notes
League-wide	Mercury 2	742	Inception Labs — not in core 6
Fast tier	Gemini 3.5 Flash (high)	175.6	Speed leader in top 10 intelligence
Fast tier	Grok 4.3 (high)	157.9	Surprise: posted 158 t/s at AI Index 53
Mid tier	Gemini 3.1 Pro Preview	112.9	Strong dual: 57 index + speed
Mid tier	Llama 4 Scout	105.9	10M context window
Slow tier	Claude Opus 4.8 (Max)	57.8	Speed below average for its price class
Slow tier	GPT-5.5 (xhigh)	54.7	Also below average; TTFT 67s
Slow tier	DeepSeek V4 Pro (Max)	46.2	Cheapest; slowest of the group

Mercury 2 retains the league-wide speed record at 742 t/s, down from the 824.7 t/s reported earlier this week — likely provider fluctuation rather than a model change. 1

The xAI call-up: Grok 4.3 (high) at 157.9 t/s was a quiet standout. Elon's franchise has the loudest ownership box and middling regular-season results, but 158 t/s at an AI Index of 53 — delivered at $0.64/M blended — is a legitimate three-and-D role that shouldn't be dismissed.

Analytics dashboard with data charts and business statistics on screens — AI model benchmarking — tracking speed, intelligence, and price across the field 1

Pricing war breakdown

Franchise	Input ($/1M)	Output ($/1M)	Blend (7:2:1)	Promo?
Anthropic Claude Opus 4.8	$6.25	$25.00	$4.10	—
OpenAI GPT-5.5 (xhigh)	$5.00	$30.00	$4.35	—
Google Gemini 3.1 Pro	$2.00	$12.00	$1.74	—
Google Gemini 3.5 Flash	$1.50	$9.00	$1.31	—
xAI Grok 4.3 (high)	$1.25	$2.50	$0.64	—
Kimi K2.6	$0.95	$4.00	$0.70	—
DeepSeek V4 Pro (Max)	$0.435	$0.87	$0.18	⚠️ Ends May 31
Meta Llama 4 Scout	~$0.17	~$0.66	$0.22	—

DeepSeek 75% promo: 24 hours remaining. The discount on V4 Pro pricing runs through May 31, 2026. Current blended rate: $0.18/M. Post-promo permanent rate: ~$0.25/M (based on published input/output pricing of $0.435/$0.87, which already reflects the permanent post-promo structure per DeepSeek's announcements). Anyone evaluating DeepSeek's API cost should be budgeting on that $0.25/M floor — the $0.18 is an expiring bonus, not the baseline. 5

OpenAI's output premium. GPT-5.5 (xhigh) charges $30.00/M output — highest in the dataset. Claude Opus 4.8 is close at $25.00/M. Neither franchise has signaled a price move this cycle. Both appear comfortable playing the premium tier while Gemini 3.5 Flash absorbs the mid-range.

Context window chart

Model	Context Window	Use case edge
Llama 4 Scout	10M tokens	League-widest; RAG pipelines, bulk doc ingestion
Claude Opus 4.8	1M tokens	Multimodal; complex instruction stacking
GPT-5.5 (xhigh)	~922K tokens	Near-parity with Claude
Gemini 3.1/3.5 Flash	1M tokens	Strong multimodal + video input
Grok 4.3	1M tokens	Competitive parity
DeepSeek V4 Pro	1M tokens	Text-only; no image input

Business data presentation charts on laptop screen showing performance metrics — Context window and cost-efficiency comparison across all 6 core teams, tracked by Artificial Analysis 1

Meta's Llama 4 Scout holds the 10 million context window record — ten times the next tier. AI Index score of 14 keeps it outside the top bracket, but no other model comes close for tasks that require ingesting entire codebases or lengthy document archives. 6

Artificial Analysis — Independent AI benchmarking platform tracking 390+ models — Artificial Analysis Intelligence Index v4.0 — independent benchmark tracking 390+ models across 10 evaluations 1

Challenger watch

Kimi K2.6 (Moonshot AI) — AI Index: 54, Speed: 43.7 t/s, Blend: $0.70/M. Still sitting one point above DeepSeek V4 Pro and clear of every xAI variant in raw intelligence. The slow output speed (43.7 t/s vs 56 t/s median for its size class) is a recurring knock, but the intelligence-per-dollar ratio at $0.70/M blended competes directly with Grok. Worth tracking as a serious mid-field challenger. 7

New signings — xAI rookie watch:

Grok Code Fast 1 — evaluated May 25, 2026. Coding specialist, early access. Designed to complement Grok 4.3 in developer workflows. Pricing and benchmark scores not yet in Artificial Analysis index. 8

Postgame summary

Claude Opus 4.8 holds the #1 slot at AI Index 61 — same as yesterday, no movement at the top. The real action is at the mid-table: Grok's speed and price efficiency made it the most underrated stat line on the board. DeepSeek's 75% promo expires in under 24 hours; the permanent rate lands around $0.25/M, which still clears everything from Google up. Gemini 3.5 Flash is doing a kind of hybrid guard role — mid-range scoring (55 AI Index) with elite speed (175.6 t/s) — and no one in the price tier is both faster and smarter. Watch the Kimi K2.6 box score next week.

Competitive position as of May 30, 2026: Top 2 locked in. Mid-bracket (Google/xAI/Kimi) actively contested by speed and price. DeepSeek promo deadline tomorrow. Llama plays a specialist role no one else can fill.

Data sourced from Artificial Analysis Intelligence Index v4.0, updated rolling 72-hour performance measurements. 1

#AILeague