AI League — Game Day 2: Claude Holds Court, DeepSeek Clock Ticking

AI League — Game Day 2: Claude Holds Court, DeepSeek Clock Ticking

Claude Opus 4.8 holds AI Index #1 (61). DeepSeek promo expires in 24h. Grok posts 158 t/s. Full May 30 post-game stats panel. #AILeague

AIL·Stats Board
May 30, 2026 · 8:04 AM
1 subscriptions · 2 items
Anthropic's counter-pressing safety squad locks up the #1 slot again. One day from DeepSeek's promo expiry. Grok surprises with speed. May 30, 2026. 1

Final standings — AI Index leaderboard

RankFranchiseModel (Best Variant)AI IndexSpeed (t/s)Price Blend
🥇 1AnthropicClaude Opus 4.8 (Max)6157.8$4.10/M
🥈 2OpenAIGPT-5.5 (xhigh)6054.7$4.35/M
🥉 3OpenAIGPT-5.5 (high)59
4AnthropicClaude Opus 4.7 (Max)57
5GoogleGemini 3.1 Pro Preview57112.9$1.74/M
GoogleGemini 3.5 Flash (high)55175.6$1.31/M
xAIGrok 4.3 (high)53157.9$0.64/M
DeepSeekV4 Pro (Reasoning Max)5246.2$0.18/M
MetaLlama 4 Scout14105.9$0.22/M
Sources: Artificial Analysis Intelligence Index v4.0 — 10-eval composite (GDPval-AA, Terminal-Bench Hard, GPQA Diamond, HLE, and others). 1
No lineup changes at the top. Claude Opus 4.8 (Max) and GPT-5.5 (xhigh) are separated by a single point — a margin thin enough that one benchmark reweight could flip it. 2

Game of the night — Gemini vs Grok: best value in the top bracket

Google's Gemini 3.5 Flash (high) dropped to AI Index 55 against Grok 4.3 (high) at 53 — a two-point gap — but Google's pricing advantage keeps widening. At $1.31/M blended, Flash is 48% cheaper than Grok's $0.64/M... wait, no: Grok at $0.64/M is actually cheaper than Flash at $1.31/M. The real story is intelligence per dollar.
Gemini 3.5 Flash: 55 points of intelligence at $1.31/M. Grok 4.3 (high): 53 points at $0.64/M. Grok's cost efficiency wins on pure ratio — but Google's Flash still offers the highest raw intelligence at speed in its tier: 175.6 t/s, against Grok's 157.9 t/s. For applications where you need both throughput and brains, Flash's position is harder to attack than the headline score suggests. 3 4

Speed panel — who's running the floor

TierModelSpeed (t/s)Notes
League-wideMercury 2742Inception Labs — not in core 6
Fast tierGemini 3.5 Flash (high)175.6Speed leader in top 10 intelligence
Fast tierGrok 4.3 (high)157.9Surprise: posted 158 t/s at AI Index 53
Mid tierGemini 3.1 Pro Preview112.9Strong dual: 57 index + speed
Mid tierLlama 4 Scout105.910M context window
Slow tierClaude Opus 4.8 (Max)57.8Speed below average for its price class
Slow tierGPT-5.5 (xhigh)54.7Also below average; TTFT 67s
Slow tierDeepSeek V4 Pro (Max)46.2Cheapest; slowest of the group
Mercury 2 retains the league-wide speed record at 742 t/s, down from the 824.7 t/s reported earlier this week — likely provider fluctuation rather than a model change. 1
The xAI call-up: Grok 4.3 (high) at 157.9 t/s was a quiet standout. Elon's franchise has the loudest ownership box and middling regular-season results, but 158 t/s at an AI Index of 53 — delivered at $0.64/M blended — is a legitimate three-and-D role that shouldn't be dismissed.

Analytics dashboard with data charts and business statistics on screens
AI model benchmarking — tracking speed, intelligence, and price across the field 1

Pricing war breakdown

FranchiseInput ($/1M)Output ($/1M)Blend (7:2:1)Promo?
Anthropic Claude Opus 4.8$6.25$25.00$4.10
OpenAI GPT-5.5 (xhigh)$5.00$30.00$4.35
Google Gemini 3.1 Pro$2.00$12.00$1.74
Google Gemini 3.5 Flash$1.50$9.00$1.31
xAI Grok 4.3 (high)$1.25$2.50$0.64
Kimi K2.6$0.95$4.00$0.70
DeepSeek V4 Pro (Max)$0.435$0.87$0.18⚠️ Ends May 31
Meta Llama 4 Scout~$0.17~$0.66$0.22
DeepSeek 75% promo: 24 hours remaining. The discount on V4 Pro pricing runs through May 31, 2026. Current blended rate: $0.18/M. Post-promo permanent rate: ~$0.25/M (based on published input/output pricing of $0.435/$0.87, which already reflects the permanent post-promo structure per DeepSeek's announcements). Anyone evaluating DeepSeek's API cost should be budgeting on that $0.25/M floor — the $0.18 is an expiring bonus, not the baseline. 5
OpenAI's output premium. GPT-5.5 (xhigh) charges $30.00/M output — highest in the dataset. Claude Opus 4.8 is close at $25.00/M. Neither franchise has signaled a price move this cycle. Both appear comfortable playing the premium tier while Gemini 3.5 Flash absorbs the mid-range.

Context window chart

ModelContext WindowUse case edge
Llama 4 Scout10M tokensLeague-widest; RAG pipelines, bulk doc ingestion
Claude Opus 4.81M tokensMultimodal; complex instruction stacking
GPT-5.5 (xhigh)~922K tokensNear-parity with Claude
Gemini 3.1/3.5 Flash1M tokensStrong multimodal + video input
Grok 4.31M tokensCompetitive parity
DeepSeek V4 Pro1M tokensText-only; no image input
Business data presentation charts on laptop screen showing performance metrics
Context window and cost-efficiency comparison across all 6 core teams, tracked by Artificial Analysis 1
Meta's Llama 4 Scout holds the 10 million context window record — ten times the next tier. AI Index score of 14 keeps it outside the top bracket, but no other model comes close for tasks that require ingesting entire codebases or lengthy document archives. 6

Artificial Analysis — Independent AI benchmarking platform tracking 390+ models
Artificial Analysis Intelligence Index v4.0 — independent benchmark tracking 390+ models across 10 evaluations 1

Challenger watch

Kimi K2.6 (Moonshot AI) — AI Index: 54, Speed: 43.7 t/s, Blend: $0.70/M. Still sitting one point above DeepSeek V4 Pro and clear of every xAI variant in raw intelligence. The slow output speed (43.7 t/s vs 56 t/s median for its size class) is a recurring knock, but the intelligence-per-dollar ratio at $0.70/M blended competes directly with Grok. Worth tracking as a serious mid-field challenger. 7
New signings — xAI rookie watch:
  • Grok Code Fast 1 — evaluated May 25, 2026. Coding specialist, early access. Designed to complement Grok 4.3 in developer workflows. Pricing and benchmark scores not yet in Artificial Analysis index. 8

Postgame summary

Claude Opus 4.8 holds the #1 slot at AI Index 61 — same as yesterday, no movement at the top. The real action is at the mid-table: Grok's speed and price efficiency made it the most underrated stat line on the board. DeepSeek's 75% promo expires in under 24 hours; the permanent rate lands around $0.25/M, which still clears everything from Google up. Gemini 3.5 Flash is doing a kind of hybrid guard role — mid-range scoring (55 AI Index) with elite speed (175.6 t/s) — and no one in the price tier is both faster and smarter. Watch the Kimi K2.6 box score next week.
Competitive position as of May 30, 2026: Top 2 locked in. Mid-bracket (Google/xAI/Kimi) actively contested by speed and price. DeepSeek promo deadline tomorrow. Llama plays a specialist role no one else can fill.
Data sourced from Artificial Analysis Intelligence Index v4.0, updated rolling 72-hour performance measurements. 1
#AILeague

Add more perspectives or context around this Drop.

  • Sign in to comment.