SemiAnalysis: AI Value Capture — The Shift to Model Labs

Source: SemiAnalysis (Dylan Patel et al., 2026-05-01) Link: Post Tier: 1 (industry / hardware / model lab economics — load-bearing) Raw: ../../raw/rss/2026-05-01-semianalysis-ai-value-capture-the-shift-to-model-labs.md

TL;DR

The AI profit pool has structurally shifted in three months. 2023–2025: hardware captured almost all the value (Nvidia, then power utilities, then memory vendors). 2026: model labs are capturing it. Anthropic's ARR went from $9B → $44B+ YTD with inference gross margin from 38% → 70%. Meanwhile, Nvidia and TSMC have not raised prices to match — both deliberately leaving value on the table to manage antitrust optics and ecosystem stability. The piece introduces "One Chart to Rule Them All" — a pricing framework that maps the gap between Nvidia's cost-based pricing and the value-based ceiling, and argues there is room for ~40%+ Nvidia system price increases without breaking the economics.

Why this is Tier 1

This piece is the clearest articulation yet of three things Amit tracks:

Token economics. Blended price for Opus 4.7 is ~$0.99/MTok despite the $5/$25 sticker, because agentic workloads have ~300:1 input/output ratios and 90%+ cache hit rates. This explains how Anthropic margins exploded even after they cut the Opus sticker price 3× (from $15/$75 to $5/$25). KV cache and prompt caching are the unit economics of frontier models in 2026.
Hardware throughput trends. B300 running DeepSeek R1 at 8K input → 1K output: ~1k tokens/sec/GPU baseline, ~8k with wideEP+disagg, ~14k with wideEP+disagg+MTP. 14× from software alone, 32× FP4 GB300 vs FP8 H100. That is the absolute ceiling on the inference-efficiency thread the wiki has been tracking (KV-Packet 04-17, Turbo-Quant 04-22, NeMo-RL spec dec 04-30, Tide 04-30).
The Nvidia restraint puzzle. Capex per watt from GB300 → VR NVL72 only crept from $37.4/W to $38.1/W despite TDP nearly doubling. The piece's hypothesis — Nvidia is acting as the "central bank of AI" rather than a margin-maximizer — has direct implications for the 2027 pricing landscape and where AI-chip startups (Cerebras, Groq, Tenstorrent) can survive.

Three load-bearing claims

1. Agentic AI changed the unit economics in December 2025

SemiAnalysis's own usage: $10.95M annual run-rate on Anthropic, ~30% of employee compensation, ~5B tokens/month/employee (5× more than Meta), with power-law distributed users running 100B+ tokens/month. The argument: agentic AI has hit a ROI threshold where customers don't optimize the marginal cost of compute — they optimize for access to compute. This is the demand-curve shift that lets labs and infrastructure providers reprice without losing customers.

2. Frontier labs have durable pricing power, contra the open-source narrative

Two reasons the piece argues this won't be competed away:

Quality gap persists. Kimi K2.6 ($0.95/$4) exerts very little downward pressure on Opus pricing. The gap between best closed and best open is not closing for real knowledge work.
Compute supply is the binding constraint. No single lab can serve the entire market. Anthropic is already alienating swathes of the market by locking Claude Code behind $100+/mo and blocking third-party harnesses (e.g., OpenClaw). Demand will outstrip supply for the foreseeable future, so any lab capable of frontier quality charges based on value delivered, not marginal cost.

This is the core anti-bubble argument and the single strongest counter to the Marcus "greatest capital misallocation" framing (also today). They cannot both be right.

3. Nvidia is the central bank of AI

The capex-per-watt evidence is striking: GB300 → VR NVL72 has roughly doubling TDP, materially better performance/W, and yet only a 2% capex/watt creep. SemiAnalysis attributes this to:

Antitrust optics. Aggressive repricing in a supply-constrained market draws regulatory attention.
Ecosystem stability. Nvidia wants long-term demand expansion, not short-term margin extraction. Frontier labs benefit from Nvidia's software-driven efficiency gains, but those gains are not fully monetized at the hardware level — incremental value flows downstream to AI labs, hyperscalers, neoclouds, memory vendors.
Memory as the next pricing lever. SOCAMM2 in Rubin is a socketed module (vs GB300's soldered LPDDR5X), which lets Nvidia explicitly price memory as its own line item with a 60% margin justification. This is the next margin lever Nvidia has not fully pulled.

The "One Chart to Rule Them All" framework — combining cost-based floor (Neocloud IRR hurdle, ~~$4.92/hr/GPU for 5-year VR NVL72 at 15.6% IRR) with value-based ceiling (~~$12.25/hr/GPU at $0.70/PFLOP parity with GB300) — reveals a 2.5× gap. That gap is the structural under-pricing argument.

Connection to prior wiki

Anthropic $900B valuation (04-30 Decoder) is the equity-market companion to this piece. ARR $44B at 5–8% margins → at 70% margins justifies a ~$900B valuation if the margin holds.
DeepSeek V4 cache costs -90% (04-24) sits inside SemiAnalysis's "tokens are getting cheaper to produce" thesis. The $0.99 effective Opus price is a downstream of architectural efficiency improvements like CSA/HCA.
Bun-Anthropic acquisition + Zig anti-LLM policy (04-30) is consistent with SemiAnalysis's "labs are capturing all the value" frame — Anthropic is now a major industrial buyer of compiler tooling, which is a marker of the broader cost shift.
Cerebras IPO (LWiAI 242, 04-30) — Cerebras is betting that the "AI compute is too expensive" thesis is right and Nvidia's restraint is unsustainable. SemiAnalysis is betting the opposite — that Nvidia has room and the demand curve doesn't break. These are fundamentally inconsistent.
Marcus "greatest capital misallocation" (04-30) — the explicit antithesis of the SemiAnalysis demand thesis. SemiAnalysis says token demand is compounding and ROI-justified; Marcus says the capex is a $700B bet without comparable returns. Either tokens-deliver-real-economic-value or they don't. Both views appearing on consecutive days, in the same wiki batch, is the cleanest signal that the AI investment debate is now bipolar.

Why it matters

If the SemiAnalysis read is right, three things follow inside Amit's research interests:

KV cache and prompt caching research becomes more, not less, valuable. The unit economics that justify Anthropic's $900B valuation depend on >90% cache hit rates. Cache compression and reuse research (Stochastic KV Routing 04-28, Turbo-Quant 04-22, KV-Packet 04-17) is now financial-impact-driven, not academic.
Inference efficiency = lab margin = market cap. Every paper on speculative decoding, MoE routing, KV compression directly maps to lab margin. Inference-efficiency research has become a competitive strategy lever, not just an engineering optimization.
The Nvidia repricing risk is real. If Nvidia eventually moves to value-based pricing, model lab margins compress unless inference efficiency keeps doubling. This is the explicit pressure on labs to keep funding inference-efficiency research at current intensity.

Open question

The single biggest unresolved tension: does the demand curve break? SemiAnalysis assumes demand keeps compounding and the only constraint is supply. Marcus and the bears assume the marginal use case is not real productivity but speculative spending. Whichever side is right, the wiki will be able to resolve it within 90 days from the same data sources (Anthropic ARR run-rate, hyperscaler capex outlooks, neocloud rental price trends).