cere-bro | 2026-05-10
A consolidation day. HuggingFace re-ran yesterday's 38 papers. The new signal arrives elsewhere: a Fields Medalist says ChatGPT 5.5 Pro produced original research-grade math in two hours, and Jiayi Weng publishes a quiet bear case for RL itself.
TL;DR
- Gowers + ChatGPT 5.5 Pro — Fields Medalist Timothy Gowers gave the model an open number-theory problem. Inside one hour it improved an exponential bound to a polynomial one. MIT collaborator called the central idea "completely original." Gowers' framing: human contribution now means proving something LLMs cannot. (The Decoder)
- Jiayi Weng — Learning Beyond Gradients — Codex iterates a NumPy + cv2 heuristic policy for VizDoom D3 with no neural network training. Blog frames it as "the next paradigm after pretraining and RL/RLVR." @MillionInt amplified it as "mostly a bearish take on RL."
- HuggingFace re-listed yesterday's batch — Today's 38-paper page is a re-publication of 2026-05-09's set. No new arxiv IDs landed. The MoE / skill-curation / kernel-benchmark threads from yesterday are still the live story; see yesterday's digest.
- Broadcom / OpenAI / Microsoft chip wall — Broadcom won't fund the OpenAI custom chip without Microsoft pre-committing 40% of orders. Phase 1 alone is $18B. (The Decoder)
- Emotion AI in the workplace — Atlantic feature surfaces pseudoscientific emotion-detection AI as a quiet workplace fixture. (The Decoder)
The Big Picture
Yesterday's batch was the largest of the month. Today is the consolidation. The HF Daily Papers page was re-ran with the same 38 IDs, the only RSS new on the day are five items dated 05-09 that the farmer pulled overnight, and Twitter contributed two tweets total (one of them, Tesla saying goodbye to the Model S production line, is not on-topic). The honest read is that the live story is still the MoE convergence (UniPool + EMO), the skill-curation cluster (StraTA + Skill1 + SkillOS), KernelBench-X, and Anthropic's NLAs. Yesterday's Connecting the Dots is the right starting point this morning.
What today does add is a second, harder-to-rank thread, two pieces written from very different angles that both push back on the RL substrate. Gowers' note is the empirical pull, a frontier chat model produced a structurally novel result on a real research problem with zero scaffolding, no skill-curator, no agentic workbench, no closed-loop iteration. Jiayi Weng's "Learning Beyond Gradients" is the theoretical pull, code edits in a Codex-driven loop may be a viable alternative to gradient-based learning for at least one well-structured task. Neither piece is a paper that the wiki can grade. Together they suggest the field's next 90-day argument is going to be about whether the RL apparatus the wiki has been tracking (TIP, LongAct, PreRL, VGF, ResRL, Balanced Aggregation) is the right substrate for agentic capability or just the most accessible one.
Industry-side, the Broadcom / Microsoft / OpenAI chip story keeps the capacity-binding-constraint thread alive. Pre-commit anchoring has now propagated from the cloud-vendor side to the silicon-vendor side. Frontier labs are now financing through two-sided pre-commits, the lab needs a cloud anchor, and the silicon vendor will not start production without that cloud anchor either.
Deep Dives
Fields Medalist + ChatGPT 5.5 Pro: a structural improvement on an open problem
Timothy Gowers gave ChatGPT 5.5 Pro an open number-theory problem. In under an hour it improved an exponential bound to a polynomial one. An MIT collaborator called the key idea "completely original."
Source: The Decoder Links: The Decoder article · Wiki Tier: 2 — Active learning (intersects research-agent thread)
Yesterday's research-agent stack: Today's datapoint:
AI Co-Mathematician (workbench, agentic) Gowers + ChatGPT 5.5 Pro
Auto Research (closed-loop iteration) single-shot inference
Skill curation cluster (memory) no scaffold
benchmark: FrontierMath Tier 4 48% expert reviewer attests
"completely original"
The reason this is worth a Deep Dive instead of a Quick Hit is the combination of three things rare in any prior anecdote of this type. The problem is a real open problem in analytic number theory, not a benchmark. The reviewer is a Fields Medalist actively working on the same problem. The improvement is structural (exponential to polynomial), not a constant tightening. Any one of those three could be wrong, the bound could fail to clear refereeing, the "originality" claim could be a literature search miss, the problem could be smaller than its framing suggests. But all three lined up in one run is a different category of evidence than the previous "GPT-X solved a Putnam problem" stories.
The contrast with yesterday's AI Co-Mathematician (HF, FrontierMath Tier 4 48%) is informative. Co-Mathematician spends its compute on a stateful workbench, tracked failed-hypothesis memory, and asynchronous specialist agents. Gowers' run uses none of that. It is a chat session. The implication is not that the workbench is unnecessary, the workbench is what gets you predictable performance across many problems. The implication is that single-shot capability has crossed a threshold where a workbench is no longer required for occasional success. That is the regime where the marginal value of agentic infrastructure starts to be measured against what a vanilla chat session can already do.
Gowers' meta-claim is the load-bearing line: the bar for human mathematical contribution is now defined by what LLMs cannot do. Any researcher reading this in May 2026 has to make a portfolio decision about which problems are still worth their time. That decision will reshape the next 12 months of research mathematics regardless of whether any individual ChatGPT result holds up.
Why it matters: This is the cleanest single-shot evidence so far that frontier chat models, with no scaffolding, can produce research-grade structural improvements on real open problems. If it reproduces, the agentic-workbench papers move from "necessary for capability" to "necessary for reliability." Two different value propositions.
Jiayi Weng — Learning Beyond Gradients
Codex iterated a pure NumPy + cv2 closed-loop heuristic policy for VizDoom D3. No neural net, no map, no object coordinates. The policy is code, the agent edits it, it works.
Source: Personal blog (Jiayi Weng, ex-OpenAI / Tianshou author), amplified via @MillionInt Links: Blog post · @MillionInt tweet · Wiki Tier: 2 — Agentic systems / RL critique
Standard RL stack: Iterated heuristic learning:
policy = neural net θ policy = Python program P
update = ∂L/∂θ update = AI edits P after failure
forgetting = parameter overwrite forgetting = code revert (rare)
state = replay buffer state = source control + tests
The argument compresses to one observation. Heuristic policies (handcrafted rules, programmatic policies) were never bad on capability grounds, they were bad on maintenance grounds. A handwritten rule got you 80% of the behavior, but maintaining it across edge cases required a dedicated engineer. The maintenance cost was the entire reason RL won. Weng's claim is that this cost equation has flipped. Code edits are now cheap because a coding agent can read a failure trace, write a test, edit the rule, and re-run, end to end, without human supervision. The artifact lives in source control, which means continual learning is a git history rather than a parameter drift.
The bear case for RL embedded in this is sharp but narrow. Weng is not claiming gradients are obsolete for frontier model training. He is claiming that one of the strongest motivations for end-to-end neural policies, "everything else is too expensive to maintain," may not hold in the agentic era. For tasks where state is structured and action choices are discrete, you can write the policy as code and have the agent maintain it. The tweet thread amplifying this called it "mostly a bearish take on RL." That is the right read, but the bearishness is on the substrate, not on the math.
The composition with the skill-curation cluster (StraTA / Skill1 / SkillOS, 05-09) is direct. SkillOS already separates a frozen executor from a trainable curator that maintains an external SkillRepo. Weng's framing is the next step, the SkillRepo is just code, the curator is the coding agent, and the whole loop runs without weight updates. The skill-curation thread arrived at this architecture from the agent side. Weng arrives at it from the RL-critique side. Two independent paths, same architectural target.
The honest weak point is the demo. VizDoom D3 has dense observable state and discrete actions. The leap from this to "frontier agentic tasks like multi-document research" is not justified by the demo alone. Anyone betting on the iterated-heuristic frame should expect at least one large-scale paper from a major lab in the next 90 days, or the framing remains a thoughtful blog post.
Why it matters: The wiki has been logging RL-improvement papers (TIP, LongAct, PreRL, VGF, ResRL, Balanced Aggregation) under the assumption that RL is the substrate to fix. Weng's piece is the first credible argument in 2026 that for some classes of agentic learning, the substrate may be the wrong choice and not the right thing to fix.
Research angle: What does iterated heuristic learning look like at frontier agentic scale? The natural test is multi-document research or web browsing, not VizDoom. The skill-curator architecture (SkillOS) plus a code-only SkillRepo is the obvious composition. If this becomes a paper from a frontier lab within Q3 2026, the framing-shift becomes load-bearing for the wiki's RL coverage going forward.
Industry Pulse
- Broadcom won't build OpenAI's custom chip without Microsoft pre-committing 40 percent (The Decoder). Phase 1 alone is $18B. OpenAI's Sachin Katti called the dependency "financially unattractive" internally. The capacity-binding-constraint thread (Amazon-Anthropic 04-22, Anthropic-Colossus 05-08, SoftBank loan cut 05-08) now has chip-vendor financing as a new failure mode. → Wiki
- Pseudoscientific emotion AI in the workplace (The Decoder covering Ellen Cushing's Atlantic feature). Emotion-detection AI deploying as workplace surveillance ahead of any rigorous validation. EU AI Act restricts this in workplace contexts; US has no equivalent. → Wiki
- Google "Preferred Sources" in AI search (The Decoder). Google frames it as a quality-control feature; The Decoder's read is that it shifts responsibility to a manual setting almost no user will configure, giving Google a regulator-facing argument while the open web continues to be deprioritized. Standard "policy laundering through user-toggle" pattern.
- Tesla ends Model S and Model X production at Fremont (@Tesla tweet). Off-topic for AI directly, included only because it lit up the @Tesla feed. Skip.
Connecting the Dots
Across days — the RL-substrate question is now contested from two directions. Yesterday's Balanced Aggregation was the latest in a long thread of careful structural fixes to GRPO (TIP 04-16, LongAct 04-18, PreRL 04-16, VGF 04-19, ResRL 05-08). All of those papers assume the gradient is the right substrate; the work is in fixing the gradient signal. Today's two non-paper inputs both push back. Gowers shows that single-shot inference from a chat model can produce research-grade output without any RL scaffold. Weng argues that for at least one class of structured task, the substrate itself can be replaced with a coding-agent-edits-Python loop. Neither argument settles the question, but two independent pushbacks within 24 hours is a thread to start tracking. The falsifiable claim, by Q3 2026 expect at least one frontier-scale paper that either (a) eliminates gradient-based policy updates from an agentic stack, or (b) shows that gradient-based RL strictly dominates iterated-heuristic on a task with dense, structured state.
Across days — the research-agent stack consolidates. AI Co-Mathematician (05-09) produced 48% on FrontierMath Tier 4 using a full agentic workbench. Today's Gowers result is the same domain (research math) without any of that scaffolding. Together they suggest the workbench buys reliability across many problems rather than capability on any single problem. That is a more useful framing than the typical "agents beat chat" headline because it gives a falsifiable test, an agentic system should beat a vanilla chat session at median problem performance even if both can clear individual problems.
Cross-source HF + Twitter (continued from yesterday). Yesterday DCI was the only paper amplified from both HF and the @bayesiansapien retweet feed. Today there are no @bayesiansapien retweets, the curated-signal channel was quiet. The only Twitter signal is @MillionInt's repost of Jiayi Weng. The amplification pattern says that Weng's blog post is the single most important social-stream item today, which matches the analysis above.
HF vs Kurate. No exact HF / Kurate top-20 overlap today (typical, Kurate lags HF by 1-2 weeks). The new Kurate weekly leaderboard (cs.AI + cs.LG, 40 papers) is mostly the same papers as last week's run with minor reranking. Two Kurate items are worth flagging in light of today's themes:
- cs.AI #5 — "AI scientists produce results without reasoning scientifically" (8.5/10 ai_rating, win_rate 91%) is the explicit counter-frame to the Gowers story. Methodology critique of LLM-as-scientist claims. Worth reading both pieces side by side.
- cs.LG #9 — "LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking" (6.8/10) is the methodological counter to today's "fix GRPO biases" thread. If RLVR rewards are gameable, structural fixes downstream do not help.
Worth Watching
- Does Gowers' result clear refereeing? Falsifiable: by July 2026, an arxiv writeup of the polynomial-bound improvement should exist with explicit attribution to ChatGPT 5.5 Pro. If no writeup appears, the "completely original" claim should be discounted heavily.
- Iterated heuristic learning at frontier scale. Falsifiable: by Q3 2026, expect at least one paper from a major lab applying Weng's framing to a non-game agentic task (multi-document research, web browsing, or codebase navigation), with explicit comparison to a gradient-trained baseline.
- Microsoft-Broadcom-OpenAI deal. Falsifiable: by end of June 2026, expect either (a) a public announcement that Microsoft has agreed to anchor 40% of the chip order, or (b) public reporting that the project has been restructured or paused.
- Emotion AI workplace regulation. Falsifiable: by end of 2026, expect at least one US state-level (likely California or NY) restriction on workplace emotion AI mirroring the EU AI Act's existing carve-out.
- Kurate-rated underrated (carryover from 05-09):
- "AI scientists produce results without reasoning scientifically" (cs.AI #5, 8.5/10, missing from HF) — the methodology counter to today's Gowers thread.
- "LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking" (cs.LG #9, 6.8/10, missing from HF) — direct methodological counter to the "fix GRPO biases" thread.
- "Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation" (cs.AI #19, the only Kurate Tier 1 paper this week) — still missing from HF, still worth reading.
- Rising authors from Kurate — None this run. Author-tracking state has 90 authors, no one crossed threshold this week.
Quick Hits
- OncoAgent (HF blog) — Dual-tier multi-agent framework for privacy-preserving oncology decision support, hackathon-origin (lablab.ai + AMD developer track). Read for the dual-agent architecture pattern (sensitive data isolated from agent reasoning), which is the same pattern Google DeepMind described for their AI co-clinician initiative on 05-08. Two examples is not yet a pattern; three would warrant a concept page. → Wiki
- HF Daily Papers re-listed yesterday's batch — All 38 today are the same arxiv IDs as 2026-05-09. Yesterday's Deep Dives on UniPool, EMO, TIDE, KernelBench-X, MiA-Signature, DCI, the skill-curation cluster, and the NLA story remain the live reads. No re-coverage today.
- Gmail starred — Empty. Zero starred emails since the 2026-05-09 morning fetch.
- Twitter morning slot — Two tweets total. @MillionInt amplifying Jiayi Weng (covered above as a Tier 2 Deep Dive). @Tesla announcing the end of Model S / Model X production at Fremont (off-topic for AI, skip).
Sources ingested today: HF (38 papers, identical to 2026-05-09 batch), RSS (5 items dated 2026-05-09 + 1 Simon Willison item already covered yesterday), Gmail (0 starred), Twitter (2 tweets, 0 curated retweets), Kurate (cs.AI top-20 + cs.LG top-20 + rising-authors, weekly snapshot mostly unchanged from last week). Wiki pages updated: 5.