social-stream · 2026-05-11

2026-05-11

Summary

The day splits into two substantive signals and a long tail of low-value content. The morning slot is anchored by @burkov's retweet of the Sakana Conductor paper, a 7B RL orchestrator that routes between GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro at roughly 3 calls per question and beats every individual frontier model on GPQA-D, LiveCodeBench, and AIME25. The same paper is independently surfaced in today's DAIR.AI weekly Gmail, which makes it cross-source confirmed and the load-bearing item of the day for routing. The evening slot's only real signal is an Apple paper, surfaced via @omarsar0, on moving tool-call evaluation inside the execution loop with a reviewer agent and Helpfulness-Harmfulness metrics. A weak 4-post cluster on Jensen Huang's CMU commencement runs through the morning, plus six opaque x.com/i/article/ reposts from @bayesiansapien that the pipeline cannot resolve. Afternoon is empty.

Posts

  • Sakana Conductor (cross-source confirmed, HF + DAIR.AI Gmail) (@burkov · arXiv 2512.04388 · ChapterPal summary) [morning]. 7B RL policy orchestrates GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro by writing NL subtasks and assigning workers, beats every individual model on GPQA-D, LiveCodeBench, and AIME25 at ~3 calls per question. See Conductor summary.
  • In-loop reviewer agent for tool-calling (Apple) (@omarsar0 · arXiv 2604.27233) [evening]. A reviewer agent inspects each provisional tool call before execution and injects feedback; the paper proposes Helpfulness and Harmfulness metrics to measure whether the reviewer fixes more errors than it creates. Reports +5.5% on BFCL irrelevance detection and +7.1% on Tau2-Bench multi-turn. Tracks against tool-calling.
  • Clarity-of-writing regression in long-form AI text (@MillionInt) [morning]. Argues clarity may have regressed from the o1/o3 era despite intelligence gains. Solo signal, no paper. Worth tracking as a falsifiable claim if anyone benchmarks it.
  • LLM Wikis + HTML artifacts as a workflow (@omarsar0) [evening]. Personal-workflow opinion piece arguing an LLM wiki is the durable state for agents, with HTML artifacts as the interactive surface. Directionally relevant because cere-bro is exactly this pattern.
  • Jensen Huang at CMU commencement (cluster of 4) (@nvidia 23:29 UTC, @nvidia 23:25 UTC, @magicsilicon 20:31 UTC, @magicsilicon 19:16 UTC · NVIDIA blog · CMU news) [morning]. Honorary Doctor of Science and Technology, keynote framed AI revolution alongside PC revolution. PR cycle, included as cluster.
  • Opaque x.com/i/article/ reposts (click through to read)@neural_avb, @eng_khairallah1, @_avichawla, @akshay_pachaar, @ashwingop, @addyosmani [morning]. All retweeted by @bayesiansapien; pipeline cannot resolve the embedded article IDs. Treated as curated click-through items.
  • DAIR.AI Vibe Coding Claude Code course (landing page) [evening]. Promo bundled into the Apple-paper tweet. Skip.
  • Tesla Smart Summon FSD v14.3.2 clip (@Tesla) [evening]. Product demo, no model or training detail. Skip.
  • @bcherny — Clawd + umeshu (@bcherny) [morning]. Off-topic personal post from the Anthropic feed. Skip.