Media Live
Twitter/X polled four times a day. Every retweet from @bayesiansapien plus AI-relevant tweets from 9+ tracked accounts. Less curated than the daily digest. Closer to the raw stream.
Summary
Two real signals tonight. First, Karpathy has joined Anthropic, announced via @ClaudeDevs reposting his personal note about returning to R&D at "the frontier of LLMs." That is the single most consequential AI-people move of the week and reshapes Anthropic's research bench. Second, Anthropic shipped a substantial Claude Managed Agents update (cluster of 5 @ClaudeDevs posts): self-hosted sandboxes, MCP tunnels into private networks, hot-swappable tools/MCP/vaults on a live session, and automatic offloading of MCP outputs over 100k tokens to sandbox files. Beyond that, Scoble's Google I/O note flags a new Gemini Omni "world model" pitched as Holodeck-foundation, and Cursor reports Composer 2.5 is now the most-chosen model on the platform with 10x usage bumps for the day. The @brivael run (10+ posts) is French-language political commentary and gets one skip bullet.
Posts
- Karpathy joins Anthropic (@ClaudeDevs reposting @karpathy). Karpathy: "I've joined Anthropic... I think the next few years at the frontier of LLMs will be especially formative... I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time." This is a senior research hire of the highest order. Anthropic's frontier roadmap, post-training methodology, and (eventually) educational outputs all move because of this. Watch for what part of the stack he ends up touching first.
- Claude Managed Agents: self-hosted sandboxes + MCP tunnels (cluster of 5, @ClaudeDevs · launch blog · docs · cookbooks · claude-api skill). Two security-grade additions: self-hosted sandboxes (agent execution stays in your infra or a managed sandbox provider) and MCP tunnels (agent reaches services inside your security perimeter). Operationally: tools, MCP servers, and vault IDs can now be swapped on a live session without restart, and MCP tool outputs over 100k tokens auto-offload to sandbox files (relevant for KV cache accounting on long-running agents). Onboarding cookbooks shipped for Cloudflare, Daytona, Docker, Modal, and Vercel. This is the enterprise-deployment story for Claude Code architecture and friends.
- Google I/O: Gemini Omni "world model" (@Scobleizer reposting @googleaidevs). Scoble's framing: "Create anything from everything... the foundation for the Holodeck is set." No technical detail in the post itself; need to read the keynote write-up before judging whether this is a real generative-world-model claim or a multimodal-Gemini rebrand. Tracking.
- Cursor: Composer 2.5 now the most-chosen model, 10x usage bump (@mntruell reposting @cursor_ai). Cursor co-founder claims Composer 2.5 has become the default pick on the platform, with everyone getting 10x usage for the rest of the day. Pairs with this afternoon's @karankendre claim that 2.5 hits near-Opus-4.7 benchmarks at 10x lower cost; usage-share telemetry is a softer signal but consistent with the price-performance story.
- PolyAI opens its Agentic Dialog Platform (@Scobleizer · poly.ai). Voice-agent platform going GA to enterprise builders, citing FedEx, Unicredit, PG&E, Marriott, Foot Locker deployments totaling 1B+ resolved conversations. Scoble's prediction: "In 24 months every customer-service line you call will be a voice agent." Worth watching for whether the build-an-agent-in-10-minutes claim survives contact with enterprise compliance.
- AI work-intelligence platforms newsletter (@Scobleizer · unaligned.io). Scoble + Cronin newsletter on enterprise productivity-tracking AI (he names Timeglass as current daily driver). Frames the segment's real bottleneck as employee-surveillance trust rather than capability. Click through to read.
- Code with Claude London (@ClaudeDevs reposting @bcherny). Anthropic developer event photo op. Skip.
- @imagine creator Q&A with @heavypulp (@imagine). xAI's video tool promo for a cinematic-scene Q&A. Skip.
- Ginko Protocol / Aichoes (cluster of 2, @brivael reposting @fao_z_fab · aichoes.lovable.app). Lightweight portable "memory protocol" for human-AI conversation: paste a text spec into any LLM, the model emits a "Gink" snapshot you store anywhere and re-inject later. Cute design as a no-account artifact, but it is a prompt-template convention, not an architectural memory system. Nothing to integrate against.
- @brivael French political and personal feed (cluster of 10, @brivael). Long thread on entrepreneurship, Asch/Milgram conformity studies, French taxation, and personal jabs at French commentators. No AI content. Skip.
Raw feed · 26 tweets
Article snippet (unaligned.io)
Unaligned Newsletter Unaligned Newsletter Login Subscribe Unaligned Newsletter Industry-wide analysis of the hottest companies and people in AI / Spatial Computing Archive 3 hours ago AI Work Intelligence Platforms The Future of Productivity Robert Scoble, +1 May 12, 2026 AI-Powered Cyberattacks A New Security Threat Robert Scoble, +1 May 05, 2026 AI Agents and the New Enterprise Workforce Layer Robert Scoble, +1 Apr 28, 2026 The Rise of AI Native Companies Robert Scoble, +1 Apr 21, 2026 Why AI Competition Is Becoming a Platform War Robert Scoble, +1 Apr 14, 2026 …
Article snippet (aichoes.lovable.app)
Ginko Protocol — Aichoes The Protocol Ginko Protocol · v1.7 A quiet gesture for human–AI memory. The Ginko Protocol is a lightweight, portable framework for sovereign cognition between you and any AI. A small, intentional shape — carried like a leaf, opened like a notebook. Read the protocol How it works Utilisation Four steps. No accounts. No app. Only a small ritual you can carry between any AI and any notebook. step 01 Generate a Gink Copy-paste the Ginko Protocol into any AI to generate your first Gink. ginko-protocol.txt v1.7 Copy Available at the bottom of this page step 02 Your AI now p…
Article snippet (aichoes.lovable.app)
Ginko Protocol — Aichoes The Protocol Ginko Protocol · v1.7 A quiet gesture for human–AI memory. The Ginko Protocol is a lightweight, portable framework for sovereign cognition between you and any AI. A small, intentional shape — carried like a leaf, opened like a notebook. Read the protocol How it works Utilisation Four steps. No accounts. No app. Only a small ritual you can carry between any AI and any notebook. step 01 Generate a Gink Copy-paste the Ginko Protocol into any AI to generate your first Gink. ginko-protocol.txt v1.7 Copy Available at the bottom of this page step 02 Your AI now p…
Article snippet (poly.ai)
PolyAI : The world's most lifelike voice AI agents Customers Product Agent Studio Technology Pricing Strategic Partners Languages Security Integrations Data and Insights Developers Agent Studio Explore Industries Consumer services Financial services Healthcare Hotels Insurance Restaurants Retail Telecom Travel Utilities Healthcare Explore Use cases Account management Authentication Call routing Billing payments Booking reservations FAQ Order management Troubleshooting Booking reservations Explore Resources Resources Customers Call recordings Blog Guides Events Podcasts Webinars ADK Docs Co…
Article snippet (studio.poly.ai)
PolyAI Agent Studio
Article snippet (platform.claude.com)
Self-hosted sandboxes - Claude API Docs Cookie settings We use cookies to deliver and improve our services, analyze site usage, and if you agree, to customize or personalize your experience and market our services to you. You can read our Cookie Policy here . Customize Customize Cookie Settings Reject Reject All Cookies Accept Accept All Cookies Loading... Messages Managed Agents Admin Resources API reference English Console Log in Search... ⌘K First steps Overview Quickstart Prototype in Console Define your agent Agent setup Tools MCP connector Permission policies Agent Skills Configure agent…
Article snippet (github.com)
claude-cookbooks/managed_agents/self_hosted_sandboxes at main · anthropics/claude-cookbooks · GitHub Skip to content Navigation Menu Toggle navigation Sign in Appearance settings Platform AI CODE CREATION GitHub Copilot Write better code with AI GitHub Spark Build and deploy intelligent apps GitHub Models Manage and compare prompts MCP Registry New Integrate external tools DEVELOPER WORKFLOWS Actions Automate any workflow Codespaces Instant dev environments Issues Plan and track work Code Review Manage code changes APPLICATION SECURITY GitHub Advanced Security Find and fix vulnerabilities Code…
Summary
The dominant story is the Musk v. Altman verdict aftermath. @ns123abc carries a cluster of five posts covering the jury finding (all three charitable-trust claims dismissed as time-barred, not on merits) and Musk's same-day appeal to the 9th Circuit, with the lawyer's "appeal" sound bite and a three-stage timeline of OpenAI's for-profit conversion. The second real signal is hardware: NVIDIA VP Ian Buck personally hand-delivering the first Vera CPUs to Anthropic, OpenAI, SpaceXAI, and Oracle, with Anthropic publicly calling Vera "a promising part of the ecosystem when solving for agentic workloads." Two smaller items round out the slot: an undisclosed claim that Demis Hassabis was an angel investor in Anthropic at founding, and a @brivael repost of @karankendre claiming Cursor's Composer 2.5 hits near Opus-4.7 benchmark scores at 10x lower cost (no source paper, treat as unverified). The rest of @brivael's 15-post run is French-language political and personal commentary, no technical content.
Posts
- Musk v. Altman: jury verdict and 9th Circuit appeal (cluster of 5, @ns123abc). After roughly 90 minutes the jury unanimously found all three Musk claims (breach of charitable trust, aiding and abetting, unjust enrichment) time-barred; Judge Gonzalez Rogers accepted the findings, dismissing the only counts that could have produced disgorgement, a constructive trust on Brockman's ~$30B equity, or nonprofit control of the for-profit. Musk announced an appeal to the 9th Circuit the same day. His lawyer Marc Toberoff argued the limitations clock should not have started in 2020 because the for-profit conversion happened in three stages: 2015 pure 501(c)(3), 2019 capped-profit subsidiary at 100x, 2025 PBC conversion with caps removed and Microsoft at 27%. The jury never weighed the underlying breach evidence. Adjacent to Anthropic overtakes OpenAI B2B on the broader OpenAI corporate-form question.
- NVIDIA Vera CPU hand-delivery tour: Anthropic, OpenAI, SpaceXAI, Oracle (@ns123abc). Ian Buck personally drove the first Vera CPUs across the Bay Area on Friday. Anthropic called Vera "a promising part of the ecosystem when solving for agentic workloads," OpenAI's Katti thanked Buck, Musk asked technical questions about cores, memory layout, and cooling for SpaceXAI evaluation, and Oracle is publicly committed to hundreds of thousands of Vera CPUs starting in 2026. CPU rollout matters because Vera pairs with Rubin and Blackwell at the host side of agent workloads where context loading and dispatch dominate.
- Undisclosed: Demis Hassabis was an angel investor in Anthropic at founding (@ns123abc · ft.com). Claim is that Dario Amodei views Hassabis as a role model and Hassabis put personal money into Anthropic at the start. If accurate, it complicates the standard DeepMind/Anthropic-as-rivals framing. Source is a referenced FT piece; the URL in the tweet is truncated, click through to read.
- Cursor Composer 2.5 claimed near-Opus-4.7 at 10x lower cost (@brivael reposting @karankendre). Brivael frames it as "Cursor did a collab with SpaceX to destroy Anthropic," which reads as overheated, and the claim itself is unsourced in the post (no eval card, no benchmark table). Worth tracking if Cursor publishes numbers, but treat the price-performance claim as marketing until then.
- Hello from Code with Claude London (@bcherny). Anthropic Code with Claude developer event. Skip.
- @brivael French-language political and personal feed (cluster of 14, @brivael). Mix of jabs at French intellectuals, Sarah Knafo on a 15-year-old hacking the French state, Elon Musk gossip from Ashley St. Clair, retirement-by-capitalisation politics, and personal lifestyle posts. No technical content, no AI claims worth checking. Skip.
Raw feed · 23 tweets
Summary
No @bayesiansapien retweets in the morning slot, so all signal comes from the AI account feed. The dominant story is Cursor Composer 2.5, a six-post launch thread from @cursor_ai detailing a Kimi K2.5 base, RL with textual feedback for credit assignment across hundred-thousand-token rollouts, a Sharded Muon optimizer plus dual mesh HSDP training stack, and a follow-on training run with SpaceXAI at 10x more compute on Colossus 2's million H100-equivalent cluster. Second signal is NVIDIA Vera, where Ian Buck's hand-delivery to Anthropic, OpenAI, SpaceXAI, and Oracle yesterday gets the official NVIDIA blog post framing Vera as the agentic-AI CPU. Third is Anthropic shipping Claude Code Fast Mode by default on Opus 4.7 and adding cache diagnostics in the console, plus a rename of "extra usage" to "usage credits." Tesla pushes three FSD v14.3.3 testimonial reposts. Scoble runs three eclectic medical-AI demos. @brivael returns with another long French-language run on capitalism, philosophy, and AI as merit catalyst, with one technically interesting repost of Cedric Lion's "Open Collider" creativity-engine project for LLMs.
Posts
- Cursor Composer 2.5: Kimi K2.5 base, RL with textual feedback, Sharded Muon, dual-mesh HSDP, 10x compute follow-up with SpaceXAI (cluster of 6, @cursor_ai · blog). Cursor ships Composer 2.5 as a substantial step up over Composer 2 on long-running task quality and complex instruction following. The training stack is the interesting part. The base is Moonshot's Kimi K2.5 open-source checkpoint. Training adds three things: targeted RL with textual feedback (the model receives natural-language critique during rollout, used to assign credit to specific decisions inside spans of hundreds of thousands of tokens, which the post argues is the only viable credit-assignment scheme at agentic rollout length); synthetic data generation for more complex RL environments; and a Sharded Muon optimizer paired with dual-mesh Hybrid Sharded Data Parallel (HSDP), which the post says was necessary to scale Muon to their training-data parallelism layout. Cursor claims Composer 2.5 is "up to 10x more efficient than similarly capable models," matched against unnamed peers. Separately, Michael Truell (@mntruell) confirms the SpaceXAI collaboration is a from-scratch much larger model with 10x total compute on Colossus 2's million H100-equivalents, framed as "the very start of our work with SpaceXAI." This is the second consecutive week the wiki has tracked Cursor pushing models built on a Chinese open-weight base (Composer 2 on Kimi K2, now Composer 2.5 on Kimi K2.5). The base-model dependency on Chinese open weights for a frontier US coding agent is now a stable, not transitional, pattern.
- NVIDIA Vera CPU hand-delivery: Anthropic, OpenAI, SpaceXAI, Oracle Cloud get first units (cluster of 3, @nvidia · NVIDIA blog). NVIDIA VP Ian Buck hand-delivered the first Vera CPUs Friday May 15 to Anthropic San Francisco, OpenAI Mission Bay, SpaceXAI Palo Alto, then Oracle Cloud Santa Clara on Monday. Buck quote: "Agentic AI is creating a new CPU moment in the AI factory. As models move from answering to acting, Vera is purpose-built to keep that work moving at scale." Vera was announced at GTC San Jose in March as NVIDIA's first custom CPU, positioned as a multi-billion-dollar standalone business. The strategic claim is that agentic AI demands a different CPU architecture than the x86 / ARM hosts that pair with current Hopper / Blackwell datacenter GPUs because the host-side dispatch, context-loading, and tool-invocation patterns are different in agentic vs request-response workloads. No public Vera specs in this thread; the blog page promises a roadmap to Vera-powered systems. Matched against yesterday afternoon's social-stream Vera item (Ian Buck's same-day photo tour), today's blog post is the official narrative.
- Anthropic ships Fast Mode default on Opus 4.7 in Claude Code, plus prompt-cache diagnostics, plus usage-credit rename (cluster of 6, @ClaudeDevs · fast-mode docs). Three product changes in one push. Fast Mode is a high-speed Opus configuration: identical model quality at roughly 2.5x response speed at a higher per-token rate, toggleable via
/fast. Anthropic frames it as latency-versus-cost tradeoff for rapid iteration, live debugging, time-sensitive work. The docs note it stays in research preview, with rate-limiting handled at the org level and per-session opt-in optional. Second change: prompt cache diagnostics now visible in the Claude Console, showing exactly which part of a prompt caused a cache miss and the token cost (cache diagnostics docs). This is one of the most-requested missing features for high-cache-rate workloads where small invisible prefix drift can silently halve effective margins. Third: "extra usage" is renamed to "usage credits" across Claude products, with the framing that credits now power features beyond plan overflow (specifically fast mode and other premium routes). Existing spending limits, auto-reload, and pre-purchased credits carry over unchanged. The three together describe an Anthropic moving toward a metered-throughput pricing model where speed and cache-hit quality are first-class billable axes. - How Claude Code works in large codebases (@ClaudeDevs · blog). Anthropic publishes a best-practices post drawn from teams running Claude Code across multi-million-line monorepos, decades-old legacy systems, and distributed microservices. The fetched article body is navigation chrome only, so the substantive recommendations require clicking through; but the framing suggests Anthropic is starting to publish prescriptive playbooks for enterprise rollouts rather than just docs.
- Tesla FSD v14.3.3 testimonials (cluster of 3, @Tesla). Three Tesla reposts of community videos showing v14.3.3 catching potential side-swipe and merge collisions, plus a Herbert Ong summary claiming the version brings much less driver-monitor nagging (users reportedly going over a minute with no nag), smoother and more human-like driving, faster Smart Summon to 8 mph, "Hey Grok" voice support, and better visualizations. Vision-based collision avoidance is real safety signal; the nagging reduction is the deployment-comfort lever Tesla has been pushing for two quarters. Tier 3 for this wiki (robotics adjacent, not core research).
- Open Collider: an LLM creativity engine that mechanically improves idea diversity (@brivael reposting @cdriclion). Brivael flags Cedric Lion's "Open Collider" open-source project. Lion's framing: LLMs collapse on the same ideas when sampled many times from the same brief, a phenomenon Jiang et al. (2025) call "Artificial Hivemind." Telling the model "be more creative" moves the output set sideways without expanding it. Open Collider is positioned as a sampling-and-recombination engine that mechanically forces non-trivial idea diversity. The tweet doesn't expose method details, so the wiki cannot verify the claim, but the diagnosis is consistent with the wiki's prior coverage of LLM-creativity-bottleneck papers (the 2026-05-12 social-stream item on neuron pruning for novelty and the recurring open question of whether RLVR contracts the policy distribution).
- Robert Scoble field reports: AR brain surgery, vein-and-skeleton biometrics, AI travel-planning LobeHub (cluster of 3, @Scobleizer). First is a meeting with Cam Rooahmed, who runs an AR + AI brain-tumor visualization system used in 100+ surgeries with mistake-rate reductions Scoble claims save material money at $100K+ per surgery; uses Magic Leap. Second is Scoble plugging an interview with Robert Adams, whose company globaledentity.com is selling vein and skeleton biometrics now used in new TSA airport scanners. Third is LobeHub plugging a multi-agent system that planned his entire Google I/O week. All three are vendor-positive demos with no technical depth. Track only the brain-tumor AR system, which sits at the AI / medical robotics intersection.
- @brivael French-language run on capitalism, philosophy, AI as merit catalyst (cluster of 10, @brivael). Posts include a "Jensen Huang is a true good guy" Dell-event signing, an Estado Mínimo repost on Venezuela and socialism, a Peter Thiel quote on physical-world stagnation since 1970, an Elon-Musk-on-child-experiments line, an essay on free trade and minimal government, and a top-50-entrepreneurs analysis claiming early precocity plus genuine personal financial risk plus 10-15 year pain tolerance are the only common traits. No AI research content. Skip.
- One-line / promo (@Scobleizer on Robert Adams biometrics, @magicsilicon on White Castle in New Jersey, @WHFraudTF reposting VP Vance on fraud). Skip.
Raw feed · 35 tweets
Article snippet (nvda.ws)
Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs | NVIDIA Blog Skip to content Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs Ian Buck hand-delivers the first NVIDIA Vera CPU systems to Anthropic, OpenAI, Oracle Cloud Infrastructure and SpaceXAI — marking the moment agentic CPUs move from announcement to production. May 18, 2026 by Ian Finder Share X Facebook LinkedIn Email 0 Agentic AI has always called for a different kind of CPU. NVIDIA CEO and founder Jensen Huang introduced the answer — the standalone Vera CPU — at GTC San Jose in March as NVI…
Article snippet (unaligned.io)
Unaligned Newsletter Unaligned Newsletter Login Subscribe Unaligned Newsletter Industry-wide analysis of the hottest companies and people in AI / Spatial Computing Archive May 12, 2026 AI-Powered Cyberattacks A New Security Threat Robert Scoble, +1 May 05, 2026 AI Agents and the New Enterprise Workforce Layer Robert Scoble, +1 Apr 28, 2026 The Rise of AI Native Companies Robert Scoble, +1 Apr 21, 2026 Why AI Competition Is Becoming a Platform War Robert Scoble, +1 Apr 14, 2026 The Rise of AI Middle Managers Robert Scoble, +1 Apr 07, 2026 The New AI Divide Op…
Article snippet (code.claude.com)
Speed up responses with fast mode - Claude Code Docs Skip to main content Claude Code Docs home page English Search... ⌘ K Ask AI Claude Developer Platform Claude Code on the Web Claude Code on the Web Search... Navigation Model and responses Speed up responses with fast mode Getting started Build with Claude Code Administration Configuration Reference Agent SDK What's New Resources Settings and permissions Settings Permissions Sandboxing Model and responses Model configuration Speed up responses with fast mode Output styles Interface Terminal configuration Fullscreen rendering Voice dict…
Article snippet (platform.claude.com)
Caching | Claude Platform Cookie settings We use cookies to deliver and improve our services, analyze site usage, and if you agree, to customize or personalize your experience and market our services to you. You can read our Cookie Policy here . Customize Customize Cookie Settings Reject Reject All Cookies Accept Accept All Cookies Loading...
Article snippet (cursor.com)
Introducing Composer 2.5 · Cursor Skip to content Cursor Product ↓ Agents Code Review Cloud Tab CLI Marketplace ↗ Enterprise Pricing Resources ↓ Changelog Blog Docs Community Help ↗ Workshops Forum ↗ Careers Product → Enterprise Pricing Resources → Sign in Contact Contact sales Download Blog / research May 18, 2026 · research Introducing Composer 2.5 7 min read Table of Contents ↑ Training Composer 2.5 Targeted RL with textual feedback Synthetic data Sharded Muon and dual mesh HSDP Try Composer 2.5 Composer 2.5 is now available in Cursor. It's a substantial improvement in intelligence and…
Monday, May 18
Summary
The day is a clear evening-heavy slot. Morning was empty (zero retweets, zero articles, second consecutive Monday like that) and afternoon offered only one Tier 1 nugget. The evening slot carried the substance: five of six @bayesiansapien retweets are worth reading, anchored by Meta's SP-KV (Self-Pruned KV Attention) which trains a per-head utility predictor for KV eviction and claims 3 to 10x cache reduction. Two field-shaping arguments land alongside it: a Stanford Data Processing Inequality paper arguing a single LLM beats coordinated multi-agent systems under equal reasoning budgets, and dair.ai's Epistematics paper claiming most agent leaderboards do not measure what they advertise. The afternoon's single nugget is Atlas Inference clocking Qwen3.6-35B at 200+ tok/s on a DGX Spark (GB10), roughly 3x what Codex and Claude pipelines hit on the same hardware class. Everything from @brivael (30+ tweets across slots, mostly French-language polemics) and the @nvidia / @magicsilicon Dell-Tech-World promo cluster is noise.
Posts
- SP-KV: Self-Pruned KV Attention from Meta, 3 to 10x KV cache reduction (@TheTuringPost) [evening]. Per-token per-head 2-layer MLP predicts utility, old tokens are pruned while a local sliding window stays full. Composes with Make Each Token Count's eviction policy and the KV-sharing / MHC line Raschka surveyed yesterday. Cleanest Tier 1 KV-cache story of the week.
- Single LLM beats coordinated multi-agent under equal reasoning budgets (Stanford) (@rohanpaul_ai) [evening]. Formalizes the handoff-as-compression argument via the Data Processing Inequality. Reads as a coherent counter to the multi-agent default, alongside the LIFE survey on multi-agent collaboration failure and the multi-agent-systems concept page.
- The Evaluation Trap / Epistematics: most agent leaderboards do not measure what you think (@dair_ai · arxiv 2605.14167) [evening]. Audit procedure that derives evaluation criteria from a benchmark's capability claim and checks whether the test discriminates the claim from proxy behaviors. Worked example shows Dupoux et al. (2026) reproducing the assumption it claims to revise.
- SFT memorizes, RL generalizes (ICML 2025) (@burkov · arxiv 2501.17161) [evening]. Comparative study across rule-based textual and visual tasks. Empirical companion to GFT: SFT as degenerate RL, the theoretical version of the same claim.
- Detecting overfitting during long-horizon grokking via Random Matrix Theory (@burkov · arxiv 2605.12394) [evening]. RMT spectra of weight matrices alone discriminate generalizing vs memorizing basins. Practitioner setting with no training history, no test set. Model-card-grade diagnostic if it holds up.
- Atlas Inference clocks Qwen3.6-35B at 200+ tok/s on DGX Spark (GB10) (@Scobleizer reposting @AtlasInference) [afternoon]. Claim is roughly 3x Codex / Claude on the same hardware class. No paper or kernel detail in the post. Worth a follow-up if Atlas publishes methodology.
- Hermes Agent Kanban: orchestrator auto-decomposition on triage (@Scobleizer · docs · PR #27572) [afternoon]. Orchestrator decomposes a triage prompt into subtasks and routes by specialization description, durable board in
~/.hermes/kanban.db, every worker an OS process. Adjacent to Claude Code vs Hermes permissions coverage. - Sholto Douglas reposts "How to land a frontier lab job" by Vlad Feinberg (@_sholtodouglas · vladfeinberg.com) [evening]. Anthropic-insider endorsement, career-side reading rather than research, high signal for the audience.
- Claude Code at scale: best practices for monorepos, legacy systems, microservices (@ClaudeDevs · claude.com blog) [evening]. Anthropic's own write-up for million-line repos. Practitioner reading.
- Grok Build beta first impressions (@brivael) [evening]. Hands-on note, speed is "genuinely cool", quality-at-speed claim would be a real IDE-agent shift if it lands near Opus 4.7. Anecdotal, no benchmarks.
- Opaque x.com/i/article reposts (@nyk_builderz via bayesiansapien) [evening]. Content not fetchable. Click through to read.
- @MillionInt productivity / math aphorisms (cluster of 2, @MillionInt) [morning]. Inspirational, no AI content. Skip.
- @brivael French-language polemics on AI meritocracy, copycats, politics, Twitter drama (cluster of 30 across slots, @brivael) [afternoon + evening]. No links to research, no falsifiable claim. Skip.
- Scoble personal feed and consumer biometric / BCI plugs (cluster of 6, @Scobleizer) [afternoon + evening]. Bill Gates time-value, Big Sur sunset, "AI is taking my job" essay, globaledentity.com vein-and-skeletal biometrics, Mave Health consumer BCI. No technical content. Skip.
- NVIDIA at Dell Technologies World, Jensen on stage with Michael Dell (@nvidia keynote · @nvidia AI-and-routine-work clip) (cluster of 2) [evening]. Promotional. Skip.
- INTC on the NYSE floor, Lip-Bu Tan on Mad Money (@magicsilicon) [evening]. Stock promo. Skip.
- @BrettRatner Instagram reel (@BrettRatner) [afternoon]. Opaque link, no preview. Click through to read.
Sunday, May 17
Summary
A near-empty social-stream day. Zero curated retweets from @bayesiansapien across all three slots, and the only AI-feed signal is a single Anthropic-researcher tweet in the morning. No cross-slot clusters because afternoon and evening were both empty. The one substantive item is @_sholtodouglas opening DMs for Claude frustrations, which is an operational signal that Anthropic is in active feedback-gathering mode ahead of the next model, not a research signal. The @magicsilicon post is a bare x.com/i/article/ quote-retweet the farmer cannot extract without X cookies. The day's actual research substance landed via Gmail (Interconnects Open Artifacts, Raschka architecture survey), Kurate (cs.LG #13 MoE-muP), and Reddit (MTP merge in llama.cpp, Cutile-rs beta), not via Twitter. Treat this as expected Sunday-after-a-heavy-Saturday variance.
Posts
- @_sholtodouglas asks when users reach for other models over Claude (@_sholtodouglas) [morning]. Anthropic researcher opens DMs for transcript-level frustrations with Claude. Operational feedback-gathering signal stacked on this week's Anthropic context ($900B valuation, $200M Gates partnership, the 2028 US-China policy paper). Worth tracking whether the next Claude release targets the harness-spread failure modes flagged by WildClawBench.
- @magicsilicon bare
x.com/i/article/quote-retweet (@magicsilicon) [morning]. Inaccessible native-long-form link, same shape that produced unscrapable @0xblacklight and @petradonka posts yesterday. Likely a hardware-analysis essay tied to the open-model wave, but unreadable without X cookies. Skip. - No @bayesiansapien retweets in any slot today. The curated stream that carried the entire 2026-05-16 day (fourteen retweets including Lighthouse Attention, Is Grep All You Need, the circuits-non-unique cluster) is empty across morning, afternoon, and evening. Operationally normal for a Sunday IST scrape window after a heavy Saturday batch.
- No NVIDIA, xAI, Google Research, Cursor, or @ClaudeDevs activity in the AI feed today. A strict subset of yesterday's already-thin feed. Nothing to read.
Links
- Slot detail: morning · afternoon · evening
- Today's digest: 2026-05-17.md
- Comparable prior day with normal cadence: 2026-05-16.md
Saturday, May 16
Summary
The morning slot carried the entire day. Fourteen curated retweets from @bayesiansapien, with three landing inside today's HuggingFace Tier 1 batch and a clean two-paper interpretability cluster forming around the claim that LLM circuits are not unique. Headline standalone is the Nous Research announcement of Lighthouse Attention (the source of today's 1.4-1.7x speedup at 98K and ~17x at 512K paper), followed by "Is Grep All You Need?" which pairs directly with yesterday's WildClawBench harness-spread result. The Sylph AI plus LIFE pairing makes harness construction itself the next automatable layer, a third entry in the three-day "the wrapper is the thing being trained" thread alongside ATESD and EvolveMem. Industry signal is concentrated in Anthropic's 2028 US-China policy paper and Bill Gurley's open-source-as-corporate-strategy essay, which sit on opposite sides of the same compute-vs-openness axis. Afternoon and evening were near-empty (one founder-tone macro-take, then zero posts), so this roll-up is effectively the morning batch.
Posts
- Lighthouse Attention release from Nous Research (@NousResearch · Lighthouse Attention deep dive) [morning]. Direct announcement of today's HF Tier 1 paper: symmetric Q/K/V pyramid pooling plus top-k cascade wrapped around standard FlashAttention, 1.4-1.7x at 98K and ~17x at 512K on a B200. No custom kernel, no straight-through estimator, and the wrapper removes itself near the end of training.
- Circuits-are-non-unique cluster (@DakingRai, @fnruji316625 · arXiv 2605.12671) (cluster of 2) [morning]. Two papers in 24 hours arguing the Functional Anisotropy Hypothesis is empirically false: multiple structurally distinct, equally faithful, equally sparse circuits coexist for the same task. Implication: a large slice of mechanistic-interpretability claims that treat a discovered circuit as the explanation are over-trusting uniqueness.
- Is Grep All You Need? (@omarsar0 · arXiv 2605.15184) [morning]. Empirical comparison across Chronos, Claude Code, Codex, and Gemini CLI finds grep-style search matches or beats vector retrieval when the harness is well-designed. Direct fit for the harness-as-load-bearing thread from WildClawBench.
- Automating AI R&D interview paper (@ZabihullahAtal) [morning]. Stanford/OpenAI/DeepMind/Anthropic researchers report a shift toward seeing automated AI research as realistic on a tighter timeline than expected. Pairs with the r/MLScaling Prime Intellect auto-nanoGPT post (14K GPU-hours, beat human SoTA, no novel ideas proposed).
- Semi-Formal Reasoning for patch verification (@IntuitMachine) [morning]. Structured-reasoning prompt format that forces preconditions, postconditions, and interprocedural dependencies, pushing patch-verification accuracy to 93% without running tests. Candidate intervention to shrink the 10.7% Lucky Pass rate flagged by AgentLens.
- Anthropic 2028 US-China policy paper (@rohanpaul_ai · Anthropic post) [morning]. Argues compute export controls give the US a 12-24 month frontier lead by 2028, conditional on closing China's access to chips, model outputs, and distillation. Pairs with this week's $900B valuation news and the $200M Gates Foundation partnership as a coordinated civic-infrastructure framing.
- Bill Gurley on open source as corporate strategy (@bgurley via @rohanpaul_ai · Substack essay) [morning]. New 65-paragraph essay reframing 27 years of open source as a strategic mechanism executives use to break monopoly power, with the headline prediction that Chinese open models become the global default by 2030. Sits on the opposite side of the same axis as Anthropic's compute-export argument.
- LIFE multi-agent survey (@dair_ai · arXiv 2605.14892) [morning]. 200+ papers mapped along Lay → Integrate → Find faults → Evolve. Reference work for the agentic-systems concept page, with the self-evolution chapter flagged as the cleanest existing field map.
- Sylph AI: harness evolution loop (@IntuitMachine) [morning]. Three-agent loop (Worker, Evaluator, Evolution) automates prompts, tools, and orchestration end-to-end. Structural cousin to today's ATESD and yesterday's EvolveMem, making three papers in three days where the wrapper is the trained object.
- Generative AI depletes the innovation commons (@kyronis_talks) [morning]. Position paper arguing aggressive automation of creative and knowledge work erodes the human substrate future models need. Same axis as Andrew Ng's "no jobpocalypse" argument from Algorithmic Bridge Weekly Picks #121, opposite framing.
- Claude Code in large codebases guide (@charmaine_klee · Anthropic blog) [morning]. Anthropic's own best-practices guide for Claude Code at scale. Shipped the same week Microsoft pulled Claude Code licenses internally, making the harness-as-product framing explicit on both sides.
- Opaque x.com/i/article reposts (@0xblacklight, @petradonka) [morning]. Bare native-article quote-retweets with no readable body. Inaccessible to the farmer until X cookies are wired in.
- @ClaudeDevs weekly rate-limit reset (@ClaudeDevs) [morning]. Operational. Skip.
- NVIDIA Catalyst series and Net Zero 2026 (Azure E5, Helfie, Josh Parker Net Zero, Energy transition) (cluster of 4) [morning]. Brand-marketing video drops on healthcare access and sustainability. Skip.
- @WHFraudTF political content (Dr. Oz CMS, SBA Maine PPP) (cluster of 2) [morning]. US administration anti-fraud political content, off-topic. Skip.
- AI progress underestimated (@MillionInt) [afternoon]. Founder-tone macro-take with no specific claim, benchmark, or link. Skip.
Friday, May 15
Summary
The strongest cross-slot cluster is xAI's Grok Build CLI launch in the morning (five tweets across @xai, @JasonBud, @milichab), the fourth-frontier-lab entry in the agent CLI category and a direct pair with today's WildClawBench harness-as-load-bearing finding. The single-slot standout is the evening's Anthropic CFO podcast tease from @nottombrown, with run-rate revenue stats ($9B to $30B in one quarter, NDR over 500%, 90% of internal code written by Claude Code) that, if half-true, are the most aggressive enterprise numbers ever publicly attributed to an AI lab. Secondary signal: @ClaudeDevs documents a concrete prompt-cache pre-warming trick (system prompt before user prompt) for time-to-first-token wins, and AWS announces a $110M Build on Trainium program targeting Berkeley/MIT/CMU researchers, replaying NVIDIA's CUDA-subsidy playbook on Neuron. Everything else is noise: Lex Fridman hitchhiking in China, NVIDIA commencement video, Cybertruck FSD marketing, two opaque x.com/i/article reposts, political content. Three slots ingested (morning, afternoon, evening); no night slot today.
Posts
- xAI Grok Build CLI for SuperGrok Heavy (cluster of 5) (@xai · @JasonBud demo · @JasonBud subagents · @milichab /imagine · @milichab plan mode · x.ai/cli) [morning]. Early-beta agentic CLI with subagents, plan mode, clickable interface, and
/imagineplus/imagine-videoin-terminal. Five-lab race in agent CLIs now (Claude Code, OpenAI Codex, Hermes, Grok Build, plus internal Anthropic stack); pair with WildClawBench and treat harness as a first-class routing variable. - Anthropic CFO podcast tease: $9B to $30B run-rate, 90% internal code via Claude Code (cluster of 2) (@nottombrown intro · @nottombrown stats) [evening]. Krishna Rao's first podcast: NDR over 500% annualized, first revenue in March 2023, on pace for $50B run-rate next month, ~$75B raised, Cowork outpacing early Claude Code adoption. Reinforces the Anthropic-overtakes-OpenAI B2B thread; the Trainium/TPU/GPU allocation discussion is routing-adjacent infrastructure signal.
- Anthropic prompt-cache pre-warming pattern (cluster of 2) (@ClaudeDevs · @ClaudeDevs docs · docs) [afternoon]. Send the system prompt before the user prompt so Claude writes it to cache without generating output, then the real request lands on a warm cache. Practitioner KV-cache adjacency for long-prompt API workloads, fits the kv-cache thread.
- Position paper: agentic AI as the only foreseeable route to AGI (@omarsar0 via @bayesiansapien · arxiv 2605.12966) [afternoon]. Argues monolithic scaling is insufficient and agent DAGs achieve exponentially better generalization and sample efficiency, with explicit connections to MoE and routing topologies. Tier-2 read for the agentic-systems direction.
- AWS $110M Build on Trainium program for university researchers (@mattsgarman · aboutamazon.com) [evening]. Berkeley, MIT, CMU and others get dedicated Trainium access, open-source contributions flow back to Neuron. NVIDIA-style CUDA-subsidy playbook for the non-NVIDIA stack; sits next to Amazon-Anthropic capital concentration and the broader gpu-kernels GPU-alternative push.
- HBR: "AI brain fry" hits high performers hardest (@rohanpaul_ai via @bayesiansapien) [afternoon]. Survey of 1,500 workers reports AI intensifies workloads, forcing constant task-switching. Opinion/HR-research framing, soft responsible-AI signal on deployment-side cognitive load.
- JasonBud calibration tweet on Grok Build (@JasonBud) [morning]. Sets early-beta expectations; surrounding cluster covers the substance. Skim.
- "Vibes are hard to replicate in silicon" (@MillionInt) [evening]. One-liner reflection, no data. Skim.
- Opaque x.com/i/article reposts (@ashwingop) [afternoon]. Article body not fetched; click through if curious.
- NVIDIA Jensen Huang CMU commencement (@nvidia) [morning]. PR/motivational. Skip.
- Lex Fridman hitchhiking-in-China announcement (cluster of 2) (@lexfridman trip plan · @lexfridman form) [morning]. Off-topic except as a downstream podcast signal on China AI engineers. Skip until episodes drop.
- DAIR.AI Academy "Vibe Coding AI Apps with Claude Code" course (academy.dair.ai) [afternoon]. Course-catalogue side article. Skip.
- Magicsilicon: Intel x McLaren F1 compute partnership (@magicsilicon) [morning]. Sponsorship PR. Skip.
- Tesla Cybertruck FSD marketing (@Tesla) [morning]. Demo-drive promo. Skip.
- @BrettRatner AF1/Starlink/Jensen photo (@BrettRatner) [afternoon]. Celebrity tweet. Skip.
- WHFraudTF political posts (cluster of 2) (article · livestream) [afternoon]. Off-topic political content. Skip.
Thursday, May 14
Summary
The day is morning-driven: afternoon was effectively empty (one Tesla FSD marketing clip) and evening was three vendor tweets (NVIDIA pushing Nemotron Labs "claws" twice and AWS reopening Kiro credits). The strongest morning cluster is δ-mem, the lightweight 8x8 frozen-backbone associative memory paper, picked up by both @HuggingPapers and @dair_ai. A second mini-cluster forms around the AutoTTS agentic test-time scaling paper (two reposts) and a third asymmetric-training-identical-inference cluster ties Lighthouse Attention and Nous Token Superposition Training together. Standout single-slot signals worth pulling out of the morning stream: Anthropic's Mythos / Glasswing cyber-range result from @bcherny with AISI confirmation, the multi-agent Bystander Effect paper, and the refusal-neurons MLP-level alignment bypass. Everything else (NVIDIA SAP OpenShell, NVIDIA Snap GPU pipeline, opaque article reposts, Tesla, Kiro) is industry positioning or noise.
Posts
- δ-mem: efficient online memory for frozen LLMs (cluster of 2) (@HuggingPapers · @dair_ai · paper) [morning]. 8x8 associative memory with delta-rule learning gives 1.31x on MemoryAgentBench and 1.20x on LoCoMo without fine-tuning the backbone. @dair_ai calls it one of the most elegant memory mechanisms of the month.
- AutoTTS: agentic discovery of test-time scaling controllers (cluster of 2) (@zhengtoong · @ihtesham2005 · paper · wiki) [morning]. Claude Code proposes its own TTS controllers, tests, and refines over 5 rounds. Total discovery cost $39.9.
- Multi-agent Bystander Effect / Sovereignty Gap (@dair_ai · paper) [morning]. 22,500 deterministic trajectories show agents often compute the right answer internally then suppress it to agree with the swarm. Formalizes an Interaction Depth Limit and a lead-anchor non-commutativity finding.
- Lighthouse Attention: removable subquadratic wrapper (@omarsar0 · paper) [morning]. Nous Research. Wraps SDPA with a hierarchical gradient-free selection layer that gets removed at end of training so deployed inference runs vanilla attention.
- Token Superposition Training (TST) (@NousResearch) [morning]. 2-3x wall-clock pretraining speedup at matched FLOPs without changing architecture, optimizer, tokenizer, or data. Bag-of-tokens prediction in the first third of training then standard NTP. Third paper this week on the asymmetric-training-identical-inference frame.
- Microsoft KV-Cache compression repost (@AiwithYasir) [morning]. Inflated framing ("Microsoft just solved the context window problem") of a real KV-cache compression paper for long chain-of-thought. Click through to the underlying paper.
- Refusal-neurons: single MLP neuron bypasses safety alignment (@hamid_kazemi22) [morning]. Across 7 models, 2 families, 1.7B to 70B scale, suppressing one MLP neuron disables refusal behavior. No fine-tuning, no prompt engineering.
- "Attention Is All You Need V2" / Nested Learning framing (@HowToAI_) [morning]. Hype repost of the Google HOPE / Nested Learning architecture from 2026-04-28. Skip the tweet narrative; the paper itself is real.
- omarsar0 on HTML Artifacts + agents (@omarsar0 · DAIR.AI event) [morning]. Demo of an HTML+JS artifact backed by Obsidian markdown that agents read and modify. Practitioner signal on where agent UI is heading.
- Anthropic Mythos / Glasswing cyber results (@bcherny · XBOW evaluation · AISI report) [morning]. UK AISI confirms Mythos Preview is the first model to solve both cyber ranges end-to-end, including Cooling Tower which no prior model solved. AISI reports autonomous AI cyber-task length doubling every few months.
- NVIDIA + SAP OpenShell agent runtime (@nvidia · NVIDIA blog) [morning]. SAP embeds NVIDIA OpenShell (open-source secure agent runtime) into SAP Business AI Platform with isolated execution and infra-level containment.
- NVIDIA + Snap GPU petabyte A/B testing (@nvidia · NVIDIA AI Podcast Ep 298) [morning]. Snap moved 10+ PB/day of A/B-test data to GPU-accelerated Google Cloud: 76% cost cut, 80% memory cut, zero code changes.
- Nemotron Labs "claws" / long-running agents (cluster of 2) (@nvidia tweet 1 · @nvidia tweet 2 · blog) [evening]. NVIDIA pushes self-hosted persistent agents that run 24/7, citing OpenClaw's 250k GitHub stars in 60 days. Market-positioning signal: agent narrative shifting from prompt-triggered to always-on.
- Opaque article-only reposts (group, click through to read) (@akshay_pachaar · @oneill_c · @amitiitbhu · @AnatoliKopadze · @mem0ai) [morning]. Five
x.com/i/article/reposts with no extractable text. mem0ai likely on agent memory; the rest unclear without click-through. - Tesla marketing posts (@Tesla EV · @Tesla FSD glare) [morning + afternoon]. EV marketing and a vendor FSD-through-glare clip. Skip.
- Kiro startup credits reopen (@mattsgarman · blog) [evening]. AWS reopens Kiro Pro+ credits for pre-seed to Series A startups. Pure promo. Skip.
Wednesday, May 13
Summary
A very thin day with effectively one real signal across both available slots. The afternoon was near-empty, just a single snarky line from @ns123abc about token-burning as performative work. The evening carried the only substantive post, NVIDIA announcing an engineering co-design partnership with David Silver's new lab Ineffable Intelligence, framed explicitly as infrastructure for continuously-learning RL "superlearners" beyond pretrained LLMs. The other evening posts are travel chatter, Jensen and Elon both landing in Beijing on Air Force One the same day, which is hard to read as unrelated to the RL-infra announcement given how much of the story turns on chip access and training compute. No curated reposts from @bayesiansapien in either slot, no morning or night pull. Treat the day as quiet, the Ineffable post as the only item worth carrying into the digest.
Posts
- NVIDIA + Ineffable Intelligence partner on large-scale RL infra (@nvidia · NVIDIA blog) [evening]. Jensen and David Silver co-designing the training pipeline for continuously-learning RL agents, positioned as the next paradigm after pretrained LLMs. Silver's quote frames imitation-trained LLMs as "the easier problem" and experience-driven RL as the real frontier. Worth tracking against the recent training-efficiency and RL-from-experience wave.
- Jensen and Elon land in Beijing on AF1 (cluster of 2) (@ns123abc a, @ns123abc b) [evening]. Two posts inside 15 minutes flagging the same arrival, with Elon's own tweet quoted ("Just Jensen and I are on AF1"). Industry-political signal more than technical, but the timing alongside the Ineffable announcement is notable.
- "Wait, people are just burning tokens to look busy?" (@ns123abc) [afternoon]. One-line jab at agent and coding-assistant users running long generations for optics rather than output. Mildly amusing, not substantive.
Tuesday, May 12
Summary
The day's strongest cross-slot cluster is Claude Code's internal architecture. The morning surfaced @bcherny's launch of the Claude Code agent view (a single list of all in-flight sessions), the afternoon carried a public 5-layer field guide thread, and the evening closed with Gary Marcus reading Claude Code's 53 symbolic tools and ~500k lines of scaffolding as vindication for neurosymbolic AI. Three independent angles on the same codebase in one day is a real signal, not noise. The standout single-slot post is AutoTTS in the afternoon, with both the authors' thread and an analytical writeup arriving inside a few hours: $39.9 and ~160 minutes of agent search beats hand-crafted test-time scaling baselines, and the wiki already has a page on it from yesterday so this is cross-source confirmation. The rest of the afternoon is unusually dense for one slot, with PwC's clarification-timing paper, the tool-calling steerability probe, Thinking Machines' always-on Interaction Models, and the Microsoft + Salesforce 200K-conversation drift study all landing in the same window. Morning and evening are otherwise thin, with the @bcherny Cowork-books-flights cluster as the only other workflow-grade post, plus AWS Claude Platform GA in industry news. Several opaque x.com/i/article reposts go uncollapsed because the synthesis cannot expand them inline.
Posts
- Claude Code agent view (research preview) (@bcherny · @claudeai launch) [morning]. Unified list of all in-flight Claude Code sessions instead of cycling between terminal tabs. Productizes the many-agents-per-user pattern.
- Claude Code's 5 architectural layers (@NainsiDwiv50980 · wiki) [afternoon]. Field guide thread on CLAUDE.md as memory layer plus four further layers that go beyond prompting. Clean public summary of material already in the wiki's Claude Code pages.
- Gary Marcus: Claude Code is the most neurosymbolic system he has ever seen (@GaryMarcus · ccunpacked.dev · wiki) [evening]. Reads Claude Code's 53 tools plus ~500k lines of orchestration around a frontier LLM as proof that progress is coming from classical-AI scaffolding, not pure scaling. The linked site is a source-level dissection of the agent loop and tool registry.
- @bcherny Cowork + Opus 4.7 one-shots 8 flights and 5 hotels (cluster of 2) (@bcherny a, @bcherny b) [morning]. Flight preferences go into Cowork instructions, Opus opens a browser, navigates sites, books everything in parallel while the user does other Claude Code work. Frontier-agent browser use is crossing into real workflows.
- AutoTTS, frontier LLMs design their own test-time scaling (cluster of 2) (@zhengtoong, @omarsar0 · arxiv · wiki) [afternoon]. Environment-driven discovery framework: humans design the search environment, coding agents discover the width-depth TTS controller. Discovery cost $39.9 and ~160 minutes, results generalize across held-out benchmarks and model scales. Second day of independent signal.
- Clarification timing in long-horizon agents (PwC) (@dair_ai · arxiv) [afternoon]. Forced-injection framework across 4 frontier models, 84 tasks, 6,000+ runs. Goal clarification loses almost all value after 10% of execution. Deferring past mid-trajectory is worse than never asking. Empirical brake on the "always ask early" prior.
- Tool calling is linearly readable and steerable (@tldr_ai_papers · arxiv · wiki) [afternoon]. Probes 12 instruction-tuned models (270M to 27B). Adding the mean activation difference between two tools flips the chosen tool with 77-100% accuracy and the JSON arguments autoregressively conform to the new schema. Small set of mid- and late-layer attention heads localized via patching.
- Thinking Machines Interaction Models (TML-Interaction-Small) (@rohanpaul_ai · blog) [afternoon]. 276B MoE, 12B active. Replaces walkie-talkie turn-taking with always-present AI: audio, video, and text sliced into 200ms micro-turns, model listens, watches, speaks, acts, and tool-calls while the interaction is still happening. Trained from scratch with a multi-stream micro-turn design.
- RAO: Recursive Agent Optimization (@apurvasgandhi) [afternoon]. End-to-end RL for training LLMs to spawn, delegate to, and coordinate with recursive copies of themselves. Sub-agents as inference-time scaling primitives. Adjacent to the Sakana Conductor and AutoTTS thread on learned orchestration.
- Anthropic memory plus "Dreaming" continual-learning preview (@daniel_mac8) [afternoon]. Reports on a recent talk framing memory as the next first-class agent primitive after MCP, Skills, and harnesses: writable shared context, provenance, review, background consolidation. "Dreaming" is described as recursive self-improvement at the agent-system level.
- GEPA explainer on long-horizon agent RL (@blc_16) [afternoon]. Walkthrough of why sparse rewards throw away trajectory information and how GEPA learns from the trajectory itself via textual critiques, prompt edits, and Pareto-frontier selection.
- Microsoft + Salesforce: 200K conversations, 39% average accuracy degradation (@HowToAI_) [afternoon]. ChatGPT 96.6% to 72.6%, Gemini 97.4% to 68.1% as conversations lengthen. Attributed to an anchoring trap. Mechanism overlaps with the PwC clarification-timing paper in the same slot.
- Curved geometry of LLM activations (@che_shr_cat) [afternoon]. Argues the Linear Representation Hypothesis is a useful lie that breaks down fast: straight-line steering produces teleportation and diversity collapse. Conceptually opposite to the tool-steerability probe in the same slot.
- Nature Neuroscience: brains do not predict every word uniformly (@ValerioCapraro) [afternoon]. Zou, Poeppel, Ding: brain activity tracks word surprisal LLM-style inside phrases but the match weakens across major phrase boundaries. Counterweight to the "humans are just next-word predictors" frame.
- Claude Platform on AWS GA (@mattsgarman · AWS blog) [morning]. Anthropic's native Claude Platform, including Managed Agents, Agent Skills, MCP connector, code execution, and files API, accessible directly from AWS accounts. AWS is the first cloud provider to offer it natively. Also in today's Industry Pulse.
- NVIDIA at Dell Technologies World (cluster of 2) (@nvidia a, @nvidia b · event) [morning]. Jensen Huang and Michael Dell co-keynote on AI-accelerated enterprise compute, May 18-21 Las Vegas. PR-cycle event.
- Opaque x.com/i/article reposts (click through) (@AmarSVS, @AlphaSignalAI, @neural_avb, @ns123abc) [afternoon + evening]. Bare X-native long-form article links the synthesis cannot expand inline.
- @magicsilicon "Whoa" (@magicsilicon) [afternoon]. Reaction post, no content. Skip.