Saturday, May 16, 2026 · social stream

Media Live

daily roll-up

Summary

The morning slot carried the entire day. Fourteen curated retweets from @bayesiansapien, with three landing inside today's HuggingFace Tier 1 batch and a clean two-paper interpretability cluster forming around the claim that LLM circuits are not unique. Headline standalone is the Nous Research announcement of Lighthouse Attention (the source of today's 1.4-1.7x speedup at 98K and ~17x at 512K paper), followed by "Is Grep All You Need?" which pairs directly with yesterday's WildClawBench harness-spread result. The Sylph AI plus LIFE pairing makes harness construction itself the next automatable layer, a third entry in the three-day "the wrapper is the thing being trained" thread alongside ATESD and EvolveMem. Industry signal is concentrated in Anthropic's 2028 US-China policy paper and Bill Gurley's open-source-as-corporate-strategy essay, which sit on opposite sides of the same compute-vs-openness axis. Afternoon and evening were near-empty (one founder-tone macro-take, then zero posts), so this roll-up is effectively the morning batch.

Posts

  • Lighthouse Attention release from Nous Research (@NousResearch · Lighthouse Attention deep dive) [morning]. Direct announcement of today's HF Tier 1 paper: symmetric Q/K/V pyramid pooling plus top-k cascade wrapped around standard FlashAttention, 1.4-1.7x at 98K and ~17x at 512K on a B200. No custom kernel, no straight-through estimator, and the wrapper removes itself near the end of training.
  • Circuits-are-non-unique cluster (@DakingRai, @fnruji316625 · arXiv 2605.12671) (cluster of 2) [morning]. Two papers in 24 hours arguing the Functional Anisotropy Hypothesis is empirically false: multiple structurally distinct, equally faithful, equally sparse circuits coexist for the same task. Implication: a large slice of mechanistic-interpretability claims that treat a discovered circuit as the explanation are over-trusting uniqueness.
  • Is Grep All You Need? (@omarsar0 · arXiv 2605.15184) [morning]. Empirical comparison across Chronos, Claude Code, Codex, and Gemini CLI finds grep-style search matches or beats vector retrieval when the harness is well-designed. Direct fit for the harness-as-load-bearing thread from WildClawBench.
  • Automating AI R&D interview paper (@ZabihullahAtal) [morning]. Stanford/OpenAI/DeepMind/Anthropic researchers report a shift toward seeing automated AI research as realistic on a tighter timeline than expected. Pairs with the r/MLScaling Prime Intellect auto-nanoGPT post (14K GPU-hours, beat human SoTA, no novel ideas proposed).
  • Semi-Formal Reasoning for patch verification (@IntuitMachine) [morning]. Structured-reasoning prompt format that forces preconditions, postconditions, and interprocedural dependencies, pushing patch-verification accuracy to 93% without running tests. Candidate intervention to shrink the 10.7% Lucky Pass rate flagged by AgentLens.
  • Anthropic 2028 US-China policy paper (@rohanpaul_ai · Anthropic post) [morning]. Argues compute export controls give the US a 12-24 month frontier lead by 2028, conditional on closing China's access to chips, model outputs, and distillation. Pairs with this week's $900B valuation news and the $200M Gates Foundation partnership as a coordinated civic-infrastructure framing.
  • Bill Gurley on open source as corporate strategy (@bgurley via @rohanpaul_ai · Substack essay) [morning]. New 65-paragraph essay reframing 27 years of open source as a strategic mechanism executives use to break monopoly power, with the headline prediction that Chinese open models become the global default by 2030. Sits on the opposite side of the same axis as Anthropic's compute-export argument.
  • LIFE multi-agent survey (@dair_ai · arXiv 2605.14892) [morning]. 200+ papers mapped along Lay → Integrate → Find faults → Evolve. Reference work for the agentic-systems concept page, with the self-evolution chapter flagged as the cleanest existing field map.
  • Sylph AI: harness evolution loop (@IntuitMachine) [morning]. Three-agent loop (Worker, Evaluator, Evolution) automates prompts, tools, and orchestration end-to-end. Structural cousin to today's ATESD and yesterday's EvolveMem, making three papers in three days where the wrapper is the trained object.
  • Generative AI depletes the innovation commons (@kyronis_talks) [morning]. Position paper arguing aggressive automation of creative and knowledge work erodes the human substrate future models need. Same axis as Andrew Ng's "no jobpocalypse" argument from Algorithmic Bridge Weekly Picks #121, opposite framing.
  • Claude Code in large codebases guide (@charmaine_klee · Anthropic blog) [morning]. Anthropic's own best-practices guide for Claude Code at scale. Shipped the same week Microsoft pulled Claude Code licenses internally, making the harness-as-product framing explicit on both sides.
  • Opaque x.com/i/article reposts (@0xblacklight, @petradonka) [morning]. Bare native-article quote-retweets with no readable body. Inaccessible to the farmer until X cookies are wired in.
  • @ClaudeDevs weekly rate-limit reset (@ClaudeDevs) [morning]. Operational. Skip.
  • NVIDIA Catalyst series and Net Zero 2026 (Azure E5, Helfie, Josh Parker Net Zero, Energy transition) (cluster of 4) [morning]. Brand-marketing video drops on healthcare access and sustainability. Skip.
  • @WHFraudTF political content (Dr. Oz CMS, SBA Maine PPP) (cluster of 2) [morning]. US administration anti-fraud political content, off-topic. Skip.
  • AI progress underestimated (@MillionInt) [afternoon]. Founder-tone macro-take with no specific claim, benchmark, or link. Skip.
slot detail

Evening

scraped 2026-05-16 22:00 IST · 0 tweets

Summary

Empty slot. Zero tweets in the 24h lookback from either @bayesiansapien's curated reposts or the tracked AI handle feed. Saturday-evening quiet is typical for this window; the morning and afternoon slots already carried the day's signal.

Posts

No posts in this window.

Afternoon

scraped 2026-05-16 15:00 IST · 1 tweets

Summary

A near-empty slot. No curated retweets from @bayesiansapien in the past 24 hours, and the AI account feed surfaced a single non-technical macro-take from Core Automation's @MillionInt about AI's pace of progress. Nothing to drill into. Signal is essentially zero for the afternoon window.

Posts

  • AI progress is underestimated by almost everyone (@MillionInt). Generic founder-tone post claiming the trend line is clear and most people are still thinking too small. No specific claim, no benchmark, no link. Skip.

Morning

scraped 2026-05-16 09:00 IST · 21 tweets · 14 curated

Summary

This morning's @bayesiansapien retweet batch is the strongest in a week. Fourteen curated retweets, and three of them land directly inside today's HuggingFace Tier 1 batch. The Nous Research announcement of Lighthouse Attention is the original source of today's Tier 1 paper that delivers 1.4-1.7x faster training at 98K context and roughly 17x faster forward+backward at 512K on a single B200, using only a training-time wrapper around standard FlashAttention. A two-paper mechanistic-interpretability cluster argues that the long-held assumption of a unique canonical "circuit" inside an LLM for each task is wrong. Multiple structurally distinct, equally faithful, equally sparse circuits coexist for the same task, which would invalidate a large chunk of the responsible-ai literature that treats discovered circuits as the explanation. A two-paper harness cluster (Sylph AI's three-agent harness evolution loop, plus the LIFE multi-agent survey) makes harness construction itself an automatable problem. Standalone strong signals from this batch: a paper finding that grep-style text search inside the right coding-agent harness beats embedding-based retrieval (a direct fit for the harness-as-load-bearing thread from WildClawBench on 05-15); a Stanford/OpenAI/DeepMind/Anthropic interview paper on automating AI research itself (which matches the Prime Intellect auto-nanoGPT Reddit post that ran 14K GPU-hours of autonomous research and surpassed human SoTA without proposing novel ideas); Anthropic's 2028 US-China policy paper framing compute export controls as a 12-24 month frontier lead; and a Bill Gurley essay reframing 27 years of open-source software as "Open Source Strategy" used to break monopoly power. The AI handle feed is thin: @ClaudeDevs reset rate limits, NVIDIA pushes a Catalyst series of brand-marketing videos, @WHFraudTF posts US administration political content that is off-topic for the wiki.

Posts

  • Lighthouse Attention release from Nous Research (@bayesiansapien retweet of @NousResearch). Direct announcement of today's HuggingFace Tier 1 paper. The mechanism: during training, queries, keys, and values are pooled symmetrically into a multi-resolution pyramid; a top-k cascade selects a small hierarchical dense sub-sequence; after sorting to preserve left-to-right order, the short selected sub-sequence runs through standard FlashAttention. No custom sparse kernel, no straight-through estimator (the trick where a non-differentiable forward op is pretended differentiable for backprop), no auxiliary loss. The wrapper removes itself near the end of training so the model ships as standard dense attention. Numbers: 1.4-1.7x speedup at 98K context, ~17x forward+backward at 512K on a single B200, both measured wall-clock. The symmetric pooling across Q, K, and V is the structural move that earlier selective-attention work declined to make, and it is what lets the gradient-free top-k cascade learn structural routing rather than just memory indexing. → Lighthouse Attention deep dive

  • Mechanistic-interpretability cluster: circuits are non-unique (cluster of 2) (@DakingRai on data-driven circuit discovery and @fnruji316625 on "All Circuits Lead to Rome" ICML 2026, paper at arXiv 2605.12671). Two retweets in 24 hours arguing the implicit Functional Anisotropy Hypothesis is wrong. The hypothesis says that for a given LLM task, the model's behavior is explained by one sparse, near-unique circuit. Both papers find that this is empirically false. For the same task, multiple structurally distinct circuits or sheaves coexist that are simultaneously faithful (reproduce the task behavior), sparse (use only a small fraction of model components), and low-overlap (the structurally distinct circuits use different components). DakingRai's paper finds that existing circuit-discovery methods often discover dataset-specific circuits or mixed-mechanism circuits rather than general task circuits, meaning the "circuit" reported in a paper is actually an artifact of the dataset used to discover it. The ICML paper introduces Overlap-Aware Sheaf Repulsion, which augments the discovery objective with an explicit penalty on structural overlap across multiple discovery runs, and shows that the resulting "Distributive Dense Circuit Hypothesis" recovers the competing mechanisms a single-shot discovery misses. Combined implication: a large body of mechanistic-interpretability claims that treat a discovered circuit as the explanation are over-trusting circuit uniqueness, and the right object to study is the distribution of competing mechanisms.

  • Is Grep All You Need? (@bayesiansapien retweet of @omarsar0, paper at arXiv 2605.15184). The paper runs an empirical comparison of grep-style text search versus embedding-based vector retrieval inside several coding-agent harnesses (Chronos, Claude Code, Codex, Gemini CLI) on real coding tasks. Finding: grep, when wrapped in the right harness, matches or beats vector retrieval. The retweet's framing is that the bottleneck was never the retrieval algorithm; it was the harness design around the primitive tools. Direct fit for the harness-as-load-bearing thread from WildClawBench (which on 05-15 measured an 18-point spread between the worst and best agent harness running the same model on the same 60 long-horizon tasks). Practical consequence: any coding-agent stack that depends on a vector database for code retrieval is worth re-evaluating against a well-instrumented grep harness.

  • Automating AI R&D, leading-lab interview paper (@bayesiansapien retweet of @ZabihullahAtal). Stanford, OpenAI, Google DeepMind, and Anthropic researchers were interviewed for a paper titled "AI Researchers' Views on Automating AI R&D and Intelligence Explosions." Three shifts are named: AI is rapidly improving at coding and math; systems are moving from assistants to autonomous developers; and many researchers now see automated AI research as a realistic possibility on a tighter timeline than they expected a year ago. Matches the r/MLScaling Prime Intellect auto-nanoGPT post in today's Reddit batch (14K GPU-hours of autonomous research surpassed human state-of-the-art on the nanoGPT speedrun, but with no novel ideas proposed). Track as a Worth Watching candidate: the next milestone is an autonomous run that proposes a novel idea, not just a faster reimplementation.

  • Semi-Formal Reasoning for code patch verification (@bayesiansapien retweet of @IntuitMachine). Prompting framework that pushes patch-verification accuracy to 93% without running any tests. Standard LLMs verify patches by skipping interprocedural traces and making unsupported claims (the example in the thread: assuming Python's format() is a builtin when it is actually shadowed by Django's custom implementation). The framework is a structured-reasoning prompt format that sits between chain-of-thought and full formal verification, forcing the model to lay out preconditions, postconditions, and interprocedural dependencies before drawing a conclusion. Direct fit for AgentLens (the 05-14 process-aware labeling system that found 10.7% of passing SWE-bench Verified trajectories are Lucky Passes, where the right answer came out for the wrong reasons): semi-formal reasoning is one candidate intervention to shrink that 10.7% number.

  • Anthropic 2028 US-China policy paper (@bayesiansapien retweet of @rohanpaul_ai, Anthropic post). Anthropic argues that US compute export controls give the US and its allies a 12-24 month frontier-AI lead by 2028, conditional on closing China's access to advanced compute and copied model outputs. The paper alleges China is using loopholes, smuggled chips, offshore data centers, and distillation attacks to stay close. Compute is framed as the central bottleneck: not just one input to AI but the gatekeeper for training, deployment, revenue, experimentation, and future capabilities. Pairs directly with The Decoder's coverage and with today's Anthropic $900B valuation news and the $200M Gates Foundation partnership. Three coordinated moves in one week framing Anthropic as the civic-infrastructure-tier AI lab.

  • Bill Gurley on open source as corporate strategy (@bayesiansapien retweet of @rohanpaul_ai's retweet of @bgurley, Substack essay). New 65-paragraph essay reframing 27 years of open-source software as "Open Source Strategy": sophisticated executives use open-source mechanisms (free distribution, permissive licensing, community contribution) to optimize corporate strategy and break monopoly power. Gurley revisits his 1999 "Rising Impact of Open Source" piece and updates it for the LLM era. Headline prediction: "Chinese open models may become the global default by 2030." Pairs against today's Microsoft pulling Claude Code licenses internally (a closed-stack vendor lock-in move) in Industry Pulse, and against Anthropic's compute-export-control argument (which depends on China-side openness being insufficient to close the frontier gap).

  • LIFE survey for multi-agent systems (@bayesiansapien retweet of @dair_ai, arXiv 2605.14892). 200+ papers mapped along four causally-linked stages: Lay (individual capability) → Integrate (collaboration) → Find faults (failure attribution) → Evolve (self-improvement). The self-evolution chapter is described by the retweet as "the cleanest field map of where memory, meta-learning, and procedure-editing approaches actually intersect." The paper itself flags the under-examined risk: in multi-agent systems, errors propagate across agents and interaction rounds, producing failures that are hard to diagnose and rarely translate into structural self-improvement. Reference work, not a deep dive; foundation reference for the agentic-systems concept page.

  • Sylph AI: harness evolution loop (@bayesiansapien retweet of @IntuitMachine). Three-agent loop (Worker executes the task, Evaluator adversarially finds failures, Evolution rewrites the harness) automates agent harness construction end-to-end: prompts, tools, orchestration logic. The retweet's framing is that manual agent engineering is dead because the next generation of agents builds their own harnesses. Structural cousin to today's ATESD (which makes teacher exposure a learnable control variable via a Beta-policy controller) and yesterday's EvolveMem (which self-evolves retrieval configuration via AutoResearch on the system's own architecture). Three papers in three days where the model's wrapper, not the model, is the thing being trained.

  • Generative AI depletes the innovation commons (@bayesiansapien retweet of @kyronis_talks). Position paper: companies aggressively automating creative and knowledge work deplete the shared "innovation commons" of fresh, diverse, high-quality human creativity that future models need. The argument runs in two timescales. Short term: individual firms win with cost cuts and productivity spikes. Long term: the substrate (new ideas, art, code, research, designs) erodes. Tracks alongside Karpathy's prior concerns about model collapse on synthetic data and against the Algorithmic Bridge's Weekly Picks #121 "no AI jobpocalypse" thread (Andrew Ng arguing the labor-market disruption is overstated). Same axis, opposite framings.

  • Claude Code in large codebases blog (@bayesiansapien retweet of @charmaine_klee, Anthropic blog). Anthropic ships its own guidance on Claude Code best practices in large codebases. Direct relevance to the harness-as-load-bearing thread (WildClawBench's 18-point harness spread on 05-15). Paired tension: Anthropic is shipping harness-design guidance the same week Microsoft pulls Claude Code licenses internally. The harness-as-product framing is now explicit on both sides.

  • Opaque article reposts (group) (@0xblacklight, @petradonka). Bare x.com/i/article/... quote-retweets with no readable body. The bypass-Twitter URL format makes the content inaccessible to the farmer (X cookies are not yet configured to fetch native long-form articles). Click through if curious; the wiki cannot extract them.

  • @ClaudeDevs rate limit reset (@ClaudeDevs). Friday weekly rate-limit reset announcement. Operational only. Skip.

  • NVIDIA Catalyst series and Net Zero 2026 (cluster of 4) (Microsoft Azure + NVIDIA Catalyst E5, Helfie Australia health access, Josh Parker Net Zero, AI as catalyst for energy transition). NVIDIA brand-marketing series on healthcare access and sustainability. Two YouTube video drops. Skip.

  • @WHFraudTF political content (cluster of 2) (Dr. Oz CMS, SBA Maine PPP fraud). US administration anti-fraud political content. Off-topic for the wiki. Skip.