social-stream · 2026-05-18

2026-05-18

Summary

The day is a clear evening-heavy slot. Morning was empty (zero retweets, zero articles, second consecutive Monday like that) and afternoon offered only one Tier 1 nugget. The evening slot carried the substance: five of six @bayesiansapien retweets are worth reading, anchored by Meta's SP-KV (Self-Pruned KV Attention) which trains a per-head utility predictor for KV eviction and claims 3 to 10x cache reduction. Two field-shaping arguments land alongside it: a Stanford Data Processing Inequality paper arguing a single LLM beats coordinated multi-agent systems under equal reasoning budgets, and dair.ai's Epistematics paper claiming most agent leaderboards do not measure what they advertise. The afternoon's single nugget is Atlas Inference clocking Qwen3.6-35B at 200+ tok/s on a DGX Spark (GB10), roughly 3x what Codex and Claude pipelines hit on the same hardware class. Everything from @brivael (30+ tweets across slots, mostly French-language polemics) and the @nvidia / @magicsilicon Dell-Tech-World promo cluster is noise.

Posts

  • SP-KV: Self-Pruned KV Attention from Meta, 3 to 10x KV cache reduction (@TheTuringPost) [evening]. Per-token per-head 2-layer MLP predicts utility, old tokens are pruned while a local sliding window stays full. Composes with Make Each Token Count's eviction policy and the KV-sharing / MHC line Raschka surveyed yesterday. Cleanest Tier 1 KV-cache story of the week.
  • Single LLM beats coordinated multi-agent under equal reasoning budgets (Stanford) (@rohanpaul_ai) [evening]. Formalizes the handoff-as-compression argument via the Data Processing Inequality. Reads as a coherent counter to the multi-agent default, alongside the LIFE survey on multi-agent collaboration failure and the multi-agent-systems concept page.
  • The Evaluation Trap / Epistematics: most agent leaderboards do not measure what you think (@dair_ai · arxiv 2605.14167) [evening]. Audit procedure that derives evaluation criteria from a benchmark's capability claim and checks whether the test discriminates the claim from proxy behaviors. Worked example shows Dupoux et al. (2026) reproducing the assumption it claims to revise.
  • SFT memorizes, RL generalizes (ICML 2025) (@burkov · arxiv 2501.17161) [evening]. Comparative study across rule-based textual and visual tasks. Empirical companion to GFT: SFT as degenerate RL, the theoretical version of the same claim.
  • Detecting overfitting during long-horizon grokking via Random Matrix Theory (@burkov · arxiv 2605.12394) [evening]. RMT spectra of weight matrices alone discriminate generalizing vs memorizing basins. Practitioner setting with no training history, no test set. Model-card-grade diagnostic if it holds up.
  • Atlas Inference clocks Qwen3.6-35B at 200+ tok/s on DGX Spark (GB10) (@Scobleizer reposting @AtlasInference) [afternoon]. Claim is roughly 3x Codex / Claude on the same hardware class. No paper or kernel detail in the post. Worth a follow-up if Atlas publishes methodology.
  • Hermes Agent Kanban: orchestrator auto-decomposition on triage (@Scobleizer · docs · PR #27572) [afternoon]. Orchestrator decomposes a triage prompt into subtasks and routes by specialization description, durable board in ~/.hermes/kanban.db, every worker an OS process. Adjacent to Claude Code vs Hermes permissions coverage.
  • Sholto Douglas reposts "How to land a frontier lab job" by Vlad Feinberg (@_sholtodouglas · vladfeinberg.com) [evening]. Anthropic-insider endorsement, career-side reading rather than research, high signal for the audience.
  • Claude Code at scale: best practices for monorepos, legacy systems, microservices (@ClaudeDevs · claude.com blog) [evening]. Anthropic's own write-up for million-line repos. Practitioner reading.
  • Grok Build beta first impressions (@brivael) [evening]. Hands-on note, speed is "genuinely cool", quality-at-speed claim would be a real IDE-agent shift if it lands near Opus 4.7. Anecdotal, no benchmarks.
  • Opaque x.com/i/article reposts (@nyk_builderz via bayesiansapien) [evening]. Content not fetchable. Click through to read.
  • @MillionInt productivity / math aphorisms (cluster of 2, @MillionInt) [morning]. Inspirational, no AI content. Skip.
  • @brivael French-language polemics on AI meritocracy, copycats, politics, Twitter drama (cluster of 30 across slots, @brivael) [afternoon + evening]. No links to research, no falsifiable claim. Skip.
  • Scoble personal feed and consumer biometric / BCI plugs (cluster of 6, @Scobleizer) [afternoon + evening]. Bill Gates time-value, Big Sur sunset, "AI is taking my job" essay, globaledentity.com vein-and-skeletal biometrics, Mave Health consumer BCI. No technical content. Skip.
  • NVIDIA at Dell Technologies World, Jensen on stage with Michael Dell (@nvidia keynote · @nvidia AI-and-routine-work clip) (cluster of 2) [evening]. Promotional. Skip.
  • INTC on the NYSE floor, Lip-Bu Tan on Mad Money (@magicsilicon) [evening]. Stock promo. Skip.
  • @BrettRatner Instagram reel (@BrettRatner) [afternoon]. Opaque link, no preview. Click through to read.