2026-05-18 — cere-bro

Summary

The day is a clear evening-heavy slot. Morning was empty (zero retweets, zero articles, second consecutive Monday like that) and afternoon offered only one Tier 1 nugget. The evening slot carried the substance: five of six @bayesiansapien retweets are worth reading, anchored by Meta's SP-KV (Self-Pruned KV Attention) which trains a per-head utility predictor for KV eviction and claims 3 to 10x cache reduction. Two field-shaping arguments land alongside it: a Stanford Data Processing Inequality paper arguing a single LLM beats coordinated multi-agent systems under equal reasoning budgets, and dair.ai's Epistematics paper claiming most agent leaderboards do not measure what they advertise. The afternoon's single nugget is Atlas Inference clocking Qwen3.6-35B at 200+ tok/s on a DGX Spark (GB10), roughly 3x what Codex and Claude pipelines hit on the same hardware class. Everything from @brivael (30+ tweets across slots, mostly French-language polemics) and the @nvidia / @magicsilicon Dell-Tech-World promo cluster is noise.

Posts

SP-KV: Self-Pruned KV Attention from Meta, 3 to 10x KV cache reduction (@TheTuringPost) [evening]. Per-token per-head 2-layer MLP predicts utility, old tokens are pruned while a local sliding window stays full. Composes with Make Each Token Count's eviction policy and the KV-sharing / MHC line Raschka surveyed yesterday. Cleanest Tier 1 KV-cache story of the week.
Single LLM beats coordinated multi-agent under equal reasoning budgets (Stanford) (@rohanpaul_ai) [evening]. Formalizes the handoff-as-compression argument via the Data Processing Inequality. Reads as a coherent counter to the multi-agent default, alongside the LIFE survey on multi-agent collaboration failure and the multi-agent-systems concept page.
The Evaluation Trap / Epistematics: most agent leaderboards do not measure what you think (@dair_ai · arxiv 2605.14167) [evening]. Audit procedure that derives evaluation criteria from a benchmark's capability claim and checks whether the test discriminates the claim from proxy behaviors. Worked example shows Dupoux et al. (2026) reproducing the assumption it claims to revise.
SFT memorizes, RL generalizes (ICML 2025) (@burkov · arxiv 2501.17161) [evening]. Comparative study across rule-based textual and visual tasks. Empirical companion to GFT: SFT as degenerate RL, the theoretical version of the same claim.
Detecting overfitting during long-horizon grokking via Random Matrix Theory (@burkov · arxiv 2605.12394) [evening]. RMT spectra of weight matrices alone discriminate generalizing vs memorizing basins. Practitioner setting with no training history, no test set. Model-card-grade diagnostic if it holds up.
Atlas Inference clocks Qwen3.6-35B at 200+ tok/s on DGX Spark (GB10) (@Scobleizer reposting @AtlasInference) [afternoon]. Claim is roughly 3x Codex / Claude on the same hardware class. No paper or kernel detail in the post. Worth a follow-up if Atlas publishes methodology.
Hermes Agent Kanban: orchestrator auto-decomposition on triage (@Scobleizer · docs · PR #27572) [afternoon]. Orchestrator decomposes a triage prompt into subtasks and routes by specialization description, durable board in ~/.hermes/kanban.db, every worker an OS process. Adjacent to Claude Code vs Hermes permissions coverage.
Sholto Douglas reposts "How to land a frontier lab job" by Vlad Feinberg (@_sholtodouglas · vladfeinberg.com) [evening]. Anthropic-insider endorsement, career-side reading rather than research, high signal for the audience.
Claude Code at scale: best practices for monorepos, legacy systems, microservices (@ClaudeDevs · claude.com blog) [evening]. Anthropic's own write-up for million-line repos. Practitioner reading.
Grok Build beta first impressions (@brivael) [evening]. Hands-on note, speed is "genuinely cool", quality-at-speed claim would be a real IDE-agent shift if it lands near Opus 4.7. Anecdotal, no benchmarks.
Opaque x.com/i/article reposts (@nyk_builderz via bayesiansapien) [evening]. Content not fetchable. Click through to read.
@MillionInt productivity / math aphorisms (cluster of 2, @MillionInt) [morning]. Inspirational, no AI content. Skip.
@brivael French-language polemics on AI meritocracy, copycats, politics, Twitter drama (cluster of 30 across slots, @brivael) [afternoon + evening]. No links to research, no falsifiable claim. Skip.
Scoble personal feed and consumer biometric / BCI plugs (cluster of 6, @Scobleizer) [afternoon + evening]. Bill Gates time-value, Big Sur sunset, "AI is taking my job" essay, globaledentity.com vein-and-skeletal biometrics, Mave Health consumer BCI. No technical content. Skip.
NVIDIA at Dell Technologies World, Jensen on stage with Michael Dell (@nvidia keynote · @nvidia AI-and-routine-work clip) (cluster of 2) [evening]. Promotional. Skip.
INTC on the NYSE floor, Lip-Bu Tan on Mad Money (@magicsilicon) [evening]. Stock promo. Skip.
@BrettRatner Instagram reel (@BrettRatner) [afternoon]. Opaque link, no preview. Click through to read.