2026-05-10-afternoon

Summary

Quiet slot dominated by one substantive thread: Jiayi Weng's "Learning Beyond Gradients" essay, surfaced by @MillionInt with a bearish framing on RL. A coding-agent-iterated NumPy+cv2 heuristic policy beats neural RL on VizDoom D3 Battle, prompting the claim that coding agents may turn maintainable heuristics into a serious post-RLVR paradigm. @MillionInt's follow-up sharpens the point: RL only "wins" where simple heuristics already get you most of the way, so the heuristic-policy result is more an indictment of the benchmark than a refutation of RL. Tesla supplies three operational notes (final Model S/X off the Fremont line, photon-count image reconstruction for FSD night vision, crash-data-driven airbag deployment timing) that are interesting as deployed-AI engineering but adjacent to wiki priorities. Grok Imagine and an architecture aside round out the noise.

Posts

Learning Beyond Gradients: heuristics as the next paradigm after RLVR? (@MillionInt · blog). Jiayi Weng had Codex iterate a closed-loop pure NumPy+cv2 heuristic for VizDoom D3 Battle using only screen pixels and public game variables. No network, no map, no seeds. It works. The argument: hand-written rules were never useless, they were too expensive to maintain, and coding agents flatten that maintenance curve enough that programmatic policies become continual-learning vehicles without weight updates. @MillionInt frames this as bearish for RL.
Follow-up: where RL actually fails to generalize (@MillionInt). Clarifies the bearish post: if a heuristic simple enough for Codex solves the game, RL will find and overfit to that heuristic, and won't generalize. The bottleneck is environments where simple heuristics get you far. Useful caveat. It reframes the VizDoom result less as "RL is dead" and more as "this benchmark was always heuristic-shaped."
Tesla photon-count image reconstruction for FSD (@Tesla). Tesla shows the human-perceived RGB next to its photon-count reconstruction, claimed to explain why FSD sees through extreme glare and at night. Interesting as a sensor-fusion / low-light vision pipeline detail, though the post is marketing-shaped and gives no method specifics.
Tesla crash-data-driven restraint deployment timing (@Tesla). Wes Morrill describes replaying real fleet crash data in simulation, sweeping airbag and seat-belt-pretension timing, and finding that earlier deployment improves occupant kinematics. Strong example of a mature data flywheel turning fleet telemetry into a control-policy improvement, even if not directly an AI-modeling advance.
End of Model S and Model X production at Fremont (@Tesla). Industrial milestone. No AI content.
Laws-of-physics-of-companies aphorism (@MillionInt). One-line take that outsized returns require bending industry-specific "laws" via tech innovation. Skip.
Sam-Altman-firing texts as a musical (@MillionInt). Repost of @dgrreen turning the Altman/Murati 2023 firing texts (now Musk v. OpenAI trial evidence) into a Hamilton-style number. Curio, not signal. Skip.
Grok Imagine weekend-plans promo (@imagine). Generated-content showcase. Skip.
Cathedral of Learning aside (@magicsilicon). Off-topic architecture musing. Skip.