Wiki Pages

Click a topic to expand. Concept pages first, then summaries newest-to-oldest.

Agent Evaluation & Benchmarks

Multi-Agent Systems

Tool Use & Function Calling

2026-05-19 · Tier 2

AI for Auto-Research: Roadmap & User Guide

2026-05-18 · Tier 2

Look Before You Leap: Autonomous Exploration for LLM Agents

2026-05-18 · Tier 2

MMSkills: Multimodal Skills for General Visual Agents

2026-05-18 · Tier 2

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

2026-05-18 · Tier 2

Solvita: Agentic Evolution for Competitive Programming

2026-05-17 · Tier 2

LIFE Survey: Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

2026-05-16 · Tier 2

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

2026-05-16 · Tier 2

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

2026-05-15 · Tier 2

Agent Memory Cluster: STALE + Preping + EvolveMem + MemEye + MemLens + BOOKMARKS

2026-05-15 · Tier 2

EvoEnv: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

2026-05-15 · Tier 2

Orchard: Open-Source Agentic Modeling Framework — 67.5% SWE-bench Verified at 30B

2026-05-15 · Tier 2

SDAR: Self-Distilled Agentic Reinforcement Learning

2026-05-15 · Tier 2

WildClawBench: Native-Runtime Long-Horizon Agent Benchmark — Claude Opus 4.7 Tops Out at 62.2%

2026-05-14 · Tier 2

AgentLens: the Lucky Pass problem in SWE-agent evaluation

2026-05-14 · Tier 2

Context Training with Active Information Seeking

2026-05-14 · Tier 2

Revisiting DAgger in the era of LLM agents

2026-05-14 · Tier 2

MAP: a Map-then-Act paradigm for long-horizon interactive agents

2026-05-13 · Tier 2

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks

2026-05-13 · Tier 2

LLM Agents Already Know When to Call Tools — Even Without Reasoning (Probe&Prefill)

2026-05-13 · Tier 2

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

2026-05-13 · Tier 2

Useful Memories Become Faulty When Continuously Updated by LLMs

2026-05-12 · Tier 2

X-OmniClaw: Unified Mobile Agent for Multimodal Understanding and Interaction

2026-05-11 · Tier 2

AutoTTS: Agentic Discovery for Test-Time Scaling

2026-05-10 · Tier 2

Jiayi Weng: Learning Beyond Gradients

2026-05-10 · Tier 3

OncoAgent: Dual-tier multi-agent framework for privacy-preserving oncology decision support

2026-05-09 · Tier 2

AI Co-Mathematician

2026-05-09 · Tier 2

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

2026-05-09 · Tier 2

Beyond Semantic Similarity: Direct Corpus Interaction (DCI)

2026-05-09 · Tier 2

Skill Curation Cluster: StraTA, Skill1, SkillOS

2026-05-07 · Tier 2

BRIGHT-Pro and RTriever-4B: Reasoning-Intensive Retrieval for Agentic Search

2026-05-07 · Tier 2

MedSkillAudit: Domain-Specific Audit Framework for Medical Research Agent Skills

2026-05-07 · Tier 2

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

2026-05-05 · Tier 2

AcademiClaw: When Students Set Challenges for AI Agents

2026-05-05 · Tier 2

Ctx2Skill: From Context to Skills — Self-Evolving Multi-Agent Skill Extraction

2026-05-05 · Tier 2

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

2026-05-05 · Tier 2

T^2PO: Token- and Turn-Level Policy Optimization for Stable Multi-Turn Agentic RL

2026-05-04 · Tier 1

Why Your Agentic AI Pentester Is Probably Just a Fancy Scanner — Ken Huang

2026-05-04 · Tier 3

LWD — Learning While Deploying: Fleet-Scale RL for Generalist Robot Policies

2026-05-02 · Tier 1

Ken Huang Ch 15 — Structured Output and Schema-Constrained Generation

2026-05-01 · Tier 2

Ara: Agent-Native Research Artifacts

2026-05-01 · Tier 2

Claw-Eval-Live: Live Agent Benchmark for Evolving Real-World Workflows

2026-05-01 · Tier 2

Eywa: Heterogeneous Scientific Foundation Model Collaboration

2026-05-01 · Tier 2

InteractWeb-Bench: Multimodal Agents under Non-Expert User Instructions

2026-05-01 · Tier 2

Intern-Atlas: A Methodological Evolution Graph

2026-05-01 · Tier 2

MCP Integration: Claude Code vs Hermes (Ken Huang Ch 13)

2026-05-01 · Tier 2

Synthetic Computers at Scale: Long-Horizon Productivity Simulation

2026-04-30 · Tier 2

ClawGym: A Scalable Framework for Building Effective Claw Agents

2026-04-25 · Tier 2

Claude Code Memory Systems: Chapter 8 Analysis

2026-04-23 · Tier 2

Claude Code vs. Hermes Agent: Permission System Architectures

2026-04-23 · Tier 2

Persistent Agent Infrastructure: Kimi K2.6, OpenAI Agent Studio, Anthropic Conway

2026-04-22 · Tier 2

AgentSPEX: An Agent Specification and Execution Language

2026-04-22 · Tier 2

SimpleTES: Evaluation-Driven Scaling for Scientific Discovery

2026-04-22 · Tier 2

HuggingFace ml-intern: Open-Source Agentic Post-Training Loop

2026-04-21 · Tier 2

Precise Debugging Benchmark: Models Regenerate, They Don't Debug

2026-04-21 · Tier 2

Reward-Free Self-Evolution: Agents That Learn Without Being Told What to Learn

GTA-2: Benchmarking General Tool Agents from Atomic Use to Open-Ended Workflows

PRL-Bench: LLMs on Frontier Physics Research

Chapter 3: The Query/Agent Loop — Claude Code vs. Hermes Agent

2026-04-19 · Tier 2

Claude Code Architecture: A Deep Reading

2026-04-19 · Tier 2

UniDoc-RL: RL-Based Visual RAG with Hierarchical Actions

2026-04-18 · Tier 2

Corpus2Skill: Don't Retrieve, Navigate

2026-04-18 · Tier 2

DR3-Eval: Realistic Benchmark for Deep Research Agents

2026-04-17 · Tier 2

Dive into Claude Code: Architecture Analysis

2026-04-17 · Tier 2

SuperLocalMemory V3.3: Biologically-Inspired Agent Memory

2026-04-16 · Tier 2

DefenseClaw, MAESTRO, and the Security Boundary Agentic AI Has Been Missing

2026-04-16 · Tier 2

Do AI Coding Agents Log Like Humans? An Empirical Study

2026-04-16 · Tier 2

Exploration and Exploitation Errors Are Measurable for Language Model Agents

2026-04-16 · Tier 2

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

2026-04-16 · Tier 2

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

2026-04-16 · Tier 2

UI-Copilot: Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

2026-04-16 · Tier 2

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

2026-05-17 · Tier 2

Open Artifacts #21: The May 2026 Open-Model Wave and the CAISI / ECI Gap

2026-05-13 · Tier 2

Anthropic overtakes OpenAI in B2B adoption for the first time (Ramp data)

2026-05-13 · Tier 2

Recursive emerges from stealth with $650M for self-improving AI

2026-05-10 · Tier 3

Broadcom won't build OpenAI's custom chip without Microsoft buying 40 percent

2026-05-08 · Tier 1

Anthropic ↔ Colossus 1 Deal: Capacity Crunch + Brand Risk

2026-05-08 · Tier 1

GitHub Reliability Crisis: AI Load Breaks the Platform

2026-05-08 · Tier 2

Lambert: Notes from inside China's AI labs

2026-05-04 · Tier 3

Anthropic + OpenAI Both Build Services Companies Around Their AI

2026-05-03 · Tier 3

Microsoft VS Code Auto-Inserts "Co-Authored-by Copilot" Even With AI Off

2026-05-03 · Tier 1

Xiaomi MiMo-V2.5-Pro — Open-Weight Long-Horizon Coding at 40-60% Fewer Tokens

2026-05-02 · Tier 3

ChatGPT Tracks Free Users for Ads by Default

2026-05-01 · Tier 3

AISN #72 — CAIS AI-Wellbeing Research, Public-Sentiment Decline, OpenAI Releases

2026-05-01 · Tier 3

Anthropic Launches Claude Security — Defensive Cyber Productization

2026-05-01 · Tier 3

Chinese AI Startups Onshoring: Moonshot, StepFun Dissolving Offshore Structures

2026-05-01 · Tier 3

UK AISI: GPT-5.5 Matches Claude Mythos on Full Network Attack Simulation

2026-05-01 · Tier 2

Marcus: "The Greatest Capital Misallocation in History?"

2026-05-01 · Tier 3

Pentagon Signs Eight Tech Giants for AI-First Fighting Force; Anthropic Excluded

2026-05-01 · Tier 2

Pragmatic Engineer: AI Load Breaks GitHub; Anthropic's Trust Speedrun

2026-05-01 · Tier 1

SemiAnalysis: AI Value Capture — The Shift to Model Labs

2026-04-30 · Tier 3

Zig's Anti-LLM Contribution Policy and the "Contributor Poker" Argument

2026-04-29 · Tier 3

Claude for Creative Work: MCP Connectors for Blender, Adobe, Ableton, Autodesk

2026-04-29 · Tier 3

UCP Wins the Agentic Commerce Governance Layer

2026-04-22 · Tier 3

Amazon-Anthropic $33B Deal and AI Capital Concentration Week

2026-04-18 · Tier 3

OpenAI Executive Departures and Product Restructuring (April 2026)

2026-04-17 · Tier 3

Anthropic's Mythos Model: Government Access and the Trust Debate

2026-04-17 · Tier 3

Claude's Explosive Market Share Surge (April 2026)

2026-05-17 · Tier 1

MoE-muP: How to Scale Mixture-of-Experts (From muP to the Maximally Scale-Stable Parameterization)

2026-05-16 · Tier 1

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

2026-05-15 · Tier 1

Dynamic Latent Routing: Joint Latent-Code and Routing Policy for LM Post-Training

2026-05-15 · Tier 1

RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

2026-05-11 · Tier 1

CaRE: Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts

2026-05-11 · Tier 1

Conductor: Learning to Orchestrate Agents in Natural Language

2026-05-08 · Tier 1

State of Routing in Model Serving (Netflix Tech Blog)

2026-05-02 · Tier 1

Step-level Optimization for Efficient Computer-use Agents

2026-05-01 · Tier 1

Ken Huang Ch 14: Model Routing and Provider Abstraction (Claude Code vs Hermes)

2026-04-17 · Tier 1

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

GPU Kernels and Accelerator Optimization

2026-05-19 · Tier 1

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

2026-05-14 · Tier 1

Inference as energy-to-token production: a position paper

2026-05-13 · Tier 1

SemiAnalysis: Cerebras — Faster Tokens Please

2026-05-09 · Tier 1

KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

2026-05-04 · Tier 1

Cerebras Targets $40B Valuation in Second IPO Attempt

2026-04-27 · Tier 1

Semiconductor Week 17, 2026: AI Memory Supercycle and Agentic EDA

2026-04-21 · Tier 1

SemiAnalysis: GPU Cluster Economics and the Goodput Reckoning

Knowledge Distillation

Speculative Decoding

2026-05-19 · Tier 1

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

2026-05-19 · Tier 1

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

2026-05-19 · Tier 1

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

2026-05-19 · Tier 2

Measuring Maximum Activations in Open Large Language Models

2026-05-19 · Tier 1

PUMA: Semantic-Preserving Early Exit for Reasoning Models

2026-05-19 · Tier 1

SNLP: Layer-Parallel Inference via Structured Newton Corrections

2026-05-19 · Tier 1

ZEDA: Post-Trained MoE Can Skip Half Experts via Self-Distillation

2026-05-18 · Tier 1

FashionChameleon: Training-Free KV Cache Rescheduling for Interactive Video Customization

2026-05-18 · Tier 1

HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

2026-05-17 · Tier 1

MTP support merged into llama.cpp: Strix Halo benchmarks confirm a 2x decode speedup at 27B, mixed result at 35B

2026-05-17 · Tier 1

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention (Raschka)

2026-05-16 · Tier 1

ATESD: Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

2026-05-16 · Tier 1

Lighthouse Attention: Long-Context Pre-Training as a Detachable Wrapper

2026-05-15 · Tier 1

Asynchronous Continuous Batching: CPU-GPU Overlap via Dual Buffer Slots

2026-05-15 · Tier 1

Forcing-KV: Hybrid KV Cache Compression for Autoregressive Video Diffusion

2026-05-14 · Tier 1

The Extrapolation Cliff: a closed-form clip-safety threshold for on-policy distillation

2026-05-14 · Tier 1

MinT: managed infrastructure for million-scale LoRA training and serving

2026-05-14 · Tier 2

MMProLong: training long-context vision-language models with generalization beyond 128K

2026-05-14 · Tier 1

Orthrus: dual-view diffusion + autoregressive on a shared KV cache

2026-05-13 · Tier 1

δ-mem: Efficient Online Memory for Large Language Models

2026-05-13 · Tier 1

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

2026-05-13 · Tier 1

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

2026-05-13 · Tier 1

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle

2026-05-13 · Tier 1

Token Superposition Training (TST): Efficient Pre-Training with Token Superposition

2026-05-12 · Tier 1

Make Each Token Count: Improving Long-Context Performance with Learned KV Eviction

2026-05-11 · Tier 1

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

2026-05-11 · Tier 1

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference

2026-05-11 · Tier 1

UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

2026-05-09 · Tier 1

EMO: Pretraining Mixture of Experts for Emergent Modularity

2026-05-09 · Tier 1

MiA-Signature: Approximating Global Activation for Long-Context Understanding

2026-05-09 · Tier 1

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

2026-05-07 · Tier 1

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

2026-05-07 · Tier 1

LIVEditor: Lightning Unified Video Editing via In-Context Sparse Attention (ISA)

2026-05-07 · Tier 1

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

2026-05-07 · Tier 1

Stream-T1: Test-Time Scaling for Streaming Video Generation

2026-05-05 · Tier 1

MotionCache: Motion-Aware Caching for Efficient Autoregressive Video Generation

2026-05-04 · Tier 1

The Distillation Panic — Nathan Lambert (Interconnects AI)

2026-05-02 · Tier 1

FlashRT: Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

2026-05-02 · Tier 1

Nemotron 3 Nano Omni: Efficient Open Multimodal Intelligence

2026-05-01 · Tier 1

LenVM: Token-Level Length Value Model

2026-05-01 · Tier 1

RoundPipe: Efficient Training on Multiple Consumer GPUs

2026-04-30 · Tier 1

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

2026-04-30 · Tier 1

Tide: Cross-Architecture Distillation for Diffusion Large Language Models

2026-04-22 · Tier 1

PrfaaS: Prefill-as-a-Service via Cross-Datacenter KV Cache Transfer

2026-04-22 · Tier 1

SDVG: Speculative Decoding for Autoregressive Video Generation

2026-04-22 · Tier 1

ShadowPEFT: Centralized Layer-Space Parameter-Efficient Fine-Tuning

2026-04-22 · Tier 1

TurboQuant: Online Vector Quantization for KV Cache Compression

2026-04-21 · Tier 1

Nemotron 3 Super: Hybrid Mamba-Attention MoE at NVFP4

1D Ordered Tokens Enable Efficient Test-Time Search

AccelOpt: Self-Improving LLM Agent for AI Accelerator Kernel Optimization

AVR: Adaptive Visual Reasoning for Efficient VRMs

Maximal Brain Damage: Disrupting Neural Networks via Sign-Bit Flips

STOP: Super Token for Path Pruning in Parallel Reasoning

W-RAC: Web Retrieval-Aware Chunking for Cost-Efficient RAG

2026-04-18 · Tier 1

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context RL

2026-04-18 · Tier 1

Switch-KD: Visual-Switch Knowledge Distillation for VLMs

2026-04-18 · Tier 1

TESSY: Teacher-Student Cooperation Framework for SFT Data Synthesis

2026-04-17 · Tier 1

Cross-Tokenizer LLM Distillation via Byte-Level Interface

2026-04-17 · Tier 1

KV Packet: Recomputation-Free Context-Independent KV Caching

2026-04-17 · Tier 1

Model Capability Dominates: Lessons from AIMO 3 Inference-Time Optimization

2026-04-16 · Tier 1

TIP: Token Importance in On-Policy Distillation

Reinforcement Learning for LLMs

2026-05-19 · Tier 2

DiHAL: Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

2026-05-19 · Tier 2

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

2026-05-19 · Tier 2

NGM: A Plug-and-Play Training-Free Memory Module for LLMs

2026-05-18 · Tier 2

AIRA-Compose and AIRA-Design: Agentic Discovery of Neural Architectures

2026-05-18 · Tier 2

CIPO: Correction-Oriented Policy Optimization with Verifiable Rewards

2026-05-18 · Tier 2

NudgeRL: Strategy-Guided Exploration for RLVR

2026-05-15 · Tier 2

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Reasoning

2026-05-15 · Tier 2

SU-01: Gold-Medal Olympiad Reasoning at 30B via Simple and Unified Scaling

2026-05-14 · Tier 2

Many-Shot CoT-ICL: long context as structured curriculum, not retrieval buffer

2026-05-13 · Tier 2

Reward Hacking in Rubric-Based Reinforcement Learning

2026-05-12 · Tier 2

G-Zero: Self-Play for Open-Ended Generation from Zero Data

2026-05-12 · Tier 2

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

2026-05-12 · Tier 2

Model Merging Scaling Laws in Large Language Models

2026-05-12 · Tier 2

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

2026-05-12 · Tier 2

Soohak: Mathematician-Curated Research-Level Math Benchmark

2026-05-10 · Tier 2

Gowers + ChatGPT 5.5 Pro: PhD-level math research in under two hours

2026-05-09 · Tier 2

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

2026-05-09 · Tier 2

Prescriptive Scaling Laws for Data Constrained Training

2026-05-09 · Tier 1

TIDE: Every Layer Knows the Token Beneath the Context

2026-05-08 · Tier 2

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

2026-05-08 · Tier 2

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

2026-05-04 · Tier 1

Import AI 455: AI Systems Are About to Start Building Themselves — Jack Clark

2026-05-04 · Tier 2

Themis — Robust Multilingual Code Reward Models for Multi-Criteria Scoring

2026-05-03 · Tier 1

Ken Huang — World Models, Architectures, and the Next Phase of AI

2026-05-03 · Tier 2

Marcus — Have LLMs Improved Patient Outcomes?

2026-05-03 · Tier 2

MIT Study — Superposition Explains Why Scaling Language Models Works So Reliably

2026-05-03 · Tier 2

Philosophy-Bench — Frontier Models Diverge on 100 Everyday Ethical Scenarios

2026-05-02 · Tier 2

ARC-AGI-3 — Three Systematic Reasoning Errors in Frontier Models

2026-05-02 · Tier 2

Compliance vs Sensibility: Reasoning Controllability in LLMs

2026-05-02 · Tier 1

The Defense Trilemma + NP-Hardness of Reward Hacking Detection

2026-05-02 · Tier 2

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

2026-05-01 · Tier 2

CoPD: Co-Evolving Policy Distillation

2026-04-30 · Tier 2

A Survey on LLM-Based Conversational User Simulation

2026-04-28 · Tier 2

Hope Architecture: Nested Learning and Continuously Adapting LLMs

2026-04-24 · Tier 2

DeepSeek V4: Architecture and Industry Impact

2026-04-24 · Tier 2

GPT-5.5: Launch Analysis and System Card Deep Dive

2026-04-22 · Tier 2

Chain-of-Thought Degrades Visual Spatial Reasoning

2026-04-22 · Tier 2

Target-Oriented Pretraining via Neuron-Activated Graph (NAG)

2026-04-22 · Tier 2

TEMPO: Scaling Test-Time Training for Large Reasoning Models

2026-04-22 · Tier 2

Weight Disentanglement and Task Arithmetic: OrthoReg

2026-04-21 · Tier 2

Geometric Canary: Steerability and Drift Detection from Representational Geometry

2026-04-21 · Tier 2

GFT: SFT is Degenerate Policy Gradient — and Group Fine-Tuning Fixes It

2026-04-21 · Tier 2

When Does RLVR Generalize? Reward Saturation and Reasoning Faithfulness

2026-04-19 · Tier 2

ASGuard: Mechanistic Defense Against Targeted Jailbreaking

2026-04-19 · Tier 2

Value Gradient Flow: RL as Optimal Transport

2026-04-18 · Tier 2

C2: Cooperative-Critical Rubric-Augmented Reward Modeling

2026-04-16 · Tier 2

InfiniteScienceGym: Procedurally-Generated Benchmark for Scientific Analysis

2026-04-16 · Tier 2

My Bets on Open Models, Mid-2026

2026-04-16 · Tier 2

From P(y|x) to P(y): Reinforcement Learning in Pre-train Space

2026-05-19 · Tier 2

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

2026-05-18 · Tier 2

DiagnosticIQ: LLM Deployment Calibration Bottleneck in Industrial Maintenance

2026-05-17 · Tier 2

LLM-based Detection of Manipulative Political Narratives

2026-05-16 · Tier 2

LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

2026-05-14 · Tier 2

WriteSAE: sparse autoencoders for the recurrent matrix cache write

2026-05-13 · Tier 2

A Single Layer to Explain Them All: Understanding Massive Activations in LLMs (ME Layer)

2026-05-10 · Tier 2

Pseudoscientific emotion AI in the workplace (Atlantic via The Decoder)

2026-05-09 · Tier 2

Anthropic Natural Language Autoencoders (NLAs)

2026-05-08 · Tier 2

The First Token Knows: Single-Decode Confidence for Hallucination Detection

2026-05-19-afternoon

2026-05-19-evening

2026-05-19-morning

2026-05-18-afternoon

2026-05-18-evening

Media Live | 2026-05-18 morning slot

2026-05-17-afternoon

2026-05-17-evening

Media Live | 2026-05-17 morning

2026-05-16-afternoon

2026-05-16-evening

2026-05-16-morning

2026-05-15-afternoon

2026-05-15-evening

2026-05-15-morning

2026-05-14-afternoon

2026-05-14-evening

2026-05-14-morning

2026-05-13-afternoon

2026-05-13-evening

2026-05-12-afternoon

2026-05-12-evening

2026-05-12-morning

2026-05-11-afternoon

2026-05-11-evening

2026-05-11-morning

2026-05-10-afternoon

2026-05-10-evening

2026-05-10-morning

2026-05-09-morning

2026-05-08-night

2026-05-07-afternoon

2026-05-06-morning

2026-05-06-night

2026-05-17 · Tier 2

CurveBench: Hierarchical Topological Reasoning from Visual Input

2026-05-12 · Tier 2

Auto-Rubric as Reward (ARR): From Implicit Preferences to Explicit Multimodal Generative Criteria

2026-05-12 · Tier 2

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

2026-05-12 · Tier 2

ROMA: Reinforcing Multimodal Reasoning Against Visual Degradation

2026-05-07 · Tier 3

APEX: Aesthetic-Informed Popularity Prediction for AI-Generated Music

2026-05-07 · Tier 3

HERMES++: Unified Driving World Model for 3D Scene Understanding and Generation

2026-05-07 · Tier 3

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

2026-05-07 · Tier 3

Parameter-Efficient Multi-View Proficiency Estimation

2026-05-07 · Tier 3

PhysForge: Physics-Grounded 3D Asset Generation

2026-05-07 · Tier 3

RLDX-1: VLA Robotic Policy for Dexterous Humanoid Manipulation

2026-05-04 · Tier 4

AnalogRetriever — Cross-Modal Representations for Analog Circuit Retrieval

2026-05-04 · Tier 3

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

2026-05-04 · Tier 3

GenLIP — Generative Language-Image Pre-training for ViTs

2026-05-04 · Tier 4

Map2World — Segment Map Conditioned Text-to-3D World Generation

2026-05-04 · Tier 3

UniVidX — Unified Multimodal Framework for Versatile Video Generation

2026-05-02 · Tier 3

Nemotron 3 Nano Omni — Efficient Open Multimodal Intelligence (NVIDIA)

2026-05-02 · Tier 3

Semi-DPO: Learning from Noisy Preferences via Semi-Supervised DPO

2026-05-02 · Tier 3

ViPO: Visual Preference Optimization at Scale

2026-05-01 · Tier 3

Edit-R1: Verifier-Based RL for Image Editing

2026-05-01 · Tier 3

FD-loss: Representation Fréchet Loss for Visual Generation

2026-05-01 · Tier 3

PhyCo: Controllable Physical Priors for Generative Motion

2026-05-01 · Tier 3

Visual Generation in the New Era: Atomic to Agentic World Modeling

2026-04-30 · Tier 3

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

2026-04-30 · Tier 3

FASH-iCNN: Editorial Fashion Identity via Multimodal CNN Probing

2026-04-30 · Tier 3

GLM-5V-Turbo: Native Foundation Model for Multimodal Agents

2026-04-30 · Tier 3

X-WAM: Unified 4D World Action Modeling with Asynchronous Denoising

Qwen3.5-Omni Technical Report

2026-04-16 · Tier 3

GameWorld: Standardized and Verifiable Evaluation of Multimodal Game Agents

2026-04-16 · Tier 3

MERRIN: Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments

2026-04-16 · Tier 3

RationalRewards: Reasoning Rewards Scale Visual Generation at Training and Test Time

2026-04-16 · Tier 3

Seedance 2.0: Advancing Video Generation for World Complexity