inference-efficiency · 2026-05-09 · Tier 1

MiA-Signature: Approximating Global Activation for Long-Context Understanding

MiA-Signature: Approximating Global Activation for Long-Context Understanding

TL;DR

Long-context RAG and agentic systems suffer the same problem: the model can't actually use the full activated context, so quality degrades. MiA-Signature borrows from cognitive science (the "global ignition" idea over distributed memory) and ships a compressed representation of the global activation pattern induced by a query. Concretely: submodular selection picks high-level concepts that cover the activated context space, with optional iterative working-memory refinement. The signature acts as a conditioning signal that approximates the full activation state at tractable cost. Drops into RAG and agentic systems with consistent gains across multiple long-context tasks.

Why this matters

This is a different framing of the long-context bottleneck. The standard story is "models forget the middle, fix attention." MiA-Signature's story is "models can't compress the global activation pattern into a usable conditioning signal, give them the signal directly." If the framing holds, every long-context system can plug a signature module in at low cost, no architectural change to the base model.

Mechanism

Standard long-context RAG:
  query ──► retrieve top-k chunks ──► prepend to context ──► generate
                                          ↓
                            attention degrades on long context

MiA-Signature:
  query ──► identify activated concept space
        ──► submodular select concepts that cover the space
              (high coverage, low redundancy)
        ──► optionally refine via working-memory iteration
        ──► emit MiA-Signature s
  
  generate(query, context, conditioning=s)

The submodular objective is the technical core. Submodularity gives diminishing-returns coverage selection, so you get a small set of concepts that span the activated space without redundancy. The cognitive-science framing ("global ignition") is a metaphor, but the math is standard combinatorial optimization. Cost is dominated by the concept-space identification step, which is upper-bounded by a small number of LLM calls.

Connections to prior wiki

Direct intersection with KV cache and long-context concept pages. MiA-Signature is orthogonal to all the KV-side work the wiki has tracked: KV Packet (04-17), TurboQuant (04-22), PrfaaS (04-22), Stream-T1 (05-07). Those compress what attention reads. MiA-Signature compresses the conditioning signal that gates what attention attends to. Both can compose. A KV-Packet-quantized cache plus a MiA-Signature conditioning vector is a natural deployment stack.

Connection to today's DCI (2605.05242). DCI replaces the retriever with grep on raw corpus, putting the burden of corpus interaction on the agent. MiA-Signature is the upstream complement: the agent that does the grepping needs a global view of what concepts to look for, and the signature is that global view. DCI plus MiA-Signature is a candidate for the post-RAG retrieval stack.

Cognitive-science framing has a track record. The "global workspace" / "global ignition" frame surfaced in earlier wiki ingest as a research-angle aside (Granularity Axis 05-09 also today, on social-role representation in LLMs). MiA-Signature is the most concrete cognitive-science-inspired primitive in the wiki to date.

Research angle

  1. What is the signature dimensionality? A 256-dim conditioning vector behaves very differently from a 16K-dim vector. The paper's headline gain depends on the size and structure of the signature.
  2. Submodular vs neural selection. The paper uses submodular selection. Could a small learned selector (e.g., a 1B classifier that proposes concepts) outperform the submodular oracle? If yes, this becomes a learnable interface.
  3. Composition with EMO (also today). EMO's expert pool selection happens at document boundaries. MiA-Signature's concept selection happens at query time. Both are sparse-conditioning primitives. The composition would be: a query produces a signature, which selects an expert pool, which restricts which model slice runs. That's a candidate end-to-end deployment stack for cost-bounded long-context inference.

Source