RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

Source: HuggingFace Daily Papers · arXiv 2605.00180 Date ingested: 2026-05-15 Tier: 1. LLM routing, profile design, generalization to new models Raw: farmer file

TL;DR

The routing literature has been preoccupied with the router (the dispatch function) and almost ignored the LLM profile (the structured description of what each candidate model is good at). RouteProfile treats LLM profiling as a heterogeneous-data integration problem and lays out a 4-dimensional design space: organizational form (per-domain bucket vs structured tree), representation type (text vs embeddings vs scalar scores), aggregation depth (raw, summary, deep abstract), and learning configuration (frozen vs trainable). Across three representative routers under standard and new-LLM-generalization conditions, three findings hold: structured profiles beat flat ones, query-level signals beat domain-level signals, and generalization to newly added models benefits most from structured profiles with trainable configurations.

Why this matters for the wiki

The wiki has tracked five papers on the routing-decision axis in the last four weeks: TraceR (query-level features), CARE (MoE bi-level), Sakana Conductor (frontier orchestration), Netflix State of Routing, and yesterday's MinT (adapter catalog). All five focused on the dispatch function. None of them studied the profile the dispatcher uses to decide where to send a query.

RouteProfile is the first paper in the wiki to treat the profile as an independent design surface. The implication is that the same router can be a strong or weak system depending entirely on how its candidate models are described to it. The wiki's existing routing pages should be re-read with this in mind: a router that scores well at evaluation time may collapse under a profile shift even though the dispatch code is unchanged.

The new-LLM-generalization setting is the production-relevant one. Routing fleets add models monthly. If the profile representation is flat or domain-bucketed, every new model needs a fresh trial-period to populate its profile before traffic can be sent. Structured profiles with trainable configurations let the router cold-start a new model from a small description, then refine. This composes directly with MinT's million-adapter catalog: the routing decision for a 10^6-adapter fleet cannot wait for empirical trials on every new adapter.

Key findings

Structured > flat. Profiles that encode taxonomic hierarchy (skill family → sub-skill → query type) consistently outperform per-domain flat profiles across all three routers tested. The ablation isolates organizational form from representation type; the gain is from structure, not from richer features.
Query-level > domain-level. A profile that summarizes a model's behavior on specific query patterns generalizes better than one that aggregates by domain (math, code, reasoning). Domain labels are too coarse to predict routing quality.
Trainable configurations close the new-model gap. When a new LLM is added without empirical traces, structured + trainable profiles maintain accuracy; frozen or flat profiles collapse.

Connections to prior wiki pages

TraceR — used query-level features for the router but not for the profile. RouteProfile says the profile should also be query-level. The natural composition: a TraceR-style router using a RouteProfile-style profile.
Netflix State of Routing — Netflix's production routing pipeline is largely domain-bucketed. RouteProfile suggests this is the wrong default; query-level profiles are the production target.
MinT — 10^6 adapter catalog is exactly the regime where domain-bucketed profiles fail. RouteProfile gives MinT the missing addressing layer.
llm-routing.md — concept page should add "profile design" as a first-class research axis alongside router design.

Research angle

Three threads worth pulling.

Profile-router co-training. RouteProfile keeps the router fixed and studies the profile. The natural extension is to co-train both: the profile changes the router's input distribution, and the router's gradient should flow through the profile representation. Joint optimization is the cleaner formulation.
Profile compression. A query-level profile for 10^6 adapters is hundreds of MB per adapter if naive. The compression problem (what's the minimum profile that supports correct routing?) is unstudied.
Profiles for adapter routing, not model routing. RouteProfile's experiments are on model fleets. Whether the same design choices transfer to adapter fleets (where the base is shared) is open. The four design dimensions may have different optima when the policy delta is sub-1%-of-base instead of a different model.

RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

TL;DR

Why this matters for the wiki

Key findings

Connections to prior wiki pages

Research angle

Links