agentic-systems · 2026-04-16 · Tier 2

UI-Copilot: Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

UI-Copilot: Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

TL;DR: UI-Copilot splits the GUI agent into two roles — a policy agent that executes tasks and a lightweight copilot that provides memory retrieval and numerical computation on demand. This separation breaks long-horizon GUI tasks into manageable pieces, achieving 17.1% absolute improvement on AndroidWorld over the base Qwen model.

Key Findings

  • Memory decoupling separates persistent observations (what happened before) from transient execution context (what's happening now), reducing memory degradation in long sessions.
  • Tool-Integrated Policy Optimization (TIPO) trains tool selection and task execution separately — single-turn prediction for routing, on-policy multi-turn rollouts for execution quality.
  • Outperforms GUI-Owl-7B and UI-TARS-1.5-7B on MemGUI-Bench at 7B scale.
  • Key failure modes in long-horizon GUI: memory degradation, progress confusion, math hallucination — all addressed by offloading to the copilot.

Related Pages

Raw source: ../../raw/huggingface/2026-04-16-ui-copilot-advancing-long-horizon-gui-automation-via-tool-in.md