Agent JIT Compilation — ICML 2026 Paper Shows 10.4× Speedup by Compiling Web Agent Tasks to Executable Code

Agent JIT Compilation — ICML 2026 Paper Shows 10.4× Speedup by Compiling Web Agent Tasks to Executable Code
📑 Table of Contents

Agent JIT Compilation — ICML 2026 Paper Shows 10.4× Speedup by Compiling Web Agent Tasks to Executable Code

A team from Stanford and Google Research has reimagined how computer-use agents should plan and execute tasks on the web — and the results could reshape an entire generation of agent architectures.

Every computer-use agent (CUA) today works roughly the same way: it looks at a screenshot, decides what to do, does it, looks again, and repeats. This sequential loop — fetch-screenshot-execute, fetch-screenshot-execute — costs one large-language-model (LLM) call per step. For a task like ordering a pizza or filling out a multi-page application, that’s dozens of LLM calls, each adding seconds of latency and compounding the chance of error.

In a paper accepted at ICML 2026, researchers Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini, and Christos Kozyrakis propose a radical alternative: Agent JIT Compilation. Instead of reasoning step-by-step in the loop, the agent compiles the task description into executable code — code that can contain LLM calls, tool invocations, and parallelization — all validated and cost-optimized before a single action is taken.

The results speak for themselves: 10.4× speedup and +28% accuracy over the Browser-Use framework, and 2.4× speedup and +9% accuracy over OpenAI’s own CUA.


The Core Insight: Treat Agent Loops Like Compilers

The name is deliberate. Just-in-time (JIT) compilation in programming languages translates high-level bytecode into optimized machine code at runtime, selecting among many valid translations the one with the lowest estimated cost. Agent JIT Compilation does the same — but the “source language” is a natural-language task description, and the “machine code” is a plan of executable tool calls, LLM evaluations, and parallel branches.

The system comprises three components working in concert:

Component Purpose
Invariant-Enforcing Tool Protocol Pre/postcondition state invariants for compositional verification at compile time
JIT-Planner Generates multiple code plans, validates them via CFG traversal against tool contracts, selects the minimum-cost candidate
JIT-Scheduler Chooses the optimal parallelization strategy (serial/parallel/hedge) via Monte Carlo cost estimation from learned latency distributions

“Like a JIT compiler, which at run time translates a higher-level program into optimized lower-level code, our system translates a higher-level natural-language instruction into lower-level code at plan synthesis time.” — From the paper

The Invariant-Enforcing Tool Protocol

A critical finding in the paper: 45-50% of web automation errors arise from incorrect action sequences — calling a tool in the wrong order, or calling a tool when the page state doesn’t support it.

The tool protocol eliminates this entire class of errors by attaching preconditions and postconditions to every tool, much like a formal verification system:

{
  "name": "add_to_cart",
  "pre": {"page_type": "store"},
  "post": {"page_type": "store"},
  "pre_check": "return document.body.textContent.includes('Full Menu') ? true : [false, 'Not on a store page'];"
}

State invariants are composable: two tool calls can be chained if the first’s post satisfies the second’s pre. Static verification at plan synthesis time eliminates tool-ordering errors — cutting the failure rate from 59% to 25%, and boosting overall valid-plan rate from 77.1% to 90.6% across all tested models (p ≪ 0.001).

JIT-Planner: Why Cost-Aware Selection Matters

The JIT-Planner generates multiple code plans in parallel (up to K candidates), builds a control-flow graph (CFG) for each, validates every tool call against the current state invariants, computes an estimated cost using learned latency distributions, and returns the cheapest valid plan.

The critical finding: the difference between the best-cost and worst-cost plan candidates is 5.3× on average (11.7s vs 61.7s), validating that naive plan generation is leaving massive performance on the table.

Performance across models:

Method GPT-4.1 Gemini 2.5 Flash Gemini 2.5 Pro
Browser-Use 150.1s, 61% 100.3s, 59% 115.9s, 77%
Browser-Use + cache 105.2s, 88% 69.3s, 81% 65.8s, 86%
JIT-Planner 15.4s, 90% 7.2s, 94% 12.6s, 97%
Speedup vs Browser-Use 9.7× 14× 9.2×

With Gemini 2.5 Flash, JIT-Planner achieves 94% accuracy in 7.2 seconds — down from 100.3 seconds with standard Browser-Use. A 14× improvement.

JIT-Scheduler: Parallelization via Monte Carlo

Once a plan is selected, the JIT-Scheduler decides how to execute it. It simulates three strategies — serial execution, parallel execution of independent branches, and “hedge” execution (running multiple strategies simultaneously and taking the first result) — using Monte Carlo sampling over learned latency distributions.

The scheduler also leverages a persistent cache of past tool results. When identical tool calls appear in different plans, previous results can be reused without re-execution, further reducing latency.

Why This Matters

The paper addresses what may be the single biggest bottleneck in production agent deployments today: latency. Web agents that take 2-3 minutes per task are not practical for end users. Agents that complete the same task in 7-15 seconds — with higher accuracy — cross a critical usability threshold. As the State of Agent Engineering 2026 report confirms, infrastructure bottlenecks like rate limits and latency are rapidly overtaking model capability as the primary barrier to production deployment.

The implications extend beyond web automation:

  • Tool-calling agents in any domain can benefit from cost-optimized plan synthesis with invariant enforcement
  • The JIT compilation metaphor provides a principled framework for thinking about agent planning as optimization rather than sequential reasoning
  • Parallel execution of independent agent sub-tasks is an under-explored lever for performance — the paper shows it can nearly double throughput without sacrificing accuracy
  • The invariant-enforcing protocol is model-agnostic — any frontier LLM benefits from the same static validation

The Road Ahead

Agent JIT Compilation is accepted at ICML 2026, one of the premier machine learning conferences. The authors have released the paper under a CC BY 4.0 license, and code is expected to follow.

For the agent ecosystem, this work represents a shift in how we think about agent architecture. Instead of optimizing the loop, maybe we should compile the loop away entirely. This paradigm is one of several architectural innovations reshaping the agent landscape, alongside the frameworks and patterns catalogued in our Complete Guide to AI Agents.


Sources: arXiv:2605.21470 — “Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling” (ICML 2026), StartupHub.ai analysis