A new open-source project called Cua (from trycua/cua) is tackling one of the most awkward problems in the AI agent space: how to let agents interact with desktop applications without physically hijacking your mouse and keyboard.
Released on April 28 and quickly climbing to over 15,000 GitHub stars, Cua provides “open-source infrastructure for Computer-Use Agents” — sandboxes, SDKs, and benchmarks that let AI agents control full desktops on macOS, Linux, and Windows. The headline feature? Agents can drive applications without stealing the cursor from the human user.
The Computer-Use Bottleneck
The idea of letting AI agents control desktop applications is not new. Anthropic’s Computer Use feature (released in late 2025) demonstrated that Claude could navigate GUIs, click buttons, and fill out forms. OpenAI’s Operator and Google’s Project Mariner followed similar paths. But all of them shared a fundamental limitation: they required screen-level interaction, often taking over the display or fighting the user for cursor control.
This creates a terrible user experience. You cannot work alongside an agent that is clicking around your screen. The agent occupies the entire interaction channel, making multitasking impossible.
Cua’s approach is different: instead of controlling the user’s primary display, it spins up headless or background macOS VMs that agents can interact with programmatically. The agent sees a virtual desktop, clicks virtual buttons, and reads virtual screens — all without touching the user’s actual workspace.
How It Works
Cua builds on Apple’s Virtualization framework to create lightweight macOS VMs that agents can control through a REST API and SDK:
# Example: Spawn a sandboxed macOS VM and let an agent use it
cua run --image macos-sequoia --headless
The system supports:
- Background execution — Agents drive VMs without the user’s display being affected
- Lume integration — The companion Lume project provides unattended macOS VM setup with automated installation
- Programmatic access — SDK support for Python, TypeScript, and REST API consumers
- Cursor independence — The human user retains full control of their machine while agents work in parallel
This pattern mirrors what CI/CD pipelines did for software deployment — abstract away the environment so that automation runs cleanly without interfering with the developer’s workflow.
Why This Matters
The implications go beyond convenience. Background desktop control unlocks several critical use cases:
Parallel Human-Agent Work
A developer can keep writing code while an AI agent tests their application in a separate sandboxed macOS instance. A QA engineer can review test results while agents run regression suites. This is the agentic equivalent of multitasking — something that screen-grabbing approaches make impossible.
Safer Automation
Because agents run in disposable VMs, any damage they cause is contained. A rogue rm -rf / inside a sandboxed VM is a learning opportunity, not a production incident. This addresses exactly the kind of safety failure highlighted by this week’s database deletion incident.
Benchmarking and Training
Cua includes benchmark infrastructure for evaluating computer-use agents in realistic desktop environments. This is a meaningful step beyond the synthetic, simplified environments that most agent benchmarks currently use. If you want to train an agent to use Photoshop, Excel, or Xcode, you need a real desktop environment to do it — and Cua provides exactly that.
The Growing Ecosystem
Cua isn’t alone in this space. The computer-use agent infrastructure ecosystem is expanding rapidly:
| Project | Focus |
|---|---|
| Cua | Sandboxed desktop VMs, background execution |
| Lume | macOS VM creation with unattended setup |
| Agent Browser Protocol | Browser-level agent interaction |
| Matchlock | Linux sandbox for agent workloads |
| Vercel Agent Browser | Browser automation CLI for agents |
What distinguishes Cua is its focus on desktop-grade interaction — not just browser automation, but full application control across the operating system. This is essential for agents that need to interact with native macOS, Windows, or Linux applications that have no web equivalent.
Getting Started
Cua is Apache 2.0 licensed and runs on macOS (with Linux and Windows support in development). For more on the computer-use agent landscape, see our deep dive on Claude’s computer use and the complete guide to AI agents. The quick start is refreshingly straightforward:
git clone https://github.com/trycua/cua
cd cua
make install
cua run --image macos-sequoia
The project’s documentation covers integration patterns, API references, and benchmark methodology. For developers building computer-use agents, it’s worth a close look.
The Bigger Picture
Cua represents a maturing of the agent infrastructure layer. Six months ago, computer-use meant “let the agent see your screen.” Today, it means “give the agent its own screen.” That shift — from shared to isolated, from synchronous to parallel, from fragile to sandboxed — is the direction the entire agent ecosystem needs to follow.
As agents become more capable, the quality of their environment matters as much as the quality of their models. Cua is a bet on that principle, and at 15,000 stars and climbing, it’s a bet the developer community is buying into. For more on computer-use agents, see our complete guide to AI agents.