Overview
Computer Use enables agents to interact with graphical applications via screenshot analysis and mouse/keyboard input.Requirements
Desktop Templates
Use one of our desktop-enabled templates:| Template | Description |
|---|---|
ubuntu-desktop | Ubuntu 24.04 with XFCE, Chrome, Firefox |
desktop-dev | Ubuntu + VS Code, Node.js, Python |
desktop-browser | Minimal desktop with Chrome only |
supportsDesktop: true.
Enabling Computer Use
If you try to deploy a runtime with
computer_use tool to a non-desktop template, deployment will fail with a validation error.Computer Use Tools
When enabled, agents have access to:| Tool | Description |
|---|---|
screenshot | Capture screen and analyze |
click | Click at coordinates |
type | Type text |
scroll | Scroll up/down |
key | Press keyboard keys |
mouse_move | Move mouse cursor |
Example: Browser Automation
VNC Access
Access the sandbox desktop via VNC:Display Configuration
Use Cases
Browser Automation
Navigate websites, fill forms, extract data
Desktop Apps
Use applications without APIs
Testing
UI testing and screenshot comparisons
Legacy Systems
Automate systems without modern APIs
Supported Applications
Computer Use works with any GUI application:- Web browsers (Chrome, Firefox)
- Office applications
- IDEs (VS Code, JetBrains)
- Design tools
- Any X11 application
Best Practices
Use appropriate resolution
Use appropriate resolution
Higher resolution = more tokens for screenshots. Balance quality vs cost.
Be specific about UI elements
Be specific about UI elements
Tell the agent exactly what to look for: “Click the blue ‘Submit’ button”
Handle loading states
Handle loading states
Instruct agent to wait for pages/apps to load before interacting.
Prefer APIs when available
Prefer APIs when available
Computer Use is slower and less reliable than direct APIs. Use only when necessary.