Skip to main content

Overview

Computer Use enables agents to interact with graphical applications via screenshot analysis and mouse/keyboard input.

Requirements

Computer Use has two requirements:
  1. Vision model — Must support image input (e.g., claude-opus-4-5, gpt-5.2, gemini-3-pro)
  2. Desktop template — Must use a template with display server (X11/Wayland)

Desktop Templates

Use one of our desktop-enabled templates:
TemplateDescription
ubuntu-desktopUbuntu 24.04 with XFCE, Chrome, Firefox
desktop-devUbuntu + VS Code, Node.js, Python
desktop-browserMinimal desktop with Chrome only
Or create a custom template with supportsDesktop: true.

Enabling Computer Use

const runtime = await rt.runtimes.create({
  slug: 'browser-agent',
  model: 'claude-opus-4-5',  // Must support vision
  tools: [
    'bash',
    'read_file',
    'edit_file',
    'computer_use',  // Enable computer use
  ],
});

// Deploy with a desktop template
const deployment = await rt.deployments.create({
  runtimeSlug: 'browser-agent',
  templateSlug: 'ubuntu-desktop',  // Must be desktop-enabled
  apiSlug: 'my-browser-bot',
});
If you try to deploy a runtime with computer_use tool to a non-desktop template, deployment will fail with a validation error.

Computer Use Tools

When enabled, agents have access to:
ToolDescription
screenshotCapture screen and analyze
clickClick at coordinates
typeType text
scrollScroll up/down
keyPress keyboard keys
mouse_moveMove mouse cursor

Example: Browser Automation

const run = await rt.agents.run('browser-agent', {
  message: 'Go to google.com, search for "RunTools", and screenshot the results',
});

for await (const event of run) {
  if (event.type === 'tool_call' && event.tool === 'screenshot') {
    console.log('Agent took a screenshot');
  }
}

VNC Access

Access the sandbox desktop via VNC:
const vnc = await sandbox.getVNC();
console.log(vnc.url);      // wss://sandbox-abc123.vnc.runtools.ai
console.log(vnc.password); // Connection password
Use any VNC client or our web viewer in the dashboard.

Display Configuration

const sandbox = await rt.sandboxes.create({
  template: 'desktop',  // Template with GUI support
  display: {
    width: 1920,
    height: 1080,
    depth: 24,
  },
});

Use Cases

Browser Automation

Navigate websites, fill forms, extract data

Desktop Apps

Use applications without APIs

Testing

UI testing and screenshot comparisons

Legacy Systems

Automate systems without modern APIs

Supported Applications

Computer Use works with any GUI application:
  • Web browsers (Chrome, Firefox)
  • Office applications
  • IDEs (VS Code, JetBrains)
  • Design tools
  • Any X11 application

Best Practices

Higher resolution = more tokens for screenshots. Balance quality vs cost.
Tell the agent exactly what to look for: “Click the blue ‘Submit’ button”
Instruct agent to wait for pages/apps to load before interacting.
Computer Use is slower and less reliable than direct APIs. Use only when necessary.