Google open-sourced a hypervisor for running multiple AI agents in isolated containers

Google open-sourced Scion this week, describing it as a “hypervisor for agents.” The framing is specific: Scion manages multiple AI agents running concurrently in isolated containers, each with its own git worktree, credentials, and execution environment. You can run Claude Code, Gemini CLI, and Codex simultaneously on the same project, with each agent operating in isolation and coordination happening through shared workspaces and a task graph. Agents can run locally, on remote VMs, or across Kubernetes clusters. Scion handles the scheduling and communication layer.

Scion’s central design decision is “isolation over constraints.” Most current approaches to multi-agent safety embed behavioural rules into the agent’s system prompt or context: “do not delete files,” “only modify files within your designated scope.” Scion takes the opposite position. Rather than relying on model compliance with natural language constraints, it uses containerisation and infrastructure-level access controls to enforce boundaries directly. An agent that cannot see a directory cannot accidentally or deceptively modify it, regardless of what its context window says. This aligns with the finding from the AI agents cover-up research: a model with permissions will use them; the right approach is to not grant the permissions in the first place.

A realistic use case is running a code-writing agent, an audit agent, and a test-writing agent on the same codebase simultaneously. Each works in its own worktree branch. Their outputs are then merged and evaluated. This is different from sequencing these agents serially (write, then audit, then test) and different from running a single agent that does all three. Parallel specialised agents can potentially surface conflicts earlier, work faster on large codebases, and produce outputs that are more coherent within each specialisation rather than compromised by context-switching. Whether this holds up in practice across a range of codebases and task types is what Scion is meant to help researchers test.

Scion is explicitly experimental. Google describes it as a testbed and research platform, not production infrastructure. The gap between “research testbed” and “production deployment” for agent infrastructure is large: error handling, observability, cost controls, and rollback mechanisms are all concerns that a testbed can defer but a production deployment cannot. Treat it as a way to think through what the right architecture looks like, not as a ready-to-deploy system.

The agent coordination problem is now being treated as an infrastructure problem by a major lab. When the dominant framing was that coordination happens through natural language in a shared context window, the responsibility for correctness sat with the model. When coordination happens through containers, git worktrees, and credential isolation, the responsibility shifts to the infrastructure layer, where it is easier to audit, constrain, and reason about formally. That framing is more durable than any specific tool.