Introducing HyperMachine: an agentic hypervisor for autonomous AI systems
RELEASE

Introducing HyperMachine: an agentic hypervisor for autonomous AI systems

Adam EricksonAdam EricksonJune 5, 20265 min read
Listen to this article
0:00
--:--

AI agents are starting to run real workloads — building software, executing tools, driving long-horizon tasks — and every one of those workloads needs somewhere safe to run. The honest answer today is a tangle of containers, cloud VMs, and shell scripts glued together with brittle CLI scraping. None of it was designed for an agent that wants to discover what it can do, reserve a GPU, spawn a sandbox, and clean up after itself.

HyperMachine is our answer: a high-performance hypervisor framework written in Rust, with first-class AI agent support built into the core rather than bolted on. It runs as both a Type-1 bare-metal hypervisor and a Type-2 hosted one from a single codebase, and every VM is an addressable resource an agent can reason about.


Agentic-first virtualization

HyperMachine is the first hypervisor designed from the ground up for AI agent workloads. Every VM is an MCP-addressable resource: agents discover capabilities through ontology endpoints, invoke typed tools (vm.create, vm.exec, gpu.reserve), and receive structured results — no shell scraping, no brittle CLI wrappers. Multi-LLM tool schemas ship built-in for OpenAI, Anthropic, and Google formats, so the same VM fleet is reachable from GPT, Claude, and Gemini without translation glue.

# Discover available tools
curl http://localhost:8080/mcp/tools

# LLM-specific tool formats
curl http://localhost:8080/agentic/tools/openai     # GPT-4o, o1, o3
curl http://localhost:8080/agentic/tools/anthropic  # Claude 4, Sonnet
curl http://localhost:8080/agentic/tools/gemini     # Gemini 2.5

# Execute an operation
curl -X POST http://localhost:8080/mcp/call \
  -H "Authorization: Bearer your-key" \
  -d '{"tool": "vm.create", "arguments": {"name": "ai-sandbox", "cpu_cores": 4}}'

O(1) agent spawn: fleet density as a first principle

The defining number in HyperMachine is O(1) agent spawn. A built-in agent runtime spawns agents as copy-on-write clones of a warm baseline, so 100 idle agents cost roughly one baseline's memory rather than 100. A clone is about 9 ns regardless of fleet size — constant at 1, 16, or 64 baseline units — while a full copy starts at hundreds of microseconds and grows with size.

Measured with Criterion on an AMD Ryzen 9 9900X:

  • Agent spawn (CoW clone) — ~9 ns, O(1); the foundation of fleet density.
  • CoW first-write fault — 73 ns; the per-page copy paid once, when a clone first dirties a page.
  • Guest memory read / write — 27–96 ns / 9–19 ns for 64 B–4 KiB zero-copy accesses.
  • MCP tool dispatch — 547 ns for a single agent tool call end-to-end; ~0.7 µs/call at 64-way concurrency.

Agents run tool-calling loops over a fast MCP dispatch path, and their sessions and memory are reclaimed automatically. The runtime is exposed over a tenant-scoped, optionally-authenticated REST API.

# Spawn a tenant-scoped agent (O(1) CoW sandbox from the warm baseline)
curl -X POST http://localhost:8080/api/v1/agents \
  -H "X-Tenant-Id: acme" -H "Content-Type: application/json" \
  -d '{"agent_id": "researcher", "capabilities": "operator"}'

curl http://localhost:8080/api/v1/agents/fleet -H "X-Tenant-Id: acme"  # fleet memory
curl -X POST http://localhost:8080/api/v1/agents/reap -d '{"max_idle_secs": 600}'

One codebase, Type-1 and Type-2

A single codebase runs as both a Type-2 hosted hypervisor (KVM on Linux, WHPX on Windows, HVF on macOS) and a Type-1 bare-metal hypervisor (Intel VMX, AMD SVM) with no code duplication. The same VM definitions, device models, and API surface work in both modes: develop on your laptop in hosted mode, then deploy the identical stack bare-metal in production.

# Create and run a VM (Type-2 hosted mode)
hm t2 create --name myvm --cpu 4 --memory 8G --gpu
hm t2 start myvm

# Start the MCP server for AI agents
hm mcp serve --api-key "your-key"

A GPU fabric agents can reserve

HyperMachine models GPU interconnect topology — NVLink, NVSwitch, PCIe — and makes placement decisions based on real bandwidth and latency rather than a flat device count. Capacity reservations carry SLA tiers (platinum/gold/silver/bronze) that prevent noisy-neighbor contention, and fleet-wide health monitoring tracks utilization, temperature, and ECC errors across hosts. An agent can check capacity, reserve it with a tier and a duration, and release it — all over the same REST surface.

# Check capacity for a GPU workload
curl -X POST http://localhost:8080/api/v1/gpu-fabric/capacity/check \
  -H "Content-Type: application/json" \
  -d '{"gpu_count": 4, "min_vram_mb": 40960, "interconnect": "NvLink"}'

# Reserve GPU capacity with an SLA tier
curl -X POST http://localhost:8080/api/v1/gpu-fabric/capacity/reserve \
  -H "Authorization: Bearer your-key" \
  -d '{"gpu_count": 8, "sla_tier": "Premium", "duration_secs": 3600}'

Semantic GUI automation, not screen scraping

HyperMachine ships a desktop GUI with a typed automation API, so an agent drives the interface through semantic commands (gui.navigate, gui.dialog.open, gui.form.set_field) rather than pixel-based screen capture. This approach is deterministic, resolution-independent, and orders of magnitude faster than vision-based automation like Computer Use — the agent sets a field by name, not by hunting for it in a screenshot.

use hm_gui::{AutomationHandle, GuiCommand, DialogType, FormType};

let (handle, receiver) = AutomationHandle::new();

handle.open_dialog(DialogType::CreateVm)?;
handle.set_field(FormType::CreateVm, "name", "ai-sandbox")?;
handle.set_field(FormType::CreateVm, "cpus", 4)?;
handle.set_field(FormType::CreateVm, "memory_mb", 8192)?;
handle.execute(GuiCommand::SubmitDialog(DialogType::CreateVm))?;

Post-quantum crypto and an enterprise middleware stack

Alongside classical FIPS-approved algorithms (AES-GCM, RSA, ECDSA), HyperMachine ships the NIST-standardized post-quantum schemes — ML-KEM (Kyber) for key encapsulation, ML-DSA (Dilithium) for digital signatures, and SLH-DSA (SPHINCS+) for hash-based signatures — backed by the audited pure-Rust RustCrypto implementations, not placeholders. AES-GCM runs on the validated ring backend with AES-NI acceleration at roughly 9–10 GiB/s.

The REST API arrives with 28 composable middleware layers out of the box: rate limiting, circuit breakers, request-replay protection, tenant isolation, geo-IP enrichment, W3C trace propagation, HMAC payload signing, schema validation, slow-request detection, maintenance mode, and more — all configurable, all tested.


Pure Rust, zero placeholder stubs

HyperMachine is roughly 238,000 lines of Rust across 13 crates with zero todo!(), unimplemented!(), or placeholder stubs. The full stack — from the bare-metal boot sequence to REST middleware to GPU scheduling — is implemented in safe Rust with zero unsafe in business logic. It is covered by 4,600+ tests and is cargo clippy -D warnings clean, with dependency advisories triaged in deny.toml.

git clone https://github.com/nervosys/HyperMachine && cd HyperMachine

# Build (excludes the nightly-only Type-1 crates)
cargo build --release --workspace --exclude hv1-core --exclude hv1-boot

Try it

HyperMachine 1.1 is open source under AGPL-3.0, with a commercial license available for use without the copyleft obligations. If you are building autonomous AI systems that need real isolation, real GPUs, and an interface an agent can actually reason about, we would love your feedback.

Check out HyperMachine on GitHub.

Per aspera ad astra.

Intelligence Briefings

Subscribe for new dispatches

Research updates, technical deep-dives, and announcements from the frontier of embodied AI — delivered to your inbox.

Check your inbox to confirm your subscription.