Announcing SPINE: an agentic-first web stack — and an honest way to measure one
RELEASE

Announcing SPINE: an agentic-first web stack — and an honest way to measure one

Adam EricksonAdam EricksonJune 9, 20265 min read
Listen to this article
0:00
--:--

The web was built for people reading documents. Agents aren't people, and they aren't reading. They stream tokens, call tools, advertise capabilities, ship embeddings, and coordinate in swarms — and they do all of it over a transport stack designed for browsers fetching HTML.

Today we're announcing SPINE (Synaptic Pathways INterconnecting Entities), an agentic-first web stack that treats the things modern LLM agents actually need as first-class wire primitives rather than JSON glued onto HTTP after the fact. Alongside it, we're shipping a major update to agentic-eval, our open-source benchmark for scoring how well a web stack fits agentic use.

Both are open source. SPINE is dual-licensed under AGPL-3.0-or-later with a commercial option; agentic-eval is AGPL-3.0-or-later.


What SPINE is

SPINE is a Rust workspace (28 crates) that makes agent-native operations first-class Message variants at the wire level:

  • Token streamingStreamStartStreamToken { seq, data, usage? }StreamEnd, where StreamData carries Text | Bytes | ToolCall | Encoded(EncodedFrame). Latents and mid-stream function calls fall out of the same frame. Multiplex-aware StreamCancel cancels one stream by id instead of bluntly closing the socket like SSE.
  • Tool calling (MCP-shaped)ToolCall → ToolResult, mapping cleanly to Anthropic MCP and OpenAI function calling.
  • Capability discovery — a native CapabilityQuery (exact / prefix / semantic-by-embedding / all) → CapabilityAdvertisement with JSON Schema per capability.
  • Distributed tracing — W3C TraceContext attached inline on tool calls, results, and stream starts.
  • A neural encoder-decoder protocol — self-describing EncodedFrames that carry a codec id, modality, shape, and dtype inline, so a latent is its own schema.

And because the rest of the world speaks other protocols, SPINE ships three deployable bridges: a runnable MCP stdio server, an OpenAI-compatible /v1/chat/completions + /v1/embeddings gateway, and a reflection-enabled gRPC AgentService. A standard MCP, OpenAI, or gRPC client can drive a SPINE agent today with stock stubs.


agentic-eval: scoring a stack for agents, not browsers

agentic-eval ranks seven web stacks — SPINE, the OpenAI API, the Anthropic API, MCP, gRPC, plain HTTP+JSON, and GraphQL — on five agent-native axes:

  1. streaming — is LLM-shaped output a first-class frame, or a bolt-on?
  2. tool-discoverability — can an agent introspect capabilities from the protocol, or must it read prose?
  3. encoding-efficiency — wire compactness for the LLM/tool-call workload.
  4. interop — does the agent ecosystem actually speak this? (Network effects are real and we score them honestly.)
  5. security-primitives — auth, tracing, integrity, and per-message identity carried by the protocol itself.

Composite fitness is the unweighted mean of the five. Here's where the stacks land:

StackFitnessStreamingToolsEncodingInteropSecurity
SPINE0.900.980.950.950.670.95
gRPC0.830.700.850.950.850.80
openai-api0.690.850.700.351.000.55
anthropic-api0.660.850.700.350.850.55
graphql0.600.500.950.350.750.45
mcp0.560.400.950.400.650.40
http-json0.540.550.400.301.000.45

SPINE leads the composite, edging gRPC by +0.07 and the OpenAI API by +0.21. It is strongest on the axes it was designed for and at protobuf-parity on encoding.

And here's the axis we did not dress up: interop, at 0.67 — SPINE's weakest score. The bridges map the agentic surface, not SPINE's native binary frames, and a brand-new protocol has ~zero native install base. The OpenAI API scores a perfect 1.00 there for a reason. Closing that gap is a publish-and-get-users problem that no amount of code in the repo can fake.


The numbers

SPINE is fast, and we can show it against a real modern protocol. All figures below were re-measured on 2026-06-08 against the actual h2 HTTP/2 crate and real serde_json, on TCP loopback — not hand-rolled baselines.

vs real HTTP/2:

  • Single-stream latency: 1.6–2.4× faster
  • Single-stream throughput: 1.8–2.3× higher
  • N=64 pipelined multiplexing: ~32× (≈1.3M requests/sec on one connection)

Agentic workloads (vs HTTP/2 + JSON):

  • Embedding batches (1536-dim, RAG / fleet broadcast): ~6–25×
  • LLM token streaming: hundreds of millions of tokens/sec, 9–15× over HTTP/2+binary at large batches — where OpenAI-style JSON-SSE caps near ~10M tok/s and collapses on big batches.

These are loopback medians: the direction and order of magnitude reproduce run-to-run, but absolute peaks are bandwidth- and scheduler-bound and vary by machine. We say so, in the README and in the paper.


The neural codec, and what it actually costs

SPINE makes the latent form a first-class payload. TitansLatentCodec (a real Titans Neural Long-Term Memory projector, not a stub) turns text into a fixed-width latent and frames it as a self-describing EncodedFrame. We benchmarked it for agentic use:

  • On the wire, it's compact: the frame is 66–71% smaller than its JSON form (dim 256: 1241 B vs 3942 B; dim 1024: 4314 B vs 14803 B), because the latent rides as a CBOR byte string instead of a JSON float array.
  • The encode is a real forward pass. Cost is superlinear in width — ~94 µs at dim 128, ~847 µs at 256, ~3.1 ms at 512, ~26 ms at 1024. That's the honest price of a semantic projection, paid once at the sender. It is not a memcpy, and we don't pretend it is.

It is also inherently encrypted. Because the latent is the output of a specific trained projector, the model weights are the key. Encoding text into an EncodedFrame and decoding it back both require the exact TitansLatentCodec weights used to produce it; without them, the frame is not recoverable to text or to a usable embedding. There is no separate cipher to negotiate and no key to ship out of band — a captured stream is opaque to anyone who does not hold the trained weights, and rotating the weights rotates the key.

That cost/benefit lives in agentic-eval's evidence too: the wire-size win backs SPINE's 0.95 encoding score; the encode latency is recorded right next to it.


What changed in agentic-eval

This release (v0.14.x) re-scored SPINE after recent protocol work and, just as importantly, anchored every SPINE evidence string in a runnable benchmark:

  • the transport head-to-head (spine_vs_http2, agentic_ai_workload, llm_tok_per_sec),
  • the neural codec (neural_codec_bench),
  • the wire-size measurements (wire_sizes).

Scores moved only where real capability moved (the gRPC bridge maturing lifted interop 0.15 → 0.67 across releases). The benchmarks substantiate the scores; they were not used to inflate them. Directional tests assert the judgments that should hold — e.g. gRPC's install base beats SPINE's on interop, and SPINE's per-message Ed25519 signatures beat channel-only mTLS on security — and they still pass.


Try it

# Clone and build
git clone https://github.com/nervosys/SPINE
cargo build --release

# Reproduce the benchmarks yourself
cargo bench -p spine-transport --bench spine_vs_http2
cargo bench -p spine-transport --bench agentic_ai_workload
cargo bench -p spine-transport --bench llm_tok_per_sec
cargo bench -p spine-protocol  --bench neural_codec_bench

# Score the web stacks
cargo run -p agentic-eval --example web_benchmark

For commercial licensing (closed-source SaaS, on-prem, embedded), contact opensource@nervosys.ai.


What's next

The honest read on SPINE today: it wins the agent-native axes it was built for, it's at protobuf-parity on encoding with a latent data plane nothing else has natively, and its one real weakness is adoption. The transport is never the bottleneck — an LLM generates 50–200 tokens/sec/user; SPINE moves them by the hundreds of millions. The work that matters now isn't another order of magnitude on a microbenchmark. It's getting the bridges into real agent runtimes and turning a 0.67 interop score into an earned one.

We'll keep measuring it the same way: in public, reproducibly, and with the numbers we can actually stand behind.

Per aspera ad astra.

Intelligence Briefings

Subscribe for new dispatches

Research updates, technical deep-dives, and announcements from the frontier of embodied AI — delivered to your inbox.

Check your inbox to confirm your subscription.