Production LLMs don't fit on a single gaming GPU. We shard them across a coalition of consumer cards, layer-pipelined over P2P — and pay every operator by the second.
Production LLMs don't fit on consumer GPUs — and datacenter cards that do cost $30k each and are perpetually back-ordered. So inference funnels to three hyperscalers, while hundreds of millions of capable consumer GPUs sit idle. Each one alone is too small to host a real model.
ComputePool splits a model layer-wise across two or more consumer GPUs. The entry shard runs the first half and streams hidden states over a P2P transport to the exit shard, which finishes the forward pass and samples the next token. No single card has to fit the whole model — the coalition does.
The orchestrator picks a coalition and tells each node which layer slice to load. Hidden states travel from entry to exit over a P2P transport (AXL); the sampled token comes back the same way. Payments ride alongside on x402 + Superfluid.
Each token requires a hidden-state hop forward and a sampled-token hop back. The orchestrator never touches activations — the P2P transport keeps the loop tight even on consumer hardware.

KeeperHub orchestrates on-chain workflows but had no native way to handle continuous payouts or multi-party operator commitments. We upstreamed both — Superfluid streams and a Coalition plugin with slashing — and unified them with x402 into one workflow primitive.

0G has high-throughput compute, but only datacenter-class GPUs qualify — and there's no native streaming-money primitive. We shipped both: deterministic Superfluid contracts via CREATE2, and an SDK that fuses consumer cards into one virtual compute target while preserving 0G's signing model.

AXL is Gensyn's P2P compute mesh. We turned it into a turnkey sharding fabric for AI: layer-pipelined inference over AXL, packaged in prebuilt Docker images, and meshed with Tailscale so operators never expose a public port.
Live on 0G testnet: Qwen3-4B running across two RTX 4090s. Hidden states cross the AXL transport every ~90ms; the Superfluid meter ticks every 50ms. Both shards earn while the request is open.
A 4B–8B open model in 2024 matches GPT-3.5 quality from 2023. Two prosumer GPUs can serve it together. The crypto crash left a glut of cards looking for a job. The pieces only just lined up.
The cloud sells you the only GPU big enough to fit your model. We turn the GPUs you already have into one big enough — together. The TAM is every consumer GPU not currently running production AI.
Every other "decentralized GPU" product still requires the model to fit on one host. Together and Bedrock just resell big-iron capacity. We're the only ones letting two consumer cards behave like one production GPU.
Push the sharded-inference frontier to 70B-class models on consumer rigs, scale to 250 operators across three regions, and grow streaming GMV to $40M/yr.