ComputePool
01 / 14
↓ scroll · use arrow keys
ComputePool

Production inference
on the GPUs
you already own.

Production LLMs don't fit on a single gaming GPU. We shard them across a coalition of consumer cards, layer-pipelined over P2P — and pay every operator by the second.

Sharded inferencex402 paymentsSuperfluid streams
SERIES SEED · Q2 2026
Problem

A 70B model needs 140 GB.
A 4090 has 24.

Production LLMs don't fit on consumer GPUs — and datacenter cards that do cost $30k each and are perpetually back-ordered. So inference funnels to three hyperscalers, while hundreds of millions of capable consumer GPUs sit idle. Each one alone is too small to host a real model.

140 GB
VRAM to serve a 70B model in fp16
24 GB
VRAM on a flagship RTX 4090
$30k+
price tag of a single H100
~400M
consumer GPUs sitting idle worldwide
HYPERSCALERmonthly billing · long contractsopaque pricing · vendor lock-indevnet 30devnet 30devnet 30devnet 30devnet 30$$$$$
Solution

Shard the model. Cooperate. Serve.

ComputePool splits a model layer-wise across two or more consumer GPUs. The entry shard runs the first half and streams hidden states over a P2P transport to the exit shard, which finishes the forward pass and samples the next token. No single card has to fit the whole model — the coalition does.

01
Layer-wise sharding
A 12B model splits cleanly across two 24GB GPUs. Each operator only loads its slice — embeddings + early layers, or late layers + lm_head.
02
P2P hidden-state transport
Activations move directly between operators over an authenticated mesh — no orchestrator round-trip per token.
03
Pay per second of compute
x402 opens the session; Superfluid streams USDCx to every operator in the coalition while inference runs.
Architecture

One model, split in two.

The orchestrator picks a coalition and tells each node which layer slice to load. Hidden states travel from entry to exit over a P2P transport (AXL); the sampled token comes back the same way. Payments ride alongside on x402 + Superfluid.

entry shard · layers 0..mid
Holds embeddings + first half of transformer blocks. Outputs hidden states.
exit shard · layers mid..N
Finishes the forward pass + lm_head, samples the next token, ships it back.
x402 opens session ·Superfluid streams payouts
QWEN3-4B-INSTRUCT · 36 LAYERS · 8.0 GBnode-a · entry shardRTX 4090 · 24 GBembed + layers 0..17VRAM4 / 24 GBnode-b · exit shardRTX 4090 · 24 GBlayers 18..35 + lm_headVRAM4 / 24 GBhidden states · AXL P2Phsampled token · AXL P2Porchestratorroutes · authenticates · settlesx402 voucherSuperfluid stream
Request lifecycle

One token, end to end.

Each token requires a hidden-state hop forward and a sampled-token hop back. The orchestrator never touches activations — the P2P transport keeps the loop tight even on consumer hardware.

ClientOrchestratorEntry shardExit shardPayment railsPOST /infer + x402 voucheropen Superfluid flowload layers 0..midload layers mid..Nhidden states · AXLsampled token · AXLnext hidden states (loop)EOS · final tokens200 OK · stream
Innovation · Keeperhub
Keeperhub
Upstream contributions
Keeperhub

We brought streaming money
to the workflow layer.

KeeperHub orchestrates on-chain workflows but had no native way to handle continuous payouts or multi-party operator commitments. We upstreamed both — Superfluid streams and a Coalition plugin with slashing — and unified them with x402 into one workflow primitive.

Superfluid plugin
Native Superfluid actions — open, update, close streams — callable from any Keeperhub workflow.
Coalition plugin
Multi-party on-chain commitments with slashing — N operators commit to serve a model; the keeper enforces and slashes any that breach.
x402 + streams
Atomic onboarding plus per-second metering, packaged as one workflow primitive. The shape every API economy lands on.
x402Superfluid
Innovation · 0G
0G
Protocol contributions
0G

Superfluid live on 0G.
Consumer GPUs unlocked.

0G has high-throughput compute, but only datacenter-class GPUs qualify — and there's no native streaming-money primitive. We shipped both: deterministic Superfluid contracts via CREATE2, and an SDK that fuses consumer cards into one virtual compute target while preserving 0G's signing model.

Superfluid on 0G
CREATE2-deployed, source-verified Superfluid contracts on 0G testnet — first per-second money streams the chain has ever had. Public, callable by anyone.
Pooled-GPU SDK
0G Compute mandates high-end GPUs; consumer cards are excluded. Our SDK pools them into one logical accelerator — small cards qualify together.
4× RTX 3090 → 1 virtual H100-class target
TEE orchestrator
The orchestrator runs inside a Trusted Execution Environment, so 0G's native signing & attestation flow stays intact end-to-end. No protocol downgrade for distributed inference.
SGX · attested
Innovation · AXL (Gensyn)
AXL · Gensyn
Transport + deployment
AXL · Gensyn

What Gensyn drew on the whiteboard,
we put in production.

AXL is Gensyn's P2P compute mesh. We turned it into a turnkey sharding fabric for AI: layer-pipelined inference over AXL, packaged in prebuilt Docker images, and meshed with Tailscale so operators never expose a public port.

Sharded inference over AXL
First production deployment of layer-pipelined LLM inference on the AXL transport. Hidden states cross AXL frames; sampled tokens come back the same way.
live · qwen-pool-1
One-line deploy
Prebuilt NVIDIA + CPU images bundle AXL, the worker, and CUDA. Operators run a single command — no toolchain.
$ docker compose up dis-com
Tailscale-native
AXL traffic rides a Tailscale mesh. Zero exposed ports, zero firewall edits — operators stay invisible to the public internet.
0 ports open · WireGuard mesh
Live · running on 0G testnet

A 4B model,
two consumer cards.

Live on 0G testnet: Qwen3-4B running across two RTX 4090s. Hidden states cross the AXL transport every ~90ms; the Superfluid meter ticks every 50ms. Both shards earn while the request is open.

1Entry shard loads layers 0..17 (≈4 GB VRAM)
2Exit shard loads layers 18..35 + lm_head
3Each token: one forward hop, one token back
411 tok/s sustained · meter stops on EOS
Live Stream · 0G testnetstreaming
user · 0x7a4f…c19e
0.5000
−0.0084 USDCx/s
node-a
0xaaa…1234
+0.0000
node-b
0xbbb…5678
+0.0000
Why now

Open models small enough to shard.
Consumer GPUs idle enough to host them.

A 4B–8B open model in 2024 matches GPT-3.5 quality from 2023. Two prosumer GPUs can serve it together. The crypto crash left a glut of cards looking for a job. The pieces only just lined up.

2022Crypto bust → idle gaming GPUs2024Open 4B/8B models match GPT-3.52024x402 standard ships2025Superfluid live on 0G2026ComputePool
Open models hit production quality
Llama-3.2, Qwen3 — 4B/8B params clear the bar that needed 175B in 2022.
Idle prosumer GPUs everywhere
Post-mining 30/40-series cards and gaming rigs sit at <10% utilization.
Payment + transport rails ship
x402 (HTTP 402) and Superfluid streams on 0G — settle and meter at second resolution.
Market

Hundreds of millions
of stranded GPUs.

The cloud sells you the only GPU big enough to fit your model. We turn the GPUs you already have into one big enough — together. The TAM is every consumer GPU not currently running production AI.

$6.4B
serviceable obtainable market
Decentralized inference on consumer GPUs, 2027 (a16z, Multicoin)
Sharded inference on consumer GPUs (SAM)$6.4B
AI inference market (TAM)$42.1B
Idle prosumer GPU compute (asset value)$220B
Competition

The only network that shards.

Every other "decentralized GPU" product still requires the model to fit on one host. Together and Bedrock just resell big-iron capacity. We're the only ones letting two consumer cards behave like one production GPU.

ComputePoolAkashTogether AIAWS Bedrock
Shards across consumer GPUsYes — layer-pipelinedNo — one container, one hostNo — hosted-onlyNo — single big GPU
Min hardware per operator24 GB consumer cardWhole-host VM
Pricing modelPer-second streamingPer-secondPer-tokenHourly + contracts
SettlementOn-chain (0G testnet)On-chainOff-chainNet-30 invoice
OperatorsPermissionlessPermissionlessHostedSingle vendor
Roadmap

From two-card splits to the full model frontier.

shipping
Q2 2026
2-way shards
Llama-3.2 / Qwen3-4B across pairs of 24 GB consumer cards
Q3 2026
N-way shards
4- and 8-way coalitions unlock 30B–70B class models on prosumer rigs
Q4 2026
Heterogeneous coalitions
Mix 3090s, 4090s, M-series Macs; orchestrator balances by VRAM + bandwidth
Q1 2027
Verifiable inference
zk-proofs of computation per shard — slashing for incorrect activations
The ask

$4M seed.
18 months.

Push the sharded-inference frontier to 70B-class models on consumer rigs, scale to 250 operators across three regions, and grow streaming GMV to $40M/yr.

55%
Sharding R&D · larger N-way splits
25%
Operator onboarding + GTM
20%
On-chain liquidity + audits
Contact
Founder
Philo & Freedan
hello@philotheephilix.in
On-chain
0G testnet · 0xCp00…1ED
Try the live product →