01 / 14

↓ scroll · use arrow keys

ComputePool

Production inference
on the GPUs
you already own.

Production LLMs don't fit on a single gaming GPU. We shard them across a coalition of consumer cards, layer-pipelined over P2P — and pay every operator by the second.

Sharded inferencex402 paymentsSuperfluid streams

SERIES SEED · Q2 2026

Problem

A 70B model needs 140 GB.
A 4090 has 24.

Production LLMs don't fit on consumer GPUs — and datacenter cards that do cost $30k each and are perpetually back-ordered. So inference funnels to three hyperscalers, while hundreds of millions of capable consumer GPUs sit idle. Each one alone is too small to host a real model.

140 GB

VRAM to serve a 70B model in fp16

24 GB

VRAM on a flagship RTX 4090

$30k+

price tag of a single H100

~400M

consumer GPUs sitting idle worldwide

Solution

Shard the model. Cooperate. Serve.

ComputePool splits a model layer-wise across two or more consumer GPUs. The entry shard runs the first half and streams hidden states over a P2P transport to the exit shard, which finishes the forward pass and samples the next token. No single card has to fit the whole model — the coalition does.

01

Layer-wise sharding

A 12B model splits cleanly across two 24GB GPUs. Each operator only loads its slice — embeddings + early layers, or late layers + lm_head.

02

P2P hidden-state transport

Activations move directly between operators over an authenticated mesh — no orchestrator round-trip per token.

03

Pay per second of compute

x402 opens the session; Superfluid streams USDCx to every operator in the coalition while inference runs.

Architecture

One model, split in two.

The orchestrator picks a coalition and tells each node which layer slice to load. Hidden states travel from entry to exit over a P2P transport (AXL); the sampled token comes back the same way. Payments ride alongside on x402 + Superfluid.

entry shard · layers 0..mid

Holds embeddings + first half of transformer blocks. Outputs hidden states.

exit shard · layers mid..N

Finishes the forward pass + lm_head, samples the next token, ships it back.

x402 opens session ·Superfluid streams payouts

Request lifecycle

One token, end to end.

Each token requires a hidden-state hop forward and a sampled-token hop back. The orchestrator never touches activations — the P2P transport keeps the loop tight even on consumer hardware.

Innovation · Keeperhub

Upstream contributions

Keeperhub

We brought streaming money
to the workflow layer.

KeeperHub orchestrates on-chain workflows but had no native way to handle continuous payouts or multi-party operator commitments. We upstreamed both — Superfluid streams and a Coalition plugin with slashing — and unified them with x402 into one workflow primitive.

Superfluid plugin

Native Superfluid actions — open, update, close streams — callable from any Keeperhub workflow.

KeeperHub#1106· open↗

Coalition plugin

Multi-party on-chain commitments with slashing — N operators commit to serve a model; the keeper enforces and slashes any that breach.

KeeperHub#1105· open↗

x402 + streams

Atomic onboarding plus per-second metering, packaged as one workflow primitive. The shape every API economy lands on.

x402Superfluid

Innovation · 0G

Protocol contributions

Superfluid live on 0G.
Consumer GPUs unlocked.

0G has high-throughput compute, but only datacenter-class GPUs qualify — and there's no native streaming-money primitive. We shipped both: deterministic Superfluid contracts via CREATE2, and an SDK that fuses consumer cards into one virtual compute target while preserving 0G's signing model.

Superfluid on 0G

CREATE2-deployed, source-verified Superfluid contracts on 0G testnet — first per-second money streams the chain has ever had. Public, callable by anyone.

0G · CREATE2 verified· shipped↗

Pooled-GPU SDK

0G Compute mandates high-end GPUs; consumer cards are excluded. Our SDK pools them into one logical accelerator — small cards qualify together.

4× RTX 3090 → 1 virtual H100-class target

TEE orchestrator

The orchestrator runs inside a Trusted Execution Environment, so 0G's native signing & attestation flow stays intact end-to-end. No protocol downgrade for distributed inference.

SGX · attested

Innovation · AXL (Gensyn)

Transport + deployment

AXL · Gensyn

What Gensyn drew on the whiteboard,
we put in production.

AXL is Gensyn's P2P compute mesh. We turned it into a turnkey sharding fabric for AI: layer-pipelined inference over AXL, packaged in prebuilt Docker images, and meshed with Tailscale so operators never expose a public port.

Sharded inference over AXL

First production deployment of layer-pipelined LLM inference on the AXL transport. Hidden states cross AXL frames; sampled tokens come back the same way.

live · qwen-pool-1

One-line deploy

Prebuilt NVIDIA + CPU images bundle AXL, the worker, and CUDA. Operators run a single command — no toolchain.

$ docker compose up dis-com

Tailscale-native

AXL traffic rides a Tailscale mesh. Zero exposed ports, zero firewall edits — operators stay invisible to the public internet.

0 ports open · WireGuard mesh

Live · running on 0G testnet

A 4B model,
two consumer cards.

Live on 0G testnet: Qwen3-4B running across two RTX 4090s. Hidden states cross the AXL transport every ~90ms; the Superfluid meter ticks every 50ms. Both shards earn while the request is open.

1Entry shard loads layers 0..17 (≈4 GB VRAM)

2Exit shard loads layers 18..35 + lm_head

3Each token: one forward hop, one token back

411 tok/s sustained · meter stops on EOS

Live Stream · 0G testnetstreaming

user · 0x7a4f…c19e

0.5000

−0.0084 USDCx/s

node-a

0xaaa…1234

+0.0000

node-b

0xbbb…5678

+0.0000

Why now

Open models small enough to shard.
Consumer GPUs idle enough to host them.

A 4B–8B open model in 2024 matches GPT-3.5 quality from 2023. Two prosumer GPUs can serve it together. The crypto crash left a glut of cards looking for a job. The pieces only just lined up.

Open models hit production quality

Llama-3.2, Qwen3 — 4B/8B params clear the bar that needed 175B in 2022.

Idle prosumer GPUs everywhere

Post-mining 30/40-series cards and gaming rigs sit at <10% utilization.

Payment + transport rails ship

x402 (HTTP 402) and Superfluid streams on 0G — settle and meter at second resolution.

Market

Hundreds of millions
of stranded GPUs.

The cloud sells you the only GPU big enough to fit your model. We turn the GPUs you already have into one big enough — together. The TAM is every consumer GPU not currently running production AI.

$6.4B

serviceable obtainable market

Decentralized inference on consumer GPUs, 2027 (a16z, Multicoin)

Sharded inference on consumer GPUs (SAM)$6.4B

AI inference market (TAM)$42.1B

Idle prosumer GPU compute (asset value)$220B

Competition

The only network that shards.

Every other "decentralized GPU" product still requires the model to fit on one host. Together and Bedrock just resell big-iron capacity. We're the only ones letting two consumer cards behave like one production GPU.

ComputePoolAkashTogether AIAWS Bedrock

Shards across consumer GPUsYes — layer-pipelinedNo — one container, one hostNo — hosted-onlyNo — single big GPU

Min hardware per operator24 GB consumer cardWhole-host VM——

Pricing modelPer-second streamingPer-secondPer-tokenHourly + contracts

SettlementOn-chain (0G testnet)On-chainOff-chainNet-30 invoice

OperatorsPermissionlessPermissionlessHostedSingle vendor

Roadmap

From two-card splits to the full model frontier.

shipping

Q2 2026

2-way shards

Llama-3.2 / Qwen3-4B across pairs of 24 GB consumer cards

Q3 2026

N-way shards

4- and 8-way coalitions unlock 30B–70B class models on prosumer rigs

Q4 2026

Heterogeneous coalitions

Mix 3090s, 4090s, M-series Macs; orchestrator balances by VRAM + bandwidth

Q1 2027

Verifiable inference

zk-proofs of computation per shard — slashing for incorrect activations

The ask

$4M seed.
18 months.

Push the sharded-inference frontier to 70B-class models on consumer rigs, scale to 250 operators across three regions, and grow streaming GMV to $40M/yr.

55%

Sharding R&D · larger N-way splits

25%

Operator onboarding + GTM

20%

On-chain liquidity + audits

Contact

Founder

Philo & Freedan

hello@philotheephilix.in

Live product

computepool.vercel.app ↗

On-chain

0G testnet · 0xCp00…1ED

Try the live product →

Production inferenceon the GPUsyou already own.

A 70B model needs 140 GB.A 4090 has 24.

Shard the model. Cooperate. Serve.

One model, split in two.

One token, end to end.

We brought streaming moneyto the workflow layer.

Superfluid live on 0G.Consumer GPUs unlocked.

What Gensyn drew on the whiteboard,we put in production.

A 4B model,two consumer cards.

Open models small enough to shard.Consumer GPUs idle enough to host them.

Hundreds of millionsof stranded GPUs.