Pricing

Pay for the seconds you actually stream.

Billing is per GPU-second of streaming inference, metered second-by-second on the wire. Sessions that idle do not bill. Sessions that burst do not get throttled. There are no seats, no per-model surcharges, and no commitment minimums until you ask for one.

Build something

Free

credit included

Enough GPU-seconds to ship a real prototype. No card on file, no rate-limit traps, no expiring credits.

10,000 GPU-seconds / month
Up to 2 concurrent sessions
All models in the public catalog
Community support

Start building

Ship to users

Build

$0.00041

per H100-second

Production-grade infrastructure with per-second billing. No committed minimums, no seat fees, no surprise overages.

Unlimited concurrent sessions
Session affinity across 12 regions
99.95% streaming SLO
Same-day engineering support

Get an API key

Custom infrastructure

Scale

Custom

annual contract

Reserved capacity, custom regions, private model deployments, and a direct line to the team that wrote the kernels.

Reserved H100 / B200 pools
Private model fine-tunes
Custom region deployment
Dedicated solutions engineer

Talk to sales

Rates shown are for the H100 80GB pool. B200 180GB pool runs at a 2.4× multiplier with proportional throughput. Per-second billing is metered server-side and reconciled to the GPU power telemetry — what you see on your invoice is what the silicon actually did.