Pricing

Pay for the seconds you actually stream.

Billing is per GPU-second of streaming inference, metered second-by-second on the wire. Sessions that idle do not bill. Sessions that burst do not get throttled. There are no seats, no per-model surcharges, and no commitment minimums until you ask for one.

Build something

Free

$0
credit included

Enough GPU-seconds to ship a real prototype. No card on file, no rate-limit traps, no expiring credits.

  • 10,000 GPU-seconds / month
  • Up to 2 concurrent sessions
  • All models in the public catalog
  • Community support
Start building
Ship to users

Build

$0.00041
per H100-second

Production-grade infrastructure with per-second billing. No committed minimums, no seat fees, no surprise overages.

  • Unlimited concurrent sessions
  • Session affinity across 12 regions
  • 99.95% streaming SLO
  • Same-day engineering support
Get an API key
Custom infrastructure

Scale

Custom
annual contract

Reserved capacity, custom regions, private model deployments, and a direct line to the team that wrote the kernels.

  • Reserved H100 / B200 pools
  • Private model fine-tunes
  • Custom region deployment
  • Dedicated solutions engineer
Talk to sales

Rates shown are for the H100 80GB pool. B200 180GB pool runs at a 2.4× multiplier with proportional throughput. Per-second billing is metered server-side and reconciled to the GPU power telemetry — what you see on your invoice is what the silicon actually did.