Free
Enough GPU-seconds to ship a real prototype. No card on file, no rate-limit traps, no expiring credits.
- 10,000 GPU-seconds / month
- Up to 2 concurrent sessions
- All models in the public catalog
- Community support
Billing is per GPU-second of streaming inference, metered second-by-second on the wire. Sessions that idle do not bill. Sessions that burst do not get throttled. There are no seats, no per-model surcharges, and no commitment minimums until you ask for one.
Enough GPU-seconds to ship a real prototype. No card on file, no rate-limit traps, no expiring credits.
Production-grade infrastructure with per-second billing. No committed minimums, no seat fees, no surprise overages.
Reserved capacity, custom regions, private model deployments, and a direct line to the team that wrote the kernels.
Rates shown are for the H100 80GB pool. B200 180GB pool runs at a 2.4× multiplier with proportional throughput. Per-second billing is metered server-side and reconciled to the GPU power telemetry — what you see on your invoice is what the silicon actually did.