Genie 3
Real-time world model with persistent state across minutes of interaction.
Windflow is the inference platform for real-time video and world models — persistent session state, bidirectional streaming, sub-50 ms frame latency, delivered as a single API. The runtime is the boring part; we'd rather you spend your time on the application.
The first generation of generative video gave us beautiful frozen frames. The second gave us pre-rendered clips — motion implied, but the loop fixed. Both were cameras: capture, render, ship.
What is arriving now is a different instrument entirely. Models that respond mid-generation, hold state across thousands of frames, and answer a control signal in less time than a monitor takes to refresh. The shift is not faster video — it is video that reacts.
We think of it like the difference between a photograph of an aircraft and the wind tunnel that proved it could fly. One records; the other lets the world push back. The runtime for that medium is what we're building.
The last eighteen months produced a wave of models that respond mid-generation and hold state across thousands of frames. They are research artifacts — not APIs. Windflow exists to make this class of model something developers can actually build on.
Real-time world model with persistent state across minutes of interaction.
Causal diffusion transformer that produces video as a continuous stream, not a clip.
Autoregressive video diffusion proving sub-second control-to-frame loops are tractable.
A single uninterrupted session per clip — no pre-renders, no cherry-picked seeds, no offline upscaling. Hover the monitor to pause auto-advance.
An infinite open world streamed frame-by-frame from a short prompt. Terrain, foliage and lighting hold across thousands of frames because the session's KV cache lives on the GPU between requests.
Inference platforms today are optimized for chat completions — fire a prompt, return a response, end the session. Real-time video models invert almost every assumption these systems were built on.
| Axis | LLM inference | Batch diffusion | Windflow |
|---|---|---|---|
| Workload shape | Single prompt → single response | Batch of discrete generations | Persistent bidirectional session |
| Latency target | First-token in ~300 ms | Tens of seconds per image | Sub-50 ms per frame, sustained |
| State model | Stateless or short context window | Stateless | Consistent across 1000s of frames |
| Input channel | Text prompt at start | Prompt + seed | Continuous control: pose, action, audio |
| Optimization surface | KV cache, speculative decoding | Throughput batching | Custom CUDA + streaming schedulers |
Each layer is exposed as a primitive. Use them together as a managed runtime, or pull just the one you need into an existing stack.
A bidirectional WebSocket and WebRTC protocol that pipelines control inputs and generated frames in the same session. No polling, no awkward chunked HTTP, no batching latency tax.
A persistence layer purpose-built for spatial and temporal coherence. Models hold a consistent world across thousands of frames; you can branch, fork, and replay sessions.
Research-grade open-source models, rewritten to production. Custom CUDA kernels, torch.compile pipelines, quantization, and scheduler tricks that make the difference between a demo and a product.
Designed for a geographically distributed fleet of H100 / B200 class GPUs with session affinity. Users connect to the GPU closest to them; sessions migrate live without dropping a frame.
We're pre-launch, so these are commitments, not retroactive marketing. They're the numbers the architecture, the kernels, and the SDK were designed around. When the platform ships, our public benchmarks will be published against this same list, in the open.
The full SDK is a thin wrapper around our streaming protocol — explicit about what is happening, opinionated about what should be invisible. There is no GPU to choose, no model server to deploy, no retry logic to glue.
Interactive real-time video generation with infinite streaming. The Helios family is our target workload for game-studio integrations on Windflow.
High-fidelity environments with action-conditioned, real-time interactive output. Open weights, optimized end-to-end on Windflow.
Multi-shot interactive video generation. Hold narrative state across scene cuts while the user keeps steering.
Autoregressive video diffusion — low-latency text-to-video and frame-accurate restyle, background replace, virtual production.
Designed for studios prototyping titles where environments — terrain, weather, NPC behaviour — are generated frame-by-frame in response to player action. Windflow is the runtime we're building underneath.
A target workload for robotics and autonomy teams who want to train and evaluate policies inside generated worlds that respond to the same control signals as the physical fleet — in real time, in the loop.
For creative-software teams who want cursor-speed video transforms — restyle, replace, regenerate — behind their canvas, by streaming a Windflow session instead of operating a model server.
A small, senior team across streaming infrastructure, GPU systems, and developer platforms. Full bios go live with our public launch; in the meantime, the shape of the team we are assembling:
Previously built low-latency video infrastructure at hyperscale; deep WebRTC and CUDA background.
Shipped diffusion and DiT inference work in production; custom CUDA kernels and torch.compile pipelines.
Built and operated developer platforms used by tens of thousands of engineers.
We're opening the platform to a small number of design partners — teams building games, simulators, and creative tools where sub-50 ms is the difference between magic and unusable. Pricing will be per GPU-second, with no committed minimums.