CPU Machine Types
For lightweight workloads that don’t require GPU acceleration — routing, preprocessing, API proxies.| Machine Type | RAM | CPU Cores |
|---|---|---|
| XS | 512 MB | 0.5 |
| S | 1 GB | 1 |
| M | 2 GB | 2 |
| L | 15 GB | 4 |
| XL | 30 GB | 8 |
GPU Machine Types
| Machine Type | VRAM | RAM | CPU | Bandwidth | Video Enc / Dec |
|---|---|---|---|---|---|
| GPU-RTX4090 | 24 GB | 48 GB | 12 | 1.0 TB/s | 2 / 2 |
| GPU-RTX5090 | 32 GB | 60 GB | 14 | 1.8 TB/s | 3 / 3 (AV1) |
| GPU-A100 | 40 GB | 60 GB | 12 | 2.0 TB/s | — / 5 |
| GPU-L40 | 48 GB | 100 GB | 6 | 0.9 TB/s | 3 / 3 (AV1) |
| GPU-H100 | 80 GB | 112 GB | 12 | 3.4 TB/s | — / 7 |
| GPU-H200 | 141 GB | 112 GB | 12 | 4.8 TB/s | — / 7 |
| GPU-B200 | 192 GB | 210 GB | 19 | 8.0 TB/s | — / 7 |
-- for encode have no hardware encoder and require software encoding on the CPU.
Choosing a GPU
By VRAM requirement — pick the smallest GPU that fits your model:- 24 GB (RTX 4090): SDXL, Flux, most LoRA fine-tunes
- 32 GB (RTX 5090): Larger diffusion models, video generation with AV1 hardware encode
- 40 GB (A100): General-purpose training and inference at a lower price point than Hopper GPUs
- 48 GB (L40): AI inference combined with video transcoding and graphics rendering
- 80 GB (H100): LLM inference and training, NVLink 4.0 at 900 GB/s for multi-GPU scaling
- 141 GB (H200): Large models and long-context workloads on a single GPU — 76% more memory and 43% more bandwidth than H100
- 192 GB (B200): Maximum memory and compute for the largest models, FP4/FP6/FP8 precision support
- Image generation: RTX 4090 or RTX 5090 (cost-effective, good throughput)
- Video generation: RTX 5090 (3 NVENC encoders, AV1 encode) or L40 (3 NVENC + 3 NVDEC)
- LLM inference: H100 or H200 (high bandwidth, large VRAM)
- Training: A100, H100, or H200 (depending on model size)
- Largest models: B200 or multi-GPU H100/H200
Configuration
Set the machine type in your application:Multiple Machine Types
Allow your app to use multiple machine types for a larger pool of available machines:Multi-GPU
For models that need more than one GPU:Multi-GPU Workloads
Learn how to distribute inference across multiple GPUs
Changing Machine Types
Via Code: Updatemachine_type and redeploy:
machine_type is a code-specific parameter — it always comes from your code and resets on every deploy. See Scaling Configuration for details.