Skip to main content

CPU Machine Types

For lightweight workloads that don’t require GPU acceleration — routing, preprocessing, API proxies.
Machine TypeRAMCPU Cores
XS512 MB0.5
S1 GB1
M2 GB2
L15 GB4
XL30 GB8

GPU Machine Types

Machine TypeVRAMRAMCPUBandwidthVideo Enc / Dec
GPU-RTX409024 GB48 GB121.0 TB/s2 / 2
GPU-RTX509032 GB60 GB141.8 TB/s3 / 3 (AV1)
GPU-A10040 GB60 GB122.0 TB/s— / 5
GPU-L4048 GB100 GB60.9 TB/s3 / 3 (AV1)
GPU-H10080 GB112 GB123.4 TB/s— / 7
GPU-H200141 GB112 GB124.8 TB/s— / 7
GPU-B200192 GB210 GB198.0 TB/s— / 7
Video encode/decode counts refer to hardware NVENC/NVDEC engines — dedicated hardware units that encode or decode video independently of the GPU’s compute cores. GPUs with encoders (RTX 5090, L40) can output video frames without using GPU compute time. GPUs marked -- for encode have no hardware encoder and require software encoding on the CPU.

Choosing a GPU

By VRAM requirement — pick the smallest GPU that fits your model: By workload type:
  • Image generation: RTX 4090 or RTX 5090 (cost-effective, good throughput)
  • Video generation: RTX 5090 (3 NVENC encoders, AV1 encode) or L40 (3 NVENC + 3 NVDEC)
  • LLM inference: H100 or H200 (high bandwidth, large VRAM)
  • Training: A100, H100, or H200 (depending on model size)
  • Largest models: B200 or multi-GPU H100/H200

Configuration

Set the machine type in your application:
class MyApp(fal.App):
    machine_type = "GPU-H100"
    num_gpus = 1

Multiple Machine Types

Allow your app to use multiple machine types for a larger pool of available machines:
class MyApp(fal.App):
    machine_type = ["GPU-H100", "GPU-A100"]
Machine types are tried in order. If the first type has no available capacity, the next is used.

Multi-GPU

For models that need more than one GPU:
class MyApp(fal.App):
    machine_type = "GPU-H100"
    num_gpus = 2

Multi-GPU Workloads

Learn how to distribute inference across multiple GPUs

Changing Machine Types

Via Code: Update machine_type and redeploy:
class MyApp(fal.App):
    machine_type = "GPU-A100"
fal deploy
machine_type is a code-specific parameter — it always comes from your code and resets on every deploy. See Scaling Configuration for details.