Skip to main content
fal Compute gives you dedicated GPU instances that run continuously under your control. Unlike Serverless, where runners scale automatically and you pay per-second of execution, Compute instances stay running for as long as you need them. You SSH in, install your stack, and use the GPU directly. This makes Compute the right choice for training runs, long-running fine-tuning jobs, batch processing, and any workload where you need sustained, predictable access to hardware rather than on-demand scaling. Compute runs on NVIDIA H100 SXM GPUs with high-speed SSD storage. For distributed workloads, 8-GPU instances can be provisioned in the same sector, connecting them over InfiniBand for low-latency multi-node communication. Instances are billed per-hour at fixed rates, so your costs are predictable regardless of how you use the GPU. For workloads that are better served by autoscaling and pay-per-use, see Serverless instead.

Instance Types

Two instance types are available, both built on H100 SXM GPUs:
Instance TypeGPUsvCPURAMVRAMStorage
1xH100-SXM116200 GB80 GB1 TB SSD
8xH100-SXM81281,600 GB640 GB8 TB SSD
The 1xH100 is suited for development, single-GPU fine-tuning, and inference workloads. The 8xH100 is designed for large-scale training, multi-GPU inference, and distributed computing. Resources scale proportionally: the 8-GPU instance has 8x the CPU cores, memory, VRAM, and storage of the single-GPU instance.

Multi-Node and InfiniBand

When you need to distribute a workload across multiple machines, provision 8xH100 instances in the same sector. Instances within a sector are connected over InfiniBand, providing ultra-low latency and high bandwidth for frameworks like PyTorch DDP, DeepSpeed, and Horovod.
InfiniBand and sector placement are only available on 8xH100 instances. 1xH100 instances run as standalone machines without inter-node connectivity.

When to Use Compute vs Serverless

The two products serve different workload profiles:
ComputeServerless
BillingPer-hour, fixed ratePer-second of runner lifetime
ScalingManual (you manage instances)Automatic (runners scale with traffic)
AccessFull SSH access to the machineCode runs inside managed runners
Best forTraining, fine-tuning, batch jobs, researchAPI endpoints, on-demand inference, autoscaling
Cold startsNone (instance is always running)Yes (new runners need startup time)
Use Compute when you need sustained GPU access for hours or days at a time. Use Serverless when you need an API that scales to zero and handles traffic spikes automatically.

Getting Started

Provisioning an instance takes about 2-3 minutes. You choose an instance type, select a sector (for multi-node setups), paste your SSH public key, and click create. Once the instance is ready, you SSH in and have full control.