Skip to main content
When you deploy your own applications on fal Serverless, you are billed for the total time your runners are alive, measured per-second by machine type.

Billing by runner state

Every runner transitions through the states below during its lifecycle. You are billed for the states marked Yes at the per-second rate for your machine type.
StateBilledDescription
PENDINGNoWaiting to be scheduled on available hardware
DOCKER_PULLNoPulling your container image from the registry
SETUPYesRunning your setup() method — loading models, initializing resources
IDLEYesRunner is ready but waiting for requests (includes keep_alive time)
RUNNINGYesActively processing one or more requests
DRAININGYesFinishing in-flight requests before shutdown
TERMINATINGYesRunning your teardown() method
TERMINATEDNoRunner has stopped and resources are released
5xx errors (HTTP 500+) are also not charged. See Runners for full details on each state and transitions.

GPU count multiplier

Multi-GPU instances are billed as gpu_count x duration. For example, a runner using 2x A100 GPUs for 60 seconds is billed as 120 GPU-seconds.

Monitoring your usage