Skip to main content
A runner is a compute instance of your application running on fal’s infrastructure. When you deploy an app, fal creates runners on demand to handle incoming requests. Each runner is tied to a specific machine type that determines its hardware resources (CPU cores, RAM, GPU type and count). Runners automatically start when requests arrive and shut down when idle, so you only pay for compute you actually use. Understanding the runner lifecycle is essential for making good decisions about scaling, cold start optimization, and cost management. Every scaling parameter, every caching strategy, and every startup optimization ultimately affects how runners behave. This page covers what happens from the moment a runner is scheduled to the moment it shuts down. For how requests flow through the queue and interact with runners, see Understanding Requests.

Runner Lifecycle and States

Runners transition through a sequence of states during their lifetime. The startup flow determines your cold start latency, and the shutdown flow determines how gracefully your app handles scale-down events.
StateDescription
PENDINGRunner is waiting to be scheduled on available hardware
DOCKER_PULLPulling Docker image from registry (skipped when image is cached)
SETUPRunning setup() method, loading model and initializing resources
FAILURE_DELAYPrevious runner startup failed; delaying this runner’s start before retrying
IDLEReady and waiting for work, no active requests
RUNNINGActively processing one or more requests
DRAININGFinishing current requests, won’t accept new ones
TERMINATINGShutting down, running teardown() if defined
TERMINATEDRunner has stopped and resources are released

Startup

When demand increases, fal schedules a new runner. If the runner’s Docker image isn’t already cached, it’s pulled first (DOCKER_PULL). Then your setup() method runs to load models and initialize resources. Once setup completes and the health check passes, the runner enters IDLE and is ready to serve requests. The time from PENDING to IDLE is your cold start latency.

Startup Failure

If a runner crashes during startup or setup() times out, the runner is terminated. To prevent a tight crash loop, the system applies an incremental backoff to all subsequent runner starts for the same app. When a new runner is about to start and previous runners have failed, it enters FAILURE_DELAY — a holding state where the runner waits before attempting allocation. The backoff works as follows:
  • Each subsequent runner start is delayed by an additional 30 seconds (e.g. 30s, 60s, 90s)
  • The delay is capped at 10 minutes
  • If any runner succeeds during the delay, all waiting runners are woken up immediately and the backoff resets
  • Scheduling failures (when hardware isn’t available) also trigger a delay, but use a fixed 20-second wait instead of incremental backoff
After the delay, the runner transitions back to PENDING and retries the full startup flow. When you see runners in FAILURE_DELAY, check your setup() method and runner logs for errors. Common causes include missing model files, out-of-memory errors during model loading, and dependency issues.

Request Processing

When a request arrives, an IDLE runner transitions to RUNNING. After completing all requests, it returns to IDLE. Runners can handle multiple concurrent requests if max_multiplexing is set above 1.

Shutdown

Runners shut down through different paths depending on their state:
  • keep_alive expiration: An idle runner with no in-flight requests is terminated directly. The system sends SIGTERM, runs your teardown() method for cleanup, and then releases resources.
  • Scale-down with in-flight requests: The runner enters DRAINING. No new requests are routed, but existing requests continue processing. After requests complete, the runner enters TERMINATING, runs teardown(), and is terminated.
  • Scale-down with no in-flight requests: The runner skips DRAINING and enters TERMINATING immediately.
  • Manual stop/kill: You can terminate runners manually using fal runners stop or fal runners kill, or from the dashboard.
  • Host maintenance: The runner may be terminated if the underlying host is being drained due to a maintenance event (GPU errors, networking issues, or scheduled infrastructure updates).
For details on startup and shutdown hooks (setup(), handle_exit(), teardown()), see App Lifecycle.