Runner Lifecycle and States
Runners transition through a sequence of states during their lifetime. The startup flow determines your cold start latency, and the shutdown flow determines how gracefully your app handles scale-down events.| State | Description |
|---|---|
| PENDING | Runner is waiting to be scheduled on available hardware |
| DOCKER_PULL | Pulling Docker image from registry (skipped when image is cached) |
| SETUP | Running setup() method, loading model and initializing resources |
| FAILURE_DELAY | Previous runner startup failed; delaying this runner’s start before retrying |
| IDLE | Ready and waiting for work, no active requests |
| RUNNING | Actively processing one or more requests |
| DRAINING | Finishing current requests, won’t accept new ones |
| TERMINATING | Shutting down, running teardown() if defined |
| TERMINATED | Runner has stopped and resources are released |
Startup
When demand increases, fal schedules a new runner. If the runner’s Docker image isn’t already cached, it’s pulled first (DOCKER_PULL). Then yoursetup() method runs to load models and initialize resources. Once setup completes and the health check passes, the runner enters IDLE and is ready to serve requests. The time from PENDING to IDLE is your cold start latency.
Startup Failure
If a runner crashes during startup orsetup() times out, the runner is terminated. To prevent a tight crash loop, the system applies an incremental backoff to all subsequent runner starts for the same app. When a new runner is about to start and previous runners have failed, it enters FAILURE_DELAY — a holding state where the runner waits before attempting allocation.
The backoff works as follows:
- Each subsequent runner start is delayed by an additional 30 seconds (e.g. 30s, 60s, 90s)
- The delay is capped at 10 minutes
- If any runner succeeds during the delay, all waiting runners are woken up immediately and the backoff resets
- Scheduling failures (when hardware isn’t available) also trigger a delay, but use a fixed 20-second wait instead of incremental backoff
PENDING and retries the full startup flow.
When you see runners in FAILURE_DELAY, check your setup() method and runner logs for errors. Common causes include missing model files, out-of-memory errors during model loading, and dependency issues.
Request Processing
When a request arrives, an IDLE runner transitions to RUNNING. After completing all requests, it returns to IDLE. Runners can handle multiple concurrent requests ifmax_multiplexing is set above 1.
Shutdown
Runners shut down through different paths depending on their state:keep_aliveexpiration: An idle runner with no in-flight requests is terminated directly. The system sends SIGTERM, runs yourteardown()method for cleanup, and then releases resources.- Scale-down with in-flight requests: The runner enters DRAINING. No new requests are routed, but existing requests continue processing. After requests complete, the runner enters TERMINATING, runs
teardown(), and is terminated. - Scale-down with no in-flight requests: The runner skips DRAINING and enters TERMINATING immediately.
- Manual stop/kill: You can terminate runners manually using
fal runners stoporfal runners kill, or from the dashboard. - Host maintenance: The runner may be terminated if the underlying host is being drained due to a maintenance event (GPU errors, networking issues, or scheduled infrastructure updates).
setup(), handle_exit(), teardown()), see App Lifecycle.
Understanding Requests
Request lifecycle, retry interaction, and how requests flow through the queue to runners
Caching
How fal’s multi-layer cache reduces cold start times
CLI: fal runners
List runners, filter by state, view history, and get runner details
App Analytics
Dashboard metrics for requests, runners, and performance