submit(), subscribe()), your request enters the queue and benefits from automatic retries, status tracking, and durability. Direct methods (run(), stream()) bypass the queue and connect straight to a runner, which is faster but means no retries and no status polling. Both paths share the same runner and scaling infrastructure underneath.
Request Lifecycle (Queue-Based)
When usingsubmit() or subscribe(), a request moves through three states visible to callers via the queue status API. Direct calls via run() and stream() bypass the queue entirely and do not have these states.
| State | What is happening | Caller sees |
|---|---|---|
| IN_QUEUE | Request is waiting in the queue for an available runner | queue_position indicating how many requests are ahead |
| IN_PROGRESS | A runner is executing your endpoint handler | logs from your code (when enabled) |
| COMPLETED | Result is stored and ready for retrieval | Full response payload |
CANCELLATION_REQUESTED (202) or ALREADY_COMPLETED (400) rather than transitioning to a pollable status.
How Requests Flow
Submission
The caller submits a request via the SDK or REST API. fal assigns a
request_id and places the request in the persistent queue. The request enters IN_QUEUE state. By default there is no queue size limit and requests are never dropped. Callers can optionally set fal_max_queue_length to reject requests with 429 if the queue exceeds a threshold.Dispatch
The dispatcher checks for available IDLE runners. If a runner is free, the request is routed immediately and enters
IN_PROGRESS. If all runners are busy, the request waits in the queue while fal scales up new runners. Runners with matching routing hints are preferred when available.Processing
The runner receives the request as a standard HTTP call. Your endpoint handler runs, processes the input, and returns a response. The runner transitions from RUNNING back to IDLE. If the runner fails, the request is retried automatically.
Result
The response is stored and the request enters
COMPLETED. The caller retrieves the result by polling or streaming status, or receives it via webhook. For direct run() calls, the response is returned in the same HTTP connection.run(), submit(), or stream(). The queue and dispatch layer are transparent to your app code.
Requests and Retries
Retries only apply to queue-based requests. Direct calls viarun() and stream() return errors immediately with no retry.
When a runner fails while processing a queued request, the request is placed in a scheduled requeue with a backoff delay, then re-enters the queue and is dispatched to a different runner. This happens automatically for server errors (503), timeouts (504), and connection failures, up to 10 attempts. The retry is transparent to the caller — they continue polling the same request_id and eventually get a result or a final failure.
The start_timeout clock runs continuously across all retry attempts. If you set start_timeout=30 and the first attempt fails after 20 seconds, the second attempt only has 10 seconds left before the server returns 504. This prevents retries from running indefinitely.
You can control retry behavior at three levels: app-level with skip_retry_conditions, per-response with the X-Fal-Needs-Retry header, and per-request with the X-Fal-No-Retry header from the caller.