How the Queue Works
When you submit a request toqueue.fal.run, it enters a persistent, durable queue. The request moves through three states before completion:
Request Lifecycle
| Status | SDK Type (Python / JS) | What is happening |
|---|---|---|
IN_QUEUE | Queued(position) / "IN_QUEUE" | Request is received and stored. Waiting for an available runner. |
IN_PROGRESS | InProgress(logs) / "IN_PROGRESS" | fal’s dispatcher has routed the request to a runner. |
COMPLETED | Completed(logs, metrics) / "COMPLETED" | Result is stored and available for retrieval, or sent to your webhook. |
Key Guarantees
Requests in the queue are never dropped. If no runners are available, your request waits while fal scales up new runners automatically. There is no queue size limit. If a runner fails during processing (503, 504, or a connection error), the request is automatically re-queued and retried up to 10 times. As demand grows, runners scale up to match. When demand drops, they scale back down, so you only pay for compute you use.Submit a Request
Usesubmit to send a request to the queue and return immediately. In Python, submit() returns a SyncRequestHandle object with methods for status, result, and cancel (submit_async() returns an AsyncRequestHandle). In JavaScript, submit() returns an object with the request_id, and you use separate fal.queue.* functions. In the REST API, the response includes URLs for each operation.
request_id and convenience URLs for tracking the request:
request_id if you need to check status or retrieve results later, even from a different process.
Check Status
Poll for the current state of the request. Passwith_logs=True (Python) or logs: true (JS) or ?logs=1 (REST) to include runner log output.
| Field | Description |
|---|---|
queue_position | Number of requests ahead of yours (only when IN_QUEUE) |
logs | Array of log messages from the runner (when logs are enabled) |
metrics.inference_time | Seconds the runner spent processing (only when COMPLETED) |
error | A human-readable error message, present only if the request failed (only when COMPLETED) |
error_type | A machine-readable error type string, present only if the request failed. See Request Error Types for the full list of values. |
Stream Status Updates
Instead of polling manually, you can stream status updates continuously until the request completes. In Python,iter_events() polls and yields each status. In JavaScript, streamStatus() opens an SSE connection.
text/event-stream with SSE events. Each event is a JSON status object in the same format as the polling endpoint. The connection stays open until the status reaches COMPLETED.
Get the Result
Retrieve the final output once the request is complete. In Python,get() polls internally until Completed then fetches the response. In JavaScript, call fal.queue.result() after the status is COMPLETED.
video object, and audio/speech models return an audio_url or audio object. Check the model’s API page (e.g., FLUX.1 schnell API) for the exact output schema.
All media URLs in responses (
https://v3.fal.media/...) are publicly accessible and subject to your media expiration settings. Download files you need to keep before they expire.Cancel a Request
Cancel a request. What happens depends on the request’s state:- Still in the queue (IN_QUEUE): The request is removed immediately and is never processed.
- Already being processed (IN_PROGRESS): fal sends a cancellation signal to the runner. The request may still complete if the app does not handle cancellation. Whether the running code actually stops depends on whether the app has implemented a cancel endpoint.
| HTTP Status | JSON Body | Meaning |
|---|---|---|
202 Accepted | {"status": "CANCELLATION_REQUESTED"} | Cancel accepted. The request may still complete if it was already mid-processing. |
400 Bad Request | {"status": "ALREADY_COMPLETED"} | The request already finished before the cancel arrived. |
404 Not Found | {"status": "NOT_FOUND"} | No request exists with that ID. |
handler.cancel() succeeds silently on 202 and raises an exception on 400 or 404.
Webhooks
Instead of polling, configure fal to POST results directly to your server when a request completes.status is "OK" for successful responses (HTTP 200) or "ERROR" for failures — this is different from the queue status values (IN_QUEUE, IN_PROGRESS, COMPLETED). Return 200 quickly to acknowledge the webhook. fal may retry failed deliveries, so use request_id for idempotency. See Webhooks for full details on payload format, retries, and signature verification.
submit() Parameters
path
Endpoint path appended to the model ID. Most models expose a single root endpoint, so you can leave this empty. Use it when a model or your own app defines additional endpoints at sub-paths.
start_timeout
Server-side deadline in seconds, sent as the X-Fal-Request-Timeout header. Despite the header name, this is a time-to-start deadline, not a total request deadline. The clock starts when the request is submitted and covers queue wait, runner acquisition, and failed retry attempts. Once a runner successfully begins processing your request, the timeout stops and inference can run as long as it needs.
If a runner picks up the request but fails (e.g., returns 503 or crashes), the request goes back to the queue for a retry — and the clock keeps ticking. All queue waits, runner acquisition, and failed attempts count against the same single deadline. If the deadline is reached before any runner successfully starts processing, the server returns 504 Gateway Timeout with the header X-Fal-Request-Timeout-Type: user and no further retries occur.
This timeout does not limit how long inference takes. Once a runner starts processing, the request runs to completion. The maximum inference time is controlled by the app’s
request_timeout setting (default 3600s), which is configured by the app developer, not the caller. If you need a total client-side deadline that includes processing time, use client_timeout on subscribe().hint
Routing hint sent as the X-Fal-Runner-Hint header. When you pass a hint string, fal tries to route the request to the same runner that handled a previous request with the same hint. This is useful for session affinity — for example, keeping requests pinned to a runner that already has a specific model or adapter loaded in memory. For serverless apps that serve multiple models, your app can implement provide_hints() on the server side to tell fal what each runner is specialized for. See Optimize Routing Behavior for the full pattern.
priority
Queue priority for the request, sent as the X-Fal-Queue-Priority header. Accepts "normal" (default) or "low".
Priority applies to the per-endpoint queue — every request to the same endpoint shares one queue, regardless of who sent it. A low-priority request sits behind all normal-priority requests in that queue and is only processed once no normal requests are waiting. This means setting priority="low" on a shared model API (like fal-ai/flux/dev) deprioritizes your request relative to all other users of that model.
Low priority is most useful for your own deployed serverless apps where you control all traffic. For example, you might submit user-facing requests at normal priority and background batch jobs at low priority so interactive requests are always served first.
webhook_url
URL where fal sends a POST request with the result when processing completes. When set, you don’t need to poll for status — the result arrives at your server automatically. The webhook payload includes the request_id, status, and the full model output. See Webhooks for payload format, retries, and signature verification.
headers
Additional HTTP headers passed with the request. Use this to set platform-level headers like X-Fal-Store-IO (disable payload storage), X-Fal-No-Retry (disable retries), or X-Fal-Object-Lifecycle-Preference (control media expiration).
subscribe() Parameters
subscribe() accepts all of the submit() parameters above (path, start_timeout, hint, priority, webhook_url, headers), plus the following:
client_timeout
Client-side deadline in seconds (Python SDK only). Limits the total time the client waits for the result, including queue wait and processing. When exceeded, the client stops polling and raises a FalClientTimeoutError. The request may still be processing on the server.
start_timeout: If you set client_timeout without setting start_timeout, the SDK automatically sets start_timeout = client_timeout so the server also respects your deadline. If you explicitly set start_timeout to a value larger than client_timeout, the SDK emits a warning because the server-side timeout would never be reached before the client gives up.
| Timeout | Enforced by | Affects server | Use when |
|---|---|---|---|
start_timeout / X-Fal-Request-Timeout | Server | Yes (returns 504, stops retries) | You want the server to cancel the request |
client_timeout | Client | No (request may continue) | You want a local deadline without affecting the server |
Disabling Retries
By default, fal automatically retries queue requests that fail due to server errors, timeouts, or rate limits. If you need to disable retries for a specific request, pass theX-Fal-No-Retry header when submitting:
1, true, or yes, fal will not retry the request even if it fails due to a retryable error.