Skip to main content
When you call a model or your own deployed app on fal, you can pass platform-level HTTP headers that control how the request is handled. These headers are separate from the model’s input arguments (like prompt or image_size) and from SDK method parameters (like start_timeout or client_timeout). They apply at the infrastructure level — controlling retries, payload storage, media expiration, and routing. Some of these headers have dedicated SDK parameters that set them automatically. For example, passing start_timeout=30 in the SDK sets X-Fal-Request-Timeout: 30 under the hood. Others, like X-Fal-Store-IO, can only be set via the headers dict. This page documents all platform headers in one place. For headers that have SDK parameters, the corresponding method pages are linked.

X-Fal-Request-Timeout (start_timeout)

Server-side time-to-start deadline in seconds. Despite the header name, this does not limit total request time. The clock starts when the request is submitted and covers queue wait, runner acquisition, and failed retry attempts. Once a runner successfully begins processing, the timeout stops and inference can run as long as it needs. If the deadline is reached before any runner starts processing, the server returns 504 Gateway Timeout with X-Fal-Request-Timeout-Type: user. To limit total client-side wait time (including processing), use client_timeout on subscribe() instead.
HeaderX-Fal-Request-Timeout
DefaultNo timeout
Minimum> 0.1 seconds
SDK parameterstart_timeout on submit(), subscribe(), and run()

X-Fal-Runner-Hint (hint)

Routing hint that tells fal to try to route the request to a specific runner. Useful for session affinity — for example, keeping requests pinned to a runner that already has a LoRA adapter or conversation state loaded in memory. If the hinted runner is unavailable, fal routes to any available runner.
HeaderX-Fal-Runner-Hint
DefaultAutomatic routing
SDK parameterhint on submit(), subscribe(), and run()

X-Fal-Queue-Priority (priority)

Queue priority for the request. Priority applies to the per-endpoint queue — every request to the same endpoint shares one queue, regardless of who sent it. A low-priority request sits behind all normal-priority requests. This means setting "low" on a shared model API deprioritizes your request relative to all other users of that model.
HeaderX-Fal-Queue-Priority
Default"normal"
Values"normal", "low"
SDK parameterpriority on submit() and subscribe()

X-Fal-Object-Lifecycle-Preference

Control how long generated files (images, videos, audio) are stored on fal’s CDN.
HeaderX-Fal-Object-Lifecycle-Preference
DefaultYour account setting (forever if not configured)
FormatJSON: {"expiration_duration_seconds": <seconds>}
import json

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={
        "X-Fal-Object-Lifecycle-Preference": json.dumps({
            "expiration_duration_seconds": 3600
        })
    }
)

Data Retention & Storage

Full guide to media expiration, payload retention, and the delete API

X-Fal-Store-IO

Prevent fal from storing request payloads (JSON inputs and outputs). Payloads are stored for 30 days by default and power the request history in your dashboard.
HeaderX-Fal-Store-IO
Default"1" (stored for 30 days)
Values"0" to disable storage
result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"X-Fal-Store-IO": "0"}
)
This only prevents storage of the JSON payloads. CDN files generated during processing are still accessible (subject to media expiration settings).

X-Fal-No-Retry

Disable automatic retries for this request. By default, queue-based requests are retried for up to 10 total attempts on server errors (503, 504, connection errors).
HeaderX-Fal-No-Retry
DefaultRetries enabled
Values"1", "true", "yes" to disable
result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"X-Fal-No-Retry": "1"}
)

Reliability & Retries

Learn more about automatic retries, fallbacks, and error handling

x-app-fal-disable-fallback

Disable automatic model fallbacks for this request. By default, fal may reroute requests to equivalent alternative endpoints if the primary is unavailable.
Headerx-app-fal-disable-fallback
DefaultFallbacks enabled
result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"x-app-fal-disable-fallback": "true"}
)

Reliability & Retries

Learn more about model fallbacks

fal_max_queue_length

Reject the request with 429 if the endpoint’s queue already has more than this many requests waiting (across all callers). Useful for latency-sensitive applications that prefer to fail fast rather than wait in a long queue.
Query paramfal_max_queue_length
DefaultNo limit
TypeInteger
This parameter is passed as a query parameter on the URL, not as a header. The SDKs do not currently expose it as a named parameter; use the raw URL approach or pass it via headers.
cURL
curl -X POST "https://queue.fal.run/fal-ai/nano-banana-2?fal_max_queue_length=10" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a sunset"}'

Response Headers

These headers are returned by fal in the response. They are informational; you don’t set them.
HeaderDescription
x-fal-request-idUnique identifier for the request. Use this when contacting support or correlating logs.
X-Fal-Billable-UnitsBilling units charged for this request. See Pricing for how units map to cost.
X-Fal-Served-FromInternal identifier of the runner that served the request.
X-Fal-Request-Timeout-TypeSet to user when your start_timeout deadline triggered the 504. See Timeouts and Retries.
X-Fal-Error-TypeError category on failure responses (e.g., request_timeout, startup_timeout, runner_disconnected). See Request Error Types.
x-fal-runner-hintsRouting hints returned by the runner for sticky session routing. See Optimize Routing Behavior.

Common Model Arguments

Common input parameters like seed, image_size, and safety checker that appear across many models