Synchronous Inference

There are two ways to make a simple blocking call to a model on fal. run sends a direct HTTP request to fal.run and returns the result immediately, with no queue involved. subscribe uses the queue under the hood but handles polling automatically, so it feels just as simple while giving you automatic retries and reliability. Both are good starting points for scripts, prototyping, and any situation where you just want the output without managing the request lifecycle yourself. Use run when you want the fastest possible path with no overhead, and subscribe when you want queue-backed reliability. For production workloads that need parallel processing or webhooks, use async inference instead.

Using `run` (Direct)

run sends a direct HTTP request to fal.run. There is no queue, no retries, and no status polling. The response comes back in the same HTTP connection.

import fal_client

result = fal_client.run("fal-ai/flux/schnell", arguments={
    "prompt": "a sunset over mountains"
})

print(result["images"][0]["url"])

subscribe submits to the queue and polls until the result is ready. You get automatic retries, timeout handling, and scaling, with the same simple blocking interface as run.

import fal_client

result = fal_client.subscribe("fal-ai/flux/schnell", arguments={
    "prompt": "a sunset over mountains"
})

print(result["images"][0]["url"])

With Progress Updates

Track progress while waiting for results:

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "fal-ai/flux/schnell",
    arguments={"prompt": "a sunset over mountains"},
    with_logs=True,
    on_queue_update=on_queue_update
)

`run()` Parameters

`path`

Endpoint path appended to the model ID. Leave empty for the default root endpoint. See path reference.

`timeout`

Client-side HTTP timeout in seconds. This is a standard HTTP connection timeout — how long the client library waits for the server to send back a complete response. If the connection is idle for longer than this, the client raises a timeout error. It does not affect the server.

result = fal_client.run("fal-ai/nano-banana-2", arguments={...}, timeout=120)

`start_timeout`

Server-side deadline covering total elapsed time until a runner successfully starts processing. Returns 504 if exceeded. See start_timeout reference for full details on how the clock behaves across retries.

`hint`

Routing hint for session affinity — routes requests to the same runner. See hint reference.

`headers`

Additional HTTP headers for platform-level controls like disabling retries or payload storage. See headers reference.

subscribe() accepts all the parameters above except timeout, plus the queue-specific options below. For shared parameters (path, start_timeout, hint, headers, priority), see the submit() reference.

`client_timeout`

Client-side total deadline in seconds (Python SDK only). This limits the total time subscribe() blocks, including queue wait and processing. When exceeded, the client stops polling and raises a FalClientTimeoutError. The request may still be processing on the server after the client gives up. If you set client_timeout without setting start_timeout, the SDK automatically sets start_timeout = client_timeout so the server also respects your deadline. If you set start_timeout larger than client_timeout, the SDK emits a warning.

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    client_timeout=60,
)

For a comparison of all timeout mechanisms, see Timeouts and Retries.

`with_logs` / `logs`

When enabled, status updates include runner logs (print statements from your model’s code). Without this, the logs field in progress updates is empty. Useful for debugging or showing generation progress to users. The parameter is named with_logs in Python and logs in JavaScript.

`on_enqueue`

A callback function that fires once, immediately after the request enters the queue. It receives the request_id as its argument. Use this to store the request ID for later reference — for example, to display it in a UI or save it to a database so you can retrieve the result later even if the client disconnects.

def save_request_id(request_id):
    print(f"Enqueued: {request_id}")

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    on_enqueue=save_request_id,
)

`on_queue_update` / `onQueueUpdate`

A callback function that fires each time the client polls for status. It receives a status object with a status field indicating the current state. In Python, the status is one of three types: Queued (has position), InProgress (has logs), or Completed (has logs and metrics). In JavaScript, check the status string field: "IN_QUEUE", "IN_PROGRESS", or "COMPLETED".

def on_update(update):
    if isinstance(update, fal_client.Queued):
        print(f"Queue position: {update.position}")
    elif isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    with_logs=True,
    on_queue_update=on_update,
)

When to Use

Synchronous methods are best for simple scripts, prototyping, and any situation where you just want the result without managing the request lifecycle. Use run when you want the fastest path with no overhead. Use subscribe when you want queue-backed reliability with automatic retries. For production workloads that need parallel request processing, webhooks, or full visibility into queue status, use async inference instead.

Error Responses

When a request fails due to infrastructure issues (timeouts, runner errors, etc.), the response includes a JSON body with detail and error_type fields, along with an X-Fal-Error-Type header. For the full list of error types and their meanings, see Request Error Types. For model-level validation errors (invalid inputs, content policy violations, etc.), see Model Errors.

Setting Up

Model APIs

Serverless

Compute

Organizations

Synchronous Inference

Using `run` (Direct)

With Progress Updates

`run()` Parameters

`path`

`timeout`

`start_timeout`

`hint`

`headers`

`client_timeout`

`with_logs` / `logs`

`on_enqueue`

`on_queue_update` / `onQueueUpdate`

When to Use

Error Responses

Setting Up

Model APIs

Serverless

Compute

Organizations

​Using run (Direct)

​Using subscribe (Queue-backed)

​With Progress Updates

​run() Parameters

​path

​timeout

​start_timeout

​hint

​headers

​subscribe() Parameters

​client_timeout

​with_logs / logs

​on_enqueue

​on_queue_update / onQueueUpdate

​When to Use

​Error Responses

Using `run` (Direct)

Using `subscribe` (Queue-backed)

With Progress Updates

`run()` Parameters

`path`

`timeout`

`start_timeout`

`hint`

`headers`

`subscribe()` Parameters

`client_timeout`

`with_logs` / `logs`

`on_enqueue`

`on_queue_update` / `onQueueUpdate`

When to Use

Error Responses