Streaming Inference

Streaming allows you to receive output progressively as the model generates it. Instead of waiting for the full result, you process each chunk as it arrives. This is useful for LLMs that produce tokens incrementally, models that generate intermediate previews, or any situation where you want to show progress to the user. Under the hood, stream() sends a direct HTTP request to fal.run using Server-Sent Events (SSE). The SDK wraps the SSE connection into an iterator, so each event arrives as a parsed object. Streaming does not use the queue, so there are no automatic retries.

Streaming is only supported by models that have a /stream endpoint. Check the model’s API page to confirm support before using stream().

Using `stream()`

import fal_client

for event in fal_client.stream("fal-ai/flux/schnell", arguments={
    "prompt": "a sunset over mountains"
}):
    print(event)

Each event is a dictionary/object whose shape depends on the model. The REST API returns SSE-formatted events (each line prefixed with data: ). The SDKs parse these automatically into objects. A model might stream progress updates followed by the final result:

{"progress": 0.25, "message": "Generating..."}
{"progress": 0.50, "message": "Generating..."}
{"progress": 0.75, "message": "Generating..."}
{"images": [{"url": "https://v3.fal.media/files/..."}], "seed": 42}

`stream()` Parameters

`path`

Endpoint path appended to the model ID. Defaults to "/stream" for streaming endpoints. See path reference.

`timeout`

Client-side HTTP timeout in seconds — how long the client waits for the SSE connection. See timeout reference.

stream() does not support hint, priority, start_timeout, client_timeout, or headers because it bypasses the queue and sends a direct HTTP request. There are no retries. If you need queue-backed reliability, use submit() and poll for status with with_logs=True to track progress.

When to Use Streaming

Streaming is best for LLMs, chat models, showing real-time progress to users, and reducing perceived latency in interactive applications. It is not needed for models that return a single result with no intermediate output, or backend-to-backend integrations where you just need the final response. In those cases, run() or subscribe() is simpler.

Setting Up

Model APIs

Serverless

Compute

Organizations

Streaming Inference

Using `stream()`

`stream()` Parameters

`path`

`timeout`

When to Use Streaming

Setting Up

Model APIs

Serverless

Compute

Organizations

​Using stream()

​stream() Parameters

​path

​timeout

​When to Use Streaming

Using `stream()`

`stream()` Parameters

`path`

`timeout`

When to Use Streaming