| Method | How it works |
|---|---|
run() | Direct HTTP call, no queue |
subscribe() | Queue-based, blocks until result |
submit() | Queue-based, returns immediately (recommended) |
stream() | Progressive output via SSE |
realtime() | WebSocket, persistent connection |
Direct (run)
The simplest way to call a model. Sends a direct HTTP request to fal.run and returns the result. No queue, no retries, no polling.
run for quick scripts, prototyping, or any model with fast response times where you want the lowest overhead. Because there is no queue involved, the call goes straight to a runner and returns the response directly. The tradeoff is that direct calls do not retry on failure. If the runner returns an error or times out, you get the error immediately.
Learn more
Direct and queue-backed synchronous calls
Subscribe (Queue-backed synchronous)
Likerun, but uses the queue under the hood. Submits a request, polls automatically, and blocks until the result is ready. You get automatic retries and reliability with a simple interface.
subscribe is a good choice when you want the simplicity of a blocking call combined with queue-backed reliability. It handles polling for you, so the code looks almost identical to run, but your request is durable and will be retried if a runner fails. Reach for it in simple integrations, backend scripts, or anywhere you do not need to manage the request lifecycle yourself.
Learn more
Direct and queue-backed synchronous calls
Asynchronous (submit)
The recommended approach for production. Submit a request to the queue and return immediately, then poll for status or receive results via webhook.
Polling:
Status types
Thehandler.status() method returns one of three types. Pass with_logs=True to include runner logs.
| Type | Fields | Meaning |
|---|---|---|
Queued | position (int) | Waiting in queue. position is how many requests are ahead. |
InProgress | logs (list or None) | A runner is processing the request. logs contains messages if with_logs=True. |
Completed | logs (list or None), metrics (dict) | Result is ready. metrics includes inference_time in seconds. |
InQueueQueueStatus, InProgressQueueStatus, and CompletedQueueStatus. See the full Python SDK reference and JavaScript SDK reference for details.
Webhook (no polling needed):
Learn Async Inference
The recommended way to call models at scale
Streaming (stream)
For models that produce output progressively. Each event arrives as it is generated, so you can display partial results without waiting for the full response. This is useful for showing image generation previews or streaming LLM tokens.
The
stream() method connects to the /stream path on the model endpoint. Not all models support streaming. Check the model’s API documentation for availability.Learn Streaming
Receive output as it’s generated
Real-time (realtime)
For interactive applications that need the lowest possible latency. Opens a persistent WebSocket connection to a warm runner, enabling back-to-back requests without reconnection overhead. Only available for models with an explicit real-time endpoint.
Learn Real-time
WebSocket connections for interactive apps