Calling Your Endpoints

Once your app is deployed, you can call it using the same client SDKs and patterns used for any model on fal. Your app’s endpoint ID is your-username/your-app-name, and all of the inference methods work identically whether you are calling a marketplace model or your own deployed app. This page shows quick examples of each calling pattern. subscribe is the simplest option since it handles polling for you and blocks until the result is ready. For production workloads where you need to manage many requests in parallel, submit gives you full control over the request lifecycle. For full details on parameters, response shapes, status polling, and cancellation, see the Inference documentation. Submits a request to the queue, polls automatically, and returns the result when ready. This is the simplest calling pattern since it handles the request lifecycle for you. Optionally receive progress updates via callbacks.

Python
JavaScript

import fal_client

result = fal_client.subscribe("your-username/your-app-name", arguments={
    "prompt": "a sunset over mountains"
})
print(result)

With progress updates:

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "your-username/your-app-name",
    arguments={"prompt": "a sunset over mountains"},
    with_logs=True,
    on_queue_update=on_queue_update,
)

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("your-username/your-app-name", {
  input: { prompt: "a sunset over mountains" },
});
console.log(result.data);

With progress updates:

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("your-username/your-app-name", {
  input: { prompt: "a sunset over mountains" },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs?.map((log) => console.log(log.message));
    }
  },
});

Synchronous Inference

Full details on subscribe, progress updates, and timeout handling

Queue (Async)

For fire-and-forget workflows. Submit a request, get a request ID, and retrieve the result later by polling or webhook.

Python
JavaScript

import fal_client

handler = fal_client.submit("your-username/your-app-name", arguments={
    "prompt": "a sunset over mountains"
})

print(f"Request ID: {handler.request_id}")

# Check status
status = handler.status()
print(status)

# Get result when ready
result = handler.get()
print(result)

import { fal } from "@fal-ai/client";

const { request_id } = await fal.queue.submit("your-username/your-app-name", {
  input: { prompt: "a sunset over mountains" },
});

console.log(`Request ID: ${request_id}`);

// Check status
const status = await fal.queue.status("your-username/your-app-name", {
  requestId: request_id,
  logs: true,
});

// Get result when ready
const result = await fal.queue.result("your-username/your-app-name", {
  requestId: request_id,
});

Asynchronous Inference

Full details on the queue system, status polling, and REST API reference

Streaming

For apps that produce progressive output via Server-Sent Events (SSE). Your app must define a streaming endpoint at /stream using @fal.endpoint("/stream").

Python
JavaScript

import fal_client

for event in fal_client.stream("your-username/your-app-name", arguments={
    "prompt": "a sunset over mountains"
}):
    print(event)

import { fal } from "@fal-ai/client";

const stream = await fal.stream("your-username/your-app-name", {
  input: { prompt: "a sunset over mountains" },
});

for await (const event of stream) {
  console.log(event);
}

const finalResult = await stream.done();

Building Streaming Endpoints

How to implement SSE streaming in your fal.App

Streaming Inference

Client-side streaming details and REST API

Real-Time (WebSocket)

For bidirectional, low-latency communication over a persistent connection. Your app must define a @fal.realtime("/realtime") endpoint.

Python
JavaScript

import fal_client

with fal_client.realtime("your-username/your-app-name") as connection:
    connection.send({"prompt": "Hello, world!"})
    result = connection.recv()
    print(result)

Async version:

import asyncio
import fal_client

async def main():
    async with fal_client.realtime_async("your-username/your-app-name") as connection:
        await connection.send({"prompt": "Hello, world!"})
        result = await connection.recv()
        print(result)

asyncio.run(main())

import { fal } from "@fal-ai/client";

const connection = fal.realtime.connect("your-username/your-app-name", {
  onResult: (result) => {
    console.log(result);
  },
});

connection.send({ prompt: "Hello, world!" });

Building Realtime Endpoints

How to implement WebSocket endpoints in your fal.App

Real-Time Inference

Client-side real-time details and proxy setup

Webhooks

Submit a request and receive the result at a URL you specify, instead of polling.

Python
JavaScript
cURL

import fal_client

handler = fal_client.submit(
    "your-username/your-app-name",
    arguments={"prompt": "a sunset over mountains"},
    webhook_url="https://your-server.com/api/webhook",
)

print(f"Request ID: {handler.request_id}")

import { fal } from "@fal-ai/client";

const { request_id } = await fal.queue.submit("your-username/your-app-name", {
  input: { prompt: "a sunset over mountains" },
  webhookUrl: "https://your-server.com/api/webhook",
});

curl -X POST "https://queue.fal.run/your-username/your-app-name?fal_webhook=https://your-server.com/api/webhook" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a sunset over mountains"}'

When the request completes, fal sends a POST to your webhook URL with the result:

{
  "request_id": "abc-123",
  "status": "OK",
  "payload": { ... }
}

Webhooks API

Full details on webhook payloads, retries, verification, and IP allowlisting

Passing Headers

You can pass custom headers with any calling method to control platform behavior:

Python
JavaScript

import fal_client

result = fal_client.subscribe(
    "your-username/your-app-name",
    arguments={"prompt": "a sunset"},
    headers={
        "x-fal-no-retry": "1",
    },
)

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("your-username/your-app-name", {
  input: { prompt: "a sunset" },
  headers: {
    "x-fal-no-retry": "1",
  },
});

See Platform Headers for all available headers, and Retries for retry control. Each inference method page also documents its available SDK parameters.

Next Steps

Inference Documentation

Full details on all calling methods, parameters, status polling, and the request handle

Async Inference (Queue)

Submit, status, result, cancel, webhooks, and streaming status updates

Client Setup

Install and configure the fal client SDK

Handle Inputs & Outputs

Define the input/output schema for your endpoints

Setting Up

Model APIs

Serverless

Compute

Organizations

Calling Your Endpoints

Synchronous Inference

Queue (Async)

Asynchronous Inference

Streaming

Building Streaming Endpoints

Streaming Inference

Real-Time (WebSocket)

Building Realtime Endpoints

Real-Time Inference

Webhooks

Webhooks API

Passing Headers

Next Steps

Inference Documentation

Async Inference (Queue)

Client Setup

Handle Inputs & Outputs

Setting Up

Model APIs

Serverless

Compute

Organizations

​Subscribe

Synchronous Inference

​Queue (Async)

Asynchronous Inference

​Streaming

Building Streaming Endpoints

Streaming Inference

​Real-Time (WebSocket)

Building Realtime Endpoints

Real-Time Inference

​Webhooks

Webhooks API

​Passing Headers

​Next Steps

Inference Documentation

Async Inference (Queue)

Client Setup

Handle Inputs & Outputs

Subscribe

Queue (Async)

Streaming

Real-Time (WebSocket)

Webhooks

Passing Headers

Next Steps