HTTP over WebSockets API | fal.ai Reference

For applications that require real-time interaction or handle streaming, fal offers a WebSocket-based integration. This allows you to establish a persistent connection and stream data back and forth between your client and the fal API using the same format as the HTTP endpoints.

WebSocket Endpoint

To utilize the WebSocket functionality, use the wss protocol with the the ws.fal.run domain:

wss://ws.fal.run/{model_id}

Communication Protocol

Once connected, the communication follows a specific protocol with JSON messages for control flow and raw data for the actual response stream:

Payload Message: Send a JSON message containing the payload for your application. This is equivalent to the request body you would send to the HTTP endpoint.
Start Metadata: Receive a JSON message containing the HTTP response headers from your application. This allows you to understand the type and structure of the incoming response stream.
Response Stream: Receive the actual response data as a sequence of messages. These can be binary chunks for media content or a JSON object for structured data, depending on the Content-Type header.
End Metadata: Receive a final JSON message indicating the end of the response stream. This signals that the request has been fully processed and the next payload will be processed.

Example Interaction

Here’s an example of a typical interaction with the WebSocket API:

Client Sends (Payload Message):

{"prompt": "generate a 10-second audio clip of a cat purring"}

Server Responds (Start Metadata):

{
  "type": "start",
  "request_id": "5d76da89-5d75-4887-a715-4302bf435614",
  "status": 200,
  "headers": {
    "Content-Type": "text/event-stream; charset=utf-8",
    "Transfer-Encoding": "chunked",
    // ...
  }
}

Server Sends (Response Stream):

<binary audio data chunk 1>
<binary audio data chunk 2>
...
<binary audio data chunk N>

Server Sends (Completion Message):

{
  "type": "end",
  "request_id": "5d76da89-5d75-4887-a715-4302bf435614",
  "status": 200,
  "time_to_first_byte_seconds": 0.577083
}

This WebSocket integration provides a powerful mechanism for building dynamic and responsive AI applications on the fal platform. By leveraging the streaming capabilities, you can unlock new possibilities for creative and interactive user experiences.

Example Program

For instance, should you want to make fast prompts to any LLM, you can use fal-ai/any-llm.

import fal.apps

with fal.apps.ws("fal-ai/any-llm") as connection:
    for i in range(3):
        connection.send(
            {
                "model": "google/gemini-flash-1.5",
                "prompt": f"What is the meaning of life? Respond in {i} words.",
            }
        )

    # they should be in order
    for i in range(3):
        import json

        response = json.loads(connection.recv())
        print(response)

And running this program would output:

{'output': '(Silence)\n', 'partial': False, 'error': None}
{'output': 'Growth\n', 'partial': False, 'error': None}
{'output': 'Personal fulfillment.\n', 'partial': False, 'error': None}

Example Program with Stream

The fal-ai/any-llm/stream model is a streaming model that can generate text in real-time. Here’s an example of how you can use it:

with fal.apps.ws("fal-ai/any-llm/stream") as connection:
    # NOTE: this app responds in 'text/event-stream' format
    # For example:
    #
    #    event: event
    #    data: {"output": "Growth", "partial": true, "error": null}

    for i in range(3):
        connection.send(
            {
                "model": "google/gemini-flash-1.5",
                "prompt": f"What is the meaning of life? Respond in {i+1} words.",
            }
        )

    for i in range(3):
        for bs in connection.stream():
            lines = bs.decode().replace("\r\n", "\n").split("\n")

            event = {}
            for line in lines:
                if not line:
                    continue
                key, value = line.split(":", 1)
                event[key] = value.strip()

            print(event["data"])

        print("----")

And running this program would output:

{"output": "Perspective", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": true, "error": null}
{"output": "Perspective.\n", "partial": false, "error": null}
----
{"output": "Find", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": true, "error": null}
{"output": "Find meaning.\n", "partial": false, "error": null}
----
{"output": "Be", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": true, "error": null}
{"output": "Be, love, grow.\n", "partial": false, "error": null}
----