For applications that require real-time interaction or handle streaming, fal offers a WebSocket-based integration. This allows you to establish a persistent connection and stream data back and forth between your client and the fal API.
WebSocket Endpoint
To utilize the WebSocket functionality, connect to the endpoint you want to use but using the new ws.fal.run solution:
Communication Protocol
Once connected, the communication follows a specific protocol with JSON messages for control flow and raw data for the actual response stream:
Payload Message: Send a JSON message containing the payload for your application. This is equivalent to the request body you would send to the HTTP endpoint.
Start Metadata: Receive a JSON message containing the HTTP response headers from your application. This allows you to understand the type and structure of the incoming response stream.
Response Stream: Receive the actual response data as a sequence of messages. These can be binary chunks for media content or a JSON object for structured data, depending on the Content-Type header.
End Metadata: Receive a final JSON message indicating the end of the response stream. This signals that the request has been fully processed and the next payload will be processed.
Example Interaction
Here’s an example of a typical interaction with the WebSocket API:
Client Sends (Payload Message):
Server Responds (Start Metadata):
Server Sends (Response Stream):
Server Sends (Completion Message):
This WebSocket integration provides a powerful mechanism for building dynamic and responsive AI applications on the fal platform. By leveraging the streaming capabilities, you can unlock new possibilities for creative and interactive user experiences.
Example Program
For instance, should you want to make fast prompts to any LLM, you can use fal-ai/any-llm.
And running this program would output:
Example Program with Stream
The fal-ai/any-llm/stream model is a streaming model that can generate text in real-time. Here’s an example of how you can use it: