Skip to main content
fal is designed for production workloads and includes several built-in mechanisms to ensure your requests succeed.

Queue-Based Processing

The queue system handles traffic surges gracefully and provides request tracking. When you submit a request, it enters a managed queue that ensures reliable processing even during peak demand.

Automatic Retries

When using the queue, fal automatically retries requests that fail due to:
  • Server errors (503): The model endpoint was temporarily unavailable
  • Timeouts (504): The request took too long due to transient issues
  • Connection errors: Network issues between fal infrastructure
  • Rate limits (429): Request waits and retries automatically when you temporarily exceed your concurrent request limit
Requests are retried up to 10 times with intelligent backoff.
No charge for server errors: Failed requests that return 5xx status codes are not billed.
Automatic retries only apply to queue-based requests. Direct synchronous requests return errors immediately without retry.

Model Fallbacks

For supported models, fal might automatically reroute requests to equivalent alternative endpoints if the primary endpoint is temporarily unavailable. This only occurs after fal retries the request up to five times; if those retries fail, the request is routed to a fallback endpoint. This mechanism improves overall reliability and reduces the likelihood of failed requests. Fallbacks are enabled by default for all accounts. If you need to disable fallbacks for your account, please let your account team know. If you want to disable it per request, you can pass x-fal-disable-fallbacks header. For any questions, contact our sales team.