App
An App is a Python class that wraps your AI model for deployment. Your app defines what packages it needs, how to load your model, and how users interact with it.Where Code Runs: Local vs Remote Execution
What runs locally
- Module import / top level: When Python imports your file, all top-level code executes on your machine.
- This is where you typically define helpers, constants, and construct objects you might pass to the app.
- Building the app object: The class body for your app is defined locally (like any Python class).
- Serialization boundary preparation: When you reference local objects from the app (e.g., myobj), we attempt to pickle them locally to ship to the remote runtime.
What runs remotely
- Class transfer & instantiation: Your fal.App subclass is pickled locally, then unpickled and instantiated remotely in the runtime you configured (e.g. requirements, container image, etc.).
-
App methods / entrypoints: Methods of your app class (e.g.
setup,@fal.endpoints, etc) execute on the remote machine. -
Referenced symbols:
- Pickled objects (closures, small data, simple classes) are shipped as part of the app payload.
- Importable code (installed packages or modules present in the remote image) is imported remotely instead of being shipped.
Example
Lifespan
Startup
setup()
When a runner starts, it first initializes the application and calls the setup() method. This is where you should load your model, initialize connections, and prepare any resources needed to serve requests. The runner is not considered ready until setup() completes successfully.
Shutdown
Runners can be terminated for several reasons:- Expiration: Runners will be terminated when they reach their expiration time
- Manual stop/kill: You can manually terminate runners using the CLI or dashboard
- Scaling activity: Runners may be terminated when scaling down due to reduced demand
SIGTERM signal. After receiving SIGTERM, no new requests are routed to the runner, and it has 5 seconds to:
- Run
handle_exit()(if defined) - Finish processing ongoing requests
- Run
teardown()for cleanup
SIGKILL.
handle_exit()
The handle_exit() method is called immediately when a SIGTERM is received. Use this to signal your request handlers to stop processing early, so there’s enough time remaining for cleanup in teardown().
Without handle_exit(), long-running requests may consume the entire grace period, causing teardown() to be skipped when SIGKILL arrives.
teardown()
The teardown() method is called after all ongoing requests have finished. Use this to clean up resources, close connections, or perform any final operations before the runner terminates.
While it’s possible to add your own signal handlers, we recommend using
handle_exit() instead.
The setup(), handle_exit(), and teardown() methods provide a clean and predictable way to manage your application’s lifecycle without the complexity of custom signal handling.Example
Retry Policy
When using the queue, fal automatically retries requests in the following scenarios:- Server Error: The connection with the app broke or the runner returned a
503status code - Timeout: The app took longer to respond than the request timeout or the runner returned a
504status code
Control Retry Behavior
You can configure your app to skip retries for specific conditions using theskip_retry_conditions option.
Available conditions are "server_error" and "timeout".
Machine Type
Machine Type specifies the hardware (CPU or GPU) your app runs on. Choose based on your model’s needs:"CPU" for lightweight models, "GPU-H100" for most AI models, or "GPU-B200" for large models.
Runner
A Runner is a compute instance that executes your app using your chosen machine type. Runners automatically start when requests arrive and shut down when idle to save costs.Endpoint
An Endpoint is a function in your app that users can call via API. It defines how your model processes inputs and returns outputs.Playground
Each endpoint gets an automatic Playground - a web interface where you can test your model with different inputs before integrating it into your application.fal run vs fal deploy
-
fal run: Test your app on a single cloud GPU during development. Creates a temporary URL that disappears when you stop the command. Defaults topublicauth (no authentication required). -
fal deploy: Deploy your app to production. Creates a permanent URL that stays available until you delete it. Defaults toprivateauth (API key required).
--auth flag to control access:
--auth public: Anyone can call your app without authentication (you pay for usage)--auth private: Requires API key authentication
fal run while building and testing, then fal deploy when ready for production use.