fal run gives you a temporary URL for testing, fal deploy creates a persistent deployment that stays available until you explicitly remove it. Your app gets a stable endpoint ID (e.g., your-username/my-model) that callers use with the fal client SDKs or REST API, automatic runner scaling, and built-in reliability through the queue system.
Before deploying, make sure you have installed the CLI, written your fal.App, and tested it locally with fal run. This page walks through the deployment process from the initial command through build, configuration, and what happens behind the scenes. For managing deployments after they are live, see Manage Deployments.
Development vs Production
During development, usefal run to get a temporary endpoint for testing. It creates an ephemeral deployment that is destroyed when you stop the process. By default, fal run uses public auth mode so you can test without an API key.
fal deploy. This creates a persistent deployment with a permanent URL. By default, fal deploy uses private auth mode, requiring an API key to call your app.
Deploying Your App
The simplest deployment requires just the path to your app file and class:your-username/app-name. Callers use this ID with fal_client.subscribe("your-username/app-name", ...) or via the REST API at https://queue.fal.run/your-username/app-name. You do not need Docker installed locally. The build happens entirely in the cloud.
App Name
By default, your app name is derived from the class name (e.g.,TextToImage becomes text-to-image). You can override it in your code or via the CLI. The app name is the second part of your endpoint ID (your-username/app-name).
Machine Type
Set the hardware your app runs on using themachine_type class attribute. For GPU workloads, you can also specify num_gpus for multi-GPU configurations. See Machine Types for the full list of available hardware and guidance on choosing the right option.
Regions
By default, your app can run in any available region. If you need data residency compliance or want to minimize latency for a specific geography, restrict it:Environments
You can deploy to different environments (e.g., staging, production) to isolate configurations and secrets:Authentication Modes
Authentication controls who can call your app and who pays for compute. You set it with the--auth flag or the app_auth class attribute.
Private is the default for fal deploy. Only the account owner and their team members can call the app, and they must include an API key. All compute is billed to the owner.
Public is the default for fal run. Anyone with the URL can call the app without authentication. All compute is billed to the owner. Use this for open demos or internal tools where you control access at another layer.
Shared allows anyone with the URL to call the app, but callers must authenticate with their own API key. Each caller pays for their own usage. This is how all apps in the Model Gallery work. Shared mode requires admin enablement on your account. See Publishing to the Marketplace for the full process of making your app available as a marketplace model.
Rollout Strategies
When you redeploy an app that is already running, fal needs to transition from the old revision to the new one. You can choose between two strategies: Rolling (default) spins up a runner on the new revision before switching traffic. fal waits for this runner to completesetup() and become healthy. Only after the new runner is confirmed ready does fal switch the alias. This ensures zero downtime but takes longer to complete. If the new runner fails to start (e.g., setup() crashes), the deployment is aborted and traffic stays on the old revision.
Recreate instantly switches the app alias to the new revision. No runners are started proactively. The first request after redeployment triggers a cold start on the new revision. If your app has min_concurrency > 0, the scaling system will eventually bring runners up to meet that minimum, but this happens through normal scaling, not as part of the deploy itself.
With rolling deployments, you will see a runner spin up during
fal deploy even if there is no traffic. This is the health verification step. The deploy command stays open until the runner is confirmed healthy or fails.How Builds Work
When you runfal deploy, your code is built into a container image remotely. You do not need Docker installed locally.
Upload
Your app code is sent to fal’s build service. For non-container apps, files listed in
app_files are included. For container apps, files referenced by COPY/ADD in your Dockerfile are included automatically.Build
fal builds your container image in the cloud using Depot, a remote Docker build service.
Cache check
A hash is computed from your Dockerfile, dependencies, build args, and container files. If an identical image already exists, the build is skipped entirely.
Ready
Runners pull the image when they start up. Image size directly affects pull time and cold start duration.
Since builds happen remotely, your local machine does not need Docker, GPU drivers, or large amounts of disk space. You only need the
fal CLI and your code.