fal.App with automatic container builds.
For a broader overview of deploying existing Docker containers on fal (regardless of where they came from), see Deploy an Existing Server. If you are comparing fal to other platforms, see Migrate from Replicate or Migrate from Modal.
Concept Mapping
| RunPod | fal | Notes |
|---|---|---|
handler(job) | @fal.endpoint("/") | Request handler function |
runpod.serverless.start({"handler": handler}) | class MyApp(fal.App) | App entrypoint |
job["input"] | Pydantic Input model | fal validates and types inputs automatically |
return result | return Output(...) | fal validates outputs with Pydantic |
yield result (streaming) | StreamingResponse or @fal.realtime() | See Streaming |
| Model loading at module level | def setup(self) | Runs once per runner, not per request |
refresh_worker: True | Return HTTP 503 | Terminates the runner and spins up a fresh one |
runpod.serverless.progress_update() | print() (logs visible via SDK) | Or use streaming for real-time updates |
| Dockerfile + Docker Hub | requirements = [...] or ContainerImage | fal builds containers for you, or bring your own |
| Docker Hub deployment | fal deploy | Single CLI command |
/run (async) | fal_client.submit() | Queue-based async |
/runsync (sync) | fal_client.subscribe() | Blocks until result |
/stream | fal_client.stream() | Progressive output |
| Max workers | max_concurrency | Maximum runners to scale to |
| Min workers | min_concurrency | Minimum runners kept warm |
| Idle timeout | keep_alive | Seconds before idle runner shuts down |
| Concurrency per worker | max_multiplexing | Concurrent requests per runner |
| Network volumes | /data persistent storage | Mounted automatically on all runners |
| Environment variables | fal secrets set | Secrets exposed as env vars |
Migration Path: Handler to fal.App
The most common pattern on RunPod is a handler function that loads a model at module level and processes requests. On fal, this maps to afal.App class where model loading moves into setup() and the handler becomes an endpoint method.
- RunPod
- fal
setup(), which runs once per runner rather than once per container build. Inputs are validated through a Pydantic model instead of manually extracting from job["input"]. Outputs are also typed, and images are automatically uploaded to the fal CDN rather than saved to a local path. You do not need to write a Dockerfile or push to Docker Hub since fal builds the container from your requirements list.
Calling Your Deployed App
RunPod exposes/run (async), /runsync (sync), and /stream endpoints. fal provides equivalent patterns through the client SDK.
- RunPod
- fal
Deployment Workflow
On RunPod, the deployment unit is a Docker image. Your handler code lives inside the image, and the workflow requires several manual steps: write a Dockerfile that copies your handler and installs dependencies, build the image locally, push it to Docker Hub, then deploy through RunPod’s console by pointing it at the image URL.- RunPod workflow
- fal workflow
fal deploy and fal handles the container build, image storage, and deployment. Your code and your environment definition live in the same Python file.
Environment and Dependencies
Because fal eliminates the Docker build step for most use cases, you have two options for defining your environment: Option 1: pip requirements (recommended). List your packages in therequirements attribute. fal builds the container automatically. This is the simplest migration path since you can copy the pip install lines from your RunPod Dockerfile directly into the list.
COPY handler.py and CMD lines since fal handles those.
Model Storage
RunPod offers three approaches for model weights: Hugging Face cache, baked into the Docker image, or network volumes. fal’s equivalent is persistent storage at/data, which is mounted on every runner and shared across your account. Models downloaded to /data are cached automatically and survive runner restarts, similar to RunPod’s network volumes but without explicit volume configuration.