Skip to main content
If you have been running serverless GPU workloads on RunPod, this guide maps RunPod concepts to their fal equivalents and shows how to convert your code. The core ideas are similar: both platforms run your code on GPU machines that scale based on demand. The main difference is that RunPod uses a handler function pattern with explicit Docker builds, while fal uses a class-based fal.App with automatic container builds. For a broader overview of deploying existing Docker containers on fal (regardless of where they came from), see Deploy an Existing Server. If you are comparing fal to other platforms, see Migrate from Replicate or Migrate from Modal.

Concept Mapping

RunPodfalNotes
handler(job)@fal.endpoint("/")Request handler function
runpod.serverless.start({"handler": handler})class MyApp(fal.App)App entrypoint
job["input"]Pydantic Input modelfal validates and types inputs automatically
return resultreturn Output(...)fal validates outputs with Pydantic
yield result (streaming)StreamingResponse or @fal.realtime()See Streaming
Model loading at module leveldef setup(self)Runs once per runner, not per request
refresh_worker: TrueReturn HTTP 503Terminates the runner and spins up a fresh one
runpod.serverless.progress_update()print() (logs visible via SDK)Or use streaming for real-time updates
Dockerfile + Docker Hubrequirements = [...] or ContainerImagefal builds containers for you, or bring your own
Docker Hub deploymentfal deploySingle CLI command
/run (async)fal_client.submit()Queue-based async
/runsync (sync)fal_client.subscribe()Blocks until result
/streamfal_client.stream()Progressive output
Max workersmax_concurrencyMaximum runners to scale to
Min workersmin_concurrencyMinimum runners kept warm
Idle timeoutkeep_aliveSeconds before idle runner shuts down
Concurrency per workermax_multiplexingConcurrent requests per runner
Network volumes/data persistent storageMounted automatically on all runners
Environment variablesfal secrets setSecrets exposed as env vars

Migration Path: Handler to fal.App

The most common pattern on RunPod is a handler function that loads a model at module level and processes requests. On fal, this maps to a fal.App class where model loading moves into setup() and the handler becomes an endpoint method.
import runpod
import torch
from diffusers import StableDiffusionXLPipeline

model = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

def handler(job):
    prompt = job["input"]["prompt"]
    image = model(prompt).images[0]
    image.save("/tmp/output.png")
    return {"image_path": "/tmp/output.png"}

runpod.serverless.start({"handler": handler})
Key differences in the fal version: The model loading moves from module-level into setup(), which runs once per runner rather than once per container build. Inputs are validated through a Pydantic model instead of manually extracting from job["input"]. Outputs are also typed, and images are automatically uploaded to the fal CDN rather than saved to a local path. You do not need to write a Dockerfile or push to Docker Hub since fal builds the container from your requirements list.

Calling Your Deployed App

RunPod exposes /run (async), /runsync (sync), and /stream endpoints. fal provides equivalent patterns through the client SDK.
import runpod

runpod.api_key = "your_api_key"
endpoint = runpod.Endpoint("your_endpoint_id")

# Sync
result = endpoint.run_sync({"prompt": "a sunset"})

# Async
run = endpoint.run({"prompt": "a sunset"})
status = run.status()
result = run.output()
For the full range of calling patterns including streaming, real-time WebSockets, and webhooks, see Calling Your Endpoints.

Deployment Workflow

On RunPod, the deployment unit is a Docker image. Your handler code lives inside the image, and the workflow requires several manual steps: write a Dockerfile that copies your handler and installs dependencies, build the image locally, push it to Docker Hub, then deploy through RunPod’s console by pointing it at the image URL.
# 1. Write a Dockerfile
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
RUN pip install runpod diffusers transformers
COPY handler.py /handler.py
CMD ["python", "/handler.py"]
# 2. Build locally
docker build -t myuser/my-model:latest .

# 3. Push to Docker Hub
docker push myuser/my-model:latest

# 4. Deploy via RunPod console (manual step in the UI)
On fal, there is no manual Docker build, no Docker Hub, and no console-based deployment. You run fal deploy and fal handles the container build, image storage, and deployment. Your code and your environment definition live in the same Python file.

Environment and Dependencies

Because fal eliminates the Docker build step for most use cases, you have two options for defining your environment: Option 1: pip requirements (recommended). List your packages in the requirements attribute. fal builds the container automatically. This is the simplest migration path since you can copy the pip install lines from your RunPod Dockerfile directly into the list.
class MyApp(fal.App):
    requirements = ["torch==2.1.0", "diffusers==0.30.0", "transformers"]
Option 2: Custom Docker container. If you need system packages, a specific CUDA version, or want to reuse your existing RunPod Dockerfile with minimal changes, use ContainerImage. You can paste your RunPod Dockerfile almost as-is, just remove the COPY handler.py and CMD lines since fal handles those.
from fal.container import ContainerImage

class MyApp(fal.App):
    image = ContainerImage.from_dockerfile_str("""
        FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
        RUN apt-get update && apt-get install -y ffmpeg
        RUN pip install diffusers transformers
    """)

Model Storage

RunPod offers three approaches for model weights: Hugging Face cache, baked into the Docker image, or network volumes. fal’s equivalent is persistent storage at /data, which is mounted on every runner and shared across your account. Models downloaded to /data are cached automatically and survive runner restarts, similar to RunPod’s network volumes but without explicit volume configuration.
def setup(self):
    import os
    os.environ["HF_HOME"] = "/data/.cache/huggingface"

    from diffusers import StableDiffusionXLPipeline
    self.model = StableDiffusionXLPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0"
    ).to("cuda")

Next Steps

Once you have migrated your handler, the App Lifecycle page explains how the full lifecycle works on fal, from code serialization to runner shutdown. For scaling configuration, see Scale Your Application. For monitoring your deployed app, see App Analytics.