Migrate from RunPod

If you have been running serverless GPU workloads on RunPod, this guide maps RunPod concepts to their fal equivalents and shows how to convert your code. The core ideas are similar: both platforms run your code on GPU machines that scale based on demand. The main difference is that RunPod uses a handler function pattern with explicit Docker builds, while fal uses a class-based fal.App with automatic container builds. For a broader overview of deploying existing Docker containers on fal (regardless of where they came from), see Deploy an Existing Server. If you are comparing fal to other platforms, see Migrate from Replicate or Migrate from Modal.

Concept Mapping

RunPod	fal	Notes
`handler(job)`	`@fal.endpoint("/")`	Request handler function
`runpod.serverless.start({"handler": handler})`	`class MyApp(fal.App)`	App entrypoint
`job["input"]`	Pydantic `Input` model	fal validates and types inputs automatically
`return result`	`return Output(...)`	fal validates outputs with Pydantic
`yield result` (streaming)	`StreamingResponse` or `@fal.realtime()`	See Streaming
Model loading at module level	`def setup(self)`	Runs once per runner, not per request
`refresh_worker: True`	Return HTTP 503	Terminates the runner and spins up a fresh one
`runpod.serverless.progress_update()`	`print()` (logs visible via SDK)	Or use streaming for real-time updates
Dockerfile + Docker Hub	`requirements = [...]` or `ContainerImage`	fal builds containers for you, or bring your own
Docker Hub deployment	`fal deploy`	Single CLI command
`/run` (async)	`fal_client.submit()`	Queue-based async
`/runsync` (sync)	`fal_client.subscribe()`	Blocks until result
`/stream`	`fal_client.stream()`	Progressive output
Max workers	`max_concurrency`	Maximum runners to scale to
Min workers	`min_concurrency`	Minimum runners kept warm
Idle timeout	`keep_alive`	Seconds before idle runner shuts down
Concurrency per worker	`max_multiplexing`	Concurrent requests per runner
Network volumes	`/data` persistent storage	Mounted automatically on all runners
Environment variables	`fal secrets set`	Secrets exposed as env vars

Migration Path: Handler to fal.App

The most common pattern on RunPod is a handler function that loads a model at module level and processes requests. On fal, this maps to a fal.App class where model loading moves into setup() and the handler becomes an endpoint method.

RunPod
fal

import runpod
import torch
from diffusers import StableDiffusionXLPipeline

model = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

def handler(job):
    prompt = job["input"]["prompt"]
    image = model(prompt).images[0]
    image.save("/tmp/output.png")
    return {"image_path": "/tmp/output.png"}

runpod.serverless.start({"handler": handler})

import fal
from pydantic import BaseModel
from fal.toolkit import Image

class Input(BaseModel):
    prompt: str

class Output(BaseModel):
    image: Image

class MyApp(fal.App):
    machine_type = "GPU-A100"
    requirements = ["torch", "diffusers"]
    keep_alive = 300
    max_concurrency = 5

    def setup(self):
        import torch
        from diffusers import StableDiffusionXLPipeline

        self.model = StableDiffusionXLPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            torch_dtype=torch.float16,
        ).to("cuda")

    @fal.endpoint("/")
    def generate(self, input: Input) -> Output:
        image = self.model(input.prompt).images[0]
        return Output(image=Image.from_pil(image))

Key differences in the fal version: The model loading moves from module-level into setup(), which runs once per runner rather than once per container build. Inputs are validated through a Pydantic model instead of manually extracting from job["input"]. Outputs are also typed, and images are automatically uploaded to the fal CDN rather than saved to a local path. You do not need to write a Dockerfile or push to Docker Hub since fal builds the container from your requirements list.

Calling Your Deployed App

RunPod exposes /run (async), /runsync (sync), and /stream endpoints. fal provides equivalent patterns through the client SDK.

RunPod
fal

import runpod

runpod.api_key = "your_api_key"
endpoint = runpod.Endpoint("your_endpoint_id")

# Sync
result = endpoint.run_sync({"prompt": "a sunset"})

# Async
run = endpoint.run({"prompt": "a sunset"})
status = run.status()
result = run.output()

import fal_client

# Sync (subscribe polls automatically)
result = fal_client.subscribe("your-username/your-app", arguments={
    "prompt": "a sunset"
})

# Async
handler = fal_client.submit("your-username/your-app", arguments={
    "prompt": "a sunset"
})
status = handler.status()
result = handler.get()

For the full range of calling patterns including streaming, real-time WebSockets, and webhooks, see Calling Your Endpoints.

Deployment Workflow

On RunPod, the deployment unit is a Docker image. Your handler code lives inside the image, and the workflow requires several manual steps: write a Dockerfile that copies your handler and installs dependencies, build the image locally, push it to Docker Hub, then deploy through RunPod’s console by pointing it at the image URL.

RunPod workflow
fal workflow

# 1. Write a Dockerfile
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
RUN pip install runpod diffusers transformers
COPY handler.py /handler.py
CMD ["python", "/handler.py"]

# 2. Build locally
docker build -t myuser/my-model:latest .

# 3. Push to Docker Hub
docker push myuser/my-model:latest

# 4. Deploy via RunPod console (manual step in the UI)

# 1. Write your app (no Dockerfile needed)
class MyApp(fal.App):
    machine_type = "GPU-A100"
    requirements = ["torch", "diffusers", "transformers"]

    def setup(self):
        ...

# 2. Deploy (one command, fal builds the container for you)
fal deploy my_app.py::MyApp

On fal, there is no manual Docker build, no Docker Hub, and no console-based deployment. You run fal deploy and fal handles the container build, image storage, and deployment. Your code and your environment definition live in the same Python file.

Environment and Dependencies

Because fal eliminates the Docker build step for most use cases, you have two options for defining your environment: Option 1: pip requirements (recommended). List your packages in the requirements attribute. fal builds the container automatically. This is the simplest migration path since you can copy the pip install lines from your RunPod Dockerfile directly into the list.

class MyApp(fal.App):
    requirements = ["torch==2.1.0", "diffusers==0.30.0", "transformers"]

Option 2: Custom Docker container. If you need system packages, a specific CUDA version, or want to reuse your existing RunPod Dockerfile with minimal changes, use ContainerImage. You can paste your RunPod Dockerfile almost as-is, just remove the COPY handler.py and CMD lines since fal handles those.

from fal.container import ContainerImage

class MyApp(fal.App):
    image = ContainerImage.from_dockerfile_str("""
        FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
        RUN apt-get update && apt-get install -y ffmpeg
        RUN pip install diffusers transformers
    """)

Model Storage

RunPod offers three approaches for model weights: Hugging Face cache, baked into the Docker image, or network volumes. fal’s equivalent is persistent storage at /data, which is mounted on every runner and shared across your account. Models downloaded to /data are cached automatically and survive runner restarts, similar to RunPod’s network volumes but without explicit volume configuration.

def setup(self):
    import os
    os.environ["HF_HOME"] = "/data/.cache/huggingface"

    from diffusers import StableDiffusionXLPipeline
    self.model = StableDiffusionXLPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0"
    ).to("cuda")

Next Steps

Once you have migrated your handler, the App Lifecycle page explains how the full lifecycle works on fal, from code serialization to runner shutdown. For scaling configuration, see Scale Your Application. For monitoring your deployed app, see App Analytics.

Setting Up

Model APIs

Serverless

Compute

Organizations

Migrate from RunPod

Concept Mapping

Migration Path: Handler to fal.App

Calling Your Deployed App

Deployment Workflow

Environment and Dependencies

Model Storage

Next Steps

Setting Up

Model APIs

Serverless

Compute

Organizations

​Concept Mapping

​Migration Path: Handler to fal.App

​Calling Your Deployed App

​Deployment Workflow

​Environment and Dependencies

​Model Storage

​Next Steps

Concept Mapping

Migration Path: Handler to fal.App

Calling Your Deployed App

Deployment Workflow

Environment and Dependencies

Model Storage

Next Steps