If you have been running models on Replicate using Cog, this guide shows how to convert your Cog model to a fal.App. The core idea is similar: both platforms package a model with its dependencies and expose a predict/generate interface. The main differences are that fal uses a Python class instead of a cog.yaml + predict.py pattern, and fal builds containers from a requirements list or Dockerfile rather than relying on Cog’s build system.
For a broader overview of deploying existing Docker containers on fal (regardless of where they came from), see Deploy an Existing Server. If you are comparing fal to other platforms, see Migrate from Modal or Migrate from RunPod.
Concept Mapping
| Replicate (Cog) | fal | Notes |
|---|
cog.yaml | requirements = [...] or ContainerImage | Environment definition |
class Predictor(BasePredictor) | class MyApp(fal.App) | App class |
def setup(self) | def setup(self) | One-time model loading |
def predict(self, ...) | @fal.endpoint("/") | Request handler |
cog.Path (file output) | fal.toolkit.Image / fal.toolkit.File | Media outputs uploaded to CDN automatically |
Input(...) type hints | Pydantic BaseModel | fal uses standard Pydantic for input validation |
cog push | fal deploy | Single CLI command |
| Replicate API client | fal_client.subscribe(...) | HTTP + queue based |
| Webhooks | webhook_url parameter | Both support webhook delivery |
Migration Path: Cog Predictor to fal.App
The most common Cog pattern is a Predictor class with setup() and predict() methods. On fal, setup() stays the same, and predict() becomes an @fal.endpoint method with Pydantic input/output models.
# cog.yaml
build:
python_version: "3.11"
python_packages:
- torch==2.1.0
- diffusers==0.30.0
- transformers
- accelerate
gpu: true
predict: "predict.py:Predictor"
# predict.py
from cog import BasePredictor, Input, Path
import torch
from diffusers import StableDiffusionXLPipeline
class Predictor(BasePredictor):
def setup(self):
self.pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
def predict(self, prompt: str = Input(description="Text prompt")) -> Path:
image = self.pipe(prompt).images[0]
output_path = "/tmp/output.png"
image.save(output_path)
return Path(output_path)
import fal
from pydantic import BaseModel, Field
from fal.toolkit import Image
class Input(BaseModel):
prompt: str = Field(description="Text prompt")
class Output(BaseModel):
image: Image
class MyApp(fal.App):
machine_type = "GPU-A100"
requirements = ["torch==2.1.0", "diffusers==0.30.0", "transformers", "accelerate"]
def setup(self):
import torch
from diffusers import StableDiffusionXLPipeline
self.pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
@fal.endpoint("/")
def predict(self, input: Input) -> Output:
image = self.pipe(input.prompt).images[0]
return Output(image=Image.from_pil(image))
Key differences in the fal version:
The cog.yaml is replaced by class attributes (machine_type, requirements). The cog.Path output is replaced by fal.toolkit.Image, which automatically uploads the image to the fal CDN and returns a URL. Inputs use standard Pydantic models instead of Cog’s Input() type hints. Imports happen inside setup() so they run on the remote runner, not on your local machine (see Serialization and Build for why).
Using Your Existing Cog Dockerfile
If you have a complex cog.yaml with system packages, CUDA configuration, or custom build steps, you can extract the Dockerfile that Cog generates and use it directly with fal. Run cog debug to output the generated Dockerfile:
You will need to make a few modifications to the generated Dockerfile:
- Remove the
COPY . /src, EXPOSE, and CMD lines at the end - fal handles these
- Remove the Cog wheel installation (
cog-0.0.1.dev-py3-none-any.whl) since fal does not use the Cog runtime
- Replace the Cog requirements with your actual pip packages
Then reference the Dockerfile in your fal app:
import fal
from fal.container import ContainerImage
class MyApp(fal.App):
machine_type = "GPU-A100"
image = ContainerImage.from_dockerfile("Dockerfile")
def setup(self):
import torch
from diffusers import StableDiffusionXLPipeline
self.pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
@fal.endpoint("/")
def predict(self, input: dict) -> dict:
image = self.pipe(input["prompt"]).images[0]
return {"image": image}
For most migrations, the requirements list approach is simpler and avoids dealing with Cog’s generated Dockerfile. Use the Dockerfile approach only when you have system-level dependencies or a specific CUDA version that cannot be expressed through pip packages. See Custom Container Images for the full guide.
cog debug is a hidden debugging command with no stability guarantees from the Cog team. The generated Dockerfile format may change between Cog versions.
Deploying and Calling
# Deploy
fal deploy my_app.py::MyApp
# Call your deployed app
import fal_client
result = fal_client.subscribe("your-username/my-app", arguments={
"prompt": "a sunset over mountains"
})
print(result["image"]["url"])
For the full range of calling patterns including async queue, streaming, and webhooks, see Calling Your Endpoints.
Next Steps
Once you have migrated your model, the App Lifecycle page explains how the full lifecycle works on fal, from code serialization to runner shutdown. For scaling configuration, see Scale Your Application. For monitoring your deployed app, see App Analytics.