Skip to main content
If you have been running models on Replicate using Cog, this guide shows how to convert your Cog model to a fal.App. The core idea is similar: both platforms package a model with its dependencies and expose a predict/generate interface. The main differences are that fal uses a Python class instead of a cog.yaml + predict.py pattern, and fal builds containers from a requirements list or Dockerfile rather than relying on Cog’s build system. For a broader overview of deploying existing Docker containers on fal (regardless of where they came from), see Deploy an Existing Server. If you are comparing fal to other platforms, see Migrate from Modal or Migrate from RunPod.

Concept Mapping

Replicate (Cog)falNotes
cog.yamlrequirements = [...] or ContainerImageEnvironment definition
class Predictor(BasePredictor)class MyApp(fal.App)App class
def setup(self)def setup(self)One-time model loading
def predict(self, ...)@fal.endpoint("/")Request handler
cog.Path (file output)fal.toolkit.Image / fal.toolkit.FileMedia outputs uploaded to CDN automatically
Input(...) type hintsPydantic BaseModelfal uses standard Pydantic for input validation
cog pushfal deploySingle CLI command
Replicate API clientfal_client.subscribe(...)HTTP + queue based
Webhookswebhook_url parameterBoth support webhook delivery

Migration Path: Cog Predictor to fal.App

The most common Cog pattern is a Predictor class with setup() and predict() methods. On fal, setup() stays the same, and predict() becomes an @fal.endpoint method with Pydantic input/output models.
# cog.yaml
build:
  python_version: "3.11"
  python_packages:
    - torch==2.1.0
    - diffusers==0.30.0
    - transformers
    - accelerate
  gpu: true
predict: "predict.py:Predictor"
# predict.py
from cog import BasePredictor, Input, Path
import torch
from diffusers import StableDiffusionXLPipeline

class Predictor(BasePredictor):
    def setup(self):
        self.pipe = StableDiffusionXLPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            torch_dtype=torch.float16,
        ).to("cuda")

    def predict(self, prompt: str = Input(description="Text prompt")) -> Path:
        image = self.pipe(prompt).images[0]
        output_path = "/tmp/output.png"
        image.save(output_path)
        return Path(output_path)
Key differences in the fal version: The cog.yaml is replaced by class attributes (machine_type, requirements). The cog.Path output is replaced by fal.toolkit.Image, which automatically uploads the image to the fal CDN and returns a URL. Inputs use standard Pydantic models instead of Cog’s Input() type hints. Imports happen inside setup() so they run on the remote runner, not on your local machine (see Serialization and Build for why).

Using Your Existing Cog Dockerfile

If you have a complex cog.yaml with system packages, CUDA configuration, or custom build steps, you can extract the Dockerfile that Cog generates and use it directly with fal. Run cog debug to output the generated Dockerfile:
cog debug > Dockerfile
You will need to make a few modifications to the generated Dockerfile:
  1. Remove the COPY . /src, EXPOSE, and CMD lines at the end - fal handles these
  2. Remove the Cog wheel installation (cog-0.0.1.dev-py3-none-any.whl) since fal does not use the Cog runtime
  3. Replace the Cog requirements with your actual pip packages
Then reference the Dockerfile in your fal app:
import fal
from fal.container import ContainerImage

class MyApp(fal.App):
    machine_type = "GPU-A100"
    image = ContainerImage.from_dockerfile("Dockerfile")

    def setup(self):
        import torch
        from diffusers import StableDiffusionXLPipeline

        self.pipe = StableDiffusionXLPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            torch_dtype=torch.float16,
        ).to("cuda")

    @fal.endpoint("/")
    def predict(self, input: dict) -> dict:
        image = self.pipe(input["prompt"]).images[0]
        return {"image": image}
For most migrations, the requirements list approach is simpler and avoids dealing with Cog’s generated Dockerfile. Use the Dockerfile approach only when you have system-level dependencies or a specific CUDA version that cannot be expressed through pip packages. See Custom Container Images for the full guide.
cog debug is a hidden debugging command with no stability guarantees from the Cog team. The generated Dockerfile format may change between Cog versions.

Deploying and Calling

# Deploy
fal deploy my_app.py::MyApp
# Call your deployed app
import fal_client

result = fal_client.subscribe("your-username/my-app", arguments={
    "prompt": "a sunset over mountains"
})
print(result["image"]["url"])
For the full range of calling patterns including async queue, streaming, and webhooks, see Calling Your Endpoints.

Next Steps

Once you have migrated your model, the App Lifecycle page explains how the full lifecycle works on fal, from code serialization to runner shutdown. For scaling configuration, see Scale Your Application. For monitoring your deployed app, see App Analytics.