Skip to main content
This tutorial builds on the Quick Start by deploying a real AI model. You will create a text-to-image API powered by Stable Diffusion XL that runs on GPU infrastructure, loads model weights from Hugging Face, and returns generated images through a typed Pydantic schema. The result is a production-ready endpoint with a Playground for browser-based testing. Along the way, you will encounter several concepts that are central to building Serverless apps: the setup() hook for one-time model loading, the keep_alive parameter for controlling runner lifetime, pip requirements for environment setup, and the fal.toolkit.Image class for returning media outputs. If any of these are new to you, the Serverless Introduction provides a quick overview.

Before You Start

You’ll need:
  • Python - we recommend 3.11
  • A fal account (sign up is free)
  • Basic familiarity with Python and AI models

Step 1: Install the CLI

If you haven’t already:
pip install fal

Step 2: Authenticate

Get your API key from the fal dashboard and authenticate:
fal auth login
This opens your browser to authenticate with fal. Once complete, your credentials are saved locally.

Step 3: Create Your Image Generator

Create a file called image_generator.py with this Stable Diffusion XL text-to-image model:
import fal
from pydantic import BaseModel, Field
from fal.toolkit import Image

class Input(BaseModel):
    prompt: str = Field(
        description="The prompt to generate an image from",
        examples=["A beautiful image of a cat"],
    )

class Output(BaseModel):
    image: Image

class MyApp(fal.App):
    keep_alive = 300
    app_name = "my-demo-app"
    machine_type = "GPU-H100"
    requirements = [
        "hf-transfer==0.1.9",
        "diffusers[torch]==0.32.2",
        "transformers[sentencepiece]==4.51.0",
        "accelerate==1.6.0",
    ]

    def setup(self):
        # Enable HF Transfer for faster downloads
        import os

        os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

        import torch
        from diffusers import StableDiffusionXLPipeline

        # Load any model you want, we'll use stabilityai/stable-diffusion-xl-base-1.0
        # Huggingface models will be automatically downloaded to
        # the persistent storage of your account (/data)
        self.pipe = StableDiffusionXLPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        ).to("cuda")

        # Warmup the model before the first request
        self.warmup()

    def warmup(self):
        self.pipe("A beautiful image of a cat")

    @fal.endpoint("/")
    def run(self, request: Input) -> Output:
        result = self.pipe(request.prompt)
        image = Image.from_pil(result.images[0])
        return Output(image=image)

Step 4: Test Your Image Generator

Run your model to test it:
fal run image_generator.py::MyApp
This starts your app, downloads the Stable Diffusion XL model weights (cached after the first run), and prints two URLs: a direct HTTP endpoint and a web playground. The first run takes a couple of minutes for the model download. Subsequent runs are faster. Once you see Application startup complete, test it with curl using the URL from the output:
curl $FAL_RUN_URL -H 'Content-Type: application/json' \
  -d '{"prompt":"A cyberpunk cityscape at night with neon lights"}'
You can also use the playground URL to test through a browser interface.

Step 5: Deploy Your Model

Once you are satisfied with the results, deploy your app to create a persistent URL. Deployed apps scale automatically, with runners managed by fal’s infrastructure. You can configure scaling behavior through parameters like keep_alive, min_concurrency, and max_concurrency. See Scale Your Application to learn more.
fal deploy image_generator.py::MyApp

Step 6: Call Your Deployed App

Once deployed, call your image generator from any Python or JavaScript application using the fal client SDK.
pip install fal-client
import fal_client

result = fal_client.subscribe(
    "your-username/my-demo-app",
    arguments={"prompt": "A cyberpunk cityscape at night with neon lights"},
)
print(result["image"]["url"])
Replace your-username/my-demo-app with the endpoint ID shown after deploying. See Calling Your Endpoints for all calling patterns including async queue, streaming, real-time, and webhooks.

Next Steps

The App Lifecycle page explains how apps are structured, where code runs, and how runners start up and shut down. To define richer input and output schemas (sliders, image uploads, multiple outputs), see Handle Inputs and Outputs. For all the ways to call your deployed app from client code, including async queue, streaming, and real-time patterns, see Calling Your Endpoints.