Publishing to the Marketplace

Every model on the fal Marketplace is a fal.App running on Serverless. You can publish your own app to the marketplace, making it callable by any fal user who authenticates with their own API key and pays for their own usage. Publishing involves deploying your app in shared auth mode, defining billable units so callers are charged correctly, and working with the fal team to get listed. Before publishing, your app should be deployed and stable, scaled for your expected traffic, and have health checks and analytics enabled.

Set your app to shared mode

Shared mode means callers must authenticate with their own fal API key. Each caller is billed for their own usage — the app owner is not charged for caller compute.

class MyModel(fal.App):
    app_auth = "shared"

Or via the CLI:

fal deploy my_app.py::MyModel --auth shared

Shared mode requires admin enablement on your account. Contact the fal team to get it enabled before deploying with app_auth = "shared".

Define billable units

When your app runs in shared mode, you control how callers are charged by setting the x-fal-billable-units response header. This tells the platform how many billing units to charge for each request.

response.headers["x-fal-billable-units"] = str(units)

The cost per unit is configured separately by the fal team in the billing system — your code only reports how many units each request consumed. If you don’t set the header, the platform falls back to per-second billing based on the GPU machine type used.

Common patterns

Per image (flat rate) Charge one unit per generated image, regardless of resolution:

response.headers["x-fal-billable-units"] = str(input.num_images)

Per megapixel Scale the charge with output resolution. Higher resolutions cost proportionally more:

resolution_factor = math.ceil(
    (image_size.width * image_size.height) / (1024 * 1024)
)
response.headers["x-fal-billable-units"] = str(
    resolution_factor * input.num_images
)

Per video second Charge based on the duration of generated video:

response.headers["x-fal-billable-units"] = str(video_duration_seconds)

Per text chunk For text-based models (TTS, LLMs), scale with input length:

response.headers["x-fal-billable-units"] = str(max(1, len(prompt) // 1000))

Flat per request Charge a fixed amount regardless of input or output:

response.headers["x-fal-billable-units"] = "1"

Get listed on the Marketplace

Once your app is deployed in shared mode with billable units configured, contact the fal team to get listed on fal.ai/models. The team will configure:

Model card — name, description, category, and example inputs/outputs
Pricing — the cost per billable unit, visible on your model’s page and at fal.ai/pricing
Visibility — when and how your model appears in the marketplace

After listing, callers see your model’s pricing on its page and the X-Fal-Billable-Units header in every response, so they can track their usage programmatically.

Deploy to Production

Deployment strategies and authentication modes.

Model APIs Pricing

How marketplace model pricing works from the caller’s perspective.

Setting Up

Model APIs

Serverless

Compute

Organizations

Publishing to the Marketplace

Set your app to shared mode

Define billable units

Common patterns

Get listed on the Marketplace

Deploy to Production

Model APIs Pricing

Setting Up

Model APIs

Serverless

Compute

Organizations

​Set your app to shared mode

​Define billable units

​Common patterns

​Get listed on the Marketplace

Deploy to Production

Model APIs Pricing

Set your app to shared mode

Define billable units

Common patterns

Get listed on the Marketplace