Skip to main content
When you call a pre-trained model through fal’s Model APIs, you are billed based on the output you generate. Each model on fal has its own pricing and billing unit, visible on the model’s page in the gallery and at fal.ai/pricing. You pay only for successful outputs, and you are never charged for server errors or time spent waiting in the queue. fal uses a prepaid credit model. You purchase credits in advance and they are drawn down as you use the platform. Credits fund both Model API usage and your concurrency limit, which scales with your purchase history. If you deploy your own applications on Serverless, billing works differently and is covered on the Serverless pricing page.

Per-Model Pricing

The billing unit varies by model type. Image generation models typically charge per image or per megapixel of output, where higher resolutions cost proportionally more. Video generation models charge per second of generated video or a flat rate per video. Other models, such as LLMs or audio models, charge per request or per output unit specific to the model. Models that do not have a fixed per-output price fall back to per-second billing based on the GPU machine type used to run the request. This fallback applies to some less common models and to your own Serverless endpoints.
Model typeBilling unitHow it works
Image generationPer image or per megapixelFlat rate per image, or proportional to output resolution
Video generationPer second or per videoPer second of generated video, or flat rate per video
Other modelsPer request or compute secondsFlat rate per request, or per-second billing by GPU type
Prices vary by model and may change. Check the model’s page or the pricing page for current rates. You can also query prices programmatically through the Platform APIs.

What You Pay For

You are billed for successfully generated outputs. The billing unit (image, megapixel, video second, etc.) is defined per model. When a model does not have a fixed output price, billing falls back to per-second pricing based on the GPU machine type that processed your request.

What You Are Not Charged For

Server errors are never billed. If a request fails with an HTTP 500 or higher status code, no charge is incurred. Time spent waiting in the queue before a runner starts processing your request is also free. Only the actual inference work counts toward your bill.

Checking Prices Programmatically

You can retrieve pricing information for any model endpoint through the Platform APIs. This is useful for building cost estimation into your application or comparing rates across models.
curl "https://api.fal.ai/v1/models/pricing?endpoint_id=fal-ai/flux/dev" \
  -H "Authorization: Key $FAL_KEY"
The response includes the billing unit and unit price for each endpoint:
{
  "prices": [
    {
      "endpoint_id": "fal-ai/flux/dev",
      "unit_price": 0.025,
      "unit": "image",
      "currency": "USD"
    }
  ],
  "next_cursor": null,
  "has_more": false
}
You can also estimate costs before running a request, query usage line items with unit quantities and prices, and access time-bucketed analytics for spend tracking.

Platform APIs for Models

Full reference for pricing, usage, and analytics APIs

Enterprise Pricing

Enterprise customers can receive custom per-endpoint pricing and volume discounts. These are configured per account and apply automatically to all requests. Contact the sales team for details.

Monitoring Your Usage

The billing dashboard shows your overall spend, invoices, and payment history. For per-model cost breakdowns and request-level detail, use the usage and analytics Platform APIs, or check the billing reports available in the dashboard.