Introduction to Serverless

Key Features
What’s New

Enterprise FeaturePlease visit the Serverless Get Started page to request access.

fal Serverless provides on-demand, serverless GPUs that scale to thousands of GPUs instantly. This section covers our serverless deployment system, which enables you to host your custom models, apps, and workflows on our infrastructure—the same infrastructure that powers our own models.

Quick Start

Deploy your first model in minutes with our step-by-step guide.

Migrate in 5 Minutes

Already have a Docker server? Migrate it to fal with minimal changes.

Examples

Step-by-step tutorials for deploying text-to-image, video, speech, and more.

CLI Reference

Complete reference for fal deploy, fal apps, fal runners, and more.

Key Features

Instant scaling: Start from zero to thousands of GPUs instantly
Pay-per-use: Pay only for the compute you use with auto-scaling and high availability
Unified framework: Complete solution for running, deploying, and productionizing your AI apps
GPU access: Access to thousands of H100s, H200s, and other high-performance GPUs
Full observability: Complete visibility into requests, responses, and latencies (including custom metrics)
Native clients: HTTP and WebSocket clients that work with both fal-provided models and your own apps

What’s New

A recap of the biggest Serverless updates: smarter scaling, observability tools, cold start optimizations, multi-GPU support, and more.

Connect fal to Cursor

Getting Started

Reliability

Deployment & Operations

Development

Multi-GPU Workloads

Advanced Optimizations

Migrations

Python SDK

Introduction to Serverless

Quick Start

Migrate in 5 Minutes

Examples

CLI Reference

Key Features

What’s New

Getting Started

Reliability

Deployment & Operations

Development

Multi-GPU Workloads

Advanced Optimizations

Migrations

Python SDK

Quick Start

Migrate in 5 Minutes

Examples

CLI Reference

​Key Features

​What’s New

Key Features

What’s New