Introduction to Serverless
fal Serverless provides on-demand, serverless GPUs that scale to thousands of GPUs instantly. This section covers our serverless deployment system, which enables you to host your custom models, apps, and workflows on our infrastructure—the same infrastructure that powers our own models.
Quick Start
Key Features
- Instant scaling: Start from zero to thousands of GPUs instantly
- Pay-per-use: Pay only for the compute you use with auto-scaling and high availability
- Unified framework: Complete solution for running, deploying, and productionizing your AI apps
- GPU access: Access to thousands of H100s, H200s, and other high-performance GPUs
- Full observability: Complete visibility into requests, responses, and latencies (including custom metrics)
- Native clients: HTTP and WebSocket clients that work with both fal-provided models and your own apps