What’s New in ServerlessScaling, observability, cold starts, multi-GPU & more
Machine Type in Runner Output
fal runners listandfal app runners <app>now display the machine type for each runner (e.g.GPU-A100,GPU-H100)- JSON output (
--output json) also includes themachine_typefield for each runner
Runner FAILURE_DELAY Status
- Runners that fail during
setup()now show aFAILURE_DELAYstatus, making it easier to identify runners that are in a cooldown period before retrying initialization - You can filter runners by this state using
fal runners list --state failure_delay - See Understanding Runners for details
Interactive Log Histogram
- The logs page now features an interactive histogram that visualizes log volume over time, broken down by severity level
- Click and drag to select a time range on the histogram to zoom into that window and filter your logs instantly
- Zoom in and out to explore log patterns at different time granularities
- Color-coded bars show the distribution of stderr, error, warning, info, and trace logs at a glance
Jump to Context
- When viewing a specific log entry, use the new Jump to Context button to instantly scroll to surrounding log lines
- Quickly see what happened before and after any log entry without manually searching through timestamps
- Especially useful when navigating to a log from a shared link or alert
Switch Log Timezones Between UTC and Local
- You can now toggle between UTC and your local timezone directly from the datepicker in the logs page
- All log timestamps, filters, and the histogram update instantly when switching timezones
Serverless App Cards with Error Rates and Graphs
- The app listing page now displays rich cards for each serverless app with at-a-glance performance metrics
- Error rate indicators show the health of each app directly on the card
- Inline sparkline graphs visualize request volume and error trends over time
- Quickly identify apps that need attention without clicking into each one individually
Share Logs with Your Team
- You can now click on any log entry and share the link directly with the rest of your team
- Shared log links preserve the full context of the log, making it easy to collaborate on debugging and troubleshooting
Runner Side Sheet
- Click into any runner on the Runners page to open a detailed side sheet with telemetry, logs, and the ability to connect to the runner
- Makes it much easier to debug and observe your runners without leaving the page
--auth flag for fal run
You can now specify the authentication mode when running your app with fal run using the --auth flag. Supported values are public and private, giving you control over who can access your app during development and testing.public— no authentication required, app owner paysprivate— only you or your team can access
Full-Screen Logs
- The logs page now opens in a full-screen view, giving you significantly more vertical and horizontal space to work with
- See more log lines at once and reduce scrolling when debugging complex issues
FalBaseModel for better input/output definitions
Define your API inputs and outputs withFalBaseModel a Pydantic base class with built-in support for hidden fields, field ordering, and media type hints.- Hidden fields - Use
Hidden(Field(...))to mark parameters as API-only, hiding them from the playground UI while keeping them accessible via API - Field ordering - Control the order of fields in your API schema with
FIELD_ORDERS - Media field helpers - Use
ImageField,AudioField,VideoField, andFileFieldfor better playground rendering
Switch environments from app pages
You can now quickly switch between environments directly from any app page using the new environment dropdown in the dashboard.- Environment dropdown - Click the environment badge on any app page to see all environments where the app is deployed
- One-click switching - Select an environment to navigate to the same app in that environment
- Quick access - View environment secrets or create new environments directly from the dropdown
Track when runners are idle and waiting for work
Understand runner utilization with the new IDLE state that shows when runners are ready but not actively processing requests.- IDLE state visibility - see when runners finish processing and are waiting for new work
- Better resource monitoring - distinguish between actively processing requests and waiting states
- Improved observability - track idle time to optimize scaling and resource allocation
Control queue wait time with start_timeout
You can now set a start_timeout on requests which ensures that when queue time is too long, the request is aborted without starting.See Client Libraries → Start Timeout for details.Environments for isolated deployments
Organize your applications, secrets, and configurations across different stages of your workflow with the new environments feature.- Create isolated environments for development, staging, and production
- Environment-scoped secrets - use different API keys and credentials per environment
- Deploy to specific environments using the
--envflag
Handle graceful shutdown with handle_exit() and teardown()
You can now define handle_exit() and teardown() methods in your app to handle graceful shutdown.handle_exit()- Called when the runner is requested to terminate to signal handlers to stop early.teardown()- Called when the runner is shutting down to clean up resources.
Include local files in container builds with COPY and ADD
You can now use standard DockerCOPY and ADD commands to include local files in your container builds.- Automatic file parsing - fal parses your Dockerfile to find COPY/ADD commands and collects referenced files
- Hash-based deduplication - Files are uploaded to fal’s storage with content-addressable deduplication (files from
app_filesare reused automatically) - .dockerignore support - Create a
.dockerignorefile or useadd_dockerignore()to exclude files - Multi-stage build support -
COPY --from=...commands are correctly handled (only local files are collected) - Smart rebuilds - Changes to your
fal.Appfile don’t trigger rebuilds (it’s pickled separately); only changes to COPY/ADD referenced files trigger rebuilds
Skip retries for specific conditions
You can now skip retries for specific conditions using theskip_retry_conditions option."server_error", "timeout".See retry policy docs for details.Graceful shutdown of fal apps
Starting withfal>=1.61.0, runners now receive a SIGTERM signal when terminated and are given a 5-second grace period to complete ongoing requests before being forcefully terminated with SIGKILL.This applies to all termination scenarios: expiration, manual stop/kill, and scaling down. Use the teardown() method to handle cleanup during this grace period.See lifespan docs for details.Add a health check endpoint to your application
Add a health check endpoint to your application to automatically replace unhealthy runners.- Health check endpoint - Pass the
health_checkparameter to the@fal.endpoint()decorator to configure an endpoint as your health check - Periodic checks and recovery - fal periodically (every 15 seconds) calls this endpoint and replace unhealthy runners if it fails for a few consecutive calls
Disable environment build cache
You can now disable the environment build cache by passing the--no-cache flag to the fal deploy or fal run command.See custom container image docs.Scale your application with the new scaling delay feature
Scale your application with the new scaling delay feature.- Scaling delay - the amount of seconds the system will wait for a request to be picked up by a runner before triggering a scale up of a runner
Reduce cold start times with shared compiled PyTorch caches
Dramatically reduce cold start times for torch.compile() models with the new inductor cache utilities.- Load pre-compiled CUDA kernels in ~2 seconds instead of recompiling for 20-30 seconds on each worker
- GPU-specific caching automatically organized by GPU type (H100, H200, A100)
- Two usage patterns: Manual control with
load_inductor_cache()/sync_inductor_cache()or automatic withsynchronized_inductor_cache()context manager - Persistent shared storage at
/data/inductor-caches/<GPU_TYPE>/<cache_key>.zip - First worker compiles and shares, subsequent workers load instantly
Get Slack notifications for serverless app failures
Never miss critical issues with instant Slack alerts for your serverless applications.- Connect your workspace with one-click OAuth installation
- Choose notification channel from a dropdown of your Slack channels
- Instant alerts for:
- App startup failures and timeouts
- Critical platform issues
- Real-time error notifications
- Team visibility - everyone in the channel sees important updates
- Configure at
https://fal.ai/dashboard/notifications/settings
Stop and kill runners directly from the dashboard
No more switching to the CLI to manage your runners. You now have full lifecycle control right from the dashboard.- Graceful shutdown or force kill runners with a single click
- Access at
https://fal.ai/dashboard/apps/{username}/{appname}/runners
Stream platform logs to your own endpoint with drains
Integrate fal’s logging with your existing observability stack using the new Serverless Drains feature.- Automatic log forwarding from apps, runners, and file operations in NDJSON format
- Works with Datadog, Splunk, Elasticsearch, or any HTTP endpoint
- Configure at
https://fal.ai/dashboard/drains
Upload larger files with improved timeout handling
We’ve significantly improved the reliability of file uploads from URLs, especially for large datasets and model files.- Extended timeout to 10 minutes for
fal files uploadandfal files upload-url - Upload multi-GB files without timeout errors
- See
fal filesdocs
Restart all runners without redeploying
Apply environment changes or recover from bad states instantly with the newfal apps rollout command.- Restart all runners for an app without creating a new deployment
- Graceful by default (runners finish current requests) or use
--forcefor immediate restart - Pick up new secrets, environment variables, or clear memory issues
- See
fal apps rolloutdocs
Stop specific runners without affecting others
Target individual runners for maintenance with graceful shutdown viafal runners stop.- Stop specific runners without affecting others, useful for targeted maintenance
- See
fal runnersdocs
Debug production runners with interactive shell access
Jump directly into any running container to troubleshoot issues in real-time withfal runners shell.- SSH-like access to inspect files, environment variables, and dependencies
- Debug production issues without redeploying
- See
fal runners shelldocs
See everything happening in your app with the events timeline
Complete activity history for runners, deployments, and config changes in one place.- Unified timeline of runner events, deployments, and config changes
- Access at
https://fal.ai/dashboard/apps/{username}/{appname}/events
Get from zero to deployed in minutes with in-app onboarding
New interactive guide walks you through your first serverless deployment step-by-step.- Step-by-step walkthrough from installation to deployment with copy-paste examples
- Access at
https://fal.ai/dashboard/serverless-get-started
Delete files from fal storage
Remove files and directories with the newfal files rm command.- Recursive deletion:
fal files rm path/to/file-or-directory - See
fal filesdocs
Platform APIs v1 officially released
Programmatically manage your model deployments with the new Platform APIs.- Model discovery - search and metadata retrieval for 600+ models
- Pricing and cost estimation - real-time pricing information
- Usage tracking - detailed line items with quantities and prices
- Analytics - request counts, error rates, and latency percentiles
- Available at
https://api.fal.ai/v1- see docs
Get notified when you hit concurrent requests limits
Never wonder why requests are queuing—we now send notifications when you reach your concurrency limit.- Email and dashboard notifications with smart throttling (immediate, 1h, 1d, weekly)
- Limit value included in 429 responses for programmatic handling
Debug errors faster with the new errors page
Comprehensive error analytics to identify and resolve issues quickly.- Server vs client error rates with 4xx/5xx breakdown and sparklines
- Error timeline with status code distribution and endpoint-level breakdown
- Access at
https://fal.ai/dashboard/apps/{username}/{appname}/errors
Stop or kill individual runners from the command line
Precise control over each runner’s lifecycle without touching the dashboard.fal runners stop- gracefully stop a runner, allowing in-flight requests to completefal runners kill- immediately terminate a runner without waiting- See
fal runnersdocs
See exactly how long runners spend starting up
Identify GPU availability bottlenecks and optimize cold start times.- Pending uptime metrics show how long runners wait before becoming active
- Track PENDING, DOCKER_PULL, and SETUP state durations separately
Connect fal docs to Cursor with MCP
Access the complete fal documentation directly in Cursor using Model Context Protocol.- Complete documentation in your IDE with AI-powered suggestions
- Simple setup: add fal MCP server to your
mcp.json- see guide
Personalized dashboard with creator and developer views
The dashboard now adapts to your workflow with two distinct experiences.- Creator view - gallery-focused with favorite models and visual generation history
- Developer view - metrics-driven with usage stats, error tracking, and API analytics
- Quick stats showing credits, requests, and errors with sparklines
Add custom headers to your API requests
Integrate seamlessly with analytics, auth, and middleware by passing custom HTTP headers.- Add custom headers for analytics, authentication, or middleware integration
- Works with all client libraries
Multi-GPU inference and training with fal.distributed
Scale AI workloads across multiple GPUs with the newfal.distributed module.- Data parallelism - generate multiple outputs simultaneously (e.g., 4 images on 4 GPUs)
- Model parallelism - split large models across GPUs for faster generation
- Distributed training - synchronized gradient updates with DDP
- Supports 2, 4, or 8 GPU configurations on H100s and A100s
- See distributed docs
Dedicated pages for Analytics, Runners, Logs, and Versions
Complete app details redesign gives each deployment aspect its own focused view.- New Analytics page - runner-focused metrics with date range filtering
- New Runners page - app-scoped runner view with enhanced filters
- New Logs page - dedicated log viewer for debugging
- New Versions page - manage and view app revisions
- Enhanced Overview - endpoint stats and performance metrics at a glance
Compare models side-by-side in the new Sandbox
Find the perfect model by testing multiple options in parallel with the same prompt.- Run multiple models simultaneously with the same prompt
- Available at
https://fal.ai/sandbox
Manage deployments from Python without async/await
New synchronous client makes serverless management feel just like the CLI.- Manage apps, runners, and deployments programmatically without async/await
- Same API as CLI:
client.apps.*,client.runners.*,client.deploy() - See Python client docs
Bring your own container to any deployment
Full control over your runtime environment with custom Docker images.- Use
ContainerImage.from_dockerfile_str()orContainerImage.from_dockerfile() - Install any dependencies, tools, or system packages you need
- See custom containers guide
Dynamic auto-scaling with percentage-based buffers
Scale more intelligently by setting concurrency buffers as percentages instead of fixed numbers.- Configure buffer as a percentage of current concurrency for dynamic scaling
- See scaling docs
Runner logs with streaming and filtering
Real-time log streaming and powerful filtering for faster debugging.- Stream logs in real-time with
fal runners logs --follow - Filter by time range with
--sinceand--until - Search logs with
--searchparameter - Scrollable and searchable in the dashboard with SSE-powered updates
- See
fal runners logsdocs
Include local files in your deployments automatically
Bring configs, utilities, and code from your local machine into serverless apps.- Specify files with relative or absolute paths to include at runtime
- Works with
fal runandfal deploy - See app files docs
Find what you need faster with reorganized navigation
Clearer dashboard structure groups features by workflow: Generate, Serverless, and Manage.- Generate group: Sandbox, Model Gallery
- Serverless group: Apps, Logs, Files, Runners
- Manage group: Usage, Billing, API Keys, Webhooks, Team Members
Know exactly which version each runner is running
Track deployments better with revision IDs shown on every runner.- Revision ID displayed on runners to track which version is running
- State renamed: “DEAD” → “TERMINATED” for clarity
Filter logs with custom labels and powerful queries
Find what you need instantly with EXACT/CONTAINS matching and multi-condition filters.- EXACT or CONTAINS matching for label values
- Multiple conditions with OR logic (e.g.,
status IN ["error", "warning"]) - Available in dashboard and API
- Examples:
error_type = "ValidationError",endpoint CONTAINS "/api/v2/"
See what runners are doing during startup
Track exactly where runners are in the startup process—pending, pulling images, or setting up.fal runners listnow shows PENDING, DOCKER_PULL, and SETUP states- Understand deployment progress in real-time
View all app endpoints and config at a glance
Redesigned app details page surfaces the information you need most.- Endpoints, configuration, and status all in one place
Monitor and clear your request queue from the CLI
Check how many requests are queued and flush them when needed.fal queue size app_name- check queue size for an appfal queue flush app_name- flush all pending requests- See
fal queuedocs
View runner history with time-based filtering
See terminated runners and filter by state to debug failures.fal runners list --since "1h"- view runners from the last hour (max 24h)fal runners list --state dead- filter by state (running, pending, setup, dead)- Helpful for debugging failed deployments and understanding runner lifecycle
- See
fal runners listdocs
Reorganize files in fal storage without re-uploading
Move and rename files instantly with the newfal files mv command.- Rename or move files in fal storage:
fal files mv source destination - See
fal filesdocs