@fal.endpoint with the health_check parameter. The endpoint should be lightweight and raise an exception if the runner is in an unrecoverable state. For more on how runners start up and shut down, see App Lifecycle. For readiness and liveness probes at the platform level, see Readiness and Liveness.
Basic Usage
Use thehealth_check keyword argument in the @fal.endpoint() decorator to designate an endpoint as your health check:
Only one endpoint can be designated as the health check endpoint per app.
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
start_period_seconds | int | 30 | Minimum time the runner has been running before health check failures count. Replaced by startup_timeout if that value is higher. |
timeout_seconds | int | 5 | Timeout in seconds for the health check request. |
failure_threshold | int | 3 | Number of consecutive failures before the runner is considered unhealthy and terminated. |
call_regularly | bool | True | If true, fal calls the health check every 15 seconds. If false, health checks only run when manually triggered via the x-fal-runner-health-check header. |
How It Works
By default, fal calls your health check endpoint every 15 seconds. During thestart_period_seconds window after a runner starts, failures are ignored to give your app time to initialize. After that, if the health check fails or times out for failure_threshold consecutive calls, the runner is terminated and replaced. To signal an unhealthy state, raise an exception inside the health check method. Returning a normal response means the runner is healthy.
Manual Health Checks
You can trigger a health check from within a regular endpoint by setting thex-fal-runner-health-check header on the response. This is useful when your endpoint detects a degraded state and wants the platform to verify runner health immediately.
failure_threshold and start_period_seconds. If the subsequent health check fails, the runner is terminated immediately.
Writing Good Health Checks
Keep your health check lightweight. It runs concurrently with normal requests, so heavy GPU work could contend with or slow down in-flight inference. For most apps, checking external dependencies (database connections, upstream APIs) is sufficient. If you do need to verify GPU health, keep the operation minimal (e.g., a small tensor allocation) to avoid interfering with active requests. Only raise exceptions for truly unrecoverable states that require a fresh runner. For transient issues, try to recover within the health check before failing:call_regularly=False and trigger checks manually from your endpoints when you detect a problem.