App Analytics

Every deployed app on fal comes with built-in analytics accessible from the dashboard. You can see request volume, latency, error rates, and runner utilization at a glance, then drill into individual requests or runners to understand what happened. The analytics are split into two views. The requests view answers questions about traffic and performance: how many requests are coming in, how fast they complete, which endpoints are slowest, and which requests failed. The runners view answers questions about infrastructure: how runner time is split between setup, inference, and idle, how deep the queue is, and whether you have the right scaling parameters.

Requests

Navigate to Dashboard > Apps > [your-app] > Analytics to see request metrics. You can filter by endpoint, date range, and view mode (your requests vs all requests to your app). The key metrics to watch are request throughput, latency percentiles (P50, P90), and error rate. If latency is higher than expected, drill into individual requests to see a full breakdown from submission to completion, including inputs, outputs, and logs. If errors are spiking, use Error Analytics for a detailed breakdown by status code and error type.

Runners

The runners view shows how your infrastructure is performing. Use it to understand whether your scaling parameters are tuned correctly. If runners are spending most of their time in setup, your cold starts are too long. Look at Optimizing Cold Starts for strategies. If runners are mostly idle, your keep_alive or min_concurrency may be too high. If the queue is consistently deep, you may need to increase max_concurrency or add concurrency buffer. You can also inspect individual runners to see their CPU, memory, and VRAM usage, and jump directly to their logs for debugging.

Metrics Definitions

Request Timing

Every request passes through several stages. The dashboard breaks these into two key charts: Request Startup (time before processing begins) and Request Duration (time spent processing).

Metric	What it measures
Request Startup	Time from submission until the request starts processing on a runner. For queued requests, this includes queue wait time and runner acquisition. If a cold start is needed, the startup time includes waiting for the new runner’s `setup()` to complete. This is the metric most affected by scaling parameters.
Request Duration	Time the request spent processing once it reached a runner. This is the actual execution time of your endpoint handler. Does not include queue wait or cold start time.
Time to First Byte	Time from when the request was sent to the runner until the first byte of the response was received. Useful for streaming endpoints where you want to measure how quickly the user sees initial output.
Latency Percentiles (P50, P75, P90)	Both Request Startup and Request Duration are shown at the 50th, 75th, and 90th percentile. P90 means 90% of requests completed faster than this value. Use P90 to understand worst-case performance for most users.

Runner Timing

Runner time is split into three categories. The dashboard shows these as a stacked area chart over time.

Metric	What it measures
Startup Time	Time spent in the SETUP state, running your `setup()` method. This is your cold start cost. Appears once per runner lifecycle, not per request. Reduce it with FlashPack, compiled caches, or by optimizing your `setup()` code.
Execution Time	Time the runner spent actively processing requests (in the RUNNING state). This is the productive time you want to maximize.
Idle Time	Time the runner was alive but not processing any requests (in the IDLE state). Several scaling parameters affect idle time. `keep_alive` controls how long a runner stays alive after its last request before being terminated. `min_concurrency` keeps runners permanently warm even with no traffic, which means they accumulate idle time continuously. `concurrency_buffer` maintains extra runners above current demand, so those buffer runners are idle until traffic increases. `scaling_delay` makes the autoscaler wait before spinning up new runners, which can reduce idle time by reusing existing runners instead of creating new ones that sit idle.
Runner Utilization	Execution time as a percentage of total runner time (startup + execution + idle). Higher utilization means your runners are spending more time doing useful work. If utilization is low, look at which idle-time factor is the largest contributor: reduce `keep_alive` if runners linger too long after traffic stops, reduce `min_concurrency` if you have more warm runners than you need, or increase `scaling_delay` to encourage runner reuse.

Queue and Concurrency

Metric	What it measures
Queue Size	Number of requests currently waiting in the queue for an available runner. A consistently growing queue means you need more runners (increase `max_concurrency` or `concurrency_buffer`).
Concurrent Requests	Number of requests being actively processed across all runners at a given moment.
Error Rate	Percentage of requests that returned a 4XX or 5XX status code over the selected time period.

Setting Up

Model APIs

Serverless

Compute

Organizations

App Analytics

Requests

Runners

Metrics Definitions

Request Timing

Runner Timing

Queue and Concurrency

Setting Up

Model APIs

Serverless

Compute

Organizations

​Requests

​Runners

​Metrics Definitions

​Request Timing

​Runner Timing

​Queue and Concurrency

Requests

Runners

Metrics Definitions

Request Timing

Runner Timing

Queue and Concurrency