Skip to main content
Every deployed app on fal comes with built-in analytics accessible from the dashboard. You can see request volume, latency, error rates, and runner utilization at a glance, then drill into individual requests or runners to understand what happened. The analytics are split into two views. The requests view answers questions about traffic and performance: how many requests are coming in, how fast they complete, which endpoints are slowest, and which requests failed. The runners view answers questions about infrastructure: how runner time is split between setup, inference, and idle, how deep the queue is, and whether you have the right scaling parameters.

Requests

Navigate to Dashboard > Apps > [your-app] > Analytics to see request metrics. You can filter by endpoint, date range, and view mode (your requests vs all requests to your app). The key metrics to watch are request throughput, latency percentiles (P50, P90), and error rate. If latency is higher than expected, drill into individual requests to see a full breakdown from submission to completion, including inputs, outputs, and logs. If errors are spiking, use Error Analytics for a detailed breakdown by status code and error type.

Runners

The runners view shows how your infrastructure is performing. Use it to understand whether your scaling parameters are tuned correctly. If runners are spending most of their time in setup, your cold starts are too long. Look at Optimizing Cold Starts for strategies. If runners are mostly idle, your keep_alive or min_concurrency may be too high. If the queue is consistently deep, you may need to increase max_concurrency or add concurrency buffer. You can also inspect individual runners to see their CPU, memory, and VRAM usage, and jump directly to their logs for debugging.

Metrics Definitions

Request Timing

Every request passes through several stages. The dashboard breaks these into two key charts: Request Startup (time before processing begins) and Request Duration (time spent processing).
MetricWhat it measures
Request StartupTime from submission until the request starts processing on a runner. For queued requests, this includes queue wait time and runner acquisition. If a cold start is needed, the startup time includes waiting for the new runner’s setup() to complete. This is the metric most affected by scaling parameters.
Request DurationTime the request spent processing once it reached a runner. This is the actual execution time of your endpoint handler. Does not include queue wait or cold start time.
Time to First ByteTime from when the request was sent to the runner until the first byte of the response was received. Useful for streaming endpoints where you want to measure how quickly the user sees initial output.
Latency Percentiles (P50, P75, P90)Both Request Startup and Request Duration are shown at the 50th, 75th, and 90th percentile. P90 means 90% of requests completed faster than this value. Use P90 to understand worst-case performance for most users.

Runner Timing

Runner time is split into three categories. The dashboard shows these as a stacked area chart over time.
MetricWhat it measures
Startup TimeTime spent in the SETUP state, running your setup() method. This is your cold start cost. Appears once per runner lifecycle, not per request. Reduce it with FlashPack, compiled caches, or by optimizing your setup() code.
Execution TimeTime the runner spent actively processing requests (in the RUNNING state). This is the productive time you want to maximize.
Idle TimeTime the runner was alive but not processing any requests (in the IDLE state). Several scaling parameters affect idle time. keep_alive controls how long a runner stays alive after its last request before being terminated. min_concurrency keeps runners permanently warm even with no traffic, which means they accumulate idle time continuously. concurrency_buffer maintains extra runners above current demand, so those buffer runners are idle until traffic increases. scaling_delay makes the autoscaler wait before spinning up new runners, which can reduce idle time by reusing existing runners instead of creating new ones that sit idle.
Runner UtilizationExecution time as a percentage of total runner time (startup + execution + idle). Higher utilization means your runners are spending more time doing useful work. If utilization is low, look at which idle-time factor is the largest contributor: reduce keep_alive if runners linger too long after traffic stops, reduce min_concurrency if you have more warm runners than you need, or increase scaling_delay to encourage runner reuse.

Queue and Concurrency

MetricWhat it measures
Queue SizeNumber of requests currently waiting in the queue for an available runner. A consistently growing queue means you need more runners (increase max_concurrency or concurrency_buffer).
Concurrent RequestsNumber of requests being actively processed across all runners at a given moment.
Error RatePercentage of requests that returned a 4XX or 5XX status code over the selected time period.