Optimizing Cold Starts

Cold Starts vs Warm Starts

A cold start occurs when a new runner needs to be created from scratch. The runner goes through PENDING → SETUP → IDLE (or PENDING → DOCKER_PULL → SETUP → IDLE if the Docker image isn’t cached) before it can serve requests. A warm start occurs when an existing IDLE runner is reused to handle a new request: IDLE → RUNNING.

What Triggers Cold Starts

No warm runners available (all busy or expired)
Traffic spike exceeds warm runner capacity
First deployment
Runners expired during low traffic periods

Factors Affecting Cold Start Duration

Image size: Larger Docker images take longer to pull
Model size: Larger models take longer to download and load
Setup complexity: Complex initialization in setup() adds time
Cache state: First runs are slower, subsequent runs benefit from caching
Hardware availability: GPU availability varies by region and time

How to Reduce Cold Starts

Each of these strategies targets a different phase of the cold start:

Scaling Parameters

Keep warm runners available with keep_alive, min_concurrency, and buffers

Container Images

Reduce image size for faster pulls with multi-stage builds and smaller base images

Compiled Caches

Cache compiled kernels to speed up setup() across runners

Persistent Storage

Download models to /data for automatic caching between runners

Setting Up

Model APIs

Serverless

Compute

Organizations

Optimizing Cold Starts

Cold Starts vs Warm Starts

What Triggers Cold Starts

Factors Affecting Cold Start Duration

How to Reduce Cold Starts

Scaling Parameters

Container Images

Compiled Caches

Persistent Storage

Setting Up

Model APIs

Serverless

Compute

Organizations

​Cold Starts vs Warm Starts

​What Triggers Cold Starts

​Factors Affecting Cold Start Duration

​How to Reduce Cold Starts

Scaling Parameters

Container Images

Compiled Caches

Persistent Storage

Cold Starts vs Warm Starts

What Triggers Cold Starts

Factors Affecting Cold Start Duration

How to Reduce Cold Starts