Cold Starts vs Warm Starts
A cold start occurs when a new runner needs to be created from scratch. The runner goes throughPENDING → SETUP → IDLE (or PENDING → DOCKER_PULL → SETUP → IDLE if the Docker image isn’t cached) before it can serve requests.
A warm start occurs when an existing IDLE runner is reused to handle a new request: IDLE → RUNNING.
What Triggers Cold Starts
- No warm runners available (all busy or expired)
- Traffic spike exceeds warm runner capacity
- First deployment
- Runners expired during low traffic periods
Factors Affecting Cold Start Duration
- Image size: Larger Docker images take longer to pull
- Model size: Larger models take longer to download and load
- Setup complexity: Complex initialization in
setup()adds time - Cache state: First runs are slower, subsequent runs benefit from caching
- Hardware availability: GPU availability varies by region and time
How to Reduce Cold Starts
Each of these strategies targets a different phase of the cold start:Scaling Parameters
Keep warm runners available with keep_alive, min_concurrency, and buffers
Container Images
Reduce image size for faster pulls with multi-stage builds and smaller base images
Compiled Caches
Cache compiled kernels to speed up setup() across runners
Persistent Storage
Download models to /data for automatic caching between runners