Cold Starts vs Warm Starts
A cold start occurs when a new runner needs to be created from scratch. The runner goes through:PENDING → DOCKER_PULL → SETUP → IDLE before it can serve requests.
A warm start occurs when an existing IDLE runner is reused to handle a new request: IDLE → RUNNING.
What Triggers Cold Starts
- No warm runners available (all busy or expired)
- Traffic spike exceeds warm runner capacity
- First deployment
- Runners expired during low traffic periods
Factors Affecting Cold Start Duration
- Image size: Larger Docker images take longer to pull
- Model size: Larger models take longer to download and load
- Setup complexity: Complex initialization in
setup()adds time - Cache state: First runs are slower, subsequent runs benefit from caching
- Hardware availability: GPU availability varies by region and time
Scaling Parameters
The most effective way to reduce cold starts is maintaining warm runners using scaling parameters.keep_alive
Default: 10 seconds
Keep runners alive after their last request completes.
min_concurrency
Default: 0
Maintain minimum runners alive at all times, regardless of traffic.
concurrency_buffer
Default: 0
Maintain extra runners beyond current demand.
min_concurrency when higher.
concurrency_buffer_perc
Default: 0
Set buffer as a percentage of current request volume.
concurrency_buffer and concurrency_buffer_perc / 100 * request volume.
max_multiplexing
Default: 1 (code-specific parameter)
Number of concurrent requests each runner handles simultaneously.
scaling_delay
Default: 0 seconds
Wait time before scaling up when a request is queued.
startup_timeout
Default: Varies (code-specific parameter)
Maximum time allowed for setup() to complete.
Other Optimization Strategies
Image optimization: Use smaller base images, multi-stage builds. See Optimize Container Images. Persistent storage: Download models to/data for automatic caching. See Use Persistent Storage.
Compiled caches: Share compilation artifacts across runners. See Optimize Startup with Compiled Caches.
Cost Considerations
More warm runners = lower latency but higher cost. Balance based on your needs:- Latency-critical apps: Accept higher cost for warm runners
- Cost-sensitive apps: Optimize cold starts, accept some latency
- Variable traffic: Use buffers and scaling delays
Related Resources
- Understanding Runners - Runner lifecycle and states
- Scale Your Application - Complete scaling parameter reference
- Monitor Performance - Performance monitoring and metrics