Containers autoscaling reference

Reviewed on February 18, 2025

Benefits of autoscaling

Autoscaling offers several benefits, including improved responsiveness and cost efficiency. By automatically adjusting the number of instances of your resource based on current demand, you ensure that your applications can handle varying loads without manual intervention. This not only enhances user experience by maintaining performance levels but also helps in reducing costs by only using resources when necessary. Additionally, autoscaling helps in optimal resource utilization, minimizing the risk of performance degradation during peak times.

Autoscaling can be based on exactly one of the following parameters:

Request concurrency
CPU percentage
RAM percentage

Request concurrency

Minimum and maximum scales

Minimum scale (min-scale)

This parameter sets the minimum number of instances your resource is allowed to scale down to:

If you set a value of 0, all instances of your resource will be terminated after 15 minutes of inactivity.
If you set a value of 1 or more, the corresponding number of instances of your resource will remain available at all times.

Customizing the minimum scale for Serverless can help ensure that an instance remains pre-allocated and ready to handle requests, reducing delays associated with cold starts. However, this setting also impacts the costs of your Serverless Container.

Maximum scale (max-scale)

This parameter sets the maximum number of instances of your resource. You should adjust it based on your resource's traffic spikes, keeping in mind that you may wish to limit the max scale to manage costs effectively.

When the maximum scale is reached, new requests are queued for processing. When the queue is full, the resource returns 503 errors for requests received beyond this point.

Autoscaler behavior

The autoscaler decides to add new instances (scale up) when the number of concurrent requests defined (default is 80) is reached.

The same autoscaler decides to remove instances (scale down) down to 1 when no more requests are received for 30 seconds.

Scaling down to zero (if min-scale is set to 0) happens after 15 minutes of inactivity.

Note

Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones.

Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.

CPU and RAM percentage

Minimum and maximum scales

Minimum scale (min-scale)

This parameter sets the minimum number of instances your resource is allowed to scale down to.

Note

For technical reasons, the minimum scale for CPU/RAM-based autoscaling is 1.

Maximum scale (max-scale)

This parameter sets the maximum number of instances of your resource. You should adjust it based on your resource's CPU or RAM usage spikes, keeping in mind that you may wish to limit the max scale to manage costs effectively.

Autoscaler behavior

The autoscaler decides to start new instances when the existing instances' CPU or RAM usage exceeds the threshold you defined for a certain amount of time.

The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold.

Still need help?

Create a support ticket