Considerations to configure event retention for SQS trigger inputs
This feature only allows a maximum of 10 inflight requests. Containers that are configured with more than 10 replicas will not be used at full capacity. This limitation will be removed in the future.
Triggers get events from an input, such as an SQS queue, and forward them to a container, which will scale up according to its settings to accommodate the workload. This process uses back pressure, so that no new events are read if the container instances have not yet finished processing the previous ones.
Triggers only keep a buffer of the messages that are inflight, they do not drain all the messages of the input in advance.
As a result, in some scenarios such as event bursts or slow computations, events may stay in the input buffer for a while before being consumed. If the input messages in the queue are set with a timeout, such as the MessageRetentionPeriod in SQS queues, events may be deleted before triggering the container.
The implementation of the core trigger behavior is input-agnostic. It is therefore your responsibility to configure the input buffers according to your use case to avoid losing events.
The following formulas may help you choose the retention values in scenarios where every message must be processed.
The first parameter to calculate is the throughput, which corresponds to the amount of messages that can be consumed by the container every second. This value can be obtained from the container’s processing time when invoked and the maximum number of instances it can scale up to.
Container throughput = ( 1 / processing time ) * maximum instances
For example, a container that takes 0.1 seconds to complete with a maximum of 10 replicas has a throughput of
( 1 / 0.1 ) * 10 = 100 messages per second.
As long as the number of events sent to the input per second is lower than the container throughput, the events will be consumed almost immediately. However, if there is a burst of events higher than this value, they will be throttled and buffered in the input while they are consumed.
The time required for all the burst messages to be consumed can be calculated with the following formula:
Delay = burst size / container throughput
For example, a container with a throughput of 100 messages per second that receives a burst of 200 messages will take
200 / 100 = 2 seconds to process them.
The previous formulas assume that the container is completely scaled up at the moment of receiving the events, but it may be only partially scaled or must be scaled from zero. This can be the case if the workload is not constant over time. In these scenarios, the cold-start latency of the new instances must be taken into consideration.
As a result, the minimum retention period in the input buffer can be estimated with the following formula:
Minimum retention = max expected burst size / ( ( 1 / processing time ) * max instances ) + cold start latency
For example, a container that takes 30 seconds to cold-start, takes 5 seconds to complete, has a maximum of 10 replicas and expects a maximum burst of 200 messages should have an input retention of at least
200 / ( ( 1 / 5 ) * 10 ) + 30 = 130 seconds.
This input retention value corresponds to the minimum time required to process a single message burst, the actual value used in production should therefore be higher.