Skip to main content

Time slices error budget calculation method

Reading time: 0 minute(s) (0 words)

The time slices method of error budget calculation measures how many good minutes were achieved (minutes where a system is operating within defined boundaries) during a time window. Time slices are currently hard-coded to 1-minute evaluation intervals. A good minute is one where the time slice allowance is not violated. This will be explained further later in this document.

define error budget
Image 1: Defining error budget in the UI


No matter what budgeting method you choose, target is used to calculate your error budget. If you want something to be reliable 95% of the time (or in other words, have at most a 5% failure rate over the defined time window), then your target should be set at 95%. This is the reliability target for the service level objective (SLO).

define error budget
Image 2: Defining time slice target in the UI

Time slice allowance​

The time slice allowance is used to evaluate each time slice. It can be considered a micro-objective. This is used to determine if a time slice should be considered good or bad and is a separate evaluation from the error budget. Each time slice is evaluated independently to determine whether it fell within the defined allowance. If so, then it is considered a good minute. If not, then it is considered a bad minute, and some of the error budget will be burned.

If you decide that a good minute is one with a 90% success rate, then your time slice allowance should be set at 90%. Your target will then be for 95% of minutes to have at minimum a 90% success rate.

define time slice objective 1
Image 3: Defining objectives in the UI

Use case example​

Let’s say you are told that over a 24-hour time period a given SLI should have fewer than 10% slow responses (over 750 ms), 95% of the time. Another way of viewing this is that 90% of the data points must be good (in this case, with a value less than 750 ms), 95% of the time. You’ll need to use the time slices error budget calculation method for this SLO: you have been given a time slice allowance of 90% - a maximum 10% error/failure rate per 1-minute time slice - and this must be achieved in 95% of the time slices in a 24-hour time period.

define time slice example
Image 4: Defining objectives in the UI - example
time slices and sparse metrics

For the impact of the time slices calculation method for sparse metrics, see the SLO calculations guide.