Timeslices error budget calculation method
The timeslices method of error budget calculation measures how many good minutes were achieved (minutes where a system is operating within defined boundaries) during a time window. Timeslices are currently hard-coded to 1-minute evaluation intervals. A good minute is one where the time slice allowance is not violated. This will be explained in this section.
data:image/s3,"s3://crabby-images/4d413/4d413768bdc0faa8eea20ad8e6a5fcbfd401db61" alt="define error budget"
Targetโ
No matter what budgeting method you choose, target is used to calculate your error budget. If you want something to be reliable 95% of the time (or in other words, have at most a 5% failure rate over the defined time window), then your target should be set at 95%. This is the reliability target for the service level objective (SLO).
data:image/s3,"s3://crabby-images/50e33/50e338e1141e2e8212550c840aa175fb7c0fb695" alt="define error budget"
Timeslice allowanceโ
The time slice allowance is used to evaluate each time slice. It can be considered a micro-objective. This is used to determine if a time slice should be considered good or bad and is a separate evaluation from the error budget. Each time slice is evaluated independently to determine whether it fell within the defined allowance. If so, then it is considered a good minute. If not, then it is considered a bad minute, and some of the error budget will be burned.
If you decide that a good minute is one with a 90% success rate, then your time slice allowance should be set at 90%. Your target will then be for 95% of minutes to have at minimum a 90% success rate.
data:image/s3,"s3://crabby-images/06c5a/06c5a09c8a0c302f3421a278816e43c4c938575d" alt="define time slice objective 1"
Use case exampleโ
Letโs say that over a 24-hour time period a given SLI should have fewer than 10% slow responses (over 750 ms), 95% of the time. Another way of viewing this is that 90% of the data points must be good (in this case, with a value less than 750 ms), 95% of the time. Youโll need to use the timeslices error budget calculation method for this SLO: you have been given a time slice allowance of 90% - a maximum 10% error/failure rate per 1-minute time slice - and this must be achieved in 95% of the timeslices in a 24-hour time period.
data:image/s3,"s3://crabby-images/06c5a/06c5a09c8a0c302f3421a278816e43c4c938575d" alt="define time slice example"
For the impact of the timeslices calculation method for sparse metrics, see the SLO calculations guide.