SLOs error budget

Reading time: 0 minute(s) (0 words)

It's helpful to first examine how a standard SLO budget is determined to understand how the composite SLO budget is calculated. For standard SLOs, reliability is the ratio of successful events to total events or good minutes to total minutes over a specific time window. If an SLO has only successful events, its reliability over time would appear as follows:

regular-slo-budget — Error budget of a standard SLO

Reliability

Reliability is a ratio, meaning the number of successful events can never exceed the total number of events. As a result, its value will always fall between 0 and 1, or can be expressed as a percentage.

A realistic SLO that has some bad events (some error) will have reliability somewhere below 100%:

regular-slo-reliability — Reliability of a standard SLO

Error budget target

Once reliability has been measured, we can set a budget target, which represents the minimum acceptable level of reliability. This target acts as a threshold on the Y-axis of the reliability chart. For example, let's say we picked our SLO's target at 75%:

Reliability above the target is acceptable, while reliability below the target is not.

This means our error budget is the remaining 25% (calculated as 100% - 75%), representing the maximum number of errors we are willing to tolerate. To track how much of the error budget we've consumed, we scale the Y-axis of the reliability chart. In this context, 75% reliability corresponds to 0% of the budget, while 100% reliability represents 100% of the budget

reliability-below-target — Reliability below target

It's the same graph as before, but with different Y-axis labels. Here, we can see that the entire budget has been consumed, and it turns negative at the point where reliability falls below the target. When viewing the reliability graph, selecting a target for your standard SLO becomes straightforward.

Use-case example 1

Suppose you're tracking HTTP requests over a 28-day rolling period. You might set a reliability target of 99.9% for that service, meaning you expect at least 99.9% of all requests to be successful within any 28-day period.

Use-case example 2

Imagine you ping a service every minute and are focused on the time the service is available to clients during a calendar month. In this case, you would use the Timeslices method with a 1-month, calendar-aligned time window. After consulting with your stakeholders, you agree to accept up to 1 hour of downtime per calendar month. This expectation translates to a reliability target of 99.86%, calculated as 99.86% = (720h - 1h) / 720h (where 30 days equals 720 hours, with a 1-hour error budget).

Error budget of a composite SLO

Knowing how the budget is calculated for standard SLOs, let’s look at how it’s calculated for composite SLOs. The composite budget is calculated based on the reliability over a given time window of multiple other SLOs.

Let’s take a look at several standard SLOs with 100% reliability:

reliability-below-target-2 — standard SLOs with 100% reliability

The reliability of every component is expressed as a percentage on a scale from 0% to 100%.

Reliability is always calculated over a specific time period. When assessing the reliability of component SLOs aggregated into a composite SLO, the relevant time period is the composite’s time window, which may differ from the time windows configured for individual SLOs.

The reliability of a composite SLO is also measured on a scale from 0% to 100%, but it reflects the combined reliability of all its components. This can be visualized as a stacked area chart:

reliability-100-samples — Reliability 100% samples

Now, let’s take a look at a more realistic example where each component has some error and reliability below 100%:

composite-reliability-below-100 — Composite reliability below 100

A composite SLO’s reliability composed of these SLOs would look like this:

composite-reliability-sample — Composite reliability sample

Reliability of components is “stacked” and normalized to 100%. In Nobl9, this result is presented without coloring of individual components:

normalized-reliability — Normalized reliability chart for a composite SLO

The composite SLO’s target is also just a reliability threshold. It’s a point selected on the Y-axis that indicates the lowest acceptable reliability.

Let’s assume the target, for example composite is set to 75%:

composite-target — Composite target at 75%

The remaining budget of composite SLO is the same as the reliability, but with the Y-axis scaled, the target is at 0%. This reflects that, by definition, we are accepting our reliability to be below 100% but not lower than the target.

composite-remaining-budget — Composite SLO remaining budget

This is the same chart as above, but with the Y-axis scaled. The peaks and valleys appear steeper, but that's simply a result of stretching the diagram vertically.

Error budget of a composite SLO with weighted components

So far we've considered a scenario where all components were weighted equally:

Component SLO	Absolute weight (set by user)	Normalized weight	Normalized weight calculation
SLO A	1	25%	`1 / (1 + 1 + 1 +1)`
SLO B	1	25%	`1 / (1 + 1 + 1 +1)`
SLO C	1	25%	`1 / (1 + 1 + 1 +1)`
SLO D	1	25%	`1 / (1 + 1 + 1 +1)`

All normalized weights in our example are equal to 25%, which indicates that in the chart of the composite’s reliability without errors, each component contributes 25% to the composite’s overall reliability:

composite-slo-reliability — Composite SLO reliability

Important insight

Maximum of composite SLO’s reliability that can be burned by a given component equals to this component's normalized weight.

It implies that a single component SLO, unless it’s the only component existing in a composite SLO, can not bring the reliability of a composite SLO down to 0%.

Let’s take a look at how our example changes when assigned different weights to different components:

Component SLO	Absolute weight (set by user)	Normalized weight	Normalized weight calculation
SLO A	8	53%	`8 / (8 + 4 + 1 + 2)`
SLO B	4	27%	`4 / (8 + 4 + 1 + 2)`
SLO C	1	7%	`1 / (8 + 4 + 1 + 2)`
SLO D	2	13%	`2 / (8 + 4 + 1 + 2)`

Reliability of a composite SLO, when all component SLOs have 100% reliability but different weights now looks like this:

composite-slo-reliability-2 — Composite SLO reliability - 100% reliability for all weights

As you can see, the thickness of different component bands corresponds to their normalized weight.

Component weights are determined by their ratio to one another, not their absolute values. For example, weights of 2 and 4 are functionally the same as weights of 50 and 100. While there's no limit to the values you can use, maintaining the correct proportion is key:

Component SLO	Absolute weight (set by user)	Normalized weight	Normalized weight calculation
SLO A	24	53%	`24 / (24 + 12 + 3 + 6)`
SLO B	12	27%	`12 / (24 + 12 + 3 + 6)`
SLO C	3	7%	`3 / (24 + 12 + 3 + 6)`
SLO D	6	13%	`6 / (24 + 12 + 3 + 6)`

If we take the same component SLO data, with some errors as before but with the new weights, then our composite SLO’s reliability would look like this:

composite-slo-reliability-new-weights — Composite SLO reliability overview - weight change

Now, let’s set a 75% target over that composite SLO:

composite-slo-75-target — Composite SLO with a 75% target

The top 25% is our error budget, so the budget burned chart will look like this:

composite-slo-75-target-remaining-budget — Composite SLO remaining budget for a 75% target

We can notice a few things about our newly calculated composite budget:

It is different than the case when all weights were equal.
The shape of the budget somewhat resamples the shape of SLO A’s budget. That is because SLO A has significantly more weight than other components. Other components also contribute to the burn of the composite SLO’s budget, but not as much.

Reliability​

Error budget target​

Use-case example 1​

Use-case example 2​

Error budget of a composite SLO​

Error budget of a composite SLO with weighted components​