Skip to main content

SLOs error budget

Reading time: 0 minute(s) (0 words)

It's helpful to first examine how a standard SLO budget is determined to understand how the composite SLO budget is calculated. For standard SLOs, reliability is the ratio of successful events to total events or good minutes to total minutes over a specific time window. If an SLO has only successful events, its reliability over time would appear as follows:

regular-slo-budget
Error budget of a standard SLO

Reliability

Reliability is a ratio, meaning the number of successful events can never exceed the total number of events. As a result, its value will always fall between 0 and 1, or can be expressed as a percentage.

A realistic SLO that has some bad events (some error) will have reliability somewhere below 100%:

regular-slo-reliability
Reliability of a standard SLO

Error budget target

Once reliability has been measured, we can set a budget target, which represents the minimum acceptable level of reliability. This target acts as a threshold on the Y-axis of the reliability chart. For example, let's say we picked our SLO's target at 75%:

slo-target-75
Reliability target 75%

Reliability above the target is acceptable, while reliability below the target is not.

This means our error budget is the remaining 25% (calculated as 100% - 75%), representing the maximum number of errors we are willing to tolerate. To track how much of the error budget we've consumed, we scale the Y-axis of the reliability chart. In this context, 75% reliability corresponds to 0% of the budget, while 100% reliability represents 100% of the budget

reliability-below-target
Reliability below target

It's the same graph as before, but with different Y-axis labels. Here, we can see that the entire budget has been consumed, and it turns negative at the point where reliability falls below the target. When viewing the reliability graph, selecting a target for your standard SLO becomes straightforward.

Use-case example 1

Suppose you're tracking HTTP requests over a 28-day rolling period. You might set a reliability target of 99.9% for that service, meaning you expect at least 99.9% of all requests to be successful within any 28-day period.

Use-case example 2

Imagine you ping a service every minute and are focused on the time the service is available to clients during a calendar month. In this case, you would use the Timeslices method with a 1-month, calendar-aligned time window. After consulting with your stakeholders, you agree to accept up to 1 hour of downtime per calendar month. This expectation translates to a reliability target of 99.86%, calculated as 99.86% = (720h - 1h) / 720h (where 30 days equals 720 hours, with a 1-hour error budget).

Error budget of a composite SLO

Knowing how the budget is calculated for standard SLOs, let’s look at how it’s calculated for composite SLOs. The composite budget is calculated based on the reliability over a given time window of multiple other SLOs.

Let’s take a look at several standard SLOs with 100% reliability:

reliability-below-target-2
standard SLOs with 100% reliability

Each of the component's reliability is expressed as a percentage on a scale from 0% to 100%.

Reliability is always calculated over a specific time period. When assessing the reliability of component SLOs aggregated into a composite SLO, the relevant time period is the composite’s time window, which may differ from the time windows configured for individual SLOs.

The reliability of a composite SLO is also measured on a scale from 0% to 100%, but it reflects the combined reliability of all its components. This can be visualized as a stacked area chart:

reliability-100-samples
Reliability 100% samples

Now, let’s take a look at a more realistic example where each component has some error and reliability below 100%:

composite-reliability-below-100
Composite reliability below 100

A composite SLO’s reliability composed of these SLOs would look like this:

composite-reliability-sample
Composite reliability sample

Reliability of components are “stacked” and normalized to 100%. In Nobl9, this result is presented without coloring of individual components:

normalized-reliability
Normalized reliability chart for a composite SLO

The composite SLO’s target is also just a reliability threshold. It’s a point selected on the Y-axis that indicates the lowest acceptable reliability.

Let’s assume the target, for example composite is set to 75%:

composite-target
Composite target at 75%

The remaining budget of composite SLO is the same as the reliability, but with the Y-axis scaled, the target is at 0%. This reflects that, by definition, we are accepting our reliability to be below 100% but not lower than the target.

composite-remaining-budget
Composite SLO remaining budget

This is the same chart as above, but with the Y-axis scaled. The peaks and valleys appear steeper, but that's simply a result of stretching the diagram vertically.

Error budget of a composite SLO with weighted components

So far we've considered a scenario where all components were weighted equally:

SLO ASLO BSLO CSLO D
Weights1111
Normalized Weights1 / (1 + 1 + 1 +1) = 25%1 / (1 + 1 + 1 +1) = 25%1 / (1 + 1 + 1 +1) = 25%1 / (1 + 1 + 1 +1) = 25%

All normalized weights in our example are equal to 25%, which indicates that in the chart of the “composite’s reliability without errors,” each component contributes 25% to the composite’s overall reliability:

composite-slo-reliability
Composite SLO reliability
Important insight

Maximum of composite SLO’s reliability that can be burned by a given component equals to this component's normalized weight.

It implies that a single component SLO, unless it’s the only component existing in a composite SLO, can not bring the reliability of a composite SLO down to 0%.

Let’s take a look at how our example changes when assigned different weights to different components:

SLOSLO ASLO BSLO CSLO D
Weights8412
Normalized Weights8 / (8 + 4 + 1 + 2) = 53%4 / (8 + 4 + 1 + 2) = 27%1 / (8 + 4 + 1 + 2) = 7%2 / (8 + 4 + 1 + 2) = 13%

Reliability of a composite SLO, when all component SLOs have 100% reliability but different weights now looks like this:

composite-slo-reliability-2
Composite SLO reliability - 100% reliability for all weights

Note how the thickness of different bands corresponds to their component's normalized weight. The larger the normalized weight, the thicker the band.

An important thing to note is that we can assign different weights and still get the same values of normalized weights:

SLOSLO ASLO BSLO CSLO D
Weights241236
Normalized Weights24 / (24 + 12 + 3 + 6) = 53%12 / (24 + 12 + 3 + 6) = 27%3 / (24 + 12 + 3 + 6) = 7%6 / (24 + 12 + 3 + 6) = 13%
important insight

The absolute weight of a single component doesn’t matter. What matters is the ratio of that weight to the weights of other components.

If we take the same component SLO data, with some errors as before but with the new weights, then our composite SLO’s reliability would look like this:

composite-slo-reliability-new-weights
Composite SLO reliability overview - weight change

Now, let’s set a 75% target over that composite SLO:

composite-slo-75-target
Composite SLO with a 75% target

The top 25% is our error budget, so the budget burned chart will look like this:

composite-slo-75-target-remaining-budget
Composite SLO remaining budget for a 75% target

We can notice a few things about our newly calculated composite budget:

  • It is different than the case when all weights were equal.
  • The shape of the budget somewhat resamples the shape of SLO A’s budget. That is because SLO A has significantly more weight than other components. Other components also contribute to the burn of the composite SLO’s budget, but not as much.