Choosing target for a Composite SLO
Setting high vs. low SLO targets
An argument for selecting a high target—and consequently a smaller error budget—is that it establishes a more ambitious reliability goal for your service. It’s important to clarify that simply setting a high target for an SLO won’t inherently make your service more reliable. Instead, it acts as an early warning system, alerting you more quickly when the reliability of your service requires attention. This can lead to observing higher burn rates and potentially receiving more frequent alerts.
While aiming for high reliability is a worthwhile goal, achieving perfection is often impractical and unnecessary. SLOs provide a way to optimize resources by defining what it means for your service to be “good enough,” allowing you to focus efforts more strategically. By setting a target that is “low enough to be acceptable,” you may save time and money while still maintaining sufficient reliability. This decision usually involves discussions and consensus within teams or across stakeholders, but that’s another advantage of using SLOs—they offer a common framework for discussing and agreeing on reliability goals.
Choosing targets for regular vs. composite SLOs
Setting a target for a standard SLO is typically more straightforward compared to a Composite SLO, as it can be easily expressed in natural language. For instance: “99%
of requests in any 28-day period are successful” or “99%
of the time in any calendar month the service is available.” These targets are easier to define because the SLI (Service Level Indicator) in standard SLOs usually measures a tangible metric with a clear unit of measurement.
In contrast, Composite SLOs aggregate the weighted reliabilities of multiple underlying SLOs. For a composite SLO, a target of 99%
means that “the weighted reliability of a set of component SLOs over a given time window must be at least 99%
.” Composite SLOs lets you define an error budget across diverse types of SLIs, such as HTTP request success rates, response time latencies, batch job success rates, or service uptime—all of which may not be naturally compatible on their own.
Composite SLO’s target doesn’t refer to any tangible measurement with units but rather to other SLOs reliability, budgets and weights.
Assessing target set at random
Let’s get back to the example of weighted component SLOs:
As a reminder, normalized weights for SLOs A, B, C, and D were 53%, 27%, 7%, and 13% respectively:
SLO | SLO A | SLO B | SLO C | SLO D |
---|---|---|---|---|
Weights | 8 | 4 | 1 | 2 |
Normalized Weights | 53% | 27% | 7% | 13% |
Our task is to select the appropriate point on the Y-axis of this stacked area chart, which will serve as the threshold for our composite error budget. It’s important to remember that each component SLO can consume a portion of the error budget proportional to its normalized reliability weight. This gives us insight into how the composite error budget will behave and respond to errors across different components.
For example, if we set the composite target at 75
%, only SLOs A and B would be capable of consuming the entire composite error budget individually. However, this would only happen if their reliability dropped below specific thresholds—53% for SLO A, calculated as 53% = 100% - (100% - 75%) / 53%
, and 7
% for SLO B, calculated as 7% = 100% - (100% - 75%) / 27%
. These reliability thresholds are extremely low, leading us to conclude that a 75% target for this composite SLO is too low to be meaningful.
Setting pessimistic target
Usually, a healthy and valuable SLO is reliable in the high 90s, with targets set close to 100%. We could anticipate that on a bad day, when the reliability of all SLOs drops below, let’s say, 98%, a Composite SLO’s reliability will drop to the same value as well. That suggests that we would typically set Composite SLO’s target somewhere within its component targets' range. Let’s look at it more closely and assume that our component SLOs have the following targets:
SLO A | SLO B | SLO C | SLO D | |
---|---|---|---|---|
Target | 99% | 97.5% | 99.99% | 99.9% |
Normalized Weights | 53% | 27% | 7% | 13% |
If the budgets of all component SLOs are exhausted and are at 0% then the reliability of the Composite SLO would be:
99%*53% + 97.5%*27% + 99.99%*7% + 99.9%*13% = 98.78%
Refining target based on component SLO volatility
A healthy and valuable SLO typically operates with reliability in the high 90s, with targets set close to 100%. On a particularly bad day, when the reliability of all component SLOs drops below, say, 98%
, we can expect the Composite SLO’s reliability to drop similarly. This implies that the target for a Composite SLO should generally fall within the range of its component SLO targets. To better understand this, let’s examine an example where the component SLOs have the following targets:
100%*53% + 97.5%*27% + 100%*7% + 100%*13% = 99.33%
In a scenario where all component SLOs occasionally burn their error budgets, you may notice that these burn periods don’t overlap frequently. By setting a composite target midway between the lowest expected reliability, 98.78%
, and 100%
—which would be 99.39%
(99.39% = (98.78% + 100%) / 2
)—the Composite SLO becomes more sensitive to situations where multiple component SLOs degrade in performance simultaneously, even if none have completely exhausted their individual budgets.
Editing an SLO's target resets its budget. SLOs that support Replay can work around that by reimporting historical data backfilling budget calculations based on updated configuration. Composite SLOs as of 2024-08-21 are still in beta and don’t support Replay yet.
Further refining target
Fine-tuning SLOs to ensure they remain beneficial for you and your organization is an ongoing process. This is well described in The SLO Development Lifecycle (SLODLC), which can get you up and running by setting up a process for continuously reviewing and updating your SLOs.
Ultimately, SLOs should be useful for you and your specific case, so there is no “one-size-fits-all” solution. Experimenting, observing, and reviewing your SLOs is the key to finding the correct configuration that works.