Skip to main content

Choosing weights

Reading time: 0 minute(s) (0 words)

After understanding how a composite SLO's error budget is calculated, the next crucial step is to balance component weights. Proper weighting ensures the composite SLO accurately reflects each component's priority, aligning reliability goals with business objectives.

Selecting the optimal configuration is an iterative process. There is no universal rule for using equal or different weights; instead, treat the process as a series of tests to discover the best fit. The optimal approach depends on your system's context and needs. Embracing experimentation encourages teams to adapt based on outcomes.

With these principles in mind, let’s explore the main considerations for balancing components in a composite SLO:

  • Weights represent the relative importance of components
  • Weights act as a reliability proxy of the observed service operation

Assign greater weight to more critical services to ensure correct prioritization.

As you start the weighting process, consider these guiding approaches:

  • SLI specification (availability, latency, etc.)
  • High-level non-functional business requirements
  • Component SLO targets
  • Component data consolidation

In any approach, it is a good practice to regularly review and fine-tune component weights.

SLI specification approach​

When managing both availability and latency components, it's crucial to define the most important business functionality. A payment service can be slower, but it must be as available as possible; a faster execution pipeline in the high-frequency trading domain is a must-have for business competitiveness, etc.

This prioritization can be represented by assigning a higher weight to the availability of the payment service in the first case. In the second case, assigning similar or equal weights to both the latency and availability components is a reasonable balance.

However, be aware that heavily prioritizing one aspect (such as availability) can obscure issues in another (such as latency). This could lead to an unnoticed error budget burn for reliability signals that are less critical but still important overall. Being mindful of this trade-off helps prevent long-term risks and encourages resilience by keeping teams alert to potential problems.

Find the practical usage in the composite use case.

Business requirement approach​

For complex composite SLOs that reflect multistep user journeys, consider focusing on the relative business importance of each step.

For example, in a food-delivery app, users perform several steps after opening the application. The table below provides example steps, their business importance, and the relative weight each can be assigned.

StepImportancePotential weightReasoning
Browsing restaurantsImportant3Represents the primary discovery and selection phase. If this functionality is broken, users have nothing to buy.
Viewing menusImportant5Represents the primary discovery and selection phase. If this functionality is broken, users have nothing to buy.
Adding items to the cartImportant5Represents the primary discovery and selection phase. If this functionality is broken, users have nothing to buy.
Placing the orderCritical7If this step is broken, the application is non-functional.
PayingCritical7If this step is broken, the application is non-functional.
Tracking the orderMinor2This represents the added value feature. Users can still get their food without tracking the order.
Providing the feedback after deliveryMinor1This represents the added value feature. Users can still get their food without providing the feedback.
Availability vs. latency

If the order placement and payment services include components for availability and latency, the higher weight is assigned to the availability components. However, latency components' weights must not be much lower, as a hungry user is unlikely to tolerate long delays.

An example of this approach is also used in a sample use case.

Component target approach​

Component targets reflect their reliability expectations and can hint at the composite's sensitivity to reliability changes for each objective. The component target approach considers the inverted value of component error budgets. This ensures that higher reliability expectations translate into greater impact on the composite SLO.

The table below provides calculations and resulting suggested component weights. The following formulas are used:

100% – component target = error budget
100% / component error budget = component raw weight
Component raw weight / total weights = component normalized weight

ValueComponent AComponent BComponent CComponent D
Component target99.00%97.50%99.50%75.00%
Error budget1.00%2.50%0.50%25.00%
Raw weight
(inverted error budget)
100402004
Component normalized weight29%12%58%1%

This approach yields mathematically derived weights. The higher the component target, the more it weighs in the composite. For example, Component C with a tight error budget can quickly burn half of the composite error budget, making any errors immediately noticeable. Component D, with a substantially larger error budget, would require far more errors to have the same impact. This method ensures equal error budget treatment for each component.

Key points to consider
  • Changing weights doesn't reset the budget.
  • If you’re unsure how to start assigning weights, assign identical weights to all components first. Later, once you have assessed component importance, adjust the weights accordingly.
  • The key criterion is the component's usefulness for you.

Data consolidation approach​

To specify how component data is consolidated in your composite SLO, select an aggregation metric that determines whether the composite reports on gradual reliability changes or binary error budget status. This aligns the weight assignment with the chosen data consolidation metric.

  • Reliability aggregation provides a continuous signal reflecting gradual changes.
    • Higher component weights increase their impact on composite reliability, increasing the composite's sensitivity to partial performance degradations.
    • Assign higher weights to critical components whose degradations require greater visibility and immediate action.
    • Avoid overweighting non-critical components to reduce noise.
  • Error budget state aggregation provides less sensitivity to minor variations and focuses on threshold-based reporting.
    • Component weights can scale up the criticality of components in terms of staying within their error budgets. Higher-weighted components with exhausted error budgets exert a greater impact, providing a focused view of critical compliance breaches.
    • Assign higher weights to components that contribute significantly to compliance goals or have strict no-failure requirements.
    • Ensure weights reflect true priorities without overshadowing minor but important contributions.

The table below highlights the key differences between the aggregation metrics.

MetricSensitivity to fluctuationsUse caseWeight role
ReliabilityHighOperational monitoringScale impact of components based on importance and contribution to overall service health.
Error budget stateLowHigh-level compliance reportingScale impact of components to reflect critical compliance priorities.
Check out these related guides and references: