Service level objectives inputs and outputs
Service Level Objective(s) (SLOs) is a complex term that describes various components of a complex system. This document aims to clarify the inputs and outputs of reliability approaches based on SLOs and how these values are presented in Nobl9. For meaningful comparisons between outputs, they must use the same inputs.
SLO inputs
To make sure that the SLO works correctly, we need to define the following characteristics:
-
Service Level Indicator (SLI): A metric or set of metrics that can be used to determine if a system is currently performing in an acceptable state (or not).
-
Service Level Objective target: A percentage indicating the desired good state based on SLI status for events or time.
-
Error budget time window: The period over which we evaluate compliance with the SLO target.
-
Calculation method: To determine a system's SLO (Service Level Objective) status, we can look at the total number of good events or the time the system functions correctly. This evaluation is done by comparing the results against our predetermined error budget time window.
Remember that it's crucial to define all the inputs accurately and have meaningful and easy-to-understand outputs.
SLO outputs
Using a calculation pipeline, based on the four inputs above, Nobl9 produces the following measures:
-
Reliability burn down: Indicates the percentage of the time a system has been in a good state, evaluated against the error budget time window.
-
Remaining error budget: An alternate representation of reliability burndown that uses the space between
100%
and the SLO target as its field of operation. For example, if your SLO target is90%
and your reliability burndown is95%
, your remaining error budget would be50%
. -
Burn rate: The number of observed bad events divided by the number of allowed bad events, as defined by your error budget time window. When your burn rate is
1
, you are burning through your budget at an acceptable rate. Below1
, you are on track to retain the excess budget, and above1
, you are on track to exhaust all established error budget.
To understand your system's performance, you can only compare these outputs if they are calculated using the same inputs.
Nobl9 outputs
On the grid view, Nobl9 displays:
- The reliability burn down as a line chart
- The remaining error budget in both time and a percentage remaining, and
- The burn rate as a natural number
In the details view, Nobl9 displays the incoming SLI data as a line chart of raw data. You can hover over this line to know the exact value at any point in time.
In the details view, Nobl9 displays the reliability burn down as a line chart that you can hover over to know the exact value at any point in time. The SLO target is represented on the line chart as a dotted line (- - -) to spot how this burn down operates against the target over the configured time window.
For the lte
(≤
) and gte
(≥
) operators, SLO target is represented as a solid line on the SLI charts:
In the SLO details view, Nobl9 displays the burn rate as a line chart that you can hover over to see the exact value at any point in time. Number 1
is represented on the line chart as a dotted line (- - -) to quickly see if you’ve been burning too much of your budget.
When burn rate is calculated over your error budget time window, the burn rate determines when you experience periods of unreliability and how severe these events were.
When you zoom in and out of an SLI chart, the chart might display different values, depending on the zoom level.
This can happen when the sum
aggregation is set for a non-incremental ratio SLI.
For this SLI type, Nobl9 adds every next data point to the previous point, so the SLI chart displays the sum of these time series.
Zooming in the chart narrows down the timespan for the displayed data, so it covers less data, reducing the values you see.
When you zoom the chart out, the timespan widens (and captures more data), so the displayed values grow.
Learn more about discrepancy between Nobl9 SLI charts and the values from a data source