Service level objectives
Our services may be small or incredibly deep and complex, but almost without fail these services can no longer be properly understood via the logs or stack traces we have depended on in the past. With this shift, we need not just new types of telemetry, but also new approaches for using that telemetry.
A Practical Guide to SLIs, SLOs & Error Budgets by Alex Hidalgo
The core concept of performance tracking is a service level objective (an SLO). It refers to the desired performance of your serviceโthe level you consider acceptable. In other words, you use SLOs to measure the reliability of your service.
SLOs exist along with two other concepts:
- Service level indicators (SLIs or objectives)โa quantifiable metrics that measure a specific aspect of your service's performance level.
Assessing your service performance, you monitor whether your SLIs satisfy your SLO. - Error budgetโthe acceptable number of failures you can have while still achieving your desired performance target.
It shows how close your reliability is towards your SLO over some period of time.
Considering the above-mentioned, an SLO unit refers to the number of unique error budgets Nobl9 calculates using the following:
- Data received from your data source
- Target you configure, according to your expectations
This means every SLO is connected to a data source and has at least one error budget. Every additional SLO target is considered an additional error budget.
Nobl9 simplifies SLO development and management with its comprehensive features, from integrating your preferred data source, through SLO creation, to alerting and reports.
Using SLOs, you measure individual aspects of your serviceโthe latency of authorization, the number of successful registrations, or anything else you need to monitor.
When you need to monitor the reliability of your complex system end-to-end, you can assemble multiple SLOs into a single composite.
Create an SLOโ
Nobl9 lets you create SLOs in the following ways:
- On the Nobl9 Web:
- Following the steps in the SLO wizard accessible from the SLO grid
- Using SLI Analyzer
- Using the SLOs as Code tools:
SLO name (in contrast to its display name)
is a unique identifier of your SLO.
While you can edit an SLO's display name at any time,
its name cannot be edited on the Nobl9 Web once you save the SLO.
The only way to modify it is the sloctl get slos
command in sloctl
.
Configuring an SLO, you specify an SLIโa metric for Nobl9 to pull from your data source. Depending on your metrics source, it is specified as a query or a set of parameters.
For example, you can query the following:
Service type | Ask for |
---|---|
A web service or API | HTTPS responses with 2xx and 3xx status codes |
A queue consumer | Successful processing of a message |
Serverless and function-based architectures | Successful completion of an invocation |
A batch | Normal exit (for example, rc == 0) of the driving process or script |
A browser application | Completion of a user action without yamlScript errors |
SLO gridโ
Once created, your SLO appears in the SLO grid, in the Service Level Objectives section on the Nobl9 Web.
This is a central board of your SLOs. You can do the following:
- View all SLOs in the organization.
You can see SLOs enclosed in projects you have access to. - View SLO live graphs, rewind them, fast-forward up to the current time, and pause.
- Search and filter SLOs.
- View SLO charts from the perspective of different time windows and time zones
Any changes to the time zone made on the SLO grid apply to this SLO details page.
If you suspect an issue with an SLO, first verify its underlying query. For queries that seem accurate, the problem might lie with the data source itself. In that case, activate event logs for your data source to pinpoint errors and identify the number of impacted SLOs.
SLO detailsโ
Click the required SLO to open its details. Here you can manage and assess your SLO.
- SLO management options:
- Replay past SLO data to backfill your SLO report.
It's active when maximum period for historical data retrieval in your data source is>0
- Silence and activate alerts for this SLO
- Manage SLO annotations
- Access the SLO YAML configuration
- Edit and delete the SLO
- Export SLI data Enterprise
- Replay past SLO data to backfill your SLO report.
- SLO assessment options:
- Zoom in and out SLO charts to access SLI raw data
- Shift SLO time window
- Change the time zone.
The default time zone matches the time zone set in SLO grid
SLO details 2.0 guideโ
Nobl9 launched a brand-new interface for the SLO details page.
The revamped design aims to make your familiar functionality more intuitive and readily accessible.
This update covers the user interface only, while calculations are all the same.
To switch to the redesigned SLO details, click Try SLO details 2.0 next to your SLO name.
To switch back, click Previous design next to SLO more options in the SLO header row.
To ensure a smooth transition, we'll be rolling out the new design in three phases:
Phase 2: SLO details 2.0 takes center stage (Current)
You still have access to both interfaces. Now, SLO details 2.0 are opened by default. Experience the new functionalities and layout more readily.
Phase 3: Streamlining the experience
Finally, we'll transition to offering SLO details 2.0 exclusively. The previous design will be phased out to ensure a consistent and improved user experience.
We'll keep you informed as we move through each phase.
Highlightsโ
This article outlines the key sections and concepts of the SLO details view.
The primary objective is an objective of your SLO that takes center stage on the SLO details page.
You can access its detailed information
immediately
upon opening your SLO details.
The primary objective is labeled with
When no primary objective is set, Nobl9 displays the lowest-target objective in this place.
The tabs organize the focus areas of your SLO:
- Overview, with the focus on the primary objective
- Objectives, with the charts for all objectives in your SLO
- Alerts, comprising your SLO-related alerts
The tiles under the SLO metadata display the reliability values of your primary objective (under Overview) or the rest of SLO objectives (under Objectives):
- The error budget remaining (in percent)
- The burn rate
- The reliability target, along with the current reliability value
- The number of active alerts within the current time window
The reliability target always shows the actual value, regardless of the time window selected.
The charts visualize the reliability parameters of the objective currently in focus:
- The error budget remaining
- The reliability burn down
- The service level indicator
- The error budget burn rate
The tiles provide a snapshot of the most recent data, focusing on the last seven days of the chosen time window. The charts offer a broader view, visualizing the SLO objective's status over the entire selected time window.
For time windows shorter than seven days, the tiles and charts capture the entire time window.
Values in both tiles are calculated based on the most recent data within the time window selected.
Since the Target value is always an actual target, if you increase it, there can be a moment when the Reliability tile is red even with the sufficient value and enough error budget remaining in percent. And vice versa: the reliability can be too low, and the error budget very little, but the Reliability tile can turn green if you decrease the target low enough.
Nobl9 recalculates the values in both tiles, considering the new target, after the following data income.
This time range remains in SLO history, so when you rewind the time window, you will still see it unless you change the target again or replay1 the SLO.
1The maximum period for historical data retrieval limit per data source is applied.
SLO management options are now placed next to the SLO name at the top of the page. Unfold More options to run Replay on this SLO, handle alerts, view YAML, and delete the SLO.
General SLO metadata, including project, service, and data source, is now located directly beneath the SLO name. To access it, click (unfold) before the SLO name.
Additionally, the redesigned view provides details on the SLO history, such as who created this SLO, when, and the date of its last update:
Overview tabโ
The Overview tab focuses on your primary objective.
The summary of the rest of your SLO objectives is placed at the bottom of the page, while their charts,
other details, and available options are moved to the Objectives tab.
To set or change your primary objective, click Set/Change primary objective.
The focus area now includes your primary objective's target, value, and type, along with the options to view metric settings and handle the time window and time zone.
The tiles display the primary objective's info.
This tile always shows the real-time number of active alerts.
Pausing the SLO also pauses the live updates of active alerts. In this case, the tile shows alerts that were active at the moment of pausing.
SLO annotations are moved further down the tiles on the right:
Your primary objective charts are displayed below:
Under the charts, the Objectives list summarizes the rest of your SLO's objectives:
Objectives tabโ
This tab provides the details for all your SLO objectives, including the primary objective.
You can find the familiar tiles under every objective. To view the objective's charts, unfold them.
Click before the required objective's name to unfold its charts:
You can select the objectives to display:
Show and hide annotations:
And access their reference info: targets, values, types, and metric settings:
Alerts tabโ
Open this tab to access alert policies linked to your SLO and check triggered alerts, if any.
The number next to the tab indicates how many alerts are currently active.
Tiles display the alert policies linked to your SLO.
Every alert policy tile comprises a short summary:
- The alert policy name and severity
- Whether it triggered any alerts and when, if yes
- The option to silence or resume alerts
Depending on alert status, alert policies are marked as follows:
Status | Description |
---|---|
Currently alerting | |
Alert resolved | |
Alert silenced | |
No icon | No alert triggered |
Click the required alert policy name to open its details.
To silence an alert, click Silence in its tile. The alert you originally intended to silence is marked for silencing. Select the silence duration and click Silence to confirm.
Under the Silenced alerts section, the currently silenced alerts are listed. You can resume currently silenced alerts, silence any other alerts, or resume all alerts at once.
Under the tiles, the Alerts list shows alerts triggered per SLO objective. Nobl9 limits displaying alerts to 1000 most recent alert events.
By default, you see alerts that have been active within your current time window. The newest alerts are displayed first.
You can filter the list by the following criteria:
- Alert status: All alerts, Triggered, Resolved
- SLO objective name
- Alert policy name
- Time window
When you filter by two or more criteria, the results satisfy all of themโNobl9 applies the AND
logical operator.
Click the required alert to check its details.
Nobl9 returns alerts only for existing alert policies, SLOs, services, and objectives. So, Nobl9 won't return alerts in the following situations:
- If you delete an SLO, alert policy, or service
- If you delete an SLO, alert policy, or service and recreate it with the same name
- If you unlink an alert policy from an SLO