Service level objectives

Reading time: 0 minute(s) (0 words)

Our services may be small or incredibly deep and complex, but almost without fail these services can no longer be properly understood via the logs or stack traces we have depended on in the past. With this shift, we need not just new types of telemetry, but also new approaches for using that telemetry.

from Implementing Service Level Objectives
A Practical Guide to SLIs, SLOs & Error Budgets by Alex Hidalgo

The core concept of performance tracking is a service level objective (an SLO). It refers to the desired performance of your service—the level you consider acceptable. In other words, you use SLOs to measure the reliability of your service.

SLOs exist along with two other concepts:

Service level indicators (SLIs or objectives)—a quantifiable metrics that measure a specific aspect of your service's performance level.
Assessing your service performance, you monitor whether your SLIs satisfy your SLO.
Error budget—the acceptable number of failures you can have while still achieving your desired performance target.
It shows how close your reliability is towards your SLO over some period of time.

Considering the above-mentioned, an SLO unit refers to the number of unique error budgets Nobl9 calculates using the following:

Data received from your data source
Your configured target

This means every SLO is connected to a data source and has at least one error budget. Every additional SLO target is considered an additional error budget.

Nobl9 simplifies SLO development and management with its comprehensive features, from integrating your preferred data source, through SLO creation, to alerting and reports.

Using SLOs, you measure individual aspects of your service—the latency of authorization, the number of successful registrations, or anything else you need to monitor.

When you need to monitor the reliability of your complex system end-to-end, you can assemble multiple SLOs into a single composite.

Create an SLO

You can create SLOs in Nobl9 using two main methods:

The Nobl9 web application:
- Follow the steps in the SLO wizard, accessible from the SLO grid.
- Use the SLI Analyzer.
SLOs as code:
- sloctl
- Nobl9 Terraform provider

SLO name vs. display name

An SLO's name is its unique identifier and cannot be changed in the Nobl9 web application after it is created. This is different from the display name, which you can edit at any time.

When you configure an SLO, you must specify an SLI—a metric for Nobl9 to pull from your data source. Depending on the source, the SLI is defined either as a query or through a set of parameters.

The table below provides examples of common SLIs for different types of services.

Service type	Example SLI (what to measure)
A web service or API	HTTPS responses with 2xx and 3xx status codes
A queue consumer	Successful processing of a message
Serverless and function-based architectures	Successful completion of an invocation
A batch process	Normal exit of the driving process or script (e.g., exit code 0)

Example SLIs by service type

Create a composite SLO

You can assemble existing SLOs into a composite SLO to gain a unified view of your system's reliability and performance. This approach streamlines reliability monitoring by assessing individual components from a single entry point. It provides a consolidated perspective, event for complex systems with diverse components, data sources, reporting frequencies, and criticality levels.

Refer to the Composite SLOs guide for step-by-step instructions.

The Nobl9 Terraform provider documentation and YAML guide provide the details on how to create composite SLOs leveraging the SLOs-as-code approach.

SLOs section

Once created, your SLO appears under the Service level objectives section in the Nobl9 web application.

From here, you can view and organize SLOs as follows:

See all SLOs contained in projects you can access.
Switch between the grid and list views.
Search and filter SLOs.

Service level objectives section — Switching between the grid and list layouts in the SLOs section

The available actions depend on the section layout.

Grid view:

Analyze charts
- View SLOs from the perspective of different time windows and time zones.
  The default current time window option is SLO-specific. When you select another time window, all SLO cards display charts for your selected range.
- Pause the time window live update.
  When the time window is paused, the Error budget remaining and Burn rate values are hidden. Return to the live view to see them again.
Maintain context
- By default, SLO states are shown for your current time zone.
  When you select a different time zone on this page, it will remain active when you navigate to an individual SLO's details page.

List view:

Add, remove, and rearrange columns.
For this, click (customize columns) next to the search bar.

Customize columns — Customizing columns on the list layout

To arrange columns, move them up or down:

Troubleshooting SLOs

If you suspect an issue with an SLO, first verify its underlying query. If the query is correct, the problem may lie with the data source itself. In this case, you can activate event logs if the SLO's data source is connected using the direct connection method. For data sources connected with the agent method, check the agent's metrics.

Read more about the data source connection methods.

SLO details

Click the required SLO to open its details. Here you can manage and assess your SLO.

The following options are available in your SLO header row:

Button	Action	Notes
	Expand SLO metadata	SLO metadata includes the following: SLO name and history Time window parameters Error budget calculation method (budgeting method) Data source, its configuration method, and the query parameters No data anomaly alert details, if any Any additional info like labels or links
	Copy link to the SLO	Link is copied along with the time window and time zone
	Edit the SLO	SLO wizard opens
	Open the Options menu	Charts¹: hide and show chart types and change their display order Annotations: show and hide system and user annotations
	Open the More actions menu	Run Replay²: replay the SLO for the required time range Silence alerts: silence any alerts for the required time range and resume all alerts if any is silenced View YAML: access the SLO YAML configuration. Click Edit to modify it Copy SLO: create a copy of the SLO with the option to place the copy to a different project or service Delete: delete the SLO

SLO handling options table

¹Charts settings apply to all SLOs per user. Once you set the chart visibility for one SLO, you'll see the same for all SLOs. When no chart is selected, the message Select at least one chart to see data appears in place of charts.
²Run Replay is inactive while it's being replayed or when the maximum period for historical data retrieval is set for its data source to 0.

You can also handle the time window parameters:

Shift SLO time window, change the time zone, and copy the time window
The default time zone matches the time zone set in SLO grid
Zoom in SLO charts to access SLI raw data

Zooming in

To zoom in on a specific time range, click and drag on the desired area of the chart. You can drag in both directions.

Highlights

The SLO details page comprises five tabs. They organize the focus areas of your SLO and are as follows:

Overview focuses on the primary objective
Objectives contains summaries and charts for all objectives in your SLO
Alerts comprises your SLO-related alerts
Change history shows the history of changes made to your SLO
Annotations displays annotations added to your SLO

Once you open the SLO details, you land on the Overview tab. It highlights the reliability values and charts of the primary objective of your SLO.

The primary objective is an objective that takes center stage on the SLO details page. You can access its detailed information immediately upon opening your SLO details.
The primary objective is labeled with

Primary objective

and can be changed at any time.

The tiles on the Overview and Objectives tabs display the reliability values of your objective that is currently in focus. The tiles provide a snapshot of the most recent data, focusing on the last seven days of the chosen time window. Here, you can find the following:

The error budget remaining (in percent)
The burn rate
The reliability target, along with the current reliability value
The number of active alerts within the current time window

Below the tiles, the charts visualize the reliability parameters of the objective currently in focus. The charts cover the entire time window selected and include the following:

The error budget remaining
The reliability burn down
The service level indicator
The error budget burn rate

Service level indicator (SLI) chart

The SLI chart visualizes the data points received. Its appearance varies depending on the metric type (threshold or ratio). For ratio metrics, the chart's appearance also depends on the data count method. Non-incremental ratio metric charts feature two key elements:

The option to select the display mode
Counts of good (or bad) and total events received over the selected time window

Wider time windows show aggregated (downsampled) data, where the aggregation method is also determined by the metric configuration.

SLO alerts

This tab provides the details of alert policies linked to your SLO along with any triggered alerts.

The number next to the tab name indicates how many alerts are currently active.

Tiles display the alert policies linked to your SLO and their summary

The alert policy name and severity
Whether it triggered any alerts for this SLO and when, if yes
The option to silence or resume alerts

Depending on alert status, alert policies are marked as follows:

Status	Description
	Currently alerting
	Alert resolved
	Alert silenced
No icon	No alert triggered

To silence an alert policy, click Silence in its tile. The policy you originally intended to silence is marked for silencing. Select the silence duration and click Silence to confirm.

Under the Silenced alerts section, the currently silenced alerts are listed. You can resume currently silenced alerts, silence any other alerts, or resume all alerts at once.

Under the tiles, the Alerts list shows alerts triggered per SLO objective. Nobl9 limits displaying alerts to 1000 most recent alert events. Click the required alert to open its details.

By default, you see alerts that have been active within the SLO's current time window. The newest alerts are displayed first.

You can filter the list by the following criteria:

Alert status: All alerts, Triggered, Resolved
SLO objective name
Alert policy name
Time window

When you filter by two or more criteria, the results satisfy all of them—Nobl9 applies the AND logical operator.

Alerts for existing resources only

Nobl9 only shows alerts for SLOs, services, and alert policies that currently exist. Alerts will no longer be available if:

The associated SLO, service, or alert policy is deleted (even if it's recreated later with the same name).
The alert policy is unlinked from the SLO

Managing SLOs

SLO management options in Nobl9 include copying existing SLOs. For this:

Go to your required SLO details.
Click (more actions) > Copy SLO.

Copying SLO
Make the necessary amendments: modify your copy name, select the required project and service.
Copying the SLO to a different project requires service selection. Make sure your required project contains at least one service.
Click Copy SLO to confirm.

Configuration from the original SLO	Is copied?	Notes
Data source Replay settings Metric settings and query Error budget calculation settings Labels, links, descriptions	Always copied	Data collection mirrors the original SLO.
Collected data	Never copied	Starts new data collection upon creation.
Alert policies	Depends on copy's destination	Retained if copied within the same project (any service); otherwise, unlinked.
No data anomaly alert methods	Depends on your permissions to alert methods' projects	Preserved if you have access to these projects; otherwise, unlinked.

SLO copy configuration persistence table

Useful links

For a more in-depth look, consult additional resources:

Posts about service level objectivesNobl9 blog

The key to reliabilityNobl9 blog

SLO inputs and outputsSLOs

Use case of SLO configurationUse cases

Creating SLOsData sources in Nobl9

SLO Terraform configurationNobl9 Terraform provider

SLO creation assistanceSLI Analyzer

Composite SLOsComposites

SLO annotationsNobl9 features

SLI aggregationsSLO guides

Error budgetGlossary

AlertingAlerting

Alert conditionsAlerting

Create an SLO​

Create a composite SLO​

SLOs section​

SLO details​

Highlights​

SLO alerts​

Managing SLOs​

Useful links​