Skip to main content

Service level objectives

Reading time: 0 minute(s) (0 words)

Our services may be small or incredibly deep and complex, but almost without fail these services can no longer be properly understood via the logs or stack traces we have depended on in the past. With this shift, we need not just new types of telemetry, but also new approaches for using that telemetry.

from Implementing Service Level Objectives
A Practical Guide to SLIs, SLOs & Error Budgets by Alex Hidalgo

The core concept of performance tracking is a service level objective (an SLO). It refers to the desired performance of your service—the level you consider acceptable. In other words, you use SLOs to measure the reliability of your service.

SLOs exist along with two other concepts:

  • Service level indicators (SLIs or objectives)—a quantifiable metrics that measure a specific aspect of your service's performance level.
    Assessing your service performance, you monitor whether your SLIs satisfy your SLO.
  • Error budget—the acceptable number of failures you can have while still achieving your desired performance target.
    It shows how close your reliability is towards your SLO over some period of time.

Considering the above-mentioned, an SLO unit refers to the number of unique error budgets Nobl9 calculates using the following:

  • Data received from your data source
  • Your configured target

This means every SLO is connected to a data source and has at least one error budget. Every additional SLO target is considered an additional error budget.

Nobl9 simplifies SLO development and management with its comprehensive features, from integrating your preferred data source, through SLO creation, to alerting and reports.

Using SLOs, you measure individual aspects of your service—the latency of authorization, the number of successful registrations, or anything else you need to monitor.

When you need to monitor the reliability of your complex system end-to-end, you can assemble multiple SLOs into a single composite.

Create an SLO

You can create SLOs in Nobl9 using two main methods:

SLO name vs. display name

An SLO's name is its unique identifier and cannot be changed in the Nobl9 web application after it is created. This is different from the display name, which you can edit at any time.

When you configure an SLO, you must specify an SLI—a metric for Nobl9 to pull from your data source. Depending on the source, the SLI is defined either as a query or through a set of parameters.

The table below provides examples of common SLIs for different types of services.

Service typeExample SLI (what to measure)
A web service or APIHTTPS responses with 2xx and 3xx status codes
A queue consumerSuccessful processing of a message
Serverless and function-based architecturesSuccessful completion of an invocation
A batch processNormal exit of the driving process or script (e.g., exit code 0)
Example SLIs by service type

Create a composite SLO

You can assemble existing SLOs into a composite SLO to gain a unified view of your system's reliability and performance. This approach streamlines reliability monitoring by assessing individual components from a single entry point. It provides a consolidated perspective, event for complex systems with diverse components, data sources, reporting frequencies, and criticality levels.

Refer to the Composite SLOs guide for step-by-step instructions.

The Nobl9 Terraform provider documentation and YAML guide provide the details on how to create composite SLOs leveraging the SLOs-as-code approach.

SLOs section

Once created, your SLO appears under the Service level objectives section in the Nobl9 web application.

From here, you can view and organize SLOs as follows:

Service level objectives section
Switching between the grid and list layouts in the SLOs section

The available actions depend on the section layout.

Grid view:

  • Analyze charts
    • View SLOs from the perspective of different time windows and time zones.
      The default current time window option is SLO-specific. When you select another time window, all SLO cards display charts for your selected range.
    • Pause the time window live update.
      When the time window is paused, the Error budget remaining and Burn rate values are hidden. Return to the live view to see them again.
  • Maintain context
    • By default, SLO states are shown for your current time zone.
      When you select a different time zone on this page, it will remain active when you navigate to an individual SLO's details page.

List view:

  • Add, remove, and rearrange columns.
    For this, click (customize columns) next to the search bar.
Customize columns
Customizing columns on the list layout

To arrange columns, move them up or down:

Moving up a column
Moving a column up
Troubleshooting SLOs

If you suspect an issue with an SLO, first verify its underlying query. If the query is correct, the problem may lie with the data source itself. In this case, you can activate event logs if the SLO's data source is connected using the direct connection method. For data sources connected with the agent method, check the agent's metrics.

Read more about the data source connection methods.

SLO details

Click the required SLO to open its details. Here you can manage and assess your SLO.

The following options are available in your SLO header row:

ButtonActionNotes
Expand SLO metadataSLO metadata includes the following:
  • SLO name and history
  • Time window parameters
  • Error budget calculation method (budgeting method)
  • Data source, its configuration method, and the query parameters
  • No data anomaly alert details, if any
  • Any additional info like labels or links
  • Copy link to the SLOLink is copied along with the time window and time zone
    Edit the SLOSLO wizard opens
    Open the Options menu
  • Charts1: hide and show chart types and change their display order
  • Annotations: show and hide system and user annotations
  • Open the More actions menu
  • Run Replay2: replay the SLO for the required time range
  • Silence alerts: silence any alerts for the required time range and resume all alerts if any is silenced
  • View YAML: access the SLO YAML configuration. Click Edit to modify it
  • Copy SLO: create a copy of the SLO with the option to place the copy to a different project or service
  • Delete: delete the SLO
  • SLO handling options table

    1Charts settings apply to all SLOs per user. Once you set the chart visibility for one SLO, you'll see the same for all SLOs. When no chart is selected, the message Select at least one chart to see data appears in place of charts.
    2Run Replay is inactive while it's being replayed or when the maximum period for historical data retrieval is set for its data source to 0.

    You can also handle the time window parameters:

    • Shift SLO time window, change the time zone, and copy the time window
      The default time zone matches the time zone set in SLO grid
    • Zoom in SLO charts to access SLI raw data
    Zooming in

    To zoom in on a specific time range, click and drag on the desired area of the chart. You can drag in both directions.

    Zooming in chart left to right

    Highlights

    The SLO details page comprises five tabs. They organize the focus areas of your SLO and are as follows:

    • Overview focuses on the primary objective
    • Objectives contains summaries and charts for all objectives in your SLO
    • Alerts comprises your SLO-related alerts
    • Change history shows the history of changes made to your SLO
    • Annotations displays annotations added to your SLO

    Once you open the SLO details, you land on the Overview tab. It highlights the reliability values and charts of the primary objective of your SLO.

    Overview

    The primary objective is an objective that takes center stage on the SLO details page. You can access its detailed information immediately upon opening your SLO details.
    The primary objective is labeled with

    Primary objective
    and can be changed at any time.

    The tiles on the Overview and Objectives tabs display the reliability values of your objective that is currently in focus. The tiles provide a snapshot of the most recent data, focusing on the last seven days of the chosen time window. Here, you can find the following:

    • The error budget remaining (in percent)
    • The burn rate
    • The reliability target, along with the current reliability value
    • The number of active alerts within the current time window

    Below the tiles, the charts visualize the reliability parameters of the objective currently in focus. The charts cover the entire time window selected and include the following:

    • The error budget remaining
    • The reliability burn down
    • The service level indicator
    • The error budget burn rate
    Service level indicator (SLI) chart

    The SLI chart visualizes the data points received. Its appearance varies depending on the metric type (threshold or ratio). For ratio metrics, the chart's appearance also depends on the data count method. Non-incremental ratio metric charts feature two key elements:

    • The option to select the display mode
    • Counts of good (or bad) and total events received over the selected time window

    Wider time windows show aggregated (downsampled) data, where the aggregation method is also determined by the metric configuration.

    Read more about SLI aggregations.

    Non-incremental ratio metric SLI chart

    Under every objective name, you can find its summary:

    • Target: the percentage of acceptable performance you're aiming for.
    • Total error budget shows how much of the error budget this objective has within the time window.
    • Value indicates which values you considered acceptable for this objective using one of the indicators: less than, less than or equal to, greater than, greater than or equal to.
    • Type: the metric type—ratio or threshold.

    You can also view objective's underlying metric settings.

    To access general SLO metadata, click (unfold) before the SLO name. The metadata block includes the following:

    • Parent project, service, and data source
    • SLO history: who created this SLO and the dates of creation and last update
    Understanding the visuals
    1. For newly created SLOs, and when no primary objective is set for an SLO, Nobl9 displays the lowest-target objective under the Overview tab.

    2. The reliability target always shows the actual value, regardless of the time window selected.

    3. The Active alerts tile always shows the real-time number of active alerts.
      Pausing the SLO also pauses the live updates of active alerts. In this case, the tile shows alerts that were active at the moment of pausing.

    4. For time windows shorter than seven days, the tiles and charts capture the entire time window.

    5. Reliability target changes.
      Values in both tiles are calculated based on the most recent data within the time window selected.
      Since the Target value is always an actual target, if you increase it, there can be a moment when the Reliability tile is red even with the sufficient value and enough error budget remaining in percent. And vice versa: the reliability can be too low, and the error budget very little, but the Reliability tile can turn green if you decrease the target low enough.
      This can happen for a relatively short time after reliability target modification because Nobl9 recalculates the values in both tiles, considering the new target, after the following data income.
      This time range remains in SLO history, so when you rewind the time window, you will still see it unless you change the target again or replay1 the SLO.
      1The maximum period for historical data retrieval limit per data source is applied.

    SLO alerts

    This tab provides the details of alert policies linked to your SLO along with any triggered alerts.

    The number next to the tab name indicates how many alerts are currently active.

    The Alerts tab

    Tiles display the alert policies linked to your SLO and their summary

    • The alert policy name and severity
    • Whether it triggered any alerts for this SLO and when, if yes
    • The option to silence or resume alerts

    Depending on alert status, alert policies are marked as follows:

    StatusDescription
    Currently alerting
    Alert resolved
    Alert silenced
    No iconNo alert triggered

    To silence an alert policy, click Silence in its tile. The policy you originally intended to silence is marked for silencing. Select the silence duration and click Silence to confirm.

    Under the Silenced alerts section, the currently silenced alerts are listed. You can resume currently silenced alerts, silence any other alerts, or resume all alerts at once.

    Silence alerts

    Under the tiles, the Alerts list shows alerts triggered per SLO objective. Nobl9 limits displaying alerts to 1000 most recent alert events. Click the required alert to open its details.

    By default, you see alerts that have been active within the SLO's current time window. The newest alerts are displayed first.

    You can filter the list by the following criteria:

    • Alert status: All alerts, Triggered, Resolved
    • SLO objective name
    • Alert policy name
    • Time window

    When you filter by two or more criteria, the results satisfy all of them—Nobl9 applies the AND logical operator.

    Alerts for existing resources only

    Nobl9 only shows alerts for SLOs, services, and alert policies that currently exist. Alerts will no longer be available if:

    • The associated SLO, service, or alert policy is deleted (even if it's recreated later with the same name).
    • The alert policy is unlinked from the SLO

    Managing SLOs

    SLO management options in Nobl9 include copying existing SLOs. For this:

    1. Go to your required SLO details.

    2. Click (more actions) > Copy SLO.

      Copying SLO
    3. Make the necessary amendments: modify your copy name, select the required project and service.
      Copying the SLO to a different project requires service selection. Make sure your required project contains at least one service.

    4. Click Copy SLO to confirm.

    Configuration from the original SLOIs copied?Notes
  • Data source
  • Replay settings
  • Metric settings and query
  • Error budget calculation settings
  • Labels, links, descriptions
  • Always copiedData collection mirrors the original SLO.
    Collected dataNever copiedStarts new data collection upon creation.
    Alert policiesDepends on copy's destinationRetained if copied within the same project (any service); otherwise, unlinked.
    No data anomaly alert methodsDepends on your permissions to alert methods' projectsPreserved if you have access to these projects; otherwise, unlinked.
    SLO copy configuration persistence table
    For a more in-depth look, consult additional resources: