System Health Review report

Reading time: 0 minute(s) (0 words)

The System Health Review Report provides a simplified, accessible tool for reporting reliability and performance data. This report bridges the gap between basic and highly configurable reporting options, making it easy to present critical information about the overall health of the system. Designed for recurring reliability check-in meetings, it provides an efficient way to monitor and present service reliability and performance over time.

The report targets system administrators, technical managers, and upper management. It provides clear, concise data to monitor and manage system reliability and performance, offering a straightforward view of service metrics and key performance indicators to support informed decision-making.

Report overview

The System Health Review report facilitates recurring reliability check-ins grouping your Nobl9 SLOs by projects or services and labels of your choice through the remaining error budget metric in a table format.

The report gives both aggregated and dispersed views of the overall system health for SLOs residing in selected projects or services. The four color-coded categories reflect the available error budget of the SLOs:

Healthy SLOs

This icon and color indicates SLOs with enough error budget left within their time window.

Exhausted SLOs

This icon and color indicates SLOs with error budget fully exhausted within their time window.

SLOs at risk

This icon and color indicates SLOs with error budget at risk of being fully exhausted within their time window.

No data SLOs

This icon and color indicates SLOs that haven't gathered data.

The report also aggregates system health information from the selected SLOs to provide quick glimpses into the performance of your system.

Learn more about aggregation in the report.

Depending on the grouping option set for the report, the identifier column contains project of service names with the grouping indicator in its header cell:

Additionally, you can sort table by SLO's remaining error budget. For this, click the required column header:

SLOs in the columns match two criteria:

Reside in corresponding projects / services
Marked with labels selected for each column

You can view labels assigned to each column. For this, hover the cursor over the required column header:

From the perspective of reporting time frame, the report can be:

Real-time

Real-time reports reflect the state of selected SLOs at the latest data point received by Nobl9 within the last hour. Choose this option to review the most up-to-date state of your system.

Retrospective

Retrospective reports show the historic state of selected SLOs. You can additionally define a recurrence rule (rrule) to update historic data in the report regularly [1]. Retrospective reports also display SLO health trends [2].

SLO health trends

SLO health trends represent the remaining error budget metric trends for the latest reporting period.

For example, in reports generated every Monday (available with rrule), the trend will reflect the comparison between the previous and current Mondays.

Aggregated columns and rows

The table displays aggregated information about system health based on the remaining error budget of the SLOs you selected.

Each cell contains the percentage of SLOs per status: healthy, at risk, exhausted, and without data. In some cases, cells may show no matching SLOs, indicating that a given project or service doesn't include SLOs marked with a given column's labels.

The At risk and SLOs without data categories are optional for display—their visibility is set at report creation.

It also indicates the percentage of resources in a particular status per report as follows:

Overall: The percentage of SLOs in the report
Horizontally: The percentage of SLOs in a particular row—project or service—across all columns
Vertically: The percentage of SLOs in a particular column across all rows—projects or services

Example

The yellow cell indicates that 27 % of SLOs in this report are exhausted, 61% are healthy, and there's no data for 11% of SLOs.
The turquoise row indicates that for, say, the United Kingdom, we have 31% exhausted SLOs, 56% healthy, and 6 % of SLOs report no data over the reporting period.
The purple cell in the first column: the responses project includes 25% exhausted SLOs, 50% healthy, and 25% with no data.

SLO aggregations — SLO aggregation in columns and rows

Key points and limitations

The rows are sorted in alphabetical order by default.
The report doesn't display empty projects or services (with no child SLOs).
When you assign more than one label to a column and the assigned labels have different keys, the report applies the AND logical operator to SLOs, meaning it displays SLOs marked with all assigned labels.
When several labels in a column have the same key, the report applies the OR logical operator to SLOs. So it displays SLOs marked with either of the assigned labels.
Cells display no matching SLOs when no SLO matches column filters, meaning a given project or service doesn't hold SLOs marked with the labels you assigned to a given column.
Adding the labels to SLOs and removing them update reports that include these labels dynamically, as long as the labeled SLOs belong to services and projects within the report scope.
sloctl only: Every resource defined on the filters level must have a parent resource specified. For example, to add a service, define its parent project under filters.project.

Matching SLOs example

Assume your report includes:

Projects: Latency and Responses
Columns: Location and Shopping cart with labels
- Labels in Location:
  - region: europe
  - region: usa
- Labels in Shopping cart:
  - delivery: fast
  - subscribe: yes

The report applies the filters as follows:

Projects	Location	Shopping cart
Latency	SLOs from the Latency project marked with either `region: europe`, or `region: usa`	SLOs from the Latency project marked with `delivery: fast` and `subscribe: yes`
Responses	SLOs from the Responses project marked with either `region: europe`, or `region: usa`	SLOs from the Responses project marked with `delivery: fast` and `subscribe: yes`

Numbers to remember:

The maximum allowed number of columns is 30.
Retrospective reports can provide historical data for up to two years.
The maximum allowed frequency for recurrent reports is DAILY.

impact of query delay

If any of your SLOs use a data source with an extended query delay, the "latest" type of report will reflect the state of your system delayed by the period configured in the query delay.

Creating the System Health Review report

You can create the System Health Review report on the Nobl9 Web or applying a YAML configuration in sloctl.

Nobl9 Web
YAML configuration for sloctl

On the Nobl9 Web, go to Reports.
Click .

Step 1: Name report and choose its type

Enter the display name for your report.
You can edit it at any time.
Select the System Health Review report type.

Step 2: Filter resources

The resources you select define the scope of your report.

Select at least one project, service, or service level objective to be included in your report.
These fields are interdependent: selecting a project defines the list of available services and SLOs, and selected SLOs narrow down the list of available services and projects.
Optional: Select labels to add more resources to your report.

Step 3: Define report layout

Your report layout depends on the row grouping and what's included in the columns. The table in this step is a visual guide only. It doesn't reflect the specific projects or services you've chosen. It shows only the structure your report will follow once you've customized the columns.

Set Row grouping:
- by project to break down SLOs per project
- by service to organize SLOs in the table per service
Add columns to the table.
- You can add up to 30 columns per report
- To delete a column, hover over it:
  
  Adding and removing columns
Click every column to enter its name and add labels:

Adding name and labels to the column
- Labels define which SLOs appear in a given column
- You can select all labels available in your organization
- Specify as many labels as you need
- SLOs appear in the report as follows:
  - Labels with the same key and different values: SLOs that appear in this column are marked with the either label.
  - Labels with different keys: SLOs that appear in this column are marked with all the assigned labels.

Step 4: Configure thresholds

Thresholds define report categories: Exhausted, At risk, and Healthy.

Specify how much of the SLOs' remaining error budget define exhausted and healthy SLOs.
- SLOs with the error budget remaining between these values fall into the At risk category.
- You can reset the thresholds to their default values set by your organization admin for the Service Health Dashboard.
Set the visibility of the At risk category and SLOs that report no data.
- Hide At risk and set the same values for Exhausted and Healthy, so your report includes only two categories.
- Deselect SLOs without data to have these SLOs in your report.

Step 5: Select reporting time

You can create a one-time report based on the latest data or retrospective. Retrospective reports can be one-time or recurring.

To create the report based on the latest data, select Real-time and specify the time zone.

For Retrospective, do the following:

Set the date, time, and time zone: your report will show the state of your selected SLOs as of the moment you specified.
Specify the Repeat rule for your report:
- With any option except for Don't repeat, Nobl9 will update the report as frequently as you select.
- Select Custom when no repeat option fits your needs:
  - Enter your custom recurrence rule in the iCalendar format or use the rrule generator.
  - Omit specifying the date, time, and time zone in the rrule—you already have them set.
  - The minimum recurrence frequency is DAILY.

Learn more about RRULE

Apply the following YAML definition to create a System Health Review report:

General YAML sample for the System Health Review report
apiVersion: n9/v1alpha
kind: Report
metadata:
  name: string # Mandatory
  displayName: string # Optional
spec:
  shared: true # Optional, boolean, defaults to false
  filters: # You must define at least one level of filtering from these three: projects, services, slos
    projects:
      - project-1 # Mandatory if filtering by projects
    services:
      - name: string # Mandatory if filtering by services
        project: string # Mandatory if filtering by services
    slos:
      - name: string # Mandatory if filtering by slos
        project: string # Mandatory if filtering by slos
    labels:
      "key_1":
        - "value_1"
        - "value_2"
      "key_2":
        - "value_1"
        - "value_2"
  systemHealthReview:
    timeFrame:
      snapshot:
        point: enum # Mandatory. One of: past | latest
        dateTime: YYYY-MM-DDThh:mm:ssZ # Mandatory if point.past is defined
        rrule: string # Recurrence rule definition. Use only if point.past is defined
      timeZone: "America/New_York" # Mandatory
    rowGroupBy: enum # Mandatory. One of: project | service
    columns:
    - displayName: string # Mandatory
      labels: # Mandatory
        "key_1":
          - "value_1"
          - "value_2"
        "key_2":
          - "value_1"
          - "value_2"
    - displayName: string # Mandatory
      labels: # Mandatory
        "key_3":
        - "value_1"
    thresholds:
      redLte: float # Mandatory
      greenGt: float # Mandatory
      showNoData: false # Optional, boolean

Select a level of YAML spec objects to see detailed descriptions:

All levels

metadata

spec

spec.filters

spec.filters.projects

spec.filters.services

spec.filters.slos

spec.filters.labels

spec.systemHealthReview

spec.systemHealthReview.timeframe

Field

Type

Description

metada.name

mandatory

string

The name identifier of the report.

metadata.displayName

optional

string

User-friendly display name of the report that will be displayed as its title on the Nobl9 web.

spec.shared

optional

boolean

Defaults to false. Set to true if you want other Nobl9 users to see your report

spec.filters

mandatory

n/a

spec.filters is an array of Nobl9 resources by which you want to filter the report.

You must specify at least one method of filtering by projects, services, or slos.

filters.projects[n]

optional

list

A list of project names you want filter your report by.

filters.services[n]

optional

list

A list of services you want filter your report by.

filters.services[n].name

mandatory

string

The name identifier of the service.

filters.services[n].project

mandatory

string

The project where the service resides in.

filters.slos[n]

optional

list

A list of SLOs you want to filter your report by.

filters.slos[n].name

mandatory

string

The name identifier of the service.

filters.slos[n].project

mandatory

string

The project where the slo resides in.

filters.labels

optional

map

A map of labels you want to filter your report by.

filters.labels.key

mandatory

map

See overall YAML schema for more details.

filters.labels.value

mandatory

map

See overall YAML schema for more details.

spec.systemHealthReview

mandatory

n/a

An array that allows you to define properties specific to the System Health Review report.

snapshot.point

mandatory

enum

Defines the method of reporting timeFrame.

Two options are available:
• latest – shows data from the last received data point within the last hour
• past – shows data for a defined past data point

snapshot.dateTime

mandatory

string

Required only when point.past is selected.

Defined past point time from which you want to generate the report. The expected value must be a string representing the date and time in the RFC3339 format.

snapshot.rrule

mandatory

string

The iCalendar recurrence rule for the System Health Review report for past timeframes.

Use only if snapshot.point.past is selected.

snapshot.timeZone

mandatory

string

Time zone name as in IANA Time Zone Database.

systemHealthReview.rowGroupBy

mandatory

enum

Grouping methods of report table rows. Two methods are available:
• project
• service

systemHealthReviewReport.columns[n]

mandatory

list

A list of columns groupings to be generated in the report that will be grouped by labels.

Defining at least one column is required. The maximum number of columns per Service Health Review report is 30.

columns[n].displayName

mandatory

string

The name of the column that will be displayed as its header.

columns[n].labels

mandatory

map

A map of labels you want to filter your report by: all SLOs with labels defined here will be grouped under this column. For labels with the same key, SLOs are filtered using the OR logical operator. For labels with different keys, SLOs are filtered using the AND operator.

columns[n].labels.key

mandatory

map

See overall YAML schema for more details.

columns[n].labels.value

mandatory

map

See overall YAML schema for more details.

systemHealthReview.thresholds

mandatory

n/a

The thresholds used to define the report categories. Set the same values for redLte and greenGt to dismiss the At risk category in the report.

thresholds.redLte

mandatory

float

The threshold used to define the Exhausted category. The range is inclusive: less than or equal to.

thresholds.greenGt

mandatory

float

The threshold used to define the Healthy category. The range is exclusive: greater than.

thresholds.showNoData

optional

boolean

The option to show SLOs with no data in the report. Default: false. Note: it doesn't affect the display of resources with no matching SLOs.

RRULE

Using spec.rrule you can create a rule for a System Health Report events that will generate them regularly. The spec.rrule field follows the iCalendar specification.

The format of the rrule field consists of key-value pairs separated by semicolons (;). Each key-value pair specifies a parameter of the recurrence rule. Nobl9 supports all iCalendar rules outlined in the iCalendar documentation.

Example:

The System Health Report will repeat every week on Monday at 10:00:00 AM.
apiVersion: n9/v1alpha
kind: Report
...
spec:
  serviceHealthReport:
    timeFrame:
      snapshot:
        point: past
        rrule: FREQ=WEEKLY;BYDAY=MO;BYHOUR=10;BYMINUTE=0;BYSECOND=0

tip

Use the rrule generator to create a recurrence rule suited to your needs.

Report overview​

Aggregated columns and rows​

Key points and limitations​

Creating the System Health Review report​

RRULE​

Report overview

Aggregated columns and rows

Key points and limitations

Creating the System Health Review report

RRULE