Skip to main content

System Health Review report

Reading time: 0 minute(s) (0 words)

The System Health Review Report provides a simplified, accessible tool for reporting reliability and performance data. This report bridges the gap between basic and highly configurable reporting options, making it easy to present critical information about the overall health of the system. Designed for recurring reliability check-in meetings, it provides an efficient way to monitor and present service reliability and performance over time.

The report targets system administrators, technical managers, and upper management. It provides clear, concise data to monitor and manage system reliability and performance, offering a straightforward view of service metrics and key performance indicators to support informed decision-making.

Report overview

The System Health Review report facilitates recurring reliability check-ins grouping your Nobl9 SLOs by projects or services and labels of your choice through the remaining error budget metric in a table format.

The report gives both aggregated and dispersed views of the overall system health for SLOs residing in selected projects or services. The four color-coded categories reflect the available error budget of the SLOs:

Healthy SLOs
This icon and color indicates SLOs with enough error budget left within their time window.
Exhausted SLOs
This icon and color indicates SLOs with error budget fully exhausted within their time window.
SLOs at risk
This icon and color indicates SLOs with error budget at risk of being fully exhausted within their time window.
No data SLOs
This icon and color indicates SLOs that haven't gathered data.
System Health Review report

The report also aggregates system health information from the selected SLOs to provide quick glimpses into the performance of your system.

Learn more about aggregation in the report.

Report aggregations

Depending on the grouping option set for the report, the identifier column contains project of service names with the grouping indicator in its header cell:

Grouping by project

SLOs in the columns match two criteria:

  • Reside in corresponding projects / services
  • Marked with labels selected for each column
    While labels define which SLOs appear in columns, the column names are specified independently.
Column groupings

From the perspective of reporting time frame, the report can be:

Real-time
Real-time reports reflect the state of selected SLOs at the latest data point received by Nobl9 within the last hour. Choose this option to review the most up-to-date state of your system.
Retrospective
Retrospective reports show the historic state of selected SLOs. You can additionally define a recurrence rule (rrule) to update historic data in the report regularly [1]. Retrospective reports also display SLO health trends [2].
rrule
System Health Review report for the past timepoint with added recurrence rule (no. 1) and SLO health trends (no. 2)
SLO health trends

SLO health trends represent the remaining error budget metric trends for the latest reporting period.

For example, in reports generated every Monday (available with rrule), the trend will reflect the comparison between the previous and current Mondays.

Aggregated columns and rows

The table displays aggregated information about system health based on the remaining error budget of the SLOs you selected.

Each cell contains the percentage of SLOs per status: healthy, at risk, exhausted, and without data. In some cases, cells may show no matching SLOs, indicating that a given project or service doesn't include SLOs marked with a given column's labels.

The At risk and SLOs without data categories are optional for display—their visibility is set at report creation.

It also indicates the percentage of resources in a particular status per report as follows:

  • Overall: The percentage of SLOs in the report
  • Horizontally: The percentage of SLOs in a particular row—project or service—across all columns
  • Vertically: The percentage of SLOs in a particular column across all rows—projects or services
Example
  • The yellow cell indicates that 27 % of SLOs in this report are exhausted, 61% are healthy, and there's no data for 11% of SLOs.
  • The turquoise row indicates that for, say, the United Kingdom, we have 31% exhausted SLOs, 56% healthy, and 6 % of SLOs report no data over the reporting period.
  • The purple cell in the first column: the responses project includes 25% exhausted SLOs, 50% healthy, and 25% with no data.
SLO aggregations
SLO aggregation in columns and rows

Key points and limitations

  • The rows are sorted in alphabetical order by default.
  • The report doesn't display empty projects or services (with no child SLOs).
  • Cells display no matching SLOs when no SLO matches column filters (i.e. a given project or service doesn't hold SLOs marked with selected labels).
  • Adding the selected labels to SLOs and removing them update the report dynamically, as long as these SLOs belong to services and projects within the report scope.
  • sloctl only: Every resource defined on the filters level must have a parent resource specified. For example, to add a service, define its parent project under filters.project.

Numbers to remember:

  • The maximum allowed number of columns is 30.
  • Retrospective reports can provide historical data for up to two years.
  • The maximum allowed frequency for recurrent reports is DAILY.
impact of query delay

If any of your SLOs use a data source with an extended query delay, the "latest" type of report will reflect the state of your system delayed by the period configured in the query delay.

Creating the System Health Review report

You can create the System Health Review report on the Nobl9 Web or applying a YAML configuration in sloctl.

  1. On the Nobl9 Web, go to Reports.
  2. Click .
Step 1: Name report and choose its type
  1. Enter the display name for your report.
    You can edit it at any time.
  2. Select the System Health Review report type.
Step 2: Filter resources

The resources you select define the scope of your report.

  1. Select at least one project, service, or service level objective to be included in your report.
    These fields are interdependent: selecting a project defines the list of available services and SLOs, and selected SLOs narrow down the list of available services and projects.

  2. Optional: Select labels to add more resources to your report.

Step 3: Define report layout

Your report layout depends on the row grouping and what's included in the columns. The table in this step is a visual guide only. It doesn't reflect the specific projects or services you've chosen. It shows only the structure your report will follow once you've customized the columns.

  1. Set Row grouping:

    • by project to break down SLOs per project
    • by service to organize SLOs in the table per service
  2. Add columns to the table.

    • You can add up to 30 columns per report

    • To delete a column, hover over it:

      Adding and removing columns
  3. Click every column to enter its name and add labels:

    Adding name and labels to the column
    • Labels define which SLOs appear in a given column
    • You can select all labels available in your organization
    • Specify as many labels as you need
Step 4: Configure thresholds

Thresholds define report categories: Exhausted, At risk, and Healthy.

  1. Specify how much of the SLOs' remaining error budget define exhausted and healthy SLOs.
    • SLOs with the error budget remaining between these values fall into the At risk category.
    • You can reset the thresholds to their default values set by your organization admin for the Service Health Dashboard.
  2. Set the visibility of the At risk category and SLOs that report no data.
    • Hide At risk and set the same values for Exhausted and Healthy, so your report includes only two categories.
    • Deselect SLOs without data to have these SLOs in your report.
Step 5: Select reporting time

You can create a one-time report based on the latest data or retrospective. Retrospective reports can be one-time or recurring.

To create the report based on the latest data, select Real-time and specify the time zone.

For Retrospective, do the following:

  1. Set the date, time, and time zone: your report will show the state of your selected SLOs as of the moment you specified.
  2. Specify the Repeat rule for your report:
    • With any option except for Don't repeat, Nobl9 will update the report as frequently as you select.
    • Select Custom when no repeat option fits your needs:
      • Enter your custom recurrence rule in the iCalendar format or use the rrule generator.
      • Omit specifying the date, time, and time zone in the rrule—you already have them set.
      • The minimum recurrence frequency is DAILY.

Learn more about RRULE

RRULE

Using spec.rrule you can create a rule for a System Health Report events that will generate them regularly. The spec.rrule field follows the iCalendar specification.

The format of the rrule field consists of key-value pairs separated by semicolons (;). Each key-value pair specifies a parameter of the recurrence rule. Nobl9 supports all iCalendar rules outlined in the iCalendar documentation.

Example:

The System Health Report will repeat every week on Monday at 10:00:00 AM.
apiVersion: n9/v1alpha
kind: Report
...
spec:
serviceHealthReport:
timeFrame:
snapshot:
point: past
rrule: FREQ=WEEKLY;BYDAY=MO;BYHOUR=10;BYMINUTE=0;BYSECOND=0
tip

Use the rrule generator to create a recurrence rule suited to your needs.