System Health Review report
The System Health Review Report provides a simplified, accessible tool for reporting reliability and performance data. This report bridges the gap between basic and highly configurable reporting options, making it easy to present critical information about the overall health of the system. Designed for recurring reliability check-in meetings, it provides an efficient way to monitor and present service reliability and performance over time.
The report targets system administrators, technical managers, and upper management. It provides clear, concise data to monitor and manage system reliability and performance, offering a straightforward view of service metrics and key performance indicators to support informed decision-making.
Report overview
The System Health Review report facilitates recurring reliability check-ins grouping your Nobl9 SLOs by projects or services and labels of your choice through the remaining error budget
metric in a table format.
The report gives both aggregated and dispersed views of the overall system health for SLOs residing in selected projects or services. The four color-coded categories reflect the available error budget of the SLOs:
The report also aggregates system health information from the selected SLOs to provide quick glimpses into the performance of your system.
Learn more about aggregation in the report.
Depending on the grouping option set for the report, the identifier column contains project of service names with the grouping indicator in its header cell:
Additionally, you can sort table by SLO's remaining error budget. For this, click the required column header:
SLOs in the columns match two criteria:
- Reside in corresponding projects / services
- Marked with labels selected for each column
You can view labels assigned to each column. For this, hover the cursor over the required column header:
From the perspective of reporting time frame, the report can be:
SLO health trends represent the remaining error budget metric trends for the latest reporting period.
For example, in reports generated every Monday (available with rrule), the trend will reflect the comparison between the previous and current Mondays.
Aggregated columns and rows
The table displays aggregated information about system health based on the remaining error budget of the SLOs you selected.
Each cell contains the percentage of SLOs per status: healthy, at risk, exhausted, and without data. In some cases, cells may show no matching SLOs, indicating that a given project or service doesn't include SLOs marked with a given column's labels.
The At risk and SLOs without data categories are optional for display—their visibility is set at report creation.
It also indicates the percentage of resources in a particular status per report as follows:
- Overall: The percentage of SLOs in the report
- Horizontally: The percentage of SLOs in a particular row—project or service—across all columns
- Vertically: The percentage of SLOs in a particular column across all rows—projects or services
- The yellow cell indicates that 27 % of SLOs in this report are exhausted, 61% are healthy, and there's no data for 11% of SLOs.
- The turquoise row indicates that for, say, the United Kingdom, we have 31% exhausted SLOs, 56% healthy, and 6 % of SLOs report no data over the reporting period.
- The purple cell in the first column: the responses project includes 25% exhausted SLOs, 50% healthy, and 25% with no data.
Key points and limitations
- The rows are sorted in alphabetical order by default.
- The report doesn't display empty projects or services (with no child SLOs).
- When you assign more than one label to a column and the assigned labels have different keys, the report applies the
AND
logical operator to SLOs, meaning it displays SLOs marked with all assigned labels. - When several labels in a column have the same key, the report applies the
OR
logical operator to SLOs. So it displays SLOs marked with either of the assigned labels. - Cells display no matching SLOs when no SLO matches column filters, meaning a given project or service doesn't hold SLOs marked with the labels you assigned to a given column.
- Adding the labels to SLOs and removing them update reports that include these labels dynamically, as long as the labeled SLOs belong to services and projects within the report scope.
sloctl
only: Every resource defined on thefilters
level must have a parent resource specified. For example, to add a service, define its parent project underfilters.project
.
Matching SLOs example
Assume your report includes:
- Projects: Latency and Responses
- Columns: Location and Shopping cart with labels
- Labels in Location:
region: europe
region: usa
- Labels in Shopping cart:
delivery: fast
subscribe: yes
- Labels in Location:
The report applies the filters as follows:
Projects | Location | Shopping cart |
---|---|---|
Latency | SLOs from the Latency project marked with eitherregion: europe , or region: usa | SLOs from the Latency project marked withdelivery: fast and subscribe: yes |
Responses | SLOs from the Responses project marked with eitherregion: europe , or region: usa | SLOs from the Responses project marked withdelivery: fast and subscribe: yes |
Numbers to remember:
- The maximum allowed number of columns is 30.
- Retrospective reports can provide historical data for up to two years.
- The maximum allowed frequency for recurrent reports is
DAILY
.
If any of your SLOs use a data source with an extended query delay, the "latest" type of report will reflect the state of your system delayed by the period configured in the query delay.
Creating the System Health Review report
You can create the System Health Review report on the Nobl9 Web or applying a YAML configuration in sloctl
.
- Nobl9 Web
- YAML configuration for sloctl
- On the Nobl9 Web, go to Reports.
-
Click .
Step 1: Name report and choose its type
- Enter the display name for your report.
You can edit it at any time. - Select the System Health Review report type.
Step 2: Filter resources
The resources you select define the scope of your report.
-
Select at least one project, service, or service level objective to be included in your report.
These fields are interdependent: selecting a project defines the list of available services and SLOs, and selected SLOs narrow down the list of available services and projects. -
Optional: Select labels to add more resources to your report.
Step 3: Define report layout
Your report layout depends on the row grouping and what's included in the columns. The table in this step is a visual guide only. It doesn't reflect the specific projects or services you've chosen. It shows only the structure your report will follow once you've customized the columns.
-
Set Row grouping:
- by project to break down SLOs per project
- by service to organize SLOs in the table per service
-
Add columns to the table.
-
You can add up to 30 columns per report
-
To delete a column, hover over it:
-
-
Click every column to enter its name and add labels:
- Labels define which SLOs appear in a given column
- You can select all labels available in your organization
- Specify as many labels as you need
- SLOs appear in the report as follows:
- Labels with the same key and different values: SLOs that appear in this column are marked with the either label.
- Labels with different keys: SLOs that appear in this column are marked with all the assigned labels.
Step 4: Configure thresholds
Thresholds define report categories: Exhausted, At risk, and Healthy.
- Specify how much of the SLOs' remaining error budget define exhausted and healthy SLOs.
- SLOs with the error budget remaining between these values fall into the At risk category.
- You can reset the thresholds to their default values set by your organization admin for the Service Health Dashboard.
- Set the visibility of the At risk category and SLOs that report no data.
- Hide At risk and set the same values for Exhausted and Healthy, so your report includes only two categories.
- Deselect SLOs without data to have these SLOs in your report.
Step 5: Select reporting time
You can create a one-time report based on the latest data or retrospective. Retrospective reports can be one-time or recurring.
To create the report based on the latest data, select Real-time and specify the time zone.
For Retrospective, do the following:
- Set the date, time, and time zone: your report will show the state of your selected SLOs as of the moment you specified.
- Specify the Repeat rule for your report:
- With any option except for Don't repeat, Nobl9 will update the report as frequently as you select.
- Select Custom when no repeat option fits your needs:
- Enter your custom recurrence rule in the iCalendar format or use the rrule generator.
- Omit specifying the date, time, and time zone in the rrule—you already have them set.
- The minimum recurrence frequency is
DAILY
.
Learn more about RRULE
Apply the following YAML definition to create a System Health Review report:
apiVersion: n9/v1alpha
kind: Report
metadata:
name: string # Mandatory
displayName: string # Optional
spec:
shared: true # Optional, boolean, defaults to false
filters: # You must define at least one level of filtering from these three: projects, services, slos
projects:
- project-1 # Mandatory if filtering by projects
services:
- name: string # Mandatory if filtering by services
project: string # Mandatory if filtering by services
slos:
- name: string # Mandatory if filtering by slos
project: string # Mandatory if filtering by slos
labels:
"key_1":
- "value_1"
- "value_2"
"key_2":
- "value_1"
- "value_2"
systemHealthReview:
timeFrame:
snapshot:
point: enum # Mandatory. One of: past | latest
dateTime: YYYY-MM-DDThh:mm:ssZ # Mandatory if point.past is defined
rrule: string # Recurrence rule definition. Use only if point.past is defined
timeZone: "America/New_York" # Mandatory
rowGroupBy: enum # Mandatory. One of: project | service
columns:
- displayName: string # Mandatory
labels: # Mandatory
"key_1":
- "value_1"
- "value_2"
"key_2":
- "value_1"
- "value_2"
- displayName: string # Mandatory
labels: # Mandatory
"key_3":
- "value_1"
thresholds:
redLte: float # Mandatory
greenGt: float # Mandatory
showNoData: false # Optional, boolean
spec
objects to see detailed descriptions:metadata
spec
spec.filters
spec.filters.projects
spec.filters.services
spec.filters.slos
spec.filters.labels
spec.systemHealthReview
spec.systemHealthReview.timeframe
metada.name
metadata.displayName
spec.shared
false
. Set to true
if you want other Nobl9 users to see your reportspec.filters
spec.filters
is an array of Nobl9 resources by which you want to filter the report. You must specify at least one method of filtering by projects
, services
, or slos
.
filters.projects[n]
project
names you want filter your report by.filters.services[n]
services
you want filter your report by.filters.services[n].name
service
.filters.services[n].project
project
where the service resides in.filters.slos[n]
filters.slos[n].name
service
.filters.slos[n].project
project
where the slo
resides in.filters.labels
spec.systemHealthReview
snapshot.point
timeFrame
. Two options are available:
• latest
– shows data from the last received data point within the last hour
• past
– shows data for a defined past data point
snapshot.dateTime
point.past
is selected.Defined past point time from which you want to generate the report. The expected value must be a string representing the date and time in the RFC3339 format.
snapshot.rrule
past
timeframes. Use only if snapshot.point.past
is selected.
systemHealthReview.rowGroupBy
•
project
•
service
systemHealthReviewReport.columns[n]
Defining at least one column is required. The maximum number of columns per single Service Health Review report is 30.
columns[n].displayName
columns[n].labels
OR
logical operator. For labels with different keys, SLOs are filtered using the AND
operator.systemHealthReview.thresholds
thresholds.redLte
thresholds.greenGt
thresholds.showNoData
RRULE
Using spec.rrule
you can create a rule for a System Health Report events that will generate them regularly. The spec.rrule
field follows the iCalendar specification.
The format of the rrule
field consists of key-value pairs separated by semicolons (;
). Each key-value pair specifies a parameter of the recurrence rule. Nobl9 supports all iCalendar rules outlined in the iCalendar documentation.
Example:
apiVersion: n9/v1alpha
kind: Report
...
spec:
serviceHealthReport:
timeFrame:
snapshot:
point: past
rrule: FREQ=WEEKLY;BYDAY=MO;BYHOUR=10;BYMINUTE=0;BYSECOND=0
Use the rrule
generator to create a recurrence rule suited to your needs.