Skip to main content

Reliability Roll-up report

Reading time: 0 minute(s) (0 words)

The Reliability Roll-up report is a powerful tool that empowers you to define the significance and relevance of your SLOs within the context of your system. It allows for the aggregation of reliability measures from multiple SLOs to tailor the data to your needs. With the Reliability Roll-up report, you have the flexibility to choose from predefined filters or design a custom data structure for a more profound understanding of your system's reliability.

The true essence of this Report lies in its ability to simplify the complexity of reliability into a single, easy-to-understand metric โ€“ the Reliability score that provides a precise assessment of your system's overall health. In the Reliability Roll-up report each SLO is assigned a Reliability score, which is calculated based on its performance against target objectives and Time windows. These individual Scores are then consolidated within the report structure, resulting in an aggregated Reliability Score at each level of the Report's structure.

Even if you're busy with other things, using the Reliability Roll-up report, you can quickly assess the overall reliability of Services in your organization.

Reliability Roll-up report is useful for:

  • Drawing a high-level overview of organization-wide reliability for a specific period
  • Measuring reliability tailored to your system and needs
  • Making informed decisions quicker based on reliability
  • Driving organization-wide adoption of SLOs based on easy-to-digest data

Creating a reliability roll-up reportโ€‹

Step 1: Name the report and choose its typeโ€‹

For detailed instructions on Step 1 of the Report Wizard, check the main Reports documentation.

Step 2: Create reliability score layersโ€‹

You can create an auto-generated or custom-made structure for your report to organize your Nobl9 resources (services, projects, and SLOs) and create layers for the reliability score.

You can change the type of structure and layers of your existing Reliability Roll-up reports.

Auto-generated structureโ€‹

Using an auto-generated structure, you can choose filters to create reliability score layers that mirror your organizationโ€™s project-service-SLO dependencies. Check the main Reports documentation to learn more.

replay source config
Image 2: Step 2 โ€“ Create auto-generated structure

Custom structureโ€‹

Using a custom structure, you can adapt the Reliability Roll-up report to your requirements, creating custom layers for the overall reliability score. Using this option, you can add single resources, folders, or subfolders that contain your resources:

replay source config
Image 3: Step 2 โ€“ Create custom structure

Folders are useful for creating a custom structure for your Nobl9 resources (services, projects, and SLOs). Here are several things to know:

  • Folders can contain individual resources and other folders
  • Folders and child folders create layers that aggregate the reliability scores of the resources they contain
  • The reliability score of a parent folder is calculated as the average of the reliability scores of all the resources and child folders it contains
  • For child folders, the maximum level of nesting is 8
  • Projects and Services that donโ€™t contain SLOs wonโ€™t affect the reliability score calculations. Effectively, a folder/child folder that contains only โ€œemptyโ€ projects and services will display an N/A value in the reliability drill-down section
tip

When added in this step, you can easily rearrange your resources and folders. For this, hover on the six dots next to each folder/resource tile and click the down-/up-pointing arrows.

You can also rename your folders: just click on their display name and edit it. The name can be max. 63 characters long and can contain diacritic and special characters.

replay source config
Image 4: Creating and organizing folders-resources structure
Reliability Roll-up report and RBACโ€‹

You can make your report available to others by sharing it. The Reliability Roll-up report will be visible to all users with access to your report's SLOs and projects.

For a custom structure, if you don't include any SLOs (or projects) and share the report, everyone will be able to access it.

This will happen even if those folders originated from your existing projects or services. When empty, such folders lose their RBAC properties and become standalone entities. Once you've added SLOs to this report, it will disappear for users who donโ€™t have access to them via their RBAC permissions.

Step 3: Select time rangeโ€‹

  • Check the main Reports documentation for details on this step

  • Currently - all time ranges in the Reliability Roll-up report are calculated in the UTC time zone

tip

You can edit the time range of your existing Reliability Roll-up report.

To do that, go to the Reports list and click the pencil icon next to the report that youโ€™d like to change. Then, go to Step 3 of the Report wizard.

Report overviewโ€‹

Reliability report
Image 5: Overview of the Reliability Roll-up report (auto-generated structure)
  • Reliability score: The reliability score displays the value of the score for the parent folder (see section below for details about the reliability score calculations).

    note

    If there's data for the current and previous report range, Nobl9 displays the calculated trend below the reliability score. The trend will be hidden if there's no data for both ranges.

    replay source config
    Image 6: Trend values for reliability score. From left to right: positive, negative, no data.
  • The least reliable: This tile displays up to 10 folders with the lowest reliability scores. Folders are sorted according to the reliability score value.

    Note

    The reliability score value is always displayed for each folder within the report's structure. If a parent folder contains only one child folder, their reliability score will be the same, so both folders will be on the Top 10 list with the same value.

  • The most reliable: This tile displays up to 10 folders with the highest reliability scores. Folders are sorted according to the reliability score value.

  • Total SLOs: The total number of SLOs in each report.

    note

    This value might change for an auto-generated structure. For example, if you select a project as the only filter and a new SLO appears within that project, the report will be updated, and the total SLO count will change.

  • Total folders: The total number of folders in the report. Note that the main parent folder is not counted here; itโ€™s a virtual folder that displays the overall calculated reliability score.

    note

    This value might change in an auto-generated structure if you add new services to your projects.

  • SLOs within budget: Expressed as a value (the number of SLOs) and percentage (total SLOs).

  • SLOs over budget: Expressed as a value (the number of SLOs) and percentage (total SLOs). An SLO is considered as over-budget if its error budget is depleted for even a minute within the reporting time range. Any SLO with a reliability score less than 100% will be deemed as over-budget.

  • Report drill-down: Allows you to examine your reliability score data and the structure more closely.

  • Reliability score & trend - If an SLO includes only one objective, we show this value once for the objective and the entire SLO. If an SLO contains more than one objective, we show these values for each objective and separately for the entire SLO (for the SLO, itโ€™s the average calculated from objectives).

What is the reliability score?โ€‹

The reliability score measures your systemโ€™s health based on how often your SLOs meet their targets. If an SLO consistently meets its target and never exceeds its error budget, the score will be 100%. If an SLO falls below its target for 10% of the measured period, the score will be 90%.

Reliability score calculationsโ€‹

The method for calculating the reliability score varies based on the type of the time window associated with a service level objective (SLO).

For SLOs using rolling time windows, where data points are consistently added and dropped as the window moves forward, the reliability score is computed by considering every data point's adherence to the SLO target and calculating a daily target adherence percentage. See section below for details.

In the case of calendar-aligned SLOs, the primary focus is on how the SLO adheres to its target at the end of its calendar-aligned windows, calculating the score based on the final measurements. This approach ensures that the reliability score accurately reflects the health of such SLOs. See section below for details.

SLOs with rolling time windowsโ€‹

For the rolling-type time windows, the reliability score is calculated as the ratio of values within budget to the sum of values within budget and the values that exceeded budget. Nobl9 uses the metric for the Remaining error budget and categorizes returned data points as:

  • within budget if the remaining budget is greater than or equal to 0
  • over budget if the remaining budget is less than 0

The counts for each SLOโ€™s objectives above and below the error budget are aggregated daily. Effectively, the reliability score for an objective in the reporting time window is an average daily result.

Example 1: Burn down chart and reliability scoreโ€‹

The following image shows a burn-down chart for an SLO with a rolling time window with two objectives, a and b:

Reliability report
Image 7: Burn-down chart for an SLO with a rolling time window

Based on these values, the reliability score for the displayed time range will be as follows:

burn down rolling
Image 8: Burn-down chart for an SLO with a rolling time window

As we can see, the reliability score for the objective a is 0%, since the objective was consistently below the target throughout the reporting time range. We can also see that the objective increased its reliability by 48.73%. The total reliability score for this SLO = 24.36%, which is an average score for this SLO's objectives (48,73% + 0%/2 = 24.365%).

SLOs with calendar-aligned time windowsโ€‹

For objectives that adhere to calendar-aligned time window SLOs, Nobl9 calculates the reliability score at the end of the day and at the end of the calendar-aligned time window for all objectives. The reliability score is calculated by dividing the last value of the calculated data point from the burn down chart (called good-to-total-ratio). The following logic applies:

  • If the value for good-to-total-ratio is greater than or equal to the target, the reliability score equals 100%

  • If the value for good-to-total-ratio is less than 0, then the reliability score is less than 100% and equals good_total_ratio/target

Nobl9 uses the final data points of completed Time Windows for SLOs within the reporting time range of the Reliability Roll-up report in calendar-aligned SLO objectives. Nobl9 also includes the daily reliability score from the end of the reporting time window if that day isnโ€™t already in the final data points of completed SLO time windows.

Reliability score consistently averages those results, for example:

  • The reliability score for the time window that ended during the Reporting time window is 94%, and the daily reliability score at the end of this window is 100%. RG of this objective would be 97%:
calculation for the reliability score
Image 9: Calculation for the reliability score
Example 2: Burn down chart and reliability scoreโ€‹

The following image shows a burn-down chart for an SLO with a calendar time window with one objective:

RS burn down for a calendar-aligned slo
Image 10: Burn down chart for an SLO with a calendar time window

Based on the value marked as a red dot in the burn down chart (the last value in the time range), the reliability score for the SLO is as follows:

RS for calendar-aligned report
Image 11: Reliability score and trend for the calendar-aligned SLO

Aggregation of reliability score valuesโ€‹

The aggregate value of the reliability score for aggregation level would be an average of the reliability score values from its child level.

note

Mathematically, the score layer structure is a calculation formula where SLOs are the variables to calculate, and these are grouped into score layers for calculation purposes. The calculation starts at the lowest layer.

At each subsequent higher layer, the average is derived by summing the average values of all the immediate underlying layers. This recursive calculation continues upwards through the layers.

  • For example:
    • In the hierarchy, there is a folder called Producers. Inside are folders named Data Intake with a Reliability Score = 95,5% and Data Processor with a reliability score = 90%. The reliability score for the parent folder (Producers) will be 92,75%:
reliability score folders
Image 12: Calculations for Reliability Roll-up report folders

Reliability score calculations and Replayโ€‹

If you run Replay for any SLO included in your Reliability Roll-up report, once the process for reimporting historical data has been completed, Nobl9 will recalculate and update the reliability score in the background.

Other notesโ€‹

  • All time ranges in the Reliability Roll-up report are calculated in the UTC time zone

  • Composite SLOs arenโ€™t added as separate objectives in the Reliability Roll-up report