Skip to main content

Explore real-world SLOs

Reading time: 0 minute(s) (0 words)

Let's create an SLO to monitor the stability of pod readiness transitions over a specific time period. To achieve this, we need to track how often pods in our Kubernetes cluster transition between "ready" and "not ready" states within the past 5 minutes.

SLO summary
Metric type
Threshold
Time window
Rolling, 7 days
Error budget calculation method
Occurrences
Target 99.5%
Operator and value < 7

In this case, we are monitoring readiness transitions, which behave monotonically. The total number of transitions either increases or remains the same during a given time window, as each state change is counted incrementally. This counter resets only when a new monitoring time window begins. Considering these characteristics, we’ve selected the threshold metric type.

To put this into practice, we can use the following PromQL query to measure state changes in pod readiness:

sum(changes(kube_pod_status_ready{condition="true"}[5m]))
enter query
Metric type and query

Looking for an alternative metric type? Check out the ratio metric type, designed to compare good responses to total responses.

The rolling time window is ideal here because it allows for continuously updating metrics in real-time. In this scenario, we’re focusing on a week-long view, setting the time window duration to seven days. This means our error budget will be calculated dynamically across this period, continuously incorporating fresh data as older points are replaced.

rolling time window
Time window selection

Curious about alternative time windows? Explore the benefits of calendar-aligned time windows for your use case.

To track and control the frequency of readiness state changes, we use the occurrences error budget calculation method. With this budgeting method, we monitor how frequently state changes occur within 5-minute intervals (set in the query).

While normal operation allows for some state transitions, we assume that our error budget is depleted when 7 or more state transitions occur in a single 5-minute interval, as this frequency may indicate underlying pod stability issues. This target, combined with a 99.5% success rate, ensures we maintain both reliability and stability.

For clarity, we name this SLO objective Acceptable readiness transitions. In this example, it is the only objective in our SLO.

occurrences
Defining budgeting method and objectives

Interested in alternative methods? Explore the timeslices error budget method.

The final step is to make the SLO user-friendly and easy to collaborate on. This includes giving it a clear name and applying optional settings for streamlined management.

additional
Additional settings
  • As you're typing a display name for your Nobl9 resource (hereβ€”the SLO and its objective), Nobl9 automatically generates the name for it.
    name is a unique identifier for all resources in Nobl9. You can manually set it only once, when creating a resource. After you save your resource, the name becomes read-only. You will need your resource name when working with sloctl to specify the names of required resources in a YAML definition.