Most teams set up service level objectives (SLOs) based on a rolling window because it's more actionable, especially when trying to get ahead of incidents. Many engineering teams work in sprints, and rolling time windows match that model well.
To answer management questions, you may extrapolate and re-purpose rolling time window SLOs. They can create a skewed image of the state of your service and lead to over- or under-reporting of uptime for your services. If you’re using rolling time windows and trying to extrapolate calendar data, then the preferred calculation method for your error budget would be a calendar-aligned time window.
Calendar-aligned reporting: overview
Defining SLOs to align directly to a calendar makes it much easier to provide reporting for uptime. As the name implies, calendar-aligned SLOs are tied to a specific window of time on the calendar with a clear start and stop date. Calendar-aligned SLOs reset at the end of each window and never regain the error budget in a time window.
Rolling time window SLOs, on the other hand, never reset but “earn” the error budget back as old “bad” occurrences drop off the back end of the time window. You can easily repurpose existing SLIs from existing rolling-time-window SLOs (see the use case section below).
Operational alerts vs. management reports
Different audiences have slightly different needs in how actual results compare to goals. Platform teams, reliability engineers, application developers, and operations teams need to know what’s happening now and the trend, ignoring arbitrary month-end boundaries. Using 28-day rolling windows for SLOs is a convenient and consistent way to ensure that you miss minor blips while significant trends can be dealt with proactively.
On the other hand, management and business-focused users like customer support, procurement, and executive teams need to see how well the services (including 3rd party vendors) are operating compared to a calendar. Your organization can use this for planning, compliance reporting, and holding vendors accountable to SLAs.
Case study: are we achieving our uptime goals?
Overview of the issue
A Nobl9 customer asked us to help them create a management report using their existing SLOs. They ran a large e-commerce platform and used SLOs to ensure critical user journeys like catalog display or checkout were working correctly for their customers. They also set up rolling time-window SLOs.
They tried to use their existing SLO data to answer questions from management about overall uptime. Still, they were stuck merely estimating because the rolling windows had to convert the data to calendar dates after the fact.
Configuring uptime reporting with calendar-aligned windows
To simplify their setup, they set up calendar-aligned SLOs using the same SLI queries, which made the configuration trivial and the reporting very clear. Because the customer already had error budget alerts on their rolling-time-window SLOs, the new calendar-aligned SLOs required no error budget alerting policies for the calendar-aligned SLOs.
They could provide up-to-the-minute uptime reports without data manipulation or extra reporting steps. The combination of SLOs gave all audiences what they needed: the platform, reliability engineering, operations, and application development teams got proactive alerts when their services started seeing elevated errors. And the management and customer support teams could see precisely how well the service worked in any given month, quarter, and year.
Without SLOs, the organization had trouble understanding if they were meeting SLAs and were constantly reacting to incidents and outages. Now, they have a much clearer picture of what was happening. They could adjust resources to ensure data-driven decisions around contractual SLAs, user experience, and the velocity of features.