Skip to main content

Rolling time windows

Reading time: 0 minute(s) (0 words)

Rolling time windows excel at offering a near real-time view of system health, making them a vital tool for reliability engineers, application developers, and operations teams. They are particularly effective when organizations prioritize reactive and proactive operations over historical reporting, as the SLOs continuously refresh, capturing the most recent data.

Rolling SLOs reflect service reliability based on rolling intervals that continuously move forward, providing a live view of service performance and enabling engineering teams to react to recent changes quickly.

This guide explores when to use rolling time windows, how they complement calendar-aligned SLOs, and provides real-world examples to help teams implement them for effective system reliability tracking and decision-making.

In a nutshell​

Rolling time windows maintain a fixed duration while their start and end dates continuously shift forward. This ongoing movement causes that older data points are constantly dropped from the time window as newer ones are included.

Consequently, configuring an SLO with the rolling time window only requires selecting the window duration; the start time is automatically set to the moment the SLO is saved. You can set its duration for up to 31 days.

For the reliability values to be restored to 100%, the following conditions must be met:

  • Bad data points must be replaced with good ones
  • This condition must last long enough for the reliability to fully recover

Practical use​

What goals do rolling time windows serve best?

Observations show that, typically, rolling SLOs are beneficial for delivering timely operational alerts. With this approach, reliability engineers, application developers, and operations teams gain real-time insights into system performance and can take necessary actions to maintain the desired level of reliability.

GoalBusiness domainPractical use
Real-time monitoringSRE / DevOpsMonitor the health of the services in real time to detect and respond to issues promptly
Incident managementSRE / DevOpsProvide a dynamic view of the error budget and allowing for recovery after incidents as old data rolls out
Trend analysisSRE / DevOpsIdentify patterns or anomalies in service reliability without arbitrary calendar boundaries
Feature impact monitoringApplication developmentTrack the impact of feature rollouts on reliability after deployments
Stability validationApplication developmentIdentify whether the service stabilizes after issues
Granular operation oversightPlatformPredict reliability by continuously highlighting trends before they escalate into incidents

Case study: feature rollout impact assessment​

A technology scale-up was rapidly launching new features in its SaaS platform. The engineering and product management teams struggled to measure the immediate reliability impact of each deployment. Their existing monthly or quarterly reporting windows obscured short-term trends, causing them to miss critical signals after new features went live.

They needed a way to gauge changes in user-facing reliability in near real-time, especially post-release.

Solution with rolling time windows​

The organization configured SLOs with rolling time windows, keeping the rest of the SLO settings the same. By evaluating the error budget and service performance across constantly updating windows, teams received rapid feedback on how each deployment affected the overall user experience.

Because rolling time windows continuously excluded the oldest data while incorporating the latest, service disruptions or performance dips caused by new features became quickly visibleβ€”even if they were resolved swiftly. This enabled teams to course-correct, roll back, or fast-follow with fixes without waiting until the end of the reporting period.

End result​

The transition to rolling time windows led to:

  • Faster identification and remediation of incident spikes caused by recent changes
  • Real-time monitoring of the impact caused by implemented changes
  • A more stable user experience, thanks to continuous, actionable insights rather than delayed, high-level retrospectives
For a more in-depth look, consult additional resources: