Skip to main content

Composites use case

Reading time: 0 minute(s) (0 words)

The following document describes a configured example of Composites 2.0 emulating a real-life scenario.

User journeyโ€‹

The chart below illustrates a user journey defined for an online store:

User journey
User journey

First, the user interacts with a website and selects products they want to buy. When finalizing the purchase, they pay via an external payment provider integrated with the website.

After an order is placed, several things happen. An order with a request to dispatch a package is sent to the warehouse. The warehouse uses its software, which is hosted outside of our shopโ€™s IT infrastructure and integrated via API. An invoice is issued and processed by accounting. An email with the order confirmation, delivery details from the warehouse, and an attached invoice is sent back to the user.

The entire scenario is divided into two distinct phases:

  • Before the user places an order
  • After the user places an order.

System architectureโ€‹

Both of these phases mix steps that are fulfilled by software hosted in the storeโ€™s IT infrastructureโ€”store website, email server, invoicing softwareโ€”and external providers, such as payment and warehouse services.

The following chart illustrates the store's IT architecture:

User journey
User journey

The company uses Prometheus to monitor all its self-hosted services. Due to the limitations imposed by external service providers, metrics regarding warehouse operations and payment services are only available via Datadog integration.

Nobl9 SLO configurationโ€‹

The company has already configured a set of SLOs for each of the services using both available data sources. Because different teams are responsible for self-hosted applications and external integrations, these configurations were organized into separate Nobl9 projects. The reliability of most services is measured with two SLOs: one SLO measuring availability and one SLO measuring the latency of the service.

The company is now interested in measuring the overall reliability of services that contribute to the core user flow of placing orders and making purchases. The following observations has been made when defining reliability requirements:

In general, the availability of a service is much more important than its latency. While latency is recognized as a factor in a userโ€™s purchase experience, it is still far more important for the service to work at all than for it to respond very quickly.
Services in the pre-purchase phase are more important than services in the post-purchase state. After the user has already paid and placed an order the most impart part of the process is done.
The latency of the warehouse integration and invoicing system will not be taken into consideration. It has been decided that the benefit of monitoring and improving the latency of these services is not worth the investment.
In a pre-purchase phase website availability is slightly more important than payment availability.
In a post-purchase phase warehouse and invoicing are slightly more important than emailing.

Based on the following criteria, the company introduces the following SLOs:

The entire Nobl9 configuration looks like this:

Composite config
Nobl9 Composite configuration

SLO configurationโ€‹

The following section includes configuration of all SLOs defined for the online store along with specific services connected to them.

Component SLOsโ€‹

Store website
apiVersion: n9/v1alpha
kind: Service
metadata:
name: store-web
displayName: Store Website
project: e-commerce
spec:
description: User facing store website.
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: store-web-availability
displayName: Store Website Availability
project: e-commerce
spec:
budgetingMethod: Occurrences
indicator:
metricSource:
kind: Agent
name: prometheus
project: e-commerce
objectives:
- displayName: Availability
name: availability
op: lt
primary: true
rawMetric:
query:
prometheus:
promql: (time()*1+10)%(minute(vector(time()*1+10*60))+1)
target: 0.95
value: 50
service: store-web
timeWindows:
- count: 28
isRolling: true
unit: Day
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: store-web-latency
displayName: Store Website Latency
project: e-commerce
spec:
budgetingMethod: Occurrences
indicator:
metricSource:
kind: Agent
name: prometheus
project: e-commerce
objectives:
- displayName: Latency
name: latency
op: lt
primary: true
rawMetric:
query:
prometheus:
promql: (time()*1+17)%(minute(vector(time()*1+17*60))+1)
target: 0.9
value: 40
service: store-web
timeWindows:
- count: 14
isRolling: true
unit: Day

Composite SLOsโ€‹

Pre-purchase user experience
apiVersion: n9/v1alpha
kind: Service
metadata:
name: user-experience
displayName: User experience
project: e-commerce
spec:
description: Service for grouping all user experience based SLOs.
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: pre-purchase-user-experience
displayName: Pre-purchase user experience
project: e-commerce
spec:
description: Measures experience before user makes a purchase.
budgetingMethod: Occurrences
objectives:
- displayName: Pre-purchase user experience
name: pre-purchase-user-experience
target: 0.9
composite:
maxDelay: 45m
components:
objectives:
- project: e-commerce
slo: store-web-availability
objective: availability
weight: 4
whenDelayed: CountAsBad
- project: e-commerce
slo: store-web-latency
objective: latency
weight: 1
whenDelayed: CountAsGood
- project: external-services
slo: payments-availability
objective: availability
weight: 3
whenDelayed: CountAsBad
- project: external-services
slo: payments-latency
objective: latency
weight: 1
whenDelayed: CountAsGood
service: user-experience
timeWindows:
- unit: Day
count: 28
isRolling: true
alertPolicies:
- slow-budget-drop
- slow-burn
- fast-budget-drop
note

The queries in all SLOs are entirely hypothetical. The data doesn't reflect the actual availability or latency of any real system.