Graphite
Graphite is a monitoring tool used to track the performance of websites, applications, business services, and networked servers.
Graphite parameters and supported features in Nobl9
- General support:
- Release channel: Stable, Beta
- Connection method: Agent
- Replay and SLI Analyzer: Historical data limit 30 days
- Event logs: Not supported
- Query checker: Not supported
- Query parameters retrieval: Not supported
- Timestamp cache persistence: Not supported
- Query parameters:
- Query interval: 1 min
- Query delay: 1 min
- Jitter: 15 sec
- Timeout: 30 sec
- Agent details and minimum required versions for supported features:
- Plugin name: n9graphite
- Query delay environment variable: GRAPHITE_QUERY_DELAY
- Replay and SLI Analyzer: 0.65.0
Creating SLOs with Graphite
Nobl9 Web
Follow the instructions below to create your SLOs with Graphite in the UI:
-
Navigate to Service Level Objectives.
-
Click
.
-
In step 2, select Graphite as the Data Source for your SLO, then specify the Metric. You can choose either a Threshold Metric, where a single time series is evaluated against a threshold, or a Ratio Metric, which allows you to enter two time series to compare (for example, a count of good requests and total requests).
- Choose the Data Count Method for your ratio metric:
- Non-incremental: counts incoming metric values one-by-one. So the resulting SLO graph is pike-shaped.
- Incremental: counts the incoming metric values incrementally, adding every next value to previous values.
It results in a constantly increasing SLO graph.
-
Enter a Query or Good query, and Total query for the metric you selected. The following are query examples:
-
Threshold metric for Graphite:
Query:carbon.agents.9b365cce.cpuUsage -
Ratio metric for Graphite:
Good query:stats_counts.response.200
Total query:astats_counts.response.allSLI values for good and totalWhen choosing the query for the ratio SLI (countMetrics), keep in mind that the values resulting from that query for both good and total:- Must be positive.
- While we recommend using integers, fractions are also acceptable.
- If using fractions, we recommend them to be larger than
1e-4=0.0001. - Shouldn't be larger than
1e+20.
-
-
In step 3, define a Time Window for the SLO.
-
Rolling time windows are better for tracking the recent user experience of a service.
-
Calendar-aligned windows are best suited for SLOs that are intended to map to business metrics measured on a calendar-aligned basis, such as every calendar month or every quarter.
-
In step 4, specify the Error Budget Calculation Method and your Objective(s).
- Occurrences method counts good attempts against the count of total attempts.
- Time Slicesmethod measures how many good minutes were achieved (when a system operates within defined boundaries) during a time window.
- You can define up to 12 objectives for an SLO.
See the use case example and the SLO calculations guide for more information on the error budget calculation methods.
-
In step 5, add the Display name, Name, and other settings for your SLO:
- Set no data anomaly detection.
When activated, Nobl9 notifies you if your SLO hasn't received data for the period you set. - Add alert policies, labels, and links, if required.
You can add up to 20 links per SLO.
- Set no data anomaly detection.
-
Click Create SLO.
YAML
- Threshold (rawMetric)
- Ratio (countMetric)
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Graphite SLO
indicator:
metricSource:
name: graphite
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200
name: ok
target: 0.95
rawMetric:
query:
graphite:
metricPath: carbon.agents.9b365cce.cpuUsage
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: '2022-12-01 00:00:00'
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
alertAfter: 1h
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Graphite SLO
indicator:
metricSource:
name: graphite
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1
name: ok
target: 0.95
countMetrics:
incremental: true
good:
graphite:
metricPath: stats_counts.response.200
total:
graphite:
metricPath: stats_counts.response.all
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: '2022-12-01 00:00:00'
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
alertAfter: 1h
Metric specification for Graphite has only one mandatory field:
metricPath- it is a string field that specifies Graphite’s metric path, such asservers.cpu.total
Visit the following link to understand Paths and Wildcards.
The Graphite documentation suggests using *, [,], {, or }, but Nobl9 does not support this functionality. When you use *, [,], {, or }, a validation error occurs.
Querying the Graphite server
Metrics are retrieved using the from and until parameters once per minute. The API returns a half-open interval (from, until], which includes the end date but not the start date.