Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service and a repository that aggregates data from more than 70 AWS data sources. CloudWatch also allows users to publish custom metrics from their services. Creating SLOs using this data is a powerful tool to monitor large portfolios of products.
Nobl9 integration with CloudWatch supports CloudWatch Metrics Insights. Leveraging Metrics Insights, Nobl9 users can retrieve metrics even faster and gain added flexibility in querying raw service level indicator (SLI) data to use for their SLOs.
Using CloudWatch as a Source in Nobl9, users can configure their SLOs by leveraging data in CloudWatch-specific groupings β i.e., by region, namespaces, and dimensions.
Amazon CloudWatch parameters and supported features in Nobl9
- General support:
- Release channel: Stable, Beta
- Connection method: Agent, Direct
- Replay and SLI Analyzer: Historical data limit 15 days
- Event logs: Supported
- Query checker: Not supported
- Query parameters retrieval: Supported
- Timestamp cache persistence: Supported
- Query parameters:
- Query interval: 1 min
- Query delay: 1 min
- Jitter: 15 sec
- Timeout: 30 sec
- Agent details and minimum required versions for supported features:
- Plugin name: n9cloudwatch
- Query delay environment variable: CW_QUERY_DELAY
- Replay and SLI Analyzer: 0.65.0
- Query parameters retrieval: 0.73.2
- Timestamp cache persistence: 0.65.0
- Additional notes:
- Support for AWS cross-account observability
- No support for high-resolution metrics and metrics that use more than one Unit
- Learn more
Creating SLOs with CloudWatchβ
Using Amazon CloudWatch, you can create SLOs by:
-
Entering standard threshold and ratio metrics
-
Entering an SQL query
-
Entering multiple queries through JSON
All three methods are available both in the UI and through applying YAML (see the Creating CloudWatch SLOs - YAML section).
Nobl9 Webβ
Follow the instructions below to create your SLOs with CloudWatch in the UI:
- Navigate to Service Level Objectives.
- Click .
- Select a Service.
It will be the location for your SLO in Nobl9. - Select your Amazon CloudWatch data source.
- Modify Period for Historical Data Retrieval, when necessary.
- This value defines how far back in the past your data will be retrieved when replaying your SLO based on Amazon CloudWatch.
- A longer period can extend the data loading time for your SLO.
- Must be a positive whole number up to the maximum period value you've set when adding the Amazon CloudWatch data source.
- Select the Metric type:
- Threshold metric: a single time series is evaluated against a threshold.
- Ratio metric: two-time series for comparison for good events and total events.
For ratio metrics, select the Data count method: incremental or non-incremental.
countMetrics
), keep in mind that the values ββresulting from that query for both good and total:- Must be positive.
- While we recommend using integers, fractions are also acceptable.
- If using fractions, we recommend them to be larger than
1e-4
=0.0001
. - Shouldn't be larger than
1e+20
.
- Configure the metric.
CloudWatch allows you to create your query in the following ways:- Enter standard threshold and ratio metrics (click Configurations)
- Enter an SQL query
- Enter a multiple query through JSON
Read Entering CloudWatch Query for detailed instructions.
- Define the Time window for your SLO:
- Rolling time windows constantly move forward as time passes. This type can help track the most recent events.
- Calendar-aligned time windows are usable for SLOs intended to map to business metrics measured on a calendar-aligned basis.
- Configure the Error budget calculation method and Objectives:
- Occurrences method counts good attempts against the count of total attempts.
- Time Slices method measures how many good minutes were achieved (when a system operates within defined boundaries) during a time window.
- You can define up to 12 objectives for an SLO.
Similar threshold values for objectivesTo use similar threshold values for different objectives in your SLO, we recommend differentiating them by setting varying decimal points for each objective.
For example, if you want to use threshold value1
for two objectives, set it to1.0000001
for the first objective and to1.0000002
for the second one.
Learn more about threshold value uniqueness. - Add the Display name, Name, and other settings for your SLO:
- Name identifies your SLO in Nobl9. After you save the SLO, its name becomes read-only.
Use only lowercase letters, numbers, and dashes. - Create composite SLO: with this option selected, you create a composite SLO 1.0. Composite SLOs 1.0 are deprecated. They're fully operable; however, we encourage you to create new composite SLOs 2.0.
You can create composite SLOs 2.0 withsloctl
using the provided template. Alternatively, you can create a composite SLO 2.0 with Nobl9 Terraform provider. - Set Notifications on data. With it, Nobl9 will notify you in the cases when SLO won't be reporting data for more than 15 minutes.
- Add alert policies, labels, and links, if required.
Up to 20 items of each type per SLO is allowed.
- Name identifies your SLO in Nobl9. After you save the SLO, its name becomes read-only.
- Click CREATE SLO
Entering CloudWatch queryβ
Both, Ratio and Threshold metrics for a standard CloudWatch metric use the same parameters. For the Ratio Metric, choose one of the following metric types:
- Good Metric, meaning a ratio of
good
requests andtotal
requests - Bad Metric, meaning a ratio of
bad
requests andtotal
requests
and define the parameters separately.
- Standard Configuration
- SQL Query
- JSON
- Enter an Account ID (optional). Use Account ID to access your SLO data from multiple accounts within a Region. An AWS account ID is a 12-digit identification number of your AWS account. Check AWS Documentation to learn more.
- Add a Region. It is a region code in AWS. Use one of the regional codes that are listed here.
- Add a Namespace (mandatory, max. number of characters 255). A namespace can contain alphanumeric characters, period, a hyphen, underscore, forward slash, hash, or colon. A Namespace is a container for CloudWatch metrics. For further details, see CloudWatch Concepts | Amazon CloudWatch Documentation.
- Add a Metric Name (mandatory, max. number of characters 255).
- Add Statistic function. Statistic functions are aggregations of metric data over specified periods. For example, you can use
- Add Dimensions (optional, list). A dimension is a name/value pair that is part of the identity of a metric. Users can assign a max. of 10 dimensions to a metric.
- Add a Name (mandatory, max. number of characters 255, don't trim whitespaces). The name of the dimension. Dimension names must contain only ASCII characters and must include at least one non-whitespace character.
- Add a Value required (max. number of characters 255). It is the value of the dimension. Dimension values must contain only ASCII characters and must include at least one non-whitespace character.
AWS cross-account observability is available for configuration-type metrics only. This field is supported only through the Beta release channel.
Maximum
, Minimum
, Sum
, Average
. To see all statistics are supported by CloudWatch for metrics, go to Statistics Definition | Amazon CloudWatch Documentation.- Select SQL in the feature toggle.
- Select a Region.
- Select a type of Metric, and enter a Query. Sample SQL queries for CloudWatch:
- SQL Threshold metric for Cloudwatch: Query:
SELECT AVG(CPUUtilization) FROM "AWS/EC2β
- SQL Ratio metric for CloudWatch:
- Good Query:
SELECT AVG(CPUUtilization) FROM "AWS/EC2"
- Total Query:
SELECT MAX(CPUUtilization) FROM "AWS/EC2"
CloudWatch integration lets you query multiple CloudWatch metrics and use math expressions to create new time series based on these metrics. You can do this by entering multiple JSON queries:
- Choose JSON in the feature toggle.
- Choose a Region.
- Select a type of Metric, and enter a Query.
- Enter your JSON query.
- For samples of multiple JSON queries, refer to the Amazon CloudWatch JSON Queries section in the Nobl9 Documentation.
- For further details on CloudWatch metric math functions, go to Using Metric Math | Amazon CloudWatch Documentation.
Your query must be a valid JSON query. It must contain arrays of metrics. Refer to the CloudWatch Metrics Insights Queries | Amazon CloudWatch Documentation for more detailed information.
sloctlβ
Standard configurationβ
- Threshold (rawMetric)
- Ratio (countMetric) good over total
- Ratio (countMetric) bad over total
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200.0
name: ok
target: 0.95
rawMetric:
query:
cloudWatch:
region: us-west-2
namespace: AWS/RDS
metricName: ReadLatency
stat: Average
dimensions:
- name: LoadBalancer
value: app/api-server
accountId: "123456789012"
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1.0
name: ok
target: 0.95
countMetrics:
incremental: true
good:
cloudWatch:
region: us-west-2
namespace: AWS/ApplicationELB
metricName: HTTPCode_Target_2XX_Count
stat: SampleCount
dimensions:
- name: LoadBalancer
value: app/api-server
accountId: "123456789012"
total:
cloudWatch:
region: us-west-2
namespace: AWS/ApplicationELB
metricName: RequestCount
stat: SampleCount
dimensions:
- name: LoadBalancer
value: app/api-server
accountId: "123456789012"
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1.0
name: ok
target: 0.95
countMetrics:
incremental: true
bad:
cloudWatch:
region: us-west-2
namespace: AWS/ApplicationELB
metricName: HTTPCode_Target_5XX_Count
stat: SampleCount
dimensions:
- name: LoadBalancer
value: app/api-server
accountId: "123456789012"
total:
cloudWatch:
region: us-west-2
namespace: AWS/ApplicationELB
metricName: RequestCount
stat: SampleCount
dimensions:
- name: LoadBalancer
value: app/api-server
accountId: "123456789012"
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
Important notes:
Both ratio and threshold metrics for CloudWatch use the same parameters.
For ratio metric, define these parameters separately for the good/bad metric and total metric.
-
region
is required. It is a region code in AWS. Use one of the regional codes listed here. -
namespace
is required (string, max. number of characters 255). It can contain alphanumeric characters, period.
, hyphen-
, underscore_
, forward slash/
, hash#
, or colon:
. Anamespace
is a container for CloudWatch metrics. For further details, see CloudWatch Concepts | Amazon CloudWatch documentation. Example:AWS/ApplicationELB
. -
metricName
is required (string, max. number of characters 255). -
stat
is required. stats are aggregations of metric data over specified periods of time. To see what statistics are supported by CloudWatch for metrics, go to Statistics Definitions | Amazon CloudWatch documentation. Examples:Sum, Average, p95, TC(0.005:0.030)
. -
dimensions
field is optional (list). A dimension is a name/value pair that is part of the identity of a metric. Users can assign a max. of 10 dimensions to a metric.-
name
is required (string, max. number of characters 255). Dimension names must contain only ASCII characters and must include at least one non-whitespace character. -
value
is required (string, max. number of characters 255). Dimension values must contain only ASCII characters and must include at least one non-whitespace character.
-
Using CloudWatch SQL queryβ
- Threshold (rawMetric) SQL
- Ratio (countMetric) good over total
- Ratio (countMetric) bad over total
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200.0
name: ok
target: 0.95
rawMetric:
query:
cloudWatch:
region: us-west-2
sql: SELECT AVG(CPUUtilization) FROM "AWS/EC2β
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1.0
name: ok
target: 0.95
countMetrics:
incremental: true
good:
cloudWatch:
region: us-west-2
sql: SELECT AVG(CPUUtilization) FROM "AWS/EC2"
total:
cloudWatch:
region: us-west-2
sql: SELECT MAX(CPUUtilization) FROM "AWS/EC2"
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example CloudWatch SLO
indicator:
metricSource:
name: cloud-watch
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1.0
name: ok
target: 0.95
countMetrics:
incremental: true
bad:
cloudWatch:
region: us-west-2
sql: SELECT AVG(CPUUtilization) FROM "AWS/EC2"
total:
cloudWatch:
region: us-west-2
sql: SELECT MAX(CPUUtilization) FROM "AWS/EC2"
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
Important notes:
Both ratio and threshold metrics for CloudWatch use the same parameters.
For ratio metric, define these parameters separately for the good/bad metric and total metric.
When using SQL query, only these fields are required:
-
region
is mandatory. It is a regional code in AWS. Use one of the regional codes listed here. Note: CloudWatch SQL query is available in all AWS Regions, except China. -
sql
is mandatory. It is an SQL query to compare, aggregate, and group metrics by labels to gain real-time operational insights.
Multiple-metric SLO with JSONβ
- Threshold (rawMetric) JSON
- Ratio (countMetric)
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: cloudwatch-rawmetric-via-json
project: cloudwatch
spec:
budgetingMethod: Occurrences
description: ""
indicator:
metricSource:
kind: Agent
name: cloudwatch
project: cloudwatch
objectives:
- displayName: ""
op: lte
rawMetric:
query:
cloudWatch:
json: |-
[
{
"Id": "e1",
"Expression": "m1 / m2",
"Period": 60
},
{
"Id": "m1",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_2XX_Count",
"Dimensions": [
{
"Name": "LoadBalancer",
"Value": "app/main-default-appingress-350b/904311bedb964754"
}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "m2",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "RequestCount",
"Dimensions": [
{
"Name": "LoadBalancer",
"Value": "app/main-default-appingress-350b/904311bedb964754"
}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
}
]
region: eu-central-1
target: 0.8
value: 0.9
service: cloudwatch-service
timeWindows:
- count: 1
isRolling: true
period:
begin: "2021-11-10T14:49:37Z"
end: "2021-11-10T15:49:37Z"
unit: Hour
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: cloudwatch-timeslices-json
project: cloudwatch
spec:
budgetingMethod: Timeslices
description: ""
indicator:
metricSource:
name: cloudwatch
objectives:
- countMetrics:
good:
cloudWatch:
json: |
[
{
"Id": "e1",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_2XX_Count",
"Dimensions": [
{
"Name": "LoadBalancer",
"Value": "app/main-default-appingress-350b/123456789"
}
]
},
"Period": 60,
"Stat": "Sum"
}
}
]
region: eu-central-1
incremental: false
total:
cloudWatch:
json: |
[
{
"Id": "e2",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "RequestCount",
"Dimensions": [
{
"Name": "LoadBalancer",
"Value": "app/main-default-appingress-350b/123456789"
}
]
},
"Period": 60,
"Stat": "Sum"
}
}
]
region: eu-central-1
displayName: ""
target: 0.5
timeSliceTarget: 0.5
value: 1
service: cloudwatch-service
timeWindows:
- count: 1
isRolling: true
period:
begin: "2021-11-10T12:19:58Z"
end: "2021-11-10T13:19:58Z"
unit: Hour
Important notes:
Both ratio and threshold metrics for CloudWatch use the same parameters.
For ratio metric, define these parameters separately for the good/bad metric and total metric.
When using multiple queries (JSON) it is important to remember about:
-
region
field is mandatory. It is a regional code in AWS. Use one of the regional codes listed here. -
json
field is mandatory. It is a JSON query that lets you query multiple CloudWatch metrics and use math expressions to create new time series based on these metrics.
The following JSON validation applies:
-
The JSON query must be valid.
-
The JSON query should be an array of metrics.
-
Only one
ReturnData
field can be set to true (when it is not set, by default it is true), and the rest of theReturnData
fields in other metrics has to be set explicitly to false. -
The
Period
field inMetricStat
is required, and it has to be equal to 60, ifMetricStat' does not exist
, thePeriod
field should be set in the base object to 60.
For further details on CloudWatch metric math functions, go to Using Metric Math | Amazon CloudWatch documentation.
Querying the CloudWatch serverβ
Once the SLO is set up, Nobl9 queries the CloudWatch server every 60 seconds.
CloudWatch API rate limitsβ
For GetMetricData
API, CloudWatch has limit of 50TPS per Region set by default. This is the maximum number of operation requests you can make per second. For more information, refer to the CloudWatch service quotas | CloudWatch documentation.
CloudWatch has minimum query and store period - one second. By default, CloudWatch stores data with a 1-minute period.
CloudWatch retains metric data differently for various store period. For more information, refer to the GetMetricData | CloudWatch documentation.
Known limitationsβ
CloudWatch SQL query is available in all AWS Regions, except China.