Sumo Logic
Sumo Logic is an observability platform that provides visibility into AWS, Azure, and GCP cloud applications and infrastructure.
Sumo Logic parameters and supported features in Nobl9
- General support:
- Release channel: Stable, Beta
- Connection method: Agent, Direct
- Replay and SLI Analyzer: Not supported
- Event logs: Supported
- Query checker: Not supported
- Query parameters retrieval: Supported
- Timestamp cache persistence: Supported
- Query parameters:
- Query interval: 2 min
- Query delay: 4 min
- Jitter: 30 sec
- Timeout: 30 sec
- Agent details and minimum required versions for supported features:
- Plugin name: n9sumologic
- Query delay environment variable: SUMOLOGIC_QUERY_DELAY
- Query parameters retrieval: 0.73.2
- Timestamp cache persistence: 0.65.0
- Additional notes:
- Supported authentication using <accessId>:<accessKey>
Creating SLOs with Sumo Logicβ
Sumo Logic allows you to create SLOs for both types of metrics by:
-
Entering logs
-
Entering metrics
See the instructions in the following sections for more details.
Nobl9 Webβ
- Threshold β Metrics
- Threshold β Logs
- Ratio β Metrics
- Ratio β Logs
Follow the instructions below to create Sumo Logic threshold metric using the Metrics type:
- Navigate to Service Level Objectives.
- Click the button.
- In step 1 of the SLO wizard, select the Service the SLO will be associated with.
- In step 2, select Sumo Logic as the data source for your SLO, then specify the Metric.
- Select Threshold metric > Metrics.
- Select value and units for Quantization.
- In Sumo Logic, quantization is the process of aggregating metric data points for time series over an interval of time. The minimum value for this field is 15s.
- For more details, refer to the Sumo Logic documentation.
- Select value for Rollup. Rollup is an aggregation function Sumo Logic uses when quantizing metrics.
- Select one of the following values:
avg
,sum
,min
,max
,count
,none
. - Default value is
none
. - Enter a Query.
- Sample query for Sumo Logic Threshold metric (Metrics type):
metric=CPU_usage
. - In step 3, define a Time Window for the SLO.
- In step 4, specify the Error Budget Calculation Method and your Objective(s).
- In step 5, add a Name, Description, and other details about your SLO. You can also select Alert policies and Labels on this screen.
- When youβre done, click Create SLO.
Follow the instructions below to create Sumo Logic threshold metric using the Logs type:
- Navigate to Service Level Objectives.
- Click the button.
- In step 1 of the SLO wizard, select the Service the SLO will be associated with.
- In step 2, select Sumo Logic as the data source for your SLO, then specify the Metric.
- Select Threshold metric > Logs.
- Enter a Query
- The Query must contain the keyword
timeslice
. - Sample query for Sumo Logic threshold metric:
- In step 3, define a Time Window for the SLO.
- In step 4, specify the Error Budget Calculation Method and your Objective(s).
- In step 5, add a Name, Description, and other details about your SLO. You can also select Alert policies and Labels on this screen.
- When youβre done, click Create SLO.
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1" * * " as (status_code, size, tail)
| if (status_code matches "20" or status_code matches "30*",1,0) as resp_ok
| sum(resp_ok) as n9_value by n9_time
| sort by n9_time asc
Follow the instructions below to create Sumo Logic ratio metric using the Metrics type:
- Navigate to Service Level Objectives.
- Click the button.
- In step 1 of the SLO wizard, select the Service the SLO will be associated with.
- In step 2, select Sumo Logic as the data source for your SLO, then specify the Metric.
- Select Ratio metric > Metrics.
- Choose the Data Count Method.
- Non-incremental: counts incoming metric values one-by-one. So the resulting SLO graph is pike-shaped.
- Incremental: counts the incoming metric values incrementally, adding every next value to previous values. It results in a constantly increasing SLO graph.
- Enter a Query:
- Good query for the ratio metric (Metrics type): quantization: 15s
rollup: Avg
query: metric=Mem_Used - Total query for the ratio metric (Metrics type):quantization: 15s
rollup: Avg
query: metric=Mem_Total - In step 3, define a Time Window for the SLO.
- In step 4, specify the Error Budget Calculation Method and your Objective(s).
- In step 5, add a Name, Description, and other details about your SLO. You can also select Alert policies and Labels on this screen.
- When youβre done, click Create SLO.
Follow the instructions below to create Sumo Logic ratio metric using the Logs type:
- Navigate to Service Level Objectives.
- Click the button.
- In step 1 of the SLO wizard, select the Service the SLO will be associated with.
- In step 2, select Sumo Logic as the data source for your SLO, then specify the Metric.
- Select Ratio metric > Logs.
- Choose the Data Count Method.
- Non-incremental: counts incoming metric values one-by-one. So the resulting SLO graph is pike-shaped.
- Incremental: counts the incoming metric values incrementally, adding every next value to previous values. It results in a constantly increasing SLO graph.
- Enter a Query.
The query must contain the keyword
timeslice
: - Good query for the ratio metric (logs type):
- Total query for the ratio metric (logs type):
- In step 3, define a Time Window for the SLO.
- In step 4, specify the Error Budget Calculation Method and your Objective(s).
- In step 5, add a Name, Description, and other details about your SLO. You can also select Alert policies and Labels on this screen.
- When youβre done, click Create SLO.
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1" * * " as (status_code, size, tail)
| if (status_code matches "20" or status_code matches "30*",1,0) as resp_ok
| sum(resp_ok) as n9_value by n9_time
| sort by n9_time asc
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1" * * " as (status_code, size, tail)
| count() as n9_value by n9_time
| sort by n9_time asc
countMetrics
), keep in mind that the values ββresulting from that query for both good and total:- Must be positive.
- While we recommend using integers, fractions are also acceptable.
- If using fractions, we recommend them to be larger than
1e-4
=0.0001
. - Shouldn't be larger than
1e+20
.
sloctlβ
Sumo Logic metricsβ
- Threshold (rawMetric)
- Ratio (countMetric)
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Sumo Logic SLO
indicator:
metricSource:
name: sumo-logic
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200.0
name: ok
target: 0.95
rawMetric:
query:
sumoLogic:
type: metrics
query: metric=CPU_Usage
quantization: 15s
rollup: Avg
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
- apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Sumo Logic SLO
indicator:
metricSource:
name: sumo-logic
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1.0
name: ok
target: 0.95
countMetrics:
incremental: true
good:
sumoLogic:
type: metrics
query: metric=Mem_Used
quantization: 15s
rollup: Avg
total:
sumoLogic:
type: metrics
query: metric=Mem_Total
quantization: 15s
rollup: Avg
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01 00:00:00
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
Mandatory requirements for Sumo Logic metrics
SLOs
Specification for Sumo Logic metrics has the following mandatory fields:
-
sumologic
-
type
- string field. Select only one of the following values:metrics
orlogs
. -
quantization
- integer field for the period of data aggregation.-
In Sumo Logic, quantization is the process of aggregating metric data points for time series over an interval of time (e.g,
s
,h
). The minimum value for this field is15s
. -
For more details, refer to the Metric Quantization | Sumo Logic documentation.
-
-
rollup
- string field.
Rollup is an aggregation function Sumo Logic uses when quantizing metrics. Choose one of the below values (default isnone
):-
avg
,sum
,min
,max
,count
,none
. -
For more details, refer to the Rollup Types | Sumo Logic documentation.
-
-
query
- string field.
Your custom query. Example:metric=CPU_usage
-
Sumo Logic logsβ
- Threshold (rawMetric)
- Ratio (countMetric)
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Sumo Logic SLO
indicator:
metricSource:
name: sumo-logic
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 200
name: ok
target: 0.95
rawMetric:
query:
sumoLogic:
type: logs
query: >-
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1" * * " as (status_code, size, tail)
| if (status_code matches "20" or status_code matches "30*",1,0)
as resp_ok
| sum(resp_ok) as n9_value by n9_time
| sort by n9_time asc
op: lte
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01T00:00:00.000Z
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: api-server-slo
displayName: API Server SLO
project: default
labels:
area:
- latency
- slow-check
env:
- prod
- dev
region:
- us
- eu
team:
- green
- sales
annotations:
area: latency
env: prod
region: us
team: sales
spec:
description: Example Sumo Logic SLO
indicator:
metricSource:
name: sumo-logic
project: default
kind: Agent
budgetingMethod: Occurrences
objectives:
- displayName: Good response (200)
value: 1
name: ok
target: 0.95
countMetrics:
incremental: true
good:
sumoLogic:
type: logs
query: |-
_collector="app-cluster" _source="logs"
| json "log"
| timeslice 15s as n9_time
| parse "level=* *" as (log_level, tail)
| if (log_level matches "error" ,0,1) as log_level_not_error
| sum(log_level_not_error) as n9_value by n9_time
| sort by n9_time asc
total:
sumoLogic:
type: logs
query: |-
_collector="app-cluster" _source="logs"
| json "log"
| timeslice 15s as n9_time
| parse "level=* *" as (log_level, tail)
| count(*) as n9_value by n9_time
| sort by n9_time asc
primary: true
service: api-server
timeWindows:
- unit: Month
count: 1
isRolling: false
calendar:
startTime: 2022-12-01T00:00:00.000Z
timeZone: UTC
alertPolicies:
- fast-burn-5x-for-last-10m
attachments:
- url: https://docs.nobl9.com
displayName: Nobl9 Documentation
anomalyConfig:
noData:
alertMethods:
- name: slack-notification
project: default
Mandatory requirements for Sumo Logic Logs
queries
-
query
:-
Must contain the keyword
timeslice
:-
Sumo Logic supports only integers (
15s
,1m
,1050ms
). -
The minimum value for timeslice is 15 sec.
-
-
Must contain
n9time
andn9value
: Then9time
is the actual time, and then9value
is the metric value. Then9time
must be a Unix timestamp and then9value
must be a float value. -
Must contain aggregation keyword, such as
count(*) by n9_time as n9_value
. -
Alias fields or your query by an
as
operator to ensure you get ann9_time
andn9_value
returned in your query. For details on theas
operator, refer to Sumo Logic documentation.
-
For more details on constructing Sumo Logic queries, see the Querying for logs section below.
Querying for logsβ
Sumo Logic Search Syntax is based on Pipelines. Queries work similarly to Pipelines in Unix-like operating systems:
operator1 | operator2 | operator3
Each operator is separated by the |
sign and passes the result to the next one, and they are progressively filtered, so eventually, you get the desired result.
All queries begin with a keyword or string search. Special characters:
-
*
- a wildcard, for zero or more characters. -
?
- a question Mark, for a single character.
An example of Sumo Logic query looks like this:
_sourceCategory=uploads/nginx
| parse "HTTP/1.1\" * * *" as (status_code, size, tail)
In the example above, the first wildcard
is evaluated as the status_code
, the second - size
, and the third will store the remaining message.
An example good query for count metrics (SLO based on HTTP status codes) looks like this:
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1\" * * *" as (status_code, size, tail)
| if (status_code matches "20*" or status_code matches "30*",1,0) as resp_ok
| sum(resp_ok) as n9_value by n9_time
| sort by n9_time asc
That will produce the following output:
"n9_time","n9_value"
"1645371960000","2.0"
"1645372020000","58.0"
"1645372080000","46.0"
"1645372140000","12.0"
"1645372200000","12.0"
"1645372260000","12.0"
"1645372320000","14.0"
"1645372380000","22.0"
A similar query, but for Total instead of Good:
_sourceCategory=uploads/nginx
| timeslice 1m as n9_time
| parse "HTTP/1.1\" * * *" as (status_code, size, tail)
| count(*) as n9_value by n9_time
| sort by n9_time asc
For the full specification on Sumo Logic queries, refer to the official documentation.
Querying the Sumo Logic serverβ
Nobl9 queries Sumo Logic leveraging the Search Job API or Metrics Query API every two minutes with a query delay of four minutes. The maximum resolution of the response must be 4 data points.
The query's Time range is set from the beginning and end of the 2-minute-time window being queried.
Sumo Logic API rate limitsβ
Sumo Logic's Search Job API requests are rate limited (see Rate limit throttling | Sumo Logic documentation).
The Nobl9 agent requests several endpoints to gather data points according to the Process Flow described in the documentation. The Nobl9 agent distributes the required requests within the two-minute interval to reduce the number of requests per second.
To prevent Sumo Logic rate limits issues:
- Prefer metrics queries over logs queries. Logs are at least 4 times more expensive than metrics (see how to convert your logs to metrics)
- Logs queries should take at most two minutes (using Sumo Logic partitions and Sumo Logic scheduled views will help a lot)
- If you're using the Nobl9 agent for Sumo Logic, stick to a single agent as your data source (this will allow Nobl9 to orchestrate querying Sumo Logic API). This does not apply to directs, having multiple of them doesn't impact rate limiting orchestration.
- Keep the number of Sumo Logic logs objectives in check with your API limits (see Number of objectives directed)
- Contact Sumo Logic customer support to increase your rate limits and prevent conflicts.
Number of directed objectivesβ
Sumo Logic allows for a total of 240 requests per minute to its APIs combined. Nobl9 agent for Sumo Logic has a 2-minute query interval. It means that Nobl9 can make up to 480 API requests to Sumo Logic.
Querying for metricsβ
Querying metrics is synchronousβyou query, and the API responds with data.
This means you could have at most 480 unique Metrics queries run against Sumo Logic API.
Querying for logsβ
Querying logs is more complicated. The following shows the lifecycle of obtaining the data:
- Create a search logs job.
- Wait 20 seconds and query if the job is completed (repeat until the process is completed).
- Fetch data for the finished job.
- Delete the job.
Each of the steps executed uses up one request to the Sumo Logic API. The optimistic count for a single logs query is 4. Step 2 (listed above) may, and most probably will, be repeated, as logs queries usually need more processing time. The pessimistic count is that step 2. will be repeated 6 times using up to 9 API requests per a single logs query.
This means that you can have anywhere from 54 to 120 logs queries.
Limitationsβ
For direct connections, we only support orchestration of querying Sumo Logic within the same release channel. Having the direct connections both in the Stable and Beta release channels causes desynchronization of querying and may result in failures.