Skip to main content

Query customization

Reading time: 0 minute(s) (0 words)

By default, Nobl9 queries a data source every minute for the last-minute data.
However, data sources can have limitations related to the consistency of the data, rate, and resource constraints on the platform, or network overhead.

For smooth operations and consistent data flow to Nobl9, you can configure querying behavior with the following parameters:

  • Query interval
  • Query delay
  • Jitter
  • Timeout

This article explains how they work and impact other Nobl9 features. Also, it highlights their interconnection and dependencies for correct configuration to avoid multiple queries running simultaneously and unexpected effects.

note

Currently, only query delay configuration is available on the Nobl9 Web, sloctl, and Terraform. To modify the rest of the parameters, contact Nobl9 support

Query interval​

The query interval sets how often Nobl9 queries a data source. It also outlines the period of data retrieval.

Suppose it's 15:00 now.
The query interval = 2min. Nobl9 sends a query every 2 minutes for the data collected by the data source over the latest 2-minute interval.

Query sending timeData retrieval period
15:0014:58-15:00
15:0215:00-15:02
15:0415:02-15:04

β†˜ To retrieve fresh data, set a smaller query interval value.
For example, when you need to get alerts as soon as possible.
However, you can hit the data source's rate limits.

β†— To reduce the number of requests keeping the same data retrieval period, increase the query interval.
For example, with a 1-minute query interval, Nobl9 sends five queries to retrieve data for the 5 minutes. The same data is retrieved with one query when the interval is 5 minutes.
The higher the value, the more outdated data is retrieved.

query interval
Image 1: Query interval

Interaction with other parameters​

  • Jitter must be less or equal to the query interval
    Otherwise it can cause data to come in out-of-order, which can lead to data being rejected.
  • Timeout must be less or equal to the query interval
    Timeout longer than the query interval can cause multiple simultaneous queries and unexpected effects.

Impact on Nobl9 features​

  • Alerting: high values can delay important alerts
    For example, the query interval = 30 minutes, and the query delay is 0. The incident occurred 20 minutes ago, and Nobl9 sent the last query 25 minutes ago. The agent will query this period data within the next 5 minutes. So, there will be at least a 25-minute lag in the notification.
  • Service Health dashboards: can cause services falling into the No data category
    With the query interval value greater than or equal to the dashboard's time window, Nobl9 lacks time to accumulate data for categorization.
note

Nobl9 doesn't send alerts on incidents that took place 1 hour ago or later.

Query delay​

Query delay is the amount of time that Nobl9 waits before sending a query. The delay is counted from the time set by the query interval. Although the query delay postpones sending a query, it doesn't affect the data retrieval period. Regardless of the query delay value, Nobl9 will always query for data as it is set by the query interval.

Shortly, the query delay helps solve issues with inconsistent or unavailable in the data source data.

Suppose it's 15:00 now.
The query interval = 2min.

Query sent at,
query delay = 0 min
Query sent at,
query delay = 1 min
Data retrieval period
15:0015:0114:58-15:00
15:0215:0315:00-15:02
15:0415:0515:02-15:04
15:0615:0715:04-15:06

The above example shows that the data retrieval period is the same, regardless of whether the query is sent with a delay or without it. It illustrates the way to let the data source time for proper indexing data across all its partitions. This guarantees the data retrieved is fresh and consistent.
The particular value of the query delay required to ensure data consistency depends on the data source.

Read more about data consistency models.

query delay
Image 2: Query delay

β†˜ To obtain insights and alerts faster, reduce the query delay.

β†— To ensure the data retrieved is already aggregated and consolidated, increase the query delay.
When there's a discrepancy in the data between a data source and Nobl9, querying for the last minute can yield different results with each query compared to querying for the same period 10 minutes later.

Impact on Nobl9 features​

  • Replay: can affect results
    For instance, if the query delay is set to 15 minutes or more, and the Replay time window is narrow (less than 3 hours, depending on the data source), Replay can import data faster than the query delay duration, overwriting a small portion of already collected data. It can cause inconsistencies and short periods of incorrect results.
  • Alerting: can delay important alerts
    For example, the query interval is set to 1 minute and the query delay is 10 minutes. An incident occurred 10 minutes ago, and the last query was 1 minute ago. It can result in 10-11 minute lag in the notificationβ€”once the agent requests the relevant time range.
  • Service Health dashboards: can cause services falling into the No data category
    With the query delay value greater than or equal to the dashboard's time window, Nobl9 lacks time to accumulate data for categorization.

Read more about query delay.

Jitter​

Jitter refers to an interval when randomized data source querying is allowed.
The agent starts counting the jitter immediately after the query interval or query delay (if any) and then sends the query randomly within the defined range.

Suppose it's 15:00.
The query interval = 2 min.

Jitter valueQuery sending timeData retrieval period
015:0014:58-15:00
30 secAt a random point between
15:00:00 and 15:00:30
14:58-15:00
jitter
Image 3: Jitter

β†˜ The smaller the jitter value, the higher the query density for SLOs with more than one SLI.
When your SLO covers a significant amount of SLIs from a single data source, small jitter values can cause data source overload.

β†— Increasing collection jitter can be helpful when a data source has significant spiky loads.
A higher collection jitter spreads the load over the query interval.

Query density and jitter

The lowest query density is when the collection jitter value is equal to the query interval.

query density
Image 4: Collection jitter vs query density.
When you have several SLIs based on the same data source, collection jitter = query interval allows distributing the load to the data source

Interaction with other parameters​

  • Query interval + query delay: must be greater or equal to collection jitter
    Otherwise, it can cause data to come in out-of-order, which can lead to some data being rejected.
Direct connection and jitter

For SLOs based on data sources connected with the direct method, we recommend jitter values < 1 min.

Impact on Nobl9 features​

  • SLOs: when your SLO covers several SLIs based on the same data source, low collection jitter values can exhaust the data source rate limits.

Timeout​

Timeout refers to the duration Nobl9 will wait for a response after sending a request to a data source.
If the data source responds at any point within the timeout range, the query is considered successful.
If the data source fails to respond at the end of the timeout period, the query is considered a failure.
In either case, the subsequent queries run as set by other parameters, with no regard to the time when the previous query is finished.

Suppose it's 15:00.
The query interval = 2 min.
The timeout = 15 sec.

Query sending timeWaiting tillNext query sending time
15:00:0015:00:1515:02:00
15:02:0015:02:1515:04:00
timeout
Image 5: Timeout

β†— Increase the timeout when the agent reports timeouts, especially for long-running complex queries.

Interaction with other parameters​

  • Query interval + query delay: must be greater or equal to the timeout.
    While timeout doesn't affect the querying behavior (the subsequent request starts when the interval passes, disregarding when the previous query finished), timeout longer than the query interval + delay can lead to multiple queries running simultaneously, which can have unexpected effects.

Impact on Nobl9 features​

Replay: too low timeout values cause Replay to fail. Low timeout hinders the data source from returning data in time.

Default query parameters​

Data sourceQuery intervalQuery delayJitterTimeout
Generic1min015s30s
Amazon CloudWatch1min1min15s30s
Amazon Prometheus1min015s30s
Amazon Redshift1min30s15s30s
AppDynamics1min1min15s30s
Azure Monitor
beta
1min5min15s60s
BigQuery1min015s30s
Datadog2min1min15s30s
Dynatrace1min2min15s30s
Elasticsearch1min1min15s30s
Google Cloud Monitoring1min2min15s50s
Grafana Loki1min1min15s30s
Graphite1min1min15s30s
Honeycomb1min5min15s10s
InfluxDB1min1min15s60s
Instana1min1min15s30s
New Relic1min1min15s30s
OpenTSDB1min1min15s30s
Pingdom1min1min15s30s
Prometheus1min015s30s
ServiceNow Cloud Observability1min2min15s30s
Splunk1min5min20s30s
Splunk Observability
1min5min15s30s
Sumo Logic2min4min30s30s
ThousandEyes1min1min15s60s

Query parameters retrieval​

You can retrieve values for jitter, timeout, and query interval parameters for agents and directs using the sloctl get [directs | agents] command for sources that support these parameters. See the table below for minimum agent versions.

note

Currently, these values are read-only. If you change and apply them through a YAML definition of an agent or direct, it won't make any changes to the source.

apiVersion: n9/v1alpha
kind: Agent
metadata:
name: my-source
project: default
spec:
...
sumoLogic:
...
releaseChannel: beta
interval:
unit: Minute
value: 2
jitter:
unit: Second
value: 30
queryDelay:
minimumAgentVersion: 0.65.0-beta09
unit: Minute
value: 5
timeout:
unit: Second
value: 30

Scope of support​

The following table shows minimum versions for agents supporting retrieval of query parameters.

note

If you want to change values for the query interval, jitter and timeout, contact Nobl9 support. Keep in mind that it's not possible for sources without the agent version listed in the table.

Data SourceQuery IntervalJitterTimeout
AmazonPrometheus0.71.0-beta0.71.0-beta0.71.0-beta
AppDynamics0.70.0-beta040.70.0-beta040.70.0-beta04
Azure Monitor0.71.0-beta0.71.0-beta0.71.0-beta
BigQuery0.71.0-beta0.71.0-beta0.71.0-beta
CloudWatch0.71.0-beta0.71.0-beta0.71.0-beta
Datadog0.70.0-beta040.70.0-beta040.70.0-beta04
Dynatrace0.70.0-beta040.70.0-beta040.70.0-beta04
Elasticsearch0.70.0-beta040.70.0-beta040.70.0-beta04
GCM0.70.0-beta040.70.0-beta040.70.0-beta04
GrafanaLoki0.70.0-beta040.70.0-beta040.70.0-beta04
Graphite---
InfluxDB---
Instana---
Lightstep0.70.0-beta040.70.0-beta040.70.0-beta04
NewRelic0.70.0-beta040.70.0-beta040.70.0-beta04
OpenTSDB---
Pingdom0.71.0-beta0.71.0-beta0.71.0-beta
Prometheus0.70.0-beta040.70.0-beta040.70.0-beta04
Redshift---
Splunk0.70.0-beta040.70.0-beta040.70.0-beta04
SplunkObservability---
SumoLogic0.73.0-beta0.73.0-beta0.73.0-beta
ThousandEyes0.71.0-beta0.71.0-beta0.71.0-beta

Key takeaways​

ParameterWhen to modify?Other parameters dependencyImpact on Nobl9 features
Query interval:
frequency
β†˜ Retrieve fresh data
β†— Reduce the number of requests to the data source
Jitter, timeout ≀ query intervalAlerting: long query intervals can delay important alerts
Query delay:
lag
β†˜ Get insights and alerts faster
β†— Get aggregated and consolidated data
Jitter, timeout ≀ query interval + query delayReplay: long query delay can affect results
Alerting: long query delay can postpone important alerts
Jitter:
randomness
β†— Spread the load over the query interval durationQuery interval + query delay β‰₯ jitterSLOs: short jitter can exhaust rate limits for SLOs with big amount of SLIs
Timeout:
wait time
β†— The agent reports timeoutsQuery interval + query delay β‰₯ timeoutReplay, SLI Analyzer, SLOs: short timeout make it impossible to receive a response
interaction
Image 6: Parameter interactions with query delay, collection jitter, and timeout set to zero
interaction
Image 7: Parameter interactions with non-zero query delay, collection jitter, and timeout