Data sources
In the case of data retrieval failures for any data source, except for Splunk Observability, you can check this data source's event logs:
- The data source must be added with the direct connection method
- To collect event logs for Replay and SLI Analyzer, add your required data source in the Beta channel
- Event logs must be activated in advance for your data source
β Data collection after outagesβ
In the case of any outages, the after-recovery data collection resumes starting from the last successfully fetched data point. This is ensured by the timestamp cache persistence feature and applies to data sources regardless of the connection method used.
By default, it's inactive for data sources connected using the agent method, so you may want to activate persistence for your data source.
β Permissions for adding the Dynatrace data sourceβ
To connect the Nobl9 agent to Dynatrace, you need an access token with the metrics.read
scope activated.
Read more about the Dynatrace integration with Nobl9.
β Errors upon setting up an integration with Splunk Observabilityβ
Clear the Nobl9 cache in your browser and set up the integration again.
β Nobl9 integration with new data sourcesβ
If your required data source is missing in the list of available integrations, let us know contact form, so we can estimate the possibility.
β No data from a data sourceβ
Depending on the budgeting method, "no data" is treated as follows:
- Occurrences: no good and no total events, so there is no effect on the ratio of good to total events.
- Time slices: minutes with no data considered good.
You can troubleshoot this situation in the following ways:
- Verify your authentication credentials.
To verify credentials on the Nobl9 Web, go to Integrations and click Edit on your required data source. - Verify the query.
Itβs a good practice to first apply the query you wrote in a test environment to make sure it returns valid metrics. - Make sure your query returns only one metric and one time series.
- If you apply any filters in the query, it must point to the exact entity you want to observe.
Nobl9 processes only one datasetβthere is no aggregation on the Nobl9 side.
When all the above steps are verified but the SLO still doesn't show any data, review the data source integration log for any errors:
- For the Agent connection, examine agent logs
- For the Direct connection, check event logs
β Changing streams in ServiceNow Cloud Observabilityβ
You can update the stream ID for SLOs based on ServiceNow Cloud Observability without losing the history.
β Discrepancy between Nobl9 SLI chart and values from my data sourceβ
A discrepancy in Nobl9 and Grafana may occur when creating ratio SLOs with Prometheus as the data source. In the Nobl9 Web, a ratio SLO has a very specific 1-minute resolution presented on the graphs. The SLI value at these rounded minutes differs from the cortex values presented in Grafana.
Why is there a discrepancy?β
Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point. Such a time series is then downsampled by summing its individual points and is displayed on the chart. Thatβs why in Nobl9, there might be higher values on the chart than in Prometheus: the more the chart is zoomed out, the higher the values will beβthis is an intended behavior. However, it is important to understand the exact semantics of a user query and how it can affect calculations. When querying time series in Prometheus, there are three parameters that affect the result:
- Query
- Time range
- Step
How can time range
change the result?β
In general, the time range only cuts out a section of the result time series; however, in certain functions, in promql
, the increase
function extrapolate values to the end of the requested range.
That means that the value for a given timestamp can be different depending on whether it is in the beginning, middle, or end of the requested time range. Nobl9 adjusts for that by appending the ends of the queried time range to full minutes.
How can step
change the result?β
The increase
is a function that accepts a range vector as an argument and returns an instant vector.
To produce a range vector, the customer can use the []
operator in promql
with a fixed time duration, for example,
[2m]
.
When looking at a value at a given point in time,
a certain counter increases in the last two minutes until that point.
If the step
used to query the time series is smaller than the range vector duration,
then the result is based on a series of overlapping windows.
Since Nobl9 interprets every data point of a non-incremental ratio SLI as an increase from the previous point and
uses the query_range API with the 15s
step, then with the 2m
range vector, each HTTP request is counted eight times.
The above-mentioned logic has an impact on error budget calculations as it affects error distribution.
What used to be a single erroneous event will be counted proportionally as 1/8 every 15 seconds over two minutes.
As a result, reliability burn down will be smoothed out in comparison to SLO, which would use a raw counter value.
Error budgets are calculated based on a proportion of errors, so higher absolute values donβt affect it.
What is the possible workaround?β
Following the Prometheus documentation on the increase
function:
Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.
If this is not required for your counter, that is, your counter doesnβt get restarted, you can use the raw counter value in your query along with the incremental ratio SLI instead of the most accurate results.
Other notes regarding promql
β
Using offset
in the query is a useful way
to account for delays in data availability or eventual consistency in monitoring systems.
However,
it also shifts the alert time, so consider this for alerts with time-sensitive policies.
Moreover, sum
is an aggregation operator. In the context of time-series data, sum
aggregates values across dimensions, not across time.
Useful linksβ
Check Nobl9 features that will help you troubleshoot:
Event logs
Metrics health notifier
Query checker
Nobl9 agent logging