Skip to main content

Sources

What permission is needed to enable Dynatrace integration?


I get an error after setting up an integration with Splunk Observability.

  • There are two possible workarounds to solve this issue:
  1. Open a browser window in incognito mode and set up the integration again.
  2. Clear the Nobl9 cache in your browser and set up the integration again.

Are there any plans to expand the list of data sources Nobl9 supports?

  • If there's a data source that you would like us to add, log a suggestion using our contact form and we'll take a look.

I'm not receiving any data from my data source. How is that treated by Nobl9? What can I do to troubleshoot this?

When there is no data, no results get produced. Currently, you are not notified when the data inflow is stopped. Depending on the budgeting method, no data is treated differently:
  • Occurrences method - no data means that there were no good and no total events, so there is no effect on the ratio of good to total events.
  • Time Slices method - minutes without any data are counted as good minutes.

Here are some useful tips on how you can troubleshoot the no data received situation on your own:
  • Verify if your authentication credentials, for example, API key, are valid. You can verify them in the Nobl9 UI - simply go to the Integrations tab, find the relevant Source on the list, and click the Edit button.
  • Verify the query syntax and double check if there are no errors. It’s a good practice to first apply the query you wrote in the integration environment to make sure it returns valid metrics.
  • Make sure that your query returns only one metric and one time series as Nobl9 cannot process multiple metrics.
  • If you apply any filters in the query, it must point to the exact entity you want to be observed. Nobl9 can only process a single dataset - there is no aggregation on the N9 side.
  • If all of the above steps were verified but the SLO still doesn't show any data, review the data source integration log to look for any errors:
    • For the Agent connection method, you have access to the Agent logs. For more details on how to access Agent logs, refer to the Agent Troubleshooting section of the documentation.
    • For the Direct connection method, Nobl9 can look at these logs for you. Contact Nobl9 support at support@nobl9.com and provide the SLO name, project name in which the SLO is, and your organization name (if you have more than one).

When I’m adding a SumoLogic Ratio Metric and changing the timeslice value in the query, I can see the “Values for timeslices must be the same in good and total query“ error, even though the values are the same.

  • When this situation happens, you need to click the Next step button twice.

If I define an SLO with Lightstep Stream X, and then update it to use Lightstep Stream Y, will it retain all the history?

  • Yes, you should be able to update the StreamID without losing the history.

I deactivated Pingdom API key and just re-enabled it with a new key. I do not see a gap in the SLO metrics. Does it mean it was able to import the missing historical data?

  • Our Agents do their best to import missed data collection time periods. The capacity is defined by the memory allocated to the container. Since Pingdom data is not extremely granular, it can go back relatively far compared to other Agents.

Why is there a discrepancy between Nobl9’s SLI chart and Grafana cortex values for the SLO using ratio metric type?

A discrepancy in Nobl9 and Grafana may occur when creating SLOs using the ratio metric type and Prometheus as the data source. In the Nobl9 UI, the SLO using ratio metric type has a very specific 1-minute resolution presented on the graphs. The SLI value at these rounded minutes differs from the cortex values presented in Grafana.

Why is there a discrepancy?

  • Nobl9 interprets every data point non-incremental ratio SLI as an increase from the previous point. Such a time series is then downsampled by summing its individual points and is displayed on the chart. That’s why in Nobl9, there might be higher values on the chart than in Prometheus: the more the chart is zoomed out, the higher the values will be - this is an intended behavior.

  • However, it is important to understand the exact semantics of a user query and how it can affect calculations. When querying time series in Prometheus, there are 3 parameters that affect the result: query itself, time range, and step.

How can time range change the result?

  • In general, the time range only cuts out a section of the result time series; however, in certain functions, in promql, the increase function extrapolate values to the end of the requested range.

  • That means that the value for a given timestamp can be different depending on whether it is in the beginning, middle, or end of the requested time range. Nobl9 adjusts for that by appending the ends of the queried time range to full minutes.

How can step change the result?

  • The increase is a function that accepts a range vector as an argument and returns an instant vector. To produce a range vector, the customer can use the [] operator in promql with a fixed time duration, for example, [2m].
  • When looking at a value at a given point in time, a certain counter increases in the last 2 minutes until that point. If the step used to query the time series is smaller than the range vector duration, then the result is based on a series of overlapping windows.
  • Nobl9 interprets every data point non-incremental ratio SLI as an increase from the previous point. Nobl9 uses query_range API with a 15s step so with 2m range vector, each HTTP request is counted 8 times.
  • The above-mentioned logic has an impact on error budget calculations as it affects error distribution. What used to be a single erroneous event will be counted proportionally as 1/8 every 15s across 2 minutes. As a result, reliability burn down will be smoothed out in comparison to SLO, which would use a raw counter value.
  • Error budgets are calculated based on a proportion of errors, so higher absolute values don’t affect it.

What is the possible workaround?

  • Following the Prometheus documentation on the increase function, that is:
  • Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.
    If this is not required for your counter, that is, your counter doesn’t get restarted, the customer can use raw counter value in their query along with incremental ratio SLI instead for the most accurate results.
    Currently, Nobl9 doesn’t automatically adjust for breaks in monotonicity.

Other notes regarding the promql

  • Using offset in the query is a good way to adjust for the delays due to data availability in the instrumentation and delays due to eventual consistency. However, it also shifts the time of receiving alerts - remember that if the alert policies are very time sensitive.
  • It is also worth noting that sum is an aggregation operator and will aggregate values of across dimensions, not across time.