Grafana Loki
Grafana Loki (or Loki) isΒ a horizontally-scalable, multi-tenant log aggregation system that is extremely easy to operate. Loki does not index the contents of the logs, but rather a set of labels for each log stream. Nobl9 users can leverage Loki to query and build metrics on top of their logs.
Scope of supportβ
Currently, the Grafana Loki integration with Nobl9 does not support the direct connection.
Authenticationβ
Loki does not provide an authentication layer.Β Authentication is up to the customer.Β Users are expected to run an authenticating reverse proxy in front of their services, such as NGINX using basic auth
or an OAuth2
proxy.
Nobl9 collects only the URL for the Loki integration definition and calls the GET /loki/api/v1/query_range
URL. For details, refer to HTTP API Grafana Loki | Grafana Loki documentation.
Authenticating Grafana Loki agent with the basic_auth
proxyβ
Since Loki does not provide an authentication layer, the authentication method is up to the users. Normally, Loki's users are expected to run an authenticating reverse proxy in front of their services, such as NGINX
using basic_auth
proxy.
If that's the method you use, the Nobl9 agent version equal to or higher than 0.40.0, allows you to send an additional Authorization request header with the basic_auth
. Refer to the section below for more details.
Adding Grafana Loki as a data sourceβ
To add Grafana Loki as a data source in Nobl9 using the agent connection method, follow these steps:
- Navigate to Integrations > Sources.
- Click
.
- Click the relevant Source icon.
- Choose a relevant connection method (Agent or Direct), then configure the source as described below.
Grafana Loki agentβ
Agent configuration in the UIβ
Follow the instructions below to configure your Grafana Loki agent:
- Select one of the following Release Channels:
- The
stable
channel is fully tested by the Nobl9 team. It represents the final product; however, this channel does not contain all the new features of abeta
release. Use it to avoid crashes and other limitations. - The
beta
channel is under active development. Here, you can check out new features and improvements without the risk of affecting any viable SLOs. Remember that features in this channel may be subject to change.
- The
Add the URL (mandatory).
Theurl
is an entry point to Grafana Loki. It depends on the configuration of your Loki instance, for more details, refer to the Configuration | Grafana Loki documentation section of Grafana Loki technical documentation.
- Select a Project.
Specifying a project is helpful when multiple users are spread across multiple teams or projects. When the Project field is left blank then object is assigned to projectdefault
. - Enter a Display Name.
You can enter a friendly name with spaces in this field. - Enter a Name.
The name is mandatory and can only contain lowercase, alphanumeric characters and dashes (for example,my-project-name
). This field is populated automatically when you enter a display name, but you can edit the result. - Enter a Description.
Here you can add details such as who is responsible for the integration (team/owner) and the purpose of creating it. - Specify the Query delay to set a customized delay for queries when pulling the data from the data source.
- The default value in Grafana Loki integration for Query delay is
1 minute
.
infoChanging the Query delay may affect your SLI data. For more details, check the Query delay documentation. - The default value in Grafana Loki integration for Query delay is
- Click Add Data Source
Agent using CLI - YAMLβ
The YAML for setting up an agent connection to Grafana Loki looks like this:
apiVersion: n9/v1alpha
kind: Agent
metadata:
name: grafana-loki-agent
displayName: Grafana Loki Agent # optional
project: default
spec:
description: Agent settings for Grafana Loki datasource # optional
sourceOf:
- Metrics
- Services
releaseChannel: beta # string, one of: beta || stable
queryDelay:
unit: Minute # string, one of: Second || Minute
value: 720 # numeric, must be a number less than 1440 minutes (24 hours)
grafanaLoki:
url: http://loki.example.com
Important notes:
url
is an entry point to Grafana Loki. Theurl
depends on the configuration of your Loki instance, for more details, refer to the Configuration | Grafana Loki documentation section of Grafana Loki technical documentation.
You can deploy only one agent in one YAML file by using the sloctl apply
command.
Deploying Grafana Loki agentβ
When you add the data source, Nobl9 automatically generates a Kubernetes configuration and a Docker command line for you to use to deploy the agent. Both of these are available in the web UI, under the Agent Configuration section. Be sure to swap in your credentials.
- Kubernetes
- Docker
If you use Kubernetes, you can apply the supplied YAML config file to a Kubernetes cluster to deploy the agent. It will look something like this:
# DISCLAIMER: This deployment description contains only the fields necessary for the purpose of this demo.
# It is not a ready-to-apply k8s deployment description, and the client_id and client_secret are only exemplary values.
apiVersion: v1
kind: Secret
metadata:
name: nobl9-agent-nobl9-dev-grafana-loki-name
namespace: default
type: Opaque
stringData:
client_id: "unique_client_id"
client_secret: "unique_client_secret"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nobl9-agent-nobl9-dev-grafana-loki-lokiagent
namespace: default
spec:
replicas: 1
selector:
matchLabels:
nobl9-agent-name: "lokiagent"
nobl9-agent-project: "grafana-loki"
nobl9-agent-organization: "nobl9-dev"
template:
metadata:
labels:
nobl9-agent-name: "lokiagent"
nobl9-agent-project: "grafana-loki"
nobl9-agent-organization: "nobl9-dev"
spec:
containers:
- name: agent-container
image: nobl9/agent:latest
resources:
requests:
memory: "350Mi"
cpu: "0.1"
env:
- name: N9_CLIENT_ID
valueFrom:
secretKeyRef:
key: client_id
name: nobl9-agent-nobl9-dev-grafana-loki-lokiagent
- name: N9_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: client_secret
name: nobl9-agent-nobl9-dev-grafana-loki-name
# The N9_METRICS_PORT is a variable specifying the port to which the /metrics and /health endpoints are exposed.
# The 9090 is the default value and can be changed.
# If you donβt want the metrics to be exposed, comment out or delete the N9_METRICS_PORT variable.
- name: N9_METRICS_PORT
value: "9090"
If you use Docker, you can run the Docker command to deploy the agent. It will look something like this:
# DISCLAIMER: This Docker command contains only the fields necessary for the purpose of this demo.
# It is not a ready-to-apply command, and you will need to replace the placeholder values with your own values.
docker run -d --restart on-failure \
--name nobl9-agent-nobl9-dev-grafana-loki-lokiagent \
-e N9_CLIENT_ID="unique_client_id" \
-e N9_CLIENT_SECRET="unique_client_secret" \
# The N9_METRICS_PORT is a variable specifying the port to which the /metrics and /health endpoints are exposed.
# The 9090 is the default value and can be changed.
# If you donβt want the metrics to be exposed, comment out or delete the N9_METRICS_PORT variable.
-e N9_METRICS_PORT=9090 \
nobl9/agent:latest
Deploying Grafana Loki agent with basic_auth
methodβ
To activate basic_auth for the agent, you need to pass optional environmental variables to an agent:
AUTH_METHOD: basic_auth
- is a fixed value, but it must be passed to let know agent thatbasic_auth
will be used.USERNAME: REDACTED
- username forbasic_auth
.PASSWORD: REDACTED
- password forbasic_auth
.
- Kubernetes
- Docker
If you use Kubernetes, the supplied YAML config file to a Kubernetes cluster to deploy the agent using basic_auth
method. It will look something like this:
# DISCLAIMER: This deployment description contains only the fields necessary for the purpose of this demo.
# It is not a ready-to-apply k8s deployment description, and the client_id and client_secret are only exemplary values.
apiVersion: v1
kind: Secret
metadata:
name: nobl9-agent-nobl9-dev-stable-grafana-loki
namespace: default
type: Opaque
stringData:
client_id: "REDACTED"
client_secret: "REDACTED"
basic_auth_username: "REDACTED"
basic_auth_password: "REDACTED"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nobl9-agent-nobl9-dev-stable-grafana-loki
namespace: default
spec:
replicas: 1
selector:
matchLabels:
nobl9-agent-name: "grafana-loki"
nobl9-agent-project: "grafana-loki"
nobl9-agent-organization: "nobl9-dev-stable"
template:
metadata:
labels:
nobl9-agent-name: "grafana-loki"
nobl9-agent-project: "grafana-loki"
nobl9-agent-organization: "nobl9-dev-stable"
spec:
containers:
- name: agent-container
image: nobl9/agent:latest
resources:
requests:
memory: "350Mi"
cpu: "0.1"
env:
- name: N9_CLIENT_ID
valueFrom:
secretKeyRef:
key: client_id
name: nobl9-agent-nobl9-dev-stable-grafana-loki
- name: N9_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: client_secret
name: nobl9-agent-nobl9-dev-stable-grafana-loki
- name: AUTH_METHOD
value: "basic_auth"
- name: USERNAME
valueFrom:
secretKeyRef:
key: basic_auth_username
name: nobl9-agent-nobl9-dev-grafana-loki-with-basic-auth
- name: PASSWORD
valueFrom:
secretKeyRef:
key: basic_auth_password
name: nobl9-agent-nobl9-dev-grafana-loki-with-basic-auth
# The N9_METRICS_PORT is a variable specifying the port to which the /metrics and /health endpoints are exposed.
# The 9090 is the default value and can be changed.
# If you donβt want the metrics to be exposed, comment out or delete the N9_METRICS_PORT variable.
- name: N9_METRICS_PORT
value: "9090"
If you use Docker, you can run the Docker command to deploy the agent with the basic_auth
method. It will look something like this:
# DISCLAIMER: This Docker command contains only the fields necessary for the purpose of this demo.
# It is not a ready-to-apply command, and you will need to replace the placeholder values with your own values.
docker run -d --restart on-failure \
--name nobl9-agent-nobl9-dev-stable-grafana-loki-grafana-loki \
-e N9_CLIENT_SECRET="REDACTED" \
-e N9_CLIENT_ID="REDACTED" \
# The N9_METRICS_PORT is a variable specifying the port to which the /metrics and /health endpoints are exposed.
# The 9090 is the default value and can be changed.
# If you donβt want the metrics to be exposed, comment out or delete the N9_METRICS_PORT variable.
-e N9_METRICS_PORT=9090 \
-e AUTH_METHOD="basic_auth" \
-e USERNAME="REDACTED" \
-e PASSWORD="REDACTED" \
nobl9/agent:latest
Creating SLOs with Grafana Lokiβ
Creating SLOs in the UIβ
Follow the instructions below to create your SLOs with Grafana Loki in the UI:
Navigate to Service Level Objectives.
- Click
.
In step 1 of the SLO wizard, select the Service the SLO will be associated with.
In step 2, select Grafana Loki as the data source for your SLO.
Specify the Metric. You can choose either a Threshold Metric, where a single time series is evaluated against a threshold or a Ratio Metric, which allows you to enter two time series to compare (for example, a count of good requests and total requests).
- Choose the Data Count Method for your ratio metric:
- Non-incremental: counts incoming metric values one-by-one. So the resulting SLO graph is pike-shaped.
- Incremental: counts the incoming metric values incrementally, adding every next value to previous values.
It results in a constantly increasing SLO graph.
Enter a Query, or Good Query and Total Query for the metric you selected.
Refer to the Query Examples section below for more details.In step 3, define a Time Window for the SLO.
In step 4, specify the Error Budget Calculation Method and your Objective(s).
In step 5, add a Name, Description, and other details about your SLO. You can also select Alert policies and Labels on this screen.
When youβre done, click Create SLO.
SLOs using Grafana Loki - YAML samplesβ
- rawMetric
- countMetric
Hereβs an example of Grafana Loki using a rawMetric
(threshold metric):
apiVersion: n9/v1alpha
kind: SLO
metadata:
name: n9-kafka-main-cluster-alerts-error-budgets-out-lag-threshold
project: grafana-loki
spec:
description: Example of Loki Metric query
service: grafana-loki-service
indicator:
metricSource:
name: grafana-loki
timeWindows:
- unit: Day
count: 1
isRolling: true
budgetingMethod: Occurrences
objectives:
- displayName: Good
op: lte
rawMetric:
query:
grafanaLoki:
logql: sum(sum_over_time({topic="error-budgets-out", consumergroup="alerts", cluster="main"} |= "kafka_consumergroup_lag" | logfmt | line_format "{{.kafka_consumergroup_lag}}" | unwrap kafka_consumergroup_lag [1m]))
value: 5
target: 0.50
- displayName: Moderate
op: lte
rawMetric:
query:
grafanaLoki:
logql: sum(sum_over_time({topic="error-budgets-out", consumergroup="alerts", cluster="main"} |= "kafka_consumergroup_lag" | logfmt | line_format "{{.kafka_consumergroup_lag}}" | unwrap kafka_consumergroup_lag [1m]))
value: 10
target: 0.75
Hereβs an example of Grafana Loki as a countMetric
(ratio metric):
apiVersion: n9/v1alpha
kind: SLO
metadata:
displayName: intake-response-duration
name: response-duration
project: grafana-loki
spec:
alertPolicies: []
budgetingMethod: Occurrences
description: ""
indicator:
metricSource:
kind: Agent
name: grafana-loki
project: grafana-loki
rawMetric:
grafanaLoki:
logql: avg(avg_over_time({app="nobl9", component="intake"} |= "duration" |= "main.nobl9.dev" | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | unwrap duration [1m]))
objectives:
- displayName: Perfect
op: lte
target: 0.95
value: 59000000
service: grafana-loki-service
timeWindows:
- count: 1
isRolling: true
unit: Hour
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
displayName: intake-correct-responses-ratio
name: intake-correct-responses-ratio
project: grafana-loki
spec:
alertPolicies: []
budgetingMethod: Occurrences
description: ""
indicator:
metricSource:
kind: Agent
name: grafana-loki
project: grafana-loki
objectives:
- countMetrics:
good:
grafanaLoki:
logql: count(count_over_time(({app="nobl9", component="intake"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code >= 200 and http_status_code < 300)[1m]))
incremental: false
total:
grafanaLoki:
logql: count(count_over_time(({app="nobl9", component="intake"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code > 0)[1m]))
displayName: Perfect
target: 0.99
value: 1
service: grafana-loki-service
timeWindows:
- count: 1
isRolling: true
unit: Hour
---
apiVersion: n9/v1alpha
kind: SLO
metadata:
displayName: ingest-correct-responses-ratio
name: ingest-correct-responses-ratio
project: grafana-loki
spec:
alertPolicies: []
budgetingMethod: Occurrences
description: ""
indicator:
metricSource:
kind: Agent
name: grafana-loki
project: grafana-loki
objectives:
- countMetrics:
good:
grafanaLoki:
logql: count(count_over_time(({app="nobl9", component="ingest", container="ingest-container"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code >= 200 and http_status_code < 300)[1m]))
incremental: false
total:
grafanaLoki:
logql: count(count_over_time(({app="nobl9", component="ingest", container="ingest-container"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code > 0)[1m]))
displayName: Stable
target: 0.99
value: 1
service: grafana-loki-service
timeWindows:
- count: 1
isRolling: false
unit: Day
calendar:
startTime: 2021-09-20 12:30:00 # date with time in 24h format
timeZone: America/New_York
Metrics for Grafana Loki have one mandatory field:
logql
is a query written in the PromQL (Prometheus Query Language). For more details, refer to Introduction to PromQL | Grafana documentation. You can see working examples of Grafana Loki queries in the Query examples section below.
Query examplesβ
- Ratio metric for Grafana Loki:
Good Query:count(count_over_time(({app="nobl9", component="ingest", container="ingest container"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code >= 200 and http_status_code < 300)[1m]))
Total Query:count(count_over_time(({app="nobl9", component="ingest", container="ingest-container"} | json | line_format "{{.log}}" | json | http_useragent != "ELB-HealthChecker/2.0" | http_status_code > 0)[1m]))
Querying the Grafana Loki serverβ
Nobl9 calls Loki API every minute to retrieve the log data from the previous minute. Nobl9 aggregates the total number of points to 4 per minute.
Users should refrain from adding duration and Nobl9 will append [1m]
to the query.
Useful linksβ
Grafana HTTP API | Grafana Loki documentation