
DevOps | Global Monitoring using Prometheus and Thanos
Prometheus
Prometheus was originally conceived at Soundcloud Since its inception in 2012, many companies and organisations have adopted Prometheus.
Prometheus has become the standard tool for monitoring and alerting in the Cloud and container world’s.
Prometheus uses time series data model for metrics and events. Following are the key features of prometheus :-
- a multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
- a flexible query language to leverage this dimensionality
- no dependency on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dash boarding support
- support for hierarchical and horizontal federation

Thanos
Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.
Thanos leverages the Prometheus 2.0 storage format to cost-efficiently store historical metric data in any object storage while retaining fast query latencies. Additionally, it provides a global query view across all Prometheus installations and can merge data from Prometheus HA pairs on the fly.
Following are the key features of Thanos :-
- Global query view of metrics.
- Unlimited retention of metrics.
- High availability of components, including Prometheus.

Thanos components
Thanos is made of a set of components with each filling a specific role.
- Sidecar: connects to Prometheus and reads its data for query and/or upload it to cloud storage
- Store Gateway: exposes the content of a cloud storage bucket
- Compactor: compact and downsample data stored in remote storage
- Receiver: receives data from Prometheus’ remote-write WAL, exposes it and/or upload it to cloud storage
- Ruler: evaluates recording and alerting rules against data in Thanos for exposition and/or upload
- Query Gateway: implements Prometheus’ v1 API to aggregate data from the underlying components
Sidecar
Thanos integrates with existing Prometheus servers through a sidecar process which runs in the same pod as the Prometheus server.
The purpose of the Sidecar is to backup Prometheus data into an Object Storage bucket, and giving other Thanos components access to the Prometheus instance the Sidecar is attached to via a gRPC API.
Application Kubernetes Clusters
To be able to get a global view of all the different environments ranging from dev through to prod we configure and install the Prometheus Operator, Prometheus components, Thanos Sidecar and ingress into each cluster.
Prometheus Operator
Configuring Thanos Object Storage
Thanos expects a Kubernetes Secret containing the Thanos configuration. Inside this secret you configure how to run Thanos with your object storage.
Once you have written your configuration save it to a file called thanos-storage-config.yaml
Here’s are a few examples for the major cloud providers:-
type: s3 config: bucket: thanos endpoint: aws.polarpoint.io access_key: XXX secret_key: XXX
type: GCS config: bucket: "" service_account: ""
type: AZURE config: storage_account: "XXX" storage_account_key: "XXX" container: "thanos"
kubectl create secret generic thanos-storage-config --from-file=thanos.yaml=thanos-storage-config.yaml --namespace default
As well as the Blob storage configuration we want to ensure all communication is secured using mTLS creating a tls secret signed with the same CA certificate as will be used for the ingress controller and another for the CA certificate.
kubectl create secret tls -n default thanos-ingress-secret --key dev-client.key --cert dev-client.cert kubectl create secret generic -n default thanos-ca-secret --from-file=ca.crt=cacerts.cer
Using the helm chart for Prometheus Operator and the following values file (prometheus-operator-thanos-values.yaml)
prometheus: prometheusSpec: replicas: 2 retention: 12h # we only need a few hours of retention, since the rest is uploaded to blob image: tag: v2.10.0 serviceMonitorNamespaceSelector: # find target config from multiple namespaces any: true thanos: # add Thanos Sidecar tag: v0.5.0 objectStorageConfig: # blob storage to upload metrics key: thanos.yaml name: thanos-storage-config grafana: enabled: false
helm install --name dev-prom stable/prometheus-operator -f prometheus-operator-thanos-values.yaml --tiller-namespace=default
We now have the Prometheus Operator installed in the application cluster.
kubectl get svc -n default -o wide dev-prom-kube-state-metrics ClusterIP xxxx 8080/TCP 29d app=kube-state-metrics,release=int-prom dev-prom-prometheus-node-exporter ClusterIP xxxx 9100/TCP 29d app=prometheus-node-exporter,release=dev-prom dev-prom-prometheus-operat-alertmanager ClusterIP xxxx 9093/TCP 29d alertmanager=dev-prom-prometheus-operat-alertmanager,app=alertmanager dev-prom-prometheus-operat-operator ClusterIP xxxx 8080/TCP 29d app=prometheus-operator-operator,release=dev-prom dev-prom-prometheus-operat-prometheus ClusterIP xxxx 9090/TCP 29d app=prometheus,prometheus=dev-prom-prometheus-operat-prometheus kubernetes ClusterIP xxxx 443/TCP 29d prometheus-operated ClusterIP None 9090/TCP 11d app=prometheus thanos-sidecar-0 ClusterIP xxxx 10901/TCP 29d statefulset.kubernetes.io/pod-name=prometheus-dev-prom-prometheus-operat-prometheus-0
To enable Thanos to be able to scrape the application cluster metrics we need to allow Thanos Store Gateway access to the Thanos Sidecars running in each application cluster, we need to expose them via an Ingress secured with MTLS using the tls and cacert secrets defined above.
kubectl apply -f thanos-ingress-rules.yaml -n default
kubectl get ingress
NAME HOSTS ADDRESS PORTS AGE
thanos-sidecar-0 prom.dev-polarpoint.local 80, 443 29d