No commit activity in last 3 years
No release in over 3 years
Sensu plugin to compose complex Prometheus queries and execute result-set evaluation
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.14
~> 0.1.10
~> 0.10
~> 12.0
~> 3.5
~> 0.47
~> 0.13
~> 3.0
~> 2.3
 Project Readme

Sensu Kubernetes Prometheus Plugin

build status codecov.io gem version

Description

Sensu plugin designed to query prometheus data output from node-exporter

Usage

check_prometheus.rb /path/to/config.yml

# Debug mode to output all json and blacklisted checks
PROM_DEBUG=true check_prometheus.rb /path/to/config.yml

Development and testing

Dependencies: docker, docker-compose

To spin-up a development stack and run the integration tests

ruby test.rb

Afterwards you can just run rspec to run the tests

To run the dockerized version (that gitlab-ci uses)

bash test.sh

Environment variables

Name Example Default Description
PROM_DEBUG true false Debug output instead of sending checks to sensu
PROMETHEUS_ENDPOINT hostname:9090 localhost:9090 Connection string in the format address:port
SENSU_SOCKET_ADDRESS hostname localhost Address used to connect to the sensu socket
SENSU_SOCKET_PORT 1234 3030 Port used to connect to the sensu socket

Config.yml

Check configuration is defined in the config.yml file under the key checks, and checks based on custom Prometheus queries are under custom. Example:

config:
  reported_by: sbppapik8s
  occurrences: 3
  domain: example.com
  whitelist: sbppapik8s.*
  use_default_source: false
checks:
  - service:
    name: kube-controller-manager.service
  - check: load_per_cluster
    host: sbppapik8s
    cfg:
      cluster: prometheus
      warn: 1.0
      crit: 2.0
      source: sbppapik8s
custom:
  - name: heartbeat
    query: up
    check:
      type: equals
      value: 1
    msg:
      0: 'OK: Endpoint is alive and kicking'
      2: 'CRIT: Endpoints not reachable!'

Checks

Name Description
service Checks if a systemd service is active
memory Checks memory usage as a percentage
load_per_cpu Checks cpu load divided by cpus
load_per_cluster Checks cpu load of entire cluster divided by total cpus
load_per_cluster_minus_n Checks cpu load of entire cluster divided by total cpus minus n failures
inode Checks inode usage as a percentage per mountpoint
disk Checks filesytem usage as a percentage per mountpoint
disk_all Checks filesystem and inode usage of all mountpoints
predict_disk_all Predicts if any of the disks in prometheus will be full in x days

Custom

Name Example Description
name heartbeat Custom check's name
query up Prometheus query
check.type (equals|below|above) Type of evaluation applied against value. Avilable: `equals`, `below` and `above`
check.value 1 Value to be compared against query results, using `check.type` evaluation
cfg.warn 33.00 Warning threshold level
cfg.crit 37.00 Critical threshold level.
msg.0 OK: heartbeat is up Message to be used when `value` evaluation is sucessful.
msg.2 CRITICAL: heartbeat is down Message to be used when not sucessful.

Global Configuration Options

Name Example Description
reported_by sbppapik8s hostname that shows up in sensu reported_by field
occurrences 3 amount of failures before sensu will send an alert
whitelist sbppapik8s.* regex used as a safety whitelist to make sure the source names are correct
ttl 300 Override the Sensu TTL in seconds
ttl_status 1 Override the status code for an expiring Sensu TTL
use_default_source false When `true` the source of the events will be Sensu-Client's

Check Configuration Options

Name Config Example
service name: servicename
state: active|deactivating|failed|inactive (default:active)
state_required: 0|1 (default:1)
name: test-service.service
memory warn: warning percentage
crit: critical percentage
warn: 90
crit: 95
load_per_cpu warn: warning percentage
crit: critical percentage
warn: 90
crit: 95
load_per_cluster cluster: cluster name
warn: warning percentage
crit: critical percentage
source: name that shows in sensu
cluster: nodes
warn: 90
crit: 95
source: sbppapik8s
load_per_cluster_minus_n cluster: cluster name
minus_n: amount of member failures
warn: warning percentage
crit: critical percentage
source: name that shows in sensu
cluster: nodes
minus_n: 1
warn: 90
crit: 95
source: sbppapik8s
inode mount: mountpoint
name: human readable name
warn: warning percentage
crit: critical percentage
mount: /var/lib/docker
name: docker
warn: 90
crit: 95
disk mount: mountpoint
name: human readable name
warn: warning percentage
crit: critical percentage
mount: /var/lib/docker
name: docker
warn: 90
crit: 95
disk_all ignore_fs: regex of filesystems
warn: warning percentage
crit: critical percentage
ignore_fs: tmpfs
warn: 90
crit: 95
predict_disk_all range_vector: Prometheus range vector used for sample size of prediction filter: prometheus filter to include/exclude disks
days: prediction days source: sensu name
range_vector: 24h
filter: {mountpoint="/"}
days: 14 source: sbppapik8s