Sensu Kubernetes Prometheus Plugin

Description

Sensu plugin designed to query prometheus data output from node-exporter

Usage

check_prometheus.rb /path/to/config.yml

# Debug mode to output all json and blacklisted checks
PROM_DEBUG=true check_prometheus.rb /path/to/config.yml

Development and testing

Dependencies: docker, docker-compose

To spin-up a development stack and run the integration tests

ruby test.rb

Afterwards you can just run rspec to run the tests

To run the dockerized version (that gitlab-ci uses)

bash test.sh

Environment variables

Name	Example	Default	Description
PROM_DEBUG	true	false	Debug output instead of sending checks to sensu
PROMETHEUS_ENDPOINT	hostname:9090	localhost:9090	Connection string in the format address:port
SENSU_SOCKET_ADDRESS	hostname	localhost	Address used to connect to the sensu socket
SENSU_SOCKET_PORT	1234	3030	Port used to connect to the sensu socket

Config.yml

Check configuration is defined in the config.yml file under the key checks, and checks based on custom Prometheus queries are under custom. Example:

config:
  reported_by: sbppapik8s
  occurrences: 3
  domain: example.com
  whitelist: sbppapik8s.*
  use_default_source: false
checks:
  - service:
    name: kube-controller-manager.service
  - check: load_per_cluster
    host: sbppapik8s
    cfg:
      cluster: prometheus
      warn: 1.0
      crit: 2.0
      source: sbppapik8s
custom:
  - name: heartbeat
    query: up
    check:
      type: equals
      value: 1
    msg:
      0: 'OK: Endpoint is alive and kicking'
      2: 'CRIT: Endpoints not reachable!'

Checks

Name	Description
service	Checks if a systemd service is active
memory	Checks memory usage as a percentage
load_per_cpu	Checks cpu load divided by cpus
load_per_cluster	Checks cpu load of entire cluster divided by total cpus
load_per_cluster_minus_n	Checks cpu load of entire cluster divided by total cpus minus n failures
inode	Checks inode usage as a percentage per mountpoint
disk	Checks filesytem usage as a percentage per mountpoint
disk_all	Checks filesystem and inode usage of all mountpoints
predict_disk_all	Predicts if any of the disks in prometheus will be full in x days

Custom

Name	Example	Description
name	heartbeat	Custom check's name
query	up	Prometheus query
check.type	(equals\|below\|above)	Type of evaluation applied against value. Avilable: `equals`, `below` and `above`
check.value	1	Value to be compared against query results, using `check.type` evaluation
cfg.warn	33.00	Warning threshold level
cfg.crit	37.00	Critical threshold level.
msg.0	OK: heartbeat is up	Message to be used when `value` evaluation is sucessful.
msg.2	CRITICAL: heartbeat is down	Message to be used when not sucessful.

Global Configuration Options

Name	Example	Description
reported_by	sbppapik8s	hostname that shows up in sensu reported_by field
occurrences	3	amount of failures before sensu will send an alert
whitelist	sbppapik8s.*	regex used as a safety whitelist to make sure the source names are correct
ttl	300	Override the Sensu TTL in seconds
ttl_status	1	Override the status code for an expiring Sensu TTL
use_default_source	false	When `true` the source of the events will be Sensu-Client's

Check Configuration Options

Name	Config	Example
service	name: servicename state: active\|deactivating\|failed\|inactive (default:active) state_required: 0\|1 (default:1)	name: test-service.service
memory	warn: warning percentage crit: critical percentage	warn: 90 crit: 95
load_per_cpu	warn: warning percentage crit: critical percentage	warn: 90 crit: 95
load_per_cluster	cluster: cluster name warn: warning percentage crit: critical percentage source: name that shows in sensu	cluster: nodes warn: 90 crit: 95 source: sbppapik8s
load_per_cluster_minus_n	cluster: cluster name minus_n: amount of member failures warn: warning percentage crit: critical percentage source: name that shows in sensu	cluster: nodes minus_n: 1 warn: 90 crit: 95 source: sbppapik8s
inode	mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage	mount: /var/lib/docker name: docker warn: 90 crit: 95
disk	mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage	mount: /var/lib/docker name: docker warn: 90 crit: 95
disk_all	ignore_fs: regex of filesystems warn: warning percentage crit: critical percentage	ignore_fs: tmpfs warn: 90 crit: 95
predict_disk_all	range_vector: Prometheus range vector used for sample size of prediction filter: prometheus filter to include/exclude disks days: prediction days source: sensu name	range_vector: 24h filter: {mountpoint="/"} days: 14 source: sbppapik8s

sensu-plugins-prometheus-checks

Development