Sensu Kubernetes Prometheus Plugin
Description
Sensu plugin designed to query prometheus data output from node-exporter
Usage
check_prometheus.rb /path/to/config.yml
# Debug mode to output all json and blacklisted checks
PROM_DEBUG=true check_prometheus.rb /path/to/config.yml
Development and testing
Dependencies: docker, docker-compose
To spin-up a development stack and run the integration tests
ruby test.rb
Afterwards you can just run rspec
to run the tests
To run the dockerized version (that gitlab-ci uses)
bash test.sh
Environment variables
Name | Example | Default | Description |
---|---|---|---|
PROM_DEBUG | true | false | Debug output instead of sending checks to sensu |
PROMETHEUS_ENDPOINT | hostname:9090 | localhost:9090 | Connection string in the format address:port |
SENSU_SOCKET_ADDRESS | hostname | localhost | Address used to connect to the sensu socket |
SENSU_SOCKET_PORT | 1234 | 3030 | Port used to connect to the sensu socket |
Config.yml
Check configuration is defined in the config.yml
file under the key checks
, and checks based on custom Prometheus queries are under custom
. Example:
config:
reported_by: sbppapik8s
occurrences: 3
domain: example.com
whitelist: sbppapik8s.*
use_default_source: false
checks:
- service:
name: kube-controller-manager.service
- check: load_per_cluster
host: sbppapik8s
cfg:
cluster: prometheus
warn: 1.0
crit: 2.0
source: sbppapik8s
custom:
- name: heartbeat
query: up
check:
type: equals
value: 1
msg:
0: 'OK: Endpoint is alive and kicking'
2: 'CRIT: Endpoints not reachable!'
Checks
Name | Description |
---|---|
service | Checks if a systemd service is active |
memory | Checks memory usage as a percentage |
load_per_cpu | Checks cpu load divided by cpus |
load_per_cluster | Checks cpu load of entire cluster divided by total cpus |
load_per_cluster_minus_n | Checks cpu load of entire cluster divided by total cpus minus n failures |
inode | Checks inode usage as a percentage per mountpoint |
disk | Checks filesytem usage as a percentage per mountpoint |
disk_all | Checks filesystem and inode usage of all mountpoints |
predict_disk_all | Predicts if any of the disks in prometheus will be full in x days |
Custom
Name | Example | Description |
---|---|---|
name | heartbeat | Custom check's name |
query | up | Prometheus query |
check.type | (equals|below|above) | Type of evaluation applied against value. Avilable: `equals`, `below` and `above` |
check.value | 1 | Value to be compared against query results, using `check.type` evaluation |
cfg.warn | 33.00 | Warning threshold level |
cfg.crit | 37.00 | Critical threshold level. |
msg.0 | OK: heartbeat is up | Message to be used when `value` evaluation is sucessful. |
msg.2 | CRITICAL: heartbeat is down | Message to be used when not sucessful. |
Global Configuration Options
Name | Example | Description |
---|---|---|
reported_by | sbppapik8s | hostname that shows up in sensu reported_by field |
occurrences | 3 | amount of failures before sensu will send an alert |
whitelist | sbppapik8s.* | regex used as a safety whitelist to make sure the source names are correct |
ttl | 300 | Override the Sensu TTL in seconds |
ttl_status | 1 | Override the status code for an expiring Sensu TTL |
use_default_source | false | When `true` the source of the events will be Sensu-Client's |
Check Configuration Options
Name | Config | Example |
---|---|---|
service |
name: servicename state: active|deactivating|failed|inactive (default:active) state_required: 0|1 (default:1) |
name: test-service.service |
memory |
warn: warning percentage crit: critical percentage |
warn: 90 crit: 95 |
load_per_cpu |
warn: warning percentage crit: critical percentage |
warn: 90 crit: 95 |
load_per_cluster |
cluster: cluster name warn: warning percentage crit: critical percentage source: name that shows in sensu |
cluster: nodes warn: 90 crit: 95 source: sbppapik8s |
load_per_cluster_minus_n |
cluster: cluster name minus_n: amount of member failures warn: warning percentage crit: critical percentage source: name that shows in sensu |
cluster: nodes minus_n: 1 warn: 90 crit: 95 source: sbppapik8s |
inode |
mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage |
mount: /var/lib/docker name: docker warn: 90 crit: 95 |
disk |
mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage |
mount: /var/lib/docker name: docker warn: 90 crit: 95 |
disk_all |
ignore_fs: regex of filesystems warn: warning percentage crit: critical percentage |
ignore_fs: tmpfs warn: 90 crit: 95 |
predict_disk_all |
range_vector: Prometheus range vector used for sample size of prediction
filter: prometheus filter to include/exclude disks days: prediction days source: sensu name |
range_vector: 24h filter: {mountpoint="/"} days: 14 source: sbppapik8s |