Sensu Kubernetes Prometheus Plugin
Description
Sensu plugin designed to query prometheus data output from node-exporter
Usage
check_prometheus.rb /path/to/config.yml
# Debug mode to output all json and blacklisted checks
PROM_DEBUG=true check_prometheus.rb /path/to/config.yml
Development and testing
Dependencies: docker, docker-compose
To spin-up a development stack and run the integration tests
ruby test.rb
Afterwards you can just run rspec to run the tests
To run the dockerized version (that gitlab-ci uses)
bash test.sh
Environment variables
| Name | Example | Default | Description |
|---|---|---|---|
| PROM_DEBUG | true | false | Debug output instead of sending checks to sensu |
| PROMETHEUS_ENDPOINT | hostname:9090 | localhost:9090 | Connection string in the format address:port |
| SENSU_SOCKET_ADDRESS | hostname | localhost | Address used to connect to the sensu socket |
| SENSU_SOCKET_PORT | 1234 | 3030 | Port used to connect to the sensu socket |
Config.yml
Check configuration is defined in the config.yml file under the key checks, and checks based on custom Prometheus queries are under custom. Example:
config:
reported_by: sbppapik8s
occurrences: 3
domain: example.com
whitelist: sbppapik8s.*
use_default_source: false
checks:
- service:
name: kube-controller-manager.service
- check: load_per_cluster
host: sbppapik8s
cfg:
cluster: prometheus
warn: 1.0
crit: 2.0
source: sbppapik8s
custom:
- name: heartbeat
query: up
check:
type: equals
value: 1
msg:
0: 'OK: Endpoint is alive and kicking'
2: 'CRIT: Endpoints not reachable!'Checks
| Name | Description |
|---|---|
| service | Checks if a systemd service is active |
| memory | Checks memory usage as a percentage |
| load_per_cpu | Checks cpu load divided by cpus |
| load_per_cluster | Checks cpu load of entire cluster divided by total cpus |
| load_per_cluster_minus_n | Checks cpu load of entire cluster divided by total cpus minus n failures |
| inode | Checks inode usage as a percentage per mountpoint |
| disk | Checks filesytem usage as a percentage per mountpoint |
| disk_all | Checks filesystem and inode usage of all mountpoints |
| predict_disk_all | Predicts if any of the disks in prometheus will be full in x days |
Custom
| Name | Example | Description |
|---|---|---|
| name | heartbeat | Custom check's name |
| query | up | Prometheus query |
| check.type | (equals|below|above) | Type of evaluation applied against value. Avilable: `equals`, `below` and `above` |
| check.value | 1 | Value to be compared against query results, using `check.type` evaluation |
| cfg.warn | 33.00 | Warning threshold level |
| cfg.crit | 37.00 | Critical threshold level. |
| msg.0 | OK: heartbeat is up | Message to be used when `value` evaluation is sucessful. |
| msg.2 | CRITICAL: heartbeat is down | Message to be used when not sucessful. |
Global Configuration Options
| Name | Example | Description |
|---|---|---|
| reported_by | sbppapik8s | hostname that shows up in sensu reported_by field |
| occurrences | 3 | amount of failures before sensu will send an alert |
| whitelist | sbppapik8s.* | regex used as a safety whitelist to make sure the source names are correct |
| ttl | 300 | Override the Sensu TTL in seconds |
| ttl_status | 1 | Override the status code for an expiring Sensu TTL |
| use_default_source | false | When `true` the source of the events will be Sensu-Client's |
Check Configuration Options
| Name | Config | Example |
|---|---|---|
| service |
name: servicename state: active|deactivating|failed|inactive (default:active) state_required: 0|1 (default:1) |
name: test-service.service |
| memory |
warn: warning percentage crit: critical percentage |
warn: 90 crit: 95 |
| load_per_cpu |
warn: warning percentage crit: critical percentage |
warn: 90 crit: 95 |
| load_per_cluster |
cluster: cluster name warn: warning percentage crit: critical percentage source: name that shows in sensu |
cluster: nodes warn: 90 crit: 95 source: sbppapik8s |
| load_per_cluster_minus_n |
cluster: cluster name minus_n: amount of member failures warn: warning percentage crit: critical percentage source: name that shows in sensu |
cluster: nodes minus_n: 1 warn: 90 crit: 95 source: sbppapik8s |
| inode |
mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage |
mount: /var/lib/docker name: docker warn: 90 crit: 95 |
| disk |
mount: mountpoint name: human readable name warn: warning percentage crit: critical percentage |
mount: /var/lib/docker name: docker warn: 90 crit: 95 |
| disk_all |
ignore_fs: regex of filesystems warn: warning percentage crit: critical percentage |
ignore_fs: tmpfs warn: 90 crit: 95 |
| predict_disk_all |
range_vector: Prometheus range vector used for sample size of prediction
filter: prometheus filter to include/exclude disks days: prediction days source: sensu name |
range_vector: 24h filter: {mountpoint="/"} days: 14 source: sbppapik8s |