Sidekiq::InfluxDB
Sidekiq server middleware that writes job lifecycle events as points to an InfluxDB database. Also includes classes that write global Sidekiq metrics and queue metrics.
Installation
Add this gem to your application's Gemfile
:
bundle add sidekiq-influxdb
Usage
Add included middleware to your application's Sidekiq middleware stack.
The following examples assume that you already have an InfluxDB client object
in the influxdb
variable.
This will create a middleware with all defaults (suitable for most deployments):
# config/initializers/sidekiq.rb
require "sidekiq/middleware/server/influxdb"
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add Sidekiq::Middleware::Server::InfluxDB, influxdb_client: influxdb
end
end
You can customize the middleware by passing more options:
# config/initializers/sidekiq.rb
require "sidekiq/middleware/server/influxdb"
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add Sidekiq::Middleware::Server::InfluxDB,
influxdb_client: influxdb,
series_name: 'sidekiq_jobs', # This is the default one.
retention_policy: 'rp_name', # In case you want to write metrics to a non-default RP.
start_events: true, # Whether or not you want to know when jobs started. See `event` tag description below.
tags: {application: 'MyApp'}, # Anything you need on top. **Make sure that tag values have low cardinality!**
except: [UnimportantJob] # These job classes will be executed without sending any metrics.
end
end
This library assumes that you already have an InfluxDB client object set up the way you like. It does not try to create one for you. If that is not the case, you can learn how to create a client in InfluxDB client documentation.
Warning: This middleware is going to write a lot of metrics. Set up your InfluxDB client accordingly:
- either set
async: true
in the client's options to use its built-in batching feature, - or install Telegraf, set up aggregation inside it, and set up InfluxDB client to send metrics to it,
- or both.
When you deploy this code, you will have the following series in your InfluxDB database:
> select * from sidekiq_jobs
name: sidekiq_jobs
time application class creation_time error event jid queue total waited worked
---- ----------- ----- ------------- ----- ----- --- ----- ----- ------ ------
1511707465061000000 MyApp FooJob 1511707459.0186539 start 51cc82fe75fbeba37b1ff18f default 6.042410135269165
1511707465061000000 MyApp FooJob 1511707459.0186539 finish 51cc82fe75fbeba37b1ff18f default 8.046684265136719 6.042410135269165 2.0042741298675537
1511707467068000000 MyApp BarJob 1511707461.019835 start 3891f241ab84d3aba728822e default 6.049134016036987
1511707467068000000 MyApp BarJob 1511707461.019835 NoMethodError error 3891f241ab84d3aba728822e default 8.056788206100464 6.049134016036987 2.0076541900634766
Tags (repetitive indexed data; for filtering and grouping by):
-
time
— standard InfluxDB timestamp. Precision of the supplied client is respected. -
queue
— queue name. -
class
— job class name. Classes fromexcept:
keyword argument are skipped (no data is sent to InfluxDB). -
event
— what happened to the job at the specifiedtime
:start
,finish
, orerror
. If you initialize the middleware withstart_events: false
, there will be nostart
events. -
error
— ifevent=error
, this tag contains the exception class name. - Your own tags from the initializer.
Values (unique non-indexed data; for aggregation):
-
jid
— unique job ID. -
creation_time
— job creation time.
Values calculated by this gem (in seconds):
-
waited
— how long the job waited in thequeue
until Sidekiq got around to starting it. -
worked
— how long it took to perform the job from start to finish or to an exception. -
total
— how much time passed from job creation to finish. How long it took to do the job, in total.
This schema allows querying various job metrics effectively.
For example, how many reports have been generated in the last day:
SELECT COUNT(jid) FROM sidekiq_jobs WHERE class = 'ReportGeneration' AND time > now() - 1d
How many different jobs were executed with errors in the last day:
SELECT COUNT(jid) FROM sidekiq_jobs WHERE event = 'error' AND time > now() - 1d GROUP BY class
Et cetera.
Stats and Queues metrics
To collect metrics for task stats and queues, you need to run the following code periodically.
For example, you can use Clockwork for that.
You can add settings like this to clock.rb
:
require "sidekiq/metrics/stats"
require "sidekiq/metrics/queues"
influx = InfluxDB::Client.new(options)
sidekiq_global_metrics = Sidekiq::Metrics::Stats.new(influxdb_client: influx)
sidekiq_queues_metrics = Sidekiq::Metrics::Queues.new(influxdb_client: influx)
every(1.minute, 'sidekiq_metrics') do
sidekiq_global_metrics.publish
sidekiq_queues_metrics.publish
end
For stats metrics:
require "sidekiq/metrics/stats"
Sidekiq::Metrics::Stats.new(
influxdb_client: InfluxDB::Client.new(options), # REQUIRED
series_name: 'sidekiq_stats', # optional, default shown
retention_policy: nil, # optional, default nil
tags: {}, # optional, default {}
).publish
For queues metrics:
require "sidekiq/metrics/queues"
Sidekiq::Metrics::Queues.new(
influxdb_client: InfluxDB::Client.new(options), # REQUIRED
series_name: 'sidekiq_queues', # optional, default shown
retention_policy: nil, # optional, default nil
tags: {}, # optional, default {}
).publish
When you run the code, you will have the following series in your InfluxDB database:
> select * from sidekiq_stats
name: sidekiq_stats
time size stat
---- ---- ----
1582502419000000000 9999 dead
1582502419000000000 0 workers
1582502419000000000 0 enqueued
1582502419000000000 23020182 processed
> select * from sidekiq_queues
name: sidekiq_queues
time queue size
---- ----- ----
1582502418000000000 default 0
1582502418000000000 queue_name_1 0
Visualization
Grafana
You can import a ready-made dashboard from grafana_dashboard.json.