Nightwing is a Sidekiq middleware for capturing worker metrics including number processed, number of failures, timing, etc.
Installation
Inside your Gemfile, add the following line:
gem 'nightwing'
Configuration
You will need to add the code below to your app. In a typical Rails app, this would go into an initializer.
Please note that you must require your own librato-rack gem and supply it to Nightwing
# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add Nightwing::Sidekiq::Stats, client: Librato
chain.add Nightwing::Sidekiq::QueueStats, client: Librato
chain.add Nightwing::Sidekiq::WorkerStats, client: Librato
chain.add Nightwing::Sidekiq::Profiler, client: Librato
end
end
To gather database metrics:
# config/initializers/instrumentation.rb
Nightwing.client = Librato
ActiveSupport::Notifications.subscribe('sql.active_record', Nightwing::Instrumentation::ActiveRecord.new)
To gather Redis and memcache metrics:
# config/initializers/instrumentation.rb
Nightwing.client = Librato
require 'nightwing/extensions/dalli' # dalli gem required
require 'nightwing/extensions/redis' # redis gem required
Available options
Name | Description | Required? | Default |
---|---|---|---|
client | Librato or statsd client | yes | N/A |
namespace | Prefix for each metric | no | "sidekiq" |
debug | Enable for verbose logging | no | false |
logger | Logger instance for debug mode | no | Nightwing::Logger |
disabled_metrics | Metrics that are disabled | no | Empty array |
When debug mode is turned on, Nightwing will output the metrics into a parsable format. The output destination is determined by the logger. If no logger is given then we send the debugging output to STDOUT.
Disabling automatic metrics
The current approach of this gem is to report metrics in a custom sidekiq middleware. For some metrics this can be less than ideal because you won't report the metric if no jobs are being processed. This is especially true for queue depth metrics (size and latency).
If you want to use queue size and latency metrics to monitor the health of your sidekiq queues (i.e: set automatic alerts
when the queue is not being processed) you will have to manually report that metric outside of the sidekiq middleware.
One way of doing that is to run a clock process that reports the metrics every certain time interval. For that purpose
you can call Nightwing::Sidekiq::QueueMonitoring#report_depth_metrics_for_queues
passing in a collection of
Sidekiq::Queue
objects.
i.e:
Nightwing::Sidekiq::QueueMonitoring.new(metrics_collector: Librato, namespace: "sidekiq").report_depth_metrics_for_queues(Sidekiq::Queue.all)
To disable the report of queue depth metrics inside the middleware you can use disabled_metrics
option:
chain.add Nightwing::Sidekiq::QueueStats, client: Librato, disabled_metrics: [:queue_depth]
For now this only works for :queue_depth
metrics, because is the most common case where you need to disable the
automatic reporting.
Instrumentation Metrics
Below are the metrics reported to Librato from instrumentation classes
-
sql.<table>.<action>.time
: how long the database query took to complete
Extensions Metrics
Below are the metrics reported to Librato from instrumentation classes
-
redis.command.processed
: number of times overall command was called -
redis.command.time
: response time (in ms) for all commands -
redis.command.<command>.processed
: number of times the command was called -
redis.command.<command>.time
: response time (in ms) for command -
memcache.command.processed
: number of times overall command was called -
memcache.command.time
: response time (in ms) for all commands -
memcache.command.<command>.processed
: number of times the command was called -
memcache.command.<command>.time
: response time (in ms) for command
Sidekiq Metrics
Below are the metrics reported to Librato from the Sidekiq middleware
-
sidekiq.retries
: number of jobs to be retried -
sidekiq.scheduled
: number of jobs scheduled to run -
sidekiq.processed
: number of times middleware called -
sidekiq.failed
: number of jobs that raised an error
Queue specific
-
sidekiq.<queue>.size
: depth for a given queue -
sidekiq.<queue>.latency
: latency for given queue¹ -
sidekiq.<queue>.processed
: number of times middleware called for given queue -
sidekiq.<queue>.failed
: number of jobs in given queue that raised an error -
sidekiq.<queue>.time
: how long jobs took to process (in milliseconds) -
sidekiq.<queue>.gc.count
: number of times the Ruby GC kicked off -
sidekiq.<queue>.memory.delta
: the different in the process memory after jobs were processed (in bytes)
¹: the difference between now and when the oldest job was enqueued (given in seconds)
Worker specific
-
sidekiq.<queue>.<worker>.processed
: number of times middleware called for given worker -
sidekiq.<queue>.<worker>.failed
: number of jobs in given worker that raised an error -
sidekiq.<queue>.<worker>.finished
: number of successful worker jobs -
sidekiq.<queue>.<worker>.time
: how long given worker took to process (in milliseconds) -
sidekiq.<queue>.<worker>.retried
: number of times a given worker retried -
sidekiq.<queue>.<worker>.gc.count
: number of times the Ruby GC kicked off -
sidekiq.<queue>.<worker>.memory.delta
: the different in the process memory after jobs were processed (in bytes)