Project

em-monitor

0.0
No commit activity in last 3 years
No release in over 3 years
EventLoops are awesome unless you're doing a lot of blocking CPU stuff, at which point they become useless. This gem lets you easily graph the lengths of CPU-blocked spans so that you can take action to make your eventmachine server faster
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

em-monitor is a gem that lets you monitor your eventmachine reactor.

Introduction

As we all know, event loops are an awesome programming model. You can (mostly) forget about thread-safety, but you can still do a bazillion IO-things in parallel.

They do have one significant downside though: you can only run one CPU-thing at a time.

This means that if you accidentally spend 30 seconds running a bad regex, everything in your loop is going to get stuck for 30 seconds (that's about a million years in computer terms). This is particularly bad because one user who triggers a bad regex slows down all your other users for all 30 seconds.

EM::Monitor can't fix your code for you, but it can let you know you have a problem.

Usage

em-monitor wraps every CPU-span of code in your program and measures how long is spent executing it. You can then extract this data periodically in two ways. EM::monitor_spans calls a block with an array of raw measurements on a regular interval (by default 60 seconds), EM::monitor_histogram buckets all the measurements and then sums them. This lets you plot the amount of time that your event loop is spending running short CPU-spans against the amount of time that your event loop is spending running long CPU-spans.

EM::monitor_spans(interval: 1) do |spans, from, to|
  puts "Between #{from} and #{to} (#{to-from}seconds) there were #{spans.size} CPU-spans:"
  puts spans.inspect
end
#=> Between 2013-02-07 02:19:37 and 2013-02-07 02:19:38 (1.00 seconds) there were 7 CPU-spans:
#=> [0.000565469, 0.000564702, 0.000568218, 0.000564348, 0.005066146, 0.050109482, 0.050113617]

EM::monitor_histogram(interval: 1) do |histogram, from, to|
  puts "In the last #{to - from} real seconds, we used #{histogram.values.inject(&:+)} CPU-seconds"
  histogram.each do |key, value|
    puts "#{value} CPU-seconds in spans shorter than #{key} seconds"
  end
end
#=> In the last 1.00 real seconds, we used 0.1572 CPU-seconds
# => 0.0452 CPU-seconds in spans shorter than 0.001 seconds
# => 0.0619 CPU-seconds in spans shorter than 0.01 seconds
# => 0.0500 CPU-seconds in spans shorter than 0.1 seconds
# => 0 CPU-seconds in spans shorter than 1 seconds
# => 0 CPU-seconds in spans shorter than 10 seconds
# => 0 CPU-seconds in spans shorter than Infinity seconds

Plotting results

The easiest way to plot the histogram data is as a stacked chart. If your tool of choice can't stack charts directly you can call EM::monitor_histogram(stacked: true) and this will cause larger buckets to include the sum of all the smaller buckets in addition to the CPU-spans that fell into that bucket directly.

This will give you a graph of absolute time used per minute, which you can normalize to a utilization percentage in two ways:

# The absolute magnitude of the lines plotted here will be correct,
# however if you plot a stacked area graph the area will under-estimate the impact
# of CPU-spans of similar order of magnitude to `interval`.
histogram.map{ |key, value| value * 100 / (to - from) }

# Looking at the absolute magnitude of this graph will over-estimate CPU-spans
# in the short term, however if you plot a stacked area graph the area will be
# more correct.
histogram.map{ |key, value| value * 100 / interval }

If you need to combine the results from multiple machines you should instead use the EM::monitor_histogram(cumulative: true), and centrally keep track of the total cumulative CPU. Plotting the derivative after summing will give you a stable plot that makes sense when averaged.

To get a feel for how this works look at example/gnuplot.rb or example/librato.rb.

Meta-fu

There's API documentation if you'd like it.

Everything is licensed under the MIT license, see LICENSE.MIT for details.

Pull requests and bug reports are very welcome.