ruby-prof-flamegraph

Easily find the bottleneck in your Ruby app.

RubyProf::FlameGraphPrinter is a ruby-prof printer that outputs a fold stack file that's compatible with FlameGraph. It is created based on RubyProf::CallStackPrinter.

The result can be passed to FlameGraph to generate an interactive stack trace visualization (click on the image to see the demo).

Awesomeness

FlameGraph is a way to visualize stack trace, making it very obvious where in the program takes the longest time. It is a Perl script takes a "fold stack" file and generates a nice, interactive SVG. The fold stack is usually generated from DTrace or Prof data using stackcollapse.pl, which is included with FlameGraph.

I created this gem because I want to find out where the bottleneck is in SlimWiki's specs, but I don't know DTrace and just want the result quick.

To learn more about Flame Graphs, check these out:

Official Flame Graphs Website by Brendan Gregg
Node.js in Flames, which is the article that introduced me to flame graph (via Node Weekly Issue #62)
Blazing Performance with Flame Graphs talk at USENIX/LISA13 (slideshare) (video)

Installation

gem 'ruby-prof-flamegraph'

Usage

Obtaining the Fold Stack Trace

Just require 'ruby-prof-flamegraph' and use RubyProf::FlameGraphPrinter as your printer for ruby-prof. For vanilla ruby-prof, see example.rb.

For rspec-prof, RSpecProf.printer_class = RubyProf::FlameGraphPrinter

Generating the Image

Then, to generate an image, pipe the output to FlameGraph's flamegraph.pl. Here's how the example SVG is generated:

bundle exec ruby example.rb | \
    ~/GitHub/FlameGraph/flamegraph.pl --countname=ms --width=728 > example.svg

Output Format

Taken from [flamegraph.pl]:

The input is stack frames and sample counts formatted as single lines. Each frame in the stack is semicolon separated, with a space and count at the end of the line.

Each line in the output looks like this:

Thread:ID;Fiber:ID;Class#method1 (call_count) time_taken_in_milliseconds

Here's an example:

Thread:123;Fiber:123;Capybara::Server#boot (1) 0.163681
Thread:123;Fiber:123;Capybara::Server#boot (1);<Module::Capybara>#server (1) 0.099445
Thread:123;Fiber:123;Capybara::Server#boot (1);<Module::Capybara>#server (1);Kernel#block_given? (1) 0.044838
Thread:123;Fiber:123;Capybara::Server#boot (1);Proc#call (1) 0.019682
Thread:123;Fiber:123;Capybara::Server#boot (1);Proc#call (1);<Module::Capybara>#run_default_server (1) 0.147611

The call count is included so that the number of calls is shown in the graph.

Post-Processing

Sometimes, you may want to post process the data before passing it to flamegraph.pl, such as...

Removing the Thread and Fiber ID

Capybara spawns new WEBrick thread each time, putting each HTTP server instance in a different stack. We can strip the Thread ID and Fiber ID from the output first:

processed = 0

while data = gets
  data.gsub! %r{^Thread:\d+;Fiber:\d+;}, ''
  puts data
  processed += 1
  if processed % 1000 == 0
    $stderr.puts "Processed #{processed} lines"
  end
end

$stderr.puts "Finished processing #{processed} lines"

ruby remove-thread-id.rb < profile.txt > profile2.txt

Removing call counts

Sometimes, you may want to concatenate multiple profiles. This causes the stack to separate based on the call count. To put calls to same method on the same stack, the call count should be removed.

processed = 0

while data = gets
  data.gsub! %r{\s\(\d+\);}, ';'
  puts data
  processed += 1
  if processed % 1000 == 0
    $stderr.puts "Processed #{processed} lines"
  end
end

$stderr.puts "Finished processing #{processed} lines"

ruby remove-call-count.rb < profile2.txt > profile3.txt

I am relying on external scripts to do this processing for maximum flexibility (so that I always have the raw data to adjust it as needed).