Better Benchmark

Statistically correct benchmarking for Ruby.

Dependencies

Installation

# Linux:
gem install better-benchmark -- --with-R-dir=/usr/lib/R
# OSX:
gem install better-benchmark -- --with-R-dir=/Library/Frameworks/R.framework/Resources

Change the argument of --with-R-dir to whatever is appropriate for your system if either of the above don't work.

With Bundler

Bundler needs to be configured to use the build option:

# Linux:
bundle config build.rsruby --with-R-dir=/usr/lib/R
# OSX:
bundle config build.rsruby --with-R-dir=/Library/Frameworks/R.framework/Resources

Usage

Comparing code blocks

result = Benchmark.compare_realtime {
  do_something_one_way
}.with {
  do_it_another_way
}
Benchmark.report_on result

See also example.rb for a more comprehensive example.

Comparing git revisions

With a test script (recommended)

To test two revisions of a library, create a simple runner script:

# runner.rb
require 'mylib'

class TestQuick
  def initialize
    # initialization...
  end

  def run
    Benchmark.write_realtime( '/home/pistos/tmp' ) do
      5000.times do
        # do something with your lib
      end
    end
  end
end

t = TestQuick.new
t.run

Then run the bbench script, passing two git revisions:

bbench -r 6e84dd5 -r ed1e7c6 -d ~/tmp -- -Ilib runner.rb

Without altering or writing new code

You can also test two revisions by running some already-existing script, such as a file in your test suite:

bbench -r 6e84dd5 -r ed1e7c6 -- -Itest -Ilib test/test_something.rb

Be aware, however, that this may produce unnecessarily variant timings due to wide variance in the startup time of the Ruby interpreter and script.

Comparing git working copy

You can also compare the current branch tip to the current (dirty) working copy:

bbench -w -d ~/tmp -- -Ilib runner.rb

This lets you experiment without committing anything, and then only commit when you are confident that your changes result in a performance improvement.

Interpretation

Considering two "things under test", U1 and U2:

Example 1

Set 1 mean: 0.216 s
Set 1 std dev: 0.023
Set 2 mean: 0.187 s
Set 2 std dev: 0.020
p.value: 0.00287947346770876
W: 88.0
The difference (-13.5%) IS statistically significant.

This means that the results permit us to conclude that U2 performed 13.5% faster than U1.

Example 2

Set 1 mean: 10.968 s
Set 1 std dev: 4.294
Set 2 mean: 9.036 s
Set 2 std dev: 3.581
p.value: 0.217562623135379
W: 67.0
The difference (-17.6%) IS NOT statistically significant.

This means that the results do not permit us to conclude that the performance of U1 and U2 differed.

Not just Ruby

Technically, the bbench script can work with any script or program that writes a run time (in seconds) to the file bbench-run-time in the data dir. Use the -e option to specify a different executable than "ruby". e.g. perl, python, java, etc.

Help, etc.

irc.freenode.net#mathetes or http://webchat.freenode.net?channels=mathetes .

Repository

git clone git://github.com/Pistos/better-benchmark.git