Rector

Rector allows coordination of a number of jobs spawned with a mechanism like Resque (though any job manager will do). If you are able to parallelize the processing of a task, yet all these tasks are generating metrics, statistics, or other data that need to be combined, Rector might be for you.

Requirements

Ruby >= 1.9.2 (or 1.9 mode of JRuby or Rubinius)

Configuration

Rector currently supports Redis as a backend for job coordination and data storage.

Redis Server

Rector.configure do |c|
  c.redis = Redis.new(:host => "10.0.1.1", :port => 6380)
end

Job Creation (Master)

Rector requires that some process be designated as the "master" process. This is usually the process that is also responsible for spawning the worker jobs.

job = Rector::Job.new

# e.g., processing files in parallel
files.each do |file|
  worker = job.workers.create

  # e.g., using Resque for job management; Rector doesn't really care
  Resque.enqueue(WordCounterJob, worker.id, file)
end

# wait for all the workers to complete
job.join

# get aggregated data from all the jobs
job.data.each do |word, count|
  puts "#{word} was seen #{count} times across all files"
end

job.cleanup

Job Processing (Workers)

class ProcessFileJob
  def self.perform(worker_id, file)
    worker = Rector::Worker.new(worker_id)

    words = File.read(file).split(/\W/)
    words.reject(&:blank?).each do |word|
      worker.data[word] ||= 0 
      worker.data[word]  += 1
    end

    worker.finish
  end
end

rector