Project

reduceable

0.0
No commit activity in last 3 years
No release in over 3 years
Reduceable makes map reduce in mongo easy
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 1.0.0
>= 2.3.0

Runtime

~> 2.1.0
 Project Readme

Reducable

This is a module for MongoMapper which provides an easy way to add in some simple map/reduce functions to your data. If you have time series data and you want to show some sort of counter per date or time, then this should do it.

Concept

You have a bunch of objects in your MongoDB. You need to get some basic information about them such as: Simple aggregation of documents per key, Finding an average of a value, Counting the number of documents that contain a key.

You've probably read you can do this sort of stuff with MongoDB's map/reduce functionality, maybe you already know exactly how that works or maybe you don't really have a clue. Every guide I've seen for MongoMapper recommends you execute the map/reduce calculation every single time it's accessed, and they all demand that you write your own map and reduce functions.

Here are some use cases

# Count how many times each tag is used
Model.count_by(:tag, query = {})
# Sum all the weights of the different types of wrestlers 
Model.sum_of(:weight, :wrestler_type, query = {})
# Find Average Weight of the different types of wrestlers
Model.average_of(:weight, :wrestler_type, query = {})

Coming Soon

  • mongoid support
  • Sum by composite index
  • More Unit Tests :(

Installation

gem install reduceable
# or
sudo gem install reduceable

Usage

require 'mongo_mapper'
require 'reduceable'

MongoMapper.database = 'my_database_name'

class BlogPost
  include MongoMapper::Document
  include Reduceable

  key :article_body, String
  key :categories, Array
  key :time_posted, Time
  key :article_length, Integer
end

# Insert some data

BlogPost.count_by(:categories).to_a.each do |x| 
  puts "You have posted #{x['value']} posts from catefory #{x['_id']}"
end
BlogPost.sum_of(:article_length, :categories).to_a.each do |x|
  puts "You have written #{x['value']} characters in category #{x['_id']}"
end
BlogPost.average_of(:article_length, :categories).to_a.each do |x|
  puts "An article in category #{x['_id']} has an average of #{x['value']} characters"
end

See example.rb

# require the example model
require './example.rb'  #=> true
# setup some base data
setup #=> #<Test _id: BSON: ...... 
#
# Calculate how many times each tag is used
# You will use a similar map/reduce for a tag cloud
Test.count_by(:tags).to_a
#=> [{"_id"=>"alternative", "value"=>1.0}, {"_id"=>"book", "value"=>5.0}, {"_id"=>"classical", "value"=>1.0}, {"_id"=>"fantasy", "value"=>2.0}, {"_id"=>"fiction", "value"=>2.0}, {"_id"=>"music", "value"=>4.0}, {"_id"=>"non-fiction", "value"=>1.0}, {"_id"=>"pop", "value"=>1.0}, {"_id"=>"rock", "value"=>1.0}]

# Sum up the sale_amounts per tag
Test.sum_of(:sale_amount, :tags).to_a

# Find the average sale_amounts per tag
Test.average_of(:sale_amount, :tags).to_a

# Sum up the sale_amounts per tag where tags contains 'book'
Test.sum_of(:sale_amount, :tags, {:tags => 'book'}).to_a
# you can optionally pass in a mongo query that limits the initial dataset being
# fed to the map function.

# Find the average of sale_amounts per tag where tags contains 'book'
Test.average_of(:sale_amount, :tags, {:tags => 'book'}).to_a
# you can optionally pass in a mongo query that limits the initial dataset being
# fed to the map function.

For such a small collection the speed benefits aren't present, but once you get to several hundred thousand record, recreating the map_reduce collection on every call really slows things down. Reduceable solves that problem.