Count-Min Sketch
A Ruby implementation of Graham Cormode and S. Muthu Muthukrishnan's probabilistic sub-linear space streaming algorithm. Relies on nashby's CityHash gem. To read more about count-min sketches, see the Wikipedia article.
Install
Add this line to your application's Gemfile:
gem 'count_min_sketch'
And then execute:
$ bundle
Or install it yourself as:
$ gem install count_min_sketch
Usage
Accepts two optional arguments, k
, the number of hash functions, and m
, the column size. These default to 10 and 100000 respectively.
Without Arguments
require 'count_min_sketch'
sketch = CountMinSketch::Counter.new
sketch.k
# => 10
sketch.m
# => 100000
With Arguments
require 'count_min_sketch'
k = 30
m = 128
sketch = CountMinSketch::Counter.new(k, m)
sketch.k
# => 30
sketch.m
# => 128
There are two methods: insert
, which adds data and returns the count, and get_count
. The former takes one required argument, the data which you want to add to the sketch, and an optional second argument, the number of times you would like to add it. This defaults to 1
.
require 'count_min_sketch'
k = 30
m = 128
sketch = CountMinSketch::Counter.new(k, m)
name = "Guy Fieri"
restaurant = "Guy's American Kitchen and Bar"
sketch.insert(name)
sketch.insert(restaurant, 2)
sketch.get_count(name)
# => 1
sketch.get_count(restaurant)
# => 2
Contributing
- Fork it ( https://github.com/kthffmn/count_min_sketch/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request