DataSketches Ruby
DataSketches - sketch data structures - for Ruby
Installation
Add this line to your application’s Gemfile:
gem "datasketches"
Sketch Families
Distinct counting
- CPC sketch
- HyperLogLog sketch
- Theta sketch
Most frequent
- Frequent item sketch
Quantiles and histograms
- KLL sketch
Sampling
- VarOpt sketch
CPC Sketch
Create a sketch
sketch = DataSketches::CpcSketch.new
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::CpcSketch.deserialize(data)
Get the union
u = DataSketches::CpcUnion.new(14)
u.update(sketch1)
u.update(sketch2)
u.result
HyperLogLog Sketch
Create a sketch
sketch = DataSketches::HllSketch.new(14)
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize_updatable
# or
data = sketch.serialize_compact
Load a sketch
sketch = DataSketches::HllSketch.deserialize(data)
Get the union
u = DataSketches::HllUnion.new(14)
u.update(sketch1)
u.update(sketch2)
u.result
Theta Sketch
Create a sketch
sketch = DataSketches::UpdateThetaSketch.new
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::UpdateThetaSketch.deserialize(data)
Get the union
u = DataSketches::ThetaUnion.new
u.update(sketch1)
u.update(sketch2)
u.result
Get the intersection
i = DataSketches::ThetaIntersection.new
i.update(sketch1)
i.update(sketch2)
i.result
Compute A not B
d = DataSketches::ThetaANotB.new
d.compute(a, b)
Frequent Item Sketch
Create a sketch
sketch = DataSketches::FrequentStringsSketch.new(64)
Add data
sketch.update("a")
sketch.update("b")
sketch.update("c")
Estimate the frequency of an item
sketch.estimate("a")
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::FrequentStringsSketch.deserialize(data)
KLL Sketch
Create a sketch
sketch = DataSketches::KllIntsSketch.new
# or
sketch = DataSketches::KllFloatsSketch.new
Add data
sketch.update(1)
sketch.update(2)
sketch.update(3)
Get quantiles
sketch.quantile(0.5)
Get the minimum and maximum values from the stream
sketch.min_value
sketch.max_value
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::KllIntsSketch.deserialize(data)
Merge sketches
sketch.merge(sketch2)
VarOpt Sketch
Create a sketch
sketch = DataSketches::VarOptSketch.new(14)
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Sample data
sketch.samples
Credits
This library is modeled after the DataSketches Python API.
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone --recursive https://github.com/ankane/datasketches-ruby.git
cd datasketches-ruby
bundle install
bundle exec rake compile
bundle exec rake test