Bloomed
Troy Hunt's brilliant haveibeenpwned.com let's you download SHA1s of 517,238,891 real world passwords previously exposed in data breaches. This list is comprehensive but huge in size: 11GB compressed. Using a bloom filter we can reduce the size down to files measured in MBs.
You can even keep a the bloom filter in memory in your web app or api. This is great if you're afraid to send the passwords that your users enter, to an external service for lookup.
This gem will let you control the trade off between memory size and precision. False positives will occur (that's the nature of bloom filters), but you control the frequency and how many of the pwned passwords you want in your filter, starting from the most pwned at the top.
Installation
Add this line to your application's Gemfile:
gem 'bloomed'
And then execute:
$ bundle
Or install it yourself as:
$ gem install bloomed
Usage
Quick start
require 'bloomed'
pw = Bloomed::PW.new
pw.pwned? "password123"
=> true
Using lower precision / lower memory consumption
There are two parameters that can be varied: top
and false_positive_probability
.
require 'bloomed'
pw = Bloomed::PW.new(top: 100000, false_positive_probability: 0.01) # 136 kb memory
pw.pwned? "password123"
=> true
Using higher precision / higher memory consumption
To keep the gem size small, it only ships with dumps up to 253 kb in size.
To generate all combinations of top
and false_positive_probability
bloom filters for pwned passwords, run:
rake bloomed:seed
This will download the source 7zip file with pwned passwords, unpack it to the current dir, write the generated bloom filters in the lib/dump dir relative to the installation path of the gem.
Note: You'll need to brew install curl p7zip
on macos and apt-get install curl p7zip
on linux.
Sometimes you will want to have more precise control of the placement of the cache files. To seed all variants in the current dir, run:
rake bloomed:seed\[all\]
But be aware that it will take a long time!
Once you have the massive 22GB text files available, you can generate binary cache files using the exact precision you want.
require 'bloomed'
pw = Bloomed::PW.new(top:1E9, false_positive_probability: 0.0001)
Warning! This seeds all the passwords and will take a loooong time the first time. Even with a binary cache file in place loading it will take massive time and memory.
For deployment scenarios where you don't want the server to rake seed
, you can override the directory used for caching by giving a cache_dir
argument to the constructor:
require 'bloomed'
pw = Bloomed::PW.new(top:1E8, false_positive_probability: 0.0001, cache_dir: '/var/lib/bloomed'
)
Size of the in memory bloom filter
The filter can vary much in size. Use Bloomed:PW#memory_size_bytes
to get the exact size.
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/skovsboll/bloomed. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Bloomed project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.