StringMetric
A simple library with String Metric algorithms. If you want to read more about String Metric algorithms please read here.
This library wants to support MRI (1.9.3, 2.0.0, 2.1.0), JRuby and Rubinius.
Installation
Add this line to your application's Gemfile:
gem 'string_metric'
And then execute:
$ bundle
Or install it yourself as:
$ gem install string_metric
Usage
Levenshtein Distance
The public api for Levenshtein Distance is the method
StringMetric::Levenshtein.distance
.
Options
-
:max_distance
: It sets an upper limit for the calculated distance. Can beFixnum
orFloat
. -
:insertion_cost
: It overrides the default (equals to 1) insertion penalty. Can beFixnum
orFloat
. -
:deletion_cost
: It overrides the default (equals to 1) deletion penanty. Can beFixnum
orFloat
. -
:substitution_cost
: It overrides the default (equals to 1) substitution penalty. Can beFixum
orFloat
. -
:strategy
: The desired strategy for Levenshtein distance. Supported strategies are:recursive
,:two_matrix_rows
,:two_matrix_rows_v2
,:two_matrix_rows_ext
,:full_matrix
and:experiment
. The default strategy is:two_matrix_rows_v2
for MRI and:two_matrix_rows
for other platforms One should not depend on:experiment
strategy.
Examples
require 'string_metric'
StringMetric::Levenshtein.distance("kitten", "sitting")
# Generates: 3
# Trim distance to :max_distance
StringMetric::Levenshtein.distance("kitten", "sitting",
max_distance: 2)
# Generates: 2
# Pass different costs for increase, delete or substitute actions
StringMetric::Levenshtein.distance("kitten", "sitting",
insertion_cost: 2,
deletion_cost: 2,
substitution_cost: 2)
# Generates: 6
References
Benchmarks
You can run benchmarks with
$ bundle exec ruby benchmarks/*
or you can choose to benchmark a specific algorithm like:
$ bundle exec ruby benchmarks/levenshtein.rb
Current Benchmarks status
Levenshtein
Implementation | User | Real |
---|---|---|
Levenshtein::IterativeWithFullMatrix | 2.320000 | 2.343141 |
Levenshtein::IterativeWithTwoMatrixRows | 2.020000 | 2.044638 |
Levenshtein::Experiment | 1.750000 | 1.779868 |
Levenshtein::IterativeWithTwoMatrixRowsOptimized | 1.320000 | 1.343095 |
Levenshtein::IterativeWithTwoMatrixRowsExt | 0.220000 | 0.228965 |
Text::Levenshtein (from gem text) | 2.240000 | 2.308803 |
Currently the set of fixtures is very small - ruby 2.1.0 is used
Other implementations
Levenshtein
- this beautiful gem, text
- ffi implementations, like this or check The Ruby Toolbox
Various
- Approximate String matching library
Tools
- Try to use SemVer
Contributing
- Fork it ( http://github.com//string_metric/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Licence
string_metric is licensed under MIT. See License