0.02
No commit activity in last 3 years
No release in over 3 years
An algorithm that allows searching for members of a known set of strings appearing as substrings of a larger string in time linear to both the size of the string and the size of the set
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 3.1
 Project Readme

Aho-Corasick string matching

The Aho-Corasick string matching algorithm will find all instances of a set of terms that are substrings of a certain string in constant time with respect to the length of the string. It does this by pre-building a NFA-like structure which it uses to efficiently parse the string for all the terms in the set to be matched against.

Our implementation is written in pure Ruby, and is used like so:

require 'aho_corasick'
matcher = AhoCorasick.new("woodchuck", "chuck", "could")
matcher.match("How much wood would a woodchuck chuck if a woodchuck could chuck wood.")
#=> ["woodchuck", "chuck", "woodchuck", "could", "chuck"]

You can insert additional terms into the matcher after instantiation, however each call to insert takes linear time with respect to the total number of terms to be matched against. You can call insert passing multiple terms to mitigate against this:

matcher.insert("would", "wood")
matcher.match("How much wood would a woodchuck chuck if a woodchuck could chuck wood.")
#=> ["wood", "would", "wood", "woodchuck", "chuck", "wood", "woodchuck", "could", "chuck", "wood"] 

To install: gem install aho_corasick or add gem "aho_corasick" to your Gemfile. We've tested it with Ruby 1.9.2, but there's nothing in there that should stop it working with any other Ruby. If you find any bugs, let us know - Feature requests and pull requests also welcome!