A simple vector space search engine with tf*idf ranking.
More info, and details of how it works.
Installation
Just install the gem:
gem install vss
Or add to your Gemfile, if you're using Bundler:
gem 'vss'
Usage
To perform a search on a collection of documents:
require "vss"
docs = ["hello", "goodbye", "hello and goodbye", "hello, hello!"]
engine = VSS::Engine.new(docs)
engine.search("hello") #=> ["hello", "hello, hello!", "hello and goodbye"]
Rails/ActiveRecord
If you want to search a collection of ActiveRecord
objects, you need to pass a documentizer Proc
when initializing VSS::Engine
which will convert the objects into documents (which are simply strings). For example:
class Page < ActiveRecord::Base
#attrs: title, content
end
docs = Page.all
documentizer = lambda { |record| record.title + " " + record.content }
engine = VSS::Engine.new(docs, documentizer)
Notes
This isn't designed to be used on huge collections of records. The original use case was for ranking a smallish set of ActiveRecord
results obtained via a query (using SearchLogic). So, essentially, the search consisted of 2 stages; getting the corpus via a SQL query, then doing the VSS on that.
Ruby
Tested with the following Ruby versions:
- MRI 1.9.2
- MRI 1.8.7
Probably works on JRuby ~> 1.6 too, but not actively tested.
Credits
Heavily inspired by Joesph Wilk's article on building a vector space search engine in Python.
Written by Mark Dodwell (@madeofcode)