Enygma: a Sphinx toolset
NOTE: Enygma is currently in a state of disarray, since I hacked it together just enough to work with ActiveRecord. The specs shouldn't run and coverage is inexcusable, which I feel bad about. I'll clean things up incrementally. Until then, consider it in alpha.
ANOTHER NOTE: This documentation is unfinished and a little wanky. I'll improve it when time allows. The best way right now to figure out how to use Enygma is to look through the code.
Sphinx is awesome, but it's sometimes kind of unwieldly to use, requiring a bunch of moving parts just to see what a certain Sphinx query would yank out of your database. Some solutions for working with Sphinx exist, but it's hard to justify spinning up an entire new Rails project just to search through some HTML documents you've got lying on your hard drive.
For this reason, Enygma exists to be an awesome little Sphinx toolset usable just about anywhere.
Requirements
Eyngma requires the following things:
- Sphinx v.0.9.9rc1 or higher
For some types of geospatial searching, Enygma requires GeoRuby.
The Enygma database adapters require the related libraries. For example, the Enygma::Adapters::ActiveRecordAdapter
requires the active_record gem, and the Enygma::Adapters::SequelAdapter
requires the sequel gem.
Indexing
For the time being, Enygma doesn't build a conf file or help set up your indexes for you.
This is actually not that hard to do yourself, and you'll find you've got much more control over what gets indexed than a Sphinx solution such as, say, thinking_sphinx offers. Read the Sphinx documentation for information on writing your own conf file. Once you've done it yourself, you'll feel like a genius.
That being said, Enygma plans to eventually support guided Sphinx configuration.
Usage
Configuration
Take your favorite class and include Enygma
, then give it class-level configuration by calling configure_enygma
. This will save a constant in the class called <class_name>_ENYGMA_CONFIGURATION
holding the configuration
Enygma::Configuration.global do
adapter :sequel
datastore "postgres://user@localhost/db"
sphinx.host 'localhost'
sphinx.port 3312
end
class SearchyThing
include Enygma
configure_enygma do
table :posts, :indexes => [ :posts, :comments ]
end
end
This appends the search
method to the included class.
Searching
To search, call the search
method on a class that included Enygma. The way searching works depends a lot on both the kind of class you've included Enygma in and the type of datastore adapter you're using. The fundamentals are the same, though.
For a plain ol' Ruby class, like a controller, call search
and tell it where to go find the actual objects after it's retrieved the pointers from Sphinx. An example using the active_record
adapter:
class PostsController < ApplicationController
include Enygma
configure_enygma do
adapter :active_record
end
def index
@posts = search(Post).for(params[:term]).using_index(:posts)
end
end
An example using the sequel
adapter:
include Enygma
configure_enygma do
adapter :sequel
end
get '/posts' do
@posts = search(:posts).for(params[:term]).using_index(:posts)
end
Adding filters:
SearchyThing.search(:comments).for("funtimes").filter("post_id", 1..10)
SearchyThing.search(:comments).for("funtimes").exclude("post_id", 50..100)
Returning only certain attributes from the matches records:
SearchyThing.search(:comments).for("funtimes").return(:author_id)
Iterating over the records:
SearchyThing.search(:posts).for("funtimes").each do |post|
post.tag!(Tag.new("funtimes!"))
end
Geospatial Searching
Enygma's geospatial searching abilities are naive at present.
To search in a given radius of a latitude/longitude pair:
SearchyThing.search(:places).within(500).of(40.747778, -73.985556)
Where 500
means 500 meters. You can set different units like so:
SearchyThing.search(:places).within(1000).feet.of(40.747778, -73.985556)
Latitudes and longitudes should be given in degrees, which will be converted to the radians required by Sphinx.
Instead of a lat/lng pair, you can pass a GeoRuby::SimpleFeatures::Point.
point = GeoRuby::SimpleFeatures::Point.from_lon_lat(-73.985556, 40.747778)
SearchyThing.search(:places).within(500).of(point)
Or you can pass any object which responds to coordinates
, which should in turn respond to both lat
and lng
(good for, say, ActiveRecord models with a point attribute).
arbys = Place.filter(:name => "Arby's").first
SearchyThing.search(:places).within(500).of(arbys)
Should you want to, you can search within an annulus (the area between two concentric circles) by passing a range as the radius argument.
SearchyThing.search(:places).around(point, 500..1000)
Kicker Methods
An Engyma::Search instance (the things that's returned when you call search
) will delay execution of the Sphinx query and database query until you call the 'kicker' method run
. If you send the Search object a missing method, it will run the search and then pass the method on the the return value of the query (what's returned differs slightly based on your database adapter, see below).
For example:
Model.search.for("Arby's") # => returns an Enygma::Search object
Model.search.for("Arby's").run # => returns an array of Models
Model.search.for("Arby's").first # => returns the first Model found
A gotcha: if somehow you have, for example, an ActiveRecord::Base named_scope that matches the name of a Search method, say filter
, you should explicitly call run
on the Search object.
Model.search.for("Arby's").filter(...) # => sets a Sphinx filter
Model.search.for("Arby's").run.filter(...) # => runs the filter named_scope
Resources
Subclasses of ActiveRecord::Base and Sequel::Model and classes including Datamapper::Resource can be extended using Enygma::Resource to give them searching superpowers.
Including Enygma::Resource in one of the above types of classes will relfect on the table associated with the class and automatically scope Enygma searches to that table and its related indexes.
class Post < ActiveRecord::Base
include Enygma::Resource
end
Post.search.for("turkey").each { |post| puts post.title }
If a class includes Enygma::Resource, it can only search on one table at a time (meaning that it can only return records of a single type), but can still search for those records using multiple indexes. For example:
class Post < ActiveRecord::Base
include Enygma::Resource
configure_enygma do
index :posts
index :posts_delta
end
end
More in-depth per-adapter documentation detailed below:
ActiveRecord::Base
An ActiveRecord::Base subclass that includes Enygma::Resource will, instead of returning the actual results of the database query, return an anonymous scope searching for record ids in the set of ids returned by Sphinx. This helps ease integration with will_paginate, as well as allowing the appending of additional named_scopes.
For example, to get all Models and their Associations in one go:
Model.search.for("turkey").all(:include => :association)
Sequel::Model
Like above, a Sequel::Model with Enygma::Resource will not automatically kick off the query, but just return a prepared Sequel::Query object for further filtering.
Datamapper
Nothing special about Datamapper so far.
Using Enygma in a controller
What follows is an example of using Enygma in an ActionController::Base subclass in a Rails project, but it should apply to most any controller (or any class, for that matter).
class PostsController < ApplicationController
include Enygma
configure_enygma do
adapter :active_record
database Post
table :posts
index :posts
index :posts_delta
end
def index
@posts = search(:posts).for(params[:search]).all(:include => :comments)
end
end
Non-relational Database Stores
Enygma is so awesome that you can use it to hook Sphinx up to non-relational database stores and other data-storing structures, such as Memcache, Tokyo Cabinet, and BerkeleyDB (not currently implemented).
Of course, Sphinx can't index content from one of these database types, so it's assumed that the data has been prepopulated and that you have already set up a system to keep the original data source (the one Sphinx indexed from) and the data store in sync.
For example, assume you've taken a large chunk of mostly-static data from your database and put it as marshalled hashes into a Tokyo Cabinet. You can tell Enygma to to query Sphinx for a term, then get the records from the Tokyo Cabinet.
Let's assume that, nightly, you reindex your users table and stuff a bunch of hashes structured like { :id => <id>, :username => <username>, :email => <email> }
into a Tokyo Cabinet file called 'usernames.tch', each under the key user:<id>
. You want to set up a controller to autocomplete the users' names, and you want it to be fast. You can tell Enygma to look for these hashes in the (lightning-fast) Tokyo Cabinet instead of your (glacially-slow) database like so:
class UserNamesAutocompletionsController < ApplicationController
include Enygma
configure_enygma do
adapter :tokyo_cabinet
database 'usernames.tch'
key_prefix 'user:'
index :users
end
def index
@usernames = search.for(params[:search]).run
end
end