Mongoid layer for the Sphinx fulltext search server that supports block fields and dynamic indexes
Installation
Add this line to your application's Gemfile:
gem "mongoid-giza"
And then execute:
$ bundle
Or install it yourself as:
$ gem install mongoid-giza
Usage
⚠️ Before proceeding is extremely recommended to read the Sphinx documentation if you are not yet familiar with it. Reading up to chapter 5 is enought to get you going.
Configuration file
A YAML configuration file is needed to configure the gem, the Sphinx searchd daemon, optionally the Sphinx indexer and set the default options for the sources and indexes.
The minimum configuration file must have the sphinx.conf output path, the address and port of the searchd daemon, paths to its pid and log files. It's also a good idea to define a default path for very index.
The xmlpipe_command
is set to a default when using rails, otherwise you need to set it for each index or a default on the YAML file.
String settings accept ERB, and you have access to the Mongoid::Giza::Index
from index and source section settings.
The configuration file is automatically loaded when using Rails from config/giza.yml
, otherwise you will need to call Mongoid::Giza::Configuration.instance.load
to load it.
Example: (the xmlpipe_command
used here is already the one used in rails automatically so it's not needed, just for illustration)
development:
file:
output_path: "/tmp/sphinx/sphinx.conf"
searchd:
address: "localhost"
port: 9312
pid_file: "/tmp/sphinx/searchd.pid"
log: "/tmp/sphinx/searchd.log"
index:
path: "/tmp/sphinx"
source:
xmlpipe_command: "rails r '<%= index.klass %>.sphinx_indexes[:<%= index.name %>].xmlpipe2(STDOUT)'"
Setting up indexes on models
Use a sphinx_index
block to create a new index.
The sphinx_index
method may receive optional settings that will be set in this index's section or in its source section on the generated sphinx configuration file.
These settings take precedence to the defaults defined in the configuration file.
A model may have more than one index, but they need to have different names. If two or more indexes have the same name the last one to be defined is the one which will exist.
An index name is the name of the class it's defined on unless overwritten by the name
method inside the index definition block.
Besides name
, field
, attribute
and criteria
are the methods avaible inside the index definition block.
Both field
and attribute
take a name as first parameter that may match with a Mongoid field. In this case the value of the field will be used when indexing the objects.
The attribute
method may receive a second paramenter that defines the type of the attribute. If it is ommited, than the type of the Mongoid field will be used.
At last, both methods may take an block with the object as parameter. The return of the block will be used as the value of the field or attribute when indexing.
The criteria
method receives a Mongoid::Criteria
that will be used to select the objects that will be indexed.
It's Class.all
by default.
Example: Creating a index on the person model
class Person
include Mongoid::Document
include Mongoid::Giza
field :name
field :age, type: Integer
sphinx_index(enable_star: 1) do
field :name
field :bio do |person|
"#{person.name.capitalize} was born #{person.age.years.days} ago"
end
attribute :age
end
end
Dynamic Indexes
Because of the schemaless nature of MongoDB, sometimes you may find problems mapping your mongo models to sphinx indexes. To circunvent this limitation Mongoid::Giza supports dynamic indexes.
When you define a dynamic index, it will generate a regular index based on your definition for each object of the class. This allows the creation of different indexes for objects of the same model that have different dynamic fields.
Although it's not necessary, dynamic indexes are better used together with a criteria
,
so it's possible to control which objects of the class will be indexed on each determined index.
To create a dynamic index all that needs to be done is pass the object to the sphin_index
block.
Example: Creating a dynamic index on the person model. This dynamic index will generate one index for each job that is associated to a person. On each index only the people that have that job will be indexed. Finally each dynamic attribute of the job will be a field on its index.
class Job
include Mongoid::Document
field :name
# each job object has specific dynamic fields
has_many :people
end
class Person
include Mongoid::Document
include Mongoid::Giza
field :name
field :age, type: Integer
belongs_to :job
sphinx_index do |person|
name person.job.name
criteria Person.where(job: person.job)
person.job.attributes.except("name").each do |attr, val|
field attr.to_sym
end
end
end
Manipulating Dynamic Indexes
If your objects changed and this changes must be reflected on your dynamic indexes, call Mongoid::Giza::regenerate_sphinx_indexes
to clear the existing ones and regenerate all again.
If you need finer control, you may use Mongoid::Giza#generate_sphinx_indexes
to generate all dynamic indexes for the object (which will replace any existing one with the same name), and Mongoid::Giza::remove_generated_sphinx_indexes
with the name of the generated indexes which you want to remove.
Difference Between Dynamic Index and Generated Index
A dynamic index is just the skeleton that creates indexes from every object of the class.
A generated index is the actual index that was dynamically created based on the object which the dynamic index used for evaluation.
Indexing
There are 3 ways to populate the Sphinx index: use the model class' sphinx_indexer!
method, Mongoid::Giza::Indexer.instance.index!
or Mongoid::Giza::Indexer.instance.full_index!
- sphinx_indexer!: Will execute the indexer program only on the indexes of the class. Does not regenerate dynamic indexes.
- index!: Will execute the indexer. Does not regenerate dynamic indexes.
- full_index!: Will regenerate dynamic indexes, render the configuration file and execute the indexer program on all indexes.
This gem does not execute none of those automatically to let the you define what is the best reindexing strategy for your software.
Searching
Use the search
block on the class that have the indexes where the search should run.
It returns a result array, where each position of the array is a riddle result hash, plus a key with the class name, that has the Mongoid::Criteria
that selects the matching objects from the mongo database.
Inside the search
block use the fulltext
method to perform a fulltext search.
To filter your search using the attributes defined on the index creation, use the with
and without
methods, that accept the name of the attribute and the value or range.
To order the results, use the order_by
method, that receives the attribute used for sorting and a Symbol, that can be either :asc
or :desc
.
Every other Riddle::Client setter is avaible without the =, to maintain the DSL syntax consistent.
Example: Searching on the person class
result = Person.search do
fulltext "john"
with :age, 18..40
sort_mode :extended
order_by age: :asc, :@id => :desc
end
result[:Person].each do |person|
puts "#{person.name} is #{person.age} years old"
end
Sorting MongoDB results
MongoDB doesn't return the documents on the "arbitrary" order defined by Sphinx. To maintain the Sphinx ordering you can do:
giza_ids = result[:matches].map { |match| match[:doc] }
people = result[:Person].sort_by { |person| giza_ids.index(person._giza_id) }
TODO
- Support delta indexing
- Support RT indexes
- Support distributed indexes
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request