Sequel::Elasticsearch
Sequel::Elasticsearch allows you to transparently mirror your database, or specific tables, to Elasticsearch. It's especially useful if you want the power of search through Elasticsearch, but keep the sanity and structure of a relational database.
Installation
Add this line to your application's Gemfile:
gem 'sequel-elasticsearch'
And then execute:
$ bundle
Or install it yourself as:
$ gem install sequel-elasticsearch
Usage
Require the gem with:
require 'sequel/plugins/elasticsearch'
You'll need an Elasticsearch cluster to sync your data to. By default the gem will try to connect to http://localhost:9200
. Set the ELASTICSEARCH_URL
ENV variable to the URL of your cluster.
This is a Sequel plugin, so you can enable it DB wide:
Sequel::Model.plugin :elasticsearch
Or per model:
Document.plugin Sequel::Elasticsearch
# or
class Document < Sequel::Model
plugin :elasticsearch
end
There's a couple of options you can set:
Sequel::Model.plugin :elasticsearch,
elasticsearch: { log: true }, # Options to pass the the Elasticsearch ruby client
index: 'all-my-data', # The index in which the data should be stored. Defaults to the table name associated with the model
type: 'is-mine' # The type in which the data should be stored.
And that's it! Just transact as you normally would, and your records will be created and updated in the Elasticsearch cluster.
Indexing
Ensure that you create the index mappings for your data before using this plugin, otherwise you might get some weird results.
The records will by default be indexed using the values
call of the model. Should you need to customize what's indexed, you can define a indexed_values
method (or as_indexed_json
method if you prefer the Rails way).
Searching
Your model is now searchable through Elasticsearch. Just pass down a string that's parsable as a query string query.
Document.es('title:Sequel')
Document.es('title:Sequel AND body:Elasticsearch')
The result from the es
method is an enumerable containing Sequel::Model
instances of your model:
results = Document.es('title:Sequel')
results.each { |e| p e }
# Outputs
# #<Document @values={:id=>1, :title=>"Sequel", :body=>"Document 1"}>
# #<Document @values={:id=>2, :title=>"Sequel", :body=>"Document 2"}>
The result also contains the meta info about the Elasticsearch query result:
results = Document.es('title:Sequel')
p results.count # The number of documents included in this result
p results.total # The total number of documents in the index that matches the search
p results.timed_out # If the search timed out or not
p results.took # How long, in miliseconds the search took
You can also use the scroll API to search and fetch large datasets:
# Get a dataset that will stay consistent for 5 minutes and extend that time with 1 minute on every iteration
scroll = Document.es('test', scroll: '5m')
p scroll_id # Outputs the scroll_id for this specific scrolling snapshot
puts "Found #{scroll.count} of #{scroll.total} documents"
scroll.each { |e| p e }
while (scroll = Document.es(scroll, scroll: '1m')) && scroll.empty? == false do
puts "Found #{scroll.count} of #{scroll.total} documents"
scroll.each { |e| p e }
end
Import
You can import the whole dataset, or specify a dataset to be imported. This will create a new, timestamped index for your dataset, and import all the records from that dataset into the index. An alias will be created (or updated) to point to the newly created index.
Document.import! # Import all the Document records. Use the default settings.
Document.import!(dataset: Document.where(active: true)) # Import all the active Document records
Document.import!(
index: 'active-documents', # Use the active-documents index
dataset: Document.where(active: true), # Only index active documents
batch_size: 20 # Send documents to Elasticsearch in batches of 20 records
)
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/jrgns/sequel-elasticsearch.
Features that needs to be built:
- An
es
method to search through the data on the cluster. - Let
es
return an enumerator ofSequel::Model
instances. - A rake task to create or suggest mappings for a table.
License
The gem is available as open source under the terms of the MIT License.