ActiveTriples::Solrizer
Provides a default solr implementation under the ActiveTriples framework.
Installation
Add this line to your application's Gemfile:
gem 'active_triples-solrizer'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install active_triples-solrizer
Usage
Property definitions for ActiveTriples resources can be extended by adding a block to define indexing data type and modifiers (see table of supported values below).
property :title, :predicate => RDF::SCHEMA.title do |index|
index.data_type = :text # specify the data type of the field in solr. See (https://github.com/elrayle/active_triples-solrizer/blob/master/solr/schema.xml)[solr/schema.xml] for field type definitions.
index.as :indexed, :sortable # specify modifiers for the solr field
end
data_type | Notes |
---|---|
:text | tokenized text |
:text_en | tokenized English text |
:string | non-tokenized string |
:integer | |
:long | |
:double | |
:float | |
:boolean | |
:date | format for this date field is of the form 1995-12-31T23:59:59Z; Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z |
:coordinate | TBA - used to index the lat and long components for the "location" |
:location | TBA - latitude/longitude |
:guess | allow guessing of the type based on the type of the property value; NOTE: only checks the type of the first value when multiple values |
index.as modifiers | works with types | Notes |
---|---|---|
:indexed | all types except :coordinate | searchable, but not returned in solr doc unless also has :stored modifier |
:stored | all types except :coordinate | returned in solr doc, but not searchable unless also has :indexed modifier |
:multiValued | all types except :boolean, :coordinate | NOTE: if not specified and multiple values exist, only the first value is included in the solr doc |
:sortable | all types except :boolean, :coordinate, :location | numbers are stored as trie version of numeric type; :string, :text, :text_XX have an extra alphaSort field |
:range | all numeric types including :integer, :long, :float, :double, :date | optimize for range queries |
:vectored | valid for :text, :text_XX only |
NOTE: Modifiers placed on types that do not support the modifier are ignored.
Examples
Common prep code for all examples:
require 'active_triples'
require 'active_triples/solrizer'
# create an in-memory repository for ad-hoc testing
ActiveTriples::Repositories.add_repository :default, RDF::Repository.new
# configure the solr url
ActiveTriples::Solrizer.configure do |config|
config.solr_uri = "http://localhost:8983/solr/#/~cores/active_triples"
end
# create a DummyResource for ad-hoc testing
class DummyResource < ActiveTriples::Resource
configure :type => RDF::URI('http://example.org/SomeClass')
property :title, :predicate => RDF::SCHEMA.title do |index|
index.data_type = :text
index.as :indexed, :sortable
end
property :description_si, :predicate => RDF::SCHEMA.description do |index|
index.data_type = :text
index.as :stored, :indexed
end
property :borrower_uri_i, :predicate => RDF::SCHEMA.borrower do |index|
index.data_type = :string
index.as :indexed
end
property :clip_number_simr, :predicate => RDF::SCHEMA.clipNumber do |index|
index.data_type = :integer
index.as :stored, :indexed, :multiValued, :range
end
property :price_s, :predicate => RDF::SCHEMA.price do |index|
index.data_type = :float
index.as :stored
end
property :bookEdition, :predicate => RDF::SCHEMA.bookEdition # non-indexed property
end
# initialize solr service with defaults
ActiveTriples::Solrizer::SolrService.register
Example: Indexing Service to create solr document
# create a new resource
dr = DummyResource.new('http://www.example.org/dr')
dr.title = 'Test Title'
dr.description_si = 'Test text description stored and indexed.'
dr.borrower_uri_i = 'http://example.org/i/b2'
dr.clip_number_simr = [7,8,9,10]
dr.price_s = 789.01
dr.bookEdition = 'Ed. 2'
dr
# get solr doc
doc = ActiveTriples::Solrizer::IndexingService.new(dr).generate_solr_document
# => {
# :id=>"http://www.example.org/dr",
# :at_model_ssi=>"DummyResource",
# :object_profile_ss=>expected_object_profile_short_all_values,
# :title_ti=>"Test Title",
# :title_ssort=>"Test Title",
# :description_si_tsi=>"Test text description stored and indexed.",
# :borrower_uri_i_si=>"http://example.org/i/b2",
# :clip_number_simr_itsim=>[7,8,9,10],
# :price_s_fs=>789.01
# }
# persist doc to solr
ActiveTriples::Solrizer::SolrService.add(doc)
ActiveTriples::Solrizer::SolrService.commit
Example: Profile Indexing Service to serialize/deserialize resource
# create a new resource with all properties having values
dr1 = DummyResource.new('http://www.example.org/dr1')
dr1.title = 'Test Title'
dr1.description_si = 'Test text description stored and indexed.'
dr1.borrower_uri_i = 'http://example.org/i/b2'
dr1.clip_number_simr = [7,8,9,10]
dr1.price_s = 789.01
dr1.bookEdition = 'Ed. 2'
dr1
# serialize resource into object profile
object_profile1 = ActiveTriples::Solrizer::ProfileIndexingService.new(dr1).export
# => '{"id":"http://www.example.org/dr1",'\
# '"title":["Test Title"],'\
# '"description_si":["Test text description stored and indexed."],'\
# '"borrower_uri_i":["http://example.org/i/b2"],'\
# '"clip_number_simr":[7,8,9,10],'\
# '"price_s":[789.01],'\
# '"bookEdition":["Ed. 2"]}'
# deserialize resource from object profile
dr1_filled = ActiveTriples::Solrizer::ProfileIndexingService.new().import( object_profile1, DummyResource )
dr1_filled.attributes
# => {"id"=>"http://www.example.org/dr2",
# "title"=>["Test Title"],
# "description_si"=>["Test text description stored and indexed."],
# "borrower_uri_i"=>["http://example.org/i/b2"],
# "clip_number_simr"=>[7, 8, 9, 10],
# "borrower_uri_i"=>[],
# "clip_number_simr"=>[],
# "price_s"=>[789.01],
# "bookEdition"=>["Ed. 2"]}
# create a new resource with some properties with unset values
dr2 = DummyResource.new('http://www.example.org/dr2')
dr2.title = 'Test Title'
dr2.description_si = 'Test text description stored and indexed.'
dr2.price_s = 789.01
dr2.bookEdition = 'Ed. 2'
dr2
# serialize resource into object profile
object_profile2 = ActiveTriples::Solrizer::ProfileIndexingService.new(dr2).export
# => '{"id":"http://www.example.org/dr1",'\
# '"title":["Test Title"],'\
# '"description_si":["Test text description stored and indexed."],'\
# '"borrower_uri_i":[],'\
# '"clip_number_simr":[],'\
# '"price_s":[789.01],'\
# '"bookEdition":["Ed. 2"]}'
# deserialize resource from object profile
dr2_filled = ActiveTriples::Solrizer::ProfileIndexingService.new().import( object_profile2, DummyResource )
dr2_filled.attributes
# => {"id"=>"http://www.example.org/dr2",
# "title"=>["Test Title"],
# "description_si"=>["Test text description stored and indexed."],
# "borrower_uri_i"=>[],
# "clip_number_simr"=>[],
# "price_s"=>[789.01],
# "bookEdition"=>["Ed. 2"]}
Example: Properties Indexing Service to generate solr fields based on property definitions
# NOTE re-use dr1 and dr2 from object profile examples
# generate property fields
property_fields1 = ActiveTriples::Solrizer::PropertiesIndexingService.new(dr1).export
# => {
# :title_ti=>"Test Title",
# :title_ssort=>"Test Title",
# :description_si_tsi=>"Test text description stored and indexed.",
# :borrower_uri_i_si=>"http://example.org/i/b2",
# :clip_number_simr_itsim=>[7,8,9,10],
# :price_s_fs=>789.01
# }
# generate property fields
property_fields2 = ActiveTriples::Solrizer::PropertiesIndexingService.new(dr2).export
# => {
# :title_ti=>"Test Title",
# :title_ssort=>"Test Title",
# :description_si_tsi=>"Test text description stored and indexed.",
# :price_s_fs=>789.01
# }
Development Notes:
- I would like to see this expand to support specification of facets.
- The location and coordinate field types have not been tested and do not have examples.
- Some of the code in solr_service.rb is untested. It was copied from ActiveFedora as is. Mentions in the code to querying have not been tested. Query code was not copied at the time this document was written.
Contributing
Please observe the following guidelines:
- Do your work in a feature branch based on
master
and rebase before submitting a pull request. - Write tests for your contributions.
- Document every method you add using YARD annotations. (Note: Annotations are sparse in the existing codebase, help us fix that!)
- Organize your commits into logical units.
- Don't leave trailing whitespace (i.e. run
git diff --check
before committing). - Use well formed commit messages.