Ladder
Ladder is a dynamic framework for Linked Data modelling, persistence, and full-text indexing. It is implemented as a series of Ruby modules that can be used individually and incorporated within existing ActiveModel frameworks (eg. Project Hydra), or combined as a comprehensive stack.
Although conceptually similar to Spira, Ladder takes the opposite approach: instead of making RDF repositories (triple stores) behave like ActiveModel, it allows ActiveModel objects to behave like RDF resources.
Components
- Mongoid for persistence
- ActiveTriples for RDF handling
- ElasticSearch for full-text indexing
- ActiveJob for background job execution
History
Ladder was loosely conceived over the course of several years prior to 2011 as a way to encourage the GLAM community to think less dogmatically about established (often monolithic and/or niche) tools and instead embrace a broader vision of adopting more widely-used technologies.
In early 2012, Ladder began existence as an opportunity to escape from a decade of LAMP development and become familiar with Ruby. From 2012 to late 2013, a closed prototype was built under the auspices of Deliberate Data as a proof-of-concept to test the feasibility of the design. For those interested in the historical code, the original prototype branch is available, as is an experimental branch.
Installation
Add gem "ladder"
to your Gemfile and run bundle
.
Or install manually with gem install ladder
.
Usage
- Resources
- Relations
- Dynamic Resources
- Files
- Indexing
- Indexing Resources
- Indexing Files
- Background Indexing
- Configuration
Resources
Ladder::Resources implement all the functionality of a Mongoid::Document and an ActiveTriples::Resource. To add Ladder integration for your model, require and include the main module in your class:
require 'ladder'
class Person
include Ladder::Resource
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
property :description, predicate: RDF::DC.description
end
steve = Person.new(first_name: 'Steve', description: 'Funny-looking')
=> #<Person _id: 542f0c124169720ea0000000, first_name: {"en"=>"Steve"}, description: {"en"=>"Funny-looking"}>
steve.as_document
=> {"_id"=>BSON::ObjectId('542f0c124169720ea0000000'),
"first_name"=>{"en"=>"Steve"},
"description"=>{"en"=>"Funny-looking"}}
steve.as_jsonld
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/",
# "foaf": "http://xmlns.com/foaf/0.1/"
# },
# "@id": "http://example.org/people/542f0c124169720ea0000000",
# "@type": "foaf:Person",
# "dc:description": {
# "@language": "en",
# "@value": "Funny-looking"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Steve"
# }
# }
The #property
method takes care of setting both Mongoid fields and ActiveTriples properties. Properties with a supplied class_name:
will create a many-to-many relation.
By default, URIs are dynamically generated based on the name of the model class and the configured base URI. However, you can still set the base URI for a class explicitly just as you would in ActiveTriples, eg:
Person.base_uri
=> #<RDF::URI:0x3fecf69da274 URI:http://example.org/people>
Person.configure base_uri: 'http://some.other.uri/'
=> "http://some.other.uri/"
See the configuration section for more information on configuring default behaviour.
Relations
class Person
include Ladder::Resource
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
property :books, predicate: RDF::FOAF.made, class_name: 'Book'
end
class Book
include Ladder::Resource
configure type: RDF::DC.BibliographicResource
property :title, predicate: RDF::DC.title
property :people, predicate: RDF::DC.creator, class_name: 'Person'
end
b = Book.new(title: 'Heart of Darkness')
=> #<Book _id: 542f28d44169721941000000, title: {"en"=>"Heart of Darkness"}, person_ids: nil>
b.people << Person.new(first_name: 'Joseph Conrad')
=> [#<Person _id: 542f28dd4169721941010000, first_name: {"en"=>"Joseph Conrad"}, book_ids: [BSON::ObjectId('542f28d44169721941000000')]>]
b.as_jsonld
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/"
# },
# "@id": "http://example.org/books/542f28d44169721941000000",
# "@type": "dc:BibliographicResource",
# "dc:creator": {
# "@id": "http://example.org/people/542f28dd4169721941010000"
# },
# "dc:title": {
# "@language": "en",
# "@value": "Heart of Darkness"
# }
# }
You'll notice that only the RDF node for the Book object on which #as_jsonld
was called is serialized. To include the entire graph for related nodes, use the related: true
option:
b.as_jsonld related: true
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/",
# "foaf": "http://xmlns.com/foaf/0.1/"
# },
# "@graph": [
# {
# "@id": "http://example.org/books/542f28d44169721941000000",
# "@type": "dc:BibliographicResource",
# "dc:creator": {
# "@id": "http://example.org/people/542f28dd4169721941010000"
# },
# "dc:title": {
# "@language": "en",
# "@value": "Heart of Darkness"
# }
# },
# {
# "@id": "http://example.org/people/542f28dd4169721941010000",
# "@type": "foaf:Person",
# "foaf:made": {
# "@id": "http://example.org/books/542f28d44169721941000000"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Joseph Conrad"
# }
# }
# ]
# }
If you want more control over how relations are defined (eg. in the case of embedded or 1:n relations), you can just use regular Mongoid and ActiveTriples syntax:
class Person
include Ladder::Resource
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
embeds_one :address, class_name: 'Place'
property :address, predicate: RDF::FOAF.based_near
end
class Place
include Ladder::Resource
configure type: RDF::VCARD.Address
property :city, predicate: RDF::VCARD.locality
property :country, predicate: RDF::VCARD.send('country-name')
embedded_in :resident, class_name: 'Person', inverse_of: :address
property :resident, predicate: RDF::VCARD.agent
end
steve = Person.new(first_name: 'Steve')
=> #<Person _id: 542f341e41697219a2000000, first_name: {"en"=>"Steve"}, address: nil>
steve.address = Place.new(city: 'Toronto', country: 'Canada')
=> #<Place _id: 542f342741697219a2010000, city: {"en"=>"Toronto"}, country: {"en"=>"Canada"}, resident: nil>
steve.as_jsonld
# => {
# "@context": {
# "foaf": "http://xmlns.com/foaf/0.1/",
# "vcard": "http://www.w3.org/2006/vcard/ns#"
# },
# "@graph": [
# {
# "@id": "http://example.org/places/542f342741697219a2010000",
# "@type": "vcard:Address",
# "vcard:agent": {
# "@id": "http://example.org/people/542f341e41697219a2000000"
# },
# "vcard:country-name": {
# "@language": "en",
# "@value": "Canada"
# },
# "vcard:locality": {
# "@language": "en",
# "@value": "Toronto"
# }
# },
# {
# "@id": "http://example.org/people/542f341e41697219a2000000",
# "@type": "foaf:Person",
# "foaf:based_near": {
# "@id": "http://example.org/places/542f342741697219a2010000"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Steve"
# }
# }
# ]
# }
Note in this case that both objects are included in the RDF graph, thanks to embedded relations. This can be useful to avoid additional queries to the database for objects that are tightly coupled.
Dynamic Resources
In line with ActiveTriples' Open Model design, you can define properties on any Resource instance similarly to how you would on the class:
class Person
include Ladder::Resource::Dynamic
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
end
steve = Person.new(first_name: 'Steve')
=> #<Person _id: 546669234169720397000000, first_name: {"en"=>"Steve"}>
steve.description
=> NoMethodError: undefined method 'description' for #<Person:0x007fb54eb1d0b8>
steve.property :description, predicate: RDF::DC.description
=> {:description=>"http://purl.org/dc/terms/description"}
steve.description = 'Funny-looking'
=> "Funny-looking"
steve.as_document
=> {"_id"=>BSON::ObjectId('546669234169720397000000'),
"first_name"=>{"en"=>"Steve"},
"_context"=>{:description=>"http://purl.org/dc/terms/description"},
"description"=>"Funny-looking"}
steve.as_jsonld
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/",
# "foaf": "http://xmlns.com/foaf/0.1/"
# },
# "@id": "http://example.org/people/546669234169720397000000",
# "@type": "foaf:Person",
# "dc:description": "Funny-looking",
# "foaf:name": {
# "@language": "en",
# "@value": "Steve"
# }
Additionally, you can push RDF statements into a Resource instance like you would with ActiveTriples or RDF::Graph, noting that the subject is ignored since it is implicit:
steve << RDF::Statement(nil, RDF::DC.description, 'Tall, dark, and handsome')
steve << RDF::Statement(nil, RDF::FOAF.depiction, RDF::URI('http://some.image/pic.jpg'))
steve << RDF::Statement(nil, RDF::FOAF.age, 32)
steve.as_document
=> {"_id"=>BSON::ObjectId('546669234169720397000000'),
"first_name"=>{"en"=>"Steve"},
"_context"=>
{:description=>"http://purl.org/dc/terms/description",
:depiction=>"http://xmlns.com/foaf/0.1/depiction",
:age=>"http://xmlns.com/foaf/0.1/age"},
"description"=>"Tall, dark, and handsome",
"depiction"=>"http://some.image/pic.jpg",
"age"=>32}
steve.as_jsonld
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/",
# "foaf": "http://xmlns.com/foaf/0.1/",
# "xsd": "http://www.w3.org/2001/XMLSchema#"
# },
# "@id": "http://example.org/people/546669234169720397000000",
# "@type": "foaf:Person",
# "dc:description": "Tall, dark, and handsome",
# "foaf:age": {
# "@type": "xsd:integer",
# "@value": "32"
# },
# "foaf:depiction": {
# "@id": "http://some.image/pic.jpg"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Steve"
# }
# }
Dynamic properties can not be localized. They can be any kind of literal, but they can not be a related object. They can, however, contain the related object's URI.
Files
Files are bytestreams that store binary content using MongoDB's GridFS storage system. They are still identifiable by a URI, and contain technical metadata about the File's contents.
class Person
include Ladder::Resource
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
property :thumbnails, predicate: RDF::FOAF.depiction, class_name: 'Image', inverse_of: nil
end
class Image
include Ladder::File
end
Because Files must be the target of a one-way relation, the inverse_of: nil
option is required on the property (unless the one_sided_relations
configuration option is set). Note that Files can not be embedded.
steve = Person.new(first_name: 'Steve')
=> #<Person _id: 549d83c64169720b32010000, first_name: {"en"=>"Steve"}>
thumb = Image.new(file: open('http://some.image/pic.jpg'))
=> #<Image _id: 549d83c24169720b32000000>
steve.thumbnails << thumb
=> [#<Image _id: 549d83c24169720b32000000, >]
steve.as_jsonld
# => {
# "@context": {
# "foaf": "http://xmlns.com/foaf/0.1/"
# },
# "@id": "http://example.org/people/549d83c64169720b32010000",
# "@type": "foaf:Person",
# "foaf:depiction": {
# "@id": "http://example.org/images/549d83c24169720b32000000"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Steve"
# }
# }
steve.save
# ... File is stored to GridFS ...
=> true
Files have all the attributes of a GridFS file, and the stored binary content is accessed using #data
.
thumb.reload
=> #<Image _id: 549d86184169720b6a000000, >
thumb.as_document
=> {"_id"=>BSON::ObjectId('549d86184169720b6a000000'),
"length"=>59709,
"chunkSize"=>4194304,
"uploadDate"=>2014-12-26 16:00:29 UTC,
"md5"=>"0d4a486e2cd71c51b7a92cfe96f29324",
"contentType"=>"image/jpeg",
"filename"=>"549d86184169720b6a000000/open-uri20141226-2922-u66ap6"}
thumb.length
=> 59709
thumb.data
=> # ... binary data ...
Indexing
Indexing Resources
You can index Resources for keyword searching by mixing in the Ladder::Searchable module:
class Person
include Ladder::Resource
include Ladder::Searchable
configure type: RDF::FOAF.Person
property :first_name, predicate: RDF::FOAF.name
property :description, predicate: RDF::DC.description
end
kimchy = Person.new(first_name: 'Shay', description: 'Real genius')
=> #<Person _id: 543b457b41697231c5000000, first_name: {"en"=>"Shay"}, description: {"en"=>"Real genius"}>
kimchy.save
=> true
results = Person.search 'shay'
# => #<Elasticsearch::Model::Response::Response:0x007fa2ca82a9f0
# @klass=[PROXY] Person,
# @search=
# #<Elasticsearch::Model::Searching::SearchRequest:0x007fa2ca830a58
# @definition={:index=>"people", :type=>"person", :q=>"Shay"},
# @klass=[PROXY] Person,
# @params={}>>
results.count
=> 1
results.first._source
=> {"description"=>"Real genius", "first_name"=>"Shay"}
results.records.first == kimchy
=> true
When indexing, you can control how your model is stored in the index by calling #index_for_search
and supplying a block that returns a serializable hash:
Person.index_for_search { as_jsonld }
=> :as_indexed_json
kimchy.as_indexed_json
# => {
# "@context": {
# "dc": "http://purl.org/dc/terms/",
# "foaf": "http://xmlns.com/foaf/0.1/"
# },
# "@id": "http://example.org/people/543b457b41697231c5000000",
# "@type": "foaf:Person",
# "dc:description": {
# "@language": "en",
# "@value": "Real genius"
# },
# "foaf:name": {
# "@language": "en",
# "@value": "Shay"
# }
# }
Person.index_for_search { as_qname }
=> :as_indexed_json
kimchy.as_indexed_json
# => {
# "dc": {
# "description": { "en": "Real genius" }
# },
# "foaf": {
# "name": { "en": "Shay" }
# },
# "rdf": {
# "type": "foaf:Person"
# }
# }
You can also index related objects as framed JSON-LD using #as_framed_jsonld
or by using the related: true
option with #as_qname
or #as_jsonld
. Related objects can be serialized by default using the :with_relations
configuration option.
Indexing Files
Files that contain textual content (eg. HTML, PDF, ePub, DOC, etc) can be indexed when they are stored, again by mixing in the Ladder::Searchable module. This can be useful if you want to retrieve a File by searching for the textual content that it contains. Note that this requires the Mapper Attachments Plugin for Elasticsearch to be installed.
class OCR
include Ladder::File
include Ladder::Searchable
end
pdf = OCR.new(file: open('http://some.location/ocr.pdf'))
=> #<OCR _id: 54add77a4169721c23000000>
pdf.save
=> true
results = OCR.search 'Moomintroll'
# => #<Elasticsearch::Model::Response::Response:0x007fa2ca82a9f0
# @klass=[PROXY] OCR,
# @search=
# #<Elasticsearch::Model::Searching::SearchRequest:0x007fa2ca830a58
# @definition={:index=>"ocrs", :type=>"ocr", :q=>"Moomintroll"},
# @klass=[PROXY] OCR,
# @params={}>>
results.count
=> 1
results.records.first == pdf
=> true
results.records.first.as_document
=> {"_id"=>BSON::ObjectId('54add77a4169721c23000000'),
"length"=>12941,
"chunkSize"=>4194304,
"uploadDate"=>2015-01-08 01:03:54 UTC,
"md5"=>"831a47b953d6e11d17cee7de9abd73c4",
"contentType"=>"application/pdf",
"filename"=>"54add77a4169721c23000000/ocr.pdf"}
results.records.first.data
=> # ... binary data ...
Note the use of #records
to access the Ladder::File instances directly (see here for more information). However, if you want to get information about the file characteristics (including the extracted textual content), you can use a modified search query:
results = OCR.search 'Moomintroll', fields: '*'
# => #<Elasticsearch::Model::Response::Response:0x007fc36cadaa20
# @klass=[PROXY] OCR,
# @search=
# #<Elasticsearch::Model::Searching::SearchRequest:0x007fc36cadab10
# @definition={:index=>"ocrs", :type=>"ocr", :body=>{:query=>{:query_string=>{:query=>"Moomintroll"}}, :fields=>"*"}},
# @klass=[PROXY] OCR,
# @params={}>>
results.count
=> 1
results.first.fields
=> {
"file.content_type"=>["application/pdf"],
"file.keywords"=>[""],
"file"=>
["\nAnd so Moomintroll was helplessly thrown out into a strange and dangerous world and \ndropped up to his ears in the first snowdrift of his experience. It felt unpleasantly prickly \nto his velvet skin, but at the same time his nose caught a new smell. It was a more \nserious smell than any he had met before, and slightly frightening. But it made him wide \nawake and greatly interested.\n\n\n"],
"file.date"=>["2014-12-19T15:32:58Z"],
"file.title"=>["Untitled"]}
In this case, the #fields
Hash contains all of the technical metadata obtained by Elasticsearch during indexing. Note that this is not the same as the metadata stored by GridFS (with the possible exception of content type). Finally, we can also provide contextual highlighting for search results by using a slightly more complex search query:
results = OCR.search query: { query_string: { query: 'his' } }, highlight: { fields: { file: {} } }
# => #<Elasticsearch::Model::Response::Response:0x007fd653dc8b48
# @klass=[PROXY] OCR,
# @search=
# #<Elasticsearch::Model::Searching::SearchRequest:0x007fd653dc8b48
# @definition={:index=>"ocrs", :type=>"ocr", :body=>{:query=>{:query_string=>"Moomintroll"},
# :highlight=>{:fields=>{:file=>{}}}}},
# @klass=[PROXY] OCR,
# @params={}>>
results.count
=> 1
results.first.highlight.file.count
=> 2
results.first.highlight.file
=> [" <em>his</em> ears in the first snowdrift of <em>his</em> experience. It felt unpleasantly prickly \nto <em>his</em> velvet skin",
", but at the same time <em>his</em> nose caught a new smell. It was a more \nserious smell than any he had met"]
More information about highlighting queries is available in the Elasticsearch documentation.
Background Indexing
In large-scale production environments, sending an HTTP request to Elasticsearch during the database transaction isn't optimal (especially for large Files), so Ladder uses ActiveJob to queue and process indexing operations in the background. Just use the Ladder::Searchable::Background module in your model:
class OCR
include Ladder::File
include Ladder::Searchable::Background
end
# ...
class Person
include Ladder::Resource
include Ladder::Searchable::Background
configure type: RDF::FOAF.Person
end
# ...
You'll also have to set the queue adapter in your application, depending on which backend you're using:
ActiveJob::Base.queue_adapter = :sidekiq
For more information on available queueing adapters and their features, see the ActiveJob documentation.
Configuration
Configuration options are set using Ladder::Config#settings
, eg:
Ladder::Config.settings[:base_uri] = 'http://example.org'
=> "http://example.org"
Ladder::Config.settings
=> {:base_uri=>"http://example.org", :localize_fields=>false, :one_sided_relations=>false}
Ladder currently supports the following configuration options (defaults in parentheses):
-
:base_uri ('urn:x-ladder')
: Tells Ladder the base (root) URI to use for generating model URIs. For a Rack-based linked data application, this will typically be the HTTP(S) URL, eg. "http://some.domain/my_application/" -
:localize_fields (false)
: When set totrue
, Ladder will set fields defined using#property
to be localized by default. -
:one_sided_relations (false)
: When set totrue
, Ladder will set relations defined using#property
to be one-sided many-to-many relations. Otherwise, it will define has-and-belongs-to-many (HABTM) relations. -
:with_relations
: When set totrue
, Ladder will always include directly related objects when serializing JSON-LD.
Contributing
Anyone and everyone is welcome to contribute. Go crazy.
- Fork it ( https://github.com/ladder/ladder/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Authors
MJ Suhonos / mj@suhonos.ca
Acknowledgements
My biggest thanks to all the wonderful people who have shown interest and support for Ladder over the years.
Many thanks to Christopher Knight @NomadicKnight for ceding the "ladder" gem name. Check out his startup, Adventure Local / @advlo_.
License
Apache License Version 2.0 http://apache.org/licenses/LICENSE-2.0.txt