GeoBlacklight Sidecar Images
Store local copies of remote imagery in GeoBlacklight.
- Requirements
- Installation
- Rake Tasks
- View Customization
- Development
Description
This GeoBlacklight plugin captures remote images from geographic web services and saves them locally. It borrows the concept of a SolrDocumentSidecar from Spotlight, to have an ActiveRecord-based "sidecar" to match each non-AR SolrDocument. This allows us to use ActiveStorage to attach images to our solr documents.
Example Screenshot
Requirements
Suggested
- Background Job Processor
Sidekiq is an excellent choice if you need an opinion.
Installation
Existing GeoBlacklight Instance
GeoBlacklight v4 with Aardvark metadata / Add the gem to your Gemfile.
gem "geoblacklight_sidecar_images", "~> 1.0"
GeoBlacklight v3 with GBL v1.0 metadata / Add the gem to your Gemfile.
gem "geoblacklight_sidecar_images", "~> 0.9.1", "< 1.0"
Run the generator.
$ bin/rails generate geoblacklight_sidecar_images:install
Run the database migration.
$ bin/rails db:migrate
Complete any necessary Active Storage setup steps, for example:
- Add a config/storage.yml file
local:
service: Disk
root: <%= Rails.root.join("storage") %>
- Add config/environments declarations, development.rb for example:
# Store uploaded files on the local file system (see config/storage.yml for options)
config.active_storage.service = :local
New GeoBlacklight Instance
Create a new GeoBlacklight instance with the GBLSI code
$ rails new app-name -m https://raw.githubusercontent.com/geoblacklight/geoblacklight_sidecar_images/develop/template.rb
Ingest Test Documents
# Run your GBL instance
bundle exec rake geoblacklight:server
# Index the GBL test fixtures
bundle exec rake gblsci:sample_data:seed
Rake tasks
Harvest images
Harvest all images
Spawns background jobs to harvest images for all documents in your Solr index.
bundle exec rake gblsci:images:harvest_all
Harvest an individual image
Allows you to add images one document id at a time. Pass a DOC_ID env var.
DOC_ID='stanford-cz128vq0535' bundle exec rake gblsci:images:harvest_doc_id
Harvest all incomplete states
Reattempt image harvesting for all non-successful state objects.
bundle exec rake gblsci:images:harvest_retry
Check image states
bundle exec rake gblsci:images:harvest_states
We use a state machine library to track success/failure of our harvest tasks. The states we track are:
- initialized - SolrDocumentSidecar created, no harvest attempt run
- queued - Harvest attempt queued as background job
- processing - Harvest attempt at work
- succeeded - Harvest was successful, image attached
- failed - Harvest failed, no image attached, error logged
- placeheld - Harvest was not successful, placeholder imagery will be used
SolrDocumentSidecar.in_state(:succeeded) => [#<SolrDocumentSidecar:0x0000000170697960 ... ]
SolrDocumentSidecar.image.attached? => false
SolrDocumentSidecar.image_state.current_state => "placeheld"
SolrDocumentSidecar.image_state.last_transition => #<SidecarImageTransition id: 207, to_state: "placeheld", metadata: {"solr_doc_id"=>"stanford-cg357zz0321", "solr_version"=>1616509329754554368, "placeheld"=>true, "viewer_protocol"=>"wms", "image_url"=>"http://geowebservices-restricted.stanford.edu/geoserver/wms/reflect?&FORMAT=image%2Fpng&TRANSPARENT=TRUE&LAYERS=druid:cg357zz0321&WIDTH=300&HEIGHT=300", "service_url"=>"http://geowebservices-restricted.stanford.edu/geoserver/wms/reflect?&FORMAT=image%2Fpng&TRANSPARENT=TRUE&LAYERS=druid:cg357zz0321&WIDTH=300&HEIGHT=300", "gblsi_thumbnail_uri"=>false, "error"=>"Faraday::Error::ConnectionFailed"},...>
Destroy images
Remove everything
Remove all sidecar objects and attached images
bundle exec rake gblsci:images:harvest_purge_all
Remove orphaned AR objects
Remove all sidecar objects and attached images for AR objects without a corresponding Solr document
bundle exec rake gblsci:images:harvest_purge_orphans
Remove a batch
Remove sidecar objects and attached images via a CSV file of document ids
bundle exec rake gblsci:images:harvest_destroy_batch
Troubleshooting
Harvest report
Generate a CSV file of sidecar objects and associated image state. Useful for debugging problem items.
bundle exec rake gblsci:images:harvest_report
Failed state inspect
Prints details for failed state harvest objects to stdout
bundle exec rake gblsci:images:harvest_failed_state_inspect
Prioritize Solr Thumbnail Field URIs
If you add a thumbnail uri to your geoblacklight solr documents...
Example Doc
{
...
"dc_format_s":"TIFF",
"dc_creator_sm":["Minnesota. Department of Highways."],
"thumbnail_path_ss":"https://umedia.lib.umn.edu/sites/default/files/imagecache/square300/reference/562/image/jpeg/1089695.jpg",
"dc_type_s":"Still image",
...
}
Then you can edit your GeoBlacklight settings.yml file to point at that solr field (Settings.GBLSI_THUMBNAIL_FIELD). Any docs in your index that have a value for that field will harvest the image at that URI instead of trying to retrieve an image via IIIF or the other web services.
View customization
Use basic Active Storage patterns to display imagery in your application.
Example Methods
# Is there an image?
document.sidecar.image.attached?
# Can the image size be manipulated?
document.sidecar.image.variable?
# Example image_tag with resize
<%= image_tag document.sidecar.image.variant(resize_to_fit: [100, 100]), {class: 'media-object'} %>
Search results
This GBL plugin includes a custom catalog/_index_split_default.html.erb file. Look there for examples on calling the image method.
Show pages
Example for adding a thumbnail to the show page sidebar.
catalog/_show_sidebar.html.erb
# Add to end of file
<% if @document.sidecar.image.attached? %>
<% if @document.sidecar.image.variable? %>
<div class="card">
<div class="card-header">Thumbnail</div>
<div class="card-body">
<%= image_tag @document.sidecar.image.variant(resize_to_fit: [200, 200]), {class: 'mr-3'} %>
</div>
</div>
<% end %>
<% end %>
Development
# Run test suite
bundle exec rake ci
# Launch test app server
cd .internal_test_app/
bundle exec rake geoblacklight:server
# Load test fixtures
bundle exec rake gblsci:sample_data:seed
# Run harvest
bundle exec rake gblsci:images:harvest_all
# Tail image service log file
tail -f log/image_service_development.log