Project

harvestdor

0.0
No commit activity in last 3 years
No release in over 3 years
Harvest DOR object metadata from a Stanford public purl page
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

Runtime

 Project Readme

Harvestdor

Build Status Code Climate Test Coverage Gem Version

A Gem to harvest data from a Stanford Purl page, with convenience methods for getting Nokogiri::XML::Document and errors when pieces are missing

Installation

Add this line to your application's Gemfile:

gem 'harvestdor'

And then execute:

$ bundle

Or install it yourself as:

$ gem install harvestdor

Usage

Configuration

Possible configuration options (with default values unless otherwise indicated)

client = Harvestdor::Client.new({ # Example with all possible options :log_dir => File.join(File.dirname(FILE), "..", "logs"), :log_name => 'harvestdor.log', :purl => 'https://purl.stanford.edu' })

Option 1: use a yaml file

for contents of yml -- see spec/config/example.yml

client = Harvestdor::Client.new({:config_yml_path => path_to_my_yml}) client.mods('oo111oo2222')

Option 2: pass in non-default configurations as a hash

client = Harvestdor::Client.new({:purl => 'https://my_purl.org'}) client.mods('oo111oo2222')

Option 3: set the attributes explicitly in your code

client = Harvestdor::Client.new client.config.purl = 'https://my_purl.org' client.mods('oo111oo2222')

XML from PURL pages

You can get, for example, the contentMetadata for a druid:

it "#content_metadata retrieves contentMetadata as a Nokogiri::XML::Document" do cm = Harvestdor.content_metadata('bb375wb8869', 'https://purl-test.stanford.edu') cm.should be_kind_of(Nokogiri::XML::Document) cm.root.name.should == 'contentMetadata' cm.root.attributes['objectId'].text.should == @druid end

Or the MODS metadata:

it "#mods returns a Nokogiri::XML::Document from the purl mods" do x = Harvestdor.mods('bb375wb8869', 'https://purl-test.stanford.edu') x.should be_kind_of(Nokogiri::XML::Document) x.root.name.should == 'mods' x.root.namespace.href.should == Harvestdor::MODS_NAMESPACE end

Similarly for

  • mods
  • public_xml (all of it)
  • content_metadata
  • identity_metadata
  • rights_metadata
  • rdf
  • dc

You can also do this from a Harvestdor::Client object, and it will use the purl from the Client.config:

client = Harvestdor::Client.new({purl: 'https://thisone.org'}) client.identity_metadata('bb375wb8869')

Contributing

  • Fork it
  • Create your feature branch (git checkout -b my-new-feature)
  • Write code and tests.
  • Commit your changes (git commit -am 'Added some feature')
  • Push to the branch (git push origin my-new-feature)
  • Create new Pull Request

Releases

  • 0.0.14 Bug fix for compatibility with jruby
  • 0.0.13 Updated to work with Faraday 0.9, releases via rubygems instead of sul-gems
  • 0.0.11 better error handling, and better testing for errors
  • 0.0.10 tweak specs to test that unnec fetching isn't done.
  • 0.0.9 allows public xml to be passed as Nokogiri::XML::Document to content_metadata, etc. to avoid unnec fetching
  • 0.0.8 avoid undefined method 'size' from scrub_oai_args when using a non-nil default date param
  • 0.0.7 add oai client timeout overrides, update README
  • 0.0.6 refactoring oai_harvest for greater simplicity and passing errors through, add oai_record (get_record OAI request)
  • 0.0.5 don't send empty string arguments to OAI server so you can get actual results
  • 0.0.4 add integration spec and get it working with actual OAI server
  • 0.0.3 add method to get mods from purl
  • 0.0.2 tidy up README
  • 0.0.1 initial commit