Project

dwc_agent

0.01
There's a lot of open issues
A long-lived project that still receives updates
Parses the typically messy content in Darwin Core terms that contain people names
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 12
~> 3.4

Runtime

 Project Readme

DwC Agent

Ruby 3.0 gem that cleanses messy Darwin Core terms like recordedBy or identifiedBy prior to passing to its dependent Namae gem, which executes the parsing. It also produces similarity scores between two given names.

Gem Version Gem Downloads Continuous Integration Status

Usage

require "dwc_agent"
names = DwcAgent.parse '13267 (male) W.J. Cody; 13268 (female) W.E. Kemp'
=>
[#<struct Namae::Name family="Cody", given="W.J.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>,
 #<struct Namae::Name family="Kemp", given="W.E.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>]

Parsing is occasionally messy & so it is advisable to make use of the additional clean method for each parsed name.

require "dwc_agent"
names = DwcAgent.parse 'Chaboo, Bennett, Shin'
=>
[#<struct Namae::Name family=nil, given="Chaboo", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>,
 #<struct Namae::Name family=nil, given="Bennett", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>,
 #<struct Namae::Name family=nil, given="Shin", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>]
DwcAgent.clean names[0]
=> #<struct Namae::Name family="Chaboo", given=nil, suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>

A cleaned name might produce all nil attributes if it does not pass logic checks. You can use a utility method to see if this is the case:

if cleaned_name != DwcAgent.default
  # Do something with the Namae::Name attributes
else
  # Perhaps use your unparsed input some other way
end

There's also a similarity score to compare the structure of two given names. The greater the score, the more likely aliases refer to the "same" name. For instance, "John C." scores a 2 when compared to "John Charles", a 1.1 when compared to "John" alone whereas it scores a 0 when compared to "Joshua" or "John R.". If two names share the same family name, this utility method could be used to down-weight search results if the given name portions of the names are unlikely matches.

require "dwc_agent"
score = DwcAgent.similarity_score('John C.', 'John')
=> 1.1

Or, from the command-line:

gem install dwc_agent
dwcagent "13267 (male) W.J. Cody; 13268 (female) W.E. Kemp"
=> [{"title":null,"appellation":null,"given":"W.J.","particle":null,"family":"Cody","suffix":null,"dropping_particle":null,"nick":null},{"title":null,"appellation":null,"given":"W.E.","particle":null,"family":"Kemp","suffix":null,"dropping_particle":null,"nick":null}]
gem install dwc_agent
dwcagent-similarity "John C." "John"
=> 1.1

License

dwc_agent is released under the MIT license.

Support

Bug reports can be filed at https://github.com/bionomia/dwc_agent/issues.

Copyright

Authors: David P. Shorthouse

Copyright (c) 2024