Project

loc_mods

0.0
There's a lot of open issues
Library of Congress MODS / MADS parser
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

Library of Congress MODS in Ruby

Gem Version Build Status Code Climate

Purpose

This is a class-oriented Ruby library that parses LOC’s MOD data.

This gem is developed using the MODS 3.7 XSD schema.

Usage

Ruby API

require 'loc_mods'

# Single record under `<modsCollection>`
LocMods::Collection.from_xml(File.read("spec/fixtures/record_1.xml"))

# Full NIST Tech Pubs records
# https://github.com/usnistgov/NIST-Tech-Pubs/tree/nist-pages/xml
LocMods::Collection.from_xml(File.read("reference/allrecords-MODS.xml"))

Command line interface

LocMods provides a command-line interface (CLI) for various operations.

The main executable is loc-mods.

Commands:
  loc-mods detect-duplicates PATH...  # Detect duplicate records in MODS XML files or directories
  loc-mods help [COMMAND]             # Describe available commands or one specific command

Detect duplicates

The detect-duplicates command allows you to find duplicate MODS records based on using a "primary ID" that is their DOI (Digital Object Identifier).

Note
The library assumes that every record has a DOI. If that is not the case, another way to setting the primary key needs to be defined.

Usage:

Usage:
  loc-mods detect-duplicates PATH...

Options:
  [--show-unchanged], [--no-show-unchanged] # Show unchanged attributes in the diff output
                                            # Default: false
  [--highlight-diff], [--no-highlight-diff] # Highlight only the differences
                                            # Default: false
  [--color=COLOR]                           # Use colors in the diff output (auto, on, off)
                                            # Default: auto
                                            # Possible values: auto, on, off
$ loc-mods detect-duplicates [OPTIONS] <file_or_directory_path>

Options:

--show-unchanged

(default: false) Show attributes of both objects even when they were not changed.

--highlight-diff

(default: false) Highlight values only when they differ between two records.

--color=COLOR

(default: auto) Use colors in the diff output. Values:

auto

the CLI will detect whether the terminal supports colors and display with colors if it does.

on

the CLI will always display with colors.

off

the CLI will never display with colors.

Example:

$ loc-mods detect-duplicates  /path/to/mods/files

This command will:

  1. Search for MODS XML files in the specified directory (and subdirectories if -r is used).

  2. Parse each MODS file and extract the DOI.

  3. Group records with the same DOI.

  4. For each group of duplicates:

    1. Display the shared DOI.

    2. List the filenames of the duplicate records.

    3. Show a detailed comparison of the differences between the records.

The output will highlight differences, removed elements, and missing elements between the duplicate records, helping you identify discrepancies in the metadata.

Testing

bin/update-nist-mods

License

Copyright Ribose.