Project

pubid

0.0
There's a lot of open issues
Gem including all pubid-* gems.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 13.0
~> 3.0

Runtime

~> 0.2.4
~> 1.12.5
~> 0.1.0
~> 0.3.4
~> 0.1.3
~> 0.3.1
~> 0.2.2
~> 0.7.5
~> 0.1.0
~> 0.3.2
~> 0.4.0
 Project Readme

Publication identifiers library

Gem Version

Pubid is a comprehensive framework for building and working with publication identifiers. It provides a robust API for creating, modifying, parsing, exporting, and verifying document identifiers, which are crucial for various document management systems and standards compliance

This gem includes all pubid-* gems for compatibility testing and allows parsing all identifiers from one gem.

Current use cases

The current use cases, as listed below, are what we have needed until now and have discovered through use of pubid to pubid-, metanorma-, and relaton-* projects. Work on pubid is not complete and will be expanding during the ongoing integration process.

  • building identifiers from scratch

  • modifiying identifiers (e.g. basing an amendment identifier on a base identifier)

  • parsing string identifiers

  • exporting/importing to different formats or styles (hash, yaml, URN, etc.)

  • verification of compliance for the identifier’s representation standard (NIST PubID 1.0 for NIST, RFC 5141 for ISO URN), or using commonly found patterns

  • verifying that a provided identifier is correct

  • comparing two provided identifiers (with the option to exclude specified attributes)

  • converting legacy identifiers to updated format

Purpose and Usage Guidelines

The pubid library is intended to be used wherever operations on publication identifiers are required. It acts as an intermediary between users and publication identifiers, streamlining the processes of creation, modification, comparison, and attribute access. By centralizing these operations in pubid-core, we ensure that any updates to identifier standards are seamlessly integrated across all related projects in an ecosystem.

It is crucial to use the pubid library for all operations with identifiers to ensure that updates in identifier standards are consistently reflected in these operations, and are driven by a data representation of the identifier, rather than the identifier string. For example, if an identifier like ISO/R 657/IV is updated to ISO/R 657-4:1969 by a standards body, and pubid acts on that update, the library will automatically recognize this change. Hence, comparing:

Pubid::Registry.parse("ISO/R 657/IV") == Pubid::Registry.parse("ISO/R 657-4:1969")
=> true

or accessing the year:

Pubid::Registry.parse("ISO/R 657/IV").year
=> 1969

will yield the updated results.

Moreover, changes in how identifiers are compared may cause some identifiers to become equal or unequal based on the latest updates. Thus, to reflect these changes accurately, the pubid library should be used for all publication identifier operations.

Information model

Technical documents in general, and standards specifically, are routinely identified and cited by document identifier, rather than by author, title, and/or publication date, as is routine practice for other bibliographic types. These document identifiers are usually somewhat arbitrary, containing a number identifying the current document as one in a sequence of documents.

However, technical document identifiers typically also contain representation of at least some of the bibliographic information about the document. This is why bibliographic citation of technical document (particularly in other technical documents) is often restricted to just the document identifier and the title: the identifier is felt in such communities to convey all the relevant bibliographic information needed to make sense of the standard. (The expectations on what information is needed is different from other bibliographic types within their communities: authorship for example is typically corporate, dates are assumed by default to be the latest availabe edition, and the place of publication and publisher are identiified with the organisation publishing the document.)

Pubid uses semantic representation of the bibliographic information used to generate an identifier for an organisation. That is why the format of the identifier can be updated without changing the underlying semantics it encodes. Each organisation that Pubid supports has its own semantics, but there are recurring semantic fields that Pubid uses, which drive what the identifiers look like, and which correspond to bibliographic information. (The Relaton bibliographic model equivalent is linked after each definition.)

  • The publisher is the organisation responsible for the document. Standards-defining organisations, in particular, are identified by established acronyms, such as ISO, IEC, NIST, which are usually prefixed to the document identifier; that prefix takes the place of a publisher field. The choice of publisher determines the flavour of Pubid that is used. (Relaton: Creator, type: Publisher)

  • Technical documents are often published by multiple organisations. Such co-publication is indicated through using the acronyms of all organisations involved in the prefix; publishers after the first are indicated as copublisher. For example, the prefix ISO/IEC/IEEE indicates that the document is copublished by all three organisations; ISO will be treated as the primary publisher by Pubid. (Relaton: Creator, type: Publisher)

  • Technical documents are almost always numbered in a sequence; that number is used along with the publisher to identify the document uniquely. (Relaton: Series number)

  • Technical documents may consist of different parts, which can be referenced individually. The part number for the document is used in addition to the number in that case. (Relaton: Citation, type: Part)

  • In special circumstances, multiple subsequent versions of a document can appear with the same number; this applies for example to multiple drafts. These subsequent versions can be indicated as iteration. (Relaton: Edition: Version)

  • Often there is different numbering used for different types of technical document; for example, technical reports, technical notes, and standards have different number sequences in ISO. For that reason, the document type may be required in addition to the number. For inclusion in an identifier the organisation will typically have a standardised abbreviation for each of the document types. (Relaton: Series, though Relaton for standards organisations treats these for convenience as subtypes of document, the document type being standard)

  • In some cases, different numbering is used not for different types of document, but different defined series. If that is the case, the organisation will typically have a standardised abbreviation for the series as well. (Relaton: Series)

  • Some organisations publish standards in multiple languages. In that case, the document language can also be provided, to indicate a particular language version.

  • Technical documents are often cited undated; in that case, the latest version of the document is usually assumed. If a specific edition of a document needs to be specified in the edition, this can be done by supplying an edition number, the year of publication, or both, depending on the organisation. (Relaton: Edition, Date)

  • Unlike other documents, technical documents are often circulated and cited in draft version before they are officially published. If this is routinely done, the organisation may provide for means to specify that a document is a draft, or the stage or status which the document has reached in the authoring process. If stages are used in citing documents, the organisation will typically have standard abbreviations for those stages, which will be used in the identifier. (Relaton: Edition)

  • Just as documents can be bibliographically related to other documents, document identifiers can be based on other document identifiers. This is done in Pubid by specifying two Pubid identifiers, with the first treated as an attribute of the second. The first common case of derived identifiers is when one document is a supplement of another (e.g. an appendix, a corrigendum); this is encoded in Pubid by specifying a base Pubid identifier, as an attribute of the derived document, with its own distinct number, type (e.g. Appendix), and year. The second case is when a document published by one organisation is adopted for use by another organisation: this often is indicated in document identifiers simply by prefixing the adopting organisation’s acronym to the original identifier (e.g. BS ISO 639). (Relaton: Relations: updates, adoptedFrom)

Usage

Pubid::Registry#parse resolves the pubid class related to the identifier (Pubid::Iso::Identifier for "ISO" identifiers) and returns an object with the parsed identifier

require "pubid"

pubid = Pubid::Registry.parse("ISO/IEC 13213")
pubid.class
=> Pubid::Iso::Identifier::Base
pubid.publisher
=> "ISO"
pubid.copublisher
=> "IEC"
pubid.number
=> 13213
pubid.to_s
=> "ISO/IEC 13213"

You can find usage examples in the pubid-core repository. For more specific usage guides, refer to repositories related to specific identifier providers, such as pubid-iso for ISO identifiers and pubid-ccsds for CCSDS identifier)