Publication identifiers library
Pubid is a comprehensive framework for building and working with publication identifiers. It provides a robust API for creating, modifying, parsing, exporting, and verifying document identifiers, which are crucial for various document management systems and standards compliance
This gem includes all pubid-* gems for compatibility testing and allows parsing all identifiers from one gem.
Current use cases
The current use cases, as listed below, are what we have needed until now and have discovered through use of pubid to pubid-, metanorma-, and relaton-* projects. Work on pubid is not complete and will be expanding during the ongoing integration process.
-
building identifiers from scratch
-
modifiying identifiers (e.g. basing an amendment identifier on a base identifier)
-
parsing string identifiers
-
exporting/importing to different formats or styles (hash, yaml, URN, etc.)
-
verification of compliance for the identifier’s representation standard (NIST PubID 1.0 for NIST, RFC 5141 for ISO URN), or using commonly found patterns
-
verifying that a provided identifier is correct
-
comparing two provided identifiers (with the option to exclude specified attributes)
-
converting legacy identifiers to updated format
Purpose and Usage Guidelines
The pubid library is intended to be used wherever operations on publication identifiers are required. It acts as an intermediary between users and publication identifiers, streamlining the processes of creation, modification, comparison, and attribute access. By centralizing these operations in pubid-core, we ensure that any updates to identifier standards are seamlessly integrated across all related projects in an ecosystem.
It is crucial to use the pubid library for all operations with identifiers to ensure that updates in identifier standards are consistently reflected in these operations, and are driven by a data representation of the identifier, rather than the identifier string. For example, if an identifier like ISO/R 657/IV
is updated to ISO/R 657-4:1969
by a standards body, and pubid acts on that update, the library will automatically recognize this change. Hence, comparing:
Pubid::Registry.parse("ISO/R 657/IV") == Pubid::Registry.parse("ISO/R 657-4:1969")
=> true
or accessing the year:
Pubid::Registry.parse("ISO/R 657/IV").year
=> 1969
will yield the updated results.
Moreover, changes in how identifiers are compared may cause some identifiers to become equal or unequal based on the latest updates. Thus, to reflect these changes accurately, the pubid library should be used for all publication identifier operations.
Information model
Technical documents in general, and standards specifically, are routinely identified and cited by document identifier, rather than by author, title, and/or publication date, as is routine practice for other bibliographic types. These document identifiers are usually somewhat arbitrary, containing a number identifying the current document as one in a sequence of documents.
However, technical document identifiers typically also contain representation of at least some of the bibliographic information about the document. This is why bibliographic citation of technical document (particularly in other technical documents) is often restricted to just the document identifier and the title: the identifier is felt in such communities to convey all the relevant bibliographic information needed to make sense of the standard. (The expectations on what information is needed is different from other bibliographic types within their communities: authorship for example is typically corporate, dates are assumed by default to be the latest availabe edition, and the place of publication and publisher are identiified with the organisation publishing the document.)
Pubid uses semantic representation of the bibliographic information used to generate an identifier for an organisation. That is why the format of the identifier can be updated without changing the underlying semantics it encodes. Each organisation that Pubid supports has its own semantics, but there are recurring semantic fields that Pubid uses, which drive what the identifiers look like, and which correspond to bibliographic information. (The Relaton bibliographic model equivalent is linked after each definition.)
-
The
publisher
is the organisation responsible for the document. Standards-defining organisations, in particular, are identified by established acronyms, such as ISO, IEC, NIST, which are usually prefixed to the document identifier; that prefix takes the place of a publisher field. The choice of publisher determines the flavour of Pubid that is used. (Relaton: Creator, type: Publisher) -
Technical documents are often published by multiple organisations. Such co-publication is indicated through using the acronyms of all organisations involved in the prefix; publishers after the first are indicated as
copublisher
. For example, the prefix ISO/IEC/IEEE indicates that the document is copublished by all three organisations; ISO will be treated as the primary publisher by Pubid. (Relaton: Creator, type: Publisher) -
Technical documents are almost always numbered in a sequence; that
number
is used along with the publisher to identify the document uniquely. (Relaton: Series number) -
Technical documents may consist of different parts, which can be referenced individually. The
part
number for the document is used in addition to the number in that case. (Relaton: Citation, type: Part) -
In special circumstances, multiple subsequent versions of a document can appear with the same number; this applies for example to multiple drafts. These subsequent versions can be indicated as
iteration
. (Relaton: Edition: Version) -
Often there is different numbering used for different types of technical document; for example, technical reports, technical notes, and standards have different number sequences in ISO. For that reason, the document
type
may be required in addition to the number. For inclusion in an identifier the organisation will typically have a standardised abbreviation for each of the document types. (Relaton: Series, though Relaton for standards organisations treats these for convenience as subtypes of document, the document type beingstandard
) -
In some cases, different numbering is used not for different types of document, but different defined
series
. If that is the case, the organisation will typically have a standardised abbreviation for the series as well. (Relaton: Series) -
Some organisations publish standards in multiple languages. In that case, the document
language
can also be provided, to indicate a particular language version. -
Technical documents are often cited undated; in that case, the latest version of the document is usually assumed. If a specific edition of a document needs to be specified in the edition, this can be done by supplying an
edition
number, theyear
of publication, or both, depending on the organisation. (Relaton: Edition, Date) -
Unlike other documents, technical documents are often circulated and cited in draft version before they are officially published. If this is routinely done, the organisation may provide for means to specify that a document is a
draft
, or thestage
orstatus
which the document has reached in the authoring process. If stages are used in citing documents, the organisation will typically have standard abbreviations for those stages, which will be used in the identifier. (Relaton: Edition) -
Just as documents can be bibliographically related to other documents, document identifiers can be based on other document identifiers. This is done in Pubid by specifying two Pubid identifiers, with the first treated as an attribute of the second. The first common case of derived identifiers is when one document is a supplement of another (e.g. an appendix, a corrigendum); this is encoded in Pubid by specifying a
base
Pubid identifier, as an attribute of the derived document, with its own distinctnumber
,type
(e.g. Appendix), andyear
. The second case is when a document published by one organisation isadopted
for use by another organisation: this often is indicated in document identifiers simply by prefixing the adopting organisation’s acronym to the original identifier (e.g. BS ISO 639). (Relaton: Relations:updates
,adoptedFrom
)
Usage
Pubid::Registry#parse
resolves the pubid class related to the identifier (Pubid::Iso::Identifier
for "ISO" identifiers) and returns an object with the parsed identifier
require "pubid"
pubid = Pubid::Registry.parse("ISO/IEC 13213")
pubid.class
=> Pubid::Iso::Identifier::Base
pubid.publisher
=> "ISO"
pubid.copublisher
=> "IEC"
pubid.number
=> 13213
pubid.to_s
=> "ISO/IEC 13213"
You can find usage examples in the pubid-core repository. For more specific usage guides, refer to repositories related to specific identifier providers, such as pubid-iso for ISO identifiers and pubid-ccsds for CCSDS identifier)