IEEE publication identifiers ("IEEE PubID")
Purpose
Implements a mechanism to parse and utilize IEEE publication identifers.
Historic identifier patterns
There are at least two major "pattern series" of identifiers due to historical reasons: old (type I) and new (type II). This implementation attempts to support both types of publication identifier patterns.
Use cases to support
-
analyze a pattern of type I idetifier
-
parse type II idetifier into components
-
generate a filename from the components similar to type I pattern
Elements of the PubID
Publisher
Name | Abbrev |
---|---|
Institute of Electrical and Electronics Engineers |
IEEE |
Report number
{number}
- is a set of one or more digits and optional letters
Part
{part}
- is a set of digits and optional letters; starts with a digit; if a letter or letters are present then they are in the end; optional
Subpart
{subpart}
- is a set of digits and optional letters; optional, many subparts are possible
Year
{year}
- is a set of 4 digits; optional
Corrigendum & Amendment
{cor}
- is a corrigendum or an amendments with the pattern Cor {cornum}-{year}
or Amd {cornum}:{year}
where {cornum} is a set of digits; optional
Type I pattern
{publisher} {type} {series} {number}{part}.{subpart}{year} {edition}/{conform}/{correction}
-
{publisher}
IEEE -
{type}
one of the values:Standard
,Std
,Draft
,Draft Standard
,Draft Supplement
* -
{series}
one of the values:ISO/IEC
,ISO/IEC/IEEE
* -
{number}
set of digits optionally prefixed with uppercase letter and optionally suffixed with letter -
{part}
from 1 to 2 digits prefixed with.
or-
and optionally suffixed with up to 4 letters * -
{subpart}
1 digit optionally suffixed with a letter * -
{year}
4 digits prefixed with-
,:
, ` - `, or breakspace * -
{edition}
prefixEdition
followed by a reference in brackets or prefixFirst edition
followed by date in formatYYYY-MM-DD
* -
{conform}
prefixConformance
followed by 2 digits, dash, and 4 digits year * -
{correction}
prefixCor
optionally followed by breakspace, or prefixAmd
followed by.
, followed by from 1 to 2 digits, dash and 4 digits year *
(*) - optional
An identifier can be composed of 2 other identifiers with breakspace delimiter. Only the first identifier needs to cnatain puplisher, for the secont it’s optional
Following RegEx expression parses 100% of identifiers from the type I dataset:
{
^IEEE\s
((?<type1>Standard|Std|Draft(\sStandard|\sSupplement)?)\s)?
((?<series>ISO\/IEC(\/IEEE)?)\s)?
(?<number1>[A-Z]?\d+[[:alpha:]]?)
([.-](?<part1>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
(\.(?<subpart1>\d[[:alpha:]]?))?
(?<year1>([-:]|\s-\s|,\s)\d{4})?
(\s(IEEE\s(?<type2>Std)\s)?(?<number2>[A-Z]?\d+[[:alpha:]]?)
([.-](?<part2>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
([.](?<subpart2>\d[[:alpha:]]?))?
(?<year2>([-:.]|_-|\s-\s|,\s)\d{4})?)?
(\s(?<edition>Edition(\s\([^)]+\))?|First\sedition\s[\d-]+))?
(\/(?<conform>Conformance\d{2})-(?<confyear>\d{4}))?
(\/(?<correction>(Cor\s?|(Amd\.)\d{1,2})
(?<coryear>(:|-|:-)\d{4}))?$
}x
Pasing PubID elements from type II identifiers
To parse PubID elements from the type II pattern identifiers we can use a RegEx expression:
{
^IEEE\s(?<number1>\w+(\.[A-Z]\d|\sHBK)?)
(?<part1>(\.|\s)\d{1,4}[[:alpha:],]{0,7}|-\d?[A-Z]+|-\d(?=[-.]))?
(?<subpart11>\.\d{1,3}[a-z]?|-\d{5}[a-z]?|-\d+(?=[-:_]))?
(?<subpart12>\.\d|-\d+(?=-))?
(?<year1>([-:.]|_-|\s-)\d{4})?
(\/(?<number2>([A-Z]?\d+[a-z]?|Conformance\d+))
((\.|-)(?<part2>\d{1,3}[a-z]?)(?!\d))?
(\.(?<subpart21>\d{1,2}))?)?
(\/(?<number3>\d+)(\.(?<part3>\d))?)?
(?<year2>([-:.]|_-|\s-)\d{4})?
((\/|_|-|\s\/)(?<correction>(Cor|(?i)Amd(?-i))(\s|\.|\.\s)?\d{1,2})
(?<coryear>(:|-|:-|_[A-Z][a-z]{2}_)\d{4}(-\d{4})?)?)?$
}x
This RegEx expession covers 99% of the identifiers from the type II bibxml-ieee dataset.
File name generator
For type I identifiers file names are generated by replacing symbols /
, \
, ,
, '
, "
, (
, )
, and breakspace with symbol . Sequences of multiple sybols
should be squized to one symbol.
For type II identifiers it needs to parse PubID elements than join the elements in order:
IEEE.{number1}_{part1}.{subpart11}.{subpart12}-{year1}_{number2}_{part2}.{subpart21}_{number3}_{part3}-{year2}_{correction}-{coryear}