Project

termium

0.0
The project is in a healthy, maintained state
Parser for the TERMIUM Plus terminology database of the Government of Canada
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

Termium Ruby gem

Purpose

The Termium Ruby gem parses export data formats from the TERMIUM Plus terminology database service from the Government of Canada.

WARNING - Termium XML output requires manual correction

The default Termium XML output is invalid where the term domains using angular brackets have the "greater than" sign not escaped:

<textualSupport order="1" type="DEF">
  <value>&lt;artificial intelligence> operation that allows the firing of a rule, or the
    invocation of a program or a subprogram</value>
  <sourceRef order="1" />
</textualSupport>

The remedy is to manually escape the "greater than" sign using a find/replace or a regular expression:

string.gsub(/&lt;([^>]+)>/, '<\1>')

Results in:

<textualSupport order="1" type="DEF">
  <value>&lt;artificial intelligence&gt; operation that allows the firing of a rule, or the
    invocation of a program or a subprogram</value>
  <sourceRef order="1" />
</textualSupport>

Commands

termium convert

Convert a TERMIUM Plus export XML file to a Paneron Glossarist dataset.

termium convert

Purpose

This command converts a TERMIUM Plus export XML (<ns2:termium_extract>) file to a Paneron Glossarist dataset.

The resulting dataset will look like this:

{OUTPUT_PATH}/
├── concepts/
│   ├── {CONCEPT_ID}.yaml
│   ├── ...
├── localized_concepts/
    ├── {LOCALIZED_CONCEPT_ID}.yaml
    ├── ...

Usage

$ termium convert -i INPUT_XML_FILE [-o OUTPUT_PATH] [-o DATE_ACCEPTED]

Options

Flag Description

-i, --input-path

Source path to TERMIUM Plus XML export file. The file needs to start with the <ns2:termium_extract> element.

-o, --output-path

Destination path to Glossarist dataset directory. If the directory doesn’t exist it will be created. If not provided, defaults to the basename of the input file, e.g. foo/bar.xml will export to foo/bar/.

--date-accepted

Date of acceptance for the dataset. This fills in the date_accepted value of the universal concept (which is exported to a YAML file).

Examples

The data structures of these files can be seen in the following examples.

Example 1. Sample of {CONCEPT_ID}.yaml

This is 88a7dd87-6199-3516-9cec-f4cd79ff09c6.yaml.

---
data:
  identifier: '2120638'
  localized_concepts:
    eng: e114ee44-e601-5623-9099-48cfc2be2224
    fre: 9a7b88cb-4ee6-5d59-89bb-230425a3c96a
related: []
date_accepted: 2015-05-01
status: valid
id: 88a7dd87-6199-3516-9cec-f4cd79ff09c6
Example 2. Sample of {LOCALIZED_CONCEPT_ID}.yaml

This is e114ee44-e601-5623-9099-48cfc2be2224.yaml.

---
data:
  dates: []
  definition:
  - content: layer whose nodes directly communicate with external systems
  examples: []
  id: '2120638'
  notes:
  - content: 'visible layer: term and definition standardized by ISO/IEC [ISO/IEC
      2382-34:1999].'
  - content: 34.02.09 (2382)
  sources:
  - origin:
      ref: ISO/IEC 2382-34:1999
    type: lineage
    status: identical
  - origin:
      ref: Ranger, Natalie * 2006 * Bureau de la traduction / Translation Bureau *
        Services linguistiques / Linguistic Services * Bur. dir. Centre de traduction
        et de terminologie / Dir's Office Translation and Terminology Centre * Div.
        Citoyenneté et Protection civile / Citizen. & Emergency preparedness Div.
        * Normalisation terminologique / Terminology Standardization
    type: lineage
    status: identical
  terms:
  - type: expression
    normative_status: preferred
    designation: visible layer
    grammar_info:
    - preposition: false
      participle: false
      adj: false
      verb: false
      adverb: false
      noun: false
      gender: []
      number:
      - singular
  language_code: eng

Library

Usage

This gem makes heavy use of the lutaml-model classes for XML serialization.

The following code converts the Termium extract into a Glossarist dataset.

termium_extract = Termium::Extract.from_xml(IO.read(termium_extract_file))
glossarist_col = termium_extract.to_concept
FileUtils.mkdir_p(glossarist_output_file)
glossarist_col.save_to_files(glossarist_output_file)

Credits

This gem is developed, maintained and funded by Ribose Inc.

License

The gem is available as open source under the terms of the 2-Clause BSD License.