Interscript: Interoperable Script Conversion Systems, with Ruby and JavaScript runtimes
Introduction
This repository contains interoperable transliteration schemes from:
-
ALA-LC
-
BGN/PCGN
-
ICAO
-
ISO
-
UN (by UNGEGN)
-
Many, many other script conversion system authorities.
The goal is to achieve interoperable transliteration schemes allowing quality comparisons.
Demonstration
Installing Interscript
gem install interscript
Interscript stats
interscript stats | less
Transliteration
cat ara-Arab.txt
interscript -s odni-ara-Arab-Latn-2015 ara-Arab.txt -o ara-Arab-out.txt
cat ara-Arab-out.txt
Diacritization
# First, we need to install rababa
gem install rababa
# Now we can transliterate
interscript -s "var-ara-Arab-Arab-rababa|odni-ara-Arab-Latn-2015" ara-Arab.txt -o ara-Arab-out-tl.txt
cat ara-Arab-out-tl.txt
# Compare this to transliteration without diacritization
cat ara-Arab-out.txt
Reversing
cat rus-rev.txt
interscript -s odni-rus-Latn-Cyrl-2015 rus-rev.txt -o rus.txt
# Note that Latn and Cyrl are reversed
cat rus.txt
Installation
Prerequisites
Interscript depends on Ruby. Once you manage to install Ruby, it’s easy. This part won’t work until we release Interscript v2, please use the one below.
gem install interscript -v "~>2.0"
You can also download a local copy of this Git repository, eg. for development purposes:
git clone https://github.com/interscript/lcs
cd lcs/ruby
bundle install
Additional prerequisites for Thai systems
If you want to transliterate Thai systems, you will need to install some additional requirements. Please consult: Usage with Secryst.
Usage
Assume you have a file ready in the source script like this:
cat <<EOT > rus-Cyrl.txt
Эх, тройка! птица тройка, кто тебя выдумал? знать, у бойкого народа ты
могла только родиться, в той земле, что не любит шутить, а
ровнем-гладнем разметнулась на полсвета, да и ступай считать версты,
пока не зарябит тебе в очи. И не хитрый, кажись, дорожный снаряд, не
железным схвачен винтом, а наскоро живьём с одним топором да долотом
снарядил и собрал тебя ярославский расторопный мужик. Не в немецких
ботфортах ямщик: борода да рукавицы, и сидит чёрт знает на чём; а
привстал, да замахнулся, да затянул песню — кони вихрем, спицы в
колесах смешались в один гладкий круг, только дрогнула дорога, да
вскрикнул в испуге остановившийся пешеход — и вон она понеслась,
понеслась, понеслась!
Н.В. Гоголь
EOT
You can run interscript
on this text using different transliteration systems.
interscript rus-Cyrl.txt \
--system=bgnpcgn-rus-Cyrl-Latn-1947 \
--output=bgnpcgn-rus-Latn.txt
interscript rus-Cyrl.txt \
--system=iso-rus-Cyrl-Latn-9-1995 \
--output=iso-rus-Latn.txt
interscript rus-Cyrl.txt \
--system=icao-rus-Cyrl-Latn-9303 \
--output=icao-rus-Latn.txt
interscript rus-Cyrl.txt \
--system=bas-rus-Cyrl-Latn-2017-bss \
--output=bas-rus-Latn.txt
It is then easy to see the exact differences in rendering between the systems.
diff bgnpcgn-rus-Latn.txt bas-rus-Latn.txt
If you use Interscript from the Git repository, you would call the following command
instead of interscript
:
# Ensure you are in your Git repository root path
ruby/bin/interscript rus-Cyrl.txt \
--system=bas-rus-Cyrl-Latn-2017-bss \
--output=bas-rus-Latn.txt
Adding transliteration system
Please consult the Map Editing Guide
Integration with Ruby applications
Please consult the guide for integration with Ruby applications
ISCS system codes
In accordance with ISO/CC 24229, the system code identifying a script conversion system has the following components:
e.g. bgnpcgn-rus-Cyrl-Latn-1947
:
bgnpcgn
-
the authority identifier
rus
-
an ISO 639-{1,2,3,5} language code that this system applies to (For 639-2, use (T) code)
Cyrl
-
an ISO 15924 script code, identifying the source script
Latn
-
an ISO 15924 script code, identifying the target script
1947
-
an identifier unit within the authority to identify this system
Covered languages
Currently the schemes cover Cyrillic, Armenian, Greek, Arabic and Hebrew.
Samples to play with
-
rus-Cyrl-1.txt
: Copied from the XLS output from http://www.primorsk.vybory.izbirkom.ru/region/primorsk?action=show&global=true&root=254017025&tvd=4254017212287&vrn=100100067795849&prver=0&pronetvd=0®ion=25&sub_region=25&type=242&vibid=4254017212287 -
rus-Cyrl-2.txt
: Copied from the XLS output from http://www.yaroslavl.vybory.izbirkom.ru/region/yaroslavl?action=show&root=764013001&tvd=4764013188704&vrn=4764013188693&prver=0&pronetvd=0®ion=76&sub_region=76&type=426&vibid=4764013188704
References
Reference documents are located at the interscript-references repository. Some specifications that have distribution limitations may not be reproduced there.
Links to system definitions
Copyright and license
This is a Ribose project. Copyright Ribose.