No commit activity in last 3 years
No release in over 3 years
A port of Perl's Unidecoder to Ruby. Transliterates Unicode strings to an ASCII approximation.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Unidecoder

This library provides methods to transliterate Unicode characters to an ASCII approximation.

The functionality in this library was originally written by Russel Norris for his Stringex library. This gem is an extraction of the Unicode transliteration functionality from Stringex into a separate library with some added functionality.

The Unidecoder component of Stringex is itself a port of Sean M. Burke's Unidecode Perl module.

Installation

gem install unidecoder

Usage

"olá, mundo!".to_ascii                 #=> "ola, mundo!"
"你好".to_ascii                        #=> "Ni Hao "
"Jürgen Müller".to_ascii               #=> "Jurgen Muller"
"Jürgen Müller".to_ascii("ü" => "ue")  #=> "Juergen Mueller"

Extra stuff

If you also install either the Unicode or Active Support gems, Unidecoder will also perform Unicode normalization before attempting to transliterate strings to ASCII.

Starting with version 2.0.0 this gem requires >= 2.6 as it depends on Ruby's String#unicode_normalize.

Warnings

While this is a neat trick, in practice many transliterations end up being fairly useless. For example, all Chinese characters are transliterated to Mandarin Chinese. Since Japanese uses Chinese characters writing, but pronounces them differently from Mandarin, this makes the transliteration of Japanese with this library useless.

Some languages, like Russian, would most correctly transliterate some letters based on context, rather than a 1-1 mapping with ASCII. This library does not do that.

Other languages, like Hebrew and Arabic, don't write vowels, but assume them from context, so the ASCII representation of these langages given by this library will look fairly ugly to native speakers.

Basically, your milage may vary. I don't speak every language used by this library, so there are certain to be limitations and errors. Your feedback is most appreciated!