unicode-script

Small utility that allows you to detect scripts (languages) in unicode text.

###Usage:

gem install unicode-script

Suppose you have a string

h = 'ひらがな'

then you can detect the script to which this string belongs:

UnicodeScript.detect(h) # => [{:script => 'Hiragana', :value => 'ひらがな'}]

It doesn't work well on strings with spaces and delimeters:

mixed_string = "This is latin, カタカナ and ひらがな"
UnicodeScript.detect(mixed_string) # => {:script=>"basic latin", :value=>"Thisislatin,and"} 
                                   #    {:script=>"katakana", :value=>"カタカナ"}
                                   #    {:script=>"hiragana", :value=>"ひらがな"}

You can also check if string belongs to certain script:

UnicodeScript.hiragana?(h) # => true
kanji = '漢字'
UnicodeScript.cjk_unified_ideographs?(kanji) # => true
UnicodeScript.katakana?(h) # => false

unicode-script

Development

unicode-script