Small utility that allows you to detect scripts (languages) in unicode text.
###Usage:
gem install unicode-script
Suppose you have a string
h = 'ひらがな'
then you can detect the script to which this string belongs:
UnicodeScript.detect(h) # => [{:script => 'Hiragana', :value => 'ひらがな'}]
It doesn't work well on strings with spaces and delimeters:
mixed_string = "This is latin, カタカナ and ひらがな"
UnicodeScript.detect(mixed_string) # => {:script=>"basic latin", :value=>"Thisislatin,and"}
# {:script=>"katakana", :value=>"カタカナ"}
# {:script=>"hiragana", :value=>"ひらがな"}
You can also check if string belongs to certain script:
UnicodeScript.hiragana?(h) # => true
kanji = '漢字'
UnicodeScript.cjk_unified_ideographs?(kanji) # => true
UnicodeScript.katakana?(h) # => false