Rudachi
JRuby wrapper for Sudachi.
Text
Rudachi::TextParser.parse('東京都へ行く')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"
File
File.open('input.txt', 'w') { |f| f << '東京都へ行く' }
Rudachi::FileParser.parse('input.txt')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"
IO
Rudachi::StreamParser.parse(StringIO.new('東京都へ行く'))
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"
With some options
Rudachi::TextParser.new(o: 'output.txt', m: 'A').parse('東京都へ行く')
File.read('output.txt')
=> "東京\t名詞,固有名詞,地名,一般,*,*\t東京\n都\t名詞,普通名詞,一般,*,*,*\t都\nへ\t助詞,格助詞,*,*,*,*\tへ\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"
Requirements
For Ruby, please check rudachi-rb.
Installation
- Install JAR and dictionary of Sudachi (Details)
Install the Sudachi JAR file
$ wget https://github.com/WorksApplications/Sudachi/releases/download/v0.5.3/sudachi-0.5.3-executable.zip
$ unzip sudachi-0.5.3-executable.zip
$ ls sudachi-0.5.3
LICENSE-2.0.txt README.md javax.json-1.1.jar jdartsclone-1.2.0.jar licenses sudachi-0.5.3.jar sudachi.json sudachi_fulldict.json
Install the Sudachi dictionary
$ wget http://sudachi.s3-website-ap-northeast-1.amazonaws.com/sudachidict/sudachi-dictionary-latest-full.zip
$ unzip -j -d sudachi-dictionary-latest-full sudachi-dictionary-latest-full.zip
$ mv sudachi-dictionary-latest-full/system_full.dic sudachi-dictionary-latest-full/system_core.dic
$ ls sudachi-dictionary-latest-full
LEGAL LICENSE-2.0.txt system_core.dic
- Install Rudachi
# Gemfile
gem 'rudachi'
Then run bundle install
.
- Initialize Rudachi
require 'rudachi'
Rudachi.configure do |config|
config.jar_path = 'sudachi-0.5.3/sudachi-0.5.3.jar'
end
Rudachi::Option.configure do |config|
config.p = 'sudachi-dictionary-latest-full'
end
- Did it !!
Rudachi::TextParser.parse('こんにちは世界')
=> "こんにちは\t感動詞,一般,*,*,*,*\t今日は\n世界\t名詞,普通名詞,一般,*,*,*\t世界\nEOS\n"