Project

rudachi

0.0
No release in over a year
A JRuby wrapper for Sudachi.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Rudachi

JRuby wrapper for Sudachi.

Text

Rudachi::TextParser.parse('東京都へ行く')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\n\t助詞,格助詞,*,*,*,*\t\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

File

File.open('input.txt', 'w') { |f| f << '東京都へ行く' }
Rudachi::FileParser.parse('input.txt')
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\n\t助詞,格助詞,*,*,*,*\t\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

IO

Rudachi::StreamParser.parse(StringIO.new('東京都へ行く'))
=> "東京都\t名詞,固有名詞,地名,一般,*,*\t東京都\n\t助詞,格助詞,*,*,*,*\t\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"
Rudachi::TextParser.new(o: 'output.txt', m: 'A').parse('東京都へ行く')
File.read('output.txt')
=> "東京\t名詞,固有名詞,地名,一般,*,*\t東京\n\t名詞,普通名詞,一般,*,*,*\t\n\t助詞,格助詞,*,*,*,*\t\n行く\t動詞,非自立可能,*,*,五段-カ行,終止形-一般\t行く\nEOS\n"

Requirements

For Ruby, please check rudachi-rb.

Installation

  1. Install JAR and dictionary of Sudachi (Details)
Install the Sudachi JAR file
$ wget https://github.com/WorksApplications/Sudachi/releases/download/v0.5.3/sudachi-0.5.3-executable.zip
$ unzip sudachi-0.5.3-executable.zip
$ ls sudachi-0.5.3
LICENSE-2.0.txt  README.md  javax.json-1.1.jar	jdartsclone-1.2.0.jar  licenses  sudachi-0.5.3.jar  sudachi.json  sudachi_fulldict.json
Install the Sudachi dictionary
$ wget http://sudachi.s3-website-ap-northeast-1.amazonaws.com/sudachidict/sudachi-dictionary-latest-full.zip
$ unzip -j -d sudachi-dictionary-latest-full sudachi-dictionary-latest-full.zip
$ mv sudachi-dictionary-latest-full/system_full.dic sudachi-dictionary-latest-full/system_core.dic
$ ls sudachi-dictionary-latest-full
LEGAL  LICENSE-2.0.txt	system_core.dic
  1. Install Rudachi
# Gemfile
gem 'rudachi'

Then run bundle install.

  1. Initialize Rudachi
require 'rudachi'

Rudachi.configure do |config|
  config.jar_path = 'sudachi-0.5.3/sudachi-0.5.3.jar'
end

Rudachi::Option.configure do |config|
  config.p = 'sudachi-dictionary-latest-full'
end
  1. Did it !!
Rudachi::TextParser.parse('こんにちは世界')
=> "こんにちは\t感動詞,一般,*,*,*,*\t今日は\n世界\t名詞,普通名詞,一般,*,*,*\t世界\nEOS\n"