chawan ====== A cup for chasen that provides an easy to use for extracting Japanese Methods ======= * Chawan.parse(text) parse the given text by analyzer, where default analyzer is :mecab * Chawan.analyzer(xxx) (same as Chawan[xxx], Chawan.xxx) specify analyzer Class ===== * Chawan::Nodes (Chawan.parse returns a Chawan::Nodes) #noun : scope category with noun #verb : scope category with verb #grep : scope category with given pattern #compact : mix the category-consecutive nodes * Chawan::Node (Chawan::Nodes has many Chawan::Node(s)) #category : part of speech #word : text #attributes : keys and vals hash Example ======= text = '登録された利用者' # 'parse' returns a Chawan::Nodes Chawan.parse(text) => [<名詞: '登録'>, <動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>, <名詞: '利用'>, <名詞: '者'>] # Chawan::Nodes is enumerable Chawan.parse(text).select{|node| node.category == '名詞'} => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>] # gateway interface: noun Chawan.parse(text).noun => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>] # gateway interface: verb Chawan.parse(text).verb => [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>] # gateway interface: grep Chawan.parse(text).grep(/動詞/) => [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>] Chawan.parse(text).grep('動詞') => [<動詞: 'さ'>, <動詞: 'れ'>] # gateway interface: compact Chawan.parse(text).compact => [<名詞: '登録'>, <動詞: 'され'>, <助動詞: 'た'>, <名詞: '利用者'>] Chawan.parse(text).compact(/動詞/) => [<名詞: '登録'>, <動詞: 'された'>, <名詞: '利用'>, <名詞: '者'>] # gateway interface is chainable Chawan.parse(text).noun.verb => [] # chainable is fun! Chawan.parse(text).noun => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>] Chawan.parse(text).compact.noun => [<名詞: '登録'>, <名詞: '利用者'>] Chawan.parse(text).noun.compact => [<名詞: '登録利用者'>] Analyzer ======== Parser engine is defined as 'analyzer'. Available analyzers are: * mecab : (default) * chasen Chawan[:mecab].parse('test') => [<名詞: 'test'>] # same as # Chawan.mecab.parse('test') # Chawan.analyzer(:mecab).parse('test') # Chawan.parse('test') # default analyzer is :mecab Chawan[:chasen].parse('test') => [<記号: 't'>, <記号: 'e'>, <記号: 's'>, <記号: 't'>] Required ======== * UTF-8 * 'mecab' unix command (and its path) Todo ==== * use open3 rather than backquote for executing unix commands Author ====== maiha@wota.jp
Project
chawan
A cup for chasen that provides an easy to use for extracting Japanese
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Development
Dependencies
Project Readme