Project

tyccl

0.03
No commit activity in last 3 years
No release in over 3 years
"tyccl(同义词词林 哈工大扩展版) is a ruby gem that provides friendly functions to analyse similarity between Chinese Words."
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.5
>= 0
 Project Readme

Tyccl

tyccl(同义词词林 哈工大扩展版) is a ruby gem that provides friendly functions to analyse similarity between Chinese Words.

all of Tyccl`s source files using charset: UTF-8
Finding algorithm using Tire and Hash, Time complexity O(m) m<=5, Space complexity O(n), n is proportional to the records of Cilin.
Cilin.txt(892.6KB).

Installation

Add this line to your application's Gemfile:

gem 'tyccl'  
gem 'algorithms'

And then execute:

$ bundle

Or install it yourself as:

$ gem install tyccl  
$ gem install algorithms  

Usage

simple example:

  
  # Result_t = Struct.new(:value,:x_id,:y_id)
  # this struct is used to return analysing result
  # * field 'value' store the analysing value
  # * field 'x_id' 'y_id' store the ID of word X and Y 
	
  require 'tyccl'
 
  # Given wordA(string) and wordB(string). 
  # Returns a Struct Result_t which contains idA, idB, and shortest semantic distance(int) between wordA and wordB. 

  	result = Tyccl.dist("西红柿","黄瓜") 
	  	puts result.value
	  	puts result.x_id
	  	puts result.y_id

  # Given wordA(string) and wordB(string).
  # Returns a Struct Result_t which contains the most similar Pairs wordA`s ID and wordB`s ID, and similarity(float) between idA and idB.
  	result = Tyccl.sim("西红柿","黄瓜")
	  	puts result.value
	  	puts result.x_id
	  	puts result.y_id

  # Given a word(string) and a level(int),level`s value range is [0,4],4 is default, value of level is more bigger, the similarity between returned words and the given word is more less.   
  # Returns a two dimensional array that contains the parameter Word`s similar words which divided by different ID that the word matchs.
  # If the word has no similar, nil is returned.

	m = Tyccl.get_similar("人")  
	puts m
	#[	["人", "士", "人物", "人士", "人氏", "人选"],
 	#	["成年人", "壮年人", "大人", "人", "丁", "壮丁", "佬", "中年人"],
 	#	["身体", "人"],
 	#	["人格", "人品", "人头", "人", "品质", "质地", "格调", "灵魂", "为人"],
 	#	["人数", "人头", "人口", "人", "口", "丁", "家口", "食指", "总人口"]	]

download and see more methods in api doc and more examples in test.

Contributing

  1. Fork it ( http://github.com/JoeWoo/tyccl/fork )
  2. Create your feature branch (git checkout -b fork-new)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin fork-new)
  5. Create new Pull Request