slasherrb
This project is actually the ruby version of slasherjs. Slasher is a library that could extract the main content of an HTML article document. The result of extraction is depending of assumption on HTML document structure itself. Therefore, there may be flaws in the result if the document doesn't match the structure that is recognised by the library. This condition will make the library will be improved from time to time.
How To Install
Like other rubygems, just:
gem install slasher
or put this on your Gemfile
gem 'slasher'
How To Use
To use the library, you need to have an HTML document first.
require 'net/http'
require 'slasher'
uri = URI("http://sea-games-2015.liputan6.com/read/2252937/all-indonesia-finals-ganda-putra-sumbang-emas")
html = Net::HTTP.get(uri)
slasher = Slasher.new(html)
content = slasher.slash
#content variable will have the main content of the HTML document (article).
Website Coverage
This library has been tested against some websites and you can see the complete list in this document
TODO
- Add more test cases: international websites
- Performance analysis
- Better API documentation