HttpReader
Read any document on internet and parse to your own format :D
Installation
Add this line to your application's Gemfile:
gem 'http_reader'
And then execute:
$ bundle
Or install it yourself as:
$ gem install http_reader
Usage
engine = HttpReader::Engine.new(opts)
engine.read('http://www.google.com')
Available opts [Hash]
- parsers: list of document parser Classes [ default: [] ]
- default_parser: parser used when none parser was match for url [default: HashPageParser]
- http_client: http_client for downloading pages sources, [default: HTTParty]
- browser: browser_client to processing and download source, [default: Watir::Browser]
- logger: default: Logger
Examples
Usage default_parser as HashPageParser
engine = HttpReader::Engine.new
read_opts = {title: 'h1', items: '.content li;array'}
engine.read('http://example.org', read_opts)
Where page body is:
<h1>Information</h1>
<p>not importante</p>
<div class="content">
Items: <ul><li>A</li><li>B</li><li>C</li></ul>
</div>
Result should be:
{:title=>"Information", :items=>%w{A B C}}
Usage own Parser class
Class body:
Class TestParser < BasePageParser
@pattern = /^((http|https):\/\/www.google.com)$/
class << self
def browse_actions_for_html(browser, opts = {})
div = browser.div(id: 'als')
raise 'Cannot find div' unless div.exists?
div.html
end
def parse(response, opts = {})
n_body = Nokogiri::HTML(response.body)
{ text: n_body.css('p').text }
end
def use_browser
true
end
end
end
initializtion:
engine = HttpReader::Engine.new(default_parser: TestParser)
engine.read('http://www.google.com')
Or
engine = HttpReader::Engine.new(parsers: [TestParser])
engine.read('http://www.google.com')
Or
engine = HttpReader::Engine.new
engine.read('http://www.google.com', parser: TestParser)
More info about syntax
Dependecies
Gems
- nokogiri
- httparty
- headless
- watir-webdriver
System components
- xvfb instalation on ubuntu: sudo apt-get install xvfb
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request