DocParser
DocParser is a web scraping/screen scraping tool.
You can use it to easily scrape information out of HTML documents.
The gem is called docparser. You can find the documentation here.
Features
- XPath and CSS support through Nokogiri
- Support for parallel processing of the documents
- 6 Output formats:
- CSV
- XLSX
- HTML
- YAML
- JSON
- Screen (for debugging and development)
- And more! (easy to extend)
Installation
Add this line to your application's Gemfile:
gem 'docparser'
And then execute:
bundle
Or install it yourself using:
gem install docparser
Usage
See example.rb
Todo
- Better examples and documentation
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request