Skrape
Skrape provides a cute DSL for extracting information from pages on the web. You give it a url and a block and it gives you back a hash.
Installation
Add this line to your application's Gemfile:
gem 'skrape'
And then execute:
$ bundle
Or install it yourself as:
$ gem install skrape
Usage
Lets say you have a page like this:
<html><body><h1>I am a title</h1></body></html>
And you want a hash like this:
{title: "I am a title"}
You can run Skrape like this:
results = Skrape::Page.new("http://example.com").extract do
extract_title with: 'h1'
end
Skrape will give you the elements text by default, but in cases where you are after something else, or need to do some massaging of the data, you can pass a block:
results = Skrape::Page.new(url).extract do
extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end
The element(s) will be passed into the block as a Nokogiri::XML::NodeSet for you to play with. Whatever text you return will be added to the hash of things to return.
For those moments when you want an error raised when a selector returns nothing you can add:
results = Skrape::Page.new(url).extract do
error_when_selector_returns_nothing true
extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end
Contributing
- Fork it ( http://github.com/sleepycat/skrape/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request