Project

skrape

0.0
No commit activity in last 3 years
No release in over 3 years
A cute little DSL for picking information off of web pages.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.5
~> 0.9
~> 2.14
~> 1.7

Runtime

~> 1.6
 Project Readme

Skrape

Skrape provides a cute DSL for extracting information from pages on the web. You give it a url and a block and it gives you back a hash.

Installation

Add this line to your application's Gemfile:

gem 'skrape'

And then execute:

$ bundle

Or install it yourself as:

$ gem install skrape

Usage

Lets say you have a page like this:

<html><body><h1>I am a title</h1></body></html>

And you want a hash like this:

{title: "I am a title"}

You can run Skrape like this:

results = Skrape::Page.new("http://example.com").extract do
  extract_title with: 'h1'
end

Skrape will give you the elements text by default, but in cases where you are after something else, or need to do some massaging of the data, you can pass a block:

results = Skrape::Page.new(url).extract do
  extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end

The element(s) will be passed into the block as a Nokogiri::XML::NodeSet for you to play with. Whatever text you return will be added to the hash of things to return.

For those moments when you want an error raised when a selector returns nothing you can add:

results = Skrape::Page.new(url).extract do
  error_when_selector_returns_nothing true
  extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end

Contributing

  1. Fork it ( http://github.com/sleepycat/skrape/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request