Graboid
Simply awesome web scraping. Better docs later. See specs.
0.3.4 Update
http://twoism.posterous.com/new-graboid-dsl
Installation
gem install nokogiri graboid
Usage
%w{rubygems graboid}.each { |f| require f }
class RedditEntry
include Graboid::Scraper
selector '.entry'
set :title
set :domain, :selector => '.domain a'
set :link, :selector => '.title' do |entry|
entry.css('a').first['href']
end
page_with do |doc|
doc.css('p.nextprev a').select{|a| a.text =~ /next/i }.first['href']
end
before_paginate do
puts "opening page: #{self.source}"
puts "collection size: #{self.collection.length}"
puts "#{"*"*100}"
end
end
@posts = RedditEntry.new( :source => 'http://reddit.com' ).all( :max_pages => 2 )
@posts.each do |p|
puts "title: #{p.title}"
puts "domain: #{p.domain}"
puts "link: #{p.link}"
puts "#{"*"*100}"
end
##Note on Patches/Pull Requests
- Fork the project.
- Make your feature addition or bug fix.
- Add tests for it. This is important so I don't break it in a future version unintentionally.
- Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
- Send me a pull request. Bonus points for topic branches.
Copyright
Copyright (c) 2010 Christopher Burnett. See LICENSE for details.