Project

graboid

0.01
No commit activity in last 3 years
No release in over 3 years
web scraping made easier
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 1.2.9

Runtime

 Project Readme

Graboid

Graboid

Simply awesome web scraping. Better docs later. See specs.

0.3.4 Update

http://twoism.posterous.com/new-graboid-dsl

Installation

gem install nokogiri graboid

Usage

%w{rubygems graboid}.each { |f| require f }

class RedditEntry
  include Graboid::Scraper

  selector '.entry'

  set :title
  set :domain, :selector => '.domain a'
  
  set :link,   :selector => '.title' do |entry| 
    entry.css('a').first['href'] 
  end

  page_with do |doc|
    doc.css('p.nextprev a').select{|a| a.text =~ /next/i  }.first['href']
  end

  before_paginate do
    puts "opening page: #{self.source}"
    puts "collection size: #{self.collection.length}"
    puts "#{"*"*100}"
  end

end

@posts = RedditEntry.new( :source => 'http://reddit.com' ).all( :max_pages => 2 )

@posts.each do |p| 
  puts "title: #{p.title}"
  puts "domain: #{p.domain}"
  puts "link: #{p.link}"
  puts "#{"*"*100}"
end

##Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2010 Christopher Burnett. See LICENSE for details.