0.0
Repository is archived
No commit activity in last 3 years
No release in over 3 years
Object scraper is a thin wrapper for hpricot to enable recipe-like extraction of ruby objects from various web sites.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0.8.2
 Project Readme

Object Scraper¶ ↑

Description¶ ↑

Object scraper is a thin wrapper for hpricot to enable receipt-like extraction of ruby objects from various web sites.

Install¶ ↑

Gem¶ ↑

gem install object-scraper --source http://gemcutter.org

Rails¶ ↑

config.gem 'object-scraper', :source => 'http://gemcutter.org'

Example¶ ↑

class Entry < Object
  attr_accessor :text, :date
end

uri     = "http://twitter.com/twitter"
pattern = ".status"

Scraper.define(:twitter, :class => :entry, :source => uri, :node => pattern) do |s|
  s.text { |node| node.at(".entry-content").inner_html }
  s.date { |node| DateTime.parse(node.at(".timestamp")[:data][/\'.*\'/].delete("'")) }
end

@objects = Scraper.parse(:twitter)

If you define multiple scrapers, you can collect all their objects with one simple method

@objects = Scraper.parse_all

Advanced Example¶ ↑

It is possible to use other existing HTML parsers instead of hpricot. Just overwrite the according proc object.

require 'nokogiri'
Scraper.scrape_source_with = Proc.new { |source| Nokogiri::HTML(source) }

Scraper.define(:twitter, :class => :entry, :source => uri, :node => pattern) do |s|
  # initialize your objects here accordingly
end

Rails¶ ↑

All scraper definitions sitting in RAILS_ROOT/scrapers will be taken into account automatically when you use object-scraper as a gem in your rails project.

Author¶ ↑