HTML parsing

nokogiri

1297

927
74
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors. XML is like violence - if it doesn’t solve your problems, you are not using enough of it. Last commit: Thu, 22 Jul 2010 12:04:52 +0000

gem install nokogiri

Downloads: 316003

v1.4.3
19

hpricot

294

166
36
a swift, liberal HTML parser with a fantastic library Last commit: Mon, 17 May 2010 04:28:20 +0000

gem install hpricot

Downloads: 178819

v0.8.2
131525

scrubyt

251

247
28
scRUBYt! is an easy to learn and use, yet powerful and effective web scraping framework. It's most interesting part is a Web-scraping DSL built on HPricot and WWW::Mechanize, which allows to navigate to the page of interest, then extract and query data records with a few lines of code. It is hard to describe scRUBYt! in a few sentences - you have to see it for yourself! Last commit: Mon, 25 May 2009 17:07:26 +0000

gem install scrubyt

Downloads: 3943

v0.4.06
2866

scrapi

56

72
3
scrAPI is an HTML scraping toolkit for Ruby. It uses CSS selectors to write easy, maintainable scraping rules to select, extract and store data from HTML content. Last commit: Mon, 25 Aug 2008 20:41:23 +0000

gem install scrapi

Downloads: 3066

v1.2.0
2959

libxml-ruby

17

10
2
The Libxml-Ruby project provides Ruby language bindings for the GNOME Libxml2 XML toolkit. It is free software, released under the MIT License. Libxml-ruby's primary advantage over REXML is performance - if speed is your need, these are good libraries to consider, as demonstrated by the informal benchmark below. Last commit: Sun, 02 May 2010 21:38:42 +0000

gem install libxml-ruby

Downloads: 90757

v1.1.4
32966