Category: Web Content Scrapers

33%

56%

2013-01-26

link_thumbnailer gottfrois/link_thumbnailer Homepage Documentation Source Code Bug Tracker Wiki

link_thumbnailer

0.21

No release in over 3 years

Low commit activity in last 3 years

Ruby gem generating thumbnail images from a given URL.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

820,811

511

103

Releases

3.4.0

2012-08-19

2020-07-24

Activity

88%

80%

2020-08-27

data_miner seamusabshere/data_miner Homepage Documentation Source Code Bug Tracker Wiki

data_miner

0.08

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

518,725

305

Releases

3.0.0

115

2009-10-30

2014-02-04

Activity

63%

100%

2013-07-08

cobweb stewartmckee/cobweb Homepage Documentation Source Code Bug Tracker Wiki

cobweb

0.08

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

351,662

224

Releases

1.2.1

2010-11-10

2021-01-09

Activity

50%

57%

2016-04-07

tanakai glaucocustodio/tanakai Homepage Documentation Source Code Bug Tracker

tanakai

0.07

Low commit activity in last 3 years

No release in over a year

Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

53,229

290

Releases

1.7.5

2022-08-13

2025-02-10

Activity

50%

66%

2021-12-23

sinew gurgeous/sinew Homepage Documentation Source Code Bug Tracker

sinew

0.06

Low commit activity in last 3 years

No release in over a year

Crawl web sites easily using ruby recipes, with caching and nokogiri.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

62,227

255

Releases

4.0.1

2012-06-04

2023-08-19

Activity

100%

37%

2018-04-28

boilerpipe-ruby gregors/boilerpipe-ruby Homepage Documentation Source Code Bug Tracker Wiki

boilerpipe-ruby

0.03

No commit activity in last 3 years

No release in over 3 years

A pure ruby implementation of the boilerpipe web content extraction algorithm

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

1,407,092

Releases

0.5.0

2016-03-13

2021-02-15

Activity

42%

2019-05-06

fletcher hulihanapplications/fletcher Homepage Documentation Source Code Bug Tracker Wiki

fletcher

0.02

No commit activity in last 3 years

No release in over 3 years

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

86,783

Releases

0.6.9

2011-12-07

2014-05-12

Activity

75%

23%

2014-01-04

docparser jurriaan/docparser Homepage Documentation Source Code Bug Tracker Wiki

docparser

0.01

No commit activity in last 3 years

No release in over 3 years

DocParser is a Ruby Gem for webscraping

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

36,809

Releases

0.3.0

2013-04-11

2020-04-13

Activity

100%

2016-05-08

arachnid2 samnissen/arachnid2 Homepage Documentation Source Code Bug Tracker Wiki

arachnid2

0.01

No release in over 3 years

Low commit activity in last 3 years

A simple, fast web crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

35,529

Releases

0.4.0

2018-05-29

2020-07-15

Activity

100%

69%

2019-04-15

horsefield apa512/horsefield Homepage Documentation Source Code Bug Tracker Wiki

horsefield

0.0

Low commit activity in last 3 years

A long-lived project that still receives updates

It's a scraper

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

103,438

Releases

0.7.1

2013-08-25

2025-05-17

Activity

2015-12-11