Category: Web Content Scrapers

Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.

2019

2020

2021

2022

2023

2024

238,080

747

0.7.4

2010-03-26

2010-12-19

Show more project details Compare

link_thumbnailer

0.29

No release in over 3 years

Low commit activity in last 3 years

link_thumbnailer gottfrois/link_thumbnailer Homepage

Ruby gem generating thumbnail images from a given URL.

2019

2020

2021

2022

2023

2024

718,700

512

3.4.0

2012-08-19

2020-07-24

Show more project details Compare

cobweb

0.12

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

cobweb stewartmckee/cobweb Homepage

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2019

2020

2021

2022

2023

2024

333,448

226

1.2.1

2010-11-10

2021-01-09

Show more project details Compare

data_miner

0.11

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

data_miner seamusabshere/data_miner Homepage

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

2019

2020

2021

2022

2023

2024

484,997

302

3.0.0

115

2009-10-30

2014-02-04

Show more project details Compare

tanakai

0.09

Low commit activity in last 3 years

tanakai glaucocustodio/tanakai Homepage

Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2019

2020

2021

2022

2023

2024

46,544

277

1.7.4

2022-08-13

2024-09-30

Show more project details Compare

sinew

0.08

Low commit activity in last 3 years

No release in over a year

sinew gurgeous/sinew Homepage

Crawl web sites easily using ruby recipes, with caching and nokogiri.

2019

2020

2021

2022

2023

2024

54,752

255

4.0.1

2012-06-04

2023-08-19

Show more project details Compare

boilerpipe-ruby

0.05

No commit activity in last 3 years

No release in over 3 years

boilerpipe-ruby gregors/boilerpipe-ruby Homepage

A pure ruby implementation of the boilerpipe web content extraction algorithm

2019

2020

2021

2022

2023

2024

1,389,081

0.5.0

2016-03-13

2021-02-15

Show more project details Compare

fletcher

0.03

No release in over 3 years

Low commit activity in last 3 years

fletcher hulihanapplications/fletcher Homepage

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

2019

2020

2021

2022

2023

2024

82,181

0.6.9

2011-12-07

2014-05-12

Show more project details Compare

docparser

0.01

No commit activity in last 3 years

No release in over 3 years

docparser jurriaan/docparser Homepage

DocParser is a Ruby Gem for webscraping

2019

2020

2021

2022

2023

2024

33,216

0.3.0

2013-04-11

2020-04-13

Show more project details Compare

arachnid2

0.01

No release in over 3 years

Low commit activity in last 3 years

arachnid2 samnissen/arachnid2 Homepage

A simple, fast web crawler

2019

2020

2021

2022

2023

2024

30,909

0.4.0

2018-05-29

2020-07-15

Show more project details Compare

horsefield

0.0

No commit activity in last 3 years

No release in over 3 years

horsefield apa512/horsefield Homepage

It's a scraper

2019

2020

2021

2022

2023

2024

97,007

0.6.1

2013-08-25

2020-05-29

Show more project details Compare

url_scraper

0.0

No commit activity in last 3 years

No release in over 3 years

url_scraper super-engineer/url_scraper Homepage

A simple plugin for extracting information from url entered by user (Something like what facebook does). This gem is built on top of opengraph gem created by michael bleigh.

2019

2020

2021

2022

2023

2024

9,318

0.0.5