Scrapula
Scrapula is a library for scraping web pages that simplifies some of the common actions that are involved.
It has a very simple API that can be used in several ways and contexts, and another, shorter, that facilitates processing pages when characters are scarce, like irb / pry, or quick and dirty scripts.
Requirements
It uses Mechanize and Nokogiri to obtain and extract the information and RSpec for testing.
Configuration
If you want to show the output of some steps:
Scrapula.verbose = true
API
Perform requests:
page = Scrapula.get 'example.net' #=> Scrapula::Page object
page = Scrapula.post 'example.net', { q: 'a query' } #=> Scrapula::Page object
Extract information from the page:
# Using a CSS selector (all elements)
page.search! 'a'
# Using a CSS selector (fist element)
page.at! 'h1'
# Using XPath (fist element)
page.at! '//'
Perform a GET request:
Scrapula.get 'example.net
S interface
This API is not required by default, so it is up to you to use it:
require 'scrapula/s'
It provides the method and its shortcut For all HTTP verbs:
S.get 'example.net'
S.g 'example.net'
S.post 'example.net'
S.p 'example.net'
S.put 'example.net'
S.u 'example.net'
S.patch 'example.net'
S.a 'example.net'
S.delete 'example.net'
S.d 'example.net'
S.head 'example.net'
S.h 'example.net'
Additionally, GET requests, can be performed with through the shortest invocation:
S 'example.net'
Examples
There are more examples in the examples
folder.
Changelog
You can read previous changes in CHANGELOG.md
Contributing
Authors
Juan A. MartÃn Lucas (https://github.com/j-a-m-l)
License
This project is licensed under the MIT license. See LICENSE for details.