RCrawler
The wrapper of capybara for crawler.
Dependencies
- nokogiri requires libxml2.
- capybara-webkit requires qt. capybara-webkit wiki
Installation
Add this line to your application's Gemfile:
gem 'rcrawler'
And then execute:
$ bundle
Or install it yourself as:
$ gem install rcrawler
Usage
Crawl
require "rcrawler"
RCrawler.crawl do
# Some capybara dsl
visit("https://example.com/login")
page.fill_in("name", with: "user")
page.fill_in("password", with: "secret")
page.click_button("send")
page.save_screenshot("/tmp/example.png")
# Screenshot shortcut
# visit(arg[0]) and page.save_screenshot(arg[1])
screenshot("http://example.com", "/tmp/example.png")
# Nokogiri
# doc is return Nokogiri::HTML(page.html)
visit("http://example.com")
doc.css("a.some_link").each {|a| puts a.attr("href")}
end
Configuration
RCrawler.configure do |c|
c.threads = 10 # => default is 8
c.timeout = 20 # => default is 10
c.timeout_proc = :ignore # => default is :raise
end
Async processing
RCrawler.async do
crawl do
# do something
end
crawl do
# do something
end
crawl do
# do something
end
end
Command
% rcrawler help
Commands:
rcrawler help [COMMAND] # Describe available commands or one specific command
rcrawler sc http://example.com # Get screen shot
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request