SiteMapper
Map all links on a given site.
SiteMapper will try to respect /robots.txt
Works great with Wayback Archiver a gem that crawls your site and submits each URL to the Internet Archive (Wayback Machine).
Installation
Install the gem:
gem install site_mapper
Usage
Command line usage:
# Crawl all found links on page
# that has example.com domain
site_mapper example.com
Ruby usage:
# Crawl all found links on page
# that has example.com domain
require 'site_mapper'
SiteMapper.map('example.com') do |new_url|
puts "New URL found: #{new_url}"
end
# Log to STDOUT
SiteMapper.map('example.com', logger: :system) do |new_url|
puts "New URL found: #{new_url}"
end
Docs
You can find the docs online on RubyDoc.
This gem is documented using yard
(run from the root of this respository).
yard # Generates documentation to doc/
Contributing
Contributions, feedback and suggestions are very welcome.
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Notes
- Special thanks to the robots gem, which provided the bulk of the code in
lib/robots.rb
Alternatives
There are a couple of great alternatives, which are more mature and has more features than this Gem and has. Please feel free to check them out: