###Site Checker
Site Checker is a simple ruby gem, which helps you check the integrity of your website by recursively visiting the referenced pages and images. I use it in my test environments to make sure that my websites don't have any dead links.
Install
gem install site_checker
Usage
In Test Code
First, you have to load the site_checker
by adding this line to the file where you would like to use it:
require 'site_checker'
If you want to use it for testing, the line should goto the test_helper.rb
.
The usage is quite simple:
check_site("http://localhost:3000/app", "http://localhost:3000")
puts collected_remote_pages.inspect
puts collected_local_pages.inspect
puts collected_remote_images.inspect
puts collected_local_images.inspect
puts collected_problems.inspect
The snippet above will open the http://localhost:3000/app
link and will look for links and images. If it finds a link to a local page, it will recursively checkout out that page, too. The second argument - http://localhost:3000
- defines the starting reference of your website.
In case you don't want to use a DSL like API you can still do the following:
SiteChecker.check("http://localhost:3000/app", "http://localhost:3000")
puts SiteChecker.remote_pages.inspect
puts SiteChecker.local_pages.inspect
puts SiteChecker.remote_images.inspect
puts SiteChecker.local_images.inspect
puts SiteChecker.problems.inspect
Using on Generated Content
If you have a static website (e.g. generated by octopress) you can tell site_checker
to use folders from the file system. With this approach, you don't need a webserver for verifying your website:
check_site("./public", "./public")
puts collected_problems.inspect
Configuration
You can instruct site_checker
to ignore certain links:
SiteChecker.configure do |config|
config.ignore_list = ["/", "/atom.xml"]
end
By default it won't check the conditions of the remote links and images - e.g. 404 or 500 -, but you can change it like this:
SiteChecker.configure do |config|
config.visit_references = true
end
Too deep recursive calls may be expensive, so you can configure the maximum depth of the recursion with the following attribute:
SiteChecker.configure do |config|
config.max_recursion_depth = 3
end
Examples
Make sure that there are no local dead links on the website (I'm using rspec syntax):
before(:each) do
SiteChecker.configure do |config|
config.ignore_list = ["/atom.xml", "/rss"]
end
end
it "should not have dead local links" do
check_site("http://localhost:3000", "http://localhost:3000")
# this will print out the difference and I don't have to re-run with print
collected_problems.should be_empty
end
Check that all the local pages can be reached with maximum two steps:
before(:each) do
SiteChecker.configure do |config|
config.ignore_list = ["/atom.xml", "/rss"]
config.max_recursion_depth = 2
end
@number_of_local_pages = 100
end
it "all the local pages have to be visited" do
check_site("http://localhost:3000", "http://localhost:3000")
collected_local_pages.size.should eq @number_of_local_pages
end
Command line
From version 0.3.0 the site checker can be used from the command line as well. Here is the list of the available options:
~ % site_checker -h
Visits the <site_url> and prints out the list of those URLs which cannot be found
Usage: site_checker [options] <site_url>
-e, --visit-external-references Visit external references (may take a bit longer)
-m, --max-recursion-depth N Set the depth of the recursion
-r, --root URL The root URL of the path
-i, --ignore URL Ignore the provided URL (can be applied several times)
-p, --print-local-pages Prints the list of the URLs of the collected local pages
-x, --print-remote-pages Prints the list of the URLs of the collected remote pages
-y, --print-local-images Prints the list of the URLs of the collected local images
-z, --print-remote-images Prints the list of the URLs of the collected remote images
-h, --help Show a short description and this message
-v, --version Show version
Troubleshooting
undefined method 'new' for SiteChecker:Module
This error occurs when the test code calls v0.1.1 methods, but a newer version of the gem has already been installed. Update your test code following the examples above.
Copyright
Copyright (c) 2013 Zsolt Fabok and Contributors. See LICENSE for details.