Blinkr
A broken page and link checker for websites. Optionally uses phantomjs to render pages to check resource loading, links created by JS, and report any JS page load errors.
typhoeus, which can execute up to 200 parallel requests, and cache the results is used to check links.
Installation
Add this line to your application's Gemfile:
gem 'blinkr'
And then execute:
$ bundle
Or install it yourself as:
$ gem install blinkr
If you wish to use phantomjs, install phantomjs for your platform http://phantomjs.org/download.html
Usage
Blinkr determines which pages to load from your sitemap.xml
. To run blinkr
against your site checking every a[href]
link on all your pages:
blinkr -u http://www.jboss.org
If you want to customize blinkr, create a config file blinkr.yaml
. For example:
# Links and pages not to check (may be a regexp or a string)
skips:
- !ruby/regexp /^\/video\/((?!91710755).)*\/$/
- !ruby/regexp /^\/quickstarts\/((?!eap\/kitchensink).)*\/*$/
- !ruby/regexp /^\/boms\/((?!eap\/jboss-javaee-6_0).)*\/*$/
- !ruby/regexp /^\/archetypes\/((?!eap\/jboss-javaee6-webapp-archetype).)*\/*$/
# Errors to ignore when generating the output. Each ignore should be a hash
# containing a url (may be regexp or a string), an error code (integer) and a
# error message (may be a regexp or a string)
ignores:
- url: http://www.acme.com/foo
message: Not Found
- url: !ruby/regexp /^https?:\/\/(www\.)?acme\.com\/bar\/
code: 500
# The output file to write the report to
report: _tmp/blinkr.html
# The URL to check (often specificed on the command line)
base_url: http://www.jboss.org
# Specify the URL to the sitemap to use, rather than the default <base_url>/sitemap.xml
sitemap: http://www.jboss.org/my_sitemap.xml
# Specify the 'browser' used to load each page from the sitemap. By default
# typhoeus is used, which will fetch the sources of each page in parallel
# (fast).
# Alternatively, you can use phantomjs, which will process the javascript and
# CSS. This allows any links generated by javascript as well as any resources
# loaded by the page/javascript to be checked. Additionally, any JS errors are
# reported. To use phantomjs, you must make sure the native binary is available
# on your path.
browser:phantomjs
# The number of times to try reloading a link, if the server doesn't respond or
# refuses the connection. If the retry limit is exceeded, it will be reported as
# 'Server timed out' in the report. By default 3.
max_retrys: 3
# The number times to try reloading a page. You may want to increase this if you
# find errors in the console that a page cannot be loaded
max_page_retrys: 3
# Allows blinkr to ignore fragments (#foo) which can reduce the number of URLs
# to check. By default false.
ignore_fragments: true
# Control the number of threads used to run phantomjs. By default 8.
phantomjs_threads: 8
# Export the report to phantomjs
You can specify a custom config file on the command link:
blinkr -c my_blinkr.yaml
If you want to see more details about the URLs blinkr is checking, you can use
the -v
option:
blinkr -u http://www.jboss.org -v
If you need to debug why a particular URL is being reported as bad using blinkr, but works in your web browser, you can load a single URL using typhoeus:
blinkr -c my_blinkr.yaml -s http://www.acme.com/corp
Additionally, you can specify the -w
option to tell libcurl to run in verbose
mode (this is very verbose, so normally used with -s
):
blinkr -c my_blinkr.yaml -s http://www.acme.com/corp -v
Extending Blinkr
Blinkr is based around a pipeline. Issues with the pages are collected, analysed, and then passed to the report for transformation and rendering. Additional sections may appended to the report.
To add extensions to blinkr, you need to define a custom pipeline. The pipeline
is defined in a ruby file (e.g. blinkr.rb
)
require 'acme/spellcheck'
Blinkr::Extensions::Pipeline.new do |config|
# define the default extensions
extension Blinkr::Extensions::Links.new config
extension Blinkr::Extensions::JavaScript.new config
extension Blinkr::Extensions::Resources.new config
# define custom extensions
extension ACME::Extensions::SpellCheck.new config
end
NOTE: You must add the default extensions to a custom pipeline, for them to be executed.
The pipeline is defined in blinkr.yaml
:
# Use a custom pipeline
pipeline: blinkr.rb
An extension is just a standard Ruby class. It should declare an
initialize(config)
method, and may declare one or more of:
collect(page)
analyze(context, typhoeus)
transform(page, error, default_html)
append(context)
Each method is called as the pipeline progresses. Arguments passed are:
-
page
- a object containing the tyhpoeusresponse
, the pagebody
(as a Nokogiri HTML document), an array oferrors
for the page, anyresource_errors
which ocurred when the page was loaded, and anyjavascript_errors
which ocurred when the page was loaded -
context
- a map ofurl
=>page
s which are being analysed. After the analyze phase, and before the transform phase, any pages with no errors are removed from the context -
typhoeus
- a wrapper around typhoeus, defining aprocess
method and aprocess_all
method, both of which take aurl
and aretry
limit, and accept a block to execute when a response is returned. -
error
- an individual error, consisting of atype
, aurl
, atitle
, acode
, amessage
, adetail
, asnippet
and an fontawesomeicon
class -
default_html
- the default HTML used to display the error
transform
should return the HTML used to display the error. append
should
return any HTML to be appended to the report. A templating language, such as
slim or haml may be used to generate the HTML.
The build extensions, in lib/blinkr/extensions are good examples of how extensions can perform broken link analysis, or collect and format resource loading and javascript execution errors.
Contributing
- Fork it ( http://github.com/pmuir/blinkr/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request