GOV.UK: Seed the Crawler
This gem retrieves a list of seed URLs from the GOV.UK sitemap and adds them to RabbitMQ so that the crawler can consume them.
Installation
Add this line to your application's Gemfile:
gem 'govuk_seed_crawler'
And then execute:
$ bundle
Or install it yourself as:
$ gem install govuk_seed_crawler
Usage
To run with the RabbitMQ connection defaults:
bundle exec seed-crawler https://www.gov.uk/
Run with --help
to see a list of options:
bundle exec seed-crawler --help
Deployment
The gem is automatically deployed to RubyGems when the gem version is updated on main
. (Don't forget to add to the CHANGELOG!
For the new gem version to be used on GOV.UK, you'll need to update the reference in govuk-puppet.
Contributing
- Fork it ( http://github.com/{my-github-username}/govuk_seed_crawler/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request