NewsCrawler
NewsCrawler is a flexible, modular web crawler intended to provide website analysis framework.
Installation
gem install news_crawler
Getting started
To crawl a site (e.g. www.example.com) with default configuration and modules
news_crawler www.example.com
You can resume crawling by invoke without any arguments.
news_crawler
For more informations about configuration, modules development see NewsCrawler's page
Requirements
- Ruby >= 1.9.3
- MongoDB
Caution
This is a prelease version, so API can be changed significantly.
Copyright
Copyright (C) 2013 Hà Quang Dương contact@haqduong.net
NewsCrawler is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
NewsCrawler is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with NewsCrawler. If not, see http://www.gnu.org/licenses/.