Bulbasaur
Bulbasaur was created with the objective of sharing components used by the Preadly crawler. It is a module for crawler operations, which uses the XML parser Nokogiri. Bulbasaur helps to simplify those HTML operations. This is one of contributions of pread.ly to the open source community.
Installation
Add this line to your application's Gemfile:
gem 'bulbasaur'
Or to get the latest updates:
gem 'bulbasaur', github: 'preadly/bulbasaur', branch: 'master'
And then execute:
$ bundle
Or install it manually with:
$ gem install bulbasaur
Usage
Bulbasaur has three main operations: Extract, Replace and Other.
Extract
Has four sub-operations:
- ExtractImagesFromHTML
- ExtractImagesFromYoutube
- ExtractImagesFromVimeo
- ExtractImagesFromAllResorces
html = "<img src='test.jpg' alt='test' /><img src='test-2.jpg' alt='test' />"
images = Bulbasaur::ExtractImagesFromHTML.new(html).call
puts images #print [{url: 'test.jpg', alt='alt'}, {url: 'test-2.jpg', alt='test'}]
Replaces
Has two sub-operations:
- ReplacesByTagImage
- ReplacesByTagLink
html = "<img src='test.jpg' alt='test' />"
image_replaces = [{original_image_url:"test.jpg", url: "new-image.png"}]
Bulbasaur::ReplacesByTagImage.new(html, image_replaces).call
puts html #print <img src='new-image.png' alt='test' />
Others
- NormalizeURL
base_url = 'http://github.com'
context_url = 'preadly'
url = Bulbasaur::NormalizeURL.new(base_url, context_url).call
puts url #print http://github.com/preadly
For more information about the components, run the RSpec tests with parameter --format d
.
rspec --format d --color
Contributing
- Fork it ( https://github.com/preadly/bulbasaur );
- Create your feature branch (
git checkout -b my-new-feature
); - Commit your changes (
git commit -am 'Add some feature'
); - Push to the branch (
git push origin my-new-feature
); - Create a new Pull Request.