Image Downloader¶ ↑
Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.
Installation¶ ↑
sudo gem install image_downloader
Requirements¶ ↑
-
ruby 1.8 or 1.9
-
gem nokogiri
Description¶ ↑
Image Downloader is a rather simple library which does the following:
-
get web page (with Net::HTTP)
-
parse html page (use regexp or nokogiri)
-
download images (in one or multi-threads)
Example usage¶ ↑
After installation, you can use the following code as an example:
require 'rubygems' require 'image_downloader' page_url = 'www.test.com' target_path = 'img_dir/' downloader = ImageDownloader::Process.new(page_url,target_path) ##### # download all images on page in any place (by regexp, all that look like url with image) downloader.parse(:any_looks_like_image => true) ##### or # download images from all elements where usually images placed (<img...>, <a...>, ...) downloader.parse() ##### or # download image from exect places in page downloader.parse(:collect => {:link_icon => true}) ##### or # download images by regexp downloader.parse(:regexp => /[^'"]+\.jpg/i) downloader.download()
For “parse” method available following options
# find all url which contain image extansion :any_looks_like_image => true # find images in specified location :collect => { :all => true, # all image places :(img_src|a_href|style_url|link_icon) => true # specified location } # find by regexp :regexp => /['"]([^'"]+\.jpg)[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method) :regexp => /[^'"]+\.jpg/i # the same, but shorter :regexp => /[^'"]+\.css/ # other files can also be downloaded # ignore URLs with images according to given parameters :ignore_without => {:(extension|image_extension) => true} # setting the favorite User-Agent (vary important for exclude 403, 404... responses from server) :user_agent => "ruby" # Mozilla/5.0 by default
Detailed location description
-
img_src - tag: img, attribute: src=“url”
-
a_href - tag: a, attribute: href=“url”
-
style_url - tag: any, attribute: style=“(background|background-image): url(‘url’)”
-
link_icon - tag: link, attribute: rel=“shortcut icon” href=“url”
For “download” method you can use following directives
:parallel => true # for multi thread downloading (this is default if no options) :consequentially => true, # for sequential downloading into a single stream :user_agent => "ruby" # Mozilla/5.0 by default
Executables¶ ↑
You can simply use the executed shell commands:
For any looks like image download
download_any_images url dir/
For download favicon only
download_icon url dir/
For download all, that is located in the places for pictures
download_images url dir/
For download by regexp
download_by_regexp url dir/ "[^'\"]+\\.js"
Debugging¶ ↑
“-d”, “–debug”
To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.
download_images url dir/ -d
Copyright¶ ↑
Copyright © 2011 Malykh Oleg. See LICENSE.txt for further details.
License¶ ↑
The MIT License
Authors¶ ↑
Personal blog author: Malykh Oleg - blog in russian