0.01
No commit activity in last 3 years
No release in over 3 years
A simple lib for downloading pictures from web pages. It can get and parse page with different options and receive images from specified locations. It is possible to download images simultaneously in multiple threads or sequentially. In fact, it's picture downloader or picture grabber from web pages, which allows you to download photos (.jpg, .jpeg, .png, .gif, .ico, .svg, .bmp) and not only them.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.0.0
~> 1.6.4
>= 0
~> 2.3.0

Runtime

>= 1.4.4
 Project Readme

Image Downloader¶ ↑

Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.

Installation¶ ↑

sudo gem install image_downloader

Requirements¶ ↑

  • ruby 1.8 or 1.9

  • gem nokogiri

Description¶ ↑

Image Downloader is a rather simple library which does the following:

  • get web page (with Net::HTTP)

  • parse html page (use regexp or nokogiri)

  • download images (in one or multi-threads)

Example usage¶ ↑

After installation, you can use the following code as an example:

require 'rubygems'
require 'image_downloader'

page_url = 'www.test.com'
target_path = 'img_dir/'
downloader = ImageDownloader::Process.new(page_url,target_path)

#####
# download all images on page in any place (by regexp, all that look like url with image)
downloader.parse(:any_looks_like_image => true)

##### or
# download images from all elements where usually images placed (<img...>, <a...>, ...)
downloader.parse()

##### or
# download image from exect places in page
downloader.parse(:collect => {:link_icon => true})

##### or
# download images by regexp
downloader.parse(:regexp => /[^'"]+\.jpg/i)

downloader.download()

For “parse” method available following options

# find all url which contain image extansion
:any_looks_like_image => true

# find images in specified location
:collect => {
  :all => true, # all image places
  :(img_src|a_href|style_url|link_icon) => true # specified location
}

# find by regexp
:regexp => /['"]([^'"]+\.jpg)[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method)
:regexp => /[^'"]+\.jpg/i # the same, but shorter
:regexp => /[^'"]+\.css/  # other files can also be downloaded

# ignore URLs with images according to given parameters
:ignore_without => {:(extension|image_extension) => true}

# setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)
:user_agent => "ruby" # Mozilla/5.0 by default

Detailed location description

  • img_src - tag: img, attribute: src=“url”

  • a_href - tag: a, attribute: href=“url”

  • style_url - tag: any, attribute: style=“(background|background-image): url(‘url’)”

  • link_icon - tag: link, attribute: rel=“shortcut icon” href=“url”

For “download” method you can use following directives

:parallel => true # for multi thread downloading (this is default if no options)
:consequentially => true, # for sequential downloading into a single stream
:user_agent => "ruby" # Mozilla/5.0 by default

Executables¶ ↑

You can simply use the executed shell commands:

For any looks like image download

download_any_images url dir/

For download favicon only

download_icon url dir/

For download all, that is located in the places for pictures

download_images url dir/

For download by regexp

download_by_regexp url dir/ "[^'\"]+\\.js"

Debugging¶ ↑

“-d”, “–debug”

To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.

download_images url dir/ -d

Copyright © 2011 Malykh Oleg. See LICENSE.txt for further details.

License¶ ↑

The MIT License

Authors¶ ↑

Personal blog author: Malykh Oleg - blog in russian