ImgFetcher
ImgFetcher is a command-line tool that given a plaintext file containing URLs (one per line), downloads all of them to the local hard disk.
Installation
From RubyGems you can download it with the following command:
$ gem install img_fetcher
Or, you can clone this repository and build and install the gem by yourself:
- Clone the repository.
- Install Ruby dependencies:
$ bundle install
- Build and install the gem with the Rake command:
$ bundle exec rake install
For development
Temporary folders must be created in order to run the tests, so you must run the following bash command to setup the development environment:
$ ./bin/setup
This will run bundle install
, and create tmp/
and spec/support/tmp/
directories.
Usage
After installing the gem, you will be able to run the script with the command line.
$ img_fetcher -f plaintext.txt -o output_directory/
You can type img_fetcher --help
at the terminal for more information.
Usage: img_fetcher -f <file_path> [options...]
-f, --file FILE_PATH [REQUIRED] Fetch and store the images from each line from the given file
-o, --output OUTPUT_DIRECTORY Specify the output directory
-V, --version Show version number and quit
-v, --verbose Make the operation more talkative
-t, --threaded Run the command with multiple threads
Regarding the OUTPUT_DIRECTORY
, folder MUST exist. In case it doesn't, files will be stored in the current directory (./
).
Threaded option
Regarding the --threaded
option, it's a basic ruby thread usage. Further improvements will be to limit the amount of threads with a pool of threads. Only the ImgFetcher::Stats
class is synchronized with a Mutex. I don't really know if puts
must be synchronized given that it's constantly accessing to stdout.
Output
If --verbose
option is selected, then the output of the command will be shown at the terminal with the following structure:
FILE ROW INDEX, STATUS, FILE ORIGINAL LINE
The command returns the downloaded files preserving their original filenames (whenever is possible) at the end, starting with 6 random characters to avoid collisions between already existing files.
Downloading file from URL
Regarding the download of files from a URL, the first approach will be using open-uri
, but knowing that this input will be generated by external users, open-uri
has some limitations and security issues if it's nothandled carefully. After doing some research, Down gem takes care of all these issues for you, as well as valid URL, file size, timeouts, number of redirects, connectivity, and more.
For this case, we limit the maximum number of redirects to 0 and there's no limit about the file size. Looking for an improvement, both can be added as a command-line option in a future.
Possible improvements
- If URLs are repeated along the file, don't fetch them again.
- Creating a pool of threads for further customization.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/francoprud/img_fetcher. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
License
ImgFetcher gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the ImgFetcher project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.