Wire
Run a strict amount of threads during a time interval, primarily used for web scraping.
How to use
Example 1 - Basic
Start 100 threads, only run 10 at the same time, with a 3 second delay between each new thread, except the first 10.
100.times do
Wire.new(max: 10, wait: 3) do
# Do stuff
end
end
Example 2 - Timer
11.times do
Wire.new(max: 10, wait: 1) do
sleep 0.1
end
end
Time to run: ~ 1.2 seconds.
This is how it works.
- 11 threads is created, done at time 0.
- Running 10 threads, done at time 0.1
- Wait 1 second, done at time 1.1
- Start the 11th thread, done at time 1.2
Example 3 - Pass arguments
Wire.new(max: 10, wait: 1, vars: ["A", "B"]) do |first, last|
puts first # => "A"
puts last # => "B"
end
100.times do |n|
Wire.new(max: 10, wait: 1, vars: [n]) do |counter|
puts counter
end
end
# => 1 2 3 4 5 ...
Example 4 - Scraping
This project was originally build to solve the request limit problem when using Spotify´s Meta API.
In order to make the Metadata API snappy and open for everyone to use, rate limiting rules apply. If you make too many requests too fast, you’ll start getting 403 Forbidden responses. When rate limiting has kicked in, you’ll have to wait 10 seconds before making more requests. The rate limit is currently 10 request per second per ip. This may change.
We wanted to make as many request as possible without being banned due to the rate limit.
require "rest-client"
require "wire"
require "uri"
a_very_large_list_of_songs = ["Sweet Home Alabama", ...]
a_very_large_list_of_songs.each do |s|
Wire.new(max: 10, wait: 1, vars: [s]) do |song|
data = RestClient.get "http://ws.spotify.com/search/1/track.json?q=#{URI.encode(song)}"
# Do something with the data
end
end
Tip
Don't forget to join your threads using Thread#join
.
list = []
10.times do |n|
list << Thread.new do
# Do stuff
end
end
list.map(&:join)
Read more about #join here.
Arguments to pass
Ingoing arguments to new
.
-
max (Integer) The maximum amount of threads to run a the same time. The value 10 will be used if
max
is nil or zero. - wait (Integer) The time to wait before starting a new thread.
- vars (Array) A list of arguments to the block.
- silent (Boolean) The given block will not raise error if set to true. Default is false.
- timeout (Integer) The maximum time to run one thread, default is no limit.
- retries (Integer) How many times should we retry? Default is 0.
- delay (Float) Time between each retry. Default is 0.
How do install
[sudo] gem install wire
Requirements
Wire is tested on OS X 10.6.7 using Ruby 1.9.2.