Fetch
Fetch enables easy fetching of data from multiple web sources. It was extracted from Bogrobotten where we use it to fetch prices and other stuff from multiple merchants. We use it for price comparison, but you can use it for anything that involves fetching data from external sources.
Fetch uses the Typhoeus gem for fast and reliable asynchronous fetches from multiple URLs.
Installation
Add this line to your application's Gemfile:
gem "fetch"
Then run:
$ bundle
Example
In app/models/user.rb:
class User < ActiveRecord::Base
def fetcher
@fetcher ||= UserFetcher.new(self)
end
end
In app/fetchers/user_fetcher.rb:
class UserFetcher < Fetch::Base
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
end
In lib/facebook/user_info_fetch.rb:
module Facebook
class UserInfoFetch < Fetch::Module
include Fetch::Simple
include Fetch::JSON
url do
"http://graph.facebook.com/#{fetchable.login}"
end
process do |user_info|
fetchable.update_attribute :facebook_id, user_info["id"]
end
end
end
In lib/github/user_info_fetch.rb
module Github
class UserInfoFetch < Fetch::Module
include Fetch::JSON
# Request for user ID
request do |req|
req.url = "https://api.github.com/users/#{fetchable.login}"
req.process do |user|
fetchable.update_attribute :github_id, user["id"]
end
end
# Request for repos
request do |req|
req.url = "https://api.github.com/users/#{fetchable.login}/repos"
req.process do |repos|
repo_names = repos.map { |r| r["name"] }
fetchable.update_attribute :github_repos, repo_names
end
end
end
end
Then, when everything is set up, you can do:
user = User.find(123)
user.fetcher.fetch
This will run three requests – one for Facebook and two for GitHub – and update the user model with a Facebook user ID, a GitHub user ID, and a list of GitHub repos.
Good to know
Doing something before a fetch
If you need to run something before a fetch is started, you can do it using the
before_fetch
callback.
class UserFetcher < Fetch::Module
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
before_fetch do
# Do something before the fetch.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => `before_fetch` is run before fetching
Note: If you define more than one before_fetch
callback, they are run in the order
in which they were defined.
Doing something after a fetch
If you need to run something after a fetch is completed, you can do it using
the after_fetch
callback.
class UserFetcher < Fetch::Module
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
after_fetch do
# Do something after the fetch has completed.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => `after_fetch` is run after fetching
Note: If you define more than one after_fetch
callback, they are run in
the reverse order of which they were defined.
Adding defaults to your requests
Each fetch module has a defaults
callback that you can use to set up defaults
for all requests in that modules.
class UserInfoFetch < Fetch::Module
defaults do |req|
req.user_agent = "My Awesome Bot!"
end
request do |req|
req.url = "http://test.com"
req.process do |body|
# Do some processing
end
end
end
This will add the user agent My Awesome Bot!
to all requests in the
UserInfoFetch
module.
The defaults
callback is inherited, like all other callbacks, so if you have
a base fetch class that you subclass, the defaults
callback in the superclass
will be run in all subclasses.
Handling HTTP failures
HTTP failures can be handled using the failure
callback. If you want to
handle failures for all requests generally, you can use the module-wide
failure
callback:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
end
failure do |code, url|
Rails.logger.info "Fetching from #{url} failed: #{code}"
end
end
If you want to handle failures on the specific requests instead:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
req.failure do |code, url|
# Handle the failure
end
end
end
When you handle failures directly on the request, the general failure
callback isn't called.
Note: If you don't specify a failure
callback at all, HTTP failures are ignored,
and processing skipped for the failed request.
Handling fetch errors
Sometimes a URL will return something that potentially makes your processing
code fail. To prevent this from breaking your whole fetch, you can handle
errors using the error
callback:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
end
error do |exception|
Rails.logger.info "An error occured: #{exception.message}\n" +
exception.backtrace.join("\n")
raise exception if ["development", "test"].include?(Rails.env)
end
end
You can also do it directly on the requests:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
req.error do |exception|
# Handle the error
end
end
end
If you handle errors directly on the requests, the general error
callback
isn't run.
Note: If you don't do any error handling in one of the two ways shown above, any exceptions that occur when processing will be raised, causing the whole fetch to fail. So please add error handling 😊
General error handling
If you need to ensure that something is run, even if anything in the fetch
fails, you can add an error
callback to your Fetch::Base
subclass.
class UserFetcher < Fetch::Base
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
before_fetch do
this_fails!
end
error do |e|
# Do something that must be done,
# even if the fetch fails.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => raises an exception, but the error callback will be run before that.
Parsing JSON
Fetch has a module for automatically parsing the request body as JSON before it is sent to the process block.
class UserInfoFetch < Fetch::Module
include Fetch::JSON
request do |req|
req.url = "http://api.test.com/user"
req.process do |json|
# Do something with the JSON.
end
end
end
Dynamically loading fetch modules
You can load fetch modules dynamically using the load
callback. Normally, the
modules defined with modules
are instantiated directly. When you use the
load
callback, this will determine how your modules are loaded.
class UserFetcher < Fetch::Base
modules :user_info_fetch, :status_fetch
load do |modules|
namespaces.product(modules).map do |path|
path.join("/").camelize.safe_constantize
end.compact
end
private
def namespaces
[:github, :facebook]
end
end
This will load the modules Github::UserInfoFetch
, Github::StatusFetch
,
Facebook::UserInfoFetch
and Facebook::StatusFetch
, if they are present.
The load
callback is only run once, so you can safely inherit it – only the
last one defined will be run.
Initializing fetch modules
Normally, a fetcher is initialized with an optional fetchable
that is sent
along to the fetch modules when they are initialized. You can change how this
works with the init
callback.
Let's say you have a Search
model with a SearchFetcher
that gets results
from various search engines. Normally, the Search
instance would be sent to
the fetch modules as a fetchable. Let's say you just want to send the keyword
to reduce coupling.
In app/fetchers/search_fetcher.rb:
class SearchFetcher < Fetch::Base
modules Google::KeywordFetch,
Bing::KeywordFetch
init do |klass|
klass.new(fetchable.keyword)
end
end
In lib/base/keyword_fetch.rb:
module Base
class KeywordFetch < Fetch::Module
attr_reader :keyword
def initialize(keyword)
@keyword = keyword
end
end
end
In lib/google/keyword_fetch.rb:
module Google
class KeywordFetch < Base::KeywordFetch
request do |req|
req.url = "https://www.google.com/search?q=#{CGI::escape(keyword)}"
req.process do |body|
# Do something with the body.
end
end
end
end
And lib/bing/keyword_fetch.rb something similar to Google.
Then:
search = Search.find(123)
SearchFetcher.new(search).fetch
Now the keyword will be sent to the fetch modules instead of the fetchable.
Changelog
See the changelog for changes in the different versions.
Contributing
Contributions are much appreciated. To contribute:
- Fork the project
- Create a feature branch (
git checkout -b my-new-feature
) - Make your changes, including tests so it doesn't break in the future
- Commit your changes (
git commit -am 'Add feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new pull request
Please do not touch the version, as this will be updated by the owners when the gem is ready for a new release.