Google Safe Browsing Plugin
A Rails 3 plugin for Google Safe Browsing API v2. It supports Google malware and phishing list.
Installation
Add the plugin to your Gemfile
gem 'google_safe_browsing_plugin', '~> 0.1'
After bundle install, run the following to generate the db migration, model classes and other code
bundle install
bundle exec rails g google:install
Run the migrations generated from the previous step, and then seed the databse
bundle exec rake db:migrate
bundle exec rake google:safe_browsing:db_seed
Edit the configration file with your Google API key
# Edit config/google_safe_browsing.yml, and replace the real API key in line 2.
Build the hash prefix data locally
The plugin stores the hash prefixes in the relational database. The following rake task needs to be run under a cron schedule to keep the local data in sync with the Google server lists. It may require several runs initially before you have a relatively complete hash prefix set before you can do any meaningful full hash lookup. The first run may take a while because it needs to download quite a bit of data and store them in the local database. The initial run of the rake task could generate quarter a million shavar records in the database.
bundle exec rake google:safe_browsing:load_remote
Url lookup
After you run the 'load_remote' rake task several times, your local cache of the hash prefixes will be ready. Now you can start to do url lookup. Start the Rails console, and then try the following
url = 'financestudyhelp.com'
r = Google::SafeBrowsingHelper.lookup_url url
Since the Google Safe Browsing data get updated frequently, the previous query may not necessarily generate hit on Malware, other urls you can try are 'http://gumblar.cn' and 'http://ianfette.org'.
Upon a match on full-length hash lookup, the lookup_url call will return a hash object that contains the match. E.g.
{"financestudyhelp.com"=>["goog-malware-shavar"]}
The key of the hash is the url that's been queried. The array is the match themselves. If the url is both a malware and a phishing link, the value will be
["goog-malware-shavar","googpub-phish-shavar"]
If the url is neither a malware nor phishing link, the lookup result will be an empty array [].
Uninstall
If you want to uninstall the gem and remove the generated files
rails d google:install
Features and limitations
- The plugin does not hide the ActiveRecord models and the helper inside the Gem. It instead uses the generator function provided by Rails and Thor to copy the template model/migration/helper code to the Rails application source tree. The generated code is under the Google namespace.
- The plugin uses the relational database as the data store. But it doesn't have to. Redis may be a better choice considering lookup speed and cache expire. But the local data have to be kept under certain limit. The tables generated from the migration have google as the prefix.
- The hash prefix is stored in the database as plain text but not in binary/encoded format.
- The config parameters, in config/google_safe_browsing.yml, can be changed to your need. E.g. full_length_hash_expires is for how long the full-length hash will be cached locally.
- The plugin has backoff strategy built in when error happens. So if you notice that the full-length hash request doesn't go to Google, it probably is still in the backoff mode that prevents the request being sent to Google.
- Some RSpec tests are provided in the spec folder.
- The google:safe_browsing:load_remote rake task cannot be run repeatedly within a short time span. It honors the NEXT instruction from Google's response. This value is kept in google_functions#next_updated_at. You have to change the value if you want to run it immediately after the previous run.
- When Google sends out 'reset' instruction to the request, the plugin will not clean up the local data by default. You can change the behavior to reset the data by adding the following configuration in application.rb or the environment file in your Rails app:
# This will reset your local data
config.google_safe_browsing_upon_reset = lambda {
Google::SafeBrowsingShavar.delete_all
}
- 'rekey' instruction is not supported.
- The plugin works for Rails 3.2. But it should be relatively trivial to make it work with other Rails version. The reason for using Rails is mostly because of the value ActiveRecord provides for the data mapping and store. The download, parsing and encoding/decoding of the Google Safe Browsing data do not have to use Rails.
Reference
- Safe Browsing API v2
- google-safe-browsing project on Google code.
Copyright (c) 2013 stonelonely and contributors, released under the MIT license.