A ruby gem for continuous source retrieval and data integration.
Aeternitas provides means to regularly "poll" resources (i.e. a website, twitter feed or API) and to permanently store retrieved results. By default it avoids putting too much load on external servers and stores raw results as compressed files on disk. Aeternitas can be configured to a wide variety of polling strategies (e.g. frequencies, cooldown periods, ignoring exceptions, deactivating resources, ...).
Aeternitas is meant to be included in a rails application and expects a working sidekiq/redis setup and any kind of database backend. All meta-data is stored in two database tables (aeternitas_pollable_meta_data and aeternitas_sources) while metrics are stored in Redis and raw data as compressed files on disk.
Installation
Add this line to your application's Gemfile:
gem 'aeternitas'
And then execute:
$ bundle install
$ rails generate aeternitas:install
$ rake db:migrate
to install gem and generate tables and initializers needed for aeternitas.
Quickstart
Aeternitas expects you wanting to store single pollables as ActiveRecord Objects. For instance you might want to monitor several Websites for the usage of the word æternitas and store the websites old states for later analysis. Using Aeternitas you would create your website model and tables
$ rails generate model Website url:string aeternitas_word_count:integer
And then include Aeternitas
class Website < ApplicationRecord
include Aeternitas::Pollable
polling_options do
polling_frequency :weekly
end
end
For now we are satisfied with aeternitas default setting [TODO:link] except for the polling_frequency. We only want to
check once a week.
Next up we have to implement the websites poll
method.
def poll
page_content = Net::HTTP.get(URI.parse(self.url))
add_source(page_content) #store the retrieved page permanently
aeternitas_word_count = page_content.scan('aeternitas').size
update(aeternitas_word_count: aeternitas_word_count)
end
The poll method is called every time a pollable is good to go. In our example this would be once a week. The time at which
aeternitas will execute the poll method is determined by the pollable metadata stored in a separate table and may be
checked using the next_polling
method on a website (note: there are several advanced error states which may or may not
allow a pollable to be polled).
Assuming you have already setup sidekiq the only thing left is to regularly run Aeternitas.enqueue_due_pollables
and have a worker consuming the "polling" queue.
In most cases it makes sense to store polling results as sources to allow further work to be done in separate jobs.
In above example we already added the page_content
as a source to the website. Aeternitas thereby only stores a new source
if the sources fingerprint does not yet exist (i.e. MD5 Hash of the page_content). If we wanted to process the word count in
a separate job the following implementation would allow to do so.
class Website < ApplicationRecord
include Aeternitas::Pollable
polling_options do
polling_frequency :weekly
end
def poll
page_content = Net::HTTP.get(URI.parse(self.url))
new_source = add_source(page_content) #returns nil if source already exists
CountWordJob.perform_async(new_source.id) if new_source
end
end
class CountWordJob
include Sidekiq::Worker
def perform(source_id)
source = Aeternitas::Source.find(source_id)
page_content = source.raw_content
aeternitas_word_count = page_content.scan('aeternitas').size
website = source.pollable
website.update(aeternitas_word_count: aeternitas_word_count)
end
end
Configuration
Global Configuration
In this configuration you can specify the global settings for Æternitas. The configuration should be stored in
config/initializers/aeternitas.rb
redis
This option specifies the redis connection details. Æternitas uses redis to for resource locking and to store statistics.
Aeternitas.configure do |config|
config.redis = {host: localhost, port: 6379}
end
For configuration options you can have a look here: redis-rb
storage_adapter
Default: Aeternitas::StorageAdapter::File
Æternitas by default stores source file in compressed files on disk. If you however want to store them in another way you
can do so easily by implementing the Aeternitas::StorageAdapter
interface. For an example you can have a look at
Aeternitas::StorageAdapter::File
.
To specify which storage adapter Æternitas should use, just pass the class name to this option:
Aeternitas.configure do |config|
config.storage_adapter = Aeternitas::StorageAdapter::File
end
storage_adapter_options
Some storage adapter need some extra configuration. The file adapter for example needs to now where to store the files:
Aeternitas.configure do |config|
config.storage_adapter_options = {
directory: File.join(Rails.root, 'public', 'sources')
}
end
Pollable Configuration
Pollables can be configured on a per class base using the polling_options
block.
polling_frequency
Default: :daily
This option controls how often a pollable is polled and can be configured in two different ways.
Either use one of the presets specified in Aeternitas::PollingFrequency
by specifiey the presets name as a symbol
polling_options do
polling_frequency :weekly
end
If want to specify a more complex polling schema you can also us a custom method which returns the time and date of the next polling. For example if you want to increase the frequency depending on the pollables age you could use the following method:
polling_options do
# set frequency depending elements age (+ 1 month for every 3 months)
polling_frequency ->(context) { 1.month + (Time.now - context.created_at).to_i / 3.month * 1.month }
end
before_polling
Default: []
Specifies methods that are run before the polling is executed. You can either specify a method name or a lamba
polling_options do
# run the pollables `foo` and `bar` methods
before_polling :foo, :bar
# run a custom block
before_polling ->(pollable) { puts "About to poll #{pollable.id}"}
end
after_polling
Specify methods run after polling was successful. See before_polling
deactivation_errors
Default: []
Specify error clases which, once they occur, will instantly deactivate the pollable. This can be useful for example if the error implied that the resource does not exist any more.
polling_options do
# deactivate the pollable if the tweet was not found
deactivation_errors Twitter::Error::NotFound
end
ignore_error
Default: []
Errors specified as ignored errors are wrapped within Aeternitas::Errors::Ignored
. which is then raised instead.
This is supposed to be used in combination with error tracking systems like Airbrake. Instead globally telling Airbrake which errors to
ignore, you can do this on a per pollable basis
polling_options do
# don't log an error if the twitter api is down
ignore_error Twitter::Error::ServiceUnavailable
end
sleep_on_guard_lock
_Default: true
With this option set to true, if a pollable can't acquire the lock, it will sleep until the guard_timeout expires, effectively blocking the Sidekiq queue from processing any other jobs. This should only be used, if you know that all the jobs within this queue will try to access the same resource and you want to pause the entire queue.
queue
Default: 'polling'
This option specifies the Sidekiq queue into which the poll job will be enqueued.
guard_key
Default: "#{obj.class.name}"
This option specifies the guard key. This can be done by either specifying a method name or a custom block. Default is to lock on pollable class level. Therefor only one job at a time per pollable class will be executed to avoid DDOSing by accident.
polling_options do
# use the urls host as lock key
guard ->(website) { URI.parse(website.url).host }
end
Web UI
Æternitas also has a monitoring interface which can be integrated with a rail application.
To use it add this line to your application's Gemfile:
gem 'aeternitas_web_ui'
Then run
$ bundle install
And mount the engine
# config/routes.rb
mount Aeternitas::WebUi::Engine => '/aeternitas'
For more information head over to the project page: https://github.com/FHG-IMW/aeternitas_web_ui
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and spec backed pull requests are welcome on GitHub at https://github.com/FHG-IMW/aeternitas. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.