MagicPipe
MagicPipe is a Ruby library to push data to remote destinations on multiple topics.
It provides client adapters for several popular message busses, and it's meant to facilitate publishing messages and streaming data, in different formats and to different backends.
Content
- Design, concepts and internals
- The moving parts
- Codecs
- Transports
- Senders
- Loaders
- Gluing everything together
- Multiple pipes
- The message payloads
- The moving parts
- Usage
- Configuration
- Dependencies
- Use cases
- Installation
Design, concepts and internals
Its design principles are:
- It should be plug and play, with minimal configuration -- it's an opinionated library.
- It should support different message formats and backends with a consistent interface.
- It should allow for multiple backends to be targeted at the same time.
- It should be extendable and customizable.
To achieve these goals, MagicPipe adopts a modular design with interchangeable parts.
The moving parts
The four main units of work are codecs, transports, senders and loaders. MagicPipe provides a set of classes out of the box, but users of the library can configure their own custom classes that implement the correct interface.
Codecs
Codecs accept a Ruby object and produce an encoded output. The input must respond to as_json
and return a Hash. The provided codecs are:
- Yaml
- JSON
- MessagePack
- Thrift (work in progress)
Transports
Transports are adapters for the different backends, and take care of establishing connections and submitting the payloads. The provided transports are:
- Debug
- Log
- HTTPS
- SQS
- Kafka (work in progress)
- DynamoDB (work in progress)
- Multi (allows to target different transports at the same time)
Senders
Senders glue things together and implement a processing strategy. The provided senders are:
- Sync
- Async (Sidekiq)
Loaders
Loaders are used with the Async sender to serialize Ruby objects into something that can be passed to Sidekiq, and then to rebuild the original Objects inside the Sidekiq workers. The provided loaders are:
- SimpleActiveRecord: it takes
ActiveRecord::Base
instances and an optional wrapper (e.g. anActiveModel::Serializer
) and turns them into class references (Strings) and a record ID. Then, when the Sidekiq jobs execute, it loads the record from the DB and wraps it in the serializer, if present.
Gluing everything together
A MagicPipe client encapsulates the configured parts. For example, clients with a single and multiple transports can be represented like this:
Client
├ Configuration
└ Sender
├ Codec
├ (Loader)
└ Transport
Client
├ Configuration
└ Sender
├ Codec
├ (Loader)
└ Transports
├ Transport A
├ Transport B
└ Transport C
A client can only have a single codec and sender, but can have multiple transports to submit data to multiple backends. This is particularly efficient because it allows to reduce the number of DB queries to re-load the data, when using the async sender.
Multiple pipes
Multiple clients with different configurations can be used together in the same process.
The main use case is to support different codecs (message formats). Some applications may in fact need to emit messages with different formats to different backends, for example JSON to both SQS and a remote HTTPS endpoint, and Thrift to Kafka.
Another use case is to use different Sidekiq queues (and worker pools) for different topics, which can be accomplished by using different MagicPipe clients for different types of objects.
The message payloads
MagicPipe wraps the payloads in message envelopes with extra metadata. These extra attributes are:
- the message topic (string)
- the producer name (string)
- the submission timestamp, captured when
client#send_data
is invoked (integer) - the payload mime type, e.g.
application/json
(string)
Some transports will additionally provide this metadata as message meta attributes. For example, the HTTPS transport will set them as custom HTTP request headers, and the SQS transport will set them as the SQS message custom attributes.
Usage
Create and configure a MagicPipe client: (Temporary API! Still a work in progress)
require "magic_pipe/senders/async"
require "magic_pipe/transports/https"
require "magic_pipe/transports/sqs"
$magic_pipe = MagicPipe.build do |mp|
mp.client_name = :my_magic_pipe
mp.producer_name = "My Awesome Service (Production)"
mp.sender = :async
mp.loader = :simple_active_record
mp.codec = :json
mp.transports = [:https, :sqs]
mp.sidekiq_options = {
queue: "magic_pipe"
}
mp.https_transport_options = {
url: "https://my.receiver.service/foo",
basic_auth: "bar:foo",
}
mp.sqs_transport_options = {
queue: "my_data_stream"
}
mp.logger = Rails.logger
mp.metrics_client = $statsd_client
end
Then, to submit a message:
$magic_pipe.send_data(
object: object,
topic: "my_topic",
wrapper: nil, # default
time: Time.now.utc # default
)
A more concrete example, with an active record object:
class Article < ActiveRecord::Base
after_commit :send_on_magic_pipe
private
def send_on_magic_pipe
$magic_pipe.send_data(
object: self,
topic: "articles",
wrapper: Serializers::InventoryArticleSerializer,
time: updated_at
)
end
end
Configuration
The MagicPipe::Config
class lists all supported configuration options and their default values.
Users of the library are strongly encouraged to set explicit values for these settings:
-
client_name
: used internally to identify a specific magic pipe client instance, for example when handing over the message publication to Sidekiq. -
producer_name
: the name of your application, as it should appear in the metadata of the published messages. -
logger
: when used in Rails, this should be set toRails.logger
. -
metrics_client
: to collect operational metrics. Wrapped byMagicPipe::Metrics
. It's mainly designed to work with DataDog's StatsD out of the box, but any stats collector library would work, if you provide a thin adapter wrapper. If not configured, the default implementation will simply send the metrics to the logger.
More specific configuration is then required for the different modules of the library, when used.
Transport: SQS
This transport requires credentials for the AWS API. The credentials need to be associated to an IAM user with full access to SQS, and need to be present in the system env:
export AWS_ACCESS_KEY_ID='foo'
export AWS_SECRET_ACCESS_KEY='bar'
export AWS_REGION='us-east-1'
Transport: HTTPS
This transport builds a Faraday connection. A number of options can be configured:
$magic_pipe = MagicPipe.build do |mp|
mp.transport = :https
mp.https_transport_options = {
url: "https://my.receiver.service/messages",
dynamic_path_builder: -> (topic) { topic }
basic_auth: "foo:bar",
timeout: 2,
open_timeout: 3,
}
end
The dynamic_path_builder
setting should be a callable that will receive the topic name. It defaults to nil
, in which case the base URL will be used as is. If present, its return value will be used in Faraday as:
faraday_connection.post do |r|
r.url dynamic_path_builder.call(current_topic)
end
Which will result in requests as:
HTTP POST https://my.receiver.service/messages/a-topic-name
Dependencies
Becasuse of MagicPipe's modular design, and in order to keep a small installation footprint, all of its dependencies are optional. Users of the library need to manually install the required dependencies in their projects.
The Ruby gems MagicPipe's modules depend on are:
- Senders:
- Async:
sidekiq
- Async:
- Transports:
- SQS:
aws-sdk-sqs
- HTTPS:
faraday
,typhoeus
- SQS:
- Codecs:
- JSON:
oj
(optional, will fallback tojson
from the stdlib ifoj
is missing) - MessagePack:
msgpack
- JSON:
Use cases
TODO
- event driven architectures
- streaming domain data to replicate it somewhere else
Installation
Add this line to your application's Gemfile:
gem 'magic_pipe'
And then execute:
$ bundle
Or install it yourself as:
$ gem install magic_pipe