Project

shifty

0.0
Low commit activity in last 3 years
No release in over a year
Shifty provides tools for coordinating simple workers which consume a supplying queue and emit corresponding work products, valuing pure functions, carefully isolated side effects, and extremely low coupling.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

 Project Readme

Gem Version Tests Maintainability Test Coverage

The Shifty Framework

"How many Ruby fibers does it take to screw in a lightbulb?"

Quick Start

Add this line to your application's Gemfile:

gem 'shifty'

And then execute:

$ bundle

Or install it yourself:

$ gem install shifty

And then use it:

require 'shifty'

Shifty DSL

The DSL provides convenient shorthand for several common types of workers. Let's look at them.

But first, be sure to include the DSL mixin:

include Shifty::DSL

Source Worker

At the headwater of every Shifty pipeline, there is a source worker which is able to generate its own work items.

Here is possibly the simplest of source workers, which provides the same string each time it is invoked:

worker = source_worker { "Number 9" }

worker.shift #=> "Number 9"
worker.shift #=> "Number 9"
worker.shift #=> "Number 9"
# ...ad nauseum...

Sometimes you want a worker which generates until it doesn't:

counter = 0
worker = source_worker do
  if counter < 3
    counter += 1
    counter + 1000
  end
end

worker.shift #=> 1001
worker.shift #=> 1002
worker.shift #=> 1003
worker.shift #=> nil
worker.shift #=> nil (and will be nil henceforth)

If all you want is a source worker which generates a series of numbers the DSL provides an easier way to do it:

w1 = source_worker [0,1,2]
w2 = source_worker (0..2)

w1.shift == w2.shift #=> 0
w1.shift == w2.shift #=> 1
w1.shift == w2.shift #=> 2
w1.shift == w2.shift #=> nil (and henceforth, etc.)

Relay Worker

This is perhaps the most "normal" kind of worker in Shifty. It accepts a value and returns some transformation of that value:

squarer = relay_worker do |number|
  number ** 2
end

Of course a relay worker needs a source from which to get values upon which to operate:

source = source_worker (0..3)

squarer.supply = source

Or, if you prefer, the DSL provides a vertical pipe for linking the workers together into a pipeline:

pipeline = source | squarer

pipeline.shift #=> 0
pipeline.shift #=> 1
pipeline.shift #=> 4
pipeline.shift #=> 9
pipeline.shift #=> nil

Side Worker

As we travel down the pipeline, from time to time, we will want to stop and smell the roses, or write to a log file, or drop a beat, or create some other kind of side-effect. That's the purpose of the side_worker. A side worker will pass through the same value that it received, but as it does so, it can perform some kind side-effect work.

source = source_worker (0..3)

evens = []
even_stasher = side_effect do |value|
  if value % 2 == 0
    evens << value
  end
end

# re-using "squarer" from above example...
pipeline = source | even_stasher | squarer

pipeline.shift #=> 0
pipeline.shift #=> 1
pipeline.shift #=> 4
pipeline.shift #=> 9
pipeline.shift #=> nil

evens #=> [0, 2]

Notice that the output is the same as the previous example, even though we put this even_stasher side_worker in the middle of the pipeline.

However, we can also see that the evens array now contains the even numbers from source (the side effect).

But wait, you want to say, can't we still create side effects with regular ole relay workers? Why, yes. Yes you can. Ruby being what it is, there really isn't a way to prevent the implementation of any worker from creating side effects.

And still wait, you'll also be wanting to say, isn't it possible that a side worker could mutate the value as it's passed through? And again, yes. It would be very difficult* in the Ruby language (where so many things are passed by reference) to perfectly prevent a side worker from mutating the value. But please don't.

The side effect worker's purpose is to provide intentionality and clarity. When you're creating a side effect, let it be very clear. And don't do regular, pure functional style work in the same worker.

This will make side effects easier to troubleshoot; if you pull them out of the pipeline, the pipeline's output shouldn't change. By the same token, if there is a problem with a side effect, troubleshooting it will be much simpler if the side effects are already isolated and named.

Another measure for preventing side workers from creating unwanted side-effects is to use :hardened mode. That's right, side_worker has a mode (which defaults to :normal). When set to :hardened, each value is Marshal.dumped and Marshal.loaded before being passed to the side worker's callable. This helps to ensure that the original value will be passed through to the next worker in the queue in a pristine state (unmodified), and that no future or subsequent modification can take place.

Given a source...

source = source_worker [[:foo], [:bar]]

Consider this unsafe, unhardened side worker:

unsafe = side_worker { |v| v << :boo }

This produces unwanted mutations, because of the <<.

pipeline = source | unsafe

pipeline.shift #=> [:foo, :boo] <-- Oh noes!
pipeline.shift #=> [:bar, :boo] <-- Disaster!
pipeline.shift #=> nil

Let's harden it:

unsafe = side_worker(:hardened) { |v| v << :boo }

The :hardened side worker doesn't have access to the original value, and therefore its shenanigans are moot with respect to the main workflow:

pipeline = source | unsafe

pipeline.shift #=> [:foo] <-- No changes
pipeline.shift #=> [:bar] <-- Much better
pipeline.shift #=> nil

Filter Worker

The filter worker simply passes through the values which are given to it, but only those values which result in truthiness when your provided callable is run against it. Values which result in falsiness are simply discarded.

source = source_worker (0..5)

filter = filter_worker do |value|
  value % 2 == 0
end

pipeline = source | filter

pipeline.shift #=> 0
pipeline.shift #=> 2
pipeline.shift #=> 4
pipeline.shift #=> nil

Batch Worker

The batch worker gathers outputs into batches and the returns each batch as its output.

source = source_worker (0..7)

batch = batch_worker gathering: 3

pipeline = source | batch

pipeline.shift #=> [0, 1, 2]
pipeline.shift #=> [3, 4, 5]
pipeline.shift #=> [6, 7]
pipeline.shift #=> nil

Notice how the final batch doesn't have full compliment of three.

Fixed-numbered batch workers are useful for things like pagination, perhaps, but sometimes we don't want a batch to include a specific number of items, but to batch together all items until a certain condition is met. So batch worker can also accept a callable:

source = source_worker [
  "some", "rain", "must\n", "fall", "but", "ok" ]

line_reader = batch_worker do |value|
  value.end_with? "\n"
end

pipeline = source | line_reader

pipeline.shift #=> ["some", "rain", "must\n"]
pipeline.shift #=> ["fall", "but", "ok"]
pipeline.shift #=> nil

Splitter Worker

The splitter worker accepts a value from its supply, and generates an array and then successively returns each element of the array. Once the array has been expended, the splitter worker appeals to its supplier for another value and the process repeats.

source = source_worker [
  'A bold', 'move westward' ]

splitter = splitter_worker do |value|
  value.split(' ')
end

pipeline = source | splitter

pipeline.shift #=> 'A'
pipeline.shift #=> 'bold'
pipeline.shift #=> 'move'
pipeline.shift #=> 'westward'
pipeline.shift #=> nil

Trailing Worker

The trailing worker accepts a value, and returns an array containing the last n values. This worker could be useful for things like rolling averages.

No values are returned until n values had been accumulated. New values are unshifted into the zero-th position in the array of trailing values, so that a given value will graduate through the array until it reaches the last position and then subsequently removed.

Maybe it's easiest to just give an example:

source = source_worker (0..5)
trailer = trailing_worker 4

pipeline = source | trailer

pipeline.shift #=> [3,2,1,0]
pipeline.shift #=> [4,3,2,1]
pipeline.shift #=> [5,4,3,2]
pipeline.shift #=> nil

Note that as soon as a nil is received from the supplying queue, the trailing worker simply provides a nil.

Origins of Shifty

Shifty was Stepladder, which grew out of experimentation with Ruby fibers, after reading Dave Thomas' demo of Ruby fibers, wherein he created a pipeline of fiber processes, emulating the style and syntax of the *nix command line. I noticed that, courtesy of fibers' extremely low surface area, fiber-to-fiber collaborators could operate with extremely low coupling. That was the original motivation for creating the framework.

After playing around with the new framework a bit, I began to notice other interesting characteristics of this paradigm.

Escalator vs Elevator

Suppose we are performing several operations on the members of a largish collection. If we daisy-chain enumerable operators together (which is so easy and fun with Ruby) we will notice that if something goes awry with item number two during operation number seven, we nonetheless had to wait through a complete run of all items through operations 1 - 6 before we receive the bad news early in operation seven. Imagine if the first six operations take a long time to each complete? Furthermore, what if the operations on all incomplete items must be reversed in some way (e.g. cleaned up or rolled back)? It would be far less messy and far more expedient if each item could be processed though all operations before the next one is begun.

This is the design paradigm which shifty makes easy. Although all the workers in your assembly line can be coded in the same context (which is one of the big selling points of the daisy-chaining of enumerable methods,incidentally), you also get the advantage of passing each item though the entire op-chain before starting the next.

Think Locally, Act Globally

Because shifty workers use fibers as their basic API, the are almost unaware of the outside world. And they can pretty much be written as such. At the heart of each Shifty worker is a task which you provide, which is a callable ruby object (such as a Proc or a lambda).

The scope of the work is whatever scope existed in the task when you initially created the worker. And if you want a worker to maintain its own internal state, you can simply include a loop within the worker's task, and use the #handoff method to pass along the worker's product at the appropriate point in the loop.

For example:

realestate_maker = Shifty::Worker.new do
  oceans = %w[Atlantic Pacific Indiana Arctic]
  previous_ocean = "Atlantic"
  while current_ocean = oceans.sample
    drain current_ocean  #=> let's posit that this is a long-running async process!
    handoff previous_ocean
    previous_ocean = current_ocean
  end

Anything scoped to the outside of that loop but inside the worker's task will essentially become that worker's initial state. This means we often don't need to bother with instance variables, accessors, and other features of classes which deal with maintaining instances' state.

The Wormhole Effect

Despite the fact that the work is done in separate threads, the scope of the work is the scope of the coordinating body of code. This means that even the remaining coupling in such systems --which is primarily just common objects coupling-- this little remaining coupling is mitigated by the possibility of having coupled code live together. I call this feature the folded collaborators effect.

Consider the following code vaporware:

SUBJECT = 'kitteh'

include Shifty::DSL

tweet_getter = source_worker do
  twitter_api.fetch_my_tweets
end

about_me = filter_worker do |tweet|
  tweet.referenced.include? SUBJECT
end

tweet_formatter = relay_worker do |tweet|
  apply_format_to tweet
end

kitteh_tweets = tweet_getter | about_me | tweet_formatter

while tweet = kitteh_tweets.shift
  display(tweet)
end

None of these tasks have hard coupling with each other. If we were to insert another worker between the filter and the formatter, neither of those workers would need changes in their code, assuming the inserted worker plays nicely with the objects they're all passing along.

Which brings me to the point: these workers to have a dependency upon the objects they're handing off and receiving. But we have the capability to coordinate those workers in a centralized location (such as in this code example).

Shifty::Worker

The Shifty::Worker documentation (which was the old README) is now here.

Roadmap