Project

spinoza

0.0
No commit activity in last 3 years
No release in over 3 years
Model of Calvin distributed database.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

 Project Readme

spinoza

A model of the Calvin distributed database. The main purpose of this model is expository, rather than analysis for correctness or performance. All concurrency and distribution is modeled in a single-threaded process with in-memory data tables, which makes it easier to understand what is happening.

Spinoza, like Calvin, was a philosopher who dealt in determinism.

Calvin is developed by the Yale Databases group; the open-source releases are at https://github.com/yaledb.

Structure

The model of the underlying computer and network system is in lib/spinoza/system.

The Calvin model, implemented on the system models, is in lib/spinoza/calvin. Other distributed transaction models could also be implemented on this layer.

The transaction class, in lib/spinoza/transaction.rb, is mostly abstracted from these layers. It is very simplistic, intended to illustrate Calvin's replication and consistency characteristics.

Running

You will need ruby 2.0 or later, from http://ruby-lang.org, and the gems listed in the gemspec:

sequel
sqlite3
rbtree

You can also gem install spinoza, but it may not be up to date.

To run the unit tests:

rake test

Examples TBD.

Observing

One benefit of modeling concurrency in a single thread is being able to see what is going on at every node at every moment in time. In fact, Spinoza keeps a history of all "events". (Events are how we model the passage of time and the actions that occur at a point in time.) As noted in test/test-scheduler.rb, you can use this to make assertions about what occurred, when, and where:

pp @timeline.history.select {|time, event| event.action != :step_epoch}

(The :step_epoch events are frequent and not usually very interesting.) The hostory is stored in a red-black tree for easy access by time interval.

References

To do

  • The performance and error modeling should optionally be statistical, with variation using some distribution.

  • Model IO latency and compute time, in addition to currently modeled network latency.

  • Log#time_replicated should be a function of the reading node and depend on the link characteristics between that node and the writing node.

  • Transactions, to be more realistic, should have dataflow dependencies among operations. (But only for non-key values, because Calvin splits dependent transactions.)

  • Transactions also need conditionals, or, at least, conditional abort, which is needed to support the splitting mentioned above.

  • For comparison, implement a 2-phase commit transaction processor on top of the Spinoza::System classes.

  • Output spacetime diagrams using graphviz.

  • See also 'TODO' in code.

Contact

Joel VanderWerf, vjoel@users.sourceforge.net, @JoelVanderWerf.

License and Copyright

Copyright (c) 2014, Joel VanderWerf

License for this project is BSD. See the COPYING file for the standard BSD license.