Project: epo - The Ruby Toolbox

Project
epo

0.0
No commit activity in last 3 years
No release in over 3 years
epo crapooze/epo Homepage Documentation Source Code Bug Tracker Wiki
A no-brainer, plain-ruby database
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
Popularity
12,237
Releases
0.0.3
2011-03-19
2011-04-11
Development
Primary Language
Ruby
Average date of last 50 commits
2011-03-28
Reverse Dependencies
Dependencies
Runtime

derailleur
>= 0.0.6
welo
>= 0.1.1
Project Readme
= EPO

EPO is a no-brainer, plain-ruby file system database. If you have objects and
need to store them in a clean hierarchy, then EPO is a good choice.
EPO is not a good choice if you want to perform optimized queries or if you
operate on big datasets.
In EPO, each object has a directory, so, it is easy to use programs which write
their output at a given file without using temp dirs.  Similarly, if you do
lots of batch operations on your EPO's resources, the one-directory per
resource approach scales well.

== Philosophy

EPO only is a small library, so summarizing it's philosophy takes only a few
lines:
* My filesystem already is a database
* My database should be plain ruby and
* But my database should be easy to use from non-ruby programs

== The EPO hierarchy

EPO tries to build from the lessons of KISS and REST routes.
If you're familiar with REST applications, you will find EPO's hierarchy pretty
straightforward.

=== Important Rules

Here are the simple rules for EPO's databases:
* EPO operates on Welo::Resources
* each resource has one directory
* the directory's path identifies the resource stored
* there is one serialization file per (resource, perspective, extension) tuple
* any other file in the directory should not impact your ruby application, but may have meaning to other programs (e.g., a thumbnail created by your file-viewer)

=== Examples

If you have a resource "user" identified by its "login", if you have three
users: peter, jon, and marc. 
The database directories may look like the following. Between brackets are reference to further explanations.

$ tree db/
db/
└── user
    ├── peter
    |   ├── resource-epo-default.json  [1a]
    |   └── picture.png            [1b]
    ├── jon
    |   ├── resource-epo-default.json  [2a]
    |   ├── picture.png            [2b]
    |   └── pubkey.rsa.txt         [2c]
    └── marc
        ├── resource-epo-default.json  [3a]
        ├── picture.png            [3b]
        └── thumbnail.dat          [3c]

In the previous hierarchy, [1a, 2a, 3a] are the serialization of the user
resources under the 'default' (or :default) perspective in the JSON format.
The perspective is a concept of Welo's resource, which is basically a list of
fields of an object that you want to dump or observe (an administrator may have
access to more details than a mere, non-registered user).

The other objects may or may not be related to your ruby application.  A
convenient explanation may be: [1b, 2b, 3b] are pictures, most likely your
user's headshots, uploaded by your users through your application.  User [2b]
has given his public key, and we see that your filebrowser has created a
thumbnail in [3c], which has nothing to do with your application.

=== Attention

The main thing to keep in mind is that filesystems come with a slight problem:
file paths are strings.
As a result, when designing your EPO hierarchy, you must be aware of:
- encoding issues
- case sensitivity
- ambiguities with path separators 

== Benefits

Example of things you can do easily with EPO:
* use a filesystem explorer to visualize/explore your DB
* organize your documents without hassle
* use bash scripts for batch processing
* use rsync/NFS/git on all or part of your database's content
* use ftp/http servers to expose a branch of your DB's hierarchy 
* have other programs use your DB easily without configuring databases

Say you have photos to sort. You may sort them by year, by place, by subject or
other things. Some software propose you to tag your photos and build a database
with these photos for you. Unfortunately, these softwares are often closed, you
cannot re-use their database, or it requires lots of effort to script the
software to do something it is not ready for (e.g., resize your pictures in a
batch). Moreover, the file-viewer of your OS may be good enough to view your
pictures.  Filesystems have solved many database issues since ages.  So, why
not just rely on your file system to store your pictures? 

== Usage (please have a look at the examples directory)

=== EPO is simple
There is no complex things in EPO, as an example, there is no:
- index
- transaction
- thread/multiprocess safety
- lifecycle hook
If you want to add a missing feature, feel free to fork/subclass/add a module,
or implement it directly on your filesystem.

=== Theory

EPO uses:
* Derailleur's paths routing to quickly map paths to resources
  (we may want to remove this dependency later)
* Welo's observers to hook events when iterating on the filesystem

EPO::DB are just simple, in-memory Ruby objects. 
They have no connection to take care of, no credentials.

An EPO::DB may understand several formats (json or yaml are standard choices).
You may modify this by changing EPO::DB#extensions .

EPO::DB are headless, in the sense that they don't store their hierarchy's root
in a variable. As a result, an EPO::DB contains the concepts of the models but
are not actually tied to the data on the filesystem.
You may have two EPO::DB on the same filesystem's root (each understanding
different resources or formats).

=== Commented code bits

Say you have two models (which are Welo::Resources): Person and Item.
Both models must have a "identifying" named :flat_db (this is just a default convention).
  class Person 
    include Welo::Resource
    identify :flat_db, [:name]
    ...

If you want to create a database only able to handle persons
  EPO::DB.new([Person])

Creates a database able to handle persons and items
  db = EPO::DB.new([Person, Item])

Saves a person with the default perspective, in all format
  person = Person.new(...)
  db.save(root, person)

Iterates on all the DB items (as observations)
  db.each_resource(root) { |observation| ... }

== Dependencies

welo >= 0.1.0
derailleur >= 0.5.0
a JSON library if you use json formatting (recommended)

== Benchmark
Config:
* macbook pro (2010), without SSD:
* ruby 1.9.2
* json 1.5.1

$ ruby example/benchmark.rb 
  0.770000   1.020000   1.790000 (  4.563546)
5935
  1.160000   0.420000   1.580000 (  1.601897)

So, roughly 5 seconds to write 6000 records, and 2 seconds to read them back.
No sync operation nor cache flush was forced in between the results. That's maybe why
we can read back only 5935 records out of 6000.

== License

The MIT license