Project

sql-ferret

0.0
No commit activity in last 3 years
No release in over 3 years
The SQL Ferret provides a navigational database style wrapper around SQLite. EXPERIMENTAL! The API and both DSLs are subject to change!
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

~> 1.0.0
 Project Readme
The SQL Ferret wraps SQLite into a navigational database style
interface.

BEWARE: this is raw and ugly EXPERIMENTAL code, and its API is
VERY LIKELY to change before 1.0.


== Overview

The [[Ferret::new]] constructor takes two arguments: a
(multiline) string containing the Ferret Data Schema Description
and a previously opened [[SQLite3::Database]] instance for
accessing an SQLite database with such a schema.  The resulting
[[Ferret]] instance's primary useful method is [[go]]; it takes
one mandatory argument -- the Ferret query string -- and may
also take numbered and named arguments, as well as a block.

Note that Ferret has TWO distinct DSLs: one for defining the
data model, one for querying and manipulating it.  When the
Ferret schema is stored in a text file, it's customarily given a
name in the form of [[foo.fers]].  Ferret query expressions are
typically inlined in Ruby code.


== Ferret query language

The simplest form of a Ferret query expression is a query over
one table, filtering by one input column, and producing one
output column:

  <table> ':' <input-field> '->' <output-field>

Such an expression corresponds to the SQL of

  SELECT <output-field> FROM <table> WHERE <input-field> = ?

Note that in order to process such an expression, [[Ferret#go]]
requires an extra argument besides the expression -- the
exemplar value for [[<input-field>]].  The separation of
expression from data is a deliberate design feature of Ferret:
on one hand, it's believed to make the expressions clearer; on
another, it provides a measure of protection against inadvertent
XSS vulnerabilities.

The <input-field> can be omitted.  In such a case, the query
fetches all rows from <table>.  If more than one input field is
supplied, they must be separated by commas and surrounded by
parentheses, like this.  (Rationale: Ferret's [[->]] operator,
normally binds very weakly on its left hand side.  While it
would not be hard to write an exception into the parser, it is
believed that permitting the surrounding parentheses to be
omitted is likely to lead to confusing Ferret query
expressions.)

Multiple output fields can be specified by separating them with
commas.  Surrounding parentheses are not necessary or permitted
around the right-hand side of an arrow.

More complex queries involve multiple tables and what relational
algebra calls /joins/.  Since Ferret aims to provide a
navigational rather than purely relational interface, it
presents joins as /dereferencing/, denoted by a trailing [[->]]
operator.  That is, the query

  houses: number -> resident -> name, phone

can correspond to the SQL of

  SELECT house.number, house.resident, resident.name,
      resident.age
  FROM houses LEFT JOIN residents ON
      houses.resident = residents.id
  WHERE house.number = ?

provided that the data schema specifies that the column of
[[houses.resident]] refers to [[resident]] through its [[id]].
(In SQL parlance, it needs to be defined as a foreign key.)  If,
instead of [[left join]], an [[inner join]] is desired, the
two-ended dereferencing arrow [[<->]] needs to be used instead
of [[->]].

A Ferret data schema permitting such translation might look
roughly thus:

  [houses]
  id: primary key, integer
  number: optional integer
  name: optional varchar
  street: varchar
  resident: optional ref residents(id)

  [residents]
  id: primary key, integer
  name: varchar = 'John Smith'
  phone: optional varchar

Note that the columns are by default _not_ nullable but they can
be explicitly defined as nullable by the keyword [[optional]].
The [[=]] character followed by an SQL expression specifies the
default value for a column.

Also note that in the definition of [[resident: optional ref
residents(id)]], the [[(id)]] can be omitted because it's clear
from context -- [[residents.id]] is the primary key of
[[residents]].

This data schema permits only up to one resident per house.
What if the house->resident relation needs to have an 1->n shape
rather than 1->0..1?  We could move the linking column from
[[houses]] to [[residents]], like this:

  [houses]
  id: primary key, integer
  number: optional integer
  name: optional varchar
  street: varchar
  resident: ghost ref residents(house)

  [residents]
  id: primary key, integer
  name: varchar = 'John Smith'
  phone: optional varchar
  house: optional ref residents

Note that we're still defining [[houses.resident]] but it's no
longer a /column/ -- that is, it does not have a matching SQL
table column anymore --, but a /ghost field/.

Besides being primary keys, columns can be defined merely
[[unique]].  Ferret does not currently support composite
secondary keys, but a future version might.

How does [[Ferret#go]] deliver its results?  It depends.  If a
block is given to it, it will call this block with each row;
otherwise, it collects rows and returns them.  (Actually, if
Ferret can prove, using [[unique]] and [[primary key]]
constraints, that the query necessarily produces 0 or 1 rows, it
will return either [[nil]] or the one row; otherwise, it will
return an array of the rows.) If the query specifies one column
(which may be precededed by dereferences); each 'row' will be
the value without encapsulation; otherwise, Ferret wraps rows
into [[OpenStruct]] instances.  The multicolumn behaviour can be
forced by adding an explicit trailing comma after what would
otherwise be the single requested column.  (Rationale: while
these rules are a bit clumsy to specify, they have proven
intuitive, in a Perlish way.)

Each queried column can be given an explicit name, analogously
to SQL's [[AS]] clause, by specifying
it between apostrophes after the column appears in the
expression.  Note that this is not a string literal; rather,
Ferret parses each apostrophe as a token, and the explicit name
must parse as a valid Ferret identifier token.

Star topology joins can be specified by surrounding a joining
arrow together with its right-hand side in parentheses, like
this:

  houses: number -> resident (-> name, phone), street

Such parentheses can be nested.

In addition to retrieval, Ferret query expressions also support
modification and deletion of entries.  This is notated by
terminating an expression in a 'blank' dereference operator
followed by a colon and a verb, like this:

  houses: number -> resident ->: set

The fields to be changed will then be specified as named
arguments to [[Ferret#go]].  The verb [[update]] can also be
used instead of [[set]]; it has exactly the same meaning.  When
the verb [[delete]] is used, [[Ferret#go]] does not take any
named arguments.

Outside the Ferret query expression mechanism, there's the
[[Ferret#insert]] method that takes the target table's name as a
mandatory argument and the values to be inserted as named
arguments, like this:

  $ferret.insert 'residents',
      house: 8,
      name: 'Jacob Doe',
      phone: '555-1212'

A future version of Ferret API is likely to provide record
insertion through [[Ferret#go]].  The reason we're not doing it
in this public release is that our autovivification mechanisms
are nowhere near settling yet.

Also of note is [[Ferret#change]], whose signature matches
[[Ferret#insert]] except that it performs the [[INSERT OR
REPLACE INTO ...]] operation instead of plain [[INSERT]], and
[[Ferret#transaction]], which supports recursive locking.
(This is quirky.  It's mainly intended for use in library
functions that need to group Ferret or SQL operations for
atomicity without assuming that an outer transaction exists or
does not exist, and it needs care even then.  Unless you know
you need it, you're probably better off using
[[SQLite3::Database#transaction]] directly.)

Instead of a single input value, it's permitted to
pass a whole collection of input values to [[Ferret#go]] --
then, Ferret uses [[foo in (?, ...)]] instead of [[foo = ?]],
and won't consider this column's possible declared uniquity when
deciding whether the query is a single-row query --, or [[nil]],
in which case Ferret uses [[foo is null]] for proper SQL-style
nullity checking.  (A 'collection' is defined through duck
typing -- anything that produces more than one value when the
[[*]] prefix operator is applied to it.)

When it's desired that [[Ferret#go]] produce distinct values, a
trailing [[: distinct]] or [[: select distinct]] can be used.
(These two are synonymous.)  Note that for query-type verbs, the
colon *must not* be preceded by a blank-RHS dereferencing arrow,
unlike for mutation-type verbs, which require it.

Besides straight values, Ferret supports interpreted values.
The set of such is currently hardcoded and is:

  iso8601
  unix_time
  subsecond_unix_time
  json
  pretty_json
  yaml
  ruby_marshal
  packed_hex

When a Ferret schema assigns an interpreted rather than straight
data type to a column, [[Ferret#go]] will automatically
interpret and 'deterpret' values for this column, unless the
column's name is prefixed with a backslash in the expression.
Note that [[Ferret#insert]] does not (currently?) support
interpretation, and always processes raw values.


== Likely future development

- 'en passant' filters in addition to 'initial' filters.  These
  will probably be notated by brackets, and may permit
  ordering comparison in addition to equality checks;
- use of [[Range]] values in addition to collections as filters
  passed to [[Ferret#go]];
- explicit ordering of the produced rows, probably via unary
  postfix operators;
- SQL grouping and aggregate functions;
- [[Ferret#go]] returning rows as a [[Hash]] instead of an
  [[Array]], using a given key or keys;
- Kleene dereferencing arrows [[-*>]] and [[-+>]] (and their
  double-ended [[INNER JOIN]] counterparts), implemented via
  SQLite's recently introduced [[WITH RECURSIVE]] construct;
- multi-column uniquity constraints and foreign keys;
- automated handling of SIKR (Strictly Incremental Knowledge
  Representation) packets so that Fossil-style multinode
  tracking could be implemented for (nearly) arbitrary data
  structures;
- accessing SQL's views;
- customisable autovivification in multistage insertions;
- defining [[Hash]]-like interfaces atop [[Ferret]] that would
  be backed with a custom, potentially joined, relation in the
  underlying SQL table;
- defining per-column access-controlled [[Ferret]]-like subAPIs;
- integration of [[insert]] and [[change]] into [[Ferret#go]];
- an interface for Android's built-in SQLite API as available
  through Ruboto, as an alternative to the [[sqlite3]] Rubygem
  which is not available on Ruboto;
- better documentation;
- renaming to avoid clashing with
  <https://github.com/jkraemer/ferret>.


== Possible future development

- tracking the underlying SQLite database's schema via [[PRAGMA
  user_version]] and automatically upgrading it;
- command line tools for database setup and data import and
  export;
- transparently joining another table so as to implement
  [[is-a]] type relation atop a SQL data model;
- transparently interpreting JSON or YAML data as extra payload
  fields of their containing table without explicit formal
  specification, akin to MongoDB;
- transparent compression of blob/YAML/JSON fields;
- extracting column type data, constraints, and foreign keys
  from the [[sqlite_master]] data so that the Ferret schema
  would only need to specify ghost fields and interpretations;
- a notation for /ad hoc/ joins;
- a third, hybrid DSL: Ferret dereference operator extensions to
  basic SQL queries, something like [[SELECT number, resident ->
  name, resident -> phone FROM houses WHERE id = ?]] or perhaps
  [[SELECT number, resident -> (name, phone) FROM ...]];
- custom enums as interpretations;
- global primary key definition in the schema.