DataMapper SQL Finders
DataMapper SQL Finders adds #by_sql
to your models, as a compliment to the standard pure-ruby
query system used by DataMapper. The SQL you write will be executed directly against the adapter,
but you do not need to lose the benefit of the field and table name abstraction offered by DM.
When you invoke #by_sql
, one or more table representations are yielded into a block, which you
provide. These objects are interpolated into your SQL, such that you use DataMapper properties in
the SQL and make no direct reference to the real field names in the database schema.
I wrote this gem because my original reason for using DataMapper was that I needed to work with a
large, existing database schema belonging to an old PHP application, subsequently ported to Rails.
We need the naming abstraction offered by DataMapper, but we also prefer its design. That said, we
run some queries that are less than trivial, mixing LEFT JOINs and inline derived tables…
something which DataMapper does not currently handle at all. #by_sql
allows us to drop down to
SQL in places where we need it, but where we don't want to by-pass DataMapper entirely.
The #by_sql
method returns a DataMapper Collection
wrapping a Query
, in just the same way
#all
does. Just like using #all
, no SQL is executed until a kicker triggers its execution
(e.g. beginning to iterate over a Collection).
Because the implementation is based around a real DataMapper Query object, you can chain other
DataMapper methods, such as #all
, or methods or relationships defined on your Model.
It looks like this:
class Post
include DataMapper::Resource
property :id, Serial
property :title, String
belongs_to :user
end
class User
include DataMapper::Resource
property :id, Serial
property :username, String
has n, :posts
def self.never_posted
by_sql(Post) { |u, p| "SELECT #{u.*} FROM #{u} LEFT JOIN #{p} ON #{p.user_id} = #{u.id} WHERE #{p.id} IS NULL" }
end
end
User.never_posted.each do |user|
puts "#{user.username} has never created a Post"
end
The first block argument is always the current Model. You can optionally pass additional models to #by_sql
and have
them yielded into the block if you need to join.
You may chain regular DataMapper finders onto the result (the original SQL is modified with the additions):
User.never_posted.all(:username.like => "%bob%")
A note about DataMapper 2.0
The DataMapper guys are hard at work creating DataMapper 2.0, which involves a lot of under-the-surface changes, most notably building DM's query interface atop Veritas, with the adapter layer generating SQL by walking a Veritas relation (an AST - abstract syntax tree). Because of the way DM 1 handles queries, it is not trivial to support SQL provided by the user (except for the trival case of it being in the WHERE clause). With any hope, gems like this will either not be needed in DM 2.0, or at least will be easy to implement cleanly, since SQL and Veritas play nicely with each other.
Installation
Via rubygems:
gem install dm-sql-finders
Detailed Usage
Note that in the following examples, you are not forced to use the table representations yielded into the block, but you are encouraged to. They respond to the following methods:
-
tbl.*
: expands the splat to only the known fields defined in your model. Other fields in the database are excluded. -
tbl.to_s
: represents the name of the table in the database.#to_s
is invoked implcitly in String context. Note that if you join to the same table multiple times, DataMapper SQL Finders will alias them accordingly. -
tbl.property_name
: represents the field name in the database mapping toproperty_name
in your model.
Writing the field/table names directly, while it will work, is not advised, since it will significantly hamper any future efforts to chain onto the query (and it reads just like SQL anyway, right?).
Basic SELECT statements
Returning a String from the block executes the SQL when a kicker is invoked (e.g. iterating the Collection).
def self.basically_everything
by_sql { |m| "SELECT #{m.*} FROM #{m}" }
end
Passing in variables
The block may return an Array, with the first element as the SQL and the following elements as the bind values.
def self.created_after(time)
by_sql { |m| ["SELECT #{m.*} FROM #{m} WHERE #{m.created_at} > ?", time] }
end
Selecting less than all fields
Just specify individual fields in the SQL. The regular DM semantics apply (i.e. the rest will be lazy loaded, and omitting the primary key means your records are immutable).
def self.usernames_only
by_sql { |u| "SELECT #{u.username} FROM #{u}" }
end
Selecting more than all fields (experimental)
This allows you to pre-load things like aggregate calculations you may otherwise add denormalizations for:
def self.with_post_counts
by_sql(Post) { |u, p| "SELECT #{u.*}, COUNT(#{p.id}) AS post_count FROM #{u} INNER JOIN #{p} ON #{p.user_id} = #{u.id} GROUP BY #{u.id}" }
end
A @post_count
instnace variable is set on all resources. Currently this is always a String. You will need to typecast manually.
See the section on "Joins" for details on the join syntax.
You should consider this feature experimental. It takes advantage of the fact DM Property instances can be created and thrown-away on-the-fly.
Ordering
DataMapper always adds an ORDER BY to your queries if you don't specify one. DataMapper SQL Finders behaves no differently. The default ordering is always ascending by primary key. You can override it in the SQL:
def self.backwards
by_sql { |m| "SELECT #{m.*} FROM #{m} ORDER BY #{m.id} DESC" }
end
Or you can provide it as a regular option to #by_sql
, just like you can with #all
:
def self.backwards
by_sql(:order => [:id.desc]) { |m| "SELECT #{m.*} FROM #{m}" }
end
Note that the :order
option take precendence over anything specified in the SQL. This allows method chains to override it.
Joins
The additional models are passed to #by_sql
, then you use them to construct the join.
class User
... snip ...
def self.posted_today
by_sql(Post) { |u, p| ["SELECT #{u.*} FROM #{u} INNER JOIN #{p} ON #{p.user_id} = #{u.id} WHERE #{p.created_at} > ?", Date.today - 1] }
end
end
The :links
option will also be interpreted and added to the FROM
clause in the SQL. This is useful if you need to override the SQL.
Limits and offsets
These can be specified in the SQL:
def self.penultimate_five
by_sql { |m| "SELECT #{m.*} FROM #{m} ORDER BY #{m.id} DESC LIMIT 5 OFFSET 5" }
end
Or they can be provided as options to #by_sql
:
def self.penultimate_five
by_sql(:limit => 5, :offset => 5) { |m| "SELECT #{m.*} FROM #{m}" }
end
If :limit
and/or :offset
are passed to #by_sql
, they take precedence over anything written in the SQL itself.
Method chaining
Method chaining with #by_sql
, for the most part, works just like with #all
. There are some current limitations,
such as reversing the order of a query that used ORDER BY
in the SQL, rather than via an :order
option.
Also note the you may not currently chain #by_sql
calls together. #by_sql
must, logically, always be the first
call in the chain.
User.by_sql{ |u| ["SELECT #{u.*} FROM #{u} WHERE #{u.role} = ?", "Manager"] }.all(:username.like => "%bob%", :order => [:username.desc])
Unions, Intersections and Differences
Unfortunately this is not currently supported, and will likely only be added after the other limitations are worked out.
Specifically, queries like this:
User.by_sql { |u| ["SELECT #{u.*} FROM #{u} WHERE #{u.created_at} < ?", Date.today - 365] } | User.all(:admin => true)
Should really produce SQL of the nature:
SELECT "users"."id", "users"."username", "users"."admin" FROM "users" WHERE ("created_at" < ?) OR (admin = TRUE)
I have no idea what will happen if it is attempted, but it almost certainly will not work ;)
Will it interfere with DataMapper?
#select_statement
on the adapter is overridden such that, when you use #by_sql
query, code in the gem is executed, and
when you execute a regular query, the original code pathways are followed. I'd prefer some sort of extension API in
DataObjectsAdapter to allow hooks into its SQL generation logic, but for now, this is how it works.
DataMapper 2.0 should fix this.
Contributors
DataMapper SQL Finders is currently written by Chris Corbyn
Contributions are more than gladly accepted. The primary goal is to support SQL in a way that does not break gems like dm-aggregates and dm-pager. The more the SQL can be interpreted and turned into a native Query, the better.
TODO
There are some known limitations, that are mostly edge-cases. You will only run into them if you try to get too crazy combining regular
DM queries with SQL (e.g. adding :links
to a hand-written SQL query works, unless you have used bind values somewhere other than the
WHERE clause and if, and only if DataMapper needs to use a bind value in the join, such as for special join conditions). Real
edge-cases.
- Support overriding
:fields
in a#by_sql
query (complex if the query depends on RDBMS native functions in both the WHERE and the SELECT) - Reverse the order when invoking
#reverse
in a#by_sql
query that usedORDER BY
in the SQL (note this will work just fine if you use the:order
option) - Better support for
?
replacements in places other than theWHERE
clause - Support set operations (union, intersection, difference)
- Possibly (?) support crazy complex mass-updates (seems a little DB-specific though):
User.by_sql { ... something with join conditions ... }.update!(:banned => true)
(MySQL, for one, can doUPDATE ... INNER JOIN ...
)
Future Plans
Before I started writing this, I wanted to implement something similar to Doctrine's (PHP) DQL, probably called DMQL. This is basically a strict superset of SQL that is pre-processed with DataMapper, having knowledge of your schema, therefore allowing you to simplify the query and let DMQL hande things like JOIN logic. Say, for example:
Post.by_dmql("JOIN User u WHERE u.username = ?", "Bob")
Which would INNER JOIN posts with users and map u.username
with the real field name of the User#username
property.
This gem would be a pre-requisite for that.
Copyright (c) 2011 Chris Corbyn.