Project

queryalize

0.0
No commit activity in last 3 years
No release in over 3 years
Queryalize lets you use Rails 3 to build queries just like with ActiveRecord::QueryMethods, except you can serialize the end result. This is useful for running queries that potentially return large result sets in the background using something like Resque or Delayed::Job.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 3.0.0
 Project Readme

Queryalize lets you use Rails 3 to build queries just like with ActiveRecord::QueryMethods, except you can serialize the end result. This is useful for running queries that potentially return large result sets in the background using something like Resque or Delayed::Job.

Normally, using ActiveRecord::QueryMethods, you build queries like this:

query = User.where(:name => "something").order("created_at DESC")

With Queryalize, it's only a little different:

query = Querialize.new(User).where(:name => "something").order("created_at DESC")

However, now you get all of this goodness:

# NOTE the following methods DO NOT query the database,
# they return a representation of the query itself in one 
# of the following formats

json = query.to_json # => query as json data
yaml = query.to_yaml # => query as yaml data
hash = query.to_hash # => query as ruby hash

new_query_from_json = Queryalize.from_json(json)
new_query_from_yaml = Queryalize.from_yaml(yaml)
new_query_from_hash = Queryalize.from_hash(hash)

Why?

Imagine, for example, that you have a database that organizes music into several genres. You have built an admin interface that allows the administrator to filter the catalog of music by genre, and run updates against the result set. However, the database is large, and the query for "electronica" returns 1,000,000+ results. The administrator wants to re-process these entries such that the genre is "electronic" (without the annoying 'a' at the end).

Unfortunately, your schema is setup in such a way that you cannot simply run a single "UPDATE." Rather, you must iterate through each individual record and update its genre. Ouch. There is no way you can allow this to happen during the request, or it will certainly timeout. So you decide to queue the update, but how do you tell the queue workers which records to update? You could try to capture just the ids from the records, but you'd still need to store 1,000,000+ ids somewhere so the queue worker can reference them later, not to mention that actually collecting the ids takes a healthy amount of time and memory, and will probably also time out. You could build up your query and then use to_sql to pass the raw SQL to the queue worker, but then you can't use useful methods like 'find_each' in the queue task.

The solution is to serialize the query you've built, and then rebuild it in the queue task. It ends up looking something like this (if you're using Delayed::Job):

query = Queryalize.new(Music).joins("JOIN #{Genre.table_name} ON #{Genre.table_name}.music_id = #{Music.table_name}.id").where(["#{Genre.table_name}.name = ?", 'electronica'])
# see 1. below

worker = GenreWorker.new({
  :update => 'electronic',
  :query  => query.to_json
})

Delayed::Job.enqueue(worker)

# 1.
# written this way to demonstrate chaining, but a slightly cleaner way would be:
# genres = Genre.table_name
# query  = Queryalize.new(Music)
# query  = query.joins("JOIN #{genres} ON #{genres}.music_id = #{genres}.id")
# query  = query.where(["#{genres}.name = ?", 'electronica'])

The GenreWorker class looks something like this:

class GenreWorker

    def initialize(args)
      @update = args[:update]
      @query  = args[:query]
    end
    
    def perform
      Queryalize.from_json(@query).find_each do |music|
        music.genre.update_attribute(:name => @update)
      end
    end
  end
end

Notice the query was serialized and reconstructed to its original state, so you can seamlessly use ActiveRecord features like find_each. Simple!