Project

janko

0.0
No commit activity in last 3 years
No release in over 3 years
Because sometimes you just need to feed PostgreSQL a lot of data.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

> 5.0, ~> 5.0
~> 1.6
~> 2.8
~> 0
~> 10.0
>= 3.0.0, ~> 3.0

Runtime

>= 0.0.1, ~> 0.0.1
> 0.17, ~> 0.17
 Project Readme

Janko

One day, your Ruby app may meed to feed a metric shit-ton of data into PostgreSQL.

On that day, you need Janko.

Janko imports and merges large datasets with truly stupid levels of performance (see the benchmarks below).

Installation

Run gem install janko, or add gem "janko" to your Gemfile.

Janko has been tested against PostgreSQL 9.3, and requires at least PostgreSQL 9.1, as it makes use of a writable CTE. As soon as PostgreSQL 9.5 goes platinum, Janko support will be added there as well.

Putting It to Work

Importing

Janko::Import needs to be configured for each table you want to load with data:

require "janko/import"

# Let's import some data with Janko::Import
importer = Janko::Import.new

# Use the same database connection as ActiveRecord.
importer.connect(ActiveRecord::Base)

# Import lots of records at once.
importer.use(Janko::CopyImporter)

# Let's only import a few columns into the "users" table.
importer.table("users").columns(:id, :email, :password)

# And feed it some data from an array
importer.start
rows.each { |row| importer.push(row) }
importer.stop

Out of the box, Janko::Import offers a choice of two different data-loading behaviors: Janko::InsertImporter and Janko::CopyImporter.

Janko::CopyImporter is notably faster, but only throws errors after feeding the entire dataset into PostgreSQL. Janko::InsertImporter is half as fast as copy, but will error immediately if the database is unhappy with what its being fed.

Both will have the same final result regardless of errors, the only differences are speed, and whether or not errors occur immediately on importer.push, or when importer.stop is called.

Merging / Upserting

Janko::Merge works similarly to Janko::Import:

require "janko/merge"

# Create a Merger, and fetch a Builder.
merge = Janko::Merge.new

# Tell it about our connection.
merge.connect(ActiveRecord::Base)

# Set the target table.
merge.table("users")

# Use the email address as the key column, used to determine whether
# to UPDATE an existing row, or INSERT a new one.
merge.key(:email)

# Feed it some data from an array
merge.start
rows.each { |row| merge.push(row) }
merge.stop

By default, Merge will attempt to fill all columns on both INSERT and UPDATE except the id column. You can choose to UPDATE or INSERT only on specific columns with merge.update("column") and merge.insert("column"), respectively.

Multiple key columns are allowed.

Returning

Merge can return both updated and inserted rows via returning:

# Return inserted rows.
merge.returning(:inserted)

# Return updated rows.
merge.returning(:updated)

# Return all rows.
merge.returning(:all)

# Return nothing.
merge.returning(:none)

Results are stored in merge.result.

Column Defaults

PostgreSQL provides a lot of functionality that is difficult or impossible to expose through an ORM, so Janko provides some shockingly unsafe ways to interact with the database.

Merge#alter is dangerous as all hell. It provides zero safety against stupidity, intentional or otherwise, and is injected directly into the database without escaping.

Here's how it works:

# Use the database default value for a column if nil.
merge.alter(:updated_at) { |c| c.default(Janko::DEFAULT) }

# Preserve the existing value of a column if nil.
merge.alter(:created_at) { |c| c.default(Janko::KEEP) }

# Do both of those things.
merge.alter(:updated_at) { |c| c.default(Janko::KEEP | Janko::DEFAULT) }

# Have the vote column increment whenever a row is updated.
merge.alter(:votes) { |c| c.on_update("$OLD + 1") }

# Make all post titles really, really shouty.
merge.alter(:title) { |c| c.wrap("upper($NEW)") }

# Completely nuke every record in the users table via SQL injection. Did I
# mention that Merge#alter should fill you with a sense of great danger?
merge.alter(:title) { |c| c.wrap("; rollback; delete * from users;" }

When alterting columns, $NEW is a placeholder for the value to be inserted from the Ruby side, and $OLD is the existing value in the database.

Both of these use parameter binding, not string escaping or quoting, so you don't need to worry about SQL injection on the values themselves, just on any raw query you might supply with wrap or on_update.

Benchmarks

Benchmarks were run under a single-processor virtual machine on my laptop (a Macbook Air, 1.3GhZ Core i5), and yielded surprisingly nice results.

Import

Insert Performance Graph

Merge

Merge Performance Graph

Other Considerations.

Both Insert and Merge should be threadsafe insofar as the underlaying connection library is.

The FUTURE!

Better documentation.

Returning rows into a custom handler or other table.

JRuby and Sequel support should be fairly straightforward.

Integration with ActiveRecord and Sequel::Model.

Profiling, optimization, and concurrency testing.

Binary-format COPY support and/or faster CSV encoding.

No plans to support other databases, just PostgreSQL, but I wouldn't turn down a well-implemented pull request.