activerecord-fast-import
Loads data from text files into tables using fast native MySQL LOAD DATA INFILE query.
Examples
Loading data from tab delimited log file
Suppose you have an ActiveRecord model LogEntry defined as LogEntry happened_at:datetime url:string
and a log file with tab delimited columns like this:
2009-09-30 12:32:43<tab>http://github.com/
2009-09-30 13:36:13<tab>http://facebook.com/
To import data from this log file, you have to use
LogEntry.fast_import('huge.log')
That's it!
Of course in real world you will also need more advanced features. Read on...
Changing delimiters and ignoring some rows
Of course not all log files are delimited by tabs and newlines. Just pass custom delimiters to options. If you want to ignore first 10 lines, just use :ignore_lines
import_options = {
:fields_terminated_by => ',',
:lines_terminated_by => ';',
:ignore_lines => 10
}
Changing order of columns and ignoring columns
Now, imagine you want to import data from a huge log file with following format:
http://github.com/<tab>Mozilla<tab>2009-09-30 12:32:43
http://facebook.com/<tab>Opera<tab>2009-09-30 13:36:13
It is clear that columns are in different order and we even want to ignore the second column. Let's do it
import_options = {:columns => ["url", "@dummy", "happened_at"]}
LogEntry.fast_import('huge.log', import_options)
The special @dummy
loads that column into a local variable and when unused (in a transformation) is just ignored.
Transforming data
Now imagine we have a log file like this:
2009-09-30 12:32:43<tab>http://github.com/<tab>image.jpg
2009-09-30 13:36:13<tab>http://facebook.com/<tab>styles/default.css
We want to concatenate those two columns into one.
import_options = {
:columns => ["happened_at", "@domain", "@file"],
:mapping => { :url => "CONCAT(@domain, @file)" }
}
LogEntry.fast_import('huge.log', import_options)
Of course you can use any of those shiny MySQL functions.
Extra features - added by Rafal Piekarski
This version also provides other "LOAD DATA" features and they are fields in options hash:
:insert_method # defines method for inserting data: IGNORE or REPLACE
:charset_name # defines file charset
:fields_optionally_enclosed_by # allows you to define fields enclosure method
:fields_escaped_by # changes the escape character
Copyright
Copyright (c) 2009 Jan Suchal. See LICENSE for details.