csv_step_importer
A library to validate, speed up and organize bulk insertion of complex CSV data into multiple tables.
It depends on
Sample Application (Thanks to @vochicong)
vochicong/csv_step_importer_sample
Installation
Add this line to your application's Gemfile:
# Quicker CSV processing
gem 'csv_step_importer'
gem 'smarter_csv', github: 'tilo/smarter_csv'
NOTE: you might need to add gem 'smarter_csv', github: 'tilo/smarter_csv'
if you encounter problems building rows
And then execute:
$ bundle
Or install it yourself as:
$ gem install csv_step_importer
Usage
Hello world setup (super simple sample, single table, no user-defined row, no user-defined dao)
First let's create a basic rails application to play around with
rails new bookshop --database=mysql --api --skip-spring --skip-test
cd bookshop
echo "gem 'csv_step_importer'
gem 'smarter_csv', github: 'tilo/smarter_csv'" >> Gemfile
bundle install
rails g model author name:string:uniq email:string
rails g model book author:references name:string:uniq
rails db:create db:migrate
Then edit the model like this:
app/models/author.rb
class Author < ApplicationRecord
class ImportableModel < CSVStepImporter::Model::ImportableModel
# The model to be updated
def model_class
Module.nesting[1]
end
# return CSVStepImporter::Model::Reflector in order to enable reflections (e.g. get ids of all rows)
# disabled by default
def reflector_class
CSVStepImporter::Model::Reflector
end
def columns
[:name, :email, :created_at, :updated_at]
end
def composite_key_columns
[:name]
end
def on_duplicate_key_update
[:email, :updated_at]
end
end
end
Import a single data row
rails c
irb(main)> data = [{ name: 'Milan Kundera', email: 'milan.kundera@example.com' }]
irb(main)> importer = CSVStepImporter::Loader.new(rows: data, processor_classes: [Author::ImportableModel])
irb(main)> importer.valid?
=> true
irb(main)> importer.save!
(1.1ms) SET NAMES utf8, @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'), @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
(0.4ms) BEGIN
[Author::ImportableModel, Author(id: integer, name: string, email: string, created_at: datetime, updated_at: datetime)]
(13.3ms) SHOW VARIABLES like 'max_allowed_packet';
Author Create Many Without Validations Or Callbacks (9.2ms) INSERT INTO `authors` (`name`,`email`,`created_at`,`updated_at`) VALUES ('Milan Kundera','milan.kundera@example.com','2018-09-11 11:33:07','2018-09-11 11:33:07') ON DUPLICATE KEY UPDATE `authors`.`email`=VALUES(`email`),`authors`.`updated_at`=VALUES(`updated_at`)
(154.6ms) COMMIT
=> true
irb(main)> puts JSON.parse(Author.all.to_json).to_yaml # Isn't there an easy way to get clean yaml...
---
- id: 7
name: Milan Kundera
email: milan.kundera@example.com
created_at: '2018-09-11T11:42:15.000Z'
updated_at: '2018-09-11T11:42:15.000Z'
irb(main)> data = [{ name: 'Milan Kundera', email: 'milan.kundera2@example.com' }, { name: 'Immanuel Kant', email: 'immanuel.kant@example.com' }]
irb(main)> CSVStepImporter::Loader.new(rows: data, processor_classes: [Author::ImportableModel]).save!
=> true
irb(main)> puts JSON.parse(Author.all.to_json).to_yaml # Isn't there an easy way to get clean yaml...
# NOTE: updated_at changed, but the id did not
---
- id: 7
name: Milan Kundera
email: milan.kundera2@example.com
created_at: '2018-09-11T11:42:15.000Z'
updated_at: '2018-09-11T12:19:17.000Z'
- id: 9
name: Immanuel Kant
email: immanuel.kant@example.com
created_at: '2018-09-11T12:19:17.000Z'
updated_at: '2018-09-11T12:19:17.000Z'
Import a CSV data file
rails c
irb(main)> File.open("authors.csv", "w") do |file|
file.write(<<~CSV)
Name,Email
Milan Kundera,milan.kundera@example.com
Immanuel Kant,immanuel.kant@example.com
CSV
end
irb(main)> CSVStepImporter::Loader.new(path: 'authors.csv', processor_classes: [Author::ImportableModel], csv_options: {file_encoding: "UTF-8"}).save
=> true
irb(main)> puts JSON.parse(Author.all.to_json).to_yaml
# NOTE: The email and updated_at is updated as specified in on_duplicate_key_update
---
- id: 7
name: Milan Kundera
email: milan.kundera@example.com
created_at: '2018-09-11T11:42:15.000Z'
updated_at: '2018-09-11T12:24:13.000Z'
- id: 9
name: Immanuel Kant
email: immanuel.kant@example.com
created_at: '2018-09-11T12:19:17.000Z'
updated_at: '2018-09-11T12:24:13.000Z'
Simple model
By default, for each row read from the CSV file, a DAO belonging to a model will be created.
These models will be validated and saved in the order specified by the processor_classes
option.
The simplest model is one, which simply calls save
on all DAOs, which calls internally create_or_update
.
create_or_update
is customizable.
Example:
This example will call find_or_create_by
for each row after all validations have passed.
class SimpleDAO < CSVStepImporter::Model::DAO
def create_or_update
Currency.find_or_create_by(name: row.name, code: row.code)
end
end
class SimpleModel < CSVStepImporter::Model::Model
def dao_class
SimpleDAO
end
end
CSVStepImporter::Loader.new(path: 'currencies.csv', processor_classes: [SimpleModel]).save
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run rake install
. To release a new version, update the version number in version.rb
, and then run rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/fruwe/csv_step_importer. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the CSVStepImporter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.