Loadmop

Helps load up the OMOP Vocabulary/CDM into the database of your choice.

Requirements

Ruby 2.0+
Bundler
Some sort of RDBMS to store OMOP Vocabulary files in

Installation

Add this line to your application's Gemfile:

gem 'loadmop'

And then execute:

bundle

Or install it yourself as:

gem install loadmop

Usage

Run bundle exec loadmop with no arguments to get a help screen and figure out which command you want to run

Preparation

Create a .env file in the root directory of the clone, specifying:

SEQUELIZER_DATABASE
- The name of the database you want to install to
- Required
SEQUELIZER_ADAPTER
- The name of the adapter to use (e.g. postgres, sqlite, mysql, oracle)
- Required
SEQUELIZER_HOST
- The host on which the database lives
- Optional
SEQUELIZER_USERNAME
- The username to connect to the database with
- Optional
SEQUELIZER_PASSWORD
- The password to connect to the database with
- Optional
SEQUELIZER_SEARCH_PATH
- At least for PostgreSQL, specifies a schema to install into
- Optional
- When using PostgreSQL, make sure to use the following entry in loadmop's .env file to ensure the data is put in the correct schema:
  - SEQUELIZER_SEARCH_PATH=vocabulary,public

See the Sequelizer Gem for some .env examples

Then:

Create the database you just specified in your .env file
- loadmop isn't (yet) cool enough to actually create the database for you
- If you're using SQLite, you don't have to create the database file
- If your database defaults to using case-insensitive storage of text, (I'm looking at you MySQL and SQL Server), make sure to set a case-sensitive collation on your database
cd into a directory where you've defined a config/database.yml or .env file that is compatible with Sequelizer
run bundle install to make sure you have all the needed dependencies installed
run bundle exec sequelizer config to ensure your connection parameters are correctly set
run bundle exec sequelizer update_gemfile to ensure your Gemfile has the right database gem

Loading Vocabulary Files

Download the OMOP Vocabulary Files and unzip them to some directory.
- In addition to the default selected vocabularies, also select vocabulary 34 (ICD-10)
- After downloading the zip file with the vocabularies via the link sent via email, run the java program to download the CPT codes, which can take about an hour.
  - There are ~18,000 CPT codes to download one-at-a-time
- Also, if on a case sensitive file system, add concept.csv to CONCEPT.csv: cat concept.csv >> CONCEPT.csv
- Unfortunately, this cannot be made significantly easier as the restricted vocabulary data can not be reformatted and redistributed due to the licensing agreement.
Run bundle exec loadmop create_vocab_database <database_name> /absolute/path/to/directory/holding/unzipped/vocabulary/files
- This runs all the steps for setting up the vocabulary database, namely
  - Creating the proper tables
  - Prepping the CSV files to load into the database
  - Loading the CSV files into the database
  - Adding some useful indexes to the vocabulary tables

Loading CDM Data

Run bundle exec loadmop create_cdmv4_data /path/to/directory/holding/cdm/data/files
- This runs all the steps for loading CDM data into a database, namely
  - Creating the proper CDM tables
  - Prepping the CSV files to load into the database
  - Loading the CSV files into the database
  - Adding some useful indexes to the CDM tables

Pleas for Help

I've written methods to quickly load the data into PostgreSQL and SQLite, but I don't regularly use many other RDBMSs. Right now they use a method that should work for all RDBMSs, but is pretty slow.

Some of the faster methods are Unix-only as well. If there are fast, platform-independent ways to load the data, I'm interested.

Please submit suggestions or pull requests to speed up loading under other RDBMSs and I'll incorporate them. Thanks!

Contributing

Fork it https://github.com/outcomesinsights/loadmop/fork
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

Thanks

Outcomes Insights, Inc.
- Many thanks for allowing me to release a portion of my work as Open Source Software!
IMEDS/OMOP/OHDSI
- For developing the vocabulary files. What a fantastic resource!
OMOP/OHDSI
- For developing their CDM

loadmop

Development

Runtime