Loadmop
Helps load up the OMOP Vocabulary/CDM into the database of your choice.
Requirements
- Ruby 2.0+
- Bundler
- Some sort of RDBMS to store OMOP Vocabulary files in
Installation
Add this line to your application's Gemfile:
gem 'loadmop'
And then execute:
bundle
Or install it yourself as:
gem install loadmop
Usage
Run bundle exec loadmop
with no arguments to get a help screen and figure out which command you want to run
Preparation
Create a .env file in the root directory of the clone, specifying:
- SEQUELIZER_DATABASE
- The name of the database you want to install to
- Required
- SEQUELIZER_ADAPTER
- The name of the adapter to use (e.g. postgres, sqlite, mysql, oracle)
- Required
- SEQUELIZER_HOST
- The host on which the database lives
- Optional
- SEQUELIZER_USERNAME
- The username to connect to the database with
- Optional
- SEQUELIZER_PASSWORD
- The password to connect to the database with
- Optional
- SEQUELIZER_SEARCH_PATH
- At least for PostgreSQL, specifies a schema to install into
- Optional
-
When using PostgreSQL, make sure to use the following entry in loadmop's .env file to ensure the data is put in the correct schema:
- SEQUELIZER_SEARCH_PATH=vocabulary,public
See the Sequelizer Gem for some .env examples
Then:
-
Create the database you just specified in your .env file
- loadmop isn't (yet) cool enough to actually create the database for you
- If you're using SQLite, you don't have to create the database file
- If your database defaults to using case-insensitive storage of text, (I'm looking at you MySQL and SQL Server), make sure to set a case-sensitive collation on your database
- cd into a directory where you've defined a config/database.yml or .env file that is compatible with Sequelizer
- run
bundle install
to make sure you have all the needed dependencies installed - run
bundle exec sequelizer config
to ensure your connection parameters are correctly set - run
bundle exec sequelizer update_gemfile
to ensure your Gemfile has the right database gem
Loading Vocabulary Files
- Download the OMOP Vocabulary Files and unzip them to some directory.
- In addition to the default selected vocabularies, also select vocabulary 34 (ICD-10)
- After downloading the zip file with the vocabularies via the link sent via email, run the java program to download the CPT codes, which can take about an hour.
- There are ~18,000 CPT codes to download one-at-a-time
- Also, if on a case sensitive file system, add concept.csv to CONCEPT.csv:
cat concept.csv >> CONCEPT.csv
- Unfortunately, this cannot be made significantly easier as the restricted vocabulary data can not be reformatted and redistributed due to the licensing agreement.
- Run
bundle exec loadmop create_vocab_database <database_name> /absolute/path/to/directory/holding/unzipped/vocabulary/files
- This runs all the steps for setting up the vocabulary database, namely
- Creating the proper tables
- Prepping the CSV files to load into the database
- Loading the CSV files into the database
- Adding some useful indexes to the vocabulary tables
- This runs all the steps for setting up the vocabulary database, namely
Loading CDM Data
- Run
bundle exec loadmop create_cdmv4_data /path/to/directory/holding/cdm/data/files
- This runs all the steps for loading CDM data into a database, namely
- Creating the proper CDM tables
- Prepping the CSV files to load into the database
- Loading the CSV files into the database
- Adding some useful indexes to the CDM tables
- This runs all the steps for loading CDM data into a database, namely
Pleas for Help
I've written methods to quickly load the data into PostgreSQL and SQLite, but I don't regularly use many other RDBMSs. Right now they use a method that should work for all RDBMSs, but is pretty slow.
Some of the faster methods are Unix-only as well. If there are fast, platform-independent ways to load the data, I'm interested.
Please submit suggestions or pull requests to speed up loading under other RDBMSs and I'll incorporate them. Thanks!
Contributing
- Fork it https://github.com/outcomesinsights/loadmop/fork
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request