Project

loadmop

0.0
No release in over 3 years
loadmop assits in loading OMOP Vocabulary files and OMOP CDMv4-compatible CSV files into your database
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.6

Runtime

~> 10.3
>= 0.0.5
~> 0.19
 Project Readme

Loadmop

Helps load up the OMOP Vocabulary/CDM into the database of your choice.

Requirements

  • Ruby 2.0+
  • Bundler
  • Some sort of RDBMS to store OMOP Vocabulary files in

Installation

Add this line to your application's Gemfile:

gem 'loadmop'

And then execute:

bundle

Or install it yourself as:

gem install loadmop

Usage

Run bundle exec loadmop with no arguments to get a help screen and figure out which command you want to run

Preparation

Create a .env file in the root directory of the clone, specifying:

  • SEQUELIZER_DATABASE
    • The name of the database you want to install to
    • Required
  • SEQUELIZER_ADAPTER
    • The name of the adapter to use (e.g. postgres, sqlite, mysql, oracle)
    • Required
  • SEQUELIZER_HOST
    • The host on which the database lives
    • Optional
  • SEQUELIZER_USERNAME
    • The username to connect to the database with
    • Optional
  • SEQUELIZER_PASSWORD
    • The password to connect to the database with
    • Optional
  • SEQUELIZER_SEARCH_PATH
    • At least for PostgreSQL, specifies a schema to install into
    • Optional
    • When using PostgreSQL, make sure to use the following entry in loadmop's .env file to ensure the data is put in the correct schema:
      • SEQUELIZER_SEARCH_PATH=vocabulary,public

See the Sequelizer Gem for some .env examples

Then:

  • Create the database you just specified in your .env file
    • loadmop isn't (yet) cool enough to actually create the database for you
    • If you're using SQLite, you don't have to create the database file
    • If your database defaults to using case-insensitive storage of text, (I'm looking at you MySQL and SQL Server), make sure to set a case-sensitive collation on your database
  • cd into a directory where you've defined a config/database.yml or .env file that is compatible with Sequelizer
  • run bundle install to make sure you have all the needed dependencies installed
  • run bundle exec sequelizer config to ensure your connection parameters are correctly set
  • run bundle exec sequelizer update_gemfile to ensure your Gemfile has the right database gem

Loading Vocabulary Files

  • Download the OMOP Vocabulary Files and unzip them to some directory.
    • In addition to the default selected vocabularies, also select vocabulary 34 (ICD-10)
    • After downloading the zip file with the vocabularies via the link sent via email, run the java program to download the CPT codes, which can take about an hour.
      • There are ~18,000 CPT codes to download one-at-a-time
    • Also, if on a case sensitive file system, add concept.csv to CONCEPT.csv: cat concept.csv >> CONCEPT.csv
    • Unfortunately, this cannot be made significantly easier as the restricted vocabulary data can not be reformatted and redistributed due to the licensing agreement.
  • Run bundle exec loadmop create_vocab_database <database_name> /absolute/path/to/directory/holding/unzipped/vocabulary/files
    • This runs all the steps for setting up the vocabulary database, namely
      • Creating the proper tables
      • Prepping the CSV files to load into the database
      • Loading the CSV files into the database
      • Adding some useful indexes to the vocabulary tables

Loading CDM Data

  • Run bundle exec loadmop create_cdmv4_data /path/to/directory/holding/cdm/data/files
    • This runs all the steps for loading CDM data into a database, namely
      • Creating the proper CDM tables
      • Prepping the CSV files to load into the database
      • Loading the CSV files into the database
      • Adding some useful indexes to the CDM tables

Pleas for Help

I've written methods to quickly load the data into PostgreSQL and SQLite, but I don't regularly use many other RDBMSs. Right now they use a method that should work for all RDBMSs, but is pretty slow.

Some of the faster methods are Unix-only as well. If there are fast, platform-independent ways to load the data, I'm interested.

Please submit suggestions or pull requests to speed up loading under other RDBMSs and I'll incorporate them. Thanks!

Contributing

  1. Fork it https://github.com/outcomesinsights/loadmop/fork
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Thanks