Overview
We provide some utilities for validating data in a PostgreSQL data table. These utilities can be used as simple terminal commands and can be installed by:
gem install idata
idata comes along with the following commands:
- iload
- ivalidate
- ipatch
- ieval
- iexport
- imerge
- isanitize
Run a command with --help
switch for the details
Prequisites:
- PostgreSQL 9.0 or above
- Ruby 2.0 or above
- An auto ID field is required for data table to be validated using ivalidate
Usage
Suppose we have an items
table, and we want to validate its records against certain criteria like:
-
vendor_code
must not be null -
vendor_name
must not be null - The composite
[vendor_code, vendor_name]
must be unique - One
vendor_code
corresponds to only ONEvendor_name
(in other words, there must not be two items with differentvendor_name
but with the samevendor_code
) and vice-versa -
vendor_code
must reference thecode
column in thevendors
table
Then the validation command could be:
ivalidate --host=localhost --user=postgres --database=mydb --table=items
--log-to=validation_errors \
--not-null="vendor_code" \
--not-null="vendor_name" \
--unique="vendor_code,vendor_name" \
--consistent-by="vendor_code|vendor_name" \
--consistent-by="vendor_name|vendor_code" \
--cross-reference="vendor_code|vendors.code"
Validation results for every single record are logged to an additional column named validation_errors
of the items
table, as specified by the --log-to
switch. As you can see, most common checks can be performed using the supported switches:
--not-null
--unique
--consistent-by
--cross-reference
Custom Validation
For more customized checks, we support some other switches.
The --match="field/pattern/"
switch tells the program to check if value of a field
matches the provided pattern
(which is a regular expression).
For example:
# Check if item_id is a number:
ivalidate --match="item_id/[0-9]+/"
# Check if value of status is either 'A' or 'I' (any other value is not allowed)
ivalidate --match="status/^(A|I)$/"
In case you need even more customized validation other than the supported ones (match
, unique
, not-null
, cross-reference
...)
then the --query
switch may come in handy. For example:
ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"
You can also use --rquery
which is the reversed counterpart of --query
For example, the following two checks are equivalent, mark any record whose start_date < '01/02/2014'
as "invalid date":
ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"
ivalidate --rquery="start_date < string_to_date('01/02/2014') -- invalid date"
Note: run ivalidate --help
to see the full list of supported switches
Put it all together
You can put several ivalidate
commands (for several data tables) in one single bash/sh file.
Besides ivalidate
, we also support some other utilities to:
- Load data from text files to SQL tables
- Modify data tables
- Generate summary reports
See our sample.sh
for a comprehensive example