Process Structured CSV files (structured_csv
)
Purpose
The structured_csv_to_yaml
script converts a “Structured CSV” file into a YAML file.
When you have data of a yet-undefined data structure, it is useful to manage them inside a CSV file which can be viewed and edited by a CSV editor, such as Excel.
This is extremely useful in developing a normalized structure for such data, as you can ensure that the existing data can be normalized according to a defined structure.
Ultimately, the data is to be meant to exported to a YAML file.
This script supports UTF-8 CSV files.
Note
|
This was originally developed to create over 50 normalized data models for ITU Operational Bulletin data. See https://github.com/ituob/ for more details. |
Installation
Add this line to your application’s Gemfile
:
gem 'structured_csv'
and then run:
bundle install
Or install it without a Gemfile
:
gem install structured_csv
Usage
$ structured_csv_to_yaml [input-file.csv]
Where,
input-file.csv
-
is the input CSV file, the output will be named as
input-file.yaml
.
Details
A Structured CSV file has these properties:
Two structured sections. A section is defined by the first column on an otherwise empty row that is either the first row or a row preceded by an empty row. Two section types are allowed: METADATA
and DATA
.
The METADATA
section has values organized like key-value pairs:
-
Column 1 is the name of key
-
Column 2 is the value
The key
can be a normal string or namespaced:
-
foobar
, this maps to the YAML keyfoobar:
-
foo.bar.boo
, this maps to the YAML structure:foo: bar: boo:
A typical YAML output is like:
---
metadata:
locale:
bar:
en: beef
fr: boeuf
jp: 牛肉
data:
foo:
bar:
...
A sample METADATA section looks like this table:
METADATA |
|
locale.bar.en |
beef |
locale.bar.fr |
boeuf |
locale.bar.jp |
牛肉 |
And generates this YAML:
---
metadata:
locale:
bar:
en: beef
fr: boeuf
jp: 牛肉
The DATA
section has values organized in a table form. The first row is the header row.
The first column is assumed to be the key.
A sample DATA section looks like this table:
DATA |
|||
foo.bar.en |
foo.bar.fr |
foo.bar.jp |
description |
beef |
boeuf |
牛肉 |
Yummy! |
pork |
porc |
豚肉 |
Delicious! |
By default, this table generates this YAML format:
---
data:
beef:
foo:
bar:
en: beef
fr: boeuf
jp: 牛肉
description: Yummy!
pork:
foo:
bar:
en: pork
fr: porc
jp: 豚肉
description: Delicious!
...
In cases where there is no DATA key, you have to specify the type=array
to generate an array:
DATA |
type=array |
||
foo.bar.en |
foo.bar.fr |
foo.bar.jp |
description |
beef |
boeuf |
牛肉 |
Yummy! |
pork |
porc |
豚肉 |
Delicious! |
---
data:
- foo:
bar:
en: beef
fr: boeuf
jp: 牛肉
description: Yummy!
- foo:
bar:
en: pork
fr: porc
jp: 豚肉
description: Delicious!
...
You are also allowed to specify the data types of columns. The types of string
, boolean
and integer
are supported.
DATA |
|||
foo.bar.en[string] |
foo.bar.fr[string] |
yummy[boolean] |
availability[integer] |
beef |
boeuf |
TRUE |
3 |
pork |
porc |
FALSE |
10 |
---
data:
beef:
foo:
bar:
en: beef
fr: boeuf
yummy: true
availability: 3
pork:
foo:
bar:
en: pork
fr: porc
yummy: false
availability: 10
...
Examples
The samples/
folder contains a number of complex examples.