Project

to-arff

0.0
No commit activity in last 3 years
No release in over 3 years
ToARFF is a ruby gem to convert sqlite database file to ARFF (Attribute-Relation File Format) file.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.12
>= 0
~> 11.2
~> 1.3
 Project Readme

ToARFF

Build Status Coverage Status Gem Version Dependency Status Code Climate MIT licensed

Table of Content

  • About
    • What is an ARFF File
  • Installation
  • Usage
    • Convert from an SQLite Database
  • Contributing
  • License

About

ToARFF is a ruby library to convert SQLite database files to ARFF files (Attribute-Relation File Format), which is used to specify datasets for WEKA, a machine learning and data mining tool.

What is an ARFF File:

This wiki describes perfectly,

"An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software."

Note: Converting from an SQLite database will generate one ARFF file per table. See this stackoverflow post.

Installation

Add this line to your application's Gemfile:

gem 'to-arff'

And then execute:

$ bundle

Or install it yourself as:

$ gem install to-arff

Usage

###Convert from an SQLite Database

By Specifying Column Types (Recommended)

Use the convert() method and specify the column/attribute types as a json (or nested hash).

require 'to-arff'
# Get the db file from https://github.com/dhrubomoy/to-arff/blob/master/spec/sample_db_files/sample2.db
sample = ToARFF::SQLiteDB.new "/path/to/sample2.db"
# Attribute names and types must be valid and specified as either json or nested hash
# eg. { "table1": {"column11": "NUMERIC",
#                    "column12": "STRING"
#                   },
#       "table2": {"column21": "class {Iris-setosa,Iris-versicolor,Iris-virginica}",
#                    "column22": "DATE \"yyyy-MM-dd HH:mm:ss\""
#                   }
#     }
# OR  { "table1" => {"column11"=>"NUMERIC",
#                    "column12"=>"STRING"
#                   },
#       "table2" => {"column21"=>"class {Iris-setosa,Iris-versicolor,Iris-virginica}",
#                    "column22"=>"DATE \"yyyy-MM-dd HH:mm:ss\""
#                   }
#     }
sample_column_types_param_json = {
                                    "employees": {
                                      "EmployeeId": "NUMERIC",
                                      "LastName": "STRING",
                                      "City": "STRING",
                                      "HireDate": "DATE \"yyyy-MM-dd HH:mm:ss\""
                                    },
                                    "albums": {
                                      "Albumid": "NUMERIC",
                                      "Title": "STRING"
                                    }
                                  }
sample_column_types_param_hash = { "employees" => {"EmployeeId"=>"NUMERIC",
                                              "LastName"=>"STRING",
                                              "City"=>"STRING",
                                              "HireDate"=>"DATE \"yyyy-MM-dd HH:mm:ss\""
                                             },
                                    "albums" => { "Albumid"=>"NUMERIC",
                                                  "Title"=>"STRING"
                                                }
                                  }
puts sample.convert column_types: sample_column_types_param_json
#OR
puts sample.convert column_types: sample_column_types_param_hash

Both will produce string similar to following:

@RELATION employees

@ATTRIBUTE EmployeeId NUMERIC
@ATTRIBUTE LastName STRING
@ATTRIBUTE City STRING
@ATTRIBUTE HireDate DATE "yyyy-MM-dd HH:mm:ss"

@DATA
1,"Adams","Edmonton","2002-08-14 00:00:00"
2,"Edwards","Calgary","2002-05-01 00:00:00"
3,"Peacock","Calgary","2002-04-01 00:00:00"
...and so on...

@RELATION albums

@ATTRIBUTE Albumid NUMERIC
@ATTRIBUTE Title STRING

@DATA
1,"For Those About To Rock We Salute You"
2,"Balls to the Wall"
3,"Restless and Wild"
...and so on...

By Specifying Column Names

require 'to-arff'
sample = ToARFF::SQLiteDB.new "/path/to/sample_sqlite.db"
# Column names must be specified like this:
# { "table1" => ["column11", "column12",...],
#   "table2" => ["column21", "column22",...]
# }
# OR
# { "table1": ["column11", "column12",...],
#   "table2": ["column21", "column22",...]
# }
sample_columns_json = { "albums": ["AlbumId", "Title", "ArtistId"],
                         "employees": ["EmployeeId", "LastName", "FirstName", "Title"]
                       }
sample_columns_hash =  { "albums" => ["AlbumId", "Title", "ArtistId"],
                         "employees" => ["EmployeeId", "LastName", "FirstName", "Title"]
                       }
puts sample.convert columns: sample_columns_json
puts sample.convert columns: sample_columns_hash

Both json and hash parameters for columns: will return string similar to following:

@RELATION albums

@ATTRIBUTE AlbumId NUMERIC
@ATTRIBUTE Title STRING
@ATTRIBUTE ArtistId NUMERIC

@DATA
1,"For Those About To Rock We Salute You",1
2,"Balls to the Wall",2
...and so on...



@RELATION employees

@ATTRIBUTE EmployeeId NUMERIC
@ATTRIBUTE LastName STRING
@ATTRIBUTE FirstName STRING
@ATTRIBUTE HireDate STRING

@DATA
1,"Adams","Andrew","2002-08-14 00:00:00"
2,"Edwards","Nancy","2002-05-01 00:00:00"
...and so on..

As you can see, "HireDate" Attribute didn't have the correct datatype. It should be "DATE "yyyy-MM-dd HH:mm:ss"", not "STRING"

You can also do following, but might not generate correct datatypes
require 'to-arff'
sample = ToARFF::SQLiteDB.new "/path/to/sample_sqlite.db"
sample.convert tables: ["albums","employees"]
# OR
sample.convert

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/dhrubomoy/to-arff. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

  1. Fork it ( https://github.com/dhrubomoy/to-arff/fork )
  2. Create branch (git checkout -b my-new-feature)
  3. Make changes. Add test cases for your changes
  4. Run rspec spec/ and make sure all the test passes
  5. Commit your changes (git commit -am 'Add some feature')
  6. Push to the branch (git push origin my-new-feature)
  7. Create new Pull Request

License

The gem is available as open source under the terms of the MIT License.