hadoop-papyrus¶ ↑
Enable to run Ruby DSL script on your Hadoop.
Description¶ ↑
You can write DSL by Ruby to run Hadoop as Mapper / Reducer. This gem depends on ‘jruby-on-hadoop’ project.
Install¶ ↑
Required gems are all on GemCutter.
-
Upgrade your rubygem to 1.3.5
-
Install gems
$ gem install hadoop-papyrus
Usage¶ ↑
-
Run Hadoop cluster on your machines and put your ‘hadoop’ executable to your PATH or set HADOOP_HOME env variable.
-
put files into your hdfs. ex) wc/inputs/file1
-
Now you can run ‘papyrus’ like below:
$ papyrus examples/word_count_test.rb
You can get Hadoop job results in your hdfs wc/outputs/part-*
Examples¶ ↑
Word Count DSL script
dsl 'WordCount' from 'wc/inputs' to 'wc/outputs' count_uniq total :bytes, :words, :lines
Log Analysis DSL script
dsl 'LogAnalysis' data 'apache log on test2' do from 'apachelog/inputs' to 'apachelog/outputs' each_line do pattern /(.*) (.*) (.*) \[(.*)\] (".*") (\d*) (\d*) (.*) "(.*)"/ column_name 'remote_host', 'pass', 'user', 'access_date', 'request', 'status', 'bytes', 'pass', 'ua' topic 'ua counts', :label => 'ua' do count_uniq column[:ua] end end end
Run spec¶ ↑
Set HADOOP_HOME on your env and run ‘jruby -S rake spec’
Author¶ ↑
Koichi Fujikawa <fujibee@gmail.com>
Copyright¶ ↑
License: Apache License