0.02
No commit activity in last 3 years
No release in over 3 years
a plugin for Patriot Workflow Scheduler, which deal with Hadoop-related softwares.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

patriot-workflow-scheduler

Patriot-workflow-scheduler is a workflow scheduler that can manage batch jobs with complicated dependency relations. The workflow scheduler can deal with an extensible monolithic workflow. Since the various kinds of jobs can be consolidated into the monolithic workflow, various combinations of data integration can be flexibly achieved.

The scheduler offers an original DSL for batch configuration (PBC: Patriot Batch Configuration). The description of a complicated workflow can be simplified by the DSL. In addition, the DSL is designed to be extensible so that customized job types can be easily integrated.

Features

  • Execution management of batch jobs with complicated dependency
  • an extensible DSL for batch job configurations
  • a simple web console for jobs administration

Getting Started

Install

  • install
% gem install patriot-workflow-scheduler
% patriot-init ${install_dir}
  • update existing installation

gem install won't upgrade the js for web console. The patriot upgrade should be used to replace the js with new one.

% cd ${install_dir}
% ./bin/patriot upgrade

Execute a sample job

Command line tools are available for processing jobs defined in PBC. The tool can be triggered by the patriot script. The script takes three arguments: a tool name ('execute' for job execution), a target date of the jobs (in yyyy-MM-dd) and a path to the PBC file. In processing the PBC files, a variable '_date_' will be replaced with the target date.

% cd ${install_dir}
% cat batch/sample/daily/test.pbc
sh{
  name "test"
  commands "echo '#{_date_}' > /tmp/test.out"
}
% ./bin/patriot execute 2015-04-01 batch/sample/daily/test.pbc
% cat /tmp/test.out
2015-04-01

Dependency Configuration

Dependencies between jobs are defined through products. The products produced/required by jobs are configured by produce and require, respectively. A job becomes ready to be executed when all products required by the job are available. The products become available when all jobs which produce the product are finished.

To run jobs according to dependencies, the strict option should be passed to the execute tool.

% cat flow_sample.pbc
sh{
  require ['product1'] # run after 'product1' is created
  name "consumer"
  commands "echo 'this is a consumer'"
}
sh{
  produce ['product1'] # this job creates 'product1'
  name "producer"
  commands "echo 'this is a producer'"
}
% ./bin/patriot execute --strict 2015-04-01 flow_sample.pbc
# execute echo 'this is a producer'
# execute echo 'this is a consumer'

Build from Source

  • Compiling JSX

The Web console implemented in JSX (React.js) and compiled with Grunt. To compile the JSXs, install Nodejs and set up grunt.

% # install node (e.g., brew install node)
% npm install -g grunt-cli
% npm install
% grunt
( or "grunt --watch" to watchify, "grunt build" to uglify js files. )

-- Build gem and upgrade installation

gem build patriot-workflow-scheduler.gemspec
cd ${install_dir}
./bin/patriot upgrade

Generate documents

  • Github pages

The gh-pages branch should be managed as workdir at docs/build/html. The document can be generatd by Sphinx.

% git-new-workdir . docs/build/html gh-pages
% cd docs
% make html
  • API documents

You can easily generate API documents using yard.

% gem install yard
% yard doc lib plugins
% yard server

Now you can see docs at http://localhost:8808/docs/frames/index.

For more information

For understanding how to manage dependencies in a complicated workflow, the architecture of the scheduler, other command line tools, etc, please see the Github pages.

Requirements

  • Ruby ( >= 1.9 )
  • MySQL ( recommended for production use )

License

Copyright © CyberAgent, Inc. All Rights Reserved.

This package is released under Apache License Ver. 2.0. http://www.apache.org/licenses/LICENSE-2.0.txt