Triglav::Agent::Hdfs
Triglav Agent for Hdfs
Requirements
- JRuby >= 9.1.5.0
- Java >= 1.8.0_45
Prerequisites
- HDFS path to be monitored must be created or modified atomically. To modify HDFS path atomically, use either of following strategies for example:
- Create a tmp directory and copy files into the directory, then move to the target path
- Create a marker file such as
_SUCCESS
after copying is done, and monitor the_SUCESSES
file
Installation
Add this line to your application's Gemfile:
gem 'triglav-agent-hdfs'
And then execute:
$ bundle
Or install it yourself as:
$ gem install triglav-agent-hdfs
CLI
Usage: triglav-agent-hdfs [options]
-c, --config VALUE Config file (default: config.yml)
-s, --status VALUE Status stroage file (default: status.yml)
-t, --token VALUE Triglav access token storage file (default: token.yml)
--dotenv Load environment variables from .env file (default: false)
-h, --help help
--log VALUE Log path (default: STDOUT)
--log-level VALUE Log level (default: info)
Run as:
TRIGLAV_ENV=development bundle exec triglav-agent-hdfs --dotenv -c config.yml
Configuration
Prepare config.yml as example/config.yml.
You can use erb template. You may load environment variables from .env file with --dotenv
option.
serverengine section
You can specify any serverengine options at this section
triglav section
Specify triglav api url, and a credential to authenticate.
The access token obtained is stored into a token storage file (--token option).
hdfs section
This section is the special section for triglav-agent-hdfs.
- monitor_interval: The interval to watch tables (number, default: 60)
- connection_info: key-value pairs of hdfs connection info where keys are resource URI pattern in regular expression, and values are connection information
Specification of Resource URI
Resource URI must be a form of:
hdfs://{namespace}/#{path}
Path accepts strftime
format such as %Y-%m-%d
.
How it behaves
- Authenticate with triglav
- Store the access token into the token storage file
- Read the token from the token storage file next time
- Refresh the access token if it is expired
- Repeat followings in
monitor_interval
seconds: - Obtain resource (table) lists of the specified prefix (keys of connection_info) from triglav.
- Connect to hdfs with an appropriate connection info for a resource uri, and find tables which are newer than last check.
- Store checking information into the status storage file for the next time check.
Development
Prepare
bundle
bundle exec rake vendor_jars
./prepare.sh
Edit .env
file or config.yml
file directly.
Start
Start up triglav api on localhost.
Run triglav-agent-hdfs as:
TRIGLAV_ENV=development bundle exec triglav-agent-hdfs --dotenv --debug -c example/config.yml
The debug mode with --debug option ignores the last_modification_time
value in status file.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/triglav-agent-hdfs/triglav-agent-hdfs. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
ToDo
- prepare mocks of both triglav and hdfs for tests