Skywatch (alpha)
A simple alerting system that lets you define checks and alerts in any language that are then magically run on Heroku. Nagios can go cry in a corner.
NoOps! Polyglot! Free monitoring of anything!
Installation
$ gem install skywatch
Usage / Quickstart
It's a fairly powerful tool. Run skywatch --help
to see a list of
subcommands. Here is the quickest way to something interesting:
$ mkdir demo && cd demo # make a directory for your scripts
$ skywatch init # this will fail and require Heroku auth
$ skywatch init # run again, after logged in
$ skywatch enable check example # enable the example check script
$ skywatch deploy # everything is shipped to Heroku
$ skywatch monitor # watch it run in the cloud
Features
- Lets you monitor or assert anything at any frequency
- Scriptable alerts (email, sms, tweet, etc)
- Runs in the cloud on Heroku for free
- Completely automated deployment
- Easily monitor activity logs in real-time
- Enable / disable checks or alerts
- Unphased by flapping. You get alerted once.
- Entire system is in self-contained CLI tool
- Simple enough for personal use, powerful enough for commercial use
- Can be used for building adaptive systems?
What the hell is this amazing thing??
tl;dr, skywatch is a tool to run repeating check scripts on Heroku. It's the simplest idea wrapped into a convenient utility.
Skywatch is a command-line utility that manages checking and alerting scripts used by a small (50 lines) watcher service. Skywatch deploys these scripts and the service on Heroku where they can run and monitor anything from the cloud for free.
The watcher service runs check scripts that can assert anything at any frequency. If a check script returns a non-zero exit status, it will fire any enabled alert scripts, passing it the output of the check script. Alert scripts can then act on this assertion failure, such as send email, SMS, or webhook.
The check script will continue to run and potentially fail, but the alert script only runs once if it ran without error. Only until a reset signal is sent will it be ready to fire the alert again for any failed check script. In this way, alerts work like clip indicators in the audio recording world. They turn on once any clipping happens and remain on until you manually reset them.
You manage your scripts locally with the skywatch command, or by hand since they're just files in directories. When you want to deploy script changes, toggle enabled scripts, or reset the alert state, you can run a skywatch command and it will handle pushing changes to Heroku for you.
Using skywatch
The skywatch command manages a directory containing check scripts and alert scripts. You can make a new directory and let skywatch set this up for you:
$ mkdir skywatch-demo
$ cd skywatch-demo
$ skywatch init
It will have you authenticate with your Heroku credentials if you
haven't already. Grab a free account if you don't have
one. When you run skywatch init
authenticated it will create some example alerts and checks, then deploy
an empty watcher to Heroku. None of the checks or alerts are enabled by
default. See the scripts it set up by just running skywatch
from the
directory:
$ skywatch
Checks for fathomless-crag-3169
example every 30s disabled
skywatch_watchers every 3600s disabled
Alerts for fathomless-crag-3169
email disabled
Take a look at all the files in the directory. Checks and alerts are nothing
more than scripts. Checks have a naming convention of <interval>.<name>
,
and enabling and disabling is just setting the execute bit on the
scripts. There's nothing the skywatch
command does that you can't
easily do by hand. It just happens to be terribly convenient.
$ skywatch edit alert email
This will open your editor and you can see the example email alert
script is using SendGrid. In fact, when you ran skywatch init
, you
were set up with a free SendGrid starter addon for 600 emails a day. So
let's try it by putting your email address in the TO
variable of the
script. Now enable the alert:
$ skywatch enable alert email
Let's create a new check script in bash that fails so we can get the alert.
$ skywatch create check failure_test 30
The last argument is the interval. Intervals are always in seconds. All
this did was create a new file under the checks
directory with a
little bit of boiler plate. Let's replace its contents with this:
#!/usr/bin/env bash
echo "Oh no, a failed check."
exit 255
Enable the check and then deploy:
$ skywatch enable check failure_test
$ skywatch deploy
It's going to move some files around and then deploy to Heroku. It keeps
a staging directory called .skywatch
, which is a Git repo used to push
to Heroku. It automatically adds this to a .gitignore
file, so you can
version your scripts with Git and not worry about this implementation
detail.
Once it's finished, you might want to run monitor to see how it went and what's going on. This is just tailing the Heroku logs of the watcher service:
$ skywatch monitor
You can run this whenever to see what it's doing. You'll probably see that it triggered the alert. Go check your email! That will be the only email you get, regardless of whether the check starts to work again and then fail again. No flapping. You have to manually reset:
$ skywatch reset
This should cause another alert email within 30 seconds. And of course, you can tear everything down with destroy:
$ skywatch destroy
This destroys the Heroku app and the .skywatch
directory. It doesn't
touch your scripts at all. In fact, you can run skywatch init
again if
you'd like.
The source code to all this is terribly simple. The watcher service is only about 50 lines of Ruby. Everything else is just file operations. In fact, the little state it maintains is kept in file metadata. For how automated it is, skywatch has to be one of the simplest monitoring services ever.
Writing Check Scripts
Check scripts are any executable script using the shebang to define the interpreter. Heroku has most common languages built-in to its Cedar stack, so feel free to use Python, Perl, Ruby, whatever. I like bash.
The only conventions of check scripts are the interval-in-the-filename and that a non-zero exit status will fire the alerts. Any output of the check script will be piped into STDIN of the alert script, so try be verbose but not too verbose.
If you're using bash, it's a good idea to use set -e
so any failed
subcommand will bubble up. Here's an example check script:
#!/usr/bin/env bash
set -e
curl --trace-ascii --silent --fail http://example.com
Writing Alert Scripts
Like check scripts, alert scripts can be written in any language. Also, like check scripts, the exit status is important. If an alert script exit status is non-zero, it will run again with the next failure of the check script.
The alert script is given the output of the check script via STDIN. It's also given 2 arguments. The first is the name of the check script. The second is the exit status of the failed check script. Here's an example alert script:
#!/usr/bin/env bash
TO=foobar@example.com
SUBJECT="[skywatch] $1"
BODY=`echo -e "Failure with status $2:\n\n$(cat)"`
set -e
curl \
-X 'POST' \
-F "api_user=$SENDGRID_USERNAME" \
-F "api_key=$SENDGRID_PASSWORD" \
-F "to=$TO" \
-F "subject=$SUBJECT" \
-F "text=$BODY" \
-F "from=$TO" \
--silent --fail "https://sendgrid.com/api/mail.send.json"
The output of an alert script is ignored. It might be a good idea to log
the output of failed alert scripts. You'd then be able to see it via
skywatch monitor
. Sounds like a contribution idea.
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
License
MIT