pupistry
Pupistry (puppet + artistry) is a solution for implementing reliable and secure
masterless puppet deployments by taking Puppet modules assembled by r10k
and
generating compressed and signed archives for distribution to the masterless
servers.
Pupistry builds on the functionality offered by the r10k
workflow but rather
than requiring the implementing of site-specific custom bootstrap and custom
workflow mechanisms, Pupistry executes r10k
, assembles the combined modules
and then generates a compressed artifact file. It then optionally signs the
artifact with GPG and finally uploads it into an Amazon S3 bucket along with a
manifest file.
The masterless Puppet machines then just run a Pupistry job which checks for a new version of the manifest file. If there is, it downloads the new artifact and does an optional GPG validation before applying it and running Puppet. To make life even easier, Pupistry will even spit out bootstrap files for your platform which sets up each server from scratch to pull and run the artifacts.
Essentially Pupistry is intended to be a robust solution for masterless Puppet deployments and makes it trivial for beginners to get started with Puppet.
Why Pupistry?
Masterless Puppet is a great solution for anyone wanting to avoid scaling issues and risk of centralised failure due to a central Puppet master, but it does bring a number of issues with it.
- Having to setup deployer keys to every Git repo used is a maintainance headache. Pupistry means only your workstation needs access, which presumably will have access to most/all repos already.
- Your system build success is dependent on all the Git repos you've used, including any third parties that could vanish. A single missing or broken repo could prevent autoscaling or new machine builds at a critical time. Pupistry's use of artifact files prevents surprises - if you can hit S3, you're sorted.
- It is easy for malicious code in the third party repos to slip in without noticing. Even if the author themselves is honest, not all repos have proper security like two-factor. Pupistry prevents surprise updates of modules and also has an easy diff feature to see what changed since you last generated an artifact.
- Puppet masterless tends to be implemented in many different ways using everyone's own hacky scripts. Pupistry's goal is to create a singular standard approach to masterless, in the same way that
r10k
created a standard approach to Git-based Puppet workflows. And this makes things easy - install Pupistry, add the companion Puppet module and run the bootstrap script. Easy! - No dodgy cronjobs running
r10k
and Puppet in weird ways. A simple clean agent with daemon or run-once functionality. - Performance - Go from 30+ seconds
r10k
update checks to 2 second Pupistry update checks. And when there is a change, it's a fast efficent compressed file download from S3 rather than pulling numerious Git repos.
Usage
Building new artifacts
Build a new artifact:
$ pupistry build
I, [2015-04-08T22:19:30.419392 #52534] INFO -- : Using r10k utility to fetch the latest Puppet code
[R10K::Action::Deploy::Environment - INFO] Deploying environment /Users/jethro/.pupistry/cache/puppetcode/master
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/stdlib
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/ruby
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/gcc
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/inifile
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/vcsrepo
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/git
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/ntp
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/firewall
[R10K::Action::Deploy::Environment - INFO] Deploying module /Users/jethro/.pupistry/cache/puppetcode/master/modules/soe
I, [2015-04-08T22:21:21.705315 #52534] INFO -- : r10k run completed
I, [2015-04-08T22:21:21.706023 #52534] INFO -- : Creating artifact...
I, [2015-04-08T22:21:21.999753 #52534] INFO -- : Compressing artifact...
I, [2015-04-08T22:21:22.103131 #52534] INFO -- : Building manifest information for artifact...
I, [2015-04-08T22:21:22.107012 #52534] INFO -- : New artifact version 3f29c324aab076cd81667f9031a675e7 ready for pushing
--
Tip: Run pupistry diff to see what changed since the last artifact version
Note that artifact builds are done from the upstream Git repos, so if you
have made changes, remember to git push
first before generating. The tool will
remind you if it detects nothing has changed since the last run.
Once your artifact is built, you can double check what has changed in the Puppet modules since the last run with:
$ pupistry diff
diff -Nuar unpacked.3f29c324aab076cd81667f9031a675e7/puppetcode/master/README.md unpacked.4a522dd22c0453e1e3ec3d17dfed151b/puppetcode/master/README.md
--- unpacked.3f29c324aab076cd81667f9031a675e7/puppetcode/master/README.md 2015-04-08 22:19:42.000000000 +1200
+++ unpacked.4a522dd22c0453e1e3ec3d17dfed151b/puppetcode/master/README.md 2015-04-08 23:01:14.000000000 +1200
@@ -1 +1,4 @@
Personal Puppet Repo
+
+Example of a changed file in a module somewhere, nice and visible for all to see.
+
--
Tip: Run pupistry push to GPG sign & upload if happy to go live
Finally when you're happy, push it to S3 to be delivered to all your servers. If you have gpg signing enabled, it will ask you to sign here... or tell you off if you have it disabled. :-)
$ pupistry push
I, [2015-04-08T22:52:01.020865 #53037] INFO -- : Uploading artifact version latest (3f29c324aab076cd81667f9031a675e7)
W, [2015-04-08T22:52:01.888356 #53037] WARN -- : You have GPG signing *disabled*, whilst not critical it does weaken your security.
W, [2015-04-08T22:52:01.888418 #53037] WARN -- : Skipping signing step...
I, [2015-04-08T22:52:03.043886 #53037] INFO -- : Upload of artifact version 3f29c324aab076cd81667f9031a675e7 completed and is now latest
Bootstrapping nodes
New machines need to be bootstrapped in order to install Pupistry, configure it and be able to download configuration. Generally this is a step done differently site-by-site (and you can still do it that way if you want), but if you want a nice easy life, Pupistry can generate you a bootstrap script for your platform.
$ pupistry bootstrap
- centos-7
- ubuntu-14.04
$ pupistry boostrap --template centos-7
# Compatible with RHEL 7, CentOS 7 and maybe other variations.
rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-el-7.noarch.rpm
yum update --assumeyes
yum install --assumeyes puppet ruby-devel rubygems gcc zlib-devel libxml2-devel patch
gem install pupistry
mkdir /etc/pupistry
cat > /etc/pupistry/settings.yaml << "EOF"
general:
app_cache: ~/.pupistry/cache
s3_bucket: example
s3_prefix:
gpg_disable: true
gpg_signing_key: XYZXYZ
agent:
puppetcode: /etc/puppet/environments/
access_key_id:
secret_access_key:
region: ap-southeast-2
proxy_uri:
EOF
pupistry apply --verbose
You generally can run this on a new non-Puppetised machine, or paste into the user data field of most cloud providers like AWS or Digital Ocean. If using CFN with AWS, you can make it part of the stack itself.
These bootstraps aren't mandatory, if you prefer a different approach you can use these as an example and write your own - generally the essential bit is to get puppet installed, get pupistry (and deps to build its gems) installed and write the config before finally executing your first Pupistry/Puppet run.
If using AWS and IAM Roles feature, it is acceptable for access_key_id and secret_access_key to be blank, if not you will need to have these set to an account with read-only access to the configured S3 bucket!
Running Puppet on target nodes
Pupistry replaces the need to call Puppet directly. Instead, call Pupistry and it will handle getting the artifact and then executing Puppet for you. It respects some parameters like --environment and --noop for easy testing of new manifests and modules.
At its simplest, to apply the current Puppet manifests:
$ pupistry apply
I, [2015-04-10T00:44:40.623101 #6726] INFO -- : Pulling latest artifact....
I, [2015-04-10T00:44:42.700540 #6726] INFO -- : Executing Puppet...
Notice: Compiled catalog for testhost1 in environment master in 2.21 seconds
Notice: Finished catalog run in 3.07 seconds
Check what is going to be applied (Puppet in --noop mode)
pupistry apply --noop
Specify an alternative environment:
pupistry apply --environment staging
Run pupistry as a system daemon. When you use the companion Puppet module, a system init file gets installed that sets this daemon up for you automatically.
pupistry apply --daemon
Note that the daemon runs & logs to the foreground if you run it like the above, the init script handles the syslog & backgrounding for you (why code what the init system can do for us?).
Alternatively, if you don't wish to use Pupistry to run the nodes, you don't have to. You can use Pupistry to build the artifacts and then pull them down and unpack via any means you find appropiate. It's just standard S3 + tar with some YAML and optional GPG signing.
Installation
1. Application
First install Pupistry onto your workstation. You can make pupistry generate you a config file if you've never used it before
gem install pupistry
pupistry setup
Alternatively if you like living on the edge, download this repository and run:
gembuild pupistry.gemspec
gem install pupistry-VERSION.gem
pupistry setup
Pupistry will write an example config file into ~/.pupistry/settings.yaml
for
you, you will need to edit it with your preferred editor.
2. S3 Bucket
Pupistry uses S3 for storing and pulling the artifact files. You need to configure the following:
- A private S3 bucket (you'll get this by default).
- An IAM account with access to write that bucket (for your build workstation)
- An IAM account with access to read that bucket (for your servers)
If you're not already using IAM with your AWS account you want to be - your servers should only ever have read access to the bucket and only your build workstation should be permitted to write new artifacts. IE, don't share your AWS root account around the place. :-)
Note that if you're running EC2 instances and using IAM roles, you can avoid needing to create explicit IAM credentials for the agents/servers, as long as you include read access to the Pupistry S3 bucket in the IAM roles for all servers that will be running it.
If you're new to AWS, we've made your life easy - there's an AWS CloudFormation template included with Pupistry that will build an S3 bucket and two IAM user accounts for you with sensible default policies.
Just make sure you have a working aws
command - that's the Python CLI issued
by AWS themselves setup instructions can be found at:
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html
Provided that you've setup aws
correctly and have full permissions to your
account, you can now build your S3 bucket and IAM users with:
wget https://raw.githubusercontent.com/jethrocarr/pupistry/master/resources/aws/cfn_pupistry_bucket_and_iam.template
aws cloudformation create-stack \
--capabilities CAPABILITY_IAM \
--template-body file://cfn_pupistry_bucket_and_iam.template \
--stack-name pupistry-resources-changeme
It is very important that you change the stack name to something globally unique, or the stack will fail to build.
It may take 30 seconds or so to build, you can check for completion (or for an error) with:
aws cloudformation describe-stacks --query "Stacks[*].StackStatus" --stack-name pupistry-resources-changeme
Once status is CREATE_COMPLETE, you can get all the outputs from the stack with:
aws cloudformation describe-stacks --query "Stacks[*].Outputs[*]" --stack-name pupistry-resources-changeme
You now need to edit ~/.pupistry/settings.yaml
and enter in the equivalent
OutputValue for the following labels:
general:
s3_bucket: S3Bucket
...
agent:
access_key_id: AgentAccessKeyId
secret_access_key: AgentSecretKeyID
region: S3Region
...
build:
access_key_id: BuildAccessKeyId
secret_access_key: BuildSecretKeyID
region: S3Region
...
3. Puppet Manifests & Configuration
Puppet Code Structure
The following is the expected minimum structure of the Puppetcode repository to enable it to work with Pupistry:
/Puppetfile
/hiera.yaml
/manifests/site.pp
Puppetfile
is standard r10k
and site.pp
is standard Puppet. The Hiera config
is generally normal, but you do need to define a datadir to tell Puppet to look
where the puppet code gets unpacked to. Generally the following sample Hiera
will do the trick:
---
:backends: yaml
:yaml:
:datadir: "%{::settings::confdir}/%{::environment}/hieradata"
:hierarchy:
- "environments/%{::environment}"
- "nodes/%{::hostname}"
- common
Pupistry will override the default %{::settings::confdir}
value with wherever
the Pupistry agent has been configured to write to (by default this will be the
Puppet-4 style /etc/puppetlabs/code/environments
path) so this Hiera config
should work fine for any Pupistry deployed setups.
If you're using a hybrid of Pupistry and masterful Puppet, you may need to adjust
the datadir
parameter in Hiera to a fixed path and the puppetcode
parameter
in Pupistry to be the exact same value, since %{::settings::confdir}
will
differ between Pupistry and masterful Puppet.
Pupistry will default to applying the master
branch if one is not listed, if
you are doing branch-based environments, you can specifiy when bootstrapping
and override on a per-execution basis with --environment
.
You'll notice pretty quickly if something is broken when doing pupistry apply
Confused? No worried, check out the sample repo that shows a very simple setup.
You can copy this and start your own Puppet adventure, just add in your modules
to Puppetfile
and add them to the relevant machines in manifests/site.pp
.
https://github.com/jethrocarr/pupistry-samplepuppet
TODO: Longer term intend to add support for various popular structures, but for now it is what it is. It's not hard, check out bin/puppistry and send pull requests.
Helper Module
Whilst you can use Pupistry to roll out any particular design of Puppet manifests, you will save yourself a lot of pain by also including the Pupistry companion Puppet module in your manifests.
The companion Puppet module will configure Pupistry for you, including setting up the system service and configuring Puppet and Hiera correctly for masterless operation.
You can fetch the module from: https://github.com/jethrocarr/puppet-pupistry
If you're doing r10k
and Puppet masterless from scratch, this is probably
something you want to make life easy. With r10k
, just add the following to your
Puppetfile
:
# Install the Pupistry companion module
mod 'jethrocarr/pupistry'
# Dependencies for Pupistry companion if not already defined
mod 'puppetlabs/stdlib'
mod 'jethrocarr/initfact'
And include the pupistry module in all your systems:
node default {
include pupistry
...
}
4. Building your first node (Bootstrapping)
No need for manual configuration of your servers/nodes, you just need to build
your first artifact with Pupistry (pupistry build && pupistry push
) and then
generate a bootstrap script for your particular OS with pupistry bootstrap
The bootstrap script will:
- Install Puppet and Pupistry for the particular OS.
- Download the latest artifact
- Trigger a Puppet run to build your server.
These bootstrap scripts can be generated for you. Refer to "Bootstrapping" under "Usage" instructions above for details.
The bootstrap script goal is to get you from stock OS to running Pupistry and doing your first Puppet run. After that - it's up to you and your Puppet skills to make your node actually do something useful. :-)
5. (optional) Baking an image with Packer
Note that the node initialisation process is still susceptible to weaknesses such as a bug in a new version of Puppet or Pupistry, or changes to the OS packages. If this is a concern/issue for you and you want complete reliability, then use the user data to build a host pre-loaded with Puppet and Pupistry and then create an image of it using a tool like Packer. Doing this, you can make it possible to build all the way to Puppet execution with no dependencies on any third parties other than your VM provider and AWS S3.
Pupistry includes support for generating some Packer examples that you can either use as-is or built upon to meet your own needs. You can list all the available Packer templates with:
pupistry packer
You can select a template and generate a Packer file by specifying the template and the output file on the command line:
pupistry packer --template aws_amazon-any --file packer.json
Once the file has been generated, you can build your packer environment with
the packer build
command. Note that some templates will require additional
variables to be passed to them at run time, for example the AWS template
requires a VPC ID and subnet ID specific to your account.
packer build \
-var 'aws_vpc_id=vpc-example' \
-var 'aws_subnet_id=subnet-example' \
output.json
By default any Packer machines are built with the hostname of "packer" which allows you to specifically target them with your manifests. If you don't do any targetting, the default manifests will be applied.
Templates tend to have other customisable variables, check the available
options and their defaults with packer inspect output.json
.
Tutorials
If you're looking for a more complete introduction to doing masterless Puppet and want to use Pupistry, check out a tutorial by the author:
https://www.jethrocarr.com/2015/05/10/setting-up-and-using-pupistry
By following this tutorial you can go from nothing, to having a complete up
and running masterless Puppet environment using Pupistry. It covers the very
basics of setting up your r10k
environment.
GPG Notes
GPG can be a bit of a beast to setup and get used to. Pupistry tries to make the signing and key sharing process as simple as possible, but setting up GPG on your platform and creating your key is beyond the scope of this document.
If you are being asked for your GPG password for every pupistry push
even in
rapid succession, then you may need to setup gpg-agent so it can keep you
logged in for short durations to get a better balance of security vs usability.
Currently Pupistry supports a 1:1 approach, where the key used to sign the artifact is the key used to verify it. Pull requests to add support for signing and verifying against a keyring list would be welcome to make it easier for teams to use GPG without having everyone with a single master key.
Note that GPG isn't vital for security - you still have end-to-end transport security between your build machine and your servers via HTTPS/TLS to and from the S3 bucket, all that GPG does is prevent anyone who managed to break into your S3 bucket from pushing their own Puppet manifests out.
Generally S3 is secure (assuming no bugs in AWS itself), any likely exploit would be from you accidentally sharing your IAM credentials in the wrong place, or an exploited build server.
Securing Hiera with HieraCrypt
In a standard Puppet master situation, the Puppet master parses the Hiera data and then passes only the values that apply to a particular host to it. But with masterless Puppet, all machines get a full copy of Hiera data, which could be a major issue if one box gets expoited and the contents leaked. Generally it goes against good practise and damanges the isolation ability of VMs if you give all the VMs enough information to do some serious damage to themselves.
By default an out-of-the-box Pupistry installation suffers this limitation like most master-less Puppet solutions. However, there is an optional feature built into Pupistry called "HieraCrypt" which can be used to encrypt data and prevent excessive exposure of information to nodes.
The solutions works, by generating a cert on each node you use with the
pupistry hieracrypt --generate
parameter and saving the output into your
puppetcode repository at hieracrypt/nodes/HOSTNAME
. This output includes a
x509 cert made against the host's SSH RSA host key and a JSON array of all
the facter facts on that host that correlate to values inside the hiera.yaml
file.
When you run Pupistry on your build workstation, it parses the hiera.yaml file for each environment and generates a match of files per-node. It then encrypts these files and creates an encrypted package for each node that only they can decrypt.
For example, if your hiera.yaml file looks like:
:hierarchy:
- "environments/%{::environment}"
- "nodes/%{::hostname}"
- common
And your hieradata directory looks like:
hieradata/
hieradata/common.yaml
hieradata/environments
hieradata/nodes
hieradata/nodes/testhost.yaml
When Pupistry builds the artifact, it will include the common.yaml
file for
all nodes, however the testhost.yaml
file will only be included for the
server with that hostname.
All servers still get the encrypted data for all the other nodes as they're shipped as part of the artifact, but nodes can only decrypt the data signed against their key.
Caveats & Future Plans
Use r10k
Currently only an r10k
workflow is supported. Pull requests for others (eg
Librarian Puppet) are welcome, but it's not a priority for this author as r10k
is working nicely.
Bootstrap Functionality
Currently Pupistry only supports generation of bootstrap for select popular distributions and platforms. Other distributions will be added, but it may take time to get to your particular favourite distribution.
Note that it isn't a show stopper if support for your platform of choice doesn't yet exist - you can use pupistry with pretty much any nix platform, you'll just not have the handy advantage of automatically generated bootstrap for your servers. And in many cases, one of the existing ones can easily be adapted to your platform of choice.
If you do customise it for a different platform, pull requests are VERY welcome, I'll add pretty much any OS if you write a decent bootstrap template for it.
Please see resources/bootstrap/BOOTSTRAP_NOTES.md for more details on how to write and debug bootstrap templates.
Continuous Deployment
A lot of what Pupistry does can also be accomplished by various home-grown Continious Deployment (CD) solutions using platforms like Jenkins or Bamboo. CD is an excellent approach for larger organisations, but Pupistry has been designed for both large and small users so does not mandate it.
It would be possible to use Pupistry as part of your CD process and if you decide to do so, a pull request to better support CD systems out-of-the-box would be welcome.
PuppetDB
There's nothing stopping you from using PuppetDB other than Pupistry has no automatic setup hooks in the bootstrap config. Pull requests to support PuppetDB for masterless machines are welcome, although masterless users tend to want to avoid dependencies on a central point.
Windows
No idea whether this works under Windows, or what would be required to make it do so. Again, pull requests always welcome but it's not a priority for the author.
Developing
When developing Pupistry, you can run the Git repo copy with:
gem install bundler
bundle install
bundle exec pupistry
By default Pupistry will try to load a settings.yaml file in the current
working directory, before then trying ~/.pupistry/settings.yaml
and then
finally /etc/pupistry/settings.yaml
. You can also override with --config
.
Add --verbose
for additional debugging information. If you have a bug this
is the first thing you should run to get more context for reports.
Whilst Pupistry has few tests, we would like to improve this. Please feel free to contribute any additional tests and aim to write tests for new features and definetly for any bug fixes. Once you have written tests, check the output of the tests and Rubocop with:
bundle exec rake
Contributions
Pull requests are very welcome. Pupistry is a very young app and there is plenty of work that can be done to improve it's code quality, enhance existing features and add handy new features. Constructive feedback/requests via the issue tracker is fine, but pull requests speak louder than words. :-)
If you find a bug or need support, please use the issue tracker rather than personal emails to the author.
Feel free to grep the source for "TODO" comments on various tasks that need doing, or check out the issuer tracker for interesting issues to tackle.
Author
Pupistry is developed by Jethro Carr. Blog posts about Pupistry and new features can be found at http://www.jethrocarr.com/tag/pupistry
Beer welcome.
License
Pupistry is licensed under the Apache License, Version 2.0 (the "License").
See the LICENSE.txt
or http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.