User:Djmitche/New Releng Puppet Infrastructure

From MozillaWiki
Jump to: navigation, search

This is a complete re-implementation of puppet for release engineering.

Goals

  • A modern puppet installation, completely specifying all releng infrastructure (including Windows)
  • Manifests structured to apply settings across all machines, rather than distinct sets of manifests for each slave silo
  • Usable by external parties, both inside and outside of mozilla
  • Hands-free installations

General

The releng puppet masters are managed by IT (in fact, managed by IT's puppet infrastructure).

The releng puppet infrastructure will be using the same puppet versions as the rest of Mozilla. Currently, this is Puppet-2.7.1. As IT upgrades, the masters will be upgraded; releng can then upgrade the clients using puppet itself.

Base Images

The base images for this infrastructure are barely-modified OS installs. They have just enough installed that they can connect to a puppet server, get certificates, and puppetize on boot.

See the base images page for details on how that is set up.

Manifests

The manifests are currently in http://hg.mozilla.org/users/dmitchell_mozilla.com/puppet, but this will move to http://hg.mozilla.org/build/puppet eventually.

Big Files

Some files are too big to end up in version control -- mostly package sources.

Concerns

  • Reproducibility - we should have a source-of-record for all files, and this should be enforced
    • [TODO] how?
  • Security - some of the packages have special licensing, or may have other security concerns; these will need to be protected from unauthorized access
  • Stability - releng systems should not dynamically update, even for security updates. We will need to have "snapshots" of constantly-evolving repositories like EPEL or updates, and a way to update those on demand.
  • Storage - this can wind up taking a *huge* amount of space. Saving space with hard-links between identical files might help, especially with snapshots.
    • [TODO] Is there a utility to fix this?
  • Replication - how can all of the masters stay in sync?
  • Updates - how can we time package updates?
    • This has significant security implications

RPMs / Yum Repos Repos

There are several repos hosted on each puppet master:

  • CentOS 6
    • os - from the DVD
    • updates - mirrored at a well-defined time
  • EPEL 6 - mirrored at a well-defined time
  • releng
    • public - packages we can share with the community
    • private - packages that are private or proprietary (hopefully minimal)

most of the questions in "Concerns", above, are still un-answered.

DMGs

Python Packages

Windows Packages

Masters

There is currently one puppet master, although long-term we will have multiple masters.

The masters update their manifests from mercurial once every 5 minutes, with a bit of "splay" added (so it does not always occur on the 5-minute mark). Any errors during the update are emailed, as well as a diff of the manifests when they change; the latter forms a kind of change control.

Multi-Master

Cert Signing

 A sysadmin asked the Architect,
   "What's the best way to install a new system?"
 The Architect answered,
   "Turn it on."
 The sysadmin was enlightened.

All of our installation tools are scriptable. These tools are responsible for fetching a signed certificate from the puppet master and installing it on the client before its first boot. This transaction will be authenticated using a protected shared secret. Non-Mozilla users can simply omit this part of the setup and sign certificates by hand.

Clients

Client images should proceed automatically from imaging to a fully-operational state. The base images are designed to support this.

Automatic/Manual Updates

A few things will need to be kept current. We will need some automated means to nag about these, lest the releng infra again fall out of date.

  • security updates for packages
  • puppet versions