Balrog/Meetings/CloudOps - August 10, 2016
From MozillaWiki
Attendees
bhearsum, ckolos, mostlygeek, chartjes
Notes
- mostlygeek is switching Balrog admin+public to the new deployment pipeline, which should make deployments quicker and less error prone. Eg: shouldn't be getting 502s because of using r/o database on admin anymore.
- Now that the cleanup script has landed, we're going to give it a try soon.
- Will use a clone of the prod database first, to see how long the initial run will take.
- Need to decide on MAX_AGE (in days) for prod. Starting with 180 to match old cluster, may shrink it later.
- We're going to try to manually remove releases_history.version (stems from bug 741412), which we weren't able to do on the old cluster.
- If it's still too painful or slow we'll just live with that column continuing to exist, unused.
- We talked a bit about timing of deployments going forward. We decided that RelEng should give a window when requesting one because we're in the best place to know all of the restrictions (ongoing releases most notably). Docs to come on this.
- We discussed what to do with the old Balrog mana page
- The one that mostlygeek created recently replaces the architecture parts
- Updated deployment instructions have been started on the main Balrog wiki page, and will be fleshed out further
- mostlygeek wants to try a couple of things with the Dockerfile, and intends to send pull requests:
- Try to not run the apps as root
- Try using the alpine slim Python image to greatly shrink the size of the images (~700mb -> ~150mb)