ReleaseEngineering/How To/Process release email
This is a list of automatically generated emails you should expect to receive as a release engineer at mozilla. It is not complete.
Note that email is not a good notification methodology, and better systems should always be preferred. However, it often is all that is available for audience which needs the notification. To minimize the pain of email notifications, follow these guidelines:
- email should go to a unique address for the service. This can be achieved by using "plus addresses" (preferred due to positive filtering criterea). (Note: AWS SES refers to these as "labels".)
- if not possible, the message MUST have a unique start to the subject field (brittle).
- email should be documented on this page.
Some email is also routed to archives, which you may prefer to search instead of joining a list to receive emails:
Index
Note: The "Wildcard" column gives a suggestion on how to filter for that email.
Field | Wildcard | Further Notes |
---|---|---|
Subject | collapse report | #Performance Metrics |
Subject | Suspected machine issue (* | Not an actionable email at this point. (from: nobody@cruncher - s/a bug 825625 |
Subject | Talos Suspected machine issue * | if you don't know, you don't care |
Subject | Try submission * | to: autolanduser@mozilla.com |
Subject | [vcs2vcs] alert_major_errors* | major processing error make sure build duty and/or hwine know details |
Subject | [vcs2vcs]: git.m.o push N failed: * | single occurrence related to git.m.o/releases/gecko.git if repeated, this is a major processing error make sure build duty and/or hwine know details |
Subject | [vcs2vcs] process delays* | if repeated, this is a major processing error make sure build duty and/or hwine know details |
Subject | [release-runner] failed | How to investigate release runner failures. (ignore if a release isn't actively being started) |
To | release+amp@mozilla.com | Test Google Play store account. Blassey and Snorp have access. RelEng does not have access at this time. |
To | release+aws@mozilla.com | AWS admin email, service notifications & marketing. See list of AWS emails, contact catlee if unsure how to handle. |
To | release+bitbucket@mozilla.com | Mozilla Bitbucket Admin email (contact hwine for now) |
To | release+aws-sanity-check@mozilla.com | Output from cruncher aws_sanity_checker.py (contact rail) |
To | release+ec2.*@mozilla.com | error output from crontab on the indicated machine. FIX ISSUE! |
To | release+sns@mozilla.com | SNS issue notifications from various services. FIX ISSUE! |
To | release+chromecast@mozilla.com | Developer account for Chromecast app support bug 1037018 (details) |
To | release+v2v-gh@mozilla.com | Primary email for github account moz-v2v-gh. Contact vcs-sync folk |
To | release+roku@mozilla.com | Primary email for Roku account, mfinkle is dev contact |
To | release+signaddons@mozilla.com | Primary email for signing addons in automation via API |
To | release+ubuntu-store@mozilla.com | Primary email for Ubuntu Store |
To | release+mozdef@mozilla.com | Security alerts from infosec's Mozdef server. Alert team&infosec if you find suspicious activity. |
To | release+moc_notifications@mozilla.com | Something from the MOC. Action depends on content. (Cited in mana.) |
To | release+appleagent@mozilla.com | Related to Apple ID account -- bring to manager's attention if lots of activity. |
To | release+wcw@mozilla.com & release+wmw@mozilla.com | Requests for Wednesday Change Window (mana link to come). CiDuty or manager should respond. |
To | release+github@mozilla.com | GitHub private repo access, esp partners + xpi. |
Contents
- 1 Index
- 2 Performance Metrics
- 3 vcs2vcs System
- 4 Release runner
- 5 Amazon EC2 Instance scheduled for retirement
- 6 SNS Notifications from AWS
- 7 Mail to release+chromecast@mozilla.com
- 8 Security Alerts from Mozdef
- 9 Mail to release+moc_notifications@mozilla.com
- 10 Mail to release+appleagent@mozilla.com
- 11 Sample
- 12 Mail to release+adhoc-signing@mozilla.com
Performance Metrics
Why we get them
We get various emails containing raw data that relates to a performance bottle neck at some point in time. Typically these are produced by cron jobs, and so received regularly regardless of metric status. (I.e. they may not require any action.)
What is sending them
Since this is a "catch all" category, various tools send them. Check the full headers for information on sender and source machine as needed.
What to do when one is received
If you don't know what it's about, you don't need to deal with it beyond setting up a filter to ignore it.
How to silence or acknowledge this alert
It's not an alert, so they'll keep coming until the end of time. Filter them if you're not involved with them.
Future plans
Adhoc, so varies by email. Theoretically, these should be transitional, and moved into automation and alerting as soon as the metric is understood.
How to best filter these emails
Since these are adhoc, you'll need adhoc filters. It would be nice if folks used a common prefix on subjects, such as "[releng metrics]".
vcs2vcs System
Why we get them
These emails are the interim notification for vcs2vcs system, and indicate an error that must be addressed. The b2g project is dependent upon parts of the vcs2vcs system, as are other developers and partners.
What is sending them
All emails are sent (perhaps indirectly) by a script from vcs2vcs tools. The hosts sending the email will be one of the ones listed in the configs. Full details of how each script is run, including trouble shooting tips, are in the docs (a formatted copy may be online here).
What to do when one is received
- if the subject contains "[vcs2vcs] AUTOFIX process delays", then look for another email within a few minutes and proceed as follows:
- if a second email from the same host follows almost immediately with a subject of "[vcs2vcs] process delays", then the AUTOFIX failed, which is an unexpected condition. Page hwine.
- if no followup email is received, the AUTOFIX worked. Log in bug 829025 & delete (or just delete and leave for hwine to log).
- if the subject contains "[vcs2vcs] process delays" and is repeated every 20 minutes, this is a service outage - one or more repositories are no longer being updated. The email contents will give specific errors. Consult the trouble shooting section of the docs (above) for guidance and/or PAGE hwine.
- Unfortunately, there appear to be a few race conditions between scripts, so a single occurrence of the email may be a false positive. (bug 839595 filed to track this.)
- if the subject contains "[vcs2vcs] alert_major_errors alert", this is a major problem - one or more repositories are no longer being updated. The email contents will give specific errors. Consult the trouble shooting section of the docs (above) for guidance and/or contact hwine.
- The most common cause of this is hg repo corruption, the recovery is scripted, but can take some time. Please add to bug 808129 if you fix, or block that bug with a new bug.
- NOTE: you may receive an additional email after the root cause is resolved. (The alert checks on the hour for problems in the prior hour.)
- if the subject containes "[vcs2vcs]: git.m.o push N failed for gecko.git:", this is a (usually) transient problem with pushing gecko.git (the partner facing gecko repository) to either git.m.o or git staging. Two pushes are tried each iteration - both should succeed. Each push is numbered '1' or '2', if you see only one email report, the other already succeeded, and is ignorable. One or two sets of emails is ignorable, any more needs investigation, starting with the health of git.mozilla.org. (Note that the message is short, as this also pages hwine via sms, where brevity is nice.)
- if the subject is something else, this is likely unexpected output from a cron job. Judge the severity and escalate to hwine appropriately. File a bug to get better diagnosis of this error condition in the future.
How to silence or acknowledge this alert
Resolving the root cause will stop the emails.
Future plans
The system will eventually be transitioned to Developer Productivity (nee Developer Services (nee IT)) for operations. Specific email will be converted to nagios alerts before then.
How to best filter these emails
All of these emails are sent to the addresses of the form: release+vcs2vcs*@mozilla.com. Common sub addresses are:
- release+vcs2vcs
- mail that will have specifics in the Subject line.
- release+vcs2vcs+forward
- mail to vcs2vcs user, forwarded via ~/.forward file.
Release runner
Why we get them
Release runner sends e-mail when it fails in any way. Eg, failing to poll ship it after a long period of time or failing to start a submitted release.
What is sending them
- Source: https://hg.mozilla.org/build/tools/file/default/buildfarm/release
- Sent by: buildbot
- Runs on: buildbot-master36 through supervisord
What to do when one is received
How to invesigate release runner failures
How to silence or acknowledge this alert
Fix whatever problem release runner has hit. (Sometimes this means waiting out network issues.) There's no way to ack (in the nagios sense) release runner e-mails.
Future plans
They're here to stay.
How to best filter these emails
[release-runner] in the subject.
Amazon EC2 Instance scheduled for retirement
Example
One or more of your Amazon EC2 instances in the us-east-1 region is scheduled for retirement. The following instance(s) will be shut down after 12:00 AM UTC on 2013-10-22.
i-02cc2669
Why we get them
Amazon needs us to move our virtual instance(s) off of certain physical hardware so they can perform maintenance on it.
What is sending them
Automated notification sent by no-reply-aws@amazon.com
What to do when one is received
- determine what host is running on the specified EC2 instance.
- power the instance down in an orderly manner
- start it back up
How to silence or acknowledge this alert
Future plans
How to best filter these emails
Filter on the sender and subject line.
SNS Notifications from AWS
Example
Anything with the Subject "AWS Notification Message"
Why we get them
We use SNS to deliver notifications about various Amazon services as well as services like Papertrail. These are generally critical alerts that we've set up and should be dealt with/investigated in a timely fashion. At the moment, only AWS Cloudwatch and Papertrail use this service, but we will likely add more in the future after we get an SNS->irc bot set up because it allows for an easy HTTP/HTTPS endpoint push that other services already integrate with.
What is sending them
The Amazon SNS service notification topic "buildduty"
- arn:aws:sns:us-west-2:314336048151:buildduty
- arn:aws:sns:us-east-1:314336048151:buildduty
What to do when one is received
Determine what the issue is by parsing the output. Make sure someone is working on fixing the issue (if you're not sure how, at least contact ciduty for their input/advice).
How to silence or acknowledge this alert
Fix the underlying issue to stop the alert.
Future plans
In the near future we intend to send SNS notifications to an irc bot instead of via email.
How to best filter these emails
Ideally you should not filter them except into a high priority folder. You can filter on the Subject or the To address.
Mail to release+chromecast@mozilla.com
Why we get them
The mobile team is adding Chromecast support (ability to fling videos/tabs from a device to a TV). They need a persistent account not linked to a single developer who might leave the company at some point.
What is sending them
These emails come from the
What to do when one is received
Traffic should be light. If the email is not simply Google self-promotion, please forward it to lead mobile devs, namely :blassey and :mfinkle.
How to silence or acknowledge this alert
Future plans
How to best filter these emails
You can either filter on the "To:" field for "release+chromecast@mozilla.com" to catch just these emails, or filter on "From:" for "noreply@google.com" and move all mail from Google (we have multiple accounts mailing us intermittently) to a separate Google subfolder (coop).
Security Alerts from Mozdef
Why we get them
Mozdef is an ELK stack (logging aggregator + parser) run by the infosec team. They're consuming our Papertrail logs, at our request.
2016.09.13: We have asked them to create some preliminary alerts on ssh access to our signing infrastructure. See https://bugzilla.mozilla.org/show_bug.cgi?id=1290261
What is sending them
2016.09.13: the infosec team has a cron job finding ssh activity on the signing infrastructure, and that emails us.
What to do when one is received
2016.09.13: The emails are very new. For now, we most likely want to take a look and see what the 'normal' looks like, so we know when something out of the ordinary happens.
On suspicious email, notify the team and infosec.
How to silence or acknowledge this alert
2016.10.08: These will send once an hour if there is ssh access.
Future plans
2016.09.13: We may change the frequency of the emails to be more immediate, once we know the noise level.
How to best filter these emails
As noted in the table above, these are sent to release+mozdef@mozilla.com
Mail to release+moc_notifications@mozilla.com
Why we get them
Unsure when MOC will use this address.
What is sending them
Humans from MOC will use this address.
What to do when one is received
- Read and handle
How to silence or acknowledge this alert
- depends on context
Future plans
Unknown - check mana to see if anything has changed.
How to best filter these emails
Filter by "to" address.
Mail to release+appleagent@mozilla.com
Why we get them
- 2 step verification
- fall back account
What is sending them
Apple when folks interact with the release Apple ID agent account.
What to do when one is received
- If you generated it, claim it by reply.
- Unclaimed emails should be escalated to folks with access to release Apple ID accounts
How to silence or acknowledge this alert
- depends on context
Future plans
none
How to best filter these emails
Filter by "to" address.
Sample
Why we get them
Give a brief explanation of why this email is for, what it helps us do and why it should be watched
What is sending them
Include a link to the source of the program sending the email. Include information on which hosts are sending the email, and give information on how program runs. Is it a daemon? Does it have an init script? Do you run it under screen?
What to do when one is received
- if the title contains "[scl-production-puppet-new] <slavename> is waiting to be signed", this is for information and requires no immediate action
- if the title contains "[scl-production-puppet-new] <slavename> has invalid cert", the script will try once to clean the cert before sending the email. If this is successful, you'll see a matching "<slavename> is waiting to be signed" email. The key will be automatically signed
How to silence or acknowledge this alert
Include information on how to make the emails stop
Future plans
provide any future plans for this email. Is it temporary? Is it going to be replaced by a real dashboard? Are you going to add/change things people filter on?
How to best filter these emails
provide insight on how to filter these emails. Is there a distinguishing header? Is it always from a specifc host, or family of hosts? Is there a distinctive subject?
Mail to release+adhoc-signing@mozilla.com
Why we get them
- adhoc repo release promotion
What is sending them
https://github.com/mozilla-releng/adhoc-signing relpro
What to do when one is received
- Make sure we're expecting an adhoc signing operation
How to silence or acknowledge this alert
- depends on context
Future plans
none
How to best filter these emails
Filter by "to" address.