ReleaseEngineering/How To/Process release email

From MozillaWiki
Jump to: navigation, search

This is a list of automatically generated emails you should expect to receive as a release engineer at mozilla. It is not complete.

Note that email is not a good notification methodology, and better systems should always be preferred. However, it often is all that is available for audience which needs the notification. To minimize the pain of email notifications, follow these guidelines:

  • email should go to a unique address for the service. This can be achieved by using "plus addresses" (preferred due to positive filtering criterea). (Note: AWS SES refers to these as "labels".)
    • if not possible, the message MUST have a unique start to the subject field (brittle).
  • email should be documented on this page.

Some email is also routed to archives, which you may prefer to search instead of joining a list to receive emails:

Index

Note: The "Wildcard" column gives a suggestion on how to filter for that email.

Field Wildcard Further Notes
Subject collapse report #Performance Metrics
Subject Suspected machine issue (* Not an actionable email at this point. (from: nobody@cruncher - s/a bug 825625
Subject Talos Suspected machine issue * if you don't know, you don't care
Subject Try submission * to: autolanduser@mozilla.com
Subject [vcs2vcs] alert_major_errors* major processing error make sure build duty and/or hwine know details
Subject [vcs2vcs]: git.m.o push N failed: * single occurrence related to git.m.o/releases/gecko.git if repeated, this is a major processing error make sure build duty and/or hwine know details
Subject [vcs2vcs] process delays* if repeated, this is a major processing error make sure build duty and/or hwine know details
Subject [release-runner] failed How to investigate release runner failures. (ignore if a release isn't actively being started)
To release+amp@mozilla.com Test Google Play store account. Blassey and Snorp have access. RelEng does not have access at this time.
To release+aws@mozilla.com AWS admin email, service notifications & marketing. See list of AWS emails, contact catlee if unsure how to handle.
To release+bitbucket@mozilla.com Mozilla Bitbucket Admin email (contact hwine for now)
To release+vcs2vcs@mozilla.com Output from vcs2vcs hg<->git conversion (details)
To release+aws-sanity-check@mozilla.com Output from cruncher aws_sanity_checker.py (contact rail)
To release+ec2.*@mozilla.com error output from crontab on the indicated machine. FIX ISSUE!
To release+sns@mozilla.com SNS issue notifications from various services. FIX ISSUE!
To release+update.b2g.o@mozilla.com low disk space on dogfood update server, see bug 877224
To release+chromecast@mozilla.com Developer account for Chromecast app support bug 1037018 (details)
To release+v2v-gh@mozilla.com Primary email for github account moz-v2v-gh. Contact vcs-sync folk
To release+roku@mozilla.com Primary email for Roku account, mfinkle is dev contact
To release+signaddons@mozilla.com Primary email for signing addons in automation via API
To release+ubuntu-store@mozilla.com Primary email for Ubuntu Store
To release+mozdef@mozilla.com Security alerts from infosec's Mozdef server. Alert team&infosec if you find suspicious activity.
To release+moc_notifications@mozilla.com Something from the MOC. Action depends on content. (Cited in mana.)
To release+appleagent@mozilla.com Related to Apple ID account -- bring to manager's attention if lots of activity.
To release+wcw@mozilla.com & release+wmw@mozilla.com Requests for Wednesday Change Window (mana link to come). CiDuty or manager should respond.
To release+cot@mozilla.com GPG expiration monitoring. Alert aki, catlee, garndt.


Contents

Performance Metrics

Why we get them

We get various emails containing raw data that relates to a performance bottle neck at some point in time. Typically these are produced by cron jobs, and so received regularly regardless of metric status. (I.e. they may not require any action.)

What is sending them

Since this is a "catch all" category, various tools send them. Check the full headers for information on sender and source machine as needed.

What to do when one is received

If you don't know what it's about, you don't need to deal with it beyond setting up a filter to ignore it.

How to silence or acknowledge this alert

It's not an alert, so they'll keep coming until the end of time. Filter them if you're not involved with them.

Future plans

Adhoc, so varies by email. Theoretically, these should be transitional, and moved into automation and alerting as soon as the metric is understood.

How to best filter these emails

Since these are adhoc, you'll need adhoc filters. It would be nice if folks used a common prefix on subjects, such as "[releng metrics]".

vcs2vcs System

Why we get them

These emails are the interim notification for vcs2vcs system, and indicate an error that must be addressed. The b2g project is dependent upon parts of the vcs2vcs system, as are other developers and partners.

What is sending them

All emails are sent (perhaps indirectly) by a script from vcs2vcs tools. The hosts sending the email will be one of the ones listed in the configs. Full details of how each script is run, including trouble shooting tips, are in the docs (a formatted copy may be online here).

What to do when one is received

  • if the subject contains "[vcs2vcs] AUTOFIX process delays", then look for another email within a few minutes and proceed as follows:
    • if a second email from the same host follows almost immediately with a subject of "[vcs2vcs] process delays", then the AUTOFIX failed, which is an unexpected condition. Page hwine.
    • if no followup email is received, the AUTOFIX worked. Log in bug 829025 & delete (or just delete and leave for hwine to log).
  • if the subject contains "[vcs2vcs] process delays" and is repeated every 20 minutes, this is a service outage - one or more repositories are no longer being updated. The email contents will give specific errors. Consult the trouble shooting section of the docs (above) for guidance and/or PAGE hwine.
    • Unfortunately, there appear to be a few race conditions between scripts, so a single occurrence of the email may be a false positive. (bug 839595 filed to track this.)
  • if the subject contains "[vcs2vcs] alert_major_errors alert", this is a major problem - one or more repositories are no longer being updated. The email contents will give specific errors. Consult the trouble shooting section of the docs (above) for guidance and/or contact hwine.
    • The most common cause of this is hg repo corruption, the recovery is scripted, but can take some time. Please add to bug 808129 if you fix, or block that bug with a new bug.
    • NOTE: you may receive an additional email after the root cause is resolved. (The alert checks on the hour for problems in the prior hour.)
  • if the subject containes "[vcs2vcs]: git.m.o push N failed for gecko.git:", this is a (usually) transient problem with pushing gecko.git (the partner facing gecko repository) to either git.m.o or git staging. Two pushes are tried each iteration - both should succeed. Each push is numbered '1' or '2', if you see only one email report, the other already succeeded, and is ignorable. One or two sets of emails is ignorable, any more needs investigation, starting with the health of git.mozilla.org. (Note that the message is short, as this also pages hwine via sms, where brevity is nice.)
  • if the subject is something else, this is likely unexpected output from a cron job. Judge the severity and escalate to hwine appropriately. File a bug to get better diagnosis of this error condition in the future.

How to silence or acknowledge this alert

Resolving the root cause will stop the emails.

Future plans

The system will eventually be transitioned to Developer Productivity (nee Developer Services (nee IT)) for operations. Specific email will be converted to nagios alerts before then.

How to best filter these emails

All of these emails are sent to the addresses of the form: release+vcs2vcs*@mozilla.com. Common sub addresses are:

release+vcs2vcs 
mail that will have specifics in the Subject line.
release+vcs2vcs+forward 
mail to vcs2vcs user, forwarded via ~/.forward file.

Release runner

Why we get them

Release runner sends e-mail when it fails in any way. Eg, failing to poll ship it after a long period of time or failing to start a submitted release.

What is sending them

What to do when one is received

How to invesigate release runner failures

How to silence or acknowledge this alert

Fix whatever problem release runner has hit. (Sometimes this means waiting out network issues.) There's no way to ack (in the nagios sense) release runner e-mails.

Future plans

They're here to stay.

How to best filter these emails

[release-runner] in the subject.

Amazon EC2 Instance scheduled for retirement

Example

One or more of your Amazon EC2 instances in the us-east-1 region is scheduled for retirement. The following instance(s) will be shut down after 12:00 AM UTC on 2013-10-22.

 i-02cc2669

Why we get them

Amazon needs us to move our virtual instance(s) off of certain physical hardware so they can perform maintenance on it.

What is sending them

Automated notification sent by no-reply-aws@amazon.com

What to do when one is received

  • determine what host is running on the specified EC2 instance.
  • power the instance down in an orderly manner
  • start it back up

How to silence or acknowledge this alert

Future plans

How to best filter these emails

Filter on the sender and subject line.

SNS Notifications from AWS

Example

Anything with the Subject "AWS Notification Message"

Why we get them

We use SNS to deliver notifications about various Amazon services as well as services like Papertrail. These are generally critical alerts that we've set up and should be dealt with/investigated in a timely fashion. At the moment, only AWS Cloudwatch and Papertrail use this service, but we will likely add more in the future after we get an SNS->irc bot set up because it allows for an easy HTTP/HTTPS endpoint push that other services already integrate with.

What is sending them

The Amazon SNS service notification topic "buildduty"

  • arn:aws:sns:us-west-2:314336048151:buildduty
  • arn:aws:sns:us-east-1:314336048151:buildduty

What to do when one is received

Determine what the issue is by parsing the output. Make sure someone is working on fixing the issue (if you're not sure how, at least contact ciduty for their input/advice).

How to silence or acknowledge this alert

Fix the underlying issue to stop the alert.

Future plans

In the near future we intend to send SNS notifications to an irc bot instead of via email.

How to best filter these emails

Ideally you should not filter them except into a high priority folder. You can filter on the Subject or the To address.

Mail to release+chromecast@mozilla.com

Why we get them

The mobile team is adding Chromecast support (ability to fling videos/tabs from a device to a TV). They need a persistent account not linked to a single developer who might leave the company at some point.

What is sending them

These emails come from the

What to do when one is received

Traffic should be light. If the email is not simply Google self-promotion, please forward it to lead mobile devs, namely :blassey and :mfinkle.

How to silence or acknowledge this alert

Future plans

How to best filter these emails

You can either filter on the "To:" field for "release+chromecast@mozilla.com" to catch just these emails, or filter on "From:" for "noreply@google.com" and move all mail from Google (we have multiple accounts mailing us intermittently) to a separate Google subfolder (coop).



Security Alerts from Mozdef

Why we get them

Mozdef is an ELK stack (logging aggregator + parser) run by the infosec team. They're consuming our Papertrail logs, at our request.

2016.09.13: We have asked them to create some preliminary alerts on ssh access to our signing infrastructure. See https://bugzilla.mozilla.org/show_bug.cgi?id=1290261

What is sending them

2016.09.13: the infosec team has a cron job finding ssh activity on the signing infrastructure, and that emails us.

What to do when one is received

2016.09.13: The emails are very new. For now, we most likely want to take a look and see what the 'normal' looks like, so we know when something out of the ordinary happens.

On suspicious email, notify the team and infosec.

How to silence or acknowledge this alert

2016.10.08: These will send once an hour if there is ssh access.

Future plans

2016.09.13: We may change the frequency of the emails to be more immediate, once we know the noise level.

How to best filter these emails

As noted in the table above, these are sent to release+mozdef@mozilla.com


Mail to release+moc_notifications@mozilla.com

Why we get them

Unsure when MOC will use this address.

What is sending them

Humans from MOC will use this address.

What to do when one is received

  • Read and handle

How to silence or acknowledge this alert

  • depends on context

Future plans

Unknown - check mana to see if anything has changed.

How to best filter these emails

Filter by "to" address.


Mail to release+appleagent@mozilla.com

Why we get them

  • 2 step verification
  • fall back account

What is sending them

Apple when folks interact with the release Apple ID agent account.

What to do when one is received

  • If you generated it, claim it by reply.
  • Unclaimed emails should be escalated to folks with access to release Apple ID accounts

How to silence or acknowledge this alert

  • depends on context

Future plans

none

How to best filter these emails

Filter by "to" address.

Sample

Why we get them

Give a brief explanation of why this email is for, what it helps us do and why it should be watched

What is sending them

Include a link to the source of the program sending the email. Include information on which hosts are sending the email, and give information on how program runs. Is it a daemon? Does it have an init script? Do you run it under screen?

What to do when one is received

  • if the title contains "[scl-production-puppet-new] <slavename> is waiting to be signed", this is for information and requires no immediate action
  • if the title contains "[scl-production-puppet-new] <slavename> has invalid cert", the script will try once to clean the cert before sending the email. If this is successful, you'll see a matching "<slavename> is waiting to be signed" email. The key will be automatically signed

How to silence or acknowledge this alert

Include information on how to make the emails stop

Future plans

provide any future plans for this email. Is it temporary? Is it going to be replaced by a real dashboard? Are you going to add/change things people filter on?

How to best filter these emails

provide insight on how to filter these emails. Is there a distinguishing header? Is it always from a specifc host, or family of hosts? Is there a distinctive subject?

Mail to release+cot@mozilla.com

Why we get them

  • gpg pubkey expiration monitoring ([1])

What is sending them

https://tools.taskcluster.net/hooks/project-releng/cot-gpg-keys%2Fexpiration

What to do when one is received

  • Alert the cot-gpg-keys contributors [2]

How to silence or acknowledge this alert

  • depends on context

Future plans

none

How to best filter these emails

Filter by "to" address.