Sheriffing/How To/Retrigger Jobs: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
(→‎Retriggering Nightly Builds: link to new hooks, some cleanup)
 
(19 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Sheriffing How To|Retrigger jobs}}
__TOC__
Sometimes builds and tests need to be retriggered. For some classes of automation/infrastructure failures, this happens automatically and the job is marked in the [https://treeherder.mozilla.org/userguide.html Treeherder UI] as dark blue. For other cases, or if you're doing investigative work e.g. testing for an intermittent failure, you'll need to retrigger the job manually.
Sometimes builds and tests need to be retriggered. For some classes of automation/infrastructure failures, this happens automatically and the job is marked in the [https://treeherder.mozilla.org/userguide.html Treeherder UI] as dark blue. For other cases, or if you're doing investigative work e.g. testing for an intermittent failure, you'll need to retrigger the job manually.
'''Retriggering''' a job will cause the certain test to be run again.
<span style="color:#D00">Tasks which belong to the release process to actually ship a build to users must not be retriggered or backfilled else the task chain will break. These tasks can be found e.g. on beta and release trees. Examples are the "UV" or "Snap" tasks. If a task name at the bottom left of treeherder starts with "release-", it's a release task. If you are unsure about it, ask in [https://chat.mozilla.org/#/room/#firefox-ci:mozilla.org Firefox CI] on Matrix.</span>


= Manual retriggers =
= Manual retriggers =
* Select a job result in Treeherder and click on it.
* Select a job result in Treeherder and click on it.
* This will display a results pane in the left bottom corner with information like Job, Machine, Task, etc.
* This will display a results pane in the left bottom corner with information like Job, Machine, Task, etc.
* To retrigger this job/test, click on the circular arrow with the mouseover text of "Repeat the selected job" at the top of the results pane and the job will be retriggered. You can accomplish the same thing by simply pressing "r" in the results pane when you are logged in to Treeherder.
* To retrigger this job/test, click on the '''circular arrow''' with the ''mouseover text'' of ''"Repeat the selected job"'' at the top of the results pane and the job will be retriggered. You can accomplish the same thing by simply pressing '''"r"''' in the results pane when you are logged in to Treeherder.
[[File:Retrigger ll.jpg|400px|center]]
*If you want to manually retrigger '''multiple jobs''', you can '''add''' them to the '''pinboard''' and click on '''retrigger all''':
[[File:Rt all.jpg|400px|center]]
 
= Backfills =
 
'''Backfilling''' runs the job on the ''previous pushes'' (at the moment: 5) where it often didn’t run (regarded as not necessary or to save resources).
To backfill, you '''select a job''' and from the panel you click on the '''“...”''' next to the retrigger button, and choose the '''first option''':
[[File:Backfill.jpg|center]]
This comes in handy when you want to determine from which push a certain failure has started (ex: for backouts).
<br />
If you want to backfill on '''''a certain number of pushes''''', click on '''“...”''' and then on '''Custom Action''':
[[File:Custom action.jpg|400px|center]]
*Choose backfill and change the '''depth''' to your number of choice:
[[File:Backfill details.jpg|500px|center]]
*And trigger.
 
= Retriggering Nightly Builds =


= Retriggering PGO Builds =
If new Nightlies have to be requested - be it for a backout or because the merge had to be later than expected - wait for the normal 'Gecko Decision Task' to finish and request the Nightlies after that. Due to the new 'shippable' builds, the Nightlies will create far less jobs and reuse the already running shippable builds.
PGO ('''P'''rofile-'''G'''uided '''O''''ptimization) builds can break, just like anything else. When you fix the bustage with a backout or with help from a developer, you can trigger PGO builds to verify that the issue is fixed.


To retrigger PGO builds on your changeset:
<br />
* Go the buildapi self-serve page for mozilla-central: https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central  
Nightly builds run at '''12:00 / 01:00 AM/PM RO time''' (10am/pm UTC) so if we don't succeed in doing merges to central before that, nightly builds will be automatically scheduled for the last push to central before that time if there have not been Nightly builds already for that push (scheduled 12 hours before).
* Input your changeset into the box labelled "Create new PGO builds on mozilla-central revision" at the bottom of the page.
* Click Submit


As of Jan 17, 2017 Linux and Android nightly builds on m-c are now running in taskcluster so the instructions on how to respin them have been updated for these platforms
<span style="color:#FF0000">'''Note:'''</span>  We only respin nightlies if we miss them by a few minutes or if we need to get something into the next nightly (for example: fixes for crashes). If they have been running for '''''more than half an hour''''', we <span style="color:#FF0000">won’t</span> respin them.
See https://wiki.mozilla.org/ReleaseEngineering/Buildduty/Other_Duties#How_do_I_re-spin_mozilla-central_nightlies.3F


= Retriggering Nightly Builds =
===== Steps =====
The above steps work equally well for triggering new Nightly builds, just use the "Create new nightly builds on mozilla-central revision" input box instead.  
 
======Cancel running undesired Nightly tasks on older push======
*Open mozilla-central and type  <span style="color:#14866d">"'''nightly'''"</span> in the upper right search box, also <span style="color:#14866d">select the running jobs</span> (gray) and <span style="color:#14866d">deselect the rest</span><br>
[[File:Nightly filter.jpg|800px|center]]
*Scroll down to the last merge, you will see "N" builds running<br/>
[[File:Nightly running.jpg|900px|center]]
*Pin all the jobs and cancel them<br>
[[File:Nightly pin.jpg|800px|center]]
*Pin all nightlies and select '''“Clear”'''
 
======Request new Nightlies======
*Open https://tools.taskcluster.net and login if necessary
*Click on the following links:
**To retrigger nightlies for '''desktop platforms and Android''': https://firefox-ci-tc.services.mozilla.com/hooks/project-releng/cron-task-mozilla-central%2Fnightly-all
**To retrigger nightlies '''only for all desktop platforms''' https://firefox-ci-tc.services.mozilla.com/hooks/project-releng/cron-task-mozilla-central%2Fnightly-desktop
**To retrigger nightlies for '''Android''' https://firefox-ci-tc.services.mozilla.com/hooks/project-releng/cron-task-mozilla-central%2Fnightly-android
*At the top you will see when the task has been ran (picture is for reference)
[[File:Retrigger running.jpg|500px|center]]
*Scroll down and click '''Trigger hook''', a pop-up will be displayed, click '''Trigger Hook''' again
[[File:Trigger hook.jpg|500px|center]]


'''CAVEAT''': there are implications to triggering too many Nightly builds in a single day or in quick succession. Please talk with a sheriff first before retriggering Nightly builds.
'''CAVEAT''': there are implications to triggering too many Nightly builds in a single day or in quick succession. Please talk with a sheriff first before retriggering Nightly builds.
= How to bulk retrigger build bustages a push at a time =
<span style="color:#FF0000"><big>'''Please note that this will run all failed jobs again, not only build bustages!'''</big></span>
Prerequisites:
Step 1:  Run <span style="color:#b32425">'''pip install taskcluster'''</span> to '''install a taskcluster component''' from pip install taskcluster
      If you get the message ''"The program 'pip' is currently not installed"'' then you have to install it by running:
            a. <code> '''sudo apt install -y python-pip''' </code>
            b. <code> when python-pip install is completed, run '''pip install taskcluster''' </code>
Step 2:  '''Save the file''' and '''make it executable''' by running the command: <span style="color:#b32425"> '''sudo wget https://hg.mozilla.org/build/braindump/raw-file/default/taskcluster/tc-filter.py -P /usr/local/sbin/ && chmod +x /usr/local/sbin/tc-filter.py'''</span>
Step 3:  '''Re-sign in''' with the taskcluster tool if you were already signed in
How to use:
# Set the url of the taskcluster instance in which the failing tasks ran (there is also a community taskcluster instance which doesn't get sheriffed): <code>export TASKCLUSTER_ROOT_URL=https://firefox-ci-tc.services.mozilla.com</code>
# '''sign in''' with the taskcluster tool ( <span style="color:#b32425">'''eval $(taskcluster signin)'''</span> | if you were not already signed in)
# '''run''' <span style="color:#b32425">'''tc-filter.py --state failed --action rerun --graph-id</span> <span style="color:#14866d">geckoDecisionTaskTaskId</span>'''
''Note: Replace geckoDecisionTaskTaskId with the task id being shown on the bottom left when you click on the gecko decision task for the push with the failures.''
= Rerunning build bustages =
=====  How to install taskcluster CLI =====
This tool is needed in order to retrigger some build jobs, especially nightly builds.
Download ''Taskcluster CLI'' on Ubuntu from https://github.com/taskcluster/taskcluster-cli
From your /home/user folder ('''or the location where mozilla-unified is stored'''), run the following commands:
# <span style="color:#b32425">'''sudo wget https://index.taskcluster.net/v1/task/project.taskcluster.taskcluster-cli.latest/artifacts/public/linux-amd64/taskcluster -P /usr/local/sbin/'''</span>
# <span style="color:#b32425">'''sudo chmod +x /usr/local/sbin/taskcluster'''</span>
The tool is now '''installed''' and '''made executable''' in /usr/local/sbin/.
=====  How to use Taskcluster CLI =====
# From the terminal, '''run''' the command: <span style="color:#b32425"> '''eval $(taskcluster signin)'''</span>. This tool <span style="color:#14866d">will only work as long as the terminal remains open.</span>
# When the browser page opens, login using LDAP
# Click '''Create a new clientId''' and go to the end of the page, then click '''Create Client'''.
# Wait a few seconds, then close the browser.
# In the console, the following message should appear: <span style="color:#14866d">''Credentials output as environment variables.''</span>
# '''Run''' <span style="color:#b32425">'''taskcluster task rerun'''</span> <span style="color:#14866d">'''TASK_ID'''</span> (take the TASK_ID from the job summary – go to Treeherder, click the job and on the left side of the window you have Task)
# After following these steps, the console should output either <span style="color:#14866d">''running''</span> or <span style="color:#14866d">''pending''</span>.

Latest revision as of 21:16, 26 March 2024

Sometimes builds and tests need to be retriggered. For some classes of automation/infrastructure failures, this happens automatically and the job is marked in the Treeherder UI as dark blue. For other cases, or if you're doing investigative work e.g. testing for an intermittent failure, you'll need to retrigger the job manually. Retriggering a job will cause the certain test to be run again.

Tasks which belong to the release process to actually ship a build to users must not be retriggered or backfilled else the task chain will break. These tasks can be found e.g. on beta and release trees. Examples are the "UV" or "Snap" tasks. If a task name at the bottom left of treeherder starts with "release-", it's a release task. If you are unsure about it, ask in Firefox CI on Matrix.


Manual retriggers

  • Select a job result in Treeherder and click on it.
  • This will display a results pane in the left bottom corner with information like Job, Machine, Task, etc.
  • To retrigger this job/test, click on the circular arrow with the mouseover text of "Repeat the selected job" at the top of the results pane and the job will be retriggered. You can accomplish the same thing by simply pressing "r" in the results pane when you are logged in to Treeherder.
Retrigger ll.jpg
  • If you want to manually retrigger multiple jobs, you can add them to the pinboard and click on retrigger all:
Rt all.jpg

Backfills

Backfilling runs the job on the previous pushes (at the moment: 5) where it often didn’t run (regarded as not necessary or to save resources). To backfill, you select a job and from the panel you click on the “...” next to the retrigger button, and choose the first option:

Backfill.jpg

This comes in handy when you want to determine from which push a certain failure has started (ex: for backouts).
If you want to backfill on a certain number of pushes, click on “...” and then on Custom Action:

Custom action.jpg
  • Choose backfill and change the depth to your number of choice:
Backfill details.jpg
  • And trigger.

Retriggering Nightly Builds

If new Nightlies have to be requested - be it for a backout or because the merge had to be later than expected - wait for the normal 'Gecko Decision Task' to finish and request the Nightlies after that. Due to the new 'shippable' builds, the Nightlies will create far less jobs and reuse the already running shippable builds.


Nightly builds run at 12:00 / 01:00 AM/PM RO time (10am/pm UTC) so if we don't succeed in doing merges to central before that, nightly builds will be automatically scheduled for the last push to central before that time if there have not been Nightly builds already for that push (scheduled 12 hours before).

Note: We only respin nightlies if we miss them by a few minutes or if we need to get something into the next nightly (for example: fixes for crashes). If they have been running for more than half an hour, we won’t respin them.

Steps
Cancel running undesired Nightly tasks on older push
  • Open mozilla-central and type "nightly" in the upper right search box, also select the running jobs (gray) and deselect the rest
Nightly filter.jpg
  • Scroll down to the last merge, you will see "N" builds running
Nightly running.jpg
  • Pin all the jobs and cancel them
Nightly pin.jpg
  • Pin all nightlies and select “Clear”
Request new Nightlies
Retrigger running.jpg
  • Scroll down and click Trigger hook, a pop-up will be displayed, click Trigger Hook again
Trigger hook.jpg

CAVEAT: there are implications to triggering too many Nightly builds in a single day or in quick succession. Please talk with a sheriff first before retriggering Nightly builds.

How to bulk retrigger build bustages a push at a time

Please note that this will run all failed jobs again, not only build bustages!

Prerequisites:

Step 1: Run pip install taskcluster to install a taskcluster component from pip install taskcluster

      If you get the message "The program 'pip' is currently not installed" then you have to install it by running:
           a.  sudo apt install -y python-pip 
           b.  when python-pip install is completed, run pip install taskcluster 

Step 2: Save the file and make it executable by running the command: sudo wget https://hg.mozilla.org/build/braindump/raw-file/default/taskcluster/tc-filter.py -P /usr/local/sbin/ && chmod +x /usr/local/sbin/tc-filter.py

Step 3: Re-sign in with the taskcluster tool if you were already signed in

How to use:

  1. Set the url of the taskcluster instance in which the failing tasks ran (there is also a community taskcluster instance which doesn't get sheriffed): export TASKCLUSTER_ROOT_URL=https://firefox-ci-tc.services.mozilla.com
  2. sign in with the taskcluster tool ( eval $(taskcluster signin) | if you were not already signed in)
  3. run tc-filter.py --state failed --action rerun --graph-id geckoDecisionTaskTaskId

Note: Replace geckoDecisionTaskTaskId with the task id being shown on the bottom left when you click on the gecko decision task for the push with the failures.

Rerunning build bustages

How to install taskcluster CLI

This tool is needed in order to retrigger some build jobs, especially nightly builds. Download Taskcluster CLI on Ubuntu from https://github.com/taskcluster/taskcluster-cli

From your /home/user folder (or the location where mozilla-unified is stored), run the following commands:

  1. sudo wget https://index.taskcluster.net/v1/task/project.taskcluster.taskcluster-cli.latest/artifacts/public/linux-amd64/taskcluster -P /usr/local/sbin/
  2. sudo chmod +x /usr/local/sbin/taskcluster

The tool is now installed and made executable in /usr/local/sbin/.

How to use Taskcluster CLI
  1. From the terminal, run the command: eval $(taskcluster signin). This tool will only work as long as the terminal remains open.
  2. When the browser page opens, login using LDAP
  3. Click Create a new clientId and go to the end of the page, then click Create Client.
  4. Wait a few seconds, then close the browser.
  5. In the console, the following message should appear: Credentials output as environment variables.
  6. Run taskcluster task rerun TASK_ID (take the TASK_ID from the job summary – go to Treeherder, click the job and on the left side of the window you have Task)
  7. After following these steps, the console should output either running or pending.