ReleaseEngineering/Archive/Android Tegras: Difference between revisions

ReleaseEngineering/Archive/Android Tegras (view source)

Revision as of 09:42, 14 February 2013

295 bytes removed , 14 February 2013

no edit summary

Callek

Account confirmers, Anti-spam team, Confirmed users, Bureaucrats and Sysops emeriti

1,529

edits

@@ Line 7: / Line 7: @@
 Nagios will alert us in channel (and send email) after the it hits the retry limit for ping attempts.
-See the section [[ReleaseEngineering:How_To:Android_Tegras#Reboot_a_tegra|power cycle a tegra]].
+See the section [[#Reboot_a_tegra|power cycle a tegra]].
 === tegra agent check is CRITICAL ===
@@ Line 24: / Line 24: @@
      https://secure.pub.build.mozilla.org/buildapi/recent/tegra-338
-* If it's burning builds, connect to the associated foopy listed in the dashboard and [[#Disable a tegra|stop the tegra(s).
+* If it's burning builds, connect to the associated foopy listed in the dashboard and [[#Disable a tegra|stop the tegra(s)]].
 <!--
 TODO: This maintenance script needs updating, but docs will be (almost) perfect when done, so don't remove from page
@@ Line 50: / Line 49: @@
 = Basic tegra management =
 == Find what foopy a Tegra is on ==
-Open the Tegra Dashboard - the foopy number is shown to the right
+Open the [http://mobile-dashboard.pub.build.mozilla.org/ Tegra Dashboard] - the foopy number is shown to the right
 == Check status of Tegra(s) ==
-Find the Tegra on the [[http://mobile-dashboard1.build.mtv1.mozilla.com/tegras/ Dashboard]] and then ssh to that foopy
+Find the Tegra on the [[#Find_what_foopy_a_Tegra_is_on|Dashboard]] and then ssh to that foopy
   ssh cltbld@foopy##
@@ Line 66: / Line 65: @@
 == Clear an error flag ==
+This is done automatically, once an hour. But if you need to do it manually for some reason...
 Find the Tegra on the Dashboard, ssh to that foopy and then
   ssh cltbld@foopy05
-  ./check.sh -t tegra-002 -r
+  rm -f /builds/tegra-NNN/error.flg
-== Restart Tegra(s) ==
+== start Tegra(s) ==
 Find out which foopy server you need to be on and then run:
- ssh cltbld@foopy##
- screen -x # or you will hit bug 642369
   cd /builds
-  ./stop_cp.sh tegra-###
+  rm -f /builds/tegra-###/{disabled,error}.flg
-check the '''ps''' output that is generated at the end to ensure that nothing has glitched. If any zombie processes are found then you will need to kill them manually.  Once clear, run
- ./start_cp.sh tegra-###
-== start Tegra(s) ==
+The device should then attempt to startup within 5 minutes, running through verify then starting buildbot it verify succeeds.
-Find out which foopy server you need to be on and then run:
- screen -x # or you will hit bug 642369
+Should it seem to have trouble starting, you can check its watcher log:
-  cd /builds
+  tail /builds/tegra-###/watcher.log
- ./start_cp.sh [tegra-###]
+And if that is stale you might want to peek at [[#Recover_a_foopy|recover a foopy]]
-If you specify the tegra-### parameter then it will only attempt to start that Tegra, otherwise it will walk thru all Tegras found in /builds/tegra-*
+== Disable a tegra ==
-== stop Tegra(s) ==
 First find the foopy server for the Tegra and then run:
- screen -x # or you will hit bug 642369
   cd /builds
-  ./stop_cp.sh [tegra-###]
+  touch tegra-NNN/disabled.flg
-If you specify the tegra-### parameter then it will only attempt to stop that Tegra, otherwise it will walk thru all Tegras found in /builds/tegra-*
-At the end of the startup process, stop_cp.sh will run
- ps auxw | grep "tegra-###"
+This will then stop the device within 5 minutes, at the next watch_devices cycle.
-to allow you to check that all associated or spawned child processes have been also stopped. Sadly some of them love to zombie and that just ruins any summer picnic.
+Should it seem to have trouble starting, you can check its watcher log:
+ tail /builds/tegra-###/watcher.log
+And if that is stale you might want to peek at [[#Recover_a_foopy|recover a foopy]]
 == Reboot a tegra ==
@@ Line 120: / Line 109: @@
 If rebooting via PDU does not clear the problem, here are things to try:
 * reboot again - fairly common to have 2nd one clear it
-** especially if box responsive to ping & telnet (port 20701) after first reboot
+** especially if device responsive to ping & telnet (port 20701) after first reboot
 == Recover a foopy ==
@@ Line 130: / Line 119: @@
   screen -x
   cd /builds
-  ./stop_cp.sh
+  rm -f tegra-*/watcher.lock
-  ./start_cp.sh
+  ./watch_devices.sh
 = Advanced tegra management =