ReleaseEngineering/Archive/Android Tegras: Difference between revisions

no edit summary
No edit summary
Line 7: Line 7:
Nagios will alert us in channel (and send email) after the it hits the retry limit for ping attempts.  
Nagios will alert us in channel (and send email) after the it hits the retry limit for ping attempts.  


See the section [[ReleaseEngineering:How_To:Android_Tegras#Reboot_a_tegra|power cycle a tegra]].
See the section [[#Reboot_a_tegra|power cycle a tegra]].


=== tegra agent check is CRITICAL ===
=== tegra agent check is CRITICAL ===
Line 24: Line 24:
     https://secure.pub.build.mozilla.org/buildapi/recent/tegra-338  
     https://secure.pub.build.mozilla.org/buildapi/recent/tegra-338  


* If it's burning builds, connect to the associated foopy listed in the dashboard and [[#Disable a tegra|stop the tegra(s).
* If it's burning builds, connect to the associated foopy listed in the dashboard and [[#Disable a tegra|stop the tegra(s)]].
 
<!--
<!--
TODO: This maintenance script needs updating, but docs will be (almost) perfect when done, so don't remove from page
TODO: This maintenance script needs updating, but docs will be (almost) perfect when done, so don't remove from page
Line 50: Line 49:
= Basic tegra management =
= Basic tegra management =
== Find what foopy a Tegra is on ==
== Find what foopy a Tegra is on ==
Open the Tegra Dashboard - the foopy number is shown to the right
Open the [http://mobile-dashboard.pub.build.mozilla.org/ Tegra Dashboard] - the foopy number is shown to the right


== Check status of Tegra(s) ==
== Check status of Tegra(s) ==
Find the Tegra on the [[http://mobile-dashboard1.build.mtv1.mozilla.com/tegras/ Dashboard]] and then ssh to that foopy
Find the Tegra on the [[#Find_what_foopy_a_Tegra_is_on|Dashboard]] and then ssh to that foopy


  ssh cltbld@foopy##
  ssh cltbld@foopy##
Line 66: Line 65:


== Clear an error flag ==
== Clear an error flag ==
This is done automatically, once an hour. But if you need to do it manually for some reason...
Find the Tegra on the Dashboard, ssh to that foopy and then
Find the Tegra on the Dashboard, ssh to that foopy and then


  ssh cltbld@foopy05
  ssh cltbld@foopy05
  ./check.sh -t tegra-002 -r
  rm -f /builds/tegra-NNN/error.flg
 
== Restart Tegra(s) ==


== start Tegra(s) ==
Find out which foopy server you need to be on and then run:
Find out which foopy server you need to be on and then run:


ssh cltbld@foopy##
screen -x # or you will hit bug 642369
  cd /builds
  cd /builds
  ./stop_cp.sh tegra-###
  rm -f /builds/tegra-###/{disabled,error}.flg
 
check the '''ps''' output that is generated at the end to ensure that nothing has glitched. If any zombie processes are found then you will need to kill them manually.  Once clear, run
 
./start_cp.sh tegra-###


== start Tegra(s) ==
The device should then attempt to startup within 5 minutes, running through verify then starting buildbot it verify succeeds.
Find out which foopy server you need to be on and then run:


screen -x # or you will hit bug 642369
Should it seem to have trouble starting, you can check its watcher log:
  cd /builds
  tail /builds/tegra-###/watcher.log
./start_cp.sh [tegra-###]
And if that is stale you might want to peek at [[#Recover_a_foopy|recover a foopy]]


If you specify the tegra-### parameter then it will only attempt to start that Tegra, otherwise it will walk thru all Tegras found in /builds/tegra-*
== Disable a tegra ==


== stop Tegra(s) ==
First find the foopy server for the Tegra and then run:
First find the foopy server for the Tegra and then run:
screen -x # or you will hit bug 642369
  cd /builds
  cd /builds
  ./stop_cp.sh [tegra-###]
  touch tegra-NNN/disabled.flg
 
If you specify the tegra-### parameter then it will only attempt to stop that Tegra, otherwise it will walk thru all Tegras found in /builds/tegra-*
 
At the end of the startup process, stop_cp.sh will run


ps auxw | grep "tegra-###"
This will then stop the device within 5 minutes, at the next watch_devices cycle.


to allow you to check that all associated or spawned child processes have been also stopped. Sadly some of them love to zombie and that just ruins any summer picnic.
Should it seem to have trouble starting, you can check its watcher log:
tail /builds/tegra-###/watcher.log
And if that is stale you might want to peek at [[#Recover_a_foopy|recover a foopy]]


== Reboot a tegra ==
== Reboot a tegra ==
Line 120: Line 109:
If rebooting via PDU does not clear the problem, here are things to try:
If rebooting via PDU does not clear the problem, here are things to try:
* reboot again - fairly common to have 2nd one clear it
* reboot again - fairly common to have 2nd one clear it
** especially if box responsive to ping & telnet (port 20701) after first reboot
** especially if device responsive to ping & telnet (port 20701) after first reboot


== Recover a foopy ==
== Recover a foopy ==
Line 130: Line 119:
  screen -x
  screen -x
  cd /builds
  cd /builds
  ./stop_cp.sh
  rm -f tegra-*/watcher.lock
  ./start_cp.sh
  ./watch_devices.sh


= Advanced tegra management =
= Advanced tegra management =
Account confirmers, Anti-spam team, Confirmed users, Bureaucrats and Sysops emeriti
1,529

edits