541
edits
| Line 12: | Line 12: | ||
== list of steps to try == | == list of steps to try == | ||
1. | === For blocked talos === | ||
* <b>Symptoms</b> | |||
** A given talos slave hasn't reported any numbers in a long time (upwards of 6 - 8 hours) | |||
** A given talos slave from a set has been consuming a lot of builds rapidly and failing out on browser download/installation | |||
1. Check waterfall at: http://qm-rhel02.mozilla.org:2006/ (mpt-vpn) | |||
* see if slave is connected. | * see if slave is connected. | ||
2. | 2. Restart slave | ||
* login to machine using provided credentials | * login to machine using provided credentials | ||
** VNC (qm-pxp01-05, qm-mini*): | ** VNC (qm-pxp01-05, qm-mini*): | ||
* close running instances of firefox or dialog windows (make sure to check the taskbar) | * close running instances of firefox or dialog windows (make sure to check the taskbar) | ||
* | * On WinXP/Vista | ||
** ctrl-c in the command window, answer yes to terminate buildbot process | ** ctrl-c in the command window, answer yes to terminate buildbot process | ||
** cd c:\ | ** cd c:\ | ||
** on qm-pxp*, 'buildbot start slave' (command does not return) | ** on qm-pxp*, 'buildbot start slave' (command does not return) | ||
** on qm-mini*, 'buildbot start talos-slave' (command does not return) | ** on qm-mini*, 'buildbot start talos-slave' (command does not return) | ||
* | * On Linux/Mac | ||
** login via ssh | ** login via ssh | ||
** 'buildbot stop talos-slave' (ignore 'never saw slave...' message on mac) | ** 'buildbot stop talos-slave' (ignore 'never saw slave...' message on mac) | ||
| Line 33: | Line 38: | ||
'''note''' builds are triggered by finished builds on the Tinderbox (Firefox for trunk, Mozilla1.8 for branch). Then, depending on when the master was started, may take up to 10 minutes to recognize a change. If the master is restarted, first completed tinderbox builds are often missed so sometimes it can take upwards of 30-40 minutes to verify that systems are working as expected. | '''note''' builds are triggered by finished builds on the Tinderbox (Firefox for trunk, Mozilla1.8 for branch). Then, depending on when the master was started, may take up to 10 minutes to recognize a change. If the master is restarted, first completed tinderbox builds are often missed so sometimes it can take upwards of 30-40 minutes to verify that systems are working as expected. | ||
3. | === For a talos machine reporting strange numbers === | ||
* | |||
* | * <b>Symptoms</b> | ||
* | ** A given talos machine is reporting significantly higher/lower numbers than matching machines. | ||
** buildbot | *** Talos machines reporting to trunk come in sets of three (qm-mini-ubuntu01/02/03, qm-mini-vista01/02/03, etc) so that outlier results can be spotted. If we see an outlier we try and fix the configuration on that given machine to have it match it's equals. | ||
* | |||
** cd / | ==== Linux talos machines ==== | ||
* | 1. stop the build slave | ||
* | ~$ buildbot stop talos-slave | ||
* | 2. Is throttling on/correct? | ||
** | * reset the throttling | ||
** buildbot | ~$ sudo cpufreq-set -g userspace | ||
* | ~$ sudo cpufreq-set -g userspace -c 1 | ||
* | ~$ sudo cpufreq-set -f 1000 | ||
~$ sudo cpufreq-set -f 1000 -c 1 | |||
~$ cpufreq-info | |||
analyzing CPU 0: | |||
driver: acpi-cpufreq | |||
CPUs which need to switch frequency at the same time: 0 | |||
hardware limits: 1000 MHz - 1.67 GHz | |||
available frequency steps: 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz | |||
available cpufreq governors: userspace, conservative, powersave, ondemand, performance | |||
current policy: frequency should be within 1000 MHz and 1.67 GHz. | |||
The governor "userspace" may decide which speed to use | |||
within this range. | |||
current CPU frequency is 1000 MHz. | |||
analyzing CPU 1: | |||
driver: acpi-cpufreq | |||
CPUs which need to switch frequency at the same time: 1 | |||
hardware limits: 1000 MHz - 1.67 GHz | |||
available frequency steps: 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz | |||
available cpufreq governors: userspace, conservative, powersave, ondemand, performance | |||
current policy: frequency should be within 1000 MHz and 1.67 GHz. | |||
The governor "userspace" may decide which speed to use | |||
within this range. | |||
current CPU frequency is 1000 MHz. | |||
3. Is the random number generator set up correctly? | |||
~$ cd /dev | |||
~$ sudo rm random; mknod random c 1 9 | |||
~$ ls -l | grep random | |||
crw-r--r-- 1 root root 1, 9 2007-12-18 10:48 random | |||
crw-rw-rw- 1 root root 1, 9 2007-12-17 22:24 urandom | |||
4. Can you VNC to the machine? | |||
* login via VNC | |||
* If this fails login via ssh | |||
~$ sudo x11vnc -display :0 -shared -forever -rfbauth /home/mozqa/.vnc/passwd -auth /var/lib/gdm/:0.Xauth -bg | |||
5. Check settings | |||
* Screensaver off | |||
* Auto-update off | |||
* all sleep features off | |||
* Screen size 1280 x 1024 | |||
6. Re-start apache | |||
~$ /etc/init.d/apache2 restart | |||
7. Re-start the buildbot slave and check the numbers after the next successful machine cycle | |||
~$ buildbot start talos-slave | |||
8. If all else fails, reboot the machine | |||
9. Ensure the settings as described above are correct | |||
10. Re-start the buildbot slave | |||
~$ buildbot start talos-slave | |||
==== Mac talos machines (Tiger/Leopard) ==== | |||
1. Stop the buildbot slave | |||
~$ buildbot stop talos-slave | |||
2. Check settings | |||
* Screensaver off | |||
* Auto-update off | |||
* all sleep features off | |||
* Screen size 1024 x 768 | |||
3. Ensure correct version of apache is running | |||
~$ sudo apachectl stop | |||
~$ sudo /etc/apache2/bin/apachectl start | |||
4. Re-start the buildbot slave and check the numbers after the next successful machine cycle | |||
~$ buildbot start talos-slave | |||
5. If all else fails, reboot the machine | |||
6. Ensure the settings as described above are correct | |||
7. Re-start the buildbot slave | |||
~$ buildbot start talos-slave | |||
==== WinXP talos machines ==== | |||
1. Stop the buildbot slave | |||
* In the open cmd Ctrl-C, 'y' to stop slave | |||
2. Check throttling | |||
* Open speedswitch | |||
* Check Settings | |||
* Machine settings -> Forced Throttle 50% | |||
3. Re-start apache | |||
* Start Menu -> Programs -> Apache -> restart apache server | |||
4. Check settings | |||
* Display Theme: Windows XP | |||
* screensaver off | |||
* auto-update off | |||
* firewall off | |||
* all sleep features off | |||
* screen size 1280 x 1024 | |||
5. Re-start the buildbot slave and check the numbers after the next successful machine cycle | |||
* In the open cmd 'buildbot start talos-slave' | |||
6. If all else fails, reboot the machine | |||
7. Ensure the settings as described above are correct | |||
8. Re-start the buildbot slave | |||
* Open cmd | |||
* cd c:/ | |||
* buildbot start talos-slave | |||
==== Vista talos machines ==== | |||
1. Stop the buildbot slave | |||
* in the open cmd Ctrl-c, 'y' to stop slave | |||
2. Check throttling | |||
* Access Control Panel -> Hardware & Sound -> Power Options -> Edit Plan Settings -> Change advanced Settings | |||
** Max processor state 50% | |||
** Min processor state 50% | |||
3. Re-start apache | |||
* Start Menu -> Programs -> Apache -> Re-start apache server -> right click -> 'Run as administrator' | |||
4. Check settings | |||
* screensaver off | |||
* all sleep features off | |||
* screen size 1280 x 1024 | |||
5. Re-start the buildbot slave and check the numbers after the next successful machine cycle | |||
* Start Menu -> cmd -> right click -> 'Run as administrator' | |||
* cd c:/ | |||
* buildbot start talos-slave | |||
6. If all else fails, reboot the machine | |||
7. Ensure the settings as described above are correct | |||
8. Re-start the buildbot slave | |||
* Start Menu -> cmd -> right click -> 'Run as administrator' | |||
* cd c:/ | |||
* buildbot start talos-slave | |||
edits