ReleaseEngineering/How To/Get a Missing Slave Back Online

From MozillaWiki
Jump to: navigation, search

Usually you'll find that a slave is missing when nagios alerts you of such in #build - either a failed PING or some other alert. There can be several problems:

"But I can ping it just fine"

If a ping to the same hostname works fine for you, then nagios may be using an old IP address. This is particularly common when systems move around. Nagios caches IP addresses every time its configuration is regenerated. See if relops can take a look.

It seems down, but has -ix- in the hostname

IX boxes have an IPMP interface that sometimes works. See ReleaseEngineering/How Tos/Connect To IPMI. Generally we just use this interface to reboot the system.

Cannot get a display on mac

I initially thought the slave was unable to get a display because of a malfunctioning dongle From the system.log

May 29 07:09:39 talos-r4-snow-009 screenresolution[1142]: starting screenresolution argv=/usr/local/bin/screenresolution get May 29 07:09:39 talos-r4-snow-009 screenresolution[1142]: kCGErrorFailure: Set a breakpoint @ CGErrorBreakpoint() to catch errors as they are logged. May 29 07:09:39 talos-r4-snow-009 screenresolution[1142]: Error: failed to get list of active displays

However, it turns out that the problem was that a recent password update script had gone awry.

This was not an issue with the edid101d but rather a side effect of the passwd change being half implemented. The errors in comment 1 were from cltbld not being able to autologin. Keychain needed to be updated manually.

See bug 759332 for details