ReleaseEngineering/How To/Fix the RAID config on an HP Machine
This class of machine can experience a problem where they lose their RAID config at boot. This problem is tracked is bug 779487. If you see an HP machine at a "Welcome to the Mozilla Corporation Network Installer" screen, it needs to have its RAID config fixed. You can do that with the following step:
- Re-open (or create) the host specific bug blocked by bug 779487, and make sure the bug is assigned to "Server Operations: DCOps", adding the current details.
- Wait for DCOps to do their thing. Mostly, this occurs with mock-builders, and production has burst coverage via AWS.
DCOps is working this problem closely with HP, and there may be new patches and/or firmware they will want to apply. Letting them do the work ensures the latest information is used.
Urgent Workaround Instructions
If you need the host back in service ASAP (e.g. it's the only mock builder in staging), you can use the following steps to temporarily work around the issue:
- Assign the #bug to yourself, and indicate you're going to attempt the workaround.
- Open up the management console for the machine (for example, http://bld-centos6-hp-004-mgmt.build.mozilla.org) - if this doesn't work on your local machine, RDP To winadmin to do it.
- Click through the "This connection is untrusted" warning and login
- Open the Remote Console ("Remote Console" → "Launch" under Java Integrated Remote Console). Click through any warnings/confirmations it throws up.
- Reset the machine ("Power Switch" → "Reset")
- Once the "HP Smart Array" screen comes up, press F8 until you get into the "Option Rom Configuration for Arrays" screen. If you end up in the main BIOS setup, start again.
- Hit ESC once to get to the main menu of this configuration screen.
- Choose "Create Logical Drive"
- Choose the defaults ("Port 1, Box 1, Bay 1" for the drive, "RAID 0" for the RAID config, "Use one drive as spare" should be unchecked, and "Disable" Maximum Boot partition)
- Press F8 to confirm, Enter to continue, then ESC to exit.
- The machine should boot to CentOS and reenter the slave pool.
- Document that you performed this procedure in the #bug and resolve the bug. (There's no action for DCOps if the machine isn't hanging at the prompt.)
- If the workaround did not succeed, be sure the #bug is unassigned in component "Server Operations: DCOps"