ReleaseEngineering/Mozpool/Handling Panda Failures: Difference between revisions

m
Line 29: Line 29:
== Mozpool failed_ state ==
== Mozpool failed_ state ==


failed_power_cycling: "The power-cycle operation itself has failed or timed out multiple times"
'''failed_power_cycling:''' "The power-cycle operation itself has failed or timed out multiple times"
*Explanation: This is typically caused by a relay board failure.  See Relay Board Failures
*Explanation: This is typically caused by a relay board failure.  See Relay Board Failures


failed_pxe_booting: "While PXE booting, the device repeatedly failed to contact the imaging server from the live image."
'''failed_pxe_booting:''' "While PXE booting, the device repeatedly failed to contact the imaging server from the live image."
*Explanation: Mozpool successfully powercycled the relay associated with the panda board but the panda board did not check-in with mozpool within the allotted time.
*Explanation: Mozpool successfully powercycled the relay associated with the panda board but the panda board did not check-in with mozpool within the allotted time.
**Reasons:
**Reasons:
Line 42: Line 42:
**#squashfs file failed to download:  This can be caused by the file being renamed on the apache server or being mispelled in the pxe_config db entry.  If the squashfs fails to download, check the apache error logs and the pxe-config in the mozpool db associated with the failed request for mismatched squashfs URL.
**#squashfs file failed to download:  This can be caused by the file being renamed on the apache server or being mispelled in the pxe_config db entry.  If the squashfs fails to download, check the apache error logs and the pxe-config in the mozpool db associated with the failed request for mismatched squashfs URL.


failed_mobile_init_started: "While executing mobile-init, the device repeatedly failed to contact the imaging server from the live image."
'''failed_mobile_init_started:''' "While executing mobile-init, the device repeatedly failed to contact the imaging server from the live image."
*Explanation: The panda board successfully pxe booted into the live environment but failed to continue executing a second stage script.
*Explanation: The panda board successfully pxe booted into the live environment but failed to continue executing a second stage script.


failed_sut_verifying: "Could not connect to SUT agent."  <B>There is a known bug {{bug|836417}} causing all sut_verifying checks to fail after a reimage.  See [[#Known Issues and Handling ]]</B>  
'''failed_sut_verifying:''' "Could not connect to SUT agent."  <B>There is a known bug {{bug|836417}} causing all sut_verifying checks to fail after a reimage.  See [[#Known Issues and Handling ]]</B>  
*Explanation:  Mozpool was unable to connect to a panda running SUTAgent.  This may be an indication that SUTAgent has failed to start or had crashed.
*Explanation:  Mozpool was unable to connect to a panda running SUTAgent.  This may be an indication that SUTAgent has failed to start or had crashed.


failed_android_downloading: "While installing Android, the device timed out repeatedly while downloading Android"
'''failed_android_downloading:''' "While installing Android, the device timed out repeatedly while downloading Android"
*Explanation:  One or more android artifacts failed to download during the second stage script.
*Explanation:  One or more android artifacts failed to download during the second stage script.
**Reasons:
**Reasons:
Line 55: Line 55:
**#partition failed to mount:  Formatting succeeded but failed to mount filesystems.  This can be caused by a faulty SD card or a bad preseed image.
**#partition failed to mount:  Formatting succeeded but failed to mount filesystems.  This can be caused by a faulty SD card or a bad preseed image.


failed_android_extracting: "While installing Android, the device timed out repeatedly while extracting Android"
'''failed_android_extracting:''' "While installing Android, the device timed out repeatedly while extracting Android"
*Explanation:  The android artifacts were successfully download but one or more failed to exctract.
*Explanation:  The android artifacts were successfully download but one or more failed to exctract.
**Reasons:   
**Reasons:   
**#artifact corrupted:  If one of the android artifact tarballs are corrupt, extraction will fail.
**#artifact corrupted:  If one of the android artifact tarballs are corrupt, extraction will fail.


failed_b2g_downloading: "While installing B2G, the device timed out repeatedly while downloading B2G"
'''failed_b2g_downloading:''' "While installing B2G, the device timed out repeatedly while downloading B2G"
*Explanation:  One or more B2g artifacts failed to download during the second stage script.
*Explanation:  One or more B2g artifacts failed to download during the second stage script.
**Reasons:
**Reasons:
Line 67: Line 67:
**#partition failed to mount:  Formatting succeeded but failed to mount filesystems.  This can be caused by a faulty SD card or a bad preseed image.
**#partition failed to mount:  Formatting succeeded but failed to mount filesystems.  This can be caused by a faulty SD card or a bad preseed image.


failed_b2g_extracting: "While installing B2G, the device timed out repeatedly while extracting B2G"
'''failed_b2g_extracting:''' "While installing B2G, the device timed out repeatedly while extracting B2G"
*Explanation:  The B2G artifacts were successfully download but one or more failed to exctract.
*Explanation:  The B2G artifacts were successfully download but one or more failed to exctract.
**Reasons:   
**Reasons:   
**#artifact corrupted:  If one of the B2G artifact tarballs are corrupt, extraction will fail.
**#artifact corrupted:  If one of the B2G artifact tarballs are corrupt, extraction will fail.


failed_b2g_pinging: "While installing B2G, the device timed out repeatedly while pinging the new image waiting for it to come up"
'''failed_b2g_pinging:''' "While installing B2G, the device timed out repeatedly while pinging the new image waiting for it to come up"
*Explanation:  After the b2g installation, the device is powercycled, given time to boot and then pinged.  This failure state is reached after repeated ping attemps fail.
*Explanation:  After the b2g installation, the device is powercycled, given time to boot and then pinged.  This failure state is reached after repeated ping attemps fail.
**Reasons:
**Reasons:
Confirmed users
120

edits