Sheriffing/How To/Hangs

From MozillaWiki
Jump to: navigation, search

Hang/Timeout Test Failures need special attention

When a test hangs, and "application timed out after 330 seconds with no output," we kill the process just in case something interesting was happening, and put a "crash" stack in the log.

There's almost never anything significant at the top of the stack, even when we were actually up to something, the (rare) significance is more likely to be buried in some other thread. But most of the time, we're just sitting spinning the event loop, waiting for something to happen that isn't ever going to happen, and the "crash" signature is CrashingThread(void *), or libSystem.B.dylib + 0xd7a, or linux-gate.so + 0x424.

Those signatures mean absolutely nothing beyond what you're already saying in the summary, application timed out after 330 seconds with no output, but thanks to the power of (treeherder's bug) suggestion, if you look at hang bugs, you'll see that when we get in a hurry, we happily star things like "test_HTMLElement58.html | application timed out after 330 seconds with no output" as "Windows mochitest-1,2,3 hangs on Shutdown | application timed out after 330 seconds with no output".

Please don't put any of those three things in bug summaries, please remove them when you see them, and please don't star new unfiled hangs as something utterly different in a different test. The only time we need them in the summary is when we don't have a test name, which should only be the case for Shutdown, where we just need to gather up the strength to not to call an unfiled test timeout a shutdown timeout, even if treeherder suggests it.