canmove, Confirmed users
2,850
edits
ChrisCooper (talk | contribs) |
ChrisCooper (talk | contribs) |
||
Line 198: | Line 198: | ||
=== report crontask === | === report crontask === | ||
There's a crontask on [https://mana.mozilla.org/wiki/display/websites/Releng+Cluster relengwebadm] that runs every minute to generate the builds-4hr.js file. If this crontask gets "hung", it will prevent updates from occuring. This can happen due to DB connection issues, or bad data in the DB. In these cases, killing the hung crontask is the appropriate fix. | There's a crontask on [https://mana.mozilla.org/wiki/display/websites/Releng+Cluster relengwebadm] that runs every minute to generate the builds-4hr.js file. If this crontask gets "hung", it will prevent updates from occuring. This can happen due to DB connection issues, or bad data in the DB. In these cases, killing the hung crontask is the appropriate fix. There may be a stale lockfile that need removing under /var/lock/buildapi too. | ||
However, the crontask takes about 15 minutes to run on a cold cache. So if its cache (memcached) has gone cold, then killing it before 15 minutes have elapsed is only delaying the failure. | However, the crontask takes about 15 minutes to run on a cold cache. So if its cache (memcached) has gone cold, then killing it before 15 minutes have elapsed is only delaying the failure. | ||
From [https://bugzilla.mozilla.org/show_bug.cgi?id=1005342#c6 bug 1005342]: | |||
{| class="wikitable" | |||
|- | |||
| ''Dustin J. Mitchell [:dustin] 2014-05-02 16:18:14 PDT'' | |||
I looked at 4:01 pacific, and saw that builds-4hr.js.gz had last been generated at 2:47. This is generated by a crontab that runs every minute, and makes a lot of heavy queries against the DB. On a good day with a cold cache, it's about 15 minutes. With a hot cache it can finish comfortably in one minute. It would seem that Matt's heavy query caused the builds-4hr.js.gz queries to go slowly, and thus caused the file's generation to fall behind. | |||
I found the crontask, appropriately dated 2:47, and killed it. Cron started a new one up on the minute, and that completed in something like 45s. | |||
I don't know why the original job didn't complete soon after mpressman cancelled his query. | |||
|} | |||
= Staging = | = Staging = |