ReleaseEngineering/How To/Fix Build4h not updating: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
 
(Minor indentation and formatting updates)
 
Line 1: Line 1:
* Issue
* '''Nagios alert'''
 
<pre>
<pre>
Notification Type: PROBLEM
<nagios-releng> Fri 02:57:18 PDT [4051] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-4hr.js.gz is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:11:57 ago - 2896 bytes in 0.003 second response time (http://m.mozilla.org/http+file+age+-+/buildjson/builds-4hr.js.gz)
</pre>


Service: http file age - /buildjson/builds-4hr.js.gz
* '''Causes'''
Host: builddata.pub.build.mozilla.org
** one possible reason for this is invalid data being imported on one or more buildbot-masters
Address: 63.245.215.57
State: CRITICAL
</pre>


* Problem on buildbot-master
* '''Fix'''
** Invalid data imported on buildbot-master
:* check 'statusdb' for invalid entries:
** In order to sort the request we can use the following sequence
<pre>
<pre>
mysql> select * from builds, build_properties, properties where builds.id = build_properties.build_id and build_properties.property_id = properties.id and name = 'request_times' and value like '"%';
mysql> select * from builds, build_properties, properties where builds.id = build_properties.build_id and build_properties.property_id = properties.id and name = 'request_times' and value like '"%';
Line 24: Line 20:
</pre>
</pre>


** In this case the issue is that we have a number of rows in the statusdb with invalid request times.
:* in this particular case the issue was that we had a number of rows with invalid request times, all coming from a single buildbot master.
 
:* to fix it, someone had to decode the values as json, eval()'ing as python expressions, re-encoding as json, and writing back to the database.
* The fix
 
To fix this someone need to decode the values as json, eval()'ing as python expressions, re-encoding as json, and writing back to the database.


*See also Bug 1311964
* '''See also'''
  [https://bugzilla.mozilla.org/show_bug.cgi?id=1312002 Bug 1311964]

Latest revision as of 13:08, 12 December 2017

  • Nagios alert
<nagios-releng> Fri 02:57:18 PDT [4051] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-4hr.js.gz is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:11:57 ago - 2896 bytes in 0.003 second response time (http://m.mozilla.org/http+file+age+-+/buildjson/builds-4hr.js.gz)
  • Causes
    • one possible reason for this is invalid data being imported on one or more buildbot-masters
  • Fix
  • check 'statusdb' for invalid entries:
mysql> select * from builds, build_properties, properties where builds.id = build_properties.build_id and build_properties.property_id = properties.id and name = 'request_times' and value like '"%';

build_properties.property_id = properties.id and name = 'request_times' and value like '"%';
+-----------+-------------+------------+----------+-----------+---------------------+---------------------+--------+------------------------------------------------+-----------+------+-------------+-----------+-----------+---------------+------------+--------------------------------------------------------------------------------------------------------------------------------------+
| id        | buildnumber | builder_id | slave_id | master_id | starttime           | endtime             | result | reason                                         | source_id | lost | property_id | build_id  | id        | name          | source     | value                    |
+-----------+-------------+------------+----------+-----------+---------------------+---------------------+--------+------------------------------------------------+-----------+------+-------------+-----------+-----------+---------------+------------+--------------------------------------------------------------------------------------------------------------------------------------+
| 110001696 |         705 |     615690 |    13071 |       213 | 2016-10-20 09:08:06 | 2016-10-20 10:32:59 |      2 | scheduler                                      |  16077227 |    0 |   403588549 | 110001696 | 403588549 | request_times | postrun.py | "{'128606532': 1476951285L}"                    |
| 110002178 |         707 |     615690 |    12017 |       213 | 2016-10-20 09:57:12 | 2016-10-20 11:21:48 |      2 | scheduler                                      |  16077868 |    0 |   403590537 | 110002178 | 403590537 | request_times | postrun.py | "{'128608528': 1476952425L}"                    |
  • in this particular case the issue was that we had a number of rows with invalid request times, all coming from a single buildbot master.
  • to fix it, someone had to decode the values as json, eval()'ing as python expressions, re-encoding as json, and writing back to the database.
  • See also
 Bug 1311964