ReleaseEngineering/Buildduty/SVMeetings/Nov2-Nov6

From MozillaWiki
Jump to: navigation, search

Upcoming vacation/PTO:

  • alin - dec 24, dec 28-31

Meetings every Tuesday and Thursday

https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Sept28-Oct2 https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Sept28-Oct2-coop-visit https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Oct5-Oct9 https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Oct12-Oct16 https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Oct19-Oct23 https://wiki.mozilla.org/ReleaseEngineering/Buildduty/SVMeetings/Oct26-Oct30


30.10.2015 - 03-11.2015

New bugs

There are two blocker bugs alerting in buildduty, please see if they need help or can be assigned to someone

Could you go through the buildduty report and see if any of these bugs could be closed https://secure.pub.build.mozilla.org/builddata/reports/slave_health/buildduty_report.html

Add alert for buildbot lag https://bugzilla.mozilla.org/show_bug.cgi?id=1220191 You could needinfo catlee to see where the current graph is that measure this. This is similar to the bug that checks pending counts. We want alerts when certain buildbot masters hit certain limits.

Enable DOM push Mochitests on Android Fennec https://bugzilla.mozilla.org/show_bug.cgi?id=1214362 adding new tests on mobile, let me know if you have questions [Vlad] I will start looking over it

[alin] https://bugzilla.mozilla.org/show_bug.cgi?id=1220191 created and uploaded the python script to check the age of the backlog waiting for :catlee’s review

     2.    cleaned the buildduty report a little bit:

marked as resolved several problem tracking bugs:

Windows slaves: we have been monitoring them lately, things look good

one talos slave: got re-imaged as a win732 slave did a needinfo on bug 1185116 to see if the slave is still needed + some others from yesterday https://bugzilla.mozilla.org/show_bug.cgi?id=1204970

[vlad] https://bugzil.la/1214184 Attached the date and time settings for the slave https://bugzilla.mozilla.org/show_bug.cgi?id=1193002 Kim if you can provide us more details https://bugzilla.mozilla.org/show_bug.cgi?id=1193002


03.11.2015 - 05-11.2015

[alin]

noticed that when receiving alerts for a high number of pending jobs, we still use the results returned by the old version of check_pending_builds.py e.g: cruncher.srv.releng.scl3.mozilla.com:Pending builds is WARNING: WARNING Pending Builds: 2083 on cruncher, I checked the logs from /var/log/nrpe.log → they contain entries like the one above looking at /usr/lib64/nagios/plugins/custom/check_pending_builds will show the old version of the script Q: is there a certain period of time needed to do the sync with the version from the repo (https://hg.mozilla.org/build/braindump/file/tip/nagios-related/check_pending_builds.py)? Or do we need to change this?

2. https://bugzilla.mozilla.org/show_bug.cgi?id=1220191 - create nagios checks for buildbot backlog age and master lag yesterday I updated the script that checks the age of the backlog, waiting for Chris’s review looked over the script to create an alert using graphite data, played with graphite to understand it better, tested the script locally using several values for the thresholds we would need to create a nagios alert to check the lag on the masters too. Is there a repo that contains the config files we need to modify? talk to arr for more info on how to create nagios alerts

3. Loaning a dev-linux64-ec2 vm → puppetize.sh gets stuck when running papertrail does not contain any data on the issue connected to aws manager, switched to buildduty user and then managed to ssh to the instance: uptime puppetize.log contains the following:

Contacting puppet server puppet

5 Nov 09:26:06 ntpdate[20126]: 66.228.59.187 rate limit response from server.
5 Nov 09:26:07 ntpdate[20126]: adjust time server 67.18.187.111 offset -0.135954 sec

Got incorrect certificates (!?) Looks like it is failing here http://hg.mozilla.org/build/puppet/file/tip/modules/puppet/files/puppetize.sh#l123

[vlad]

https://bugzilla.mozilla.org/show_bug.cgi?id=1214362