Breakpad/Status Meetings/2015-08-19: Difference between revisions

Revision as of 18:25, 19 August 2015

« previous meeting — index – next week » ^create?

Meeting Info

Breakpad status meetings occur on Wed at 11:00am Pacific Time.

Conference numbers:

   Vidyo: Stability 
   650-903-0800 x92 conf 98200#
   800-707-2533 (pin 369) conf 98200#

IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)

Operations Updates

stage is working again
- fair and balanced submitter is running
  - 10% of what we receive is now going to stage
- adi is coming in now

JSON problems reoccured in prod
- stage has a patch the filters keys as well as values
- manually submitting a problematic crash leads us to believe stage is fixed
- write out the null bytes

updating supersearchfields is not working
- adrian working on a patch

datadog runs on UDP
- we've been losing windows of time

webapp has been registering as down to pingdom/nagios
- high response time requests from the webapp, but they are serving requests in those windows
- can we get newrelic to look?
  - talk to Travis
  - we need 8 hosts for a while
- moztrap was seeing this
  - cache was acclimating pingdom to fast responses
  - whole cache was invalidating at once
- we are not actually down in these times, may just have higher response times above the timeouts
- the only thing more dangerous than no alerting is noisy alerting

pingdom accounts for the team?
- going to take approx 1 month to set us up for pingdom on the mozilla account

sentry is down
- cannot ingest
- should be back by the end of the day

loggly
- django errors don't end up in syslog, they don't go to stdout
- maybe we don't want to continue with loggly
  - the features it has over other log aggregators are not that useful

Other

How many EC2 webhead nodes do we have to prod?
- Wanna scale that down?

Project Updates

Socorro Bug Tracker
this week's bugs

Deployment Triage

http://mzl.la/NuW9Zi

PR Triage

https://prs.paas.allizom.org/mozilla/socorro

QA

Help tracking down an intermittent failure
- ReadTimeout: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out. (read timeout=10)

other business

We're ok with exposing ADI
The Raw Crash JSON has weeeeird keys - https://crash-stats.mozilla.com/admin/supersearch-fields/missing/

Travel, etc

Links

@@ Line 19: / Line 19: @@
 == Operations Updates ==
 * stage is working again
-** updating supersearchfields is not working
+** fair and balanced submitter is running
+*** 10% of what we receive is now going to stage
+** adi is coming in now
+* JSON problems reoccured in prod
+** stage has a patch the filters keys as well as values
+** manually submitting a problematic crash leads us to believe stage is fixed
+** write out the null bytes
+* updating supersearchfields is not working
+** adrian working on a patch
+* datadog runs on UDP
+** we've been losing windows of time
+* webapp has been registering as down to pingdom/nagios
+** high response time requests from the webapp, but they are serving requests in those windows
+** can we get newrelic to look?
+*** talk to Travis
+*** we need 8 hosts for a while
+** moztrap was seeing this
+*** cache was acclimating pingdom to fast responses
+*** whole cache was invalidating at once
+** we are not actually down in these times, may just have higher response times above the timeouts
+** the only thing more dangerous than no alerting is noisy alerting
+* pingdom accounts for the team?
+** going to take approx 1 month to set us up for pingdom on the mozilla account
+* sentry is down
+** cannot ingest
+** should be back by the end of the day
+* loggly
+** django errors don't end up in syslog, they don't go to stdout
+** maybe we don't want to continue with loggly
+*** the features it has over other log aggregators are not that useful
 === Other ===

Breakpad/Status Meetings/2015-08-19: Difference between revisions

Revision as of 18:25, 19 August 2015

Contents

Meeting Info

Operations Updates

Other

Project Updates

Deployment Triage

PR Triage

QA

other business

Travel, etc

Links

Navigation menu

Breakpad/Status Meetings/2015-08-19: Difference between revisions

Revision as of 18:25, 19 August 2015

Meeting Info

Operations Updates

Other

Project Updates

Deployment Triage

PR Triage

QA

other business

Travel, etc

Links

Navigation menu

Search