Breakpad/Status Meetings/2015-08-19: Difference between revisions
< Breakpad | Status Meetings
Jump to navigation
Jump to search
Line 19: | Line 19: | ||
== Operations Updates == | == Operations Updates == | ||
* stage is working again | * stage is working again | ||
** updating supersearchfields is not working | ** fair and balanced submitter is running | ||
*** 10% of what we receive is now going to stage | |||
** adi is coming in now | |||
* JSON problems reoccured in prod | |||
** stage has a patch the filters keys as well as values | |||
** manually submitting a problematic crash leads us to believe stage is fixed | |||
** write out the null bytes | |||
* updating supersearchfields is not working | |||
** adrian working on a patch | |||
* datadog runs on UDP | |||
** we've been losing windows of time | |||
* webapp has been registering as down to pingdom/nagios | |||
** high response time requests from the webapp, but they are serving requests in those windows | |||
** can we get newrelic to look? | |||
*** talk to Travis | |||
*** we need 8 hosts for a while | |||
** moztrap was seeing this | |||
*** cache was acclimating pingdom to fast responses | |||
*** whole cache was invalidating at once | |||
** we are not actually down in these times, may just have higher response times above the timeouts | |||
** the only thing more dangerous than no alerting is noisy alerting | |||
* pingdom accounts for the team? | |||
** going to take approx 1 month to set us up for pingdom on the mozilla account | |||
* sentry is down | |||
** cannot ingest | |||
** should be back by the end of the day | |||
* loggly | |||
** django errors don't end up in syslog, they don't go to stdout | |||
** maybe we don't want to continue with loggly | |||
*** the features it has over other log aggregators are not that useful | |||
=== Other === | === Other === |
Revision as of 18:25, 19 August 2015
« previous meeting — index – next week » create?
Meeting Info
Breakpad status meetings occur on Wed at 11:00am Pacific Time.
Conference numbers:
Vidyo: Stability 650-903-0800 x92 conf 98200# 800-707-2533 (pin 369) conf 98200#
IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)
Operations Updates
- stage is working again
- fair and balanced submitter is running
- 10% of what we receive is now going to stage
- adi is coming in now
- fair and balanced submitter is running
- JSON problems reoccured in prod
- stage has a patch the filters keys as well as values
- manually submitting a problematic crash leads us to believe stage is fixed
- write out the null bytes
- updating supersearchfields is not working
- adrian working on a patch
- datadog runs on UDP
- we've been losing windows of time
- webapp has been registering as down to pingdom/nagios
- high response time requests from the webapp, but they are serving requests in those windows
- can we get newrelic to look?
- talk to Travis
- we need 8 hosts for a while
- moztrap was seeing this
- cache was acclimating pingdom to fast responses
- whole cache was invalidating at once
- we are not actually down in these times, may just have higher response times above the timeouts
- the only thing more dangerous than no alerting is noisy alerting
- pingdom accounts for the team?
- going to take approx 1 month to set us up for pingdom on the mozilla account
- sentry is down
- cannot ingest
- should be back by the end of the day
- loggly
- django errors don't end up in syslog, they don't go to stdout
- maybe we don't want to continue with loggly
- the features it has over other log aggregators are not that useful
Other
- How many EC2 webhead nodes do we have to prod?
- Wanna scale that down?
Project Updates
Socorro Bug Tracker
this week's bugs
Deployment Triage
PR Triage
QA
- Help tracking down an intermittent failure
- ReadTimeout: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out. (read timeout=10)
other business
- We're ok with exposing ADI
- The Raw Crash JSON has weeeeird keys - https://crash-stats.mozilla.com/admin/supersearch-fields/missing/