Breakpad/Status Meetings/2015-08-19

From MozillaWiki
Jump to: navigation, search

« previous meetingindexnext week » create?

Meeting Info

Breakpad status meetings occur on Wed at 11:00am Pacific Time.

Conference numbers:

   Vidyo: Stability 
   650-903-0800 x92 conf 98200#
   800-707-2533 (pin 369) conf 98200# 

IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)

Operations Updates

  • stage is working again
    • fair and balanced submitter is running
      • 10% of what we receive is now going to stage
    • adi is coming in now
  • JSON problems reoccured in prod
    • stage has a patch the filters keys as well as values
    • manually submitting a problematic crash leads us to believe stage is fixed
    • write out the null bytes
  • updating supersearchfields is not working
    • adrian working on a patch
  • datadog runs on UDP
    • we've been losing windows of time
  • webapp has been registering as down to pingdom/nagios
    • high response time requests from the webapp, but they are serving requests in those windows
    • can we get newrelic to look?
      • talk to Travis
      • we need 8 hosts for a while
    • moztrap was seeing this
      • cache was acclimating pingdom to fast responses
      • whole cache was invalidating at once
    • we are not actually down in these times, may just have higher response times above the timeouts
    • the only thing more dangerous than no alerting is noisy alerting
  • pingdom accounts for the team?
    • going to take approx 1 month to set us up for pingdom on the mozilla account
  • sentry is down
    • cannot ingest
    • should be back by the end of the day
  • loggly
    • django errors don't end up in syslog, they don't go to stdout
    • maybe we don't want to continue with loggly
      • the features it has over other log aggregators are not that useful


Other

  • How many EC2 webhead nodes do we have to prod?
    • Wanna scale that down?

Project Updates

Socorro Bug Tracker
this week's bugs

Deployment Triage

PR Triage

QA

  • Help tracking down an intermittent failure
    • ReadTimeout: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out. (read timeout=10)

other business

  • We're ok with exposing ADI
  • The Raw Crash JSON has weeeeird keys - https://crash-stats.mozilla.com/admin/supersearch-fields/missing/
  • how many EC2 webheads do we have, and do we want to scale it down?
    • pausing until we figure out the downtime reporting stuff
  • later this week there will be a PR for collector2015
    • changes collector system to use rule sets like the processor
    • will make it easier to accept non-binary crashes, ravenjs style, etc

Travel, etc

  • adrian AFK next week
  • lars is hiding in the water

Links