Breakpad/Status Meetings/2016-08-03
< Breakpad | Status Meetings
Jump to navigation
Jump to search
« previous meeting — index – next week » create?
Meeting Info
Breakpad status meetings occur on Wed at 10:00am Pacific Time.
Conference numbers:
Vidyo: Stability 650-903-0800 x92 conf 98200# 800-707-2533 (pin 369) conf 98200#
IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)
Operations Updates
- focused on "not being on fire"
- seems to be going well
- root cause of last weeks issues
- configuration mismatch with the rest of the cluster
- puppet missed putting the .yaml file in there
- they defaulted to 2GB and when they exhausted themselves everything went to hell
- we initially suspected that it was retention related
- debated but didn't land a change that would lower retention temporarily
- new pingdom accounts coming if you have one already
- monitoring of ES
- Jason has been helping us to figure out our ES config and make it more robust
- JP has new monitoring agent
- we expect to have new, aggressive alerts
- super search errors are checked in webapp health check
- should catch individual shard failures
- shard failures break pingdom and sentry now
- jp will own a plan for failure
- python upgrade
- on the horizon
- JP wants a stable stage and prod before he does it
- let's do it this week, shortly after our next ship to prod
Project Updates
- Socorro::Middleware component to Graveyard
- Monitoring/healthcheck now checks for ES shards errors. In prod. Every minute.
Deployment Triage
PR Triage
Major Projects
Migrating off of persona
- Deployed in stage
- emails will be sent as soon as we put this in prod
Sending public data to parquet for reading from spark/re:dash
- Adrian and Peter has prototype to add another crash storage that sends to S3
- Mark's awareness of reprocessing (aka. primary keys)
Symbols service refactoring (snappy, somewhat tangental to us)
No update.
Signature generation across crash reporters
Splitting out collector
No update.