Breakpad/Status Meetings/2016-08-03
From MozillaWiki
< Breakpad | Status Meetings
« previous meeting — index – next week » create?
Contents
- 1 Meeting Info
- 2 Operations Updates
- 3 Project Updates
- 4 Major Projects
- 4.1 Migrating off of persona
- 4.2 Sending public data to parquet for reading from spark/re:dash
- 4.3 Symbols service refactoring (snappy, somewhat tangental to us)
- 4.4 Signature generation across crash reporters
- 4.5 Splitting out collector
- 4.6 Collecting client-side JavaScript errors
- 4.7 Handling more PII data in crashes
- 4.8 Sending stacks for all crashes from the client
- 4.9 Replacing FTPscraper
- 5 other business
- 6 Travel, etc
- 7 Links
Meeting Info
Breakpad status meetings occur on Wed at 10:00am Pacific Time.
Conference numbers:
Vidyo: Stability 650-903-0800 x92 conf 98200# 800-707-2533 (pin 369) conf 98200#
IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)
Operations Updates
- focused on "not being on fire"
- seems to be going well
- root cause of last weeks issues
- configuration mismatch with the rest of the cluster
- puppet missed putting the .yaml file in there
- they defaulted to 2GB and when they exhausted themselves everything went to hell
- we initially suspected that it was retention related
- debated but didn't land a change that would lower retention temporarily
- new pingdom accounts coming if you have one already
- monitoring of ES
- Jason has been helping us to figure out our ES config and make it more robust
- JP has new monitoring agent
- we expect to have new, aggressive alerts
- super search errors are checked in webapp health check
- should catch individual shard failures
- shard failures break pingdom and sentry now
- jp will own a plan for failure
- python upgrade
- on the horizon
- JP wants a stable stage and prod before he does it
- let's do it this week, shortly after our next ship to prod
Project Updates
- Socorro::Middleware component to Graveyard
- aiming to reduce to ::infra ::general ::webapp
- https://bugzilla.mozilla.org/show_bug.cgi?id=1291808
- Monitoring/healthcheck now checks for ES shards errors. In prod. Every minute.
- Home page AJAX code now cached! Yay faster home page.
- in the ES fire there was talk of a spike among release drivers
- laura was saying we needed to be stable so they could investigate
- api now has cache headers
- webapp is using them now, improved perf
- google auth is getting ready to go out
- we see an error on stage that we cannot reproduce
- we suspect its security scanner tools, benign
- one more fix going out and then we're ready for prod
Deployment Triage
PR Triage
Major Projects
Migrating off of persona
- Deployed in stage
- emails will be sent as soon as we put this in prod
- also need to execute the migration for aliases https://bugzilla.mozilla.org/show_bug.cgi?id=1287548
- lonnen will migrate to clonnen, etc, gauth doesn't support aliases
Sending public data to parquet for reading from spark/re:dash
- Adrian and Peter has prototype to add another crash storage that sends to S3
- Mark's awareness of reprocessing (aka. primary keys)
- how useful is it and how often do we do it?
- we are going to unify raw and processed crashes into a single crash report json document based on the public schema
- avoids duplicate info, unifies all the info we have into one doc and uses the prettier name where we have it
- starts only at the point where we transmit to telemetry data platform
Symbols service refactoring (snappy, somewhat tangental to us)
No update.
Signature generation across crash reporters
- No update
- but Peter is asking about it here: https://bugzilla.mozilla.org/show_bug.cgi?id=828452
Splitting out collector
No update.
Collecting client-side JavaScript errors
- No update
Handling more PII data in crashes
Sending stacks for all crashes from the client
- no update
Replacing FTPscraper
- Nick Thomas suggests we put a file called "socorro.json" in the build artifacts on TaskCluster.
- A bad idea?
- https://public.etherpad-mozilla.org/p/socorro-releng-index-201607