Breakpad/Status Meetings/2016-08-03: Difference between revisions

Line 20: Line 20:
* focused on "not being on fire"
* focused on "not being on fire"
** seems to be going well
** seems to be going well
** pull request went up for changing the retention in ES
 
*** did it get merged?
* root cause of last weeks issues
*** it did _not_!
** configuration mismatch with the rest of the cluster
** well then JP should have slept better last night
** puppet missed putting the .yaml file in there
** we had started a job that ran once a week to expire old indexes
** they defaulted to 2GB and when they exhausted themselves everything went to hell
** we trimmed down how many indexes are kept
** we initially suspected that it was retention related
** root cause of last weeks issues
** debated but didn't land a change that would lower retention temporarily
*** configuration mismatch with the rest of the cluster
* new pingdom accounts coming if you have one already
*** puppet missed putting the .yaml file in there
 
*** they defaulted to 2GB and when they exhausted themselves everything went to hell
* monitoring of ES
*** we initially suspected that it was retention related
** Jason has been helping us to figure out our ES config and make it more robust
*** debated but didn't land a change that would lower retention temporarily
** JP has new monitoring agent
** new pingdom accounts coming if you have one already
** we expect to have new, aggressive alerts
 
* super search errors are checked in webapp health check
** should catch individual shard failures
** shard failures break pingdom and sentry now
** jp will own a plan for failure
 
* python upgrade
** on the horizon
** JP wants a stable stage and prod before he does it
** let's do it this week, shortly after our next ship to prod
 
[https://bugzilla.mozilla.org/buglist.cgi?priority=P1&resolution=---&query_format=advanced&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=Infra&product=Socorro&list_id=13148014  P1 infra bugs]


== Project Updates ==
== Project Updates ==
Confirmed users
1,031

edits