Webdev:Meetings:2010-09-28: Difference between revisions

Webdev:Meetings:2010-09-28 (view source)

836 bytes added , 28 September 2010

1,107

edits

@@ Line 27: / Line 27: @@
 == lars ==
 == laura ==
+* Firefighting:
+** After last webdev meeting, rest of week was consumed with 1.8 [https://intranet.mozilla.org/Socorro:1.8_Postmortem release/disaster/rollback/postmortem]
+** Last week began with a day off and then [https://intranet.mozilla.org/Socorro:1.8_Postmortem#Outcomes_.2F_Actions grand plans to make things better]
+** and ended with further disasters (consuming Thursday and Friday) due to a hole in an HBase table, which we coded around with duct tape and then later fixed thanks to Cloudera Support and help from our friends at StumbleUpon.
+** This week began with a new disaster, which an entire day of work revealed to be due to a failed hard drive on cm-hadoop06.  Yes, Hadoop is meant to be redundant if it loses a node.  As long as it's not the wrong node.  And no, monitoring did not alert us to the drive going bad.
 == lorchard ==
 == malexis ==