CrashKill/WorkWeek2012/Analysis
From MozillaWiki
< CrashKill | WorkWeek2012
Contents
Desktop crash analysis - reports, steps we do
- Crashkill - looks at new crashes appearing in the last week
- Better metrics on how crashy things are - only have crashes per 100 ADU
- Release management - interested in mean time between failures.
- What % of our users are crashing more than x times a day?
- Bucket of people that crash a lot.
- Can we collect this via Telemetry?
- Ben - used to have a user id with crash reports. Difficult to go back there.
- Look at getting the data from Telemetry.
- Histogram - of how users crash.
- Laura - comparison - what does a healthy browser look like
- Add-ons - what does the histogram look like?
- Used correlation reports a lot - correlations only available for the top 200 reports
- STR, correlation reports, explosive
- Top mac crashes - breakdown by OS versions
- Would like a top list of crashes on OS
- Mobile crashes - top list per device - info in the app notes
- Could put this info in some other field - now we are using the app notes
- Client side - can we make it easier to add a field?
- Signature summary - highly valuable
- Breakpad - mini dump -> turn into stack. New version of the tool that spits out JSON, stick more data. Never had the time to get it integrated - change the format of the process dump, we are incompatible with the rest of the data. Get much more searchable, flexible process dumps.
- Input perspective - would like an easier way to get emails - how to write your own queries?
- Explosive crashes - integrated into the UI yet? Part way.
- Dups - feedback from us on whether the detection seems right or not. Reports where a bunch of fields are identical.
- Regression ranges - how we we isolate good regression ranges.
Mobile walk-through of how we use Socorro
- Which OS version.
- Mostly searches by build id
- Device report - combines signatures and devices - yes there is a bug on this.
- ADUs per OS and ADUs per device - is there a bug on this? No…probably not. They likely don't get the data from the client.
- Put some information in Socorro between devices and CPUs - bug 763629
- Add-block plus - add-on where we have had crashes
- Crashes we cannot catch in Socorro - OOM crashes
- We use the URL reports, build id reports, search by dates.
- Naoki whiteboard to describe pieces of crash analysis for mobile
Summary of crash activities
- Mondays 10:00am PT
- Engineering Meeting Tues 11:00am PT
- Channel Meetings (Tues/Thurs 2:00pm PT)
- Release signoff (Wed before release)
- Crash triage (Thurs 9:00am PT)
- Topcrash triage (ad-hoc)
- General crash triage (ad-hoc)
Finding Crashes
- Bob Clary - load crash URLs
- Fuzz testing
- Harvesting support data and input
- Triaging crash bugs
- Topcrash reports and explosive reports
- WinQual reports
Tools
- Address sanitizer
- Valgrind
- Lithium
- Stack-blame
Security Bug?
- 0x0 - 0x0ffff (safe)
- Out of stack space (safe)
- 0x??? (unsafe)
What breakpad misses
- Hangs
- Jank
- OOM crashes
- Out of stack space
- Add-on disables breakpad
- Media:CrashAnalysisTools.pdf