Data/WorkingGroups/CrashReporting/Status2021

From MozillaWiki
Jump to: navigation, search

Crash Reporting Status 2021

Every month, the coordinator will send out an email asking for status updates from teams/projects working on crash reporting things. Details on this process are at Data/WorkingGroups/CrashReporting#Monthly_status_rollup.

Updates get compiled into a newsletter and sent to lists and posted here.

Crash Reporting Headlines (August 12th, 2021)

Quick Summary

  • Windows Error Reporting crash collection and macOS crash handler improvements
  • Denoting more OOM crash reports as OOMs in the crash signature

Details

Completed

  • Crash Stats: flagging additional crash reports as OOMs
    • bug 1716742: flag ERROR_COMMITTMENT_LEVEL as OOMs
    • bug 1723474: flag WER windows crashes with a reason set to STATUS_FATAL_MEMORY_EXHAUSTION or STATUS_NO_MEMORY as OOMs
  • Crash reporter: WER improvements
    • Windows Error Reporting is fully functioning across all processes (bug 1697895 and bug 1682518), it flags OOM crashes correctly (bug 1711418) and the reports have a special WindowsErrorReporting annotation that lets you tell them apart from the rest (bug 1703761). Capturing hangs has also been disabled (bug 1718226).
  • Crash reporter: macOS crash handler improvements
    • The macOS crash handler has been modernized and now properly reports 64-bit crashes (bug 1035892). Among other things this makes UAF crashes on arm64 macOS builds immediately obvious as the poison pattern will appear as the crashing address.
    • macOS crashes now have thread names correctly populated (bug 1658831)
    • An infamous main process crash while capturing the minidump of a child process has been fixed on macOS (bug 1723941).
  • Crash reporter: native thread names support for minidumps from Linux
    • Martin Sirringhaus implemented native thread names support in Linux minidumps (bug 1714465) in child process (main process crashes still rely on the old machinery)

In progress


Crash Reporting Headlines (June 4th, 2021)

Quick Summary

  • Acquired symbols for openSUSE 15.3
  • Improving our support for acquiring macOS symbols for release and beta builds
  • Work continues on WER support in crash reporter around hang reports and child processes
  • Improved capturing annotations for OOM crashes and re-enabled grabbing memory reports

Details

Completed

In progress

  • Crash reporter: intercepting child process crashes via WER
  • All: Rust rewrite of all things breakpad
    • rust minidump-stackwalk:
      • https://github.com/luser/rust-minidump/tree/master/minidump-stackwalk
      • https://github.com/luser/rust-minidump/issues/153
      • You can now install and test rust-minidump minidump-stackwalk
      • Same CLI as existing minidump-stackwalk that Socorro uses. Outputs the same JSON schema.
      • We handle most stuff reasonably well on x86/x64 these days, having full stackwalkers/symbolicators. ARM/ARM64 support is in progress. Some fields like exploitability heuristic are not yet implemented.
      • The biggest remaining task is replacing `breakpad-symbols` with `symbolic`, which should significantly improve performance/reliability of all the debuginfo handling.
      • Additionally, we've gotten a commitment from Microsoft to help build Rust minidump-stackwalk, maintain, and extend it.
  • Tecken: new symbolication API microservice
  • Socorro: Better signature generation for Java crash reports
  • Symbols: improving process for acquiring symbols for macOS Big Sur
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1683758
    • Currently we have symbols for release versions of macOS. Work is being done to acquire symbols for beta versions as well. Additionally, the process and tools for acquiring symbols for macOS Big Sur are being improved.
    • This enables profiles collected on beta versions of macOS with the Firefox profiler to have symbolicated system libraries.
    • This will improve stacks in crash reports for beta versions of macOS.
  • Firefox profiler: fix OOM errors when profiling local builds on Linux
  • Firefox profiler: getting inline callstacks
    • Making this work for official builds is a bigger lift and requires that our symbolication API stops using dump_syms as part of the symbolication pipeline. The Eliot rewrite is making big strides towards that goal and I am very excited about it. (Eliot currently goes [raw build artifact] -> [.sym file] -> [symbolic symcache] -> [API response]. Once we can go directly from the raw build artifact to the symbolic symcache, the rest should be easy.)

Crash Reporting Headlines (May 7th, 2021)

Quick Summary

  • WER support in crash reporter
    • Work continues on WER support so that we get crash reports for situations we're currently not getting any. Main process support should be done. Content process support is in progress.
  • Socorro's minidump-stackwalker improvements
    • Socorro's minidump-stackwalker was improved to emit additional Windows and macOS information. You can see this in the minidump-stackwalk output in the crash report view of Crash Stats.
  • rust-minidump progress is moving along
    • Work towards replacing Socorro's minidump-stackwalker with rust-minidump is progressing very nicely.
  • Crash Stats lets you search by major_version
  • Crash Stats has an improved Extensions tab in the crash report view

Details

Completed

  • Crash reporter: WER support
    • Windows Error Reporting interception landed last month and can intercept all main process crashes we were previously missing. This includes __fastfail() crashes, catastrophic OOM crashes, weird DLL injections and very late shutdown crashes. It significantly increased nightly crash rate which is good! Content process support is being worked upon.
  • Socorro: minidump-stackwalker improved Windows information
    • minidump-stackwalker was improved to print out richer information for Windows including unloaded modules, authenticode signatures, __fastfail() crash reasons, and NTSTATUS errors.
  • Socorro: minidump-stackwalker __crash_info support for macOS
    • minidump-stackwalker was improved to find and emit __crash_info information for Apple-specific error messages.
    • Thank you, Steven Michaud!
  • Crash reporter: fixed OOM crash annotations
    • Alexandre modified the way we handle out-of-memory crash annotations so that it will never be missing again.
  • rust-minidump: taught rust-minidump to parse MISC_INFO_5 format
  • rust-minidump: upgraded minidump-processor unwinder
  • rust-minidump: upgraded cli to match dump_syms
    • Upgraded the minidump-processor CLI frontend to match dump_syms, and taught it to generate a JSON version of its report (format is "whatever the layout of the current types are", to be iterated on over time)
    • https://github.com/luser/rust-minidump/pull/151
  • dump_syms: better support for Apple's compact unwinding
    • Taught symbolic (and therefore dump_syms) how to dump Apple's Compact Unwinding (.__unwind_info) format into breakpad's format for x86/x64, as well as wrote up a very thorough description of the format (that is otherwise missing from llvm's implementation, which is the only existing documentation of the format). Ideally when this lands it will fix Bug 1691022 (x64 macos missing CFI on socorro).
    • https://github.com/getsentry/symbolic/pull/372/
  • Crash Stats: added last error value to crash report view.
  • Crash Stats: redid process type support
    • Redid process type support--now "parent" is the value for parent process crash reports and we're phasing out "browser".
    • This makes it a lot easier to search for parent crashes and aggregations on process type work now.
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1701357
  • Crash Stats: iImproved Extensions tab in report view
  • Crash Stats: fixed Bugs API to support POST as well as GET
  • Crash Stats: added search by major_version
  • Crash Stats: all Super Search fields now have exists/does-not-exist filter
  • Socorro: Added support for multiple processing pipeline rulesets
    • The first non-default ruleset I wrote is "regenerate_signature" which just regenerates the crash signature. It takes 1/10 the time regular processing takes. I'll use this going forward to regenerate crash signatures after signature generation changes.
    • We can use this infrastructure for additional processing as well. That's been something we've talked about over the years.
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1705469
  • Siggen: Released socorro-siggen 1.0.6
    • This includes signature generation changes made since 1.0.5 as well as some minor fixes.
  • Presentation: Socorro Overview: 2021

In process


Crash Reporting Headlines (April 7th, 2021)

Quick summary

  • Started Crash Reporting Working Group
    • We started a Crash Reporting Working Group to coordinate crash reporting, ingestion, and analysis work. If you're interested in participating or lurking, we've got a mailing list (crash-reporting-wg) and a Matrix channel (#crashreporting)
    • Socorro: Ended collection of Email address data.
      • Firefox 89+ no longer sends Email address data in crash reports.
      • Email data is dropped at collection for all crash reports.
    • Socorro: Ended collection of Fennec crash reports.
    • Tecken: We need help testing new symbolication API microservice.

Details

Completed

  • Crash Stats: Improved preview in Slack/Matrix for crash report view urls and signature report view urls.
  • Socorro: End collection of Email data in crash reports.
    • I changed the collector to delete Email data for all incoming crash reports. I fixed the Firefox main and content crash reporter client code. I still have some changes to make in the webapp, but I'm waiting until May 2021 to do that.
    • Many thanks to Emily, Nneka, Gabriele, Mike, and Chris for their help with this!
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1688883
  • Socorro: End collection of crash reports for Fennec
    • When working on ending collection of Email data, it came up that we don't need Fennec crash reports anymore. Thus Socorro now rejects all incoming crash reports for Fennec.
    • Many thanks to Emily, Stefan, Vesta, and Agi!
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1699239
  • Crash Stats: Fixed the webapp to automatically update the PCI device db once a week.
  • Crash stats: Redid "Raw data and minidumps" tab in crash report view.
    • The Crash Stats ui is confusing and clunky and I've been trying to fix bits of it over time. In this pass, I improved the tab that holds links to raw and processed crash data, minidumps, and the output of minidump-stackwalk. It should be clearer now as to what's protected data and what isn't. The links are at the top of the tab where they're easier to access. The minidump-stackwalk output is much easier to manipulate and use.
    • https://bugzilla.mozilla.org/show_bug.cgi?id=1696910
  • Tecken: New symbolication API microservices

In progress