Release Management/WER Investigation

From MozillaWiki
Jump to: navigation, search

This page will track the initial investigation of Windows Error Reporting (WER). Our current plan is to hook our bug tracking and metrics into Winqual, the site/API used to access Microsoft's crash info. Using this info will help diagnose hangs (which aren't currently tracked) on Windows XP and later, and may even uncover new crashes.

Meetings

Associated Bugs

Resources

Builds

Creating File Mappings

  • MS suggests using their Microsoft Product Feedback Mapping Tool (AppMap.exe) to create a file manifest - required for crash collection for a new product version
  • (?) Need to explore generating this file manifest so that the build machines can create it

Uploading File Mappings

  • MS suggests using the "Upload File Mappings option" on the Administration menu
  • Can upload file mappings, but might be out of date: blog post

WinQual Update Frequency

  • "By default we collect 10 cab (minidump) files per event"
  • Lag times
    • "Once we receive cab files for an event you will generally be able to see these cabs within a few hours of us receiving them."
    • "For newly detected crashes it can take more than 4 days to get the crashes processed and up on the site."
  • (?) Determine if there are any limitations to the number of products that can be registered (for possible use with nighties
  • (?) Bandwidth limits for pulling down
  • Only the first few cab files are stored for a crash. The event viewer web interface offers the ability to make a "Data Request" to collect "Processor & Memory Information", the heap, specific files (logs, etc.), additional cab files
    • (?) Can the web API be used to expose this to developers or will we need more accounts?
  • (?) Need to find out what it means when no cab files are available and the web interface offers "(click here to switch to collection mode)" - aren't we always collecting?

Accessing WinQual Data

Windows Live Login

  • (?) Access in winqual requires a Windows Live login. Similarly, in MS's StackHash client requires the use of Windows Live Sign-in Assistant. Need to determine if the associated library is required for logging in.
  • (?) Need to figure out what account we'd use for automated tools

WER Web API

Breakpad/Soccoro

  • Crash dumps are stored in buckets
    • "For crash events the bucketing parameters are Application Name, Application Version, Application Build Date, Module Name, Module Version, Module Build Date, Exception Code, and Code Offset"
  • (?) Need to understand the overlap between WinQual crash data and Breakpad crash data, map the applicable info, and decide what to do with "additional" info
    • (?) Can we make use of individual hit event info in general? (as opposed to just crash cabs)
  • (?) Will hangs (heap dumps) need to be handled any differently than minidumps?
    • Hang blog posts: part 3 and part 4
    • They are bucketed differently. On XP, "hangs really only have 2 effective bucketing parameters... all of particular version of an application’s hangs ended up in a single bucket." On Vista it's better, but "there are still edge cases (just as there are in crash bucketing) where a bucket does not uniquely identify a single bug." (^^ see blog posts)
    • Need to also understand how to represent cross process hangs
  • (?) Who do we give access to minidumps?
  • (?) What is our current data retention policy?
    • We may want to keep hangs around for longer since there may be a lot, and they've never been investigated
  • (?) What is our access audit ability?

Cab File Contents (for collector/processor)

  • FAQ - WER Services - more info here under the question "What are the different types of memory dumps?"
  • WERInternalMetadata.xml - (possibly) not present if version.txt is. Includes
    • OSVersionInformation - windows version info, architecture, etc.
    • ProblemSignatures - event type (crash/hang), crashing executable name, exe version/timestamp, methodDef token of faulting method (?), and IL offset of faulting instruction (?)
    • DynamicSignatures -
    • SystemInformation - HW info. What's an MID?
  • AppCompat.txt (also all lower) - not present if WERDataCollectionFailure.txt is. Includes information on all images loaded by the process.
  • WERDataCollectionFailure.txt - includes error message if processing failed in MS.
  • version.txt - only came across this once. Only included OS version.
  • For crashes
  • For hangs
    • <process-name>.xml - additional hang metadata like the wait chain list
    • memory.hdmp - info about the difference between a heapdump and a minidump outlined here

Future Investigations

  • Consider providing a solution (link) when the user is presented with the Windows crash dialog
    • Can even link to an exe (if part of the "Designed for Windows" logo program), which may be a good idea for last ditch effort if even Firefox's safe mode fails (application files no longer pristine, need reinstall).