PostCrash: Difference between revisions
No edit summary |
No edit summary |
||
Line 88: | Line 88: | ||
Changes to Socorro: | Changes to Socorro: | ||
* Add a webservice call. Given crash id, return signature, associated plugin, associated bugzilla id and bug status. | * Add a webservice call. Given crash id, return signature, associated plugin, associated bugzilla id and bug status. | ||
Possible mock | |||
[[File:About_crashes.png|600px]] | |||
==Early idea on a revamped about:crashes page== | ==Early idea on a revamped about:crashes page== |
Revision as of 20:48, 27 July 2010
DRAFT
The content of this page is a work in progress intended for review.
Please help improve the draft!
Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.
The primary goal of this project is to engage with users who provided an e-mail address in the Crash Reporter by
- Providing relevant info about their crash as well as general info about how to avoid crashes in Firefox.
- Letting them know when their crash has been fixed in Firefox.
A secondary goal is to develop about:crashes into a more useful resource for advanced users, with more info about the crashes upfront (e.g. signature, bug status, support documentation, Firefox fix status, etc) and add relevant pointers/info to Session Restore so people can find it.
The project will require changes to SUMO and Socorro in order to supply the required functionality. For the secondary goal, Session Restore and/or about:crashes need to be changed, too.
Primary goal: Engaging with users after Firefox crashes
We should engage with users via e-mail in the following situations:
- Immediately when submitting the crash report to provide info about the crash, including relevant support documentation links for the crash signature and generic support documentation for how to avoid crashes in general.
- When new support documentation is available for the particular crash signature (if it didn't exist when the report was submitted).
- After a bug has been fixed and shipped in a Firefox release, to inform the user that the bug is fixed in Firefox version x.y.z. This will also work as a powerful way of retaining users, including getting users back who may have switched to another browser between the crash report and the fix.
When submitting the crash report
This e-mail will be automated:
- A user submits a crash reports and provides an e-mail address.
- Socorro generates the signature for the crash and queries the SUMO search to see if there's an associated support article for this particular crash.
- If an article exists, the title and URL of that article is fetched and used in the e-mail.
- If an article does not exist, the report is flagged accordingly so it can revisit the report when new info is available.
- An e-mail is sent out pointing the user to the support article (if one exists) and also includes general information about how to avoid crashes (linking to a generic article on SUMO listing ways to keep your Firefox, plug-ins and add-ons up to date), where to get more info about the crash itself (linking to the crash report on Socorro).
Generally, the e-mail should:
- Thank the user for submitting the report and possibly describe what's going to happen to set the right expectations.
- Inform that we will contact the user again if there's more info about the bug.
- Make it clear that it's not possible to reply to the e-mail and that they should go to support.mozilla.com for help.
Issues with regard to authenticity of email / phishing can be addressed by encouraging users to "visit mozilla.com and click on blah for more information" (taking a lead from Paypal/eBay here). We should make sure that emails come from a mozilla.com address and use whatever we have in place for email authentication.
When new support documentation is available
This e-mail will be automated:
- A cron job would check each signature that lacks an associated SUMO article to see if new info is available.
- If a new article is detected, all crash reports with that signature and an e-mail address will be queued up for another e-mail message including the new information.
Generally, this e-mail should make the new information prominent, but otherwise retain the language and key messages of the initial e-mail.
When a bug has been fixed in a shipped Firefox release
This would in part be a manual workflow:
- A Firefox bug is fixed in Bugzilla.
- A cron job notices when a bug associated with a crash signature is fixed, and puts it in a queue for the QA team.
- The QA team maintains this list of fixed signatures and keeps track of which Firefox release a fix is targeted for.
- When a fix is scheduled for a final Firefox release, the QA team passes over a list of crash bugs fixed to the SUMO team.
- The SUMO team writes personal (based on a template) messages for each such fixed crash, explaining in which Firefox version the bug has been fixed and what steps to take next for the user.
- The messages are sent to each user who submitted a crash report with that crash signature.
- The item is removed from the QA queue.
Secondary goals (not essential for this project)
Session Restore dialog
We want to make sure that more users are exposed to the already existing troubleshooting info about specific crashes, as well as general tips and advice on how to minimize crashes by keeping plugins up to date, etc.
A good opportunity to do that is just after Firefox resumes from a crash. The Session Restore dialog should check if Firefox has crashed more than x (e.g. 3) times in the last 24 hours. If so, it should at the very least display general tips/advice for avoiding crashes, but it should also query Socorro/SUMO to see if there is specific support available for the crash and show that as well.
The flowchart to the right illustrates the desired logic.
On the SUMO side, there would be specific articles for known crash signatures (this is already the case for many crashes today), as well as a generic checklist guide for things a user should do to avoid crashes.
about:crashes
We want to make about:crashes more useful. Ideally, for each crash we would display:
- the crash id
- the crash signature (source: Socorro)
- associated bugzilla ids (source: Socorro) and status of these bugs (source: Bugzilla or Socorro).
- If a bug is resolved fixed, show the version it was fixed in.
- if a crash is associated with a plugin, link to plugincheck (source: Socorro)
- If a canonical SUMO article exists, a link to that article, otherwise a link to SUMO search for that signature (as is currently done in Socorro)
The client should obtain the extra information from Socorro and SUMO via XHR. This data should be stored locally once final. Final data includes:
- crash signature
- bugzilla ids for RESOLVED bugs
- plugin association
- canonical SUMO links
Data that should be polled each time about:crashes is loaded includes:
- Status of bugzilla bugs that were unresolved last time about:crashes was loaded
- Search for a canonical SUMO article where one previously didn't exist
In order to minimize load on Socorro and SUMO, we will by default only show an expanded view for the most recent crash. Users may individually expand other crashes (something like "check crash status" on each previous crash).
Question:
- Should we show both hangs and crashes? Users may load the page and see nothing but hangs. Does it make more sense to (perhaps by default) just show crashes?
Changes to Socorro:
- Add a webservice call. Given crash id, return signature, associated plugin, associated bugzilla id and bug status.
Early idea on a revamped about:crashes page
- The latest crash automatically queries SUMO and Socorro for status info.
- Older crashes might not query the servers automatically to reduce server load.
- To reduce information overload, the crash ID isn't shown, but clicking on the signature takes you to the full report on Socorro.
Some additional thoughts/ideas:
- Would be even better if the page could query SUMO for the full article title and show that instead of "View solution" (the SUMO service should also support l10n to make sure that a localized article title appears if it exists).
- The latest crash could be more prominently highlighted -- perhaps even have a separate section above the "Recent Crash Reports".
- The generic support link at the bottom could be expanded into its own section. The target link could take you to a start page specializing on crash/hang problems, with prominent links for how to upgrade plugins, etc.
Technical requirements
Changes to Socorro
- To be fleshed out
Changes to SUMO
- Add a webservice call that, given a signature and locale, searches for relevant content on SUMO and returns a URI. The web service should work as follows:
- If a canonical article exists for crash_signature (top search result), return the title and URL
- Else if other search results exist for crash_signature (e.g. forum threads), return URL to search results
Questions/Discussion
- Some crashes have more than one associated bug. Should we email the user each time a bug is fixed or when all the bugs are fixed (or some other model)?
- the period of when bugs get fixed to when the crash happens might be very long. bugs being fixed on the trunk now might not be available in a final release until end of q3 or q4. It doesn't make much sense to e-mail users now about fixes that won't appear in a final firefox 4.0 until then, or e-mail them at the end of the year about that that crash they had 6 months ago. A single signature with many bugs makes it nearly impossible to correlate if any one of the bugs might have fixed the specific bug that the user just crashed on. we would need to start storing the full stack trace(s) of the bugs that we have fixed and map them to the full stack trace that the user experienced. -chofmann
- What volume of users supply emails? (Check: For each of the topcrashers, how many associated user emails do we have?) The architecture of the solution for sending email depends on the volume.
- Also depend on what events we send e-mails for. (see below: expect to send about 12k e-mails per day for just incoming reports where we are auto responding when a user sends us an e-mail address, and around 5k per day if we e-mail when there is a bug on file associated with the signature) -chofmann
- Do repeated crashes mean repeated e-mails to the users with the same message, or do we track so that we don't keep telling users there is no fix for their crash yet? if we don't track it the e-mails start to look like spam. --chofmann
- we could be over engineering things here. the system that we had with talkback served a good purpose. when we found a specific crash that we had something specific to tell users about, we constructed the message, then we queued up the system to watch for that signature and send the message. This was for a very small number of the total overall crashes, but had useful information as opposed to just auto-responding. e.g.
"from your recent crash we detected that you need to upgrade to Skype X.X to fix the crash."
At any particular point in time we might have something this specific to say in an extremely low pct. of cases. Numbers below indicate this might be lower than 1% of the time a user reports a crash to us.
- sending e-mails with technical instructions on how to fix or mitigate crashes, and espcially instructions that involve downloading and installing software is open to impersonation, and phishing. we need build in ways for users to authenticate that the message came from mozilla and that the same instructions are available on an "offical" mozilla hosted website.
some numbers
14k -count of reports that have e-mail address, 378k -total crashes per day, 3.6% -ratio of e-mails to total crashes 5k -signature has e-mail and an assigned bug 1% -ratio of signatures with emails and assigned bugs to total crashes 35% -ratio of reports with e-mails to reports e-mails with bugs. email& email& date email=yes #crash email/crash bugs bugs/tl email/email&bug 20100530-crashdata.csv 12596 341229 0.03691 4319 0.0126572 0.342887 20100531-crashdata.csv 13567 372037 0.03646 4784 0.0128589 0.35262 20100601-crashdata.csv 13941 378282 0.03685 5053 0.0133578 0.362456 20100602-crashdata.csv 13806 379287 0.03639 4793 0.0126369 0.347168 20100603-crashdata.csv 12353 332924 0.03710 3855 0.0115792 0.31207 -chofmann