Firefox Core Engineering: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(add Carl)
(updated, 2017-03-27)
Line 14: Line 14:
* Felipe Gomes (:felipe)
* Felipe Gomes (:felipe)
* Matt Howell (:mhowell)
* Matt Howell (:mhowell)
* Chris HC (:chutten)
* Chris HC (:chutten) -- honorary
* Kirk Steuber (:bytesized)
* Kirk Steuber (:bytesized)
* Robert Strong (:rstrong)
* Robert Strong (:rstrong)
Line 38: Line 38:


=== Mailing lists ===
=== Mailing lists ===
* bsmedberg-team
* dev-platform
* dev-platform
* fhr-dev
* fhr-dev
Line 52: Line 51:
=== telemetry.mozilla.org Dashboards ===
=== telemetry.mozilla.org Dashboards ===
All of these dashboards are in the process of transferring ownership. Please contact ddurst if you need data that is currently not functional.
All of these dashboards are in the process of transferring ownership. Please contact ddurst if you need data that is currently not functional.
* '''Add-on startup correlation:''' via iacomus; broken on 403 for S3 bucket
* '''Add-on shutdown correlation:''' via iacomus; broken on 403 for S3 bucket
* '''Main Thread I/O:''' via iacomus; broken on 403 for S3 bucket
* '''Population Distribution:''' functional
* '''Power Dashboard:''' functional
* '''SlowSQL:''' functional
* '''ChromeHangs:''' jquery csv issue resolved, backfilled data from 3/07 to 6/05
* '''Update Orphaning:''' functional
* '''Update Orphaning:''' functional
* '''Stability Dashboard:''' functional
* '''Stability Dashboard:''' functional


=== symbolapi.mozilla.org ===
=== symbolapi.mozilla.org ===
This is the [[Snappy_Symbolication_Server|symbolication server]] (aka "[https://github.com/mozilla/Snappy-Symbolication-Server Snappy Symbolication Server]") used by platform developers and performance dashboards. It is '''not''' used for the analogous process on Socorro.
This is the [[Snappy_Symbolication_Server|symbolication server]] (aka "[https://github.com/mozilla/Snappy-Symbolication-Server Snappy Symbolication Server]") used by platform developers and performance dashboards. It is '''not''' used for the analogous process on Socorro. '''This is currently being re-written by the owner of symbols, peterbe.'''


== Historical knowledge areas ==
== Historical knowledge areas ==
Line 69: Line 61:
* e10s system add-on (felipe)
* e10s system add-on (felipe)
* e10s data analysis (chutten)
* e10s data analysis (chutten)
* install and update (rstrong, mhowell)
* install and update (rstrong, mhowell, agashlin)
* telemetry, histograms, pings, and data reporting (chutten)
* telemetry, histograms, pings, and data reporting (chutten)
* stack-walking and breakpad (gsvelto)
* stack-walking and breakpad (gsvelto, ccorcoran)
* we keep moving him around (bytesized)
* flash plugin-related (bytesized, felipe, dthayer)


== Pipeline ==
== Pipeline ==
Line 82: Line 74:
* <strike>differentiate between process types in crash pings ({{Bugzilla|1310664}})</strike>
* <strike>differentiate between process types in crash pings ({{Bugzilla|1310664}})</strike>
* process stack data in crash pings into a queryable result ({{Bugzilla|1310695}})
* process stack data in crash pings into a queryable result ({{Bugzilla|1310695}})
* create CrashSender to handle crash pings instead of Gecko ({{Bugzilla|1310703}})
* <strike>create pingSender to handle crash pings instead of Gecko ({{Bugzilla|1310703}})</strike>
* enable client-side stackwalking and send basic stack traces with crash pings on Beta/GA
* enable client-side stackwalking and send basic stack traces with crash pings on Beta/release


See the [[Firefox_Core_Engineering/Get_More_Data_Faster|roadmap here]].
See the [[Firefox_Core_Engineering/Get_More_Data_Faster|roadmap here]].
Line 91: Line 83:
* <strike>create a dashboard like arewestableyet.com, but based on telemetry ({{Bugzilla|1297146}})</strike> '''Stability dashboard''': https://telemetry.mozilla.org/crashes/
* <strike>create a dashboard like arewestableyet.com, but based on telemetry ({{Bugzilla|1297146}})</strike> '''Stability dashboard''': https://telemetry.mozilla.org/crashes/
* establish confidence levels based on kilousagehours by comparing telemetry-based stability data with ADI-based stability data (ON HOLD)
* establish confidence levels based on kilousagehours by comparing telemetry-based stability data with ADI-based stability data (ON HOLD)
==== Stabilize symbolapi.m.o ====
The symbolication API service is used by platform developers for debugging. It may also be used as part of the processing step for stacks received via crash pings. But there have historically been issues with its performance ({{Bugzilla|1244589}}). Stabilizing this means:
* <strike>rewrite symbolapi.m.o, adding tests and fixing caching</strike>
* load test rewrite to ensure it improves on current uptime and load handling (PENDING)
* coordinate with Ops to set up regular deployment process and transfer ownership (ON HOLD)


==== Experiment with setting Flash to CTA by default ====
==== Experiment with setting Flash to CTA by default ====
All major browsers are stopping support for Flash; Firefox will stop supporting all NPAPI plugins (except Flash) shortly. Because there can be extreme user impact in blocking all Flash, we want to understand and attempt to smooth the transition to a post-Flash world. This includes:
All major browsers are stopping support for Flash; Firefox soon will only support Flash. Because there can be unpleasant user impact in blocking all Flash, we want to understand and attempt to smooth the transition to a post-Flash world. This includes:
* prefer fallback content to Flash ({{Bugzilla|1277346}})
* prefer fallback content to Flash ({{Bugzilla|1277346}})
* establish allowedlists and deniedlists ({{Bugzilla|1307604}}, {{Bugzilla|1307605}})
* establish allowedlists and deniedlists ({{Bugzilla|1307604}}, {{Bugzilla|1307605}})
* use heuristics to control when Flash is set to Click To Activate ({{Bugzilla|1307606}})
* use heuristics to control when Flash is set to Click To Activate ({{Bugzilla|1307606}})
We hope to do a geography-specific SHIELD study in release to understand user response and impact ({{Bugzilla|1277346}}).
We hope to do a geography-specific SHIELD study in FF53 release to understand user response and impact ({{Bugzilla|1277346}}).


See the [https://docs.google.com/document/d/1sYp0DNioPA5iF3iw9LHGf1uN5B5AgdCu7jJxAh0MiqA/edit details here].
See the [https://docs.google.com/document/d/1sYp0DNioPA5iF3iw9LHGf1uN5B5AgdCu7jJxAh0MiqA/edit details here].
Line 119: Line 105:
==== Updater and Orphan remediation ====
==== Updater and Orphan remediation ====
Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Analysis thus far confirms the reach but not the effectiveness or rate of conversion that we'd hoped for. This means:
Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Analysis thus far confirms the reach but not the effectiveness or rate of conversion that we'd hoped for. This means:
* continue the download instead of starting over after NS_ERROR_DOCUMENT_NOT_CACHED occurs (already fixed in Firefox 49) ({{Bugzilla|1272585}})
* <strike>continue the download instead of starting over after NS_ERROR_DOCUMENT_NOT_CACHED occurs ({{Bugzilla|1272585}})</strike> (FF49)
* continue the download instead of starting over after other networking errors occur ({{Bugzilla|1309124}})
* continue the download instead of starting over after other networking errors occur ({{Bugzilla|1309124}})
* <strike>download the update MAR file unthrottled (already landed) ({{Bugzilla|1309125}}, {{Bugzilla|1309668}})</strike>
* <strike>download the update MAR file unthrottled (already landed) ({{Bugzilla|1309125}}, {{Bugzilla|1309668}})</strike>
Line 125: Line 111:
* <strike>push either a system or hotfix add-on that changed the download throttle preference to 0 for FF 50+</strike>
* <strike>push either a system or hotfix add-on that changed the download throttle preference to 0 for FF 50+</strike>
* <strike>run another method (non sysaddon, non SHIELD?, etc) to urge 43.0.1 users to upgrade</strike>
* <strike>run another method (non sysaddon, non SHIELD?, etc) to urge 43.0.1 users to upgrade</strike>
* Updater UI is outdated, too big, and needs to be updated ({{Bugzilla|893505}})
* Updater UI is outdated, too big, and needs to be updated ({{Bugzilla|893505}}) (FF55)
** most recent mockups: https://mozilla.invisionapp.com/share/Y776FIBWS#/screens
** most recent mockups: https://mozilla.invisionapp.com/share/Y776FIBWS#/screens
* change compression to LZMA for updates ({{Bugzilla|641212}})
* change compression to LZMA for updates ({{Bugzilla|641212}}) (FF55?)
* move to SHA-384 for MAR signing ({{Bugzilla|1324498}})
* move to SHA-384 for MAR signing ({{Bugzilla|1324498}}) (FF55?)
* continue the download of an update if Firefox closes, instead of having to start over
* create an Update Agent, responsible for running independently, daily, and downloading an update if found ({{Bugzilla|1343669}})


==== Install UI ====
==== Install UI ====
* Streamlined installer testing in qx onboarding funnelcake ({{Bugzilla|1328445}})  
* Streamlined installer testing in QX onboarding funnelcake ({{Bugzilla|1328445}})  


==== Windows 64 ====
==== Windows 64 ====
Line 141: Line 127:
=== Current projects ===
=== Current projects ===
==== 2017 Q1 goals ====
==== 2017 Q1 goals ====
* update https://telemetry.mozilla.org/update-orphaning/ to v2
* <strike>update https://telemetry.mozilla.org/update-orphaning/ to v2</strike>
* start querying stacks received from crash pings
* start querying stacks received from crash pings
* relaunch of symbolapi.m.o (ON HOLD)
* <strike>relaunch of symbolapi.m.o (ON HOLD)</strike>
* implement crash ping signatures
* implement crash ping signatures
* create "Plugin Safety" (Flash) SHIELD study
* create "Plugin Safety" (Flash) SHIELD study
* simplify Updater UI
* simplify Updater UI
* LZMA and SHA384 for MAR files
* LZMA and SHA384 for MAR files
* land pingSender and implement for crashes and main pings
* <strike>land pingSender and implement for crashes and main pings</strike>




Line 154: Line 140:
This list should be considered a work in progress. Decisions will be reflected for a particular quarter.
This list should be considered a work in progress. Decisions will be reflected for a particular quarter.
* Assisting with measuring (and addressing) jank and hang
* Assisting with measuring (and addressing) jank and hang
* BHR
* assisting with Quantum Flow
* assisting with Photon


== (partial) Active Bug List ==
== (partial) Active Bug List ==

Revision as of 01:09, 28 March 2017

Function

Quoting the Platform/UI Team:

"The Platform teams and Firefox teams are co-dependent, but oftentimes priorities between the two groups do not line up perfectly. If Platform has a goal to ship some new feature that requires UI, and the Firefox team has no cycles to help develop the UX or UI, then it that feature will often languish."

This can happen in the opposite direction too, with Toolkit needing support for a feature that doesn't line up with current Platform priorities.

The purpose of this team is to address needs that fall between Toolkit and Platform, with an emphasis (currently) on improving stability, quality, and performance – supported by empirical data. As such, we overlap a bit with everyone from Platform through Toolkit, Data, and more.

This team grew out of, in part, the Performance Engineering team, and owns that team's previous infrastructure – performance-related dashboards on telemetry.mozilla.org, the symbolication server, and more. It also includes the installer & updater applications.

Personnel

  • Carl Corcoran (:ccorcoran)
  • Neil Deakin (:enn)
  • Adam Gashlin (:agashlin)
  • Felipe Gomes (:felipe)
  • Matt Howell (:mhowell)
  • Chris HC (:chutten) -- honorary
  • Kirk Steuber (:bytesized)
  • Robert Strong (:rstrong)
  • Gabriele Svelto (:gsvelto)
  • Doug Thayer (:dthayer)
  • David Durst (:ddurst)

Goals

  1. Enabling stability
    In general, things we do should be tied to enabling stability – so, making it measurable and/or addressing issues.
  2. Supporting performance improvements
    Improving performance is certainly everyone's job—not just our team—but we hold the keys for some distinct pieces of the analysis that allow people to understand what needs to be improved. This is primarily, but not limited to, telemetry and data analysis.
  3. Improving the user/contributor experience
    This one is the weirdest: it covers things that we can do to further the web, and improve the experience for our users – both end-users and code contributors (some examples: Flash blocking, XUL performance analysis). This category is the most ripe for population.

Communication

You can typically find us in:

IRC

  • #fce
  • #developers
  • #e10s
  • #fx-team
  • #perf
  • #telemetry
  • #uptime

Mailing lists

  • dev-platform
  • fhr-dev

Process and Queuing

There is currently no regimented process for regular triage of candidate work. Needs usually filter down through Benjamin Smedberg's team or tangentially related to performance analysis and experimentation.

All actively tracked work is marked with the whiteboard "[fce-active]" (for now). Or look at the #Active Bug List on this page.

Major initiatives are listed on this page.

Owned Infrastructure (needs updating per 1298080)

telemetry.mozilla.org Dashboards

All of these dashboards are in the process of transferring ownership. Please contact ddurst if you need data that is currently not functional.

  • Update Orphaning: functional
  • Stability Dashboard: functional

symbolapi.mozilla.org

This is the symbolication server (aka "Snappy Symbolication Server") used by platform developers and performance dashboards. It is not used for the analogous process on Socorro. This is currently being re-written by the owner of symbols, peterbe.

Historical knowledge areas

  • back-end of the user interface/XUL, et al (enn)
  • e10s system add-on (felipe)
  • e10s data analysis (chutten)
  • install and update (rstrong, mhowell, agashlin)
  • telemetry, histograms, pings, and data reporting (chutten)
  • stack-walking and breakpad (gsvelto, ccorcoran)
  • flash plugin-related (bytesized, felipe, dthayer)

Pipeline

Current efforts

Get More Data Faster

We need to reduce known blind spots and barriers to getting data. For this, our goals are to:

  • enable client-side stackwalking and send basic stack traces with crash pings (beginning in Nightly/Aurora, see 1280484)
  • enable content process crash reports (1293656)
  • differentiate between process types in crash pings (1310664)
  • process stack data in crash pings into a queryable result (1310695)
  • create pingSender to handle crash pings instead of Gecko (1310703)
  • enable client-side stackwalking and send basic stack traces with crash pings on Beta/release

See the roadmap here.

Stability Dashboard for Relman

Relman has been using arewestableyet.com and related graphs to understand stability by build and channel; this is fine, but it relies on ADI and crash-stats rather than telemetry, and this is known to be unreliable. For this, our goals are to:

  • create a dashboard like arewestableyet.com, but based on telemetry (1297146) Stability dashboard: https://telemetry.mozilla.org/crashes/
  • establish confidence levels based on kilousagehours by comparing telemetry-based stability data with ADI-based stability data (ON HOLD)

Experiment with setting Flash to CTA by default

All major browsers are stopping support for Flash; Firefox soon will only support Flash. Because there can be unpleasant user impact in blocking all Flash, we want to understand and attempt to smooth the transition to a post-Flash world. This includes:

  • prefer fallback content to Flash (1277346)
  • establish allowedlists and deniedlists (1307604, 1307605)
  • use heuristics to control when Flash is set to Click To Activate (1307606)

We hope to do a geography-specific SHIELD study in FF53 release to understand user response and impact (1277346).

See the details here.

Dashboard (repairs and additions)

Dashboards. So many dashboards.

  • change Update Orphaning dashboard to use MainSummary instead of longitudinal dataset
  • Removing potentially unused dashboards from t.m.o (1324526)
  • Updating dashboards (1324528)

Updater and Orphan remediation

Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Analysis thus far confirms the reach but not the effectiveness or rate of conversion that we'd hoped for. This means:

  • continue the download instead of starting over after NS_ERROR_DOCUMENT_NOT_CACHED occurs (1272585) (FF49)
  • continue the download instead of starting over after other networking errors occur (1309124)
  • download the update MAR file unthrottled (already landed) (1309125, 1309668)
  • serve a partial MAR file to Firefox 43.0.1 clients (1309130)
  • push either a system or hotfix add-on that changed the download throttle preference to 0 for FF 50+
  • run another method (non sysaddon, non SHIELD?, etc) to urge 43.0.1 users to upgrade
  • Updater UI is outdated, too big, and needs to be updated (893505) (FF55)
  • change compression to LZMA for updates (641212) (FF55?)
  • move to SHA-384 for MAR signing (1324498) (FF55?)
  • create an Update Agent, responsible for running independently, daily, and downloading an update if found (1343669)

Install UI

  • Streamlined installer testing in QX onboarding funnelcake (1328445)

Windows 64

We want to start moving users to 64-bit when appropriate:

  • stub installer should automatically select 32-bit or 64-bit (797208)


Current projects

2017 Q1 goals

  • update https://telemetry.mozilla.org/update-orphaning/ to v2
  • start querying stacks received from crash pings
  • relaunch of symbolapi.m.o (ON HOLD)
  • implement crash ping signatures
  • create "Plugin Safety" (Flash) SHIELD study
  • simplify Updater UI
  • LZMA and SHA384 for MAR files
  • land pingSender and implement for crashes and main pings


Potential future projects

This list should be considered a work in progress. Decisions will be reflected for a particular quarter.

  • Assisting with measuring (and addressing) jank and hang
  • BHR
  • assisting with Quantum Flow
  • assisting with Photon

(partial) Active Bug List

No results.

0 Total; 0 Open (0%); 0 Resolved (0%); 0 Verified (0%);