Firefox Core Engineering: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (update bug #)
(Updated her name too)
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{FirefoxCoreEngineering_banner}}
== Function ==
== Function ==
Quoting the [[Platform/UI Team]]:
The purpose of this team is to address needs that fall between Toolkit and Platform, with an emphasis (currently) on improving stability, quality, and performance – supported by empirical data. As such, we overlap a bit with everyone from Gecko, Desktop, Data, and more.
<blockquote>"The Platform teams and Firefox teams are co-dependent, but oftentimes priorities between the two groups do not line up perfectly. If Platform has a goal to ship some new feature that requires UI, and the Firefox team has no cycles to help develop the UX or UI, then it that feature will often languish."</blockquote>
This can happen in the opposite direction too, with Toolkit needing support for a feature that doesn't line up with current Platform priorities.


The purpose of this team is to address needs that fall between Toolkit and Platform, with an emphasis (currently) on improving stability, quality, and performance &ndash; supported by empirical data. As such, we overlap a bit with everyone from Platform through Toolkit, Data, and more.
This team grew out of, in part, the Performance Engineering team, and owns some of that team's infrastructure &ndash; some performance-related dashboards on telemetry.mozilla.org, crash analysis, hang visualization, etc. It also includes the installer & updater applications.
 
This team grew out of, in part, the Performance Engineering team, and owns that team's previous infrastructure &ndash; performance-related dashboards on telemetry.mozilla.org, the symbolication server, and more. It also includes the installer & updater applications.


== Personnel ==
== Personnel ==
Line 12: Line 9:
* Adam Gashlin (:agashlin)
* Adam Gashlin (:agashlin)
* Felipe Gomes (:felipe)
* Felipe Gomes (:felipe)
* Matt Howell (:mhowell)
* Molly Howell (:mhowell)
* Chris HC (:chutten)
* Chris HC (:chutten) -- honorary
* Kirk Steuber (:bytesized)
* Robin Steuber (:bytesized)
* Robert Strong (:rstrong)
* Robert Strong (:rstrong)
* Gabriele Svelto (:gsvelto)
* Gabriele Svelto (:gsvelto)
Line 22: Line 19:
== Goals ==
== Goals ==
# <b>Enabling stability</b><br/>In general, things we do should be tied to enabling stability &ndash; so, making it measurable and/or addressing issues.
# <b>Enabling stability</b><br/>In general, things we do should be tied to enabling stability &ndash; so, making it measurable and/or addressing issues.
# <b>Supporting performance improvements</b><br/>Improving performance is certainly everyone's job&mdash;not just our team&mdash;but we hold the keys for some distinct pieces of the analysis that allow people to understand what needs to be improved. This is primarily, but not limited to, telemetry and data analysis.
# <b>Supporting performance improvements</b><br/>Improving performance is certainly everyone's job&mdash;not just our team&mdash;but we hold the keys for some distinct historical pieces of the analysis that allow people to understand what needs to be improved. This is primarily, but not limited to, telemetry and data analysis.
# <b>Improving the user/contributor experience</b><br/>This one is the weirdest: it covers things that we can do to further the web, and improve the experience for our users &ndash; both end-users and code contributors (some examples: Flash blocking, XUL performance analysis). This category is the most ripe for population.
# <b>Improving the user/contributor experience</b><br/>This one is the weirdest: it covers things that we can do to further the web, and improve the experience for our users &ndash; both end users and code contributors (some examples: Flash blocking, XUL performance analysis). This category is the most open-ended for future expansion.


== Communication ==
== Communication ==
You can typically find us in:
You can typically find us in:
=== IRC ===
=== IRC ===
* '''#fce'''
* '''#fce''' (primary)
* #developers
* #developers
* #e10s
* #e10s
* #fx-team
* #fx-team
* #perf
* #perf
* #releng
* #telemetry
* #telemetry
* #uptime
* #uptime


=== Mailing lists ===
=== Mailing lists ===
* bsmedberg-team
* dev-platform
* dev-platform
* fhr-dev
* fhr-dev
* firefox-dev


== Process and Queuing ==
== Process and Queuing ==
There is currently no regimented process for regular triage of candidate work. Needs usually filter down through Benjamin Smedberg's team or tangentially related to performance analysis and experimentation.
There is currently no regimented process for regular triage of candidate work. Needs usually filter down through performance analysis and experimentation.


[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=all_fce-active&sharer_id=500559 All actively tracked work is marked with the whiteboard "<nowiki>[fce-active]</nowiki>"] (for now). Or look at the [[#Active Bug List]] on this page.
[https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=all_fce-active&sharer_id=500559 All actively tracked work is marked with the whiteboard "<nowiki>[fce-active-legacy]</nowiki>"] (for now). Or look at the [[#Active Bug List]] on this page.


Major initiatives are listed on this page.
Major initiatives are listed on this page.
Line 50: Line 48:
== Owned Infrastructure (needs updating per {{Bugzilla|1298080}}) ==
== Owned Infrastructure (needs updating per {{Bugzilla|1298080}}) ==
=== telemetry.mozilla.org Dashboards ===
=== telemetry.mozilla.org Dashboards ===
All of these dashboards are in the process of transferring ownership. Please contact ddurst if you need data that is currently not functional.
* '''Add-on startup correlation:''' via iacomus; broken on 403 for S3 bucket
* '''Add-on shutdown correlation:''' via iacomus; broken on 403 for S3 bucket
* '''Main Thread I/O:''' via iacomus; broken on 403 for S3 bucket
* '''Population Distribution:''' functional
* '''Power Dashboard:''' functional
* '''SlowSQL:''' functional
* '''ChromeHangs:''' jquery csv issue resolved, backfilled data from 3/07 to 6/05
* '''Update Orphaning:''' functional
* '''Update Orphaning:''' functional
* '''Stability Dashboard:''' functional


=== symbolapi.mozilla.org ===
=== symbolapi.mozilla.org ===
This is the [[Snappy_Symbolication_Server|symbolication server]] (aka "[https://github.com/mozilla/Snappy-Symbolication-Server Snappy Symbolication Server]") used by platform developers and performance dashboards. It is '''not''' used for the analogous process on Socorro.
This is the [[Snappy_Symbolication_Server|symbolication server]] (aka "[https://github.com/mozilla/Snappy-Symbolication-Server Snappy Symbolication Server]") used by platform developers and performance dashboards. It is '''not''' used for the analogous process on Socorro. '''This is currently slated to be replaced by the owner of symbols, peterbe. See [https://github.com/mozilla-services/tecken Tecken].'''


== Historical knowledge areas ==
== Historical knowledge areas ==
* back-end of the user interface/XUL, et al (enn)
* back-end of the user interface/XUL, et al (enn)
* e10s system add-on (felipe)
* e10s system add-ons & system add-ons for feature rollout (felipe)
* e10s data analysis (chutten)
* e10s data analysis (chutten)
* install and update (rstrong, mhowell)
* install and update (rstrong, mhowell)
* telemetry, histograms, pings, and data reporting (chutten)
* telemetry, histograms, pings, and data reporting (chutten)
* stack-walking and breakpad (gsvelto)
* stack-walking, breakpad, and crash pings (gsvelto, ccorcoran)
* we keep moving him around (bytesized)
* flash plugin-related (bytesized, felipe, dthayer)
* policy engine and MVP (felipe, bytesized)
* migration performance, BHR dashboard (dthayer)
 


== Pipeline ==
== Pipeline ==
=== Current efforts ===
=== Get More Data Faster ===
==== Get More Data Faster ====
We need to reduce known blind spots and barriers to getting data AND commit to non-ADI based metrics. For this, our goals are to:
We need to reduce known blind spots and barriers to getting data. For this, our goals are to:
* enable client-side stackwalking and send basic stack traces with crash pings (beginning in Nightly/Aurora, see {{Bugzilla|1280484}})
* <strike>enable content process crash reports ({{Bugzilla|1293656}})</strike>
* <strike>differentiate between process types in crash pings ({{Bugzilla|1310664}})</strike>
* process stack data in crash pings into a queryable result ({{Bugzilla|1310695}})
* process stack data in crash pings into a queryable result ({{Bugzilla|1310695}})
* create CrashSender to handle crash pings instead of Gecko ({{Bugzilla|1310703}})
* enable client-side stackwalking and send basic stack traces with crash pings on Beta/GA


See the [[Firefox_Core_Engineering/Get_More_Data_Faster|roadmap here]].
See the [[Firefox_Core_Engineering/Get_More_Data_Faster|legacy roadmap here]].
 
=== Set Flash to CTA by default ===
This includes:
* prefer fallback content to Flash ({{Bugzilla|1277346}})


==== Stability Dashboard for Relman ====
See the [https://docs.google.com/document/d/1sYp0DNioPA5iF3iw9LHGf1uN5B5AgdCu7jJxAh0MiqA/edit details here].
Relman has been using arewestableyet.com and related graphs to understand stability by build and channel; this is fine, but it relies on ADI and crash-stats rather than telemetry, and this is known to be unreliable. For this, our goals are to:
* <strike>create a dashboard like arewestableyet.com, but based on telemetry ({{Bugzilla|1297146}})</strike> '''Stability dashboard''': https://telemetry.mozilla.org/crashes/
* establish confidence levels based on kilousagehours by comparing telemetry-based stability data with ADI-based stability data


==== Stabilize symbolapi.m.o ====
=== XBL/XUL replacement ===
The symbolication API service is used by platform developers for debugging. It may also be used as part of the processing step for stacks received via crash pings. But there have historically been issues with its performance ({{Bugzilla|1244589}}). Stabilizing this means:
TBD with and after Browser Architecture's recommendations.
* <strike>rewrite symbolapi.m.o, adding tests and fixing caching</strike>
* load test rewrite to ensure it improves on current uptime and load handling (PENDING)
* coordinate with Ops to set up regular deployment process and transfer ownership (PENDING)


==== Experiment with blocking Flash ====
=== Policy Engine ===
All major browsers are stopping support for Flash; Firefox will stop supporting all NPAPI plugins (except Flash) shortly. Because there can be extreme user impact in blocking all Flash, we want to understand and attempt to smooth the transition to a post-Flash world. This includes:
When implemented, this should provide an API for pre-defined policies to support enterprise management of Firefox deployments.
* prefer fallback content to Flash ({{Bugzilla|1277346}})
* establish allowedlists and deniedlists ({{Bugzilla|1307604}}, {{Bugzilla|1307605}})
* use heuristics to control when Flash is set to Click To Activate ({{Bugzilla|1307606}})
We hope to do a geography-specific SHIELD study to understand user response and impact, but first we have to develop the mechanics needed for any such study.


==== Dashboard (repairs and additions) ====
=== Migration performance optimization ===
Dashboards. So many dashboards.
With bug {{Bugzilla|1332225}}, investigate and optimize the migration process for new users.
* ChromeHangs, SlowSQL, and Main Thread I/O dashboard rebuild
* change Update Orphaning dashboard to use MainSummary instead of longitudinal dataset


==== XUL performance tests ====
XUL is supposed to go away, but it would seem that we don't know what the performance implications could/will be. This work builds on Neil Deakin's 2015 experiment to shed some light on where we need to focus our optimization/change efforts.


==== Updater and Orphan remediation ====
=== App Updater and Installers ===
Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Analysis thus far confirms the reach but not the effectiveness or rate of conversion that we'd hoped for. This means:
* continue the download instead of starting over after NS_ERROR_DOCUMENT_NOT_CACHED occurs (already fixed in Firefox 49) ({{Bugzilla|1272585}})
* continue the download instead of starting over after other networking errors occur ({{Bugzilla|1309124}})
* <strike>download the update MAR file unthrottled (already landed) ({{Bugzilla|1309125}}, {{Bugzilla|1309668}})</strike>
* <strike>serve a partial MAR file to Firefox 43.0.1 clients ({{Bugzilla|1309130}})</strike>
* push either a system or hotfix add-on that changed the download throttle preference to 0
* run another method (non sysaddon, non SHIELD?, etc) to urge 43.0.1 users to upgrade
* change compression to LZMA for updates ({{Bugzilla|641212}})


==== Install UI ====
==== Update Orphan remediation ====  
* The install UI is outdated (and too big) and needs to be updated. ({{Bugzilla|893505}})
Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Efforts are identified by ongoing analysis, including the [https://telemetry.mozilla.org/update-orphaning/ update orphaning dashboard]. This has yielded such things as:
** most recent mockups: https://mozilla.invisionapp.com/share/Y776FIBWS#/screens
* continue the download instead of starting over after other networking errors occur ({{Bugzilla|1309124}}, <strike>{{Bugzilla|1348087}}</strike>)
* create an Update Agent, responsible for running independently, daily, and downloading an update if found ({{Bugzilla|1343669}})
* create a dashboard for non-orphan telemetry analysis


==== Windows 64 ====
==== Installer ====
We want to start moving users to 64-bit when appropriate:
* <strike>rename installed links to "Firefox" instead of "Mozilla Firefox" ({{Bugzilla|1413295}})</strike>
* <strike>stub installer should automatically select 32-bit or 64-bit ({{Bugzilla|797208}})</strike>
* stub installer metrics ({{Bugzilla|995794}})
* investigate MSI-based (read: non-NSIS) installer




=== Current projects ===
== Current projects ==
==== 2016 Q4 goals ====
=== 2018 Q1 goals ===
* landing of client-side stackwalking (DONE)
* APP UPDATE (see [[Firefox_Core_Engineering/App_Updater_roadmap|legacy roadmap]]):
* create separate content process crash pings (DONE)
** Allow update download to continue in the background (beyond Firefox session)
* start querying stacks received from crash pings (IN PROGRESS)
* INSTALLER (see [[Firefox_Core_Engineering/App_Installer_roadmap|legacy roadmap]]):
* relaunch of symbolapi.m.o -- now with tests and safe cache management (IN QA)
** Outline plan for MSI-based installer
* completion of definition phase of Flash-blocking & UI project (IN PROGRESS)
** Support onboarding
* LZMA compression for updates (IN REVISION)
* CRASH MACHINERY (see [[Firefox_Core_Engineering/Crash_machinery_roadmap|legacy roadmap]]):
* updated Install UI (IN PROGRESS)
** (Implement crash ping signatures) -- relies on Data Pipeline
* standardize orphan remediation process with respect to GA release cycle (IN ANALYSIS)
** (Start querying stacks received from crash pings for stability monitoring) -- relies on PI
* PERFORMANCE:
** Assist with measuring (and identifying) jank and hang via BHR
** Assist with performance improvements for migration
* ENTERPRISE:
** Prototype policy engine and policies MVP (for 59 beta tests, release in 60)
* XUL:
** Assist Browser Architecture team's flexbox recommendation effort


== Potential future projects ==
== Potential future projects ==
This list should be considered a work in progress. Decisions will be reflected for a particular quarter.
This list should be considered a work in progress. Decisions will be reflected for a particular quarter.
* Assisting with measuring (and addressing) jank and hang


== Active Bug List ==
* cmore is pitching that Firefox optimizes the user paths that support retention -- which could also include fixing paths where retention drops. This work probably involves a system add-on that initiates event-based recordation, as well as the analysis and remediation of the root cause (this is pending dcamp approval).
* DLL injection (see https://bugzilla.mozilla.org/show_bug.cgi?id=1306406) needs investigation/implementation of a dynamically updateable DLL blocklist (possibly using Kinto?).
* Mossop (& browser arch) has begun the de-XBL. Overall browser architecture (& UI architecture) could be game in the near future.
 
 
== (partial) Active Bug List ==
<bugzilla>
<bugzilla>
{
{
"whiteboard": "[fce-active]",
"whiteboard": "[fce-active-legacy]",
"resolution": "---",
"resolution": "---",
"include_fields": "id, product, summary, priority, status, whiteboard, keywords, assigned_to"
"include_fields": "id, product, summary, priority, status, whiteboard, keywords, assigned_to"
}
}
</bugzilla>
</bugzilla>

Latest revision as of 17:21, 18 November 2025

Stop (medium size).png
The Firefox Core Engineering (FCE) team ran from March 2016 through January 2018. All work on that team has been dispersed to other teams.

This page should be considered legacy.

For more information on previously-FCE activities, go by individuals listed in Historical knowledge areas. (If you're in a pinch, ask ddurst.)

Function

The purpose of this team is to address needs that fall between Toolkit and Platform, with an emphasis (currently) on improving stability, quality, and performance – supported by empirical data. As such, we overlap a bit with everyone from Gecko, Desktop, Data, and more.

This team grew out of, in part, the Performance Engineering team, and owns some of that team's infrastructure – some performance-related dashboards on telemetry.mozilla.org, crash analysis, hang visualization, etc. It also includes the installer & updater applications.

Personnel

  • Neil Deakin (:enn)
  • Adam Gashlin (:agashlin)
  • Felipe Gomes (:felipe)
  • Molly Howell (:mhowell)
  • Chris HC (:chutten) -- honorary
  • Robin Steuber (:bytesized)
  • Robert Strong (:rstrong)
  • Gabriele Svelto (:gsvelto)
  • Doug Thayer (:dthayer)
  • David Durst (:ddurst)

Goals

  1. Enabling stability
    In general, things we do should be tied to enabling stability – so, making it measurable and/or addressing issues.
  2. Supporting performance improvements
    Improving performance is certainly everyone's job—not just our team—but we hold the keys for some distinct historical pieces of the analysis that allow people to understand what needs to be improved. This is primarily, but not limited to, telemetry and data analysis.
  3. Improving the user/contributor experience
    This one is the weirdest: it covers things that we can do to further the web, and improve the experience for our users – both end users and code contributors (some examples: Flash blocking, XUL performance analysis). This category is the most open-ended for future expansion.

Communication

You can typically find us in:

IRC

  • #fce (primary)
  • #developers
  • #e10s
  • #fx-team
  • #perf
  • #releng
  • #telemetry
  • #uptime

Mailing lists

  • dev-platform
  • fhr-dev
  • firefox-dev

Process and Queuing

There is currently no regimented process for regular triage of candidate work. Needs usually filter down through performance analysis and experimentation.

All actively tracked work is marked with the whiteboard "[fce-active-legacy]" (for now). Or look at the #Active Bug List on this page.

Major initiatives are listed on this page.

Owned Infrastructure (needs updating per 1298080)

telemetry.mozilla.org Dashboards

  • Update Orphaning: functional

symbolapi.mozilla.org

This is the symbolication server (aka "Snappy Symbolication Server") used by platform developers and performance dashboards. It is not used for the analogous process on Socorro. This is currently slated to be replaced by the owner of symbols, peterbe. See Tecken.

Historical knowledge areas

  • back-end of the user interface/XUL, et al (enn)
  • e10s system add-ons & system add-ons for feature rollout (felipe)
  • e10s data analysis (chutten)
  • install and update (rstrong, mhowell)
  • telemetry, histograms, pings, and data reporting (chutten)
  • stack-walking, breakpad, and crash pings (gsvelto, ccorcoran)
  • flash plugin-related (bytesized, felipe, dthayer)
  • policy engine and MVP (felipe, bytesized)
  • migration performance, BHR dashboard (dthayer)


Pipeline

Get More Data Faster

We need to reduce known blind spots and barriers to getting data AND commit to non-ADI based metrics. For this, our goals are to:

  • process stack data in crash pings into a queryable result (1310695)

See the legacy roadmap here.

Set Flash to CTA by default

This includes:

  • prefer fallback content to Flash (1277346)

See the details here.

XBL/XUL replacement

TBD with and after Browser Architecture's recommendations.

Policy Engine

When implemented, this should provide an API for pre-defined policies to support enterprise management of Firefox deployments.

Migration performance optimization

With bug 1332225, investigate and optimize the migration process for new users.


App Updater and Installers

Update Orphan remediation

Remediation efforts have been tested for both system add-on capable and non (44.x and 43.0.1, respectively). Efforts are identified by ongoing analysis, including the update orphaning dashboard. This has yielded such things as:

  • continue the download instead of starting over after other networking errors occur (1309124, 1348087)
  • create an Update Agent, responsible for running independently, daily, and downloading an update if found (1343669)
  • create a dashboard for non-orphan telemetry analysis

Installer

  • rename installed links to "Firefox" instead of "Mozilla Firefox" (1413295)
  • stub installer metrics (995794)
  • investigate MSI-based (read: non-NSIS) installer


Current projects

2018 Q1 goals

  • APP UPDATE (see legacy roadmap):
    • Allow update download to continue in the background (beyond Firefox session)
  • INSTALLER (see legacy roadmap):
    • Outline plan for MSI-based installer
    • Support onboarding
  • CRASH MACHINERY (see legacy roadmap):
    • (Implement crash ping signatures) -- relies on Data Pipeline
    • (Start querying stacks received from crash pings for stability monitoring) -- relies on PI
  • PERFORMANCE:
    • Assist with measuring (and identifying) jank and hang via BHR
    • Assist with performance improvements for migration
  • ENTERPRISE:
    • Prototype policy engine and policies MVP (for 59 beta tests, release in 60)
  • XUL:
    • Assist Browser Architecture team's flexbox recommendation effort

Potential future projects

This list should be considered a work in progress. Decisions will be reflected for a particular quarter.

  • cmore is pitching that Firefox optimizes the user paths that support retention -- which could also include fixing paths where retention drops. This work probably involves a system add-on that initiates event-based recordation, as well as the analysis and remediation of the root cause (this is pending dcamp approval).
  • DLL injection (see https://bugzilla.mozilla.org/show_bug.cgi?id=1306406) needs investigation/implementation of a dynamically updateable DLL blocklist (possibly using Kinto?).
  • Mossop (& browser arch) has begun the de-XBL. Overall browser architecture (& UI architecture) could be game in the near future.


(partial) Active Bug List

Full Query
ID Product Summary Priority Status Whiteboard Keywords Assigned to
799861 Firefox Stub installer does not detect proxy (Win7) P3 NEW [stubv3-][fce-active-legacy]
1256952 Core [e10s] Caret breaks in contenteditable elements when I drag anything over them (without dropping) P2 REOPENED [fce-active-legacy]
1258895 Core Make test_bug426082.html and test_bug656379-1.html work with e10s P5 NEW btpp-active [fce-active-legacy]
1301572 Toolkit Stop update staging when there is a shutdown request P3 NEW [fce-active-legacy]
1310680 Mozilla Metrics Update all usage of datasets that rely on crash ping aggregation -- NEW [fce-active-legacy]
1316366 Toolkit include information from update-settings.ini in update ping? P3 NEW [fce-active-legacy]
1324801 Toolkit Telemetry should throttle pings to a sane limit per day P3 NEW [measurement:client][fce-active-legacy]
1416078 Firefox [meta] modernize crash handling and related areas P2 NEW [fce-active-legacy] meta

8 Total; 8 Open (100%); 0 Resolved (0%); 0 Verified (0%);