Gecko:Shutdown issues: Difference between revisions

Add crash-stats search
(Discussions about Gecko shutdown related issues)
 
(Add crash-stats search)
Line 1: Line 1:
== Shutdown problems ==
== Shutdown problems ==


Bugzilla is filled with gruesome shutdown war stories. We have put a lot of effort into improving the situation with good results, but it is still pretty bad (look at crash-stats).
Bugzilla is filled with gruesome shutdown war stories. We have put a lot of effort into improving the situation with good results, but it is still pretty bad (as shown by this [https://crash-stats.mozilla.com/search/?product=Firefox&shutdown_progress=shutdown&_facets=signature&_columns=date&_columns=signature&_columns=version&_columns=build_id&_columns=platform&_columns=shutdown_progress#facet-signature crash-stats search]).
Properly shutting down is hard, but it is important in order to accurately check for memory leaks.
Properly shutting down is hard, but it is important in order to accurately check for memory leaks.


Line 8: Line 8:


After working on shutdown related issues for a while we started to think in terms of two category of objects:
After working on shutdown related issues for a while we started to think in terms of two category of objects:
* static objects, not truly static in the C++ sense, but typically abstractions with extended lifetime, than short-lived object depend on. For example top-level IPDL protocols, singletons, the JS engine, etc.
* static objects, not truly static in the C++ sense, but typically abstractions with extended lifetime, than short-lived object depend on. For example top-level IPDL protocols, singletons, a thread, the JS engine, etc.
* dynamic objects with shorter lifetimes such as textures, DOM elements, etc.
* dynamic objects with shorter lifetimes such as textures, DOM elements, etc.
This distinction is quite trivial but we'll use this terminology later.
This distinction is quite trivial but we'll use this terminology later.


An other way to categorize most objects is to separate manually managed objects with deterministic lifetime and automatically managed objects (reference-counted or garbage-collected) which for all intents and purposes have non-deterministic memory management. It turns out that there is a correlation between these categories: dynamic objects are often automatically managed and static objects are often manually managed.
An other way to categorize most objects is to separate manually managed objects with deterministic lifetime and automatically managed objects (reference-counted or garbage-collected) which for all intents and purposes have non-deterministic memory management. It turns out that there is a correlation between these categories: dynamic objects are often automatically managed and static objects are often manually managed.
For example, about all of the modules (static objects) being destroyed synchronously one after the other in ShutdownXPCOM.
For example, most of the modules (static objects) being destroyed synchronously one after the other in ShutdownXPCOM.


A lot of the shutdown issues boil down to dynamic objects depending on static objects. During shutdown a lot of important static objects are destroyed and if dynamic objects that depend on them are still alive, we run into use-after-free bugs. Let's call this the '''first family of shutdown issue'''.
A lot of the shutdown issues boil down to dynamic objects depending on static objects. During shutdown a lot of important static objects are destroyed and if dynamic objects that depend on them are still alive, we run into use-after-free bugs. Let's call this the '''first family of shutdown issue'''.
Line 28: Line 28:
I presented it earlier. Dynamic and automatically managed resources outlive static resources they depend on, and it causes crashes. Ideally we would not have manually managed resources that get destroyed one after the other in ShutdownXPCOM and they would all be automatically managed so that a everything maintains its dependencies alive. The reality is that it would be hard to get anything to shut down at all in such a situation, or it would require us to rethink how every single module is shut down. For example graphics resources depend on threads so they have to be shutdown before ShutdownPhase::ShutdownThreads (after which, well you can't use threads.) XPCOM threads themselves depend on other things, which depend on other things, and so on, and you quickly find out that to automatically manage the lifetime of a certain module you have to make everything else automatically managed. It is probably no a manageable change (I'd be happy that someone prove me wrong).
I presented it earlier. Dynamic and automatically managed resources outlive static resources they depend on, and it causes crashes. Ideally we would not have manually managed resources that get destroyed one after the other in ShutdownXPCOM and they would all be automatically managed so that a everything maintains its dependencies alive. The reality is that it would be hard to get anything to shut down at all in such a situation, or it would require us to rethink how every single module is shut down. For example graphics resources depend on threads so they have to be shutdown before ShutdownPhase::ShutdownThreads (after which, well you can't use threads.) XPCOM threads themselves depend on other things, which depend on other things, and so on, and you quickly find out that to automatically manage the lifetime of a certain module you have to make everything else automatically managed. It is probably no a manageable change (I'd be happy that someone prove me wrong).


The current status is that modules are shutdown sequentially in a way that (implicitly) tries to respect the dependencies between modules. Except that this dependency graph has cycles. The cycle collector ends up being destroyed very late, which means some cycle-collected DOM elements end up being destroyed after things that they depend on. As a result, some canvas and media elements end up being destroyed after the modules they depend on (media, gfx), which causes some issues. The graphics and media team have put a lot of effort into mitigating this by trying to find live objects and force them to shut down even if something else will keep them alive longer, but it's hard to this kind of things well, especially if these objects may be used on other threads. The reality is that while we brought the crash volume down significantly, these crashes still exist today.
The current status is that modules are shutdown sequentially in a way that (implicitly) tries to respect the dependencies between modules. Except that this dependency graph has cycles. The cycle collector ends up being destroyed very late, which means some cycle-collected DOM elements end up being destroyed after things that they depend on. As a result, some canvas and media elements end up being destroyed after the modules they depend on (media, gfx), which causes some issues. The graphics and media team have put a lot of effort into mitigating this by trying to find live objects and force them to shut down even if something else will keep them alive longer, but keeping track of all live objects is hard, especially if these objects may be used on other threads. The result is that while we brought the crash volume down significantly, some obects are falling through the cracks and these crashes still exist today.


== Two-phase shutdown proposal ==
== Two-phase shutdown proposal ==
Line 36: Line 36:
* Phase 2: All modules shut down the way they do now, but without having to destroy all of  their dynamic objects synchronously (which is the cause of much of our trouble) since the latter are already gone.
* Phase 2: All modules shut down the way they do now, but without having to destroy all of  their dynamic objects synchronously (which is the cause of much of our trouble) since the latter are already gone.


This is certainly a lot easier said than done, but managing shutdown the way we do now is harder.
This is certainly a lot easier said than done, but managing shutdown the way we do now is arguably even harder.


== Shutting down IPDL protocols ==
== Shutting down IPDL protocols ==
Confirmed users
138

edits