Sfink/Memory Ideas: Difference between revisions

m
→‎Problem F: Hard to track down problems: Remove hard breaks to fix formatting
No edit summary
m (→‎Problem F: Hard to track down problems: Remove hard breaks to fix formatting)
 
Line 111: Line 111:
Details:
Details:


<div id='A2'>A2. atop records a ton of statistics about memory, disk, network, CPU, and
<div id='A2'>A2. atop records a ton of statistics about memory, disk, network, CPU, and other things at a 10 minute sampling interval. Stats are collected both on a global and per-process granularity. It monitors every process that starts and stops, even if the process appeared and disappeared entirely between two samples. It dumps all this in a somewhat-compressed binary log.
other things at a 10 minute sampling interval. Stats are collected both on a
global and per-process granularity. It monitors every process that starts and
stops, even if the process appeared and disappeared entirely between two
samples. It dumps all this in a somewhat-compressed binary log.


The visual UI has a good set of heuristics for detecting "large" values, and
The visual UI has a good set of heuristics for detecting "large" values, and coloring the output accordingly. If your disk is busy for >90% of the sampling interval, it'll turn red. If your network traffic is a high percentage of the expected maximum bandwidth, it'll turn red. etc.
coloring the output accordingly. If your disk is busy for >90% of the sampling
interval, it'll turn red. If your network traffic is a high percentage of the
expected maximum bandwidth, it'll turn red. etc.


It lets you use it in 'top-like' mode, where it displays the current state of
It lets you use it in 'top-like' mode, where it displays the current state of things, as well as in a historical mode where it reads from a log file. (It is decidedly *not* seamless between the two, but it should be.)
things, as well as in a historical mode where it reads from a log file. (It is
decidedly *not* seamless between the two, but it should be.)


It also allows dumping historical data to text files. I've used that for
It also allows dumping historical data to text files. I've used that for generating graphs of various values.
generating graphs of various values.


For the browser, many of the same metrics are applicable, but I'd also like an
For the browser, many of the same metrics are applicable, but I'd also like an equivalent of the processes' info. The idea is to know "what was going on at
equivalent of the processes' info. The idea is to know "what was going on at
XXX?" So it should be user and browser actions, which tab was active, network requests, significant events firing, etc.
XXX?" So it should be user and browser actions, which tab was active, network
requests, significant events firing, etc.


</div>
</div>
----
----


<div id='A3'>A3. The idea is that rather than waiting for the screen to redraw for every
<div id='A3'>A3. The idea is that rather than waiting for the screen to redraw for every action in getting to about:memory, you just do firefox 'about:memory...' and go have a cup of tea while it thinks about it.
action in getting to about:memory, you just do firefox 'about:memory...' and go
have a cup of tea while it thinks about it.


</div>
</div>
----
----


<div id='A5'>A5. This is based on pure speculation, but I don't understand why the browser
<div id='A5'>A5. This is based on pure speculation, but I don't understand why the browser is so incredibly unusable when memory usage is going nuts. Why is all that memory being touched? Why isn't it just swapped out and forgotten? Under the assumption that it's the GC scanning it over and over again, it seems like it would be nice to suppress GC in this situation. Generational GC could eliminate this problem in a nicer and much more principled way.
is so incredibly unusable when memory usage is going nuts. Why is all that
memory being touched? Why isn't it just swapped out and forgotten? Under the
assumption that it's the GC scanning it over and over again, it seems like it
would be nice to suppress GC in this situation. Generational GC could eliminate
this problem in a nicer and much more principled way.


</div>
</div>
----
----


<div id='B2'>B2. I have the impression that we have many, many memory-related problem
<div id='B2'>B2. I have the impression that we have many, many memory-related problem reports that end up being useless. I think that's really our fault; it's too hard for users to file useful bug reports. Experienced Mozilla devs don't even know what to do.
reports that end up being useless. I think that's really our fault; it's too
hard for users to file useful bug reports. Experienced Mozilla devs don't even
know what to do.


</div>
</div>
----
----


<div id='B5'>B5. eg: collect up all API calls that an addon makes (or record them, or
<div id='B5'>B5. eg: collect up all API calls that an addon makes (or record them, or whatever.) Maintain a whitelist of APIs. (If you pass in a string, assume it may be duplicated a thousand times and stored in a sqlite DB forever, but if you're just setting existing booleans or reading state, you're blameless.)
whatever.) Maintain a whitelist of APIs. (If you pass in a string, assume it
may be duplicated a thousand times and stored in a sqlite DB forever, but if
you're just setting existing booleans or reading state, you're blameless.)


</div>
</div>
----
----


<div id='C2'>C2. When looking at a memory leak, I took several snapshots of
<div id='C2'>C2. When looking at a memory leak, I took several snapshots of /proc/<pid>/maps, diffed them to find a memory region that appeared and did not disappear, and then dumped out the raw memory to a file. Then I ran strings on it.
/proc/<pid>/maps, diffed them to find a memory region that appeared and did not
disappear, and then dumped out the raw memory to a file. Then I ran strings on
it.


</div>
</div>
----
----


<div id='D2'>D2. I don't really know enough about the system to flesh this out properly, but
<div id='D2'>D2. I don't really know enough about the system to flesh this out properly, but it seems like when you have a bunch of memory lingering around when it really ought to be dead, that many of the objects comprising that memory should be able to "know" that they *probably* shouldn't live past... the current page, or for more than a few seconds, or whatever. Assuming this is possible, it should be possible to walk up a dominator graph and give a fairly directed answer to "why has this outlived what it thought its lifespan would be?"
it seems like when you have a bunch of memory lingering around when it really
ought to be dead, that many of the objects comprising that memory should be
able to "know" that they *probably* shouldn't live past... the current page, or
for more than a few seconds, or whatever. Assuming this is possible, it should
be possible to walk up a dominator graph and give a fairly directed answer to
"why has this outlived what it thought its lifespan would be?"


Not every memory allocation needs to be marked for this to work. You just need
Not every memory allocation needs to be marked for this to work. You just need one object within the "leaked" memory to be marked.
one object within the "leaked" memory to be marked.


It could also walk the graph "en masse" to ignore individual objects that are
It could also walk the graph "en masse" to ignore individual objects that are reachable longer than expected and focus on the clusters of objects that are kept alive by the same thing. (I'm thinking that the expected lifetime is a guess, and may be inaccurate.)
reachable longer than expected and focus on the clusters of objects that are
kept alive by the same thing. (I'm thinking that the expected lifetime is a
guess, and may be inaccurate.)


</div>
</div>
----
----


<div id='D4'>D4. eg use mprotect on a random subset of the heap to find pages (or smaller
<div id='D4'>D4. eg use mprotect on a random subset of the heap to find pages (or smaller regions, but that's harder) that are never accessed after some point. Remove the GC/CC from consideration.
regions, but that's harder) that are never accessed after some point. Remove
the GC/CC from consideration.
</div>
</div>
Confirmed users
328

edits