Firefox OS/Performance/Debugging OOMs: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 22: Line 22:


Most child processes run with oom_adj 2 while they're in the foreground.  Child processes in the background run with oom_adj between 3 and 6 (inclusive).  Exactly what oom_adj a child process while in the background gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on.
Most child processes run with oom_adj 2 while they're in the foreground.  Child processes in the background run with oom_adj between 3 and 6 (inclusive).  Exactly what oom_adj a child process while in the background gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on.
== Debugging an OOM crash ==
Suppose you have a reproducible crash that you suspect is caused by the phone running out of memory.  The following are steps you can take to understand more about what's going wrong.
=== Step 1: Verify that it's actually an OOM ===
First, we need to check whether the crash is actually due to the phone running out of memory.  To do this, run <tt>adb shell dmesg</tt>.  If the app is being killed due to OOM, you'll see something like the following line in dmesg:
  <4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625
This line indicates that the phone's low-memory killer killed the Notes+ app (process-id 2897), which had oom_adj 2 when it was killed.  The size reported here is in pages, which are 4kb each.  So in this case, the Notes+ app was using 30625 * 4kb = 120mb of memory.
==== Digression: If it's not an OOM ====
If you don't see a line like this in the dmesg output, your crash is likely not an OOM.  The next step in debugging such a crash is usually to attach gdb to the crashing process and get a backtrace:
  $ cd path/to/B2G/checkout
  $ adb shell b2g-ps
  # Note pid of the app that you're going to crash
  $ ./run-gdb.sh attach <pid>
  (gdb) continue
  # crash the app
  (gdb) bt
Attach this output, along with the output of <tt>adb logcat</tt> to a bug.
If your crash is due to OOM, a gdb backtrace is probably not interesting, because an OOM crash is triggered by a signal sent from the kernel, not by bad code that the process executes.
=== Step 2: Collect memory reports ===
After you've verified that your crash is actually due to OOM, the next step is to collect a memory report from the phone before the app crashes.  A memory report will help us understand where memory is being used.
This step is a bit tricky, because once an app crashes, there's no way to collect a memory report from that process.  There's also no way to trigger a memory report when the kernel tries to kill a process -- by then, it's too late.
To pull a memory report from the phone, first update your tree so you get the latest version of this tool.  <tt>repo sync</tt> is not sufficient; you must git fetch && git merge or git pull:
  $ cd path/to/B2G/checkout
  $ git fetch origin
  $ git merge --ff-only origin
Now you can run the tool:
  $ tools/get_about_memory.py
Once you get a memory report you're happy with, you can zip up the directory (named about-memory-N) and attach it to the relevant bug.
But again, this is only helpful if you run this command while the app you care about is alive and using a lot of memory.  We have a few options here.
==== Step 2, option 1: Get a different device ====
Often the easiest thing to do is to get a device with more RAM.  You know from step 1 above how much memory the process used when it crashed, so you can simply wait until the process is using about that much memory, and then take a memory report.
The <tt>b2g-info</tt> tool let you see how much memory the different B2G processes are using.  You can run this tool in a loop by doing something like the following:
  $ adb shell 'while true; do b2g-info; sleep 1; done'
If b2g-info isn't available on your device, you can use b2g-procrank instead.
==== Step 2, option 2: Fastest finger ====
If you don't have access to a device with more RAM, you can try to run get_about_memory.py just before the app crashes.  Again, you can run b2g-info in a loop to figure out when to run get_about_memory.py.
Running a memory report freezes all of the processes on the phone for a few moments, so it's often not difficult to grab a memory report soon before a process OOMs itself.
==== Step 2, option 3: Use a smaller testcase ====
We often hit OOMs when doing something like "load a file of size at least X in the app."
If the app crashes very quickly with a testcase of size X, you could try running a similar but smaller testcase (say, size X/2) and capturing a memory report after that succeeds.  The memory report generated this way often gives us good insight into the OOM crash that we ultimately care about.
==== Step 2, option 4: Run B2G on your desktop ====
If worst comes to worst, you can run B2G on your desktop, which probably has much more RAM than your FFOS phone.  This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.
In particular, B2G on desktop machines has multiprocess disabled by default.  It doesn't really work 100% correctly anywhere, but it mostly works on Linux and Mac.  (I'm not sure yet how to enable it; for now, ask on #b2g and please update the wiki if you figure it out.)  You can test on your desktop without multiprocess enabled, but in my experience a lot of our high memory usage issues are caused by our inter-process communication code, so that won't necessarily trigger the bug you're seeing.
It's also not as convenient to take memory reports from a B2G desktop process.  On Linux, you can send signal 34 to the main B2G process and we'll write "memory-report-*.gz" files out to /tmp.
One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on MacOS.  We've had a lot of success with this in the past.
Instructions for setting up B2G desktop builds can be found here: https://wiki.mozilla.org/Gaia/Hacking#B2G_Desktop
=== Step 3: Analyze the memory report ===
When you run get_about_memory.py, it will open a memory report in Firefox.  This file contains information about the memory usage of all processes on the system.
Reading these reports can be a bit overwhelming at first, but it's not so bad once you get the hang of it.  You can click on a sub-tree to collapse (or expand) it, and you can hover over any leaf node to get a description of what that node describes.
What you're looking for is something "unusually large" in the crashing process.  You can get an idea of what "unusually large" means by capturing a memory report of your app when it's not using a ton of memory and comparing that to the errant memory report.
Reading memory reports takes some practice, so feel free to ask for help.  The experts on this subject hang out in #memshrink on IRC.
Confirmed users
1,345

edits

Navigation menu