|
|
| Line 22: |
Line 22: |
|
| |
|
| Most child processes run with oom_adj 2 while they're in the foreground. Child processes in the background run with oom_adj between 3 and 6 (inclusive). Exactly what oom_adj a child process while in the background gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on. | | Most child processes run with oom_adj 2 while they're in the foreground. Child processes in the background run with oom_adj between 3 and 6 (inclusive). Exactly what oom_adj a child process while in the background gets depends on a number of factors, such as whether it's playing sound, whether it's the homescreen app, and so on. |
|
| |
| == Debugging an OOM crash ==
| |
|
| |
| Suppose you have a reproducible crash that you suspect is caused by the phone running out of memory. The following are steps you can take to understand more about what's going wrong.
| |
|
| |
| === Step 1: Verify that it's actually an OOM ===
| |
|
| |
| First, we need to check whether the crash is actually due to the phone running out of memory. To do this, run <tt>adb shell dmesg</tt>. If the app is being killed due to OOM, you'll see something like the following line in dmesg:
| |
|
| |
| <4>[06-18 07:40:25.291] [2897: Notes+]send sigkill to 2897 (Notes+), adj 2, size 30625
| |
|
| |
| This line indicates that the phone's low-memory killer killed the Notes+ app (process-id 2897), which had oom_adj 2 when it was killed. The size reported here is in pages, which are 4kb each. So in this case, the Notes+ app was using 30625 * 4kb = 120mb of memory.
| |
|
| |
| ==== Digression: If it's not an OOM ====
| |
|
| |
| If you don't see a line like this in the dmesg output, your crash is likely not an OOM. The next step in debugging such a crash is usually to attach gdb to the crashing process and get a backtrace:
| |
|
| |
| $ cd path/to/B2G/checkout
| |
| $ adb shell b2g-ps
| |
| # Note pid of the app that you're going to crash
| |
| $ ./run-gdb.sh attach <pid>
| |
| (gdb) continue
| |
| # crash the app
| |
| (gdb) bt
| |
|
| |
| Attach this output, along with the output of <tt>adb logcat</tt> to a bug.
| |
|
| |
| If your crash is due to OOM, a gdb backtrace is probably not interesting, because an OOM crash is triggered by a signal sent from the kernel, not by bad code that the process executes.
| |
|
| |
| === Step 2: Collect memory reports ===
| |
|
| |
| After you've verified that your crash is actually due to OOM, the next step is to collect a memory report from the phone before the app crashes. A memory report will help us understand where memory is being used.
| |
|
| |
| This step is a bit tricky, because once an app crashes, there's no way to collect a memory report from that process. There's also no way to trigger a memory report when the kernel tries to kill a process -- by then, it's too late.
| |
|
| |
| To pull a memory report from the phone, first update your tree so you get the latest version of this tool. <tt>repo sync</tt> is not sufficient; you must git fetch && git merge or git pull:
| |
|
| |
| $ cd path/to/B2G/checkout
| |
| $ git fetch origin
| |
| $ git merge --ff-only origin
| |
|
| |
| Now you can run the tool:
| |
|
| |
| $ tools/get_about_memory.py
| |
|
| |
| Once you get a memory report you're happy with, you can zip up the directory (named about-memory-N) and attach it to the relevant bug.
| |
|
| |
| But again, this is only helpful if you run this command while the app you care about is alive and using a lot of memory. We have a few options here.
| |
|
| |
| ==== Step 2, option 1: Get a different device ====
| |
|
| |
| Often the easiest thing to do is to get a device with more RAM. You know from step 1 above how much memory the process used when it crashed, so you can simply wait until the process is using about that much memory, and then take a memory report.
| |
|
| |
| The <tt>b2g-info</tt> tool let you see how much memory the different B2G processes are using. You can run this tool in a loop by doing something like the following:
| |
|
| |
| $ adb shell 'while true; do b2g-info; sleep 1; done'
| |
|
| |
| If b2g-info isn't available on your device, you can use b2g-procrank instead.
| |
|
| |
| ==== Step 2, option 2: Fastest finger ====
| |
|
| |
| If you don't have access to a device with more RAM, you can try to run get_about_memory.py just before the app crashes. Again, you can run b2g-info in a loop to figure out when to run get_about_memory.py.
| |
|
| |
| Running a memory report freezes all of the processes on the phone for a few moments, so it's often not difficult to grab a memory report soon before a process OOMs itself.
| |
|
| |
| ==== Step 2, option 3: Use a smaller testcase ====
| |
|
| |
| We often hit OOMs when doing something like "load a file of size at least X in the app."
| |
|
| |
| If the app crashes very quickly with a testcase of size X, you could try running a similar but smaller testcase (say, size X/2) and capturing a memory report after that succeeds. The memory report generated this way often gives us good insight into the OOM crash that we ultimately care about.
| |
|
| |
| ==== Step 2, option 4: Run B2G on your desktop ====
| |
|
| |
| If worst comes to worst, you can run B2G on your desktop, which probably has much more RAM than your FFOS phone. This is tricky because B2G running on a desktop machine is a different in some key ways from B2G running on a phone.
| |
|
| |
| In particular, B2G on desktop machines has multiprocess disabled by default. It doesn't really work 100% correctly anywhere, but it mostly works on Linux and Mac. (I'm not sure yet how to enable it; for now, ask on #b2g and please update the wiki if you figure it out.) You can test on your desktop without multiprocess enabled, but in my experience a lot of our high memory usage issues are caused by our inter-process communication code, so that won't necessarily trigger the bug you're seeing.
| |
|
| |
| It's also not as convenient to take memory reports from a B2G desktop process. On Linux, you can send signal 34 to the main B2G process and we'll write "memory-report-*.gz" files out to /tmp.
| |
|
| |
| One advantage to using B2G desktop builds is that you can use your favorite desktop debugging tools, such as Instruments on MacOS. We've had a lot of success with this in the past.
| |
|
| |
| Instructions for setting up B2G desktop builds can be found here: https://wiki.mozilla.org/Gaia/Hacking#B2G_Desktop
| |
|
| |
| === Step 3: Analyze the memory report ===
| |
|
| |
| When you run get_about_memory.py, it will open a memory report in Firefox. This file contains information about the memory usage of all processes on the system.
| |
|
| |
| Reading these reports can be a bit overwhelming at first, but it's not so bad once you get the hang of it. You can click on a sub-tree to collapse (or expand) it, and you can hover over any leaf node to get a description of what that node describes.
| |
|
| |
| What you're looking for is something "unusually large" in the crashing process. You can get an idea of what "unusually large" means by capturing a memory report of your app when it's not using a ton of memory and comparing that to the errant memory report.
| |
|
| |
| Reading memory reports takes some practice, so feel free to ask for help. The experts on this subject hang out in #memshrink on IRC.
| |