|
|
| (17 intermediate revisions by 10 users not shown) |
| Line 1: |
Line 1: |
| = Profiling with oprofile = | | = Profiling with the gecko profiler = |
| OProfile is a system-wide profiler for Linux systems. The detail description about OProfile please refer to below url<br>
| |
| http://oprofile.sourceforge.net/news/
| |
|
| |
|
| OProfile consists of three portions, linux kernel driver, userspace applications and collected profiling samplings.<br>
| | Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers |
|
| |
|
| == Prepare the Linux Kernel ==
| | See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions]. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet. |
| Please make sure you have turned on below features in kerenl configuration file. The kerenl configuration file will be .config in your linux kernel directory normally. You need to recompile linux kernel after turn on OProfile feature.
| |
|
| |
|
| <pre>
| | = Profiling with systrace = |
| CONFIG_PROFILING=y
| | Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers |
| CONFIG_OPROFILE=y
| |
| CONFIG_HAVE_OPROFILE=y
| |
| </pre>
| |
|
| |
|
| == Userspace applications ==
| | Bad at: Requires configure option, higher overhead |
| userspace applications of OProfile includes opcontrol and oprofiled. You can find source code of OProfile in glue/gonk/external/oprofile.<br>
| |
|
| |
|
| == Host application ==
| | *Download android sdk to get systrace tool: |
| use host utility opreport to analysis profiling samples<br>
| | **[http://developer.android.com/sdk/index.html 1. download link] |
| you need to install it in your host system.
| | **2. the systrace.py tool is at path-to-android-sdk/tools/systrace |
| <pre>
| |
| sudo apt-get install oprofile
| |
| </pre>
| |
|
| |
|
| == Five Steps to profile your target device ==
| | *Enable systrace in B2G: |
| To make it easier to use OProfile on B2G project, several Makefile targets have been written.
| | **Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like: |
| <pre> | | <pre> |
| make op_setup # start up oprofile
| | #define MOZ_USE_SYSTRACE |
| make op_start # start profiling
| | #ifdef MOZ_USE_SYSTRACE |
| make op_status # check status
| | # define ATRACE_TAG ATRACE_TAG_ALWAYS |
| make op_stop # stop profiling
| | // We need HAVE_ANDROID_OS to be defined for Trace.h. |
| make op_pull # pull profile data from phone
| | // If its not set we will set it temporary and remove it. |
| make op_show # save profiling result in oprofile/oprofile.log
| | # ifndef HAVE_ANDROID_OS |
| | # define HAVE_ANDROID_OS |
| | # define REMOVE_HAVE_ANDROID_OS |
| | # endif |
| </pre> | | </pre> |
| ===make op_setup===
| |
| prepare opsetup script file and push it to target device. <br>
| |
| opsetup script will wake up oprofiled and setup trigger event.<br>
| |
| The snapshot of opsetup is listed below<br>
| |
| <pre>
| |
| opcontrol --setup<br>
| |
| opcontrol --vmlinux=/home/vincent/project/B2G_20120217/boot/kernel-android-galaxy-s2/vmlinux --kernel-range=0xc059c000, 0xc0c06000 --event=CPU_CYCLES<br>
| |
| </pre>
| |
|
| |
| ===make op_start===
| |
| We use "adb shell opcontrol --start" to start profiling and collect samples in target device<br>
| |
| ===make op_status===
| |
| We use "adb shell opcontrol --status" to check profiling status<br>
| |
| <pre>
| |
| Driver directory: /dev/oprofile
| |
| Session directory: /data/oprofile
| |
| Counter 0:
| |
| name: CPU_CYCLES
| |
| count: 150000
| |
| Counter 1 disabled
| |
| Counter 2 disabled
| |
| Counter 3 disabled
| |
| Counter 4 disabled
| |
| oprofiled pid: 3074
| |
| profiler is running
| |
| 5621 samples received
| |
| 0 samples lost overflow
| |
| </pre>
| |
| ===make op_stop===
| |
| we use "adb shell opcontrol --stop" to stop profiling<br>
| |
| ===make op_pull===
| |
| pull profiling samples from target device to host PC and copy the related binary files to correlate symbols and memory address
| |
| ===make op_show===
| |
| use opreport to analysis profiling samples<br>
| |
| use sudo apt-get install oprofile to install it in your host system
| |
| <pre>
| |
| CPU: ARM Cortex-A9, speed 0 MHz (estimated)
| |
| Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 150000
| |
| samples % image name app name symbol name
| |
| 5438 9.9701 libmozglue.so libmozglue.so __aeabi_idiv
| |
| 2811 5.1537 libGLESv2_mali.so libGLESv2_mali.so /system/lib/egl/libGLESv2_mali.so
| |
| 2348 4.3049 libc.so libc.so __aeabi_idiv
| |
| 2083 3.8190 libxul.so libxul.so pixman_composite_over_8888_8_8888_asm_neon
| |
| 1556 2.8528 libxul.so libxul.so pixman_composite_over_8888_8888_asm_neon
| |
| 1337 2.4513 libxul.so libxul.so pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
| |
| 594 1.0890 libc.so libc.so timesub
| |
| 578 1.0597 libxul.so libxul.so __aeabi_l2f
| |
| 547 1.0029 libmozglue.so libmozglue.so __aeabi_uidiv
| |
| 421 0.7719 libc.so libc.so localsub
| |
| 383 0.7022 libc.so libc.so memset
| |
| 357 0.6545 libxul.so libxul.so pixman_composite_over_n_8888_asm_neon
| |
| 341 0.6252 libxul.so libxul.so pixman_composite_over_n_8_8888_asm_neon
| |
| 308 0.5647 libxul.so libxul.so pixman_composite_src_8888_8888_asm_neon
| |
| 304 0.5574 libm.so libm.so floor
| |
| 211 0.3869 libc.so libc.so __findenv
| |
| 208 0.3814 libc.so libc.so pthread_mutex_lock
| |
| 201 0.3685 libmozglue.so libmozglue.so arena_malloc
| |
| 193 0.3538 libmozglue.so libmozglue.so arena_dalloc
| |
| 180 0.3300 libm.so libm.so fmod
| |
| 177 0.3245 libxul.so libxul.so nsIFrame::FinishAndStoreOverflow(nsOverflowAreas&, nsSize)
| |
| 176 0.3227 libc.so libc.so __system_property_find
| |
| 174 0.3190 libxul.so libxul.so gfx3DMatrix::Transform3D(gfxPoint3D const&) const
| |
| 171 0.3135 libxul.so libxul.so pixman_composite_src_n_8888_asm_neon
| |
| 162 0.2970 libc.so libc.so time2sub.clone.2
| |
| 161 0.2952 libxul.so libxul.so PL_DHashTableOperate
| |
| </pre>
| |
|
| |
| = Profilingwith perf =
| |
| The perf utility is a performance analysis tools for Linux.
| |
|
| |
| == Setup ==
| |
| The profiling data is collected at target device, and the report been generated at host side.<br>
| |
| You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.
| |
|
| |
| * Install perf at host side for Ubuntu
| |
| $ sudo apt-get install linux-tools
| |
| $ perf --version
| |
| perf version 3.0.17
| |
|
| |
| * Create direcotry for libaries with symbols<br>Here's a B2G makefile helper to create this directory.
| |
| $ make perf-create-symfs
| |
|
| |
| == Real time report ==
| |
| On target device, use perf top to generate and display performance counters in real time.
| |
| # perf top -p `pidof b2g`
| |
| The output will be like this:
| |
| PerfTop: 388 irqs/sec kernel:13.1% exact: 0.0% [1000Hz cycles], (target_pid: 7852)
| |
| -------------------------------------------------------------------------------
| |
|
| |
| samples pcnt function DSO
| |
| _______ _____ __________________________________ _________________
| |
|
| |
| 403.00 31.8% _downsample_2x2_rgba8888 libGLESv2_mali.so
| |
| 119.00 9.4% JaegerStubVeneer libxul.so
| |
| 93.00 7.3% _raw_spin_unlock_irqrestore [kernel.kallsyms]
| |
| 59.00 4.7% _m200_texture_deinterleave_16x16_b libMali.so
| |
| 56.00 4.4% memcpy libc.so
| |
| 40.00 3.2% finish_task_switch [kernel.kallsyms]
| |
| 37.00 2.9% vfprintf libc.so
| |
| 23.00 1.8% _gles_fb_tex_sub_image_2d libGLESv2_mali.so
| |
| 16.00 1.3% __sfvwrite libc.so
| |
| 16.00 1.3% __do_softirq [kernel.kallsyms]
| |
| 15.00 1.2% __memzero [kernel.kallsyms]
| |
| 13.00 1.0% getnstimeofday [kernel.kallsyms]
| |
| 12.00 0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so
| |
| 12.00 0.9% snprintf libc.so
| |
| 12.00 0.9% __divsi3 libmozglue.so
| |
| 10.00 0.8% v7_dma_clean_range [kernel.kallsyms]
| |
|
| |
| == Recording for a period and generating report ==
| |
| Record at target side: (Hit CTRL-C to stop recording)
| |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| |
|
| |
| Generate report at host side:
| |
| $ adb pull /data/local/perf.data .
| |
| $ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux
| |
| The output will be like this:
| |
| # Events: 4K cycles
| |
| #
| |
| # Overhead Command Shared Object
| |
| # ........ ....... ................. ...............................................................................................
| |
| #
| |
| 8.00% b2g perf-7852.map [.] 0x438413fc
| |
| 4.46% b2g [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
| |
| 4.36% b2g [unknown] [.] 0x43843500
| |
| 2.61% b2g [kernel.kallsyms] [k] finish_task_switch
| |
| 1.69% b2g libxul.so [.] JaegerStubVeneer
| |
| 1.20% b2g libxul.so [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J
| |
| 1.06% b2g libxul.so [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
| |
| 1.05% b2g libxul.so [.] js::mjit::stubs::GetElem(js::VMFrame&)
| |
| 1.01% b2g libc.so [.] pthread_mutex_lock
| |
| 1.00% b2g libc.so [.] memcpy
| |
| 0.90% b2g libxul.so [.] JSObject::nativeLookup(JSContext*, int)
| |
| 0.88% b2g [kernel.kallsyms] [k] sub_preempt_count
| |
| 0.86% b2g libGLESv2_mali.so [.] 0xa3a0
| |
| 0.82% b2g [kernel.kallsyms] [k] add_preempt_count
| |
| 0.80% b2g [kernel.kallsyms] [k] __do_softirq
| |
| 0.79% b2g libxul.so [.] js_IsTypedArray(JSObject*)
| |
| 0.78% b2g libMali.so [.] 0x13be8
| |
| 0.67% b2g libxul.so [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*)
| |
| 0.66% b2g libxul.so [.] js::PropertyTable::search(int, bool)
| |
| 0.66% b2g libxul.so [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*)
| |
| 0.65% b2g libc.so [.] pthread_mutex_unlock
| |
| 0.59% b2g libxul.so [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa
| |
| 0.57% b2g libmozglue.so [.] __udivsi3
| |
| 0.53% b2g libxul.so [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool)
| |
| 0.52% b2g libxul.so [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
| |
| 0.49% b2g libxul.so [.] js::TypedArray::getTypedArray(JSObject*)
| |
| 0.49% b2g libxul.so [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*)
| |
| 0.48% b2g [kernel.kallsyms] [k] vector_swi
| |
| 0.47% b2g [kernel.kallsyms] [k] get_parent_ip
| |
| 0.42% b2g libxul.so [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)
| |
|
| |
| == Recording with callgraph ==
| |
|
| |
| Use option '-g' to do callgraph recording:
| |
| # perf record -g -o /data/local/perf.data -p `pidof b2g`
| |
|
| |
| Note:
| |
| # To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
| |
| # On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.
| |
|
| |
| == System-wide and specific application profiling ==
| |
|
| |
| Use option '-a' to do system-wide profiling:
| |
| # perf record -o /data/local/perf.data -a
| |
|
| |
| Profiling on specified command:
| |
| # perf -o /data/local/perf.data /system/b2g/b2g
| |
|
| |
| Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)
| |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| |
|
| |
| == Makefile helpers for perf ==
| |
|
| |
|
| Here are B2G makefile helpers to generate perf reports at host side.
| | *How to use systrace: |
| | **[http://developer.android.com/tools/help/systrace.html systrace.py document] |
| | **./systrace.py --time=10 -o mynewtrace.html sched |
|
| |
|
| * Create direcotry for libaries with symbols
| | Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type. |
| $ make perf-create-symfs
| |
| * Remove directory for libaries with symbols
| |
| $ make perf-clean-symfs
| |
| * Real time perf report for system wide
| |
| $ make perf-top
| |
| * Real time report for B2G process
| |
| $ make perf-top-b2g
| |
| * Summary perf report for system wide
| |
| $ make perf-report
| |
| * Summary perf report for B2G process
| |
| $ make perf-report-b2g
| |
| * Change recording duration<br>For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".<br>Below is an example to record for 30 seconds:
| |
| $ make perf-report RECORD_DURATION=30
| |