|
|
| Line 3: |
Line 3: |
|
| |
|
| = Profiling with perf = | | = Profiling with perf = |
| The perf utility is a performance analysis tools for Linux. | | The perf utility is a performance analysis tools for Linux.<br>Please read about [[B2G/Profiling/perf|profiling B2G with perf]]. |
| | |
| == Setup ==
| |
| The profiling data is collected at target device, and the report been generated at host side.<br>
| |
| You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.
| |
| | |
| * Install perf at host side for Ubuntu
| |
| $ sudo apt-get install linux-tools
| |
| | |
| * Create direcotry for libaries with symbols<br>Here's a B2G makefile helper to create this directory.
| |
| $ make perf-create-symfs
| |
| | |
| == Real time report ==
| |
| On target device, use perf top to generate and display performance counters in real time.
| |
| # perf top -p `pidof b2g`
| |
| The output will be like this:
| |
| PerfTop: 388 irqs/sec kernel:13.1% exact: 0.0% [1000Hz cycles], (target_pid: 7852)
| |
| -------------------------------------------------------------------------------
| |
|
| |
| samples pcnt function DSO
| |
| _______ _____ __________________________________ _________________
| |
|
| |
| 403.00 31.8% _downsample_2x2_rgba8888 libGLESv2_mali.so
| |
| 119.00 9.4% JaegerStubVeneer libxul.so
| |
| 93.00 7.3% _raw_spin_unlock_irqrestore [kernel.kallsyms]
| |
| 59.00 4.7% _m200_texture_deinterleave_16x16_b libMali.so
| |
| 56.00 4.4% memcpy libc.so
| |
| 40.00 3.2% finish_task_switch [kernel.kallsyms]
| |
| 37.00 2.9% vfprintf libc.so
| |
| 23.00 1.8% _gles_fb_tex_sub_image_2d libGLESv2_mali.so
| |
| 16.00 1.3% __sfvwrite libc.so
| |
| 16.00 1.3% __do_softirq [kernel.kallsyms]
| |
| 15.00 1.2% __memzero [kernel.kallsyms]
| |
| 13.00 1.0% getnstimeofday [kernel.kallsyms]
| |
| 12.00 0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so
| |
| 12.00 0.9% snprintf libc.so
| |
| 12.00 0.9% __divsi3 libmozglue.so
| |
| 10.00 0.8% v7_dma_clean_range [kernel.kallsyms]
| |
| | |
| == Recording for a period and generating report ==
| |
| Record at target side: (Hit CTRL-C to stop recording)
| |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| |
| | |
| Generate report at host side:
| |
| $ adb pull /data/local/perf.data .
| |
| $ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux
| |
| The output will be like this:
| |
| # Events: 4K cycles
| |
| #
| |
| # Overhead Command Shared Object
| |
| # ........ ....... ................. ...............................................................................................
| |
| #
| |
| 8.00% b2g perf-7852.map [.] 0x438413fc
| |
| 4.46% b2g [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
| |
| 4.36% b2g [unknown] [.] 0x43843500
| |
| 2.61% b2g [kernel.kallsyms] [k] finish_task_switch
| |
| 1.69% b2g libxul.so [.] JaegerStubVeneer
| |
| 1.20% b2g libxul.so [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J
| |
| 1.06% b2g libxul.so [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
| |
| 1.05% b2g libxul.so [.] js::mjit::stubs::GetElem(js::VMFrame&)
| |
| 1.01% b2g libc.so [.] pthread_mutex_lock
| |
| 1.00% b2g libc.so [.] memcpy
| |
| 0.90% b2g libxul.so [.] JSObject::nativeLookup(JSContext*, int)
| |
| 0.88% b2g [kernel.kallsyms] [k] sub_preempt_count
| |
| 0.86% b2g libGLESv2_mali.so [.] 0xa3a0
| |
| 0.82% b2g [kernel.kallsyms] [k] add_preempt_count
| |
| 0.80% b2g [kernel.kallsyms] [k] __do_softirq
| |
| 0.79% b2g libxul.so [.] js_IsTypedArray(JSObject*)
| |
| 0.78% b2g libMali.so [.] 0x13be8
| |
| 0.67% b2g libxul.so [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*)
| |
| 0.66% b2g libxul.so [.] js::PropertyTable::search(int, bool)
| |
| 0.66% b2g libxul.so [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*)
| |
| 0.65% b2g libc.so [.] pthread_mutex_unlock
| |
| 0.59% b2g libxul.so [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa
| |
| 0.57% b2g libmozglue.so [.] __udivsi3
| |
| 0.53% b2g libxul.so [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool)
| |
| 0.52% b2g libxul.so [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
| |
| 0.49% b2g libxul.so [.] js::TypedArray::getTypedArray(JSObject*)
| |
| 0.49% b2g libxul.so [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*)
| |
| 0.48% b2g [kernel.kallsyms] [k] vector_swi
| |
| 0.47% b2g [kernel.kallsyms] [k] get_parent_ip
| |
| 0.42% b2g libxul.so [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)
| |
| | |
| == Recording with callgraph ==
| |
| | |
| Use option '-g' to do callgraph recording:
| |
| # perf record -g `pidof b2g`
| |
| | |
| Note:
| |
| # To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
| |
| # On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.
| |
| | |
| == System-wide and specific application profiling ==
| |
| | |
| Use option '-a' to do system-wide profiling:
| |
| # perf record -o /data/local/perf.data -a
| |
| | |
| Profiling on specified command:
| |
| # perf -o /data/local/perf.data/system/b2g/b2g
| |
| | |
| Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)
| |
| # perf record -o /data/local/perf.data -p `pidof b2g`
| |
| | |
| == Makefile helpers for perf ==
| |
| | |
| Here are B2G makefile helpers to generate perf reports at host side.
| |
| | |
| * Create direcotry for libaries with symbols
| |
| $ make perf-create-symfs
| |
| * Remove directory for libaries with symbols
| |
| $ make perf-clean-symfs
| |
| * Real time perf report for system wide
| |
| $ make perf-top
| |
| * Real time report for B2G process
| |
| $ make perf-top-b2g
| |
| * Summary perf report for system wide
| |
| $ make perf-report
| |
| * Summary perf report for B2G process
| |
| $ make perf-report-b2g
| |
| * Change recording duration<br>For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".<br>Below is an example to record for 30 seconds:
| |
| $ make perf-report RECORD_DURATION=30
| |