Firefox OS/Performance/Profiling: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
m (Lakrits moved page FirefoxOS/Performance/Profiling to Firefox OS/Performance/Profiling: The official spelling of "Firefox OS" leaves a space between the two parts of the name. It's easier to find a page if the spelling of its name is consistent...)
 
(17 intermediate revisions by 10 users not shown)
Line 1: Line 1:
= Profiling with oprofile =
= Profiling with the gecko profiler =
OProfile is a system-wide profiler for Linux systems. The detail description about OProfile please refer to below url<br>
http://oprofile.sourceforge.net/news/


OProfile consists of three portions, linux kernel driver, userspace applications and collected profiling samplings.<br>
Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers


== Prepare the Linux Kernel ==
See [https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 these instructions]. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.
Please make sure you have turned on below features in kerenl configuration file. The kerenl configuration file will be .config in your linux kernel directory normally. You need to recompile linux kernel after turn on OProfile feature.


<pre>
= Profiling with systrace =
CONFIG_PROFILING=y
Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
</pre>


== Userspace applications ==
Bad at: Requires configure option, higher overhead
userspace applications of OProfile includes opcontrol and oprofiled. You can find source code of OProfile in glue/gonk/external/oprofile.<br>


== Host application ==
*Download android sdk to get systrace tool:
use host utility opreport to analysis profiling samples<br>
**[http://developer.android.com/sdk/index.html 1. download link]
you need to install it in your host system.  
**2. the systrace.py tool is at path-to-android-sdk/tools/systrace
<pre>
sudo apt-get install oprofile
</pre>


== Five Steps to profile your target device ==
*Enable systrace in B2G:
To make it easier to use OProfile on B2G project, several Makefile targets have been written.
**Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like:
<pre>
<pre>
make op_setup        # start up oprofile
#define MOZ_USE_SYSTRACE
make op_start        # start profiling
#ifdef MOZ_USE_SYSTRACE
make op_status      # check status
# define ATRACE_TAG ATRACE_TAG_ALWAYS
make op_stop        # stop profiling
// We need HAVE_ANDROID_OS to be defined for Trace.h.
make op_pull        # pull profile data from phone
// If its not set we will set it temporary and remove it.
make op_show        # save profiling result in oprofile/oprofile.log
# ifndef HAVE_ANDROID_OS
#   define HAVE_ANDROID_OS
#   define REMOVE_HAVE_ANDROID_OS
# endif
</pre>
</pre>
===make op_setup===
prepare opsetup script file and push it to target device. <br>
opsetup script will wake up oprofiled and setup trigger event.<br>
The snapshot of opsetup is listed below<br>
<pre>
opcontrol --setup<br>
opcontrol --vmlinux=/home/vincent/project/B2G_20120217/boot/kernel-android-galaxy-s2/vmlinux --kernel-range=0xc059c000, 0xc0c06000 --event=CPU_CYCLES<br>
</pre>
===make op_start===
We use "adb shell opcontrol --start" to start profiling and collect samples in target device<br>
===make op_status===
We use "adb shell opcontrol --status" to check profiling status<br>
<pre>
Driver directory: /dev/oprofile
Session directory: /data/oprofile
Counter 0:
    name: CPU_CYCLES
    count: 150000
Counter 1 disabled
Counter 2 disabled
Counter 3 disabled
Counter 4 disabled
oprofiled pid: 3074
profiler is running
      5621 samples received
          0 samples lost overflow
</pre>
===make op_stop===
we use "adb shell opcontrol --stop" to stop profiling<br>
===make op_pull===
pull profiling samples from target device to host PC and copy the related binary files to correlate symbols and memory address
===make op_show===
use opreport to analysis profiling samples<br>
use sudo apt-get install oprofile to install it in your host system
<pre>
CPU: ARM Cortex-A9, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 150000
samples  %        image name              app name                symbol name
5438      9.9701  libmozglue.so            libmozglue.so            __aeabi_idiv
2811      5.1537  libGLESv2_mali.so        libGLESv2_mali.so        /system/lib/egl/libGLESv2_mali.so
2348      4.3049  libc.so                  libc.so                  __aeabi_idiv
2083      3.8190  libxul.so                libxul.so                pixman_composite_over_8888_8_8888_asm_neon
1556      2.8528  libxul.so                libxul.so                pixman_composite_over_8888_8888_asm_neon
1337      2.4513  libxul.so                libxul.so                pixman_scaled_bilinear_scanline_8888_8888_OVER_asm_neon
594      1.0890  libc.so                  libc.so                  timesub
578      1.0597  libxul.so                libxul.so                __aeabi_l2f
547      1.0029  libmozglue.so            libmozglue.so            __aeabi_uidiv
421      0.7719  libc.so                  libc.so                  localsub
383      0.7022  libc.so                  libc.so                  memset
357      0.6545  libxul.so                libxul.so                pixman_composite_over_n_8888_asm_neon
341      0.6252  libxul.so                libxul.so                pixman_composite_over_n_8_8888_asm_neon
308      0.5647  libxul.so                libxul.so                pixman_composite_src_8888_8888_asm_neon
304      0.5574  libm.so                  libm.so                  floor
211      0.3869  libc.so                  libc.so                  __findenv
208      0.3814  libc.so                  libc.so                  pthread_mutex_lock
201      0.3685  libmozglue.so            libmozglue.so            arena_malloc
193      0.3538  libmozglue.so            libmozglue.so            arena_dalloc
180      0.3300  libm.so                  libm.so                  fmod
177      0.3245  libxul.so                libxul.so                nsIFrame::FinishAndStoreOverflow(nsOverflowAreas&, nsSize)
176      0.3227  libc.so                  libc.so                  __system_property_find
174      0.3190  libxul.so                libxul.so                gfx3DMatrix::Transform3D(gfxPoint3D const&) const
171      0.3135  libxul.so                libxul.so                pixman_composite_src_n_8888_asm_neon
162      0.2970  libc.so                  libc.so                  time2sub.clone.2
161      0.2952  libxul.so                libxul.so                PL_DHashTableOperate
</pre>
= Profilingwith perf =
The perf utility is a performance analysis tools for Linux.
== Setup ==
The profiling data is collected at target device, and the report been generated at host side.<br>
You need to install perf tool at host side, and create a directory for kernel and libraries with symbols.
* Install perf at host side for Ubuntu
$ sudo apt-get install linux-tools
$ perf --version
perf version 3.0.17
* Create direcotry for libaries with symbols<br>Here's a B2G makefile helper to create this directory.
$ make perf-create-symfs
== Real time report ==
On target device, use perf top to generate and display performance counters in real time.
# perf top -p `pidof b2g`
The output will be like this:
  PerfTop:    388 irqs/sec  kernel:13.1%  exact:  0.0% [1000Hz cycles],  (target_pid: 7852)
-------------------------------------------------------------------------------
              samples  pcnt function                          DSO
              _______ _____ __________________________________ _________________
              403.00 31.8% _downsample_2x2_rgba8888          libGLESv2_mali.so
              119.00  9.4% JaegerStubVeneer                  libxul.so       
                93.00  7.3% _raw_spin_unlock_irqrestore        [kernel.kallsyms]
                59.00  4.7% _m200_texture_deinterleave_16x16_b libMali.so     
                56.00  4.4% memcpy                            libc.so         
                40.00  3.2% finish_task_switch                [kernel.kallsyms]
                37.00  2.9% vfprintf                          libc.so         
                23.00  1.8% _gles_fb_tex_sub_image_2d          libGLESv2_mali.so
                16.00  1.3% __sfvwrite                        libc.so         
                16.00  1.3% __do_softirq                      [kernel.kallsyms]
                15.00  1.2% __memzero                          [kernel.kallsyms]
                13.00  1.0% getnstimeofday                    [kernel.kallsyms]
                12.00  0.9% _gles_generate_mipmaps_sw_16x16blo libGLESv2_mali.so
                12.00  0.9% snprintf                          libc.so         
                12.00  0.9% __divsi3                          libmozglue.so   
              10.00  0.8% v7_dma_clean_range                [kernel.kallsyms]
== Recording for a period and generating report ==
Record at target side: (Hit CTRL-C to stop recording)
# perf record -o /data/local/perf.data -p `pidof b2g`
Generate report at host side:
$ adb pull /data/local/perf.data .
$ perf report --symfs=/tmp/b2g_symfs_galaxys2 --vmlinux=/vmlinux
The output will be like this:
# Events: 4K cycles
#
# Overhead  Command      Shared Object                                                                                               
# ........  .......  .................  ...............................................................................................
#
      8.00%      b2g  perf-7852.map      [.] 0x438413fc     
      4.46%      b2g  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
      4.36%      b2g  [unknown]          [.] 0x43843500     
      2.61%      b2g  [kernel.kallsyms]  [k] finish_task_switch
      1.69%      b2g  libxul.so          [.] JaegerStubVeneer
      1.20%      b2g  libxul.so          [.] TypedArrayTemplate<float>::obj_getElement(JSContext*, JSObject*, JSObject*, unsigned int, J
      1.06%      b2g  libxul.so          [.] void js::mjit::stubs::SetElem<0>(js::VMFrame&)
      1.05%      b2g  libxul.so          [.] js::mjit::stubs::GetElem(js::VMFrame&)
      1.01%      b2g  libc.so            [.] pthread_mutex_lock
      1.00%      b2g  libc.so            [.] memcpy
      0.90%      b2g  libxul.so          [.] JSObject::nativeLookup(JSContext*, int)
      0.88%      b2g  [kernel.kallsyms]  [k] sub_preempt_count
      0.86%      b2g  libGLESv2_mali.so  [.] 0xa3a0         
      0.82%      b2g  [kernel.kallsyms]  [k] add_preempt_count
      0.80%      b2g  [kernel.kallsyms]  [k] __do_softirq
      0.79%      b2g  libxul.so          [.] js_IsTypedArray(JSObject*)
      0.78%      b2g  libMali.so        [.] 0x13be8       
      0.67%      b2g  libxul.so          [.] js::GetPropertyHelper(JSContext*, JSObject*, int, unsigned int, JS::Value*)
      0.66%      b2g  libxul.so          [.] js::PropertyTable::search(int, bool)
      0.66%      b2g  libxul.so          [.] js_GetProperty(JSContext*, JSObject*, JSObject*, int, JS::Value*)
      0.65%      b2g  libc.so            [.] pthread_mutex_unlock
      0.59%      b2g  libxul.so          [.] castNativeFromWrapper(JSContext*, JSObject*, unsigned int, nsISupports**, JS::Value*, XPCLa
      0.57%      b2g  libmozglue.so      [.] __udivsi3
      0.53%      b2g  libxul.so          [.] mozilla::gl::GLContextEGL::MakeCurrentImpl(bool)
      0.52%      b2g  libxul.so          [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
      0.49%      b2g  libxul.so          [.] js::TypedArray::getTypedArray(JSObject*)
      0.49%      b2g  libxul.so          [.] js::GetPropertyOperation(JSContext*, unsigned char*, JS::Value const&, JS::Value*)
      0.48%      b2g  [kernel.kallsyms]  [k] vector_swi
      0.47%      b2g  [kernel.kallsyms]  [k] get_parent_ip
      0.42%      b2g  libxul.so          [.] DisabledGetElem(js::VMFrame&, js::mjit::ic::GetElementIC*)
== Recording with callgraph ==
Use option '-g' to do callgraph recording:
# perf record -g -o /data/local/perf.data -p `pidof b2g`
Note:
# To get correct call graph report, you need to compile libaries with "-fno-omit-frame-pointer".
# On SGS2 device, it's easy to crash when doing perf with callgraph, this is an issue to be fixed.
== System-wide and specific application profiling ==
Use option '-a' to do system-wide profiling:
# perf record -o /data/local/perf.data -a
Profiling on specified command:
# perf -o /data/local/perf.data /system/b2g/b2g
Use option '-p' to profile an existing process: (On some devices there's no pidof, and you need to use ps to find out b2g PID)
# perf record  -o /data/local/perf.data -p `pidof b2g`
== Makefile helpers for perf ==


Here are B2G makefile helpers to generate perf reports at host side.
*How to use systrace:
**[http://developer.android.com/tools/help/systrace.html systrace.py document]
**./systrace.py --time=10 -o mynewtrace.html sched


* Create direcotry for libaries with symbols
Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type.
$ make perf-create-symfs
* Remove directory for libaries with symbols
$ make perf-clean-symfs
* Real time perf report for system wide
$ make perf-top
* Real time report for B2G process
$ make perf-top-b2g
* Summary perf report for system wide
$ make perf-report
* Summary perf report for B2G process
$ make perf-report-b2g
* Change recording duration<br>For perf-report-*, it automatically records for 10 seconds then generate report. You can change it by giving argument "RECORD_DURATION".<br>Below is an example to record for 30 seconds:
$ make perf-report RECORD_DURATION=30

Latest revision as of 13:59, 1 February 2015

Profiling with the gecko profiler

Good at: Native stacks (with runtime options) + javascript profiling, low overhead sampling, familiar for gecko developers

See these instructions. Patches are in-flight to get native stacks in profiles, but that's not in default configurations yet.

Profiling with systrace

Good at: Shows process preemption, shows all calls to instrumented functions, Familiar for android developers

Bad at: Requires configure option, higher overhead

  • Download android sdk to get systrace tool:
    • 1. download link
    • 2. the systrace.py tool is at path-to-android-sdk/tools/systrace
  • Enable systrace in B2G:
    • Build with '--enable-systrace' config or just uncomment the MOZ_USE_SYSTRACE define in gecko/tools/profiler/GeckoProfilerImpl.h like:
#define MOZ_USE_SYSTRACE
#ifdef MOZ_USE_SYSTRACE
# define ATRACE_TAG ATRACE_TAG_ALWAYS
// We need HAVE_ANDROID_OS to be defined for Trace.h.
// If its not set we will set it temporary and remove it.
# ifndef HAVE_ANDROID_OS
#   define HAVE_ANDROID_OS
#   define REMOVE_HAVE_ANDROID_OS
# endif

Note: Gecko code is tagged as ATRACE_TAG_ALWAYS, so we don't set the category type.