Firefox OS/Performance/Boot Profiling

From MozillaWiki
Jump to: navigation, search

Outline


Project Plan

First Steps

  • Read existing literature on the topic of linux boot optimization
    • Take cues from existing attempts (Someone mentioned a presentation by SAMSUNG engineers on their quest for boot improvement)
    • Read through the internal documentation on the fxos boot process
    • Also source code reading, try to get some insight into IPDL thing below
  • Figure out how to actually measure boot statistics
    • Pick a revision (probably HEAD, make note of it for consistency)
    • Profile boot of Flame, Hamachi, Tarako (concerns expressed about ability to source hamachi and tarakos, if unavailable, just do flames?)
      • I should possibly benchmark several flames to ensure there are no hardware differences/defects
    • Use testing sample size of 30 (for statistical significance)
  • Collect data on freshly flashed vs. heavy workload
    • If no significant difference on heavy workload, probably dont need to continue carrying out these tests for the second step

Find out which components take what amount of time at boot, and if there are any places for improvement in general.

Extra questions:
  • How is boot time different on a "well-used" device. (lots of personal data accumulated)
    • Someone had mentioned a tool to automatically populate a phone with contacts, etc. (b2gpopulate?mozpopulate?)
    • Lots of apps is probably sufficient though.
    • Perhaps analyzing boot sequence would give insight on where potential slowdowns could occur as a result of this extra data
  • At what stage precisely does the UI come up.

I hope to have a good idea about the current state of affairs on firefox os before the first progress update. So in conclusion this section is about profiling boot, allocating blame to various subsystems, and then doing research in the mean time, since data collection is probably going to be time consuming

Second step

  • Apply any relevant research or theories in attempts to affect boot speeds, and collect data similarly to above
  • Fix obvious nonoptimal sections, or anything else with the code
  • At this phase, once we figure out what is doing what, It is probably wise to choose one of the following to focus on, depending on the results we see:
    • Linux kernel and system startup improvements
    • Boot-2-Gecko improvements (weighted to this side)

Relevant bugs:
bug 1010381
bug 994998

I'll try to update the first bug with comments on my progress; whether something worked, or i discovered an approach to not be fruitful.

Important Questions

  • Are we lagging significantly behind our competitors?
    • A: Probably not, actually; Extremely rough and statistically invalid tests had the following results:
      • Flame ~= 25s, hamachi ~= 40s, tarako ~= 30s (probably freshly flashed devices)
      • Nexus 5 ~= 35s (but around ~4s until button hold became affective)
        • Likely well used device
      • iPhone 4s ~ 1:14s (but around ~5s until button hold became affective)
        • Well used device
  • Would be super interesting to know what is taking time, linux kernel, or b2g?
  • What do we do differently than android? (i.e. what choices did we make to allow us to boot faster or slower)
  • Telemetry; What tools exist to measure? How to measure? Misc. Statistics?
    • dmesg timestamps could be pretty useful here.

Possible General Techniques

What is Success

  • Tangible boot improvement
  • More data on current state of affairs (Characterize what takes time)
    • Relatedly, if the subsystems that are slow are things that would not be included in an OEM car or TV build, then that would be great.

Miscellanea

  • An intermediate black screen appears between partner logo and full boot up on the tarako device. (That wasn't really the case on my flame though). We want to eliminate this. Why does this happen? Can we get our own logos here instead? i.e. If we can't eliminate, am I able to write a frame buffer program to fill this visual empty space.
  • IPDL; Figure out what things are using RPC vs Async; Are all RPC usages (which block) are completely justifiable?