Mobile/BostonBrainstorm/Benchmarks

From MozillaWiki
Jump to: navigation, search

Why?

  • Make Fx Android better, help to establish industry standards,drives press, provides metrics for OEM negotiations
  • Provides metrics for competitive analysis
  • We are behind on a pure numbers scale on several of them
  • The numbers are not meaningful (generally bullshit) because they are reporting numbers that don't even match what's happening under the covers

Goals

  • Be faster on things we are slow on
  • Build benchmarks that show we're faster
  • Create benchmarks on the right things and educate on why they are the right things
    • Ensure that benchmarks are simple for everyone to run
  • Fix existing benchmarks
  • Not just create new benchmark, but help people see why we are measuring these things the way that we are measuring them and what they should be

What

P1

Page Load

  • Issues - we serve pages over a local LAN on mobile testing
    • Priority: P1 Reproducible page load test - can we make something like iBench?
    • Doing page load across browsers is going to be important
    • Page Load is rendering + networking

Network Perf

  • Priority: P1 More important on mobile, but hard to do simply
    • Define as how fast do you get all the resources you need to load a page. (JS/CSS/HTML/etc)
    • test with servers that support/don't support pipelining
    • 2g/3g/4g/wifi etc.
    • lots of things to measure and you want to measure all those things and have a composite score of it all
    • It's a subset of pageload. (Probably should be related to the cross browser page load bench - could share code)
    • Folks want to win on network, but we don't have data on where we are right now on it
    • Also measure how much bandwidth we are consuming (can we use less data bandwidth so that our users are charged less $$$ for their data plans)

P2

  • Memory usage (already owned: kats)
    • Bug Number?
  • Responsiveness/snappy (already owned: chris lord)
    • Bug Number?
  • Canvas 2d/webgl/game-y things - sprites, etc
    • Same as the current gaming benchmarks? Bugs file yet?

P3

  • Startup Time
    • Priority goes up if you're using webRT, less if browser)
    • Startup test is well measured by edieticker, but not simple execution, and not correctness
    • Don't have reproducible benchmark to have other people measure it
  • Battery
    • Probably going to need specialized hardware to test, and so won't have simple execution
  • Correctness/completeness of implementation
    • HTML5, we could just fix html5test.com
  • media perf - video/audio
    • x number of frames per sec on video is cool, we don't need to do more than some level x
    • Should be able to use a cross browser API to do it.
    • You want Eideticker to be used as a last resort - we could have a way to do this on a web page using the various gecko/webkit APIs
    • Eyeball test is horrible, but we don't have any numbers
  • CSS 3 animations/transitions (used for effects in games)
    • Pri: 3 (but much discussion on how important it is)

P4

  • JS Perf
    • Priority: P4 (keep track of it, but there are so many good benchmarks we can use that we don't need to develop anything here at the moment)
  • Layout - reflow, ?

P5

  • Disk IO/cache
    • Pri: 5
  • size on disk (install size)
    • Pri: 5/6

Things to accomplish

  • simple execution
  • simple reporting
  • track regressions
  • cross browser
  • correctness/completion of implementation
  • testing doing things the right way - and throwing out tests that do things the "wrong" way and we should educate people why the "wrong" tests are useless
  • Emphasize doing things the "right" way

Plan Short term (this week)

  • media bench
  • canvas/webgl tests - vlad
  • startup test using a script tag that prints current time like old ts, that should work for all cross browsers

Plan Long Term

  • Find owners for page load and network benchmark tests
  • battery - we had a tool that we could use to measure it - needs owner.
    • Is it enough to simply measure battery usage over the life of a test run (more reproducible)
    • Could also measure how much of the cpu you are using over the course of a test
    • Have simple setup and measure against "real" environent to see if it is close enough to reality
    • Could write an android test app for this using the android instrumentation API
  • memory: kats
  • responsiveness: chris lord
  • startup time - it would be nice to have something any one can run
    • publish the edieticker stuff

If testing setup is complicated

  • Document our approach so that people can set up their own systems
  • Engage people that build their own system, especially if they get different numbers than we do