Performance/Status Meetings/2007-June-27

From MozillaWiki
Jump to: navigation, search

« Back to Status Meetings


Action Item Update

  • AI:Justin see if the perf machines are swapping or if they need more memory.
    • 5 are done, 5 more will be done this week.
    • bug 384314. Arriving tomorrow (thurs).
  • AI:robcee bug 383167 tracking problem getting buildID-in-a-file from Tinderbox.
  • AI:rhelmer run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)AI:rhelmer needs to change tests to start/stop JProf as part of the test... first implementation lost with harddisk failure, new implementation will be better anyway and checked in! Done, publishing results to experimental. Discussions about Tier1 vs Tier2 support.
    • vlad filed bug364779. No longer linux specific, now platform independent.
      • people constrained, will do this as part of talos framework. Can use buildbot as other option also.
  • Getting higher resolution timers for tests
    • AI:Damon will meet with Boris about this. Different issues on different platforms.
    • Rob Arnold to take: bug 363258
  • Graph server status
    • Graph server for easy build-to-build comparisons
    • her latest changes now checked into
    • AI:alice, justin DONE Discussions with IT about having them maintain the machine, not just Alice. Justin & Alice to meet, setup staging & production machines. Justin to support production machine, but not 24x7. Alice to work on stage machines, push to production, like we do for a.m.o and other sites. Justin setting up production machine, estimate 3 days.
  • AI:rhelmer/robcee XP machines frequently hang, freeze, out-of-mem, etc. Changes XP machines to clean-boot-and-auto-start-everything. Having them auto-login, start VNCserver, etc. rhelmer would like to do this for both build and perf machines. Run one perf machine rebooting-every-24hours, compare results to perf machines that are not rebooted frequently. No change.
  • Reducing test variance
    • AI:schrep DONE will try playing with existing TP2 logs from rhelmer, see if schrep can do math magic. Talked with Ken and learned the following:
      • Tp: 5 test runs per page, takes median of each page set, and averages those
      • Tp2: 5 test runs per page, takes median of each page set after max removed and averages those
      • Suggestion to do many more runs per page (e.g. 100) to stabilize results
      • bug 386084


  • Generate reliable, relevant performance data (already underway as talos). Talos status update?

  • Areas where help is needed
  • expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
  • reduce noise in tests to ~1% (suggested by bz, not started)
  • move perf tests to chrome, so we get more reliable results, and can test more than just content
  • improve performance reporting and analyses:
    • Better reports for sheriffs to easily spot perf regressions
    • Tracking down specific performance issues
  • stats change to track AUS usage by osversion.
  • Priorities for infra:
    • Generate historical baselines
    • General profile data regularly on builds
    • Getting the perf numbers more stable
    • Developing the graph server to display time spent in each module
  • New ideas
    • Question: How are we tracking perf bugs, specifically, and are we doing this the same way we are triaging security bugs? Can we do it the same way if not? (damon)

Gecko: Perf discussion

  • Perfathon next week
    • vlad to provide profile data before meeting
  • What's interesting data to collect?
    • High priority: historical 1.8->1.9 Tp(1,2,3) numbers
    • Ts time
    • memory usage
    • Talos initial results by next Wednesday
  • Todo: verification of test results
    • Fonts, valid rendering, etc.
    • TODO: Stuart: verify that concerns about talos results are addressed
  • Problem: Hard to try to identify perf regressions via backouts due to long cycle times and too-few machines
    • Will have 4 Talos machines doing interleaved perf tests, which will help
    • We need a better solution to this.
      • Have a machine at office that's constantly building and timing, but once a regression is identified, the continuous build can be suspended and people can try backing out patches and run perf tests.
      • Use bb try to help automate this

Last week

  • jrpof tinderbox
    • jprof profile splitting
      • need list of functions checked in
        • just toplevel chunks (whatever you want separated out)
      • need to go back to 1.8 and collect data from there
  • synthetic tests
    • put specific tests into talos framework
    • poach mochitest performance tests
    • individual reftest-like items
  • need to get a bunch of people looking at profiles
    • take up some of these meetings to sit down and look at profiles
    • timer-based profiling is better (vtune/jprof/oprofile/etc., not quantify)
    • TODO: vlad to generate profile for next week's meeting
  • running without Fx chrome
    • need dummy history impl
    • that would get us a number that's just pure gecko, and not timing the firefox UI
    • rhelmer's extension to run Tp2 in chrome
  • examining default theme for performance issues
  • Tp vs. Tp2
    • Tp2 avoids a lot of firefox UI stuff because it's in an iframe
    • Tp needs server-side code, but it doesn't have to be the exact code that's there now
  • probes
    • has value, but needs to be maintained
    • merge dtrace/nsProbes/etc. so that we can use whatever tool without reinstrumenting the code
    • dbaron already maintains entrypoints into layout to assert that layout doesn't reenter incorrectly
    • TODO: vlad/dbaron - connect with sun guys, take probe stuff somewhere

Other Information

Followup on JS timing granularity: turns out that it's not JS timing errors after all! Instead, it's the synthetic load I was using, which was to multiply a pair of numbers. If the load is changed to be addition of numbers, the measured time is a clean linear function of the number of additions. So the mystery becomes one of why multiplication in JS sometimes takes about 16ms longer than expected, but at least we're not suspicious about our measurement tools.

Related Bugs