Performance/Status Meetings/2007-June-13: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 7: Line 7:
* '''AI:joduinn''' Choosing time/frequency of meeting: 10am-11am? Every week?
* '''AI:joduinn''' Choosing time/frequency of meeting: 10am-11am? Every week?
* '''AI:Alice/Robcee''': File bugs/bullet list of areas others could help for perf infra
* '''AI:Alice/Robcee''': File bugs/bullet list of areas others could help for perf infra
* [https://bugzilla.mozilla.org/show_bug.cgi?id=383264 bug 383264]
** hardware problem with pxp02 [https://bugzilla.mozilla.org/show_bug.cgi?id=383264 bug 383264]
* '''AI:robcee''' [https://bugzilla.mozilla.org/show_bug.cgi?id=383167 bug 383167] tracking problem getting buildID-in-a-file from Tinderbox.
* '''AI:robcee''' [https://bugzilla.mozilla.org/show_bug.cgi?id=383167 bug 383167] tracking problem getting buildID-in-a-file from Tinderbox.
* run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)
* run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)

Revision as of 17:30, 13 June 2007

« Back to Status Meetings

Participants

robcee, martain, alice, justin, bobclary, joduinn, justin, damon, vlad

Action Item Update

  • AI:joduinn Choosing time/frequency of meeting: 10am-11am? Every week?
  • AI:Alice/Robcee: File bugs/bullet list of areas others could help for perf infra
  • AI:robcee bug 383167 tracking problem getting buildID-in-a-file from Tinderbox.
  • run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)
  • AI:Justin see if the perf machines are swapping or if they need more memory.
    • Justin & rhelmer: hardware for JProf died.
    • AI:rhelmer reopened bug 366615. Hardware replaced, but rhelmer still installing software.
    • vlad filed bug364779. No longer linux specific, now platform independent.
  • Getting higher resolution timers for tests
    • AI:Damon will meet with Boris about this. Different issues on different platforms.
  • Graph server status
    • Graph server for easy build-to-build comparisons
    • her latest changes now checked into graphs.mozilla.org.
    • AI:alice, justin Discussions with IT about having them maintain the machine, not just Alice. Justin & Alice to meet, setup staging & production machines. Justin to support production machine, but not 24x7. Alice to work on stage machines, push to production, like we do for a.m.o and other sites.

Agenda

  • Generate reliable, relevant performance data (already underway as talos). Talos status update?

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTest

  • Areas where help is needed
  • Reducing test variance
  • expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
  • reduce noise in tests to ~1% (suggested by bz, not started)
  • move perf tests to chrome, so we get more reliable results, and can test more than just content
  • improve performance reporting and analyses:
    • Better reports for sheriffs to easily spot perf regressions
    • Tracking down specific performance issues
    • stats change to track AUS usage by osversion.
  • New ideas
    • Question: How are we tracking perf bugs, specifically, and are we doing this the same way we are triaging security bugs? Can we do it the same way if not? (damon)

AI:Damon: Timer Resolution

Information from Boris:

There are several timers that are involved in the performance tests. First of all, there is the JS Date.now() function. This is a priori accurate to no better than 1ms. We also use JS timeouts (same accuracy) and perl's timing stuff (worse, unless Time::HiRes is installed, which it should be on all the test boxen; with HiRes we should be getting microsecond precision).

In practice, the actual accuracy is sometimes worse than the 1ms accuracy listed above.

On Windows some of the commonly-used timer APIs (e.g. timeGetTime) only give 15ms accuracy; I'm not sure which, if any, of the above are affected by that. I seem to recall issues with JS timeouts due to that. Certainly anything on Windows that uses PR_IntervalNow() will be affected by the timeGetTime behavior.

Most of the other things above seem to use gettimeofday on Linux and GetSystemTimeAsFileTime on Windows; both seem to be accurate enough for our purposes. I think. The msdn docs on GetSystemTimeAsFileTime are pretty slim.

On Mac I'm really not sure what the situation is.

In general, I _think_ that anything that's using JS Date.now() directly is good for 1ms precision (which means that tasks of under 100ms are hard to time to 1% accuracy). Anything using timeouts (I seem to recall Tp does this?) will get noise in it due to PR_IntervalNow; on Windows this might be a lot of noise. Ts uses the Perl timing, which should be ok.

JS timing is actually worse than 1ms on Mac - granularity is in 16 ms ticks, easily visible if you graph timings of short loops. gettimeofday() on Mac has a 1-microsecond granularity though. -Stan

Other Information

Priorities for infra:

  • Generate historical baselines
  • General profile data regularly on builds
  • Getting the perf numbers more stable
  • Developing the graph server to display time spent in each module