Performance/Status Meetings/2007-June-06

From MozillaWiki
Jump to navigation Jump to search

« Back to Status Meetings

Participants

Action Item Update

Agenda

  • Choosing time/frequency of meeting: 10am-11am? Every week?
  • Generate reliable, relevant performance data (already underway as talos). Talos status update?
  • Graph server status
  • Areas where help is needed
  • Reducing test variance
  • expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
  • reduce noise in tests to ~1% (suggested by bz, not started)
  • run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)
  • move perf tests to chrome, so we get more reliable results, and can test more than just content
  • improve performance reporting and analyses:
  • Graph server for easy build-to-build comparisons
  • Better reports for sheriffs to easily spot perf regressions
  • Tracking down specific performance issues
  • stats change to track AUS usage by osversion.
  • New ideas

Action Items

  • Preed: Assist Robcc with BB Master instance (use ref plat)
    • preed: handed ref image to ????. Back and forth questions on what was/wasnt needed. Some issues about setting up developer instance on local machine in addition to server image. AI:preed to meet offline robcee.
  • Alice/Robcc: File bugs/bullet list of areas others could help for perf infra
    • AI:alice: collected info on personal webpage, will post to public server soon
  • Justin & rhelmer: hardware for JProf
  • Getting higher resolution timers for tests
    • AI:Damon will meet with Boris about this. Different issues on different platforms.

AI:Damon: Timer Resolution

Information from Boris:

There are several timers that are involved in the performance tests. First of all, there is the JS Date.now() function. This is a priori accurate to no better than 1ms. We also use JS timeouts (same accuracy) and perl's timing stuff (worse, unless Time::HiRes is installed, which it should be on all the test boxen; with HiRes we should be getting microsecond precision).

In practice, the actual accuracy is sometimes worse than the 1ms accuracy listed above.

On Windows some of the commonly-used timer APIs (e.g. timeGetTime) only give 15ms accuracy; I'm not sure which, if any, of the above are affected by that. I seem to recall issues with JS timeouts due to that. Certainly anything on Windows that uses PR_IntervalNow() will be affected by the timeGetTime behavior.

Most of the other things above seem to use gettimeofday on Linux and GetSystemTimeAsFileTime on Windows; both seem to be accurate enough for our purposes. I think. The msdn docs on GetSystemTimeAsFileTime are pretty slim.

On Mac I'm really not sure what the situation is.

In general, I _think_ that anything that's using JS Date.now() directly is good for 1ms precision (which means that tasks of under 100ms are hard to time to 1% accuracy). Anything using timeouts (I seem to recall Tp does this?) will get noise in it due to PR_IntervalNow; on Windows this might be a lot of noise. Ts uses the Perl timing, which should be ok.

Other Information

Priorities for infra:

  • Get farm up with new machines reporting data for trunk
    • AI:Robcee configuring machines with Alice, expecting to have machines reporting this afternoon.
  • Generate historical baselines
  • General profile data regularly on builds
  • Getting the perf numbers more stable
  • Developing the graph server to display time spent in each module