Performance/Status Meetings/2007-June-06

From MozillaWiki
Jump to: navigation, search

« Back to Status Meetings


robcee, vlad,

Action Item Update


  • Choosing time/frequency of meeting: 10am-11am? Every week?
  • Generate reliable, relevant performance data (already underway as talos). Talos status update?

  • Areas where help is needed
  • Reducing test variance
  • expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
  • reduce noise in tests to ~1% (suggested by bz, not started)
  • move perf tests to chrome, so we get more reliable results, and can test more than just content
  • Graph server status
    • Graph server for easy build-to-build comparisons
    • her latest changes now checked into AI:alice Discussions with IT about having them maintain the machine, not just Alice.
  • improve performance reporting and analyses:
    • Better reports for sheriffs to easily spot perf regressions
    • Tracking down specific performance issues
    • stats change to track AUS usage by osversion.
  • New ideas
    • Question: How are we tracking perf bugs, specifically, and are we doing this the same way we are triaging security bugs? Can we do it the same way if not? (damon)

Action Items

  • Preed: Assist Robcc with BB Master instance (use ref plat)
    • preed: handed ref image to ????. Back and forth questions on what was/wasnt needed. Some issues about setting up developer instance on local machine in addition to server image. AI:preed to meet offline robcee. Done.
  • Alice/Robcee: File bugs/bullet list of areas others could help for perf infra
  • Get farm up with new machines reporting data for trunk
  • run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)
    • Justin & rhelmer: hardware for JProf died.
    • AI:rhelmer reopened bug 366615
    • vlad filed bug364779
  • Getting higher resolution timers for tests
    • AI:Damon will meet with Boris about this. Different issues on different platforms.

AI:Damon: Timer Resolution

Information from Boris:

There are several timers that are involved in the performance tests. First of all, there is the JS function. This is a priori accurate to no better than 1ms. We also use JS timeouts (same accuracy) and perl's timing stuff (worse, unless Time::HiRes is installed, which it should be on all the test boxen; with HiRes we should be getting microsecond precision).

In practice, the actual accuracy is sometimes worse than the 1ms accuracy listed above.

On Windows some of the commonly-used timer APIs (e.g. timeGetTime) only give 15ms accuracy; I'm not sure which, if any, of the above are affected by that. I seem to recall issues with JS timeouts due to that. Certainly anything on Windows that uses PR_IntervalNow() will be affected by the timeGetTime behavior.

Most of the other things above seem to use gettimeofday on Linux and GetSystemTimeAsFileTime on Windows; both seem to be accurate enough for our purposes. I think. The msdn docs on GetSystemTimeAsFileTime are pretty slim.

On Mac I'm really not sure what the situation is.

In general, I _think_ that anything that's using JS directly is good for 1ms precision (which means that tasks of under 100ms are hard to time to 1% accuracy). Anything using timeouts (I seem to recall Tp does this?) will get noise in it due to PR_IntervalNow; on Windows this might be a lot of noise. Ts uses the Perl timing, which should be ok.

JS timing is actually worse than 1ms on Mac - granularity is in 16 ms ticks, easily visible if you graph timings of short loops. gettimeofday() on Mac has a 1-microsecond granularity though. -Stan

Other Information

Priorities for infra:

  • Generate historical baselines
  • General profile data regularly on builds
  • Getting the perf numbers more stable
  • Developing the graph server to display time spent in each module