Performance/Status Meetings/2007-June-06
Participants
Action Item Update
Agenda
- Choosing time/frequency of meeting: 10am-11am? Every week?
- Generate reliable, relevant performance data (already underway as talos). Talos status update?
- Graph server status
- Areas where help is needed
- Reducing test variance
- expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
- reduce noise in tests to ~1% (suggested by bz, not started)
- run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)
- move perf tests to chrome, so we get more reliable results, and can test more than just content
- improve performance reporting and analyses:
- Graph server for easy build-to-build comparisons
- Better reports for sheriffs to easily spot perf regressions
- Tracking down specific performance issues
- stats change to track AUS usage by osversion.
- New ideas
Action Items
- Preed: Assist Robcc with BB Master instance (use ref plat)
- preed: handed ref image to ????. Back and forth questions on what was/wasnt needed. Some issues about setting up developer instance on local machine in addition to server image. AI:preed to meet offline robcee.
- Alice/Robcc: File bugs/bullet list of areas others could help for perf infra
- AI:alice: collected info on personal webpage, will post to public server soon
- Justin & rhelmer: hardware for JProf
- AI:rhelmer reopened bug 366615
- Getting higher resolution timers for tests
- AI:Damon will meet with Boris about this. Different issues on different platforms.
AI:Damon: Timer Resolution
Information from Boris:
There are several timers that are involved in the performance tests. First of all, there is the JS Date.now() function. This is a priori accurate to no better than 1ms. We also use JS timeouts (same accuracy) and perl's timing stuff (worse, unless Time::HiRes is installed, which it should be on all the test boxen; with HiRes we should be getting microsecond precision).
In practice, the actual accuracy is sometimes worse than the 1ms accuracy listed above.
On Windows some of the commonly-used timer APIs (e.g. timeGetTime) only give 15ms accuracy; I'm not sure which, if any, of the above are affected by that. I seem to recall issues with JS timeouts due to that. Certainly anything on Windows that uses PR_IntervalNow() will be affected by the timeGetTime behavior.
Most of the other things above seem to use gettimeofday on Linux and GetSystemTimeAsFileTime on Windows; both seem to be accurate enough for our purposes. I think. The msdn docs on GetSystemTimeAsFileTime are pretty slim.
On Mac I'm really not sure what the situation is.
In general, I _think_ that anything that's using JS Date.now() directly is good for 1ms precision (which means that tasks of under 100ms are hard to time to 1% accuracy). Anything using timeouts (I seem to recall Tp does this?) will get noise in it due to PR_IntervalNow; on Windows this might be a lot of noise. Ts uses the Perl timing, which should be ok.
Other Information
Priorities for infra:
- Get farm up with new machines reporting data for trunk
- AI:Robcee configuring machines with Alice, expecting to have machines reporting this afternoon.
- Generate historical baselines
- General profile data regularly on builds
- Getting the perf numbers more stable
- Developing the graph server to display time spent in each module