Platform/Games/GameFocusedBenchmarking: Difference between revisions

Platform/Games/GameFocusedBenchmarking (view source)

Revision as of 21:07, 12 December 2013

1,798 bytes added , 12 December 2013

→‎Current Results

Klahnakoski

Confirmed users

513

edits

@@ Line 13: / Line 13: @@
 ==Current Results==
-View [https://metrics.mozilla.com/bugzilla-analysis/Perfy-Overview.html] and click to drill down to details.
+View [https://metrics.mozilla.com/bugzilla-analysis/Perfy-Overview.html Perfy-Overview.html] (requires LDAP) and click to drill down to details.
+The charts show average score over the 20 test runs performed.  We believe average is a fair aggregate for the sub-test results given the characteristics we see:
+: ''Many of the sub-tests have bimodal behavior ([https://metrics.mozilla.com/bugzilla-analysis/Perfy-Details.html#benchmark=Benchmark.octane-2.0.Mandreel&platform=Platform.Linux&date=2013-12-12 example]).  We highly suspect garbage collection is being triggered during some of the test runs, and is the cause of the two modes.  The plan is to add more tooling to count the number and duration of GC events to confirm this suspicion.
+''
+Each individual mode has low variance; the weight given to each of the modes is equal to the number of samples that we observe; and the average provides us with number that best reflects that balance.
+'''Caveate''' - The time series ([https://metrics.mozilla.com/bugzilla-analysis/Perfy-TimeSeries.html#platform=Platform.Linux&benchmark=Benchmark.octane-2.0 example]) does '''NOT''' use average, it uses median!  The purpose of the time-series is to detect regressions in performance, which means we are only interested in one of the two modes and whether it changes over time.  The median naturally chooses the most popular mode giving us consistent results over time.  There are dangers to using median; for one, it is not sensitive to change in the balance of modes over time.  Yet, median is much better then average given the small number of tests we perform.  Average is sensitive to the random fluctuations between each battery of tests, not large enough when viewed on it's own, but distracting when looking at variation between batteries.  We do not want to be chasing regressions when GC happened to hit thrice this round, and only twice last time.
 ==Scope==

Platform/Games/GameFocusedBenchmarking: Difference between revisions

Platform/Games/GameFocusedBenchmarking (view source)

Revision as of 21:07, 12 December 2013

Navigation menu

Search