FWIW, I'm not sure if Tp2 does the calculations properly. I swear I did a recent run where for one of them I had numbers like 100, 120, 110, 1600, 100 and it gave me a median of 800 for that test. Needless to say, that skewed things a bit. It did show the 1600 as grayed out, but it seems it was still factored into the calculation.
EDIT: OK, I know for sure it's not calculating correctly. Here's five times from a run I just did:
391 296 390 234 297
As intended, it did grey out the 391 value. However, it still calculated a median of 343.50, which is clearly wrong. It should be 296.50. It appears that instead of averaging the middle two remaining numbers, it's averaging the top two remaining numbers.
EDIT2: The more I look at other medians, the more suspect things get. I'm seeing other examples of things being completely wrong even when it appears that the high number is being factored out properly. For example, I did another run where the four remaining numbers were 141 141 157 157. The median of that is 149. The value reported was 157.
EDIT3: One more round of edits and I'm done. It appears that maybe the actual tinderboxen are doing the calculations correctly and the problem I'm seeing only occurs when Tp2 is run locally (which is still a valid bug :-)...). Looking a recent run from fx-linux-tbox perf test, I see something like this:
If I'm reading that correctly it's median, mean, min, max, <5 times>. At which point, those numbers are correct with the 680 run factored out. So I guess the good news at least is that the tinderboxen seem to be doing the correct calculations at least.
-RyanVM 06:58, 27 June 2007 (PDT)