Auto-tools/Projects/Signal From Noise/StatusNovember2012: Difference between revisions

Jump to navigation Jump to search
(→‎State of Statistics: November 2012: remove unneeded section)
Line 50: Line 50:
* we use more replicates per page
* we use more replicates per page
* Datazilla utilizes improved statistical methodologies. Datazilla uses the welch's ttest, the FDR stuff, and the exponential smoothing.
* Datazilla utilizes improved statistical methodologies. Datazilla uses the welch's ttest, the FDR stuff, and the exponential smoothing.
Open questions:
* How many replicates do we need for a given page/pageset?  How *should* this be determined?: We use 30 replicates of each page load.
Right now it is a manual process of reading the data to determine how many we need. And we balance that against the demands of machine turn around time. I don't know that this needs to be addressed here. I think we should merely outline what we currenlty do. ''jhammel- to me it seems extraordinarily dangerous to not document a crucial part of our system.  30 isn't a magic number.  30 is not guaranteed to resolve all distributions.  If we ever need to reassess this, we should put either guidelines on how to do it or pointers that we haven't done it.  IMHO just putting what we've done is about as good as putting how the pre-SfN methodology worked: it makes it appear rigorous by omission when in fact there is no particular rigor to it.  Or to put this another way, is this not an open question?  Will 30 serve all of our needs forever? Is there occassion where this should be reassessed? Knowing when a model breaks down is important if you wish to use said model.''
** (as many as needed to resolve the parts of the distribution function we care about, clearly)
** What would be the computational cost, compared to now, of running Talos with this number of replicates?
drop it, I think we already do this.
** While this has been investigated by Metrics, https://wiki.mozilla.org/Metrics/Talos_Investigation#Higher_run_count_structure , no quantifiable conclusions has yet been reached
A datazilla-metrics repository, https://github.com/mozilla/datazilla-metrics ,  has been created, which is a python package that implements statistical methods useful for Datazilla.
A datazilla-metrics repository, https://github.com/mozilla/datazilla-metrics ,  has been created, which is a python package that implements statistical methods useful for Datazilla.


947

edits

Navigation menu