User:Staszyk/Thoughts on the webstats solutions

From MozillaWiki
Jump to: navigation, search

The numbers in parenthesis are priorities.

1. Mozilla custom reports

https://bugzilla.mozilla.org/show_bug.cgi?id=393200

In the mozilla.com profile we have a few Mozilla custom reports, which are really very good.

a) pages by source (p1)
         by browser (p1)
         by country (p2)
         by platform (p3)
b) sources by pages (p2)
c) browser by pages (p2)
d) platform by pages (p3)

This allows for getting detailed data (UA, source, platform, country) about every page of the site. This is important as it allows to have data about:

  • each locale (/$lang/)
  • downloads (esp. users' UA)
  • entry points (source and referrer data, e.g. getfirefox.com).

As far as the countries, UA, referrers and platform data is concerned, the default Urchin installation offers only sum totals for the entire site, without the possibility of breaking it down.

2. Proposed enhancements

(implemented on mozilla-europe.org and hopefully-soon-to-be-implemented) on mozilla.com)

a) outbound links tracking https://bugzilla.mozilla.org/show_bug.cgi?id=393056

  • click event listeners added with JavaScript
  • allows to learn how users exit our sites (and not only where they finish their visits)
  • allows to learn if users actually follow the links we propose them (e.g. support sites)
  • allows to learn which content users prefer (discussion boards, FAQs, newsgroups, books)

b) download tracking https://bugzilla.mozilla.org/show_bug.cgi?id=391347

  • allows to measure the download data everywhere on the site, not only for the pages which make use of the download.html transition page, but also for the pages with direct links to download.mozilla.org (which normally aren't tracked on mozilla.com and mozilla-europe.org, having a different hostname)
  • the proposed script (see the bug) provides a work-around for the mozilla.com Urchin bug [1], which only takes the first download parameter (e.g. ?product=firefox=2.0.0.6) and leaves out the other ones (e.g. lang, os).

3. Urchin's shortcomings

a) URLs are one-dimensional

Let's take the following example:

Page               Pageviews
/fr/               3,000
  /fr/products/     1,000
  /fr/support/      2,000
/en/               12,000
  /en/products/     4,000
  /en/support/      8,000

In the above example, it's difficult to see the data for all support pages (6,000 pageviews). We can only go deeper in the hierarchy of the urls, whereas often it would be nice to be able to change this hierarchy, and have:

Page               Pageviews
/products/         5,000
  /products/fr/     1,000
  /products/en/     4,000
/support/          10,000
  /support/fr/      2,000
  /support/en/      8,000

Current solution:

  1. use of regular expressions in the Filter field (only one filter allowed though)
  2. external processing, e.g. with OpenOffice import, sort and search features. Not very efficient, but has one great advantage: you export the data from Urcin and you are sure to be analysing the same set of data all the time (from the same time range)

Possible other solutions:

  • I've stared to think of having some kind of a database system, which would allow for multiple filters (includes, doesn't include, is equal to etc.)
  • Maybe some offline/online piece of software to provide a database-like interface? It would also make us analyse a stable set of data
  • How to integrate it with some statistical software? Maybe R?
  • How to make the interface capable of understanding the requests like: "display the download sum totals for all the locales, grouped by the users' browsers". Current regexp implementation won't cut it.

b) Urchin is good for doing a quick check of pageviews, downloads etc. It's not actually that good for actually conducting statistical analysis, I think. Hence the idea of some other tool described above, as well as the need to have a decent export functionality.

c) More export options. Tab-separated values are quite OK, but the export option in Urchin only gives us the data displayed in the currently viewed report. How about some more advanced combined exports? XML maybe? (or... DB dumps? ;-)

d) Urchin's Adobe SVG is evil.

e) Load balancing --> make nladm01 work again! [2]

f) Urchin seems to get the bytes data wrong.

g) 404 and non-existing pages -- it would be nice to be able to display data only for actually existing pages (but only as a option). Thousands of mistyped URLs are kind of a mess :-) On the other hand, they give us some insight, if the errors are repeated often, and that's why I would propose this only as a option.