Webdev:Meetings:2009-09-01

From MozillaWiki
Jump to: navigation, search

Open Items

  • AMO team is seeking ideas: We generate CSVs from our statistics data for add-on authors. There are 3 date groupings and 8 different ways to plot the data. The problem is, the historical data continues to grow and we're running out of memory building these huge CSVs on the fly. Ideas:
    • When the uninstall survey started dying on CSVs we used a cron to build them and cache them to disk. If we do this for all our add-ons that's well over 200,000 files and growing. Perhaps we can combine this with one of the other ideas.
    • Provide less historical data. Right now it goes all the way back. Restricting that is weak sauce.
    • Reduce the number of groupings/plots. What if we just provided CSVs for daily downloads with a couple sets of columns. That's only ~15000 files per set of columns. Still a lot.
      • 1 row = 1 day, right? What if past $x weeks in history, we only offered monthly totals? i.e. data older than 6 months is 1 row = 1 (week/month)
    • Generate CSVs for add-ons with more than $x weeks of history. Eventually we'll have #1.
    • Write something way lighter weight to build CSVs on the fly. We can't scale this way forever though.
    • Limit the number of rows returned but provide paging params to view older ranges of data
    • Output CSV as it is generated and bypass Cake views, thus avoiding the need to generate huge arrays of data
    • Each add-on id gets its own tables stats.addonid.* and some data is only offered for a year:
      • *.downloads: date, version, n° of downloads
      • *.usage_total: date; sum of update pings
      • *.usage_apps: date, app, update pings
      • *.usage_ly_versions: date, version, update pings (only for last year)
      • *.usage_ly_apps_and versions: date, version, app, appversion, update pings, userEnabked pings, incompatible pings (only for last year, is there a need for needsDependencies or blocklisted?)
      • *.usage_ly_os: date, app, os, update pings (only for last year)
    • Maybe a service to mail the developer the csv data once per week/month
    • Can metrics do this for us?
    • Why get all the data in memory at once? We could have a little service that builds the csv and streams it out to disk. Let that stay cached for however long is appropriate.
    • add more ideas! thx