Places/Stats: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Documenting dataset cleaning & initial R code)
 
Line 9: Line 9:
* Insights into usage of bookmarks history
* Insights into usage of bookmarks history
* Use open source tools to create and iterate on reproducible analysis of the places stats data set
* Use open source tools to create and iterate on reproducible analysis of the places stats data set
* [Andyed] Investigate potential to provide updated stats to the research community for historical longitudinal stats
* [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.)


==== Toolset ====
==== Toolset ====

Revision as of 22:49, 12 June 2009

Context

See Places-Stats.mozilla


Analysis

Goals:

  • Insights into usage of bookmarks history
  • Use open source tools to create and iterate on reproducible analysis of the places stats data set
  • [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.)

Toolset

  • R
  • GGobi

Code

See the Etherpad Page for the scratchpad

Load Data

places <- read.csv("...places.csv")

Compute age metrics

places$oldest_stamp = as.POSIXct(strptime(as.character(places$visit_date_oldest),format="%m/%d/%y %H:%M"))
places$newest_stamp = as.POSIXct(strptime(as.character(places$visit_date_newest),format="%m/%d/%y %H:%M"))
places$time_delta = difftime(places$newest_stamp,places$oldest_stamp, units="days")

Tags & Bookmark Metrics

places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt
places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt