Places/Stats: Difference between revisions
< Places
Jump to navigation
Jump to search
(Documenting dataset cleaning & initial R code) |
m (→Analysis) |
||
| Line 9: | Line 9: | ||
* Insights into usage of bookmarks history | * Insights into usage of bookmarks history | ||
* Use open source tools to create and iterate on reproducible analysis of the places stats data set | * Use open source tools to create and iterate on reproducible analysis of the places stats data set | ||
* [Andyed] Investigate potential to | * [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.) | ||
==== Toolset ==== | ==== Toolset ==== | ||
Revision as of 22:49, 12 June 2009
Context
Analysis
Goals:
- Insights into usage of bookmarks history
- Use open source tools to create and iterate on reproducible analysis of the places stats data set
- [Andyed] Investigate potential to gather updated stats for metrics tracked in historical research (% usage of bookmarks, % new urls visited, etc.)
Toolset
- R
- GGobi
Code
See the Etherpad Page for the scratchpad
Load Data
places <- read.csv("...places.csv")
Compute age metrics
places$oldest_stamp = as.POSIXct(strptime(as.character(places$visit_date_oldest),format="%m/%d/%y %H:%M")) places$newest_stamp = as.POSIXct(strptime(as.character(places$visit_date_newest),format="%m/%d/%y %H:%M")) places$time_delta = difftime(places$newest_stamp,places$oldest_stamp, units="days")
Tags & Bookmark Metrics
places$bookmark_tagged_pct = (places$bookmark_cnt - places$bookmark_nontag_cnt )/ places$bookmark_cnt places$folder_cnt_crrctd = places$folder_cnt - places$bookmark_cnt