Confirmed users, Bureaucrats and Sysops emeriti
3,599
edits
(→Design) |
m (moved Firefox/Sprints/Places DB Creation Scripts to Firefox/Projects/Places DB Creation Scripts: They're actually projects) |
||
(34 intermediate revisions by 4 users not shown) | |||
Line 4: | Line 4: | ||
;Description | ;Description | ||
:Create python | :Create python scripts to generate Places DBs with various characteristics such as "many visits within the same domain", "visits across many domains", "many tags", "many bookmarks", etc. Also, collect data from real-world users to inform the profiles of our generated DBs. | ||
= Status = | |||
Sprint's been on the back burner while we're getting Firefox 3.5b4 out the door. | |||
Currently collecting stats from Mozilla community at https://places-stats.mozilla.com/. Been doing so since early March. Stats will inform our database generation script. | |||
Database generation script (Python) being worked on. Patches are up on {{bug|480340}}. If you are feeling adventurous, please download the Python and try it out. I would like to document bootstrapping this better. Feel free to ping ddahl in #places. | |||
Relevant links: | |||
* [http://forums.mozillazine.org/viewtopic.php?f=23&t=1172765 Mozillazine forum posting] about stats collection portion | |||
* [http://daviddahl.blogspot.com/2009/03/places-database-generator-stats.html ddahl's blog post] about database generation script | |||
* [http://blog.mozilla.com/adw/2009/03/25/places-stats/ adw's blog post] about stats collection implementation and initial results | |||
= Goals / Use Cases = | = Goals / Use Cases = | ||
The chief goal is to be able to automate the generation of these sample sqlite databases for a continuous test to run on Places. We want to be able to reliably set some benchmarks and see what code changes either slow down or speed up queries in Places. | '''The chief goal''' is to be able to automate the generation of these sample sqlite databases for a continuous test to run on Places. We want to be able to reliably set some benchmarks and see what code changes either slow down or speed up queries in Places. | ||
The sample data set should actually be quite huge (according to Beltzner and Shaver). We should collect stats from users so that our sample databases reflect real-world use. | |||
Next step: take as input to the generation script the data we gather from the stats web page. | |||
= Non Goals = | = Non Goals = | ||
Creating a sample database for every little niche use case. If at some point it becomes important to test a little niche use case, fine, our generator script should be able to handle it, but we will not be doing so at the outset. | |||
Going out of our way to collect data that would help other teams/sprinters at Mozilla. If we can share our results with others because it would help them, fantastic. But time is wasting, we need to get going, so we can't accommodate everyone. Maybe later. | |||
= Design = | = Design = | ||
Line 42: | Line 62: | ||
* Keywords | * Keywords | ||
Shawn says: | We can come up with different data points in each dimension, take cartesian product across all dimensions to get a full suite of databases... User of our script should be able to specify a point in each dimension, and our script generates a database. | ||
= Implementation = | |||
=== Database generator === | |||
set up django: | |||
http://www.djangoproject.com/download/1.0.2/tarball/ | |||
uncompress and run: | |||
sudo python setup.py install | |||
add django bin to your path | |||
export PATH=$PATH:~/code/python/django/bin:~/code/python | |||
cd ~/code/python | |||
run this: | |||
django-admin.py startproject places | |||
django-admin.py startapp builddb | |||
copy a places.sqlite file to ~/code/python/places | |||
export PLACES_DB_PATH=~/code/python/places/places.sqlite | |||
export DJANGO_SETTINGS_MODULE=places.settings | |||
export PYTHONPATH=$PYTHONPATH:~/code/python | |||
edit the places/settings.py: | |||
import os | |||
DATABASE_ENGINE = 'sqlite3' | |||
DATABASE_NAME = os.environ['PLACES_DB_PATH'] | |||
reverse engineer the Django Models from the schema: | |||
cd ~/code/python/places | |||
python manage.py inspectdb >> builddb/models.py | |||
Now, we need to clean up the foreign keys. | |||
=== Stats collector === | |||
https://places-stats.mozilla.com/ | |||
The stats collector is a CGI script written in Ruby located at the above address. Visitors are presented with instructions on how to submit statistics related to their Places databases. They copy a small piece of JavaScript, located at | |||
https://places-stats.mozilla.com/places.js and embedded in the page, and paste it into Firefox's JavaScript console and evaluate it. The JavaScript computes numerous statistics from their Places database, presents them to the user, and allows him to submit them to the site. Once submitted, the stats are inserted into a MySQL database, from which they are presented to all visitors to the site. | |||
We will publicize the site via blogs, forums, and wherever else to solicit submissions from the community. | |||
= Bugs = | |||
* {{bug|480340}} | |||
* https://places-stats.mozilla.com/ | |||
= Misc notes for ddahl and adw = | |||
=== Awesomebar autocomplete === | |||
How should AutoComplete be stressed? Shawn says: | |||
* http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | * http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | ||
Line 210: | Line 297: | ||
</pre> | </pre> | ||
AutoComplete is definitely important, but we'd like our database construction scripts/methodology to be general enough to generate places databases for any kind of testing context. | |||
=== Frecency === | |||
* [https://developer.mozilla.org/en/The_Places_frecency_algorithm Algorithm description], though sdwilsh says this may be out of date | |||
* Actual frecency calculation at nsNavHistory::CalculateFrecencyInternal(), http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistory.cpp#7275 | |||
=== Stats we should have collected but did not === | |||
For each data point: | |||
* Distribution of moz_historyvisits.visit_type. This value is one of the nsINavHistoryService.TRANSITION_* constants. | |||
* Distribution of moz_places.typed | |||
* Distribution of moz_places.frecency | |||
* Nested folder stats (ddahl) | |||