Confirmed users, Bureaucrats and Sysops emeriti
3,599
edits
(→Design) |
m (moved Firefox/Sprints/Places DB Creation Scripts to Firefox/Projects/Places DB Creation Scripts: They're actually projects) |
||
(46 intermediate revisions by 4 users not shown) | |||
Line 4: | Line 4: | ||
;Description | ;Description | ||
:Create python | :Create python scripts to generate Places DBs with various characteristics such as "many visits within the same domain", "visits across many domains", "many tags", "many bookmarks", etc. Also, collect data from real-world users to inform the profiles of our generated DBs. | ||
= Status = | |||
Sprint's been on the back burner while we're getting Firefox 3.5b4 out the door. | |||
Currently collecting stats from Mozilla community at https://places-stats.mozilla.com/. Been doing so since early March. Stats will inform our database generation script. | |||
Database generation script (Python) being worked on. Patches are up on {{bug|480340}}. If you are feeling adventurous, please download the Python and try it out. I would like to document bootstrapping this better. Feel free to ping ddahl in #places. | |||
Relevant links: | |||
* [http://forums.mozillazine.org/viewtopic.php?f=23&t=1172765 Mozillazine forum posting] about stats collection portion | |||
* [http://daviddahl.blogspot.com/2009/03/places-database-generator-stats.html ddahl's blog post] about database generation script | |||
* [http://blog.mozilla.com/adw/2009/03/25/places-stats/ adw's blog post] about stats collection implementation and initial results | |||
= Goals / Use Cases = | = Goals / Use Cases = | ||
The chief goal is to be able to automate the generation of these sample sqlite databases for a continuous test to run on Places. We want to be able to reliably set some benchmarks and see what code changes either slow down or speed up queries in Places. | '''The chief goal''' is to be able to automate the generation of these sample sqlite databases for a continuous test to run on Places. We want to be able to reliably set some benchmarks and see what code changes either slow down or speed up queries in Places. | ||
The sample data set should actually be quite huge (according to Beltzner and Shaver). We should collect stats from users so that our sample databases reflect real-world use. | |||
Next step: take as input to the generation script the data we gather from the stats web page. | |||
= Non Goals = | = Non Goals = | ||
Creating a sample database for every little niche use case. If at some point it becomes important to test a little niche use case, fine, our generator script should be able to handle it, but we will not be doing so at the outset. | |||
Going out of our way to collect data that would help other teams/sprinters at Mozilla. If we can share our results with others because it would help them, fantastic. But time is wasting, we need to get going, so we can't accommodate everyone. Maybe later. | |||
= Design = | = Design = | ||
Line 40: | Line 60: | ||
* Number of tags | * Number of tags | ||
* Nature of tags (each bookmark has tons of tags, few tags, or varied?) | * Nature of tags (each bookmark has tons of tags, few tags, or varied?) | ||
* Keywords | |||
We can come up with different data points in each dimension, take cartesian product across all dimensions to get a full suite of databases... User of our script should be able to specify a point in each dimension, and our script generates a database. | |||
= Implementation = | |||
=== Database generator === | |||
set up django: | |||
http://www.djangoproject.com/download/1.0.2/tarball/ | |||
uncompress and run: | |||
sudo python setup.py install | |||
add django bin to your path | |||
export PATH=$PATH:~/code/python/django/bin:~/code/python | |||
cd ~/code/python | |||
run this: | |||
django-admin.py startproject places | |||
django-admin.py startapp builddb | |||
copy a places.sqlite file to ~/code/python/places | |||
export PLACES_DB_PATH=~/code/python/places/places.sqlite | |||
export DJANGO_SETTINGS_MODULE=places.settings | |||
export PYTHONPATH=$PYTHONPATH:~/code/python | |||
edit the places/settings.py: | |||
import os | |||
DATABASE_ENGINE = 'sqlite3' | |||
DATABASE_NAME = os.environ['PLACES_DB_PATH'] | |||
reverse engineer the Django Models from the schema: | |||
cd ~/code/python/places | |||
python manage.py inspectdb >> builddb/models.py | |||
Now, we need to clean up the foreign keys. | |||
=== Stats collector === | |||
https://places-stats.mozilla.com/ | |||
The stats collector is a CGI script written in Ruby located at the above address. Visitors are presented with instructions on how to submit statistics related to their Places databases. They copy a small piece of JavaScript, located at | |||
https://places-stats.mozilla.com/places.js and embedded in the page, and paste it into Firefox's JavaScript console and evaluate it. The JavaScript computes numerous statistics from their Places database, presents them to the user, and allows him to submit them to the site. Once submitted, the stats are inserted into a MySQL database, from which they are presented to all visitors to the site. | |||
We will publicize the site via blogs, forums, and wherever else to solicit submissions from the community. | |||
= Bugs = | |||
* {{bug|480340}} | |||
* https://places-stats.mozilla.com/ | |||
= Misc notes for ddahl and adw = | |||
=== Awesomebar autocomplete === | |||
Shawn says: | How should AutoComplete be stressed? Shawn says: | ||
* http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | * http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp | ||
** GetAutoCompleteBaseQuery() | ** GetAutoCompleteBaseQuery() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#190 | ||
** see BOOK_TAG_SQL - having a lot of tags will slow stuff down, however that might not be representative of "normal users" | ** see BOOK_TAG_SQL - having a lot of tags will slow stuff down, however that might not be representative of "normal users": http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#109 | ||
** mDBAdaptiveQuery | ** mDBAdaptiveQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#485 | ||
** mDBKeywordQuery | ** mDBKeywordQuery http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#507 | ||
** AutoCompleteProcessSearch() | ** AutoCompleteProcessSearch() http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#1000 - does post-processing | ||
* This file (or these queries at least) are being rewritten in JS: see _processRow() in https://bug455555.bugzilla.mozilla.org/attachment.cgi?id=363641 | * This file (or these queries at least) are being rewritten in JS: see _processRow() in https://bug455555.bugzilla.mozilla.org/attachment.cgi?id=363641 | ||
--- | Some notes on the above funcs and SQL: | ||
* GetAutoCompleteBaseQuery() selects from table moz_places(_temp) x moz_favicons; where frecency != 0; orders by column 9 (guessing this is frecency column...) | |||
* BOOK_TAG_SQL defined in terms of SQL_STR_FRAGMENT_GET_BOOK_TAG http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp#94 which itself selects from moz_bookmarks x moz_bookmarks where TYPE_BOOKMARK; sometimes orders by lastModified; | |||
* mDBAdaptiveQuery uses BOOK_TAG_SQL, moz_inputhistory... | |||
GetAutoCompleteBaseQuery() boils down to: | |||
<pre> | |||
SELECT h.url, | |||
h.title, | |||
f.url, | |||
(SELECT b.parent | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS parent, | |||
(SELECT b.title | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS bookmark, | |||
(SELECT GROUP_CONCAT(t.title, ',') | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent = ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
) AS tags, | |||
h.visit_count, h.typed, h.frecency | |||
FROM moz_places_temp h | |||
LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id | |||
WHERE h.frecency <> 0 | |||
{ADDITIONAL_CONDITIONS} | |||
UNION ALL | |||
SELECT h.url, | |||
h.title, | |||
f.url, | |||
(SELECT b.parent | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS parent, | |||
(SELECT b.title | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS bookmark, | |||
(SELECT GROUP_CONCAT(t.title, ',') | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent = ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
) AS tags, | |||
h.visit_count, | |||
h.typed, | |||
h.frecency | |||
FROM moz_places h | |||
LEFT OUTER JOIN moz_favicons f ON f.id = h.favicon_id | |||
WHERE h.id NOT IN (SELECT id FROM moz_places_temp) AND | |||
h.frecency <> 0 | |||
{ADDITIONAL_CONDITIONS} | |||
-- ORDER BY h.frecency, the 9th column | |||
ORDER BY 9 DESC LIMIT ?2 OFFSET ?3); | |||
</pre> | |||
mDBAdaptiveQuery: | |||
<pre> | |||
SELECT IFNULL(h_t.url, h.url), | |||
IFNULL(h_t.title, h.title), | |||
f.url, | |||
(SELECT b.parent | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS parent, | |||
(SELECT b.title | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent != ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
ORDER BY b.lastModified DESC LIMIT 1 | |||
) AS bookmark, | |||
(SELECT GROUP_CONCAT(t.title, ',') | |||
FROM moz_bookmarks b | |||
JOIN moz_bookmarks t ON t.id = b.parent AND t.parent = ?1 | |||
WHERE b.type = nsINavBookmarksService::TYPE_BOOKMARK AND | |||
b.fk = h.id | |||
) AS tags, | |||
IFNULL(h_t.visit_count, h.visit_count), | |||
IFNULL(h_t.typed, h.typed), | |||
rank | |||
FROM (SELECT ROUND( | |||
MAX( | |||
((i.input = ?2) + (SUBSTR(i.input, 1, LENGTH(?2)) = ?2)) * | |||
i.use_count | |||
), 1 | |||
) AS rank, | |||
place_id | |||
FROM moz_inputhistory i | |||
GROUP BY i.place_id HAVING rank > 0 | |||
) AS i | |||
LEFT JOIN moz_places h ON h.id = i.place_id | |||
LEFT JOIN moz_places_temp h_t ON h_t.id = i.place_id | |||
LEFT JOIN moz_favicons f ON f.id = IFNULL(h_t.favicon_id, h.favicon_id) | |||
WHERE IFNULL(h_t.url, h.url) NOTNULL | |||
ORDER BY rank DESC, IFNULL(h_t.frecency, h.frecency) DESC | |||
</pre> | |||
mDBKeywordQuery: | |||
<pre> | |||
SELECT IFNULL( | |||
(SELECT REPLACE(url, '%s', ?2) FROM moz_places_temp WHERE id = b.fk), | |||
(SELECT REPLACE(url, '%s', ?2) FROM moz_places WHERE id = b.fk) | |||
) AS search_url, | |||
IFNULL(h_t.title, h.title), | |||
COALESCE( | |||
f.url, | |||
(SELECT f.url FROM moz_places_temp | |||
JOIN moz_favicons f ON f.id = favicon_id | |||
WHERE rev_host = IFNULL( | |||
(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), | |||
(SELECT rev_host FROM moz_places WHERE id = b.fk) | |||
) | |||
ORDER BY frecency DESC LIMIT 1), | |||
(SELECT f.url FROM moz_places | |||
JOIN moz_favicons f ON f.id = favicon_id | |||
WHERE rev_host = IFNULL( | |||
(SELECT rev_host FROM moz_places_temp WHERE id = b.fk), | |||
(SELECT rev_host FROM moz_places WHERE id = b.fk) | |||
) | |||
ORDER BY frecency DESC LIMIT 1) | |||
), | |||
b.parent, | |||
b.title, | |||
NULL, | |||
IFNULL(h_t.visit_count, h.visit_count), | |||
IFNULL(h_t.typed, h.typed) | |||
FROM moz_keywords k | |||
JOIN moz_bookmarks b ON b.keyword_id = k.id | |||
LEFT JOIN moz_places AS h ON h.url = search_url | |||
LEFT JOIN moz_places_temp AS h_t ON h_t.url = search_url | |||
LEFT JOIN moz_favicons f ON f.id = IFNULL(h_t.favicon_id, h.favicon_id) | |||
WHERE LOWER(k.keyword) = LOWER(?1) | |||
ORDER BY IFNULL(h_t.frecency, h.frecency) DESC") | |||
</pre> | |||
AutoComplete is definitely important, but we'd like our database construction scripts/methodology to be general enough to generate places databases for any kind of testing context. | |||
=== Frecency === | |||
* [https://developer.mozilla.org/en/The_Places_frecency_algorithm Algorithm description], though sdwilsh says this may be out of date | |||
* Actual frecency calculation at nsNavHistory::CalculateFrecencyInternal(), http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistory.cpp#7275 | |||
=== Stats we should have collected but did not === | |||
For each data point: | |||
* Distribution of moz_historyvisits.visit_type. This value is one of the nsINavHistoryService.TRANSITION_* constants. | |||
* Distribution of moz_places.typed | |||
* Distribution of moz_places.frecency | |||
* Nested folder stats (ddahl) |