User:Sspitzer/GlobalFrecency: Difference between revisions
No edit summary |
No edit summary |
||
| Line 134: | Line 134: | ||
unvisited bookmarks from previous versions will be hidden until visited, but we want them to show up in ac. for stuff we really want hidden, set frecency to 0, like place: and unvisited livemark items. (is there more?) | unvisited bookmarks from previous versions will be hidden until visited, but we want them to show up in ac. for stuff we really want hidden, set frecency to 0, like place: and unvisited livemark items. (is there more?) | ||
x) | |||
watch out, don't calculate frecency on history import. instead, set to -1 and come back later on idle. | |||
nsNavHistory::AddPageWithVisit(), passing PR_FALSE for aCalculateFrecency to InternalAddNewPage() | |||
'''todo''' | '''todo''' | ||
x) | |||
// XXX | |||
// on import / fx 2 migration, is the frecency work going to slow us down? | |||
// find out if we import history before or after bookmarks for fx 2 | |||
// we might want to skip this stuff, as well as the frecency work | |||
// caused by GetUrlIdFor() which calls InternalAddNewPage() | |||
// if we do skip this, after import, we will // need to call FixInvalidFrecenciesForExcludedPlaces() | |||
// we might need to call it anyways, if items aren't properly annotated | |||
// as livemarks feeds yet. | |||
less of a deal as typicall bookmarks are much fewer than history? not sure. | |||
x) | x) | ||
Revision as of 00:15, 4 January 2008
notes
1)
unvisited bookmarks are no longer hidden, but unvisited livemark items are. this happens in nsNavBookmarks::InsertBookmark(), where we call UpdateFrecencyAndHiddenForBookmark(), but only if the parent is not a livemark
2)
when you remove a bookmark, should the frecency change for that place? yes, it should, as we gave weight to bookmarks. RemoveItem() calls UpdateBookmarkHashOnRemove() with the place id, so we do this work there. Note, if you remove a folder, we will eventually call RemoveItem(), so that's how this works with removing folders.
3)
spin off bug:
// UpdateBookmarkHashOnRemove() does a sanity check using // IsBookmarkedInDatabase(), so it might not have actually // removed the bookmark. should we have a boolean out param // for if we actually removed it, and use that to decide if we call // UpdateFrecency() and the rest of this code?
4)
on calculating frecency, if we don't know ahead of time, we need to check that for a place id, if it is bookmarked, but not a livemark. see CalculateFrecency() [not CalculateFrecencyInternal()]
5)
spin off bug:
upon moving a bookmark, update the frecency. for unvisited bookmarks, this bumps the frecency, because we use PR_Now() as the pseudo visit, which is desired, as we're expressing interest. For visited bookmarks, it may age the bookmark and lower frecency. Is that desired? for now, punt on this and for annoations for updating frecencies.
5)
issue: if when autocompleteing, we will prefer moz_place title over bookmark title, and so when showing match, we will show that one. because of the fix for bug #407292 – When adding a bookmark with no title, we should use the uri as the title, you might get the uri as the title if it matches the user text, as we prefer it.
6)
spin off bug:
we recalc twice on idle, one for invalid and one for old frecencies. need marco's last_visit_date change to make old work right, or do a subquery until then. right now, we are going to recalc the same places over and over. (if we do a sub query, watch out for performance)
recalculating for invalid (frecency == -1) is for firefox 2 / 3b2 -> b3
recalculating for old frecencies is for the ebay problem (https://bugzilla.mozilla.org/show_bug.cgi?id=394038#c19)
to solve the ebay problem, we should also recalc the moz_places with high frecency and "old" last visit date. (In another patch, marco has added last visit date to moz_places, which will make the query we need more efficient as we don't have to join against moz_historyvisits)
If something has a recent last visit date, then we recently recalculated frecency, so the frecency would be accurate.
If something has a low frecency and an old last visit date, recalculating would only lower the frecency (and low frecency wouldn't be hurting our autocomplete results.)
But if something has a high frecency and an old last visit date, recalculating the frecency would lower the frecency, which would solve the ebay problem.
7)
when updating places with frecency of -1, we call CalculateFrecency(), which will check if the place is a livemark item (only), and if so, use PR_FALSE for is bookmarked. we need to do all this so that upon calculating frecencies for previously hidden livemark items, we don't unhide them. (They should come back with frecency of zero if they were never visited.)
8)
in previous versions of the patch, I had hidden <> 1 in the check for invalid frecencies. we don't need to (nor want to) limit recalculations to non hidden. what we currently do is look for anything with frecency of -1, and recalc (on idle). if frecency goes to > 0, we can unhide it. this will cause old, hidden bookmarks to be unhidden and start showing up in url bar autocomplete, yet keep hidden, unvisited livemark items hidden.
note, we have had issues in the past where unvisited bookmarks are hidden. (see bug #369887)
with a new bookmark, we create a moz_place for it, with hidden = 1. if you were to visit that bookmark, we would set hidden to 0, even if you later cleared all visits.
for new bookmarks, we will be calculating the frecency (and unhiding) when we call InsertBookmark()
9)
I've made prefs for everything, so we can easily tune it. note, for download, and redirect transitions, and 0 (undefined, see bug #xxxxxx) the weight is 0. (will anything bad happen if we go negative? I don't think so.)
10)
when do we recalc frecencies for places I don't revisit? on expiration, but that could be a long time. we also recalc on idle (after first bucket days), ordered by high frececeny.
11)
like unvisited livemark items, place: queries should not show up in autocomplete results. using IsQueryURI(), I check if the place url is place:..., and if so, pass in false for isBookmark to CalculateFrecency() so that they stay hidden until visited (or in the case of place: urls, always.) because the frecency is zero (passing in PR_TRUE for isBookmark would result in non-zero frecency), we will also no recalculate frecency on idle. only upon visit.
12) on clear all data, resetting frecency to -1
13) on expiration, I reset frecency to -1
on expiration (EraseVisits()), for every place we deleted visits, visit_count and frecency is reset.
in the code that recalcs frecency, if old frecency is -1, we recalc visit count and update it.
14) on delete visits, see nsNavHistory::RemovePage() and nsNavHistory::RemovePagesFromHost(), set frecency = -1
15)
add frecency <> 0 to ac queries because...
16)
add code so that for all non place: and unvisisted children of livemarks we set frecency to 0, so that we can use it after migration, clear all private data, etc
17)
are there any undesirable side effects of clearing hidden after recalculating frecency? I don't think so, as visiting a bm would unhide it, and then you could always clear private data or remove the visit to have the same effect.
18) note, bookmarks also no in open location and pref page autocomplete (all history autocomplete)
19) after v7 schema migration, run fix it method so place: and unvisited livemark items don't show up, as our migration sets frecency to -1
20)
for the code that resets to -1, better to fix existing code skip over place we will reset to 0, or do what I'm doing and query and reset to 0. note, need the fix it code for after migration, so this keeps it simple. need to verify if my query is a perf issue with lots of history. (less concerned after marco's index fix, however.)
21)
for the ac queries, sort by freceny, then typed, then visit_count because we might not have frecency. in the case of 3b2 migration or clear all private data, lots of places have frecency = -1 (until idle), so we will have lots of frecency ties, so use typed and visit_count to break the ties and provide better results.
22)
should we add "AND h.frecency <> 0" to our autocomplete queries?
yes, because take the following scenario:
a) an unvisited livemark items get frecency of 0, since not a bookmark b) visit that livemark item, frecency of non 0, shows up in ac, no longer hidden since frecency is non 0. c) clear history, livemark item no longer hidden, would should up in ac until we recalc on idle d) I added code FixInvalidFrecenciesForExcludedPlaces() so that we will reset place: and unvisited livemark item frecency to 0 after clear history, expire, etc to solve this problem.
what about migration or first run? as long as we call FixInvalidFrecenciesForExcludedPlaces() we are good.
why <> 0 and not > 0. We want invalid frecencies to show up in ac, as those are bookmarks, migrated places, places after clear all private data, etc before we got a chance to recalculate.
23)
from the ac queries, why remove hidden <> 1
unvisited bookmarks from previous versions will be hidden until visited, but we want them to show up in ac. for stuff we really want hidden, set frecency to 0, like place: and unvisited livemark items. (is there more?)
x)
watch out, don't calculate frecency on history import. instead, set to -1 and come back later on idle.
nsNavHistory::AddPageWithVisit(), passing PR_FALSE for aCalculateFrecency to InternalAddNewPage()
todo
x)
// XXX // on import / fx 2 migration, is the frecency work going to slow us down? // find out if we import history before or after bookmarks for fx 2 // we might want to skip this stuff, as well as the frecency work // caused by GetUrlIdFor() which calls InternalAddNewPage() // if we do skip this, after import, we will // need to call FixInvalidFrecenciesForExcludedPlaces() // we might need to call it anyways, if items aren't properly annotated // as livemarks feeds yet.
less of a deal as typicall bookmarks are much fewer than history? not sure.
x)
what about the other occurrences of hidden <> 1 in nsNavHistory.cpp? see comment #23 about about why that is bogus. use frecency <> 0?
x)
should we get rid of the hidden column?
x)
for calc frecncy, never calc 0 unless place: or unvisit livemark item, so do 1.
x)
for the code that resets to -1, better to skip over stuff, or fix it afterwards?
x)
everything gets frecency of -1, until we visit something or recalc on idle. for fx2 migration, when we import history, we can recalc (some?) at the end. for fx3b2 migration, we can recalc some at the end.
and then wait for idle?
on clear all data (and bm import or restore? should we do a quick recalc of bookmarks? what if they have a lot of bookmarks? should part of our work on idle be to recalc bookmarks?
x)
when calc frecncy, never calc 0 unless place: or unvisit livemark item, so do 1.
x)
note, not clearing visit count, so what does that mean for bug #394133
in the places organizer and history sidebar ui, visit count is lazily corrected (except for place: and for unvisited children of livemarks, because we use visit count in our query.
note, untrusted visit count is used in other places, like the smart bookmarks menu
x)
for visit count re-calculation, visit_type NOT in (0,4), is that correct?
x)
// should we add "AND visit_type NOT IN (0,4)" // XXX aren't embed's affecting visit count already in ::AddVisit()? // I think we have a bug about not letting them do that. // XXX note, should we fix AddVisit() to recalc visit count by using // CalculateVisitCount()?
x)
// should we recalc frecency for all non-livemark item, non place: query // bookmark items? What if there are a lot? Better to let it happen on idle? // if we add a frecency > 0 check to autocomplete, nothing will show up // we could have visited a livemark item, cleared history, which marked it // as non hidden, then, until recalculation on idle, it wills how up // in autocomplete. we could recalc frecency for (or just make 0) // all place: uri and livemark items, and then do frecency <> 0 // and keep hidden <> 1. this way, we will show everything in ac // after clear all data that we've got, except for place: and livemark items
x)
2 -> 3 migration, frecency = -1, but do we have visit counts? force place: and unvisited livemarks to be zero, do one on idle, wait for rest? 3b2 - > 3 migraiton, frecency = -1, but we do have visit counts. force place: and unvisited livemarks to be zero, do on idle, wait for rest. clear all private data, frecency = -1, but we do have visit counts. force place: and unvisited livemarks to be zero, do on idle, wait for rest.
small area: partial expiration, frecency = -1 for a few places, but we do have visit counts. don't think we need to force place and unvisited livemarks, (maybe we do for unvisit livemaks) , do on idle, wait for rest.
small, but could be big: manual delete places, , but we do have visit counts. force place: and livemarks to be zero, do on idle, wait for rest.
..or whereever we end up with frecency = -1
on all of these, fire the idle timer / invalid frecency
don't reset visit count, use that for determining which places to do recalc first, but always need to recompute visitcount before recalc frecency
x)
do forced "on idle" work in "on visit"?
x)
verify we check oldVisitCount and frecency = -1 everywhere we should be.
x)
MigrateV7Up() may need to call "rv = prefs->SetIntPref(PREF_BROWSER_LEFTPANEFOLDERID, -1);"
covered by another bug, but we need to fix it before landing, otherwise going back and forth between versions will hork the organizer.
x)
possible optimization, if not bookmark and no visit count, bail out early from frequency calculation?
x)
rename the isBookmark arg to isNonHiddenBookmark in the CalculateFrecency() methods, as it's really for non place:, non univisited livemark items?
x)
make sure that we aren't doing too much work when a livemark gets updated, due to removing from hash
x)
first time migration delay: from adding column with -1 to moz_places (acceptable, not much we can do)
x)
determine if we really need the index on frecency
x)
first time, all bogus (need to take the top <x> and recalc), on migration and import from fx 2
x)
decide if we should recalc frecency for bookmarks (remaining places?) after clear all data, or just wait until idle
x)
make sure not too much "on idle" cpu usage
x)
determine if we should do more (or less) invalid / old calc on idle
x)
pref for autocomplete chunk size and timeout (AUTOCOMPLETE_SEARCH_CHUNK_SIZE and AUTOCOMPLETE_SEARCH_TIMEOUT?)
x)
make sure we stop searching once we hit max results, or do we not know if we have hit max results in the backend?
x)
should url bar drop down be frecency or visit date? (keep visit date, so I need to fix my code to be the old way, but ask ux if they like by frecency?)
x)
we do need to recalculate frecency on history import?
x)
recalc on bm import / restore?
x)
XXX when removing duplicate ids, should we be resetting recency to -1?