Places/StatusMeetings/2006-10-26 and 2006-10-27

From MozillaWiki
Jump to: navigation, search

« previous week | index | next week »

Places meeting: 2006-10-26 4pm PST

sspitzer       thunder / myk / dietrich / et al:  meeting time?
thunder        hola
dietrich       howdy all
myk            sspitzer: yes, i'm here
dietrich       sspitzer: i'll try out the new patch on bug 356487 asap
dietrich       anyone have a chance to check out the ERD i sent out?
sspitzer       dietrich:  yes
thunder        yeah
sspitzer       I had a question
myk            dietrich: one more naming suggestion is to call the moz_anno_names table moz_attributes, since the moz_annos table is an entity-attribute-value (EAV) table, and moz_anno_names stores the attributes that the moz_annos table uses
dietrich       myk: moz_attributes sounds kinda generic.. moz_anno_names is specific to annotations
dietrich       moz_anno_attributes?
myk            dietrich: sure
sspitzer       my question was also about moz_annos
sspitzer       and this is not new to your ERD, but:
sspitzer       are moz_annos required to have a moz_anno_name?
sspitzer       can we have un-nammed annotations?
sspitzer       I think it should be a "may have a " relationship
myk            dietrich: i generally prefer one-word names, but perhaps expressiveness is better than brevity in this case
sspitzer       instead of a "has a"
myk            sspitzer: they are required to have a name
dietrich       myk: we could remove the moz_* prefix from all the tables
dietrich       sspitzer: i'm not sure what the use case for un-typed annotations is
myk            dietrich: yeah, i've thought about that as well, but it makes some sense to have it given that sqlite doesn't support namespaces or cross-db queries
sspitzer       dietrich:  here's my use case
myk            dietrich: the use case i usually consider is that multiple extensions want to store some related data in the database; we don't want them to create colliding tables (neither colliding with us nor with each other)...
dietrich       myk: yeah, prefixing to avoid collisions makes sense
myk            dietrich: large "enterprise" RDBMSes use namespaces to accomplish this; medium-scale RDBMSes like MySQL allow you to join across databases, effectively making databases be namespaces
myk            dietrich: incidentally, i dislike prefixes too and think they are often used unnecessarily, but in this case i think they make sense
dietrich       myk: yeah, the lack of cross-db queries means extensions are likely to add to our db as opposed to creating their own
dietrich       ugh
myk            dietrich: yeah :-)
sspitzer       well, I was thinking about extension developers, who might use annotations in ways other than tagging (which is how I have been thinking about them, mostly).  in ways where all annotations are of the same type, but to play nice with other extensions and our own code, they might need a moz_anno_name, like "extension_xyz", so maybe my use case is wrong.
dietrich       sspitzer: yeah, anno_names are basically being used as a loose typing system
sspitzer       ok, then ignore my question.
myk            dietrich: hmm, actually it looks like i'm wrong; sqlite does support joining across databases using the ATTACH DATABASE syntax
myk            although "There is a compile-time limit of 10 attached database files."
thunder        it must be fairly inefficient if they do that
myk            (http://www.sqlite.org/lang_attach.html)
thunder        s/do that/have that restriction/
sspitzer       speaking of SQLite, I think thunder is working on updating to a new version, one with full text indexing.
thunder        oh; yes
thunder        I have a patch waiting to be reviewed by vlad
thunder        (well, and committed by vlad, since I don't have an account)
dietrich       myk: thx for the link. yeah we should encourage extension devs to use external dbs :)
dietrich       thunder: cool, is there a bug for that?
myk            dietrich: yeah, you're right, we really should; i wonder what it means for our prefixes
thunder        yeah, looking
thunder        just a sec
thunder        341137
thunder        https://bugzilla.mozilla.org/show_bug.cgi?id=341137
myk            sspitzer: brett encouraged third parties to use namespaces in the annotation names to avoid collisions
myk            sspitzer: f.e., in my implementation of microsummaries on top of places, i used microsummary/ as the namespace, f.e. the annotation name for the generator URI is microsummary/generator_uri (or something like that; would have to look at source to recall the exact name)
dietrich       myk: if we can't restrict table creation in our db, then we should keep the prefixes
sspitzer       myk:  ok, that makes sense.  thanks for the background info.
myk            dietrich: i think i concur, especially given that compile-time limit to the number of other databases one can attach
thunder        dietrich: my patch doesn't enable the text searching stuff
thunder        (yet)
thunder        I thought I'd get this reviewed first, then add the interface to mozStorage
dietrich       fyi: i sent mail to todd agulnick, asking for him to take a look at the proposed changes
dietrich       i'll hit up the yahoo people also
thunder        cool
thunder        um, do we have an agenda of some sort?
dietrich       thunder: i think just to report any progress on the task list, and then any specific issues people want to bring up
thunder        ok,
thunder        I'm in the process of making a couple of new tests for tinderbox
thunder        TpMH (medium history) and TpLH (large history)
thunder        they are basically Tp, but I copy in a history file
thunder        however, I don't have good history files to copy in :)
dietrich       thunder: QA might be able to help getting history files
thunder        I'm going to shoot for ~1MB for the medium and ~10MB for the large
thunder        ah good call
thunder        I'll ask
thunder        other than that, I just need to get this patch checked in and deployed, and we'll have new testing data
thunder        hopefully it'll be useful
dietrich       cool
dietrich       i'm sure it will. far more realistic baselines for testing Tp
thunder        yeah
thunder        other than that, nothing new and exciting to report
thunder        the 3.3.8 update seth mentioned
thunder        I'll add interfaces for the text search stuff when I figure out how to do that :)
dietrich       anyone else have anything to discuss?
dietrich       the only other thing i wanted to mention is that we should be thinking about where we want UI to go after we get Fx2 parity
dietrich       eg, considering stuff here: http://wiki.mozilla.org/Firefox/Feature_Brainstorming:Bookmarks
thunder        yeah.
dietrich       as well as adding any ideas to that list
thunder        that is where the exciting stuff starts :-)
dietrich       exactly!
thunder        I want cheese-based bookmarks
dietrich       mmmm
thunder        but we should offer a lactose-free version
dietrich       on that note: meeting adjourned :)
thunder        um, with full text search in the db, we should have smart bookmark folders
thunder        haha :)
dietrich       thunder: yes, that'd be v. cool
dietrich       hey i don't think that's on the list
thunder        really
*              thunder adds
*              myk votes for cheese-based bookmarks
thunder        hooray!
thunder        okay, it has been totally added
sspitzer       I don;'t have much to report, except a new patch (that address the comments from dietrich).
thunder        woo
thunder        commit!
thunder        :-P

Places meeting: 2006-10-27 2pm PST

sspitzer       dietrich / thunder / myk / et al
dietrich       hi
sspitzer       dietrich:  first, thanks for the tip about the storage inspector.
sspitzer       that's a very useful extension
dietrich       yah
sspitzer       https://addons.mozilla.org/firefox/3072/
sspitzer       so, a couple questions that relate to the ERD
sspitzer       http://wiki.mozilla.org/Places:BookmarksComments#Places_ERD
sspitzer       we have title and user_title in the moz_places title (moz_history if you use the storage inspector on your trunk with places enabled "as is" today)
sspitzer       those are usually the same, right?
sspitzer       can we do a sort of copy-on-write trickery, where we don't store it twice if they are the same?
sspitzer       when we do our query, we ask for both columns, and if user_title is null, we use title?
sspitzer       or, is there a reason we do what we are doing?
dietrich       sspitzer: iirc user_title is basically for bookmarks
dietrich       title retains the original title from history
dietrich       say you bookmark a page and title the bookmark differently, it populates user_title
dietrich       user_title may be obsoleted by the schema changes actually
sspitzer       how is that?
dietrich       my opinion is that a bookmark title is one of the most-likely to be changed, and represented in multiple places, properties of a bookmark
dietrich       and therefore should be tied to the bookmark, not to the history entry
dietrich       i think that user_title should be removed, and a title property added to moz_bookmarks
sspitzer       ok, I follow you.  I thought you were saying it has been changed in your new erd
sspitzer       you are saying that we should.
dietrich       right
sspitzer       I agree, but this leads me into my next question, which is really something mconnor asked me (and he might have asked you already, too)
thunder        dietrich: agreed
thunder        I think
dietrich       that's basically what i say in the wiki, but i forgot to change that in the diagram
thunder        nod
sspitzer       why are bookmarks and history in the same data model?  Could we have two databses, and when we need to, join across them?
dietrich       sspitzer: since history is unique on URI and bookmarks isn't anymore, that's a possibility
dietrich       however,
thunder        I was thinking about that too
dietrich       there might be performance repercussions of loading the 2 files separately
myk            dietrich: putting user_title into moz_bookmarks seems like the right thing to do
dietrich       i think that there are close ties between history and annos, bookmarks and annos, but not as close an association between bookmarks and history
dietrich       wrt to how they're used in practice
myk            dietrich: since it's a property of the bookmark, not the place itself
dietrich       myk: yep
dietrich       sspitzer: i think you bring up a larger question: what does a tight integration between history and bookmarks in the data model buy us?
myk            sspitzer: it's a good question; one might also ask the corollary question, however: why ever have more than one database, no matter how many different kinds of data we start storing in sqlite?
sspitzer       for myks question, do we want extensions to be using our database?
sspitzer       not that I have a problem with extension authors, but it seems like they could impact the browser, even after the extension is removed.
sspitzer       I was thinking (maybe naively) that having one giant database with everything could be costly on startup.
sspitzer       at yesterdays 4pm meeting, we chatted here about table prefixes
sspitzer       about why we were doing moz_*
sspitzer       about my naive thinking, I don't know if sqlite is designed for one db with lots of tables, or better to have mutliple dbs.  we already have:
myk            sspitzer: that may be true, since databases are represented as a single file; on the other hand, perhaps sqlite just loads a small portion of that file on startup; i don't know
sspitzer       myk:  me neither
sspitzer       we already have:
sspitzer       bookmarks_history.sqlite        urlclassifier.sqlite
sspitzer       formhistory.sqlite              urlclassifier2.sqlite
sspitzer       search.sqlite
sspitzer       each of those are a separate db on disk.
sspitzer       dietrich asked:
sspitzer       what does a tight integration between history and bookmarks in the data model buy us?
dietrich       i think that the page URI singleton model demands a tight integration between them
myk            sspitzer: as an aside, perhaps we should rename bookmarks_history.sqlite -> places.sqlite
myk            (that is, unless it turns out that we should be sticking everything into a single database)
dietrich       eg: if we remove the place_id FK in moz_bookmarks, what breaks?
dietrich       anything that modifies a bookmark will now be annotating a bookmark URI, not the place URI
thunder        it might make it slower to update the history table when one visits the place
thunder        since you'd have to look up which row it is
thunder        though, probably not
thunder        since that is/could be indexed anyway
dietrich       yeah
thunder        (to look up by uri)
sspitzer       does separating them make "clear private data" faster?
sspitzer       if we don't have to worry about bookmarks being lost?
dietrich       sspitzer: possibly
dietrich       however, i think it's important to think about what we want from a functionality perspective first, then optimize on the requirements of those goals
myk            fwiw, i agree with dietrich 
myk            we should avoid premature optimization and instead focus on accurately modeling the concepts with the database schema
thunder        hmm
thunder        sure; but from that perspective
myk            once we have an accurate model, then it makes sense to measure its performance and modify the schema accordingly (adding indexed, de-normalizing in exceptional cases, etc.)
thunder        bookmarks is a separate concept from history
thunder        so should live in its own table :)
myk            thunder: right, which is why we represent the two concepts with their own tables
thunder        ah, I thought the argument was geared toward not separating them
dietrich       hence my question: after removing the bookmark singleton approach, is there a reason to keep that place_id FK in the moz_bookmarks table?
thunder        my confusion, sorry
sspitzer       dietrich: thinking...
dietrich       i'm trying to think of use-cases for showing history data for a bookmark
thunder        there are
dietrich       but in those cases, you could join on URI instead of place_id
thunder        but, right
thunder        which is why I think it's not necessary to have the FK there
dietrich       also: say you wanted visitation stats for a bookmark, those would have to be stored against the bookmark URI, not the original URI
thunder        URI?
thunder        or row?
dietrich       thunder: same thing wrt to annotations
dietrich       thunder: right now there are place: uris for folders, queries, etc
thunder        it could be useful to know how many times I've been to a site vs how many times I've clicked on this bookmark
thunder        both could be useful, really, depending on context
dietrich       one of my recommendations was to provide URIs for all bookmarks datastore objects, for annotations, etc
myk            dietrich: removing the place_id FK essentially turns the uri column into a FK; i don't see a problem with that offhand
dietrich       even if that URI is something simple like place:bookmark:{PKID}
dietrich       myk: exactly - place_id is redundant at the point
thunder        yeah, it does do that, with the difference that we don't need to maintain it pointing to an actual row
thunder        (right?)
dietrich       yep
myk            dietrich: there's a minor space hit, but i'd say it's insigificant, since users don't generally have too many bookmarks
thunder        so you could have a bookmark with a uri that is not in the history table
dietrich       thunder: i think so
myk            dietrich: one consideration is that joins might be more expensive
thunder        whereas with the FK route it seems bad if it doesn't point anywhere (though, it could just be null, I guess? - but even then you still have to walk the bookmarks when you clear history)
dietrich       well, a bookmark's URI *is* a URI that's not already in history
thunder        er
thunder        no, I mean
dietrich       myk: how so?
thunder        the uri pointed to by a bookmark
myk            dietrich: as we'd be joining on URI string rather than integer ID
dietrich       myk: good point - given that we'd already be indexing the URI cols, i wonder how big that hit would be
myk            dietrich: here's a use case that requires a join from places -> bookmarks: say i wanted to search my history for some site.  in the search results, i'd probably want to know that a particular result is bookmarked rather than being just some random site i visited once
myk            f.e. the search results might include an icon for each result, and the icon for results that are also bookmarks would be different (f.e. a bookmark symbol overlaid over the favicon)
dietrich       myk: so that was my next question: in use-cases like that, should it be done at the db layer?
myk            dietrich: well, my first thought is that the db layer solution would be simplest, but maybe that's just because i know how to do it
thunder        that is a valid use-case, but we don't know how much slower that would be sans-integer-FK
dietrich       myk: i think it would be faster to implement that use-case at the db layer than it would be to call out to the bookmark service from the front-end
thunder        nor do we know much much faster other operations (e.g. clear history) would be
myk            dietrich: sure; in that case, i suspect that a join from places -> bookmarks would be faster on an integer key than a URI, but i don't know that for sure
dietrich       any interaction w/ annotations are likely joins on URI cols
dietrich       in the current model, and with the proposed changes
dietrich       and likely to occur more often than the use-case we're discussing
myk            another consideration is that it isn't clear to me that "places" is conceptually the same thing as "history"; i imagine that interesting applications could be enabled by differentiating between the two
dietrich       so that's really an issue we have in the status quo (if it's even an issue)
dietrich       myk: i totally agree. i think that using annotations as an extensibility mechanism is key to that.
myk            dietrich: yes, annotations is important, but we would also need to provide for the possibility that a place in the places table doesn't necessarily represent an item in "history", if "history" means the set of places the user has visited
thunder        hrm
dietrich       myk: i think that the separation of moz_visits and moz_places effectively does that
myk            dietrich: yes, indeed
myk            dietrich: but we seem to have been talking about the places table as if it's history, and i wanted to make sure we were differentiating between them conceptually
dietrich       ok
dietrich       hm, i wonder if clearing history in places clears moz_places entries (ne moz_history), or just entries in moz_visits
dietrich       b/c entries in moz_places are kind of like an implicit visit
sspitzer       dietrich:  didn
dietrich       sure you can put non history stuff in there, but history URIs are there because you visited them
dietrich       i guess if there's no moz_visits entry, there's no way to tell how the moz_places entry got there
sspitzer       't brett also tells us recently that the clearing of history happens "in chunks"?
myk            dietrich: perhaps the solution is to clear entries which are neither bookmarks, nor history, nor have any annotations in the annotations database to indicate that some other code cares about those URIs
sspitzer       looking for his comments...
dietrich       myk: sure that might do it
dietrich       when clearing history, clear moz_visits *and* remove "un-attached" entries in moz_places
myk            dietrich: you mentioned earlier that interactions between places and annotations are likely joins between URI columns, but it looks like your ERD has them joined by ID
sspitzer       "places history expiration happens incrementally as you browse (instead of delaying shutdown)"
dietrich       sspitzer: yeah, i wonder if clearing history manually forces removal from the db, or if it does the expire-over-time thing
dietrich       eg: via cpd
myk            dietrich: also, there's a partial model for expiring annotations; i've previously proposed adding "expire when bookmark goes away" functionality; perhaps there should also be "expire when history cleared"
sspitzer       for more on expiration, see brettw's comments on http://wiki.mozilla.org/Places/StatusMeetings/2006-10-12
sspitzer       he writes:
dietrich       myk: yeah i was mistaken, it's by place_id
dietrich       myk: sure. i think brettw referenced a bug for that
dietrich       maybe u filed it
dietrich       :)
myk            :-)
myk            i probably filed the bookmarks one; if so, i'll add history to it
myk            hope i'm not dumping too many considerations on y'all
dietrich       myk: not at all :)
dietrich       we're still early in the process, so it's great to have more ideas to churn on
dietrich       i think the takeaways wrt to schema, that i need to update ERD to:
sspitzer       dietrich:  look at http://lxr.mozilla.org/seamonkey/source/toolkit/components/places/src/nsNavHistoryExpire.cpp, clearing history manually doesn't appear to do the expire over time thing.
dietrich       - remove place_id FK from bookmarks
dietrich       and rename the db file to moz_places.sqlite?
dietrich       or just places
sspitzer       I think places.sqlite
sspitzer       based on the names of the other .sqlite files
dietrich       k
sspitzer       one more item, in addition to "- remove place_id FK from bookmarks"
sspitzer       what about user_title?
sspitzer       or, did you already have that, but not in the ERD image?
dietrich       sspitzer: right, that also needs to be added to the diagram
sspitzer       any objection to me making today's chat part of yesterday's 4pm chat log, that I'll post on http://wiki.mozilla.org/Places/StatusMeetings?
myk            i'm not so sure removing place_id is actually the right thing to do
dietrich       myk: yeah that one's wobbly
dietrich       i think there aren't any super arguments for keeping it
dietrich       but no args for removing it either
dietrich       which leaves performance, which i think would be worse (just not sure to what degree)
dietrich       i leave it in for now
dietrich       however, thunder, this means that you cannot bookmark something that isn't in moz_places :)
sspitzer       dietrich:  then how does it work when I right click and bookmark a link I've never been to?
sspitzer       do we create an entry, and set the visited to null, or some special value?
sspitzer       (in the moz_places table)
dietrich       sspitzer: right now it creates entries in both
dietrich       both moz_history and moz_bookmarks
dietrich       i'm not sure what we do for visits
dietrich       sspitzer: http://lxr.mozilla.org/seamonkey/source/toolkit/components/places/src/nsNavHistory.cpp#868
dietrich       that's what's called when you add a bookmark
dietrich       "create a new hidden, untyped, unvisited entry"
sspitzer       ok, and then the moz_bookmark references that?
dietrich       yep
sspitzer       ok, thanks for clarifying.