User talk:Mak77

From MozillaWiki
Jump to: navigation, search

Places Database Splitting

  • Ideas:
    • Split places.sqlite into history.sqlite and bookmarks.sqlite
    • Use a SYNC=OFF connection on history.sqlite
    • Use a SYNC=FULL connection on bookmarks.sqlite
  • Needs:
    • Sqlite support for different sync on attached databases
    • History backup system
  • Pros:
    • no fsyncs for history
    • faster history update/write queries
    • restore the old "delete history by deleting history file" behaviour
    • 2 smaller dbs instead of a huge one (easier/faster vacuuming)
    • possibility to regenerate table schema fixing bogus column name
    • possibility to prepare a better DB schema for future changes
  • Cons:
    • history can be lost in case of an OS crash or power failure
    • migration will require time to move all data
    • in case of history lost, we lose icons, pageAnnos, frecencies, inputhistory
    • deleting a history item would not delete it from history backups
  • Solutions:
    • backup history, hold backup copies of history.sqlite, or save a json backup containing the 100 (or more) more frecent pages
    • a progressbar on migration would probably make this less irritating
    • when doing the backup save all relevant data for the backed-up uris
    • not an easy one, unless the user understands we are restoring an old history

The basic idea is to change the PRIMARY KEY we are actually using in moz_places table, practically now we have TWO primary keys, the real one is "id", but also "url" is a UNIQUE column that could be used as a primary key. So changing the primary key to be "url" would allow us to easily split bookmarks related tables to a different database, replacing "place_id" columns with "url" columns and joining where our_table.url = moz_places.url. Ideally we could hold both keys, since adding an url column with an index will increase the table size, we could continue joining on "place_id" inside history.sqlite, while using "url" inside bookmarks.sqlite.

Actually this change would also make bookmarks/keywords/tags checks against an url much faster since we would not need anymore to join with moz_places (url would be also inside moz_bookmarks)

The migration patch would allow us to regenerate all tables, fixing bogus column names and adding other columns we need from some time like last_visit and referer. Also regenerating dbs would make a VACUUM, that is surely a nice thing to do at this time.

Tags out of bookmarks table

  • Ideas:
    • move tags to a dedicated table
  • Pros:
    • No data duplication
    • faster tag finding
    • no need to filter out tags from bookmarks queries
    • no bad inserts bugs
    • easier tags rename/merge/split
  • Cons:
    • need to revise all tags related code and tagging service

Tags are actually bookmarks folders, so we have to duplicate bookmarks inside them, pay attention to not add containers/separators inside them, move bookmarks when merging/splitting two tags, filter out them from all bookmarks queries. Having tags in a separated table with schema (tag, item_id) would make all of this much easier and faster. Why this and not a classic (id, tag)+(id, item_id) approach? Thinking to the merge case, doing UPDATE moz_tags SET tag = "X" where item_id = "Y" would automatically merge or split tags at no cost. The primary key for such a table would be a compound one (tag, item_id), but we will most likely also need single indexes on both columns. Tags should be case-insensitive, so always saved as LOWERCASE in the db

keywords ids out of bookmarks table

  • Ideas:
    • move keywords ids to a dedicated table
  • Pros:
    • No null columns in bookmarks
    • faster keyword finding
  • Cons:
    •  ?

Keywords live in a separated table, but they are linked through a keyword_id column in moz_bookmarks, this is often null, and that's not good since it will take a lot of useless space in the db We should instead have a (keyword, item_id) table, and no column in moz_bookmarks


Livemark children out of bookmarks table

  • Ideas:
    • don't save livemark children
    • Convert livemarks to dynamic containers
  • Pros:
    • No need to filter them in queries (autocomplete ones for example)
    • No need to write to disk (or no need to fsync)
  • Cons:
    • break extensions wrongly using bookmarks service to access them
    • needs new API methods in livemark service for extension developers
    • If held in memory they will not be available offline
    • If held in memory they will have to be reparsed on every new session

Livemarks children are added to bookmarks table, and often removed after a few hours or one day. This is bad because we always have to filter them out from bookmarks queries, and adding/removing them often will cause db fragmentation. Also that needs a lot of fsyncs!

Ideally, if we split databases, we could save them in a SYNC=OFF dedicated table, that hwv would still require writing to disk, but much faster.

Hold Embed visits in memory tables

  • Ideas:
    • don't sync embed visits to disk
  • Pros:
    • No need to expire them manually
    • faster adding, less sync work
  • Cons:
    • embed visits are discarded on session close instead after 24h
    • needs temp tables setup from the app start

embed visits are used only for link coloring, adding and removing them so often is causing database fragmentations. So it would be better not writing them to disk at all.


Use Places Temp Tables for Private Browsing

  • Ideas:
    • while in private browsing save visits to temp tables
  • Pros:
    • All Places features will work during provate browsing
  • Cons:
    • needs temp tables setup from the app start
    • need a UI change to distinguish between real visits and private ones

Actually when private browsing starts it "freeze" Places, so that no new visits are added. In this state i could not use the awesomebar, the history menu or sidebar to see where i was 5 minutes ago or find a website i've closed. IDeally all Places features should continue working.

The private visits could be showed differently in the primary UI, maybe by putting a special private icon near them or changing their color to grey, or their background. They would be marked as private in the temp table through a special "private" column, on sync we would avoid syncing to disk "private" visits. On exiting private browsing we would clear all private items from the temp tables (they would so disappear from the UI)

What if the user bookmarks a site? Most likely we should mark the site as no more private and save its history (this would be easy, changing the private value before adding the bookmark)


Use a Preordered Nested Tree table schema for bookmarks

  • Ideas:
    • change bookmarks table to a preordered nested tree schema
  • Needs:
    • cleaner table (no tags, no livemarks children, no keywords)
  • Pros:
    • non recursive queries to get ancestors/descendants
    • fast children count
    • not possible to create cycles between folders
  • Cons:
    • needs 2 new columns
    • inserts require updating more then one item

Nested Trees are a good way to manage a hierachy in sql, you can most likely select anything with one query, and no recursion is needed at all. So you can with one query know if a node is descendant or ancestor of another one.

On the other side, when inserting a new node, we will have to update other nodes too, so inserts could be a bit slower.


Add a more referer-like column to history

  • Ideas:
    • add a column reporting the last valid visit that brought us to a page
  • Pros:
    • useful for extensions and security checks
  • Cons:
    • needs 1 new column

Actually from_visit reports the page we have come from, so in case of a double redirect we would have to recurse twice to go to the original site. It would be nice having a new column reporting the real referer, or a new table schema allowing to query without recursion (a tree like schema could do the trick, but would make inserts a bit slower)