Places/AsyncAPIsForSync

From MozillaWiki
Jump to: navigation, search

Overview

Tracking bug for Sync: bug 606353

Current situation

Right now Sync calls various synchronous Places API methods to read and write records, with the exception of history reads where it rolls its own asynchronous SQL queries.

Problem

Synchronous sqlite I/O is hurting us a lot on mobile.

Proposed solution

Provide powerful asynchronous methods to replace Sync's many synchronous calls. So instead of adding a bookmark and then adding a bunch of annotations to it, it would be great if Places had an API that would let us do it one go, do all the writes async and then call us back.

Bookmarks

bug 519514

Description of Sync's bookmark record (includes folders, separators, livemarks, etc.): https://wiki.mozilla.org/Labs/Weave/Developer/BrowserObjects#bookmark

Read

For syncing out as well as conflict resolution, right now we call

  • getFolderIdForItem()
  • getBookmarkURI()
  • getKeywordForBookmark()
  • and get the various "bookmarkProperties/" annotations (description, loadInSidebar, staticTitle, etc.).

I imagine that this information could be retrieved with one query which could be wrapped into getBookmarkInfoAsync() or similar API call.

Write

We currently use the following nsINavBookmarksService methods to create items:

  • insertBookmark()
  • createFolder()
  • insertSeparator()
  • as well as nsILivemarkService::createLivemark()

When we make these or a subset of these async, how will be notified of the ID (and later, when we add GUIDs, their GUID)? Through an nsINavBookmarkObserver? Or will there be a callback we can pass into? We need the ID / GUID so that we can set various "bookmarkProperties/" annotations (description, loadInSidebar, staticTitle, etc.) as well as keywords via setKeywordForBookmark(). Ideally, of course, we could follow aforementioned theme and have the insert*Async() methods allow us to pass those in as well. Then we'd only need a callback for special cases such as microsummaries and tags.

To update already existing items, we call setItemTitle(), changeBookmarkURI(), setKeywordForBookmark() and update various annations. Perhaps there could be an updateBookmarkAsync() akin to insertBookmarkAsync() to save us these various separate method calls?

Detailed Proposal

The new methods can be added to a new interface. Maybe nsIBookmarksService (and it only ever does async stuff)?

The only thing not covered here so far is livemarks. Not sure if it's worth an API for them or not (can we get data on how many people store now?).

nsIAnnotationInfo

/**
 * Interface that describes an annotation.
 */
interface nsIAnnotationInfo : nsISupports
{
  readonly AUTF8String name;
  readonly nsIVariant value;
  readonly long flags;
  readonly unsigned short expiration;
}

Ideally, we could use this for a future async annotation service. For now, it's just purposed for nsIBookmarkInfo.

nsIBookmarkInfo

/**
 * Interface that describes a bookmark.
 */
interface nsIBookmarkInfo : nsISupports
{
  readonly ACString guid;
  readonly long long id;
  readonly unsigned short type;
  readonly long long parentId;
  readonly ACString parentGuid;
  readonly long index;
  readonly nsIURI uri;
  readonly AString keyword;
  readonly AString title;
  readonly long long frecency;
  /**
   * An array of nsIAnnotationInfo objects for the bookmark.  These are item annotations from nsIAnnotationService.
   */
  readonly nsIVariant annotations;
}

The idea here is that this interface will be returned and given to places APIs when you ask for information about a bookmark, and when you want to add a bookmark. Because we have the type present, this will handle bookmarks, folders, separators, and dynamic containers (which, if I recall correctly, handles microsummaries and live bookmarks).

I included guids here, which makes this gated (possibly) on bug 607117.

Not clear to me if we need tags on here or not. Need input from the Sync team.

nsIBookmarkInfoCallback

interface nsIBookmarkInfoCallback : nsISupports
{
  /**
   * Called when one of the bookmark methods is done with its work.
   *
   * @param aBookmarkInfo
   *        The information about the bookmark, or undefined if nothing was found.
   */
  void onComplete(in nsIBookmarkInfo aBookmarkInfo);
}

This should be marked with [function] so JS consumers can just pass a function in if they want.

getBookmarkInfo

/**
 * Gets all information known about a bookmark.  Callers must specify the id, guid, or the URI of the bookmark in question.
 *
 * @param aInfo
 *        The bookmark info object that contains the id xor the guid xor the URI of the bookmark.
 * @param aCallback
 *        The object/function to notify when we have the information about the bookmark.
 */
void getBookmarkInfo(in nsIBookmarkInfo aInfo,
                     in nsIBookmarkInfoCallback aCallback);

Consumers would do something like this: bs.getBookmarkInfo({guid:"..."}, function(aInfo) { ... });

C++ consumers won't be happy, but I'm not sure I care.

insertItem

/**
 * Inserts a bookmark item.  Required fields on nsIBookmarkInfo are:
 *  - parentId or parentGuid
 *  - uri
 *  - title
 * Everything else is optional.
 *
 * @param aInfo
 *        The bookmark info object that contains the data needed.
 * @param aCallback
 *        The object/function to notify when we have added the bookmark.
 */
void insertItem(in nsIBookmarkInfo aInfo,
                [optional] in nsIBookmarkInfoCallback aCallback);

Need to document about everything that would make us throw. Also, how do we handle errors? Want to keep the callback simple, ideally.

insertItems

/**
 * Inserts many bookmark items.  Just like insertItem otherwise.
 *
 * @param aInfo
 *        The bookmark info objects that contain the data needed.
 * @param aCallback
 *        The object/function to notify when we have added each bookmark.
 */
void insertItems([array, size_is(aLength) in nsIBookmarkInfo aInfo,
                 in unsigned long aLength,
                 [optional] in nsIBookmarkInfoCallback aCallback);

Just like insertBookmarkWithInfo, but takes a big array of bookmark info and does it all at once. This is basically the batch mode version. I suspect mak and I are going to debate how to best do batch mode, so this may change a lot still.

updateItem

/**
 * 
 * Update the information about a bookmark item.  Callers must specify the id, guid, xor the URI of the bookmark in question.
 *
 * @param aIdentifier
 *        The bookmark info object that contains the id xor the guid xor the URI of the bookmark.
 * @param aInfo
 *        The information to update about the bookmark.
 * @param aCallback
 *        The object/function to notify when we have updated the information about the bookmark.
 */
void updateItem(in nsIBookmarkInfo aIdentifier,
                in nsIBookmarkInfo aInfo,
                [optional] in nsIBookmarkInfoCallback aCallback);

updateItems

/**
 * Updates many bookmark items.  Just like updateItem otherwise.  aIdentifiers and aInfo must have a 1:1 mapping.
 *
 * @param aIdentifiers
 *        Array of bookmark info objects that contains the id xor the guid xor the URI of the bookmarks.
 * @param aInfo
 *        Array of information to update about the bookmark.
 * @param aCallback
 *        The object/function to notify when we have added each bookmark.
 */
void updateItems([array, size_is(aLength) in nsIBookmarkInfo aIdentifiers,
                 [array, size_is(aLength) in nsIBookmarkInfo aInfo,
                 in unsigned long aLength,
                 [optional] in nsIBookmarkInfoCallback aCallback);

Notes and feedback on Proposal

No need for "plural" methods

Due to the way Sync engines work, we have no need for the batch operation methods (insertBookmarks, updateItems) as we deal with each bookmark record individually. We're running this in batchmode anyway. --philikon

We didn't want runInBatchMode to have magic flags that deal with these async methods. Shouldn't be a problem for the Sync team to deal with this new way. --sdwilsh

So the bookmarks store will just accumulate the incoming objects in memory and then the bookmarks engine, at the end of _processIncoming(), will flush the records to disk, using the async call. To avoid hogging too much memory when lots of bookmarks are involved, the store should flush after a certain number of records have accumulated as well. 50 seems like a good number since that's incidentally also the batch size for network fetches on mobile. --philikon

Types of places items covered

Just so we're on the same page, these are the data types Sync supports:

  • bookmarks (TYPE_BOOKMARK)
  • folders (TYPE_FOLDER)
  • separators (TYPE_SEPARATOR)
  • microsummary (TYPE_BOOKMARK with microsummary)
  • livemark (TYPE_FOLDER that's a livemark)
  • query (TYPE_BOOKMARK with a place: URI)

Note that Sync does not support "dynamic containers" (TYPE_DYNAMIC_CONTAINER). --philikon

We don't care about dynamic containers so far, mainly because the API lies and they just don't work. --mak

All these are covered, but you may have to roll your own way with annotations. --sdwilsh

To eloborate: microsummaries and livemarks are regular bookmarks and folders, respectively, with some annotations added to them. So instead of going through their APIs we're just going to add the necessary annotations. We might want to expose some of the internal constants these services use via their interfaces so that we don't have to copy & paste them. --philikon

Interfaces

The separate interface is the best bet, both for compatibility and clear sync/async separation. Name could either be nsIBookmarksService (drop ns?) or IAsyncBookmarks. --mak

Don't have a strong preference. I say just defer to sr on this. --sdwilsh

Dietrich suggested to just do a AsyncBookmarks.jsm module, and completely drop xpcom. I mostly agree, this interface is not going to have cpp users, and we are also making it hard for cpp users (the arrays and so on...). We'd just need some private interface in nsPIPlacesDatabase to forward notifications. What would block us from doing that? --mak

Parent Name

Sync seem to use parentName for reconciliation, but does not look like something that should be part of the API, both the parent and the name can change easily and having 2 volatile informations seem fragile. On the other side caching parent names on the fly in a local hash is probably as efficient. --mak

I think the whole setup is a bit volatile. We should see what the Sync team says about it and if they care that they might have to do a second query to get the actual name. It's not hard to add it if need be, it just feels awkward. --sdwilsh

Having the parent name in there feels awkward but it would make our code much easier while making the places code only slightly more complicated (I hope). I will have to think about alternatives to our current duping algorithm, but there's probably little alternative to having it match folder names. --philikon

Decision: keep the parentName. --sdwilsh

Predecessor

This is similar to the parent name issue: In order to preserve ordering, Sync maintains the GUID of an item's predecessor in each record. So it'd be good if nsIBookmarkInfo were to contain that info. Bonus points for making updateItem understand this and move the item appropriately. --philikon

We'd like to just keep it at itemIndex, due to more complicated queries or more queries. philikon says it's possible, but will follow up. --sdwilsh

It's possible but all options I can think of will require serious change:

  1. We could make bookmarks have the itemIndex instead of predecessorid. However, adding a new bookmark in a folder or moving a bookmark would invalidate a lot of other bookmarks, potentially even all, in that folder.
  2. We could store the predecessor GUID in annotations since we now have async APIs for querying and storing these. But we would have to ensure that this annotation exists on all bookmarks first which would mean a lot of I/O on first sync, pretty much exactly the same kind of I/O that we have to do for GUIDs right now.
  3. Instead of maintaining the ordering on the bookmarks, maintain it on their folders. Inserting or moving a bookmark would invalidate one more object, its folder (as opposed to its successor as it does now). This would mean having a childrenGUIDs array on nsIBookmarkInfo that would get/set the order of a folder's children.

Both option 1 and 3 would mean a version bump for the bookmark storage. These are similarly problematic as global storage version bumps. Fortunately, we're doing one very very soon, so we could just move to the new bookmark storage version now so that it's in place when we get the async APIs. I'm personally leaning towards option 3 --philikon

Changing bookmarks table to drop position could not be that easy, I have to check, but iirc we rely pretty much on that positioning and on it not having "holes". So if that's the idea, we have to be cautious here, if the "bookmark storage" is something on Sync side, then nevermind! --mak

Yes, "bookmark storage" is a Sync component. --philikon

Alternate proposal: Sync adds an annotation on the folder containing the list of guids (in order). This results in no extra API frontage and solves all current issues. --sdwilsh

Tags

nsIBookmarkInfo should most likely include tags or we need a way to set tags async. It's indeed impossible to create a bookmark async and tag it synchronously. it could be possible to add tags in the bookmarks added notification, but is it worth it? --mak

I thought about this, but I'm pretty sure that because we store tags as a bookmark in a different root, Sync will just work magically anyway. We should confirm this with them though. I'd rather not add it yet if we can avoid it because this is already a lot of work. --sdwilsh

We explicitly ignore the tags folder right now, so it doesn't magically work. Sync uses the tagging service API explicitly. So either we should perhaps start syncing the tags folder (if that does indeed magically work) or nsIBookmarkInfo should contain the tags. --philikon

OK, we could add a getBookmarkInfoWithTags (or have another argument to getBookmarkInfo) that could obtain tags too. Syncing the bookmarg tags folder wouldn't work out right for things like the iphone client. --sdwilsh

Take in count in future tags will have to move out of bookmarks folders/table to their own storage, thus it is not sane to request syncs of the tags root. What Sync does using taggign service is correct. Not working here though since it's sync. What's the cost of just returning tags always? If I use this interface is pretty clear I want to know EVERYTHING on a bookmark. --mak

nsIBookmarkInfo readonly

getBookmarkInfo takes a nsIBookmarkInfo object, thus we probably don't want all of its properties being readonly. --mak

It shouldn't matter. We don't have to return the same nsIBookmarkInfo object that we are passed in (in fact, for threadsafety reasons, we shouldn't!) --sdwilsh

Query by URI

Notice that while passing an id or guid will return a single bookmark, passing an uri is not guaranteed to do so. the same uri can have multiple bookmarks. what to do here? either don't allow by uri, or return last modified bookmark (that is what we do today) --mak

Last modified probably works fine, but we should check with the Sync team. I suspect they might actually care about all cases, in which case, we need to change our approach a bit. --sdwilsh

Pretty sure we don't have any need for querying bookmarks by URI. --philikon

Let's just drop support for querying on URI. --sdwilsh

I guess mobile could find querying by uri useful instead... but we can fix this later. --mak

Missing attributes ignored on update

updatebookmarkWithInfo same as above for name. Actually how to handle missing information in the info object? Does the implementer have to collect all info, change and then submit? Or things that should not be changed must be null? How to set a null value then? Having to collect info to be able to change them is going to be more expensive than changing them one by one. --mak

How about updateItem then? The implementer should only have to pass in the id or the guid. I envisioned them also being able to specify only the parts that changed: updateItem({id:2}, {guid:"GUID_HERE"}); This would set the guid on item with id of 2 to "GUID_HERE". Easy to implement because things will just throw NS_ERROR_NOT_IMPLEMENTED. I hope this doesn't get logged to the error console though... --sdwilsh

Yes, it would be great if missing properties on the nsIBookmarkInfo object would simply be ignored in the update. The same should go for annotations. This means that this API wouldn't allow us to remove annotations, but that's ok since we have no need for it. But we do avoid having to first get all the information before updating it. --philikon

This shouldn't be a problem. We can add an API in the future for removal of data, but we do not need to in this cycle (2.0). --sdwilsh

History

See the full spec

Input History

Turns out we want to sync input history too, see bug 597874. It would be great if we could add the following to nsIPlaceInfo:

 /**
  * An array of nsIInputHistory objects for the place.
  */
 readonly nsIVariant inputs;

with nsIInputHistory:

interface nsIInputHistory : nsISupports
{
  ACString input;
  double use_count;
}