Calendar:Calendar Discussions

From MozillaWiki
Jump to: navigation, search

Ago's misc thoughts on sharing/syncing

There are 3 backend types (that can be on LAN/WAN...):

I. Flatfiles (all-records at once, local queries on object obtained from parsed text)

II. DB files (record level access, 1 db engine per client, possibly accessing same shared file, local queries)

III. DB servers or equivalent (centralised, remote queries, requires communication protocol). Forget it for the time being.


There are 5 "situations":

1. local edit when "connected" to flatfile remote backend = 2-way in sequence: lock + "download" + merge + "upload" + lock

2. local edit when "connected" to shared db datafile = 1-way local->remote for individual event

3. switch from online to offline = reload = 1-way: remote -> local. Use sequential index for DB to get newly edited records, do a full download for flatfiles.

4. switch from offline to online = sync = 2-way: remote <-> local. Use sequential index to distinguish conflict cases and trivial cases.

5. syncing third party apps. Use an intermediate file (id|md5) to find out which records were edited offline.


On top of it, there are two possible online configurations (wrt the remote calendar), which affect how (1) and (2) are implemented:

A. You always have a local calendar. All edits occur twice: on the local and on the remote calendar. If you cannot save an edit on the remote calendar you move to offline mode. In most cases queries are performed locally. That includes a client running an instance of sqlite off a data file on a remote folder, or a remote file loaded and parsed into an object in memory which is in turn browsed/queried. -> There is no need to have a heavy weight API for remote calendars... All is needed are a few standardised methods for getting the data (unparsed) in and out. Possibly, local calendars should be handled directly by oeIICal (better if sql based), while remote calendars should be accessed via plugins wich are specific implementations of the same (simpler) interface (reload, sync, addEvent, editEvent, deleteEvent, connect, disconnect...). Plugins should take care of networking if required.

B. You work directly off the remote calendar, when connected to it you do not use the local calendar at all. But when the connection drops or is closed, or the application is closed down, the calendar stored in memory is dumped locally. This makes sense if you are sharing a flatfile ics on a LAN and the local storage is also a flatfile, so that there is little point in using it, or if you are using an sqlite file stored on shared folder. In fact this is like having "local" backends which are in fact shared. The implementation can be done within oeIICal, and some more code for flatfiles (see bug 265274). This will not work over WAN with networking/latency issues but it will be adequate on LANs. And I think we should worry about LANs first...


More details:

Possibly both A and B should be supported. B seems easier to implement, it should be the preferred method for LAN sharing. No need in this case to maintain duplicate copies of the data..

File-based backends are more complex since you need to explicitly take care of locks and create procedures to avoid overwriting other people changes to the same and to other records. Only edited records should be affected. It seems trivial but it is not. In the case of flat files that means that the file must be locked, then "downloaded", then the edited events are merged in, then it is "uploaded" again, then the lock is removed (see bug 265274). It is obviously easier for a DB backend...

In online mode, to incorporate external changes, you do not need sync, you need a reload. It is a unidirectional transfer remote->local. Every X secs merge in the records that where changed on the remote calendar since last reload. This involves two loops first add/edit new records, second go through locally stored events and delete records which do not find a matching id remotely. In the case of flat files, every X secs check if the file was modified externally, if so download it replacing the local one, and reparse.

Direct access to upload/download mechanism should be avoided or made unidirectional. Better approach is to have a read-only property for some calendars (=download only) and create an "export" function. Never allow to manually download (refresh) and upload (publish) the same calendar between local/remote flatfile. Data can get lost.

For both syncing and reloading a sequential index (one that is incremented at each edit, a timestamp is not adequate) is required. One catch: in online mode, the sequential index for events in the local calendar must always have the same value as the ones in the remote calendar (when using A), it cannot be "automatic". So you get the max index for the remote calendar, increment, and use it both locally and remotely. When you go offline, the local indices are incremented "automatically" based on the max value of the local table. Knowing the max value of this index at the time you go offline makes syncing possible. See my comments in http://wiki.mozilla.org/Mozilla2.0?UnifiedStorage

Syncing with third party apps is a different beast. Third party apps will probably not use such index so you need some way of marking the records that were edited offline in order to sync. A simple but possibly inefficient/inaccurate technique is to stringify the events and store id and md5 for each event at the time of the last sync. This is only to check which records were modified offline, and only for third party apps. The stringfication/md5 do not need to guarantee a bit-perfect equality. Third party apps should never access a shared calendar directly unless they can guarantee the integrity of the index (properly incrementing it)...

mvl's comments back:

In short: Why distinguish a lot of different cases, when thay all boil down to syncing? Sure, reloading to merge the remote changes back is simpler then sync, but only because there are no local changes. But syncs detects that. No need to write additional code. So i suggest to think about the worst case, fix that, and all other cases will work too. And worst case is a shared file with non-mozilla clients writing to it.

Ago

Hmmm as you say, when you are reloading (i.e. merging external changes) that is not strictly speaking syncing since there are never conflict cases, this makes a difference when you use flatfiles: you simply overwrite the local file with the remote one and parse it all. Also when you are editing while online, connected to a DB you are not really syncing you only edit a row of a recordset. And if you are online connected to a shared flatfile the sequence "download"+merge+"upload" is not really a record by record syncing... To better see this consider what happens when you switch offline->online with flatfiles: you need to scan all records, and you cannot use the download/upload scheme. Those are different cases.

For LAN sharing you can (and probably should) use (B) above. It is a oeIICal implementantion, no need for plugins, since storage appears always as "local", even if on a shared folder. In this case moving offline is equivalent to dumping and changing the path of the local server to point to a file in the profile, and moving online means syncing and changing the "local" path to point to a file on a shared folder. At any time you only use one storage (oeIICal). My bug in practice implements part of BI ("online" only, with shared flatfile, over LAN).

As for third party apps I think they are a different issue, because you have to find a way to detect records edited offline. Maybe there is a dirty field most third party apps use, otherwise it is relatively involved... Letting third party apps directly access in r/w mode a shared mozilla calendar is still a different issue, and a major one... We might want to have an API they can use to access the storage in a safe way (in particular they should not disrupt the index)...

Let's do simpler things first: syncing for mozilla apps (i.e. online/offline capabilities using the sequential index) and LAN sharing (BI/BII). Lan sharing is by far the most common case (it is much more likely you do sharing on a LAN than on a WAN)... and one case where sponsors can be found... The online part of BI is already there (see bug 265274), if you add syncing (+sequential index) BI can be completed with online/offline facilities. Then doing BII should be easier (no explicit locks, no all-at-once edits). Once we have solid LAN sharing to build upon, then we worry about third party apps and latency/networking issues (A/plugins).

Similar capabilities should aslo be applied to the Thunderbird \AddressBook.

mvl

The only reason i see to not always use the sync code is performance, and with it lock time. But i think syncing can be pretty smart and pretty fast. So let's start with doing that, and if it really turns out to be a problem, let's fix that, or special case it. Not start with adding lots of special cases until you know it is really needed.

For better performance, a one-way-sync (aka merge) can be added, using the same trick to fund out which records are dirty. This is needed anyway for getting into a synced state when adding a new calendar.

ago

No prob with me. I suggest you start with my code and try to modify it so that it uses syncing (you will need to modify both reload and part of \RetrieveAndSaveLocalCalendar. You get locks and checks for free and the differences between the two approaches will be immidiately evident. As mentioned let's focus first on LAN sharing+offline capabilities with flatfiles... In fact I ditched my plugin idea, i think it is for later on (A)...