Calendar:Syncing Algorithm

From MozillaWiki
Jump to: navigation, search

Syncing

For syncing, you can't really trust the last modified dates. If another computer happens to had a wrong date set, data could get lost. This is bad. So, you need to use generation numbers. Those can be stores in the x-syncid field. Mozilla Calendar can adjust them, no problem there. But other clients will likely not do that. This means you don't know if the remote event has changed.

A possible solution would be to store the lastmodifiedtime of the record when it was last synced. Then in the next sync, compare it with the current lastmodifiedtime. This way you can calculate if the remote record was changed. This wont work, because then only one app could sync to a certain file, not the entire office.

Or a hash of the contents can be stored in the syncid property, and recalculate it the next sync, and compare. This wont work, for the same reason.

Alternativly, a copy of the event can be stored locally when syncing. At the next sync, the events can be compared to see if the remote event was changed. Even better, a variation of copy-on-write van be used. When an event is edited, a copy of the original is saved. This will save diskspace. For every external source the file has to be synced, an addition copy can be made.

Possible sync algorithm

given a remote file, and locally synced to it. Locally stored in some oeIICal impl. (need to figure out how to actually get into that state)

There are two local caledar files: the one holding the date (call it a cache), and one that is empty (call it a scratch). When the user adds or edits an event, the event is copied into the scratch calendar, before the changes are written (But only if no such copy exists). Only after that the local cache is edited. Same when deleting a event. This also makes a copy into the other calendar. The result is that all the events the user edited are saved in the state they were at the time of the last sync.

When syncing again, all changed events are checked for remote events with the same id

  1. If there is no remote event, it means the local event is new, and should be added to the remote
  2. if there is a remote event, but it differs from the changed event, there is a conflict.
  3. if there is a remote event, and it doesn't differ, and no local event, it means the local event was deleted, and so should the remote

pseudocode:

if (!remoteEvent)
  remoteCalendar.add(localEvent);
else if (remoteEvent != changedEvent)
  conflict();
else if (!localEvent)
  remoteCalendar.deleteEvent(changedEvent.id)


The next step is to compare all the remote events with the local copies.

When syning, all the remote events are compared with the local copies.

  1. If there is no local event with the same id, a new event was made in the remote file. This event is copied to the local
  2. If the local event with the same id matches, nothing changed, nothing to do
  3. If the events don't match, but no local copy was made, the remote was changed, and can be copied over the local
  4. If there is a local copy, and the copy matches the remote, the local event was changed, and can be copied over the remote
  5. If there is a local copy, but it doesn't match, there is a problem. Ask the user.

We can take advantage of the fact that we already processed locally deleted events

pseudocode again:

if (!localEvent)
  localCalendar.add(remoteEvent);
else if (localEvent != remoteEvent)
  if (!changeEvent)
    localCalendar.modify(localEvent.id, remoteEvent);
  else if (changeEvent == remoteEvent)
    remoteCalendar.modify(remoteEvent.id, localEvent);
  else
    conflict();


After syncing, the local copies should be deleted, so new ones can be made for the next sync session

Discussions

discussions moved to Calendar:Calendar Discussions

Ago comments on algorithm

The algorithm can be efficient to get the "merged" set as a whole. Less so to get individual records to be merged. I explain. Say you have A (set of current events on a mozilla client), B (set of original events before modification, the scrap dataset), C (set of current events of calendar to sync). To get the merged set quickly you proceed as following:

D = subset of C conatining events with id NOT in B

Now compare events in B and C with same id:

  1. if C(id)==B(id) or if C(id) does not exists -> add event A(id) to D. If A(id) does not exist do nothing.
  2. if C(id)!=B(id) and A(id) does not exists add C(id) to D
  3. if C(id)!=B(id) and A(id) exists -> conflict case. Solve according to preset rules [priority to A/priority to C/duplicate/user interaction] -> add A(id) and/or C(id) to D

D is now the merged set, and in order to get it, you only need #B comparisons (in fact less than that when A(id) and/or C(id) do not exist)! Note that in the scheme above if there is a conflict between a deleted event on one side and the same event being edited on the other side, the edit prevails over deletion, no conflict is called. This is fast, since you start with the block D (the original block D is always in the merged set) which takes away a lot of work. Should you want to know which records, individually, need to be added/edited/deleted into A, then you need to loop also through D, and to have this information you waste #D comparisons. Which suggests that to gain speed, after obtaining the merged set, this set must be dumped as a whole completely replacing both A and C (as opposed to be picky and only change the records that really changed)...

Consider the case of an "online" use when you edit the record and changes are immidiately reflected on a connected calendar. B will only contain the edited event. So the fast algorithm is almost equivalent to "download all + parse + merge edited record + upload all"...

Moreover might you need a second sync algorithm when the scrap db B is not available, in case it gets deleted/corrupted. A sequential index will alleviate the pain. In this case D=C, B=events in A with sequential index > index as of the time you went offline. For each id in B, if C(id) does not exist add B(id) to D, otherwise B(id) is a conflict case. You have more conflicts than it would be necessary if you had a proper B and newly deleted events in A have no "effect" on D (you loose this information anyway since there is no way to know if an event was deleted from A or added to C), but it is as good as it gets. In fact such index could also be used to check the integrity of B...

mvl

That would get the merged set (I think), but that is not enough. You can't just copy the whole set over. You need to use AddEvent(), if only to not confuse observers and alarms. You want to know which events changed. And the problem with the algo is that you claim you start with D. but to get that requires work too. So the algo isn't neccesarily faster.