6.0.0-git
2018-12-15

[#12567] AS: Unlikely race condition leads to data loss
Summary AS: Unlikely race condition leads to data loss
Queue Synchronization
Queue Version Git master
Type Bug
State Assigned
Priority 1. Low
Owners mrubinsk (at) horde (dot) org
Requester thomas.jarosch (at) intra2net (dot) com
Created 2013-08-13 (1950 days ago)
Due
Updated 2013-11-21 (1850 days ago)
Assigned 2013-11-21 (1850 days ago)
Resolved
Milestone 6
Patch No

History
2013-11-21 21:29:26 Michael Rubinsky Comment #4
State ⇒ Assigned
Milestone ⇒ 6
Reply to this comment
Target for H6 due to change in API return values needed to fix this properly.
2013-11-20 18:22:23 Michael Rubinsky Comment #3
State ⇒ Feedback
Reply to this comment
Looking at this again.

The strategy of using an MD5 of the object won't work. When sending 
changes from client->server, we are not guaranteed to have the full 
object represented. Often times the client will only send the changed 
properties, so we would need to first apply the change to the backend, 
then fetch the change again before calculating the MD5. Of course, 
this also opens up another place we could run into the race condition 
we are trying to prevent. I.e., someone else could modify the object 
between our committing the changes and re-fetching the object to 
obtain an MD5. The only real solution here is to refactor the API to 
return a modseq/timestamp (let's call these "syncstamps"), after each 
message change, which would of course be a BC break.
2013-08-13 14:39:21 Michael Rubinsky Assigned to Michael Rubinsky
State ⇒ Assigned
 
2013-08-13 10:40:35 Thomas Jarosch Comment #2 Reply to this comment
When Horde_ActiveSync processes an incoming change, we write down the 
modseq number and the object_uid of that change among other things. 
The idea is to not send back changes originating from a client _back_ 
to this client on the next getChanges() run.

Because of the problems detailed before, the idea is to add
an md5 based change detection. The changes to the code
is limited to Horde_ActiveSync and not that huge.

The current, existing sync map looks like this:
-----------------------------
CREATE TABLE "horde_activesync_map" (
   "message_uid" varchar(255) NOT NULL,
   "sync_modtime" int,
   "sync_key" varchar(255) NOT NULL,
   "sync_devid" varchar(255) NOT NULL,
   "sync_folderid" varchar(255) NOT NULL,
   "sync_user" varchar(255)
, "sync_clientid" varchar(255), "sync_deleted" boolean);
-----------------------------

The only thing we would add is the new
   "sync_md5" varchar(255)

field


Here's the pseudo new code:
-----------------------------
function process_change_from_PIM()
{
     $md5 = md5(array_sort($object));

     if (send_change_to_backend($object) == true) {
         store_into_syncmap(highest_modseq, sync_key, md5 and the others)
     }
}


function send_changes_to_client($start, $end, $current_sync_key)
{
     $changes = getServerChanges($start, $end);

     // GC the sync map (collect all entries for this device+folder)
     $client_changes = "SELECT * from sync_map WHERE folder_id = 
current_folder,
                                           dev_id = current_device,
                                           user = current_user
                                           ORDER BY modseq";           
  // <-- the ORDER BY is important here

     foreach($client_changes as $row => $client_change)
     {
         if ($client_change['modseq'] < $start && 
$client_change['sync_key'] < $current_sync_key) {
              sql("DELETE from syncmap WHERE modseq = 
$client_change['modseq'] AND folder_id=current AND devid=current AND 
user=current");
              unset($client_change[$row]);
         }
     }

     // collect most current md5sum (=object state) on the client
     // (the client could change an object multiple times
         while a big sync _to_ the client is running)
     // the $client_changes array is already sorted by modseq
     // therefore we have the latest object state of the client at the 
end of the loop

     $latest_md5 = array();
     foreach($client_changes as $client_change) {
          $latest_md5[$client_change['object_uid']] = $client_change['md5'];
     }

     // final check if server change originated on the client
     foreach($changes as $id => $change) {
          $object_uid = $change['object_uid'];
          if (isset($latest_md5[$object_uid])) {
               $object = fetch_from_backend($object_uid);
               $server_md5 = md5(array_sort($object));

               if ($server_md5 == $latest_md5['object_uid']) {
                   Horde::log("Ignoring PIM initiated change for: 
object_uid, modseq, md5");
                   unset($changes[$id]);
               }
           }
     }

     // get all remaining changes to send
     $objects = fetch_from_backend($changes);
     send_to_client($objects);
}
-----------------------------

As Michael pointed out, my first implementation is not perfect
since it currently needs another roundtrip through the API.
We try to move this to the point were we need to fetch the object anyway.

The new code simplifies garbage collection of the syncMap, too
since it's self-cleaning as soon as an entry is not needed anymore.

During normal work loads (=changes coming in from the server),
the syncMap is stays empty and no md5 checksumming is done at all.
We only need to checksum changes initiated by the client.

2013-08-13 10:28:09 Thomas Jarosch Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Summary ⇒ AS: Unlikely race condition leads to data loss
Queue ⇒ Synchronization
Milestone ⇒
Patch ⇒ No
Reply to this comment
Hi,

after intensive torture testing the ActiveSync code and day-long
email exchange with Michael ([mjr]) (thanks for your help!)
I figured out this scenario were the current sync engine
might skip a change on the server:

----------------------------------------------------------------
(1) Client sends a SYNC request to the server.
     It notes 10 as the last used modseq in the syncState.

(2) Client sends a change to the server. We import the change
     via _as->driver->ChangeMessage(). Internally that function
     is split into non-atomic parts (2a) and (2b).

(2a) We store the change via the API into the backend.
     The modseq stored in the history table is 13.

(x) Someone else does a change to the same object.
     The modseq stored in the history table is 16.

(2b) ChangeMessage() finally calls smartStatMessage() on "our"
      change from (2a). We will return 16 as the modseq.

(3) We insert the sync_key, the returned modseq and the object_uid 
into the syncMap.
     This will insert the modseq 16 into the syncMap.

(4) The client sends his SYNC request.
     The last sync request was at 10, so it will grab
     all changes between 10 and 16.

(5) The server part sees the MODIFY change for the object_uid sent in (2)
     It will call $backend->statMessage() and return 16 for this message.
     getPIMChangeTS(object_uid) will also return 16.

     As $pim_ts(=16) is >= $stat['mod'](=16) we ignore the
     change as originating from the PIM even though another
     user changed the object.
----------------------------------------------------------------

The root issue here is that the API and the horde apps don't
transport the modseq value after doing a change in Horde_History.

Transporting this value back to the ActiveSync layer is complex
surgery and not doable in a BC way. We needed to come up with another 
solution.
I'll add the proposed solution in a separate message.

Thomas

Saved Queries