Summary | AS: Unlikely race condition leads to data loss |
Queue | Synchronization |
Queue Version | Git master |
Type | Bug |
State | Assigned |
Priority | 1. Low |
Owners | mrubinsk (at) horde (dot) org |
Requester | thomas.jarosch (at) intra2net (dot) com |
Created | 08/13/2013 (4292 days ago) |
Due | |
Updated | 11/21/2013 (4192 days ago) |
Assigned | 11/21/2013 (4192 days ago) |
Resolved | |
Milestone | 6 |
Patch | No |
State ⇒ Assigned
Milestone ⇒ 6
State ⇒ Feedback
The strategy of using an MD5 of the object won't work. When sending
changes from client->server, we are not guaranteed to have the full
object represented. Often times the client will only send the changed
properties, so we would need to first apply the change to the backend,
then fetch the change again before calculating the MD5. Of course,
this also opens up another place we could run into the race condition
we are trying to prevent. I.e., someone else could modify the object
between our committing the changes and re-fetching the object to
obtain an MD5. The only real solution here is to refactor the API to
return a modseq/timestamp (let's call these "syncstamps"), after each
message change, which would of course be a BC break.
Assigned to Michael Rubinsky
modseq number and the object_uid of that change among other things.
The idea is to not send back changes originating from a client _back_
to this client on the next getChanges() run.
Because of the problems detailed before, the idea is to add
an md5 based change detection. The changes to the code
is limited to Horde_ActiveSync and not that huge.
The current, existing sync map looks like this:
-----------------------------
CREATE TABLE "horde_activesync_map" (
"message_uid" varchar(255) NOT NULL,
"sync_modtime" int,
"sync_key" varchar(255) NOT NULL,
"sync_devid" varchar(255) NOT NULL,
"sync_folderid" varchar(255) NOT NULL,
"sync_user" varchar(255)
, "sync_clientid" varchar(255), "sync_deleted" boolean);
-----------------------------
The only thing we would add is the new
"sync_md5" varchar(255)
field
Here's the pseudo new code:
-----------------------------
function process_change_from_PIM()
{
$md5 = md5(array_sort($object));
if (send_change_to_backend($object) == true) {
store_into_syncmap(highest_modseq, sync_key, md5 and the others)
}
}
function send_changes_to_client($start, $end, $current_sync_key)
{
$changes = getServerChanges($start, $end);
// GC the sync map (collect all entries for this device+folder)
$client_changes = "SELECT * from sync_map WHERE folder_id =
current_folder,
dev_id = current_device,
user = current_user
ORDER BY modseq";
// <-- the ORDER BY is important here
foreach($client_changes as $row => $client_change)
{
if ($client_change['modseq'] < $start &&
$client_change['sync_key'] < $current_sync_key) {
sql("DELETE from syncmap WHERE modseq =
$client_change['modseq'] AND folder_id=current AND devid=current AND
user=current");
unset($client_change[$row]);
}
}
// collect most current md5sum (=object state) on the client
// (the client could change an object multiple times
while a big sync _to_ the client is running)
// the $client_changes array is already sorted by modseq
// therefore we have the latest object state of the client at the
end of the loop
$latest_md5 = array();
foreach($client_changes as $client_change) {
$latest_md5[$client_change['object_uid']] = $client_change['md5'];
}
// final check if server change originated on the client
foreach($changes as $id => $change) {
$object_uid = $change['object_uid'];
if (isset($latest_md5[$object_uid])) {
$object = fetch_from_backend($object_uid);
$server_md5 = md5(array_sort($object));
if ($server_md5 == $latest_md5['object_uid']) {
Horde::log("Ignoring PIM initiated change for:
object_uid, modseq, md5");
unset($changes[$id]);
}
}
}
// get all remaining changes to send
$objects = fetch_from_backend($changes);
send_to_client($objects);
}
-----------------------------
As Michael pointed out, my first implementation is not perfect
since it currently needs another roundtrip through the API.
We try to move this to the point were we need to fetch the object anyway.
The new code simplifies garbage collection of the syncMap, too
since it's self-cleaning as soon as an entry is not needed anymore.
During normal work loads (=changes coming in from the server),
the syncMap is stays empty and no md5 checksumming is done at all.
We only need to checksum changes initiated by the client.
Priority ⇒ 1. Low
Type ⇒ Bug
Summary ⇒ AS: Unlikely race condition leads to data loss
Queue ⇒ Synchronization
Milestone ⇒
Patch ⇒ No
State ⇒ Unconfirmed
after intensive torture testing the ActiveSync code and day-long
email exchange with Michael ([mjr]) (thanks for your help!)
I figured out this scenario were the current sync engine
might skip a change on the server:
----------------------------------------------------------------
(1) Client sends a SYNC request to the server.
It notes 10 as the last used modseq in the syncState.
(2) Client sends a change to the server. We import the change
via _as->driver->ChangeMessage(). Internally that function
is split into non-atomic parts (2a) and (2b).
(2a) We store the change via the API into the backend.
The modseq stored in the history table is 13.
(x) Someone else does a change to the same object.
The modseq stored in the history table is 16.
(2b) ChangeMessage() finally calls smartStatMessage() on "our"
change from (2a). We will return 16 as the modseq.
(3) We insert the sync_key, the returned modseq and the object_uid
into the syncMap.
This will insert the modseq 16 into the syncMap.
(4) The client sends his SYNC request.
The last sync request was at 10, so it will grab
all changes between 10 and 16.
(5) The server part sees the MODIFY change for the object_uid sent in (2)
It will call $backend->statMessage() and return 16 for this message.
getPIMChangeTS(object_uid) will also return 16.
As $pim_ts(=16) is >= $stat['mod'](=16) we ignore the
change as originating from the PIM even though another
user changed the object.
----------------------------------------------------------------
The root issue here is that the API and the horde apps don't
transport the modseq value after doing a change in Horde_History.
Transporting this value back to the ActiveSync layer is complex
surgery and not doable in a BC way. We needed to come up with another
solution.
I'll add the proposed solution in a separate message.
Thomas