Summary | Add modification sequence support to Horde_History |
Queue | Horde Framework Packages |
Queue Version | Git master |
Type | Enhancement |
State | Resolved |
Priority | 1. Low |
Owners | mrubinsk (at) horde (dot) org |
Requester | thomas.jarosch (at) intra2net (dot) com |
Created | 04/11/2013 (4504 days ago) |
Due | |
Updated | 06/10/2013 (4444 days ago) |
Assigned | 04/12/2013 (4503 days ago) |
Resolved | 06/10/2013 (4444 days ago) |
Milestone | |
Patch | No |
State ⇒ Resolved
Priority ⇒ 1. Low
Type ⇒ Enhancement
Queue ⇒ Horde Framework Packages
Version ⇒ Git master
past, that would also mean changes need at least ten seconds until
they are visible to the syncing device. Not pretty but much better
than data loss.
works. So the actual delay with the new code could be anywhere from
$interval + 0 seconds (if the change occurs right at the end of a
heartbeat) to $interval + 10 (if it occurs at the end). $interval, by
default is something like 10 seconds IIRC and is configurable in the
activesync config.
Horde_Core_ActiveSync_Driver, Horde_Core_ActiveSync_Connector and
EACH groupware application's API, and each library would need to
perform multiple is_callable() checks to sniff out what data to
present. What happens if somebody upgrades only Turba, for example,
and not Kronolith? We'd have a mix of timestamps and counters with
no real way for the AS library to know which we need to store/send.
This is a REALLY ugly solution.
current release and implement this in a new branch to be merged for 6.0.
Regarding the fudge factor, if we set it to f.e. ten seconds in the
past, that would also mean changes need at least ten seconds until
they are visible to the syncing device. Not pretty but much better
than data loss.
about application names/versions etc... There is nothing preventing it
from being used against some other, non-horde, groupware stack.
and check for that function name in Horde_ActiveSync.
normally do when it is absolutely necessary. I.e., can't wait until
the next major version number. The problem in this case is the changes
are going to need to be too spread out. Is it doable? Probably, but it
will be an ugly, invasive, error prone change:
(1) Horde_ActiveSync will need to be extended to store (and use) the
new counter in addition to the lastsync timestamp for ping detection.
I.e., Horde_ActiveSync is responsible for storing the state data.
2) Horde_Core - which is where the getChanges() functionality lives -
will need to know about the new counter, and then need to know to
call the correct method in each individual application's APIs to fetch
changes. I.e., Horde_Core is responsible for reporting back changes to
the ActiveSync library from the various groupware apps.
(3) Groupware apps - Each app will need to be extended to accept the
new counter in lieu of (or in addition to) the timestamp.
To maintain BC, we need to ensure that upgrading (1) or (2) without
upgrading (3) does not cause anything to break. When upgrading (3), we
can always require newer versions of (1) and (2). The reverse is not
true.
So, we'd need to add new methods to both Horde_Core_ActiveSync_Driver,
Horde_Core_ActiveSync_Connector and EACH groupware application's API,
and each library would need to perform multiple is_callable() checks
to sniff out what data to present. What happens if somebody upgrades
only Turba, for example, and not Kronolith? We'd have a mix of
timestamps and counters with no real way for the AS library to know
which we need to store/send. This is a REALLY ugly solution.
and the extra method names and matching is_callable() checks would be
removed in 6.0. Though the more I think about what would be required
to do this correctly, waiting until 6.0 is really the best option.
Like checking the API version of an app or check if a function exists?
An ugly kludge would be to add a function "has_sequence_support()"
and check for that function name in Horde_ActiveSync.
That could be removed in 5.1 again.
-----------------------------------
- Add sequence (=monotonic counter) support to Horde_DB.
if they can be used on rows outside of the primary key.
- Store last seen sequence number for each device
- Use new API::getChanges($last_seen_seq, null, true) API
to collect the changes on the server
Horde_ActiveSync would require a updated version of the groupware
apps. We can make the applications require newer versions of
libraries, but we can't do the opposite.
-----------------------------------
- Add sequence (=monotonic counter) support to Horde_DB.
sqlite and MySQL don't support sequences, an example for
race-free emulation can be found in pear's DB package.
If this is too much effort, we can implement it in Horde_History
using one extra table with a primary key and use the last insert id
as sequence number. The added row is immediately deleted again
after the insert.
That's how the emulation works anyway. Probably go this route.
- Add a new unsigned integer row "history_seq" to 'horde_histories' table.
Every manipulation of a history entry increases this value.
- Horde_History
- New API call: function getbySequenceId($cmp, $seq, array
$filters = array(), $parent = null)
- Each Horde_History_Log entry has a new field 'seq'
- Adapt getActionTimestamp() to check the sequence number
if the multiple entries for a guid have the same timestamp
- Extend the API of the following applications:
- mnemo
- nag
- kronolith
- turba
API::getChanges($start, $end)
--> API::getChanges($start, $end, $ts_is_sequence=false)
API::listBy($action, $timestamp, $notepad = NULL, $end = NULL)
--> API::listBy($action, $timestamp, $notepad = NULL, $end =
NULL, $ts_is_sequence=false)
- Adapt ActiveSync code
- Store last seen sequence number for each device
- Use new API::getChanges($last_seen_seq, null, true) API
to collect the changes on the server
- Provide db update scripts
- Unit tests for new Horde_History sequence code
-----------------------------------
Let me know what you think. I'll be gone now till Monday.
query requests until it's done swapping and the pending writes are
committed anyway.
environment in this case, so let's focus on that :)
Your assumption about the problem is correct. Think about how the
history backend works: If no timestamp is given to log(), it will use
the current time as timestamp.
In the worst case the time span between the time() call and the actual
INSERT/UPDATE query might take a few seconds to ages.
The Horde_History::log() function works like this (simplified):
- get the complete history for $guid
- set ts to time() if no timestamp is given
- issue an UPDATE sql statement if replaceAction=true
for every matching row from the previously retrieved entries
OR:
INSERT new entry into the database
The OS might interrupt / slow down the execution of the script
updating the history and let another process instance (the ActiveSync
server) execute first / faster. Increasing the fudge factor will only
hide the problem and will break sooner or later.
The history backend needs a monotonic counter in addition to the
timestamp. The timestamps are good for filtering on "give me all
calendar entries changed in the last two weeks" but are not reliable
for synchronization.
The monotonic counter needs to increase on every history database
modification. We then save the last seen counter value in the
"ActiveSync sync state" and can use this to look up new changes.
I'll have a look at the APIs involved and try to come up with a BC way
to modify the API.
requests until it's done swapping and the pending writes are committed
anyway.
Time to pay attention to my day job.. :)
low end server swapping for a few minutes). The "endTS - fudge
factor" approach is safe.
subject to the same issue from a slow/swapping server?
Regardless of which code adjusts the timestamp, there is still the
chance that the query will run before the data is available in history.
E.g., history writes a bunch of entries at 12345, but due to swapping,
it's not actually available in the history backend until 12350. In the
meantime, at 12347 the activesync code is querying for changes (so it
is asking up until lets say 12346). The changes will still never be
picked up because the next activesync query will have ts_from of 12346.
location :/
State ⇒ Assigned
a low end server swapping for a few minutes). The "endTS - fudge
factor" approach is safe.
added to the timestamp before it is written by the history system to
be sure it's in the future?
low end server swapping for a few minutes). The "endTS - fudge factor"
approach is safe.
the entries, checking the entries, expiring the entries... On the
other hand, we already have code in place to exclude sending changes
to the client that were the result of changes DUE to the client. E.g.,
changing a contact on the device still causes a history entry to be
created, but we must ignore that entry since the client is the one who
made the change in the first place. Perhaps if we need to go this
route, we could utilize the same system for this as well. Again, I'm
saying all of this without looking at code...
"start_ts" is higher than the last noted end_ts.
That still leaves one problem: If an object gets changed twice
during the same second and a client already saw the first change, it
will miss the second change. This risk is rather low but it will
happen sooner or later.
up with a simpler solution...
having the history system increment the ts before writing it, or
activesync decrementing the end_ts before querying is the way to go.
There will still be a slight change of this still happening but if the
history system can't guarantee an entry is available within something
like 10 to 20 seconds of the time it is stamped with,than there are
bigger problems with the API itself here...
We lower the current sync timestamp by -1 and this ensures
we only pick up changes that are done (the past is the past).
(except for clock skews on the server...)
Then we just need to make sure we get the interval window right ('>' vs. '>=')
so we don't pick up entries twice. Since we can't modify the getChanges()
API behavior of the horde apps, we need to +1 or -1 in the ActiveSync module.
Done right, the change itself would be rather small.
last sync,
it will act as a monotonic counter.
database query result
by cutting off everything below the last seen history_id.
Anyway, I don't thing this approach is the way to go. I feel this is
more of a work-around than a solution. As explained in my response,
this would require client knowledge of the internals of the history
backend or would require any history backend to expose another
numerically incrementing counter - neither of which is a current
requirement.
Also this is duplicate functionality to timestamps which are
essentially a numerically incrementing value. I think the correct way
to fix this would be to ensure that entries are guaranteed available
by the timestamp that it is recorded with. Maybe some "fudge factor"
added to the timestamp before it is written by the history system to
be sure it's in the future? A second choice, if this is not desired
for some reason, would be to decrement the end_ts by some fudge factor
to ensure any newly written changes are not occurring in the currently
queried slice of time.
history id detected. Though I really don't like this approach since
it means the client code would need knowledge of how they history
system functions - and it assumes any history backend would both
have some unique entry identifier, and those identifiers are numeric
(or at least increase in some numeric-like fashion).
In the Kolab world, an object can disappear and re-appear. So a
history database entry with a low id might have a current timestamp
and therefore be missed again.
activesync code as well as the history code in order to provide the
needed data - and such changes would need to maintain BC.
the last sync timestamp window (start_ts <= x <= end_ts). We would
also need to keep this list for every action (add, modify, delete).
We can start expiring entries in this list once the current "start_ts"
is higher than the last noted end_ts.
That still leaves one problem: If an object gets changed twice during
the same second and a client already saw the first change, it will
miss the second change. This risk is rather low but it will happen
sooner or later.
Only a monotonic counter will solve this. Let's see if we can come up
with a simpler solution...
Queue ⇒ Synchronization
Assigned to Michael Rubinsky
State ⇒ Feedback
include a change if it's exactly on the desired end timestamp or not
(history_ts <= $end_ts). If not, we should do a "thisSyncTS+1" query
in the ActiveSync backend.
be included since the start_ts is always excluded.
should be written as an atomic action, and be guaranteed to be
completed by the indicated ts. This is obviously not the case here
though. We could change both the start and end ts to be inclusive in
the query, though as you said, this would lead to changes potentially
being picked up more than once. We could possible keep track of the
UIDs of messages that have been changed on the last SYNC response, and
exclude them if they occurred on the exact ts boundary. I have no idea
if the current code would support this though (I don't have access to
code at the moment).
The other possibility would be to somehow track the most recent
history id detected. Though I really don't like this approach since it
means the client code would need knowledge of how they history system
functions - and it assumes any history backend would both have some
unique entry identifier, and those identifiers are numeric (or at
least increase in some numeric-like fashion).
Both approaches would require some API changes to both the activesync
code as well as the history code in order to provide the needed data -
and such changes would need to maintain BC.
the last sync,
it will act as a monotonic counter.
database query result
by cutting off everything below the last seen history_id.
Adapt the query for the history to "history_ts >= $start_ts AND
history_ts <= $end_ts".
This might list entries already shown to the client.
If we now also store which highest 'history_id' we've seen during the
last sync,
it will act as a monotonic counter.
Priority ⇒ 1. Low
State ⇒ Unconfirmed
Patch ⇒ No
Milestone ⇒
Summary ⇒ ActiveSync misses changes from history database
Type ⇒ Bug
Queue ⇒ Horde Framework Packages
I've just discovered a problem in the way ActiveSync gets the changes
from the history database. I had a Kolab based contacts folder with 30
contacts and deleted them all at once.
During that time an iPad and some other devices were syncing in the
background.
Checking the iPad later on, seven contacts still existed on the device. Oops.
The server is a rather slow VM and it took a few seconds to delete all
contacts. Our ActiveSync implementation internally uses
getServerChanges($start_ts, $end_ts). This boils down to a simplified
history database query like
"select * from horde_histories WHERE history_ts > $start_ts AND
history_ts < $end_ts"
I've picked on still existing contact and dumped the history for it:
history_id|object_uid|history_action|history_ts|history_desc|history_who|history_extra
759|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|add|1365665371||mueller|
812|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|delete|1365669188||mueller|
Relevant getServerChanges() calls from the log:
2013-04-11T10:32:59+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365665778,
1365669179, 0, 1)
2013-04-11T10:32:59+02:00 DEBUG: [3473]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365665778,
1365669179, 0, 0)
2013-04-11T10:33:02+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669179,
1365669182, 0, 1)
2013-04-11T10:33:03+02:00 DEBUG: [3473]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669179,
1365669183, 0, 0)
2013-04-11T10:33:07+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669183,
1365669187, 0, 1)
2013-04-11T10:33:08+02:00 DEBUG: [3473]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669183,
1365669188, 0, 0)
2013-04-11T10:33:11+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669191, 0, 1)
2013-04-11T10:33:16+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669196, 0, 1)
2013-04-11T10:33:21+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669201, 0, 1)
2013-04-11T10:33:26+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669206, 0, 1)
2013-04-11T10:33:31+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669211, 0, 1)
2013-04-11T10:33:36+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669216, 0, 1)
2013-04-11T10:33:41+02:00 DEBUG: [2253]
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188,
1365669221, 0, 1)
...
As you can see it queries for "ts > 1365669183 and ts < 1365669188".
Next time it queries for "> 1365669188 and < 1365669191". Since the
change happened exactly on 1365669188, the history query missed it.
I'm not sure if the listBy() / getChanges() horde app API should
include a change if it's exactly on the desired end timestamp or not
(history_ts <= $end_ts). If not, we should do a "thisSyncTS+1" query
in the ActiveSync backend.
But there is more :) Here's a short dump from my history database:
sqlite> select * from horde_histories where history_ts >= 1365669183
and history_ts <= 1365669188;
history_id|object_uid|history_action|history_ts|history_desc|history_who|history_extra
800|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-668c-44ce-b0c2-60b3ac100184|delete|1365669183||mueller|
801|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877de-3368-4245-80e2-60b3ac100184|delete|1365669183||mueller|
802|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d6-2dfc-4b22-aa60-60b3ac100184|delete|1365669183||mueller|
803|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dd-c208-4ae6-9f64-60b3ac100184|delete|1365669184||mueller|
804|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877df-2fdc-4685-92b7-60b3ac100184|delete|1365669184||mueller|
805|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d7-8e9c-4bfb-8ce4-60b3ac100184|delete|1365669185||mueller|
806|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d4-e070-410d-9205-60b3ac100184|delete|1365669185||mueller|
807|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877e0-1584-4846-82c7-60b3ac100184|delete|1365669186||mueller|
808|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877de-b8d4-4a8d-8386-60b3ac100184|delete|1365669186||mueller|
809|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d4-eaf4-4a74-9146-60b3ac100184|delete|1365669187||mueller|
810|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d9-8c1c-4d62-9621-60b3ac100184|delete|1365669187||mueller|
811|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877df-8aac-402e-8a4e-60b3ac100184|delete|1365669188||mueller|
812|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|delete|1365669188||mueller|
As you can see there are several entries with the same change
timestamp. An ActiveSync client might query for "history_ts <
1365669188" and only see the first entry (id: 811). The delete request
via the web UI is still running and adds the second delete request to
the history database also with ts = 1365669188 (id: 812).
The ActiveSync client will miss this one as it already queried for all
changes up to 1365669188. Classic race condition :)
Any idea how to fix this one? A monotonic counter would be best but
that's a rather intrusive change.
If we change the history database query to "history_ts >= start_ts AND
history_ts <= end_ts", we might deliver some changes twice to the
client. That must be avoided, too.
Cheers,
Thomas