6.0.0-beta1
8/10/25

[#12175] Add modification sequence support to Horde_History
Summary Add modification sequence support to Horde_History
Queue Horde Framework Packages
Queue Version Git master
Type Enhancement
State Resolved
Priority 1. Low
Owners mrubinsk (at) horde (dot) org
Requester thomas.jarosch (at) intra2net (dot) com
Created 04/11/2013 (4504 days ago)
Due
Updated 06/10/2013 (4444 days ago)
Assigned 04/12/2013 (4503 days ago)
Resolved 06/10/2013 (4444 days ago)
Milestone
Patch No

History
06/10/2013 12:39:55 PM Thomas Jarosch Comment #23 Reply to this comment
Implemented in Git.
w00t! :) Very nice.

06/10/2013 04:19:18 AM Michael Rubinsky Comment #22
State ⇒ Resolved
Reply to this comment
Implemented in Git.
04/22/2013 04:23:13 PM Michael Rubinsky State ⇒ Accepted
Priority ⇒ 1. Low
Type ⇒ Enhancement
Queue ⇒ Horde Framework Packages
Version ⇒ Git master
 
04/22/2013 04:22:40 PM Michael Rubinsky Summary ⇒ Add modification sequence support to Horde_History
 
04/18/2013 04:57:10 PM Michael Rubinsky Comment #21 Reply to this comment
Regarding the fudge factor, if we set it to f.e. ten seconds in the 
past, that would also mean changes need at least ten seconds until 
they are visible to the syncing device. Not pretty but much better 
than data loss.
There is already a built in wait due to the way the sync/heartbeat 
works. So the actual delay with the new code could be anywhere from 
$interval + 0 seconds (if the change occurs right at the end of a 
heartbeat) to $interval + 10 (if it occurs at the end). $interval, by 
default is something like 10 seconds IIRC and is configurable in the 
activesync config.


04/18/2013 07:24:24 AM Thomas Jarosch Comment #20 Reply to this comment
So, we'd need to add new methods to both 
Horde_Core_ActiveSync_Driver, Horde_Core_ActiveSync_Connector and 
EACH groupware application's API, and each library would need to 
perform multiple is_callable() checks to sniff out what data to 
present.  What happens if somebody upgrades only Turba, for example, 
and not Kronolith? We'd have a mix of timestamps and counters with 
no real way for the AS library to know which we need to store/send. 
This is a REALLY ugly solution.
Yep, that's really messy. So let's go the fudge factor route for the 
current release and implement this in a new branch to be merged for 6.0.

Regarding the fudge factor, if we set it to f.e. ten seconds in the 
past, that would also mean changes need at least ten seconds until 
they are visible to the syncing device. Not pretty but much better 
than data loss.

04/15/2013 02:21:28 PM Michael Rubinsky Comment #19 Reply to this comment

[Show Quoted Text - 13 lines]
No, not a version check, because the ActiveSync library knows nothing 
about application names/versions etc... There is nothing preventing it 
from being used against some other, non-horde, groupware stack.
An ugly kludge would be to add a function "has_sequence_support()"
and check for that function name in Horde_ActiveSync.
Generally speaking, this is the only possible solution, and is what we 
normally do when it is absolutely necessary. I.e., can't wait until 
the next major version number. The problem in this case is the changes 
are going to need to be too spread out. Is it doable? Probably, but it 
will be an ugly, invasive, error prone change:

(1) Horde_ActiveSync will need to be extended to store (and use) the 
new counter in addition to the lastsync timestamp for ping detection. 
I.e., Horde_ActiveSync is responsible for storing the state data.

2) Horde_Core - which is where the getChanges() functionality lives - 
will need to know about  the new counter, and then need to know to 
call the correct method in each individual application's APIs to fetch 
changes. I.e., Horde_Core is responsible for reporting back changes to 
the ActiveSync library from the various groupware apps.

(3) Groupware apps - Each app will need to be extended to accept the 
new counter in lieu of (or in addition to) the timestamp.

To maintain BC, we need to ensure that upgrading (1) or (2) without 
upgrading (3) does not cause anything to break. When upgrading (3), we 
can always require newer versions of  (1) and (2). The reverse is not 
true.

So, we'd need to add new methods to both Horde_Core_ActiveSync_Driver, 
Horde_Core_ActiveSync_Connector and EACH groupware application's API, 
and each library would need to perform multiple is_callable() checks 
to sniff out what data to present.  What happens if somebody upgrades 
only Turba, for example, and not Kronolith? We'd have a mix of 
timestamps and counters with no real way for the AS library to know 
which we need to store/send. This is a REALLY ugly solution.
That could be removed in 5.1 again.
No, if this were implemented, it wouldn't be implemented until 5.1 - 
and the extra method names and matching is_callable() checks would be 
removed in 6.0. Though the more I think about what would be required 
to do this correctly, waiting until 6.0 is really the best option.



04/15/2013 11:03:15 AM Thomas Jarosch Comment #18 Reply to this comment

[Show Quoted Text - 9 lines]
Is there a usual route in horde how to work around this?

Like checking the API version of an app or check if a function exists?
An ugly kludge would be to add a function "has_sequence_support()"
and check for that function name in Horde_ActiveSync.
That could be removed in 5.1 again.

04/12/2013 04:29:14 PM Michael Rubinsky State ⇒ Feedback
 
04/12/2013 01:16:28 PM Michael Rubinsky Comment #17 Reply to this comment
First draft of needed changes:
-----------------------------------
- Add sequence (=monotonic counter) support to Horde_DB.
Horde_Db already supports sequences, though I'd have to double check 
if they can be used on rows outside of the primary key.
- Adapt ActiveSync code
     - Store last seen sequence number for each device
     - Use new API::getChanges($last_seen_seq, null, true) API
       to collect the changes on the server
This is a BC break. This would mean the new version of 
Horde_ActiveSync would require a updated version of the groupware 
apps. We can make the applications require newer versions of 
libraries, but we can't do the opposite.
04/12/2013 12:24:13 PM Thomas Jarosch Comment #16 Reply to this comment
First draft of needed changes:
-----------------------------------
- Add sequence (=monotonic counter) support to Horde_DB.

   sqlite and MySQL don't support sequences, an example for
   race-free emulation can be found in pear's DB package.

   If this is too much effort, we can implement it in Horde_History
   using one extra table with a primary key and use the last insert id
   as sequence number. The added row is immediately deleted again 
after the insert.
   That's how the emulation works anyway. Probably go this route.

- Add a new unsigned integer row "history_seq" to 'horde_histories' table.
   Every manipulation of a history entry increases this value.

- Horde_History
     - New API call: function getbySequenceId($cmp, $seq, array 
$filters = array(), $parent = null)
     - Each Horde_History_Log entry has a new field 'seq'
     - Adapt getActionTimestamp() to check the sequence number
       if the multiple entries for a guid have the same timestamp

- Extend the API of the following applications:
     - mnemo
     - nag
     - kronolith
     - turba

     API::getChanges($start, $end)
         --> API::getChanges($start, $end, $ts_is_sequence=false)

     API::listBy($action, $timestamp, $notepad = NULL, $end = NULL)
         --> API::listBy($action, $timestamp, $notepad = NULL, $end = 
NULL, $ts_is_sequence=false)

- Adapt ActiveSync code
     - Store last seen sequence number for each device
     - Use new API::getChanges($last_seen_seq, null, true) API
       to collect the changes on the server

- Provide db update scripts
- Unit tests for new Horde_History sequence code
-----------------------------------

Let me know what you think. I'll be gone now till Monday.

04/12/2013 08:06:01 AM Thomas Jarosch Comment #15 Reply to this comment
Nevermind. I guess the swapping server wouldn't be answering the 
query requests until it's done swapping and the pending writes are 
committed anyway.
A swapping server is just a specialization of a multitasking 
environment in this case, so let's focus on that :)

Your assumption about the problem is correct. Think about how the 
history backend works: If no timestamp is given to log(), it will use 
the current time as timestamp.
In the worst case the time span between the time() call and the actual 
INSERT/UPDATE query might take a few seconds to ages.

The Horde_History::log() function works like this (simplified):
- get the complete history for $guid
- set ts to time() if no timestamp is given
- issue an UPDATE sql statement if replaceAction=true
   for every matching row from the previously retrieved entries

   OR:
   INSERT new entry into the database


The OS might interrupt / slow down the execution of the script 
updating the history and let another process instance (the ActiveSync 
server) execute first / faster. Increasing the fudge factor will only 
hide the problem and will break sooner or later.

The history backend needs a monotonic counter in addition to the 
timestamp. The timestamps are good for filtering on "give me all 
calendar entries changed in the last two weeks" but are not reliable 
for synchronization.

The monotonic counter needs to increase on every history database 
modification. We then save the last seen counter value in the 
"ActiveSync sync state" and can use this to look up new changes.

I'll have a look at the APIs involved and try to come up with a BC way 
to modify the API.

04/11/2013 04:37:53 PM Michael Rubinsky Comment #14 Reply to this comment
Nevermind. I guess the swapping server wouldn't be answering the query 
requests until it's done swapping and the pending writes are committed 
anyway.

Time to pay attention to my day job.. :)
04/11/2013 04:25:01 PM Michael Rubinsky Comment #13 Reply to this comment
This will miss changes again if the server is too slow (think about a
low end server swapping for a few minutes). The "endTS - fudge
factor" approach is safe.
True. Sounds like a plan :)
Actually, thinking about this a bit more,  won't either approach be 
subject to the same issue from a slow/swapping server?

Regardless of which code adjusts the timestamp, there is still the 
chance that the query will run before the data is available in history.

E.g., history writes a bunch of entries at 12345, but due to swapping, 
it's not actually available in the history backend until 12350. In the 
meantime, at 12347 the activesync code is querying for changes (so it 
is asking up until lets say 12346). The changes will still never be 
picked up because the next activesync query will have ts_from of 12346.

04/11/2013 04:16:57 PM Michael Rubinsky Comment #12 Reply to this comment
btw: Discussing this one IRC might have been less chaotic ;)
True, but I don't currently have access to IRC from my current work 
location :/

04/11/2013 03:18:14 PM Thomas Jarosch Comment #11 Reply to this comment
btw: Discussing this one IRC might have been less chaotic ;)

04/11/2013 03:16:34 PM Michael Rubinsky Comment #10
State ⇒ Assigned
Reply to this comment
This will miss changes again if the server is too slow (think about 
a low end server swapping for a few minutes). The "endTS - fudge 
factor" approach is safe.
True. Sounds like a plan :)

04/11/2013 03:13:10 PM Thomas Jarosch Comment #9 Reply to this comment
Maybe some "fudge factor"
added to the timestamp before it is written by the history system to
be sure it's in the future?
This will miss changes again if the server is too slow (think about a 
low end server swapping for a few minutes). The "endTS - fudge factor" 
approach is safe.

04/11/2013 03:10:39 PM Michael Rubinsky Comment #8 Reply to this comment

[Show Quoted Text - 20 lines]
Yeah, though this seems like a lot of extra overhead, between creating 
the entries, checking the entries, expiring the entries... On the 
other hand, we already have code in place to exclude sending changes 
to the client that were the result of changes DUE to the client. E.g., 
changing a contact on the device still causes a history entry to be 
created, but we must ignore that entry since the client is the one who 
made the change in the first place. Perhaps if we need to go this 
route, we could utilize the same system for this as well. Again, I'm 
saying all of this without looking at code...
We can start expiring entries in this list once the current 
"start_ts" is higher than the last noted end_ts.

That still leaves one problem: If an object gets changed twice 
during the same second and a client already saw the first change, it 
will miss the second change. This risk is rather low but it will 
happen sooner or later.
Exactly. We would be exchanging one issue for another.
Only a monotonic counter will solve this. Let's see if we can come 
up with a simpler solution...
The more I think about it, I think my previous suggestion of either 
having the history system increment the ts before writing it, or 
activesync decrementing the end_ts before querying is the way to go. 
There will still be a slight change of this still happening but if the 
history system can't guarantee an entry is available within something 
like 10 to 20 seconds of the time it is stamped with,than there are 
bigger problems with the API itself here...

04/11/2013 03:10:17 PM Thomas Jarosch Comment #7 Reply to this comment

[Show Quoted Text - 9 lines]
Again, the same idea occurred to me while discussing this with a colleague:

We lower the current sync timestamp by -1 and this ensures
we only pick up changes that are done (the past is the past).
(except for clock skews on the server...)

Then we just need to make sure we get the interval window right ('>' vs. '>=')
so we don't pick up entries twice. Since we can't modify the getChanges()
API behavior of the horde apps, we need to +1 or -1 in the ActiveSync module.

Done right, the change itself would be rather small.

04/11/2013 03:01:39 PM Michael Rubinsky Comment #6 Reply to this comment
If we now also store which highest 'history_id' we've seen during the
last sync,
it will act as a monotonic counter.
Missing sentence: We can remove already seen history ids from the 
database query result
by cutting off everything below the last seen history_id.
It seems our replies crossed paths...

Anyway, I don't thing this approach is the way to go.  I feel this is 
more of a work-around than a solution.  As explained in my response, 
this would require client knowledge of the internals of the history 
backend or would require any history backend to expose another 
numerically incrementing counter - neither of which is a current 
requirement.

Also this is duplicate functionality to timestamps which are 
essentially a numerically incrementing value. I think the correct way 
to fix this would be to ensure that entries are guaranteed available 
by the timestamp that it is recorded with. Maybe some "fudge factor" 
added to the timestamp before it is written by the history system to 
be sure it's in the future? A second choice, if this is not desired 
for some reason, would be to decrement the end_ts by some fudge factor 
to ensure any newly written changes are not occurring in the currently 
queried slice of time.

04/11/2013 02:47:05 PM Thomas Jarosch Comment #5 Reply to this comment
The other possibility would be to somehow track the most recent 
history id detected. Though I really don't like this approach since 
it means the client code would need knowledge of how they history 
system functions - and it assumes any history backend would both 
have some unique entry identifier, and those identifiers are numeric 
(or at least increase in some numeric-like fashion).
My idea of abusing the history_id as monotonic counter is doomed anway:
In the Kolab world, an object can disappear and re-appear. So a 
history database entry with a low id might have a current timestamp 
and therefore be missed again.
Both approaches would require some API changes to both the 
activesync code as well as the history code in order to provide the 
needed data - and such changes would need to maintain BC.
It seems like a good idea to keep an array of seen object UIDs during 
the last sync timestamp window (start_ts <= x <= end_ts). We would 
also need to keep this list for every action (add, modify, delete).

We can start expiring entries in this list once the current "start_ts" 
is higher than the last noted end_ts.

That still leaves one problem: If an object gets changed twice during 
the same second and a client already saw the first change, it will 
miss the second change. This risk is rather low but it will happen 
sooner or later.

Only a monotonic counter will solve this. Let's see if we can come up 
with a simpler solution...

04/11/2013 02:31:53 PM Michael Rubinsky Version ⇒ Git master
Queue ⇒ Synchronization
 
04/11/2013 02:31:21 PM Michael Rubinsky Comment #4
Assigned to Michael Rubinsky
State ⇒ Feedback
Reply to this comment
I'm not sure if the listBy() / getChanges() horde app API should 
include a change if it's exactly on the desired end timestamp or not 
(history_ts <= $end_ts). If not, we should do a "thisSyncTS+1" query 
in the ActiveSync backend.
Yes, it absolutely should. Otherwise that exact timestamp will never 
be included since the start_ts is always excluded.
But there is more :) Here's a short dump from my history database:
<snip log>

[Show Quoted Text - 10 lines]
This one is rather trickier. Ideally, all entries with the same ts 
should be written as an atomic action, and be guaranteed to be 
completed by the indicated ts. This is obviously not the case here 
though. We could change both the start and end ts to be inclusive in 
the query, though as you said, this would lead to changes potentially 
being picked up more than once. We could possible keep track of the 
UIDs of messages that have been changed on the last SYNC response, and 
exclude them if they occurred on the exact ts boundary. I have no idea 
if the current code would support this though (I don't have access to 
code at the moment).

The other possibility would be to somehow track the most recent 
history id detected. Though I really don't like this approach since it 
means the client code would need knowledge of how they history system 
functions - and it assumes any history backend would both have some 
unique entry identifier, and those identifiers are numeric (or at 
least increase in some numeric-like fashion).

Both approaches would require some API changes to both the activesync 
code as well as the history code in order to provide the needed data - 
and such changes would need to maintain BC.


04/11/2013 02:20:16 PM Thomas Jarosch Comment #3 Reply to this comment
If we now also store which highest 'history_id' we've seen during 
the last sync,
it will act as a monotonic counter.
Missing sentence: We can remove already seen history ids from the 
database query result
by cutting off everything below the last seen history_id.

04/11/2013 02:17:40 PM Thomas Jarosch Comment #2 Reply to this comment
Solution proposal:

Adapt the query for the history to "history_ts >= $start_ts AND 
history_ts <= $end_ts".
This might list entries already shown to the client.

If we now also store which highest 'history_id' we've seen during the 
last sync,
it will act as a monotonic counter.

04/11/2013 01:52:15 PM Thomas Jarosch Comment #1
Priority ⇒ 1. Low
State ⇒ Unconfirmed
Patch ⇒ No
Milestone ⇒
Summary ⇒ ActiveSync misses changes from history database
Type ⇒ Bug
Queue ⇒ Horde Framework Packages
Reply to this comment
Hi,

I've just discovered a problem in the way ActiveSync gets the changes 
from the history database. I had a Kolab based contacts folder with 30 
contacts and deleted them all at once.
During that time an iPad and some other devices were syncing in the 
background.

Checking the iPad later on, seven contacts still existed on the device. Oops.

The server is a rather slow VM and it took a few seconds to delete all 
contacts. Our ActiveSync implementation internally uses 
getServerChanges($start_ts, $end_ts). This boils down to a simplified 
history database query like
     "select * from horde_histories WHERE history_ts > $start_ts AND 
history_ts < $end_ts"


I've picked on still existing contact and dumped the history for it:

history_id|object_uid|history_action|history_ts|history_desc|history_who|history_extra
759|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|add|1365665371||mueller|
812|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|delete|1365669188||mueller|


Relevant getServerChanges() calls from the log:
2013-04-11T10:32:59+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365665778, 
1365669179, 0, 1)
2013-04-11T10:32:59+02:00 DEBUG: [3473] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365665778, 
1365669179, 0, 0)
2013-04-11T10:33:02+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669179, 
1365669182, 0, 1)
2013-04-11T10:33:03+02:00 DEBUG: [3473] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669179, 
1365669183, 0, 0)
2013-04-11T10:33:07+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669183, 
1365669187, 0, 1)
2013-04-11T10:33:08+02:00 DEBUG: [3473] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669183, 
1365669188, 0, 0)
2013-04-11T10:33:11+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669191, 0, 1)
2013-04-11T10:33:16+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669196, 0, 1)
2013-04-11T10:33:21+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669201, 0, 1)
2013-04-11T10:33:26+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669206, 0, 1)
2013-04-11T10:33:31+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669211, 0, 1)
2013-04-11T10:33:36+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669216, 0, 1)
2013-04-11T10:33:41+02:00 DEBUG: [2253] 
Horde_Core_ActiveSync_Driver::getServerChanges(@Contacts@, 1365669188, 
1365669221, 0, 1)
...

As you can see it queries for "ts > 1365669183 and ts < 1365669188". 
Next time it queries for "> 1365669188 and < 1365669191". Since the 
change happened exactly on 1365669188, the history query missed it.

I'm not sure if the listBy() / getChanges() horde app API should 
include a change if it's exactly on the desired end timestamp or not 
(history_ts <= $end_ts). If not, we should do a "thisSyncTS+1" query 
in the ActiveSync backend.


But there is more :) Here's a short dump from my history database:

sqlite> select * from horde_histories where history_ts >= 1365669183 
and history_ts <= 1365669188;
history_id|object_uid|history_action|history_ts|history_desc|history_who|history_extra
800|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-668c-44ce-b0c2-60b3ac100184|delete|1365669183||mueller|
801|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877de-3368-4245-80e2-60b3ac100184|delete|1365669183||mueller|
802|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d6-2dfc-4b22-aa60-60b3ac100184|delete|1365669183||mueller|
803|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dd-c208-4ae6-9f64-60b3ac100184|delete|1365669184||mueller|
804|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877df-2fdc-4685-92b7-60b3ac100184|delete|1365669184||mueller|
805|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d7-8e9c-4bfb-8ce4-60b3ac100184|delete|1365669185||mueller|
806|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d4-e070-410d-9205-60b3ac100184|delete|1365669185||mueller|
807|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877e0-1584-4846-82c7-60b3ac100184|delete|1365669186||mueller|
808|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877de-b8d4-4a8d-8386-60b3ac100184|delete|1365669186||mueller|
809|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d4-eaf4-4a74-9146-60b3ac100184|delete|1365669187||mueller|
810|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877d9-8c1c-4d62-9621-60b3ac100184|delete|1365669187||mueller|
811|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877df-8aac-402e-8a4e-60b3ac100184|delete|1365669188||mueller|
812|turba:X-SLUZ40uwb8ZZt1JOhQWg4:514877dc-d794-4d57-aae8-60b3ac100184|delete|1365669188||mueller|


As you can see there are several entries with the same change 
timestamp. An ActiveSync client might query for "history_ts < 
1365669188" and only see the first entry (id: 811). The delete request 
via the web UI is still running and adds the second delete request to 
the history database also with ts = 1365669188 (id: 812).

The ActiveSync client will miss this one as it already queried for all 
changes up to 1365669188. Classic race condition :)

Any idea how to fix this one? A monotonic counter would be best but 
that's a rather intrusive change.

If we change the history database query to "history_ts >= start_ts AND 
history_ts <= end_ts", we might deliver some changes twice to the 
client. That must be avoided, too.

Cheers,
Thomas

Saved Queries