6.0.0-beta1
8/10/25

[#14368] Improve EAS performance with large History tables
Summary Improve EAS performance with large History tables
Queue Synchronization
Queue Version FRAMEWORK_5_2
Type Enhancement
State Resolved
Priority 1. Low
Owners mrubinsk (at) horde (dot) org
Requester hannes.brunssen (at) ewetel (dot) de
Created 05/21/2016 (3368 days ago)
Due
Updated 06/01/2016 (3357 days ago)
Assigned 05/26/2016 (3363 days ago)
Resolved 05/27/2016 (3362 days ago)
Milestone
Patch No

History
06/01/2016 02:21:12 PM Michael Rubinsky Comment #11 Reply to this comment
Fantastic!
06/01/2016 07:37:09 AM hannes (dot) brunssen (at) ewetel (dot) de Comment #10
New Attachment: database-load.png Download
Reply to this comment
Thank you for the change, the impact of these queries was even bigger 
than i thought. Our database load was between 6 and 8 before, now it's 
below 1.
05/27/2016 02:28:10 PM hannes (dot) brunssen (at) ewetel (dot) de Comment #9 Reply to this comment
Our FORCE INDEX patch with the new index on history_modseq and 
object_uid was a bad idea, the index was fast for history_action add, 
but slower for modify and delete,

We will test your new version on monday.
05/27/2016 01:15:32 PM Michael Rubinsky State ⇒ Resolved
 
05/27/2016 01:48:25 AM Michael Rubinsky Comment #8 Reply to this comment
Ok, this should do it.  If there is a large difference in the last and 
current syncStamp, we now update the state with the new value (only 
implemented in the Sql state driver, not the Mongo driver).

It should be noted that this is only an issue when the client and/or 
user is multiplexing all collections and not using separate 
collections for each calendar/addressbook etc... In that case, the 
modseq is tracked for the individual collection, so there would never 
be that large of a difference in syncstamps unless the client was 
turned off for years....
05/27/2016 01:42:44 AM Git Commit Comment #7 Reply to this comment
Changes have been made in Git (master):

commit f89c2cf8fe6d6ae4f27ae244484167f72a303793
Author: Michael J Rubinsky <mrubinsk@horde.org>
Date:   Thu May 26 21:39:52 2016 -0400

     Ticket: 14368 Update the syncstamp in the collection state.

     If there are no changes, but a large difference between _lastSyncStamp
     and _thisSyncStamp, update the syncstamp while leaving the rest of
     the state untouched.

     Helps with backends that don't track syncStamps per collection
     and have extremely large history datasets.

  .../lib/Horde/ActiveSync/Collections.php           |  6 +++
  .../lib/Horde/ActiveSync/Request/Sync.php          | 10 +++++
  .../ActiveSync/lib/Horde/ActiveSync/State/Base.php |  9 +++++
  .../ActiveSync/lib/Horde/ActiveSync/State/Sql.php  | 45 
++++++++++++++++++++++
  4 files changed, 70 insertions(+)

http://github.com/horde/horde/commit/f89c2cf8fe6d6ae4f27ae244484167f72a303793
05/26/2016 07:40:27 PM Michael Rubinsky Comment #6
Assigned to Michael Rubinsky
State ⇒ Assigned
Reply to this comment
I have some ideas about how to handle this, but need to test.
05/25/2016 04:04:32 PM hannes (dot) brunssen (at) ewetel (dot) de Comment #5 Reply to this comment
What about adding a composite index on history_action, object_uid? 
Since mysql uses the index that returns the least number of rows, 
this should improve performance when the modseq range is extremely 
high.
I tried a few different indexes with different modseq ranges for this 
query on an idle mysql server:

                                                 700k    120k    30k
1 history_modseq                                1.61    0.34    0.08
2 history_action                                2.25    2.16    2.13
3 object_uid                                    0.62    0.65    0.64
4 history_action, object_uid                    0.60    0.43    0.42
5 history_action, history_modseq, object_uid    0.50    0.13    0.02
6 history_modseq, object_uid                    0.39    0.13    0.02

This was order of the preferred indexes on a mysql 5.6 server:
700k: 4, 2
120k: 5, 4, 2
  30k: 5, 1

The history_action index seems to cause more problems than it helps, 
but it might be useful for other queries. The fasted index was never 
selected, so we will probably patch our local Horde installation to 
force the last index for these queries.

It would still be nice to prevent these long modseq ranges. Would it 
be possible to use a specific UPDATE query that only changes the 
modseq if every other field is unchanged and the sync_timestamp is a 
week in the past or could that still lead to data loss?
05/25/2016 12:46:12 PM Michael Rubinsky Comment #4 Reply to this comment
Our MySQL server uses the history_action index, but even if I force 
MySQL to use the object_uid index it's still over 1 second. The 
moqseq 956528 is from April 2015. I hoped there is a way to prevent 
checking the same history entries every 15 seconds for a year, at 
least in these cases when a user doesn't use a collection.
What about adding a composite index on history_action, object_uid? 
Since mysql uses the index that returns the least number of rows, this 
should improve performance when the modseq range is extremely high.
05/23/2016 08:01:27 AM hannes (dot) brunssen (at) ewetel (dot) de Comment #3 Reply to this comment
This query for example takes over 3 seconds to complete:

SELECT DISTINCT object_uid, history_id FROM horde_histories WHERE 
history_modseq > 956528 AND history_modseq <= 2191866 AND 
history_action = 'add' AND object_uid LIKE 
'kronolith:_J9sL_cb_Xuk0gpnaFfIGA9:%';

Our MySQL server uses the history_action index, but even if I force 
MySQL to use the object_uid index it's still over 1 second. The moqseq 
956528 is from April 2015. I hoped there is a way to prevent checking 
the same history entries every 15 seconds for a year, at least in 
these cases when a user doesn't use a collection.

[Show Quoted Text - 44 lines]
05/23/2016 12:43:10 AM Michael Rubinsky Comment #2
State ⇒ Feedback
Priority ⇒ 1. Low
Reply to this comment

This confuses me. We filter the history query on the calendar(s) being 
checked, so the current value of modSeq should be inconsequential 
since the query will only return entries with higher modSeq values AND 
matching the calendar. Could you clarify where you see loq modSeq 
values increasing "the number of history entries that has to be 
checked"?

[Show Quoted Text - 21 lines]
We can't change the request type. This would affect a number of other 
things, including preventing mirroring client-sourced changes back to 
the client and the handling of pending changes. It would possible 
(though admittedly a very small chance)  that this could overwrite any 
existing pending changes in the state. Basically this changes the sync 
state without changing the synckey, which we cannot do.
The update might only be needed on collections that support modseq, 
but the connector is not available from the state. I don't know if 
it would cause problems with mail folders, every 10000 seconds would 
be 8 times per day which would be more than necessary.
05/21/2016 08:43:53 PM hannes (dot) brunssen (at) ewetel (dot) de Comment #1
Patch ⇒ No
State ⇒ New
Milestone ⇒
Queue ⇒ Synchronization
Summary ⇒ Improve EAS performance with large History tables
Type ⇒ Enhancement
Priority ⇒ 2. Medium
Reply to this comment
We noticed a few devices that never send SYNC requests for collections 
like calendar oder contacts, because there are no changes. Without 
SYNC requests the activesync_state entries are not updated and the 
sync_mod values stay relatively low. With the low sync_mod values the 
number of history entries that has to be checked on every PING 
requests grow larger and larger. It takes a few seconds to check for 
updates in over a million history entries for a single calendar which 
is a growing part of our total database load. I don't know if this is 
a client bug or just normal behavior.

These lookups can be reduced if we update the sync state from time to 
time. This could be done if we override the getChanges method in 
Horde_ActiveSync_State_Sql:

public function getChanges(array $options = array())
{
        parent::getChanges($options);
        if ($this->_type == Horde_ActiveSync::REQUEST_TYPE_PING &&
                        empty($this->_changes) &&
                        $this->_lastSyncStamp < $this->_thisSyncStamp - 10000) {

                $this->_lastSyncStamp = $this->_thisSyncStamp;
                $this->_type = Horde_ActiveSync::REQUEST_TYPE_SYNC;
                $this->save();
                $this->_type = Horde_ActiveSync::REQUEST_TYPE_PING;
        }
        return $this->_changes;
}

The type change is a bit ugly, but the save method would clear the 
sync_data otherwise. That could be changed in the save method.

The update might only be needed on collections that support modseq, 
but the connector is not available from the state. I don't know if it 
would cause problems with mail folders, every 10000 seconds would be 8 
times per day which would be more than necessary.

Saved Queries