6.0.0-git
2019-12-06

[#13231] Hashtable session handler doesn't unlock on session close
Summary Hashtable session handler doesn't unlock on session close
Queue Horde Framework Packages
Queue Version Git master
Type Enhancement
State Assigned
Priority 1. Low
Owners
Requester arjen+horde (at) de-korte (dot) org
Created 2014-05-28 (2018 days ago)
Due
Updated 2014-05-29 (2017 days ago)
Assigned 2014-05-29 (2017 days ago)
Resolved
Milestone
Patch No

History
2014-05-29 20:06:52 Git Commit Comment #23 Reply to this comment
Changes have been made in Git (master):

commit fe943741f61c4d80764c87fefb4ec07a11252451
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Thu May 29 14:06:22 2014 -0600

     [mms] In the Predis driver, ramp up to the maximum unlock wait 
interval (Request #13231).

  framework/HashTable/lib/Horde/HashTable/Predis.php |    4 ++--
  framework/HashTable/package.xml                    |    4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

http://github.com/horde/horde/commit/fe943741f61c4d80764c87fefb4ec07a11252451
2014-05-29 20:06:47 Git Commit Comment #22 Reply to this comment
Changes have been made in Git (master):

commit 5cd6d790db2d55491025306583d01369a4143a03
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Thu May 29 14:02:26 2014 -0600

     Ticket #13231: session backend handling must be handled through 
Horde_Session

  ansel/lib/Ansel.php                        |    3 ++-
  kronolith/lib/Ajax/Application/Handler.php |    6 ++++--
  2 files changed, 6 insertions(+), 3 deletions(-)

http://github.com/horde/horde/commit/5cd6d790db2d55491025306583d01369a4143a03
2014-05-29 19:59:50 Michael Slusarz Comment #21 Reply to this comment

[Show Quoted Text - 9 lines]
Yes, if php max execution time is not set.

But obviously no if it is set and enforced.  You cut the time down 
from 10 minutes -> 30 seconds in that example.
I may be wrong, and Jan can correct me if I am, but I believe we 
already DO close the session while fetching the calendars...or maybe 
I misunderstand what you mean.
Well... kind of.  First, it is not being done properly, via 
Horde_Session (i will fix this), so $session is invalid after the 
session is reopened.

Second... I question whether we even need to have a writable session 
at all for this call.  Seems like we could add listEvents to the 
$_readOnly list.  The only places we are setting values in $session in 
Kronolith seem to be:

attendees.php
lib/Ajax/Application/Handler.php (in saveCalendar, so irrelevant for 
listEvents)
lib/View/EditEvent.php
lib/CalendarsManager.php
new.php

Other than Handler.php, the rest is UI related code and doesn't seem 
pertinent to listEvents activity.
2014-05-29 19:42:20 Michael Slusarz Comment #20
Type ⇒ Enhancement
State ⇒ Assigned
Priority ⇒ 1. Low
Reply to this comment
This doesn't help.  As I noted above, 10 requests/second is already
fairly heavy usage.  Your solution bumps that up to 20
requests/second (you have to assume worst-case scenario).
No, you don't. The requests will follow a Poisson distribution, 
assuming mt_rand is not biased towards certain values
It is.  (although not for the numbers you gave).
So the with increasing number of requests, the average will quickly 
stabilise towards 10 requests/second.
Not really.  10-20 requests is going to show bias one way or the 
other.  Sure enough, running a sample size of 20 mt_rand() calls 
created an average time of 82806ns.  18% from the expected "median" is 
not "stabilizing" in that sample size.  And if we are running 
thousands/millions of retries, the problem is not in the random 
distribution but rather in the session access in the first place.

And the original problem remains: you are potentially delaying the 
**normal** scenario - a single request is queued - in an attempt to 
make the outlier work better - since you potentially increase this 
common case to 50% slower than it currently is.  We need to optimize 
for the normal scenario.  I'm going to commit a slightly better 
algorithm for this purpose, by ramping up to the the 0.1 second delay.

[Show Quoted Text - 13 lines]
Only for the memcache sessionhandler - or at least a poll-based 
locking handler instead of an event-based handler.  WHICH IS EXPECTED 
(and one of the reasons why we don't recommend memcache 
sessionhandler).  That doesn't mean the assumption doesn't hold ... 
and the fact that the other sessionhandlers do support this 
generalization.

Randomizing the queue order is something you do to prevent deadlocks.   
It doesn't do anything to make a single access any faster.  In some 
instances it might be faster than the current strategy, but in some 
instances it might be worse.

The limiting factor remains the median polling time.

I'm with Mike in that there definitely is no bug here - this is an 
enhancement request at best.  It's possible that the 0.1 sec poll time 
is not ideal - but I'm not going to change that without evidence to 
prove a better value, especially since the sessionhandler works and 
the performance issues are known and have been known since the 
beginning.
2014-05-29 19:04:01 arjen+horde (at) de-korte (dot) org Comment #19 Reply to this comment
This doesn't help.  As I noted above, 10 requests/second is already 
fairly heavy usage.  Your solution bumps that up to 20 
requests/second (you have to assume worst-case scenario).
No, you don't. The requests will follow a Poisson distribution, 
assuming mt_rand is not biased towards certain values. So the with 
increasing number of requests, the average will quickly stabilise 
towards 10 requests/second.
And there is a general (albeit not mandatory) requirement that 
requests SHOULD be handled in a FIFO basis. If Kronolith for some 
reason sends 15 listEvents requests in a row, it is a reasonable 
assumption that the 15th is the least important request.  It should 
not, by virtue of random luck, suddenly jump to 2nd in the queue.
There is no way (whatever method you use) if these requests are 
handled by separate processes and it is required to get a lock on a 
shared resource. If that is needed, a new request can only be 
initiated when the previous one is know to be active (not waiting to 
get a lock on a resource). Since this is not done, requests can (and 
will be) served in random order. As the cachegrind output I sent you 
also indicates, where the third request that was started is served last.
2014-05-29 18:38:56 Michael Rubinsky Comment #18 Reply to this comment
I wonder if one compromise option is to lump all "internal" 
calendars into a single request, and all "other" calendars into a 
second request.
I agree it's an option. I think I even may have suggested this in the 
other ticket.

[Show Quoted Text - 27 lines]
I agree, but isn't this the same, even if the requests were lumped 
together? We still need to wait for the responses from all the 
calendars before we send the response. The user would, in this case, 
not see ANY calendars in their interface for 10 minutes in your above 
example, no?
My example is session-handler agnostic.

Seems to me a better solution, from a performance perspective, is to 
manually close the session while the remote calendars are being 
fetched.
I may be wrong, and Jan can correct me if I am, but I believe we 
already DO close the session while fetching the calendars...or maybe I 
misunderstand what you mean.

2014-05-29 18:29:12 Michael Slusarz Comment #17 Reply to this comment

[Show Quoted Text - 9 lines]
I wonder if one compromise option is to lump all "internal" calendars 
into a single request, and all "other" calendars into a second request.
... especially since I am not convinced that the use-case here is
what needs enhancement, not the session storage.
Your earlier comment about it working that way "because that's the
way the javascript code was written." isn't correct. It was written 
that way because we don't want the user waiting for EVERY SINGLE 
calendar to be loaded into memory before we send anything back to 
the client and update the UI.
OK.  I wasn't sure how listEvents were being sent.  I agree with this 
from a UI perspective.  Aside: it would be really nice for some sort 
of PHP solution to allow for bi-directional communication regarding 
this kind of polling.  see http://wiki.horde.org/Project/H6TODO and/or 
reactphp  Obviously this isn't useful now, but something to look at in 
the future.
This would create an unnecessary wait time for users that have, 
e.g., any slowly responding remote calendars.
Unfortunately, the problem as it currently stands is that you can 
possibly/easily create a pretty bad DoS situation for the current user.

Say a user has 20 remote calendars defined.  (It sounds like Arjen has 
~15, so this is not an excessive number).  Worst case scenario: those 
20 remote calendars are reachable ... meaning they won't reject 
network connection attempts ... but the actual services on those 
servers don't return anything.  The connections just sit in wait states.

If you send 20 requests, and each request has to time out, this is 20 
request * 30 seconds (the default) = 10 minutes.  Some of those PHP 
requests may timeout... but from my experience you can't count on this 
- and this can also be disabled locally in php.ini.  So you've 
essentially locked the user out of accessing Horde for 10 minutes.
IMO, it doesn't make any sense to force this extra wait time on the 
user because one of the (not even recommended) session handlers 
doesn't work satisfactorily.
My example is session-handler agnostic.

Seems to me a better solution, from a performance perspective, is to 
manually close the session while the remote calendars are being 
fetched.  (Even better, if the session data is unchanged in the 
listEvents call, you don't even need to reopen the session after it is 
completed.)  This is what we do in IMP when:
   * listing Mailboxes
   * building mailbox list
   * building message display data
to prevent these kind of DoS issues.
2014-05-29 17:08:21 Michael Rubinsky Comment #16 Reply to this comment
If Kronolith for some reason sends 15 listEvents requests in a row, 
it is a reasonable assumption that the 15th is the least important 
request.  It should not, by virtue of random luck, suddenly jump to 
2nd in the queue.
For the sake of completeness, the earlier requests are the requests 
that are (supposed) to take the least amount of time to complete. 
I.e., internal calendars. The later requests are those that may take a 
longer time, such as remote calendars.
... especially since I am not convinced that the use-case here is 
what needs enhancement, not the session storage.
Your earlier comment about it working that way "because that's the
way the javascript code was written." isn't correct. It was written 
that way because we don't want the user waiting for EVERY SINGLE 
calendar to be loaded into memory before we send anything back to the 
client and update the UI. This would create an unnecessary wait time 
for users that have, e.g., any slowly responding remote calendars.

IMO, it doesn't make any sense to force this extra wait time on the 
user because one of the (not even recommended) session handlers 
doesn't work satisfactorily.
2014-05-29 17:00:06 Michael Slusarz Comment #15 Reply to this comment
This doesn't help.  As I noted above, 10 requests/second is already 
fairly heavy usage.  Your solution bumps that up to 20 requests/second 
(you have to assume worst-case scenario).

And there is a general (albeit not mandatory) requirement that 
requests SHOULD be handled in a FIFO basis.  If Kronolith for some 
reason sends 15 listEvents requests in a row, it is a reasonable 
assumption that the 15th is the least important request.  It should 
not, by virtue of random luck, suddenly jump to 2nd in the queue.

From a practical perspective, this is the ONLY place in Horde where 
we are having this issue.  Nowhere else can I think of a place where 
we have more than 1 or maybe 2 session requests queued.  So we should 
not be vastly altering the queue system based on a single use-case ... 
especially since I am not convinced that the use-case here is what 
needs enhancement, not the session storage.
2014-05-29 10:01:50 arjen+horde (at) de-korte (dot) org Comment #14
New Attachment: Horde_Memcache.patch Download
Reply to this comment
Lowering the poll delay is not a viable option.  At 10 polls/second 
(per request), that is already quite a bit of network/protocol 
overhead.

Maybe it would be better to implement a logarithmically increasing 
delay mechanism.  Or for true performance, a hybrid flock/memcache 
system.
After session_start(), it seems to take a few milliseconds before the 
lock on the session data is released. I noticed that frequently it 
takes a while before the first six listEvents that are started manage 
to start a session. This is due to the fact that the poll delay for 
all of them is identical, so if they are started at the same time, the 
subsequent polls to retry to get a lock on the session data will also 
coincide. In that case, it may take up to six times 100 ms before all 
processes get going. Adding some randomness in the polling would help 
to minimize that, by retrying at different intervals (see attached 
patch).
2014-05-29 01:19:33 Michael Rubinsky Comment #13 Reply to this comment
I remember somewhere bringing up exactly this issue ... that it 
didn't seem to make sense to issue X # of listEvents requests as 
opposed to packaging all of them together in a single call. (Can't 
find a ticket, and searching my e-mails didn't uncover anything 
either).
Bug: 12729

2014-05-29 00:56:21 Michael Slusarz Comment #12 Reply to this comment
Back to the initial topic of this ticket...

I remember somewhere bringing up exactly this issue ... that it didn't 
seem to make sense to issue X # of listEvents requests as opposed to 
packaging all of them together in a single call. (Can't find a ticket, 
and searching my e-mails didn't uncover anything either).

I remember being told at the time that this was necessary due to the 
way the javascript code was written.  Which didn't seem to be a 
satisfactory resolution, IMHO, since that responded only to why it 
wasn't currently being done as opposed to whether it was a good idea 
or not.

Still feel that this is a valid concern.
2014-05-29 00:25:29 Git Commit Comment #11 Reply to this comment
Changes have been made in Git (master):

commit e284e4ec3d7d2a2aff1b4936d9f58fbe3c3dc63a
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Wed May 28 18:24:57 2014 -0600

     Again make clear that nobody should be using anything other than 
file-based sessionhandler backends unless there is a demonstrable 
reason for it

     Bug #13231

  horde/config/conf.xml |    5 +++++
  1 files changed, 5 insertions(+), 0 deletions(-)

http://github.com/horde/horde/commit/e284e4ec3d7d2a2aff1b4936d9f58fbe3c3dc63a
2014-05-29 00:17:11 Michael Slusarz Comment #10 Reply to this comment

[Show Quoted Text - 9 lines]
It *is* working.  I'm guessing the issue lies more with our current 
polling choice - 0.1 seconds.

You can't compare memcache with other sessionhandlers re: locking, 
since other sessionhandlers (i.e. file, sql) have built-in lock 
mechanisms (flock, SQL row locking) that are going to be 
instantaneous.  For memcache, our only solution is a poll-based 
solution.

Lowering the poll delay is not a viable option.  At 10 polls/second 
(per request), that is already quite a bit of network/protocol overhead.

Maybe it would be better to implement a logarithmically increasing 
delay mechanism.  Or for true performance, a hybrid flock/memcache 
system.

Moral of the story: as I've tried to tell some of my clients, memcache 
is NOT the ideal session storage backend.  Memcache might be useful if 
you are running a pool of PHP servers, but on a single server setup 
there remains zero reason to use anything other than file-based 
sessions.
2014-05-29 00:08:22 Michael Slusarz Comment #9 Reply to this comment
I sent a bunch of xdebug output files to Michael Slusarz a few days 
ago, which shows session_start() is spending a lot of time in 
php::usleep (waiting for Horde_Memcache->lock). Would it be useful 
to add them to this ticket (or send them to someone else?)
The xdebug files are perfectly fine.  They are correctly calling 
unlock at the end of each access.
2014-05-28 13:12:23 Jan Schneider Comment #8 Reply to this comment
Well, that's how locking works. The question is, why is unlocking not
working during the session.
I don't think I fully understand this. Should the lock be released 
*before* or *after* processing the calendar resource?
Before: 
https://github.com/horde/horde/blob/master/kronolith/lib/Ajax/Application/Handler.php#L113

[Show Quoted Text - 17 lines]
No, sessions are always locked. And this doesn't have anything to do 
with Kronolith. Closing a session obviously just doesn't work in the 
Memcache implementation. For whatever reason.
2014-05-28 12:50:22 arjen+horde (at) de-korte (dot) org Comment #7 Reply to this comment
Well, that's how locking works. The question is, why is unlocking 
not working during the session.
I don't think I fully understand this. Should the lock be released 
*before* or *after* processing the calendar resource?
Horde_SessionHandler_Storage_Hashtable::close() should be called 
from PHP, and that should delete() the key from memcache again in 
Horde_Memcache::unlock().
That is what happens, since all 15 listEvents are serviced eventually. 
The problem I see (but this may be by design, I don't know) is that in 
the implementation of the current Hashtable sessionhandler, the lock 
seems to be removed only when the listEvent completed. Which means 
that effectively, only one listEvent is handled at the same time, 
while other listEvent processes are waiting for the lock to release.

I expected similar behavior as for the Sql or builtin sessionhandlers, 
where all listEvent processes are running concurrently and don't seem 
to have to wait for locks to be released. Maybe these don't lock the 
session data and is that the reason why they are a lot faster in 
updating the calendar view.
2014-05-28 12:32:24 Jan Schneider Comment #6 Reply to this comment
You rather mean *inside* Horde_Memcache#lock()?
Yes. When six processes attempt to lock the session data more or 
less at the same time, at most one will succeed in doing so, while 
the others sleep for 100 ms. If after that the session data is still 
locked (or locked again), this repeats until all have completed.
Well, that's how locking works. The question is, why is unlocking not 
working during the session. 
Horde_SessionHandler_Storage_Hashtable::close() should be called from 
PHP, and that should delete() the key from memcache again in 
Horde_Memcache::unlock().
2014-05-28 11:13:28 arjen+horde (at) de-korte (dot) org Comment #5 Reply to this comment
You rather mean *inside* Horde_Memcache#lock()?
Yes. When six processes attempt to lock the session data more or less 
at the same time, at most one will succeed in doing so, while the 
others sleep for 100 ms. If after that the session data is still 
locked (or locked again), this repeats until all have completed.
Well if the Memcache object doesn't add() this lock key, that's 
probably nothing that  Horde can handle.
The lock is added.
2014-05-28 10:55:10 Jan Schneider Comment #4 Reply to this comment
You rather mean *inside* Horde_Memcache#lock()? Well if the Memcache 
object doesn't add() this lock key, that's probably nothing that Horde 
can handle.
2014-05-28 10:16:20 arjen+horde (at) de-korte (dot) org Comment #3 Reply to this comment
I sent a bunch of xdebug output files to Michael Slusarz a few days 
ago, which shows session_start() is spending a lot of time in 
php::usleep (waiting for Horde_Memcache->lock). Would it be useful to 
add them to this ticket (or send them to someone else?)
2014-05-28 09:35:31 Jan Schneider Summary ⇒ Hashtable session handler doesn't unlock on session close
 
2014-05-28 09:34:55 Jan Schneider Type ⇒ Bug
Version ⇒ Git master
Queue ⇒ Horde Framework Packages
State ⇒ Feedback
Priority ⇒ 1. Low
 
2014-05-28 09:34:28 Jan Schneider Comment #2 Reply to this comment
This is what we already do. Well, we actually close the session while 
events are loaded. As you already noticed by using a different session 
handler this works just fine. If it doesn't work with the Hashtable 
backend, the bug is there.
2014-05-28 06:56:33 arjen+horde (at) de-korte (dot) org Comment #1
Type ⇒ Enhancement
State ⇒ New
Priority ⇒ 1. Low
Summary ⇒ listEvents will run sequentially when listing calendar events
Queue ⇒ Kronolith
Milestone ⇒
Patch ⇒ No
Reply to this comment
I'm using Firefox 29 which, like most browsers, will allow a limited 
number of simultaneous connections to the same server (in this 
version, six). When listing the events for my calendar, it will fire 
off 15 processes, each handling a single calendar resource. It will 
run 6 processes in parallel, so far so good.

Now the hairy part. With the builtin-, File- and Sql-sessionhandlers 
these 6 processes will run concurrently and all 15 are handled in 
about half a second. Very acceptable performance and switching between 
month views is pretty much instantaneous.

With the Hashtable sessionhandler things are quite different. Each 
process will attempt to get a lock on the (same) session data that is 
stored during session_start(). Obviously, only one of 6 listEvents 
calls succeeds to get the lock (so the locking mechanism works), the 
others have to wait until the session is closed again. So effectively, 
the listEvents calls are handled sequentially for the Hashtable 
sessionhandler. This means that switching between month views takes 
well over two seconds. Not impressive at all.

As far as I can see, the listEvents call from the Calendar will not 
have to change the session data. In this case, it might be useful to 
open the session data readonly, in which case no locking is used (nor 
needed), so multiple processes can use the same data concurrently.

Saved Queries