Tickets :: [#13231] Hashtable session handler doesn't unlock on session close

6.0.0-beta6

4/10/26

Summary	Hashtable session handler doesn't unlock on session close
Queue	Horde Framework Packages
Queue Version	Git master
Type	Enhancement
State	Assigned
Priority	1. Low
Owners
Requester	arjen+horde (at) de-korte (dot) org
Created	5/28/14 (4335 days ago)
Due
Updated	5/29/14 (4334 days ago)
Assigned	5/29/14 (4334 days ago)
Resolved
Milestone
Patch	No

05/29/2014 08:06:52 PM	Git Commit	Comment #23	Reply to this comment
Changes have been made in Git (master): commit fe943741f61c4d80764c87fefb4ec07a11252451 Author: Michael M Slusarz <slusarz@horde.org> Date: Thu May 29 14:06:22 2014 -0600 [mms] In the Predis driver, ramp up to the maximum unlock wait interval (Request #13231). framework/HashTable/lib/Horde/HashTable/Predis.php \| 4 ++-- framework/HashTable/package.xml \| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) http://github.com/horde/horde/commit/fe943741f61c4d80764c87fefb4ec07a11252451

05/29/2014 08:06:47 PM	Git Commit	Comment #22	Reply to this comment
Changes have been made in Git (master): commit 5cd6d790db2d55491025306583d01369a4143a03 Author: Michael M Slusarz <slusarz@horde.org> Date: Thu May 29 14:02:26 2014 -0600 Ticket #13231: session backend handling must be handled through Horde_Session ansel/lib/Ansel.php \| 3 ++- kronolith/lib/Ajax/Application/Handler.php \| 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) http://github.com/horde/horde/commit/5cd6d790db2d55491025306583d01369a4143a03

05/29/2014 07:59:50 PM	Michael Slusarz	Comment #21	Reply to this comment
[Show Quoted Text - 9 lines][Hide Quoted Text] If you send 20 requests, and each request has to time out, this is 20 request * 30 seconds (the default) = 10 minutes. Some of those PHP requests may timeout... but from my experience you can't count on this - and this can also be disabled locally in php.ini. So you've essentially locked the user out of accessing Horde for 10 minutes. I agree, but isn't this the same, even if the requests were lumped together? We still need to wait for the responses from all the calendars before we send the response. Yes, if php max execution time is not set. But obviously no if it is set and enforced. You cut the time down from 10 minutes -> 30 seconds in that example. I may be wrong, and Jan can correct me if I am, but I believe we already DO close the session while fetching the calendars...or maybe I misunderstand what you mean. Well... kind of. First, it is not being done properly, via Horde_Session (i will fix this), so $session is invalid after the session is reopened. Second... I question whether we even need to have a writable session at all for this call. Seems like we could add listEvents to the $_readOnly list. The only places we are setting values in $session in Kronolith seem to be: attendees.php lib/Ajax/Application/Handler.php (in saveCalendar, so irrelevant for listEvents) lib/View/EditEvent.php lib/CalendarsManager.php new.php Other than Handler.php, the rest is UI related code and doesn't seem pertinent to listEvents activity.

05/29/2014 07:42:20 PM	Michael Slusarz	Comment #20 Priority ⇒ 1. Low State ⇒ Assigned Type ⇒ Enhancement	Reply to this comment
This doesn't help. As I noted above, 10 requests/second is already fairly heavy usage. Your solution bumps that up to 20 requests/second (you have to assume worst-case scenario). No, you don't. The requests will follow a Poisson distribution, assuming mt_rand is not biased towards certain values It is. (although not for the numbers you gave). So the with increasing number of requests, the average will quickly stabilise towards 10 requests/second. Not really. 10-20 requests is going to show bias one way or the other. Sure enough, running a sample size of 20 mt_rand() calls created an average time of 82806ns. 18% from the expected "median" is not "stabilizing" in that sample size. And if we are running thousands/millions of retries, the problem is not in the random distribution but rather in the session access in the first place. And the original problem remains: you are potentially delaying the normal scenario - a single request is queued - in an attempt to make the outlier work better - since you potentially increase this common case to 50% slower than it currently is. We need to optimize for the normal scenario. I'm going to commit a slightly better algorithm for this purpose, by ramping up to the the 0.1 second delay. [Show Quoted Text - 13 lines][Hide Quoted Text] And there is a general (albeit not mandatory) requirement that requests SHOULD be handled in a FIFO basis. If Kronolith for some reason sends 15 listEvents requests in a row, it is a reasonable assumption that the 15th is the least important request. It should not, by virtue of random luck, suddenly jump to 2nd in the queue. There is no way (whatever method you use) if these requests are handled by separate processes and it is required to get a lock on a shared resource If that is needed, a new request can only be initiated when the previous one is know to be active (not waiting to get a lock on a resource). Since this is not done, requests can (and will be) served in random order. Only for the memcache sessionhandler - or at least a poll-based locking handler instead of an event-based handler. WHICH IS EXPECTED (and one of the reasons why we don't recommend memcache sessionhandler). That doesn't mean the assumption doesn't hold ... and the fact that the other sessionhandlers do support this generalization. Randomizing the queue order is something you do to prevent deadlocks. It doesn't do anything to make a single access any faster. In some instances it might be faster than the current strategy, but in some instances it might be worse. The limiting factor remains the median polling time. I'm with Mike in that there definitely is no bug here - this is an enhancement request at best. It's possible that the 0.1 sec poll time is not ideal - but I'm not going to change that without evidence to prove a better value, especially since the sessionhandler works and the performance issues are known and have been known since the beginning.

05/29/2014 07:04:01 PM	arjen+horde (at) de-korte (dot) org	Comment #19	Reply to this comment
This doesn't help. As I noted above, 10 requests/second is already fairly heavy usage. Your solution bumps that up to 20 requests/second (you have to assume worst-case scenario). No, you don't. The requests will follow a Poisson distribution, assuming mt_rand is not biased towards certain values. So the with increasing number of requests, the average will quickly stabilise towards 10 requests/second. And there is a general (albeit not mandatory) requirement that requests SHOULD be handled in a FIFO basis. If Kronolith for some reason sends 15 listEvents requests in a row, it is a reasonable assumption that the 15th is the least important request. It should not, by virtue of random luck, suddenly jump to 2nd in the queue. There is no way (whatever method you use) if these requests are handled by separate processes and it is required to get a lock on a shared resource. If that is needed, a new request can only be initiated when the previous one is know to be active (not waiting to get a lock on a resource). Since this is not done, requests can (and will be) served in random order. As the cachegrind output I sent you also indicates, where the third request that was started is served last.

05/29/2014 06:38:56 PM	Michael Rubinsky	Comment #18	Reply to this comment
I wonder if one compromise option is to lump all "internal" calendars into a single request, and all "other" calendars into a second request. I agree it's an option. I think I even may have suggested this in the other ticket. [Show Quoted Text - 27 lines][Hide Quoted Text] OK. I wasn't sure how listEvents were being sent. I agree with this from a UI perspective. Aside: it would be really nice for some sort of PHP solution to allow for bi-directional communication regarding this kind of polling. see http://wiki.horde.org/Project/H6TODO and/or reactphp Obviously this isn't useful now, but something to look at in the future. This would create an unnecessary wait time for users that have, e.g., any slowly responding remote calendars. Unfortunately, the problem as it currently stands is that you can possibly/easily create a pretty bad DoS situation for the current user. Say a user has 20 remote calendars defined. (It sounds like Arjen has ~15, so this is not an excessive number). Worst case scenario: those 20 remote calendars are reachable ... meaning they won't reject network connection attempts ... but the actual services on those servers don't return anything. The connections just sit in wait states. If you send 20 requests, and each request has to time out, this is 20 request * 30 seconds (the default) = 10 minutes. Some of those PHP requests may timeout... but from my experience you can't count on this - and this can also be disabled locally in php.ini. So you've essentially locked the user out of accessing Horde for 10 minutes. I agree, but isn't this the same, even if the requests were lumped together? We still need to wait for the responses from all the calendars before we send the response. The user would, in this case, not see ANY calendars in their interface for 10 minutes in your above example, no? My example is session-handler agnostic. Seems to me a better solution, from a performance perspective, is to manually close the session while the remote calendars are being fetched. I may be wrong, and Jan can correct me if I am, but I believe we already DO close the session while fetching the calendars...or maybe I misunderstand what you mean.

05/29/2014 06:29:12 PM	Michael Slusarz	Comment #17	Reply to this comment
[Show Quoted Text - 9 lines][Hide Quoted Text] If Kronolith for some reason sends 15 listEvents requests in a row, it is a reasonable assumption that the 15th is the least important request. It should not, by virtue of random luck, suddenly jump to 2nd in the queue. For the sake of completeness, the earlier requests are the requests that are (supposed) to take the least amount of time to complete. I.e., internal calendars. The later requests are those that may take a longer time, such as remote calendars. I wonder if one compromise option is to lump all "internal" calendars into a single request, and all "other" calendars into a second request. ... especially since I am not convinced that the use-case here is what needs enhancement, not the session storage. Your earlier comment about it working that way "because that's the way the javascript code was written." isn't correct. It was written that way because we don't want the user waiting for EVERY SINGLE calendar to be loaded into memory before we send anything back to the client and update the UI. OK. I wasn't sure how listEvents were being sent. I agree with this from a UI perspective. Aside: it would be really nice for some sort of PHP solution to allow for bi-directional communication regarding this kind of polling. see http://wiki.horde.org/Project/H6TODO and/or reactphp Obviously this isn't useful now, but something to look at in the future. This would create an unnecessary wait time for users that have, e.g., any slowly responding remote calendars. Unfortunately, the problem as it currently stands is that you can possibly/easily create a pretty bad DoS situation for the current user. Say a user has 20 remote calendars defined. (It sounds like Arjen has ~15, so this is not an excessive number). Worst case scenario: those 20 remote calendars are reachable ... meaning they won't reject network connection attempts ... but the actual services on those servers don't return anything. The connections just sit in wait states. If you send 20 requests, and each request has to time out, this is 20 request * 30 seconds (the default) = 10 minutes. Some of those PHP requests may timeout... but from my experience you can't count on this - and this can also be disabled locally in php.ini. So you've essentially locked the user out of accessing Horde for 10 minutes. IMO, it doesn't make any sense to force this extra wait time on the user because one of the (not even recommended) session handlers doesn't work satisfactorily. My example is session-handler agnostic. Seems to me a better solution, from a performance perspective, is to manually close the session while the remote calendars are being fetched. (Even better, if the session data is unchanged in the listEvents call, you don't even need to reopen the session after it is completed.) This is what we do in IMP when: * listing Mailboxes * building mailbox list * building message display data to prevent these kind of DoS issues.

05/29/2014 05:08:21 PM	Michael Rubinsky	Comment #16	Reply to this comment
If Kronolith for some reason sends 15 listEvents requests in a row, it is a reasonable assumption that the 15th is the least important request. It should not, by virtue of random luck, suddenly jump to 2nd in the queue. For the sake of completeness, the earlier requests are the requests that are (supposed) to take the least amount of time to complete. I.e., internal calendars. The later requests are those that may take a longer time, such as remote calendars. ... especially since I am not convinced that the use-case here is what needs enhancement, not the session storage. Your earlier comment about it working that way "because that's the way the javascript code was written." isn't correct. It was written that way because we don't want the user waiting for EVERY SINGLE calendar to be loaded into memory before we send anything back to the client and update the UI. This would create an unnecessary wait time for users that have, e.g., any slowly responding remote calendars. IMO, it doesn't make any sense to force this extra wait time on the user because one of the (not even recommended) session handlers doesn't work satisfactorily.

05/29/2014 05:00:06 PM	Michael Slusarz	Comment #15	Reply to this comment
This doesn't help. As I noted above, 10 requests/second is already fairly heavy usage. Your solution bumps that up to 20 requests/second (you have to assume worst-case scenario). And there is a general (albeit not mandatory) requirement that requests SHOULD be handled in a FIFO basis. If Kronolith for some reason sends 15 listEvents requests in a row, it is a reasonable assumption that the 15th is the least important request. It should not, by virtue of random luck, suddenly jump to 2nd in the queue. From a practical perspective, this is the ONLY place in Horde where we are having this issue. Nowhere else can I think of a place where we have more than 1 or maybe 2 session requests queued. So we should not be vastly altering the queue system based on a single use-case ... especially since I am not convinced that the use-case here is what needs enhancement, not the session storage.

05/29/2014 10:01:50 AM	arjen+horde (at) de-korte (dot) org	Comment #14 New Attachment: Horde_Memcache.patch	Reply to this comment
Lowering the poll delay is not a viable option. At 10 polls/second (per request), that is already quite a bit of network/protocol overhead. Maybe it would be better to implement a logarithmically increasing delay mechanism. Or for true performance, a hybrid flock/memcache system. After session_start(), it seems to take a few milliseconds before the lock on the session data is released. I noticed that frequently it takes a while before the first six listEvents that are started manage to start a session. This is due to the fact that the poll delay for all of them is identical, so if they are started at the same time, the subsequent polls to retry to get a lock on the session data will also coincide. In that case, it may take up to six times 100 ms before all processes get going. Adding some randomness in the polling would help to minimize that, by retrying at different intervals (see attached patch).

05/29/2014 01:19:33 AM	Michael Rubinsky	Comment #13	Reply to this comment
I remember somewhere bringing up exactly this issue ... that it didn't seem to make sense to issue X # of listEvents requests as opposed to packaging all of them together in a single call. (Can't find a ticket, and searching my e-mails didn't uncover anything either). ~~Bug: 12729~~

05/29/2014 12:56:21 AM	Michael Slusarz	Comment #12	Reply to this comment
Back to the initial topic of this ticket... I remember somewhere bringing up exactly this issue ... that it didn't seem to make sense to issue X # of listEvents requests as opposed to packaging all of them together in a single call. (Can't find a ticket, and searching my e-mails didn't uncover anything either). I remember being told at the time that this was necessary due to the way the javascript code was written. Which didn't seem to be a satisfactory resolution, IMHO, since that responded only to why it wasn't currently being done as opposed to whether it was a good idea or not. Still feel that this is a valid concern.

05/29/2014 12:25:29 AM	Git Commit	Comment #11	Reply to this comment
Changes have been made in Git (master): commit e284e4ec3d7d2a2aff1b4936d9f58fbe3c3dc63a Author: Michael M Slusarz <slusarz@horde.org> Date: Wed May 28 18:24:57 2014 -0600 Again make clear that nobody should be using anything other than file-based sessionhandler backends unless there is a demonstrable reason for it Bug #13231 horde/config/conf.xml \| 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) http://github.com/horde/horde/commit/e284e4ec3d7d2a2aff1b4936d9f58fbe3c3dc63a

05/29/2014 12:17:11 AM	Michael Slusarz	Comment #10	Reply to this comment
[Show Quoted Text - 9 lines][Hide Quoted Text] You rather mean inside Horde_Memcache#lock()? Yes. When six processes attempt to lock the session data more or less at the same time, at most one will succeed in doing so, while the others sleep for 100 ms. If after that the session data is still locked (or locked again), this repeats until all have completed. Well, that's how locking works. The question is, why is unlocking not working during the session. It is working. I'm guessing the issue lies more with our current polling choice - 0.1 seconds. You can't compare memcache with other sessionhandlers re: locking, since other sessionhandlers (i.e. file, sql) have built-in lock mechanisms (flock, SQL row locking) that are going to be instantaneous. For memcache, our only solution is a poll-based solution. Lowering the poll delay is not a viable option. At 10 polls/second (per request), that is already quite a bit of network/protocol overhead. Maybe it would be better to implement a logarithmically increasing delay mechanism. Or for true performance, a hybrid flock/memcache system. Moral of the story: as I've tried to tell some of my clients, memcache is NOT the ideal session storage backend. Memcache might be useful if you are running a pool of PHP servers, but on a single server setup there remains zero reason to use anything other than file-based sessions.

05/29/2014 12:08:22 AM	Michael Slusarz	Comment #9	Reply to this comment
I sent a bunch of xdebug output files to Michael Slusarz a few days ago, which shows session_start() is spending a lot of time in php::usleep (waiting for Horde_Memcache->lock). Would it be useful to add them to this ticket (or send them to someone else?) The xdebug files are perfectly fine. They are correctly calling unlock at the end of each access.

05/28/2014 01:12:23 PM	Jan Schneider	Comment #8	Reply to this comment
Well, that's how locking works. The question is, why is unlocking not working during the session. I don't think I fully understand this. Should the lock be released before or after processing the calendar resource? Before: https://github.com/horde/horde/blob/master/kronolith/lib/Ajax/Application/Handler.php#L113 [Show Quoted Text - 17 lines][Hide Quoted Text] Horde_SessionHandler_Storage_Hashtable::close() should be called from PHP, and that should delete() the key from memcache again in Horde_Memcache::unlock(). That is what happens, since all 15 listEvents are serviced eventually. The problem I see (but this may be by design, I don't know) is that in the implementation of the current Hashtable sessionhandler, the lock seems to be removed only when the listEvent completed. Which means that effectively, only one listEvent is handled at the same time, while other listEvent processes are waiting for the lock to release. I expected similar behavior as for the Sql or builtin sessionhandlers, where all listEvent processes are running concurrently and don't seem to have to wait for locks to be released. Maybe these don't lock the session data and is that the reason why they are a lot faster in updating the calendar view. No, sessions are always locked. And this doesn't have anything to do with Kronolith. Closing a session obviously just doesn't work in the Memcache implementation. For whatever reason.

05/28/2014 12:50:22 PM	arjen+horde (at) de-korte (dot) org	Comment #7	Reply to this comment
Well, that's how locking works. The question is, why is unlocking not working during the session. I don't think I fully understand this. Should the lock be released before or after processing the calendar resource? Horde_SessionHandler_Storage_Hashtable::close() should be called from PHP, and that should delete() the key from memcache again in Horde_Memcache::unlock(). That is what happens, since all 15 listEvents are serviced eventually. The problem I see (but this may be by design, I don't know) is that in the implementation of the current Hashtable sessionhandler, the lock seems to be removed only when the listEvent completed. Which means that effectively, only one listEvent is handled at the same time, while other listEvent processes are waiting for the lock to release. I expected similar behavior as for the Sql or builtin sessionhandlers, where all listEvent processes are running concurrently and don't seem to have to wait for locks to be released. Maybe these don't lock the session data and is that the reason why they are a lot faster in updating the calendar view.

05/28/2014 12:32:24 PM	Jan Schneider	Comment #6	Reply to this comment
You rather mean inside Horde_Memcache#lock()? Yes. When six processes attempt to lock the session data more or less at the same time, at most one will succeed in doing so, while the others sleep for 100 ms. If after that the session data is still locked (or locked again), this repeats until all have completed. Well, that's how locking works. The question is, why is unlocking not working during the session. Horde_SessionHandler_Storage_Hashtable::close() should be called from PHP, and that should delete() the key from memcache again in Horde_Memcache::unlock().

05/28/2014 11:13:28 AM	arjen+horde (at) de-korte (dot) org	Comment #5	Reply to this comment
You rather mean inside Horde_Memcache#lock()? Yes. When six processes attempt to lock the session data more or less at the same time, at most one will succeed in doing so, while the others sleep for 100 ms. If after that the session data is still locked (or locked again), this repeats until all have completed. Well if the Memcache object doesn't add() this lock key, that's probably nothing that Horde can handle. The lock is added.

05/28/2014 10:55:10 AM	Jan Schneider	Comment #4	Reply to this comment
You rather mean inside Horde_Memcache#lock()? Well if the Memcache object doesn't add() this lock key, that's probably nothing that Horde can handle.

05/28/2014 10:16:20 AM	arjen+horde (at) de-korte (dot) org	Comment #3	Reply to this comment
I sent a bunch of xdebug output files to Michael Slusarz a few days ago, which shows session_start() is spending a lot of time in php::usleep (waiting for Horde_Memcache->lock). Would it be useful to add them to this ticket (or send them to someone else?)

05/28/2014 09:35:31 AM	Jan Schneider	Summary ⇒ Hashtable session handler doesn't unlock on session close

05/28/2014 09:34:55 AM	Jan Schneider	Version ⇒ Git master Queue ⇒ Horde Framework Packages Type ⇒ Bug State ⇒ Feedback Priority ⇒ 1. Low

05/28/2014 09:34:28 AM	Jan Schneider	Comment #2	Reply to this comment
This is what we already do. Well, we actually close the session while events are loaded. As you already noticed by using a different session handler this works just fine. If it doesn't work with the Hashtable backend, the bug is there.

05/28/2014 06:56:33 AM	arjen+horde (at) de-korte (dot) org	Comment #1 Priority ⇒ 1. Low State ⇒ New Patch ⇒ No Milestone ⇒ Summary ⇒ listEvents will run sequentially when listing calendar events Type ⇒ Enhancement Queue ⇒ Kronolith	Reply to this comment
I'm using Firefox 29 which, like most browsers, will allow a limited number of simultaneous connections to the same server (in this version, six). When listing the events for my calendar, it will fire off 15 processes, each handling a single calendar resource. It will run 6 processes in parallel, so far so good. Now the hairy part. With the builtin-, File- and Sql-sessionhandlers these 6 processes will run concurrently and all 15 are handled in about half a second. Very acceptable performance and switching between month views is pretty much instantaneous. With the Hashtable sessionhandler things are quite different. Each process will attempt to get a lock on the (same) session data that is stored during session_start(). Obviously, only one of 6 listEvents calls succeeds to get the lock (so the locking mechanism works), the others have to wait until the session is closed again. So effectively, the listEvents calls are handled sequentially for the Hashtable sessionhandler. This means that switching between month views takes well over two seconds. Not impressive at all. As far as I can see, the listEvents call from the Calendar will not have to change the session data. In this case, it might be useful to open the session data readonly, in which case no locking is used (nor needed), so multiple processes can use the same data concurrently.