6.0.0-alpha10
5/15/25

[#7363] Avoid bitwise operations in the SQL Share driver
Summary Avoid bitwise operations in the SQL Share driver
Queue Horde Framework Packages
Queue Version FRAMEWORK_3
Type Enhancement
State Resolved
Priority 2. Medium
Owners Horde Developers (at) , jan (at) horde (dot) org
Requester leena.heino (at) uta (dot) fi
Created 09/19/2008 (6082 days ago)
Due
Updated 01/30/2011 (5219 days ago)
Assigned 01/25/2010 (5589 days ago)
Resolved 01/30/2011 (5219 days ago)
Milestone
Patch No

History
01/30/2011 10:51:02 AM Jan Schneider Comment #34
State ⇒ Resolved
Reply to this comment
Yes
01/30/2011 03:59:27 AM Chuck Hagenbuch Comment #33
Assigned to Jan Schneider
State ⇒ Feedback
Reply to this comment
seems like this can be closed now with Sqlng?
05/31/2010 12:56:28 PM ibon (dot) igartua (at) ehu (dot) es Comment #32
New Attachment: sql.php[2].patch Download
Reply to this comment

The attached patch optimizes the way we do the problematic slow query. 
We have patched our production environment and our problem seems to be 
definitively solved. Thanks god!

We can do it because in our horde installation we don't allow sharing 
anything with all the users (authenticated or not) - perm_creator, 
perm_default and perm_guest will always be zero in our database

We are changing the original LEFT JOIN with a UNION

At the moment we ONLY change the query if the user is not part of any 
group and if we don't have $attributes. It could be improved, but this 
is enough for us.

Original query:
        SELECT DISTINCT s.*
        FROM nag_shares s
        LEFT JOIN nag_shares_users AS u ON u.share_id = s.share_id
        WHERE s.share_owner = 'username'
        OR (s.perm_creator & 2)
        OR (s.perm_default & 2)
        OR ( u.user_uid = 'username' AND (u.perm & 2))
        ORDER BY s.attribute_name ASC;

Improved query (written by David Fernandez -sysadmin at the UPV/EHU- 
based on the patch sent by John Madden to the kronolith mailing list:
        SELECT DISTINCT s.*
        FROM nag_shares s
        WHERE s.share_owner = 'username'
        UNION
        SELECT s2.*
        FROM nag_shares s2 RIGHT JOIN nag_shares_users u ON s2.share_id = u.share_id
        WHERE ( u.user_uid = 'username' AND (u.perm & 2))

Comments are welcomed !
01/25/2010 12:11:25 PM ibon (dot) igartua (at) ehu (dot) es Comment #31 Reply to this comment
ibon: that's really interesting. Have you observed similar things 
with other tables, or is this specific to the share tables? Do you 
have a sense of what about innodb is giving better performance here?
  Sorry for the delay replying. We only changed to InnoDB the _shares 
tables. I believe the performance improved a lit due to the way MySQL 
caches the table data always in main memory.

  At the moment we are still experiencing serious performance problems 
at peak ours. We use to have more than 1K active sessions 
simultaneously (active in the last 5 mins) with 25k rows in *_shares 
tables (kronolith, mnemo, nag and turba).

  We have a "big" server dedicated for the horde MySQL 5 server: 2 
quad core with 32 Gb RAM. We are at the moment setting up a second 
server and trying to configure them as multi-master. One for writing 
and both balanced for reading. We hope it'll improve the performance.

  By the way ... is this last patch published here reliable?
01/25/2010 02:33:16 AM Chuck Hagenbuch Comment #30
Taken from Duck
State ⇒ Assigned
Patch ⇒ No
New Attachment: sql.php[1].patch Download
Reply to this comment
Attaching a patch from John Madden, via the kronolith mailing list.
11/08/2009 04:50:32 PM Chuck Hagenbuch Comment #28 Reply to this comment
ibon: that's really interesting. Have you observed similar things with 
other tables, or is this specific to the share tables? Do you have a 
sense of what about innodb is giving better performance here? Thanks!
11/06/2009 10:58:57 AM ibon (dot) igartua (at) ehu (dot) es Comment #27 Reply to this comment
Note: we have experienced a considerable improvement moving the 
affected tables (*_shares & *_shares_users) from MyISAM to InnoDB
11/06/2009 10:04:32 AM Duck Comment #26 Reply to this comment
Practical question: moving from MySQL to Postgree or Oracle would 
solve this problem ?
MySQL doesn't feature bitmap type indexes. If you switch to a DB 
server that does, it will  help a lot.

10/30/2009 01:01:09 PM ibon (dot) igartua (at) ehu (dot) es Comment #25 Reply to this comment
We have the same problem here at the University of the Basque Country. 
We have recently upgraded from Horde Groupware Webmail 1.0 to 1.2 and 
activated the shares and the load of the MySql server is frightening.



Practical question: moving from MySQL to Postgree or Oracle would 
solve this problem ?
10/02/2009 09:55:27 AM Duck Comment #24 Reply to this comment
Release notes
suggest that the share sql driver was to be a major improvement from
the datatree based shares. But reading the comments the current share
sql driver is a major bottleneck in large installations and in
actually use it can be worse than the share datatree driver that it
replaced.
I wrote the sql driver as the datatree driver was 100 time slower on 
really large data. Tested on my production installation we fall from 
40s to 0.34s or something per Ansel gallery list with caa 80k 
galleries and 1M images. Mainly because of how sql sever creates and 
manages temporary tables for query execution. Even others have 
reported big speed improvements on the mailing-list. But, yes, it was 
just a step forward to something really useful. See ticket #6109.
Would it be possible for someone to write some sort of conversion
script from the current share sql to the new one that does not use
bitwise operations. I could then test this new code with data from
our installation.
A good time to rewrite Hore_Share into H4/git lib :)
10/02/2009 09:48:25 AM Jan Schneider Comment #23 Reply to this comment
Ping?
Nothing from me at this time.
This was actually a ping regarding Chuck's question for a large set of 
test data.
10/02/2009 09:28:35 AM leena (dot) heino (at) uta (dot) fi Comment #22 Reply to this comment
Ping?
Nothing from me at this time. Probably even not in future as the next
month I am going to join another company where RoR is used an I don't
know if I will have time to spend in for Horde anymore. For the next
30 days I will try to update all my code to the last H4 changes and
fix my internal buglist. This is all I can offer for now. I have one
guy who will replace me in this company.
I have a suggestion. Replace the current sql based share driver with 
the one that does not use the bitwise operations. Release notes 
suggest that the share sql driver was to be a major improvement from 
the datatree based shares. But reading the comments the current share 
sql driver is a major bottleneck in large installations and in 
actually use it can be worse than the share datatree driver that it 
replaced.



Would it be possible for someone to write some sort of conversion 
script from the current share sql to the new one that does not use 
bitwise operations. I could then test this new code with data from our 
installation.
10/02/2009 08:57:26 AM Duck Comment #21 Reply to this comment
Ping?
Nothing from me at this time. Probably even not in future as the next 
month I am going to join another company where RoR is used an I don't 
know if I will have time to spend in for Horde anymore. For the next 
30 days I will try to update all my code to the last H4 changes and 
fix my internal buglist. This is all I can offer for now. I have one 
guy who will replace me in this company.
10/01/2009 10:01:34 PM Jan Schneider Comment #20 Reply to this comment
Ping?
07/09/2009 05:10:52 AM Chuck Hagenbuch Comment #19
Assigned to Duck
Taken from Chuck Hagenbuch
State ⇒ Feedback
Reply to this comment
Duck - any chance of getting that test data?
07/08/2009 06:10:07 PM paul (dot) hamby (at) kw (dot) com Comment #18 Reply to this comment
Any progress on this bug? We have a fairly large user base (approx 
75000 users, 10-1500 users actually login daily). We recently upgraded 
from an older Horde version to the Horde Webmail Edition 1.2.2. With 
the upgrade, we now have 3 load balanced webmail servers all talking 
to one Mysql server (ver. 5.0..32, dual proc, dual core 2.0 GHz Xeons 
with 4GB RAM).

The webmail servers are holding up great, but the MySQL box is under 
heavy load. We've done a fair amount of tuning and tweaking, but we 
appear to be affected by this "bug". When we run "mytop" we 
consistently see queries similar to the one mentioned in this bug (for 
turba, nag, mnemo and kronolith tables) which do not use indexes and 
create temporary tables. These are also the only queries that show up 
in the mysql-slow.log.


11/05/2008 11:11:56 AM Duck Comment #17 Reply to this comment
Duck - as far as I know you have the largest data set for shares.
Could you make it available for others to use in performance testing?
Alternately, we could generate some random test data and use that. I
know it'd be better than my local database for real scale testing.
I will prepare some testing data and some possible structures in the 
next days.
11/04/2008 05:08:56 PM Chuck Hagenbuch Comment #16 Reply to this comment
Duck - as far as I know you have the largest data set for shares. 
Could you make it available for others to use in performance testing? 
Alternately, we could generate some random test data and use that. I 
know it'd be better than my local database for real scale testing.
10/09/2008 07:13:44 AM Duck Comment #15 Reply to this comment
ACTUAL QUERY  - time: 2.062

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
(s.perm_creator & 2) != 0 OR (s.perm_default & 2) != 0 OR ( u.user_uid 
= 'duck' AND (u.perm & 2) != 0))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



FLAT CRITERIA - time: 2.031

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
(s.perm_creator_show = 1) OR (s.perm_default_show = 1) OR ( u.user_uid 
= 'duck' AND u.perm_show = 1))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



WITHOUT PARENTELS IF NOT NEEDED  - time: 1.964

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
s.perm_creator_show = 1  OR s.perm_default_show = 1  OR ( u.user_uid = 
'duck' AND u.perm_show = 1 ))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



JOIN CIRTERIA OUT OF WHERE  - time: 1.9208

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id AND u.user_uid = 'duck' AND 
u.perm_show = 1 WHERE ( (s.share_owner = 'duck' OR s.perm_creator_show 
= 1  OR s.perm_default_show = 1) AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC


10/08/2008 03:54:07 PM Duck Comment #14
New Attachment: shares.tgz Download
Reply to this comment
I create a new version of sql and sql_heirarical driver that stores 
every permission type in its own column. It checks table structure so 
even custom permission like PERM_DELEGATE from Kronolith should work 
if a columns like perm_default/guest/creater_delegate will be created. 
If share caching is enabled, it check the table structure only once 
per session.



I tested the driver with Spread and Ansel. So it should work but is 
not fully tested. I will try to make some performance test on my large 
installation (63k galleries).
10/08/2008 03:38:54 PM Duck Comment #13 Reply to this comment
For me, both GROUP BY and DISTINCT are using a temporary tables.
This is normal. As a temporary table is created for every join. Then 
depends on the data length and server settings if a table is created 
in memory or on disk.


10/08/2008 01:36:01 PM Duck Comment #12 Reply to this comment
I still think this is worth exploring, but Jan and Michael R have
raised some good points on how we can support custom permissions
(such as Kronolith's PERMS_DELEGATE) in a BC way.
if PERMS_DELEGATE and other custom permissions must be possible, we 
can still do as with attributes - relay on table structure. We can 
detect fields on _save() and select with *.


10/06/2008 10:16:44 PM Michael Rubinsky Comment #11
New Attachment: mrubinsk_explains.txt Download
Reply to this comment
Very small dataset, so I'm not sure how helpful this is, but here it 
is.  For me, both GROUP BY and DISTINCT are using a temporary tables.
10/06/2008 08:03:57 PM Chuck Hagenbuch Comment #10 Reply to this comment
Can either/both of you provide EXPLAIN output for these patches?
10/06/2008 05:35:40 PM Duck Comment #9
New Attachment: sql.diff Download
Reply to this comment
In general I am counter to a solution on the PHP part. As you still 
need to load the data on the client site (probably from a remote 
server) and process it.



What about GROUP BY?

DISTINCT is normaly used to remove duplicates. GROUP BY to aggregate 
operators to each group. In our occasion DISTINCT and GROUP BY 
generates the same query plan. Probably will be faster to use GROUP BY 
for listings as better results in joins.

Even for counting we can use GROUP BY where we need to return just the 
number of rows affected instead of looping it into the countShares().



Look at the patch. And try it on your data.
10/06/2008 02:39:33 PM Michael Rubinsky Comment #8
Patch ⇒ Yes
New Attachment: sql.php.patch Download
Reply to this comment
What about something like this for removing the DISTINCT?



Ran it through a few common tasks in Turba, but not tested in depth.
10/06/2008 02:03:52 PM Michael Rubinsky Comment #7 Reply to this comment
Actually, looking over the code now, it looks like we *might* be able 
to drop the duplicates (and get rid of the need for the DISTINCT 
qualifier) where we need to in PHP code. Not 100% sure yet, and it 
would probably require an extra loop in _countShares(), but it might 
be worth it if it's possible.
10/06/2008 01:46:32 PM Michael Rubinsky Comment #6 Reply to this comment
The DISTINCT was/is necessary to avoid counting the same share more 
than once when the share has more than one entry in either the *_users 
or *_groups table. Otherwise, we get erroneous share counts.



Not ideal, but necessary with the current implementation. I agree 
things need to be reorganized after seeing the results of those 
EXPLAIN statements.
10/06/2008 09:14:46 AM Duck Comment #5 Reply to this comment
Mysql does not uses indexes for bitwise operations. So it performs a 
full table scan. I was aware of this while I was writing the native 
driver. But I let this problem as it is still much better than the DT 
driver and had no time to solve even this. This part is still on my 
TODO...



Another table scan was introduced in in revision 1.35 of mrubinsk by 
adding the DISTINCT statement in the SQL query. And this is the *real* 
performance killer for MySQL. As it must scan the joined table - it 
means that must allocate a new temporary table with all joined data, 
using limit or not. Brrr....



So we must reorganize data  and query to avoid this two table scans.
10/01/2008 08:25:25 PM Chuck Hagenbuch Assigned to Chuck Hagenbuch
Assigned to Horde DevelopersHorde Developers
 
10/01/2008 08:25:11 PM Chuck Hagenbuch Comment #4
New Attachment: explains.txt Download
Reply to this comment
Some EXPLAIN output for Share_sql vs. Share_datatree. Neither is 
great, and the datatree query isn't doing as much work, but the 
Share_sql query as is now is clearly pretty bad.
09/21/2008 08:50:00 AM adrieder (at) sbox (dot) tugraz (dot) at Comment #3 Reply to this comment
Or maybe rewriting the query using subquery and union:
[....]
Perhaps gaining a better utilization of index in the database queries.
In my tests (I think it should read "LEFT JOIN nag_shares_users" 
instead of "LEFT JOIN kronolith_shares_users") your modified query 
with the subquery was up to 6 times as fast as the original query. The 
database holds about 12,000 shares.


09/20/2008 05:55:48 PM Chuck Hagenbuch Comment #2
Summary ⇒ Avoid bitwise operations in the SQL Share driver
State ⇒ Accepted
Reply to this comment
I still think this is worth exploring, but Jan and Michael R have 
raised some good points on how we can support custom permissions (such 
as Kronolith's PERMS_DELEGATE) in a BC way.
09/19/2008 05:12:19 PM leena (dot) heino (at) uta (dot) fi Comment #1
State ⇒ New
Priority ⇒ 2. Medium
Type ⇒ Enhancement
Summary ⇒ share sql driver performance
Queue ⇒ Horde Framework Packages
Milestone ⇒
Patch ⇒ No
Reply to this comment
It seems that these bitwise operations in share sql-driver is a major 
performance killer at least in MySQL. This seems to be because SQL's 
bitwise operations do not use index in MySQL and therefore each query 
made in share sql-driver has to make a full table scan.



Would it be possible to change the database schema and the code so 
that each permission to different user type (creator/default/guest) 
would be stored in its own column and therefore we would have columns 
like perm_creator_show, perm_creator_read, perm_creator_edit and 
perm_creator_delete. This would allow us to ditch the bitwise 
operations in the query.



Instead of current query like this:

SELECT DISTINCT s.*  FROM nag_shares s

LEFT JOIN nag_shares_users AS u

ON u.share_id = s.share_id WHERE s.share_owner = 'foo'

OR (s.perm_creator & 4) != 0

OR (s.perm_default & 4) != 0

OR ( u.user_uid = 'foo' AND (u.perm & 4) != 0)

ORDER BY s.attribute_name ASC;



we could do the same query as:

SELECT DISTINCT s.*  FROM nag_shares s

LEFT JOIN nag_shares_users AS u

ON u.share_id = s.share_id WHERE s.share_owner = 'foo'

OR (s.perm_creator_read = 1)

OR (s.perm_default_read = 1)

OR ( u.user_uid = 'foo' AND (u.perm_read = 1))

ORDER BY s.attribute_name ASC;



Or maybe rewriting the query using subquery and union:

SELECT DISTINCT * FROM ((SELECT s.* FROM nag_shares s

WHERE (s.perm_creator = 1)

OR (s.perm_default = 1)

OR (s.share_owner = 'foo')) UNION (SELECT s.* FROM nag_shares s

LEFT JOIN kronolith_shares_users AS u ON u.share_id = s.share_id WHERE 
(u.user_uid = 'foo'

AND (u.perm = 1)))) AS new_s ORDER BY new_s.attribute_name ASC;



Perhaps gaining a better utilization of index in the database queries.

Saved Queries