[#7363] Avoid bitwise operations in the SQL Share driver
Summary Avoid bitwise operations in the SQL Share driver
Queue Horde Framework Packages
Queue Version FRAMEWORK_3
Type Enhancement
State Resolved
Priority 2. Medium
Owners Horde Developers, jan@horde.org
Requester leena.heino@uta.fi
Created 2008-09-19 (4310 days ago)
Due
Updated 2011-01-30 (3447 days ago)
Assigned 2010-01-25 (3817 days ago)
Resolved 2011-01-30 (3447 days ago)
Milestone
Patch No

Comments
leena.heino@uta.fi 2008-09-19 17:12:19
It seems that these bitwise operations in share sql-driver is a major 
performance killer at least in MySQL. This seems to be because SQL's 
bitwise operations do not use index in MySQL and therefore each query 
made in share sql-driver has to make a full table scan.



Would it be possible to change the database schema and the code so 
that each permission to different user type (creator/default/guest) 
would be stored in its own column and therefore we would have columns 
like perm_creator_show, perm_creator_read, perm_creator_edit and 
perm_creator_delete. This would allow us to ditch the bitwise 
operations in the query.



Instead of current query like this:

SELECT DISTINCT s.*  FROM nag_shares s

LEFT JOIN nag_shares_users AS u

ON u.share_id = s.share_id WHERE s.share_owner = 'foo'

OR (s.perm_creator & 4) != 0

OR (s.perm_default & 4) != 0

OR ( u.user_uid = 'foo' AND (u.perm & 4) != 0)

ORDER BY s.attribute_name ASC;



we could do the same query as:

SELECT DISTINCT s.*  FROM nag_shares s

LEFT JOIN nag_shares_users AS u

ON u.share_id = s.share_id WHERE s.share_owner = 'foo'

OR (s.perm_creator_read = 1)

OR (s.perm_default_read = 1)

OR ( u.user_uid = 'foo' AND (u.perm_read = 1))

ORDER BY s.attribute_name ASC;



Or maybe rewriting the query using subquery and union:

SELECT DISTINCT * FROM ((SELECT s.* FROM nag_shares s

WHERE (s.perm_creator = 1)

OR (s.perm_default = 1)

OR (s.share_owner = 'foo')) UNION (SELECT s.* FROM nag_shares s

LEFT JOIN kronolith_shares_users AS u ON u.share_id = s.share_id WHERE 
(u.user_uid = 'foo'

AND (u.perm = 1)))) AS new_s ORDER BY new_s.attribute_name ASC;



Perhaps gaining a better utilization of index in the database queries.

Chuck Hagenbuch <chuck@horde.org> 2008-09-20 17:55:48
I still think this is worth exploring, but Jan and Michael R have 
raised some good points on how we can support custom permissions (such 
as Kronolith's PERMS_DELEGATE) in a BC way.

adrieder@sbox.tugraz.at 2008-09-21 08:50:00
> Or maybe rewriting the query using subquery and union:

[....]

> Perhaps gaining a better utilization of index in the database queries.



In my tests (I think it should read "LEFT JOIN nag_shares_users" 
instead of "LEFT JOIN kronolith_shares_users") your modified query 
with the subquery was up to 6 times as fast as the original query. The 
database holds about 12,000 shares.



Chuck Hagenbuch <chuck@horde.org> 2008-10-01 20:25:11
Some EXPLAIN output for Share_sql vs. Share_datatree. Neither is 
great, and the datatree query isn't doing as much work, but the 
Share_sql query as is now is clearly pretty bad.

Duck <duck@obala.net> 2008-10-06 09:14:46
Mysql does not uses indexes for bitwise operations. So it performs a 
full table scan. I was aware of this while I was writing the native 
driver. But I let this problem as it is still much better than the DT 
driver and had no time to solve even this. This part is still on my 
TODO...



Another table scan was introduced in in revision 1.35 of mrubinsk by 
adding the DISTINCT statement in the SQL query. And this is the *real* 
performance killer for MySQL. As it must scan the joined table - it 
means that must allocate a new temporary table with all joined data, 
using limit or not. Brrr....



So we must reorganize data  and query to avoid this two table scans.

Michael Rubinsky <mrubinsk@horde.org> 2008-10-06 13:46:32
The DISTINCT was/is necessary to avoid counting the same share more 
than once when the share has more than one entry in either the *_users 
or *_groups table. Otherwise, we get erroneous share counts.



Not ideal, but necessary with the current implementation. I agree 
things need to be reorganized after seeing the results of those 
EXPLAIN statements.

Michael Rubinsky <mrubinsk@horde.org> 2008-10-06 14:03:52
Actually, looking over the code now, it looks like we *might* be able 
to drop the duplicates (and get rid of the need for the DISTINCT 
qualifier) where we need to in PHP code. Not 100% sure yet, and it 
would probably require an extra loop in _countShares(), but it might 
be worth it if it's possible.

Michael Rubinsky <mrubinsk@horde.org> 2008-10-06 14:39:33
What about something like this for removing the DISTINCT?



Ran it through a few common tasks in Turba, but not tested in depth.

Duck <duck@obala.net> 2008-10-06 17:35:40
In general I am counter to a solution on the PHP part. As you still 
need to load the data on the client site (probably from a remote 
server) and process it.



What about GROUP BY?

DISTINCT is normaly used to remove duplicates. GROUP BY to aggregate 
operators to each group. In our occasion DISTINCT and GROUP BY 
generates the same query plan. Probably will be faster to use GROUP BY 
for listings as better results in joins.

Even for counting we can use GROUP BY where we need to return just the 
number of rows affected instead of looping it into the countShares().



Look at the patch. And try it on your data.

Chuck Hagenbuch <chuck@horde.org> 2008-10-06 20:03:57
Can either/both of you provide EXPLAIN output for these patches?

Michael Rubinsky <mrubinsk@horde.org> 2008-10-06 22:16:44
Very small dataset, so I'm not sure how helpful this is, but here it 
is.  For me, both GROUP BY and DISTINCT are using a temporary tables.

Duck <duck@obala.net> 2008-10-08 13:36:01
> I still think this is worth exploring, but Jan and Michael R have

> raised some good points on how we can support custom permissions

> (such as Kronolith's PERMS_DELEGATE) in a BC way.



if PERMS_DELEGATE and other custom permissions must be possible, we 
can still do as with attributes - relay on table structure. We can 
detect fields on _save() and select with *.



Duck <duck@obala.net> 2008-10-08 15:38:54
> For me, both GROUP BY and DISTINCT are using a temporary tables.



This is normal. As a temporary table is created for every join. Then 
depends on the data length and server settings if a table is created 
in memory or on disk.



Duck <duck@obala.net> 2008-10-08 15:54:07
I create a new version of sql and sql_heirarical driver that stores 
every permission type in its own column. It checks table structure so 
even custom permission like PERM_DELEGATE from Kronolith should work 
if a columns like perm_default/guest/creater_delegate will be created. 
If share caching is enabled, it check the table structure only once 
per session.



I tested the driver with Spread and Ansel. So it should work but is 
not fully tested. I will try to make some performance test on my large 
installation (63k galleries).

Duck <duck@obala.net> 2008-10-09 07:13:44
ACTUAL QUERY  - time: 2.062

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
(s.perm_creator & 2) != 0 OR (s.perm_default & 2) != 0 OR ( u.user_uid 
= 'duck' AND (u.perm & 2) != 0))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



FLAT CRITERIA - time: 2.031

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
(s.perm_creator_show = 1) OR (s.perm_default_show = 1) OR ( u.user_uid 
= 'duck' AND u.perm_show = 1))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



WITHOUT PARENTELS IF NOT NEEDED  - time: 1.964

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR 
s.perm_creator_show = 1  OR s.perm_default_show = 1  OR ( u.user_uid = 
'duck' AND u.perm_show = 1 ))  AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



JOIN CIRTERIA OUT OF WHERE  - time: 1.9208

SELECT DISTINCT s.*  FROM ansel_shares s  LEFT JOIN ansel_shares_users 
AS u ON u.share_id = s.share_id AND u.user_uid = 'duck' AND 
u.perm_show = 1 WHERE ( (s.share_owner = 'duck' OR s.perm_creator_show 
= 1  OR s.perm_default_show = 1) AND attribute_images > 2) AND 
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id 
DESC



Chuck Hagenbuch <chuck@horde.org> 2008-11-04 17:08:56
Duck - as far as I know you have the largest data set for shares. 
Could you make it available for others to use in performance testing? 
Alternately, we could generate some random test data and use that. I 
know it'd be better than my local database for real scale testing.

Duck <duck@obala.net> 2008-11-05 11:11:56
> Duck - as far as I know you have the largest data set for shares.

> Could you make it available for others to use in performance testing?

> Alternately, we could generate some random test data and use that. I

> know it'd be better than my local database for real scale testing.



I will prepare some testing data and some possible structures in the 
next days.

paul.hamby@kw.com 2009-07-08 18:10:07
Any progress on this bug? We have a fairly large user base (approx 
75000 users, 10-1500 users actually login daily). We recently upgraded 
from an older Horde version to the Horde Webmail Edition 1.2.2. With 
the upgrade, we now have 3 load balanced webmail servers all talking 
to one Mysql server (ver. 5.0..32, dual proc, dual core 2.0 GHz Xeons 
with 4GB RAM).

The webmail servers are holding up great, but the MySQL box is under 
heavy load. We've done a fair amount of tuning and tweaking, but we 
appear to be affected by this "bug". When we run "mytop" we 
consistently see queries similar to the one mentioned in this bug (for 
turba, nag, mnemo and kronolith tables) which do not use indexes and 
create temporary tables. These are also the only queries that show up 
in the mysql-slow.log.



Chuck Hagenbuch <chuck@horde.org> 2009-07-09 05:10:52
Duck - any chance of getting that test data?

Jan Schneider <jan@horde.org> 2009-10-01 22:01:34
Ping?

Duck <duck@obala.net> 2009-10-02 08:57:26
> Ping?



Nothing from me at this time. Probably even not in future as the next 
month I am going to join another company where RoR is used an I don't 
know if I will have time to spend in for Horde anymore. For the next 
30 days I will try to update all my code to the last H4 changes and 
fix my internal buglist. This is all I can offer for now. I have one 
guy who will replace me in this company.

leena.heino@uta.fi 2009-10-02 09:28:35
>> Ping?

>

> Nothing from me at this time. Probably even not in future as the next

> month I am going to join another company where RoR is used an I don't

> know if I will have time to spend in for Horde anymore. For the next

> 30 days I will try to update all my code to the last H4 changes and

> fix my internal buglist. This is all I can offer for now. I have one

> guy who will replace me in this company.



I have a suggestion. Replace the current sql based share driver with 
the one that does not use the bitwise operations. Release notes 
suggest that the share sql driver was to be a major improvement from 
the datatree based shares. But reading the comments the current share 
sql driver is a major bottleneck in large installations and in 
actually use it can be worse than the share datatree driver that it 
replaced.



Would it be possible for someone to write some sort of conversion 
script from the current share sql to the new one that does not use 
bitwise operations. I could then test this new code with data from our 
installation.

Jan Schneider <jan@horde.org> 2009-10-02 09:48:25
>> Ping?

>

> Nothing from me at this time.



This was actually a ping regarding Chuck's question for a large set of 
test data.

Duck <duck@obala.net> 2009-10-02 09:55:27
> Release notes

> suggest that the share sql driver was to be a major improvement from

> the datatree based shares. But reading the comments the current share

> sql driver is a major bottleneck in large installations and in

> actually use it can be worse than the share datatree driver that it

> replaced.



I wrote the sql driver as the datatree driver was 100 time slower on 
really large data. Tested on my production installation we fall from 
40s to 0.34s or something per Ansel gallery list with caa 80k 
galleries and 1M images. Mainly because of how sql sever creates and 
manages temporary tables for query execution. Even others have 
reported big speed improvements on the mailing-list. But, yes, it was 
just a step forward to something really useful. See ticket #6109.



>

> Would it be possible for someone to write some sort of conversion

> script from the current share sql to the new one that does not use

> bitwise operations. I could then test this new code with data from

> our installation.



A good time to rewrite Hore_Share into H4/git lib :)

ibon.igartua@ehu.es 2009-10-30 13:01:09
We have the same problem here at the University of the Basque Country. 
We have recently upgraded from Horde Groupware Webmail 1.0 to 1.2 and 
activated the shares and the load of the MySql server is frightening.



Practical question: moving from MySQL to Postgree or Oracle would 
solve this problem ?

Duck <duck@obala.net> 2009-11-06 10:04:32
> Practical question: moving from MySQL to Postgree or Oracle would 
> solve this problem ?

MySQL doesn't feature bitmap type indexes. If you switch to a DB 
server that does, it will  help a lot.


ibon.igartua@ehu.es 2009-11-06 10:58:57
Note: we have experienced a considerable improvement moving the 
affected tables (*_shares & *_shares_users) from MyISAM to InnoDB

Chuck Hagenbuch <chuck@horde.org> 2009-11-08 16:50:32
ibon: that's really interesting. Have you observed similar things with 
other tables, or is this specific to the share tables? Do you have a 
sense of what about innodb is giving better performance here? Thanks!

Duck <duck@obala.net> 2009-11-10 19:00:27

Chuck Hagenbuch <chuck@horde.org> 2010-01-25 02:33:16
Attaching a patch from John Madden, via the kronolith mailing list.

ibon.igartua@ehu.es 2010-01-25 12:11:25
> ibon: that's really interesting. Have you observed similar things 
> with other tables, or is this specific to the share tables? Do you 
> have a sense of what about innodb is giving better performance here?

  Sorry for the delay replying. We only changed to InnoDB the _shares 
tables. I believe the performance improved a lit due to the way MySQL 
caches the table data always in main memory.

  At the moment we are still experiencing serious performance problems 
at peak ours. We use to have more than 1K active sessions 
simultaneously (active in the last 5 mins) with 25k rows in *_shares 
tables (kronolith, mnemo, nag and turba).

  We have a "big" server dedicated for the horde MySQL 5 server: 2 
quad core with 32 Gb RAM. We are at the moment setting up a second 
server and trying to configure them as multi-master. One for writing 
and both balanced for reading. We hope it'll improve the performance.

  By the way ... is this last patch published here reliable?

ibon.igartua@ehu.es 2010-05-31 12:56:28

The attached patch optimizes the way we do the problematic slow query. 
We have patched our production environment and our problem seems to be 
definitively solved. Thanks god!

We can do it because in our horde installation we don't allow sharing 
anything with all the users (authenticated or not) - perm_creator, 
perm_default and perm_guest will always be zero in our database

We are changing the original LEFT JOIN with a UNION

At the moment we ONLY change the query if the user is not part of any 
group and if we don't have $attributes. It could be improved, but this 
is enough for us.

Original query:
        SELECT DISTINCT s.*
        FROM nag_shares s
        LEFT JOIN nag_shares_users AS u ON u.share_id = s.share_id
        WHERE s.share_owner = 'username'
        OR (s.perm_creator & 2)
        OR (s.perm_default & 2)
        OR ( u.user_uid = 'username' AND (u.perm & 2))
        ORDER BY s.attribute_name ASC;

Improved query (written by David Fernandez -sysadmin at the UPV/EHU- 
based on the patch sent by John Madden to the kronolith mailing list:
        SELECT DISTINCT s.*
        FROM nag_shares s
        WHERE s.share_owner = 'username'
        UNION
        SELECT s2.*
        FROM nag_shares s2 RIGHT JOIN nag_shares_users u ON s2.share_id = u.share_id
        WHERE ( u.user_uid = 'username' AND (u.perm & 2))

Comments are welcomed !

Chuck Hagenbuch <chuck@horde.org> 2011-01-30 03:59:27
seems like this can be closed now with Sqlng?

Jan Schneider <jan@horde.org> 2011-01-30 10:51:02
Yes