Summary | Avoid bitwise operations in the SQL Share driver |
Queue | Horde Framework Packages |
Queue Version | FRAMEWORK_3 |
Type | Enhancement |
State | Resolved |
Priority | 2. Medium |
Owners | Horde Developers (at) , jan (at) horde (dot) org |
Requester | leena.heino (at) uta (dot) fi |
Created | 09/19/2008 (6082 days ago) |
Due | |
Updated | 01/30/2011 (5219 days ago) |
Assigned | 01/25/2010 (5589 days ago) |
Resolved | 01/30/2011 (5219 days ago) |
Milestone | |
Patch | No |
State ⇒ Resolved
Assigned to Jan Schneider
State ⇒ Feedback
New Attachment: sql.php[2].patch
The attached patch optimizes the way we do the problematic slow query.
We have patched our production environment and our problem seems to be
definitively solved. Thanks god!
We can do it because in our horde installation we don't allow sharing
anything with all the users (authenticated or not) - perm_creator,
perm_default and perm_guest will always be zero in our database
We are changing the original LEFT JOIN with a UNION
At the moment we ONLY change the query if the user is not part of any
group and if we don't have $attributes. It could be improved, but this
is enough for us.
Original query:
SELECT DISTINCT s.*
FROM nag_shares s
LEFT JOIN nag_shares_users AS u ON u.share_id = s.share_id
WHERE s.share_owner = 'username'
OR (s.perm_creator & 2)
OR (s.perm_default & 2)
OR ( u.user_uid = 'username' AND (u.perm & 2))
ORDER BY s.attribute_name ASC;
Improved query (written by David Fernandez -sysadmin at the UPV/EHU-
based on the patch sent by John Madden to the kronolith mailing list:
SELECT DISTINCT s.*
FROM nag_shares s
WHERE s.share_owner = 'username'
UNION
SELECT s2.*
FROM nag_shares s2 RIGHT JOIN nag_shares_users u ON s2.share_id = u.share_id
WHERE ( u.user_uid = 'username' AND (u.perm & 2))
Comments are welcomed !
with other tables, or is this specific to the share tables? Do you
have a sense of what about innodb is giving better performance here?
tables. I believe the performance improved a lit due to the way MySQL
caches the table data always in main memory.
At the moment we are still experiencing serious performance problems
at peak ours. We use to have more than 1K active sessions
simultaneously (active in the last 5 mins) with 25k rows in *_shares
tables (kronolith, mnemo, nag and turba).
We have a "big" server dedicated for the horde MySQL 5 server: 2
quad core with 32 Gb RAM. We are at the moment setting up a second
server and trying to configure them as multi-master. One for writing
and both balanced for reading. We hope it'll improve the performance.
By the way ... is this last patch published here reliable?
Taken from Duck
State ⇒ Assigned
Patch ⇒ No
New Attachment: sql.php[1].patch
other tables, or is this specific to the share tables? Do you have a
sense of what about innodb is giving better performance here? Thanks!
affected tables (*_shares & *_shares_users) from MyISAM to InnoDB
solve this problem ?
server that does, it will help a lot.
We have recently upgraded from Horde Groupware Webmail 1.0 to 1.2 and
activated the shares and the load of the MySql server is frightening.
Practical question: moving from MySQL to Postgree or Oracle would
solve this problem ?
suggest that the share sql driver was to be a major improvement from
the datatree based shares. But reading the comments the current share
sql driver is a major bottleneck in large installations and in
actually use it can be worse than the share datatree driver that it
replaced.
really large data. Tested on my production installation we fall from
40s to 0.34s or something per Ansel gallery list with caa 80k
galleries and 1M images. Mainly because of how sql sever creates and
manages temporary tables for query execution. Even others have
reported big speed improvements on the mailing-list. But, yes, it was
just a step forward to something really useful. See
ticket #6109.script from the current share sql to the new one that does not use
bitwise operations. I could then test this new code with data from
our installation.
test data.
month I am going to join another company where RoR is used an I don't
know if I will have time to spend in for Horde anymore. For the next
30 days I will try to update all my code to the last H4 changes and
fix my internal buglist. This is all I can offer for now. I have one
guy who will replace me in this company.
the one that does not use the bitwise operations. Release notes
suggest that the share sql driver was to be a major improvement from
the datatree based shares. But reading the comments the current share
sql driver is a major bottleneck in large installations and in
actually use it can be worse than the share datatree driver that it
replaced.
Would it be possible for someone to write some sort of conversion
script from the current share sql to the new one that does not use
bitwise operations. I could then test this new code with data from our
installation.
month I am going to join another company where RoR is used an I don't
know if I will have time to spend in for Horde anymore. For the next
30 days I will try to update all my code to the last H4 changes and
fix my internal buglist. This is all I can offer for now. I have one
guy who will replace me in this company.
Assigned to Duck
Taken from Chuck Hagenbuch
State ⇒ Feedback
75000 users, 10-1500 users actually login daily). We recently upgraded
from an older Horde version to the Horde Webmail Edition 1.2.2. With
the upgrade, we now have 3 load balanced webmail servers all talking
to one Mysql server (ver. 5.0..32, dual proc, dual core 2.0 GHz Xeons
with 4GB RAM).
The webmail servers are holding up great, but the MySQL box is under
heavy load. We've done a fair amount of tuning and tweaking, but we
appear to be affected by this "bug". When we run "mytop" we
consistently see queries similar to the one mentioned in this bug (for
turba, nag, mnemo and kronolith tables) which do not use indexes and
create temporary tables. These are also the only queries that show up
in the mysql-slow.log.
Could you make it available for others to use in performance testing?
Alternately, we could generate some random test data and use that. I
know it'd be better than my local database for real scale testing.
next days.
Could you make it available for others to use in performance testing?
Alternately, we could generate some random test data and use that. I
know it'd be better than my local database for real scale testing.
SELECT DISTINCT s.* FROM ansel_shares s LEFT JOIN ansel_shares_users
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR
(s.perm_creator & 2) != 0 OR (s.perm_default & 2) != 0 OR ( u.user_uid
= 'duck' AND (u.perm & 2) != 0)) AND attribute_images > 2) AND
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id
DESC
FLAT CRITERIA - time: 2.031
SELECT DISTINCT s.* FROM ansel_shares s LEFT JOIN ansel_shares_users
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR
(s.perm_creator_show = 1) OR (s.perm_default_show = 1) OR ( u.user_uid
= 'duck' AND u.perm_show = 1)) AND attribute_images > 2) AND
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id
DESC
WITHOUT PARENTELS IF NOT NEEDED - time: 1.964
SELECT DISTINCT s.* FROM ansel_shares s LEFT JOIN ansel_shares_users
AS u ON u.share_id = s.share_id WHERE ( (s.share_owner = 'duck' OR
s.perm_creator_show = 1 OR s.perm_default_show = 1 OR ( u.user_uid =
'duck' AND u.perm_show = 1 )) AND attribute_images > 2) AND
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id
DESC
JOIN CIRTERIA OUT OF WHERE - time: 1.9208
SELECT DISTINCT s.* FROM ansel_shares s LEFT JOIN ansel_shares_users
AS u ON u.share_id = s.share_id AND u.user_uid = 'duck' AND
u.perm_show = 1 WHERE ( (s.share_owner = 'duck' OR s.perm_creator_show
= 1 OR s.perm_default_show = 1) AND attribute_images > 2) AND
(s.share_parents = '' OR s.share_parents IS NULL) ORDER BY s.share_id
DESC
New Attachment: shares.tgz
every permission type in its own column. It checks table structure so
even custom permission like PERM_DELEGATE from Kronolith should work
if a columns like perm_default/guest/creater_delegate will be created.
If share caching is enabled, it check the table structure only once
per session.
I tested the driver with Spread and Ansel. So it should work but is
not fully tested. I will try to make some performance test on my large
installation (63k galleries).
depends on the data length and server settings if a table is created
in memory or on disk.
raised some good points on how we can support custom permissions
(such as Kronolith's PERMS_DELEGATE) in a BC way.
can still do as with attributes - relay on table structure. We can
detect fields on _save() and select with *.
New Attachment: mrubinsk_explains.txt
is. For me, both GROUP BY and DISTINCT are using a temporary tables.
New Attachment: sql.diff
need to load the data on the client site (probably from a remote
server) and process it.
What about GROUP BY?
DISTINCT is normaly used to remove duplicates. GROUP BY to aggregate
operators to each group. In our occasion DISTINCT and GROUP BY
generates the same query plan. Probably will be faster to use GROUP BY
for listings as better results in joins.
Even for counting we can use GROUP BY where we need to return just the
number of rows affected instead of looping it into the countShares().
Look at the patch. And try it on your data.
Patch ⇒ Yes
New Attachment: sql.php.patch
Ran it through a few common tasks in Turba, but not tested in depth.
to drop the duplicates (and get rid of the need for the DISTINCT
qualifier) where we need to in PHP code. Not 100% sure yet, and it
would probably require an extra loop in _countShares(), but it might
be worth it if it's possible.
than once when the share has more than one entry in either the *_users
or *_groups table. Otherwise, we get erroneous share counts.
Not ideal, but necessary with the current implementation. I agree
things need to be reorganized after seeing the results of those
EXPLAIN statements.
full table scan. I was aware of this while I was writing the native
driver. But I let this problem as it is still much better than the DT
driver and had no time to solve even this. This part is still on my
TODO...
Another table scan was introduced in in revision 1.35 of mrubinsk by
adding the DISTINCT statement in the SQL query. And this is the *real*
performance killer for MySQL. As it must scan the joined table - it
means that must allocate a new temporary table with all joined data,
using limit or not. Brrr....
So we must reorganize data and query to avoid this two table scans.
Assigned to
New Attachment: explains.txt
great, and the datatree query isn't doing as much work, but the
Share_sql query as is now is clearly pretty bad.
instead of "LEFT JOIN kronolith_shares_users") your modified query
with the subquery was up to 6 times as fast as the original query. The
database holds about 12,000 shares.
Summary ⇒ Avoid bitwise operations in the SQL Share driver
State ⇒ Accepted
raised some good points on how we can support custom permissions (such
as Kronolith's PERMS_DELEGATE) in a BC way.
State ⇒ New
Priority ⇒ 2. Medium
Type ⇒ Enhancement
Summary ⇒ share sql driver performance
Queue ⇒ Horde Framework Packages
Milestone ⇒
Patch ⇒ No
performance killer at least in MySQL. This seems to be because SQL's
bitwise operations do not use index in MySQL and therefore each query
made in share sql-driver has to make a full table scan.
Would it be possible to change the database schema and the code so
that each permission to different user type (creator/default/guest)
would be stored in its own column and therefore we would have columns
like perm_creator_show, perm_creator_read, perm_creator_edit and
perm_creator_delete. This would allow us to ditch the bitwise
operations in the query.
Instead of current query like this:
SELECT DISTINCT s.* FROM nag_shares s
LEFT JOIN nag_shares_users AS u
ON u.share_id = s.share_id WHERE s.share_owner = 'foo'
OR (s.perm_creator & 4) != 0
OR (s.perm_default & 4) != 0
OR ( u.user_uid = 'foo' AND (u.perm & 4) != 0)
ORDER BY s.attribute_name ASC;
we could do the same query as:
SELECT DISTINCT s.* FROM nag_shares s
LEFT JOIN nag_shares_users AS u
ON u.share_id = s.share_id WHERE s.share_owner = 'foo'
OR (s.perm_creator_read = 1)
OR (s.perm_default_read = 1)
OR ( u.user_uid = 'foo' AND (u.perm_read = 1))
ORDER BY s.attribute_name ASC;
Or maybe rewriting the query using subquery and union:
SELECT DISTINCT * FROM ((SELECT s.* FROM nag_shares s
WHERE (s.perm_creator = 1)
OR (s.perm_default = 1)
OR (s.share_owner = 'foo')) UNION (SELECT s.* FROM nag_shares s
LEFT JOIN kronolith_shares_users AS u ON u.share_id = s.share_id WHERE
(u.user_uid = 'foo'
AND (u.perm = 1)))) AS new_s ORDER BY new_s.attribute_name ASC;
Perhaps gaining a better utilization of index in the database queries.