6.0.0-git
2019-04-24

[#4797] A large horde_vfs table can cause large memory usage during GC
Summary A large horde_vfs table can cause large memory usage during GC
Queue Horde Framework Packages
Queue Version HEAD
Type Bug
State Resolved
Priority 2. Medium
Owners
Requester sean (at) duke (dot) edu
Created 2006-12-18 (4510 days ago)
Due
Updated 2006-12-21 (4507 days ago)
Assigned 2006-12-20 (4508 days ago)
Resolved 2006-12-21 (4507 days ago)
Milestone
Patch No

History
2006-12-21 18:48:35 Chuck Hagenbuch Comment #8
State ⇒ Resolved
Reply to this comment
I was able to get it to work, with two small changes.

1) Make the first line of the function be '$conn = $this->_connect();'
2) Change '$this->_write_db' to be '$this->db'
This means you are not using HEAD, and in fact you're not even using 
the latest FW_3 code. VFS_sql was changed a while ago to not load the 
body of every file in the _listFolder() method (it uses the db server 
to get the file sizes instead), which is probably what's driving up 
your memory usage anyway. I'll keep the changes, but you should really 
update.
A couple other comments, I notice a lot of the other functions use
$this->_db->quote(), but this one doesn't.
It uses a prepared statement (the ? placeholders).
Also, will there be a similar function put in place for sql_file?
That'd have to do filesystem operations anyway - not nearly as much of 
a speed/efficiency increase. I'd probably accept a patch for one though.
2006-12-21 16:11:43 sean (at) duke (dot) edu Comment #7 Reply to this comment
Thanks for the quick turnaround.



I was able to get it to work, with two small changes.



1) Make the first line of the function be '$conn = $this->_connect();'

2) Change '$this->_write_db' to be '$this->db'



A couple other comments, I notice a lot of the other functions use 
$this->_db->quote(), but this one doesn't.    Also, will there be a 
similar function put in place for sql_file?
2006-12-20 23:09:27 Chuck Hagenbuch Comment #6 Reply to this comment
Okay, can you give this implementation a shot? Drop this into VFS_sql:



     /**

      * Garbage collect files in the VFS storage system.

      *

      * @param string $path   The VFS path to clean.

      * @param integer $secs  The minimum amount of time (in seconds) required

      *                       before a file is removed.

      */

     function gc($path, $secs = 345600)

     {

         $sql = 'DELETE FROM ' . $this->_params['table']

             . ' WHERE vfs_type = ? AND vfs_modified < ? AND (vfs_path 
= ? OR vfs_path LIKE ?)';

         $this->log($sql, PEAR_LOG_DEBUG);



         $values = array(VFS_FILE,

                         time() - $secs,

                         $this->_convertPath($path),

                         $this->_convertPath($path) . '/%');



         return $this->_write_db->query($sql, $values);

     }



... and use it with the latest version of GC.php 
(http://cvs.horde.org/framework/VFS/VFS/GC.php - it'll call a gc() 
method if it exists in the passed-in $vfs object).
2006-12-20 21:33:25 sean (at) duke (dot) edu Comment #5 Reply to this comment
Here's what I'm using in an hourly cron job (put in place to work 
around this bug):



THRESHOLD=`date -d "12 hours ago" +%s`

DEL_COMMAND="DELETE FROM horde_vfs WHERE vfs_path = 
'.horde/imp/compose' AND vfs_modified < ${THRESHOLD}"





After doing this, I'm thinking it might be better to replace the 
'vfs_path = '.horde/imp/compose'' check with a check on the vfs_type.   
  The main concern I have here is not deleting the directories.     
This works for the IMP application.  It may need to be made more 
robust for a general use statement.


2006-12-20 20:08:11 Chuck Hagenbuch Comment #4 Reply to this comment
I'm referring to the code in lib/VFS/GC.php
Thanks. I thought it was in VFS somewhere but I forgot about that file 
and didn't grep - my bad.



Do you happen to have the DELETE statement already, to save my lazy 
self a bit more time?
2006-12-20 19:44:46 sean (at) duke (dot) edu Comment #3 Reply to this comment
Which garbage collection are you referring to?
I'm referring to the code in lib/VFS/GC.php
2006-12-20 19:42:36 Chuck Hagenbuch Comment #2
State ⇒ Feedback
Reply to this comment
Which garbage collection are you referring to?
2006-12-20 19:41:11 Chuck Hagenbuch Version ⇒ HEAD
Queue ⇒ Horde Framework Packages
State ⇒
 
2006-12-18 16:07:37 sean (at) duke (dot) edu Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 2. Medium
Summary ⇒ A large horde_vfs table can cause large memory usage during GC
Queue ⇒ Horde Base
Reply to this comment
I'm running IMP using Horde VFS as the filestore for attachment 
uploads.  I'm using MySQL as the Horde VFS backend.



When my horde_vfs table grew to 4GB in size, an apache process would 
grow to almost 4GB in size when GC was run.  This caused quite a bit 
of thrashing on boxes that only have 4GB of memory.



After looking at the code, when GC is run, with the SQL backend, it 
ends up doing a select to load the whole table contents into memory.   
This presumably gets put into a buffer in the mysql libs before being 
loaded into PHP, and thus before the PHP memory limit can take effect.



The unfortunate part is that all SQL needs for doing this cleanup is a 
single DELETE statement, then the mysql server will do all the work.   
As such, I suggest allowing each Horde VFS backend have its own GC code.

Saved Queries