Tickets :: [#5913] Lower memory usage while downloading files

6.0.0-beta1

7/26/25

Summary	Lower memory usage while downloading files
Queue	Gollem
Queue Version	HEAD
Type	Enhancement
State	Resolved
Priority	1. Low
Owners	chuck (at) horde (dot) org
Requester	olger.diekstra (at) cardno (dot) com (dot) au
Created	11/22/2007 (6456 days ago)
Due
Updated	09/04/2010 (5439 days ago)
Assigned
Resolved	12/11/2007 (6437 days ago)
Milestone
Patch	No

12/11/2007 07:28:54 PM	Chuck Hagenbuch	Comment #22	Reply to this comment
When will the next stable release see the light? All of these changes will be in the Horde 3.2 release series - see http://wiki.horde.org/ReleaseManagement for some details. The VFS changes were in the latest Horde 3.2 release candidate, and the Gollem changes will be in Gollem 1.1, to be released after Horde 3.2 is out.

12/11/2007 05:19:09 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #21	Reply to this comment
Hehe, I'm happy with 8mb memory_limit... I agree that not using usleep is better, even if it uses just .05 seconds. Windows? What's that...? ;-) Glad to be of help! When will the next stable release see the light? Cheers! Olger. I made it 8192 bytes without the usleep. I guess we could make it 4mb and maybe get in under the default 8mb memory limit? I didn't include the usleep because you didn't have it in your original code and it didn't "smell" right to me (won't work on windows either). Thanks for all your work on this! I'm closing the ticket but things can still be tweaked of course.

12/11/2007 05:03:16 AM	Chuck Hagenbuch	Comment #20 State ⇒ Resolved	Reply to this comment
I made it 8192 bytes without the usleep. I guess we could make it 4mb and maybe get in under the default 8mb memory limit? I didn't include the usleep because you didn't have it in your original code and it didn't "smell" right to me (won't work on windows either). Thanks for all your work on this! I'm closing the ticket but things can still be tweaked of course.

12/11/2007 12:18:49 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #19	Reply to this comment
OK, lack of time prevented me from going further on this. Tried the modifications. It didn't work so I took a look at view.php and modified it as follows: case 'download_file': $browser->downloadHeaders($filename, null, false, $GLOBALS['gollem_vfs']->size($filedir, $filename)); if (is_resource($stream)) { while ($buffer = fread($stream, 10240)) { print $buffer; ob_flush(); flush(); usleep(50000); } } else { echo $data; } /* if (is_resource($stream)) { fpassthru($stream); } else { echo $data; } */ break; That works for me with 16MB memory_limit downloading a 61MB file. The fpassthru still goggles up more memory than is desirable. ob_flush() and flush() are nessary to clear internals of the webserver and php, the usleep(50000) (.05 seconds) allows a bit of time for the buffers to be actually flushed before filling them again. Hows that? Cheers! Olger.

12/04/2007 04:34:27 PM	Chuck Hagenbuch	Comment #18	Reply to this comment
Is it possible to get a zipped up file of the development code (like including the code you put in there) from the cvs? Like just one zipfile for Horde and one for Gollem that I can simply download and unzip? I tried using the files from the snapshots, but they don't seem to have to files I'm after. That's exactly what the snapshots are. Perhaps you're not using the right snapshots. From last nights, you want: http://ftp.horde.org/pub/snaps/latest/framework-HEAD-2007-12-04.tar.gz http://ftp.horde.org/pub/snaps/latest/gollem-HEAD-2007-12-04.tar.gz I haven't used cvs before (am usually a lone programmer, so haven't had much need for it) You're missing the point of a version control system then, but that's a separate issue. :)

12/04/2007 06:44:50 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #17	Reply to this comment
Is it possible to get a zipped up file of the development code (like including the code you put in there) from the cvs? Like just one zipfile for Horde and one for Gollem that I can simply download and unzip? I tried using the files from the snapshots, but they don't seem to have to files I'm after. I haven't used cvs before (am usually a lone programmer, so haven't had much need for it) and find it very unlogical. Tried getting a cvs client but that didn't help much either.

12/04/2007 05:07:29 AM	Chuck Hagenbuch	Deleted Original Message

12/04/2007 05:07:12 AM	Chuck Hagenbuch	Priority ⇒ 1. Low

12/04/2007 05:07:00 AM	Chuck Hagenbuch	Comment #16	Reply to this comment
I'm more than happy to help make it work for every filesystem. Took me a while to figure out how to get files out of the cvs system though, but I've got the files you modified. (btw, running Horde 3.1.5 and Gollem H3 1.0.3 for testing). Tested your modifications No, you didn't test the changes to Gollem. You need the VFS updates and the Gollem changes. The thing is though, in the gollem/view.php file there is a line: $data = $GLOBALS['gollem_vfs']->read($filedir, $filename); http://cvs.horde.org/diff.php?r1=1.60&r2=1.61&f=gollem%2Fview.php Your view.php looks nothing like the current one in Gollem CVS. See http://horde.org/source/ if you still need help with CVS. For some reason it doesn't copy the file from the VFS to the local filesystem, but you may know why. ... because you're still accessing the VFS. copy() doesn't work the way you're trying to use it. To copy to a _different_ filesystem (virtual or otherwise) you need to read the file data.

12/04/2007 12:23:04 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #15 New Attachment: view.php	Reply to this comment
Hi Chuck, I'm more than happy to help make it work for every filesystem. Took me a while to figure out how to get files out of the cvs system though, but I've got the files you modified. (btw, running Horde 3.1.5 and Gollem H3 1.0.3 for testing). Tested your modifications, but no joy. Tried downloading a 380MB file, with various memory settings, all the way up to 800MB memory_limit. Keeps eating the memory. The thing is though, in the gollem/view.php file there is a line: $data = $GLOBALS['gollem_vfs']->read($filedir, $filename); which reads everything that's returned from the "read ( , )" function into a variable (in memory, which is a local resource!). In the function "read( , )" you use another variable to read the file into a variable, so basically your doubling the memory need to be able to download a file. Memory is a local resource. It doesn't care if it came from SQL, FTP, SMB. It's stored in local memory in the machine serving the browser. As opposed to copying the entire file into memory, why not copy it to disk (temp directory comes to mind)? (I've used a hardcoded '/tmp/' for now, but its easy enough to get the system variable). Using my previously ReadFileChunked function, that would be easy and compatible with any backend. It would need some additional checking I suppose, but this illustrates the basic idea (gollem/view.php), see attachment. For some reason it doesn't copy the file from the VFS to the local filesystem, but you may know why. I think it gets the basic idea across. Cheers! Olger. [Show Quoted Text - 19 lines][Hide Quoted Text] I know that you are trying to be helpful and I appreciate that. But at some point you need to start to understand the codebase that you are working with. What your changes do is make sure that Gollem can't work with any backend except a local filesystem. I hope it's obvious that that's not an acceptable change. My commits were to the CVS version of Horde and Gollem (Horde is at 3.2-RC1, Gollem will be released as Gollem 1.1 once Horde 3.2 is released - see http://wiki.horde.org/ReleaseManagement). If you are interested in helping us in a way that can be incorporated into the code and that will help all Gollem users, you need to test that version as well. You can get snapshots from http://snaps.horde.org/. You should be able to just drop in the new file.php and view.php files as well, but since I'm not sure which version you mean by "latest and greatest", I can't guarantee that. Horde 3.2 is backwards compatible with Horde 3.0, though.

11/30/2007 12:08:11 AM	Chuck Hagenbuch	Comment #14	Reply to this comment
I know that you are trying to be helpful and I appreciate that. But at some point you need to start to understand the codebase that you are working with. What your changes do is make sure that Gollem can't work with any backend except a local filesystem. I hope it's obvious that that's not an acceptable change. My commits were to the CVS version of Horde and Gollem (Horde is at 3.2-RC1, Gollem will be released as Gollem 1.1 once Horde 3.2 is released - see http://wiki.horde.org/ReleaseManagement). If you are interested in helping us in a way that can be incorporated into the code and that will help all Gollem users, you need to test that version as well. You can get snapshots from http://snaps.horde.org/. You should be able to just drop in the new file.php and view.php files as well, but since I'm not sure which version you mean by "latest and greatest", I can't guarantee that. Horde 3.2 is backwards compatible with Horde 3.0, though.

11/30/2007 12:01:22 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #13	Reply to this comment
OK, fixed it myself. Figured out how to use objects (again, used to program in Turbo Pascal with objects loooong time ago...), and resolved my issue with the hardcoded pathname. Also fixed my memory issue while I was at it. PHP memory_limit now set to 16MB, and downloading 368MB file. Sweet. Still in the same file (gollem/view.php). This is what I done: /* $data = $GLOBALS['gollem_vfs']->read($filedir, $filename); if (is_a($data, 'PEAR_Error')) { $notification->push(sprintf(_("Access denied to %s"), $filename), 'horde.error'); header('Location: ' . Util::addParameter(Horde::applicationUrl('manager.php', true), 'actionID', $actionID)); exit; }/ Commented this part out completely as it read the file into memory, stuffing it up. / Run through action handlers. / switch ($actionID) { case 'download_file': $File_Dir = $GLOBALS['gollem_vfs']->_getNativePath($filedir, $filename); $File_Size = FileSize($File_Dir); $browser->downloadHeaders($filename, null, false, $File_Size); ReadFileChunked($File_Dir); / echo $data; */ break; Let me know what you think! Cheers! Olger.

11/29/2007 11:33:46 PM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #12	Reply to this comment
Hi Chuck, Installed the latest and greatest stable of Horde/Gollum, but couldn't really figure out how to place the file.php/ftp.php. I found similar files, in the Horde VFS directory, but they were of a vastly different size. So haven't been able to test this yet. However, having had a spare minute to play with this, I did get it working with my own function. I've just downloaded a 368MB file succesfully from Gollem with my own function. Here's how: Modified gollem/view.php: Added function Function ReadFileChunked ($FileName) { $chunksize = (102400); // how many bytes per chunk $buffer = ''; $handle = fopen($FileName, 'rb'); if ($handle === false) { return false; } while (!feof($handle)) { $buffer = fread($handle, $chunksize); print $buffer; } return fclose($handle); } Then changed this section: case 'download_file': $browser->downloadHeaders($filename, null, false, strlen($data)); ReadFileChunked("/exampledir/home/web/".$filename); /* echo $data; */ break; "/exampledir/home" is where all the userdirectories are located, "web" is the logged in user. I didn't know how to get that information from Horde/Gollem quickly, so for testing purposes I hardcoded it. But it works a treat. The only reason I still have to keep the PHP memory_limit just over the filesize I'm trying to download is because of this line: $data = $GLOBALS['gollem_vfs']->read($filedir, $filename); Which reads the entire file into memory. But as opposed to having to set the memory_limit to just over double the file size, it now needs to be just over the size of the file. I'm not used to working with objects in PHP, so haven't been able to retrieve the directory from the $GLOBALS['gollem_vfs'] object (although I could print the array and view it). Now if we can get that one line fixed so that the object doesn't read the filecontents into memory anymore, we'd be laughing. Cheers! Olger. [Show Quoted Text - 10 lines][Hide Quoted Text] Please give these two commits a shot, assuming that you are using either the file or FTP backend in Gollem: http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072746.html http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072745.html I went ahead and used fpassthru because I didn't see anything that indicated that it read the whole file into memory - just that on some older versions it might leak, which is a problem but a different one. :)

11/29/2007 05:20:14 AM	Chuck Hagenbuch	Comment #11 Assigned to Chuck Hagenbuch	Reply to this comment
Please give these two commits a shot, assuming that you are using either the file or FTP backend in Gollem: http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072746.html http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072745.html I went ahead and used fpassthru because I didn't see anything that indicated that it read the whole file into memory - just that on some older versions it might leak, which is a problem but a different one. :)

11/25/2007 11:27:53 PM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #10	Reply to this comment
Hi Chuck, I posted my code in the initial ticket, but just in case you can't see it for some reason, here it is again (slightly modified to make it easier to read/use). All you'd basically do is call the second function with the filename, and the file should be send in bits to the browser. Function ReadFileChunked ($FileName) { $chunksize = (102400); // how many bytes per chunk $buffer = ''; $handle = fopen($FileName, 'rb'); if ($handle === false) { return false; } while (!feof($handle)) { $buffer = fread($handle, $chunksize); print $buffer; } return fclose($handle); } Function SendFileToBrowser ($FileName) { Header("Content-Type: application/force-download;"); Header("Content-Disposition: attachment; filename=\"".$FileName."\""); Header("Content-Length: ".FileSize($FileName)); ReadFileChunked($FileName); Exit; } Btw, you didn't post the code you mentioned yet, but is it really necessary to use chunked reads, or will readfile (http://www.php.net/readfile) or fpassthru do the right thing? In a way fpassthru would be ideal because we can return a stream from the VFS library in some backends, or fake it with a local file. Did a bit of reading up on fpassthru, and you might want to look at the first comment on this page: http://au.php.net/manual/en/function.fpassthru.php. There's also various other comments on memory usage downloading files on that page. Cheers! Olger.

11/25/2007 11:04:41 PM	Chuck Hagenbuch	Comment #9	Reply to this comment
Btw, you didn't post the code you mentioned yet, but is it really necessary to use chunked reads, or will readfile (http://www.php.net/readfile) or fpassthru do the right thing? In a way fpassthru would be ideal because we can return a stream from the VFS library in some backends, or fake it with a local file.

11/25/2007 11:02:27 PM	Chuck Hagenbuch	Comment #8 State ⇒ Feedback	Reply to this comment
I'm not sure what dynamically changing size would have to do with this. That was a comment on Chuck's remark about often not reading static files. I meant files on the local filesystem, vs. remote files, not often-changing files. But, it's really not that hard (from my point of view). Right now, (it would appear anyhows) the entire file is read into memory, from whatever source (DB, ftp, smb, etc). Causing memory problems with large files. As opposed to reading the file into RAM, why not copy it to local disk and then sending it to the client in chuncks, saving heaps on memory. As long as you can create that local copy without reading the file into RAM in the first place, fine. Works for FTP, but SQL is much harder. Etc. You won't get any argument that lower memory usage is better - but I think you'll have a better appreciation for things if you _do_ look at the Gollem and VFS code.

11/25/2007 10:35:40 PM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #7	Reply to this comment
Hi Guys, Looking at the sort of files that are being stored in Gollem (and the same goes for attachments as well), those wouldn't dynamically change size. I'm not sure what dynamically changing size would have to do with this. That was a comment on Chuck's remark about often not reading static files. But, it's really not that hard (from my point of view). Right now, (it would appear anyhows) the entire file is read into memory, from whatever source (DB, ftp, smb, etc). Causing memory problems with large files. As opposed to reading the file into RAM, why not copy it to local disk and then sending it to the client in chuncks, saving heaps on memory. The downside of having to use so much memory is that your webserver will start swapping to disk sooner. In general, clients sit on connections that are way slower than average disks can read, so speed is not an issue here. A one GB RAM server will happily serve hundreds of people a 250MB file when this is done in chuncks, whereas it'll choke on 2 users if those files are read into memory first. BTW, I live in Australia, so you might not get quick responses when you guys are half way through the day... :-) Cheers! Olger.

11/23/2007 11:40:49 PM	Chuck Hagenbuch	Comment #6 Version ⇒ HEAD	Reply to this comment
Michael and Jan's comments are right on the money, hopefully helping Olger see some of the complexities here. Olger, what does this refer to? Looking at the sort of files that are being stored in Gollem (and the same goes for attachments as well), those wouldn't dynamically change size. I'm not sure what dynamically changing size would have to do with this.

11/23/2007 11:04:01 AM	Jan Schneider	Comment #5	Reply to this comment
This probably gets much easier and more efficient if we can switch to streams for some backends in Horde 4.

11/23/2007 07:20:12 AM	Michael Slusarz	Comment #4	Reply to this comment
As chuck says, this is entirely undoable if using a backend like FTP, since PHP's ftp get functions don't allow us to chunk data from the ftp server. At a minimum, this kind of block/chunk reading needs to be put in the VFS drivers, not gollem.

11/22/2007 07:03:52 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #3	Reply to this comment
Ah, I think I can wait a few days... ;-) Looking at the sort of files that are being stored in Gollem (and the same goes for attachments as well), those wouldn't dynamically change size. If they do, something is probably wrong. But the code can easily be expanded to include some file checks every time it reads a chunk. Cheers! Enjoy Thanksgiving! Olger.

11/22/2007 06:41:32 AM	Chuck Hagenbuch	Comment #2 Summary ⇒ Lower memory usage while downloading files Priority ⇒ 2. Medium	Reply to this comment
Thanks for the ticket. The challenge in Gollem is we're very often not reading static files - we're reading from a database, or from an FTP server, or an SMB share, or ... In any case, we can take another look at this, but I'm caught up in Thanksgiving stuff here in the U.S. for now, so it'll be a few days at least.

11/22/2007 12:12:25 AM	olger (dot) diekstra (at) cardno (dot) com (dot) au	Comment #1 Priority ⇒ 3. High State ⇒ New Queue ⇒ Gollem Summary ⇒ Fix for php memory_limit Type ⇒ Enhancement	Reply to this comment
Gollem (probably Horde as an application entirely) needs quite a memory footprint (PHP memory_limit) for downloading mail attachments or downloading files from Gollem. I haven't tested this problem in the latest and greatest of Horde, but all tickets and forums I've seen indicate this is no different in teh current release. The problem comes from the fact that the entire file that is about to be downloaded is read entirely into memory (on the server) before it is pushed out to the browser. This basically means that a 200MB file stored in Gollem needs a PHP memory_limit of at least 200MB to be able to download it. If one foreach is used on the array storing the file in memory, the memory_limit doubles. To get around this problem is easy, push out the file in chunks, instead of all at once. I have no idea how Gollem (or Horde) pushes out a file to a browser, but I'd imagen it would be similar as I do below. I think the below code is easy enough to read without me blabbering on about it... Its code I use myself for a similar sort of function. And, as far as I'm concerned, is available for Horde to use. I've basically created the code myself using code templates for ideas I found on various places on the net. Using this code, PHP's memory_limit can be left at default 8MB, and any sized file can be downloaded. Questions? Do ask! Function ReadFileChunked ($FileName) { $chunksize = (102400); // how many bytes per chunk $buffer = ''; $handle = fopen($FileName, 'rb'); if ($handle === false) { return false; } while (!feof($handle)) { $buffer = fread($handle, $chunksize); print $buffer; } return fclose($handle); } { $File["File"] = "200MBfiletobedownloaded.zip"; $File["Size"] = FileSize($File["File"]); Header("Content-Type: application/force-download;"); Header("Content-Disposition: attachment; filename=\"".$File["File"]."\""); Header("Content-Length: ".$File["Size"]); ReadFileChunked($File["File"]); Exit; }