6.0.0-git
2019-04-25

[#5913] Lower memory usage while downloading files
Summary Lower memory usage while downloading files
Queue Gollem
Queue Version HEAD
Type Enhancement
State Resolved
Priority 1. Low
Owners chuck (at) horde (dot) org
Requester olger.diekstra (at) cardno (dot) com (dot) au
Created 2007-11-22 (4172 days ago)
Due
Updated 2010-09-04 (3155 days ago)
Assigned
Resolved 2007-12-11 (4153 days ago)
Milestone
Patch No

History
2007-12-11 19:28:54 Chuck Hagenbuch Comment #22 Reply to this comment
When will the next stable release see the light?
All of these changes will be in the Horde 3.2 release series - see 
http://wiki.horde.org/ReleaseManagement for some details. The VFS 
changes were in the latest Horde 3.2 release candidate, and the Gollem 
changes will be in Gollem 1.1, to be released after Horde 3.2 is out.
2007-12-11 05:19:09 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #21 Reply to this comment
Hehe, I'm happy with 8mb memory_limit... I agree that not using usleep 
is better, even if it uses just .05 seconds. Windows? What's that...? 
;-)

Glad to be of help! When will the next stable release see the light?



Cheers! Olger.
I made it 8192 bytes without the usleep. I guess we could make it 4mb
and maybe get in under the default 8mb memory limit? I didn't include
the usleep because you didn't have it in your original code and it
didn't "smell" right to me (won't work on windows either).

Thanks for all your work on this! I'm closing the ticket but things
can still be tweaked of course.
2007-12-11 05:03:16 Chuck Hagenbuch Comment #20
State ⇒ Resolved
Reply to this comment
I made it 8192 bytes without the usleep. I guess we could make it 4mb 
and maybe get in under the default 8mb memory limit? I didn't include 
the usleep because you didn't have it in your original code and it 
didn't "smell" right to me (won't work on windows either).



Thanks for all your work on this! I'm closing the ticket but things 
can still be tweaked of course.
2007-12-11 00:18:49 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #19 Reply to this comment
OK, lack of time prevented me from going further on this.

Tried the modifications. It didn't work so I took a look at view.php 
and modified it as follows:



case 'download_file':

     $browser->downloadHeaders($filename, null, false, 
$GLOBALS['gollem_vfs']->size($filedir, $filename));

     if (is_resource($stream)) {

          while ($buffer = fread($stream, 10240)) {

            print $buffer;

            ob_flush(); flush();

            usleep(50000);

            }

        } else {

          echo $data;

        }

      /*  if (is_resource($stream)) {

         fpassthru($stream);

      } else {

         echo $data;

     }  */

     break;



That works for me with 16MB memory_limit downloading a 61MB file. The 
fpassthru still goggles up more memory than is desirable. ob_flush() 
and flush() are nessary to clear internals of the webserver and php, 
the usleep(50000) (.05 seconds) allows a bit of time for the buffers 
to be actually flushed before filling them again.

Hows that?



Cheers! Olger.
2007-12-04 16:34:27 Chuck Hagenbuch Comment #18 Reply to this comment
Is it possible to get a zipped up file of the development code (like
including the code you put in there) from the cvs? Like just one
zipfile for Horde and one for Gollem that I can simply download and
unzip?
I tried using the files from the snapshots, but they don't seem to
have to files I'm after.
That's exactly what the snapshots are. Perhaps you're not using the 
right snapshots. From last nights, you want:



http://ftp.horde.org/pub/snaps/latest/framework-HEAD-2007-12-04.tar.gz

http://ftp.horde.org/pub/snaps/latest/gollem-HEAD-2007-12-04.tar.gz
I haven't used cvs before (am usually a lone programmer, so haven't
had much need for it)
You're missing the point of a version control system then, but that's 
a separate issue. :)
2007-12-04 06:44:50 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #17 Reply to this comment
Is it possible to get a zipped up file of the development code (like 
including the code you put in there) from the cvs? Like just one 
zipfile for Horde and one for Gollem that I can simply download and 
unzip?

I tried using the files from the snapshots, but they don't seem to 
have to files I'm after.

I haven't used cvs before (am usually a lone programmer, so haven't 
had much need for it) and find it very unlogical. Tried getting a cvs 
client but that didn't help much either.


2007-12-04 05:07:29 Chuck Hagenbuch Deleted Original Message
 
2007-12-04 05:07:12 Chuck Hagenbuch Priority ⇒ 1. Low
 
2007-12-04 05:07:00 Chuck Hagenbuch Comment #16 Reply to this comment
I'm more than happy to help make it work for every filesystem. Took
me a while to figure out how to get files out of the cvs system
though, but I've got the files you modified. (btw, running Horde
3.1.5 and Gollem H3 1.0.3 for testing).
Tested your modifications
No, you didn't test the changes to Gollem. You need the VFS updates 
and the Gollem changes.
The thing is though, in the gollem/view.php file there is a line:
          $data = $GLOBALS['gollem_vfs']->read($filedir, $filename);
http://cvs.horde.org/diff.php?r1=1.60&r2=1.61&f=gollem%2Fview.php



Your view.php looks nothing like the current one in Gollem CVS. See 
http://horde.org/source/ if you still need help with CVS.
For some reason it doesn't copy the file from the VFS to the local
filesystem, but you may know why.
... because you're still accessing the VFS. copy() doesn't work the 
way you're trying to use it. To copy to a _different_ filesystem 
(virtual or otherwise) you need to read the file data.
2007-12-04 00:23:04 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #15
New Attachment: view.php
Reply to this comment
Hi Chuck,



I'm more than happy to help make it work for every filesystem. Took me 
a while to figure out how to get files out of the cvs system though, 
but I've got the files you modified. (btw, running Horde 3.1.5 and 
Gollem H3 1.0.3 for testing).

Tested your modifications, but no joy. Tried downloading a 380MB file, 
with various memory settings, all the way up to 800MB memory_limit. 
Keeps eating the memory.



The thing is though, in the gollem/view.php file there is a line:

          $data = $GLOBALS['gollem_vfs']->read($filedir, $filename);

which reads everything that's returned from the "read ( , )" function 
into a variable (in memory, which is a local resource!).

In the function "read( , )" you use another variable to read the file 
into a variable, so basically your doubling the memory need to be able 
to download a file.



Memory is a local resource. It doesn't care if it came from SQL, FTP, 
SMB. It's stored in local memory in the machine serving the browser. 
As opposed to copying the entire file into memory, why not copy it to 
disk (temp directory comes to mind)? (I've used a hardcoded '/tmp/' 
for now, but its easy enough to get the system variable).

Using my previously ReadFileChunked function, that would be easy and 
compatible with any backend. It would need some additional checking I 
suppose, but this illustrates the basic idea (gollem/view.php), see 
attachment.



For some reason it doesn't copy the file from the VFS to the local 
filesystem, but you may know why.



I think it gets the basic idea across.



Cheers! Olger.

[Show Quoted Text - 19 lines]
2007-11-30 00:08:11 Chuck Hagenbuch Comment #14 Reply to this comment
I know that you are trying to be helpful and I appreciate that. But at 
some point you need to start to understand the codebase that you are 
working with.



What your changes do is make sure that Gollem can't work with any 
backend except a local filesystem. I hope it's obvious that that's not 
an acceptable change.



My commits were to the CVS version of Horde and Gollem (Horde is at 
3.2-RC1, Gollem will be released as Gollem 1.1 once Horde 3.2 is 
released - see http://wiki.horde.org/ReleaseManagement). If you are 
interested in helping us in a way that can be incorporated into the 
code and that will help all Gollem users, you need to test that 
version as well. You can get snapshots from http://snaps.horde.org/.



You should be able to just drop in the new file.php and view.php files 
as well, but since I'm not sure which version you mean by "latest and 
greatest", I can't guarantee that. Horde 3.2 is backwards compatible 
with Horde 3.0, though.
2007-11-30 00:01:22 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #13 Reply to this comment
OK, fixed it myself. Figured out how to use objects (again, used to 
program in Turbo Pascal with objects loooong time ago...), and 
resolved my issue with the hardcoded pathname.

Also fixed my memory issue while I was at it. PHP memory_limit now set 
to 16MB, and downloading 368MB file. Sweet.



Still in the same file (gollem/view.php). This is what I done:

/* $data = $GLOBALS['gollem_vfs']->read($filedir, $filename);

if (is_a($data, 'PEAR_Error')) {

     $notification->push(sprintf(_("Access denied to %s"), $filename), 
'horde.error');

     header('Location: ' . 
Util::addParameter(Horde::applicationUrl('manager.php', true), 
'actionID', $actionID));

     exit;

}*/

Commented this part out completely as it read the file into memory, 
stuffing it up.



/* Run through action handlers. */

switch ($actionID) {

case 'download_file':

     $File_Dir  = $GLOBALS['gollem_vfs']->_getNativePath($filedir, $filename);

     $File_Size = FileSize($File_Dir);

     $browser->downloadHeaders($filename, null, false, $File_Size);

     ReadFileChunked($File_Dir);

     /* echo $data; */

     break;



Let me know what you think!



Cheers! Olger.
2007-11-29 23:33:46 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #12 Reply to this comment
Hi Chuck,



Installed the latest and greatest stable of Horde/Gollum, but couldn't 
really figure out how to place the file.php/ftp.php. I found similar 
files, in the Horde VFS directory, but they were of a vastly different 
size. So haven't been able to test this yet.

However, having had a spare minute to play with this, I did get it 
working with my own function. I've just downloaded a 368MB file 
succesfully from Gollem with my own function. Here's how:



Modified gollem/view.php:

Added function

Function ReadFileChunked ($FileName) {

               $chunksize = (102400); // how many bytes per chunk

               $buffer = '';

               $handle = fopen($FileName, 'rb');

               if ($handle === false) { return false; }

               while (!feof($handle)) {

                       $buffer = fread($handle, $chunksize);

                       print $buffer;

                       }

               return fclose($handle);

               }



Then changed this section:

case 'download_file':

     $browser->downloadHeaders($filename, null, false, strlen($data));

     ReadFileChunked("/exampledir/home/web/".$filename);

     /* echo $data; */

     break;



"/exampledir/home" is where all the userdirectories are located, "web" 
is the logged in user. I didn't know how to get that information from 
Horde/Gollem quickly, so for testing purposes I hardcoded it.



But it works a treat. The only reason I still have to keep the PHP 
memory_limit just over the filesize I'm trying to download is because 
of this line:

$data = $GLOBALS['gollem_vfs']->read($filedir, $filename);

Which reads the entire file into memory.



But as opposed to having to set the memory_limit to just over double 
the file size, it now needs to be just over the size of the file.

I'm not used to working with objects in PHP, so haven't been able to 
retrieve the directory from the $GLOBALS['gollem_vfs'] object 
(although I could print the array and view it).

Now if we can get that one line fixed so that the object doesn't read 
the filecontents into memory anymore, we'd be laughing.



Cheers! Olger.

[Show Quoted Text - 10 lines]
2007-11-29 05:20:14 Chuck Hagenbuch Comment #11
Assigned to Chuck Hagenbuch
Reply to this comment
Please give these two commits a shot, assuming that you are using 
either the file or FTP backend in Gollem:



http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072746.html

http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072745.html



I went ahead and used fpassthru because I didn't see anything that 
indicated that it read the whole file into memory - just that on some 
older versions it might leak, which is a problem but a different one. :)
2007-11-25 23:27:53 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #10 Reply to this comment
Hi Chuck,



I posted my code in the initial ticket, but just in case you can't see 
it for some reason, here it is again (slightly modified to make it 
easier to read/use).

All you'd basically do is call the second function with the filename, 
and the file should be send in bits to the browser.



Function ReadFileChunked ($FileName) {

               $chunksize = (102400); // how many bytes per chunk

               $buffer = '';

               $handle = fopen($FileName, 'rb');

               if ($handle === false) { return false; }

               while (!feof($handle)) {

                       $buffer = fread($handle, $chunksize);

                       print $buffer;

                       }

               return fclose($handle);

               }



Function SendFileToBrowser ($FileName) {

               Header("Content-Type: application/force-download;");

               Header("Content-Disposition: attachment; 
filename=\"".$FileName."\"");

               Header("Content-Length: ".FileSize($FileName));

               ReadFileChunked($FileName);

               Exit;

               }
Btw, you didn't post the code you mentioned yet, but is it really
necessary to use chunked reads, or will readfile
(http://www.php.net/readfile) or fpassthru do the right thing? In a
way fpassthru would be ideal because we can return a stream from the
VFS library in some backends, or fake it with a local file.
Did a bit of reading up on fpassthru, and you might want to look at 
the first comment on this page: 
http://au.php.net/manual/en/function.fpassthru.php.

There's also various other comments on memory usage downloading files 
on that page.



Cheers! Olger.
2007-11-25 23:04:41 Chuck Hagenbuch Comment #9 Reply to this comment
Btw, you didn't post the code you mentioned yet, but is it really 
necessary to use chunked reads, or will readfile 
(http://www.php.net/readfile) or fpassthru do the right thing? In a 
way fpassthru would be ideal because we can return a stream from the 
VFS library in some backends, or fake it with a local file.
2007-11-25 23:02:27 Chuck Hagenbuch Comment #8
State ⇒ Feedback
Reply to this comment
I'm not sure what dynamically changing size would have to do with this.
That was a comment on Chuck's remark about often not reading static files.
I meant files on the local filesystem, vs. remote files, not 
often-changing files.
But, it's really not that hard (from my point of view). Right now,
(it would appear anyhows) the entire file is read into memory, from
whatever source (DB, ftp, smb, etc). Causing memory problems with
large files.
As opposed to reading the file into RAM, why not copy it to local
disk and then sending it to the client in chuncks, saving heaps on
memory.
As long as you can create that local copy without reading the file 
into RAM in the first place, fine. Works for FTP, but SQL is much 
harder. Etc. You won't get any argument that lower memory usage is 
better - but I think you'll have a better appreciation for things if 
you _do_ look at the Gollem and VFS code.
2007-11-25 22:35:40 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #7 Reply to this comment
Hi Guys,
Looking at the sort of files that are being stored in Gollem (and
the same goes for
attachments as well), those wouldn't dynamically change size.
I'm not sure what dynamically changing size would have to do with this.
That was a comment on Chuck's remark about often not reading static files.



But, it's really not that hard (from my point of view). Right now, (it 
would appear anyhows) the entire file is read into memory, from 
whatever source (DB, ftp, smb, etc). Causing memory problems with 
large files.

As opposed to reading the file into RAM, why not copy it to local disk 
and then sending it to the client in chuncks, saving heaps on memory.

The downside of having to use so much memory is that your webserver 
will start swapping to disk sooner. In general, clients sit on 
connections that are way slower than average disks can read, so speed 
is not an issue here.

A one GB RAM server will happily serve hundreds of people a 250MB file 
when this is done in chuncks, whereas it'll choke on 2 users if those 
files are read into memory first.



BTW, I live in Australia, so you might not get quick responses when 
you guys are half way through the day... :-)



Cheers! Olger.
2007-11-23 23:40:49 Chuck Hagenbuch Comment #6
Version ⇒ HEAD
Reply to this comment
Michael and Jan's comments are right on the money, hopefully helping 
Olger see some of the complexities here.



Olger, what does this refer to?
Looking at the sort of files that are being stored in Gollem (and 
the same goes for
attachments as well), those wouldn't dynamically change size.
I'm not sure what dynamically changing size would have to do with this.
2007-11-23 11:04:01 Jan Schneider Comment #5 Reply to this comment
This probably gets much easier and more efficient if we can switch to 
streams for some backends in Horde 4.
2007-11-23 07:20:12 Michael Slusarz Comment #4 Reply to this comment
As chuck says, this is entirely undoable if using a backend like FTP, 
since PHP's ftp get functions don't allow us to chunk data from the 
ftp server.  At a minimum, this kind of block/chunk reading needs to 
be put in the VFS drivers, not gollem.
2007-11-22 07:03:52 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #3 Reply to this comment
Ah, I think I can wait a few days... ;-)



Looking at the sort of files that are being stored in Gollem (and the 
same goes for attachments as well), those wouldn't dynamically change 
size. If they do, something is probably wrong. But the code can easily 
be expanded to include some file checks every time it reads a chunk.



Cheers! Enjoy Thanksgiving!

Olger.
2007-11-22 06:41:32 Chuck Hagenbuch Comment #2
Summary ⇒ Lower memory usage while downloading files
Priority ⇒ 2. Medium
Reply to this comment
Thanks for the ticket. The challenge in Gollem is we're very often not 
reading static files - we're reading from a database, or from an FTP 
server, or an SMB share, or ...



In any case, we can take another look at this, but I'm caught up in 
Thanksgiving stuff here in the U.S. for now, so it'll be a few days at 
least.
2007-11-22 00:12:25 olger (dot) diekstra (at) cardno (dot) com (dot) au Comment #1
Type ⇒ Enhancement
State ⇒ New
Priority ⇒ 3. High
Summary ⇒ Fix for php memory_limit
Queue ⇒ Gollem
Reply to this comment
Gollem (probably Horde as an application entirely) needs quite a 
memory footprint (PHP memory_limit) for downloading mail attachments 
or downloading files from Gollem.



I haven't tested this problem in the latest and greatest of Horde, but 
all tickets and forums I've seen indicate this is no different in teh 
current release.



The problem comes from the fact that the entire file that is about to 
be downloaded is read entirely into memory (on the server) before it 
is pushed out to the browser. This basically means that a 200MB file 
stored in Gollem needs a PHP memory_limit of at least 200MB to be able 
to download it. If one foreach is used on the array storing the file 
in memory, the memory_limit doubles.



To get around this problem is easy, push out the file in chunks, 
instead of all at once. I have no idea how Gollem (or Horde) pushes 
out a file to a browser, but I'd imagen it would be similar as I do 
below.

I think the below code is easy enough to read without me blabbering on 
about it... Its code I use myself for a similar sort of function. And, 
as far as I'm concerned, is available for Horde to use. I've basically 
created the code myself using code templates for ideas I found on 
various places on the net.

Using this code, PHP's memory_limit can be left at default 8MB, and 
any sized file can be downloaded.



Questions? Do ask!



Function ReadFileChunked ($FileName) {

                $chunksize = (102400); // how many bytes per chunk

                $buffer = '';

                $handle = fopen($FileName, 'rb');

                if ($handle === false) { return false; }

                while (!feof($handle)) {

                      $buffer = fread($handle, $chunksize);

                      print $buffer;

                      }

                return fclose($handle);

                }



  { $File["File"] = "200MBfiletobedownloaded.zip";

     $File["Size"] = FileSize($File["File"]);

     Header("Content-Type: application/force-download;");

     Header("Content-Disposition: attachment; filename=\"".$File["File"]."\"");

     Header("Content-Length: ".$File["Size"]);

     ReadFileChunked($File["File"]);

     Exit;

     }


Saved Queries