Summary | Lower memory usage while downloading files |
Queue | Gollem |
Queue Version | HEAD |
Type | Enhancement |
State | Resolved |
Priority | 1. Low |
Owners | chuck (at) horde (dot) org |
Requester | olger.diekstra (at) cardno (dot) com (dot) au |
Created | 11/22/2007 (6407 days ago) |
Due | |
Updated | 09/04/2010 (5390 days ago) |
Assigned | |
Resolved | 12/11/2007 (6388 days ago) |
Milestone | |
Patch | No |
http://wiki.horde.org/ReleaseManagement for some details. The VFS
changes were in the latest Horde 3.2 release candidate, and the Gollem
changes will be in Gollem 1.1, to be released after Horde 3.2 is out.
is better, even if it uses just .05 seconds. Windows? What's that...?
;-)
Glad to be of help! When will the next stable release see the light?
Cheers! Olger.
and maybe get in under the default 8mb memory limit? I didn't include
the usleep because you didn't have it in your original code and it
didn't "smell" right to me (won't work on windows either).
Thanks for all your work on this! I'm closing the ticket but things
can still be tweaked of course.
State ⇒ Resolved
and maybe get in under the default 8mb memory limit? I didn't include
the usleep because you didn't have it in your original code and it
didn't "smell" right to me (won't work on windows either).
Thanks for all your work on this! I'm closing the ticket but things
can still be tweaked of course.
Tried the modifications. It didn't work so I took a look at view.php
and modified it as follows:
case 'download_file':
$browser->downloadHeaders($filename, null, false,
$GLOBALS['gollem_vfs']->size($filedir, $filename));
if (is_resource($stream)) {
while ($buffer = fread($stream, 10240)) {
print $buffer;
ob_flush(); flush();
usleep(50000);
}
} else {
echo $data;
}
/* if (is_resource($stream)) {
fpassthru($stream);
} else {
echo $data;
} */
break;
That works for me with 16MB memory_limit downloading a 61MB file. The
fpassthru still goggles up more memory than is desirable. ob_flush()
and flush() are nessary to clear internals of the webserver and php,
the usleep(50000) (.05 seconds) allows a bit of time for the buffers
to be actually flushed before filling them again.
Hows that?
Cheers! Olger.
including the code you put in there) from the cvs? Like just one
zipfile for Horde and one for Gollem that I can simply download and
unzip?
I tried using the files from the snapshots, but they don't seem to
have to files I'm after.
right snapshots. From last nights, you want:
http://ftp.horde.org/pub/snaps/latest/framework-HEAD-2007-12-04.tar.gz
http://ftp.horde.org/pub/snaps/latest/gollem-HEAD-2007-12-04.tar.gz
had much need for it)
a separate issue. :)
including the code you put in there) from the cvs? Like just one
zipfile for Horde and one for Gollem that I can simply download and
unzip?
I tried using the files from the snapshots, but they don't seem to
have to files I'm after.
I haven't used cvs before (am usually a lone programmer, so haven't
had much need for it) and find it very unlogical. Tried getting a cvs
client but that didn't help much either.
me a while to figure out how to get files out of the cvs system
though, but I've got the files you modified. (btw, running Horde
3.1.5 and Gollem H3 1.0.3 for testing).
Tested your modifications
and the Gollem changes.
$data = $GLOBALS['gollem_vfs']->read($filedir, $filename);
Your view.php looks nothing like the current one in Gollem CVS. See
http://horde.org/source/ if you still need help with CVS.
filesystem, but you may know why.
way you're trying to use it. To copy to a _different_ filesystem
(virtual or otherwise) you need to read the file data.
New Attachment: view.php
I'm more than happy to help make it work for every filesystem. Took me
a while to figure out how to get files out of the cvs system though,
but I've got the files you modified. (btw, running Horde 3.1.5 and
Gollem H3 1.0.3 for testing).
Tested your modifications, but no joy. Tried downloading a 380MB file,
with various memory settings, all the way up to 800MB memory_limit.
Keeps eating the memory.
The thing is though, in the gollem/view.php file there is a line:
$data = $GLOBALS['gollem_vfs']->read($filedir, $filename);
which reads everything that's returned from the "read ( , )" function
into a variable (in memory, which is a local resource!).
In the function "read( , )" you use another variable to read the file
into a variable, so basically your doubling the memory need to be able
to download a file.
Memory is a local resource. It doesn't care if it came from SQL, FTP,
SMB. It's stored in local memory in the machine serving the browser.
As opposed to copying the entire file into memory, why not copy it to
disk (temp directory comes to mind)? (I've used a hardcoded '/tmp/'
for now, but its easy enough to get the system variable).
Using my previously ReadFileChunked function, that would be easy and
compatible with any backend. It would need some additional checking I
suppose, but this illustrates the basic idea (gollem/view.php), see
attachment.
For some reason it doesn't copy the file from the VFS to the local
filesystem, but you may know why.
I think it gets the basic idea across.
Cheers! Olger.
some point you need to start to understand the codebase that you are
working with.
What your changes do is make sure that Gollem can't work with any
backend except a local filesystem. I hope it's obvious that that's not
an acceptable change.
My commits were to the CVS version of Horde and Gollem (Horde is at
3.2-RC1, Gollem will be released as Gollem 1.1 once Horde 3.2 is
released - see http://wiki.horde.org/ReleaseManagement). If you are
interested in helping us in a way that can be incorporated into the
code and that will help all Gollem users, you need to test that
version as well. You can get snapshots from http://snaps.horde.org/.
You should be able to just drop in the new file.php and view.php files
as well, but since I'm not sure which version you mean by "latest and
greatest", I can't guarantee that. Horde 3.2 is backwards compatible
with Horde 3.0, though.
program in Turbo Pascal with objects loooong time ago...), and
resolved my issue with the hardcoded pathname.
Also fixed my memory issue while I was at it. PHP memory_limit now set
to 16MB, and downloading 368MB file. Sweet.
Still in the same file (gollem/view.php). This is what I done:
/* $data = $GLOBALS['gollem_vfs']->read($filedir, $filename);
if (is_a($data, 'PEAR_Error')) {
$notification->push(sprintf(_("Access denied to %s"), $filename),
'horde.error');
header('Location: ' .
Util::addParameter(Horde::applicationUrl('manager.php', true),
'actionID', $actionID));
exit;
}*/
Commented this part out completely as it read the file into memory,
stuffing it up.
/* Run through action handlers. */
switch ($actionID) {
case 'download_file':
$File_Dir = $GLOBALS['gollem_vfs']->_getNativePath($filedir, $filename);
$File_Size = FileSize($File_Dir);
$browser->downloadHeaders($filename, null, false, $File_Size);
ReadFileChunked($File_Dir);
/* echo $data; */
break;
Let me know what you think!
Cheers! Olger.
Installed the latest and greatest stable of Horde/Gollum, but couldn't
really figure out how to place the file.php/ftp.php. I found similar
files, in the Horde VFS directory, but they were of a vastly different
size. So haven't been able to test this yet.
However, having had a spare minute to play with this, I did get it
working with my own function. I've just downloaded a 368MB file
succesfully from Gollem with my own function. Here's how:
Modified gollem/view.php:
Added function
Function ReadFileChunked ($FileName) {
$chunksize = (102400); // how many bytes per chunk
$buffer = '';
$handle = fopen($FileName, 'rb');
if ($handle === false) { return false; }
while (!feof($handle)) {
$buffer = fread($handle, $chunksize);
print $buffer;
}
return fclose($handle);
}
Then changed this section:
case 'download_file':
$browser->downloadHeaders($filename, null, false, strlen($data));
ReadFileChunked("/exampledir/home/web/".$filename);
/* echo $data; */
break;
"/exampledir/home" is where all the userdirectories are located, "web"
is the logged in user. I didn't know how to get that information from
Horde/Gollem quickly, so for testing purposes I hardcoded it.
But it works a treat. The only reason I still have to keep the PHP
memory_limit just over the filesize I'm trying to download is because
of this line:
$data = $GLOBALS['gollem_vfs']->read($filedir, $filename);
Which reads the entire file into memory.
But as opposed to having to set the memory_limit to just over double
the file size, it now needs to be just over the size of the file.
I'm not used to working with objects in PHP, so haven't been able to
retrieve the directory from the $GLOBALS['gollem_vfs'] object
(although I could print the array and view it).
Now if we can get that one line fixed so that the object doesn't read
the filecontents into memory anymore, we'd be laughing.
Cheers! Olger.
Assigned to Chuck Hagenbuch
either the file or FTP backend in Gollem:
http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072746.html
http://lists.horde.org/archives/cvs/Week-of-Mon-20071126/072745.html
I went ahead and used fpassthru because I didn't see anything that
indicated that it read the whole file into memory - just that on some
older versions it might leak, which is a problem but a different one. :)
I posted my code in the initial ticket, but just in case you can't see
it for some reason, here it is again (slightly modified to make it
easier to read/use).
All you'd basically do is call the second function with the filename,
and the file should be send in bits to the browser.
Function ReadFileChunked ($FileName) {
$chunksize = (102400); // how many bytes per chunk
$buffer = '';
$handle = fopen($FileName, 'rb');
if ($handle === false) { return false; }
while (!feof($handle)) {
$buffer = fread($handle, $chunksize);
print $buffer;
}
return fclose($handle);
}
Function SendFileToBrowser ($FileName) {
Header("Content-Type: application/force-download;");
Header("Content-Disposition: attachment;
filename=\"".$FileName."\"");
Header("Content-Length: ".FileSize($FileName));
ReadFileChunked($FileName);
Exit;
}
necessary to use chunked reads, or will readfile
(http://www.php.net/readfile) or fpassthru do the right thing? In a
way fpassthru would be ideal because we can return a stream from the
VFS library in some backends, or fake it with a local file.
the first comment on this page:
http://au.php.net/manual/en/function.fpassthru.php.
There's also various other comments on memory usage downloading files
on that page.
Cheers! Olger.
necessary to use chunked reads, or will readfile
(http://www.php.net/readfile) or fpassthru do the right thing? In a
way fpassthru would be ideal because we can return a stream from the
VFS library in some backends, or fake it with a local file.
State ⇒ Feedback
often-changing files.
(it would appear anyhows) the entire file is read into memory, from
whatever source (DB, ftp, smb, etc). Causing memory problems with
large files.
As opposed to reading the file into RAM, why not copy it to local
disk and then sending it to the client in chuncks, saving heaps on
memory.
into RAM in the first place, fine. Works for FTP, but SQL is much
harder. Etc. You won't get any argument that lower memory usage is
better - but I think you'll have a better appreciation for things if
you _do_ look at the Gollem and VFS code.
the same goes for
attachments as well), those wouldn't dynamically change size.
But, it's really not that hard (from my point of view). Right now, (it
would appear anyhows) the entire file is read into memory, from
whatever source (DB, ftp, smb, etc). Causing memory problems with
large files.
As opposed to reading the file into RAM, why not copy it to local disk
and then sending it to the client in chuncks, saving heaps on memory.
The downside of having to use so much memory is that your webserver
will start swapping to disk sooner. In general, clients sit on
connections that are way slower than average disks can read, so speed
is not an issue here.
A one GB RAM server will happily serve hundreds of people a 250MB file
when this is done in chuncks, whereas it'll choke on 2 users if those
files are read into memory first.
BTW, I live in Australia, so you might not get quick responses when
you guys are half way through the day... :-)
Cheers! Olger.
Version ⇒ HEAD
Olger see some of the complexities here.
Olger, what does this refer to?
the same goes for
attachments as well), those wouldn't dynamically change size.
streams for some backends in Horde 4.
since PHP's ftp get functions don't allow us to chunk data from the
ftp server. At a minimum, this kind of block/chunk reading needs to
be put in the VFS drivers, not gollem.
Looking at the sort of files that are being stored in Gollem (and the
same goes for attachments as well), those wouldn't dynamically change
size. If they do, something is probably wrong. But the code can easily
be expanded to include some file checks every time it reads a chunk.
Cheers! Enjoy Thanksgiving!
Olger.
Summary ⇒ Lower memory usage while downloading files
Priority ⇒ 2. Medium
reading static files - we're reading from a database, or from an FTP
server, or an SMB share, or ...
In any case, we can take another look at this, but I'm caught up in
Thanksgiving stuff here in the U.S. for now, so it'll be a few days at
least.
Priority ⇒ 3. High
State ⇒ New
Queue ⇒ Gollem
Summary ⇒ Fix for php memory_limit
Type ⇒ Enhancement
memory footprint (PHP memory_limit) for downloading mail attachments
or downloading files from Gollem.
I haven't tested this problem in the latest and greatest of Horde, but
all tickets and forums I've seen indicate this is no different in teh
current release.
The problem comes from the fact that the entire file that is about to
be downloaded is read entirely into memory (on the server) before it
is pushed out to the browser. This basically means that a 200MB file
stored in Gollem needs a PHP memory_limit of at least 200MB to be able
to download it. If one foreach is used on the array storing the file
in memory, the memory_limit doubles.
To get around this problem is easy, push out the file in chunks,
instead of all at once. I have no idea how Gollem (or Horde) pushes
out a file to a browser, but I'd imagen it would be similar as I do
below.
I think the below code is easy enough to read without me blabbering on
about it... Its code I use myself for a similar sort of function. And,
as far as I'm concerned, is available for Horde to use. I've basically
created the code myself using code templates for ideas I found on
various places on the net.
Using this code, PHP's memory_limit can be left at default 8MB, and
any sized file can be downloaded.
Questions? Do ask!
Function ReadFileChunked ($FileName) {
$chunksize = (102400); // how many bytes per chunk
$buffer = '';
$handle = fopen($FileName, 'rb');
if ($handle === false) { return false; }
while (!feof($handle)) {
$buffer = fread($handle, $chunksize);
print $buffer;
}
return fclose($handle);
}
{ $File["File"] = "200MBfiletobedownloaded.zip";
$File["Size"] = FileSize($File["File"]);
Header("Content-Type: application/force-download;");
Header("Content-Disposition: attachment; filename=\"".$File["File"]."\"");
Header("Content-Length: ".$File["Size"]);
ReadFileChunked($File["File"]);
Exit;
}