Summary | Bugs in the current implementation for PARTIAL limiting (RFC 5267 [4.4]) in Horde's Imap_Client library |
Queue | Horde Framework Packages |
Queue Version | Git master |
Type | Bug |
State | Resolved |
Priority | 1. Low |
Owners | slusarz (at) horde (dot) org |
Requester | daniel_bistriceanu (at) yahoo (dot) com |
Created | 04/30/2014 (4092 days ago) |
Due | |
Updated | 05/14/2014 (4078 days ago) |
Assigned | 04/30/2014 (4092 days ago) |
Resolved | 05/14/2014 (4078 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
State ⇒ Resolved
range of message UIDs. The latter is described in RFC 3501 [9]:
seq-range = seq-number ":" seq-number
and since PARTIAL syntax does not support '*', as described in RFC
5267 [4.4], a partial range is not exactly the same as a seq-range
of message UIDs, but a range like nz-number ":" nz-number.
wants to pass invalid data into the partial field, that's your
prerogative. You're not going to get any data that makes any sense
back, but that's your own fault.
After all, a client author can do something like extend
Horde_Imap_Client_Ids to cause it to return nothing but "X"s. So even
if we did type checking we are still not going to catch this issue.
In other words... at some point, it is no longer a library's job to
make sure the input is spotless ... especially in a language like PHP
where you are not worried about things like memory/buffers either
(this might be a different discussion if the library was programmed in
C. It's not.)
clear/intuitive, and certainly not inferable only from the (mixed)
type hint and the description for the 'partial' option available in
the documentation.
appreciated.
http://marc.info/?l=dovecot&m=139928889515465&w=2
Dovecot's quirky behavior which actually explains the weird issue
mentioned above.
broken the last few releases. (See above email).
As mentioned in previous post, we should abstract partial behavior for
IMAP servers that don't support CONTEXT. You are going to lose a lot
of the performance benefits, since the PHP server still needs to grab
the entire search results list, but there's not much you can do about
it. (It is still more efficient since it will release memory sooner
in the process and without the need for further user intervention.)
commit 700333ec974f29b89da42886f66a2ae354a9afd2
Author: Michael M Slusarz <slusarz@horde.org>
Date: Wed May 14 00:25:34 2014 -0600
[mms] Support partial search limitation even if the PARTIAL
search return option is not available on the server (
Request #13153)..../Imap_Client/lib/Horde/Imap/Client/Socket.php | 22 +++++++++++++++++--
.../lib/Horde/Imap/Client/Socket/Pop3.php | 6 +++++
framework/Imap_Client/package.xml | 2 +
3 files changed, 27 insertions(+), 3 deletions(-)
http://github.com/horde/horde/commit/700333ec974f29b89da42886f66a2ae354a9afd2
commit 4e3d0ec4a5340e1b34e15c4ef78077c89a198ee4
Author: Michael M Slusarz <slusarz@horde.org>
Date: Fri May 2 23:02:24 2014 -0600
Don't allow empty PARTIAL limiters
See
Ticket #13153.../Imap_Client/lib/Horde/Imap/Client/Socket.php | 20
++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)
http://github.com/horde/horde/commit/4e3d0ec4a5340e1b34e15c4ef78077c89a198ee4
range of message UIDs. The latter is described in RFC 3501 [9]:
seq-range = seq-number ":" seq-number
and since PARTIAL syntax does not support '*', as described in RFC
5267 [4.4], a partial range is not exactly the same as a seq-range of
message UIDs, but a range like nz-number ":" nz-number.
Horde_Imap_Client_Ids is a much more generic handler that works with a
mixed set of ids representations:
* @param mixed $ids Either self::ALL, self::SEARCH_RES,
* Horde_Imap_Client_Ids object, array, or string.
As such it would accept any of those representation variants as valid
even though the rage_string for the corresponding $options['partial']
might not make any sense as a PARTIAL range in some particular cases.
For example, it would even accept a $options['partial?] set to
Horde_Imap_Client_Ids::ALL which would result in a PARTIAL range
broken error caused by the command that was actually sent that
included a RETURN (PARTIAL :), as the range_string property returns
an empty string in this particular case. And yes, this is not really
an option that should be supported and it falls in the same
type/sanity-checking category, like the existing check for a not empty
string/array, and can probably be avoided by just checking if the
range_string representation is not empty before proceeding further.
I still believe that the most efficient way of validating a partial
range would be to simply check that it does in fact follow its
specification, since we're only validating a specific subset of all
the possible representations accepted by the Horde_Imap_Client_Ids.
But on the other hand you are correct that the current API
implementation behaves differently and since in my opinion the
documentation on this is not exactly clear (see below) your approach
is probably best in order not to break backwards compatibility. So
maybe this should be changed in the next version (3.0) if you agree
that it?s a good idea.
using an older version of the library (for reasons which are not
relevant to this issue) on which I manually applied your fix.
Unfortunately I wasn't thorough enough and because in that version
there was no `case 'range_string':` in the `__get `magic method from
Ids class I was always getting that ':', regardless of the actual
value of the 'partial' option. Like I said, my bad for not also
checking the current implementation of Ids and assuming it was the
same as mine.
(http://dev.horde.org/api/master/lib/Imap_Client/classes/Horde_Imap_Client_Base.html#method_search). That was actually among the first things I did when I tried to use PARTIAL results. But I didn't see that it stated anywhere that the 'partial' option supports an array representation, or any of the other representation variants that the Ids class can handle. The current implementation, after your fix, supports either the literal string representation: 'nz-number:nz-number' or an array representation to specify the same thing, like array('nz-number', 'nz-number') which may or may not contain any other values in between the two that define the range, or an already existing Ids object that may or may not contain a valid partial range, or even Horde_Imap_Client_Ids::ALL, Horde_Imap_Client_Ids::SEARCH_RES which make no sense in this case. I might be wrong, but for me this is neither really correct, nor clear/intuitive, and certainly not inferable only from the (mixed) type hint and the description for the 'partial' option available in the
documentation.
That being said, I strongly agree with you that this library doesn't
need even more bloat, especially not just for handling things like this.
the library on which I had to manually apply your fix and I didn't do
a thorough job at it.
with IMAP PARTIAL results and this library in particular was about a
month ago. The original issue was that the library was failing on
mailboxes that had a huge number of messages, due to memory
limitations while trying to retrieve all message UIDs, even though the
results were being paginated through on the client side and only the
relevant messages were being fetched afterwards. So it made sense to
try and leverage the use of PARTIAL results to retrieve only the
relevant message UIDs as that would have made the process a lot more
efficient and completely avoided the memory limitation issue. My
problem was that the existing implementation wasn't actually allowing
PARTIAL results for an UID SORT command because CONTEXT=SORT was not
available on my server. And even though that might be the correct
behavior as far as RFC 5267 is concerned, the strange thing was that I
could manually run an UID SORT command to return a correct PARTIAL,
which showed that it was in fact working properly on my server even
w/o the CONTEXT=SORT.
Thank you for your prompt replies and for the clarification about
Dovecot's quirky behavior which actually explains the weird issue
mentioned above.
Dovecot's SEARCH and SORT code is the same. So limiting via PARTIAL
will work on SORT simply because the code being used is identical
and limitations on PARTIAL with SORT aren't being enforced.
src/imap/cmd-search.c and src/imap/cmd-sort.c both call
cmd_search_parse_return_if_found
cmd_search_parse_return_if_found (src/imap/imap-search.c) calls
search_parse_return_options (same file). Which is where the PARTIAL
return modified is parsed.
Looking at the code, it looks like CONTEXT and UPDATE are also both
available... and sure enough, they are both accepted without error.
And it looks like UPDATE is working for SORT also. Hmmmmm...
I might have to ping Timo to see if he simply forgot to add the
capability flag to the available list. Or if there are issues with
the code that I am not aware of.
Priority ⇒ 1. Low
Patch ⇒ No
ability away from someone using the code ... that's the definition of
breaking backward compatibility. Whether its a good idea or not is
completely irrelevant - we're stuck with it until Horde_Imap_Client 3.0.
And from an API perspective, by requiring someone to provide a full
partial string you are requiring them to do 2 things: 1.) Have a
detailed knowledge of RFC 3501 (number sequence formatting) and 2.)
Have a detailed knowledge of RFC 5267.
The brilliance/advantage of allowing someone to pass an array in is
that all of these details are abstracted. The library does all the
IMAP formatting details for you.
think that trying to handle them the same way just for consistency
is the best idea.
seq-number = nz-number / "*"
uniqueid = nz-number
Outside of the '*', they *are* identicial from a syntax level. So we
can (and should) treat them the same internally.
Horde_Imap_Client_Ids works on both equally well. We surely don't
want to have two libraries that duplicate the exact same code.
the moment your current fix will always set the partials range to
':' which is clearly invalid:
code to catch that, but the result's going to be the same ... an
invalid search. GIGO).
I haven't gone through and been 100% thorough in type-checking,
sanity-checking, etc. especially for parameters that are very liberal
about their data input. Trying to do that right now is simply going
to add more bloat to the library - Socket.php is already too big.
Once the commands are broken up into separate classes (coming in 3.0),
this will probably happen since the code will be more maintainable
even with all the sanity checks.
* - partial: (mixed) The range of results to return (message sequence
* numbers) Only a single range is supported (represented by
* the minimum and maximum values contained in the range
* given).
* DEFAULT: All messages are returned.
This code:
$pids = $this->getIdsOb(array(1,2,3,5,6,7))->range_string;
returns:
1:7
S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.
PARTIAL return). $pids, used as the argument to PARTIAL, can NEVER
contain a comma (,) for example. You either copy/pasted this example
or you are not running an unmodified copy of the current code.
cases. For example you can have a server that supports ESORT but not
CONTEXT=SORT, which is exactly the case on my test server:
can't use PARTIAL (or CONTEXT or UPDATE) with ESORT.
Would you happen to be using Dovecot? Dovecot does NOT support
CONTEXT=SORT. See, e.g., http://wiki2.dovecot.org/Roadmap
Now what you are seeing is probably the following: internally,
Dovecot's SEARCH and SORT code is the same. So limiting via PARTIAL
will work on SORT simply because the code being used is identical and
limitations on PARTIAL with SORT aren't being enforced. Which is
perfectly legal: there's nothing in IMAP spec that prevents commands
from executing if not listed in CAPABILITY.
But the more difficult part in implementing CONTEXT=SORT actually
deals with CONTEXT & UPDATE, which hasn't been written yet in Dovecot.
Without those two, CONTEXT=SORT can't be listed as supported since
there is no way to indicate which of the 3 return values are supported.
Fortuitously this works on your system. But there is absolutely no
way of knowing this from the CAPABILITY string and no guarantee this
works on other systems. Per the RFC, the only way PARTIAL is
supported for SORTs is if CONTEXT=SORT is available
RFC 5267 [4.1]: if CONTEXT=SORT is advertised, ESORT accepts 3
additional return arguments: UPDATE, CONTEXT, and PARTIAL. Without
CONTEXT=SORT, these 3 RETURN arguments aren't available. That's all
that section says.
The RFC does not say that PARTIAL searching is supported for SORTs if
CONTEXT=SEARCH is available. (Which makes logical sense: there's no
reason to support CONTEXT=SOMETHING type capability strings unless
they refer to a specific subset of behavior modifications. Otherwise
a single CONTEXT capability would have been defined handling both
SEARCH and SORT.)
What really should be done, and would be a more productive use of our
time, is to abstract partial support to the client also (i.e.
client-side sorting if SORT is not available). Meaning a user should
be able to pass-in the partial argument and get the partial list of
responses, regardless if CONTEXT is available on the server.
Obviously CONTEXT would make this a much less expensive operation, but
that should be irrelevant for the purposes of someone using the API.
if it is a valid sequence range ('range_start:range_end'), as
specified in RFC 5267 [4.4].
So we do need to convert to an Ids object first. We just need to
catch the single case where the range contains a single ID
as anything else but the actual literal representation? It's kind of
pointless to have this array option and allow an array(1, 2, 3) to be
specified here instead of the literal '1:3' because, as stated in RFC
5267 [4.4], the IMAP command accepts only a literal range
representation as a valid PARTIAL. In my opinion the array support
would just add additional overhead for converting back and forth for
no real gains. And it's a lot easier to use the literal representation
anyway, especially when you want to retrieve more than just a few
messages at a time: consider for example specifying '1:100' vs its
array(1,2,3,...,100) counterpart. Not to mention that handling an
array option also requires additional processing to prevent errors for
an edge case like the one reported. For me it seems that this
additional array option just ends up making things more complicated
and less clear; and also less intuitive when for a range like 'x:x' it
would also have to handle the option of array(x).
Partial ranges have nothing to do with message UIDs and I don't think
that trying to handle them the same way just for consistency is the
best idea. In case of specifying a set of message UIDs then it does
indeed make a lot of sense to use an Ids object. In that case you
don't always have a single sequence-set representation and the Ids
class handles things properly because that's what it had been designed
and implemented for, but the same does not apply to partial ranges. A
partial range must always contain exactly one single literal
sequence-set representation, so, like I already said, I don?t really
see the point of allowing the ability to specify it as an array as well.
Also you did not commit any changes to Horde_Imap_Client_Ids so at the
moment your current fix will always set the partials range to ':'
which is clearly invalid:
C: 4 UID SORT RETURN (PARTIAL :) (DATE) US-ASCII UNDELETED
S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.
And last but not least, if you're hell bent on allowing this array
option, then you need even more additional processing, to make sure
that you're actually sending a proper partial range. For example
something like array(1,2,3,5,6,7), while valid as far as Ids class is
concerned, would still break your current fix because it does not have
a single sequence-set rage representation and after the conversion it
would get translated from '1:3,5:7'*, which is obviously still not a
valid partial range:
C: 4 UID SORT RETURN (PARTIAL 1:3,5:7) (DATE) US-ASCII UNDELETED
S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.
*Assuming here that the 'range_string' property accessed through the
magic method was supposed to return the literal sequence-set
representation for the $_ids stored in the Ids object, similar to
'tostring'.
CONTEXT=SEARCH does in fact imply ESEARCH:
In the case of CONTEXT=SEARCH, the server supports the extended
SEARCH command syntax described in [IMAP-ABNF], and accepts three
additional return options.
CONTEXT=SORT does in fact imply ESORT:
Servers advertising CONTEXT=SORT also advertise the SORT capability,
as described in [SORT], support the extended SORT command syntax
described in Section 3, and accept three additional return options
for this extended SORT.
These additional return options allow for notifications of changes to
the results of SEARCH or SORT commands, and also allow for access to
partial results.
However the reverse is neither implied nor required for either cases.
For example you can have a server that supports ESORT but not
CONTEXT=SORT, which is exactly the case on my test server:
C: [LOGIN Command - username: test]
S: 1 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID
ENABLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND
UNSELECT IDLE CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1
CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH
LIST-STATUS] Logged in
As you can see it does support both ESEARCH and ESORT but only
CONTEXT=SEARCH, no CONTEXT=SORT.
Moreover, while on the $server_sort branch, PARTIAL return does NOT
require CONTEXT=SORT as well as ESORT, having only the latter
available is enough.
So allowing an ESORT with PARTIAL return even if CONTEXT=SORT is not
available is in fact correct.
Furthermore, the existing check makes the CONTEXT=SORT availability
mandatory, which in fact denies access to PARTIAL returns on servers
that do not support it but do support ESORT, which is in fact not
correct.
The entire check should be rewritten, because it is incorrectly
assuming that you could have a server that does not support ESORT and
still support CONTEXT=SORT. This can never happen, simply because if
the server does not support ESORT it will never support only
CONTEXT=SORT since the latter implies the former. Same holds true for
ESEARCH and CONTEXT=SEARCH for that matter. And yes, technically
CONTEXT=SEARCH is irrelevant on the $server_sort branch, but unlike
CONTEXT=SORT at least it does not needlessly block the PARTIAL return
on that branch in this particular case:)
The proper fix would be to change the existing check from:
if ((!$esearch || !empty($options['partial'])) &&
($cap = $this->queryCapability('CONTEXT')) &&
in_array('SORT|SEARCH', $cap)) {
to just a simple:
if (($esearch && !empty($options['partial'])) ) {
on both code branches, because on the $server_sort branch the $esearch
is set based on ESORT availability and on the other branch it is set
based on ESEARCH availability and those are the relevant capabilities,
like explained above.
Just for reference, I?ve manually added on my test server 101 messages
with subject headers like ?PARTIAL return test xxx?, making sure that
the partial range is different from the message UIDs (just for
exemplification purposes) and that the default mailbox order is also
be different than the custom sort orders used for testing. Afterwards
I did the following to confirm that PARTIAL return is working properly
when only ESORT is available, no CONTEXT=SORT:
1. Request a PARTIAL return, sorting by subject:
C: 4 UID SORT RETURN (PARTIAL 1:2) (SUBJECT) US-ASCII UNDELETED
S: * ESEARCH (TAG "4") UID PARTIAL (1:2 202,201)
S: 4 OK Sort completed (0.001 secs).
2. Fetch the message UIDs returned to confirm that the messages were
in fact first ordered by subject and only afterwards the PARTIAL range
was returned:
C: 5 UID FETCH 202,201 (ENVELOPE INTERNALDATE RFC822.SIZE)
S: * 100 FETCH (UID 201 INTERNALDATE "01-May-2014 11:05:03 +0000"
RFC822.SIZE 1827 ENVELOPE ("Thu, 01 May 2014 11:05:03 +0000" "PARTIAL
return test 002" ((NIL NIL "daniel_bistriceanu" "yahoo.com")) ((NIL
NIL "daniel_bistriceanu" "yahoo.com")) ((NIL NIL "daniel_bistriceanu"
"yahoo.com")) ((NIL NIL "test" "example.com")) NIL NIL NIL NIL))
S: * 101 FETCH (UID 202 INTERNALDATE "01-May-2014 11:05:04 +0000"
RFC822.SIZE 1827 ENVELOPE ("Thu, 01 May 2014 11:05:04 +0000" "PARTIAL
return test 001" ((NIL NIL "daniel_bistriceanu" "yahoo.com")) ((NIL
NIL "daniel_bistriceanu" "yahoo.com")) ((NIL NIL "daniel_bistriceanu"
"yahoo.com")) ((NIL NIL "test" "example.com")) NIL NIL NIL NIL))
S: 5 OK Fetch completed.
3. Request the exact same PARTIAL return, sorting by subject in reverse order:
C: 4 UID SORT RETURN (PARTIAL 1:2) (REVERSE SUBJECT) US-ASCII UNDELETED
S: * ESEARCH (TAG "4") UID PARTIAL (1:2 102:103)
S: 4 OK Sort completed (0.001 secs).
4. Fetch the message UIDs returned to confirm that the messages were
in fact first ordered by subject in reverse order and only afterwards
the PARTIAL range was returned:
C: 5 UID FETCH 102:103 (ENVELOPE INTERNALDATE RFC822.SIZE)
S: * 1 FETCH (UID 102 INTERNALDATE "01-May-2014 11:03:50 +0000"
RFC822.SIZE 1827 ENVELOPE ("Thu, 01 May 2014 11:03:48 +0000" "PARTIAL
return test 101" ((NIL NIL "daniel_bistriceanu" "yahoo.com")) ((NIL
NIL "daniel_bistriceanu" "yahoo.com")) ((NIL NIL "daniel_bistriceanu"
"yahoo.com")) ((NIL NIL "test" "example.com")) NIL NIL NIL NIL))
S: * 2 FETCH (UID 103 INTERNALDATE "01-May-2014 11:03:51 +0000"
RFC822.SIZE 1827 ENVELOPE ("Thu, 01 May 2014 11:03:50 +0000" "PARTIAL
return test 100" ((NIL NIL "daniel_bistriceanu" "yahoo.com")) ((NIL
NIL "daniel_bistriceanu" "yahoo.com")) ((NIL NIL "daniel_bistriceanu"
"yahoo.com")) ((NIL NIL "test" "example.com")) NIL NIL NIL NIL))
S: 5 OK Fetch completed.
State ⇒ Feedback
CONTEXT support to test).
commit 6e423151ed8e3ecb97167fa2b6fd464b0e29debd
Author: Michael M Slusarz <slusarz@horde.org>
Date: Wed Apr 30 16:18:31 2014 -0600
[mms] Correctly handle partial search limiting for a single ID
(
Bug #13153)..../Imap_Client/lib/Horde/Imap/Client/Socket.php | 22 ++++++++++++++-----
framework/Imap_Client/package.xml | 2 +
2 files changed, 18 insertions(+), 6 deletions(-)
http://github.com/horde/horde/commit/6e423151ed8e3ecb97167fa2b6fd464b0e29debd
Assigned to Michael Slusarz
State ⇒ Assigned
if it is a valid sequence range ('range_start:range_end'), as
specified in RFC 5267 [4.4].
So we do need to convert to an Ids object first. We just need to
catch the single case where the range contains a single ID
First, you're allowing an ESORT with PARTIAL return if CONTEXT=SORT is
not available, which is not correct.
Second, support for CONTEXT=SORT (or CONTEXT=SEARCH) isn't true 4466
ESEARCH support. For the sort instance, ESEARCH-like behavior is
available only if ESORT is available or CONTEXT=SORT is available.
ESEARCH-like behavior is NOT available if we are using the SORT
command and CONTEXT=SEARCH is available, since we are not using
CONTEXT=SEARCH support in that code branch. Thus the specific check
for SORT is correct in this instance - CONTEXT=SEARCH availability is
irrelevant.
New Attachment: PARTIAL_limiting_fixes.patch
Priority ⇒ 2. Medium
Type ⇒ Bug
Summary ⇒ Bugs in the current implementation for PARTIAL limiting (RFC 5267 [4.4]) in Horde's Imap_Client library
Queue ⇒ Horde Framework Packages
Milestone ⇒
Patch ⇒ Yes
State ⇒ Unconfirmed
Horde_Imap_Client_Socket class needs to be fixed in order to properly
support PARTIAL limiting:
1. Using a partial range like 'x:x' breaks the current implementation
for PARTIAL limiting, even though that partial range is perfectly valid.
For example, using '1:1' the expected behavior would be to have a
RETURN (PARTIAL 1:1:) in the IMAP command sent to the server.
Something like 'C: 4 UID SORT RETURN (PARTIAL 1:1) (DATE) US-ASCII
UNDELETED'.
Instead it sends: 'C: 4 UID SORT RETURN (PARTIAL 1) (DATE) US-ASCII
UNDELETED', which is obviously an invalid IMAP command that returns
'S: 4 BAD Error in IMAP command UID SORT: PARTIAL range broken.'
This issue is caused by the current behavior of converting the partial
range to ids and then back to its string representation:
strval($this->getIdsOb($options['partial']));
To fix this it should just enforce the usage of the provided partial
if it is a valid sequence range ('range_start:range_end'), as
specified in RFC 5267 [4.4].
2. In case there's a 'sort' option the current implementation
conditions the PARTIAL limiting to the availability of the 'SORT'
capability on the 'CONTEXT', even though it also states just below the
check that "RFC 5267 indicates RFC 4466 ESEARCH support,
notwithstanding RFC 4731 support."
and PARTIAL limiting relies on 'ESEARCH' not 'ESORT' to retrive the
message UIDs that correspond to the PARTIAL limiting range that was
specified in the UID SORT command, properly taking into account the
'sort' criterion:
C: 4 UID SORT RETURN (PARTIAL 1:1) (DATE) US-ASCII UNDELETED
S: * ESEARCH (TAG "4") UID PARTIAL (1:1 1)
S: 4 OK Sort completed (0.000 secs).
To fix this it should check for the availability of the 'SEARCH'
capability on the 'CONTEXT' as an alternative check that 'ESEARCH' is
available.