Summary | vlad.horde.org timeout after EHLO |
Queue | Horde.org Servers |
Type | Bug |
State | Resolved |
Priority | 2. Medium |
Owners | |
Requester | vilius (at) lnk (dot) lt |
Created | 12/28/2006 (6786 days ago) |
Due | |
Updated | 01/29/2007 (6754 days ago) |
Assigned | |
Resolved | 01/18/2007 (6765 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
Jan 19 15:11:37 mail postfix/smtp[30232]: 91DCF10E03A0: conversation
with lists.horde.org[199.175.137.231] timed out while sending MAIL FROM
Jan 19 15:11:50 mail postfix/smtp[30232]: 91DCF10E03A0:
to=<horde@lists.horde.org>, relay=smtp.easydns.com[205.210.42.52]:25,
delay=389, delays=0.08/0/383/5.1, dsn=2.0.0, status=sent (250 Ok:
queued as 188DC5054D)
Telnet test hangs just right after EHLO:
[root@mail ~]# telnet lists.horde.org 25
Trying 199.175.137.231...
Connected to lists.horde.org (199.175.137.231).
Escape character is '^]'.
EHLO mail.lnk.lt
State ⇒ Resolved
issues not long ago with a broken router but this one was replaced.
And I didn't know that turning off window scaling on our side would
help too.
Anyway, for the record, on BSD this is:
sysctl net.inet.tcp.rfc1323=0
The window scaling in linux changed in 2.6.17. If I disable scaling
on my inbound MX, messages come through. With scaling enabled, I get
nothing.
Unfortunately (for me), I can't leave scaling disabled on my inbound MX.
Thinking about it more, disabling scaling on vlad/coyote may not do
anything. If the horde machines are behind a firewall/router that's
stripping tcp scaling from the headers one-way, then any scaling that
my MX tries to do will be stripped, leaving me with a large size
window and vlad/coyote with a very small window, and few or no packets
transferred past a certain point.
This would suggest that anybody running a linux 2.6.17+ kernel on
their MX host with tcp window scaling enabled (the default) and the
default values for tcp_wmem and tcp_rmem should be having the same
problems receiving mail from the horde mailing lists.
Under linux with a 2.6 kernel,
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
I'm not sure how to disable it under other OSes.
$ host -t MX lnk.lt
lnk.lt mail is handled by 5 mail.lnk.lt.
lnk.lt mail is handled by 10 mxbackup.data.lt.
If mail.lnk.lt doesn't work, postfix tries mxbackup the next time and
vice versa.
Jan 18 05:13:07 vlad postfix/smtp[80652]: A33F214D2:
to=<vilius@lnk.lt>, relay=mxbackup.data.lt[213.197.128.83]:25,
delay=302, delays=0.28/0.17/301/0.42, dsn=2.0.0, status=sent (250
2.0.0 Ok: queued as 9425F385)
Also I see direct connections from vlad and coyote in mail.lnk.lt.
packets larger than 1420 bytes. You might as well put the MTU back to
1500. Thanks for checking.
sizes, starting at 1500 bytes and working my way down. I didn't get a
response until I dropped to 1448 bytes per packet.
through. Mail generated by hand (either telnet to port 25 or using
"mail" or "sendmail" or whatever) is generally going to be mall enough
due to a lack of normal amount of list-related headers.
The problem is caused because a) something (either directly on coyote
and vlad, or something in the network) is filtering ICMP traffic, and
b) something in the network path has an MTU of 1480. Because of the
ICMP filtering, ICMP MUST-FRAGMENT isn't being sent, breaking PMTU.
I'm scraching my head now. We are definitely missing something here.
mail.lnk.lt with HELO or EHLO request?
mail.lnk.lt with HELO or EHLO request?
Dec 28 11:05:15 mail postfix/smtpd[11975]: timeout after EHLO from
coyote.horde.org[199.175.137.230]
Dec 28 11:05:15 mail postfix/smtpd[11975]: disconnect from
coyote.horde.org[199.175.137.230]
But mxbackup.data.lt catched me as a spammer from an ip blacklist.
either with MTU size and/or window scaling, and ICMP filtering. We're
looking at a new list server, on a different network, so that should
resolve this.
doesn't understand EHLO response?
but so is my backup MX, which accepts the mail, no problem.
have no "staff". Our hosting is donated; we've asked them to look at
it, but no one there works "for" Horde.
since some time between the 20th and the 24th. If there was a change
made in that time frame to drop or filter ICMP traffic, that's
probably the cause.
State ⇒
have no "staff". Our hosting is donated; we've asked them to look at
it, but no one there works "for" Horde.
Disabling MTU discovery and setting interface MTU to 1400 does help.
Could someone from Horde staff please look into this issue? As I'm
activelly using HEAD versions it is getting very hard to follow cvs
changes and ticket waches.
http://msgs.securepoint.com/cgi-bin/get/postfix9904/37.html
http://msgs.securepoint.com/cgi-bin/get/postfix9904/37/1.html
Basically, it looks like something in the path is filtering ICMP, and
broke PMTU discovery.
I tested by sending ICMP echo request packets with varying sizes.
Anything over 1480 bytes for the total packet size was dropped. I
tested from two different network paths, to verify that it wasn't
something on my end. The two routes share the last 4 hops only:
19 154.11.4.129 36.329 ms 36.818 ms 36.967 ms
20 208.181.86.221 37.716 ms 39.063 ms 41.819 ms
21 209.53.254.36 43.078 ms 41.967 ms 41.206 ms
22 199.175.137.231 38.667 ms 38.190 ms 36.957 ms
and:
7 154.11.4.129 143.938 ms 144.835 ms 144.333 ms
8 208.181.86.221 197.612 ms 148.102 ms 145.105 ms
9 209.53.254.36 145.927 ms 144.991 ms 145.115 ms
10 199.175.137.231 145.803 ms 148.961 ms 147.354 ms
sendmail MX. Connections from vlad are timing. I haven't checked
logs for anything from coyote.
Dec 28 11:05:15 mail postfix/smtpd[11975]: timeout after EHLO from
coyote.horde.org[199.175.137.230]
Dec 28 11:05:15 mail postfix/smtpd[11975]: disconnect from
coyote.horde.org[199.175.137.230]
Priority ⇒ 2. Medium
State ⇒ Unconfirmed
Queue ⇒ Horde.org Servers
Summary ⇒ vlad.horde.org timeout after EHLO
Type ⇒ Bug
postfix 2.3.3 and vlad.horde.org can not send me email anymore.
Everything else works ok except for vlad.horde.org. This is what I see
constantly in the logs:
Dec 28 10:36:49 mail postfix/smtpd[16383]: connect from
vlad.horde.org[199.175.137.231]
Dec 28 10:41:47 mail postfix/smtpd[14568]: timeout after EHLO from
vlad.horde.org[199.175.137.231]
Dec 28 10:41:47 mail postfix/smtpd[14568]: disconnect from
vlad.horde.org[199.175.137.231]
Dec 28 10:41:49 mail postfix/smtpd[24168]: timeout after EHLO from
vlad.horde.org[199.175.137.231]
Dec 28 10:41:49 mail postfix/smtpd[24168]: disconnect from
vlad.horde.org[199.175.137.231]
I verified ant MTA is working perfectly:
[root@mail root]# telnet 213.197.188.3 25
Trying 213.197.188.3...
Connected to mail.lnk.lt (213.197.188.3).
Escape character is '^]'.
220 mail.lnk.lt ESMTP Windows 2003 Server
EHLO test.lnk.lt
250-mail.lnk.lt
250-PIPELINING
250-SIZE 51199999
250-VRFY
250-ETRN
250-STARTTLS
250-AUTH PLAIN LOGIN
250-AUTH=PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: <vilius@lnk.lt>
250 2.1.0 Ok
RCPT TO: <vilius@lnk.lt>
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
Subject: asd
asdasd
.
250 2.0.0 Ok: queued as 7E97F10E0AC0
quit
221 2.0.0 Bye
Connection closed by foreign host.
Could if be that vlad.horde.org is not compatible with new postfix or
doesn't understand EHLO response?