Tickets :: [#4814] vlad.horde.org timeout after EHLO

6.0.0-beta1

7/27/25

Summary	vlad.horde.org timeout after EHLO
Queue	Horde.org Servers
Type	Bug
State	Resolved
Priority	2. Medium
Owners
Requester	vilius (at) lnk (dot) lt
Created	12/28/2006 (6786 days ago)
Due
Updated	01/29/2007 (6754 days ago)
Assigned
Resolved	01/18/2007 (6765 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch	No

01/29/2007 09:27:13 PM	vilius (at) lnk (dot) lt	Comment #31	Reply to this comment
It is not happening anymore. Thanks!

01/19/2007 02:55:11 PM	vilius (at) lnk (dot) lt	Comment #30	Reply to this comment
Now I see another problem. Jan 19 15:11:37 mail postfix/smtp[30232]: 91DCF10E03A0: conversation with lists.horde.org[199.175.137.231] timed out while sending MAIL FROM Jan 19 15:11:50 mail postfix/smtp[30232]: 91DCF10E03A0: to=<horde@lists.horde.org>, relay=smtp.easydns.com[205.210.42.52]:25, delay=389, delays=0.08/0/383/5.1, dsn=2.0.0, status=sent (250 Ok: queued as 188DC5054D) Telnet test hangs just right after EHLO: [root@mail ~]# telnet lists.horde.org 25 Trying 199.175.137.231... Connected to lists.horde.org (199.175.137.231). Escape character is '^]'. EHLO mail.lnk.lt

01/18/2007 11:38:36 PM	Jan Schneider	Comment #29 State ⇒ Resolved	Reply to this comment
Actually, disabling TCP window scaling did help. We had the same issues not long ago with a broken router but this one was replaced. And I didn't know that turning off window scaling on our side would help too. Anyway, for the record, on BSD this is: sysctl net.inet.tcp.rfc1323=0

01/18/2007 11:28:22 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #28	Reply to this comment
More information: http://kerneltrap.org/node/6723

01/18/2007 11:22:32 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #27	Reply to this comment
I'm looking at http://lwn.net/Articles/92727/ The window scaling in linux changed in 2.6.17. If I disable scaling on my inbound MX, messages come through. With scaling enabled, I get nothing. Unfortunately (for me), I can't leave scaling disabled on my inbound MX. Thinking about it more, disabling scaling on vlad/coyote may not do anything. If the horde machines are behind a firewall/router that's stripping tcp scaling from the headers one-way, then any scaling that my MX tries to do will be stripped, leaving me with a large size window and vlad/coyote with a very small window, and few or no packets transferred past a certain point. This would suggest that anybody running a linux 2.6.17+ kernel on their MX host with tcp window scaling enabled (the default) and the default values for tcp_wmem and tcp_rmem should be having the same problems receiving mail from the horde mailing lists.

01/18/2007 11:08:12 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #26	Reply to this comment
Can you try disabling TCP window scaling? Under linux with a 2.6 kernel, echo 0 > /proc/sys/net/ipv4/tcp_window_scaling I'm not sure how to disable it under other OSes.

01/18/2007 01:43:09 PM	vilius (at) lnk (dot) lt	Comment #25	Reply to this comment
Seems like you are right :(

01/18/2007 01:30:53 PM	Jan Schneider	Comment #24	Reply to this comment
Nope, we don't use mxbackup at the moment at all. Sure you do: $ host -t MX lnk.lt lnk.lt mail is handled by 5 mail.lnk.lt. lnk.lt mail is handled by 10 mxbackup.data.lt. If mail.lnk.lt doesn't work, postfix tries mxbackup the next time and vice versa. Also I see direct connections from vlad and coyote in mail.lnk.lt. Successful connections? Because this is what I see on vlad: Jan 18 05:13:07 vlad postfix/smtp[80652]: A33F214D2: to=<vilius@lnk.lt>, relay=mxbackup.data.lt[213.197.128.83]:25, delay=302, delays=0.28/0.17/301/0.42, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9425F385)

01/18/2007 01:08:03 PM	vilius (at) lnk (dot) lt	Comment #23	Reply to this comment
Nope, we don't use mxbackup at the moment at all. Also I see direct connections from vlad and coyote in mail.lnk.lt.

01/18/2007 01:00:57 PM	Jan Schneider	Comment #22	Reply to this comment
But only because mxbackup.data.lt fixed its IP blacklists.

01/18/2007 12:47:00 PM	vilius (at) lnk (dot) lt	Comment #21	Reply to this comment
Seems like I started to recieve mail about an hour ago. Whuupee :)

01/18/2007 12:41:08 AM	cbs (at) cts (dot) ucla (dot) edu	Comment #20	Reply to this comment
The only change from where I sit is that now I can't ping with any packets larger than 1420 bytes. You might as well put the MTU back to 1500. Thanks for checking.

01/17/2007 11:37:34 PM	Jan Schneider	Comment #19	Reply to this comment
Nope

01/17/2007 11:23:30 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #18	Reply to this comment
Can you try 1448? I just tested pinging vlad with varying packet sizes, starting at 1500 bytes and working my way down. I didn't get a response until I dropped to 1448 bytes per packet.

01/17/2007 11:08:32 PM	Jan Schneider	Comment #17	Reply to this comment
Doesn't seem to make a difference.

01/17/2007 10:32:17 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #16	Reply to this comment
Can you try an MTU of 1480 or 1476?

01/17/2007 09:29:12 PM	Jan Schneider	Comment #15	Reply to this comment
Excuse my ignorance, but would it help if we change the MTU on vlad?

01/15/2007 10:18:44 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #14	Reply to this comment
Anything that is short and keeps packet sizes under 1480 will come through. Mail generated by hand (either telnet to port 25 or using "mail" or "sendmail" or whatever) is generally going to be mall enough due to a lack of normal amount of list-related headers. The problem is caused because a) something (either directly on coyote and vlad, or something in the network) is filtering ICMP traffic, and b) something in the network path has an MTU of 1480. Because of the ICMP filtering, ICMP MUST-FRAGMENT isn't being sent, breaking PMTU.

01/15/2007 01:08:43 PM	vilius (at) lnk (dot) lt	Comment #13	Reply to this comment
Yep. One with "This is a test." and second with "Another test." I'm scraching my head now. We are definitely missing something here.

01/15/2007 01:00:54 PM	Jan Schneider	Comment #12	Reply to this comment
Yeah, I recieved your test mail. Strange enough. Have you greated mail.lnk.lt with HELO or EHLO request? Both, you should have received two mails from coyote.

01/15/2007 12:45:24 PM	vilius (at) lnk (dot) lt	Comment #11	Reply to this comment
Yeah, I recieved your test mail. Strange enough. Have you greated mail.lnk.lt with HELO or EHLO request?

01/15/2007 12:22:10 PM	Jan Schneider	Comment #10	Reply to this comment
The same with coyote.horde.org Dec 28 11:05:15 mail postfix/smtpd[11975]: timeout after EHLO from coyote.horde.org[199.175.137.230] Dec 28 11:05:15 mail postfix/smtpd[11975]: disconnect from coyote.horde.org[199.175.137.230] I was able to connect and send from coyote to mail.lnk.lt just fine. But mxbackup.data.lt catched me as a spammer from an ip blacklist.

01/09/2007 03:56:29 AM	Chuck Hagenbuch	Comment #9	Reply to this comment
It doesn't have anything to do with postfix; it's a TCP-level problem, either with MTU size and/or window scaling, and ICMP filtering. We're looking at a new list server, on a different network, so that should resolve this.

01/09/2007 03:48:18 AM	Michael Rubinsky	Comment #8	Reply to this comment
Could if be that vlad.horde.org is not compatible with new postfix or doesn't understand EHLO response? FWIW, the machine that I am having the issue on is postfix 2.2.2, but so is my backup MX, which accepts the mail, no problem.

01/05/2007 05:21:00 PM	cbs (at) cts (dot) ucla (dot) edu	Comment #7	Reply to this comment
We're asking someone to look into it, but please understand that we have no "staff". Our hosting is donated; we've asked them to look at it, but no one there works "for" Horde. If it helps to pin-point whatever changed, I've been having problems since some time between the 20th and the 24th. If there was a change made in that time frame to drop or filter ICMP traffic, that's probably the cause.

01/05/2007 04:55:08 PM	Chuck Hagenbuch	Comment #6 State ⇒	Reply to this comment
We're asking someone to look into it, but please understand that we have no "staff". Our hosting is donated; we've asked them to look at it, but no one there works "for" Horde.

01/05/2007 12:59:39 PM	vilius (at) lnk (dot) lt	Comment #5	Reply to this comment
Confirmed. Disabling MTU discovery and setting interface MTU to 1400 does help. Could someone from Horde staff please look into this issue? As I'm activelly using HEAD versions it is getting very hard to follow cvs changes and ticket waches.

12/30/2006 01:07:00 AM	cbs (at) cts (dot) ucla (dot) edu	Comment #4	Reply to this comment
This thread might be relevant: http://msgs.securepoint.com/cgi-bin/get/postfix9904/37.html http://msgs.securepoint.com/cgi-bin/get/postfix9904/37/1.html Basically, it looks like something in the path is filtering ICMP, and broke PMTU discovery. I tested by sending ICMP echo request packets with varying sizes. Anything over 1480 bytes for the total packet size was dropped. I tested from two different network paths, to verify that it wasn't something on my end. The two routes share the last 4 hops only: 19 154.11.4.129 36.329 ms 36.818 ms 36.967 ms 20 208.181.86.221 37.716 ms 39.063 ms 41.819 ms 21 209.53.254.36 43.078 ms 41.967 ms 41.206 ms 22 199.175.137.231 38.667 ms 38.190 ms 36.957 ms and: 7 154.11.4.129 143.938 ms 144.835 ms 144.333 ms 8 208.181.86.221 197.612 ms 148.102 ms 145.105 ms 9 209.53.254.36 145.927 ms 144.991 ms 145.115 ms 10 199.175.137.231 145.803 ms 148.961 ms 147.354 ms

12/28/2006 09:34:04 AM	cbs (at) cts (dot) ucla (dot) edu	Comment #3	Reply to this comment
I can confirm this. I'm seeing the same thing with delivery to a sendmail MX. Connections from vlad are timing. I haven't checked logs for anything from coyote.

12/28/2006 09:06:56 AM	vilius (at) lnk (dot) lt	Comment #2	Reply to this comment
The same with coyote.horde.org Dec 28 11:05:15 mail postfix/smtpd[11975]: timeout after EHLO from coyote.horde.org[199.175.137.230] Dec 28 11:05:15 mail postfix/smtpd[11975]: disconnect from coyote.horde.org[199.175.137.230]

12/28/2006 09:00:11 AM	vilius (at) lnk (dot) lt	Comment #1 Priority ⇒ 2. Medium State ⇒ Unconfirmed Queue ⇒ Horde.org Servers Summary ⇒ vlad.horde.org timeout after EHLO Type ⇒ Bug	Reply to this comment
I recently upgraded my mail server's SMTPD from postfix 2.0.x to postfix 2.3.3 and vlad.horde.org can not send me email anymore. Everything else works ok except for vlad.horde.org. This is what I see constantly in the logs: Dec 28 10:36:49 mail postfix/smtpd[16383]: connect from vlad.horde.org[199.175.137.231] Dec 28 10:41:47 mail postfix/smtpd[14568]: timeout after EHLO from vlad.horde.org[199.175.137.231] Dec 28 10:41:47 mail postfix/smtpd[14568]: disconnect from vlad.horde.org[199.175.137.231] Dec 28 10:41:49 mail postfix/smtpd[24168]: timeout after EHLO from vlad.horde.org[199.175.137.231] Dec 28 10:41:49 mail postfix/smtpd[24168]: disconnect from vlad.horde.org[199.175.137.231] I verified ant MTA is working perfectly: [root@mail root]# telnet 213.197.188.3 25 Trying 213.197.188.3... Connected to mail.lnk.lt (213.197.188.3). Escape character is '^]'. 220 mail.lnk.lt ESMTP Windows 2003 Server EHLO test.lnk.lt 250-mail.lnk.lt 250-PIPELINING 250-SIZE 51199999 250-VRFY 250-ETRN 250-STARTTLS 250-AUTH PLAIN LOGIN 250-AUTH=PLAIN LOGIN 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN MAIL FROM: <vilius@lnk.lt> 250 2.1.0 Ok RCPT TO: <vilius@lnk.lt> 250 2.1.5 Ok DATA 354 End data with <CR><LF>.<CR><LF> Subject: asd asdasd . 250 2.0.0 Ok: queued as 7E97F10E0AC0 quit 221 2.0.0 Bye Connection closed by foreign host. Could if be that vlad.horde.org is not compatible with new postfix or doesn't understand EHLO response?