* SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX
@ 2003-01-24 18:43 David C Niemi
2003-01-24 20:29 ` David S. Miller
` (3 more replies)
0 siblings, 4 replies; 38+ messages in thread
From: David C Niemi @ 2003-01-24 18:43 UTC (permalink / raw)
To: linux-kernel
I have been experiencing some baffling SSH client hangs under 2.5.59 (and
55) in which the session totally hangs up after I have typed (typically)
10-100 characters. Right before it hangs permanently, a character is
echo'd back to the screen several seconds late. Interestingly, data due
back for my client which is initiated by the server side does make it, I
just can't type anything further.
To reproduce this: ssh in to a somewhat distant host. At a command
prompt, hold down a letter key for a couple of minutes, or just type text
in. If you cut'n'paste text, it rarely hangs (my guess is that this
requires a lot fewer round trips than interactive typing). It should hang
before you get a screenful (sometimes the sessions hang even before they
are set up).
The system involved is a new Dell desktop with a P4/2.6 CPU and an
integrated Intel E1000 NIC, being used at 100Mb full duplex
(autonegotiated). Sessions go through a Cisco PIX on their way to
anywhere useful. The problem doesn't seem to occur if the SSH client and
server are on the same subnet; I'm not sure whether the PIX is an
essential cause of this or if any old router would do the same thing.
I've also reproduce it while being attached to different 100TX switches,
so I think the problem is higher-level.
As for networking options, I see the problem both using the (rather
extensive) default options, and a stripped down set of options with no QOS
or netfilter or anything else fancy.
Neither "ifconfig" nor dmesg show *any* errors whatsoever.
Anyone else seeing SSH client hangs to nonlocal hosts under 2.5.59?
David C Niemi
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-24 18:43 SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi @ 2003-01-24 20:29 ` David S. Miller 2003-01-24 20:46 ` lost ` (2 subsequent siblings) 3 siblings, 0 replies; 38+ messages in thread From: David S. Miller @ 2003-01-24 20:29 UTC (permalink / raw) To: David C Niemi; +Cc: linux-kernel On Fri, 2003-01-24 at 10:43, David C Niemi wrote: > The system involved is a new Dell desktop with a P4/2.6 CPU and an > integrated Intel E1000 NIC, being used at 100Mb full duplex > (autonegotiated). What happens if you comment out the enabling of NETIF_F_TSO in drivers/net/e1000/e1000_main.c around line 428? Does the problem persist? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-24 18:43 SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-24 20:29 ` David S. Miller @ 2003-01-24 20:46 ` lost 2003-01-24 21:15 ` Christopher Faylor 2003-01-27 14:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-27 21:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Bill Davidsen 3 siblings, 1 reply; 38+ messages in thread From: lost @ 2003-01-24 20:46 UTC (permalink / raw) To: linux-kernel On Fri, 24 Jan 2003, David C Niemi wrote: > I have been experiencing some baffling SSH client hangs under 2.5.59 (and > 55) in which the session totally hangs up after I have typed (typically) > 10-100 characters. Right before it hangs permanently, a character is > echo'd back to the screen several seconds late. Interestingly, data due > back for my client which is initiated by the server side does make it, I > just can't type anything further. <snip> > Neither "ifconfig" nor dmesg show *any* errors whatsoever. > > Anyone else seeing SSH client hangs to nonlocal hosts under 2.5.59? I'm seeing the same problem with a D-Link NIC (8139too driver). Exact same symptoms - a delayed echo followed by no further echos. Checking netstat shows an output queue for the socket but it never transmits anything. Messages echoed by the remote server also make it through the connection. The same problem does not occur using "telnet" to connect to the remote host. William Astle finger lost@l-w.net for further information Geek Code V3.12: GCS/M/S d- s+:+ !a C++ UL++++$ P++ L+++ !E W++ !N w--- !O !M PS PE V-- Y+ PGP t+@ 5++ X !R tv+@ b+++@ !DI D? G e++ h+ y? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-24 20:46 ` lost @ 2003-01-24 21:15 ` Christopher Faylor 2003-01-25 2:00 ` SSH Hangs in 2.5.59 and 2.5.55 (TCP_NODELAY?) Christopher Faylor 0 siblings, 1 reply; 38+ messages in thread From: Christopher Faylor @ 2003-01-24 21:15 UTC (permalink / raw) To: lost; +Cc: linux-kernel On Fri, Jan 24, 2003 at 01:46:10PM -0700, lost@l-w.net wrote: >On Fri, 24 Jan 2003, David C Niemi wrote: > >> I have been experiencing some baffling SSH client hangs under 2.5.59 (and >> 55) in which the session totally hangs up after I have typed (typically) >> 10-100 characters. Right before it hangs permanently, a character is >> echo'd back to the screen several seconds late. Interestingly, data due >> back for my client which is initiated by the server side does make it, I >> just can't type anything further. > ><snip> > >> Neither "ifconfig" nor dmesg show *any* errors whatsoever. >> >> Anyone else seeing SSH client hangs to nonlocal hosts under 2.5.59? > >I'm seeing the same problem with a D-Link NIC (8139too driver). Exact same >symptoms - a delayed echo followed by no further echos. Checking netstat >shows an output queue for the socket but it never transmits anything. >Messages echoed by the remote server also make it through the connection. I hate "me toos" but maybe this will provide some useful data. I'm seeing the same thing with a 3c59x driver. I couldn't reproduce the problem with a tulip driver when I connect my laptop directly to my cable modem. The problem only occurs when going through the laptop (which acts as a firewall, running netfilter) to a remote site, in my case the site is sources.redhat.com. cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 (TCP_NODELAY?) 2003-01-24 21:15 ` Christopher Faylor @ 2003-01-25 2:00 ` Christopher Faylor 0 siblings, 0 replies; 38+ messages in thread From: Christopher Faylor @ 2003-01-25 2:00 UTC (permalink / raw) To: linux-kernel On Fri, Jan 24, 2003 at 04:15:23PM -0500, Christopher Faylor wrote: >On Fri, Jan 24, 2003 at 01:46:10PM -0700, lost@l-w.net wrote: >>On Fri, 24 Jan 2003, David C Niemi wrote: >> >>> I have been experiencing some baffling SSH client hangs under 2.5.59 (and >>> 55) in which the session totally hangs up after I have typed (typically) >>> 10-100 characters. Right before it hangs permanently, a character is >>> echo'd back to the screen several seconds late. Interestingly, data due >>> back for my client which is initiated by the server side does make it, I >>> just can't type anything further. >> >><snip> >> >>> Neither "ifconfig" nor dmesg show *any* errors whatsoever. >>> >>> Anyone else seeing SSH client hangs to nonlocal hosts under 2.5.59? >> >>I'm seeing the same problem with a D-Link NIC (8139too driver). Exact same >>symptoms - a delayed echo followed by no further echos. Checking netstat >>shows an output queue for the socket but it never transmits anything. >>Messages echoed by the remote server also make it through the connection. > >I hate "me toos" but maybe this will provide some useful data. > >I'm seeing the same thing with a 3c59x driver. I couldn't reproduce the >problem with a tulip driver when I connect my laptop directly to my >cable modem. The problem only occurs when going through the laptop >(which acts as a firewall, running netfilter) to a remote site, in my >case the site is sources.redhat.com. Checking the strace log between telnet and ssh, I noticed that ssh does this: setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 while telnet doesn't. If I introduce that call into telnet, it seems to hang eventually too in the same way as ssh. cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-24 18:43 SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-24 20:29 ` David S. Miller 2003-01-24 20:46 ` lost @ 2003-01-27 14:27 ` David C Niemi 2003-01-27 18:06 ` David S. Miller 2003-01-27 18:11 ` David S. Miller 2003-01-27 21:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Bill Davidsen 3 siblings, 2 replies; 38+ messages in thread From: David C Niemi @ 2003-01-27 14:27 UTC (permalink / raw) To: linux-kernel On Fri, 24 Jan 2003, David S. Miller wrote: > What happens if you comment out the enabling of > NETIF_F_TSO in drivers/net/e1000/e1000_main.c around > line 428? Does the problem persist? Yes, the problem persists. Interesting that it seems to happen on a variety of Ethernet cards, I wonder if the problem's in the TCP area. Interestingly it seems like on the *unafflicted* systems I can still see the "delayed character" symptom, but eventually the outstanding characters do get echoed back to the screen. Whereas on the afflicted 2.5.5x systems, as soon as there is a delay (perhaps due to a retransmission) all outstanding characters (after the delayed one) are lost or permanently hung up somewhere. DCN ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 14:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi @ 2003-01-27 18:06 ` David S. Miller 2003-01-27 18:11 ` David S. Miller 1 sibling, 0 replies; 38+ messages in thread From: David S. Miller @ 2003-01-27 18:06 UTC (permalink / raw) To: lkernel2003; +Cc: linux-kernel David, this email address you use "lkernel2003@tuxers.net" always bounces for me. Maybe you have it fixed now, but for the first two replies I've sent you on this issue I've gotten a "user unknown" bounce. This gets annoying after a while when you're trying to help someone fix a problem. :( ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 14:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-27 18:06 ` David S. Miller @ 2003-01-27 18:11 ` David S. Miller 2003-01-27 18:28 ` Anders Gustafsson 1 sibling, 1 reply; 38+ messages in thread From: David S. Miller @ 2003-01-27 18:11 UTC (permalink / raw) To: lkernel2003; +Cc: linux-kernel, kuznet From: David C Niemi <lkernel2003@tuxers.net> Date: Mon, 27 Jan 2003 09:27:25 -0500 (EST) On Fri, 24 Jan 2003, David S. Miller wrote: > What happens if you comment out the enabling of > NETIF_F_TSO in drivers/net/e1000/e1000_main.c around > line 428? Does the problem persist? Yes, the problem persists. Interesting that it seems to happen on a variety of Ethernet cards, I wonder if the problem's in the TCP area. I think the clue in this thread is the TCP_NODELAY socket option. The one post claimed that by turning this on in telnet, it made telnet exhibit the same problems SSH shows. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 18:11 ` David S. Miller @ 2003-01-27 18:28 ` Anders Gustafsson 2003-01-27 22:36 ` [TEST FIX] " David S. Miller 0 siblings, 1 reply; 38+ messages in thread From: Anders Gustafsson @ 2003-01-27 18:28 UTC (permalink / raw) To: David S. Miller; +Cc: lkernel2003, linux-kernel, kuznet, tobi On Mon, Jan 27, 2003 at 10:11:28AM -0800, David S. Miller wrote: > > I think the clue in this thread is the TCP_NODELAY socket option. > The one post claimed that by turning this on in telnet, it made > telnet exhibit the same problems SSH shows. This is a "me too", well actually not me, but some friends is seeing this. If I remember correctly the data was actually sent to the server and only the response was lost (seen be stracing the shell on the server). Someone suggested that it might be the sequence-number beeing screwed up. -- Anders Gustafsson - andersg@0x63.nu - http://0x63.nu/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 18:28 ` Anders Gustafsson @ 2003-01-27 22:36 ` David S. Miller 2003-01-28 2:25 ` lost ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: David S. Miller @ 2003-01-27 22:36 UTC (permalink / raw) To: andersg; +Cc: lkernel2003, linux-kernel, kuznet, tobi Hey guys, can you all see if this patch makes the problem go away in 2.5.x? It is merely a guess, but it is worth enough to experiment. Alexey, this piece of code was buggy first time it was coded, and it may still have some holes. :-))) --- net/ipv4/tcp_output.c.~1~ Mon Jan 27 14:45:49 2003 +++ net/ipv4/tcp_output.c Mon Jan 27 14:46:33 2003 @@ -889,7 +889,7 @@ if (atomic_read(&sk->wmem_alloc) > min(sk->wmem_queued+(sk->wmem_queued>>2),sk->sndbuf)) return -EAGAIN; - if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { + if (0 && before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una)) BUG(); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 22:36 ` [TEST FIX] " David S. Miller @ 2003-01-28 2:25 ` lost 2003-01-28 2:57 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet 2003-01-28 3:39 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Christopher Faylor 2 siblings, 0 replies; 38+ messages in thread From: lost @ 2003-01-28 2:25 UTC (permalink / raw) To: David S. Miller; +Cc: andersg, lkernel2003, linux-kernel, kuznet, tobi On Mon, 27 Jan 2003, David S. Miller wrote: > Hey guys, can you all see if this patch makes the problem go away in > 2.5.x? It is merely a guess, but it is worth enough to experiment. > > Alexey, this piece of code was buggy first time it was coded, and it > may still have some holes. :-))) It seems to have cleared up the problem for me. I've been running an SSH seesion for the past hour without any lock up problems with the patch installed. Without it, the lock up happened quite reliably within a few minutes. William Astle finger lost@l-w.net for further information Geek Code V3.12: GCS/M/S d- s+:+ !a C++ UL++++$ P++ L+++ !E W++ !N w--- !O !M PS PE V-- Y+ PGP t+@ 5++ X !R tv+@ b+++@ !DI D? G e++ h+ y? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-27 22:36 ` [TEST FIX] " David S. Miller 2003-01-28 2:25 ` lost @ 2003-01-28 2:57 ` kuznet 2003-01-28 3:22 ` Christopher Faylor 2003-01-28 3:39 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Christopher Faylor 2 siblings, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-28 2:57 UTC (permalink / raw) To: David S. Miller; +Cc: andersg, lkernel2003, linux-kernel, tobi Hello! > Alexey, this piece of code was buggy first time it was coded, and it > may still have some holes. :-))) To my shame, I cannot say "no". It was written sort of too fast. :-) Did the reporters see packets with wrong checksum on wire or wrong tcp headers or something like that? Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-28 2:57 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet @ 2003-01-28 3:22 ` Christopher Faylor 0 siblings, 0 replies; 38+ messages in thread From: Christopher Faylor @ 2003-01-28 3:22 UTC (permalink / raw) To: kuznet; +Cc: David S. Miller, linux-kernel On Tue, Jan 28, 2003 at 05:57:55AM +0300, kuznet@ms2.inr.ac.ru wrote: >>Alexey, this piece of code was buggy first time it was coded, and it >>may still have some holes. :-))) > >To my shame, I cannot say "no". It was written sort of too fast. :-) > >Did the reporters see packets with wrong checksum on wire or wrong tcp >headers or something like that? My knowledge of TCP/IP is extremely minimal but the sequence number looked weird when the stall occurred. It looked like the sequence numbers you get with the -S option to tcpdump. All of the other packets had small sequence numbers and what I assume was the bad packet had a large one. I'm sorry if this is gibberish and makes no sense. I don't know how to tell if the checksum was wrong or not. cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-27 22:36 ` [TEST FIX] " David S. Miller 2003-01-28 2:25 ` lost 2003-01-28 2:57 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet @ 2003-01-28 3:39 ` Christopher Faylor 2003-01-28 3:55 ` kuznet 2 siblings, 1 reply; 38+ messages in thread From: Christopher Faylor @ 2003-01-28 3:39 UTC (permalink / raw) To: David S. Miller; +Cc: andersg, lkernel2003, linux-kernel, kuznet, tobi On Mon, Jan 27, 2003 at 02:36:25PM -0800, David S. Miller wrote: >Hey guys, can you all see if this patch makes the problem go away in >2.5.x? It is merely a guess, but it is worth enough to experiment. > >Alexey, this piece of code was buggy first time it was coded, and it >may still have some holes. :-))) > >--- net/ipv4/tcp_output.c.~1~ Mon Jan 27 14:45:49 2003 >+++ net/ipv4/tcp_output.c Mon Jan 27 14:46:33 2003 >@@ -889,7 +889,7 @@ > if (atomic_read(&sk->wmem_alloc) > min(sk->wmem_queued+(sk->wmem_queued>>2),sk->sndbuf)) > return -EAGAIN; > >- if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { >+ if (0 && before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { > if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una)) > BUG(); Sorry, but this doesn't do it for me. I still get a hang. cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 3:39 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Christopher Faylor @ 2003-01-28 3:55 ` kuznet 2003-01-28 7:08 ` dada1 0 siblings, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-28 3:55 UTC (permalink / raw) To: Christopher Faylor; +Cc: davem, andersg, lkernel2003, linux-kernel, tobi Hello! > Sorry, but this doesn't do it for me. I still get a hang. Can you make tcpdump of this session which looks like tcpdump with -S? :-) Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 3:55 ` kuznet @ 2003-01-28 7:08 ` dada1 2003-01-28 12:36 ` Sebastian Benoit 0 siblings, 1 reply; 38+ messages in thread From: dada1 @ 2003-01-28 7:08 UTC (permalink / raw) To: kuznet, Christopher Faylor Cc: davem, andersg, lkernel2003, linux-kernel, tobi From: <kuznet@ms2.inr.ac.ru> > Can you make tcpdump of this session which looks like tcpdump with -S? :-) > > > Alexey Hello Alexey I do have a lot of such hangs too. client is a linux 2.4.20 machine, but hangs were given by other OS as well. server is a linux-2.5.59 SMP machine, traffic control in operation, 07:28:28.176006 client.1022 > server.22: S 3603063450:3603063450(0) win 5840 <mss 1460,sackOK,timestamp 359716783 0,no p,wscale 0> (DF) 07:28:28.176151 server.22 > client.1022: S 420199885:420199885(0) ack 3603063451 win 5840 <mss 1460> (DF) 07:28:28.580529 client.1022 > server.22: . ack 420199886 win 5840 (DF) 07:28:28.583078 server.22 > client.1022: P 420199886:420199909(23) ack 3603063451 win 5840 (DF) 07:28:29.006494 client.1022 > server.22: . ack 420199909 win 5840 (DF) [tos 0x10] 07:28:29.007644 client.1022 > server.22: P 3603063451:3603063472(21) ack 420199909 win 5840 (DF) [tos 0x10] 07:28:29.007743 server.22 > client.1022: . ack 3603063472 win 5840 (DF) 07:28:29.008246 server.22 > client.1022: P 420199909:420200185(276) ack 3603063472 win 5840 (DF) 07:28:29.575223 client.1022 > server.22: P 3603063472:3603063628(156) ack 420200185 win 6432 (DF) [tos 0x10] 07:28:29.576148 client.1022 > server.22: P 3603063628:3603063680(52) ack 420200185 win 6432 (DF) [tos 0x10] 07:28:29.596173 server.22 > client.1022: P 420200185:420200197(12) ack 3603063680 win 5840 (DF) 07:28:30.020642 client.1022 > server.22: P 3603063680:3603063700(20) ack 420200197 win 6432 (DF) [tos 0x10] 07:28:30.059904 server.22 > client.1022: . ack 3603063700 win 5840 (DF) 07:28:31.407346 client.1022 > server.22: P 3603063680:3603063700(20) ack 420200197 win 6432 (DF) [tos 0x10] 07:28:31.407464 server.22 > client.1022: . ack 3603063700 win 5840 (DF) 07:28:34.369344 server.22 > client.1022: P 420200197:420200209(12) ack 3603063700 win 5840 (DF) 07:28:34.784326 client.1022 > server.22: P 3603063700:3603063840(140) ack 420200209 win 6432 (DF) [tos 0x10] 07:28:34.784398 server.22 > client.1022: . ack 3603063840 win 6432 (DF) 07:28:34.786551 server.22 > client.1022: P 420200209:420200221(12) ack 3603063840 win 6432 (DF) 07:28:35.268516 client.1022 > server.22: . ack 420200221 win 6432 (DF) [tos 0x10] 07:28:38.597375 client.1022 > server.22: P 3603063840:3603063868(28) ack 420200221 win 6432 (DF) [tos 0x10] 07:28:38.605463 server.22 > client.1022: P 420200221:420200233(12) ack 3603063868 win 6432 (DF) 07:28:39.096028 client.1022 > server.22: . ack 420200233 win 6432 (DF) [tos 0x10] 07:28:39.099880 client.1022 > server.22: P 3603063868:3603064016(148) ack 420200233 win 6432 (DF) [tos 0x10] 07:28:39.101205 server.22 > client.1022: P 420200233:420200245(12) ack 3603064016 win 6432 (DF) 07:28:39.665364 client.1022 > server.22: P 3603064016:3603064028(12) ack 420200245 win 6432 (DF) [tos 0x10] 07:28:39.681917 server.22 > client.1022: P 420200245:420200377(132) ack 3603064028 win 6432 (DF) [tos 0x10] 07:28:39.957853 server.22 > client.1022: P 420200377:420200413(36) ack 3603064028 win 6432 (DF) [tos 0x10] 07:28:40.191540 client.1022 > server.22: . ack 420200377 win 7504 (DF) [tos 0x10] 07:28:40.432214 client.1022 > server.22: . ack 420200413 win 7504 (DF) [tos 0x10] 07:28:41.928298 client.1022 > server.22: P 3603064028:3603064048(20) ack 420200413 win 7504 (DF) [tos 0x10] 07:28:41.938316 server.22 > client.1022: P 420200413:420200457(44) ack 3603064048 win 6432 (DF) [tos 0x10] 07:28:42.123677 client.1022 > server.22: P 3603064048:3603064068(20) ack 420200413 win 7504 (DF) [tos 0x10] 07:28:42.134272 server.22 > client.1022: P 420200457:420200501(44) ack 3603064068 win 6432 (DF) [tos 0x10] 07:28:42.483256 client.1022 > server.22: P 3603064068:3603064108(40) ack 420200457 win 7504 (DF) [tos 0x10] 07:28:42.494187 server.22 > client.1022: P 420200501:420200569(68) ack 3603064108 win 6432 (DF) [tos 0x10] 07:28:42.679902 client.1022 > server.22: . ack 420200501 win 7504 (DF) [tos 0x10] 07:28:42.792933 client.1022 > server.22: P 3603064108:3603064128(20) ack 420200501 win 7504 (DF) [tos 0x10] 07:28:42.803135 server.22 > client.1022: P 420200569:420200613(44) ack 3603064128 win 6432 (DF) [tos 0x10] 07:28:42.825978 client.1022 > server.22: P 3603064128:3603064148(20) ack 420200501 win 7504 (DF) [tos 0x10] 07:28:42.836109 server.22 > client.1022: P 420200613:420200657(44) ack 3603064148 win 6432 (DF) [tos 0x10] 07:28:43.408817 client.1022 > server.22: P 3603064148:3603064188(40) ack 420200501 win 7504 (DF) [tos 0x10] 07:28:43.461886 server.22 > client.1022: . ack 3603064188 win 6432 (DF) [tos 0x10] 07:28:43.589866 server.22 > client.1022: P 420200501:420200569(68) ack 3603064188 win 6432 (DF) [tos 0x10] 07:28:44.087198 client.1022 > server.22: . ack 420200569 win 7504 (DF) [tos 0x10] 07:28:45.410465 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:28:49.050628 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:28:56.328994 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:29:10.886716 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:29:40.003185 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:30:38.235034 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:32:34.696797 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:34:34.678762 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:36:34.660724 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:38:34.642703 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:40:34.623666 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:42:34.605632 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:44:34.587601 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] 07:46:34.569566 server.22 > client.1022: P ack 3603064188 win 6432 (DF) [tos 0x10] netstat -an on the server tells us that 156 bytes are waiting in the send queue. Thanks for your help ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 7:08 ` dada1 @ 2003-01-28 12:36 ` Sebastian Benoit 2003-01-28 14:09 ` kuznet 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Benoit @ 2003-01-28 12:36 UTC (permalink / raw) To: dada1 Cc: kuznet, Christopher Faylor, davem, andersg, lkernel2003, linux-kernel, tobi [-- Attachment #1: Type: text/plain, Size: 1018 bytes --] dada1(dada1@cosmosbay.com)@2003.01.28 08:08:10 +0000: > From: <kuznet@ms2.inr.ac.ru> > > Can you make tcpdump of this session which looks like tcpdump with -S? :-) This might help you: I still have a similar problem (ssh hang with other traffic) that i reported in november on netdev: http://marc.theaimsgroup.com/?l=linux-kernel&m=103641051419994&w=2 (this post includes a tcpdump and a discription how to reproduce it) I did not follow up on that because i did not have the time and i ran into hardware problems then... (You can find the 'socket' program i used here: http://www.jnickelsen.de/socket/socket-1.2.html) /Sebastian -- Sebastian Benoit <benoit-lists@fb12.de> My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/ GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00 "After writing for fifteen years it struck me I had no talent for writing. But I couldn't give it up: by that time I was already famous." -- Mark Twain [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 12:36 ` Sebastian Benoit @ 2003-01-28 14:09 ` kuznet 2003-01-28 18:35 ` David S. Miller 0 siblings, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-28 14:09 UTC (permalink / raw) To: Sebastian Benoit Cc: dada1, cgf, davem, andersg, lkernel2003, linux-kernel, tobi Hello! > http://marc.theaimsgroup.com/?l=3Dlinux-kernel&m=3D103641051419994&w=3D2 Thank you. Christopher also gave something similar. Dave, look: 23:24:05.617819 trixie.bosbc.com.32793 > sources.redhat.com.22: P [bad tcp cksum 3770!] 5136:5136(0) ack 32369 win 45144 <nop,nop,timestamp 122553 80640703> (DF) [tos 0x10] (ttl 64, id 9958, len 52) 23:24:06.093754 trixie.bosbc.com.32793 > sources.redhat.com.22: P [bad tcp cksum 5b6e!] 5136:5136(0) ack 32369 win 45144 <nop,nop,timestamp 123029 80640703> (DF) [tos 0x10] (ttl 64, id 9959, len 52) 23:24:07.045603 trixie.bosbc.com.32793 > sources.redhat.com.22: P [bad tcp cksum a36a!] 5136:5136(0) ack 32369 win 45144 <nop,nop,timestamp 123981 80640703> (DF) [tos 0x10] (ttl 64, id 9960, len 52) We apparently have segment of zero length in queue. :-) Well, that chunk cannot be responsible for this directly, I am afraid we somewhat arrived to attempt to retransmit already acked segment. It is weird enough to be good hint. :-) Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 14:09 ` kuznet @ 2003-01-28 18:35 ` David S. Miller 2003-01-28 19:16 ` Sebastian Benoit 0 siblings, 1 reply; 38+ messages in thread From: David S. Miller @ 2003-01-28 18:35 UTC (permalink / raw) To: kuznet; +Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: kuznet@ms2.inr.ac.ru Date: Tue, 28 Jan 2003 17:09:09 +0300 (MSK) We apparently have segment of zero length in queue. :-) Well, that chunk cannot be responsible for this directly, I am afraid we somewhat arrived to attempt to retransmit already acked segment. Hmmm, it is one of few places where sequence numbers of already sent packet are mangled. :-) Good set of debug checks would be the following: --- net/ipv4/tcp_output.c.~1~ Mon Jan 27 14:46:33 2003 +++ net/ipv4/tcp_output.c Tue Jan 28 10:47:08 2003 @@ -441,6 +441,9 @@ TCP_SKB_CB(buff)->end_seq = TCP_SKB_CB(skb)->end_seq; TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(buff)->seq; + BUG_TRAP(TCP_SKB_CB(buff)->seq != TCP_SKB_CB(buff)->end_seq); + BUG_TRAP(TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq); + /* PSH and FIN should only be set in the second packet. */ flags = TCP_SKB_CB(skb)->flags; TCP_SKB_CB(skb)->flags = flags & ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH); @@ -524,6 +527,7 @@ } TCP_SKB_CB(skb)->seq += len; + BUG_TRAP(TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq); skb->ip_summed = CHECKSUM_HW; return 0; } @@ -796,6 +800,7 @@ /* Update sequence range on original skb. */ TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq; + BUG_TRAP(TCP_SKB_CB(skb)->seq != TCP_SKB_CB(skb)->end_seq); /* Merge over control information. */ flags |= TCP_SKB_CB(next_skb)->flags; /* This moves PSH/FIN etc. over */ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 18:35 ` David S. Miller @ 2003-01-28 19:16 ` Sebastian Benoit 2003-01-28 20:34 ` David S. Miller 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Benoit @ 2003-01-28 19:16 UTC (permalink / raw) To: David S. Miller Cc: kuznet, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi [-- Attachment #1: Type: text/plain, Size: 3935 bytes --] David S. Miller(davem@redhat.com)@2003.01.28 10:35:34 +0000: > From: kuznet@ms2.inr.ac.ru > Date: Tue, 28 Jan 2003 17:09:09 +0300 (MSK) > > We apparently have segment of zero length in queue. :-) > > Well, that chunk cannot be responsible for this directly, I am afraid > we somewhat arrived to attempt to retransmit already acked segment. > > Hmmm, it is one of few places where sequence numbers of already > sent packet are mangled. :-) > > Good set of debug checks would be the following: no output, i did 4 tests, everytime i was able to lock the ssh-connection within a few seconds. kernel 2.5.59 + your debug-patch. tcpdump of one: 20:07:30.788431 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: . ack 4359 win 13888 <nop,nop,timestamp 591456 50952833> (DF) [tos 0x10] 20:07:31.054101 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P 2927:2975(48) ack 4359 win 13888 <nop,nop,timestamp 591722 50952833> (DF) [tos 0x10] 20:07:31.119062 turing.fb12.de.ssh > ronja.fluchtwagenfahrer.de.32774: P 4359:4407(48) ack 2975 win 10944 <nop,nop,timestamp 50952865 591722> (DF) [tos 0x10] 20:07:31.119102 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: . ack 4407 win 13888 <nop,nop,timestamp 591787 50952865> (DF) [tos 0x10] 20:07:31.132819 turing.fb12.de.ssh > ronja.fluchtwagenfahrer.de.32774: P 4407:4487(80) ack 2975 win 10944 <nop,nop,timestamp 50952865 591722> (DF) [tos 0x10] 20:07:31.132842 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: . ack 4487 win 13888 <nop,nop,timestamp 591801 50952865> (DF) [tos 0x10] 20:07:31.132930 turing.fb12.de.ssh > ronja.fluchtwagenfahrer.de.32774: P 4487:4551(64) ack 2975 win 10944 <nop,nop,timestamp 50952866 591722> (DF) [tos 0x10] 20:07:31.132951 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: . ack 4551 win 13888 <nop,nop,timestamp 591801 50952866> (DF) [tos 0x10] 20:07:31.602060 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P 2975:3023(48) ack 4551 win 13888 <nop,nop,timestamp 592270 50952866> (DF) [tos 0x10] 20:07:31.687764 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P 3023:3071(48) ack 4551 win 13888 <nop,nop,timestamp 592356 50952866> (DF) [tos 0x10] 20:07:31.834730 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P 2975:3023(48) ack 4551 win 13888 <nop,nop,timestamp 592503 50952866> (DF) [tos 0x10] 20:07:31.888875 turing.fb12.de.ssh > ronja.fluchtwagenfahrer.de.32774: P 4551:4599(48) ack 3023 win 10944 <nop,nop,timestamp 50952942 592503> (DF) [tos 0x10] ---- here it hangs --- 20:07:31.888910 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: . ack 4599 win 13888 <nop,nop,timestamp 592557 50952942> (DF) [tos 0x10] 20:07:32.300653 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P ack 4599 win 13888 <nop,nop,timestamp 592969 50952942> (DF) [tos 0x10] 20:07:33.232614 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P ack 4599 win 13888 <nop,nop,timestamp 593901 50952942> (DF) [tos 0x10] 20:07:35.096334 ronja.fluchtwagenfahrer.de.32774 > turing.fb12.de.ssh: P ack 4599 win 13888 <nop,nop,timestamp 595765 50952942> (DF) [tos 0x10] 20:07:37.269116 ronja.fluchtwagenfahrer.de.32773 > turing.fb12.de.ssh: P ack 1 win 34800 <nop,nop,timestamp 597938 50948566> (DF) [tos 0x10] /B. -- Sebastian Benoit <benoit-lists@fb12.de> My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/ GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00 Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes insanity. Repetition causes ins [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 19:16 ` Sebastian Benoit @ 2003-01-28 20:34 ` David S. Miller 2003-01-28 21:59 ` Christopher Faylor ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: David S. Miller @ 2003-01-28 20:34 UTC (permalink / raw) To: benoit-lists; +Cc: kuznet, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: Sebastian Benoit <benoit-lists@fb12.de> Date: Tue, 28 Jan 2003 20:16:45 +0100 David S. Miller(davem@redhat.com)@2003.01.28 10:35:34 +0000: > Good set of debug checks would be the following: no output, i did 4 tests, everytime i was able to lock the ssh-connection within a few seconds. kernel 2.5.59 + your debug-patch. Thanks for testing, how about this new patch at the end of this email? Does it make the problem go away? Alexey, most solid report is that 2.5.43-bk1 makes bug appear. This is good because it sort of narrows things down. What is contained there in networking is: 1) initial stackable dst logic, should not cause problems 2) addition of UDP sendfile and ip_append_*() logic 3) fix to tcp_check_req() "fix" :-) it only changes bahevior on connect so should not be a problem I heavily, therefore, suspect #2 which is why I am poking around in the tcp.c changes to change checksumming and copying semantics. --- net/ipv4/tcp.c.~1~ Tue Jan 28 12:40:09 2003 +++ net/ipv4/tcp.c Tue Jan 28 12:41:48 2003 @@ -1089,11 +1089,13 @@ if (!skb) goto wait_for_memory; +#if 0 /* * Check whether we can use HW checksum. */ if (sk->route_caps & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)) skb->ip_summed = CHECKSUM_HW; +#endif skb_entail(sk, tp, skb); copy = mss_now; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 20:34 ` David S. Miller @ 2003-01-28 21:59 ` Christopher Faylor 2003-01-28 22:12 ` Sebastian Benoit 2003-01-28 23:56 ` kuznet 2 siblings, 0 replies; 38+ messages in thread From: Christopher Faylor @ 2003-01-28 21:59 UTC (permalink / raw) To: David S. Miller Cc: benoit-lists, kuznet, dada1, andersg, lkernel2003, linux-kernel, tobi On Tue, Jan 28, 2003 at 12:34:13PM -0800, David S. Miller wrote: > From: Sebastian Benoit <benoit-lists@fb12.de> > Date: Tue, 28 Jan 2003 20:16:45 +0100 > > David S. Miller(davem@redhat.com)@2003.01.28 10:35:34 +0000: > > Good set of debug checks would be the following: > > no output, i did 4 tests, everytime i was able to lock the ssh-connection > within a few seconds. kernel 2.5.59 + your debug-patch. > >Thanks for testing, how about this new patch at the end of this email? >Does it make the problem go away? It does for me, yes. I tried very hard to make ssh hang but I couldn't do so. cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 20:34 ` David S. Miller 2003-01-28 21:59 ` Christopher Faylor @ 2003-01-28 22:12 ` Sebastian Benoit 2003-01-28 23:21 ` David S. Miller 2003-01-28 23:56 ` kuznet 2 siblings, 1 reply; 38+ messages in thread From: Sebastian Benoit @ 2003-01-28 22:12 UTC (permalink / raw) To: David S. Miller Cc: kuznet, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi [-- Attachment #1: Type: text/plain, Size: 962 bytes --] David S. Miller(davem@redhat.com)@2003.01.28 12:34:13 +0000: > Thanks for testing, how about this new patch at the end of this email? > Does it make the problem go away? this does it! /B. > --- net/ipv4/tcp.c.~1~ Tue Jan 28 12:40:09 2003 > +++ net/ipv4/tcp.c Tue Jan 28 12:41:48 2003 > @@ -1089,11 +1089,13 @@ > if (!skb) > goto wait_for_memory; > > +#if 0 > /* > * Check whether we can use HW checksum. > */ > if (sk->route_caps & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)) > skb->ip_summed = CHECKSUM_HW; > +#endif > > skb_entail(sk, tp, skb); > copy = mss_now; > -- Sebastian Benoit <benoit-lists@fb12.de> My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/ GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00 The dyslexic agnostic with insomnia laid awake all night wondering if there really was a dog. [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-28 22:12 ` Sebastian Benoit @ 2003-01-28 23:21 ` David S. Miller 2003-01-29 0:02 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet 2003-01-29 0:09 ` kuznet 0 siblings, 2 replies; 38+ messages in thread From: David S. Miller @ 2003-01-28 23:21 UTC (permalink / raw) To: benoit-lists; +Cc: kuznet, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: Sebastian Benoit <benoit-lists@fb12.de> Date: Tue, 28 Jan 2003 23:12:01 +0100 David S. Miller(davem@redhat.com)@2003.01.28 12:34:13 +0000: > Thanks for testing, how about this new patch at the end of this email? > Does it make the problem go away? this does it! Alexey, my current suspect is skb->csum state on retransmit. BTW, how come tcp_trim_head() can just set skb->ip_summed blindly to CHECKSUM_HW and not setup skb->csum? Even if you can depend upon net/core/dev.c to do the checksum for you, you still would need to setup skb->csum properly. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-28 23:21 ` David S. Miller @ 2003-01-29 0:02 ` kuznet 2003-01-29 0:09 ` kuznet 1 sibling, 0 replies; 38+ messages in thread From: kuznet @ 2003-01-29 0:02 UTC (permalink / raw) To: David S. Miller Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi Hello! > BTW, how come tcp_trim_head() can just set skb->ip_summed > blindly to CHECKSUM_HW and not setup skb->csum? When skb->ip_summed is CHECKSUM_HW skb->csum is ignored and initialized at the moment when segment is transmitted in tcp_v*_send_check(). Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-28 23:21 ` David S. Miller 2003-01-29 0:02 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet @ 2003-01-29 0:09 ` kuznet 2003-01-29 0:46 ` Sebastian Benoit 2003-01-29 6:52 ` David S. Miller 1 sibling, 2 replies; 38+ messages in thread From: kuznet @ 2003-01-29 0:09 UTC (permalink / raw) To: David S. Miller Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi Hello! The proposed fix is enclosed. Please, check. Alexey ===== net/ipv4/tcp_output.c 1.19 vs edited ===== --- 1.19/net/ipv4/tcp_output.c Fri Oct 25 15:46:21 2002 +++ edited/net/ipv4/tcp_output.c Wed Jan 29 03:07:26 2003 @@ -786,13 +786,13 @@ /* Ok. We will be able to collapse the packet. */ __skb_unlink(next_skb, next_skb->list); + memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); + if (next_skb->ip_summed == CHECKSUM_HW) skb->ip_summed = CHECKSUM_HW; - if (skb->ip_summed != CHECKSUM_HW) { - memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); + if (skb->ip_summed != CHECKSUM_HW) skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); - } /* Update sequence range on original skb. */ TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 0:09 ` kuznet @ 2003-01-29 0:46 ` Sebastian Benoit 2003-01-29 4:12 ` Christopher Faylor 2003-01-29 6:52 ` David S. Miller 1 sibling, 1 reply; 38+ messages in thread From: Sebastian Benoit @ 2003-01-29 0:46 UTC (permalink / raw) To: kuznet Cc: David S. Miller, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi [-- Attachment #1: Type: text/plain, Size: 486 bytes --] kuznet@ms2.inr.ac.ru(kuznet@ms2.inr.ac.ru)@2003.01.29 03:09:21 +0000: > Hello! > > The proposed fix is enclosed. Please, check. okay, this seems to be a solution. i can't get the ssh session to lock up with this patch. thanks, B. -- Sebastian Benoit <benoit-lists@fb12.de> My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/ GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00 I'm not as think as you stoned I am. [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 0:46 ` Sebastian Benoit @ 2003-01-29 4:12 ` Christopher Faylor 0 siblings, 0 replies; 38+ messages in thread From: Christopher Faylor @ 2003-01-29 4:12 UTC (permalink / raw) To: linux-kernel On Wed, Jan 29, 2003 at 01:46:42AM +0100, Sebastian Benoit wrote: >kuznet@ms2.inr.ac.ru(kuznet@ms2.inr.ac.ru)@2003.01.29 03:09:21 +0000: >>The proposed fix is enclosed. Please, check. > >okay, this seems to be a solution. i can't get the ssh session to lock >up with this patch. Ditto for me. Thank you! cgf ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 0:09 ` kuznet 2003-01-29 0:46 ` Sebastian Benoit @ 2003-01-29 6:52 ` David S. Miller 1 sibling, 0 replies; 38+ messages in thread From: David S. Miller @ 2003-01-29 6:52 UTC (permalink / raw) To: kuznet; +Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: kuznet@ms2.inr.ac.ru Date: Wed, 29 Jan 2003 03:09:21 +0300 (MSK) The proposed fix is enclosed. Please, check. Installed locally and I will propagate everywhere as soon as possible. Thanks a lot Alexey. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-28 20:34 ` David S. Miller 2003-01-28 21:59 ` Christopher Faylor 2003-01-28 22:12 ` Sebastian Benoit @ 2003-01-28 23:56 ` kuznet 2003-01-29 0:08 ` David S. Miller 2 siblings, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-28 23:56 UTC (permalink / raw) To: David S. Miller Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi Hello! > Alexey, most solid report is that 2.5.43-bk1 makes bug appear. > This is good because it sort of narrows things down. Now I do not think so. It looks like some old beast just got manifested. It happens when 2 short consecutive segments are lost. Funny thing happen when retransmitting. First, I do not see collapsing, which must be succesfull in this case. So, the first segment is retransmitted alone, but the second is never retransmitted, tcp even prefers to retransmit the third one. Something is already bad, queue is broken in an interesting way, the impression is that... that... that tcp did collapsing, but "forgot" to modify skb length. Hey! Interesting thing has just happened, it is the first time when I found the bug formulating a senstence while writing e-mail not while peering to code. :-) Shheit, look into tcp_retrans_try_collapse(): if (skb->ip_summed != CHECKSUM_HW) { memcpy(skb_put(skb, next_skb_size), next_skb->data, nex$ skb->csum = csum_block_add(skb->csum, next_skb->csum, s$ } WHERE IS skb_put and copy when skb->ip_summed==CHECKSUM_HW??!! So, the fix is move of memcpy() line out of if clause. Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-28 23:56 ` kuznet @ 2003-01-29 0:08 ` David S. Miller 2003-01-29 3:14 ` kuznet 2003-01-29 14:12 ` David C Niemi 0 siblings, 2 replies; 38+ messages in thread From: David S. Miller @ 2003-01-29 0:08 UTC (permalink / raw) To: kuznet; +Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: kuznet@ms2.inr.ac.ru Date: Wed, 29 Jan 2003 02:56:41 +0300 (MSK) Hey! Interesting thing has just happened, it is the first time when I found the bug formulating a senstence while writing e-mail not while peering to code. :-) Congratulations :-) Shheit, look into tcp_retrans_try_collapse(): if (skb->ip_summed != CHECKSUM_HW) { memcpy(skb_put(skb, next_skb_size), next_skb->data, nex$ skb->csum = csum_block_add(skb->csum, next_skb->csum, s$ } WHERE IS skb_put and copy when skb->ip_summed==CHECKSUM_HW??!! So, the fix is move of memcpy() line out of if clause. Indeed, this bug exists in 2.4 as well of course. This bug is 2.4.3 vintage :-) It got added as part of initial zerocopy merge in fact. Here is 2.4.x version of fix, 2.5.x is identicaly sans some line number differences. I will push this all to Linus/Marcelo. BTW, Alexey, please please explain to me how that trick made by tcp_trim_head() works. :-) I am talking about how it is setting ip_summed to CHECKSUM_HARDWARE blindly and not even bothering to set skb->csum correctly. --- net/ipv4/tcp_output.c.~1~ Tue Jan 28 16:12:39 2003 +++ net/ipv4/tcp_output.c Tue Jan 28 16:14:18 2003 @@ -721,10 +721,9 @@ if (next_skb->ip_summed == CHECKSUM_HW) skb->ip_summed = CHECKSUM_HW; - if (skb->ip_summed != CHECKSUM_HW) { - memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); + memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); + if (skb->ip_summed != CHECKSUM_HW) skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); - } /* Update sequence range on original skb. */ TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 0:08 ` David S. Miller @ 2003-01-29 3:14 ` kuznet 2003-01-29 7:32 ` David S. Miller 2003-01-29 14:12 ` David C Niemi 1 sibling, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-29 3:14 UTC (permalink / raw) To: David S. Miller Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi Hello! > BTW, Alexey, please please explain to me how that trick made > by tcp_trim_head() works. :-) I am talking about how it is > setting ip_summed to CHECKSUM_HARDWARE blindly and not even > bothering to set skb->csum correctly. skb->csum is not used inside TCP when skb->ip_summed==CHECKSUM_HW: void tcp_v4_send_check(struct sock *sk, struct tcphdr *th, int len, struct sk_buff *skb) { struct inet_opt *inet = inet_sk(sk); if (skb->ip_summed == CHECKSUM_HW) { th->check = ~tcp_v4_check(th, len, inet->saddr, inet->daddr, 0); skb->csum = offsetof(struct tcphdr, check); And when pushing segment down to IP, it is initialized to offset of th->check. So, it is safe to make skb->ip_summed := CHECKSUM_HW any moment when we are lazy to recalculate checksum. Frankly speaking, it is not very good, I was confused _a_ _lot_ when seeing wrong checksums on those bogus zero-length packets in tcpdumps made by Christopher. But saves some source lines. Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 3:14 ` kuznet @ 2003-01-29 7:32 ` David S. Miller 0 siblings, 0 replies; 38+ messages in thread From: David S. Miller @ 2003-01-29 7:32 UTC (permalink / raw) To: kuznet; +Cc: benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi From: kuznet@ms2.inr.ac.ru Date: Wed, 29 Jan 2003 06:14:55 +0300 (MSK) skb->csum is not used inside TCP when skb->ip_summed==CHECKSUM_HW: ... So, it is safe to make skb->ip_summed := CHECKSUM_HW any moment when we are lazy to recalculate checksum. I see, clever trick as I had suspected. Thanks for the explanation. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 0:08 ` David S. Miller 2003-01-29 3:14 ` kuznet @ 2003-01-29 14:12 ` David C Niemi 2003-01-29 14:24 ` kuznet 2003-02-02 15:40 ` Bill Davidsen 1 sibling, 2 replies; 38+ messages in thread From: David C Niemi @ 2003-01-29 14:12 UTC (permalink / raw) To: David S. Miller Cc: kuznet, benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi On Tue, 28 Jan 2003, David S. Miller wrote: > From: kuznet@ms2.inr.ac.ru > Date: Wed, 29 Jan 2003 02:56:41 +0300 (MSK) > > Hey! Interesting thing has just happened, it is the first time when I > found the bug formulating a senstence while writing e-mail not while > peering to code. :-) > > Congratulations :-) Just to confirm, this fix works for me as well. ... > Indeed, this bug exists in 2.4 as well of course. > > This bug is 2.4.3 vintage :-) It got added as part of initial > zerocopy merge in fact. Odd, then, that it I was unable to reproduce the SSH hangs under 2.4.18 even once, despite heavily using it for several days under the same circumstances. Is there any reason 2.4.x would be better able to recover? 2.5.59 with the fix seems to feel a bit less balky than 2.4.18 without the fix, so it seemed to me that 2.4.18 had some way of recovering at the cost of a several second pause in the session. DCN ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 14:12 ` David C Niemi @ 2003-01-29 14:24 ` kuznet 2003-01-29 15:11 ` dada1 2003-02-02 15:40 ` Bill Davidsen 1 sibling, 1 reply; 38+ messages in thread From: kuznet @ 2003-01-29 14:24 UTC (permalink / raw) To: David C Niemi Cc: davem, benoit-lists, dada1, cgf, andersg, lkernel2003, linux-kernel, tobi Hello! > Odd, then, that it I was unable to reproduce the SSH hangs under 2.4.18 The bug is there, but it cannot be triggered with ssh. In 2.4 it can happen only on sockets which use sendfile(). Alexey ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 14:24 ` kuznet @ 2003-01-29 15:11 ` dada1 0 siblings, 0 replies; 38+ messages in thread From: dada1 @ 2003-01-29 15:11 UTC (permalink / raw) To: kuznet, David C Niemi Cc: davem, benoit-lists, cgf, andersg, lkernel2003, linux-kernel, tobi > Hello! > > > Odd, then, that it I was unable to reproduce the SSH hangs under 2.4.18 > > The bug is there, but it cannot be triggered with ssh. > In 2.4 it can happen only on sockets which use sendfile(). > > Alexey > Thanks VERY much Alexey for your fast fix. Back to linux 2.5.59, is the TOS 0x10 mandatory to have such hangs, or are all TCP sessions potentially candidates ? Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, 2003-01-29 14:12 ` David C Niemi 2003-01-29 14:24 ` kuznet @ 2003-02-02 15:40 ` Bill Davidsen 1 sibling, 0 replies; 38+ messages in thread From: Bill Davidsen @ 2003-02-02 15:40 UTC (permalink / raw) To: David C Niemi; +Cc: David S. Miller, Linux Kernel Mailing List On Wed, 29 Jan 2003, David C Niemi wrote: > > On Tue, 28 Jan 2003, David S. Miller wrote: > > From: kuznet@ms2.inr.ac.ru > > Date: Wed, 29 Jan 2003 02:56:41 +0300 (MSK) > > > > Hey! Interesting thing has just happened, it is the first time when I > > found the bug formulating a senstence while writing e-mail not while > > peering to code. :-) > > > > Congratulations :-) > > Just to confirm, this fix works for me as well. > > ... > > Indeed, this bug exists in 2.4 as well of course. > > > > This bug is 2.4.3 vintage :-) It got added as part of initial > > zerocopy merge in fact. > > Odd, then, that it I was unable to reproduce the SSH hangs under 2.4.18 > even once, despite heavily using it for several days under the same > circumstances. Is there any reason 2.4.x would be better able to recover? > 2.5.59 with the fix seems to feel a bit less balky than 2.4.18 without the > fix, so it seemed to me that 2.4.18 had some way of recovering at the cost > of a several second pause in the session. The problem which I have been seeing with some regularity is not the hang you describe (I see that infrequently) but rather a hang after I exit an ssh connection. I open several dozen windows at a time to a cluster when I do admin, and when I close almost always at least one doesn't drop without "~." to help. So far in a hour I haven't seen that. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX 2003-01-24 18:43 SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi ` (2 preceding siblings ...) 2003-01-27 14:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi @ 2003-01-27 21:27 ` Bill Davidsen 3 siblings, 0 replies; 38+ messages in thread From: Bill Davidsen @ 2003-01-27 21:27 UTC (permalink / raw) To: David C Niemi; +Cc: linux-kernel On Fri, 24 Jan 2003, David C Niemi wrote: > > I have been experiencing some baffling SSH client hangs under 2.5.59 (and > 55) in which the session totally hangs up after I have typed (typically) > 10-100 characters. Right before it hangs permanently, a character is > echo'd back to the screen several seconds late. Interestingly, data due > back for my client which is initiated by the server side does make it, I > just can't type anything further. > > To reproduce this: ssh in to a somewhat distant host. At a command > prompt, hold down a letter key for a couple of minutes, or just type text > in. If you cut'n'paste text, it rarely hangs (my guess is that this > requires a lot fewer round trips than interactive typing). It should hang > before you get a screenful (sometimes the sessions hang even before they > are set up). Sorry to say I sometimes see this on 2.4 kernels as well, even on PPP dialed connections. The symptoms are that the local ssh client just stops sending packets. That's very easy to tell with an external modem:-) The connection is still fine, if I have multiple connections to the host only one hangs, and I believe it's a client issue in ssh. What I see may or may not be related to your problem. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2003-02-02 15:34 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-01-24 18:43 SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-24 20:29 ` David S. Miller 2003-01-24 20:46 ` lost 2003-01-24 21:15 ` Christopher Faylor 2003-01-25 2:00 ` SSH Hangs in 2.5.59 and 2.5.55 (TCP_NODELAY?) Christopher Faylor 2003-01-27 14:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX David C Niemi 2003-01-27 18:06 ` David S. Miller 2003-01-27 18:11 ` David S. Miller 2003-01-27 18:28 ` Anders Gustafsson 2003-01-27 22:36 ` [TEST FIX] " David S. Miller 2003-01-28 2:25 ` lost 2003-01-28 2:57 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet 2003-01-28 3:22 ` Christopher Faylor 2003-01-28 3:39 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Christopher Faylor 2003-01-28 3:55 ` kuznet 2003-01-28 7:08 ` dada1 2003-01-28 12:36 ` Sebastian Benoit 2003-01-28 14:09 ` kuznet 2003-01-28 18:35 ` David S. Miller 2003-01-28 19:16 ` Sebastian Benoit 2003-01-28 20:34 ` David S. Miller 2003-01-28 21:59 ` Christopher Faylor 2003-01-28 22:12 ` Sebastian Benoit 2003-01-28 23:21 ` David S. Miller 2003-01-29 0:02 ` [TEST FIX] Re: SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, kuznet 2003-01-29 0:09 ` kuznet 2003-01-29 0:46 ` Sebastian Benoit 2003-01-29 4:12 ` Christopher Faylor 2003-01-29 6:52 ` David S. Miller 2003-01-28 23:56 ` kuznet 2003-01-29 0:08 ` David S. Miller 2003-01-29 3:14 ` kuznet 2003-01-29 7:32 ` David S. Miller 2003-01-29 14:12 ` David C Niemi 2003-01-29 14:24 ` kuznet 2003-01-29 15:11 ` dada1 2003-02-02 15:40 ` Bill Davidsen 2003-01-27 21:27 ` SSH Hangs in 2.5.59 and 2.5.55 but not 2.4.x, through Cisco PIX Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox