From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Hannemann Subject: Re: scp stalls mysteriously Date: Thu, 03 Dec 2009 16:34:12 +0100 Message-ID: <4B17DA74.3070600@nets.rwth-aachen.de> References: <20091130213727.2f4047d2@houba> <20091201211945.505d3c98@houba> <20091202085925.472136e2@houba> <20091202154403.GB30730@sd-11162.dedibox.fr> <20091202183451.173db5f2@houba> <4B16BD58.3040802@tvk.rwth-aachen.de> <20091203085933.GD30730@sd-11162.dedibox.fr> <4B17CABE.8070402@nets.rwth-aachen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Frederic Leroy , Damian Lukowski , Netdev , David Miller , Eric Dumazet , Herbert Xu , Greg KH To: =?ISO-8859-1?Q?Ilpo_J=E4rvinen?= Return-path: Received: from mta-2.ms.rz.RWTH-Aachen.DE ([134.130.7.73]:35187 "EHLO mta-2.ms.rz.rwth-aachen.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753509AbZLCPeY (ORCPT ); Thu, 3 Dec 2009 10:34:24 -0500 Received: from ironport-out-1.rz.rwth-aachen.de ([134.130.5.40]) by mta-2.ms.rz.RWTH-Aachen.de (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008)) with ESMTP id <0KU3005YF1XIVXD0@mta-2.ms.rz.RWTH-Aachen.de> for netdev@vger.kernel.org; Thu, 03 Dec 2009 16:34:30 +0100 (CET) In-reply-to: Sender: netdev-owner@vger.kernel.org List-ID: Ilpo J=E4rvinen wrote: > On Thu, 3 Dec 2009, Arnd Hannemann wrote: >=20 >> Ilpo J=E4rvinen wrote: >> >> [snipped] >> >>> Also, we have the another mystery to be solved, the fast retransmis= sion is=20 >>> not triggered for some reason (or alternatively not captured in to = a=20 >>> log), even in the working .9. case. It would be easy to see whether= it=20 >>> works at all from TCP point of view by looking into mibs once you h= ave=20 >>> have some transfers in a working configuration: >>> >>> grep -A1 TCP /proc/net/netstat >>> >>> ...luckily this fast retransmit issue is less crucial as almost all= people=20 >>> are pretty happy already if their RTO-based recovery works even if = the=20 >>> fast recovery would not. So figuring it out can be postponed (if on= e has=20 >>> to prioritize) until the silent death issue is out of the way. >>> >>> >> I looked at the working .9 case stream from 192.168.1.15 to 192.168.= 1.19. >> I don't think it is a mystery that fast retransmit does not trigger. >> The condition SACKED_DATA > 3* SMSS is simply not fulfilled. >> Neither are there 3 non-continuous SACK sequences. >> The segments sent are too small :-( >> Interesting though, seems to me in this case non-SACK would be bette= r than SACK. >> Or did I miss something? >=20 > Yes, a particularly big one, linux does not count SACKs bytes but pac= kets.=20 > In the first recovery, plenty of packets are SACKed: >=20 > 135 sack 1 {2598:2646}> > 108 sack 1 {2598:2694}> > 121 sack 1 {2598:2742}> > 95 sack 1 {2598:2790}> > 426 sack 1 {2598:2838}> >=20 > fackets_out should be 6 now which is way more than 3 which is the=20 > default tp->reordering. Ok, you probable know better than me. But, aren't the SKBs collapsed to SMSS size segments and then counted? I thought so. The 3*SMSS restriction is from RFC 3517, but of course you know. >=20 >> Hey we could cook up a draft for this problem ;-) >> >> Anyway, real problem is, RTO does not trigger... >=20 > There are two problems. ...Both are real. ;-) But significance of the= =20 > other is much worse than the other. I agree. I'm already trying to get scp stalling, but no luck so far. Neither wit= h artificially dropping packets, nor using WLAN :-( Best regards, Arnd