From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Hannemann Subject: Re: [PATCH] tcp: bound RTO to minimum Date: Thu, 25 Aug 2011 11:46:02 +0200 Message-ID: <4E5619DA.6070902@arndnet.de> References: <1314226834.6797.5.camel@edumazet-laptop> <1314229310-8074-1-git-send-email-hagen@jauu.net> <1314250134.6797.24.camel@edumazet-laptop> <4033BFEE-C432-4D94-8372-BA166AF2AA26@comsys.rwth-aachen.de> <1314260805.2387.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <4E560BFD.5020301@arndnet.de> <1314263389.2387.21.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Zimmermann , Yuchung Cheng , Hagen Paul Pfeifer , netdev , Lukowski Damian To: Eric Dumazet Return-path: Received: from mail2.unitix.de ([176.9.2.175]:41326 "EHLO mail2.unitix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750867Ab1HYJqE (ORCPT ); Thu, 25 Aug 2011 05:46:04 -0400 In-Reply-To: <1314263389.2387.21.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Sender: netdev-owner@vger.kernel.org List-ID: Hi Eric, Am 25.08.2011 11:09, schrieb Eric Dumazet: > Le jeudi 25 ao=C3=BBt 2011 =C3=A0 10:46 +0200, Arnd Hannemann a =C3=A9= crit : >> Am 25.08.2011 10:26, schrieb Eric Dumazet: >>> Le jeudi 25 ao=C3=BBt 2011 =C3=A0 09:28 +0200, Alexander Zimmermann= a =C3=A9crit : >>>> Am 25.08.2011 um 07:28 schrieb Eric Dumazet: >>> >>>>> Real question is : do we really want to process ~1000 timer inter= rupts >>>>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1= 000 ARP >>>>> requests, only to make tcp revover in ~1sec when connectivity ret= urns >>>>> back. This just doesnt scale. >>>> >>>> maybe a stupid question, but 1000?. With an minRTO of 200ms and a = maximum >>>> probing time of 120s, we 600 retransmits in a worst-case-senario >>>> (assumed that we get for every rot retransmission an icmp). No? >>> >>> Where is asserted the "max probing time of 120s" ?=20 >>> >>> It is not the case on my machine : >>> I have way more retransmits than that, even if spaced by 1600 ms >>> >>> 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\2= 6\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) =3D 48 >>> 07:16:13.389417 select(7, [3 4], [], NULL, NULL) =3D 1 (in [3]) >>> 07:31:39.901311 read(3, 0xff8c4c90, 8192) =3D -1 EHOSTUNREACH (No r= oute to host) >>> >>> Old kernels where performing up to 15 retries, doing exponential ba= ckoff. >>> >>> Now its kind of unlimited, according to experimental results. >> >> That shouldn't be. It should stop after the same time a TCP connecti= on with an >> RTO of Minimum RTO which is doing 15 retries (tcp_retries2=3D15) and= doing exponential backoff. >> So it should be around 900s*. But it could be that because of the ic= sk_retransmit wrapover >> this doesn't work as expected. >> >> * 200ms + 400ms + 800ms ... >=20 > It is 924 second with retries2=3D15 (default value) >=20 > I said ~1000 probes. >=20 > If ICMP are not rate limited, that could be about 924*5 probes, inste= ad > of 15 probes on old kernels. At a rate of 5 packets/s if RTT is zero, yes. I would like to say: so what? But your example with millions of idle connections stands. > Maybe we should refine the thing a bit, to not reverse backoff unless > rto is > some_threshold. >=20 > Say 10s being the value, that would give at most 92 tries. I personally think that 10s would be too large and eliminate the benefi= t of the algorithm, so I would prefer a different solution. In case of one bulk data TCP session, which was transmitting hundreds o= f packets/s before the connectivity disruption those worst case rate of 5 packet/s = really seems conservative enough. However in case of a lot of idle connections, which were transmitting o= nly a number of packets per minute. We might increase the rate drastically = for a certain period until it throttles down. You say that we have a proble= m here correct? Do you think it would be possible without much hassle to use a kind of = "global" rate limiting only for these probe packets of a TCP connection? > I mean, what is the gain to be able to restart a frozen TCP session w= ith > a 1sec latency instead of 10s if it was blocked more than 60 seconds = ? I'm afraid it does a lot, especially in highly dynamic environments. Yo= u don't have just the additional latency, you may actually miss the full period where connectivity was there, and then just retransmit into the = next connectivity disrupted period. Best regards, Arnd