From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: TCP: orphans broken by RFC 2525 #2.17 Date: Mon, 27 Sep 2010 09:34:43 +0200 Message-ID: <20100927073443.GR12373@1wt.eu> References: <20100926232530.GK12373@1wt.eu> <20100926.181202.28824153.davem@davemloft.net> <20100927053901.GL12373@1wt.eu> <20100926.234202.241938788.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org To: David Miller Return-path: Received: from 1wt.eu ([62.212.114.60]:45720 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751483Ab0I0Heq (ORCPT ); Mon, 27 Sep 2010 03:34:46 -0400 Content-Disposition: inline In-Reply-To: <20100926.234202.241938788.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On Sun, Sep 26, 2010 at 11:42:02PM -0700, David Miller wrote: > From: Willy Tarreau > Date: Mon, 27 Sep 2010 07:39:01 +0200 > > > On Sun, Sep 26, 2010 at 06:12:02PM -0700, David Miller wrote: > >> From: Willy Tarreau > >> Date: Mon, 27 Sep 2010 01:25:30 +0200 > >> > >> > Agreed. But that's not a reason for killing outgoing data that is > >> > being sent when there are some data left in the rcv buffer. > >> > >> What alternative notification to the peer do you suggest other than a > >> reset, then? TCP gives us no other. > > > > I know, and I agree to send the reset, but after the data are correctly > > transferred. This reset's purpose is only to inform the other side that > > the data it sent were destroyed. It is not a requirement to tell it they > > were destroyed earlier or later. What matters is that it's informed they > > were destroyed. > > So you want us to hold onto to the full connection state for however > long it takes to send the pending data Not for however long it takes, just as we do right now with orphans, nothing more, nothing less. > just because your application > doesn't want to wait around to sink a pending newline character? it's not that it *doesn't want* to wait for the pending newline character, it's that this character has no reason to be there and cannot be predicted, and even when you find it, nothing tells the application that it's the last one. > Is that what this boils down to? No, it's the opposite in fact, the goal is to ensure we can reliably release the whole connection ASAP instead of being forced to sink any possible data that may come from it and that will not be consumed nor will lead to a reset. Look : case A (current one) : we send the response to the client from an orphaned connection. Most of the times, the client won't have any issue and will get the response. In some rare circumstances, some data sent by the client after the response causes an RST to be emitted, which may destroy in flight data. But those issues are extremely rare, still they happen. case B (my proposal, and was the case before the RFC2525 fix) : we send the response to the client. it acks it we send an RST. End of the transfer. Total time: 50ms (avg RTT over ADSL). case C (alternative) : we send the response to the client. the application can't know it has acked it, and must maintain the connection open for however long is necessary to get the only form of ACK the application can detect: the FIN from the client, which is 6 minutes on my ADSL line for 10 meg. In case C, not only the state remains *a lot* longer, but the bandwidth usage is much worse, and in the end the client does not even get the reset that we're trying to ensure it gets to indicate that the data were dropped. So while case C is a reliable workaround, it's the least efficient method and the most expensive one in terms of memory, CPU, network bandwidth, socket usage, file descriptor usage and perceived time. You see, I'm not trying to make dirty dangerous things to save a few lines of code. I'm even OK to have a lot of linux-specific code to make use of the features the linux stack provides that makes it more efficient than other implementations. I'm just seeking reliability. Willy