From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: Re: issue with new TCP TSO stuff Date: Thu, 12 May 2005 21:36:18 -0700 (PDT) Message-ID: <20050512.213618.14358320.davem@davemloft.net> References: <20050512231038.GA22440@gondor.apana.org.au> <20050512.162426.75782784.davem@davemloft.net> <20050512235236.GA22658@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@oss.sgi.com Return-path: To: herbert@gondor.apana.org.au In-Reply-To: <20050512235236.GA22658@gondor.apana.org.au> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org From: Herbert Xu Subject: Re: issue with new TCP TSO stuff Date: Fri, 13 May 2005 09:52:36 +1000 > > #2 could be handled by down-sizing TSO frames when packet loss occurs. > > Ie. tcp_retransmit_skb() or whatever will segmentize a TSO packet > > which is within the sequence it is trying to retransmit. Implementing > > this is non- trivial mostly due to the fact that it has to work handle > > GFP_ATOMIC memory failures and also get all the pcount crap correct. > > I think most of the code to do that might already be there. Pity > we'll have to keep track of the pcount though. Yes, when we walk the queue to retransmit, we chop up the TSO frame via tcp_trim_head() and tcp_fragment(). It is very loose and would need refinements though. I think we'd be OK, because even if we get a GFP_ATOMIC allocation failure, it's just like any other failure. We're doing retransmits so a timer will be the worst case deadlock breaker should we get an allocation failure while chopping up the TSO frame. This actually makes me worried about sk->sk_send_head advancement. I think we have a deadlock possible here. If sk->sk_send_head is the only packet in the output queue, and the remote system has no reason to send us zero-window probes or any other packets, we could wedge if the skb_clone() fails for the tcp_transmit_skb() call. We won't set any timers if we fail to send that single sk->sk_send_head write queue packet, and we have no pending frames waiting to be ACK'd, so nothing will wake us up to retry the transmit. > > and size appropriately. I think the tcp_snd_test() simplifications > > made by my TSO Reloaded patch would help a lot here. The send test is > > logically now split to it's two tests 1) whether to send anything at > > all, and 2) once #1 passes, how many such packets. > > That was another one of my comments to your patch :) I was going to > suggest that we move the cwnd/in_flight loop from tcp_snd_test to > emit_send_queue. One elegant part of the tcp_snd_test() in the TCP Reloaded patch is that tcp_write_queue() calls it exactly once. That loop in there is bogus and unnecessary. In the changelog for the TCP Reloaded patch I mention that this can be simplified into: new_packets = skb_queue_len(&sk->sk_write_queue) - in_flight; cwnd -= in_flight; return min(new_packets, cwnd); It is a BUG() for this calculation to be larger than the number of packets from sk->sk_send_head to the end of the write queue (which is the invariant we must guarentee for our caller). We could even check for this while debugging the implementation early on. > > This would be a sort of super-TSO that would do less segmenting work > > than even a "perfect" TSO segmenter would. > > > > I'm still not sure which approach is best, just throwing around some > > ideas. > > On paper your new super-TSO approach sounds the best. However, I > suppose we won't know for sure until we try :) Yeah, TCP Reloaded sounded really nice on paper too :-) To handle these super-TSO frames properly, we would need to add sk->sk_send_offset to go along with sk->sk_send_head. So then update_send_head() would advance sk->sk_send_offset, and if that completely consumed sk->sk_send_head then the pointer would be advanced. Tests of sk->sk_send_head would need to be fixed up similarly.