From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kieran Mansley Subject: Re: TCPBacklogDrops during aggressive bursts of traffic Date: Tue, 22 May 2012 17:32:59 +0100 Message-ID: <1337704382.1698.53.camel@kjm-desktop.uk.level5networks.com> References: <1337092718.1689.45.camel@kjm-desktop.uk.level5networks.com> <1337093776.8512.1089.camel@edumazet-glaptop> <1337099368.1689.47.camel@kjm-desktop.uk.level5networks.com> <1337099641.8512.1102.camel@edumazet-glaptop> <1337100454.2544.25.camel@bwh-desktop.uk.solarflarecom.com> <1337101280.8512.1108.camel@edumazet-glaptop> <1337272292.1681.16.camel@kjm-desktop.uk.level5networks.com> <1337272654.3403.20.camel@edumazet-glaptop> <1337674831.1698.7.camel@kjm-desktop.uk.level5networks.com> <1337678759.3361.147.camel@edumazet-glaptop> <1337679045.3361.154.camel@edumazet-glaptop> <1337699379.1698.30.camel@kjm-desktop.uk.level5networks.com> <1337703170.3361.217.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Ben Hutchings , To: Eric Dumazet Return-path: Received: from webmail.solarflare.com ([12.187.104.25]:42045 "EHLO ocex02.SolarFlarecom.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750826Ab2EVQdD (ORCPT ); Tue, 22 May 2012 12:33:03 -0400 In-Reply-To: <1337703170.3361.217.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2012-05-22 at 18:12 +0200, Eric Dumazet wrote: > > __tcp_select_window() ( more precisely tcp_space() takes into account > memory used in receive/ofo queue, but not frames in backlog queue) > > So if you send bursts, it might explain TCP stack continues to > advertise > a too big window, instead of anticipate the problem. > > Please try the following patch : > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index e79aa48..82382cb 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1042,8 +1042,9 @@ static inline int tcp_win_from_space(int space) > /* Note: caller must be prepared to deal with negative returns */ > static inline int tcp_space(const struct sock *sk) > { > - return tcp_win_from_space(sk->sk_rcvbuf - > - atomic_read(&sk->sk_rmem_alloc)); > + int used = atomic_read(&sk->sk_rmem_alloc) + > sk->sk_backlog.len; > + > + return tcp_win_from_space(sk->sk_rcvbuf - used); > } > > static inline int tcp_full_space(const struct sock *sk) I can give this a try (not sure when - probably later this week) but I think this it is back to front. The patch above will reduce the advertised window by sk_backlog.len, but at the time that the window was advertised that allowed the dropped packets to be sent the backlog was empty. It is later, when the kernel is waking the application and takes the socket lock that the backlog starts to be used and the drop happens. But reducing the window advertised at this point is futile - the packets that will be dropped are already in flight. The problem exists because the backlog has a tighter limit on it than the receive window does; I think the backlog should be able to accept sk_rcvbuf bytes in addition to what is already in the receive buffer (or up to the advertised receive window if that's smaller). At the moment it will only accept sk_rcvbuf bytes including what is already in the receive buffer. The logic being that in this case we're using the backlog because it's in the process of emptying the receive buffer into the application, and so the receive buffer will very soon be empty, and so we will very soon be able to accept sk_rcvbuf bytes. This is evident from the packet capture as the kernel stack is quite happy to accept the significant quantity of data that arrives as part of the same burst immediately after it has dropped a couple of packets. Perhaps it would be easier for me to write a patch to show this suggested solution? Kieran