From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCPBacklogDrops during aggressive bursts of traffic Date: Wed, 23 May 2012 19:24:26 +0200 Message-ID: <1337793866.3361.3090.camel@edumazet-glaptop> References: <1337092718.1689.45.camel@kjm-desktop.uk.level5networks.com> <1337099368.1689.47.camel@kjm-desktop.uk.level5networks.com> <1337099641.8512.1102.camel@edumazet-glaptop> <1337100454.2544.25.camel@bwh-desktop.uk.solarflarecom.com> <1337101280.8512.1108.camel@edumazet-glaptop> <1337272292.1681.16.camel@kjm-desktop.uk.level5networks.com> <1337272654.3403.20.camel@edumazet-glaptop> <1337674831.1698.7.camel@kjm-desktop.uk.level5networks.com> <1337678759.3361.147.camel@edumazet-glaptop> <1337679045.3361.154.camel@edumazet-glaptop> <1337699379.1698.30.camel@kjm-desktop.uk.level5networks.com> <1337703170.3361.217.camel@edumazet-glaptop> <1337704382.1698.53.camel@kjm-desktop.uk.level5networks.com> <1337705135.3361.226.camel@edumazet-glaptop> <1337720076.3361.667.camel@edumazet-glaptop> <1337766246.3361.2447.camel@edumazet-glaptop> <1337774978.3361.2744.camel@edumazet-glaptop> <4FBD0A85.4040407@intel.com> <1337789530.3361.2992.camel@edumazet-glaptop> <4FBD1740.1020304@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Kieran Mansley , Jeff Kirsher , Ben Hutchings , netdev@vger.kernel.org To: Alexander Duyck Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:37683 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752747Ab2EWRYc (ORCPT ); Wed, 23 May 2012 13:24:32 -0400 Received: by eeit10 with SMTP id t10so2189618eei.19 for ; Wed, 23 May 2012 10:24:30 -0700 (PDT) In-Reply-To: <4FBD1740.1020304@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 2012-05-23 at 09:58 -0700, Alexander Duyck wrote: > Right, but the problem is that in order to make this work the we are > dropping the padding for head and hoping to have room for shared info. > This is going to kill performance for things like routing workloads > since the entire head is going to have to be copied over to make space > for NET_SKB_PAD. Hey I said that one of the point I have to add to my patch. Please read it again ;) By the way, we can also add code doing the ksb->head upgrade to fragment again in case we need to add a tunnel header, instead of full copy. So maybe the NET_SKB_PAD is not really needed anymore. Anyway, a router host could use a different allocation strategy (going back to current one) > Also this assumes no RSC being enabled. RSC is > normally enabled by default. If it is turned on we are going to start > receiving full 2K buffers which will cause even more issues since there > wouldn't be any room for shared info in the 2K frame. > Hey his is one of the point I have to address, also mentioned. Its almost trivial to check len (if we have room for shared info, take it, if not allocate the head as before) > The way the driver is currently written probably provides the optimal > setup for truesize given the circumstances. It unfortunate the hardware has 1KB granularity. > In order to support > receiving at least 1 full 1500 byte frame per fragment, and supporting > RSC I have to support receiving up to 2K of data. If we try to make it > all part of one paged receive we would then have to either reduce the > receive buffer size to 1K in hardware and span multiple fragments for a > 1.5K frame or allocate a 3K buffer so we would have room to add > NET_SKB_PAD and the shared info on the end. At which point we are back > to the extra 1K again, only in that case we cannot trim it off later via > skb_try_coalesce. In the 3K buffer case we would be over a 1/2 page > which means we can only get one buffer per page instead of 2 in which > case we might as well just round it up to 4K and be honest. > > The reason I am confused is that I thought the skb_try_coalesce function > was supposed to be what addressed these types of issues. If these > packets go through that function they should be stripping the sk_buff > and possibly even the skb->head if we used the fragment since the only > thing that is going to end up in the head would be the TCP header which > should have been pulled prior to trying to coalesce. > > I will need to investigate this further to understand what is going on. > I realize that dealing with 3K of memory for buffer storage is not > ideal, but all of the alternatives lean more toward 4K when fully > implemented. I'll try and see what alternative solutions we might have > available. Problem is skb_try_coalesce() is not used when we store packets in socket backlog, and only used for TCP at this moment.