From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues Date: Thu, 29 Nov 2012 12:44:27 -0500 (EST) Message-ID: <20121129.124427.1093031685966728935.davem@davemloft.net> References: <20121129161019.17754.29670.stgit@dragon> <20121129161052.17754.85017.stgit@dragon> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: eric.dumazet@gmail.com, fw@strlen.de, netdev@vger.kernel.org, pablo@netfilter.org, tgraf@suug.ch, amwang@redhat.com, kaber@trash.net, paulmck@linux.vnet.ibm.com, herbert@gondor.hengli.com.au To: brouer@redhat.com Return-path: Received: from shards.monkeyblade.net ([149.20.54.216]:33622 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753749Ab2K2Roa (ORCPT ); Thu, 29 Nov 2012 12:44:30 -0500 In-Reply-To: <20121129161052.17754.85017.stgit@dragon> Sender: netdev-owner@vger.kernel.org List-ID: From: Jesper Dangaard Brouer Date: Thu, 29 Nov 2012 17:11:09 +0100 > The fragmentation evictor system have a very unfortunate eviction > system for killing fragment, when the system is put under pressure. > If packets are coming in too fast, the evictor code kills "warm" > fragments too quickly. Resulting in a massive performance drop, > because we drop frag lists where we have already queue up a lot of > fragments/work, which gets killed before they have a chance to > complete. I think this is a trade-off where the decision is somewhat arbitrary. If you kill warm entries, the sending of all of the fragments is wasted. If you retain warm entries and drop incoming new fragments, well then the sending of all of those newer fragments is wasted too. The only way I could see this making sense is if some "probability of fulfillment" was taken into account. For example, if you have more than half of the fragments already, then yes it may be advisable to retain the warm entry. Otherwise, as I said, the decision seems arbitrary. Let's take a step back and think about why this is happening at all. I wonder how reasonable the high and low thresholds really are. Even once you move them to per-cpu, I think the limits are far too small. I'm under the impression that it's common for skb->truesize for 1500 MTU frames to be something rounded up to the next power of 2, so 2048 bytes, or something like that. Then add in the sk_buff control overhead, as well as the inet_frag head. So a 64K fragmented frame probably consumes close to 100K. So once we have three 64K frames in flight, we're already over the high threshold and will start dropping things. That's beyond stupid.