From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues Date: Thu, 06 Dec 2012 13:26:00 +0100 Message-ID: <1354796760.20888.217.camel@localhost> References: <1354319937.20109.285.camel@edumazet-glaptop> <20121204133007.20215.52566.stgit@dragon> <1354699462.20888.207.camel@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Florian Westphal , netdev@vger.kernel.org, Thomas Graf , "Paul E. McKenney" , Cong Wang , Herbert Xu To: Eric Dumazet Return-path: Received: from mx1.redhat.com ([209.132.183.28]:26090 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423364Ab2LFM0q (ORCPT ); Thu, 6 Dec 2012 07:26:46 -0500 In-Reply-To: <1354699462.20888.207.camel@localhost> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 2012-12-05 at 10:24 +0100, Jesper Dangaard Brouer wrote: > > The previous evictor patch of letting new fragments enter, worked > amazingly well. But I suspect, this might also be related to a > bug/problem in the evictor loop (which were being hidden by that > patch). The evictor loop does not contain a bug, just a SMP scalability issue (which is fixed by later patches). The first evictor patch, which does not let new fragments enter, only worked amazingly well because its hiding this (and other) scalability issues, and implicit allowing frags already "in" to exceed the mem usage for 1 jiffie. Thus, invalidating the patch, as the improvement were only a side effect. > My new *theory* is that the evictor loop, will be looping too much, if > it finds a fragment which is INET_FRAG_COMPLETE ... in that case, we > don't advance the LRU list, and thus will pickup the exact same > inet_frag_queue again in the loop... to get out of the loop we need > another CPU or packet to change the LRU list for us... I'll test that > theory... (its could also be CPUs fighting over the same LRU head > element that cause this) ... more to come... The above theory does happen, but does not cause excessive looping. The CPUs are just fighting about who gets to free the inet_frag_queue and who gets to unlink it from its data structures (I guess, resulting cache bouncing between CPUs). CPUs are fighting for the same LRU head (inet_frag_queue) element, which is bad for scalability. We could fix this by unlinking the element once a CPU graps it, but it would require us to change a read_lock to a write_lock, thus we might not gain much performance. I already (implicit) fix this is a later patch, where I'm moving the LRU lists to be per CPU. So, I don't know if it's worth fixing. (And yes, I'm using thresh 4Mb/3Mb as my default setting now, but I'm also experimenting with other thresh sizes) p.s. Thank you Eric for being so persistent, so I realized this patch were not good. We can hopefully now, move on to the other patches, which fixes the real scalability issues. --Jesper