From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [net-next PATCH 00/17] fib_trie: Reduce time spent in fib_table_lookup by 35 to 75% Date: Thu, 01 Jan 2015 21:08:41 -0500 (EST) Message-ID: <20150101.210841.1269406605009943743.davem@davemloft.net> References: <20141231184649.3006.29958.stgit@ahduyck-vm-fedora20> <20141231.184610.1802958694945952516.davem@davemloft.net> <54A4B1D4.1030506@gmail.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: alexander.h.duyck@redhat.com, netdev@vger.kernel.org To: alexander.duyck@gmail.com Return-path: Received: from shards.monkeyblade.net ([149.20.54.216]:36894 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751523AbbABCIn (ORCPT ); Thu, 1 Jan 2015 21:08:43 -0500 In-Reply-To: <54A4B1D4.1030506@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Alexander Duyck Date: Wed, 31 Dec 2014 18:32:52 -0800 > On 12/31/2014 03:46 PM, David Miller wrote: >> This knocks about 35 cpu cycles off of a lookup that ends up using the >> default route on sparc64. From about ~438 cycles to ~403. > > Did that 438 value include both fib_table_lookup and check_leaf? Just > curious as the overall gain seems smaller than what I have been seeing > on the x86 system I was testing with, but then again it could just be a > sparc64 thing. This is just a default run of my kbench_mod.ko from the net_test_tools repo. You can try it as well on x86-86 or similar. > I've started work on a second round of patches. With any luck they > should be ready by the time the next net-next opens. My hope is to cut > the look-up time by another 30 to 50%, though it will take some time as > I have to go though and drop the leaf_info structure, and look at > splitting the tnode in half to break the key/pos/bits and child pointer > dependency chain which will hopefully allow for a significant reduction > in memory read stalls. I'm very much looking forward to this. > I am also planning to take a look at addressing the memory waste that > occurs on nodes larger than 256 bytes due to the way kmalloc allocates > memory as powers of 2. I'm thinking I might try encouraging the growth > of smaller nodes, and discouraging anything over 256 by implementing a > "truesize" type logic that can be used in the inflate/halve functions so > that the memory usage is more accurately reflected. Wouldn't this result in a deeper tree? The whole point is to keep the tree as shallow as possible to minimize the memory refs on a lookup right?