From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick McHardy Subject: Re: [PATCH 3/3] Solving scalability issue: for chain list "name" searching. Date: Tue, 15 Jan 2008 18:18:48 +0100 Message-ID: <478CEAF8.5020509@trash.net> References: <1198244659.23885.8.camel@localhost.localdomain> <1198245030.23885.17.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Netfilter Developers , Harald Welte , Martin Josefsson To: jdb@comx.dk Return-path: Received: from stinky.trash.net ([213.144.137.162]:45105 "EHLO stinky.trash.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751423AbYAORSx (ORCPT ); Tue, 15 Jan 2008 12:18:53 -0500 In-Reply-To: <1198245030.23885.17.camel@localhost.localdomain> Sender: netfilter-devel-owner@vger.kernel.org List-ID: Jesper Dangaard Brouer wrote: > Solving scalability issue: for chain list "name" searching. > Functions: iptcc_find_label(), iptc_is_chain(). > > Testing if a chain exist, requires a linearly walk of linked list with > chain-names (doing a strcmp(3) in each step). Giving a worst-case > runtime of O(n) where n is the number of chains. > > Why is this important to fix?! If only called once, this should not be > a big concern, even-though the string compares are expensive. > > The performance issue arise with many chains for example; when using > "iptables-restore", or when listing all "iptables -nL" rules, or when > using CPAN IPTables::libiptc. > > Having 50k chains, the rule listing, with the command: > "./iptables -nL > /dev/null", > Without patch it takes approximately 5 minutes, > With the patch it takes 0.5 seconds. > > Listing without patch: > real 4m49.426s > user 4m37.993s > sys 0m0.280s > > Listing with patch: > real 0m0.558s > user 0m0.484s > sys 0m0.064s > > How is it solved?! > > The issue is solved introducing a new data structure, that allow us to > do binary search of chain names. Thus, reducing the worst-case runtime > to O(log n). > > Being more specific: > > The new data structure is called "chain index", which is an array with > pointers into the chain list, with CHAIN_INDEX_BUCKET_LEN spacing. > This facilitates the ability to speedup chain list searching, by find > a more optimal starting points when searching the linked list. > > The runtime complexity is actually also affected by this "bucket" size > concept. Thus, O(log(n/k) + k) where k is CHAIN_INDEX_BUCKET_LEN. > > A nice property of the chain index, is that the "bucket" list > length is max CHAIN_INDEX_BUCKET_LEN (when just build, inserts will > change this). Oppose to hashing, where the "bucket" list length can > vary a lot. > > Signed-off-by: Jesper Dangaard Brouer This looks fine and survives some basic testing, so I've applied it. Thanks a lot Jesper.