From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH net-next-2.6 v2] filter: optimize sk_run_filter Date: Fri, 19 Nov 2010 09:52:00 -0800 (PST) Message-ID: <20101119.095200.59681766.davem@davemloft.net> References: <1290172607.3034.124.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: eric.dumazet@gmail.com, hagen@jauu.net, netdev@vger.kernel.org To: xiaosuo@gmail.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:39745 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755318Ab0KSRvf convert rfc822-to-8bit (ORCPT ); Fri, 19 Nov 2010 12:51:35 -0500 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: =46rom: Changli Gao Date: Fri, 19 Nov 2010 22:13:07 +0800 > On Fri, Nov 19, 2010 at 9:16 PM, Eric Dumazet wrote: >> [PATCH net-next-2.6 v2] filter: optimize sk_run_filter >> >> Remove pc variable to avoid arithmetic to compute fentry at each fil= ter >> instruction. Jumps directly manipulate fentry pointer. >> >> As the last instruction of filter[] is guaranteed to be a RETURN, an= d >> all jumps are before the last instruction, we dont need to check fil= ter >> bounds (number of instructions in filter array) at each iteration, s= o we >> remove it from sk_run_filter() params. >> >> On x86_32 remove f_k var introduced in commit 57fe93b374a6b871 >> (filter: make sure filters dont read uninitialized memory) >> >> Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order = to >> avoid too many ifdefs in this code. >> >> This helps compiler to use cpu registers to hold fentry and A >> accumulator. >> >> On x86_32, this saves 401 bytes, and more important, sk_run_filter() >> runs much faster because less register pressure (One less conditiona= l >> branch per BPF instruction) >> >> # size net/core/filter.o net/core/filter_pre.o >> =A0 text =A0 =A0data =A0 =A0 bss =A0 =A0 dec =A0 =A0 hex filename >> =A0 2948 =A0 =A0 =A0 0 =A0 =A0 =A0 0 =A0 =A02948 =A0 =A0 b84 net/cor= e/filter.o >> =A0 3349 =A0 =A0 =A0 0 =A0 =A0 =A0 0 =A0 =A03349 =A0 =A0 d15 net/cor= e/filter_pre.o >> >> on x86_64 : >> # size net/core/filter.o net/core/filter_pre.o >> =A0 text =A0 =A0data =A0 =A0 bss =A0 =A0 dec =A0 =A0 hex filename >> =A0 5173 =A0 =A0 =A0 0 =A0 =A0 =A0 0 =A0 =A05173 =A0 =A01435 net/cor= e/filter.o >> =A0 5224 =A0 =A0 =A0 0 =A0 =A0 =A0 0 =A0 =A05224 =A0 =A01468 net/cor= e/filter_pre.o >> >> Signed-off-by: Eric Dumazet >> Cc: Changli Gao >> Cc: Hagen Paul Pfeifer >=20 > Acked-by: Changli Gao Ok, I'm applying this to net-next-2.6 for now. It keeps the "f_k" situation optimal for all cases, on every platform I've taken a look at the asm output (sparc64, x86-32, x86-64). I can't currently think of a way to get rid of that ifdef, so for now it's a small price to pay to get this optimal. Thanks Eric!