netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Willy Tarreau <willy@w.ods.org>
Cc: Harald Welte <laforge@netfilter.org>,
	netdev@vger.kernel.org, netfilter-devel@lists.netfilter.org,
	linux-kernel@vger.kernel.org, Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 2/3] netfilter : 3 patches to boost ip_tables performance
Date: Fri, 23 Sep 2005 07:14:24 +0200	[thread overview]
Message-ID: <43338F30.6070601@cosmosbay.com> (raw)
In-Reply-To: <20050923040234.GC595@alpha.home.local>

Willy Tarreau a écrit :
> On Thu, Sep 22, 2005 at 03:05:50PM +0200, Eric Dumazet wrote:
> (...) 
> 
>>It was necessary to get the best code with gcc-3.4.4 on i386 and 
>>gcc-4.0.1 on x86_64
>>
>>For example :
>>
>>bool1 = FWINV(ret != 0, IPT_INV_VIA_OUT);
>>if (bool1) {
>>
>>gives a better code than :
>>
>>if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
>>
>>(one less conditional branch)
>>
>>Dont ask me why, it is shocking but true :(
> 
> 
> I also noticed many times that gcc's optimization of "if (complex condition)"
> is rather poor and it's often better to put it in a variable before. I even
> remember that if you use an intermediate variable, it can often generate a
> CMOV instruction on processors which support it, while it produces cond tests
> and jumps without the variable. Generally speaking, if you want fast code,
> you have to write it as a long sequence of small instructions, just as if
> you were writing assembly. As you said, shocking but true.

Even without CMOV support, the suggested patch helps :

Here is the code generated with gcc-3.4.4  on a pentium4 (i686) for :

/********************/
bool1 = ((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr);
bool1 ^= !!(ipinfo->invflags & IPT_INV_SRCIP);

bool2 = ((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr);
bool2 ^= !!(ipinfo->invflags & IPT_INV_DSTIP);

if ((bool1 | bool2) != 0) {

/********************/
cb:       0f b6 56 53          movzbl 0x53(%esi),%edx
cf:       8b 46 08             mov    0x8(%esi),%eax #ip->saddr
d2:       23 47 0c             and    0xc(%edi),%eax #ipinfo->smsk.s_addr
d5:       0f b6 da             movzbl %dl,%ebx
d8:       3b 06                cmp    (%esi),%eax #ipinfo->src.s_addr
da:       88 55 cf             mov    %dl,0xffffffcf(%ebp)
dd:       89 da                mov    %ebx,%edx
df:       0f 95 c0             setne  %al
e2:       c1 ea 03             shr    $0x3,%edx
e5:       31 c2                xor    %eax,%edx
e7:       8b 46 0c             mov    0xc(%esi),%eax #ip->daddr&ipinfo
ea:       23 47 10             and    0x10(%edi),%eax #ipinfo->dmsk.s_addr
ed:       3b 46 04             cmp    0x4(%esi),%eax #ipinfo->dst.s_addr
f0:       89 d8                mov    %ebx,%eax
f2:       0f 95 c1             setne  %cl
f5:       c1 e8 04             shr    $0x4,%eax
f8:       31 c8                xor    %ecx,%eax
fa:       09 d0                or     %edx,%eax
fc:       a8 01                test   $0x1,%al
fe:       0f 85 95 00 00 00    jne    dest // only one conditional branch

As you can see the whole sequence is rather good : only one conditional branch
(No CMOV instructions as you can see, so even on a i486 the code should be 
roughly the same)

Now here is the code generated for the original code :
/********************/
	if (FWINV((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr,
			IPT_INV_SRCIP)
	  || FWINV((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr,
			IPT_INV_DSTIP)) {
/********************/
       cb:       0f b6 4e 53             movzbl 0x53(%esi),%ecx
       cf:       f6 c1 08                test   $0x8,%cl
       d2:       0f 84 af 01 00 00       je     287 <ipt_do_table+0x25d>
       d8:       8b 46 08                mov    0x8(%esi),%eax
       db:       23 47 0c                and    0xc(%edi),%eax
       de:       3b 06                   cmp    (%esi),%eax
       e0:       0f 84 b0 01 00 00       je     296 <ipt_do_table+0x26c>
       e6:       f6 c1 10                test   $0x10,%cl
       e9:       0f 84 d4 01 00 00       je     2c3 <ipt_do_table+0x299>
       ef:       8b 46 0c                mov    0xc(%esi),%eax
       f2:       23 47 10                and    0x10(%edi),%eax
       f5:       3b 46 04                cmp    0x4(%esi),%eax
       f8:       0f 84 98 01 00 00       je     296 <ipt_do_table+0x26c>

...

      287:       8b 46 08                mov    0x8(%esi),%eax
      28a:       23 47 0c                and    0xc(%edi),%eax
      28d:       3b 06                   cmp    (%esi),%eax
      28f:       2e 0f 84 50 fe ff ff    je,pn  e6 <ipt_do_table+0xbc>
      296:       0f b7 46 5a             movzwl 0x5a(%esi),%eax
      29a:       01 c6                   add    %eax,%esi
      29c:       8b 4d f0                mov    0xfffffff0(%ebp),%ecx
      29f:       85 c9                   test   %ecx,%ecx
      2a1:       0f 84 24 fe ff ff       je     cb <ipt_do_table+0xa1>

...

      2c3:       8b 46 0c                mov    0xc(%esi),%eax
      2c6:       23 47 10                and    0x10(%edi),%eax
      2c9:       3b 46 04                cmp    0x4(%esi),%eax
      2cc:       75 c8                   jne    296 <ipt_do_table+0x26c>
      2ce:       e9 2b fe ff ff          jmp    fe <ipt_do_table+0xd4>


/******************/
	As you can see, that a lot of conditional branches, that cannot be predicted 
correctly by the cpu, unless consecutives iptables rules generate the same flow.


Eric

  reply	other threads:[~2005-09-23  5:14 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-19 17:09 [PATCH, netfilter] NUMA aware ipv4/netfilter/ip_tables.c Eric dumazet
2005-09-19 17:20 ` Eric Dumazet
2005-09-19 17:48 ` Andi Kleen
2005-09-19 19:09   ` Eric Dumazet
2005-09-20  9:47   ` Eric Dumazet
2005-09-20 16:30     ` Andi Kleen
2005-09-20 17:02       ` Eric Dumazet
2005-09-20 21:45       ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h , " Eric Dumazet
2005-09-20 21:46         ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h Eric Dumazet
2005-09-21 21:24           ` [PATCH 0/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-21 22:43             ` Christoph Lameter
2005-09-22  0:34               ` David S. Miller
2005-09-22  1:44                 ` Christoph Lameter
2005-09-22 12:11                   ` Eric Dumazet
2005-09-22 12:49                     ` Christoph Hellwig
2005-09-22 12:54                       ` Andi Kleen
2005-09-22 12:58                         ` Christoph Hellwig
2005-09-22 13:05                           ` Andi Kleen
2005-09-22 15:37                             ` Christoph Lameter
2005-09-22 15:50                               ` Eric Dumazet
2005-09-22 15:55                                 ` Christoph Lameter
2005-09-23 17:11                                 ` Harald Welte
2005-09-23 17:44                                   ` Christoph Lameter
2005-09-23 18:04                                     ` Dave Hansen
2005-09-23 17:47                                   ` Eric Dumazet
2005-09-23 18:00                                     ` Kyle Moffett
2005-09-22  4:18             ` James Morris
2005-09-22  5:07               ` Eric Dumazet
2005-09-22 13:03             ` Andi Kleen
2005-09-22 13:30               ` Eric Dumazet
2005-09-23 17:09               ` Harald Welte
2005-09-27 16:23                 ` Andi Kleen
2005-09-28  0:25                   ` Henrik Nordstrom
2005-09-28  8:32                     ` Harald Welte
2005-09-28  8:37                       ` Andi Kleen
2005-10-04 17:01                         ` Patrick McHardy
2005-10-05 16:53                           ` Andi Kleen
2005-10-07  2:38                             ` Harald Welte
2005-10-06 17:59                               ` Andi Kleen
2005-10-07 17:08                                 ` Patrick McHardy
2005-10-07 17:21                                   ` Andi Kleen
2005-10-07 17:50                                     ` Patrick McHardy
2005-09-28 10:34                       ` Henrik Nordstrom
2005-11-25 11:23             ` [PATCH] netfilter : zap get_cpu()/put_cpu() calls from ip_tables Eric Dumazet
2005-11-25 11:28               ` [PATCH (resent with the attachment !)] " Eric Dumazet
2005-11-25 18:20                 ` Patrick McHardy
2005-09-21 21:29           ` [PATCH 1/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-22 12:57             ` Harald Welte
2005-09-22 13:17               ` Eric Dumazet
2005-09-21 21:32           ` [PATCH 2/3] " Eric Dumazet
2005-09-22 12:48             ` Harald Welte
2005-09-22 13:05               ` Eric Dumazet
2005-09-23  4:02                 ` Willy Tarreau
2005-09-23  5:14                   ` Eric Dumazet [this message]
2005-09-23 11:33                     ` Willy Tarreau
2005-09-23 14:00                   ` Tim Mattox
2005-09-21 21:37           ` [PATCH 3/3] " Eric Dumazet
2005-09-22 12:50             ` Harald Welte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43338F30.6070601@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ak@suse.de \
    --cc=laforge@netfilter.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@lists.netfilter.org \
    --cc=willy@w.ods.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).