All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Willy Tarreau <willy@w.ods.org>
Cc: Harald Welte <laforge@netfilter.org>,
	netdev@vger.kernel.org, netfilter-devel@lists.netfilter.org,
	linux-kernel@vger.kernel.org, Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 2/3] netfilter : 3 patches to boost ip_tables performance
Date: Fri, 23 Sep 2005 07:14:24 +0200	[thread overview]
Message-ID: <43338F30.6070601@cosmosbay.com> (raw)
In-Reply-To: <20050923040234.GC595@alpha.home.local>

Willy Tarreau a écrit :
> On Thu, Sep 22, 2005 at 03:05:50PM +0200, Eric Dumazet wrote:
> (...) 
> 
>>It was necessary to get the best code with gcc-3.4.4 on i386 and 
>>gcc-4.0.1 on x86_64
>>
>>For example :
>>
>>bool1 = FWINV(ret != 0, IPT_INV_VIA_OUT);
>>if (bool1) {
>>
>>gives a better code than :
>>
>>if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
>>
>>(one less conditional branch)
>>
>>Dont ask me why, it is shocking but true :(
> 
> 
> I also noticed many times that gcc's optimization of "if (complex condition)"
> is rather poor and it's often better to put it in a variable before. I even
> remember that if you use an intermediate variable, it can often generate a
> CMOV instruction on processors which support it, while it produces cond tests
> and jumps without the variable. Generally speaking, if you want fast code,
> you have to write it as a long sequence of small instructions, just as if
> you were writing assembly. As you said, shocking but true.

Even without CMOV support, the suggested patch helps :

Here is the code generated with gcc-3.4.4  on a pentium4 (i686) for :

/********************/
bool1 = ((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr);
bool1 ^= !!(ipinfo->invflags & IPT_INV_SRCIP);

bool2 = ((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr);
bool2 ^= !!(ipinfo->invflags & IPT_INV_DSTIP);

if ((bool1 | bool2) != 0) {

/********************/
cb:       0f b6 56 53          movzbl 0x53(%esi),%edx
cf:       8b 46 08             mov    0x8(%esi),%eax #ip->saddr
d2:       23 47 0c             and    0xc(%edi),%eax #ipinfo->smsk.s_addr
d5:       0f b6 da             movzbl %dl,%ebx
d8:       3b 06                cmp    (%esi),%eax #ipinfo->src.s_addr
da:       88 55 cf             mov    %dl,0xffffffcf(%ebp)
dd:       89 da                mov    %ebx,%edx
df:       0f 95 c0             setne  %al
e2:       c1 ea 03             shr    $0x3,%edx
e5:       31 c2                xor    %eax,%edx
e7:       8b 46 0c             mov    0xc(%esi),%eax #ip->daddr&ipinfo
ea:       23 47 10             and    0x10(%edi),%eax #ipinfo->dmsk.s_addr
ed:       3b 46 04             cmp    0x4(%esi),%eax #ipinfo->dst.s_addr
f0:       89 d8                mov    %ebx,%eax
f2:       0f 95 c1             setne  %cl
f5:       c1 e8 04             shr    $0x4,%eax
f8:       31 c8                xor    %ecx,%eax
fa:       09 d0                or     %edx,%eax
fc:       a8 01                test   $0x1,%al
fe:       0f 85 95 00 00 00    jne    dest // only one conditional branch

As you can see the whole sequence is rather good : only one conditional branch
(No CMOV instructions as you can see, so even on a i486 the code should be 
roughly the same)

Now here is the code generated for the original code :
/********************/
	if (FWINV((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr,
			IPT_INV_SRCIP)
	  || FWINV((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr,
			IPT_INV_DSTIP)) {
/********************/
       cb:       0f b6 4e 53             movzbl 0x53(%esi),%ecx
       cf:       f6 c1 08                test   $0x8,%cl
       d2:       0f 84 af 01 00 00       je     287 <ipt_do_table+0x25d>
       d8:       8b 46 08                mov    0x8(%esi),%eax
       db:       23 47 0c                and    0xc(%edi),%eax
       de:       3b 06                   cmp    (%esi),%eax
       e0:       0f 84 b0 01 00 00       je     296 <ipt_do_table+0x26c>
       e6:       f6 c1 10                test   $0x10,%cl
       e9:       0f 84 d4 01 00 00       je     2c3 <ipt_do_table+0x299>
       ef:       8b 46 0c                mov    0xc(%esi),%eax
       f2:       23 47 10                and    0x10(%edi),%eax
       f5:       3b 46 04                cmp    0x4(%esi),%eax
       f8:       0f 84 98 01 00 00       je     296 <ipt_do_table+0x26c>

...

      287:       8b 46 08                mov    0x8(%esi),%eax
      28a:       23 47 0c                and    0xc(%edi),%eax
      28d:       3b 06                   cmp    (%esi),%eax
      28f:       2e 0f 84 50 fe ff ff    je,pn  e6 <ipt_do_table+0xbc>
      296:       0f b7 46 5a             movzwl 0x5a(%esi),%eax
      29a:       01 c6                   add    %eax,%esi
      29c:       8b 4d f0                mov    0xfffffff0(%ebp),%ecx
      29f:       85 c9                   test   %ecx,%ecx
      2a1:       0f 84 24 fe ff ff       je     cb <ipt_do_table+0xa1>

...

      2c3:       8b 46 0c                mov    0xc(%esi),%eax
      2c6:       23 47 10                and    0x10(%edi),%eax
      2c9:       3b 46 04                cmp    0x4(%esi),%eax
      2cc:       75 c8                   jne    296 <ipt_do_table+0x26c>
      2ce:       e9 2b fe ff ff          jmp    fe <ipt_do_table+0xd4>


/******************/
	As you can see, that a lot of conditional branches, that cannot be predicted 
correctly by the cpu, unless consecutives iptables rules generate the same flow.


Eric

  reply	other threads:[~2005-09-23  5:14 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-19 17:09 [PATCH, netfilter] NUMA aware ipv4/netfilter/ip_tables.c Eric dumazet
2005-09-19 17:20 ` Eric Dumazet
2005-09-19 17:48 ` Andi Kleen
2005-09-19 19:09   ` Eric Dumazet
2005-09-20  9:47   ` Eric Dumazet
2005-09-20 16:30     ` Andi Kleen
2005-09-20 17:02       ` Eric Dumazet
2005-09-20 21:45       ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h , " Eric Dumazet
2005-09-20 21:45         ` Eric Dumazet
2005-09-20 21:46         ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h Eric Dumazet
2005-09-21 21:24           ` [PATCH 0/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-21 22:43             ` Christoph Lameter
2005-09-22  0:34               ` David S. Miller
2005-09-22  0:34                 ` David S. Miller
2005-09-22  1:44                 ` Christoph Lameter
2005-09-22 12:11                   ` Eric Dumazet
2005-09-22 12:11                     ` Eric Dumazet
2005-09-22 12:49                     ` Christoph Hellwig
2005-09-22 12:54                       ` Andi Kleen
2005-09-22 12:58                         ` Christoph Hellwig
2005-09-22 13:05                           ` Andi Kleen
2005-09-22 15:37                             ` Christoph Lameter
2005-09-22 15:50                               ` Eric Dumazet
2005-09-22 15:50                                 ` Eric Dumazet
2005-09-22 15:55                                 ` Christoph Lameter
2005-09-23 17:11                                 ` Harald Welte
2005-09-23 17:44                                   ` Christoph Lameter
2005-09-23 18:04                                     ` Dave Hansen
2005-09-26 17:58                                       ` vmalloc_node Christoph Lameter
2005-09-26 18:10                                         ` vmalloc_node Dave Hansen
2005-09-23 17:47                                   ` [PATCH 0/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-23 18:00                                     ` Kyle Moffett
2005-09-22  4:18             ` James Morris
2005-09-22  4:18               ` James Morris
2005-09-22  5:07               ` Eric Dumazet
2005-09-22 13:03             ` Andi Kleen
2005-09-22 13:30               ` Eric Dumazet
2005-09-23 17:09               ` Harald Welte
2005-09-27 16:23                 ` Andi Kleen
2005-09-28  0:25                   ` Henrik Nordstrom
2005-09-28  8:32                     ` Harald Welte
2005-09-28  8:32                       ` Harald Welte
2005-09-28  8:37                       ` Andi Kleen
2005-09-28  8:37                         ` Andi Kleen
2005-10-04 17:01                         ` Patrick McHardy
2005-10-05 16:53                           ` Andi Kleen
2005-10-07  2:38                             ` Harald Welte
2005-10-06 17:59                               ` Andi Kleen
2005-10-07 17:08                                 ` Patrick McHardy
2005-10-07 17:21                                   ` Andi Kleen
2005-10-07 17:50                                     ` Patrick McHardy
2005-09-28 10:34                       ` Henrik Nordstrom
2005-09-28 10:34                         ` Henrik Nordstrom
2005-11-25 11:23             ` [PATCH] netfilter : zap get_cpu()/put_cpu() calls from ip_tables Eric Dumazet
2005-11-25 11:28               ` [PATCH (resent with the attachment !)] " Eric Dumazet
2005-11-25 18:20                 ` Patrick McHardy
2005-09-21 21:29           ` [PATCH 1/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-22 12:57             ` Harald Welte
2005-09-22 12:57               ` Harald Welte
2005-09-22 13:17               ` Eric Dumazet
2005-09-22 13:17                 ` Eric Dumazet
2005-09-21 21:32           ` [PATCH 2/3] " Eric Dumazet
2005-09-22 12:48             ` Harald Welte
2005-09-22 12:48               ` Harald Welte
2005-09-22 13:05               ` Eric Dumazet
2005-09-23  4:02                 ` Willy Tarreau
2005-09-23  5:14                   ` Eric Dumazet [this message]
2005-09-23 11:33                     ` Willy Tarreau
2005-09-23 14:00                   ` Tim Mattox
2005-09-21 21:37           ` [PATCH 3/3] " Eric Dumazet
2005-09-22 12:50             ` Harald Welte
2005-09-22 12:50               ` Harald Welte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43338F30.6070601@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=ak@suse.de \
    --cc=laforge@netfilter.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@lists.netfilter.org \
    --cc=willy@w.ods.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.