From: Eric Dumazet <dada1@cosmosbay.com>
To: Willy Tarreau <willy@w.ods.org>
Cc: Harald Welte <laforge@netfilter.org>,
netdev@vger.kernel.org, netfilter-devel@lists.netfilter.org,
linux-kernel@vger.kernel.org, Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 2/3] netfilter : 3 patches to boost ip_tables performance
Date: Fri, 23 Sep 2005 07:14:24 +0200 [thread overview]
Message-ID: <43338F30.6070601@cosmosbay.com> (raw)
In-Reply-To: <20050923040234.GC595@alpha.home.local>
Willy Tarreau a écrit :
> On Thu, Sep 22, 2005 at 03:05:50PM +0200, Eric Dumazet wrote:
> (...)
>
>>It was necessary to get the best code with gcc-3.4.4 on i386 and
>>gcc-4.0.1 on x86_64
>>
>>For example :
>>
>>bool1 = FWINV(ret != 0, IPT_INV_VIA_OUT);
>>if (bool1) {
>>
>>gives a better code than :
>>
>>if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
>>
>>(one less conditional branch)
>>
>>Dont ask me why, it is shocking but true :(
>
>
> I also noticed many times that gcc's optimization of "if (complex condition)"
> is rather poor and it's often better to put it in a variable before. I even
> remember that if you use an intermediate variable, it can often generate a
> CMOV instruction on processors which support it, while it produces cond tests
> and jumps without the variable. Generally speaking, if you want fast code,
> you have to write it as a long sequence of small instructions, just as if
> you were writing assembly. As you said, shocking but true.
Even without CMOV support, the suggested patch helps :
Here is the code generated with gcc-3.4.4 on a pentium4 (i686) for :
/********************/
bool1 = ((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr);
bool1 ^= !!(ipinfo->invflags & IPT_INV_SRCIP);
bool2 = ((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr);
bool2 ^= !!(ipinfo->invflags & IPT_INV_DSTIP);
if ((bool1 | bool2) != 0) {
/********************/
cb: 0f b6 56 53 movzbl 0x53(%esi),%edx
cf: 8b 46 08 mov 0x8(%esi),%eax #ip->saddr
d2: 23 47 0c and 0xc(%edi),%eax #ipinfo->smsk.s_addr
d5: 0f b6 da movzbl %dl,%ebx
d8: 3b 06 cmp (%esi),%eax #ipinfo->src.s_addr
da: 88 55 cf mov %dl,0xffffffcf(%ebp)
dd: 89 da mov %ebx,%edx
df: 0f 95 c0 setne %al
e2: c1 ea 03 shr $0x3,%edx
e5: 31 c2 xor %eax,%edx
e7: 8b 46 0c mov 0xc(%esi),%eax #ip->daddr&ipinfo
ea: 23 47 10 and 0x10(%edi),%eax #ipinfo->dmsk.s_addr
ed: 3b 46 04 cmp 0x4(%esi),%eax #ipinfo->dst.s_addr
f0: 89 d8 mov %ebx,%eax
f2: 0f 95 c1 setne %cl
f5: c1 e8 04 shr $0x4,%eax
f8: 31 c8 xor %ecx,%eax
fa: 09 d0 or %edx,%eax
fc: a8 01 test $0x1,%al
fe: 0f 85 95 00 00 00 jne dest // only one conditional branch
As you can see the whole sequence is rather good : only one conditional branch
(No CMOV instructions as you can see, so even on a i486 the code should be
roughly the same)
Now here is the code generated for the original code :
/********************/
if (FWINV((ip->saddr&ipinfo->smsk.s_addr) != ipinfo->src.s_addr,
IPT_INV_SRCIP)
|| FWINV((ip->daddr&ipinfo->dmsk.s_addr) != ipinfo->dst.s_addr,
IPT_INV_DSTIP)) {
/********************/
cb: 0f b6 4e 53 movzbl 0x53(%esi),%ecx
cf: f6 c1 08 test $0x8,%cl
d2: 0f 84 af 01 00 00 je 287 <ipt_do_table+0x25d>
d8: 8b 46 08 mov 0x8(%esi),%eax
db: 23 47 0c and 0xc(%edi),%eax
de: 3b 06 cmp (%esi),%eax
e0: 0f 84 b0 01 00 00 je 296 <ipt_do_table+0x26c>
e6: f6 c1 10 test $0x10,%cl
e9: 0f 84 d4 01 00 00 je 2c3 <ipt_do_table+0x299>
ef: 8b 46 0c mov 0xc(%esi),%eax
f2: 23 47 10 and 0x10(%edi),%eax
f5: 3b 46 04 cmp 0x4(%esi),%eax
f8: 0f 84 98 01 00 00 je 296 <ipt_do_table+0x26c>
...
287: 8b 46 08 mov 0x8(%esi),%eax
28a: 23 47 0c and 0xc(%edi),%eax
28d: 3b 06 cmp (%esi),%eax
28f: 2e 0f 84 50 fe ff ff je,pn e6 <ipt_do_table+0xbc>
296: 0f b7 46 5a movzwl 0x5a(%esi),%eax
29a: 01 c6 add %eax,%esi
29c: 8b 4d f0 mov 0xfffffff0(%ebp),%ecx
29f: 85 c9 test %ecx,%ecx
2a1: 0f 84 24 fe ff ff je cb <ipt_do_table+0xa1>
...
2c3: 8b 46 0c mov 0xc(%esi),%eax
2c6: 23 47 10 and 0x10(%edi),%eax
2c9: 3b 46 04 cmp 0x4(%esi),%eax
2cc: 75 c8 jne 296 <ipt_do_table+0x26c>
2ce: e9 2b fe ff ff jmp fe <ipt_do_table+0xd4>
/******************/
As you can see, that a lot of conditional branches, that cannot be predicted
correctly by the cpu, unless consecutives iptables rules generate the same flow.
Eric
next prev parent reply other threads:[~2005-09-23 5:14 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-19 17:09 [PATCH, netfilter] NUMA aware ipv4/netfilter/ip_tables.c Eric dumazet
2005-09-19 17:20 ` Eric Dumazet
2005-09-19 17:48 ` Andi Kleen
2005-09-19 19:09 ` Eric Dumazet
2005-09-20 9:47 ` Eric Dumazet
2005-09-20 16:30 ` Andi Kleen
2005-09-20 17:02 ` Eric Dumazet
2005-09-20 21:45 ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h , " Eric Dumazet
2005-09-20 21:45 ` Eric Dumazet
2005-09-20 21:46 ` [PATCH] Adds sys_set_mempolicy() in include/linux/syscalls.h Eric Dumazet
2005-09-21 21:24 ` [PATCH 0/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-21 22:43 ` Christoph Lameter
2005-09-22 0:34 ` David S. Miller
2005-09-22 0:34 ` David S. Miller
2005-09-22 1:44 ` Christoph Lameter
2005-09-22 12:11 ` Eric Dumazet
2005-09-22 12:11 ` Eric Dumazet
2005-09-22 12:49 ` Christoph Hellwig
2005-09-22 12:54 ` Andi Kleen
2005-09-22 12:58 ` Christoph Hellwig
2005-09-22 13:05 ` Andi Kleen
2005-09-22 15:37 ` Christoph Lameter
2005-09-22 15:50 ` Eric Dumazet
2005-09-22 15:50 ` Eric Dumazet
2005-09-22 15:55 ` Christoph Lameter
2005-09-23 17:11 ` Harald Welte
2005-09-23 17:44 ` Christoph Lameter
2005-09-23 18:04 ` Dave Hansen
2005-09-26 17:58 ` vmalloc_node Christoph Lameter
2005-09-26 18:10 ` vmalloc_node Dave Hansen
2005-09-23 17:47 ` [PATCH 0/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-23 18:00 ` Kyle Moffett
2005-09-22 4:18 ` James Morris
2005-09-22 4:18 ` James Morris
2005-09-22 5:07 ` Eric Dumazet
2005-09-22 13:03 ` Andi Kleen
2005-09-22 13:30 ` Eric Dumazet
2005-09-23 17:09 ` Harald Welte
2005-09-27 16:23 ` Andi Kleen
2005-09-28 0:25 ` Henrik Nordstrom
2005-09-28 8:32 ` Harald Welte
2005-09-28 8:32 ` Harald Welte
2005-09-28 8:37 ` Andi Kleen
2005-09-28 8:37 ` Andi Kleen
2005-10-04 17:01 ` Patrick McHardy
2005-10-05 16:53 ` Andi Kleen
2005-10-07 2:38 ` Harald Welte
2005-10-06 17:59 ` Andi Kleen
2005-10-07 17:08 ` Patrick McHardy
2005-10-07 17:21 ` Andi Kleen
2005-10-07 17:50 ` Patrick McHardy
2005-09-28 10:34 ` Henrik Nordstrom
2005-09-28 10:34 ` Henrik Nordstrom
2005-11-25 11:23 ` [PATCH] netfilter : zap get_cpu()/put_cpu() calls from ip_tables Eric Dumazet
2005-11-25 11:28 ` [PATCH (resent with the attachment !)] " Eric Dumazet
2005-11-25 18:20 ` Patrick McHardy
2005-09-21 21:29 ` [PATCH 1/3] netfilter : 3 patches to boost ip_tables performance Eric Dumazet
2005-09-22 12:57 ` Harald Welte
2005-09-22 12:57 ` Harald Welte
2005-09-22 13:17 ` Eric Dumazet
2005-09-22 13:17 ` Eric Dumazet
2005-09-21 21:32 ` [PATCH 2/3] " Eric Dumazet
2005-09-22 12:48 ` Harald Welte
2005-09-22 12:48 ` Harald Welte
2005-09-22 13:05 ` Eric Dumazet
2005-09-23 4:02 ` Willy Tarreau
2005-09-23 5:14 ` Eric Dumazet [this message]
2005-09-23 11:33 ` Willy Tarreau
2005-09-23 14:00 ` Tim Mattox
2005-09-21 21:37 ` [PATCH 3/3] " Eric Dumazet
2005-09-22 12:50 ` Harald Welte
2005-09-22 12:50 ` Harald Welte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43338F30.6070601@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=ak@suse.de \
--cc=laforge@netfilter.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@lists.netfilter.org \
--cc=willy@w.ods.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.