* [RFC][PATCH] optimise iptables interface matching
@ 2007-05-24 5:55 Philip Craig
2007-05-24 17:43 ` Patrick McHardy
2007-05-25 0:44 ` Yasuyuki KOZAKAI
0 siblings, 2 replies; 14+ messages in thread
From: Philip Craig @ 2007-05-24 5:55 UTC (permalink / raw)
To: Netfilter Developer Mailing List
[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]
Optimise iptables for rules that specify 0 or 1 interface matches,
which is the more common case (at least for my rulesets).
Below are the oprofile cpu cycle percentages from a 30 second
period of an iperf throughput test on a 667MHz IXP465 with
Realtek 8169 network interfaces.
rules iface % cpu before % cpu after saving (adjusted)
0 7.7662 4.9191 2.8471
10 0 15.9798 9.8453 3.2874
20 0 23.6914 14.2051 6.6392
10 1 14.6068 11.7332 0.0265
20 1 21.1646 17.1905 1.1270
10 2 14.6497 13.0306 -1.2280
20 2 21.1647 20.3536 -2.0360
- saving with 0 rules is due to policies
- adjusted saving means with the 0 rules saving subtracted
- iface 0 means "iptables -I FORWARD"
- iface 1 means "iptables -I FORWARD -i eth0"
- iface 2 means "iptables -I FORWARD -i eth0 -o eth1"
If you think this is an acceptable approach then I can update
the patch for IPv6. Any suggestions for other parts of
netfilter/iptables to look at optimising are also welcome.
Signed-off-by: Philip Craig <philipc@snapgear.com>
[-- Attachment #2: iptables-iface-optim.patch --]
[-- Type: text/plain, Size: 4134 bytes --]
--- linux-2.6.x/include/linux/netfilter_ipv4/ip_tables.h 26 Apr 2007 13:13:50 -0000 1.2
+++ linux-2.6.x/include/linux/netfilter_ipv4/ip_tables.h 24 May 2007 04:46:59 -0000
@@ -63,6 +63,10 @@ struct ipt_ip {
#define IPT_F_GOTO 0x02 /* Set if jump is a goto */
#define IPT_F_MASK 0x03 /* All possible flag bits mask. */
+/* Internal values used for optimisation */
+#define IPT_F_VIA_IN 0x10 /* Set if rule has iif match */
+#define IPT_F_VIA_OUT 0x20 /* Set if rule has oif match */
+
/* Values for "inv" field in struct ipt_ip. */
#define IPT_INV_VIA_IN 0x01 /* Invert the sense of IN IFACE. */
#define IPT_INV_VIA_OUT 0x02 /* Invert the sense of OUT IFACE */
--- linux-2.6.x/net/ipv4/netfilter/ip_tables.c 26 Apr 2007 11:17:49 -0000 1.1.1.29
+++ linux-2.6.x/net/ipv4/netfilter/ip_tables.c 24 May 2007 04:46:59 -0000
@@ -112,30 +112,34 @@ ip_packet_match(const struct iphdr *ip,
}
/* Look for ifname matches; this should unroll nicely. */
- for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
- ret |= (((const unsigned long *)indev)[i]
- ^ ((const unsigned long *)ipinfo->iniface)[i])
- & ((const unsigned long *)ipinfo->iniface_mask)[i];
- }
+ if (ipinfo->flags & IPT_F_VIA_IN) {
+ for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret |= (((const unsigned long *)indev)[i]
+ ^ ((const unsigned long *)ipinfo->iniface)[i])
+ & ((const unsigned long *)ipinfo->iniface_mask)[i];
+ }
- if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
- dprintf("VIA in mismatch (%s vs %s).%s\n",
- indev, ipinfo->iniface,
- ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
- return 0;
+ if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
+ dprintf("VIA in mismatch (%s vs %s).%s\n",
+ indev, ipinfo->iniface,
+ ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
+ return 0;
+ }
}
- for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
- ret |= (((const unsigned long *)outdev)[i]
- ^ ((const unsigned long *)ipinfo->outiface)[i])
- & ((const unsigned long *)ipinfo->outiface_mask)[i];
- }
+ if (ipinfo->flags & IPT_F_VIA_OUT) {
+ for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret |= (((const unsigned long *)outdev)[i]
+ ^ ((const unsigned long *)ipinfo->outiface)[i])
+ & ((const unsigned long *)ipinfo->outiface_mask)[i];
+ }
- if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
- dprintf("VIA out mismatch (%s vs %s).%s\n",
- outdev, ipinfo->outiface,
- ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
- return 0;
+ if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
+ dprintf("VIA out mismatch (%s vs %s).%s\n",
+ outdev, ipinfo->outiface,
+ ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
+ return 0;
+ }
}
/* Check specific protocol */
@@ -159,13 +163,21 @@ ip_packet_match(const struct iphdr *ip,
}
static inline int
-ip_checkentry(const struct ipt_ip *ip)
+ip_checkentry(struct ipt_ip *ip)
{
+ size_t i;
+
if (ip->flags & ~IPT_F_MASK) {
duprintf("Unknown flag bits set: %08X\n",
ip->flags & ~IPT_F_MASK);
return 0;
}
+ for (i = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ if (ip->iniface_mask[i])
+ ip->flags |= IPT_F_VIA_IN;
+ if (ip->outiface_mask[i])
+ ip->flags |= IPT_F_VIA_OUT;
+ }
if (ip->invflags & ~IPT_INV_MASK) {
duprintf("Unknown invflag bits set: %08X\n",
ip->invflags & ~IPT_INV_MASK);
@@ -869,8 +881,9 @@ copy_entries_to_user(unsigned int total_
}
/* FIXME: use iterator macros --RR */
- /* ... then go back and fix counters and names */
+ /* ... then go back and fix counters, flags, and names */
for (off = 0, num = 0; off < total_size; off += e->next_offset, num++){
+ u_int8_t flags;
unsigned int i;
struct ipt_entry_match *m;
struct ipt_entry_target *t;
@@ -884,6 +897,15 @@ copy_entries_to_user(unsigned int total_
goto free_counters;
}
+ flags = e->ip.flags & IPT_F_MASK;
+ if (copy_to_user(userptr + off
+ + offsetof(struct ipt_entry, ip.flags),
+ &flags,
+ sizeof(flags)) != 0) {
+ ret = -EFAULT;
+ goto free_counters;
+ }
+
for (i = sizeof(struct ipt_entry);
i < e->target_offset;
i += m->u.match_size) {
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-24 5:55 [RFC][PATCH] optimise iptables interface matching Philip Craig
@ 2007-05-24 17:43 ` Patrick McHardy
2007-05-24 23:07 ` Philip Craig
2007-05-25 0:44 ` Yasuyuki KOZAKAI
1 sibling, 1 reply; 14+ messages in thread
From: Patrick McHardy @ 2007-05-24 17:43 UTC (permalink / raw)
To: Philip Craig; +Cc: Netfilter Developer Mailing List
Philip Craig wrote:
> Optimise iptables for rules that specify 0 or 1 interface matches,
> which is the more common case (at least for my rulesets).
>
> Below are the oprofile cpu cycle percentages from a 30 second
> period of an iperf throughput test on a 667MHz IXP465 with
> Realtek 8169 network interfaces.
>
> rules iface % cpu before % cpu after saving (adjusted)
> 0 7.7662 4.9191 2.8471
> 10 0 15.9798 9.8453 3.2874
> 20 0 23.6914 14.2051 6.6392
> 10 1 14.6068 11.7332 0.0265
> 20 1 21.1646 17.1905 1.1270
> 10 2 14.6497 13.0306 -1.2280
> 20 2 21.1647 20.3536 -2.0360
>
> - saving with 0 rules is due to policies
> - adjusted saving means with the 0 rules saving subtracted
> - iface 0 means "iptables -I FORWARD"
> - iface 1 means "iptables -I FORWARD -i eth0"
> - iface 2 means "iptables -I FORWARD -i eth0 -o eth1"
>
> If you think this is an acceptable approach then I can update
> the patch for IPv6. Any suggestions for other parts of
> netfilter/iptables to look at optimising are also welcome.
I don't like the kernel-internal fiddling with the flags too
much, but I don't see a way around it. But even if there is
no other way,
> @@ -884,6 +897,15 @@ copy_entries_to_user(unsigned int total_
> goto free_counters;
> }
>
> + flags = e->ip.flags & IPT_F_MASK;
> + if (copy_to_user(userptr + off
> + + offsetof(struct ipt_entry, ip.flags),
> + &flags,
> + sizeof(flags)) != 0) {
> + ret = -EFAULT;
> + goto free_counters;
> + }
> +
userspace should just ignore unknown flags.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-24 17:43 ` Patrick McHardy
@ 2007-05-24 23:07 ` Philip Craig
2007-05-26 8:47 ` Patrick McHardy
0 siblings, 1 reply; 14+ messages in thread
From: Philip Craig @ 2007-05-24 23:07 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Developer Mailing List
Patrick McHardy wrote:
> I don't like the kernel-internal fiddling with the flags too
> much, but I don't see a way around it.
The other idea I had was moving the interface matching into
an internal match that would be checked by IPT_MATCH_ITERATE().
Not sure if this is feasible yet.
> userspace should just ignore unknown flags.
I was trying to completely hide them from userspace so that we
still have the option to use them for something else later on.
If we ever send them to userspace, then they are taken forever,
otherwise a newer iptables userspace may not work with an older
kernel.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-24 5:55 [RFC][PATCH] optimise iptables interface matching Philip Craig
2007-05-24 17:43 ` Patrick McHardy
@ 2007-05-25 0:44 ` Yasuyuki KOZAKAI
2007-05-25 0:56 ` Philip Craig
1 sibling, 1 reply; 14+ messages in thread
From: Yasuyuki KOZAKAI @ 2007-05-25 0:44 UTC (permalink / raw)
To: philipc; +Cc: netfilter-devel
Hi, Philip,
From: Philip Craig <philipc@snapgear.com>
Date: Thu, 24 May 2007 15:55:23 +1000
> Optimise iptables for rules that specify 0 or 1 interface matches,
> which is the more common case (at least for my rulesets).
> /* Look for ifname matches; this should unroll nicely. */
> - for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
> - ret |= (((const unsigned long *)indev)[i]
> - ^ ((const unsigned long *)ipinfo->iniface)[i])
> - & ((const unsigned long *)ipinfo->iniface_mask)[i];
> - }
> + if (ipinfo->flags & IPT_F_VIA_IN) {
> + for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
> + ret |= (((const unsigned long *)indev)[i]
> + ^ ((const unsigned long *)ipinfo->iniface)[i])
> + & ((const unsigned long *)ipinfo->iniface_mask)[i];
> + }
This breaks binary compatibility. The current userspace programs do not set
this bit. And ip_checkentry() in all kernel before you change will reject
unknown flags from new userspace programs.
Actually, we cannot add a new flag to 'struct ipt_ip'. It does not have
revision field. Unfortunately it has no field such as name[] in
'struct xt_entry_match' to steal one octet for revision.
I expect that this optimization will be done when we introduce netlink
interface.
-- Yasuyuki Kozakai
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-25 0:44 ` Yasuyuki KOZAKAI
@ 2007-05-25 0:56 ` Philip Craig
2007-05-25 4:11 ` Yasuyuki KOZAKAI
2007-05-26 9:20 ` Patrick McHardy
0 siblings, 2 replies; 14+ messages in thread
From: Philip Craig @ 2007-05-25 0:56 UTC (permalink / raw)
To: Yasuyuki KOZAKAI; +Cc: netfilter-devel
Yasuyuki KOZAKAI wrote:
> This breaks binary compatibility. The current userspace programs do not set
> this bit. And ip_checkentry() in all kernel before you change will reject
> unknown flags from new userspace programs.
Userspace doesn't need to set it because the kernel can derive it
automatically (which I've done in ip_checkentry).
Basically I just needed two bits of internal state, and the flags
field looked like a convenient place for this. Userspace should
be completely ignorant of these bits.
> Actually, we cannot add a new flag to 'struct ipt_ip'. It does not have
> revision field. Unfortunately it has no field such as name[] in
> 'struct xt_entry_match' to steal one octet for revision.
If we can never add new flags, then that would be a reason for me to
not bother with clearing the internal bits before sending to userspace.
> I expect that this optimization will be done when we introduce netlink
> interface.
Yes, I'm happy to just keep a local patch until then if you don't think
it is worth it.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-25 0:56 ` Philip Craig
@ 2007-05-25 4:11 ` Yasuyuki KOZAKAI
2007-05-26 9:20 ` Patrick McHardy
1 sibling, 0 replies; 14+ messages in thread
From: Yasuyuki KOZAKAI @ 2007-05-25 4:11 UTC (permalink / raw)
To: philipc; +Cc: netfilter-devel, yasuyuki.kozakai
From: Philip Craig <philipc@snapgear.com>
Date: Fri, 25 May 2007 10:56:10 +1000
> Yasuyuki KOZAKAI wrote:
> > This breaks binary compatibility. The current userspace programs do not set
> > this bit. And ip_checkentry() in all kernel before you change will reject
> > unknown flags from new userspace programs.
>
> Userspace doesn't need to set it because the kernel can derive it
> automatically (which I've done in ip_checkentry).
> Basically I just needed two bits of internal state, and the flags
> field looked like a convenient place for this. Userspace should
> be completely ignorant of these bits.
Ah, indeed, thanks for explanation.
-- Yasuyuki Kozakai
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-24 23:07 ` Philip Craig
@ 2007-05-26 8:47 ` Patrick McHardy
0 siblings, 0 replies; 14+ messages in thread
From: Patrick McHardy @ 2007-05-26 8:47 UTC (permalink / raw)
To: Philip Craig; +Cc: Netfilter Developer Mailing List
Philip Craig wrote:
> Patrick McHardy wrote:
>
>>I don't like the kernel-internal fiddling with the flags too
>>much, but I don't see a way around it.
>
>
> The other idea I had was moving the interface matching into
> an internal match that would be checked by IPT_MATCH_ITERATE().
> Not sure if this is feasible yet.
Mhh .. probably not since you would have to put it somewhere
in the blob.
>>userspace should just ignore unknown flags.
>
>
> I was trying to completely hide them from userspace so that we
> still have the option to use them for something else later on.
> If we ever send them to userspace, then they are taken forever,
> otherwise a newer iptables userspace may not work with an older
> kernel.
We could do that, but we would need some other place to store them
since once we want to use them for something new we can't put them
in ->flags anymore.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-25 0:56 ` Philip Craig
2007-05-25 4:11 ` Yasuyuki KOZAKAI
@ 2007-05-26 9:20 ` Patrick McHardy
2007-05-28 19:53 ` Henrik Nordstrom
1 sibling, 1 reply; 14+ messages in thread
From: Patrick McHardy @ 2007-05-26 9:20 UTC (permalink / raw)
To: Philip Craig; +Cc: netfilter-devel, Yasuyuki KOZAKAI
Philip Craig wrote:
> Yasuyuki KOZAKAI wrote:
>
>>Actually, we cannot add a new flag to 'struct ipt_ip'. It does not have
>>revision field. Unfortunately it has no field such as name[] in
>>'struct xt_entry_match' to steal one octet for revision.
>
>
> If we can never add new flags, then that would be a reason for me to
> not bother with clearing the internal bits before sending to userspace.
We can add flags for new features, but not flags that are required to
be set to behave compatible since that would break iptables userspace
for old kernels.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-26 9:20 ` Patrick McHardy
@ 2007-05-28 19:53 ` Henrik Nordstrom
2007-05-29 0:24 ` Philip Craig
2007-05-29 9:54 ` Patrick McHardy
0 siblings, 2 replies; 14+ messages in thread
From: Henrik Nordstrom @ 2007-05-28 19:53 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel, Philip Craig
[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]
lör 2007-05-26 klockan 11:20 +0200 skrev Patrick McHardy:
> We can add flags for new features, but not flags that are required to
> be set to behave compatible since that would break iptables userspace
> for old kernels.
But the proposed change is completely transparent to userspace... the
use of the new flags is purely kernel-only and not visible to
userspace..
The drawback is that it reduces the possible new flags which might be
added by two bits, or if put another way reduces the flags field from 8
to 6 bits, leaving only 3 unused flag bits for future flag type
expansions.
But I am a little curious.. how much difference would it yield if simply
the loop was instead changed to terminate on string end instead of
always iterating over the max interface name length..
for (i = 0, ret = 0; ((const unsigned long *)ipinfo->outiface_mask)[i] && i < IFNAMSIZ/sizeof(unsigned long); i++) {
...
}
instead of
for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
...
}
plus a small change to the userspace to zero the mask after the first \0
in the interface name on exact matches.
The userspace only allow exact or prefix matching, so it should be ok to
restrict the compare function to this I think..
Regards
Henrik
[-- Attachment #2: Detta är en digitalt signerad meddelandedel --]
[-- Type: application/pgp-signature, Size: 307 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-28 19:53 ` Henrik Nordstrom
@ 2007-05-29 0:24 ` Philip Craig
2007-05-30 1:34 ` Henrik Nordstrom
2007-05-29 9:54 ` Patrick McHardy
1 sibling, 1 reply; 14+ messages in thread
From: Philip Craig @ 2007-05-29 0:24 UTC (permalink / raw)
To: Henrik Nordstrom; +Cc: netfilter-devel, Patrick McHardy
Henrik Nordstrom wrote:
> But I am a little curious.. how much difference would it yield if simply
> the loop was instead changed to terminate on string end instead of
> always iterating over the max interface name length..
>
> for (i = 0, ret = 0; ((const unsigned long *)ipinfo->outiface_mask)[i] && i < IFNAMSIZ/sizeof(unsigned long); i++) {
> ...
> }
>
> instead of
>
> for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
> ...
> }
>
> plus a small change to the userspace to zero the mask after the first \0
> in the interface name on exact matches.
>
> The userspace only allow exact or prefix matching, so it should be ok to
> restrict the compare function to this I think..
I'll try that, but that will also prevent loop unrolling if you're
using -funroll-loops. Not sure if any builds use this, but the
comment in the code implies some do.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-28 19:53 ` Henrik Nordstrom
2007-05-29 0:24 ` Philip Craig
@ 2007-05-29 9:54 ` Patrick McHardy
1 sibling, 0 replies; 14+ messages in thread
From: Patrick McHardy @ 2007-05-29 9:54 UTC (permalink / raw)
To: Henrik Nordstrom; +Cc: netfilter-devel, Philip Craig
Henrik Nordstrom wrote:
> lör 2007-05-26 klockan 11:20 +0200 skrev Patrick McHardy:
>
>
>>We can add flags for new features, but not flags that are required to
>>be set to behave compatible since that would break iptables userspace
>>for old kernels.
>
>
> But the proposed change is completely transparent to userspace... the
> use of the new flags is purely kernel-only and not visible to
> userspace..
Yes, certainly, that statement was not specific to this patch.
> The drawback is that it reduces the possible new flags which might be
> added by two bits, or if put another way reduces the flags field from 8
> to 6 bits, leaving only 3 unused flag bits for future flag type
> expansions.
>
>
> But I am a little curious.. how much difference would it yield if simply
> the loop was instead changed to terminate on string end instead of
> always iterating over the max interface name length..
>
> for (i = 0, ret = 0; ((const unsigned long *)ipinfo->outiface_mask)[i] && i < IFNAMSIZ/sizeof(unsigned long); i++) {
> ...
> }
>
> instead of
>
> for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
> ...
> }
>
> plus a small change to the userspace to zero the mask after the first \0
> in the interface name on exact matches.
It already does that I think. This sounds like a good alternative,
I'd be interested to see how much improvement it yields.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-29 0:24 ` Philip Craig
@ 2007-05-30 1:34 ` Henrik Nordstrom
2007-06-06 5:49 ` Philip Craig
0 siblings, 1 reply; 14+ messages in thread
From: Henrik Nordstrom @ 2007-05-30 1:34 UTC (permalink / raw)
To: Philip Craig; +Cc: netfilter-devel, Patrick McHardy
[-- Attachment #1: Type: text/plain, Size: 674 bytes --]
tis 2007-05-29 klockan 10:24 +1000 skrev Philip Craig:
>
> I'll try that, but that will also prevent loop unrolling if you're
> using -funroll-loops. Not sure if any builds use this, but the
> comment in the code implies some do.
My GCC unrolls that loop just fine...
x86_64 gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
Here is a corrected version of the for loop:
for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long) && ((const unsigned long *)ipinfo->outiface_mask)[i]; i++) {
but for modern 64-bit CPUs I suspect the original is actually fastest as
IFNAMSIZ is only 16 bytes and fits in two parallel 64-bit operations..
Regards
Henrik
[-- Attachment #2: Detta är en digitalt signerad meddelandedel --]
[-- Type: application/pgp-signature, Size: 307 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-05-30 1:34 ` Henrik Nordstrom
@ 2007-06-06 5:49 ` Philip Craig
2007-06-06 6:13 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Philip Craig @ 2007-06-06 5:49 UTC (permalink / raw)
To: Henrik Nordstrom; +Cc: netfilter-devel, Patrick McHardy
[-- Attachment #1: Type: text/plain, Size: 1863 bytes --]
Henrik Nordstrom wrote:
> tis 2007-05-29 klockan 10:24 +1000 skrev Philip Craig:
>> I'll try that, but that will also prevent loop unrolling if you're
>> using -funroll-loops. Not sure if any builds use this, but the
>> comment in the code implies some do.
>
> My GCC unrolls that loop just fine...
>
> x86_64 gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
You're right, I should check things before making statements.
> Here is a corrected version of the for loop:
>
> for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long) && ((const unsigned long *)ipinfo->outiface_mask)[i]; i++) {
>
>
> but for modern 64-bit CPUs I suspect the original is actually fastest as
> IFNAMSIZ is only 16 bytes and fits in two parallel 64-bit operations..
For my ARM platform, changing the for loop is a win for no interface
matches, or short interface names (I didn't test longer interface names).
But looking at the generated assembly for x86_64, this results in more
instructions and branches, and I can't see this being a win. (I'm not
set up to profile this.)
Also, the two optimisations are not mutually exclusive: one is to skip
the whole comparison completely (including inversion), and one is to
terminate the for loop early. So we can use your loop termination
condition to skip the whole comparison too.
The attached patch is logically equivalent to my first (assuming a
contiguous and zero padded mask), but it avoids messing with flags. In
practice, my profiling says it is slightly slower than the first patch
for 0 interface matches, but slightly faster for 1 or 2.
Note: this patch (and the original patch) change the behaviour when
inverting a zero length mask. That is for either of:
iptables -A INPUT ! -i +
iptables -A INPUT ! -i ""
Not sure if this matters.
I also tried testing just the first byte, instead of a long, but that
was slower.
[-- Attachment #2: iptables-iface-optim2.patch --]
[-- Type: text/plain, Size: 2099 bytes --]
--- linux-2.6.x/net/ipv4/netfilter/ip_tables.c 26 Apr 2007 11:17:49 -0000 1.1.1.29
+++ linux-2.6.x/net/ipv4/netfilter/ip_tables.c 6 Jun 2007 05:45:16 -0000
@@ -112,30 +112,34 @@ ip_packet_match(const struct iphdr *ip,
}
/* Look for ifname matches; this should unroll nicely. */
- for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
- ret |= (((const unsigned long *)indev)[i]
- ^ ((const unsigned long *)ipinfo->iniface)[i])
- & ((const unsigned long *)ipinfo->iniface_mask)[i];
- }
+ if (((const unsigned long *)ipinfo->iniface_mask)[0]) {
+ for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret |= (((const unsigned long *)indev)[i]
+ ^ ((const unsigned long *)ipinfo->iniface)[i])
+ & ((const unsigned long *)ipinfo->iniface_mask)[i];
+ }
- if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
- dprintf("VIA in mismatch (%s vs %s).%s\n",
- indev, ipinfo->iniface,
- ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
- return 0;
+ if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
+ dprintf("VIA in mismatch (%s vs %s).%s\n",
+ indev, ipinfo->iniface,
+ ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
+ return 0;
+ }
}
- for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
- ret |= (((const unsigned long *)outdev)[i]
- ^ ((const unsigned long *)ipinfo->outiface)[i])
- & ((const unsigned long *)ipinfo->outiface_mask)[i];
- }
+ if (((const unsigned long *)ipinfo->outiface_mask)[0]) {
+ for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret |= (((const unsigned long *)outdev)[i]
+ ^ ((const unsigned long *)ipinfo->outiface)[i])
+ & ((const unsigned long *)ipinfo->outiface_mask)[i];
+ }
- if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
- dprintf("VIA out mismatch (%s vs %s).%s\n",
- outdev, ipinfo->outiface,
- ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
- return 0;
+ if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
+ dprintf("VIA out mismatch (%s vs %s).%s\n",
+ outdev, ipinfo->outiface,
+ ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
+ return 0;
+ }
}
/* Check specific protocol */
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH] optimise iptables interface matching
2007-06-06 5:49 ` Philip Craig
@ 2007-06-06 6:13 ` Eric Dumazet
0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2007-06-06 6:13 UTC (permalink / raw)
To: Philip Craig; +Cc: netfilter-devel, Patrick McHardy, Henrik Nordstrom
Philip Craig a écrit :
> Henrik Nordstrom wrote:
>> tis 2007-05-29 klockan 10:24 +1000 skrev Philip Craig:
>>> I'll try that, but that will also prevent loop unrolling if you're
>>> using -funroll-loops. Not sure if any builds use this, but the
>>> comment in the code implies some do.
>> My GCC unrolls that loop just fine...
>>
>> x86_64 gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
>
> You're right, I should check things before making statements.
>
>> Here is a corrected version of the for loop:
>>
>> for (i = 0, ret = 0; i < IFNAMSIZ/sizeof(unsigned long) && ((const unsigned long *)ipinfo->outiface_mask)[i]; i++) {
>>
>>
>> but for modern 64-bit CPUs I suspect the original is actually fastest as
>> IFNAMSIZ is only 16 bytes and fits in two parallel 64-bit operations..
>
> For my ARM platform, changing the for loop is a win for no interface
> matches, or short interface names (I didn't test longer interface names).
>
> But looking at the generated assembly for x86_64, this results in more
> instructions and branches, and I can't see this being a win. (I'm not
> set up to profile this.)
>
> Also, the two optimisations are not mutually exclusive: one is to skip
> the whole comparison completely (including inversion), and one is to
> terminate the for loop early. So we can use your loop termination
> condition to skip the whole comparison too.
>
> The attached patch is logically equivalent to my first (assuming a
> contiguous and zero padded mask), but it avoids messing with flags. In
> practice, my profiling says it is slightly slower than the first patch
> for 0 interface matches, but slightly faster for 1 or 2.
>
> Note: this patch (and the original patch) change the behaviour when
> inverting a zero length mask. That is for either of:
> iptables -A INPUT ! -i +
> iptables -A INPUT ! -i ""
> Not sure if this matters.
>
> I also tried testing just the first byte, instead of a long, but that
> was slower.
>
>
In my analysis (oprofiling), I found instructions were not really a problem.
The big problem comes from the size of data that have to be read by the CPU to
perform a typical table lookup.
I consider 16 bytes masks (32 bytes per rule, one for iface, one for oface) is
a waste of memory. And on current CPUS, with 64 bytes cache lines, memory
bandwidth is the limiting factor.
We probably could use a mask length (one byte for iface, one for oface) to
reduce memory footprint, and have better chance to keep tables in cpu caches.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-06-06 6:13 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-24 5:55 [RFC][PATCH] optimise iptables interface matching Philip Craig
2007-05-24 17:43 ` Patrick McHardy
2007-05-24 23:07 ` Philip Craig
2007-05-26 8:47 ` Patrick McHardy
2007-05-25 0:44 ` Yasuyuki KOZAKAI
2007-05-25 0:56 ` Philip Craig
2007-05-25 4:11 ` Yasuyuki KOZAKAI
2007-05-26 9:20 ` Patrick McHardy
2007-05-28 19:53 ` Henrik Nordstrom
2007-05-29 0:24 ` Philip Craig
2007-05-30 1:34 ` Henrik Nordstrom
2007-06-06 5:49 ` Philip Craig
2007-06-06 6:13 ` Eric Dumazet
2007-05-29 9:54 ` Patrick McHardy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.