iptables performance under 2.6.0[-test9]

All of lore.kernel.org
 help / color / mirror / Atom feed

* iptables performance under 2.6.0[-test9]
@ 2003-10-27 16:10 Andy Polyakov
  2003-10-27 18:05 ` Andy Polyakov
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Polyakov @ 2003-10-27 16:10 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I tried to deploy 2.6.0[-test9] iptables to masquarade an private
interace. Strangely enough ip_conntrack.ko module seems to affect
performance of *some* TCP connections. More specific I found that
performance of certain TCP connections (netscape mail rebuilding index
of an large IMAP mailbox in my case) is *reproducibly* unacceptable
(minutes vs. normal 15-20 seconds). In the course of troubleshooting I
started to unload iptables modules one by one, and after 'rmmod
ip_conntrack,' performance became normal. I don't know if it's
essential, but performance is not affected if I deploy ipchains instead
of iptables for equivalent setup. Oh! Those affected TCP connections are
*not* masqueraded and are "bound" to primary interface. I'm looking for
triggering factors now (what's so special about netscape mail rebuilding
mailbox index), but I figured that meanwhile it might worth asking
people on this list if this resembles any other problem report. Does it?
A.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-27 16:10 iptables performance under 2.6.0[-test9] Andy Polyakov
@ 2003-10-27 18:05 ` Andy Polyakov
  2003-10-27 18:30   ` Andy Polyakov
  2003-10-28  8:30   ` Patrick McHardy
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Polyakov @ 2003-10-27 18:05 UTC (permalink / raw)
  To: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

> I tried to deploy 2.6.0[-test9] iptables to masquerade a private
> interace. Strangely enough ip_conntrack.ko module seems to affect
> performance of *some* TCP connections. ... I'm looking for
> triggering factors...

Apparently it's not certain connections as wholes which are affected,
but only some outgoing packets. Those packets corresponding to socket
writes larger than MTU-40 [where 40 is size of TCP/IP header]... I fail
to imagine how come, but here is how I can reproduce the problem with
attached head.pl script:

- 'lsmod' to make sure *no* iptables modules are loaded;
- 'time ./head.pl some.host 2000' says that it takes a portion of
elapsed second for script to complete;
- modprobe ip_conntrack;
- 'time ./head.pl some.host 2000' now says that it takes over 3(!)
elapsed seconds to complete;

If I reduce amount of \n's send in first socket write by passing value
of 1460 or lower as the last command-line argument, the script completes
instantly regardless if ip_conntrack is loaded or not. Therefore the
conclusion about MTU-40... Cheers. A.

[-- Attachment #2: head.pl --]
[-- Type: application/x-perl, Size: 395 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-27 18:05 ` Andy Polyakov
@ 2003-10-27 18:30   ` Andy Polyakov
  2003-10-28  8:30   ` Patrick McHardy
  1 sibling, 0 replies; 10+ messages in thread
From: Andy Polyakov @ 2003-10-27 18:30 UTC (permalink / raw)
  To: netfilter-devel

Sorry that I failed to keep everything in one mail, but here is
another observation.

> - 'lsmod' to make sure *no* iptables modules are loaded;
> - 'time ./head.pl some.host 2000' says that it takes a portion of
> elapsed second for script to complete;

tcpdump on same machine starts as following (skiping initial SYN packets):

19:16:20.768383 eth0 > me.33336 > he.www: . 1:1(0) ack 1 win 5840 (DF)
19:16:20.779070 eth0 > me.33336 > he.www: P 1:2001(2000) ack 1 win 5840 (DF)
19:16:20.780130 eth0 < he.www > me.33336: . 1:1(0) ack 1461 win 8760 (DF)
19:16:20.780158 eth0 < he.www > me.33336: . 1:1(0) ack 2001 win 11680 (DF)

Note that tcpdump shows an impossible thing, 4th packed being
larger than MTU!? I don't know how come, but that is *not* how
it looks like on server side. On server side I do get 1460 large
packet followed by 540 one, just as one would normally expect...

> - modprobe ip_conntrack;
> - 'time ./head.pl some.host 2000' now says that it takes over 3(!)
> elapsed seconds to complete;

19:19:54.361162 eth0 > me.33337 > he.www: . 1:1(0) ack 1 win 5840 (DF)
19:19:57.370629 eth0 > me.33337 > he.www: . 1:1461(1460) ack 1 win 5840 (DF)
19:19:57.371675 eth0 < he.www > me.33337: . 1:1(0) ack 1461 win 8760 (DF)
19:19:57.374550 eth0 > me.33337 > he.www: P 1461:2001(540) ack 1 win 5840 (DF)
19:19:57.375074 eth0 < he.www > me.33337: . 1:1(0) ack 2001 win 11680 (DF)

No impossible things, but first MTU-40 large packet is delayed for
3 seconds... A.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-27 18:05 ` Andy Polyakov
  2003-10-27 18:30   ` Andy Polyakov
@ 2003-10-28  8:30   ` Patrick McHardy
  2003-10-28 10:01     ` Andy Polyakov
  1 sibling, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2003-10-28  8:30 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: netfilter-devel

Can you please provide (pcap format) packet dumps of

a) the correct situation
b) broken situation

I would like to have a look at it. Please make sure to capture
with "-s 1500" so the packets are in the dump completly.

Thanks,
Patrick

Andy Polyakov wrote:

>>I tried to deploy 2.6.0[-test9] iptables to masquerade a private
>>interace. Strangely enough ip_conntrack.ko module seems to affect
>>performance of *some* TCP connections. ... I'm looking for
>>triggering factors...
>>    
>>
>
>Apparently it's not certain connections as wholes which are affected,
>but only some outgoing packets. Those packets corresponding to socket
>writes larger than MTU-40 [where 40 is size of TCP/IP header]... I fail
>to imagine how come, but here is how I can reproduce the problem with
>attached head.pl script:
>
>- 'lsmod' to make sure *no* iptables modules are loaded;
>- 'time ./head.pl some.host 2000' says that it takes a portion of
>elapsed second for script to complete;
>- modprobe ip_conntrack;
>- 'time ./head.pl some.host 2000' now says that it takes over 3(!)
>elapsed seconds to complete;
>
>If I reduce amount of \n's send in first socket write by passing value
>of 1460 or lower as the last command-line argument, the script completes
>instantly regardless if ip_conntrack is loaded or not. Therefore the
>conclusion about MTU-40... Cheers. A.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28  8:30   ` Patrick McHardy
@ 2003-10-28 10:01     ` Andy Polyakov
  2003-10-28 10:09       ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Polyakov @ 2003-10-28 10:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

> Can you please provide (pcap format) packet dumps of
> 
> a) the correct situation
> b) broken situation

Attached.

> I would like to have a look at it.

As already mentioned in my last letter, note that 4th packet in
fast.tcpdump is *not* actual wire traffic. On the wire [e.g. on server
side] I see *two* packets with 1460 and 540 \n's as TCP payload.

In either case it apparently has something[/everything?] to do with
ip_refrag in ip_conntrack_standalone.c. At least if I comment out the
"if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { ... }" statement in this
function, './head.pl some.host 2000' completes instantly even if I
insmod the patched module. A.

[-- Attachment #2: fast.tcpdump --]
[-- Type: application/octet-stream, Size: 2896 bytes --]

[-- Attachment #3: slow.tcpdump --]
[-- Type: application/octet-stream, Size: 3008 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28 10:01     ` Andy Polyakov
@ 2003-10-28 10:09       ` Patrick McHardy
  2003-10-28 11:18         ` Andy Polyakov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2003-10-28 10:09 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: netfilter-devel

Andy Polyakov wrote:

>As already mentioned in my last letter, note that 4th packet in
>fast.tcpdump is *not* actual wire traffic. On the wire [e.g. on server
>side] I see *two* packets with 1460 and 540 \n's as TCP payload.
>
>  
>
Sorry I forgot about this earlier, can you make the dumps again on
both sides of the firewall ? If possible sync the clocks of both
boxes so times are comparable.


>In either case it apparently has something[/everything?] to do with
>ip_refrag in ip_conntrack_standalone.c. At least if I comment out the
>"if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { ... }" statement in this
>function, './head.pl some.host 2000' completes instantly even if I
>insmod the patched module. A.
>

Yes these lines are suspect nr. 1 ;) I wonder however how it keeps
working without refragmentation, this implies it's not necessary to
refragment here as the stack will do it anyways ..

Best regards,
Patrick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28 10:09       ` Patrick McHardy
@ 2003-10-28 11:18         ` Andy Polyakov
  2003-10-28 12:19           ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Polyakov @ 2003-10-28 11:18 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

> >As already mentioned in my last letter, note that 4th packet in
> >fast.tcpdump is *not* actual wire traffic. On the wire [e.g. on server
> >side] I see *two* packets with 1460 and 540 \n's as TCP payload.
> >
> Sorry I forgot about this earlier, can you make the dumps again on
> both sides of the firewall ?

Can't you take my word for it? See my last letter from yesterday, first
tcpdump snippet. You'll see that my server first ACKs 1460 bytes and
then 540 extra. This alone proves that server got 2000 bytes in two
packets, not one. But I can actually confirm that tcpdump does show two
packets on server. I see no meaning to waste public bandwidth on
attachment that does not actually add any extra information...

> >In either case it apparently has something[/everything?] to do with
> >ip_refrag in ip_conntrack_standalone.c. At least if I comment out the
> >"if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { ... }" statement in this
> >function, './head.pl some.host 2000' completes instantly even if I
> >insmod the patched module. A.
> 
> Yes these lines are suspect nr. 1 ;) I wonder however how it keeps
> working without refragmentation, this implies it's not necessary to
> refragment here as the stack will do it anyways ..

Yes, it does [imply that]. But I rather wonder why did it assemble a TCP
packet larger than MTU in first place. But in either case it looks like
whomever who did that gives up after 3 seconds and generates two smaller
instead. Normally I would also complain that one would expect tcpdump to
show wire traffic even on the client side, but this is really bad
opportunity to do so, as we probably wouldn't land on ip_refrag so
soon:-) A.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28 11:18         ` Andy Polyakov
@ 2003-10-28 12:19           ` Patrick McHardy
  2003-10-28 21:59             ` Andy Polyakov
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2003-10-28 12:19 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: netfilter-devel

Andy Polyakov wrote:

>>Sorry I forgot about this earlier, can you make the dumps again on
>>both sides of the firewall ?
>>    
>>
>
>Can't you take my word for it? See my last letter from yesterday, first
>tcpdump snippet. You'll see that my server first ACKs 1460 bytes and
>then 540 extra. This alone proves that server got 2000 bytes in two
>packets, not one. But I can actually confirm that tcpdump does show two
>packets on server. I see no meaning to waste public bandwidth on
>attachment that does not actually add any extra information...
>
>  
>

Sure, it's not that I wouldn't believe you but I wanted to see if your
problems could be caused by de/refragmentation introducing higher rtt
variance and thus higher retransmit timeout, but after reading your
mails more carefully that seems really unlikely.

>Yes, it does [imply that]. But I rather wonder why did it assemble a TCP
>packet larger than MTU in first place. But in either case it looks like
>whomever who did that gives up after 3 seconds and generates two smaller
>instead. Normally I would also complain that one would expect tcpdump to
>show wire traffic even on the client side, but this is really bad
>opportunity to do so, as we probably wouldn't land on ip_refrag so
>soon:-) A.
>  
>

Yes you're right the stack should never generate tcp packets > mtu.

This is either a misconfiguration or a bug in TCP. I assume you
checked your configuration (btw: can you please post it ?) so we
need to find out why the stack generates these packets. Perhaps
you can check the pmtu by printing dst->pmtu in tcp_v4_connect
(rt->u.dst.pmtu). Unfortunately I'm busy ATM so I can't help
much right now.

Regards,
Patrick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28 12:19           ` Patrick McHardy
@ 2003-10-28 21:59             ` Andy Polyakov
  2003-10-29  0:32               ` Patrick McHardy
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Polyakov @ 2003-10-28 21:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

> >Yes, it does [imply that]. But I rather wonder why did it assemble a TCP
> >packet larger than MTU in first place. But in either case it looks like
> >whomever who did that gives up after 3 seconds and generates two smaller
> >instead. Normally I would also complain that one would expect tcpdump to
> >show wire traffic even on the client side, but this is really bad
> >opportunity to do so, as we probably wouldn't land on ip_refrag so
> >soon:-)
> 
> Yes you're right the stack should never generate tcp packets > mtu.
> 
> This is either a misconfiguration or a bug in TCP.

Looks like neither:-) My NIC turned to be NETIF_F_TSO capable, which
means that it "can off-load TCP/IP segmentation" and kernel is allowed
to and does throw packets larger than ethernet MTU at it [and tcpdump
therefore was honest].

I'm currently running attached patch and it apparently solves my
*particular* problem, but I can't tell if it's actually "the right
thing(tm)" to do... Is (*pskb)->sk->sk_route_caps right place to check?
Maybe out->features is more appropriate? Is there TSO maximum which one
should compare (*pskb)->len against? That kind of questions...

HOWEVER!!! Even if we figure out "the right thing(tm)" and address the
NETIF_F_TSO issue in proper manner, it does *not* necessarily mean that
performance problem will disappear as well. Well, in my optinion... I
mean performance might still suffer, whenever user will for example
masquerade a larger MTU interface behind "narrower" one, e.g. behind
PPPoE virtual interface, and further experiments should therefore be
performed... But I'm not sure if I'll be able to assist, because my
NETIF_F_TSO capable NIC might make it impossible to arrange for proper
setup [without PPPoE which I simply don't have]. I'll try, but can't
make any promises... Cheers. A.

[-- Attachment #2: nf.diff --]
[-- Type: text/plain, Size: 505 bytes --]

--- ./net/ipv4/netfilter/ip_conntrack_standalone.c.orig	Sat Oct 25 20:43:32 2003
+++ ./net/ipv4/netfilter/ip_conntrack_standalone.c	Tue Oct 28 23:16:56 2003
@@ -198,6 +198,9 @@
 	if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
 		return NF_DROP;

+	if ((*pskb)->sk && (*pskb)->sk->sk_route_caps&NETIF_F_TSO)
+		return NF_ACCEPT;
+
 	/* Local packets are never produced too large for their
 	   interface.  We degfragment them at LOCAL_OUT, however,
 	   so we have to refragment them here. */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: iptables performance under 2.6.0[-test9]
  2003-10-28 21:59             ` Andy Polyakov
@ 2003-10-29  0:32               ` Patrick McHardy
  0 siblings, 0 replies; 10+ messages in thread
From: Patrick McHardy @ 2003-10-29  0:32 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: netfilter-devel

Andy Polyakov wrote:

>>This is either a misconfiguration or a bug in TCP.
>>    
>>
>
>Looks like neither:-) My NIC turned to be NETIF_F_TSO capable, which
>means that it "can off-load TCP/IP segmentation" and kernel is allowed
>to and does throw packets larger than ethernet MTU at it [and tcpdump
>therefore was honest].
>
>I'm currently running attached patch and it apparently solves my
>*particular* problem, but I can't tell if it's actually "the right
>thing(tm)" to do... Is (*pskb)->sk->sk_route_caps right place to check?
>Maybe out->features is more appropriate? Is there TSO maximum which one
>should compare (*pskb)->len against? That kind of questions...
>  
>
NETIF_F_TSO is a netdevice flag but it needs to be enabled, so its
probably not dev->features which we need to check. I'm going to have
a look at this tomorrow. However I do not understand why the packets
got dropped after fragmentation. This is what we need to understand
for fixing.

>HOWEVER!!! Even if we figure out "the right thing(tm)" and address the
>NETIF_F_TSO issue in proper manner, it does *not* necessarily mean that
>performance problem will disappear as well. Well, in my optinion... I
>mean performance might still suffer, whenever user will for example
>masquerade a larger MTU interface behind "narrower" one, e.g. behind
>PPPoE virtual interface, and further experiments should therefore be
>performed... But I'm not sure if I'll be able to assist, because my
>NETIF_F_TSO capable NIC might make it impossible to arrange for proper
>setup [without PPPoE which I simply don't have]. I'll try, but can't
>make any promises... Cheers. A.
>

Yes that it a known problem, ip_conntrack will perform refragmentation
with different mtu despite IP_DF set. The ipv6 conntrack port from USAGI
solved the problem by keeping the original sk_buffs with the defragmented
one and instead of refragmenting sending the original ones and checking
size and DF for them. This solves the pmtu discovery issues.

One remaining (not very important) problem are protocols like NFS
which send carefully spaced fragments. Defragmentation "eats" the
spacing so they are send to the device in a burst.

Regards,
Patrick


>------------------------------------------------------------------------
>
>--- ./net/ipv4/netfilter/ip_conntrack_standalone.c.orig	Sat Oct 25 20:43:32 2003
>+++ ./net/ipv4/netfilter/ip_conntrack_standalone.c	Tue Oct 28 23:16:56 2003
>@@ -198,6 +198,9 @@
> 	if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
> 		return NF_DROP;
> 
>+	if ((*pskb)->sk && (*pskb)->sk->sk_route_caps&NETIF_F_TSO)
>+		return NF_ACCEPT;
>+
> 	/* Local packets are never produced too large for their
> 	   interface.  We degfragment them at LOCAL_OUT, however,
> 	   so we have to refragment them here. */
>  
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-10-29  0:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-27 16:10 iptables performance under 2.6.0[-test9] Andy Polyakov
2003-10-27 18:05 ` Andy Polyakov
2003-10-27 18:30   ` Andy Polyakov
2003-10-28  8:30   ` Patrick McHardy
2003-10-28 10:01     ` Andy Polyakov
2003-10-28 10:09       ` Patrick McHardy
2003-10-28 11:18         ` Andy Polyakov
2003-10-28 12:19           ` Patrick McHardy
2003-10-28 21:59             ` Andy Polyakov
2003-10-29  0:32               ` Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.