All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: Andy Polyakov <appro@fy.chalmers.se>
Cc: netfilter-devel@lists.netfilter.org
Subject: Re: iptables performance under 2.6.0[-test9]
Date: Wed, 29 Oct 2003 01:32:45 +0100	[thread overview]
Message-ID: <3F9F0AAD.2040203@trash.net> (raw)
In-Reply-To: <3F9EE6C8.AE3F16DA@fy.chalmers.se>

Andy Polyakov wrote:

>>This is either a misconfiguration or a bug in TCP.
>>    
>>
>
>Looks like neither:-) My NIC turned to be NETIF_F_TSO capable, which
>means that it "can off-load TCP/IP segmentation" and kernel is allowed
>to and does throw packets larger than ethernet MTU at it [and tcpdump
>therefore was honest].
>
>I'm currently running attached patch and it apparently solves my
>*particular* problem, but I can't tell if it's actually "the right
>thing(tm)" to do... Is (*pskb)->sk->sk_route_caps right place to check?
>Maybe out->features is more appropriate? Is there TSO maximum which one
>should compare (*pskb)->len against? That kind of questions...
>  
>
NETIF_F_TSO is a netdevice flag but it needs to be enabled, so its
probably not dev->features which we need to check. I'm going to have
a look at this tomorrow. However I do not understand why the packets
got dropped after fragmentation. This is what we need to understand
for fixing.

>HOWEVER!!! Even if we figure out "the right thing(tm)" and address the
>NETIF_F_TSO issue in proper manner, it does *not* necessarily mean that
>performance problem will disappear as well. Well, in my optinion... I
>mean performance might still suffer, whenever user will for example
>masquerade a larger MTU interface behind "narrower" one, e.g. behind
>PPPoE virtual interface, and further experiments should therefore be
>performed... But I'm not sure if I'll be able to assist, because my
>NETIF_F_TSO capable NIC might make it impossible to arrange for proper
>setup [without PPPoE which I simply don't have]. I'll try, but can't
>make any promises... Cheers. A.
>

Yes that it a known problem, ip_conntrack will perform refragmentation
with different mtu despite IP_DF set. The ipv6 conntrack port from USAGI
solved the problem by keeping the original sk_buffs with the defragmented
one and instead of refragmenting sending the original ones and checking
size and DF for them. This solves the pmtu discovery issues.

One remaining (not very important) problem are protocols like NFS
which send carefully spaced fragments. Defragmentation "eats" the
spacing so they are send to the device in a burst.

Regards,
Patrick


>------------------------------------------------------------------------
>
>--- ./net/ipv4/netfilter/ip_conntrack_standalone.c.orig	Sat Oct 25 20:43:32 2003
>+++ ./net/ipv4/netfilter/ip_conntrack_standalone.c	Tue Oct 28 23:16:56 2003
>@@ -198,6 +198,9 @@
> 	if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
> 		return NF_DROP;
> 
>+	if ((*pskb)->sk && (*pskb)->sk->sk_route_caps&NETIF_F_TSO)
>+		return NF_ACCEPT;
>+
> 	/* Local packets are never produced too large for their
> 	   interface.  We degfragment them at LOCAL_OUT, however,
> 	   so we have to refragment them here. */
>  
>

      reply	other threads:[~2003-10-29  0:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-27 16:10 iptables performance under 2.6.0[-test9] Andy Polyakov
2003-10-27 18:05 ` Andy Polyakov
2003-10-27 18:30   ` Andy Polyakov
2003-10-28  8:30   ` Patrick McHardy
2003-10-28 10:01     ` Andy Polyakov
2003-10-28 10:09       ` Patrick McHardy
2003-10-28 11:18         ` Andy Polyakov
2003-10-28 12:19           ` Patrick McHardy
2003-10-28 21:59             ` Andy Polyakov
2003-10-29  0:32               ` Patrick McHardy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F9F0AAD.2040203@trash.net \
    --to=kaber@trash.net \
    --cc=appro@fy.chalmers.se \
    --cc=netfilter-devel@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.