From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick McHardy Subject: Re: iptables performance under 2.6.0[-test9] Date: Wed, 29 Oct 2003 01:32:45 +0100 Sender: netfilter-devel-admin@lists.netfilter.org Message-ID: <3F9F0AAD.2040203@trash.net> References: <3F9D4370.99795B87@fy.chalmers.se> <3F9D5E60.866B0B63@fy.chalmers.se> <3F9E292C.3020509@trash.net> <3F9E3E6C.C0CC5598@fy.chalmers.se> <3F9E406C.7050105@trash.net> <3F9E506A.BD4BA635@fy.chalmers.se> <3F9E5EE6.1090103@trash.net> <3F9EE6C8.AE3F16DA@fy.chalmers.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: netfilter-devel@lists.netfilter.org Return-path: To: Andy Polyakov In-Reply-To: <3F9EE6C8.AE3F16DA@fy.chalmers.se> Errors-To: netfilter-devel-admin@lists.netfilter.org List-Help: List-Post: List-Subscribe: , List-Unsubscribe: , List-Archive: List-Id: netfilter-devel.vger.kernel.org Andy Polyakov wrote: >>This is either a misconfiguration or a bug in TCP. >> >> > >Looks like neither:-) My NIC turned to be NETIF_F_TSO capable, which >means that it "can off-load TCP/IP segmentation" and kernel is allowed >to and does throw packets larger than ethernet MTU at it [and tcpdump >therefore was honest]. > >I'm currently running attached patch and it apparently solves my >*particular* problem, but I can't tell if it's actually "the right >thing(tm)" to do... Is (*pskb)->sk->sk_route_caps right place to check? >Maybe out->features is more appropriate? Is there TSO maximum which one >should compare (*pskb)->len against? That kind of questions... > > NETIF_F_TSO is a netdevice flag but it needs to be enabled, so its probably not dev->features which we need to check. I'm going to have a look at this tomorrow. However I do not understand why the packets got dropped after fragmentation. This is what we need to understand for fixing. >HOWEVER!!! Even if we figure out "the right thing(tm)" and address the >NETIF_F_TSO issue in proper manner, it does *not* necessarily mean that >performance problem will disappear as well. Well, in my optinion... I >mean performance might still suffer, whenever user will for example >masquerade a larger MTU interface behind "narrower" one, e.g. behind >PPPoE virtual interface, and further experiments should therefore be >performed... But I'm not sure if I'll be able to assist, because my >NETIF_F_TSO capable NIC might make it impossible to arrange for proper >setup [without PPPoE which I simply don't have]. I'll try, but can't >make any promises... Cheers. A. > Yes that it a known problem, ip_conntrack will perform refragmentation with different mtu despite IP_DF set. The ipv6 conntrack port from USAGI solved the problem by keeping the original sk_buffs with the defragmented one and instead of refragmenting sending the original ones and checking size and DF for them. This solves the pmtu discovery issues. One remaining (not very important) problem are protocols like NFS which send carefully spaced fragments. Defragmentation "eats" the spacing so they are send to the device in a burst. Regards, Patrick >------------------------------------------------------------------------ > >--- ./net/ipv4/netfilter/ip_conntrack_standalone.c.orig Sat Oct 25 20:43:32 2003 >+++ ./net/ipv4/netfilter/ip_conntrack_standalone.c Tue Oct 28 23:16:56 2003 >@@ -198,6 +198,9 @@ > if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT) > return NF_DROP; > >+ if ((*pskb)->sk && (*pskb)->sk->sk_route_caps&NETIF_F_TSO) >+ return NF_ACCEPT; >+ > /* Local packets are never produced too large for their > interface. We degfragment them at LOCAL_OUT, however, > so we have to refragment them here. */ > >