From mboxrd@z Thu Jan 1 00:00:00 1970 From: Risto Pajula Subject: Re: IP fragmentation performance and don't fragment bug when forwarding Date: Fri, 7 Dec 2018 16:46:44 +0200 Message-ID: References: <462e25db-8aad-7687-31e5-fb812d8daeaa@gmail.com> <51078d4c-17de-9c9d-4ba2-07a4b8e73575@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: netdev@vger.kernel.org To: "David S. Miller" , Alexey Kuznetsov Return-path: Received: from mail-lf1-f68.google.com ([209.85.167.68]:36701 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725998AbeLGOqr (ORCPT ); Fri, 7 Dec 2018 09:46:47 -0500 Received: by mail-lf1-f68.google.com with SMTP id a16so3223266lfg.3 for ; Fri, 07 Dec 2018 06:46:45 -0800 (PST) In-Reply-To: <51078d4c-17de-9c9d-4ba2-07a4b8e73575@gmail.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: Hello. I have been to track the poor forwarding latency to the TCP Window scale options. The Netgem device uses rather large windows scale options (x256) and I have been able to reproduce the routers poor forwarding latency also with linux box running in the internal network and changing the net.ipv4.tcp_rmem to a large value and thus changing the TCP window scaling options to larger ones. I still do not have clue why this causes the forwarfing in the linux kernel to block? Maybe something in the connection tracking....? With the ICMP timestamp messages I have been able to also pinpoint that the latency is caused in the eth1 sending side (the following hping3 example is run in the router toward the internal network... xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V using eth1, addr: 192.168.0.1, MTU: 1500 HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 data bytes len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40 icmp_seq=0 rtt=7.9 ms ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895 ICMP timestamp RTT tsrtt=7 len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40 icmp_seq=1 rtt=235.9 ms ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128 ICMP timestamp RTT tsrtt=235 len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40 icmp_seq=2 rtt=3.8 ms ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895 ICMP timestamp RTT tsrtt=3 len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40 icmp_seq=3 rtt=47.8 ms ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940 ICMP timestamp RTT tsrtt=47 len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40 icmp_seq=4 rtt=7.7 ms ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895 ICMP timestamp RTT tsrtt=7 len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40 icmp_seq=5 rtt=3.7 ms ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895 ICMP timestamp RTT tsrtt=3 ^C --- 192.168.0.112 hping statistic --- 6 packets tramitted, 6 packets received, 0% packet loss round-trip min/avg/max = 3.7/51.1/235.9 ms BR. Risto On 2.12.2018 23:32, Risto Pajula wrote: > Hello. > > You can most likely ignore the "DF Bit, mtu bug when forwarding" case. > There isn't actually big IP packets on the wire, instead there is > burst of packets on the wire, which are combined by the GRO... And > thus dropping them should not happen. Sorry about the invalid bug report. > > However the poor latency from intenal network to the internet still > remain, both GRO enabled and disabled. I will try to study further... > > > BR. > Risto > > > On 2.12.2018 14:01, Risto Pajula wrote: >> Hello. >> >> I have encountered a weird performance problem in Linux IP >> fragmentation when using video streaming services behind the NAT. >> Also I have studied a possible bug in the DF bit (don't fragment) >> handling when forwarding the IP packets. >> >> First the system setup description: >> >> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre >> router]-internet >> >> where: >> host1: is a Netgem N7800 "cable box" for online video streaming >> services provided by local telco (Can access Netflix, HBO nordic, >> "live TV", etc.) >> linux router: Linux computer with Dualcore Intel Celeron G1840, >> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0 >> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, >> mtu set to 1500, RTL8169sb/8110sb >> eth0: Linux Routers internet facing interface, public ip address, mtu >> set to 1500,  RTL8168evl/8111evl >> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly >> connected to the eth0 of the Linux router. >> >> And now when using the Netgem N7800 with online video services >> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP >> packets in the eth0 upto ~20kB, this seems to lead to the following >> problems in the Linux IP stack. >> >> IP fragmentation performance: >> When the Linux router receives these large IP packets in the eth0 >> everything works, but it seems that them cause very large performance >> degradation from internal network to the internet regarding the >> latency when the IP fragmentation is performed. The ping latency from >> internal network to the internel network increases from stable >> 15ms-20ms up to 700-800ms AND also the ping from the internal network >> to the linux router eth1 (192.168.0.). However up link works >> perfectly, the ping is still stable when streaming the online >> services (From linux router to the internet). It seems that the IP >> fragmentation is somehow blocking the eth1 reception or transmission >> for very long time (which it shouldn't). I'm able to test and debug >> the issue further, but advice regarding where to look would be >> appreciated. >> >> >> DF Bit, mtu bug when forwarding: >> I have started to study the above mentioned problem and have found a >> possible bug in the DF bit and mtu handling in IP forwarding. The BIG >> packets received from streaming services all have the "DF bit" set >> and the question is that should we be forwarding them at all as that >> would result them being fragmented? Apparently we currently are... I >> have traced this down to the ip_forward.c function ip_exceeds_mtu(), >> and the following patch seems to fix that. >> >> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200 >> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200 >> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct >>                 return false; >> >>         /* original fragment exceeds mtu and DF is set */ >> -       if (unlikely(IPCB(skb)->frag_max_size > mtu)) >> +        if (unlikely(skb->len > mtu)) >>                 return true; >> >>         if (skb->ignore_df) >> >> >> This seems to work (in some ways) - after the change IP packets that >> are too large to the internal network get dropped and we are sending >> "ICMP Destination unreachable, The datagram is too big" messages to >> the originator (as we should?). However it seems that not all >> services really like this... Netflix behaves as expected and ping is >> stable from internal network to the internet, but for example HBO >> nordic will not work anymore (too little buffering? Retransimissions >> not working?). So it seems the original issue should be also fixed >> (And the fragmention should be allowed?). >> >> >> >> Any advice would be appreciated. Thanks! >> >> PS. Watching TV was not this intensive 20 years ago :) >>