netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Risto Pajula <or.pajula@gmail.com>
To: "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: netdev@vger.kernel.org
Subject: Re: IP fragmentation performance and don't fragment bug when forwarding
Date: Fri, 7 Dec 2018 16:46:44 +0200	[thread overview]
Message-ID: <b5d114a8-ef56-bae5-5c7c-09db04d609a4@gmail.com> (raw)
In-Reply-To: <51078d4c-17de-9c9d-4ba2-07a4b8e73575@gmail.com>

Hello.

I have been to track the poor forwarding latency to the TCP Window scale 
options. The Netgem device uses rather large windows scale options 
(x256) and I have been able to reproduce the routers poor forwarding 
latency also with linux box running in the internal network and changing 
the net.ipv4.tcp_rmem to a large value and thus changing the TCP window 
scaling options to larger ones. I still do not have clue why this causes 
the forwarfing in the linux kernel to block? Maybe something in the 
connection tracking....?


With the ICMP timestamp messages I have been able to also pinpoint that 
the latency is caused in the eth1 sending side (the following hping3 
example is run in the router toward the internal network...


xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
using eth1, addr: 192.168.0.1, MTU: 1500
HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0 
data bytes
len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
icmp_seq=0 rtt=7.9 ms
ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
icmp_seq=1 rtt=235.9 ms
ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
ICMP timestamp RTT tsrtt=235

len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
icmp_seq=2 rtt=3.8 ms
ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
ICMP timestamp RTT tsrtt=3

len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
icmp_seq=3 rtt=47.8 ms
ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
ICMP timestamp RTT tsrtt=47

len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
icmp_seq=4 rtt=7.7 ms
ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
icmp_seq=5 rtt=3.7 ms
ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
ICMP timestamp RTT tsrtt=3

^C
--- 192.168.0.112 hping statistic ---
6 packets tramitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 3.7/51.1/235.9 ms



BR.
Risto

On 2.12.2018 23:32, Risto Pajula wrote:
> Hello.
>
> You can most likely ignore the "DF Bit, mtu bug when forwarding" case. 
> There isn't actually big IP packets on the wire, instead there is 
> burst of packets on the wire, which are combined by the GRO... And 
> thus dropping them should not happen. Sorry about the invalid bug report.
>
> However the poor latency from intenal network to the internet still 
> remain, both GRO enabled and disabled. I will try to study further...
>
>
> BR.
> Risto
>
>
> On 2.12.2018 14:01, Risto Pajula wrote:
>> Hello.
>>
>> I have encountered a weird performance problem in Linux IP 
>> fragmentation when using video streaming services behind the NAT. 
>> Also I have studied a possible bug in the DF bit (don't fragment) 
>> handling when forwarding the IP packets.
>>
>> First the system setup description:
>>
>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre 
>> router]-internet
>>
>> where:
>> host1: is a Netgem N7800 "cable box" for online video streaming 
>> services provided by local telco (Can access Netflix, HBO nordic, 
>> "live TV", etc.)
>> linux router: Linux computer with Dualcore Intel Celeron G1840, 
>> running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0
>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network, 
>> mtu set to 1500, RTL8169sb/8110sb
>> eth0: Linux Routers internet facing interface, public ip address, mtu 
>> set to 1500,  RTL8168evl/8111evl
>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly 
>> connected to the eth0 of the Linux router.
>>
>> And now when using the Netgem N7800 with online video services 
>> (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP 
>> packets in the eth0 upto ~20kB, this seems to lead to the following 
>> problems in the Linux IP stack.
>>
>> IP fragmentation performance:
>> When the Linux router receives these large IP packets in the eth0 
>> everything works, but it seems that them cause very large performance 
>> degradation from internal network to the internet regarding the 
>> latency when the IP fragmentation is performed. The ping latency from 
>> internal network to the internel network increases from stable 
>> 15ms-20ms up to 700-800ms AND also the ping from the internal network 
>> to the linux router eth1 (192.168.0.). However up link works 
>> perfectly, the ping is still stable when streaming the online 
>> services (From linux router to the internet). It seems that the IP 
>> fragmentation is somehow blocking the eth1 reception or transmission 
>> for very long time (which it shouldn't). I'm able to test and debug 
>> the issue further, but advice regarding where to look would be 
>> appreciated.
>>
>>
>> DF Bit, mtu bug when forwarding:
>> I have started to study the above mentioned problem and have found a 
>> possible bug in the DF bit and mtu handling in IP forwarding. The BIG 
>> packets received from streaming services all have the "DF bit" set 
>> and the question is that should we be forwarding them at all as that 
>> would result them being fragmented? Apparently we currently are... I 
>> have traced this down to the ip_forward.c function ip_exceeds_mtu(), 
>> and the following patch seems to fix that.
>>
>> --- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
>> +++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
>>                 return false;
>>
>>         /* original fragment exceeds mtu and DF is set */
>> -       if (unlikely(IPCB(skb)->frag_max_size > mtu))
>> +        if (unlikely(skb->len > mtu))
>>                 return true;
>>
>>         if (skb->ignore_df)
>>
>>
>> This seems to work (in some ways) - after the change IP packets that 
>> are too large to the internal network get dropped and we are sending 
>> "ICMP Destination unreachable, The datagram is too big" messages to 
>> the originator (as we should?). However it seems that not all 
>> services really like this... Netflix behaves as expected and ping is 
>> stable from internal network to the internet, but for example HBO 
>> nordic will not work anymore (too little buffering? Retransimissions 
>> not working?). So it seems the original issue should be also fixed 
>> (And the fragmention should be allowed?).
>>
>>
>>
>> Any advice would be appreciated. Thanks!
>>
>> PS. Watching TV was not this intensive 20 years ago :)
>>

  reply	other threads:[~2018-12-07 14:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-02 12:01 IP fragmentation performance and don't fragment bug when forwarding Risto Pajula
2018-12-02 21:32 ` Risto Pajula
2018-12-07 14:46   ` Risto Pajula [this message]
2018-12-09 23:28     ` IP (rtl8169) forwarding bug (performance) Risto Pajula
2018-12-10 21:26       ` Heiner Kallweit
2018-12-10 22:20         ` Risto Pajula
2018-12-11 17:01           ` Risto Pajula
2018-12-11 19:51             ` Heiner Kallweit
2018-12-12  1:28               ` Risto Pajula
2018-12-12  6:23                 ` Heiner Kallweit
2018-12-12 23:20                   ` Risto Pajula
2018-12-13  4:52                     ` Stephen Hemminger
2018-12-13 22:10                       ` Risto Pajula
2018-12-13 22:30                         ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5d114a8-ef56-bae5-5c7c-09db04d609a4@gmail.com \
    --to=or.pajula@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).