From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: IP (rtl8169) forwarding bug (performance) Date: Wed, 12 Dec 2018 20:52:38 -0800 Message-ID: <20181212205238.25b30922@xeon-e3> References: <462e25db-8aad-7687-31e5-fb812d8daeaa@gmail.com> <51078d4c-17de-9c9d-4ba2-07a4b8e73575@gmail.com> <514e2dd2-2f2d-a038-9b21-410c6e5fd90d@gmail.com> <7ab35f56-cb8e-ddbc-ce8a-148a943dec92@gmail.com> <75bb86b8-8c1b-315c-7e4d-a64215dd9e2e@gmail.com> <45b17eaf-26f3-f446-0aaa-bee7d7f48807@gmail.com> <5a59227c-dd73-801f-0d5a-a7383ae45295@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Heiner Kallweit , "David S. Miller" , Alexey Kuznetsov , Realtek linux nic maintainers , netdev@vger.kernel.org To: Risto Pajula Return-path: Received: from mail-pl1-f194.google.com ([209.85.214.194]:35945 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726478AbeLMEyT (ORCPT ); Wed, 12 Dec 2018 23:54:19 -0500 Received: by mail-pl1-f194.google.com with SMTP id g9so448315plo.3 for ; Wed, 12 Dec 2018 20:54:19 -0800 (PST) In-Reply-To: <5a59227c-dd73-801f-0d5a-a7383ae45295@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 13 Dec 2018 01:20:48 +0200 Risto Pajula wrote: > Hello. >=20 > I'm not able reproduce the actual problem anymore which was the high=20 > ping latency from the internal network. >=20 > This starts to sound like some sort of voodoo, but... >=20 > I tested replacing the switch to another brand where internal network=20 > trtl8169 is connected. This did not have any effect on the behavior, the= =20 > latency remained high. >=20 > Then I disconnected every device and all the network cables from the=20 > internal network. The I connected everything again. After that=20 > reproducing the problem seems very difficult. Likelihood for this to=20 > happen dramatically decreased. >=20 > Looking at pings 30min I once got this. >=20 > Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time=3D18ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time=3D24ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 >=20 > I did also look at the kernel traces, it seems the rtl8169 send FIFO=20 > still stalls, but it does not happen as often and in practice is hard to= =20 > notice for the user. >=20 > For example I did find this kind sequence: >=20 > t0: rtl8169_start_xmit first 1506 byte frame is written > ...9 more frames is transmitted > t0+350us: 2 frames is still not sent.=C2=A0 (transmission should have tak= en=20 > 120us + interframe time) >=20 > BR. > Risto >=20 > On 12.12.2018 8:23, Heiner Kallweit wrote: > > According to your description of the issue it doesn't need a very exoti= c scenario to trigger it. > > And due to the fact that Realtek network chips are used on a lot of con= sumer mainboards, I would > > assume quite some people are using such a mainboard for a use case like= yours. > > This makes it somewhat suspicious that at least I'm not aware of more s= uch reports. > > Therefore I think it's a good idea to check the other components of the= internal network. > > > > On 12.12.2018 02:28, Risto Pajula wrote: =20 > >> Hello. > >> > >> I added some debug prints to diagnose the bug properly. I can send the= patches if you are willing to debug/try... for example this output is prod= uced: > >> > >> 96096:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.466703: rtl8169_start_x= mit: RTLDBG221 eth1 rtl8169_start_xmit len: 1506 opts1: B0000000 txpol: 0 e= ntry: 24 cur_tx: 5656 frags: 0 dirty_tx: 5648 tx_left: 8 > >> 96097:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.466704: rtl8169_start_x= mit: RTLDBG111 eth1 rtl_tx_slots_avail > >> 96099:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] d.h.=C2=A0=C2=A0 232.468827: rtl8169_interru= pt: RTLDBG111 eth1 rtl8169_interrupt > >> 96103:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.468833: rtl8169_poll: R= TLDBG111 eth1 rtl8169_poll > >> 96104:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.468834: rtl8169_poll: R= TLDBG111 eth1 rtl_rx > >> 96105:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.468835: rtl8169_poll: R= TLDBG111 eth1 rtl8169_try_rx_copy > >> 96184:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.468914: rtl_tx: RTLDBG2= 22 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 16 cur_tx: 5657 fr= ags: 0 dirty_tx: 5648 tx_left: 9 > >> 96185:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.468924: rtl_tx: RTLDBG2= 23 eth1 rtl_tx entry opts1: 16, B00005E2 17, B00005E2 18, B00005E2 19, B000= 05E2 20, B00005E2 21, B00005E2 22, B00005E2 23, B00005E2 24, B00005E2 > >> 96189:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] d.h.=C2=A0=C2=A0 232.469006: rtl8169_interru= pt: RTLDBG111 eth1 rtl8169_interrupt > >> 96193:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469012: rtl8169_poll: R= TLDBG111 eth1 rtl8169_poll > >> 96194:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469014: rtl8169_poll: R= TLDBG111 eth1 rtl_rx > >> 96196:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469020: rtl_tx: RTLDBG2= 22 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 16 cur_tx: 5657 fr= ags: 0 dirty_tx: 5648 tx_left: 9 > >> 96204:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469032: rtl_tx: RTLDBG2= 22 eth1 rtl_tx len: 1506 opts1: 30000000 txpol: 0 entry: 17 cur_tx: 5657 fr= ags: 0 dirty_tx: 5649 tx_left: 8 > >> 96212:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469041: rtl_tx: RTLDBG2= 22 eth1 rtl_tx len: 1506 opts1: B00005E2 txpol: 0 entry: 18 cur_tx: 5657 fr= ags: 0 dirty_tx: 5650 tx_left: 7 > >> 96213:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0=C2=A0 232.469050: rtl_tx: RTLDBG2= 23 eth1 rtl_tx entry opts1: 18, B00005E2 19, B00005E2 20, B00005E2 21, B000= 05E2 22, B00005E2 23, B00005E2 24, B00005E2 > >> 96235:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0= =C2=A0=C2=A0=C2=A0=C2=A0 [000] d.h.=C2=A0=C2=A0 232.471253: rtl8169_interru= pt: RTLDBG111 eth1 rtl8169_interrupt > >> > >> Here we can clearly see that TX Fifo descriptors are properly setup an= d send is initiated, but then the rtl8169 stops sending after few frames. I= think the bug is in the HW side (or in the rtl8169 firmware). Some PCI set= tings might affect this, but then again shouldn't the rtl8169 then report P= CI error interrupt... > >> > >> Also this TX Fifo stalling happens with any traffic and no netflix/net= gem/large tcp windows are required to trigger it. What the netgem/netflix d= oes is that it causes this bug to be visible by probably due to timing of t= he traffic pattern, and thus causes also upper layers of the network stack = to take actions.. > >> > >> RTL8168evl indeed is a onboard Ethernet on a reasonable new mothorboar= d and the RTL8169sb is and old PCI card. But, the same issue happens with b= oth ethernet adapters (I switched the interfaces). > >> > >> But then again - this got me thinking... If the issue (send fifo stall= ing) does not happen in the internet facing interface even with heavy uploa= d traffic, then the problem must be in the internal network itself. Tomorro= w I will change the Ethernet switch. There could be some sort of compatibil= ity problem? > >> > >> BR. > >> Risto > >> > >> > >> On 11.12.2018 21:51, Heiner Kallweit wrote: =20 > >>> OK, then another idea .. At the very beginning of the mail thread it > >>> was stated that the router has to network ports: > >>> linux router: Linux computer with Dualcore Intel Celeron G1840, runni= ng currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0 > >>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network,= mtu set to 1500, RTL8169sb/8110sb > >>> eth0: Linux Routers internet facing interface, public ip address, mtu= set to 1500,=C2=A0 RTL8168evl/8111evl > >>> > >>> r8169 supports about 50 members of the RTL8169/RTL8168/RTL8101 family > >>> and more or less every single member needs its own quirks. > >>> RTL8168evl (PCIe) is somewhat recent, RTL8169sb (PCI) is ancient. > >>> To rule out you triggered some hardware issue: Could you switch both > >>> interfaces and check whether you see a change in system behavior? > >>> > >>> On 11.12.2018 18:01, Risto Pajula wrote: =20 > >>>> Hello. > >>>> > >>>> A freshly built 4.20.0-rc6-next-20181210-lp150.12.25-default waited = me when I got back from work, but unfortunately it did not help at all, it = behaved exactly in same manner. > >>>> > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time=3D73ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time=3D83ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time=3D307ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time=3D115ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> Reply from 192.168.0.1: bytes=3D32 time<1ms TTL=3D64 > >>>> > >>>> BR. > >>>> Risto > >>>> > >>>> > >>>> On 11.12.2018 0:20, Risto Pajula wrote: =20 > >>>>> Hello. > >>>>> > >>>>> I have not yet tested with linux-next but I will, thanks for pointi= ng that out. > >>>>> > >>>>> ...But I have studied the problem a bit more, indeed it seems that = the rtl8169 transmission queue gets stuck. > >>>>> > >>>>> Below is some trace log. Starting from 802026 a burst of frames is = forwarded from eth0 to eth1 and when we get to the rtl_tx only two frames h= ave been transmitted.. (Should have had 1,25 ms to transmit them).. Also th= en the sequence seems to repeat for very long time that only two massages g= et transmitted. > >>>>> > >>>>> 800541:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698143: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 34 cur_tx: 4406946 frags:= 0 dirty_tx: 4406946 tx_left: 0 > >>>>> 800671:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698189: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 35 cur_tx: 4406947 frags:= 0 dirty_tx: 4406946 tx_left: 1 > >>>>> 800835:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698235: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 36 cur_tx: 4406948 frags:= 0 dirty_tx: 4406946 tx_left: 2 > >>>>> 800905:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698255: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 44069= 46 tx_left: 3 > >>>>> 800932:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698263: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 44069= 47 tx_left: 2 > >>>>> 800960:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698270: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 37 cur_tx: 4406949 frags: 0 dirty_tx: 44069= 48 tx_left: 1 > >>>>> 800998:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698282: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 37 cur_tx: 4406949 frags:= 0 dirty_tx: 4406949 tx_left: 0 > >>>>> 801085:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698312: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 38 cur_tx: 4406950 frags: 0 dirty_tx: 44069= 49 tx_left: 1 > >>>>> 801137:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698329: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 38 cur_tx: 4406950 frags:= 0 dirty_tx: 4406950 tx_left: 0 > >>>>> 801229:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698363: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 39 cur_tx: 4406951 frags: 0 dirty_tx: 44069= 50 tx_left: 1 > >>>>> 801264:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698375: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 39 cur_tx: 4406951 frags:= 0 dirty_tx: 4406951 tx_left: 0 > >>>>> 801356:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698409: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 40 cur_tx: 4406952 frags: 0 dirty_tx: 44069= 51 tx_left: 1 > >>>>> 801391:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698421: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 40 cur_tx: 4406952 frags:= 0 dirty_tx: 4406952 tx_left: 0 > >>>>> 801482:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698455: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 41 cur_tx: 4406953 frags: 0 dirty_tx: 44069= 52 tx_left: 1 > >>>>> 801518:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698468: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 41 cur_tx: 4406953 frags:= 0 dirty_tx: 4406953 tx_left: 0 > >>>>> 801610:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698502: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 42 cur_tx: 4406954 frags: 0 dirty_tx: 44069= 53 tx_left: 1 > >>>>> 801645:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698514: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 42 cur_tx: 4406954 frags:= 0 dirty_tx: 4406954 tx_left: 0 > >>>>> 801737:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698548: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 43 cur_tx: 4406955 frags: 0 dirty_tx: 44069= 54 tx_left: 1 > >>>>> 801772:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698560: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 43 cur_tx: 4406955 frags:= 0 dirty_tx: 4406955 tx_left: 0 > >>>>> 801864:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698594: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 44 cur_tx: 4406956 frags: 0 dirty_tx: 44069= 55 tx_left: 1 > >>>>> 801899:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698607: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 44 cur_tx: 4406956 frags:= 0 dirty_tx: 4406956 tx_left: 0 > >>>>> 801984:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.698640: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 45 cur_tx: 4406957 frags: 0 dirty_tx: 44069= 56 tx_left: 1 > >>>>> 802026:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698655: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 45 cur_tx: 4406957 frags:= 0 dirty_tx: 4406957 tx_left: 0 > >>>>> 802129:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698701: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 46 cur_tx: 4406958 frags:= 0 dirty_tx: 4406957 tx_left: 1 > >>>>> 802232:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698747: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 47 cur_tx: 4406959 frags:= 0 dirty_tx: 4406957 tx_left: 2 > >>>>> 802335:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698793: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 48 cur_tx: 4406960 frags:= 0 dirty_tx: 4406957 tx_left: 3 > >>>>> 802438:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698840: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 49 cur_tx: 4406961 frags:= 0 dirty_tx: 4406957 tx_left: 4 > >>>>> 802541:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698885: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 50 cur_tx: 4406962 frags:= 0 dirty_tx: 4406957 tx_left: 5 > >>>>> 802644:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698932: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 51 cur_tx: 4406963 frags:= 0 dirty_tx: 4406957 tx_left: 6 > >>>>> 802747:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.698978: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 52 cur_tx: 4406964 frags:= 0 dirty_tx: 4406957 tx_left: 7 > >>>>> 802851:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.699025: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 53 cur_tx: 4406965 frags:= 0 dirty_tx: 4406957 tx_left: 8 > >>>>> 805094:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.699871: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 44069= 57 tx_left: 9 > >>>>> 805126:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.699878: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 54 cur_tx: 4406966 frags: 0 dirty_tx: 44069= 58 tx_left: 8 > >>>>> 805197:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.699905: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 54 cur_tx: 4406966 frags:= 0 dirty_tx: 4406959 tx_left: 7 > >>>>> 805234:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [001] ..s.=C2=A0 7237.699925: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 55 cur_tx: 4406967 frags:= 0 dirty_tx: 4406959 tx_left: 8 > >>>>> 811446:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.702344: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 44069= 59 tx_left: 9 > >>>>> 811479:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.702354: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 56 cur_tx: 4406968 frags: 0 dirty_tx: 44069= 60 tx_left: 8 > >>>>> 811641:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.702395: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 56 cur_tx: 4406968 frags:= 0 dirty_tx: 4406961 tx_left: 7 > >>>>> 811733:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.702416: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 57 cur_tx: 4406969 frags:= 0 dirty_tx: 4406961 tx_left: 8 > >>>>> 812050:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hping= 3-2871=C2=A0 [000] ..s.=C2=A0 7237.704770: rtl8169_poll: ORP33 eth1 rtl_tx = len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406961 tx_left: 9 > >>>>> 812061:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hping= 3-2871=C2=A0 [000] ..s.=C2=A0 7237.704777: rtl8169_poll: ORP33 eth1 rtl_tx = len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 4406962 tx_left: 8 > >>>>> 812124:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hping= 3-2871=C2=A0 [000] ..s.=C2=A0 7237.704811: rtl8169_start_xmit: ORP33 eth1 r= tl8169_start_xmit len: 1506 entry: 58 cur_tx: 4406970 frags: 0 dirty_tx: 44= 06963 tx_left: 7 > >>>>> 812176:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hping= 3-2871=C2=A0 [000] ..s.=C2=A0 7237.704831: rtl8169_start_xmit: ORP33 eth1 r= tl8169_start_xmit len: 1506 entry: 59 cur_tx: 4406971 frags: 0 dirty_tx: 44= 06963 tx_left: 8 > >>>>> 812300:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.707247: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 44069= 63 tx_left: 9 > >>>>> 812312:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.707257: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 60 cur_tx: 4406972 frags: 0 dirty_tx: 44069= 64 tx_left: 8 > >>>>> 812389:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.707297: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 60 cur_tx: 4406972 frags:= 0 dirty_tx: 4406965 tx_left: 7 > >>>>> 812426:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.707318: rtl8169_start_xmi= t: ORP33 eth1 rtl8169_start_xmit len: 1506 entry: 61 cur_tx: 4406973 frags:= 0 dirty_tx: 4406965 tx_left: 8 > >>>>> 812827:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.709690: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 44069= 65 tx_left: 9 > >>>>> 812838:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 7237.709700: rtl8169_poll: ORP= 33 eth1 rtl_tx len: 1506 entry: 62 cur_tx: 4406974 frags: 0 dirty_tx: 44069= 66 tx_left: 8 > >>>>> ..... > >>>>> > >>>>> > >>>>> BR. > >>>>> Risto > >>>>> > >>>>> On 10.12.2018 23:26, Heiner Kallweit wrote: =20 > >>>>>> Did you test also with the latest linux-next kernel? Some recent c= hanges like 2e6eedb4813e > >>>>>> "r8169: make use of xmit_more and __netdev_sent_queue" may have a = positive impact. > >>>>>> > >>>>>> On 10.12.2018 00:28, Risto Pajula wrote: =20 > >>>>>>> Hello. > >>>>>>> > >>>>>>> Old subject: "Re: IP fragmentation performance and don't fragment= bug when forwarding > >>>>>>> > >>>>>>> > >>>>>>> I have now been tracing the kernel and finding the bug seems diff= icult. I think the bug is combination of several things, likely cause is th= at it only occurs with rtl8169 and how it is using the netdevapi/DQL api. > >>>>>>> > >>>>>>> =C2=A0=C2=A0From my investigations seems that following happens: > >>>>>>> > >>>>>>> *Burst of frames is received from internet. (eth0) > >>>>>>> *These are accepted and forwarded to the (eth1) > >>>>>>> *DQL hits the limit, this causes the scheduling for the device to= be stopped (__QUEUE_STATE_STACK_XOFF) > >>>>>>> *This and combination of some timing causes the=C2=A0 performance= degradation, messages are only scheduled for transmission in soft Irq cont= ext. > >>>>>>> > >>>>>>> I still do now know, why the DQL hits the limits.. I think it sho= uld not,=C2=A0 maybe something undesirable first happens with rtl8169. > >>>>>>> > >>>>>>> I inserted following trace printk functions to the code.. > >>>>>>> > >>>>>>> rtl8169_start_xmit: > >>>>>>> trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_t= x: %u frags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entr= y, tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) ); > >>>>>>> > >>>>>>> > >>>>>>> rtl_tx: > >>>>>>> trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags:= %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len, tp->cur_= tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left ); > >>>>>>> > >>>>>>> > >>>>>>> Here is some grepped output only filtering the ping packet in two= different situations: > >>>>>>> > >>>>>>> //trace when downloading ubuntu (~20MB/s), only 3 secs fit to buf= fer. ping is all the time normal.... > >>>>>>> //orig tracing.out3754 > >>>>>>> 117493:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 5235.407116: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx: 37476= 40 tx_left: 1 > >>>>>>> 118704:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 5235.416057: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx: 37476= 67 tx_left: 1 > >>>>>>> 346319:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hpi= ng3-2871=C2=A0 [000] ....=C2=A0 5236.002456: rtl8169_start_xmit: ORP33 eth1= rtl8169_start_xmit len: 54 entry: 8 cur_tx: 3754312 frags: 0 dirty_tx: 375= 4312 tx_left: 0 > >>>>>>> 346337:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hpi= ng3-2871=C2=A0 [000] ..s.=C2=A0 5236.002475: rtl8169_poll: ORP33 eth1 rtl_t= x len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx: 3754312 tx_left: 1 > >>>>>>> 1277155:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hp= ing3-2871=C2=A0 [001] ....=C2=A0 5237.002627: rtl8169_start_xmit: ORP33 eth= 1 rtl8169_start_xmit len: 54 entry: 38 cur_tx: 3766502 frags: 0 dirty_tx: 3= 766502 tx_left: 0 > >>>>>>> 1277173:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 5237.002646: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx: 3766= 502 tx_left: 1 > >>>>>>> > >>>>>>> > >>>>>>> //imcp messages from route to intenal network when TCP stream is = on. tracing.out3236/eth1_ring.txt is the original > >>>>>>> //no idea why the first frame is tripled... > >>>>>>> 200792:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hpi= ng3-2871=C2=A0 [001] ....=C2=A0 3239.593384: rtl8169_start_xmit: ORP33 eth1= rtl8169_start_xmit len: 54 entry: 34 cur_tx: 2132770 frags: 0 dirty_tx: 21= 32770 tx_left: 0 > >>>>>>> 201578:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hpi= ng3-2871=C2=A0 [001] ....=C2=A0 3240.593625: rtl8169_start_xmit: ORP33 eth1= rtl8169_start_xmit len: 54 entry: 40 cur_tx: 2132776 frags: 0 dirty_tx: 21= 32776 tx_left: 0 > >>>>>>> 341061:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3241.738011: rtl8169_start_x= mit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52 cur_tx: 2133684 frags:= 0 dirty_tx: 2133680 tx_left: 4 > >>>>>>> 342112:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3241.745297: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx: 21336= 84 tx_left: 7 > >>>>>>> 500126:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3242.739500: rtl8169_start_x= mit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49 cur_tx: 2134513 frags:= 0 dirty_tx: 2134505 tx_left: 8 > >>>>>>> 501739:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3242.751654: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx: 21345= 13 tx_left: 9 > >>>>>>> 646220:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3243.745761: rtl8169_start_x= mit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41 cur_tx: 2135273 frags:= 0 dirty_tx: 2135267 tx_left: 6 > >>>>>>> 647633:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3243.755486: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx: 21352= 73 tx_left: 8 > >>>>>>> 802878:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3244.739947: rtl8169_start_x= mit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2136087 frags:= 0 dirty_tx: 2136081 tx_left: 6 > >>>>>>> 804298:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3244.749677: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx: 21360= 87 tx_left: 8 > >>>>>>> 961190:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3245.746217: rtl8169_start_x= mit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16 cur_tx: 2136912 frags:= 0 dirty_tx: 2136906 tx_left: 6 > >>>>>>> 962610:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3245.755946: rtl8169_poll: O= RP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx: 21369= 12 tx_left: 8 > >>>>>>> 1118044:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3246.740336: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62 cur_tx: 2137726 frags= : 0 dirty_tx: 2137720 tx_left: 6 > >>>>>>> 1119443:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3246.750047: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx: 21377= 26 tx_left: 7 > >>>>>>> 1264047:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3247.709202: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29 cur_tx: 2138525 frags= : 0 dirty_tx: 2138518 tx_left: 7 > >>>>>>> 1264740:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3247.718918: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx: 2138= 525 tx_left: 4 > >>>>>>> 1419958:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3248.736436: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48 cur_tx: 2139312 frags= : 0 dirty_tx: 2139305 tx_left: 7 > >>>>>>> 1421551:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3248.746189: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx: 2139= 312 tx_left: 8 > >>>>>>> 1578746:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3249.742702: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46 cur_tx: 2140142 frags= : 0 dirty_tx: 2140135 tx_left: 7 > >>>>>>> 1580194:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3249.752457: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx: 2140= 142 tx_left: 8 > >>>>>>> 1729597:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3250.746839: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36 cur_tx: 2140964 frags= : 0 dirty_tx: 2140957 tx_left: 7 > >>>>>>> 1731015:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3250.756594: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx: 2140= 964 tx_left: 7 > >>>>>>> 1881028:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3251.740991: rtl8169_start_= xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19 cur_tx: 2141779 frags= : 0 dirty_tx: 2141772 tx_left: 7 > >>>>>>> 1881381:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3251.750718: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx: 2141= 779 tx_left: 7 > >>>>>>> //something happens, the output queue empties.. now it is fast, a= nd rtl8169_start_xmit is not called from soft_irq > >>>>>>> 1896178:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hp= ing3-2871=C2=A0 [001] ....=C2=A0 3252.595994: rtl8169_start_xmit: ORP33 eth= 1 rtl8169_start_xmit len: 54 entry: 23 cur_tx: 2141911 frags: 0 dirty_tx: 2= 141911 tx_left: 0 > >>>>>>> 1896196:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3252.596055: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx: 2141= 911 tx_left: 1 > >>>>>>> 1923136:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hp= ing3-2871=C2=A0 [001] ....=C2=A0 3253.596227: rtl8169_start_xmit: ORP33 eth= 1 rtl8169_start_xmit len: 54 entry: 50 cur_tx: 2142066 frags: 0 dirty_tx: 2= 142066 tx_left: 0 > >>>>>>> 1923154:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3253.596289: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx: 2142= 066 tx_left: 1 > >>>>>>> 1951633:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hp= ing3-2871=C2=A0 [001] ....=C2=A0 3254.596459: rtl8169_start_xmit: ORP33 eth= 1 rtl8169_start_xmit len: 54 entry: 5 cur_tx: 2142213 frags: 0 dirty_tx: 21= 42213 tx_left: 0 > >>>>>>> 1951651:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3254.596521: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx: 21422= 13 tx_left: 1 > >>>>>>> 1954472:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hp= ing3-2871=C2=A0 [001] ....=C2=A0 3255.596700: rtl8169_start_xmit: ORP33 eth= 1 rtl8169_start_xmit len: 54 entry: 15 cur_tx: 2142223 frags: 0 dirty_tx: 2= 142223 tx_left: 0 > >>>>>>> 1954490:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -0=C2=A0=C2=A0=C2=A0=C2=A0 [000] ..s.=C2=A0 3255.596763: rtl8169_poll: = ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx: 2142= 223 tx_left: 1 > >>>>>>> //until it soon, again is slow.. > >>>>>>> > >>>>>>> > >>>>>>> BR. > >>>>>>> Risto > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 7.12.2018 16:46, Risto Pajula wrote: =20 > >>>>>>>> Hello. > >>>>>>>> > >>>>>>>> I have been to track the poor forwarding latency to the TCP Wind= ow scale options. The Netgem device uses rather large windows scale options= (x256) and I have been able to reproduce the routers poor forwarding laten= cy also with linux box running in the internal network and changing the net= .ipv4.tcp_rmem to a large value and thus changing the TCP window scaling op= tions to larger ones. I still do not have clue why this causes the forwarfi= ng in the linux kernel to block? Maybe something in the connection tracking= ....? > >>>>>>>> > >>>>>>>> > >>>>>>>> With the ICMP timestamp messages I have been able to also pinpoi= nt that the latency is caused in the eth1 sending side (the following hping= 3 example is run in the router toward the internal network... > >>>>>>>> > >>>>>>>> > >>>>>>>> xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp= -ts -V > >>>>>>>> using eth1, addr: 192.168.0.1, MTU: 1500 > >>>>>>>> HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 head= ers + 0 data bytes > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D49464 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D0 rtt=3D7.9 ms > >>>>>>>> ICMP timestamp: Originate=3D52294891 Receive=3D52294895 Transmit= =3D52294895 > >>>>>>>> ICMP timestamp RTT tsrtt=3D7 > >>>>>>>> > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D49795 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D1 rtt=3D235.9 ms > >>>>>>>> ICMP timestamp: Originate=3D52295891 Receive=3D52296128 Transmit= =3D52296128 > >>>>>>>> ICMP timestamp RTT tsrtt=3D235 > >>>>>>>> > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D49941 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D2 rtt=3D3.8 ms > >>>>>>>> ICMP timestamp: Originate=3D52296891 Receive=3D52296895 Transmit= =3D52296895 > >>>>>>>> ICMP timestamp RTT tsrtt=3D3 > >>>>>>>> > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D50685 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D3 rtt=3D47.8 ms > >>>>>>>> ICMP timestamp: Originate=3D52297891 Receive=3D52297940 Transmit= =3D52297940 > >>>>>>>> ICMP timestamp RTT tsrtt=3D47 > >>>>>>>> > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D51266 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D4 rtt=3D7.7 ms > >>>>>>>> ICMP timestamp: Originate=3D52298891 Receive=3D52298895 Transmit= =3D52298895 > >>>>>>>> ICMP timestamp RTT tsrtt=3D7 > >>>>>>>> > >>>>>>>> len=3D46 ip=3D192.168.0.112 ttl=3D64 id=3D52245 tos=3D0 iplen=3D= 40 > >>>>>>>> icmp_seq=3D5 rtt=3D3.7 ms > >>>>>>>> ICMP timestamp: Originate=3D52299891 Receive=3D52299895 Transmit= =3D52299895 > >>>>>>>> ICMP timestamp RTT tsrtt=3D3 > >>>>>>>> > >>>>>>>> ^C > >>>>>>>> --- 192.168.0.112 hping statistic --- > >>>>>>>> 6 packets tramitted, 6 packets received, 0% packet loss > >>>>>>>> round-trip min/avg/max =3D 3.7/51.1/235.9 ms > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> BR. > >>>>>>>> Risto > >>>>>>>> > >>>>>>>> On 2.12.2018 23:32, Risto Pajula wrote: =20 > >>>>>>>>> Hello. > >>>>>>>>> > >>>>>>>>> You can most likely ignore the "DF Bit, mtu bug when forwarding= " case. There isn't actually big IP packets on the wire, instead there is b= urst of packets on the wire, which are combined by the GRO... And thus drop= ping them should not happen. Sorry about the invalid bug report. > >>>>>>>>> > >>>>>>>>> However the poor latency from intenal network to the internet s= till remain, both GRO enabled and disabled. I will try to study further... > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> BR. > >>>>>>>>> Risto > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2.12.2018 14:01, Risto Pajula wrote: =20 > >>>>>>>>>> Hello. > >>>>>>>>>> > >>>>>>>>>> I have encountered a weird performance problem in Linux IP fra= gmentation when using video streaming services behind the NAT. Also I have = studied a possible bug in the DF bit (don't fragment) handling when forward= ing the IP packets. > >>>>>>>>>> > >>>>>>>>>> First the system setup description: > >>>>>>>>>> > >>>>>>>>>> [host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibre route= r]-internet > >>>>>>>>>> > >>>>>>>>>> where: > >>>>>>>>>> host1: is a Netgem N7800 "cable box" for online video streamin= g services provided by local telco (Can access Netflix, HBO nordic, "live T= V", etc.) > >>>>>>>>>> linux router: Linux computer with Dualcore Intel Celeron G1840= , running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0 > >>>>>>>>>> eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 n= etwork, mtu set to 1500, RTL8169sb/8110sb > >>>>>>>>>> eth0: Linux Routers internet facing interface, public ip addre= ss, mtu set to 1500,=C2=A0 RTL8168evl/8111evl > >>>>>>>>>> fibre router: Alcatel Lucent fibre router (I-241G-Q), directly= connected to the eth0 of the Linux router. > >>>>>>>>>> > >>>>>>>>>> And now when using the Netgem N7800 with online video services= (Netflix, HBO nordic, etc) the Linux router will receive very BIG IP packe= ts in the eth0 upto ~20kB, this seems to lead to the following problems in = the Linux IP stack. > >>>>>>>>>> > >>>>>>>>>> IP fragmentation performance: > >>>>>>>>>> When the Linux router receives these large IP packets in the e= th0 everything works, but it seems that them cause very large performance d= egradation from internal network to the internet regarding the latency when= the IP fragmentation is performed. The ping latency from internal network = to the internel network increases from stable 15ms-20ms up to 700-800ms AND= also the ping from the internal network to the linux router eth1 (192.168.= 0.). However up link works perfectly, the ping is still stable when streami= ng the online services (From linux router to the internet). It seems that t= he IP fragmentation is somehow blocking the eth1 reception or transmission = for very long time (which it shouldn't). I'm able to test and debug the iss= ue further, but advice regarding where to look would be appreciated. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> DF Bit, mtu bug when forwarding: > >>>>>>>>>> I have started to study the above mentioned problem and have f= ound a possible bug in the DF bit and mtu handling in IP forwarding. The BI= G packets received from streaming services all have the "DF bit" set and th= e question is that should we be forwarding them at all as that would result= them being fragmented? Apparently we currently are... I have traced this d= own to the ip_forward.c function ip_exceeds_mtu(), and the following patch = seems to fix that. > >>>>>>>>>> > >>>>>>>>>> --- net/ipv4/ip_forward.c.orig=C2=A0 2018-12-02 11:09:32.76432= 0780 +0200 > >>>>>>>>>> +++ net/ipv4/ip_forward.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 2018-12-02 12:53:25.031232347 +0200 > >>>>>>>>>> @@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct > >>>>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return false; > >>>>>>>>>> > >>>>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* ori= ginal fragment exceeds mtu and DF is set */ > >>>>>>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (unlikely(IPCB(skb)->= frag_max_size > mtu)) > >>>>>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (unlikely(skb->= len > mtu)) > >>>>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return true; > >>>>>>>>>> > >>>>>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (sk= b->ignore_df) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This seems to work (in some ways) - after the change IP packet= s that are too large to the internal network get dropped and we are sending= "ICMP Destination unreachable, The datagram is too big" messages to the or= iginator (as we should?). However it seems that not all services really lik= e this... Netflix behaves as expected and ping is stable from internal netw= ork to the internet, but for example HBO nordic will not work anymore (too = little buffering? Retransimissions not working?). So it seems the original = issue should be also fixed (And the fragmention should be allowed?). > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Any advice would be appreciated. Thanks! > >>>>>>>>>> > >>>>>>>>>> PS. Watching TV was not this intensive 20 years ago :) > >>>>>>>>>> =20 Did you disable ethernet flow control? Ethernet flow control is usually a bad idea, it can cause head of line blocking. Unfortunately, most devices default to on.