From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users traffic Date: Sat, 10 Nov 2018 23:06:30 +0100 Message-ID: <20181110230630.0daeba8e@redhat.com> References: <61697e49-e839-befc-8330-fc00187c48ee@itcare.pl> <659fbf4b481c815f45a58b2351481cc9f761445b.camel@mellanox.com> <6486d01d-7a50-33c4-e27f-4ace8aa8e150@itcare.pl> <920c2665-781f-5f62-efbe-347e63063a24@itcare.pl> <162e25c6-dae2-7e1e-75f0-9c5b22453495@itcare.pl> <20181110203409.482f39ec@redhat.com> <7037c58d-d77d-bdd5-6c91-19cea3cbe539@itcare.pl> <69c11e38-da50-15a3-2dfc-bc47ccc134b9@itcare.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: Saeed Mahameed , "netdev@vger.kernel.org" , brouer@redhat.com To: =?UTF-8?B?UGF3ZcWC?= Staszewski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:44944 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725815AbeKKHxF (ORCPT ); Sun, 11 Nov 2018 02:53:05 -0500 In-Reply-To: <69c11e38-da50-15a3-2dfc-bc47ccc134b9@itcare.pl> Sender: netdev-owner@vger.kernel.org List-ID: On Sat, 10 Nov 2018 20:56:02 +0100 Paweł Staszewski wrote: > W dniu 10.11.2018 o 20:49, Paweł Staszewski pisze: > > > > > > W dniu 10.11.2018 o 20:34, Jesper Dangaard Brouer pisze: > >> On Fri, 9 Nov 2018 23:20:38 +0100 Paweł Staszewski > >> wrote: > >> > >>> W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze: > >>>> CPU load is lower than for connectx4 - but it looks like bandwidth > >>>> limit is the same :) > >>>> But also after reaching 60Gbit/60Gbit > >>>> > >>>>   bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help > >>>>    input: /proc/net/dev type: rate > >>>>    -         iface                   Rx Tx Total > >>>> =================================================================== > >>>> > >>>> > >>>>           enp175s0:          45.09 Gb/s  15.09 Gb/s     60.18 Gb/s > >>>>           enp216s0:          15.14 Gb/s  45.19 Gb/s     60.33 Gb/s > >>>> ------------------------------------------------------------------- > >>>> > >>>> > >>>>              total:          60.45 Gb/s  60.48 Gb/s 120.93 Gb/s > >>> Today reached 65/65Gbit/s > >>> > >>> But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets > >>> (with 50%CPU on all 28cores) - so still there is cpu power to use :). > >> This is weird! > >> > >> How do you see / measure these drops? > > > > Simple icmp test like ping -i 0.1 > > And im testing by icmp management ip address on vlan that is attacked > > to one NIC (the side that is more stressed with RX) > > And another icmp test is forward thru this router - host behind it > > > > Both measurements shows same loss ratio from 0.1 to 0.5% after > > reaching ~45Gbit/s RX side - depends how much RX side is pushed drops > > vary between 0.1 to 0.5 - even 0.6%:) > > Okay good to know, you use an external measurement for this. I do think packets are getting dropped by the NIC. > >>> So checked other stats. > >>> softnet_stats shows average 1k squeezed per sec: > >> Is below output the raw counters? not per sec? > >> > >> It would be valuable to see the per sec stats instead... > >> I use this tool: > >> https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl > CPU          total/sec     dropped/sec    squeezed/sec collision/sec      rx_rps/sec  flow_limit/sec > CPU:00               0               0               0 0               0               0 [...] > CPU:13               0               0               0 0               0               0 > CPU:14          485538               0              43 0               0               0 > CPU:15          474794               0              51 0               0               0 > CPU:16          449322               0              41 0               0               0 > CPU:17          476420               0              46 0               0               0 > CPU:18          440436               0              38 0               0               0 > CPU:19          501499               0              49 0               0               0 > CPU:20          459468               0              49 0               0               0 > CPU:21          438928               0              47 0               0               0 > CPU:22          468983               0              40 0               0               0 > CPU:23          446253               0              47 0               0               0 > CPU:24          451909               0              46 0               0               0 > CPU:25          479373               0              55 0               0               0 > CPU:26          467848               0              49 0               0               0 > CPU:27          453153               0              51 0               0               0 > CPU:28               0               0               0 0               0               0 [...] > CPU:40               0               0               0 0               0               0 > CPU:41               0               0               0 0               0               0 > CPU:42          466853               0              43 0               0               0 > CPU:43          453059               0              54 0               0               0 > CPU:44          363219               0              34 0               0               0 > CPU:45          353632               0              38 0               0               0 > CPU:46          371618               0              40 0               0               0 > CPU:47          350518               0              46 0               0               0 > CPU:48          397544               0              40 0               0               0 > CPU:49          364873               0              38 0               0               0 > CPU:50          383630               0              38 0               0               0 > CPU:51          358771               0              39 0               0               0 > CPU:52          372547               0              38 0               0               0 > CPU:53          372882               0              36 0               0               0 > CPU:54          366244               0              43 0               0               0 > CPU:55          365886               0              39 0               0               0 > > Summed:       11835201               0            1217 0               0               0 Do notice, the per CPU squeeze is not too large. The summed 11.8 Mpps is a little high compared to: Ethtool(enp216s0) stat: 4971677 (4,971,677) <= rx_packets /sec Ethtool(enp175s0) stat: 3717148 (3,717,148) <= rx_packets /sec Sum: 3717148+4971677 = 8688825 (8,688,825) [...] > >>> > >>> Remember those tests are now on two separate connectx5 connected to > >>> two separate pcie x16  gen 3.0 > >>   That is strange... I still suspect some HW NIC issue, can you provide > >> ethtool stats info via tool: > >> > >> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl > >> > >> $ ethtool_stats.pl --dev enp175s0 --dev enp216s0 > >> > >> The tool remove zero-stats counters and report per sec stats. It makes > >> it easier to spot that is relevant for the given workload. > > yes mlnx have just too many counters that are always 0 for my case :) > > Will try this also > > > But still alot of non 0 counters > Show adapter(s) (enp175s0 enp216s0) statistics (ONLY that changed!) > Ethtool(enp175s0) stat:         8891 (          8,891) <= ch0_arm /sec [...] I have copied the stats over in another document so I can better looks at it... and I've found some interesting stats. E.g. we can see that the NIC hardware is dropping packets. RX-drops on enp175s0: (enp175s0) stat: 4850734036 ( 4,850,734,036) <= rx_bytes /sec (enp175s0) stat: 5069043007 ( 5,069,043,007) <= rx_bytes_phy /sec -218308971 ( -218,308,971) Dropped bytes /sec (enp175s0) stat: 139602 ( 139,602) <= rx_discards_phy /sec (enp175s0) stat: 3717148 ( 3,717,148) <= rx_packets /sec (enp175s0) stat: 3862420 ( 3,862,420) <= rx_packets_phy /sec -145272 ( -145,272) Dropped packets /sec RX-drops on enp216s0 is less: (enp216s0) stat: 2592286809 ( 2,592,286,809) <= rx_bytes /sec (enp216s0) stat: 2633575771 ( 2,633,575,771) <= rx_bytes_phy /sec -41288962 ( -41,288,962) Dropped bytes /sec (enp216s0) stat: 464 (464) <= rx_discards_phy /sec (enp216s0) stat: 4971677 ( 4,971,677) <= rx_packets /sec (enp216s0) stat: 4975563 ( 4,975,563) <= rx_packets_phy /sec -3886 ( -3,886) Dropped packets /sec -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer