From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users traffic Date: Wed, 31 Oct 2018 15:09:05 -0700 Message-ID: <61e30474-b5e9-4dc8-a8a6-90cdd17d2a66@gmail.com> References: <61697e49-e839-befc-8330-fc00187c48ee@itcare.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit To: =?UTF-8?Q?Pawe=c5=82_Staszewski?= , netdev Return-path: Received: from mail-pl1-f175.google.com ([209.85.214.175]:45937 "EHLO mail-pl1-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725805AbeKAHJH (ORCPT ); Thu, 1 Nov 2018 03:09:07 -0400 Received: by mail-pl1-f175.google.com with SMTP id o19-v6so7910668pll.12 for ; Wed, 31 Oct 2018 15:09:09 -0700 (PDT) In-Reply-To: <61697e49-e839-befc-8330-fc00187c48ee@itcare.pl> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 10/31/2018 02:57 PM, Paweł Staszewski wrote: > Hi > > So maybee someone will be interested how linux kernel handles normal traffic (not pktgen :) ) > > > Server HW configuration: > > CPU : Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz > > NIC's: 2x 100G Mellanox ConnectX-4 (connected to x16 pcie 8GT) > > > Server software: > > FRR - as routing daemon > > enp175s0f0 (100G) - 16 vlans from upstreams (28 RSS binded to local numa node) > > enp175s0f1 (100G) - 343 vlans to clients (28 RSS binded to local numa node) > > > Maximum traffic that server can handle: > > Bandwidth > >  bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help >   input: /proc/net/dev type: rate >   \         iface                   Rx Tx                Total > ============================================================================== >        enp175s0f1:          28.51 Gb/s           37.24 Gb/s           65.74 Gb/s >        enp175s0f0:          38.07 Gb/s           28.44 Gb/s           66.51 Gb/s > ------------------------------------------------------------------------------ >             total:          66.58 Gb/s           65.67 Gb/s          132.25 Gb/s > > > Packets per second: > >  bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help >   input: /proc/net/dev type: rate >   -         iface                   Rx Tx                Total > ============================================================================== >        enp175s0f1:      5248589.00 P/s       3486617.75 P/s 8735207.00 P/s >        enp175s0f0:      3557944.25 P/s       5232516.00 P/s 8790460.00 P/s > ------------------------------------------------------------------------------ >             total:      8806533.00 P/s       8719134.00 P/s 17525668.00 P/s > > > After reaching that limits nics on the upstream side (more RX traffic) start to drop packets > > > I just dont understand that server can't handle more bandwidth (~40Gbit/s is limit where all cpu's are 100% util) - where pps on RX side are increasing. > > Was thinking that maybee reached some pcie x16 limit - but x16 8GT is 126Gbit - and also when testing with pktgen i can reach more bw and pps (like 4x more comparing to normal internet traffic) > > And wondering if there is something that can be improved here. > > > > Some more informations / counters / stats and perf top below: > > Perf top flame graph: > > https://uploadfiles.io/7zo6u > > > > System configuration(long): > > > cat /sys/devices/system/node/node1/cpulist > 14-27,42-55 > cat /sys/class/net/enp175s0f0/device/numa_node > 1 > cat /sys/class/net/enp175s0f1/device/numa_node > 1 > > > > > > ip -s -d link ls dev enp175s0f0 > 6: enp175s0f0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192 >     link/ether 0c:c4:7a:d8:5d:1c brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535 >     RX: bytes  packets  errors  dropped overrun mcast >     184142375840858 141347715974 2       2806325 0       85050528 >     TX: bytes  packets  errors  dropped carrier collsns >     99270697277430 172227994003 0       0       0       0 > >  ip -s -d link ls dev enp175s0f1 > 7: enp175s0f1: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 8192 >     link/ether 0c:c4:7a:d8:5d:1d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 448 numrxqueues 56 gso_max_size 65536 gso_max_segs 65535 >     RX: bytes  packets  errors  dropped overrun mcast >     99686284170801 173507590134 61      669685  0       100304421 >     TX: bytes  packets  errors  dropped carrier collsns >     184435107970545 142383178304 0       0       0       0 > > > ./softnet.sh > cpu      total    dropped   squeezed  collision        rps flow_limit > > > > >    PerfTop:  108490 irqs/sec  kernel:99.6%  exact:  0.0% [4000Hz cycles],  (all, 56 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >     26.78%  [kernel]       [k] queued_spin_lock_slowpath This is highly suspect. A call graph (perf record -a -g sleep 1; perf report --stdio) would tell what is going on. With that many TX/RX queues, I would expect you to not use RPS/RFS, and have a 1/1 RX/TX mapping, so I do not know what could request a spinlock contention.