From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users traffic Date: Sat, 10 Nov 2018 20:34:09 +0100 Message-ID: <20181110203409.482f39ec@redhat.com> References: <61697e49-e839-befc-8330-fc00187c48ee@itcare.pl> <659fbf4b481c815f45a58b2351481cc9f761445b.camel@mellanox.com> <6486d01d-7a50-33c4-e27f-4ace8aa8e150@itcare.pl> <920c2665-781f-5f62-efbe-347e63063a24@itcare.pl> <162e25c6-dae2-7e1e-75f0-9c5b22453495@itcare.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: Saeed Mahameed , "netdev@vger.kernel.org" , brouer@redhat.com To: =?UTF-8?B?UGF3ZcWC?= Staszewski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:60868 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726839AbeKKFUU (ORCPT ); Sun, 11 Nov 2018 00:20:20 -0500 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 9 Nov 2018 23:20:38 +0100 Paweł Staszewski wrote: > W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze: > > CPU load is lower than for connectx4 - but it looks like bandwidth > > limit is the same :) > > But also after reaching 60Gbit/60Gbit > > > >  bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help > >   input: /proc/net/dev type: rate > >   -         iface                   Rx Tx                Total > > ========================================================================== > > > >          enp175s0:          45.09 Gb/s           15.09 Gb/s     60.18 Gb/s > >          enp216s0:          15.14 Gb/s           45.19 Gb/s     60.33 Gb/s > > -------------------------------------------------------------------------- > > > >             total:          60.45 Gb/s           60.48 Gb/s 120.93 Gb/s > > Today reached 65/65Gbit/s > > But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets > (with 50%CPU on all 28cores) - so still there is cpu power to use :). This is weird! How do you see / measure these drops? > So checked other stats. > softnet_stats shows average 1k squeezed per sec: Is below output the raw counters? not per sec? It would be valuable to see the per sec stats instead... I use this tool: https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl > cpu      total    dropped   squeezed  collision        rps flow_limit >   0      18554          0          1          0          0 0 >   1      16728          0          1          0          0 0 >   2      18033          0          1          0          0 0 >   3      17757          0          1          0          0 0 >   4      18861          0          0          0          0 0 >   5          0          0          1          0          0 0 >   6          2          0          1          0          0 0 >   7          0          0          1          0          0 0 >   8          0          0          0          0          0 0 >   9          0          0          1          0          0 0 >  10          0          0          0          0          0 0 >  11          0          0          1          0          0 0 >  12         50          0          1          0          0 0 >  13        257          0          0          0          0 0 >  14 3629115363          0    3353259          0          0 0 >  15  255167835          0    3138271          0          0 0 >  16 4240101961          0    3036130          0          0 0 >  17  599810018          0    3072169          0          0 0 >  18  432796524          0    3034191          0          0 0 >  19   41803906          0    3037405          0          0 0 >  20  900382666          0    3112294          0          0 0 >  21  620926085          0    3086009          0          0 0 >  22   41861198          0    3023142          0          0 0 >  23 4090425574          0    2990412          0          0 0 >  24 4264870218          0    3010272          0          0 0 >  25  141401811          0    3027153          0          0 0 >  26  104155188          0    3051251          0          0 0 >  27 4261258691          0    3039765          0          0 0 >  28          4          0          1          0          0 0 >  29          4          0          0          0          0 0 >  30          0          0          1          0          0 0 >  31          0          0          0          0          0 0 >  32          3          0          1          0          0 0 >  33          1          0          1          0          0 0 >  34          0          0          1          0          0 0 >  35          0          0          0          0          0 0 >  36          0          0          1          0          0 0 >  37          0          0          1          0          0 0 >  38          0          0          1          0          0 0 >  39          0          0          1          0          0 0 >  40          0          0          0          0          0 0 >  41          0          0          1          0          0 0 >  42  299758202          0    3139693          0          0 0 >  43 4254727979          0    3103577          0          0 0 >  44 1959555543          0    2554885          0          0 0 >  45 1675702723          0    2513481          0          0 0 >  46 1908435503          0    2519698          0          0 0 >  47 1877799710          0    2537768          0          0 0 >  48 2384274076          0    2584673          0          0 0 >  49 2598104878          0    2593616          0          0 0 >  50 1897566829          0    2530857          0          0 0 >  51 1712741629          0    2489089          0          0 0 >  52 1704033648          0    2495892          0          0 0 >  53 1636781820          0    2499783          0          0 0 >  54 1861997734          0    2541060          0          0 0 >  55 2113521616          0    2555673          0          0 0 > > > So i rised netdev backlog and budged to rly high values > 524288 for netdev_budget and same for backlog Does it affect the squeezed counters? Notice, this (crazy) huge netdev_budget limit will also be limited by /proc/sys/net/core/netdev_budget_usecs. > This rised sortirqs from about 600k/sec to 800k/sec for NET_TX/NET_RX Hmmm, this could indicated not enough NAPI bulking is occurring. I have a BPF tool, that can give you some insight into NAPI bulking and softirq idle/kthread starting. Called 'napi_monitor', could you try to run this, so can try to understand this? You find the tool here: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/ https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_user.c https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c > But after this changes i have less packets drops. > > > Below perf top from max traffic reached: >    PerfTop:   72230 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz > cycles],  (all, 56 CPUs) > ------------------------------------------------------------------------------------------ > >     12.62%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear >      8.44%  [kernel]       [k] mlx5e_sq_xmit >      6.69%  [kernel]       [k] build_skb >      5.21%  [kernel]       [k] fib_table_lookup >      3.54%  [kernel]       [k] memcpy_erms >      3.20%  [kernel]       [k] mlx5e_poll_rx_cq >      2.25%  [kernel]       [k] vlan_do_receive >      2.20%  [kernel]       [k] mlx5e_post_rx_mpwqes >      2.02%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq >      1.95%  [kernel]       [k] __dev_queue_xmit >      1.83%  [kernel]       [k] dev_gro_receive >      1.79%  [kernel]       [k] tcp_gro_receive >      1.73%  [kernel]       [k] ip_finish_output2 >      1.63%  [kernel]       [k] mlx5e_poll_tx_cq >      1.49%  [kernel]       [k] ipt_do_table >      1.38%  [kernel]       [k] inet_gro_receive >      1.31%  [kernel]       [k] __netif_receive_skb_core >      1.30%  [kernel]       [k] _raw_spin_lock >      1.28%  [kernel]       [k] mlx5_eq_int >      1.24%  [kernel]       [k] irq_entries_start >      1.19%  [kernel]       [k] __build_skb >      1.15%  [kernel]       [k] swiotlb_map_page >      1.02%  [kernel]       [k] vlan_dev_hard_start_xmit >      0.94%  [kernel]       [k] pfifo_fast_dequeue >      0.92%  [kernel]       [k] ip_route_input_rcu >      0.86%  [kernel]       [k] kmem_cache_alloc >      0.80%  [kernel]       [k] mlx5e_xmit >      0.79%  [kernel]       [k] dev_hard_start_xmit >      0.78%  [kernel]       [k] _raw_spin_lock_irqsave >      0.74%  [kernel]       [k] ip_forward >      0.72%  [kernel]       [k] tasklet_action_common.isra.21 >      0.68%  [kernel]       [k] pfifo_fast_enqueue >      0.67%  [kernel]       [k] netif_skb_features >      0.66%  [kernel]       [k] skb_segment >      0.60%  [kernel]       [k] skb_gro_receive >      0.56%  [kernel]       [k] validate_xmit_skb.isra.142 >      0.53%  [kernel]       [k] skb_release_data >      0.51%  [kernel]       [k] mlx5e_page_release >      0.51%  [kernel]       [k] ip_rcv_core.isra.20.constprop.25 >      0.51%  [kernel]       [k] __qdisc_run >      0.50%  [kernel]       [k] tcp4_gro_receive >      0.49%  [kernel]       [k] page_frag_free >      0.46%  [kernel]       [k] kmem_cache_free_bulk >      0.43%  [kernel]       [k] kmem_cache_free >      0.42%  [kernel]       [k] try_to_wake_up >      0.39%  [kernel]       [k] _raw_spin_lock_irq >      0.39%  [kernel]       [k] find_busiest_group >      0.37%  [kernel]       [k] __memcpy > > > > Remember those tests are now on two separate connectx5 connected to > two separate pcie x16  gen 3.0 That is strange... I still suspect some HW NIC issue, can you provide ethtool stats info via tool: https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl $ ethtool_stats.pl --dev enp175s0 --dev enp216s0 The tool remove zero-stats counters and report per sec stats. It makes it easier to spot that is relevant for the given workload. Can you give output put from: $ ethtool --show-priv-flag DEVICE I want you to experiment with: ethtool --set-priv-flags DEVICE rx_striding_rq off I think you already have played with 'rx_cqe_compress', right. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer