From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Paweł Staszewski" <pstaszewski@itcare.pl>
Cc: Saeed Mahameed <saeedm@mellanox.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
brouer@redhat.com
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users traffic
Date: Sat, 10 Nov 2018 20:34:09 +0100 [thread overview]
Message-ID: <20181110203409.482f39ec@redhat.com> (raw)
In-Reply-To: <e6ece370-bf62-5d36-0417-779d2345fc8d@itcare.pl>
On Fri, 9 Nov 2018 23:20:38 +0100 Paweł Staszewski <pstaszewski@itcare.pl> wrote:
> W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze:
> > CPU load is lower than for connectx4 - but it looks like bandwidth
> > limit is the same :)
> > But also after reaching 60Gbit/60Gbit
> >
> > bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
> > input: /proc/net/dev type: rate
> > - iface Rx Tx Total
> > ==========================================================================
> >
> > enp175s0: 45.09 Gb/s 15.09 Gb/s 60.18 Gb/s
> > enp216s0: 15.14 Gb/s 45.19 Gb/s 60.33 Gb/s
> > --------------------------------------------------------------------------
> >
> > total: 60.45 Gb/s 60.48 Gb/s 120.93 Gb/s
>
> Today reached 65/65Gbit/s
>
> But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets
> (with 50%CPU on all 28cores) - so still there is cpu power to use :).
This is weird!
How do you see / measure these drops?
> So checked other stats.
> softnet_stats shows average 1k squeezed per sec:
Is below output the raw counters? not per sec?
It would be valuable to see the per sec stats instead...
I use this tool:
https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl
> cpu total dropped squeezed collision rps flow_limit
> 0 18554 0 1 0 0 0
> 1 16728 0 1 0 0 0
> 2 18033 0 1 0 0 0
> 3 17757 0 1 0 0 0
> 4 18861 0 0 0 0 0
> 5 0 0 1 0 0 0
> 6 2 0 1 0 0 0
> 7 0 0 1 0 0 0
> 8 0 0 0 0 0 0
> 9 0 0 1 0 0 0
> 10 0 0 0 0 0 0
> 11 0 0 1 0 0 0
> 12 50 0 1 0 0 0
> 13 257 0 0 0 0 0
> 14 3629115363 0 3353259 0 0 0
> 15 255167835 0 3138271 0 0 0
> 16 4240101961 0 3036130 0 0 0
> 17 599810018 0 3072169 0 0 0
> 18 432796524 0 3034191 0 0 0
> 19 41803906 0 3037405 0 0 0
> 20 900382666 0 3112294 0 0 0
> 21 620926085 0 3086009 0 0 0
> 22 41861198 0 3023142 0 0 0
> 23 4090425574 0 2990412 0 0 0
> 24 4264870218 0 3010272 0 0 0
> 25 141401811 0 3027153 0 0 0
> 26 104155188 0 3051251 0 0 0
> 27 4261258691 0 3039765 0 0 0
> 28 4 0 1 0 0 0
> 29 4 0 0 0 0 0
> 30 0 0 1 0 0 0
> 31 0 0 0 0 0 0
> 32 3 0 1 0 0 0
> 33 1 0 1 0 0 0
> 34 0 0 1 0 0 0
> 35 0 0 0 0 0 0
> 36 0 0 1 0 0 0
> 37 0 0 1 0 0 0
> 38 0 0 1 0 0 0
> 39 0 0 1 0 0 0
> 40 0 0 0 0 0 0
> 41 0 0 1 0 0 0
> 42 299758202 0 3139693 0 0 0
> 43 4254727979 0 3103577 0 0 0
> 44 1959555543 0 2554885 0 0 0
> 45 1675702723 0 2513481 0 0 0
> 46 1908435503 0 2519698 0 0 0
> 47 1877799710 0 2537768 0 0 0
> 48 2384274076 0 2584673 0 0 0
> 49 2598104878 0 2593616 0 0 0
> 50 1897566829 0 2530857 0 0 0
> 51 1712741629 0 2489089 0 0 0
> 52 1704033648 0 2495892 0 0 0
> 53 1636781820 0 2499783 0 0 0
> 54 1861997734 0 2541060 0 0 0
> 55 2113521616 0 2555673 0 0 0
>
>
> So i rised netdev backlog and budged to rly high values
> 524288 for netdev_budget and same for backlog
Does it affect the squeezed counters?
Notice, this (crazy) huge netdev_budget limit will also be limited
by /proc/sys/net/core/netdev_budget_usecs.
> This rised sortirqs from about 600k/sec to 800k/sec for NET_TX/NET_RX
Hmmm, this could indicated not enough NAPI bulking is occurring.
I have a BPF tool, that can give you some insight into NAPI bulking and
softirq idle/kthread starting. Called 'napi_monitor', could you try to
run this, so can try to understand this? You find the tool here:
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_user.c
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c
> But after this changes i have less packets drops.
>
>
> Below perf top from max traffic reached:
> PerfTop: 72230 irqs/sec kernel:99.4% exact: 0.0% [4000Hz
> cycles], (all, 56 CPUs)
> ------------------------------------------------------------------------------------------
>
> 12.62% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
> 8.44% [kernel] [k] mlx5e_sq_xmit
> 6.69% [kernel] [k] build_skb
> 5.21% [kernel] [k] fib_table_lookup
> 3.54% [kernel] [k] memcpy_erms
> 3.20% [kernel] [k] mlx5e_poll_rx_cq
> 2.25% [kernel] [k] vlan_do_receive
> 2.20% [kernel] [k] mlx5e_post_rx_mpwqes
> 2.02% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
> 1.95% [kernel] [k] __dev_queue_xmit
> 1.83% [kernel] [k] dev_gro_receive
> 1.79% [kernel] [k] tcp_gro_receive
> 1.73% [kernel] [k] ip_finish_output2
> 1.63% [kernel] [k] mlx5e_poll_tx_cq
> 1.49% [kernel] [k] ipt_do_table
> 1.38% [kernel] [k] inet_gro_receive
> 1.31% [kernel] [k] __netif_receive_skb_core
> 1.30% [kernel] [k] _raw_spin_lock
> 1.28% [kernel] [k] mlx5_eq_int
> 1.24% [kernel] [k] irq_entries_start
> 1.19% [kernel] [k] __build_skb
> 1.15% [kernel] [k] swiotlb_map_page
> 1.02% [kernel] [k] vlan_dev_hard_start_xmit
> 0.94% [kernel] [k] pfifo_fast_dequeue
> 0.92% [kernel] [k] ip_route_input_rcu
> 0.86% [kernel] [k] kmem_cache_alloc
> 0.80% [kernel] [k] mlx5e_xmit
> 0.79% [kernel] [k] dev_hard_start_xmit
> 0.78% [kernel] [k] _raw_spin_lock_irqsave
> 0.74% [kernel] [k] ip_forward
> 0.72% [kernel] [k] tasklet_action_common.isra.21
> 0.68% [kernel] [k] pfifo_fast_enqueue
> 0.67% [kernel] [k] netif_skb_features
> 0.66% [kernel] [k] skb_segment
> 0.60% [kernel] [k] skb_gro_receive
> 0.56% [kernel] [k] validate_xmit_skb.isra.142
> 0.53% [kernel] [k] skb_release_data
> 0.51% [kernel] [k] mlx5e_page_release
> 0.51% [kernel] [k] ip_rcv_core.isra.20.constprop.25
> 0.51% [kernel] [k] __qdisc_run
> 0.50% [kernel] [k] tcp4_gro_receive
> 0.49% [kernel] [k] page_frag_free
> 0.46% [kernel] [k] kmem_cache_free_bulk
> 0.43% [kernel] [k] kmem_cache_free
> 0.42% [kernel] [k] try_to_wake_up
> 0.39% [kernel] [k] _raw_spin_lock_irq
> 0.39% [kernel] [k] find_busiest_group
> 0.37% [kernel] [k] __memcpy
>
>
>
> Remember those tests are now on two separate connectx5 connected to
> two separate pcie x16 gen 3.0
That is strange... I still suspect some HW NIC issue, can you provide
ethtool stats info via tool:
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
$ ethtool_stats.pl --dev enp175s0 --dev enp216s0
The tool remove zero-stats counters and report per sec stats. It makes
it easier to spot that is relevant for the given workload.
Can you give output put from:
$ ethtool --show-priv-flag DEVICE
I want you to experiment with:
ethtool --set-priv-flags DEVICE rx_striding_rq off
I think you already have played with 'rx_cqe_compress', right.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2018-11-11 5:20 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 21:57 Kernel 4.19 network performance - forwarding/routing normal users traffic Paweł Staszewski
2018-10-31 22:09 ` Eric Dumazet
2018-10-31 22:20 ` Paweł Staszewski
2018-10-31 22:45 ` Paweł Staszewski
2018-11-01 9:22 ` Jesper Dangaard Brouer
2018-11-01 10:34 ` Paweł Staszewski
2018-11-01 15:27 ` Aaron Lu
2018-11-01 20:23 ` Saeed Mahameed
2018-11-02 5:23 ` Aaron Lu
2018-11-02 11:40 ` Jesper Dangaard Brouer
2018-11-02 14:20 ` Aaron Lu
2018-11-02 19:02 ` Paweł Staszewski
2018-11-03 0:16 ` Paweł Staszewski
2018-11-03 12:01 ` Paweł Staszewski
2018-11-03 12:58 ` Jesper Dangaard Brouer
2018-11-03 15:23 ` Paweł Staszewski
2018-11-03 15:43 ` Paweł Staszewski
2018-11-03 12:53 ` Jesper Dangaard Brouer
2018-11-05 6:28 ` Aaron Lu
2018-11-05 9:10 ` Jesper Dangaard Brouer
2018-11-05 8:42 ` Tariq Toukan
2018-11-05 8:48 ` Aaron Lu
2018-11-01 3:37 ` David Ahern
2018-11-01 10:55 ` Jesper Dangaard Brouer
2018-11-01 13:52 ` Paweł Staszewski
2018-11-01 17:23 ` David Ahern
2018-11-01 17:30 ` Paweł Staszewski
2018-11-03 17:32 ` David Ahern
2018-11-04 0:24 ` Paweł Staszewski
2018-11-05 20:17 ` Jesper Dangaard Brouer
2018-11-08 0:59 ` Paweł Staszewski
2018-11-08 1:13 ` Paweł Staszewski
2018-11-08 14:43 ` Paweł Staszewski
2018-11-07 21:06 ` David Ahern
2018-11-08 13:33 ` Paweł Staszewski
2018-11-08 16:06 ` David Ahern
2018-11-08 16:25 ` Paweł Staszewski
2018-11-08 16:27 ` Paweł Staszewski
2018-11-08 16:32 ` David Ahern
2018-11-08 17:30 ` Paweł Staszewski
2018-11-08 18:05 ` David Ahern
2018-11-09 0:40 ` Paweł Staszewski
2018-11-09 0:42 ` David Ahern
2018-11-09 4:52 ` Saeed Mahameed
2018-11-09 7:52 ` Jesper Dangaard Brouer
2018-11-09 9:56 ` Paweł Staszewski
2018-11-09 10:20 ` Paweł Staszewski
2018-11-09 16:21 ` David Ahern
2018-11-09 19:59 ` Paweł Staszewski
2018-11-10 0:06 ` David Ahern
2018-11-10 13:18 ` Paweł Staszewski
2018-11-10 14:56 ` David Ahern
2018-11-19 21:59 ` David Ahern
2018-11-20 23:00 ` Paweł Staszewski
2018-11-01 9:50 ` Saeed Mahameed
2018-11-01 11:09 ` Paweł Staszewski
2018-11-01 16:49 ` Paweł Staszewski
2018-11-01 20:37 ` Saeed Mahameed
2018-11-01 21:18 ` Paweł Staszewski
2018-11-01 21:24 ` Paweł Staszewski
2018-11-01 21:34 ` Paweł Staszewski
2018-11-03 0:18 ` Paweł Staszewski
2018-11-08 19:12 ` Paweł Staszewski
2018-11-09 22:20 ` Paweł Staszewski
2018-11-10 19:34 ` Jesper Dangaard Brouer [this message]
2018-11-10 19:49 ` Paweł Staszewski
2018-11-10 19:56 ` Paweł Staszewski
2018-11-10 22:06 ` Jesper Dangaard Brouer
2018-11-10 22:19 ` Paweł Staszewski
2018-11-11 8:03 ` Jesper Dangaard Brouer
2018-11-11 10:26 ` Paweł Staszewski
2018-11-10 20:02 ` Paweł Staszewski
2018-11-10 21:01 ` Jesper Dangaard Brouer
2018-11-10 21:53 ` Paweł Staszewski
2018-11-10 22:04 ` Paweł Staszewski
2018-11-11 8:56 ` Jesper Dangaard Brouer
2018-11-12 19:19 ` Paweł Staszewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181110203409.482f39ec@redhat.com \
--to=brouer@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pstaszewski@itcare.pl \
--cc=saeedm@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.