From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hpe.com>,
netdev@vger.kernel.org, Saeed Mahameed <saeedm@mellanox.com>,
Tariq Toukan <tariqt@mellanox.com>,
brouer@redhat.com
Subject: Re: Netperf UDP issue with connected sockets
Date: Mon, 21 Nov 2016 17:03:51 +0100 [thread overview]
Message-ID: <20161121170351.50a09ee1@redhat.com> (raw)
In-Reply-To: <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com>
On Thu, 17 Nov 2016 10:51:23 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
>
> > The point is I can see a socket Send-Q forming, thus we do know the
> > application have something to send. Thus, and possibility for
> > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > socket layer into qdisc layer, should be fairly simple (and rest of
> > xmit_more is already in place).
>
>
> As I said, you are fooled by TX completions.
Obviously TX completions play a role yes, and I bet I can adjust the
TX completion to cause xmit_more to happen, at the expense of
introducing added latency.
The point is the "bloated" spinlock in __dev_queue_xmit is still caused
by the MMIO tailptr/doorbell. The added cost occurs when enqueueing
packets, and result in the inability to get enough packets into the
qdisc for xmit_more going (on my system). I argue that a bulk enqueue
API would allow us to get past the hurtle of transitioning into
xmit_more mode more easily.
> Please make sure to increase the sndbuf limits !
>
> echo 2129920 >/proc/sys/net/core/wmem_default
Testing with this makes no difference.
$ grep -H . /proc/sys/net/core/wmem_default
/proc/sys/net/core/wmem_default:2129920
> lpaa23:~# sar -n DEV 1 10|grep eth1
IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
> 10:49:25 eth1 7.00 9273283.00 0.61 2187214.90 0.00 0.00 0.00
> 10:49:26 eth1 1.00 9230795.00 0.06 2176787.57 0.00 0.00 1.00
> 10:49:27 eth1 2.00 9247906.00 0.17 2180915.45 0.00 0.00 0.00
> 10:49:28 eth1 3.00 9246542.00 0.23 2180790.38 0.00 0.00 1.00
> Average: eth1 2.50 9018045.70 0.25 2126893.82 0.00 0.00 0.50
Very impressive numbers 9.2Mpps TX.
What is this test? What kind of traffic? Multiple CPUs?
> lpaa23:~# ethtool -S eth1|grep more; sleep 1;ethtool -S eth1|grep more
> xmit_more: 2251366909
> xmit_more: 2256011392
>
> lpaa23:~# echo 2256011392-2251366909 | bc
> 4644483
The xmit_more definitely works on your system, but I cannot get it to
"kick-in" on my setup. Once the xmit_more is active, then the
"bloated" spinlock problem should go way.
(Tests with "udp_flood --pmtu 3 --send")
Forcing TX completion to happen on the same CPU, no xmit_more:
~/git/network-testing/bin/ethtool_stats.pl --sec 2 --dev mlx5p2
Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
Ethtool(mlx5p2 ) stat: 104592908 ( 104,592,908) <= tx0_bytes /sec
Ethtool(mlx5p2 ) stat: 39059 ( 39,059) <= tx0_nop /sec
Ethtool(mlx5p2 ) stat: 1743215 ( 1,743,215) <= tx0_packets /sec
Ethtool(mlx5p2 ) stat: 104719986 ( 104,719,986) <= tx_bytes /sec
Ethtool(mlx5p2 ) stat: 111774540 ( 111,774,540) <= tx_bytes_phy /sec
Ethtool(mlx5p2 ) stat: 1745333 ( 1,745,333) <= tx_csum_partial /sec
Ethtool(mlx5p2 ) stat: 1745333 ( 1,745,333) <= tx_packets /sec
Ethtool(mlx5p2 ) stat: 1746477 ( 1,746,477) <= tx_packets_phy /sec
Ethtool(mlx5p2 ) stat: 111483434 ( 111,483,434) <= tx_prio1_bytes /sec
Ethtool(mlx5p2 ) stat: 1741928 ( 1,741,928) <= tx_prio1_packets /sec
Forcing TX completion to happen on remote CPU, some xmit_more:
Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
Ethtool(mlx5p2 ) stat: 128485892 ( 128,485,892) <= tx0_bytes /sec
Ethtool(mlx5p2 ) stat: 31840 ( 31,840) <= tx0_nop /sec
Ethtool(mlx5p2 ) stat: 2141432 ( 2,141,432) <= tx0_packets /sec
Ethtool(mlx5p2 ) stat: 350 ( 350) <= tx0_xmit_more /sec
Ethtool(mlx5p2 ) stat: 128486459 ( 128,486,459) <= tx_bytes /sec
Ethtool(mlx5p2 ) stat: 137052191 ( 137,052,191) <= tx_bytes_phy /sec
Ethtool(mlx5p2 ) stat: 2141441 ( 2,141,441) <= tx_csum_partial /sec
Ethtool(mlx5p2 ) stat: 2141441 ( 2,141,441) <= tx_packets /sec
Ethtool(mlx5p2 ) stat: 2141441 ( 2,141,441) <= tx_packets_phy /sec
Ethtool(mlx5p2 ) stat: 137051300 ( 137,051,300) <= tx_prio1_bytes /sec
Ethtool(mlx5p2 ) stat: 2141427 ( 2,141,427) <= tx_prio1_packets /sec
Ethtool(mlx5p2 ) stat: 350 ( 350) <= tx_xmit_more /sec
> PerfTop: 76969 irqs/sec kernel:96.6% exact: 100.0% [4000Hz cycles:pp], (all, 48 CPUs)
>---------------------------------------------------------------------------------------------
> 11.64% [kernel] [k] skb_set_owner_w
> 6.21% [kernel] [k] queued_spin_lock_slowpath
> 4.76% [kernel] [k] _raw_spin_lock
> 4.40% [kernel] [k] __ip_make_skb
> 3.10% [kernel] [k] sock_wfree
> 2.87% [kernel] [k] ipt_do_table
> 2.76% [kernel] [k] fq_dequeue
> 2.71% [kernel] [k] mlx4_en_xmit
> 2.50% [kernel] [k] __dev_queue_xmit
> 2.29% [kernel] [k] __ip_append_data.isra.40
> 2.28% [kernel] [k] udp_sendmsg
> 2.01% [kernel] [k] __alloc_skb
> 1.90% [kernel] [k] napi_consume_skb
> 1.63% [kernel] [k] udp_send_skb
> 1.62% [kernel] [k] skb_release_data
> 1.62% [kernel] [k] entry_SYSCALL_64_fastpath
> 1.56% [kernel] [k] dev_hard_start_xmit
> 1.55% udpsnd [.] __libc_send
> 1.48% [kernel] [k] netif_skb_features
> 1.42% [kernel] [k] __qdisc_run
> 1.35% [kernel] [k] sk_dst_check
> 1.33% [kernel] [k] sock_def_write_space
> 1.30% [kernel] [k] kmem_cache_alloc_node_trace
> 1.29% [kernel] [k] __local_bh_enable_ip
> 1.21% [kernel] [k] copy_user_enhanced_fast_string
> 1.08% [kernel] [k] __kmalloc_reserve.isra.40
> 1.08% [kernel] [k] SYSC_sendto
> 1.07% [kernel] [k] kmem_cache_alloc_node
> 0.95% [kernel] [k] ip_finish_output2
> 0.95% [kernel] [k] ktime_get
> 0.91% [kernel] [k] validate_xmit_skb
> 0.88% [kernel] [k] sock_alloc_send_pskb
> 0.82% [kernel] [k] sock_sendmsg
My perf outputs below...
Forcing TX completion to happen on the same CPU, no xmit_more:
# Overhead CPU Command Shared Object Symbol
# ........ ... .......... ................. ...............................
#
12.17% 000 udp_flood [kernel.vmlinux] [k] _raw_spin_lock
5.03% 000 udp_flood [mlx5_core] [k] mlx5e_sq_xmit
3.13% 000 udp_flood [kernel.vmlinux] [k] __ip_append_data.isra.47
2.85% 000 udp_flood [kernel.vmlinux] [k] entry_SYSCALL_64
2.75% 000 udp_flood [mlx5_core] [k] mlx5e_poll_tx_cq
2.61% 000 udp_flood [kernel.vmlinux] [k] sock_def_write_space
2.48% 000 udp_flood [kernel.vmlinux] [k] skb_set_owner_w
2.25% 000 udp_flood [kernel.vmlinux] [k] __alloc_skb
2.21% 000 udp_flood [kernel.vmlinux] [k] udp_sendmsg
2.19% 000 udp_flood [kernel.vmlinux] [k] __slab_free
2.08% 000 udp_flood [kernel.vmlinux] [k] sock_wfree
2.06% 000 udp_flood [kernel.vmlinux] [k] __ip_make_skb
1.93% 000 udp_flood [mlx5_core] [k] mlx5e_get_cqe
1.93% 000 udp_flood libc-2.17.so [.] __libc_send
1.80% 000 udp_flood [kernel.vmlinux] [k] entry_SYSCALL_64_fastpath
1.64% 000 udp_flood [kernel.vmlinux] [k] kfree
1.61% 000 udp_flood [kernel.vmlinux] [k] ip_finish_output2
1.59% 000 udp_flood [kernel.vmlinux] [k] __local_bh_enable_ip
1.57% 000 udp_flood [kernel.vmlinux] [k] __dev_queue_xmit
1.49% 000 udp_flood [kernel.vmlinux] [k] __kmalloc_node_track_caller
1.38% 000 udp_flood [kernel.vmlinux] [k] kmem_cache_alloc_node
1.30% 000 udp_flood [kernel.vmlinux] [k] dst_release
1.26% 000 udp_flood [kernel.vmlinux] [k] ksize
1.26% 000 udp_flood [kernel.vmlinux] [k] sk_dst_check
1.22% 000 udp_flood [kernel.vmlinux] [k] SYSC_sendto
1.22% 000 udp_flood [kernel.vmlinux] [k] ip_send_check
Forcing TX completion to happen on remote CPU, some xmit_more:
# Overhead CPU Command Shared Object Symbol
# ........ ... ............ ................ ..............................
#
11.67% 002 udp_flood [kernel.vmlinux] [k] _raw_spin_lock
7.61% 002 udp_flood [kernel.vmlinux] [k] skb_set_owner_w
6.15% 002 udp_flood [mlx5_core] [k] mlx5e_sq_xmit
3.05% 002 udp_flood [kernel.vmlinux] [k] entry_SYSCALL_64
2.89% 002 udp_flood [kernel.vmlinux] [k] __ip_append_data.isra.47
2.78% 000 swapper [mlx5_core] [k] mlx5e_poll_tx_cq
2.65% 002 udp_flood [kernel.vmlinux] [k] sk_dst_check
2.36% 002 udp_flood [kernel.vmlinux] [k] __alloc_skb
2.22% 002 udp_flood [kernel.vmlinux] [k] ip_finish_output2
2.07% 000 swapper [kernel.vmlinux] [k] __slab_free
2.06% 002 udp_flood [kernel.vmlinux] [k] udp_sendmsg
1.97% 002 udp_flood [kernel.vmlinux] [k] ksize
1.92% 002 udp_flood [kernel.vmlinux] [k] entry_SYSCALL_64_fastpath
1.82% 002 udp_flood [kernel.vmlinux] [k] __ip_make_skb
1.79% 002 udp_flood libc-2.17.so [.] __libc_send
1.62% 002 udp_flood [kernel.vmlinux] [k] __kmalloc_node_track_caller
1.53% 002 udp_flood [kernel.vmlinux] [k] __local_bh_enable_ip
1.48% 002 udp_flood [kernel.vmlinux] [k] sock_alloc_send_pskb
1.43% 002 udp_flood [kernel.vmlinux] [k] __dev_queue_xmit
1.39% 002 udp_flood [kernel.vmlinux] [k] ip_send_check
1.39% 002 udp_flood [kernel.vmlinux] [k] kmem_cache_alloc_node
1.37% 002 udp_flood [kernel.vmlinux] [k] dst_release
1.21% 002 udp_flood [kernel.vmlinux] [k] udp_send_skb
1.18% 002 udp_flood [kernel.vmlinux] [k] __fget_light
1.16% 002 udp_flood [kernel.vmlinux] [k] kfree
1.15% 000 swapper [kernel.vmlinux] [k] sock_wfree
1.14% 002 udp_flood [kernel.vmlinux] [k] SYSC_sendto
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-11-21 16:03 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-03 14:59 High perf top ip_idents_reserve doing netperf UDP_STREAM Jesper Dangaard Brouer
2014-09-03 15:17 ` Eric Dumazet
2016-11-16 12:16 ` Netperf UDP issue with connected sockets Jesper Dangaard Brouer
2016-11-16 17:46 ` Rick Jones
2016-11-16 22:40 ` Jesper Dangaard Brouer
2016-11-16 22:50 ` Rick Jones
2016-11-17 0:34 ` Eric Dumazet
2016-11-17 8:16 ` Jesper Dangaard Brouer
2016-11-17 13:20 ` Eric Dumazet
2016-11-17 13:42 ` Jesper Dangaard Brouer
2016-11-17 14:17 ` Eric Dumazet
2016-11-17 14:57 ` Jesper Dangaard Brouer
2016-11-17 16:21 ` Eric Dumazet
2016-11-17 18:30 ` Jesper Dangaard Brouer
2016-11-17 18:51 ` Eric Dumazet
2016-11-17 21:19 ` Jesper Dangaard Brouer
2016-11-17 21:44 ` Eric Dumazet
2016-11-17 23:08 ` Rick Jones
2016-11-18 0:37 ` Julian Anastasov
2016-11-18 0:42 ` Rick Jones
2016-11-18 17:12 ` Jesper Dangaard Brouer
2016-11-21 16:03 ` Jesper Dangaard Brouer [this message]
2016-11-21 18:10 ` Eric Dumazet
2016-11-29 6:58 ` [WIP] net+mlx4: auto doorbell Eric Dumazet
2016-11-30 11:38 ` Jesper Dangaard Brouer
2016-11-30 15:56 ` Eric Dumazet
2016-11-30 19:17 ` Jesper Dangaard Brouer
2016-11-30 19:30 ` Eric Dumazet
2016-11-30 22:30 ` Jesper Dangaard Brouer
2016-11-30 22:40 ` Eric Dumazet
2016-12-01 0:27 ` Eric Dumazet
2016-12-01 1:16 ` Tom Herbert
2016-12-01 2:32 ` Eric Dumazet
2016-12-01 2:50 ` Eric Dumazet
2016-12-02 18:16 ` Eric Dumazet
2016-12-01 5:03 ` Tom Herbert
2016-12-01 19:24 ` Willem de Bruijn
2016-11-30 13:50 ` Saeed Mahameed
2016-11-30 15:44 ` Eric Dumazet
2016-11-30 16:27 ` Saeed Mahameed
2016-11-30 17:28 ` Eric Dumazet
2016-12-01 12:05 ` Jesper Dangaard Brouer
2016-12-01 14:24 ` Eric Dumazet
2016-12-01 16:04 ` Jesper Dangaard Brouer
2016-12-01 17:04 ` Eric Dumazet
2016-12-01 19:17 ` Jesper Dangaard Brouer
2016-12-01 20:11 ` Eric Dumazet
2016-12-01 20:20 ` David Miller
2016-12-01 22:10 ` Eric Dumazet
2016-12-02 14:23 ` Eric Dumazet
2016-12-01 21:32 ` Alexander Duyck
2016-12-01 22:04 ` Eric Dumazet
2016-11-17 17:34 ` Netperf UDP issue with connected sockets David Laight
2016-11-17 22:39 ` Alexander Duyck
2016-11-17 17:42 ` Rick Jones
2016-11-28 18:33 ` Rick Jones
2016-11-28 18:40 ` Rick Jones
2016-11-30 10:43 ` Jesper Dangaard Brouer
2016-11-30 17:42 ` Rick Jones
2016-11-30 18:11 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161121170351.50a09ee1@redhat.com \
--to=brouer@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hpe.com \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.