All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Sabrina Dubroca <sd@queasysnail.net>,
	brouer@redhat.com
Subject: Re: [PATCH net-next 0/5] net: add protocol level recvmmsg support
Date: Mon, 28 Nov 2016 14:52:41 +0100	[thread overview]
Message-ID: <20161128145241.4c1b083d@redhat.com> (raw)
In-Reply-To: <20161128132141.217aef39@redhat.com>


On Mon, 28 Nov 2016 13:21:41 +0100 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> On Mon, 28 Nov 2016 11:52:38 +0100 Paolo Abeni <pabeni@redhat.com> wrote:
> >   
> > > > [2] like [1], but using the minimum number of flows to saturate the user space
> > > >  sink, that is 1 flow for the old kernel and 3 for the patched one.
> > > >  the tput increases since the contention on the rx lock is low.
> > > > [3] like [1] but using a single flow with both old and new kernel. All the
> > > >  packets land on the same rx queue and there is a single ksoftirqd instance
> > > >  running    
[...]
> > 
> > We also used connected socket for test[3], with relative little
> > difference (the tput increased for both unpatched and patched kernel, 
> > and the difference was roughly the same).  
> 
> When I use connected sockets (RX side) and ip_early_demux enabled, I do
> see a performance boost for recvmmsg.  With these patches applied,
> forced ksoftirqd on CPU0 and udp_sink on CPU2, pktgen single flow
> sending size 1472 bytes.
> 
> $ sysctl net/ipv4/ip_early_demux
> net.ipv4.ip_early_demux = 1
> 
> $ grep -H . /proc/sys/net/core/{r,w}mem_max
> /proc/sys/net/core/rmem_max:1048576
> /proc/sys/net/core/wmem_max:1048576
> 
> # taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --repeat 1
> #                               ns      pps             cycles
> recvMmsg/32  	run: 0 10000000	462.51	2162095.23	1853
> recvmsg   	run: 0 10000000	536.47	1864041.75	2150
> read      	run: 0 10000000	492.01	2032460.71	1972
> recvfrom  	run: 0 10000000	553.94	1805262.84	2220
> 
> # taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --repeat 1 --connect
> #                               ns      pps             cycles
> recvMmsg/32  	run: 0 10000000	405.15	2468225.03	1623
> recvmsg   	run: 0 10000000	548.23	1824049.58	2197
> read      	run: 0 10000000	489.76	2041825.27	1962
> recvfrom  	run: 0 10000000	466.18	2145091.77	1868
> 
> My theory is that by enabling connect'ed RX socket, the ksoftirqd gets
> faster (no fib_lookup) and is no-longer a bottleneck.  This is
> confirmed by nstat.

Paolo asked me to do a test with small packets with pktgen, and I was
actually surprised by the result.

# taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --repeat 1 --connect
recvMmsg/32  	run: 0 10000000	426.61	2344076.59	1709	17098657328
recvmsg   	run: 0 10000000	533.49	1874449.82	2138	21382574965
read      	run: 0 10000000	470.22	2126651.13	1884	18846797802
recvfrom  	run: 0 10000000	513.74	1946499.83	2059	20591095477

Notice how recvMmsg/32, got slower with 124kpps (2468225 pps -> 2344076 pps).
I was expecting it to get faster, given we just established udp_sink
was the bottleneck, and smaller packet should mean less copy of bytes
to userspace (copy_user_enhanced_fast_string). (With nstat I observe
ksoftirq is again the bottleneck).

Looking at perf diff of CPU2 (baseline=64Bytes) we do see an increase
of copy_user_enhanced_fast_string.  More interestingly we see a
decrease in the locking cost when using big packets (see ** below)

# Event 'cycles:ppp'
#
# Baseline    Delta  Shared Object     Symbol                                   
# ........  .......  ................  .........................................
#
    15.09%   +0.33%  [kernel.vmlinux]  [k] copy_msghdr_from_user
    12.36%  +21.89%  [kernel.vmlinux]  [k] copy_user_enhanced_fast_string
     8.65%   -0.63%  [kernel.vmlinux]  [k] udp_process_skb
     7.33%   -1.88%  [kernel.vmlinux]  [k] __skb_try_recv_datagram_batch
 **  7.12%   -6.66%  [kernel.vmlinux]  [k] udp_rmem_release **
 **  6.71%   -6.52%  [kernel.vmlinux]  [k] _raw_spin_lock_bh **
     6.35%   +1.36%  [kernel.vmlinux]  [k] __free_page_frag
     4.39%   +0.29%  [kernel.vmlinux]  [k] copy_msghdr_to_user_gen
     2.87%   -1.52%  [kernel.vmlinux]  [k] skb_release_data
     2.60%   +0.14%  [kernel.vmlinux]  [k] __put_user_4
     2.27%   -2.18%  [kernel.vmlinux]  [k] __sk_mem_reduce_allocated
     2.11%   +0.08%  [kernel.vmlinux]  [k] cmpxchg_double_slab.isra.68
     1.90%   +2.40%  [kernel.vmlinux]  [k] __slab_free
     1.73%   +0.20%  [kernel.vmlinux]  [k] __udp_recvmmsg
     1.62%   -1.62%  [kernel.vmlinux]  [k] intel_idle
     1.52%   +0.22%  [kernel.vmlinux]  [k] copy_to_iter
     1.20%   -0.03%  [kernel.vmlinux]  [k] import_iovec
     1.14%   +0.05%  [kernel.vmlinux]  [k] rw_copy_check_uvector
     0.80%   -0.04%  [kernel.vmlinux]  [k] recvmmsg_ctx_to_user
     0.75%   -0.69%  [kernel.vmlinux]  [k] __local_bh_enable_ip
     0.71%   +0.18%  [kernel.vmlinux]  [k] skb_copy_datagram_iter
     0.70%   -0.07%  [kernel.vmlinux]  [k] recvmmsg_ctx_from_user
     0.67%   +0.08%  [kernel.vmlinux]  [k] kmem_cache_free
     0.56%   +0.42%  [kernel.vmlinux]  [k] udp_process_msg
     0.48%   +0.05%  [kernel.vmlinux]  [k] skb_release_head_state
     0.46%           [kernel.vmlinux]  [k] lapic_next_deadline
     0.36%           [kernel.vmlinux]  [k] __switch_to
     0.34%   -0.03%  [kernel.vmlinux]  [k] consume_skb
     0.32%   -0.05%  [kernel.vmlinux]  [k] skb_consume_udp


The perf diff from CPU0, also show less lock congestion:

# Event 'cycles:ppp'
#
# Baseline    Delta  Shared Object     Symbol                                   
# ........  .......  ................  .........................................
#
    11.04%   -3.02%  [kernel.vmlinux]  [k] __udp_enqueue_schedule_skb
     9.98%   +2.16%  [mlx5_core]       [k] mlx5e_handle_rx_cqe
     7.23%   -1.85%  [kernel.vmlinux]  [k] udp_v4_early_demux
     3.90%   +0.73%  [kernel.vmlinux]  [k] build_skb
     3.85%   -1.77%  [kernel.vmlinux]  [k] udp_queue_rcv_skb
     3.83%   +0.02%  [kernel.vmlinux]  [k] sock_def_readable
 **  3.26%   -3.19%  [kernel.vmlinux]  [k] queued_spin_lock_slowpath **
     2.99%   +0.55%  [kernel.vmlinux]  [k] __build_skb
     2.97%   +0.11%  [kernel.vmlinux]  [k] __udp4_lib_rcv
 **  2.87%   -1.39%  [kernel.vmlinux]  [k] _raw_spin_lock **
     2.67%   +0.60%  [kernel.vmlinux]  [k] ip_rcv
     2.65%   +0.61%  [kernel.vmlinux]  [k] __netif_receive_skb_core
     2.64%   +0.79%  [ip_tables]       [k] ipt_do_table
     2.37%   +0.37%  [kernel.vmlinux]  [k] read_tsc
     2.26%   +0.52%  [mlx5_core]       [k] mlx5e_get_cqe
     2.11%   -1.15%  [kernel.vmlinux]  [k] __sk_mem_raise_allocated
     2.10%   +0.37%  [kernel.vmlinux]  [k] __rcu_read_unlock
     2.04%   +0.67%  [mlx5_core]       [k] mlx5e_alloc_rx_wqe
     1.86%   +0.40%  [kernel.vmlinux]  [k] inet_gro_receive
     1.57%   +0.11%  [kernel.vmlinux]  [k] kmem_cache_alloc
     1.53%   +0.28%  [kernel.vmlinux]  [k] _raw_read_lock
     1.53%   +0.25%  [kernel.vmlinux]  [k] dev_gro_receive
     1.38%   -0.18%  [kernel.vmlinux]  [k] udp_gro_receive
     1.19%   +0.37%  [kernel.vmlinux]  [k] __rcu_read_lock
     1.14%   +0.31%  [kernel.vmlinux]  [k] _raw_read_unlock
     1.14%   +0.12%  [kernel.vmlinux]  [k] ip_rcv_finish
     1.13%   +0.20%  [kernel.vmlinux]  [k] __udp4_lib_lookup
     1.05%   +0.16%  [kernel.vmlinux]  [k] ktime_get_with_offset
     0.94%   +0.38%  [kernel.vmlinux]  [k] ip_local_deliver_finish
     0.91%   +0.22%  [kernel.vmlinux]  [k] do_csum
     0.86%   -0.04%  [kernel.vmlinux]  [k] ipv4_pktinfo_prepare
     0.84%   +0.05%  [kernel.vmlinux]  [k] sk_filter_trim_cap
     0.84%   +0.20%  [kernel.vmlinux]  [k] ip_local_deliver
     0.84%   +0.19%  [kernel.vmlinux]  [k] udp4_gro_receive

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-11-28 13:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 15:39 [PATCH net-next 0/5] net: add protocol level recvmmsg support Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 1/5] net/socket: factor out msghdr manipulation helpers Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 2/5] net/socket: add per protocol mmesg support Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 3/5] net/udp: factor out main skb processing routine Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 4/5] net/socket: add helpers for recvmmsg Paolo Abeni
2016-11-25 20:52   ` kbuild test robot
2016-11-25 20:52   ` kbuild test robot
2016-11-25 22:30   ` Eric Dumazet
2016-11-27 16:21     ` Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 5/5] udp: add recvmmsg implementation Paolo Abeni
2016-11-25 17:09   ` Hannes Frederic Sowa
2016-11-28 12:32     ` David Laight
2016-11-30  0:22     ` David Miller
2016-11-30  3:47       ` Hannes Frederic Sowa
2016-11-25 17:37 ` [PATCH net-next 0/5] net: add protocol level recvmmsg support Jesper Dangaard Brouer
2016-11-28 10:52   ` Paolo Abeni
2016-11-28 12:21     ` Jesper Dangaard Brouer
2016-11-28 13:52       ` Jesper Dangaard Brouer [this message]
2016-11-25 21:16 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161128145241.4c1b083d@redhat.com \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.