All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Sabrina Dubroca <sd@queasysnail.net>,
	brouer@redhat.com
Subject: Re: [PATCH net-next 0/5] net: add protocol level recvmmsg support
Date: Mon, 28 Nov 2016 13:21:41 +0100	[thread overview]
Message-ID: <20161128132141.217aef39@redhat.com> (raw)
In-Reply-To: <1480330358.6718.13.camel@redhat.com>

On Mon, 28 Nov 2016 11:52:38 +0100
Paolo Abeni <pabeni@redhat.com> wrote:

> Hi Jesper,
> 
> On Fri, 2016-11-25 at 18:37 +0100, Jesper Dangaard Brouer wrote:
> > > The measured performance delta is as follow:
> > > 
> > > 		before		after
> > > 		(Kpps)		(Kpps)
> > > 
> > > udp flood[1]	570		1800(+215%)
> > > max tput[2]	1850		3500(+89%)
> > > single queue[3]	1850		1630(-11%)
> > > 
> > > [1] line rate flood using multiple 64 bytes packets and multiple flows  
> > 
> > Is [1] sending multiple flow in the a single UDP-sink?  
> 
> Yes, in the test scenario [1] there are multiple UDP flows using 16
> different rx queues on the receiver host, and a single user space
> reader.
> 
> > > [2] like [1], but using the minimum number of flows to saturate the user space
> > >  sink, that is 1 flow for the old kernel and 3 for the patched one.
> > >  the tput increases since the contention on the rx lock is low.
> > > [3] like [1] but using a single flow with both old and new kernel. All the
> > >  packets land on the same rx queue and there is a single ksoftirqd instance
> > >  running  
> > 
> > It is important to know, if ksoftirqd and the UDP-sink runs on the same CPU?  
> 
> No pinning is enforced. The scheduler moves the user space process on a
> different cpu in respect to the ksoftriqd kernel thread.

This floating userspace process can cause a high variation between test
runs.  On my system, the performance drops to approx 600Kpps when
ksoftirqd and udp_sink share the same CPU.

Quick run with your patches applied:

Sender: pktgen with big packets
 ./pktgen_sample03_burst_single_flow.sh -i mlx5p2 -d 198.18.50.1 \
   -m 7c:fe:90:c7:b1:cf -t1 -b128 -s 1472

Forced CPU0 for both ksoftirq and udp_sink

# taskset -c 0 ./udp_sink --count $((10**7)) --port 9 --repeat 1
                                ns      pps             cycles 
recvMmsg/32  	run: 0 10000000	1667.93	599547.16	6685
recvmsg   	run: 0 10000000	1810.70	552273.39	7257
read      	run: 0 10000000	1634.72	611723.95	6552
recvfrom  	run: 0 10000000	1585.06	630891.39	6353

 
> > > The regression in the single queue scenario is actually due to the improved
> > > performance of the recvmmsg() syscall: the user space process is now
> > > significantly faster than the ksoftirqd process so that the latter needs often
> > > to wake up the user space process.  
> > 
> > When measuring these things, make sure that we/you measure both the packets
> > actually received in the userspace UDP-sink, and also measure packets
> > RX processed by ksoftirq (and I often also look at what HW got delivered).
> > Some times, when userspace is too slow, the kernel can/will drop packets.
> > 
> > It is actually quite easily verified with cmdline:
> > 
> >  nstat > /dev/null && sleep 1  && nstat
> > 
> > For HW measurements I use the tool ethtool_stats.pl:
> >  https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl  
> 
> We collected the UDP stats for all the three scenarios; we have lot of
> drop in test[1] and little, by design, in test[2]. In test [3], with the
> patched kernel, the drops are 0: ksoftirqd is way slower than the user
> space sink. 
> 
> > > Since ksoftirqd is the bottle-neck is such scenario, overall this causes a
> > > tput reduction. In a real use case, where the udp sink is performing some
> > > actual processing of the received data, such regression is unlikely to really
> > > have an effect.  
> > 
> > My experience is that the performance of RX UDP is affected by:
> >  * if socket is connected or not (yes, RX side also)
> >  * state of /proc/sys/net/ipv4/ip_early_demux
> > 
> > You don't need to run with all the combinations, but it would be nice
> > if you specify what config your have based your measurements on (and
> > keep them stable in your runs).
> > 
> > I've actually implemented the "--connect" option to my udp_sink
> > program[1] today, but I've not pushed it yet, if you are interested.  
> 
> The reported numbers are all gathered with unconnected sockets and early
> demux enabled.
> 
> We also used connected socket for test[3], with relative little
> difference (the tput increased for both unpatched and patched kernel, 
> and the difference was roughly the same).

When I use connected sockets (RX side) and ip_early_demux enabled, I do
see a performance boost for recvmmsg.  With these patches applied,
forced ksoftirqd on CPU0 and udp_sink on CPU2, pktgen single flow
sending size 1472 bytes.

$ sysctl net/ipv4/ip_early_demux
net.ipv4.ip_early_demux = 1

$ grep -H . /proc/sys/net/core/{r,w}mem_max
/proc/sys/net/core/rmem_max:1048576
/proc/sys/net/core/wmem_max:1048576

# taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --repeat 1
#                               ns      pps             cycles
recvMmsg/32  	run: 0 10000000	462.51	2162095.23	1853
recvmsg   	run: 0 10000000	536.47	1864041.75	2150
read      	run: 0 10000000	492.01	2032460.71	1972
recvfrom  	run: 0 10000000	553.94	1805262.84	2220

# taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --repeat 1 --connect
#                               ns      pps             cycles
recvMmsg/32  	run: 0 10000000	405.15	2468225.03	1623
recvmsg   	run: 0 10000000	548.23	1824049.58	2197
read      	run: 0 10000000	489.76	2041825.27	1962
recvfrom  	run: 0 10000000	466.18	2145091.77	1868

My theory is that by enabling connect'ed RX socket, the ksoftirqd gets
faster (no fib_lookup) and is no-longer a bottleneck.  This is
confirmed by the nstat output below.

Below: unconnected
 $ nstat > /dev/null && sleep 1  && nstat
 #kernel
 IpInReceives                    2143944            0.0
 IpInDelivers                    2143945            0.0
 UdpInDatagrams                  2143944            0.0
 IpExtInOctets                   3125889306         0.0
 IpExtInNoECTPkts                2143956            0.0

Below: connected
 $ nstat > /dev/null && sleep 1  && nstat
 #kernel
 IpInReceives                    2925155            0.0
 IpInDelivers                    2925156            0.0
 UdpInDatagrams                  2440925            0.0
 UdpInErrors                     484230             0.0
 UdpRcvbufErrors                 484230             0.0
 IpExtInOctets                   4264896402         0.0
 IpExtInNoECTPkts                2925170            0.0

This is a 50Gbit/s link, and IpInReceives correspondent to approx 35Gbit/s.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-11-28 12:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 15:39 [PATCH net-next 0/5] net: add protocol level recvmmsg support Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 1/5] net/socket: factor out msghdr manipulation helpers Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 2/5] net/socket: add per protocol mmesg support Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 3/5] net/udp: factor out main skb processing routine Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 4/5] net/socket: add helpers for recvmmsg Paolo Abeni
2016-11-25 20:52   ` kbuild test robot
2016-11-25 20:52   ` kbuild test robot
2016-11-25 22:30   ` Eric Dumazet
2016-11-27 16:21     ` Paolo Abeni
2016-11-25 15:39 ` [PATCH net-next 5/5] udp: add recvmmsg implementation Paolo Abeni
2016-11-25 17:09   ` Hannes Frederic Sowa
2016-11-28 12:32     ` David Laight
2016-11-30  0:22     ` David Miller
2016-11-30  3:47       ` Hannes Frederic Sowa
2016-11-25 17:37 ` [PATCH net-next 0/5] net: add protocol level recvmmsg support Jesper Dangaard Brouer
2016-11-28 10:52   ` Paolo Abeni
2016-11-28 12:21     ` Jesper Dangaard Brouer [this message]
2016-11-28 13:52       ` Jesper Dangaard Brouer
2016-11-25 21:16 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161128132141.217aef39@redhat.com \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.