All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: David Laight <David.Laight@ACULAB.COM>
Cc: 'Marek Majkowski' <marek@cloudflare.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	network dev <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	brouer@redhat.com, Paolo Abeni <pabeni@redhat.com>
Subject: Re: epoll_wait() performance
Date: Wed, 27 Nov 2019 16:48:21 +0100	[thread overview]
Message-ID: <20191127164821.1c41deff@carbon> (raw)
In-Reply-To: <5f4028c48a1a4673bd3b38728e8ade07@AcuMS.aculab.com>


On Wed, 27 Nov 2019 10:39:44 +0000 David Laight <David.Laight@ACULAB.COM> wrote:

> ...
> > > While using recvmmsg() to read multiple messages might seem a good idea, it is much
> > > slower than recv() when there is only one message (even recvmsg() is a lot slower).
> > > (I'm not sure why the code paths are so slow, I suspect it is all the copy_from_user()
> > > and faffing with the user iov[].)
> > >
> > > So using poll() we repoll the fd after calling recv() to find is there is a second message.
> > > However the second poll has a significant performance cost (but less than using recvmmsg()).  
> > 
> > That sounds wrong. Single recvmmsg(), even when receiving only a
> > single message, should be faster than two syscalls - recv() and
> > poll().  
> 
> My suspicion is the extra two copy_from_user() needed for each recvmsg are a
> significant overhead, most likely due to the crappy code that tries to stop
> the kernel buffer being overrun.
>
> I need to run the tests on a system with a 'home built' kernel to see how much
> difference this make (by seeing how much slower duplicating the copy makes it).
> 
> The system call cost of poll() gets factored over a reasonable number of sockets.
> So doing poll() on a socket with no data is a lot faster that the setup for recvmsg
> even allowing for looking up the fd.
> 
> This could be fixed by an extra flag to recvmmsg() to indicate that you only really
> expect one message and to call the poll() function before each subsequent receive.
> 
> There is also the 'reschedule' that Eric added to the loop in recvmmsg.
> I don't know how much that actually costs.
> In this case the process is likely to be running at a RT priority and pinned to a cpu.
> In some cases the cpu is also reserved (at boot time) so that 'random' other code can't use it.
> 
> We really do want to receive all these UDP packets in a timely manner.
> Although very low latency isn't itself an issue.
> The data is telephony audio with (typically) one packet every 20ms.
> The code only looks for packets every 10ms - that helps no end since, in principle,
> only a single poll()/epoll_wait() call (on all the sockets) is needed every 10ms.

I have a simple udp_sink tool[1] that cycle through the different
receive socket system calls.  I gave it a quick spin on a F31 kernel
5.3.12-300.fc31.x86_64 on a mlx5 100G interface, and I'm very surprised
to see a significant regression/slowdown for recvMmsg.

$ sudo ./udp_sink --port 9 --repeat 1 --count $((10**7))
          	run      count   	ns/pkt	pps		cycles	payload
recvMmsg/32  	run:  0	10000000	1461.41	684270.96	5261	18	 demux:1
recvmsg   	run:  0	10000000	889.82	1123824.84	3203	18	 demux:1
read      	run:  0	10000000	974.81	1025841.68	3509	18	 demux:1
recvfrom  	run:  0	10000000	1056.51	946513.44	3803	18	 demux:1

Normal recvmsg almost have double performance that recvmmsg.
 recvMmsg/32 = 684,270 pps
 recvmsg     = 1,123,824 pps

[1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

For connected UDP socket:

$ sudo ./udp_sink --port 9 --repeat 1 --connect
          	run      count   	ns/pkt	pps		cycles	payload
recvMmsg/32  	run:  0	 1000000	1240.06	806411.73	4464	18	 demux:1 c:1
recvmsg   	run:  0	 1000000	768.80	1300724.75	2767	18	 demux:1 c:1
read      	run:  0	 1000000	823.40	1214478.40	2964	18	 demux:1 c:1
recvfrom  	run:  0	 1000000	889.19	1124616.11	3201	18	 demux:1 c:1


Found some old results (approx v4.10-rc1):

[brouer@skylake src]$ sudo taskset -c 2 ./udp_sink --count $((10**7)) --port 9 --connect
 recvMmsg/32    run: 0 10000000 537.89  1859106.74      2155    21559353816
 recvmsg        run: 0 10000000 552.69  1809344.44      2215    22152468673
 read           run: 0 10000000 476.65  2097970.76      1910    19104864199
 recvfrom       run: 0 10000000 450.76  2218492.60      1806    18066972794



  reply	other threads:[~2019-11-27 15:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22 11:17 epoll_wait() performance David Laight
2019-11-27  9:50 ` Marek Majkowski
2019-11-27 10:39   ` David Laight
2019-11-27 15:48     ` Jesper Dangaard Brouer [this message]
2019-11-27 16:04       ` David Laight
2019-11-27 19:48         ` Willem de Bruijn
2019-11-28 16:25           ` David Laight
2019-11-28 11:12         ` Jesper Dangaard Brouer
2019-11-28 16:37           ` David Laight
2019-11-28 16:52             ` Willy Tarreau
2019-12-19  7:57             ` Jesper Dangaard Brouer
2019-11-27 16:26       ` Paolo Abeni
2019-11-27 17:30         ` David Laight
2019-11-27 17:46           ` Eric Dumazet
2019-11-28 10:17             ` David Laight
2019-11-30  1:07               ` Eric Dumazet
2019-11-30 13:29                 ` Jakub Sitnicki
2019-12-02 12:24                   ` David Laight
2019-12-02 16:47                     ` Willem de Bruijn
2019-11-27 17:50           ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191127164821.1c41deff@carbon \
    --to=brouer@redhat.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.