From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>,
Jonathan Lemon <jonathan.lemon@gmail.com>,
"David S . Miller" <davem@davemloft.net>,
Willem de Bruijn <willemb@google.com>,
Eric Dumazet <edumazet@google.com>,
David Ahern <dsahern@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Pavel Begunkov <asml.silence@gmail.com>
Subject: [RFC v2 00/19] io_uring zerocopy tx
Date: Tue, 21 Dec 2021 15:35:22 +0000 [thread overview]
Message-ID: <cover.1640029579.git.asml.silence@gmail.com> (raw)
Update on io_uring zerocopy tx, still RFC. For v1 and design notes see
https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/
Absolute numbers (against dummy) got higher since v1, + ~10-12% requests/s for
the peak performance case. 5/19 brought a couple of percents, but most of it
came with 8/19 and 9/19 (+8-11% in numbers, 5-7% in profiles). It will also
be needed in the future for p2p. Any reason not to do alike for paged non-zc?
Small (under 100-150B) packets?
Most of checks are removed from non-zc paths. Implemented a bit trickier in
__ip_append_data(), but considering already existing assumptions around "from"
argument it should be fine.
Benchmarks for dummy netdev, UDP/IPv4, payload size=4096:
-n<N> is how many requests we submit per syscall. From io_uring perspective -n1
is wasteful and far from optimal, but included for comparison.
-z0 disables zerocopy, just normal io_uring send requests
-f makes to flush "buffer free" notifications for every request
| K reqs/s | speedup
msg_zerocopy (non-zc) | 1120 | 1.12
msg_zerocopy (zc) | 997 | 1
io_uring -n1 -z0 | 1469 | 1.47
io_uring -n8 -z0 | 1780 | 1.78
io_uring -n1 -f | 1688 | 1.69
io_uring -n1 | 1774 | 1.77
io_uring -n8 -f | 2075 | 2.08
io_uring -n8 | 2265 | 2.27
note: it might be not too interesting to compare zc vs non-zc, the performance
relative difference can be shifted in favour of zerocopy by cutting constant
per-request overhead, and there are easy ways of doing that, e.g. by compiling
out unused features. Even more true for the table below as there was additional
noise taking a good quarter of CPU cycles.
Some data for UDP/IPv6 between a pair of NICs. 9/19 wasn't there at the time of
testing. All tests are CPU bound and so as expected reqs/s for zerocopy doesn't
vary much between different payload sizes. io_uring to msg_zerocopy ratio is not
too representative for reasons similar to described above.
payload | test | K reqs/s
___________________________________________
8192 | io_uring -n8 (dummy) | 599
| io_uring -n1 -z0 | 264
| io_uring -n8 -z0 | 302
| msg_zerocopy | 248
| msg_zerocopy -z | 183
| io_uring -n1 -f | 306
| io_uring -n1 | 318
| io_uring -n8 -f | 373
| io_uring -n8 | 401
4096 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 303
| io_uring -n8 -z0 | 366
| msg_zerocopy | 278
| msg_zerocopy -z | 187
| io_uring -n1 -f | 317
| io_uring -n1 | 325
| io_uring -n8 -f | 387
| io_uring -n8 | 405
1024 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 329
| io_uring -n8 -z0 | 407
| msg_zerocopy | 301
| msg_zerocopy -z | 186
| io_uring -n1 -f | 317
| io_uring -n1 | 327
| io_uring -n8 -f | 390
| io_uring -n8 | 403
512 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 340
| io_uring -n8 -z0 | 417
| msg_zerocopy | 310
| msg_zerocopy -z | 186
| io_uring -n1 -f | 317
| io_uring -n1 | 328
| io_uring -n8 -f | 392
| io_uring -n8 | 406
128 | io_uring -n8 (dummy) | 602
| io_uring -n1 -z0 | 341
| io_uring -n8 -z0 | 428
| msg_zerocopy | 317
| msg_zerocopy -z | 188
| io_uring -n1 -f | 318
| io_uring -n1 | 331
| io_uring -n8 -f | 391
| io_uring -n8 | 408
https://github.com/isilence/linux/tree/zc_v2
https://github.com/isilence/liburing/tree/zc_v2
The Benchmark is <liburing>/test/send-zc,
send-zc [-f] [-n<N>] [-z0] -s<payload size> -D<dst ip> (-6|-4) [-t<sec>] udp
As a server you can use msg_zerocopy from in kernel's selftests, or a copy of
it at <liburing>/test/msg_zerocopy. No server is needed for dummy testing.
dummy setup:
sudo ip li add dummy0 type dummy && sudo ip li set dummy0 up mtu 65536
# make traffic for the specified IP to go through dummy0
sudo ip route add <ip_address> dev dummy0
v2: remove additional overhead for non-zc from skb_release_data() (Jonathan)
avoid msg propagation, hide extra bits of non-zc overhead
task_work based "buffer free" notifications
improve io_uring's notification refcounting
added 5/19, (no pfmemalloc tracking)
added 8/19 and 9/19 preventing small copies with zc
misc small changes
Pavel Begunkov (19):
skbuff: add SKBFL_DONT_ORPHAN flag
skbuff: pass a struct ubuf_info in msghdr
net: add zerocopy_sg_from_iter for bvec
net: optimise page get/free for bvec zc
net: don't track pfmemalloc for zc registered mem
ipv4/udp: add support msgdr::msg_ubuf
ipv6/udp: add support msgdr::msg_ubuf
ipv4: avoid partial copy for zc
ipv6: avoid partial copy for zc
io_uring: add send notifiers registration
io_uring: infrastructure for send zc notifications
io_uring: wire send zc request type
io_uring: add an option to flush zc notifications
io_uring: opcode independent fixed buf import
io_uring: sendzc with fixed buffers
io_uring: cache struct ubuf_info
io_uring: unclog ctx refs waiting with zc notifiers
io_uring: task_work for notification delivery
io_uring: optimise task referencing by notifiers
fs/io_uring.c | 440 +++++++++++++++++++++++++++++++++-
include/linux/skbuff.h | 46 ++--
include/linux/socket.h | 1 +
include/uapi/linux/io_uring.h | 14 ++
net/compat.c | 1 +
net/core/datagram.c | 58 +++++
net/core/skbuff.c | 16 +-
net/ipv4/ip_output.c | 55 +++--
net/ipv6/ip6_output.c | 54 ++++-
net/socket.c | 3 +
10 files changed, 633 insertions(+), 55 deletions(-)
--
2.34.1
next reply other threads:[~2021-12-21 15:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-21 15:35 Pavel Begunkov [this message]
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51 ` Hao Xu
2022-01-11 15:50 ` Pavel Begunkov
2022-01-12 3:39 ` Hao Xu
2022-01-12 16:53 ` Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1640029579.git.asml.silence@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=io-uring@vger.kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.