From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Subject: [PATCH RFC net-next 00/11] udp gso
Date: Tue, 17 Apr 2018 16:00:50 -0400 [thread overview]
Message-ID: <20180417200059.30154-1-willemdebruijn.kernel@gmail.com> (raw)
From: Willem de Bruijn <willemb@google.com>
Segmentation offload reduces cycles/byte for large packets by
amortizing the cost of protocol stack traversal.
This patchset implements GSO for UDP. A process can concatenate and
submit multiple datagrams to the same destination in one send call
by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
or passing an analogous cmsg at send time.
The stack will send the entire large (up to network layer max size)
datagram through the protocol layer. At the GSO layer, it is broken
up in individual segments. All receive the same network layer header
and UDP src and dst port. All but the last segment have the same UDP
header, but the last may differ in length and checksum.
This initial patchset is RFC. A few open items
* MSG_MORE
The feature requires UDP checksum offload, as without it the
checksum + copy operation at send() time is likely cheaper than
checksumming each segment in the GSO layer.
UDP checksum offload is disabled with MSG_MORE. As a result, GSO
only works in the lockless fast path.
The patchset can be simplified if explicitly excluding MSG_MORE.
For one, patch 1 can be dropped by passing ipcm to udp_send_skb
instead of inet_cork.
* MSG_ZEROCOPY
UDP zerocopy has been sent for review before. Completion
notification cost exceeds the savings from copy avoidance for
datagrams of regular MSS (< 1500B).
UDP GSO enables building larger packets, at which point
zerocopy becomes effective. Results with the current benchmark
are not as great as from GSO itself, though that may say more
about the benchmark. Either way, I do not intend to submit
this separate feature as part of a final UDP GSO patchset.
* GSO_BY_FRAGS
An alternative implementation that would allow non-uniform
segment length is to use GSO_BY_FRAGS like SCTP. This would
likely require MSG_MORE to build the list using multiple
send calls (or one sendmmsg). The two approaches are not
mutually-exclusive, so that could be a follow-up.
Initial results show a significant reduction in UDP cycles/byte.
See the main patch for more details and benchmark results.
udp
876 MB/s 14873 msg/s 624666 calls/s
11,205,777,429 cycles
udp gso
2139 MB/s 36282 msg/s 36282 calls/s
11,204,374,561 cycles
The patch set is broken down as follows:
- patch 1 is a prerequisite: code rearrangement, noop otherwise
- patch 2 is the core feature
- patch 3,4,6 are refinements
- patch 5 adds the cmsg interface
- patch 7 adds udp zerocopy
- patch 8..11 are tests
This idea was presented previously at netconf 2017-2
http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
Known limitation:
- The feature requires pacing and possibly a lower threshold on
segment size to limit the number of segments that may be passed
to the NIC at once.
- Even when only accepting datagrams with CHECKSUM_PARTIAL, the
segmentation layer must drop or fall back to software checksumming
if the device cannot checksum the packet.
This can happen if a device advertises checksum offload in
general, but removes it for this skb in .ndo_features_check.
Willem de Bruijn (11):
udp: expose inet cork to udp
udp: add gso
udp: better wmem accounting on gso
udp: paged allocation with gso
udp: add gso segment cmsg
udp: add gso support to virtual devices
udp: zerocopy
selftests: udp gso
selftests: udp gso with connected sockets
selftests: udp gso with corking
selftests: udp gso benchmark
include/linux/netdev_features.h | 3 +
include/linux/skbuff.h | 10 +
include/linux/udp.h | 1 +
include/net/inet_sock.h | 1 +
include/net/ip.h | 3 +-
include/net/ipv6.h | 2 +
include/net/udp.h | 5 +
include/uapi/linux/udp.h | 1 +
net/core/skbuff.c | 14 +-
net/core/sock.c | 5 +-
net/ipv4/af_inet.c | 2 +-
net/ipv4/ip_output.c | 63 +-
net/ipv4/udp.c | 78 ++-
net/ipv4/udp_offload.c | 63 ++
net/ipv6/ip6_offload.c | 5 +-
net/ipv6/ip6_output.c | 66 +-
net/ipv6/udp.c | 29 +-
net/ipv6/udp_offload.c | 14 +
tools/testing/selftests/net/.gitignore | 3 +
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/udpgso.c | 621 ++++++++++++++++++
tools/testing/selftests/net/udpgso.sh | 31 +
tools/testing/selftests/net/udpgso_bench.sh | 74 +++
tools/testing/selftests/net/udpgso_bench_rx.c | 265 ++++++++
tools/testing/selftests/net/udpgso_bench_tx.c | 379 +++++++++++
25 files changed, 1689 insertions(+), 52 deletions(-)
create mode 100644 tools/testing/selftests/net/udpgso.c
create mode 100755 tools/testing/selftests/net/udpgso.sh
create mode 100755 tools/testing/selftests/net/udpgso_bench.sh
create mode 100644 tools/testing/selftests/net/udpgso_bench_rx.c
create mode 100644 tools/testing/selftests/net/udpgso_bench_tx.c
--
2.17.0.484.g0c8726318c-goog
next reply other threads:[~2018-04-17 20:01 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 20:00 Willem de Bruijn [this message]
2018-04-17 20:00 ` [PATCH RFC net-next 01/11] udp: expose inet cork to udp Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 02/11] udp: add gso Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 03/11] udp: better wmem accounting on gso Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 04/11] udp: paged allocation with gso Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 05/11] udp: add gso segment cmsg Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 06/11] udp: add gso support to virtual devices Willem de Bruijn
2018-04-18 0:43 ` Dimitris Michailidis
2018-04-18 3:27 ` Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 07/11] udp: zerocopy Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 08/11] selftests: udp gso Willem de Bruijn
2018-04-17 20:00 ` [PATCH RFC net-next 09/11] selftests: udp gso with connected sockets Willem de Bruijn
2018-04-17 20:15 ` [PATCH RFC net-next 00/11] udp gso Sowmini Varadhan
2018-04-17 20:23 ` Willem de Bruijn
2018-04-17 20:48 ` Sowmini Varadhan
2018-04-17 21:07 ` Willem de Bruijn
2018-04-18 2:25 ` Samudrala, Sridhar
2018-04-18 3:33 ` Willem de Bruijn
2018-04-18 12:31 ` Sowmini Varadhan
2018-04-18 13:35 ` Eric Dumazet
2018-04-18 13:47 ` Sowmini Varadhan
2018-04-18 13:51 ` Willem de Bruijn
2018-04-18 15:08 ` Samudrala, Sridhar
2018-04-18 17:40 ` David Miller
2018-04-18 17:34 ` David Miller
2018-04-18 13:59 ` Willem de Bruijn
2018-04-18 14:28 ` Willem de Bruijn
2018-04-18 17:28 ` David Miller
2018-04-18 18:12 ` Alexander Duyck
2018-04-18 18:22 ` Willem de Bruijn
2018-04-20 17:38 ` Alexander Duyck
2018-04-20 21:58 ` Willem de Bruijn
2018-04-21 2:08 ` Alexander Duyck
2018-04-18 19:33 ` David Miller
2018-04-20 18:27 ` Tushar Dave
2018-04-20 20:08 ` Alexander Duyck
2018-04-21 3:11 ` Tushar Dave
2018-08-31 9:09 ` Paolo Abeni
2018-08-31 10:09 ` Eric Dumazet
2018-08-31 13:08 ` Willem de Bruijn
2018-08-31 13:44 ` Paolo Abeni
2018-08-31 15:11 ` Willem de Bruijn
2018-09-03 8:02 ` Steffen Klassert
2018-09-03 11:45 ` Sowmini Varadhan
2018-04-18 11:17 ` Paolo Abeni
2018-04-18 13:49 ` Willem de Bruijn
2018-05-24 0:02 ` Marcelo Ricardo Leitner
2018-05-24 1:15 ` Willem de Bruijn
2018-04-18 17:24 ` David Miller
2018-04-18 17:50 ` David Miller
2018-04-18 18:12 ` Willem de Bruijn
2018-04-19 17:45 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180417200059.30154-1-willemdebruijn.kernel@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).