* [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS
@ 2023-06-02 15:07 David Howells
2023-06-02 15:07 ` [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
` (10 more replies)
0 siblings, 11 replies; 17+ messages in thread
From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Here are patches to do the following:
(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.
(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.
(3) Allow tls/sw to be given a zero-length send()/sendto()/sendmsg()
without MSG_MORE set to allow userspace ot flush the pending record.
(4) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.
(5) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.
(6) Address the now failing TLS multi_chunk_sendfile kselftest by putting
in a zero-length send() to end the record.
(7) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.
(8) Provide some sample programs for driving AF_ALG (hash & encrypt), TCP,
TLS, UDP and AF_UNIX.
Here are some simple timings, taking the best timing for each out of
several runs. In the following table, samples added in the last patch were
used for the first five columns and the tls kselftest for the last:
Patches unix- tcp-send tls-send tls
send kselftest
10G lo 10G lo
======= ======= ======= ======= ======= ======= =======
none 0.516 0.469 0.492 3.121 3.082 1.152
splice 0.470 0.452 0.471 3.074 3.041 0.294
all 0.469 0.440 0.475 3.077 3.041 0.345
the times are all in seconds. The "none" row is with none of the patches
applied; "splice" is up to the splice-to-sendpage replacement; and "all" is
with all the patches applied. The "10G" column is going to a server on a
different box by 10G ethernet and the "lo" column is going to a server on
the same box by the loopback device.
I think the apparent improvement is from cutting out a layer in the splice
stack and pushing more than one page in a single sendmsg. The improvement
in the tls selftest column is particularly marked.
The following sample and selftest commands were used:
unix-sink /tmp/sock & # server
unix-send -ds 256M /tmp/sock # client
tcp-sink & # server
tcp-send -ds 256M 127.0.0.1 # client - loopback
tcp-send -ds 256M 192.168.6.1 # client - 10G ethernet
tls-sink & # server
tls-send -ds 256M 127.0.0.1 # client - loopback
tls-send -ds 256M 192.168.6.1 # client - 10G ethernet
tls -r tls.12_aes_gcm.multi_chunk_sendfile
where 256M is a 256MiB file to be read in its entirety unless otherwise
specified, -d indicates O_DIRECT and -s asks for splice (if input is a
pipe) or sendfile (if input not a pipe) to be used.
I've pushed the patches here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-2-tls
David
Changes
=======
ver #3)
- Include the splice-to-socket rewrite patch.
- Fix SPLICE_F_MORE/MSG_MORE signalling.
- Allow AF_TLS to accept sendmsg() with MSG_SPLICE_PAGES before it is
handled.
- Allow a zero-length send() to a TLS socket to flush an outstanding
record.
- Address TLS kselftest failure.
ver #2)
- Dropped the slab data copying.
- "rls_" should be "tls_".
- Attempted to fix splice_direct_to_actor().
- Blocked MSG_SENDPAGE_* from being set by userspace.
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1
David Howells (11):
net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
tls/sw: Use zero-length sendmsg() without MSG_MORE to flush
splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
tls: Address behaviour change in multi_chunk_sendfile kselftest
tls/sw: Support MSG_SPLICE_PAGES
tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
tls/device: Support MSG_SPLICE_PAGES
tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
net: Add samples for network I/O and splicing
fs/splice.c | 176 ++++++++++++++++++------
include/linux/fs.h | 2 -
include/linux/socket.h | 4 +-
include/linux/splice.h | 2 +
net/socket.c | 26 +---
net/tls/tls_device.c | 97 ++++++-------
net/tls/tls_sw.c | 217 +++++++++++-------------------
samples/Kconfig | 14 ++
samples/Makefile | 1 +
samples/net/Makefile | 13 ++
samples/net/alg-encrypt.c | 206 ++++++++++++++++++++++++++++
samples/net/alg-hash.c | 147 ++++++++++++++++++++
samples/net/splice-out.c | 147 ++++++++++++++++++++
samples/net/tcp-send.c | 177 ++++++++++++++++++++++++
samples/net/tcp-sink.c | 80 +++++++++++
samples/net/tls-send.c | 188 ++++++++++++++++++++++++++
samples/net/tls-sink.c | 104 ++++++++++++++
samples/net/udp-send.c | 156 +++++++++++++++++++++
samples/net/udp-sink.c | 84 ++++++++++++
samples/net/unix-send.c | 151 +++++++++++++++++++++
samples/net/unix-sink.c | 54 ++++++++
tools/testing/selftests/net/tls.c | 6 +-
22 files changed, 1792 insertions(+), 260 deletions(-)
create mode 100644 samples/net/Makefile
create mode 100644 samples/net/alg-encrypt.c
create mode 100644 samples/net/alg-hash.c
create mode 100644 samples/net/splice-out.c
create mode 100644 samples/net/tcp-send.c
create mode 100644 samples/net/tcp-sink.c
create mode 100644 samples/net/tls-send.c
create mode 100644 samples/net/tls-sink.c
create mode 100644 samples/net/udp-send.c
create mode 100644 samples/net/udp-sink.c
create mode 100644 samples/net/unix-send.c
create mode 100644 samples/net/unix-sink.c
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells ` (9 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them in the network protocol, however, allows these flags to be passed in by userspace too[1]. Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object if they are passed to sendmsg() by userspace. Network protocol ->sendmsg() implementations can then allow them through. Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once sendpage is removed as a whole slew of pages will be passed in in one go by splice through sendmsg, with MSG_MORE being set if it has more data waiting in the pipe. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1] --- include/linux/socket.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index bd1cc3238851..3fd3436bc09f 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -339,7 +339,9 @@ struct ucred { #endif /* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */ -#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES) +#define MSG_INTERNAL_SENDMSG_FLAGS \ + (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \ + MSG_SENDPAGE_DECRYPTED) /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ #define SOL_IP 0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells 2023-06-02 15:07 ` [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush David Howells ` (8 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal sendmsg for now. This means the data will just be copied until MSG_SPLICE_PAGES is handled. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 3 ++- net/tls/tls_sw.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index a959572a816f..9ef766e41c7a 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -447,7 +447,8 @@ static int tls_push_data(struct sock *sk, long timeo; if (flags & - ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST)) + ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | + MSG_SPLICE_PAGES)) return -EOPNOTSUPP; if (unlikely(sk->sk_err)) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 6e6a7c37d685..cac1adc968e8 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -953,7 +953,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) int pending; if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | - MSG_CMSG_COMPAT)) + MSG_CMSG_COMPAT | MSG_SPLICE_PAGES)) return -EOPNOTSUPP; ret = mutex_lock_interruptible(&tls_ctx->tx_lock); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells 2023-06-02 15:07 ` [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells 2023-06-02 15:07 ` [PATCH net-next v3 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 18:27 ` Simon Horman 2023-06-02 15:07 ` [PATCH net-next v3 04/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells ` (7 subsequent siblings) 10 siblings, 1 reply; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Allow userspace to end a TLS record without supplying any data by calling send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set or a sendfile() that was incomplete. Without this, a zero-length send to tls-sw is just ignored. I think tls-device will do the right thing without modification. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- net/tls/tls_sw.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index cac1adc968e8..6aa6d17888f5 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) struct tls_rec *rec; int required_size; int num_async = 0; - bool full_record; + bool full_record = false; int record_room; int num_zc = 0; int orig_size; @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) } } + if (!msg_data_left(msg) && eor) + goto just_flush; + while (msg_data_left(msg)) { if (sk->sk_err) { ret = -sk->sk_err; @@ -1082,6 +1085,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) */ tls_ctx->pending_open_record_frags = true; copied += try_to_copy; +just_flush: if (full_record || eor) { ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, record_type, &copied, ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush 2023-06-02 15:07 ` [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush David Howells @ 2023-06-02 18:27 ` Simon Horman 2023-06-02 19:00 ` Dan Carpenter 0 siblings, 1 reply; 17+ messages in thread From: Simon Horman @ 2023-06-02 18:27 UTC (permalink / raw) To: David Howells Cc: netdev, Linus Torvalds, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Dan Carpenter + dan Carpenter On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote: > Allow userspace to end a TLS record without supplying any data by calling > send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be > used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set > or a sendfile() that was incomplete. > > Without this, a zero-length send to tls-sw is just ignored. I think > tls-device will do the right thing without modification. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Chuck Lever <chuck.lever@oracle.com> > cc: Boris Pismenny <borisp@nvidia.com> > cc: John Fastabend <john.fastabend@gmail.com> > cc: Jakub Kicinski <kuba@kernel.org> > cc: Eric Dumazet <edumazet@google.com> > cc: "David S. Miller" <davem@davemloft.net> > cc: Paolo Abeni <pabeni@redhat.com> > cc: Jens Axboe <axboe@kernel.dk> > cc: Matthew Wilcox <willy@infradead.org> > cc: netdev@vger.kernel.org > --- > net/tls/tls_sw.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index cac1adc968e8..6aa6d17888f5 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > struct tls_rec *rec; > int required_size; > int num_async = 0; > - bool full_record; > + bool full_record = false; > int record_room; > int num_zc = 0; > int orig_size; > @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > } > } > > + if (!msg_data_left(msg) && eor) > + goto just_flush; > + Hi David, the flow of this function is not entirely simple, so it is not easy for me to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized and Smatch report that the following may be used uninitialised as a result of this change: * msg_pl * orig_size * msg_en * required_size * try_to_copy > while (msg_data_left(msg)) { > if (sk->sk_err) { > ret = -sk->sk_err; > @@ -1082,6 +1085,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > */ > tls_ctx->pending_open_record_frags = true; > copied += try_to_copy; > +just_flush: > if (full_record || eor) { > ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, > record_type, &copied, > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush 2023-06-02 18:27 ` Simon Horman @ 2023-06-02 19:00 ` Dan Carpenter 2023-06-03 14:51 ` Simon Horman 0 siblings, 1 reply; 17+ messages in thread From: Dan Carpenter @ 2023-06-02 19:00 UTC (permalink / raw) To: Simon Horman Cc: David Howells, netdev, Linus Torvalds, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel On Fri, Jun 02, 2023 at 08:27:56PM +0200, Simon Horman wrote: > + dan Carpenter > > On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote: > > Allow userspace to end a TLS record without supplying any data by calling > > send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be > > used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set > > or a sendfile() that was incomplete. > > > > Without this, a zero-length send to tls-sw is just ignored. I think > > tls-device will do the right thing without modification. > > > > Signed-off-by: David Howells <dhowells@redhat.com> > > cc: Chuck Lever <chuck.lever@oracle.com> > > cc: Boris Pismenny <borisp@nvidia.com> > > cc: John Fastabend <john.fastabend@gmail.com> > > cc: Jakub Kicinski <kuba@kernel.org> > > cc: Eric Dumazet <edumazet@google.com> > > cc: "David S. Miller" <davem@davemloft.net> > > cc: Paolo Abeni <pabeni@redhat.com> > > cc: Jens Axboe <axboe@kernel.dk> > > cc: Matthew Wilcox <willy@infradead.org> > > cc: netdev@vger.kernel.org > > --- > > net/tls/tls_sw.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > > index cac1adc968e8..6aa6d17888f5 100644 > > --- a/net/tls/tls_sw.c > > +++ b/net/tls/tls_sw.c > > @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > struct tls_rec *rec; > > int required_size; > > int num_async = 0; > > - bool full_record; > > + bool full_record = false; > > int record_room; > > int num_zc = 0; > > int orig_size; > > @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > } > > } > > > > + if (!msg_data_left(msg) && eor) > > + goto just_flush; > > + > > Hi David, > > the flow of this function is not entirely simple, so it is not easy for me > to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized > and Smatch report that the following may be used uninitialised as a result > of this change: > > * msg_pl This warning seems correct to me. > * orig_size This warning assumes we hit the first warning and then hit the goto wait_for_memory; > * msg_en I don't get this warning on my system but it's the same thing. Hit the first warning then the goto wait_for_memory. > * required_size Same. > * try_to_copy I don't really understand this warning and I can't reproduce it. Strange. regards, dan carpenter ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush 2023-06-02 19:00 ` Dan Carpenter @ 2023-06-03 14:51 ` Simon Horman 0 siblings, 0 replies; 17+ messages in thread From: Simon Horman @ 2023-06-03 14:51 UTC (permalink / raw) To: Dan Carpenter Cc: David Howells, netdev, Linus Torvalds, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel On Fri, Jun 02, 2023 at 10:00:45PM +0300, Dan Carpenter wrote: > On Fri, Jun 02, 2023 at 08:27:56PM +0200, Simon Horman wrote: > > + dan Carpenter > > > > On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote: > > > Allow userspace to end a TLS record without supplying any data by calling > > > send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be > > > used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set > > > or a sendfile() that was incomplete. > > > > > > Without this, a zero-length send to tls-sw is just ignored. I think > > > tls-device will do the right thing without modification. > > > > > > Signed-off-by: David Howells <dhowells@redhat.com> > > > cc: Chuck Lever <chuck.lever@oracle.com> > > > cc: Boris Pismenny <borisp@nvidia.com> > > > cc: John Fastabend <john.fastabend@gmail.com> > > > cc: Jakub Kicinski <kuba@kernel.org> > > > cc: Eric Dumazet <edumazet@google.com> > > > cc: "David S. Miller" <davem@davemloft.net> > > > cc: Paolo Abeni <pabeni@redhat.com> > > > cc: Jens Axboe <axboe@kernel.dk> > > > cc: Matthew Wilcox <willy@infradead.org> > > > cc: netdev@vger.kernel.org > > > --- > > > net/tls/tls_sw.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > > > index cac1adc968e8..6aa6d17888f5 100644 > > > --- a/net/tls/tls_sw.c > > > +++ b/net/tls/tls_sw.c > > > @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > > struct tls_rec *rec; > > > int required_size; > > > int num_async = 0; > > > - bool full_record; > > > + bool full_record = false; > > > int record_room; > > > int num_zc = 0; > > > int orig_size; > > > @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > > } > > > } > > > > > > + if (!msg_data_left(msg) && eor) > > > + goto just_flush; > > > + > > > > Hi David, > > > > the flow of this function is not entirely simple, so it is not easy for me > > to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized > > and Smatch report that the following may be used uninitialised as a result > > of this change: > > > > * msg_pl > > This warning seems correct to me. > > > * orig_size > > This warning assumes we hit the first warning and then hit the goto > wait_for_memory; > > > * msg_en > > I don't get this warning on my system but it's the same thing. Hit the > first warning then the goto wait_for_memory. > > > * required_size > > Same. > > > * try_to_copy > > I don't really understand this warning and I can't reproduce it. > Strange. Thanks Dan. Of the above I think only the last one was flagged by GCC but not Smatch. I can try investigating further if it is useful. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net-next v3 04/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (2 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells ` (6 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage() with a net-specific handler, splice_to_socket(), that calls sendmsg() with MSG_SPLICE_PAGES set instead of calling ->sendpage(). MSG_MORE is used to indicate if the sendmsg() is expected to be followed with more data. This allows multiple pipe-buffer pages to be passed in a single call in a BVEC iterator, allowing the processing to be pushed down to a loop in the protocol driver. This helps pave the way for passing multipage folios down too. Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should just ignore it and do a normal sendmsg() for now - although that may be a bit slower as it may copy everything. Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- fs/splice.c | 158 +++++++++++++++++++++++++++++++++-------- include/linux/fs.h | 2 - include/linux/splice.h | 2 + net/socket.c | 26 +------ 4 files changed, 131 insertions(+), 57 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..9b1d43c0c562 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -33,6 +33,7 @@ #include <linux/fsnotify.h> #include <linux/security.h> #include <linux/gfp.h> +#include <linux/net.h> #include <linux/socket.h> #include <linux/sched/signal.h> @@ -448,30 +449,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = { }; EXPORT_SYMBOL(nosteal_pipe_buf_ops); -/* - * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' - * using sendpage(). Return the number of bytes sent. - */ -static int pipe_to_sendpage(struct pipe_inode_info *pipe, - struct pipe_buffer *buf, struct splice_desc *sd) -{ - struct file *file = sd->u.file; - loff_t pos = sd->pos; - int more; - - if (!likely(file->f_op->sendpage)) - return -EINVAL; - - more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; - - if (sd->len < sd->total_len && - pipe_occupancy(pipe->head, pipe->tail) > 1) - more |= MSG_SENDPAGE_NOTLAST; - - return file->f_op->sendpage(file, buf->page, buf->offset, - sd->len, &pos, more); -} - static void wakeup_pipe_writers(struct pipe_inode_info *pipe) { smp_mb(); @@ -652,7 +629,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des * Description: * This function does little more than loop over the pipe and call * @actor to do the actual moving of a single struct pipe_buffer to - * the desired destination. See pipe_to_file, pipe_to_sendpage, or + * the desired destination. See pipe_to_file, pipe_to_sendmsg, or * pipe_to_user. * */ @@ -833,8 +810,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, EXPORT_SYMBOL(iter_file_splice_write); +#ifdef CONFIG_NET /** - * generic_splice_sendpage - splice data from a pipe to a socket + * splice_to_socket - splice data from a pipe to a socket * @pipe: pipe to splice from * @out: socket to write to * @ppos: position in @out @@ -846,13 +824,131 @@ EXPORT_SYMBOL(iter_file_splice_write); * is involved. * */ -ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) { - return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage); -} + struct socket *sock = sock_from_file(out); + struct bio_vec bvec[16]; + struct msghdr msg = {}; + ssize_t ret; + size_t spliced = 0; + bool need_wakeup = false; + + pipe_lock(pipe); + + while (len > 0) { + unsigned int head, tail, mask, bc = 0; + size_t remain = len; + + /* + * Check for signal early to make process killable when there + * are always buffers available + */ + ret = -ERESTARTSYS; + if (signal_pending(current)) + break; + + while (pipe_empty(pipe->head, pipe->tail)) { + ret = 0; + if (!pipe->writers) + goto out; + + if (spliced) + goto out; + + ret = -EAGAIN; + if (flags & SPLICE_F_NONBLOCK) + goto out; + + ret = -ERESTARTSYS; + if (signal_pending(current)) + goto out; + + if (need_wakeup) { + wakeup_pipe_writers(pipe); + need_wakeup = false; + } + + pipe_wait_readable(pipe); + } + + head = pipe->head; + tail = pipe->tail; + mask = pipe->ring_size - 1; + + while (!pipe_empty(head, tail)) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; + size_t seg; -EXPORT_SYMBOL(generic_splice_sendpage); + if (!buf->len) { + tail++; + continue; + } + + seg = min_t(size_t, remain, buf->len); + seg = min_t(size_t, seg, PAGE_SIZE); + + ret = pipe_buf_confirm(pipe, buf); + if (unlikely(ret)) { + if (ret == -ENODATA) + ret = 0; + break; + } + + bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset); + remain -= seg; + if (seg >= buf->len) + tail++; + if (bc >= ARRAY_SIZE(bvec)) + break; + } + + if (!bc) + break; + + msg.msg_flags = MSG_SPLICE_PAGES; + if (flags & SPLICE_F_MORE) + msg.msg_flags |= MSG_MORE; + if (remain && pipe_occupancy(pipe->head, tail) > 0) + msg.msg_flags |= MSG_MORE; + + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc, + len - remain); + ret = sock_sendmsg(sock, &msg); + if (ret <= 0) + break; + + spliced += ret; + len -= ret; + tail = pipe->tail; + while (ret > 0) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; + size_t seg = min_t(size_t, ret, buf->len); + + buf->offset += seg; + buf->len -= seg; + ret -= seg; + + if (!buf->len) { + pipe_buf_release(pipe, buf); + tail++; + } + } + + if (tail != pipe->tail) { + pipe->tail = tail; + if (pipe->files) + need_wakeup = true; + } + } + +out: + pipe_unlock(pipe); + if (need_wakeup) + wakeup_pipe_writers(pipe); + return spliced ?: ret; +} +#endif static int warn_unsupported(struct file *file, const char *op) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 21a981680856..f8254c3acf83 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2759,8 +2759,6 @@ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); -extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, - struct file *out, loff_t *, size_t len, unsigned int flags); extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, loff_t *opos, size_t len, unsigned int flags); diff --git a/include/linux/splice.h b/include/linux/splice.h index a55179fd60fc..991ae318b6eb 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -84,6 +84,8 @@ extern long do_splice(struct file *in, loff_t *off_in, extern long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags); +extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags); /* * for dynamic pipe sizing diff --git a/net/socket.c b/net/socket.c index 3df96e9ba4e2..c4d9104418c8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -57,6 +57,7 @@ #include <linux/mm.h> #include <linux/socket.h> #include <linux/file.h> +#include <linux/splice.h> #include <linux/net.h> #include <linux/interrupt.h> #include <linux/thread_info.h> @@ -126,8 +127,6 @@ static long compat_sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg); #endif static int sock_fasync(int fd, struct file *filp, int on); -static ssize_t sock_sendpage(struct file *file, struct page *page, - int offset, size_t size, loff_t *ppos, int more); static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); @@ -162,8 +161,7 @@ static const struct file_operations socket_file_ops = { .mmap = sock_mmap, .release = sock_close, .fasync = sock_fasync, - .sendpage = sock_sendpage, - .splice_write = generic_splice_sendpage, + .splice_write = splice_to_socket, .splice_read = sock_splice_read, .show_fdinfo = sock_show_fdinfo, }; @@ -1066,26 +1064,6 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg, } EXPORT_SYMBOL(kernel_recvmsg); -static ssize_t sock_sendpage(struct file *file, struct page *page, - int offset, size_t size, loff_t *ppos, int more) -{ - struct socket *sock; - int flags; - int ret; - - sock = file->private_data; - - flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0; - /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ - flags |= more; - - ret = kernel_sendpage(sock, page, offset, size, flags); - - if (trace_sock_send_length_enabled()) - call_trace_sock_send_length(sock->sk, ret, 0); - return ret; -} - static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (3 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 04/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 16:36 ` Linus Torvalds 2023-06-02 15:07 ` [PATCH net-next v3 06/11] tls: Address behaviour change in multi_chunk_sendfile kselftest David Howells ` (5 subsequent siblings) 10 siblings, 1 reply; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara, Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel, linux-block splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and, as a result, it incorrectly signals/fails to signal MSG_MORE when splicing to a socket. The problem I'm seeing happens when a short splice occurs because we got a short read due to hitting the EOF on a file: as the length read (read_len) is less than the remaining size to be spliced (len), SPLICE_F_MORE (and thus MSG_MORE) is set. The issue is that, for the moment, we have no way to know *why* the short read occurred and so can't make a good decision on whether we *should* keep MSG_MORE set. Further, the argument can be made that it should be left to userspace to decide how to handle it - userspace could perform some sort of cancellation for example. MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set incorrectly under some circumstances - for example if a short read fills a single pipe_buffer, but the next read would return more (seqfile can do this). This was observed with the multi_chunk_sendfile tests in the tls kselftest program. Some of those tests would hang and time out when the last chunk of file was less than the sendfile request size: build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile This has been observed before[2] and worked around in AF_TLS[3]. Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if we haven't yet hit the requested operation size. SPLICE_F_MORE remains signalled if the user passed it in to splice() but otherwise gets cleared when we've read sufficient data to fulfill the request. The cleanup of a short splice to userspace is left to userspace. [!] Note that this changes user-visible behaviour. It will cause the multi_chunk_sendfile tests in the TLS kselftest to fail. This failure in the testsuite will be addressed in a subsequent patch by making userspace do a zero-length send(). It appears that SPLICE_F_MORE is only used by splice-to-socket. Signed-off-by: David Howells <dhowells@redhat.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Jakub Kicinski <kuba@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: David Hildenbrand <david@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2] Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3] --- fs/splice.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 9b1d43c0c562..c71bd8e03469 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1052,13 +1052,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, */ bytes = 0; len = sd->total_len; + + /* Don't block on output, we have to drain the direct pipe. */ flags = sd->flags; + sd->flags &= ~SPLICE_F_NONBLOCK; /* - * Don't block on output, we have to drain the direct pipe. + * We signal MORE until we've read sufficient data to fulfill the + * request and we keep signalling it if the caller set it. */ - sd->flags &= ~SPLICE_F_NONBLOCK; more = sd->flags & SPLICE_F_MORE; + sd->flags |= SPLICE_F_MORE; WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail)); @@ -1074,14 +1078,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, sd->total_len = read_len; /* - * If more data is pending, set SPLICE_F_MORE - * If this is the last data and SPLICE_F_MORE was not set - * initially, clears it. + * If we now have sufficient data to fulfill the request then + * we clear SPLICE_F_MORE if it was not set initially. */ - if (read_len < len) - sd->flags |= SPLICE_F_MORE; - else if (!more) + if (read_len >= len && !more) sd->flags &= ~SPLICE_F_MORE; + /* * NOTE: nonblocking mode only applies to the input. We * must not do the output in nonblocking mode as then we ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() 2023-06-02 15:07 ` [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells @ 2023-06-02 16:36 ` Linus Torvalds 0 siblings, 0 replies; 17+ messages in thread From: Linus Torvalds @ 2023-06-02 16:36 UTC (permalink / raw) To: David Howells Cc: netdev, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara, Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel, linux-block On Fri, Jun 2, 2023 at 11:08 AM David Howells <dhowells@redhat.com> wrote: > > Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if > we haven't yet hit the requested operation size. Well, I certainly like this patch better than the previous versions, just because it doesn't add random fd-specific code. That said, I think it might be worth really documenting the behavior, particularly for files where the kernel *could* know "the file is at EOF, no more data". I hope that if user space wants to splice() a file to a socket, said user space would have done an 'fstat()' and actually pass in the file size as the length to splice(). Because if they do, I think this simplified patch does the right thing automatically. But if user space instead passes in a "maximally big len", and just depends on the kernel then doing tha ret = do_splice_to(in, &pos, pipe, len, flags); if (unlikely(ret <= 0)) goto out_release; to stop splicing at EOF, then the last splice_write() will have had SPLICE_F_MORE set, even though no more data is coming from the file, of course. And I think that's fine. But wasn't that effectively what the old code was already doing because 'read_len' was smaller than 'len'? I thought that was what you wanted to fix? IOW, I thought you wanted to clear SPLICE_F_MORE when we hit EOF. This still doesn't do that. So now I'm confused about what your "fix" is. Your patch doesn't actually seem to change existing behavior in splice_direct_to_actor(). I was expecting you to actually pass the 'sd' down to do_splice_to() and then to ->splice_read(), so that the splice_read() function could say "I have no more", and clear it. But you didn't do that. Am I misreading something, or did I miss another patch? Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net-next v3 06/11] tls: Address behaviour change in multi_chunk_sendfile kselftest 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (4 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 07/11] tls/sw: Support MSG_SPLICE_PAGES David Howells ` (4 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel The multi_chunk_sendfile tests in the TLS kselftest now fail because the behaviour of sendfile()[*] changed when SPLICE_F_MORE signalling was fixed. Now MSG_MORE is signalled to the socket until we have read sufficient data to fulfill the request - which means if we get a short read, MSG_MORE isn't seen to be dropped and the TLS record remains pending. [*] This will also affect splice() if SPLICE_F_MORE isn't included in the flags. Fix the TLS multi_chunk_sendfile kselftest to attempt to flush the outstanding TLS record if we get a short sendfile() by doing a zero-length send() with MSG_MORE unset. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- tools/testing/selftests/net/tls.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index e699548d4247..8f4bed8aacc0 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -377,7 +377,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata, char buf[TLS_PAYLOAD_MAX_LEN]; uint16_t test_payload_size; int size = 0; - int ret; + int ret = 0; char filename[] = "/tmp/mytemp.XXXXXX"; int fd = mkstemp(filename); off_t offset = 0; @@ -398,6 +398,10 @@ static void chunked_sendfile(struct __test_metadata *_metadata, size -= ret; } + /* Flush the TLS record on a short read. */ + if (ret < chunk_size) + EXPECT_EQ(send(self->fd, "", 0, 0), 0); + EXPECT_EQ(recv(self->cfd, buf, test_payload_size, MSG_WAITALL), test_payload_size); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 07/11] tls/sw: Support MSG_SPLICE_PAGES 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (5 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 06/11] tls: Address behaviour change in multi_chunk_sendfile kselftest David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 08/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells ` (3 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- Notes: ver #2) - "rls_" should be "tls_". net/tls/tls_sw.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 6aa6d17888f5..14636cc6c3a4 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -929,6 +929,38 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags) &copied, flags); } +static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, + struct sk_msg *msg_pl, size_t try_to_copy, + ssize_t *copied) +{ + struct page *page = NULL, **pages = &page; + + do { + ssize_t part; + size_t off; + bool put = false; + + part = iov_iter_extract_pages(&msg->msg_iter, &pages, + try_to_copy, 1, 0, &off); + if (part <= 0) + return part ?: -EIO; + + if (WARN_ON_ONCE(!sendpage_ok(page))) { + iov_iter_revert(&msg->msg_iter, part); + return -EIO; + } + + sk_msg_page_add(msg_pl, page, part, off); + sk_mem_charge(sk, part); + if (put) + put_page(page); + *copied += part; + try_to_copy -= part; + } while (try_to_copy && !sk_msg_full(msg_pl)); + + return 0; +} + int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); @@ -1021,6 +1053,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) full_record = true; } + if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) { + ret = tls_sw_sendmsg_splice(sk, msg, msg_pl, + try_to_copy, &copied); + if (ret < 0) + goto send_end; + tls_ctx->pending_open_record_frags = true; + if (full_record || eor || sk_msg_full(msg_pl)) + goto copied; + continue; + } + if (!is_kvec && (full_record || eor) && !async_capable) { u32 first = msg_pl->sg.end; @@ -1083,8 +1126,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) /* Open records defined only if successfully copied, otherwise * we would trim the sg but not reset the open record frags. */ - tls_ctx->pending_open_record_frags = true; copied += try_to_copy; +copied: + tls_ctx->pending_open_record_frags = true; just_flush: if (full_record || eor) { ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 08/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (6 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 07/11] tls/sw: Support MSG_SPLICE_PAGES David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 09/11] tls/device: Support MSG_SPLICE_PAGES David Howells ` (2 subsequent siblings) 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, bpf Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. [!] Note that tls_sw_sendpage_locked() appears to have the wrong locking upstream. I think the caller will only hold the socket lock, but it should hold tls_ctx->tx_lock too. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org --- net/tls/tls_sw.c | 165 +++++++++-------------------------------------- 1 file changed, 31 insertions(+), 134 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 14636cc6c3a4..4caed478bef8 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -961,7 +961,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, return 0; } -int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, + size_t size) { long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); struct tls_context *tls_ctx = tls_get_ctx(sk); @@ -984,15 +985,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) int ret = 0; int pending; - if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | - MSG_CMSG_COMPAT | MSG_SPLICE_PAGES)) - return -EOPNOTSUPP; - - ret = mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - if (unlikely(msg->msg_controllen)) { ret = tls_process_cmsg(sk, msg, &record_type); if (ret) { @@ -1197,157 +1189,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) send_end: ret = sk_stream_error(sk, msg->msg_flags, ret); - - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); return copied > 0 ? copied : ret; } -static int tls_sw_do_sendpage(struct sock *sk, struct page *page, - int offset, size_t size, int flags) +int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { - long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); struct tls_context *tls_ctx = tls_get_ctx(sk); - struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); - struct tls_prot_info *prot = &tls_ctx->prot_info; - unsigned char record_type = TLS_RECORD_TYPE_DATA; - struct sk_msg *msg_pl; - struct tls_rec *rec; - int num_async = 0; - ssize_t copied = 0; - bool full_record; - int record_room; - int ret = 0; - bool eor; - - eor = !(flags & MSG_SENDPAGE_NOTLAST); - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); - - /* Call the sk_stream functions to manage the sndbuf mem. */ - while (size > 0) { - size_t copy, required_size; - - if (sk->sk_err) { - ret = -sk->sk_err; - goto sendpage_end; - } - - if (ctx->open_rec) - rec = ctx->open_rec; - else - rec = ctx->open_rec = tls_get_rec(sk); - if (!rec) { - ret = -ENOMEM; - goto sendpage_end; - } - - msg_pl = &rec->msg_plaintext; - - full_record = false; - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; - copy = size; - if (copy >= record_room) { - copy = record_room; - full_record = true; - } - - required_size = msg_pl->sg.size + copy + prot->overhead_size; - - if (!sk_stream_memory_free(sk)) - goto wait_for_sndbuf; -alloc_payload: - ret = tls_alloc_encrypted_msg(sk, required_size); - if (ret) { - if (ret != -ENOSPC) - goto wait_for_memory; - - /* Adjust copy according to the amount that was - * actually allocated. The difference is due - * to max sg elements limit - */ - copy -= required_size - msg_pl->sg.size; - full_record = true; - } - - sk_msg_page_add(msg_pl, page, copy, offset); - sk_mem_charge(sk, copy); - - offset += copy; - size -= copy; - copied += copy; - - tls_ctx->pending_open_record_frags = true; - if (full_record || eor || sk_msg_full(msg_pl)) { - ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, - record_type, &copied, flags); - if (ret) { - if (ret == -EINPROGRESS) - num_async++; - else if (ret == -ENOMEM) - goto wait_for_memory; - else if (ret != -EAGAIN) { - if (ret == -ENOSPC) - ret = 0; - goto sendpage_end; - } - } - } - continue; -wait_for_sndbuf: - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); -wait_for_memory: - ret = sk_stream_wait_memory(sk, &timeo); - if (ret) { - if (ctx->open_rec) - tls_trim_both_msgs(sk, msg_pl->sg.size); - goto sendpage_end; - } + int ret; - if (ctx->open_rec) - goto alloc_payload; - } + if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | + MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) + return -EOPNOTSUPP; - if (num_async) { - /* Transmit if any encryptions have completed */ - if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { - cancel_delayed_work(&ctx->tx_work.work); - tls_tx_records(sk, flags); - } - } -sendpage_end: - ret = sk_stream_error(sk, flags, ret); - return copied > 0 ? copied : ret; + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); + if (ret) + return ret; + lock_sock(sk); + ret = tls_sw_sendmsg_locked(sk, msg, size); + release_sock(sk); + mutex_unlock(&tls_ctx->tx_lock); + return ret; } int tls_sw_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags) { + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; + if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - return tls_sw_do_sendpage(sk, page, offset, size, flags); + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg_locked(sk, &msg, size); } int tls_sw_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx = tls_get_ctx(sk); - int ret; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - ret = mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - ret = tls_sw_do_sendpage(sk, page, offset, size, flags); - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return ret; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg(sk, &msg, size); } static int ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 09/11] tls/device: Support MSG_SPLICE_PAGES 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (7 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 08/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 10/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells 2023-06-02 15:07 ` [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing David Howells 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 9ef766e41c7a..f2f1aff19e4a 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -509,6 +509,29 @@ static int tls_push_data(struct sock *sk, tls_append_frag(record, &zc_pfrag, copy); iter_offset.offset += copy; + } else if (copy && (flags & MSG_SPLICE_PAGES)) { + struct page_frag zc_pfrag; + struct page **pages = &zc_pfrag.page; + size_t off; + + rc = iov_iter_extract_pages(iter_offset.msg_iter, + &pages, copy, 1, 0, &off); + if (rc <= 0) { + if (rc == 0) + rc = -EIO; + goto handle_error; + } + copy = rc; + + if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) { + iov_iter_revert(iter_offset.msg_iter, copy); + rc = -EIO; + goto handle_error; + } + + zc_pfrag.offset = off; + zc_pfrag.size = copy; + tls_append_frag(record, &zc_pfrag, copy); } else if (copy) { copy = min_t(size_t, copy, pfrag->size - pfrag->offset); @@ -572,6 +595,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) union tls_iter_offset iter; int rc; + if (!tls_ctx->zerocopy_sendfile) + msg->msg_flags &= ~MSG_SPLICE_PAGES; + mutex_lock(&tls_ctx->tx_lock); lock_sock(sk); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 10/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (8 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 09/11] tls/device: Support MSG_SPLICE_PAGES David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-02 15:07 ` [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing David Howells 10 siblings, 0 replies; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. With that, the tls_iter_offset union is no longer necessary and can be replaced with an iov_iter pointer and the zc_page argument to tls_push_data() can also be removed. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Eric Dumazet <edumazet@google.com> cc: "David S. Miller" <davem@davemloft.net> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 84 +++++++++++--------------------------------- 1 file changed, 20 insertions(+), 64 deletions(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index f2f1aff19e4a..c698d6d60219 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i) return 0; } -union tls_iter_offset { - struct iov_iter *msg_iter; - int offset; -}; - static int tls_push_data(struct sock *sk, - union tls_iter_offset iter_offset, + struct iov_iter *iter, size_t size, int flags, - unsigned char record_type, - struct page *zc_page) + unsigned char record_type) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_prot_info *prot = &tls_ctx->prot_info; @@ -500,22 +494,13 @@ static int tls_push_data(struct sock *sk, record = ctx->open_record; copy = min_t(size_t, size, max_open_record_len - record->len); - if (copy && zc_page) { - struct page_frag zc_pfrag; - - zc_pfrag.page = zc_page; - zc_pfrag.offset = iter_offset.offset; - zc_pfrag.size = copy; - tls_append_frag(record, &zc_pfrag, copy); - - iter_offset.offset += copy; - } else if (copy && (flags & MSG_SPLICE_PAGES)) { + if (copy && (flags & MSG_SPLICE_PAGES)) { struct page_frag zc_pfrag; struct page **pages = &zc_pfrag.page; size_t off; - rc = iov_iter_extract_pages(iter_offset.msg_iter, - &pages, copy, 1, 0, &off); + rc = iov_iter_extract_pages(iter, &pages, + copy, 1, 0, &off); if (rc <= 0) { if (rc == 0) rc = -EIO; @@ -524,7 +509,7 @@ static int tls_push_data(struct sock *sk, copy = rc; if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) { - iov_iter_revert(iter_offset.msg_iter, copy); + iov_iter_revert(iter, copy); rc = -EIO; goto handle_error; } @@ -537,7 +522,7 @@ static int tls_push_data(struct sock *sk, rc = tls_device_copy_data(page_address(pfrag->page) + pfrag->offset, copy, - iter_offset.msg_iter); + iter); if (rc) goto handle_error; tls_append_frag(record, pfrag, copy); @@ -592,7 +577,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { unsigned char record_type = TLS_RECORD_TYPE_DATA; struct tls_context *tls_ctx = tls_get_ctx(sk); - union tls_iter_offset iter; int rc; if (!tls_ctx->zerocopy_sendfile) @@ -607,8 +591,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) goto out; } - iter.msg_iter = &msg->msg_iter; - rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL); + rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags, + record_type); out: release_sock(sk); @@ -619,44 +603,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) int tls_device_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx = tls_get_ctx(sk); - union tls_iter_offset iter_offset; - struct iov_iter msg_iter; - char *kaddr; - struct kvec iov; - int rc; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; - - mutex_lock(&tls_ctx->tx_lock); - lock_sock(sk); + msg.msg_flags |= MSG_MORE; - if (flags & MSG_OOB) { - rc = -EOPNOTSUPP; - goto out; - } - - if (tls_ctx->zerocopy_sendfile) { - iter_offset.offset = offset; - rc = tls_push_data(sk, iter_offset, size, - flags, TLS_RECORD_TYPE_DATA, page); - goto out; - } - - kaddr = kmap(page); - iov.iov_base = kaddr + offset; - iov.iov_len = size; - iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size); - iter_offset.msg_iter = &msg_iter; - rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA, - NULL); - kunmap(page); + if (flags & MSG_OOB) + return -EOPNOTSUPP; -out: - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return rc; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_device_sendmsg(sk, &msg, size); } struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context, @@ -721,12 +679,10 @@ EXPORT_SYMBOL(tls_get_record); static int tls_device_push_pending_record(struct sock *sk, int flags) { - union tls_iter_offset iter; - struct iov_iter msg_iter; + struct iov_iter iter; - iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0); - iter.msg_iter = &msg_iter; - return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL); + iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0); + return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA); } void tls_device_write_space(struct sock *sk, struct tls_context *ctx) ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells ` (9 preceding siblings ...) 2023-06-02 15:07 ` [PATCH net-next v3 10/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells @ 2023-06-02 15:07 ` David Howells 2023-06-03 6:38 ` Jakub Kicinski 10 siblings, 1 reply; 17+ messages in thread From: David Howells @ 2023-06-02 15:07 UTC (permalink / raw) To: netdev, Linus Torvalds Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Herbert Xu Add some small sample programs for doing network I/O including splicing. There are three IPv4/IPv6 servers: tcp-sink, tls-sink and udp-sink. They can be given a port number by passing "-p <port>" and will listen on an IPv6 socket unless given a "-4" flag, in which case they'll listen for IPv4 only. There are three IPv4/IPv6 clients: tcp-send, tls-send and udp-send. They are given a file to get data from (or "-" for stdin) and the name of a server to talk to. They can also be given a port number by passing "-p <port>", "-4" or "-6" to force the use of IPv4 or IPv6, "-s" to indicate they should use splice/sendfile to transfer the data and "-n" to specify how much data to copy. If "-s" is given, the input will be spliced if it's a pipe and sendfiled otherwise. A driver program, splice-out, is provided to splice data from a file/stdin to stdout and can be used to pipe into the aforementioned clients for testing splice. This takes the name of the file to splice from (or "-" for stdin). It can also be given "-w <size>" to indicate the maximum size of each splice, "-k <size>" if a chunk of the input should be skipped between splices to prevent coalescence and "-s" if sendfile should be used instead of splice. Additionally, there is an AF_UNIX client and server. These are similar to the IPv[46] programs, except both take a socket path and there is no option to change the port number. And then there are two AF_ALG clients (there is no server). These are similar to the other clients, except no destination is specified. One exercised skcipher encryption and the other hashing. Examples include: ./splice-out -w0x400 /foo/16K 4K | ./alg-encrypt -s - ./splice-out -w0x400 /foo/1M | ./unix-send -s - /tmp/foo ./splice-out -w0x400 /foo/16K 16K -w1 | ./tls-send -s6 -n16K - servbox ./tcp-send /bin/ls 192.168.6.1 ./udp-send -4 -p5555 /foo/4K localhost where, for example, /foo/16K is a 16KiB file. Signed-off-by: David Howells <dhowells@redhat.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: netdev@vger.kernel.org --- samples/Kconfig | 14 +++ samples/Makefile | 1 + samples/net/Makefile | 13 +++ samples/net/alg-encrypt.c | 206 ++++++++++++++++++++++++++++++++++++++ samples/net/alg-hash.c | 147 +++++++++++++++++++++++++++ samples/net/splice-out.c | 147 +++++++++++++++++++++++++++ samples/net/tcp-send.c | 177 ++++++++++++++++++++++++++++++++ samples/net/tcp-sink.c | 80 +++++++++++++++ samples/net/tls-send.c | 188 ++++++++++++++++++++++++++++++++++ samples/net/tls-sink.c | 104 +++++++++++++++++++ samples/net/udp-send.c | 156 +++++++++++++++++++++++++++++ samples/net/udp-sink.c | 84 ++++++++++++++++ samples/net/unix-send.c | 151 ++++++++++++++++++++++++++++ samples/net/unix-sink.c | 54 ++++++++++ 14 files changed, 1522 insertions(+) create mode 100644 samples/net/Makefile create mode 100644 samples/net/alg-encrypt.c create mode 100644 samples/net/alg-hash.c create mode 100644 samples/net/splice-out.c create mode 100644 samples/net/tcp-send.c create mode 100644 samples/net/tcp-sink.c create mode 100644 samples/net/tls-send.c create mode 100644 samples/net/tls-sink.c create mode 100644 samples/net/udp-send.c create mode 100644 samples/net/udp-sink.c create mode 100644 samples/net/unix-send.c create mode 100644 samples/net/unix-sink.c diff --git a/samples/Kconfig b/samples/Kconfig index b2db430bd3ff..928e06b08b99 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -280,6 +280,20 @@ config SAMPLE_KMEMLEAK Build a sample program which have explicitly leaks memory to test kmemleak +config SAMPLE_NET + bool "Build example programs for driving network protocols" + depends on NET + help + Build example userspace programs for driving network protocols. Most + of the programs (tcp, udp, tls, unix) come as client-server pairs + that allow the test to be split across a network (but not in the unix + case); but some, such as the AF_ALG samples are standalone as there + is no server per se. + + The programs allow sendfile and splice to be used. An additional + program is provided that allows sendfile/splice to stdout for use in + piping in to the other programs to operate splice there. + source "samples/rust/Kconfig" endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7727f1a0d6d1..b9fbf80a53be 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -37,3 +37,4 @@ obj-$(CONFIG_SAMPLE_KMEMLEAK) += kmemleak/ obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/ obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/ obj-$(CONFIG_SAMPLES_RUST) += rust/ +obj-$(CONFIG_SAMPLE_NET) += net/ diff --git a/samples/net/Makefile b/samples/net/Makefile new file mode 100644 index 000000000000..0ccd68a36edf --- /dev/null +++ b/samples/net/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0-only +userprogs-always-y += \ + alg-hash \ + alg-encrypt \ + splice-out \ + tcp-send \ + tcp-sink \ + tls-send \ + tls-sink \ + udp-send \ + udp-sink \ + unix-send \ + unix-sink diff --git a/samples/net/alg-encrypt.c b/samples/net/alg-encrypt.c new file mode 100644 index 000000000000..3851b5fbaeda --- /dev/null +++ b/samples/net/alg-encrypt.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* AF_ALG hash test + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <limits.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/un.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/sendfile.h> +#include <linux/if_alg.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) +#define min(x, y) ((x) < (y) ? (x) : (y)) + +static unsigned char buffer[4096 * 32] __attribute__((aligned(4096))); +static unsigned char iv[16]; +static unsigned char key[16]; + +static const struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "skcipher", + .salg_name = "cbc(aes)", +}; + +static void format(void) +{ + fprintf(stderr, "alg-send [-ds] [-n<size>] <file>|-\n"); + exit(2); +} + +static void algif_add_set_op(struct msghdr *msg, unsigned int op) +{ + struct cmsghdr *__cmsg; + + __cmsg = msg->msg_control + msg->msg_controllen; + __cmsg->cmsg_len = CMSG_LEN(sizeof(unsigned int)); + __cmsg->cmsg_level = SOL_ALG; + __cmsg->cmsg_type = ALG_SET_OP; + *(unsigned int *)CMSG_DATA(__cmsg) = op; + msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len); +} + +static void algif_add_set_iv(struct msghdr *msg, const void *iv, size_t ivlen) +{ + struct af_alg_iv *ivbuf; + struct cmsghdr *__cmsg; + + printf("%zx\n", msg->msg_controllen); + __cmsg = msg->msg_control + msg->msg_controllen; + __cmsg->cmsg_len = CMSG_LEN(sizeof(*ivbuf) + ivlen); + __cmsg->cmsg_level = SOL_ALG; + __cmsg->cmsg_type = ALG_SET_IV; + ivbuf = (struct af_alg_iv *)CMSG_DATA(__cmsg); + ivbuf->ivlen = ivlen; + memcpy(ivbuf->iv, iv, ivlen); + msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len); +} + +int main(int argc, char *argv[]) +{ + struct msghdr msg; + struct stat st; + const char *filename; + unsigned char ctrl[4096]; + unsigned int flags = O_RDONLY; + ssize_t r, w, o, ret; + size_t size = LONG_MAX, total = 0, i, out = 160; + char *end; + bool use_sendfile = false, all = true; + int opt, alg, sock, fd = 0; + + while ((opt = getopt(argc, argv, "dn:s")) != EOF) { + switch (opt) { + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + all = false; + break; + case 's': + use_sendfile = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 1) + format(); + filename = argv[0]; + + alg = socket(AF_ALG, SOCK_SEQPACKET, 0); + OSERROR(alg, "AF_ALG"); + OSERROR(bind(alg, (struct sockaddr *)&sa, sizeof(sa)), "bind"); + OSERROR(setsockopt(alg, SOL_ALG, ALG_SET_KEY, key, sizeof(key)), + "ALG_SET_KEY"); + sock = accept(alg, NULL, 0); + OSERROR(sock, "accept"); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), argv[2]); + } + + memset(&msg, 0, sizeof(msg)); + msg.msg_control = ctrl; + algif_add_set_op(&msg, ALG_OP_ENCRYPT); + algif_add_set_iv(&msg, iv, sizeof(iv)); + + OSERROR(sendmsg(sock, &msg, MSG_MORE), "sock/sendmsg"); + + if (!use_sendfile) { + bool more = false; + + while (size) { + r = read(fd, buffer, sizeof(buffer)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + more = size > 0; + w = send(sock, buffer + o, r - o, + more ? MSG_MORE : 0); + OSERROR(w, "sock/send"); + total += w; + o += w; + } while (o < r); + } + + if (more) + send(sock, NULL, 0, 0); + } else if (S_ISFIFO(st.st_mode)) { + do { + r = splice(fd, NULL, sock, NULL, size, + size > 0 ? SPLICE_F_MORE : 0); + OSERROR(r, "sock/splice"); + size -= r; + total += r; + } while (r > 0 && size > 0); + if (size && !all) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + total = r; + } + + while (total > 0) { + ret = read(sock, buffer, min(sizeof(buffer), total)); + OSERROR(ret, "sock/read"); + if (ret == 0) + break; + total -= ret; + + if (out > 0) { + ret = min(out, ret); + out -= ret; + for (i = 0; i < ret; i++) + printf("%02x", (unsigned char)buffer[i]); + } + printf("...\n"); + } + + OSERROR(close(sock), "sock/close"); + OSERROR(close(alg), "alg/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/alg-hash.c b/samples/net/alg-hash.c new file mode 100644 index 000000000000..df63c87e7661 --- /dev/null +++ b/samples/net/alg-hash.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* AF_ALG hash test + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <limits.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/un.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/sendfile.h> +#include <linux/if_alg.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[4096 * 32] __attribute__((aligned(4096))); + +static const struct sockaddr_alg sa = { + .salg_family = AF_ALG, + .salg_type = "hash", + .salg_name = "sha1", +}; + +static void format(void) +{ + fprintf(stderr, "alg-send [-ds] [-n<size>] <file>|-\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct stat st; + const char *filename; + unsigned int flags = O_RDONLY; + ssize_t r, w, o, ret; + size_t size = LONG_MAX, i; + char *end; + int use_sendfile = 0; + int opt, alg, sock, fd = 0; + + while ((opt = getopt(argc, argv, "n:s")) != EOF) { + switch (opt) { + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + break; + case 's': + use_sendfile = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 1) + format(); + filename = argv[0]; + + alg = socket(AF_ALG, SOCK_SEQPACKET, 0); + OSERROR(alg, "AF_ALG"); + OSERROR(bind(alg, (struct sockaddr *)&sa, sizeof(sa)), "bind"); + sock = accept(alg, NULL, 0); + OSERROR(sock, "accept"); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), argv[2]); + } + + if (!use_sendfile) { + bool more = false; + + while (size) { + r = read(fd, buffer, sizeof(buffer)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + more = size > 0; + w = send(sock, buffer + o, r - o, + more ? MSG_MORE : 0); + OSERROR(w, "sock/send"); + o += w; + } while (o < r); + } + + if (more) + send(sock, NULL, 0, 0); + } else if (S_ISFIFO(st.st_mode)) { + r = splice(fd, NULL, sock, NULL, size, 0); + OSERROR(r, "sock/splice"); + if (r != size) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + } + + ret = read(sock, buffer, sizeof(buffer)); + OSERROR(ret, "sock/read"); + + for (i = 0; i < ret; i++) + printf("%02x", (unsigned char)buffer[i]); + printf("\n"); + + OSERROR(close(sock), "sock/close"); + OSERROR(close(alg), "alg/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/splice-out.c b/samples/net/splice-out.c new file mode 100644 index 000000000000..224010dfd387 --- /dev/null +++ b/samples/net/splice-out.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Splice or sendfile from the given file/stdin to stdout. + * + * Format: splice-out [-s] <file>|- [<size>] + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <getopt.h> +#include <sys/stat.h> +#include <sys/sendfile.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) +#define min(x, y) ((x) < (y) ? (x) : (y)) + +static unsigned char buffer[4096]; + +static void format(void) +{ + fprintf(stderr, "splice-out [-dkN][-s][-wN] <file>|- [<size>]\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct stat st; + const char *filename; + unsigned int flags = O_RDONLY; + ssize_t r; + size_t size = 1024 * 1024, skip = 0, unit = 0, part; + char *end; + bool use_sendfile = false, all = true; + int opt, fd = 0; + + while ((opt = getopt(argc, argv, "dk:sw:")), + opt != -1) { + switch (opt) { + case 'd': + flags |= O_DIRECT; + break; + case 'k': + /* Skip size - prevent coalescence. */ + skip = strtoul(optarg, &end, 0); + if (skip < 1 || skip >= 4096) { + fprintf(stderr, "-kN must be 0<N<4096\n"); + exit(2); + } + break; + case 's': + use_sendfile = 1; + break; + case 'w': + /* Write unit size */ + unit = strtoul(optarg, &end, 0); + if (!unit) { + fprintf(stderr, "-wN must be >0\n"); + exit(2); + } + switch (*end) { + case 'K': + case 'k': + unit *= 1024; + break; + case 'M': + case 'm': + unit *= 1024 * 1024; + break; + } + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + + if (argc != 1 && argc != 2) + format(); + + filename = argv[0]; + if (argc == 2) { + size = strtoul(argv[1], &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + all = false; + } + + OSERROR(fstat(1, &st), "stdout"); + if (!S_ISFIFO(st.st_mode)) { + fprintf(stderr, "stdout must be a pipe\n"); + exit(3); + } + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + if (!all && size > st.st_size) { + fprintf(stderr, "%s: Specified size larger than file\n", + filename); + exit(3); + } + } + + do { + if (skip) { + part = skip; + do { + r = read(fd, buffer, skip); + OSERROR(r, filename); + part -= r; + } while (part > 0 && r > 0); + } + + part = unit ? min(size, unit) : size; + if (use_sendfile) { + r = sendfile(1, fd, NULL, part); + OSERROR(r, "sendfile"); + } else { + r = splice(fd, NULL, 1, NULL, part, 0); + OSERROR(r, "splice"); + } + if (!all) + size -= r; + } while (r > 0 && size > 0); + + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/tcp-send.c b/samples/net/tcp-send.c new file mode 100644 index 000000000000..608055354789 --- /dev/null +++ b/samples/net/tcp-send.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * TCP send client. Pass -s to use splice/sendfile; -z to use MSG_ZEROCOPY. + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <limits.h> +#include <fcntl.h> +#include <unistd.h> +#include <netdb.h> +#include <netinet/in.h> +#include <sys/stat.h> +#include <sys/sendfile.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[4096] __attribute__((aligned(4096))); + +static void format(void) +{ + fprintf(stderr, + "tcp-send [-46dsz][-p<port>][-n<size>] <file>|- <server>\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct addrinfo *addrs = NULL, hints = {}; + struct stat st; + const char *filename, *sockname, *service = "5555"; + unsigned int flags = O_RDONLY; + ssize_t r, w, o; + size_t size = LONG_MAX; + char *end; + bool use_sendfile = false, use_zerocopy = false, all = true; + int opt, sock, fd = 0, gai; + + hints.ai_family = AF_UNSPEC; + hints.ai_socktype = SOCK_STREAM; + + while ((opt = getopt(argc, argv, "46dn:p:sz")) != EOF) { + switch (opt) { + case '4': + hints.ai_family = AF_INET; + break; + case '6': + hints.ai_family = AF_INET6; + break; + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + all = false; + break; + case 'p': + service = optarg; + break; + case 's': + use_sendfile = true; + break; + case 'z': + use_zerocopy = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 2) + format(); + filename = argv[0]; + sockname = argv[1]; + + gai = getaddrinfo(sockname, service, &hints, &addrs); + if (gai) { + fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai)); + exit(3); + } + + if (!addrs) { + fprintf(stderr, "%s: No addresses\n", sockname); + exit(3); + } + + sockname = addrs->ai_canonname; + sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol); + OSERROR(sock, "socket"); + OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect"); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + if (size > st.st_size) + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), filename); + } + + if (!use_sendfile) { + unsigned int flags = 0; + + if (use_zerocopy) { + int zcflag = 1; + + OSERROR(setsockopt(sock, SOL_SOCKET, SO_ZEROCOPY, + &zcflag, sizeof(zcflag)), + "SOCK_ZEROCOPY"); + flags |= MSG_ZEROCOPY; + } + + while (size) { + r = read(fd, buffer, sizeof(buffer)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + flags &= ~MSG_MORE; + if (size > 0) + flags |= MSG_MORE; + w = send(sock, buffer + o, r - o, flags); + OSERROR(w, "sock/send"); + o += w; + } while (o < r); + } + + if (flags & MSG_MORE) + send(sock, NULL, 0, flags & ~MSG_MORE); + } else if (S_ISFIFO(st.st_mode)) { + do { + r = splice(fd, NULL, sock, NULL, size, + size > 0 ? SPLICE_F_MORE : 0); + OSERROR(r, "sock/splice"); + size -= r; + } while (r > 0 && size > 0); + if (size && !all) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + } + + OSERROR(close(sock), "sock/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/tcp-sink.c b/samples/net/tcp-sink.c new file mode 100644 index 000000000000..5c27c24dfb76 --- /dev/null +++ b/samples/net/tcp-sink.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * TCP sink server + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <fcntl.h> +#include <unistd.h> +#include <netinet/in.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[512 * 1024]; + +static void format(void) +{ + fprintf(stderr, "tcp-sink [-4][-p<port>]\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + unsigned int port = 5555; + bool ipv6 = true; + int opt, server_sock, sock; + + + while ((opt = getopt(argc, argv, "4p:")) != EOF) { + switch (opt) { + case '4': + ipv6 = false; + break; + case 'p': + port = atoi(optarg); + break; + default: + format(); + } + } + + if (!ipv6) { + struct sockaddr_in sin = { + .sin_family = AF_INET, + .sin_port = htons(port), + }; + server_sock = socket(AF_INET, SOCK_STREAM, 0); + OSERROR(server_sock, "socket"); + OSERROR(bind(server_sock, (struct sockaddr *)&sin, sizeof(sin)), + "bind"); + OSERROR(listen(server_sock, 1), "listen"); + } else { + struct sockaddr_in6 sin6 = { + .sin6_family = AF_INET6, + .sin6_port = htons(port), + }; + server_sock = socket(AF_INET6, SOCK_STREAM, 0); + OSERROR(server_sock, "socket"); + OSERROR(bind(server_sock, (struct sockaddr *)&sin6, + sizeof(sin6)), + "bind"); + OSERROR(listen(server_sock, 1), "listen"); + } + + for (;;) { + sock = accept(server_sock, NULL, NULL); + if (sock != -1) { + while (read(sock, buffer, sizeof(buffer)) > 0) + ; + close(sock); + } + } +} diff --git a/samples/net/tls-send.c b/samples/net/tls-send.c new file mode 100644 index 000000000000..d99b79aaf536 --- /dev/null +++ b/samples/net/tls-send.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * TLS-over-TCP send client. Pass -s to splice. + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <limits.h> +#include <fcntl.h> +#include <unistd.h> +#include <netdb.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <sys/stat.h> +#include <sys/sendfile.h> +#include <linux/tls.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[4096]; + +static void format(void) +{ + fprintf(stderr, + "tls-send [-46ds][-n<size>][-p<port>] <file>|- <server>\n"); + exit(2); +} + +static void set_tls(int sock) +{ + struct tls12_crypto_info_aes_gcm_128 crypto_info; + + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; + memset(crypto_info.iv, 0, TLS_CIPHER_AES_GCM_128_IV_SIZE); + memset(crypto_info.rec_seq, 0, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); + memset(crypto_info.key, 0, TLS_CIPHER_AES_GCM_128_KEY_SIZE); + memset(crypto_info.salt, 0, TLS_CIPHER_AES_GCM_128_SALT_SIZE); + + OSERROR(setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")), + "TCP_ULP"); + OSERROR(setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, + sizeof(crypto_info)), + "TLS_TX"); + OSERROR(setsockopt(sock, SOL_TLS, TLS_RX, &crypto_info, + sizeof(crypto_info)), + "TLS_RX"); +} + +int main(int argc, char *argv[]) +{ + struct addrinfo *addrs = NULL, hints = {}; + struct stat st; + const char *filename, *sockname, *service = "5556"; + unsigned int flags = O_RDONLY; + ssize_t r, w, o; + size_t size = LONG_MAX; + char *end; + bool use_sendfile = false, all = true; + int opt, sock, fd = 0, gai; + + hints.ai_family = AF_UNSPEC; + hints.ai_socktype = SOCK_STREAM; + + while ((opt = getopt(argc, argv, "46dn:p:s")) != EOF) { + switch (opt) { + case '4': + hints.ai_family = AF_INET; + break; + case '6': + hints.ai_family = AF_INET6; + break; + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + all = false; + break; + case 'p': + service = optarg; + break; + case 's': + use_sendfile = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 2) + format(); + filename = argv[0]; + sockname = argv[1]; + + gai = getaddrinfo(sockname, service, &hints, &addrs); + if (gai) { + fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai)); + exit(3); + } + + if (!addrs) { + fprintf(stderr, "%s: No addresses\n", sockname); + exit(3); + } + + sockname = addrs->ai_canonname; + sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol); + OSERROR(sock, "socket"); + OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect"); + set_tls(sock); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + if (size > st.st_size) + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), filename); + } + + if (!use_sendfile) { + bool more = false; + + while (size) { + r = read(fd, buffer, sizeof(buffer)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + more = size > 0; + w = send(sock, buffer + o, r - o, + more ? MSG_MORE : 0); + OSERROR(w, "sock/send"); + o += w; + } while (o < r); + } + + if (more) + send(sock, NULL, 0, 0); + } else if (S_ISFIFO(st.st_mode)) { + do { + r = splice(fd, NULL, sock, NULL, size, + size > 0 ? SPLICE_F_MORE : 0); + OSERROR(r, "sock/splice"); + size -= r; + } while (r > 0 && size > 0); + if (size && !all) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + } + + OSERROR(close(sock), "sock/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/tls-sink.c b/samples/net/tls-sink.c new file mode 100644 index 000000000000..67900b74d6d6 --- /dev/null +++ b/samples/net/tls-sink.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * TLS-over-TCP sink server + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <fcntl.h> +#include <unistd.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <linux/tls.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[512 * 1024]; + +static void format(void) +{ + fprintf(stderr, "tls-sink [-4][-p<port>]\n"); + exit(2); +} + +static void set_tls(int sock) +{ + struct tls12_crypto_info_aes_gcm_128 crypto_info; + + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; + memset(crypto_info.iv, 0, TLS_CIPHER_AES_GCM_128_IV_SIZE); + memset(crypto_info.rec_seq, 0, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); + memset(crypto_info.key, 0, TLS_CIPHER_AES_GCM_128_KEY_SIZE); + memset(crypto_info.salt, 0, TLS_CIPHER_AES_GCM_128_SALT_SIZE); + + OSERROR(setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")), + "TCP_ULP"); + OSERROR(setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, + sizeof(crypto_info)), + "TLS_TX"); + OSERROR(setsockopt(sock, SOL_TLS, TLS_RX, &crypto_info, + sizeof(crypto_info)), + "TLS_RX"); +} + +int main(int argc, char *argv[]) +{ + unsigned int port = 5556; + bool ipv6 = true; + int opt, server_sock, sock; + + + while ((opt = getopt(argc, argv, "4p:")) != EOF) { + switch (opt) { + case '4': + ipv6 = false; + break; + case 'p': + port = atoi(optarg); + break; + default: + format(); + } + } + + if (!ipv6) { + struct sockaddr_in sin = { + .sin_family = AF_INET, + .sin_port = htons(port), + }; + server_sock = socket(AF_INET, SOCK_STREAM, 0); + OSERROR(server_sock, "socket"); + OSERROR(bind(server_sock, (struct sockaddr *)&sin, sizeof(sin)), + "bind"); + OSERROR(listen(server_sock, 1), "listen"); + } else { + struct sockaddr_in6 sin6 = { + .sin6_family = AF_INET6, + .sin6_port = htons(port), + }; + server_sock = socket(AF_INET6, SOCK_STREAM, 0); + OSERROR(server_sock, "socket"); + OSERROR(bind(server_sock, (struct sockaddr *)&sin6, + sizeof(sin6)), + "bind"); + OSERROR(listen(server_sock, 1), "listen"); + } + + for (;;) { + sock = accept(server_sock, NULL, NULL); + if (sock != -1) { + set_tls(sock); + while (read(sock, buffer, sizeof(buffer)) > 0) + ; + close(sock); + } + } +} diff --git a/samples/net/udp-send.c b/samples/net/udp-send.c new file mode 100644 index 000000000000..7c6c27eb0fcc --- /dev/null +++ b/samples/net/udp-send.c @@ -0,0 +1,156 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * UDP send client. Pass -s to splice. + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <fcntl.h> +#include <unistd.h> +#include <netdb.h> +#include <netinet/in.h> +#include <sys/stat.h> +#include <sys/sendfile.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) +#define min(x, y) ((x) < (y) ? (x) : (y)) + +static unsigned char buffer[65536]; + +static void format(void) +{ + fprintf(stderr, + "udp-send [-46s][-n<size>][-p<port>] <file>|- <server>\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct addrinfo *addrs = NULL, hints = {}; + struct stat st; + const char *filename, *sockname, *service = "5555"; + unsigned int flags = O_RDONLY, len; + ssize_t r, o, size = 65535; + char *end; + bool use_sendfile = false; + int opt, sock, fd = 0, gai; + + hints.ai_family = AF_UNSPEC; + hints.ai_socktype = SOCK_DGRAM; + + while ((opt = getopt(argc, argv, "46dn:p:s")) != EOF) { + switch (opt) { + case '4': + hints.ai_family = AF_INET; + break; + case '6': + hints.ai_family = AF_INET6; + break; + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + } + if (size > 65535) { + fprintf(stderr, + "Too much data for UDP packet\n"); + exit(2); + } + break; + case 'p': + service = optarg; + break; + case 's': + use_sendfile = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 2) + format(); + filename = argv[0]; + sockname = argv[1]; + + gai = getaddrinfo(sockname, service, &hints, &addrs); + if (gai) { + fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai)); + exit(3); + } + + if (!addrs) { + fprintf(stderr, "%s: No addresses\n", sockname); + exit(3); + } + + sockname = addrs->ai_canonname; + sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol); + OSERROR(sock, "socket"); + OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect"); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + if (size > st.st_size) + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), filename); + } + + len = htonl(size); + OSERROR(send(sock, &len, 4, MSG_MORE), "sock/send"); + + if (!use_sendfile) { + while (size) { + r = read(fd, buffer, sizeof(buffer)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + ssize_t w = send(sock, buffer + o, r - o, + size > 0 ? MSG_MORE : 0); + OSERROR(w, "sock/send"); + o += w; + } while (o < r); + } + } else if (S_ISFIFO(st.st_mode)) { + r = splice(fd, NULL, sock, NULL, size, 0); + OSERROR(r, "sock/splice"); + if (r != size) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + } + + OSERROR(close(sock), "sock/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/udp-sink.c b/samples/net/udp-sink.c new file mode 100644 index 000000000000..f23c64acec4a --- /dev/null +++ b/samples/net/udp-sink.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * UDP sink server + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <fcntl.h> +#include <unistd.h> +#include <netinet/in.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[512 * 1024]; + +static void format(void) +{ + fprintf(stderr, "udp-sink [-4][-p<port>]\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct iovec iov[1] = { + [0] = { + .iov_base = buffer, + .iov_len = sizeof(buffer), + }, + }; + struct msghdr msg = { + .msg_iov = iov, + .msg_iovlen = 1, + }; + unsigned int port = 5555; + bool ipv6 = true; + int opt, sock; + + while ((opt = getopt(argc, argv, "4p:")) != EOF) { + switch (opt) { + case '4': + ipv6 = false; + break; + case 'p': + port = atoi(optarg); + break; + default: + format(); + } + } + + if (!ipv6) { + struct sockaddr_in sin = { + .sin_family = AF_INET, + .sin_port = htons(port), + }; + sock = socket(AF_INET, SOCK_DGRAM, 0); + OSERROR(sock, "socket"); + OSERROR(bind(sock, (struct sockaddr *)&sin, sizeof(sin)), + "bind"); + } else { + struct sockaddr_in6 sin6 = { + .sin6_family = AF_INET6, + .sin6_port = htons(port), + }; + sock = socket(AF_INET6, SOCK_DGRAM, 0); + OSERROR(sock, "socket"); + OSERROR(bind(sock, (struct sockaddr *)&sin6, sizeof(sin6)), + "bind"); + } + + for (;;) { + ssize_t r; + + r = recvmsg(sock, &msg, 0); + printf("rx %zd\n", r); + } +} diff --git a/samples/net/unix-send.c b/samples/net/unix-send.c new file mode 100644 index 000000000000..5950fcf1ccd2 --- /dev/null +++ b/samples/net/unix-send.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * AF_UNIX stream send client. Pass -s to use splice/sendfile. + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <getopt.h> +#include <limits.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/un.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/sendfile.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) +#define min(x, y) ((x) < (y) ? (x) : (y)) + +static unsigned char buffer[4096]; + +static void format(void) +{ + fprintf(stderr, "unix-send [-ds] [-n<size>] <file>|- <socket-file>\n"); + exit(2); +} + +int main(int argc, char *argv[]) +{ + struct sockaddr_un sun = { .sun_family = AF_UNIX, }; + struct stat st; + const char *filename, *sockname; + unsigned int flags = O_RDONLY; + ssize_t r, w, o, size = LONG_MAX; + size_t plen, total = 0; + char *end; + bool use_sendfile = false, all = true; + int opt, sock, fd = 0; + + while ((opt = getopt(argc, argv, "dn:s")) != EOF) { + switch (opt) { + case 'd': + flags |= O_DIRECT; + break; + case 'n': + size = strtoul(optarg, &end, 0); + switch (*end) { + case 'K': + case 'k': + size *= 1024; + break; + case 'M': + case 'm': + size *= 1024 * 1024; + break; + } + all = false; + break; + case 's': + use_sendfile = true; + break; + default: + format(); + } + } + + argc -= optind; + argv += optind; + if (argc != 2) + format(); + filename = argv[0]; + sockname = argv[1]; + + plen = strlen(sockname); + if (plen == 0 || plen > sizeof(sun.sun_path) - 1) { + fprintf(stderr, "socket filename too short or too long\n"); + exit(2); + } + memcpy(sun.sun_path, sockname, plen + 1); + + sock = socket(AF_UNIX, SOCK_STREAM, 0); + OSERROR(sock, "socket"); + OSERROR(connect(sock, (struct sockaddr *)&sun, sizeof(sun)), "connect"); + + if (strcmp(filename, "-") != 0) { + fd = open(filename, flags); + OSERROR(fd, filename); + OSERROR(fstat(fd, &st), filename); + if (size > st.st_size) + size = st.st_size; + } else { + OSERROR(fstat(fd, &st), argv[2]); + } + + if (!use_sendfile) { + bool more = false; + + while (size) { + r = read(fd, buffer, min(sizeof(buffer), size)); + OSERROR(r, filename); + if (r == 0) + break; + size -= r; + + o = 0; + do { + more = size > 0; + w = send(sock, buffer + o, r - o, + more ? MSG_MORE : 0); + OSERROR(w, "sock/send"); + o += w; + total += w; + } while (o < r); + } + + if (more) + send(sock, NULL, 0, 0); + } else if (S_ISFIFO(st.st_mode)) { + do { + r = splice(fd, NULL, sock, NULL, size, + size > 0 ? SPLICE_F_MORE : 0); + OSERROR(r, "sock/splice"); + size -= r; + total += r; + } while (r > 0 && size > 0); + if (size && !all) { + fprintf(stderr, "Short splice\n"); + exit(1); + } + } else { + r = sendfile(sock, fd, NULL, size); + OSERROR(r, "sock/sendfile"); + if (r != size) { + fprintf(stderr, "Short sendfile\n"); + exit(1); + } + total += r; + } + + printf("Sent %zu bytes\n", total); + OSERROR(close(sock), "sock/close"); + OSERROR(close(fd), "close"); + return 0; +} diff --git a/samples/net/unix-sink.c b/samples/net/unix-sink.c new file mode 100644 index 000000000000..9f0a5ac9c578 --- /dev/null +++ b/samples/net/unix-sink.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * UNIX stream sink server + * + * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/un.h> +#include <sys/socket.h> + +#define OSERROR(X, Y) \ + do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0) + +static unsigned char buffer[512 * 1024]; + +int main(int argc, char *argv[]) +{ + struct sockaddr_un sun = { .sun_family = AF_UNIX, }; + size_t plen; + int server_sock, sock; + + if (argc != 2) { + fprintf(stderr, "unix-sink <socket-file>\n"); + exit(2); + } + + plen = strlen(argv[1]); + if (plen == 0 || plen > sizeof(sun.sun_path) - 1) { + fprintf(stderr, "socket filename too short or too long\n"); + exit(2); + } + memcpy(sun.sun_path, argv[1], plen + 1); + + server_sock = socket(AF_UNIX, SOCK_STREAM, 0); + OSERROR(server_sock, "socket"); + OSERROR(bind(server_sock, (struct sockaddr *)&sun, sizeof(sun)), + "bind"); + OSERROR(listen(server_sock, 1), "listen"); + + for (;;) { + sock = accept(server_sock, NULL, NULL); + if (sock != -1) { + while (read(sock, buffer, sizeof(buffer)) > 0) + ; + close(sock); + } + } +} ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing 2023-06-02 15:07 ` [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing David Howells @ 2023-06-03 6:38 ` Jakub Kicinski 0 siblings, 0 replies; 17+ messages in thread From: Jakub Kicinski @ 2023-06-03 6:38 UTC (permalink / raw) To: David Howells Cc: netdev, Linus Torvalds, Chuck Lever, Boris Pismenny, John Fastabend, David S. Miller, Eric Dumazet, Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Herbert Xu On Fri, 2 Jun 2023 16:07:52 +0100 David Howells wrote: > Examples include: > > ./splice-out -w0x400 /foo/16K 4K | ./alg-encrypt -s - > ./splice-out -w0x400 /foo/1M | ./unix-send -s - /tmp/foo > ./splice-out -w0x400 /foo/16K 16K -w1 | ./tls-send -s6 -n16K - servbox > ./tcp-send /bin/ls 192.168.6.1 > ./udp-send -4 -p5555 /foo/4K localhost Can it be made into a selftests? Move the code and wrap the above in a bash script? ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-06-03 14:51 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-02 15:07 [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells 2023-06-02 15:07 ` [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells 2023-06-02 15:07 ` [PATCH net-next v3 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells 2023-06-02 15:07 ` [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush David Howells 2023-06-02 18:27 ` Simon Horman 2023-06-02 19:00 ` Dan Carpenter 2023-06-03 14:51 ` Simon Horman 2023-06-02 15:07 ` [PATCH net-next v3 04/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells 2023-06-02 15:07 ` [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells 2023-06-02 16:36 ` Linus Torvalds 2023-06-02 15:07 ` [PATCH net-next v3 06/11] tls: Address behaviour change in multi_chunk_sendfile kselftest David Howells 2023-06-02 15:07 ` [PATCH net-next v3 07/11] tls/sw: Support MSG_SPLICE_PAGES David Howells 2023-06-02 15:07 ` [PATCH net-next v3 08/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells 2023-06-02 15:07 ` [PATCH net-next v3 09/11] tls/device: Support MSG_SPLICE_PAGES David Howells 2023-06-02 15:07 ` [PATCH net-next v3 10/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells 2023-06-02 15:07 ` [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing David Howells 2023-06-03 6:38 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).