* [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS
@ 2023-06-05 12:45 David Howells
2023-06-05 12:45 ` [PATCH net-next v4 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
` (10 more replies)
0 siblings, 11 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Here are patches to do the following:
(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.
(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.
(3) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.
(4) Provide a ->splice_eof() op[2] that allows splice to signal to its
output that the input observed a premature EOF and that the caller
didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
flushed. This attempts to maintain the current behaviour. It is also
not called if we didn't manage to read any data and so didn't called
the actor function.
This needs routing though several layers to get it down to the network
protocol.
[!] Note that I chose not to pass in any flags - I'm not sure it's
particularly useful to pass in the splice flags; I also elected
not to return any error code - though we might actually want to do
that.
(5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
there is one.
(6) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.
(7) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.
I've pushed the patches here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-2-tls
David
Changes
=======
ver #4)
- Switch to using ->splice_eof() to signal premature EOF to the splice
output[2].
ver #3)
- Include the splice-to-socket rewrite patch.
- Fix SPLICE_F_MORE/MSG_MORE signalling.
- Allow AF_TLS to accept sendmsg() with MSG_SPLICE_PAGES before it is
handled.
- Allow a zero-length send() to a TLS socket to flush an outstanding
record.
- Address TLS kselftest failure.
ver #2)
- Dropped the slab data copying.
- "rls_" should be "tls_".
- Attempted to fix splice_direct_to_actor().
- Blocked MSG_SENDPAGE_* from being set by userspace.
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1
David Howells (11):
net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
splice, net: Add a splice_eof op to file-ops and socket-ops
tls/sw: Use splice_eof() to flush
tls/device: Use splice_eof() to flush
splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
tls/sw: Support MSG_SPLICE_PAGES
tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
tls/device: Support MSG_SPLICE_PAGES
tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
fs/splice.c | 207 ++++++++++++++++++++++++++-------
include/linux/fs.h | 3 +-
include/linux/net.h | 1 +
include/linux/socket.h | 4 +-
include/linux/splice.h | 3 +
include/net/sock.h | 1 +
net/socket.c | 36 ++----
net/tls/tls.h | 2 +
net/tls/tls_device.c | 110 +++++++++---------
net/tls/tls_main.c | 4 +
net/tls/tls_sw.c | 253 ++++++++++++++++++++++-------------------
11 files changed, 385 insertions(+), 239 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next v4 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].
Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
implementations can then allow them through.
Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1]
---
include/linux/socket.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index bd1cc3238851..3fd3436bc09f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -339,7 +339,9 @@ struct ucred {
#endif
/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
-#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
+#define MSG_INTERNAL_SENDMSG_FLAGS \
+ (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
+ MSG_SENDPAGE_DECRYPTED)
/* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
#define SOL_IP 0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-06-05 12:45 ` [PATCH net-next v4 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now. This means the data will just be copied until
MSG_SPLICE_PAGES is handled.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 3 ++-
net/tls/tls_sw.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..9ef766e41c7a 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -447,7 +447,8 @@ static int tls_push_data(struct sock *sk,
long timeo;
if (flags &
- ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+ MSG_SPLICE_PAGES))
return -EOPNOTSUPP;
if (unlikely(sk->sk_err))
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6e6a7c37d685..cac1adc968e8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -953,7 +953,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int pending;
if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT))
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
return -EOPNOTSUPP;
ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-06-05 12:45 ` [PATCH net-next v4 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
2023-06-05 12:45 ` [PATCH net-next v4 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 14:50 ` Simon Horman
2023-06-05 12:45 ` [PATCH net-next v4 04/11] splice, net: Add a splice_eof op to file-ops and socket-ops David Howells
` (7 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().
MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.
This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.
Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
fs/splice.c | 158 +++++++++++++++++++++++++++++++++--------
include/linux/fs.h | 2 -
include/linux/splice.h | 2 +
net/socket.c | 26 +------
4 files changed, 131 insertions(+), 57 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..9b1d43c0c562 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,6 +33,7 @@
#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/gfp.h>
+#include <linux/net.h>
#include <linux/socket.h>
#include <linux/sched/signal.h>
@@ -448,30 +449,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
};
EXPORT_SYMBOL(nosteal_pipe_buf_ops);
-/*
- * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
- * using sendpage(). Return the number of bytes sent.
- */
-static int pipe_to_sendpage(struct pipe_inode_info *pipe,
- struct pipe_buffer *buf, struct splice_desc *sd)
-{
- struct file *file = sd->u.file;
- loff_t pos = sd->pos;
- int more;
-
- if (!likely(file->f_op->sendpage))
- return -EINVAL;
-
- more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
-
- if (sd->len < sd->total_len &&
- pipe_occupancy(pipe->head, pipe->tail) > 1)
- more |= MSG_SENDPAGE_NOTLAST;
-
- return file->f_op->sendpage(file, buf->page, buf->offset,
- sd->len, &pos, more);
-}
-
static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
{
smp_mb();
@@ -652,7 +629,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des
* Description:
* This function does little more than loop over the pipe and call
* @actor to do the actual moving of a single struct pipe_buffer to
- * the desired destination. See pipe_to_file, pipe_to_sendpage, or
+ * the desired destination. See pipe_to_file, pipe_to_sendmsg, or
* pipe_to_user.
*
*/
@@ -833,8 +810,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
EXPORT_SYMBOL(iter_file_splice_write);
+#ifdef CONFIG_NET
/**
- * generic_splice_sendpage - splice data from a pipe to a socket
+ * splice_to_socket - splice data from a pipe to a socket
* @pipe: pipe to splice from
* @out: socket to write to
* @ppos: position in @out
@@ -846,13 +824,131 @@ EXPORT_SYMBOL(iter_file_splice_write);
* is involved.
*
*/
-ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
{
- return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage);
-}
+ struct socket *sock = sock_from_file(out);
+ struct bio_vec bvec[16];
+ struct msghdr msg = {};
+ ssize_t ret;
+ size_t spliced = 0;
+ bool need_wakeup = false;
+
+ pipe_lock(pipe);
+
+ while (len > 0) {
+ unsigned int head, tail, mask, bc = 0;
+ size_t remain = len;
+
+ /*
+ * Check for signal early to make process killable when there
+ * are always buffers available
+ */
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ break;
+
+ while (pipe_empty(pipe->head, pipe->tail)) {
+ ret = 0;
+ if (!pipe->writers)
+ goto out;
+
+ if (spliced)
+ goto out;
+
+ ret = -EAGAIN;
+ if (flags & SPLICE_F_NONBLOCK)
+ goto out;
+
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ goto out;
+
+ if (need_wakeup) {
+ wakeup_pipe_writers(pipe);
+ need_wakeup = false;
+ }
+
+ pipe_wait_readable(pipe);
+ }
+
+ head = pipe->head;
+ tail = pipe->tail;
+ mask = pipe->ring_size - 1;
+
+ while (!pipe_empty(head, tail)) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg;
-EXPORT_SYMBOL(generic_splice_sendpage);
+ if (!buf->len) {
+ tail++;
+ continue;
+ }
+
+ seg = min_t(size_t, remain, buf->len);
+ seg = min_t(size_t, seg, PAGE_SIZE);
+
+ ret = pipe_buf_confirm(pipe, buf);
+ if (unlikely(ret)) {
+ if (ret == -ENODATA)
+ ret = 0;
+ break;
+ }
+
+ bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset);
+ remain -= seg;
+ if (seg >= buf->len)
+ tail++;
+ if (bc >= ARRAY_SIZE(bvec))
+ break;
+ }
+
+ if (!bc)
+ break;
+
+ msg.msg_flags = MSG_SPLICE_PAGES;
+ if (flags & SPLICE_F_MORE)
+ msg.msg_flags |= MSG_MORE;
+ if (remain && pipe_occupancy(pipe->head, tail) > 0)
+ msg.msg_flags |= MSG_MORE;
+
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc,
+ len - remain);
+ ret = sock_sendmsg(sock, &msg);
+ if (ret <= 0)
+ break;
+
+ spliced += ret;
+ len -= ret;
+ tail = pipe->tail;
+ while (ret > 0) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg = min_t(size_t, ret, buf->len);
+
+ buf->offset += seg;
+ buf->len -= seg;
+ ret -= seg;
+
+ if (!buf->len) {
+ pipe_buf_release(pipe, buf);
+ tail++;
+ }
+ }
+
+ if (tail != pipe->tail) {
+ pipe->tail = tail;
+ if (pipe->files)
+ need_wakeup = true;
+ }
+ }
+
+out:
+ pipe_unlock(pipe);
+ if (need_wakeup)
+ wakeup_pipe_writers(pipe);
+ return spliced ?: ret;
+}
+#endif
static int warn_unsupported(struct file *file, const char *op)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..f8254c3acf83 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2759,8 +2759,6 @@ extern ssize_t generic_file_splice_read(struct file *, loff_t *,
struct pipe_inode_info *, size_t, unsigned int);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
-extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
- struct file *out, loff_t *, size_t len, unsigned int flags);
extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
loff_t *opos, size_t len, unsigned int flags);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..991ae318b6eb 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -84,6 +84,8 @@ extern long do_splice(struct file *in, loff_t *off_in,
extern long do_tee(struct file *in, struct file *out, size_t len,
unsigned int flags);
+extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags);
/*
* for dynamic pipe sizing
diff --git a/net/socket.c b/net/socket.c
index 3df96e9ba4e2..c4d9104418c8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -57,6 +57,7 @@
#include <linux/mm.h>
#include <linux/socket.h>
#include <linux/file.h>
+#include <linux/splice.h>
#include <linux/net.h>
#include <linux/interrupt.h>
#include <linux/thread_info.h>
@@ -126,8 +127,6 @@ static long compat_sock_ioctl(struct file *file,
unsigned int cmd, unsigned long arg);
#endif
static int sock_fasync(int fd, struct file *filp, int on);
-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
@@ -162,8 +161,7 @@ static const struct file_operations socket_file_ops = {
.mmap = sock_mmap,
.release = sock_close,
.fasync = sock_fasync,
- .sendpage = sock_sendpage,
- .splice_write = generic_splice_sendpage,
+ .splice_write = splice_to_socket,
.splice_read = sock_splice_read,
.show_fdinfo = sock_show_fdinfo,
};
@@ -1066,26 +1064,6 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
}
EXPORT_SYMBOL(kernel_recvmsg);
-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more)
-{
- struct socket *sock;
- int flags;
- int ret;
-
- sock = file->private_data;
-
- flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
- /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
- flags |= more;
-
- ret = kernel_sendpage(sock, page, offset, size, flags);
-
- if (trace_sock_send_length_enabled())
- call_trace_sock_send_length(sock->sk, ret, 0);
- return ret;
-}
-
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 04/11] splice, net: Add a splice_eof op to file-ops and socket-ops
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (2 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 05/11] tls/sw: Use splice_eof() to flush David Howells
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara,
Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel,
linux-block
Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.
This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():
(1) the user did not set SPLICE_F_MORE (splice only), and
(2) an EOF condition occurred (->splice_read() returned 0), and
(3) we haven't read enough to fulfill the request (ie. len > 0 still), and
(4) we have already spliced at least one byte.
A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
---
fs/splice.c | 31 ++++++++++++++++++++++++++++++-
include/linux/fs.h | 1 +
include/linux/net.h | 1 +
include/linux/splice.h | 1 +
include/net/sock.h | 1 +
net/socket.c | 10 ++++++++++
6 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/fs/splice.c b/fs/splice.c
index 9b1d43c0c562..3063f9a3ba62 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -969,6 +969,17 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
return out->f_op->splice_write(pipe, out, ppos, len, flags);
}
+/*
+ * Indicate to the caller that there was a premature EOF when reading from the
+ * source and the caller didn't indicate they would be sending more data after
+ * this.
+ */
+static void do_splice_eof(struct splice_desc *sd)
+{
+ if (sd->splice_eof)
+ sd->splice_eof(sd);
+}
+
/*
* Attempt to initiate a splice from a file to a pipe.
*/
@@ -1068,7 +1079,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
ret = do_splice_to(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
- goto out_release;
+ goto read_failure;
read_len = ret;
sd->total_len = read_len;
@@ -1108,6 +1119,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
file_accessed(in);
return bytes;
+read_failure:
+ /*
+ * If the user did *not* set SPLICE_F_MORE *and* we didn't hit that
+ * "use all of len" case that cleared SPLICE_F_MORE, *and* we did a
+ * "->splice_in()" that returned EOF (ie zero) *and* we have sent at
+ * least 1 byte *then* we will also do the ->splice_eof() call.
+ */
+ if (ret == 0 && !more && len > 0 && bytes)
+ do_splice_eof(sd);
out_release:
/*
* If we did an incomplete transfer we must release
@@ -1136,6 +1156,14 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
sd->flags);
}
+static void direct_file_splice_eof(struct splice_desc *sd)
+{
+ struct file *file = sd->u.file;
+
+ if (file->f_op->splice_eof)
+ file->f_op->splice_eof(file);
+}
+
/**
* do_splice_direct - splices data directly between two files
* @in: file to splice from
@@ -1161,6 +1189,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
.flags = flags,
.pos = *ppos,
.u.file = out,
+ .splice_eof = direct_file_splice_eof,
.opos = opos,
};
long ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f8254c3acf83..e393f2550300 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1796,6 +1796,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
+ void (*splice_eof)(struct file *file);
int (*setlease)(struct file *, long, struct file_lock **, void **);
long (*fallocate)(struct file *file, int mode, loff_t offset,
loff_t len);
diff --git a/include/linux/net.h b/include/linux/net.h
index b73ad8e3c212..8defc8f1d82e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -210,6 +210,7 @@ struct proto_ops {
int offset, size_t size, int flags);
ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+ void (*splice_eof)(struct socket *sock);
int (*set_peek_off)(struct sock *sk, int val);
int (*peek_len)(struct socket *sock);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 991ae318b6eb..4fab18a6e371 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -38,6 +38,7 @@ struct splice_desc {
struct file *file; /* file to read/write */
void *data; /* cookie */
} u;
+ void (*splice_eof)(struct splice_desc *sd); /* Unexpected EOF handler */
loff_t pos; /* file position */
loff_t *opos; /* sendfile: output position */
size_t num_spliced; /* number of bytes already spliced */
diff --git a/include/net/sock.h b/include/net/sock.h
index 656ea89f60ff..330b9c24ef70 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1267,6 +1267,7 @@ struct proto {
size_t len, int flags, int *addr_len);
int (*sendpage)(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
+ void (*splice_eof)(struct socket *sock);
int (*bind)(struct sock *sk,
struct sockaddr *addr, int addr_len);
int (*bind_add)(struct sock *sk,
diff --git a/net/socket.c b/net/socket.c
index c4d9104418c8..b778fc03c6e0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -130,6 +130,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
+static void sock_splice_eof(struct file *file);
#ifdef CONFIG_PROC_FS
static void sock_show_fdinfo(struct seq_file *m, struct file *f)
@@ -163,6 +164,7 @@ static const struct file_operations socket_file_ops = {
.fasync = sock_fasync,
.splice_write = splice_to_socket,
.splice_read = sock_splice_read,
+ .splice_eof = sock_splice_eof,
.show_fdinfo = sock_show_fdinfo,
};
@@ -1076,6 +1078,14 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
return sock->ops->splice_read(sock, ppos, pipe, len, flags);
}
+static void sock_splice_eof(struct file *file)
+{
+ struct socket *sock = file->private_data;
+
+ if (sock->ops->splice_eof)
+ sock->ops->splice_eof(sock);
+}
+
static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 05/11] tls/sw: Use splice_eof() to flush
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (3 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 04/11] splice, net: Add a splice_eof op to file-ops and socket-ops David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 06/11] tls/device: " David Howells
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls.h | 1 +
net/tls/tls_main.c | 2 ++
net/tls/tls_sw.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 77 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 0672acab2773..4922668fefaa 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -97,6 +97,7 @@ void tls_update_rx_zc_capable(struct tls_context *tls_ctx);
void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx);
void tls_sw_strparser_done(struct tls_context *tls_ctx);
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+void tls_sw_splice_eof(struct socket *sock);
int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
int tls_sw_sendpage(struct sock *sk, struct page *page,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 3d45fdb5c4e9..83fa15e52af6 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -924,6 +924,7 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
ops[TLS_BASE][TLS_BASE] = *base;
ops[TLS_SW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
+ ops[TLS_SW ][TLS_BASE].splice_eof = tls_sw_splice_eof;
ops[TLS_SW ][TLS_BASE].sendpage_locked = tls_sw_sendpage_locked;
ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE];
@@ -992,6 +993,7 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
prot[TLS_SW][TLS_BASE].sendmsg = tls_sw_sendmsg;
+ prot[TLS_SW][TLS_BASE].splice_eof = tls_sw_splice_eof;
prot[TLS_SW][TLS_BASE].sendpage = tls_sw_sendpage;
prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index cac1adc968e8..7a6bb670073f 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1155,6 +1155,80 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
return copied > 0 ? copied : ret;
}
+/*
+ * Handle unexpected EOF during splice without SPLICE_F_MORE set.
+ */
+void tls_sw_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+ struct tls_rec *rec;
+ struct sk_msg *msg_pl;
+ ssize_t copied = 0;
+ bool retrying = false;
+ int ret = 0;
+ int pending;
+
+ if (!ctx->open_rec)
+ return;
+
+ mutex_lock(&tls_ctx->tx_lock);
+ lock_sock(sk);
+
+retry:
+ rec = ctx->open_rec;
+ if (!rec)
+ goto unlock;
+
+ msg_pl = &rec->msg_plaintext;
+
+ /* Check the BPF advisor and perform transmission. */
+ ret = bpf_exec_tx_verdict(msg_pl, sk, false, TLS_RECORD_TYPE_DATA,
+ &copied, 0);
+ switch (ret) {
+ case 0:
+ case -EAGAIN:
+ if (retrying)
+ goto unlock;
+ retrying = true;
+ goto retry;
+ case -EINPROGRESS:
+ break;
+ default:
+ goto unlock;
+ }
+
+ /* Wait for pending encryptions to get completed */
+ spin_lock_bh(&ctx->encrypt_compl_lock);
+ ctx->async_notify = true;
+
+ pending = atomic_read(&ctx->encrypt_pending);
+ spin_unlock_bh(&ctx->encrypt_compl_lock);
+ if (pending)
+ crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ else
+ reinit_completion(&ctx->async_wait.completion);
+
+ /* There can be no concurrent accesses, since we have no pending
+ * encrypt operations
+ */
+ WRITE_ONCE(ctx->async_notify, false);
+
+ if (ctx->async_wait.err)
+ goto unlock;
+
+ /* Transmit if any encryptions have completed */
+ if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
+ cancel_delayed_work(&ctx->tx_work.work);
+ tls_tx_records(sk, 0);
+ }
+
+unlock:
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+}
+
static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 06/11] tls/device: Use splice_eof() to flush
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (4 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 05/11] tls/sw: Use splice_eof() to flush David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 07/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls.h | 1 +
net/tls/tls_device.c | 23 +++++++++++++++++++++++
net/tls/tls_main.c | 2 ++
3 files changed, 26 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 4922668fefaa..d002c3af1966 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -116,6 +116,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
size_t len, unsigned int flags);
int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+void tls_device_splice_eof(struct socket *sock);
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
int tls_tx_records(struct sock *sk, int flags);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 9ef766e41c7a..439be833dcf9 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -590,6 +590,29 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
return rc;
}
+void tls_device_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ union tls_iter_offset iter;
+ struct iov_iter iov_iter = {};
+
+ if (!tls_is_partially_sent_record(tls_ctx))
+ return;
+
+ mutex_lock(&tls_ctx->tx_lock);
+ lock_sock(sk);
+
+ if (tls_is_partially_sent_record(tls_ctx)) {
+ iov_iter_bvec(&iov_iter, ITER_SOURCE, NULL, 0, 0);
+ iter.msg_iter = &iov_iter;
+ tls_push_data(sk, iter, 0, 0, TLS_RECORD_TYPE_DATA, NULL);
+ }
+
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+}
+
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 83fa15e52af6..113f3ff91d6c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -1009,10 +1009,12 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
#ifdef CONFIG_TLS_DEVICE
prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
prot[TLS_HW][TLS_BASE].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_BASE].splice_eof = tls_device_splice_eof;
prot[TLS_HW][TLS_BASE].sendpage = tls_device_sendpage;
prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW];
prot[TLS_HW][TLS_SW].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_SW].splice_eof = tls_device_splice_eof;
prot[TLS_HW][TLS_SW].sendpage = tls_device_sendpage;
prot[TLS_BASE][TLS_HW] = prot[TLS_BASE][TLS_SW];
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 07/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (5 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 06/11] tls/device: " David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 08/11] tls/sw: Support MSG_SPLICE_PAGES David Howells
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara,
Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel,
linux-block
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket. The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.
The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set.
MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).
This was observed with the multi_chunk_sendfile tests in the tls kselftest
program. Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:
build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile
This has been observed before[2] and worked around in AF_TLS[3].
Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size. SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request.
If, however, we get a premature EOF from ->splice_read(), have sent at
least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof()
will be invoked.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3]
---
Notes:
ver #4)
- Use ->splice_eof() to signal a premature EOF to the splice output.
fs/splice.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 3063f9a3ba62..1be3ba622b0c 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1063,13 +1063,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
*/
bytes = 0;
len = sd->total_len;
+
+ /* Don't block on output, we have to drain the direct pipe. */
flags = sd->flags;
+ sd->flags &= ~SPLICE_F_NONBLOCK;
/*
- * Don't block on output, we have to drain the direct pipe.
+ * We signal MORE until we've read sufficient data to fulfill the
+ * request and we keep signalling it if the caller set it.
*/
- sd->flags &= ~SPLICE_F_NONBLOCK;
more = sd->flags & SPLICE_F_MORE;
+ sd->flags |= SPLICE_F_MORE;
WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));
@@ -1085,14 +1089,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
sd->total_len = read_len;
/*
- * If more data is pending, set SPLICE_F_MORE
- * If this is the last data and SPLICE_F_MORE was not set
- * initially, clears it.
+ * If we now have sufficient data to fulfill the request then
+ * we clear SPLICE_F_MORE if it was not set initially.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len >= len && !more)
sd->flags &= ~SPLICE_F_MORE;
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 08/11] tls/sw: Support MSG_SPLICE_PAGES
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (6 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 07/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 09/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #2)
- "rls_" should be "tls_".
net/tls/tls_sw.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7a6bb670073f..b85f92be7c9d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -929,6 +929,38 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags)
&copied, flags);
}
+static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
+ struct sk_msg *msg_pl, size_t try_to_copy,
+ ssize_t *copied)
+{
+ struct page *page = NULL, **pages = &page;
+
+ do {
+ ssize_t part;
+ size_t off;
+ bool put = false;
+
+ part = iov_iter_extract_pages(&msg->msg_iter, &pages,
+ try_to_copy, 1, 0, &off);
+ if (part <= 0)
+ return part ?: -EIO;
+
+ if (WARN_ON_ONCE(!sendpage_ok(page))) {
+ iov_iter_revert(&msg->msg_iter, part);
+ return -EIO;
+ }
+
+ sk_msg_page_add(msg_pl, page, part, off);
+ sk_mem_charge(sk, part);
+ if (put)
+ put_page(page);
+ *copied += part;
+ try_to_copy -= part;
+ } while (try_to_copy && !sk_msg_full(msg_pl));
+
+ return 0;
+}
+
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -1018,6 +1050,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
full_record = true;
}
+ if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) {
+ ret = tls_sw_sendmsg_splice(sk, msg, msg_pl,
+ try_to_copy, &copied);
+ if (ret < 0)
+ goto send_end;
+ tls_ctx->pending_open_record_frags = true;
+ if (full_record || eor || sk_msg_full(msg_pl))
+ goto copied;
+ continue;
+ }
+
if (!is_kvec && (full_record || eor) && !async_capable) {
u32 first = msg_pl->sg.end;
@@ -1080,8 +1123,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
/* Open records defined only if successfully copied, otherwise
* we would trim the sg but not reset the open record frags.
*/
- tls_ctx->pending_open_record_frags = true;
copied += try_to_copy;
+copied:
+ tls_ctx->pending_open_record_frags = true;
if (full_record || eor) {
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 09/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (7 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 08/11] tls/sw: Support MSG_SPLICE_PAGES David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:45 ` [PATCH net-next v4 10/11] tls/device: Support MSG_SPLICE_PAGES David Howells
2023-06-05 12:46 ` [PATCH net-next v4 11/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, bpf
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.
[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
---
net/tls/tls_sw.c | 173 ++++++++++-------------------------------------
1 file changed, 35 insertions(+), 138 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index b85f92be7c9d..5ffb8de862f6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -961,7 +961,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
return 0;
}
-int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
+ size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -984,15 +985,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int ret = 0;
int pending;
- if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
- return -EOPNOTSUPP;
-
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
-
if (unlikely(msg->msg_controllen)) {
ret = tls_process_cmsg(sk, msg, &record_type);
if (ret) {
@@ -1193,10 +1185,27 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
send_end:
ret = sk_stream_error(sk, msg->msg_flags, ret);
+ return copied > 0 ? copied : ret;
+}
+int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ int ret;
+
+ if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+ MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+ return -EOPNOTSUPP;
+
+ ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
+ if (ret)
+ return ret;
+ lock_sock(sk);
+ ret = tls_sw_sendmsg_locked(sk, msg, size);
release_sock(sk);
mutex_unlock(&tls_ctx->tx_lock);
- return copied > 0 ? copied : ret;
+ return ret;
}
/*
@@ -1273,151 +1282,39 @@ void tls_sw_splice_eof(struct socket *sock)
mutex_unlock(&tls_ctx->tx_lock);
}
-static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
- int offset, size_t size, int flags)
-{
- long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
- struct tls_prot_info *prot = &tls_ctx->prot_info;
- unsigned char record_type = TLS_RECORD_TYPE_DATA;
- struct sk_msg *msg_pl;
- struct tls_rec *rec;
- int num_async = 0;
- ssize_t copied = 0;
- bool full_record;
- int record_room;
- int ret = 0;
- bool eor;
-
- eor = !(flags & MSG_SENDPAGE_NOTLAST);
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
- /* Call the sk_stream functions to manage the sndbuf mem. */
- while (size > 0) {
- size_t copy, required_size;
-
- if (sk->sk_err) {
- ret = -sk->sk_err;
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- rec = ctx->open_rec;
- else
- rec = ctx->open_rec = tls_get_rec(sk);
- if (!rec) {
- ret = -ENOMEM;
- goto sendpage_end;
- }
-
- msg_pl = &rec->msg_plaintext;
-
- full_record = false;
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
- copy = size;
- if (copy >= record_room) {
- copy = record_room;
- full_record = true;
- }
-
- required_size = msg_pl->sg.size + copy + prot->overhead_size;
-
- if (!sk_stream_memory_free(sk))
- goto wait_for_sndbuf;
-alloc_payload:
- ret = tls_alloc_encrypted_msg(sk, required_size);
- if (ret) {
- if (ret != -ENOSPC)
- goto wait_for_memory;
-
- /* Adjust copy according to the amount that was
- * actually allocated. The difference is due
- * to max sg elements limit
- */
- copy -= required_size - msg_pl->sg.size;
- full_record = true;
- }
-
- sk_msg_page_add(msg_pl, page, copy, offset);
- sk_mem_charge(sk, copy);
-
- offset += copy;
- size -= copy;
- copied += copy;
-
- tls_ctx->pending_open_record_frags = true;
- if (full_record || eor || sk_msg_full(msg_pl)) {
- ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
- record_type, &copied, flags);
- if (ret) {
- if (ret == -EINPROGRESS)
- num_async++;
- else if (ret == -ENOMEM)
- goto wait_for_memory;
- else if (ret != -EAGAIN) {
- if (ret == -ENOSPC)
- ret = 0;
- goto sendpage_end;
- }
- }
- }
- continue;
-wait_for_sndbuf:
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
- ret = sk_stream_wait_memory(sk, &timeo);
- if (ret) {
- if (ctx->open_rec)
- tls_trim_both_msgs(sk, msg_pl->sg.size);
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- goto alloc_payload;
- }
-
- if (num_async) {
- /* Transmit if any encryptions have completed */
- if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
- cancel_delayed_work(&ctx->tx_work.work);
- tls_tx_records(sk, flags);
- }
- }
-sendpage_end:
- ret = sk_stream_error(sk, flags, ret);
- return copied > 0 ? copied : ret;
-}
-
int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
MSG_NO_SHARED_FRAGS))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- return tls_sw_do_sendpage(sk, page, offset, size, flags);
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg_locked(sk, &msg, size);
}
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- int ret;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
- ret = tls_sw_do_sendpage(sk, page, offset, size, flags);
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return ret;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg(sk, &msg, size);
}
static int
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 10/11] tls/device: Support MSG_SPLICE_PAGES
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (8 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 09/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-06-05 12:45 ` David Howells
2023-06-05 12:46 ` [PATCH net-next v4 11/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:45 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 439be833dcf9..bb3bb523544e 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -509,6 +509,29 @@ static int tls_push_data(struct sock *sk,
tls_append_frag(record, &zc_pfrag, copy);
iter_offset.offset += copy;
+ } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ struct page_frag zc_pfrag;
+ struct page **pages = &zc_pfrag.page;
+ size_t off;
+
+ rc = iov_iter_extract_pages(iter_offset.msg_iter,
+ &pages, copy, 1, 0, &off);
+ if (rc <= 0) {
+ if (rc == 0)
+ rc = -EIO;
+ goto handle_error;
+ }
+ copy = rc;
+
+ if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+ iov_iter_revert(iter_offset.msg_iter, copy);
+ rc = -EIO;
+ goto handle_error;
+ }
+
+ zc_pfrag.offset = off;
+ zc_pfrag.size = copy;
+ tls_append_frag(record, &zc_pfrag, copy);
} else if (copy) {
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
@@ -572,6 +595,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
union tls_iter_offset iter;
int rc;
+ if (!tls_ctx->zerocopy_sendfile)
+ msg->msg_flags &= ~MSG_SPLICE_PAGES;
+
mutex_lock(&tls_ctx->tx_lock);
lock_sock(sk);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v4 11/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (9 preceding siblings ...)
2023-06-05 12:45 ` [PATCH net-next v4 10/11] tls/device: Support MSG_SPLICE_PAGES David Howells
@ 2023-06-05 12:46 ` David Howells
10 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2023-06-05 12:46 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 92 +++++++++++---------------------------------
1 file changed, 23 insertions(+), 69 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index bb3bb523544e..b4864d55900f 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
return 0;
}
-union tls_iter_offset {
- struct iov_iter *msg_iter;
- int offset;
-};
-
static int tls_push_data(struct sock *sk,
- union tls_iter_offset iter_offset,
+ struct iov_iter *iter,
size_t size, int flags,
- unsigned char record_type,
- struct page *zc_page)
+ unsigned char record_type)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -500,22 +494,13 @@ static int tls_push_data(struct sock *sk,
record = ctx->open_record;
copy = min_t(size_t, size, max_open_record_len - record->len);
- if (copy && zc_page) {
- struct page_frag zc_pfrag;
-
- zc_pfrag.page = zc_page;
- zc_pfrag.offset = iter_offset.offset;
- zc_pfrag.size = copy;
- tls_append_frag(record, &zc_pfrag, copy);
-
- iter_offset.offset += copy;
- } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ if (copy && (flags & MSG_SPLICE_PAGES)) {
struct page_frag zc_pfrag;
struct page **pages = &zc_pfrag.page;
size_t off;
- rc = iov_iter_extract_pages(iter_offset.msg_iter,
- &pages, copy, 1, 0, &off);
+ rc = iov_iter_extract_pages(iter, &pages,
+ copy, 1, 0, &off);
if (rc <= 0) {
if (rc == 0)
rc = -EIO;
@@ -524,7 +509,7 @@ static int tls_push_data(struct sock *sk,
copy = rc;
if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
- iov_iter_revert(iter_offset.msg_iter, copy);
+ iov_iter_revert(iter, copy);
rc = -EIO;
goto handle_error;
}
@@ -537,7 +522,7 @@ static int tls_push_data(struct sock *sk,
rc = tls_device_copy_data(page_address(pfrag->page) +
pfrag->offset, copy,
- iter_offset.msg_iter);
+ iter);
if (rc)
goto handle_error;
tls_append_frag(record, pfrag, copy);
@@ -592,7 +577,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
unsigned char record_type = TLS_RECORD_TYPE_DATA;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
int rc;
if (!tls_ctx->zerocopy_sendfile)
@@ -607,8 +591,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
goto out;
}
- iter.msg_iter = &msg->msg_iter;
- rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
+ rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
+ record_type);
out:
release_sock(sk);
@@ -620,8 +604,7 @@ void tls_device_splice_eof(struct socket *sock)
{
struct sock *sk = sock->sk;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
- struct iov_iter iov_iter = {};
+ struct iov_iter iter = {};
if (!tls_is_partially_sent_record(tls_ctx))
return;
@@ -630,9 +613,8 @@ void tls_device_splice_eof(struct socket *sock)
lock_sock(sk);
if (tls_is_partially_sent_record(tls_ctx)) {
- iov_iter_bvec(&iov_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &iov_iter;
- tls_push_data(sk, iter, 0, 0, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_bvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ tls_push_data(sk, &iter, 0, 0, TLS_RECORD_TYPE_DATA);
}
release_sock(sk);
@@ -642,44 +624,18 @@ void tls_device_splice_eof(struct socket *sock)
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter_offset;
- struct iov_iter msg_iter;
- char *kaddr;
- struct kvec iov;
- int rc;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & MSG_SENDPAGE_NOTLAST)
- flags |= MSG_MORE;
-
- mutex_lock(&tls_ctx->tx_lock);
- lock_sock(sk);
+ msg.msg_flags |= MSG_MORE;
- if (flags & MSG_OOB) {
- rc = -EOPNOTSUPP;
- goto out;
- }
-
- if (tls_ctx->zerocopy_sendfile) {
- iter_offset.offset = offset;
- rc = tls_push_data(sk, iter_offset, size,
- flags, TLS_RECORD_TYPE_DATA, page);
- goto out;
- }
-
- kaddr = kmap(page);
- iov.iov_base = kaddr + offset;
- iov.iov_len = size;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size);
- iter_offset.msg_iter = &msg_iter;
- rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
- NULL);
- kunmap(page);
+ if (flags & MSG_OOB)
+ return -EOPNOTSUPP;
-out:
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return rc;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_device_sendmsg(sk, &msg, size);
}
struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -744,12 +700,10 @@ EXPORT_SYMBOL(tls_get_record);
static int tls_device_push_pending_record(struct sock *sk, int flags)
{
- union tls_iter_offset iter;
- struct iov_iter msg_iter;
+ struct iov_iter iter;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &msg_iter;
- return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA);
}
void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
2023-06-05 12:45 ` [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
@ 2023-06-05 14:50 ` Simon Horman
2023-06-05 15:10 ` David Howells
0 siblings, 1 reply; 15+ messages in thread
From: Simon Horman @ 2023-06-05 14:50 UTC (permalink / raw)
To: David Howells
Cc: netdev, Linus Torvalds, Chuck Lever, Boris Pismenny,
John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
Jens Axboe, linux-mm, linux-kernel
On Mon, Jun 05, 2023 at 01:45:52PM +0100, David Howells wrote:
...
> @@ -846,13 +824,131 @@ EXPORT_SYMBOL(iter_file_splice_write);
> * is involved.
> *
> */
> -ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
> - loff_t *ppos, size_t len, unsigned int flags)
> +ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
> + loff_t *ppos, size_t len, unsigned int flags)
> {
> - return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage);
> -}
> + struct socket *sock = sock_from_file(out);
> + struct bio_vec bvec[16];
> + struct msghdr msg = {};
> + ssize_t ret;
> + size_t spliced = 0;
> + bool need_wakeup = false;
> +
> + pipe_lock(pipe);
> +
> + while (len > 0) {
Hi David,
I'm assuming the answer is that this cannot occur,
but I thought I should mention this anyway.
If the initial value of len is 0 (or less).
...
> +
> +out:
> + pipe_unlock(pipe);
> + if (need_wakeup)
> + wakeup_pipe_writers(pipe);
> + return spliced ?: ret;
Then ret will be used uninitialised here.
> +}
> +#endif
>
> static int warn_unsupported(struct file *file, const char *op)
> {
...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
2023-06-05 14:50 ` Simon Horman
@ 2023-06-05 15:10 ` David Howells
2023-06-05 15:18 ` Simon Horman
0 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2023-06-05 15:10 UTC (permalink / raw)
To: Simon Horman
Cc: dhowells, netdev, Linus Torvalds, Chuck Lever, Boris Pismenny,
John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
Jens Axboe, linux-mm, linux-kernel
Simon Horman <simon.horman@corigine.com> wrote:
> I'm assuming the answer is that this cannot occur,
> but I thought I should mention this anyway.
>
> If the initial value of len is 0 (or less).
> ...
> > + return spliced ?: ret;
>
> Then ret will be used uninitialised here.
len shouldn't be <0 as it's size_t.
I don't think it should be possible to get there with len==0 - at least from
userspace. sys_splice() returns immediately and sys_sendfile() either splices
to a pipe or goes via splice_direct_to_actor() will just drop straight out.
But there are kernel users - nfsd for example - but I don't know if they would
splice directly to a socket.
That said, it's probably worth preclearing ret just to be sure.
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
2023-06-05 15:10 ` David Howells
@ 2023-06-05 15:18 ` Simon Horman
0 siblings, 0 replies; 15+ messages in thread
From: Simon Horman @ 2023-06-05 15:18 UTC (permalink / raw)
To: David Howells
Cc: netdev, Linus Torvalds, Chuck Lever, Boris Pismenny,
John Fastabend, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
Jens Axboe, linux-mm, linux-kernel
On Mon, Jun 05, 2023 at 04:10:57PM +0100, David Howells wrote:
> Simon Horman <simon.horman@corigine.com> wrote:
>
> > I'm assuming the answer is that this cannot occur,
> > but I thought I should mention this anyway.
> >
> > If the initial value of len is 0 (or less).
> > ...
> > > + return spliced ?: ret;
> >
> > Then ret will be used uninitialised here.
>
> len shouldn't be <0 as it's size_t.
Yes, sorry for not noticing that.
> I don't think it should be possible to get there with len==0 - at least from
> userspace. sys_splice() returns immediately and sys_sendfile() either splices
> to a pipe or goes via splice_direct_to_actor() will just drop straight out.
> But there are kernel users - nfsd for example - but I don't know if they would
> splice directly to a socket.
>
> That said, it's probably worth preclearing ret just to be sure.
Thanks, I agree it's better to be safe than sorry.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-06-05 15:19 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-05 12:45 [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-06-05 12:45 ` [PATCH net-next v4 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
2023-06-05 12:45 ` [PATCH net-next v4 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells
2023-06-05 12:45 ` [PATCH net-next v4 03/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
2023-06-05 14:50 ` Simon Horman
2023-06-05 15:10 ` David Howells
2023-06-05 15:18 ` Simon Horman
2023-06-05 12:45 ` [PATCH net-next v4 04/11] splice, net: Add a splice_eof op to file-ops and socket-ops David Howells
2023-06-05 12:45 ` [PATCH net-next v4 05/11] tls/sw: Use splice_eof() to flush David Howells
2023-06-05 12:45 ` [PATCH net-next v4 06/11] tls/device: " David Howells
2023-06-05 12:45 ` [PATCH net-next v4 07/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells
2023-06-05 12:45 ` [PATCH net-next v4 08/11] tls/sw: Support MSG_SPLICE_PAGES David Howells
2023-06-05 12:45 ` [PATCH net-next v4 09/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-06-05 12:45 ` [PATCH net-next v4 10/11] tls/device: Support MSG_SPLICE_PAGES David Howells
2023-06-05 12:46 ` [PATCH net-next v4 11/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).