* [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
@ 2026-05-31 1:01 Askar Safin
2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Askar Safin @ 2026-05-31 1:01 UTC (permalink / raw)
To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
Cc: linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds,
Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells,
Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi,
patches
This patchset is for VFS.
Recently we got a lot of vulnerabilities in splice/vmsplice.
Also vmsplice already was source of vulnerabilities in the past:
CVE-2020-29374 (see https://lwn.net/Articles/849638/ ).
Also vmsplice is problematic for other reasons. Here is what other
developers say:
Linus Torvalds in 2023:
> So I'd personally be perfectly ok with just making vmsplice() be
> exactly the same as write, and turn all of vmsplice() into just "it's
> a read() if the pipe is open for read, and a write if it's open for
> writing".
https://lore.kernel.org/all/CAHk-=wgG_2cmHgZwKjydi7=iimyHyN8aessnbM9XQ9ufbaUz9g@mail.gmail.com/
Christoph Hellwig in May 2026:
> vmsplice is the worst, as it is one of the few remaining places that
> can incorrectly dirty file backed pages without telling the file system
> and cause the other problems fixed by a FOLL_PIN conversion, but it is
> the only one where we do not have any idea yet how we could convert it
> to FOLL_PIN due to the unbounded pin time.
https://lore.kernel.org/all/agwFlBKvKytjURDO@infradead.org/
See recent discussion here:
https://lore.kernel.org/all/20260516182126.530498-1-pfalcato@suse.de/T/#u
For all these reasons I propose to make vmsplice a simple wrapper for
preadv2/pwritev2.
vmsplice(fd, vec, vlen, vmsplice_flags) will
be equivalent to preadv2(fd, vec, vlen, -1, rw_flags) if you have
readable pipe and to pwritev2(fd, vec, vlen, -1, rw_flags) if you have
writable pipe.
SPLICE_F_NONBLOCK is translated to RWF_NOWAIT, all other SPLICE_F_*
flags are ignored.
There is a small change to handling of NONBLOCK-related flags,
see commit messages for details.
I tested this patch in Qemu.
This patchset was written by me, not by LLMs.
Askar Safin (3):
tee: fs/splice.c: remove unused parameter "flags" from "link_pipe"
vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
splice: remove PIPE_BUF_FLAG_GIFT
fs/fuse/dev.c | 1 -
fs/read_write.c | 23 +++++
fs/splice.c | 202 +-------------------------------------
include/linux/pipe_fs_i.h | 1 -
include/linux/skbuff.h | 4 +-
include/linux/splice.h | 2 +-
include/linux/syscalls.h | 4 +-
7 files changed, 33 insertions(+), 204 deletions(-)
base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d (7.1-rc5)
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin @ 2026-05-31 1:01 ` Askar Safin 2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin ` (4 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Askar Safin @ 2026-05-31 1:01 UTC (permalink / raw) To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara Cc: linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches Remove unused parameter "flags" from "link_pipe". Signed-off-by: Askar Safin <safinaskar@gmail.com> --- fs/splice.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 9d8f63e2fd1a..59adbc2fa4d6 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1849,7 +1849,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, */ static ssize_t link_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, - size_t len, unsigned int flags) + size_t len) { struct pipe_buffer *ibuf, *obuf; unsigned int i_head, o_head; @@ -1962,7 +1962,7 @@ ssize_t do_tee(struct file *in, struct file *out, size_t len, if (!ret) { ret = opipe_prep(opipe, flags); if (!ret) - ret = link_pipe(ipipe, opipe, len, flags); + ret = link_pipe(ipipe, opipe, len); } } -- 2.47.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin 2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin @ 2026-05-31 1:01 ` Askar Safin 2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin ` (3 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Askar Safin @ 2026-05-31 1:01 UTC (permalink / raw) To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara Cc: linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches vmsplice behavior on writable pipe became equivalent to pwritev2. vmsplice behavior on readable pipe already was nearly equivalent to preadv2, but I made this explicit. I. e. I made it obvious from code that vmsplice now is equivalent to preadv2/pwritev2. Also I moved vmsplice to fs/read_write.c, because now it arguably belongs there. Note that SPLICE_F_NONBLOCK behavior slightly changed: previously vmsplice ignored whether the pipe was opened with O_NONBLOCK, and mode of operation depended on whether SPLICE_F_NONBLOCK was passed only. Now the operation will be non-blocking if O_NONBLOCK was passed when opening *or* SPLICE_F_NONBLOCK was passed to vmsplice. Previous behavior was arguably buggy, and new behavior is arguably better. Now SPLICE_F_GIFT is always ignored by all 3 syscalls: splice, tee and vmsplice. Signed-off-by: Askar Safin <safinaskar@gmail.com> --- fs/read_write.c | 23 +++++ fs/splice.c | 192 +-------------------------------------- include/linux/skbuff.h | 4 +- include/linux/splice.h | 2 +- include/linux/syscalls.h | 4 +- 5 files changed, 29 insertions(+), 196 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 50bff7edc91f..1e5444f4dab3 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1213,6 +1213,29 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec, return do_pwritev(fd, vec, vlen, pos, flags); } +/* + * Legacy preadv2/pwritev2 wrapper. + */ +SYSCALL_DEFINE4(vmsplice, unsigned long, fd, const struct iovec __user *, vec, + unsigned long, vlen, unsigned int, flags) +{ + if (unlikely(flags & ~SPLICE_F_ALL)) + return -EINVAL; + + CLASS(fd, f)(fd); + if (fd_empty(f)) + return -EBADF; + + /* We do do_writev/do_readv, so it is okay to pass "false" here */ + if (!get_pipe_info(fd_file(f), /* for_splice = */ false)) + return -EBADF; + + if (fd_file(f)->f_mode & FMODE_WRITE) + return do_writev(fd, vec, vlen, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0); + else + return do_readv(fd, vec, vlen, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0); +} + /* * Various compat syscalls. Note that they all pretend to take a native * iovec - import_iovec will properly treat those as compat_iovecs based on diff --git a/fs/splice.c b/fs/splice.c index 59adbc2fa4d6..b1a4e3713bd6 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -159,22 +159,6 @@ const struct pipe_buf_operations page_cache_pipe_buf_ops = { .get = generic_pipe_buf_get, }; -static bool user_page_pipe_buf_try_steal(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - if (!(buf->flags & PIPE_BUF_FLAG_GIFT)) - return false; - - buf->flags |= PIPE_BUF_FLAG_LRU; - return generic_pipe_buf_try_steal(pipe, buf); -} - -static const struct pipe_buf_operations user_page_pipe_buf_ops = { - .release = page_cache_pipe_buf_release, - .try_steal = user_page_pipe_buf_try_steal, - .get = generic_pipe_buf_get, -}; - static void wakeup_pipe_readers(struct pipe_inode_info *pipe) { smp_mb(); @@ -589,8 +573,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des * Description: * This function does little more than loop over the pipe and call * @actor to do the actual moving of a single struct pipe_buffer to - * the desired destination. See pipe_to_file, pipe_to_sendmsg, or - * pipe_to_user. + * the desired destination. See pipe_to_file or pipe_to_sendmsg. * */ ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd, @@ -1440,179 +1423,6 @@ static ssize_t __do_splice(struct file *in, loff_t __user *off_in, return ret; } -static ssize_t iter_to_pipe(struct iov_iter *from, - struct pipe_inode_info *pipe, - unsigned int flags) -{ - struct pipe_buffer buf = { - .ops = &user_page_pipe_buf_ops, - .flags = flags - }; - size_t total = 0; - ssize_t ret = 0; - - while (iov_iter_count(from)) { - struct page *pages[16]; - ssize_t left; - size_t start; - int i, n; - - left = iov_iter_get_pages2(from, pages, ~0UL, 16, &start); - if (left <= 0) { - ret = left; - break; - } - - n = DIV_ROUND_UP(left + start, PAGE_SIZE); - for (i = 0; i < n; i++) { - int size = umin(left, PAGE_SIZE - start); - - buf.page = pages[i]; - buf.offset = start; - buf.len = size; - ret = add_to_pipe(pipe, &buf); - if (unlikely(ret < 0)) { - iov_iter_revert(from, left); - // this one got dropped by add_to_pipe() - while (++i < n) - put_page(pages[i]); - goto out; - } - total += ret; - left -= size; - start = 0; - } - } -out: - return total ? total : ret; -} - -static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf, - struct splice_desc *sd) -{ - int n = copy_page_to_iter(buf->page, buf->offset, sd->len, sd->u.data); - return n == sd->len ? n : -EFAULT; -} - -/* - * For lack of a better implementation, implement vmsplice() to userspace - * as a simple copy of the pipe's pages to the user iov. - */ -static ssize_t vmsplice_to_user(struct file *file, struct iov_iter *iter, - unsigned int flags) -{ - struct pipe_inode_info *pipe = get_pipe_info(file, true); - struct splice_desc sd = { - .total_len = iov_iter_count(iter), - .flags = flags, - .u.data = iter - }; - ssize_t ret = 0; - - if (!pipe) - return -EBADF; - - pipe_clear_nowait(file); - - if (sd.total_len) { - pipe_lock(pipe); - ret = __splice_from_pipe(pipe, &sd, pipe_to_user); - pipe_unlock(pipe); - } - - if (ret > 0) - fsnotify_access(file); - - return ret; -} - -/* - * vmsplice splices a user address range into a pipe. It can be thought of - * as splice-from-memory, where the regular splice is splice-from-file (or - * to file). In both cases the output is a pipe, naturally. - */ -static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter, - unsigned int flags) -{ - struct pipe_inode_info *pipe; - ssize_t ret = 0; - unsigned buf_flag = 0; - - if (flags & SPLICE_F_GIFT) - buf_flag = PIPE_BUF_FLAG_GIFT; - - pipe = get_pipe_info(file, true); - if (!pipe) - return -EBADF; - - pipe_clear_nowait(file); - - pipe_lock(pipe); - ret = wait_for_space(pipe, flags); - if (!ret) - ret = iter_to_pipe(iter, pipe, buf_flag); - pipe_unlock(pipe); - if (ret > 0) { - wakeup_pipe_readers(pipe); - fsnotify_modify(file); - } - return ret; -} - -/* - * Note that vmsplice only really supports true splicing _from_ user memory - * to a pipe, not the other way around. Splicing from user memory is a simple - * operation that can be supported without any funky alignment restrictions - * or nasty vm tricks. We simply map in the user memory and fill them into - * a pipe. The reverse isn't quite as easy, though. There are two possible - * solutions for that: - * - * - memcpy() the data internally, at which point we might as well just - * do a regular read() on the buffer anyway. - * - Lots of nasty vm tricks, that are neither fast nor flexible (it - * has restriction limitations on both ends of the pipe). - * - * Currently we punt and implement it as a normal copy, see pipe_to_user(). - * - */ -SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, - unsigned long, nr_segs, unsigned int, flags) -{ - struct iovec iovstack[UIO_FASTIOV]; - struct iovec *iov = iovstack; - struct iov_iter iter; - ssize_t error; - int type; - - if (unlikely(flags & ~SPLICE_F_ALL)) - return -EINVAL; - - CLASS(fd, f)(fd); - if (fd_empty(f)) - return -EBADF; - if (fd_file(f)->f_mode & FMODE_WRITE) - type = ITER_SOURCE; - else if (fd_file(f)->f_mode & FMODE_READ) - type = ITER_DEST; - else - return -EBADF; - - error = import_iovec(type, uiov, nr_segs, - ARRAY_SIZE(iovstack), &iov, &iter); - if (error < 0) - return error; - - if (!iov_iter_count(&iter)) - error = 0; - else if (type == ITER_SOURCE) - error = vmsplice_to_pipe(fd_file(f), &iter, flags); - else - error = vmsplice_to_user(fd_file(f), &iter, flags); - - kfree(iov); - return error; -} - SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in, int, fd_out, loff_t __user *, off_out, size_t, len, unsigned int, flags) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2bcf78a4de7b..2961fee3e5cc 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -505,7 +505,7 @@ enum { SKBFL_ZEROCOPY_ENABLE = BIT(0), /* This indicates at least one fragment might be overwritten - * (as in vmsplice(), sendfile() ...) + * (as in sendfile(), ...) * If we need to compute a TX checksum, we'll need to copy * all frags to avoid possible bad checksum */ @@ -4017,7 +4017,7 @@ static inline int skb_linearize(struct sk_buff *skb) * @skb: buffer to test * * Return: true if the skb has at least one frag that might be modified - * by an external entity (as in vmsplice()/sendfile()) + * by an external entity (as in sendfile()) */ static inline bool skb_has_shared_frag(const struct sk_buff *skb) { diff --git a/include/linux/splice.h b/include/linux/splice.h index 9dec4861d09f..fb4f035aae83 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -19,7 +19,7 @@ /* we may still block on the fd we splice */ /* from/to, of course */ #define SPLICE_F_MORE (0x04) /* expect more data */ -#define SPLICE_F_GIFT (0x08) /* pages passed in are a gift */ +#define SPLICE_F_GIFT (0x08) /* ignored */ #define SPLICE_F_ALL (SPLICE_F_MOVE|SPLICE_F_NONBLOCK|SPLICE_F_MORE|SPLICE_F_GIFT) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index f5639d5ac331..a86a88207956 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -514,8 +514,8 @@ asmlinkage long sys_ppoll_time32(struct pollfd __user *, unsigned int, struct old_timespec32 __user *, const sigset_t __user *, size_t); asmlinkage long sys_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, int flags); -asmlinkage long sys_vmsplice(int fd, const struct iovec __user *iov, - unsigned long nr_segs, unsigned int flags); +asmlinkage long sys_vmsplice(unsigned long fd, const struct iovec __user *vec, + unsigned long vlen, unsigned int flags); asmlinkage long sys_splice(int fd_in, loff_t __user *off_in, int fd_out, loff_t __user *off_out, size_t len, unsigned int flags); -- 2.47.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin 2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin 2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin @ 2026-05-31 1:01 ` Askar Safin 2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato ` (2 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Askar Safin @ 2026-05-31 1:01 UTC (permalink / raw) To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara Cc: linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches It is unused now. Signed-off-by: Askar Safin <safinaskar@gmail.com> --- fs/fuse/dev.c | 1 - fs/splice.c | 6 ++---- include/linux/pipe_fs_i.h | 1 - 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5dda7080f4a9..fb8fe0c96692 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2352,7 +2352,6 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, goto out_free; *obuf = *ibuf; - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; ibuf->offset += obuf->len; ibuf->len -= obuf->len; diff --git a/fs/splice.c b/fs/splice.c index b1a4e3713bd6..6ddf7dd72f7b 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1622,10 +1622,9 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, *obuf = *ibuf; /* - * Don't inherit the gift and merge flags, we need to + * Don't inherit the merge flag, we need to * prevent multiple steals of this page. */ - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE; obuf->len = len; @@ -1711,10 +1710,9 @@ static ssize_t link_pipe(struct pipe_inode_info *ipipe, *obuf = *ibuf; /* - * Don't inherit the gift and merge flag, we need to prevent + * Don't inherit the merge flag, we need to prevent * multiple steals of this page. */ - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE; if (obuf->len > len) diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 7f6a92ac9704..a1eeed800669 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -6,7 +6,6 @@ #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */ #define PIPE_BUF_FLAG_ATOMIC 0x02 /* was atomically mapped */ -#define PIPE_BUF_FLAG_GIFT 0x04 /* page is a gift */ #define PIPE_BUF_FLAG_PACKET 0x08 /* read() as a packet */ #define PIPE_BUF_FLAG_CAN_MERGE 0x10 /* can merge buffers */ #define PIPE_BUF_FLAG_WHOLE 0x20 /* read() must return entire buffer or error */ -- 2.47.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin ` (2 preceding siblings ...) 2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin @ 2026-05-31 8:54 ` Pedro Falcato 2026-05-31 21:21 ` Askar Safin 2026-06-01 3:11 ` Andy Lutomirski 2026-06-01 16:23 ` Christian Brauner 5 siblings, 1 reply; 16+ messages in thread From: Pedro Falcato @ 2026-05-31 8:54 UTC (permalink / raw) To: Askar Safin Cc: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Miklos Szeredi, patches On Sun, May 31, 2026 at 01:01:04AM +0000, Askar Safin wrote: > This patchset is for VFS. > > Recently we got a lot of vulnerabilities in splice/vmsplice. > > Also vmsplice already was source of vulnerabilities in the past: > CVE-2020-29374 (see https://lwn.net/Articles/849638/ ). > > Also vmsplice is problematic for other reasons. Here is what other > developers say: > > Linus Torvalds in 2023: > > So I'd personally be perfectly ok with just making vmsplice() be > > exactly the same as write, and turn all of vmsplice() into just "it's > > a read() if the pipe is open for read, and a write if it's open for > > writing". > https://lore.kernel.org/all/CAHk-=wgG_2cmHgZwKjydi7=iimyHyN8aessnbM9XQ9ufbaUz9g@mail.gmail.com/ > > Christoph Hellwig in May 2026: > > vmsplice is the worst, as it is one of the few remaining places that > > can incorrectly dirty file backed pages without telling the file system > > and cause the other problems fixed by a FOLL_PIN conversion, but it is > > the only one where we do not have any idea yet how we could convert it > > to FOLL_PIN due to the unbounded pin time. > https://lore.kernel.org/all/agwFlBKvKytjURDO@infradead.org/ > > See recent discussion here: > https://lore.kernel.org/all/20260516182126.530498-1-pfalcato@suse.de/T/#u So, you took an ongoing discussion with an ongoing RFC patchset, and you decided to reimplement part of the idea on your own, as a concurrent patchset. Riiiiiight.... I don't think I have to NAK this, do I? > > For all these reasons I propose to make vmsplice a simple wrapper for > preadv2/pwritev2. > > vmsplice(fd, vec, vlen, vmsplice_flags) will > be equivalent to preadv2(fd, vec, vlen, -1, rw_flags) if you have > readable pipe and to pwritev2(fd, vec, vlen, -1, rw_flags) if you have > writable pipe. This does not work. https://codesearch.debian.net/search?q=vmsplice%28&literal=1 There are users. -- Pedro ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato @ 2026-05-31 21:21 ` Askar Safin 2026-06-01 16:16 ` Christian Brauner 0 siblings, 1 reply; 16+ messages in thread From: Askar Safin @ 2026-05-31 21:21 UTC (permalink / raw) To: Pedro Falcato Cc: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Miklos Szeredi, patches On Sun, May 31, 2026 at 11:54 AM Pedro Falcato <pfalcato@suse.de> wrote: > So, you took an ongoing discussion with an ongoing RFC patchset, and you > decided to reimplement part of the idea on your own, as a concurrent patchset. Yes. But I propose an alternative solution to this problem. Brauner said in discussion for your patchset: "So I'm not very likely to pick this up as is". So, I decided to submit another solution. Pedro, I'm not trying to insult you. Other kernel developers will decide which of these two solutions they like more. Many people in discussion of your patchset said how they dislike splice/vmsplice, and especially vmsplice. Hellwig said "vmsplice is the worst". Brauner, Hellwig, Horn said that they dislike vmsplice. They said that vmsplice in its current form should not be used, and that it is broken. Despite all these problems nobody managed to fix vmsplice in all these years. So I propose just to effectively remove it. You may think that I just saw a recent discussion and decided to jump in. No. splice/vmsplice is my topic of interest for many years. You can verify this by searching "f:Askar splice" on lore.kernel.org . I simply decided that given recent vulnerabilities now is the perfect time to solve all these vmsplice problems once and for all. I explained my position here: https://lore.kernel.org/all/20260523204100.553125-1-safinaskar@gmail.com/ . Nobody answered, so I just posted this patchset. If my patchset is applied, then I will try to deal with splice-pagecache-to-pipe somehow, probably by removing it, too. :) I decided first to deal with vmsplice, because it seems to be easier problem. > > vmsplice(fd, vec, vlen, vmsplice_flags) will > > be equivalent to preadv2(fd, vec, vlen, -1, rw_flags) if you have > > readable pipe and to pwritev2(fd, vec, vlen, -1, rw_flags) if you have > > writable pipe. > > This does not work. https://codesearch.debian.net/search?q=vmsplice%28&literal=1 > There are users. Yes, they are. But my solution is compatible. vmsplice is simply performance optimization. vmsplice will work just as before, but slower. And, most importantly, vmsplice design problems will be gone (nobody managed to fix them anyway for all these years). -- Askar Safin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 21:21 ` Askar Safin @ 2026-06-01 16:16 ` Christian Brauner 0 siblings, 0 replies; 16+ messages in thread From: Christian Brauner @ 2026-06-01 16:16 UTC (permalink / raw) To: Askar Safin Cc: Pedro Falcato, linux-fsdevel, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Miklos Szeredi, patches On Mon, Jun 01, 2026 at 12:21:06AM +0300, Askar Safin wrote: > On Sun, May 31, 2026 at 11:54 AM Pedro Falcato <pfalcato@suse.de> wrote: > > So, you took an ongoing discussion with an ongoing RFC patchset, and you > > decided to reimplement part of the idea on your own, as a concurrent patchset. > > Yes. But I propose an alternative solution to this problem. So I think this is a case where no explicit rules have been broken. But if you know that someone has been posting patches and is working on a problem just racing them to get your own stuff merged is very likely to unnecessarily ruffle feathers. So sync with the person next time. The discussion wasn't at an impasse and Pedro is expected to follow-up. It's not very nice to just have someone else's work be for naught. > Brauner said in discussion for your patchset: > "So I'm not very likely to pick this up as is". > So, I decided to submit another solution. This lacks quite some context... I said "in its current form" and the a long discussion ensued. > If my patchset is applied, then I will try to deal > with splice-pagecache-to-pipe somehow, > probably by removing it, too. :) I decided first So ok, but this is literally what Pedro is working on. This just wastes people's time. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin ` (3 preceding siblings ...) 2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato @ 2026-06-01 3:11 ` Andy Lutomirski 2026-06-01 15:36 ` Matthew Wilcox 2026-06-01 16:23 ` Christian Brauner 5 siblings, 1 reply; 16+ messages in thread From: Andy Lutomirski @ 2026-06-01 3:11 UTC (permalink / raw) To: Askar Safin Cc: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches On Sat, May 30, 2026 at 6:03 PM Askar Safin <safinaskar@gmail.com> wrote: > > See recent discussion here: > https://lore.kernel.org/all/20260516182126.530498-1-pfalcato@suse.de/T/#u > > For all these reasons I propose to make vmsplice a simple wrapper for > preadv2/pwritev2. > I have no comment on the code or the history. But I'm 100% in favor of the solution. vmsplice is a crappy API, and would be incredibly complex to get the implementation right, and it should be removed. But it has users, and the approach of just mapping them straight to pread/pwrite makes perfect sense. (If anyone wants to contemplate how bad the API is, contemplate gift mode. Or contemplate that, if you want correct results, you need to avoid modifying the memory until the recipient is done reading or you need to avoid reading the memory until the writer is done writing, and vmsplice *does not tell you when it's done*. And there isn't even a caller specification of whether they want to read or write. It's ... crap.) --Andy ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 3:11 ` Andy Lutomirski @ 2026-06-01 15:36 ` Matthew Wilcox 2026-06-01 15:50 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2026-06-01 15:36 UTC (permalink / raw) To: Andy Lutomirski Cc: Askar Safin, linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches On Sun, May 31, 2026 at 08:11:34PM -0700, Andy Lutomirski wrote: > On Sat, May 30, 2026 at 6:03 PM Askar Safin <safinaskar@gmail.com> wrote: > > > > See recent discussion here: > > https://lore.kernel.org/all/20260516182126.530498-1-pfalcato@suse.de/T/#u > > > > For all these reasons I propose to make vmsplice a simple wrapper for > > preadv2/pwritev2. > > > > I have no comment on the code or the history. But I'm 100% in favor > of the solution. vmsplice is a crappy API, and would be incredibly > complex to get the implementation right, and it should be removed. > But it has users, and the approach of just mapping them straight to > pread/pwrite makes perfect sense. I agree with Andy. I think it was appropriate to send this series, since (as far as I can tell) it's a completely different approach from the others taken. I'm not really qualified to judge whether the implementation is good (it's a bit outside my competency as a reviewer), but the described approach is more convincing to me than the other approaches. Can we review this series properly? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 15:36 ` Matthew Wilcox @ 2026-06-01 15:50 ` Linus Torvalds 2026-06-01 16:17 ` Christian Brauner 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2026-06-01 15:50 UTC (permalink / raw) To: Matthew Wilcox Cc: Andy Lutomirski, Askar Safin, linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches On Mon, 1 Jun 2026 at 08:36, Matthew Wilcox <willy@infradead.org> wrote: > > Can we review this series properly? Well, since it pretty much is what I suggested a few years ago, I certainly won't NAK it. And the patches looked very straightforward to me. Just the final diffstat is worth quoting again because that certainly doesn't look problematic: 7 files changed, 33 insertions(+), 204 deletions(-) and it removes that GIFT flag that was truly disgusting. So I'm certainly ok with it from a "looking at the patch" standpoint. I didn't _test_ it. I don't have any workload that might remotely care. I did a quick scan on debian code search for vmsplice, and after ten pages of entries that weren't actually *using* it but had lists of system calls, I grew bored. So there are likely users, but I don't know what they are and how much they care. It *might* be a big performance issue somewhere. Unlikely, but... Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 15:50 ` Linus Torvalds @ 2026-06-01 16:17 ` Christian Brauner 2026-06-01 16:22 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2026-06-01 16:17 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Andy Lutomirski, Askar Safin, linux-fsdevel, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches On Mon, Jun 01, 2026 at 08:50:00AM -0700, Linus Torvalds wrote: > On Mon, 1 Jun 2026 at 08:36, Matthew Wilcox <willy@infradead.org> wrote: > > > > Can we review this series properly? > > Well, since it pretty much is what I suggested a few years ago, I > certainly won't NAK it. > > And the patches looked very straightforward to me. Just the final > diffstat is worth quoting again because that certainly doesn't look > problematic: > > 7 files changed, 33 insertions(+), 204 deletions(-) > > and it removes that GIFT flag that was truly disgusting. > > So I'm certainly ok with it from a "looking at the patch" standpoint. > I didn't _test_ it. I don't have any workload that might remotely > care. > > I did a quick scan on debian code search for vmsplice, and after ten > pages of entries that weren't actually *using* it but had lists of > system calls, I grew bored. So there are likely users, but I don't > know what they are and how much they care. It *might* be a big > performance issue somewhere. Unlikely, but... As usual I would argue to accept it and revert in case we get actual regression reports... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 16:17 ` Christian Brauner @ 2026-06-01 16:22 ` Linus Torvalds 0 siblings, 0 replies; 16+ messages in thread From: Linus Torvalds @ 2026-06-01 16:22 UTC (permalink / raw) To: Christian Brauner Cc: Matthew Wilcox, Andy Lutomirski, Askar Safin, linux-fsdevel, Alexander Viro, Jan Kara, linux-kernel, linux-mm, linux-api, netdev, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches On Mon, 1 Jun 2026 at 09:17, Christian Brauner <brauner@kernel.org> wrote: > > As usual I would argue to accept it and revert in case we get actual > regression reports... Yes, likely the only way we'd ever find out .. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin ` (4 preceding siblings ...) 2026-06-01 3:11 ` Andy Lutomirski @ 2026-06-01 16:23 ` Christian Brauner 2026-06-01 17:17 ` Linus Torvalds 5 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2026-06-01 16:23 UTC (permalink / raw) To: Askar Safin Cc: Christian Brauner, linux-kernel, linux-mm, linux-api, netdev, Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches, linux-fsdevel, Alexander Viro, Jan Kara On Sun, 31 May 2026 01:01:04 +0000, Askar Safin wrote: > This patchset is for VFS. > > Recently we got a lot of vulnerabilities in splice/vmsplice. > > Also vmsplice already was source of vulnerabilities in the past: > CVE-2020-29374 (see https://lwn.net/Articles/849638/ ). > > [...] Applied to the vfs-7.2.vmsplice branch of the vfs/vfs.git tree. Patches in the vfs-7.2.vmsplice branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-7.2.vmsplice [1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" https://git.kernel.org/vfs/vfs/c/a9f7db50ed2f [2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 https://git.kernel.org/vfs/vfs/c/e2c0b2368081 [3/3] splice: remove PIPE_BUF_FLAG_GIFT https://git.kernel.org/vfs/vfs/c/7d75aa8edfce ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 16:23 ` Christian Brauner @ 2026-06-01 17:17 ` Linus Torvalds 2026-06-01 17:33 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2026-06-01 17:17 UTC (permalink / raw) To: Christian Brauner Cc: Askar Safin, linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches, linux-fsdevel, Alexander Viro, Jan Kara On Mon, 1 Jun 2026 at 09:42, Christian Brauner <brauner@kernel.org> wrote: > > Applied to the vfs-7.2.vmsplice branch of the vfs/vfs.git tree. Btw, if people want to work further on this - assuming we don't get any huge screams of pain from having effectively gotten rid of vmsplice() - I don't think it would hurt to look at limiting the "regular" splice() too. We already have the code to just turn it into a pure copy on the "splice to pipe" case: copy_splice_read(). In many ways it would be *lovely* to just always force that path. We already do that explicitly for DAX and O_DIRECT, but we made a lot of special files do it implicitly too, so quite a lot of the splice reading cases already use that "just read() into a kernel space buffer" model for splicing. It would be interesting to hear who would even notice if we just always used that copy case, and made "f_op->splice_read" never trigger at all. And it turns out that the only thing that ever uses "f_op->splice_write" is splice_to_socket. Which was actually the problematic buggy case. Everybody else pretty much seems to just use iter_file_splice_write(), which does the "emulate it with just a write from kernel buffers". So *if* we get rid of f_op->splice_read, we do leave the case that really caused problems, but nobody will ever care. Because once splice only deals with private buffers that can't be shared with anything else, a f_op->splice_write() that gets things wrong is pretty much a non-event. (We'd have to look at 'tee()' too: I don't think anybody really uses it, but it does do the "no copy linking" by just incrementing refcounts on the pipe buffers. So to really protect against splice_write users messing up, that should do copies too, but as long as it's all "private ephemeral buffers" that get their refcounts updated, I don't think anybody *really* cares) TLDR: maybe we could ghet rid of "f_op->splice_read". *That* would be a big simplification. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 17:17 ` Linus Torvalds @ 2026-06-01 17:33 ` Al Viro 2026-06-01 20:04 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2026-06-01 17:33 UTC (permalink / raw) To: Linus Torvalds Cc: Christian Brauner, Askar Safin, linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches, linux-fsdevel, Jan Kara, Steven Rostedt On Mon, Jun 01, 2026 at 10:17:23AM -0700, Linus Torvalds wrote: > TLDR: maybe we could ghet rid of "f_op->splice_read". *That* would be > a big simplification. FUSE might be interesting - fuse_dev_splice_read() and its ilk. Communications between the kernel and fuse server at least used to seriously want that, so that would be one place to look for unhappy userland... splice-related logics in fs/fuse/dev.c is interesting; another place like this is kernel/trace/, but I'm less familiar with that one. rostedt Cc'd (miklos already had been) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 2026-06-01 17:33 ` Al Viro @ 2026-06-01 20:04 ` Steven Rostedt 0 siblings, 0 replies; 16+ messages in thread From: Steven Rostedt @ 2026-06-01 20:04 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Christian Brauner, Askar Safin, linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox, Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches, linux-fsdevel, Jan Kara On Mon, 1 Jun 2026 18:33:25 +0100 Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Jun 01, 2026 at 10:17:23AM -0700, Linus Torvalds wrote: > > > TLDR: maybe we could ghet rid of "f_op->splice_read". *That* would be > > a big simplification. > > FUSE might be interesting - fuse_dev_splice_read() and its ilk. > Communications between the kernel and fuse server at least used to > seriously want that, so that would be one place to look for unhappy > userland... > > splice-related logics in fs/fuse/dev.c is interesting; another place > like this is kernel/trace/, but I'm less familiar with that one. > > rostedt Cc'd (miklos already had been) Thanks for the Cc. The tracing ring buffer was specifically made to be used by splice and the libtracefs has a lot of code to use it as well. As reading the ring buffer literally swaps out the write portion with a blank read portion, that portion (sub-buffer) is used to be directly fed into splice, providing a zero-copy of the trace data from the write of the event to going into a file. trace-cmd defaults to using splice to copy the tracing ring buffer directly into files to avoid as much copying during live recordings as possible. Whatever changes we make, I would like to make sure there's no regressions in performance of trace-cmd record. Thanks, -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-01 20:04 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin 2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin 2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin 2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin 2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato 2026-05-31 21:21 ` Askar Safin 2026-06-01 16:16 ` Christian Brauner 2026-06-01 3:11 ` Andy Lutomirski 2026-06-01 15:36 ` Matthew Wilcox 2026-06-01 15:50 ` Linus Torvalds 2026-06-01 16:17 ` Christian Brauner 2026-06-01 16:22 ` Linus Torvalds 2026-06-01 16:23 ` Christian Brauner 2026-06-01 17:17 ` Linus Torvalds 2026-06-01 17:33 ` Al Viro 2026-06-01 20:04 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox