From: Joe Damato <jdamato@fastly.com>
To: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, asml.silence@gmail.com,
linux-fsdevel@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, horms@kernel.org, linux-api@vger.kernel.org,
linux-arch@vger.kernel.org, viro@zeniv.linux.org.uk,
jack@suse.cz, kuba@kernel.org, shuah@kernel.org, sdf@fomichev.me,
mingo@redhat.com, arnd@arndb.de, brauner@kernel.org,
akpm@linux-foundation.org, tglx@linutronix.de, jolsa@kernel.org,
linux-kselftest@vger.kernel.org, Joe Damato <jdamato@fastly.com>
Subject: [RFC -next 00/10] Add ZC notifications to splice and sendfile
Date: Wed, 19 Mar 2025 00:15:11 +0000 [thread overview]
Message-ID: <20250319001521.53249-1-jdamato@fastly.com> (raw)
Greetings:
Welcome to the RFC.
Currently, when a user app uses sendfile the user app has no way to know
if the bytes were transmit; sendfile simply returns, but it is possible
that a slow client on the other side may take time to receive and ACK
the bytes. In the meantime, the user app which called sendfile has no
way to know whether it can overwrite the data on disk that it just
sendfile'd.
One way to fix this is to add zerocopy notifications to sendfile similar
to how MSG_ZEROCOPY works with sendmsg. This is possible thanks to the
extensive work done by Pavel [1].
To support this, two important user ABI changes are proposed:
- A new splice flag, SPLICE_F_ZC, which allows users to signal that
splice should generate zerocopy notifications if possible.
- A new system call, sendfile2, which is similar to sendfile64 except
that it takes an additional argument, flags, which allows the user
to specify either a "regular" sendfile or a sendfile with zerocopy
notifications enabled.
In either case, user apps can read notifications from the error queue
(like they would with MSG_ZEROCOPY) to determine when their call to
sendfile has completed.
I tested this RFC using the selftest modified in the last patch and also
by using the selftest between two different physical hosts:
# server
./msg_zerocopy -4 -i eth0 -t 2 -v -r tcp
# client (does the sendfiling)
dd if=/dev/zero of=sendfile_data bs=1M count=8
./msg_zerocopy -4 -i eth0 -D $SERVER_IP -v -l 1 -t 2 -z -f sendfile_data tcp
I would love to get high level feedback from folks on a few things:
- Is this functionality, at a high level, something that would be
desirable / useful? I think so, but I'm of course I am biased ;)
- Is this approach generally headed in the right direction? Are the
proposed user ABI changes reasonable?
If the above two points are generally agreed upon then I'd welcome
feedback on the patches themselves :)
This is kind of a net thing, but also kind of a splice thing so hope I
am sending this to right places to get appropriate feedback. I based my
code on the vfs/for-next tree, but am happy to rebase on another tree if
desired. The cc-list got a little out of control, so I manually trimmed
it down quite a bit; sorry if I missed anyone I should have CC'd in the
process.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/cover.1657643355.git.asml.silence@gmail.com/
Joe Damato (10):
splice: Add ubuf_info to prepare for ZC
splice: Add helper that passes through splice_desc
splice: Factor splice_socket into a helper
splice: Add SPLICE_F_ZC and attach ubuf
fs: Add splice_write_sd to file operations
fs: Extend do_sendfile to take a flags argument
fs: Add sendfile2 which accepts a flags argument
fs: Add sendfile flags for sendfile2
fs: Add sendfile2 syscall
selftests: Add sendfile zerocopy notification test
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/tools/syscall_32.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/read_write.c | 40 +++++++---
fs/splice.c | 87 +++++++++++++++++----
include/linux/fs.h | 2 +
include/linux/sendfile.h | 10 +++
include/linux/splice.h | 7 +-
include/linux/syscalls.h | 2 +
include/uapi/asm-generic/unistd.h | 4 +-
net/socket.c | 1 +
scripts/syscall.tbl | 1 +
tools/testing/selftests/net/msg_zerocopy.c | 54 ++++++++++++-
tools/testing/selftests/net/msg_zerocopy.sh | 5 ++
27 files changed, 200 insertions(+), 29 deletions(-)
create mode 100644 include/linux/sendfile.h
base-commit: 2e72b1e0aac24a12f3bf3eec620efaca7ab7d4de
--
2.43.0
next reply other threads:[~2025-03-19 0:15 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-19 0:15 Joe Damato [this message]
2025-03-19 0:15 ` [RFC -next 01/10] splice: Add ubuf_info to prepare for ZC Joe Damato
2025-03-19 0:15 ` [RFC -next 02/10] splice: Add helper that passes through splice_desc Joe Damato
2025-03-19 0:15 ` [RFC -next 03/10] splice: Factor splice_socket into a helper Joe Damato
2025-03-19 0:15 ` [RFC -next 04/10] splice: Add SPLICE_F_ZC and attach ubuf Joe Damato
2025-03-19 0:15 ` [RFC -next 05/10] fs: Add splice_write_sd to file operations Joe Damato
2025-03-19 0:15 ` [RFC -next 06/10] fs: Extend do_sendfile to take a flags argument Joe Damato
2025-03-19 0:15 ` [RFC -next 07/10] fs: Add sendfile2 which accepts " Joe Damato
2025-03-19 0:15 ` [RFC -next 08/10] fs: Add sendfile flags for sendfile2 Joe Damato
2025-03-19 0:15 ` [RFC -next 09/10] fs: Add sendfile2 syscall Joe Damato
2025-03-19 0:15 ` [RFC -next 10/10] selftests: Add sendfile zerocopy notification test Joe Damato
2025-03-19 8:04 ` [RFC -next 00/10] Add ZC notifications to splice and sendfile Christoph Hellwig
2025-03-19 15:32 ` Joe Damato
2025-03-19 16:07 ` Jens Axboe
2025-03-19 17:04 ` Joe Damato
2025-03-19 17:20 ` Jens Axboe
2025-03-19 17:45 ` Joe Damato
2025-03-19 18:37 ` Jens Axboe
2025-03-19 19:15 ` Stefan Metzmacher
2025-03-20 10:46 ` Pavel Begunkov
2025-03-21 7:55 ` Stefan Metzmacher
2025-03-21 20:51 ` Pavel Begunkov
2025-03-19 19:16 ` Joe Damato
2025-03-21 11:11 ` Jens Axboe
2025-03-20 5:57 ` Christoph Hellwig
2025-03-20 18:23 ` Joe Damato
2025-03-21 5:56 ` Christoph Hellwig
2025-03-21 11:14 ` Jens Axboe
2025-03-21 16:36 ` Joe Damato
2025-03-21 20:30 ` Joe Damato
2025-03-21 20:33 ` Jens Axboe
2025-03-21 21:28 ` Joe Damato
2025-03-21 20:35 ` Jens Axboe
2025-03-21 16:44 ` Joe Damato
2025-03-19 23:22 ` Joe Damato
2025-03-21 11:13 ` Jens Axboe
2025-03-20 5:50 ` Christoph Hellwig
2025-03-20 18:05 ` Joe Damato
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250319001521.53249-1-jdamato@fastly.com \
--to=jdamato@fastly.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=brauner@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jack@suse.cz \
--cc=jolsa@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox