linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Joe Damato <jdamato@fastly.com>,
	Christoph Hellwig <hch@infradead.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	asml.silence@gmail.com, linux-fsdevel@vger.kernel.org,
	edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	viro@zeniv.linux.org.uk, jack@suse.cz, kuba@kernel.org,
	shuah@kernel.org, sdf@fomichev.me, mingo@redhat.com,
	arnd@arndb.de, brauner@kernel.org, akpm@linux-foundation.org,
	tglx@linutronix.de, jolsa@kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [RFC -next 00/10] Add ZC notifications to splice and sendfile
Date: Fri, 21 Mar 2025 05:11:00 -0600	[thread overview]
Message-ID: <d458a42e-f9b4-4075-af0d-f715c15e3566@kernel.dk> (raw)
In-Reply-To: <Z9sX98Y0Xy9-Vzqf@LQ3V64L9R2>

On 3/19/25 1:16 PM, Joe Damato wrote:
>>> In general: it does seem a bit odd to me that there isn't a safe
>>> sendfile syscall in Linux that uses existing completion notification
>>> mechanisms.
>>
>> Pretty natural, I think. sendfile(2) predates that by quite a bit, and
>> the last real change to sendfile was using splice underneath. Which I
>> did, and that was probably almost 20 years ago at this point...
>>
>> I do think it makes sense to have a sendfile that's both fast and
>> efficient, and can be used sanely with buffer reuse without relying on
>> odd heuristics.
> 
> Just trying to tie this together in my head -- are you saying that
> you think the kernel internals of sendfile could be changed in a
> different way or that this a userland problem (and they should use
> the io_uring wrapper you suggested above) ?

I'm saying that it of course makes sense to have a way to do sendfile
where you know when reuse is safe, and that we have an API that provides
that very nicely already without needing to add syscalls. If you used
io_uring for this, then the "tx is done, reuse is fine" notification is
just another notification, not anything special that needs new plumbing.

>>>>> I would also argue that there are likely user apps out there that
>>>>> use both sendmsg MSG_ZEROCOPY for certain writes (for data in
>>>>> memory) and also use sendfile (for data on disk). One example would
>>>>> be a reverse proxy that might write HTTP headers to clients via
>>>>> sendmsg but transmit the response body with sendfile.
>>>>>
>>>>> For those apps, the code to check the error queue already exists for
>>>>> sendmsg + MSG_ZEROCOPY, so swapping in sendfile2 seems like an easy
>>>>> way to ensure safe sendfile usage.
>>>>
>>>> Sure that is certainly possible. I didn't say that wasn't the case,
>>>> rather that the error queue approach is a work-around in the first place
>>>> for not having some kind of async notification mechanism for when it's
>>>> free to reuse.
>>>
>>> Of course, I certainly agree that the error queue is a work around.
>>> But it works, app use it, and its fairly well known. I don't see any
>>> reason, other than historical context, why sendmsg can use this
>>> mechanism, splice can, but sendfile shouldn't?
>>
>> My argument would be the same as for other features - if you can do it
>> simpler this other way, why not consider that? The end result would be
>> the same, you can do fast sendfile() with sane buffer reuse. But the
>> kernel side would be simpler, which is always a kernel main goal for
>> those of us that have to maintain it.
>>
>> Just adding sendfile2() works in the sense that it's an easier drop in
>> replacement for an app, though the error queue side does mean it needs
>> to change anyway - it's not just replacing one syscall with another. And
>> if we want to be lazy, sure that's fine. I just don't think it's the
>> best way to do it when we literally have a mechanism that's designed for
>> this and works with reuse already with normal send zc (and receive side
>> too, in the next kernel).
> 
> It seems like you've answered the question I asked above and that
> you are suggesting there might be a better and simpler sendfile2
> kernel-side implementation that doesn't rely on splice internals at
> all.
> 
> Am I following you? If so, I'll drop the sendfile2 stuff from this
> series and stick with the splice changes only, if you are (at a high
> level) OK with the idea of adding a flag for this to splice.
> 
> In the meantime, I'll take a few more reads through the iouring code
> to see if I can work out how sendfile2 might be built on top of that
> instead of splice in the kernel.

Heh I don't know how you jumped to that conclusion based on my feedback,
and seems like it's solidified through other replies. No I'm not saying
that the approach makes sense for the kernel, it makes some vague amount
of sense only on the premise of "oh but this is easy for applications as
they already know how to use sendfile(2)".

-- 
Jens Axboe

  reply	other threads:[~2025-03-21 11:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-19  0:15 [RFC -next 00/10] Add ZC notifications to splice and sendfile Joe Damato
2025-03-19  0:15 ` [RFC -next 01/10] splice: Add ubuf_info to prepare for ZC Joe Damato
2025-03-19  0:15 ` [RFC -next 02/10] splice: Add helper that passes through splice_desc Joe Damato
2025-03-19  0:15 ` [RFC -next 03/10] splice: Factor splice_socket into a helper Joe Damato
2025-03-19  0:15 ` [RFC -next 04/10] splice: Add SPLICE_F_ZC and attach ubuf Joe Damato
2025-03-19  0:15 ` [RFC -next 05/10] fs: Add splice_write_sd to file operations Joe Damato
2025-03-19  0:15 ` [RFC -next 06/10] fs: Extend do_sendfile to take a flags argument Joe Damato
2025-03-19  0:15 ` [RFC -next 07/10] fs: Add sendfile2 which accepts " Joe Damato
2025-03-19  0:15 ` [RFC -next 08/10] fs: Add sendfile flags for sendfile2 Joe Damato
2025-03-19  0:15 ` [RFC -next 09/10] fs: Add sendfile2 syscall Joe Damato
2025-03-19  0:15 ` [RFC -next 10/10] selftests: Add sendfile zerocopy notification test Joe Damato
2025-03-19  8:04 ` [RFC -next 00/10] Add ZC notifications to splice and sendfile Christoph Hellwig
2025-03-19 15:32   ` Joe Damato
2025-03-19 16:07     ` Jens Axboe
2025-03-19 17:04       ` Joe Damato
2025-03-19 17:20         ` Jens Axboe
2025-03-19 17:45           ` Joe Damato
2025-03-19 18:37             ` Jens Axboe
2025-03-19 19:15               ` Stefan Metzmacher
2025-03-20 10:46                 ` Pavel Begunkov
2025-03-21  7:55                   ` Stefan Metzmacher
2025-03-21 20:51                     ` Pavel Begunkov
2025-03-19 19:16               ` Joe Damato
2025-03-21 11:11                 ` Jens Axboe [this message]
2025-03-20  5:57             ` Christoph Hellwig
2025-03-20 18:23               ` Joe Damato
2025-03-21  5:56                 ` Christoph Hellwig
2025-03-21 11:14                   ` Jens Axboe
2025-03-21 16:36                     ` Joe Damato
2025-03-21 20:30                       ` Joe Damato
2025-03-21 20:33                         ` Jens Axboe
2025-03-21 21:28                           ` Joe Damato
2025-03-21 20:35                       ` Jens Axboe
2025-03-21 16:44                   ` Joe Damato
2025-03-19 23:22       ` Joe Damato
2025-03-21 11:13         ` Jens Axboe
2025-03-20  5:50     ` Christoph Hellwig
2025-03-20 18:05       ` Joe Damato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d458a42e-f9b4-4075-af0d-f715c15e3566@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=brauner@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=horms@kernel.org \
    --cc=jack@suse.cz \
    --cc=jdamato@fastly.com \
    --cc=jolsa@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).