Linux userland API discussions
 help / color / mirror / Atom feed
From: Askar Safin <safinaskar@gmail.com>
To: luto@amacapital.net
Cc: akpm@linux-foundation.org, axboe@kernel.dk, brauner@kernel.org,
	david@kernel.org, dhowells@redhat.com, hch@infradead.org,
	jack@suse.cz, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, miklos@szeredi.hu, netdev@vger.kernel.org,
	patches@lists.linux.dev, pfalcato@suse.de, safinaskar@gmail.com,
	torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
	willy@infradead.org
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Thu,  4 Jun 2026 01:43:11 +0300	[thread overview]
Message-ID: <20260603224311.834796-1-safinaskar@gmail.com> (raw)
In-Reply-To: <CALCETrW3XcNLuB1Y6PSkxQDSK2o+=EB2AAd25SjWQqcJemwnbw@mail.gmail.com>

Andy Lutomirski <luto@amacapital.net>:
> Maybe we should keep an API that does an optimized copy, from one fd
> to another, that can send from a file to the network with at most ONE
> cpu-side copy.  Not aiming for zero like sendfile / splice.  Aiming
> for one.

Yes, this is what my hypothetical future patch will do.

One copy from pagecache to pipe, and then network uses that buffer
directly.

> But splice_to_socket involves
> MSG_SPLACE_PAGES, which I think is a part of the mess that you
> dislike.  And the path where one does copy_splice_read and then
> splice_to_socket has to be a bit complex because of tee and (I think)
> because splice_to_socket cannot assume that the incoming data is just
> ordinary unshared buffers.

My future patch will provide new guarantee: pipe buffers are always
stable, i. e. they will not be externally-modified.

So hopefully network code will be adjusted to use this guarantee.

But pipe buffers will not be "ordinary unshared buffers".

They still may be shared with other things because of tee(2).
(But they are still stable! They will not be randomly modified!)

But network code can do "pipe_buf_try_steal" and thus ensure that
these buffers are not shared with anything else.

So, network code can be modified to use "pipe_buf_try_steal", and you
will get "ordinary unshared buffers" exactly as you want. This will
give you in total exactly one copy.

Also: as well as I understand, previously, pipe_buf_try_steal was
kind of lie. It may return true for buffers created via vmsplice with
GIFT. (I did not check this, but I think so.) I. e. pipe_buf_try_steal will
return "true" in this case, but pages are still shared! But, thanks to my
vmsplice patchset (which is already applied), this is no longer true!
So now pipe_buf_try_steal is absolutely safe to use!

Finally, we can degrade tee(2) to copy, and hopefully this will
allow us to always be sure that pipe buffers are not shared with anything.
This is possible future direction.

-- 
Askar Safin

  parent reply	other threads:[~2026-06-03 22:43 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31  1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31  1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31  1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56   ` Stefan Metzmacher
2026-06-03 21:17     ` Askar Safin
2026-05-31  1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31  8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 21:21   ` Askar Safin
2026-06-01 16:16     ` Christian Brauner
2026-06-02 21:12   ` Askar Safin
2026-06-02 21:37     ` Pedro Falcato
2026-06-02 22:06       ` Linus Torvalds
2026-06-02 22:41         ` Pedro Falcato
2026-06-02 23:07           ` Askar Safin
2026-06-02 22:54         ` Askar Safin
2026-06-03  0:05           ` Linus Torvalds
2026-06-03  1:08             ` Askar Safin
2026-06-03  3:51             ` Andy Lutomirski
2026-06-03  4:20               ` Linus Torvalds
2026-06-03  6:45                 ` Christian Brauner
2026-06-03 13:40                   ` Christian Brauner
2026-06-03 15:26                     ` Linus Torvalds
2026-06-03 18:10                 ` Andy Lutomirski
2026-06-03 18:28                   ` Linus Torvalds
2026-06-03 19:22                     ` David Howells
2026-06-03 19:59                     ` Linus Torvalds
2026-06-03 21:31                     ` Andy Lutomirski
2026-06-03 21:36                       ` Linus Torvalds
2026-06-03 21:38                         ` Linus Torvalds
2026-06-03 22:23                         ` Andy Lutomirski
2026-06-03 22:53                           ` Linus Torvalds
2026-06-03 22:43                       ` Askar Safin [this message]
2026-06-03 22:49                         ` Andy Lutomirski
2026-06-03 23:00                           ` Askar Safin
2026-06-04  0:01                             ` Linus Torvalds
2026-06-03 18:12                 ` Jakub Kicinski
2026-06-03 11:43               ` Pedro Falcato
2026-06-03 18:14                 ` Jakub Kicinski
2026-06-01  3:11 ` Andy Lutomirski
2026-06-01 15:36   ` Matthew Wilcox
2026-06-01 15:50     ` Linus Torvalds
2026-06-01 16:17       ` Christian Brauner
2026-06-01 16:22         ` Linus Torvalds
2026-06-03 19:24       ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17   ` Linus Torvalds
2026-06-01 17:33     ` Al Viro
2026-06-01 20:04       ` Steven Rostedt
2026-06-02  0:28         ` Andrew Morton
2026-06-02  8:25           ` David Hildenbrand (Arm)
2026-06-02 18:44             ` Eric Biggers
2026-06-03  7:50               ` David Hildenbrand (Arm)
2026-06-03  9:57       ` Miklos Szeredi
2026-06-04  0:45 ` Askar Safin
2026-06-04  1:52   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260603224311.834796-1-safinaskar@gmail.com \
    --to=safinaskar@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=miklos@szeredi.hu \
    --cc=netdev@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=pfalcato@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox