From: Willy Tarreau <w@1wt.eu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Askar Safin <safinaskar@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-api@vger.kernel.org, netdev@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@infradead.org>,
David Howells <dhowells@redhat.com>,
David Hildenbrand <david@kernel.org>,
Pedro Falcato <pfalcato@suse.de>,
Miklos Szeredi <miklos@szeredi.hu>,
patches@lists.linux.dev, linux-fsdevel@vger.kernel.org,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Thu, 4 Jun 2026 17:53:37 +0200 [thread overview]
Message-ID: <aiGfgRch99l_5z11@1wt.eu> (raw)
In-Reply-To: <CAHk-=wiQB-j53cTs9kM4UeXoXPaFj78aJe3D6Yp1Fohg7i4tWA@mail.gmail.com>
On Thu, Jun 04, 2026 at 07:31:30AM -0700, Linus Torvalds wrote:
> On Wed, 3 Jun 2026 at 23:32, Willy Tarreau <w@1wt.eu> wrote:
> >
> > I'm using vmsplice() + tee() + splice() in high-performance applications,
> > load generators to be precise, and soon a cache. This is super convenient
> > and extremely efficient:
> >
> > - vmsplice() is used to prepare a "master" pipe with data to be sent
> > over TCP or kTLS
> > - then for each request, we do tee() from this master pipe to per-request
> > pipes.
> > - the per-request pipes are those that are used to deliver the data to
> > the socket via splice().
>
> So most of those would actually not be affected by any of the existing
> patches: the pipe->socket splice would remain, the tee() code would
> still just take a ref to the page count.
OK!
> The vmsplice() would change,
OK but for this use case it's not dramatic (it could be more annyoing
for the cache where I'd like this zero-copy from memory to the wire
though).
> but looking at your haterm.c sources, it
> looks like it's mostly a fairly small thing ("common_response[]" being
> 16kB).
In this one it's indeed a 16kB block that is repeated into the
same pipe by simplicity, in its ancestor it was 64kB. We try to
make as large a pipe as we can, but that's all.
> That is typically *faster* to just copy than look up pages.
>
> HOWEVER.
>
> It looks like you're actually doing exactly the thing that I thought
> was crazy and wouldn't even work reliably: you change the
> common_response[] contents dynamically *after* the vmsplice, and
> depend on the fact that changing it in user space changes the buffer
> in the pipe too.
No no, it's definitely not doing that (or it's a bug, but it's not
supposed to happen). I'm perfectly aware that one must definitely not
do that, and it's a guarantee the user of vmsplice() must provide.
> So that would break *entirely* with the vmsplice() changes if I read
> the code right (which I might not do) simply because that looks like
> it really does require that "wrutably shared buffer after the fact".
We agree that this would deliver complete garbage an I'm not interested
in such a "feature" at all.
> Interesting. Because the vmsplice() code uses get_user_pages_fast(),
> and honestly, it never pinned the page reliably to the original source
> - it breaks COW randomly in one direction or the other after fork()
I must confess I never knew how it deals with pages shared over a
fork(), and have been wondering if two processes could create a
shared memory area on the fly just by using vmsplice() on each side
and end up with the same pages (I don't need this but it could have
very nice use cases).
> (and I thouht even after a page-out, but thinking more about it the
> swap cache may have made it work for that case).
>
> Uhhuh. That does look like it makes the vmsplice() changes untenable.
No no don't worry, I'm not seeing any value in changing data after
vmsplice() and that would just be a bug. My goal here is only to
pre-fill a buffer with a pattern then prepare the pipe with that
pattern, nothing less, nothing more.
> But I may be reading your haproxy code entirely wrong.
I think so, but I wouldn't be the one blaming you for this ;-)
Thanks for the clarifications!
Willy
next prev parent reply other threads:[~2026-06-04 15:53 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56 ` Stefan Metzmacher
2026-06-03 21:17 ` Askar Safin
2026-06-04 9:06 ` David Laight
2026-06-04 14:17 ` Linus Torvalds
2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 21:21 ` Askar Safin
2026-06-01 16:16 ` Christian Brauner
2026-06-02 21:12 ` Askar Safin
2026-06-02 21:37 ` Pedro Falcato
2026-06-02 22:06 ` Linus Torvalds
2026-06-02 22:41 ` Pedro Falcato
2026-06-02 23:07 ` Askar Safin
2026-06-02 22:54 ` Askar Safin
2026-06-03 0:05 ` Linus Torvalds
2026-06-03 1:08 ` Askar Safin
2026-06-03 3:51 ` Andy Lutomirski
2026-06-03 4:20 ` Linus Torvalds
2026-06-03 6:45 ` Christian Brauner
2026-06-03 13:40 ` Christian Brauner
2026-06-03 15:26 ` Linus Torvalds
2026-06-03 18:10 ` Andy Lutomirski
2026-06-03 18:28 ` Linus Torvalds
2026-06-03 19:22 ` David Howells
2026-06-03 19:59 ` Linus Torvalds
2026-06-03 21:31 ` Andy Lutomirski
2026-06-03 21:36 ` Linus Torvalds
2026-06-03 21:38 ` Linus Torvalds
2026-06-03 22:23 ` Andy Lutomirski
2026-06-03 22:53 ` Linus Torvalds
2026-06-03 22:43 ` Askar Safin
2026-06-03 22:49 ` Andy Lutomirski
2026-06-03 23:00 ` Askar Safin
2026-06-04 0:01 ` Linus Torvalds
2026-06-03 18:12 ` Jakub Kicinski
2026-06-03 11:43 ` Pedro Falcato
2026-06-03 18:14 ` Jakub Kicinski
2026-06-01 3:11 ` Andy Lutomirski
2026-06-01 15:36 ` Matthew Wilcox
2026-06-01 15:50 ` Linus Torvalds
2026-06-01 16:17 ` Christian Brauner
2026-06-01 16:22 ` Linus Torvalds
2026-06-03 19:24 ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17 ` Linus Torvalds
2026-06-01 17:33 ` Al Viro
2026-06-01 20:04 ` Steven Rostedt
2026-06-02 0:28 ` Andrew Morton
2026-06-02 8:25 ` David Hildenbrand (Arm)
2026-06-02 18:44 ` Eric Biggers
2026-06-03 7:50 ` David Hildenbrand (Arm)
2026-06-04 6:32 ` Willy Tarreau
2026-06-04 14:31 ` Linus Torvalds
2026-06-04 15:53 ` Willy Tarreau [this message]
2026-06-04 15:58 ` Linus Torvalds
2026-06-04 16:15 ` Willy Tarreau
2026-06-04 15:53 ` Andy Lutomirski
2026-06-04 16:09 ` Willy Tarreau
2026-06-04 17:25 ` Andy Lutomirski
2026-06-03 9:57 ` Miklos Szeredi
2026-06-04 0:45 ` Askar Safin
2026-06-04 1:52 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiGfgRch99l_5z11@1wt.eu \
--to=w@1wt.eu \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=netdev@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=pfalcato@suse.de \
--cc=rostedt@goodmis.org \
--cc=safinaskar@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox