From: David Laight <david.laight.linux@gmail.com>
To: Stefan Metzmacher <metze@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andy Lutomirski <luto@amacapital.net>,
Askar Safin <safinaskar@gmail.com>,
akpm@linux-foundation.org, axboe@kernel.dk, brauner@kernel.org,
david@kernel.org, dhowells@redhat.com, hch@infradead.org,
jack@suse.cz, linux-api@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, miklos@szeredi.hu, netdev@vger.kernel.org,
patches@lists.linux.dev, pfalcato@suse.de,
viro@zeniv.linux.org.uk, willy@infradead.org
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Fri, 5 Jun 2026 13:19:42 +0100 [thread overview]
Message-ID: <20260605131942.4584728e@pumpkin> (raw)
In-Reply-To: <512d948f-7883-4d8c-b2c5-a777e70ca975@samba.org>
On Fri, 5 Jun 2026 11:43:45 +0200
Stefan Metzmacher <metze@samba.org> wrote:
> Hi Linus,
>
> >> Am I understanding correctly that this will completely break zerocopy
> >> sendfile?
> >
> > Very much, yes.
> >
> > And it's worth making it very very clear that ABSOLUTELY NONE of the
> > recent big security bugs were in splice.
> >
> > They were all in the networking and crypto code that just didn't deal
> > with shared data correctly.
> >
> > So in that sense, it's a bit sad to discuss castrating splice.
> >
> > But it's probably still the right thing to at least try.
> >
> > I've seen very impressive benchmark numbers over the years, but
> > they've often smelled more like benchmarketing than actual real work.
> >
> > There's also a real possibility that a lot of the sendfile / splice
> > advantage has little to do with zero-copy, and more to do with the
> > cost of mapping and maintaining buffers in user space.
> >
> > If you are sending file data using plain reads and writes, it's not
> > just the "copy from user space to socket data structures".
> >
> > There's also the cost of populating user space in the first place:
> > page faults for mmap made *that* historical copy avoidance basically a
> > fairy tale.
> >
> > And not using mmap means that you have the cost of double caching in
> > the kernel _and_ user space etc.
> >
> > So sendfile() as a concept (whether you use combinations of splice()
> > system calls or the sendfile system call itsefl) isn't necessarily
> > only about the zero-copy, it's really also about avoiding the user
> > space memory management.
>
> I don't think so. Ok, maybe for webservers just serving tiny
> html files, that's true. But for me with Samba it's really the
> copy_to/from_iter() that is the major factor.
Is that copy also doing the ip checksum?
I really can't tell from the code (it does sometimes, even for tcp).
But I can't help feeling that optimisation is well past its sell by date.
-- David
>
> We can use io_uring with IOSQE_ASYNC in order to offload
> the memcpy cpu wasting to different cores, but it's still
> wasting a lot of resources.
>
> For the case of filesystem => socket, we can use
> IORING_OP_SENDMSG_ZC and that at least removes the
> copy_from_iter() in the sendmsg path, but the
> IORING_OP_READV of buffers in the sizes up to 8MBytes
> is wasting cpu in copy_to_iter().
>
> For the case with smbdirect and RDMA offload with 2x200GBit/s links
> changes from only ~33GBytes/s are used (and the server cpu even if using multiple cores)
> is the limit. Without the memcpy waste ~46GByte/s is easily reached
> and the limit is just the network link.
>
> Maybe another solution could be having a version of
> copy_to/from_iter that uses async_memcpy(), but didn't
> have the time to experiment with that yet. Maybe a new flag
> to preadv2/pwritev2 could control that, so that the
> application can decide what's better.
>
> But without an alternative please don't kill splice.
>
> A lot of people are frustrated because they bought hardware
> that is able to handle a lot of throughput, but
> e.g. with the default of smb over tcp they get no
> higher than 3.5GByte/s on a 100GBit/s link that's able
> to handle ~11GBytes/s. And io_uring and splice are
> a key factor to fix that.
>
> Thanks!
> metze
>
next prev parent reply other threads:[~2026-06-05 12:19 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56 ` Stefan Metzmacher
2026-06-03 21:17 ` Askar Safin
2026-06-04 9:06 ` David Laight
2026-06-04 14:17 ` Linus Torvalds
2026-06-04 17:38 ` David Laight
2026-06-04 19:30 ` Linus Torvalds
2026-06-04 21:32 ` David Laight
2026-06-04 21:42 ` Linus Torvalds
2026-06-05 9:32 ` Florian Weimer
2026-06-05 1:57 ` Nathan Chancellor
2026-06-05 8:23 ` David Laight
2026-06-04 23:25 ` Askar Safin
2026-06-05 11:02 ` Mark Brown
2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 21:21 ` Askar Safin
2026-06-01 16:16 ` Christian Brauner
2026-06-02 21:12 ` Askar Safin
2026-06-02 21:37 ` Pedro Falcato
2026-06-02 22:06 ` Linus Torvalds
2026-06-02 22:41 ` Pedro Falcato
2026-06-02 23:07 ` Askar Safin
2026-06-02 22:54 ` Askar Safin
2026-06-03 0:05 ` Linus Torvalds
2026-06-03 1:08 ` Askar Safin
2026-06-03 3:51 ` Andy Lutomirski
2026-06-03 4:20 ` Linus Torvalds
2026-06-03 6:45 ` Christian Brauner
2026-06-03 13:40 ` Christian Brauner
2026-06-03 15:26 ` Linus Torvalds
2026-06-03 18:10 ` Andy Lutomirski
2026-06-03 18:28 ` Linus Torvalds
2026-06-03 19:22 ` David Howells
2026-06-03 19:59 ` Linus Torvalds
2026-06-03 21:31 ` Andy Lutomirski
2026-06-03 21:36 ` Linus Torvalds
2026-06-03 21:38 ` Linus Torvalds
2026-06-03 22:23 ` Andy Lutomirski
2026-06-03 22:53 ` Linus Torvalds
2026-06-03 22:43 ` Askar Safin
2026-06-03 22:49 ` Andy Lutomirski
2026-06-03 23:00 ` Askar Safin
2026-06-04 0:01 ` Linus Torvalds
2026-06-03 18:12 ` Jakub Kicinski
2026-06-05 9:43 ` Stefan Metzmacher
2026-06-05 12:19 ` David Laight [this message]
2026-06-03 11:43 ` Pedro Falcato
2026-06-03 18:14 ` Jakub Kicinski
2026-06-01 3:11 ` Andy Lutomirski
2026-06-01 15:36 ` Matthew Wilcox
2026-06-01 15:50 ` Linus Torvalds
2026-06-01 16:17 ` Christian Brauner
2026-06-01 16:22 ` Linus Torvalds
2026-06-03 19:24 ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17 ` Linus Torvalds
2026-06-01 17:33 ` Al Viro
2026-06-01 20:04 ` Steven Rostedt
2026-06-02 0:28 ` Andrew Morton
2026-06-02 8:25 ` David Hildenbrand (Arm)
2026-06-02 18:44 ` Eric Biggers
2026-06-03 7:50 ` David Hildenbrand (Arm)
2026-06-04 6:32 ` Willy Tarreau
2026-06-04 14:31 ` Linus Torvalds
2026-06-04 15:53 ` Willy Tarreau
2026-06-04 15:58 ` Linus Torvalds
2026-06-04 16:15 ` Willy Tarreau
2026-06-04 15:53 ` Andy Lutomirski
2026-06-04 16:09 ` Willy Tarreau
2026-06-04 17:25 ` Andy Lutomirski
2026-06-03 9:57 ` Miklos Szeredi
2026-06-05 8:35 ` Collin Funk
2026-06-04 0:45 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-04 1:52 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605131942.4584728e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=metze@samba.org \
--cc=miklos@szeredi.hu \
--cc=netdev@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=pfalcato@suse.de \
--cc=safinaskar@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox