From: David Laight <david.laight.linux@gmail.com>
To: Stefan Metzmacher <metze@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andy Lutomirski <luto@amacapital.net>,
Askar Safin <safinaskar@gmail.com>,
akpm@linux-foundation.org, axboe@kernel.dk, brauner@kernel.org,
david@kernel.org, dhowells@redhat.com, hch@infradead.org,
jack@suse.cz, linux-api@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, miklos@szeredi.hu, netdev@vger.kernel.org,
patches@lists.linux.dev, pfalcato@suse.de,
viro@zeniv.linux.org.uk, willy@infradead.org
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Fri, 5 Jun 2026 13:19:42 +0100 [thread overview]
Message-ID: <20260605131942.4584728e@pumpkin> (raw)
In-Reply-To: <512d948f-7883-4d8c-b2c5-a777e70ca975@samba.org>
On Fri, 5 Jun 2026 11:43:45 +0200
Stefan Metzmacher <metze@samba.org> wrote:
> Hi Linus,
>
> >> Am I understanding correctly that this will completely break zerocopy
> >> sendfile?
> >
> > Very much, yes.
> >
> > And it's worth making it very very clear that ABSOLUTELY NONE of the
> > recent big security bugs were in splice.
> >
> > They were all in the networking and crypto code that just didn't deal
> > with shared data correctly.
> >
> > So in that sense, it's a bit sad to discuss castrating splice.
> >
> > But it's probably still the right thing to at least try.
> >
> > I've seen very impressive benchmark numbers over the years, but
> > they've often smelled more like benchmarketing than actual real work.
> >
> > There's also a real possibility that a lot of the sendfile / splice
> > advantage has little to do with zero-copy, and more to do with the
> > cost of mapping and maintaining buffers in user space.
> >
> > If you are sending file data using plain reads and writes, it's not
> > just the "copy from user space to socket data structures".
> >
> > There's also the cost of populating user space in the first place:
> > page faults for mmap made *that* historical copy avoidance basically a
> > fairy tale.
> >
> > And not using mmap means that you have the cost of double caching in
> > the kernel _and_ user space etc.
> >
> > So sendfile() as a concept (whether you use combinations of splice()
> > system calls or the sendfile system call itsefl) isn't necessarily
> > only about the zero-copy, it's really also about avoiding the user
> > space memory management.
>
> I don't think so. Ok, maybe for webservers just serving tiny
> html files, that's true. But for me with Samba it's really the
> copy_to/from_iter() that is the major factor.
Is that copy also doing the ip checksum?
I really can't tell from the code (it does sometimes, even for tcp).
But I can't help feeling that optimisation is well past its sell by date.
-- David
>
> We can use io_uring with IOSQE_ASYNC in order to offload
> the memcpy cpu wasting to different cores, but it's still
> wasting a lot of resources.
>
> For the case of filesystem => socket, we can use
> IORING_OP_SENDMSG_ZC and that at least removes the
> copy_from_iter() in the sendmsg path, but the
> IORING_OP_READV of buffers in the sizes up to 8MBytes
> is wasting cpu in copy_to_iter().
>
> For the case with smbdirect and RDMA offload with 2x200GBit/s links
> changes from only ~33GBytes/s are used (and the server cpu even if using multiple cores)
> is the limit. Without the memcpy waste ~46GByte/s is easily reached
> and the limit is just the network link.
>
> Maybe another solution could be having a version of
> copy_to/from_iter that uses async_memcpy(), but didn't
> have the time to experiment with that yet. Maybe a new flag
> to preadv2/pwritev2 could control that, so that the
> application can decide what's better.
>
> But without an alternative please don't kill splice.
>
> A lot of people are frustrated because they bought hardware
> that is able to handle a lot of throughput, but
> e.g. with the default of smb over tcp they get no
> higher than 3.5GByte/s on a 100GBit/s link that's able
> to handle ~11GBytes/s. And io_uring and splice are
> a key factor to fix that.
>
> Thanks!
> metze
>
next prev parent reply other threads:[~2026-06-05 12:19 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31 1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31 1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56 ` Stefan Metzmacher
2026-06-03 21:17 ` Askar Safin
2026-06-04 9:06 ` David Laight
2026-06-04 14:17 ` Linus Torvalds
2026-06-04 17:38 ` David Laight
2026-06-04 19:30 ` Linus Torvalds
2026-06-04 21:32 ` David Laight
2026-06-04 21:42 ` Linus Torvalds
2026-06-05 9:32 ` Florian Weimer
2026-06-05 15:54 ` Linus Torvalds
2026-06-05 16:27 ` Linus Torvalds
2026-06-05 16:30 ` Florian Weimer
2026-06-05 17:12 ` Linus Torvalds
2026-06-05 1:57 ` Nathan Chancellor
2026-06-05 8:23 ` David Laight
2026-06-04 23:25 ` Askar Safin
2026-06-05 11:02 ` Mark Brown
2026-06-05 16:02 ` Linus Torvalds
2026-06-05 16:02 ` [LTP] " Linus Torvalds
2026-06-05 16:26 ` Mark Brown
2026-06-05 17:21 ` David Hildenbrand (Arm)
2026-05-31 1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31 8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 19:01 ` David Hildenbrand (Arm)
2026-05-31 21:21 ` Askar Safin
2026-06-01 16:16 ` Christian Brauner
2026-06-02 21:12 ` Askar Safin
2026-06-02 21:37 ` Pedro Falcato
2026-06-02 22:06 ` Linus Torvalds
2026-06-02 22:41 ` Pedro Falcato
2026-06-02 23:07 ` Askar Safin
2026-06-02 22:54 ` Askar Safin
2026-06-03 0:05 ` Linus Torvalds
2026-06-03 1:08 ` Askar Safin
2026-06-03 3:51 ` Andy Lutomirski
2026-06-03 4:20 ` Linus Torvalds
2026-06-03 6:45 ` Christian Brauner
2026-06-03 13:40 ` Christian Brauner
2026-06-03 15:26 ` Linus Torvalds
2026-06-03 18:10 ` Andy Lutomirski
2026-06-03 18:28 ` Linus Torvalds
2026-06-03 19:22 ` David Howells
2026-06-03 19:59 ` Linus Torvalds
2026-06-03 21:31 ` Andy Lutomirski
2026-06-03 21:36 ` Linus Torvalds
2026-06-03 21:38 ` Linus Torvalds
2026-06-03 22:23 ` Andy Lutomirski
2026-06-03 22:53 ` Linus Torvalds
2026-06-05 15:15 ` Stefan Metzmacher
2026-06-05 15:58 ` Linus Torvalds
2026-06-03 22:43 ` Askar Safin
2026-06-03 22:49 ` Andy Lutomirski
2026-06-03 23:00 ` Askar Safin
2026-06-04 0:01 ` Linus Torvalds
2026-06-03 18:12 ` Jakub Kicinski
2026-06-05 9:43 ` Stefan Metzmacher
2026-06-05 12:19 ` David Laight [this message]
2026-06-05 15:20 ` Stefan Metzmacher
2026-06-03 11:43 ` Pedro Falcato
2026-06-03 18:14 ` Jakub Kicinski
2026-06-01 3:11 ` Andy Lutomirski
2026-06-01 15:36 ` Matthew Wilcox
2026-06-01 15:50 ` Linus Torvalds
2026-06-01 16:17 ` Christian Brauner
2026-06-01 16:22 ` Linus Torvalds
2026-06-03 19:24 ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17 ` Linus Torvalds
2026-06-01 17:33 ` Al Viro
2026-06-01 20:04 ` Steven Rostedt
2026-06-02 0:28 ` Andrew Morton
2026-06-02 8:25 ` David Hildenbrand (Arm)
2026-06-02 18:44 ` Eric Biggers
2026-06-03 7:50 ` David Hildenbrand (Arm)
2026-06-04 6:32 ` Willy Tarreau
2026-06-04 14:31 ` Linus Torvalds
2026-06-04 15:53 ` Willy Tarreau
2026-06-04 15:58 ` Linus Torvalds
2026-06-04 16:15 ` Willy Tarreau
2026-06-05 15:41 ` Willy Tarreau
2026-06-05 20:54 ` The 8472
2026-06-04 15:53 ` Andy Lutomirski
2026-06-04 16:09 ` Willy Tarreau
2026-06-04 17:25 ` Andy Lutomirski
2026-06-03 9:57 ` Miklos Szeredi
2026-06-05 8:35 ` Collin Funk
2026-06-04 0:45 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-04 1:52 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605131942.4584728e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=metze@samba.org \
--cc=miklos@szeredi.hu \
--cc=netdev@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=pfalcato@suse.de \
--cc=safinaskar@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.