Linux userland API discussions
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Stefan Metzmacher <metze@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Askar Safin <safinaskar@gmail.com>,
	akpm@linux-foundation.org, axboe@kernel.dk, brauner@kernel.org,
	david@kernel.org, dhowells@redhat.com, hch@infradead.org,
	jack@suse.cz, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, miklos@szeredi.hu, netdev@vger.kernel.org,
	patches@lists.linux.dev, pfalcato@suse.de,
	viro@zeniv.linux.org.uk, willy@infradead.org
Subject: Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
Date: Fri, 5 Jun 2026 13:19:42 +0100	[thread overview]
Message-ID: <20260605131942.4584728e@pumpkin> (raw)
In-Reply-To: <512d948f-7883-4d8c-b2c5-a777e70ca975@samba.org>

On Fri, 5 Jun 2026 11:43:45 +0200
Stefan Metzmacher <metze@samba.org> wrote:

> Hi Linus,
> 
> >> Am I understanding correctly that this will completely break zerocopy
> >> sendfile?  
> > 
> > Very much, yes.
> > 
> > And it's worth making it very very clear that ABSOLUTELY NONE of the
> > recent big security bugs were in splice.
> > 
> > They were all in the networking and crypto code that just didn't deal
> > with shared data correctly.
> > 
> > So in that sense, it's a bit sad to discuss castrating splice.
> > 
> > But it's probably still the right thing to at least try.
> > 
> > I've seen very impressive benchmark numbers over the years, but
> > they've often smelled more like benchmarketing than actual real work.
> > 
> > There's also a real possibility that a lot of the sendfile / splice
> > advantage has little to do with zero-copy, and more to do with the
> > cost of mapping and maintaining buffers in user space.
> > 
> > If you are sending file data using plain reads and writes, it's not
> > just the "copy from user space to socket data structures".
> > 
> > There's also the cost of populating user space in the first place:
> > page faults for mmap made *that* historical copy avoidance basically a
> > fairy tale.
> > 
> > And not using mmap means that you have the cost of double caching in
> > the kernel _and_ user space etc.
> > 
> > So sendfile() as a concept (whether you use combinations of splice()
> > system calls or the sendfile system call itsefl) isn't necessarily
> > only about the zero-copy, it's really also about avoiding the user
> > space memory management.  
> 
> I don't think so. Ok, maybe for webservers just serving tiny
> html files, that's true. But for me with Samba it's really the
> copy_to/from_iter() that is the major factor.

Is that copy also doing the ip checksum?
I really can't tell from the code (it does sometimes, even for tcp).
But I can't help feeling that optimisation is well past its sell by date.

-- David

> 
> We can use io_uring with IOSQE_ASYNC in order to offload
> the memcpy cpu wasting to different cores, but it's still
> wasting a lot of resources.
> 
> For the case of filesystem => socket, we can use
> IORING_OP_SENDMSG_ZC and that at least removes the
> copy_from_iter() in the sendmsg path, but the
> IORING_OP_READV of buffers in the sizes up to 8MBytes
> is wasting cpu in copy_to_iter().
> 
> For the case with smbdirect and RDMA offload with 2x200GBit/s links
> changes from only ~33GBytes/s are used (and the server cpu even if using multiple cores)
> is the limit. Without the memcpy waste ~46GByte/s is easily reached
> and the limit is just the network link.
> 
> Maybe another solution could be having a version of
> copy_to/from_iter that uses async_memcpy(), but didn't
> have the time to experiment with that yet. Maybe a new flag
> to preadv2/pwritev2 could control that, so that the
> application can decide what's better.
> 
> But without an alternative please don't kill splice.
> 
> A lot of people are frustrated because they bought hardware
> that is able to handle a lot of throughput, but
> e.g. with the default of smb over tcp they get no
> higher than 3.5GByte/s on a 100GBit/s link that's able
> to handle ~11GBytes/s. And io_uring and splice are
> a key factor to fix that.
> 
> Thanks!
> metze
> 


  reply	other threads:[~2026-06-05 12:19 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31  1:01 [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-05-31  1:01 ` [PATCH 1/3] tee: fs/splice.c: remove unused parameter "flags" from "link_pipe" Askar Safin
2026-05-31  1:01 ` [PATCH 2/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-03 20:56   ` Stefan Metzmacher
2026-06-03 21:17     ` Askar Safin
2026-06-04  9:06       ` David Laight
2026-06-04 14:17         ` Linus Torvalds
2026-06-04 17:38           ` David Laight
2026-06-04 19:30             ` Linus Torvalds
2026-06-04 21:32               ` David Laight
2026-06-04 21:42                 ` Linus Torvalds
2026-06-05  9:32                   ` Florian Weimer
2026-06-05  1:57                 ` Nathan Chancellor
2026-06-05  8:23                   ` David Laight
2026-06-04 23:25             ` Askar Safin
2026-06-05 11:02   ` Mark Brown
2026-05-31  1:01 ` [PATCH 3/3] splice: remove PIPE_BUF_FLAG_GIFT Askar Safin
2026-05-31  8:54 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Pedro Falcato
2026-05-31 21:21   ` Askar Safin
2026-06-01 16:16     ` Christian Brauner
2026-06-02 21:12   ` Askar Safin
2026-06-02 21:37     ` Pedro Falcato
2026-06-02 22:06       ` Linus Torvalds
2026-06-02 22:41         ` Pedro Falcato
2026-06-02 23:07           ` Askar Safin
2026-06-02 22:54         ` Askar Safin
2026-06-03  0:05           ` Linus Torvalds
2026-06-03  1:08             ` Askar Safin
2026-06-03  3:51             ` Andy Lutomirski
2026-06-03  4:20               ` Linus Torvalds
2026-06-03  6:45                 ` Christian Brauner
2026-06-03 13:40                   ` Christian Brauner
2026-06-03 15:26                     ` Linus Torvalds
2026-06-03 18:10                 ` Andy Lutomirski
2026-06-03 18:28                   ` Linus Torvalds
2026-06-03 19:22                     ` David Howells
2026-06-03 19:59                     ` Linus Torvalds
2026-06-03 21:31                     ` Andy Lutomirski
2026-06-03 21:36                       ` Linus Torvalds
2026-06-03 21:38                         ` Linus Torvalds
2026-06-03 22:23                         ` Andy Lutomirski
2026-06-03 22:53                           ` Linus Torvalds
2026-06-03 22:43                       ` Askar Safin
2026-06-03 22:49                         ` Andy Lutomirski
2026-06-03 23:00                           ` Askar Safin
2026-06-04  0:01                             ` Linus Torvalds
2026-06-03 18:12                 ` Jakub Kicinski
2026-06-05  9:43                 ` Stefan Metzmacher
2026-06-05 12:19                   ` David Laight [this message]
2026-06-03 11:43               ` Pedro Falcato
2026-06-03 18:14                 ` Jakub Kicinski
2026-06-01  3:11 ` Andy Lutomirski
2026-06-01 15:36   ` Matthew Wilcox
2026-06-01 15:50     ` Linus Torvalds
2026-06-01 16:17       ` Christian Brauner
2026-06-01 16:22         ` Linus Torvalds
2026-06-03 19:24       ` David Howells
2026-06-01 16:23 ` Christian Brauner
2026-06-01 17:17   ` Linus Torvalds
2026-06-01 17:33     ` Al Viro
2026-06-01 20:04       ` Steven Rostedt
2026-06-02  0:28         ` Andrew Morton
2026-06-02  8:25           ` David Hildenbrand (Arm)
2026-06-02 18:44             ` Eric Biggers
2026-06-03  7:50               ` David Hildenbrand (Arm)
2026-06-04  6:32           ` Willy Tarreau
2026-06-04 14:31             ` Linus Torvalds
2026-06-04 15:53               ` Willy Tarreau
2026-06-04 15:58                 ` Linus Torvalds
2026-06-04 16:15                   ` Willy Tarreau
2026-06-04 15:53             ` Andy Lutomirski
2026-06-04 16:09               ` Willy Tarreau
2026-06-04 17:25                 ` Andy Lutomirski
2026-06-03  9:57       ` Miklos Szeredi
2026-06-05  8:35   ` Collin Funk
2026-06-04  0:45 ` [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2 Askar Safin
2026-06-04  1:52   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605131942.4584728e@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=metze@samba.org \
    --cc=miklos@szeredi.hu \
    --cc=netdev@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=pfalcato@suse.de \
    --cc=safinaskar@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox