linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Matt Whitlock <kernel@mattwhitlock.name>
Cc: David Howells <dhowells@redhat.com>,
	netdev@vger.kernel.org, Matthew Wilcox <willy@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@kvack.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
Date: Wed, 19 Jul 2023 21:35:33 +0200	[thread overview]
Message-ID: <CAJfpegvq4M_Go7fHiWVBBkrK6h4ChLqQTd0+EOKbRWZDcVerWA@mail.gmail.com> (raw)
In-Reply-To: <c634a18e-9f2b-4746-bd8f-aa1d41e6ddf7@mattwhitlock.name>

On Wed, 19 Jul 2023 at 19:59, Matt Whitlock <kernel@mattwhitlock.name> wrote:
>
> On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote:
> > On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
> >>
> >> Splicing data from, say, a file into a pipe currently leaves the source
> >> pages in the pipe after splice() returns - but this means that those pages
> >> can be subsequently modified by shared-writable mmap(), write(),
> >> fallocate(), etc. before they're consumed.
> >
> > What is this trying to fix?   The above behavior is well known, so
> > it's not likely to be a problem.
>
> Respectfully, it's not well-known, as it's not documented. If the splice(2)
> man page had mentioned that pages can be mutated after they're already
> ostensibly at rest in the output pipe buffer, then my nightly backups
> wouldn't have been incurring corruption silently for many months.

splice(2):

       Though we talk of copying, actual copies are generally avoided.
The kernel does this by implementing a pipe buffer as a set  of
refer‐
       ence-counted  pointers  to  pages  of kernel memory.  The
kernel creates "copies" of pages in a buffer by creating new pointers
(for the
       output buffer) referring to the pages, and increasing the
reference counts for the pages: only pointers are copied, not the
pages of the
       buffer.

While not explicitly stating that the contents of the pages can change
after being spliced, this can easily be inferred from the above
semantics.

Thanks,
Miklos

  reply	other threads:[~2023-07-19 19:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230629155433.4170837-1-dhowells@redhat.com>
2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
2023-07-19 10:17   ` Miklos Szeredi
2023-07-19 17:59     ` Matt Whitlock
2023-07-19 19:35       ` Miklos Szeredi [this message]
2023-07-19 19:44         ` Matthew Wilcox
2023-07-19 19:56           ` Miklos Szeredi
2023-07-19 20:04             ` Matthew Wilcox
2023-07-19 20:16           ` Linus Torvalds
2023-07-19 21:02             ` Matt Whitlock
2023-07-19 23:20               ` Linus Torvalds
2023-07-19 23:41                 ` Matt Whitlock
2023-07-20  0:00                   ` Linus Torvalds
2023-07-19 23:48                 ` Linus Torvalds
2023-07-24  9:44           ` David Howells
2023-07-24 13:55             ` Miklos Szeredi
2023-07-24 16:15             ` David Howells
2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
2023-06-30 13:44   ` Simon Horman
2023-06-30 15:29   ` David Howells
2023-06-30 17:32     ` Simon Horman
2023-06-29 15:54 ` [RFC PATCH 3/4] splice: Remove some now-unused bits David Howells
2023-06-29 15:54 ` [RFC PATCH 4/4] splice: Record some statistics David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJfpegvq4M_Go7fHiWVBBkrK6h4ChLqQTd0+EOKbRWZDcVerWA@mail.gmail.com \
    --to=miklos@szeredi.hu \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=hch@lst.de \
    --cc=kernel@mattwhitlock.name \
    --cc=linux-fsdevel@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).