From: Matt Whitlock <kernel@mattwhitlock.name>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
<linux-fsdevel@vger.kernel.org>, Christoph Hellwig <hch@lst.de>,
David Howells <dhowells@redhat.com>, Jens Axboe <axboe@kernel.dk>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE
Date: Tue, 27 Jun 2023 14:38:32 -0400 [thread overview]
Message-ID: <992abc31-c052-435d-b9e4-a55dccb03a73@mattwhitlock.name> (raw)
In-Reply-To: <ZJq6nJBoX1m6Po9+@casper.infradead.org>
On Tuesday, 27 June 2023 06:31:56 EDT, Matthew Wilcox wrote:
> Perhaps the problem is that splice() appears to copy, but really just
> takes the reference. Perhaps splice needs to actually copy if it
> sees a multi-page folio and isn't going to take all of it. I'm not
> an expert in splice-ology, so let's cc some people who know more about
> splice than I do.
As a naive userspace coder, my mental model of splice is that it takes
references to the pages backing the input file/pipe and appends those
references to the output pipe's queue. Crucially, my expectation is that
the cloned pages are marked copy-on-write, such that if some other process
subsequently modifies the pages, then those pages will be copied at the
time of modification (i.e., copy-on-write), and the references in the
spliced-into pipe buffer will still refer to the unmodified originals.
If splice is merely taking a reference without caring to implement
copy-on-write semantics, then it's not just FALLOC_FL_PUNCH_HOLE that will
cause problems. Indeed, *any* write to the input file in the range where
pages were spliced into a pipe will impact the pages already in the pipe
buffer. I'm sure that makes perfect sense to a kernel developer, but it
violates the implied contract of splice, which is to act *as though* the
pages had been *copied* into the pipe.
Obviously, I'm not asking for splice to actually copy the pages, as
zero-copy semantics are a large part of the motivation for preferring
splice in the first place, but I do believe that the pages should be made
copy-on-write such that subsequent writes to those pages in the page cache,
whether by write or mmap or fallocate or otherwise, will force a copy of
the pages to be made if the copy-on-write reference count of the modified
pages is non-zero. Note, I say, "copy-on-write reference count," as
distinct from the ordinary reference count, as you would need to properly
handle the case where there are multiple *shared* references to a page that
*should* observe any changes made to it, while there may also be multiple
*private* references to the page that *must not* observe any changes made
to it.
Just a thought: if you're going to implement bi-level reference counting
semantics, then you could make MMAP_PRIVATE actually take a snapshot of the
mapped pages at the time of mapping. Currently, mmap(2) says, "It is
unspecified whether changes made to the file after the mmap() call are
visible in the mapped region," so you have the latitude to make the private
mapping truly isolated from subsequent changes in the backing file, using
the same plumbing as you could use to isolate spliced pages.
next prev parent reply other threads:[~2023-06-27 18:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-27 1:12 [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE Matt Whitlock
2023-06-27 5:47 ` Dave Chinner
2023-06-27 6:20 ` Dave Chinner
2023-06-27 10:31 ` Matthew Wilcox
2023-06-27 18:38 ` Matt Whitlock [this message]
2023-06-27 21:49 ` Dave Chinner
2023-06-27 18:14 ` Matt Whitlock
2023-06-27 21:51 ` Dave Chinner
2023-06-28 6:30 ` David Howells
2023-06-28 8:15 ` Dave Chinner
2023-06-28 9:33 ` David Howells
2023-06-28 12:51 ` Matthew Wilcox
2023-06-28 14:19 ` David Howells
2023-06-28 22:41 ` Dave Chinner
2023-06-28 17:35 ` Matt Whitlock
2023-06-28 18:27 ` David Howells
2023-06-28 22:17 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=992abc31-c052-435d-b9e4-a55dccb03a73@mattwhitlock.name \
--to=kernel@mattwhitlock.name \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).