linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 4/7] xfs: buffered write failure should not truncate the page cache
Date: Wed, 2 Nov 2022 15:26:42 -0700	[thread overview]
Message-ID: <Y2LuogWYlbkpcoHD@magnolia> (raw)
In-Reply-To: <20221102210433.GZ3600936@dread.disaster.area>

On Thu, Nov 03, 2022 at 08:04:33AM +1100, Dave Chinner wrote:
> On Wed, Nov 02, 2022 at 09:41:30AM -0700, Darrick J. Wong wrote:
> > On Tue, Nov 01, 2022 at 11:34:09AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > xfs_buffered_write_iomap_end() currently invalidates the page cache
> > > over the unused range of the delalloc extent it allocated. While the
> > > write allocated the delalloc extent, it does not own it exclusively
> > > as the write does not hold any locks that prevent either writeback
> > > or mmap page faults from changing the state of either the page cache
> > > or the extent state backing this range.
> > > 
> > > Whilst xfs_bmap_punch_delalloc_range() already handles races in
> > > extent conversion - it will only punch out delalloc extents and it
> > > ignores any other type of extent - the page cache truncate does not
> > > discriminate between data written by this write or some other task.
> > > As a result, truncating the page cache can result in data corruption
> > > if the write races with mmap modifications to the file over the same
> > > range.
> > > 
> > > generic/346 exercises this workload, and if we randomly fail writes
> > > (as will happen when iomap gets stale iomap detection later in the
> > > patchset), it will randomly corrupt the file data because it removes
> > > data written by mmap() in the same page as the write() that failed.
> > > 
> > > Hence we do not want to punch out the page cache over the range of
> > > the extent we failed to write to - what we actually need to do is
> > > detect the ranges that have dirty data in cache over them and *not
> > > punch them out*.
> > 
> > Same dumb question as hch -- why do we need to punch out the nondirty
> > pagecache after a failed write?  If the folios are uptodate then we're
> > evicting cache unnecessarily, and if they're !uptodate can't we let
> > reclaim do the dirty work for us?
> 
> Sorry, we don't punch out the page cache  - this was badly worded. I
> meant:
> 
> "[...] - what we actually need to do is
> detect the ranges that have dirty data in cache over the delalloc
> extent and retain those regions of the delalloc extent."
> 
> i.e. we never punch the page cache anymore, and we only selectively
> punch the delalloc extent that back the clean regions of thw write
> range...

Ah ok.  I was thinking there was a discrepancy between the description
in the commit message and the code, but then I zoomed out and asked
myself why dump the pagecache at all...

> > I don't know if there are hysterical raisins for this or if the goal is
> > to undo memory consumption after a write failure?  If we're stale-ing
> > the write because the iomapping changed, why not leave the folio where
> > it is, refresh the iomapping, and come back to (possibly?) the same
> > folio?
> 
> I can't say for certain - I haven't gone an looked at the history.
> I suspect it goes back to the days where write() could write zeroes
> into the page cache for eof zeroing or zeroing for file extension
> before the write() started writing data. Maybe that was part of what
> it was trying to undo?

Yeah, I dunno, but it certainly looks like it might be an attempt to
undo the effects of posteof zeroing if the write fails.  Back in 2.6.36
days, commit 155130a4f784 ("get rid of block_write_begin_newtrunc")
changed xfs_vm_write_begin to truncate any posteof areas after a failed
write.  This was part of some sort of "new truncate sequence" that
itself got fixed and patched various times after that.

In 4.8, commit 68a9f5e7007c ("xfs: implement iomap based buffered write
path") changed this truncation to the unprocessed part of a short write,
even if it wasn't posteof.  The commit says it's part of a broad
rewrite, but doesn't mention anything about that particular change.

/me shrugs

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2022-11-02 22:26 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01  0:34 xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-01  0:34 ` [PATCH 1/7] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-02  7:17   ` Christoph Hellwig
2022-11-02 16:12   ` Darrick J. Wong
2022-11-02 21:11     ` Dave Chinner
2022-11-01  0:34 ` [PATCH 2/7] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-02  7:18   ` Christoph Hellwig
2022-11-02 16:22   ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 3/7] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-02  7:20   ` Christoph Hellwig
2022-11-02 16:32   ` Darrick J. Wong
2022-11-04  5:40     ` Dave Chinner
2022-11-07 23:53       ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 4/7] xfs: buffered write failure should not truncate the page cache Dave Chinner
2022-11-01 11:57   ` kernel test robot
2022-11-02  7:24   ` Christoph Hellwig
2022-11-02 20:57     ` Dave Chinner
2022-11-02 16:41   ` Darrick J. Wong
2022-11-02 21:04     ` Dave Chinner
2022-11-02 22:26       ` Darrick J. Wong [this message]
2022-11-04  8:08   ` Christoph Hellwig
2022-11-04 23:10     ` Dave Chinner
2022-11-07 23:48       ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 5/7] iomap: write iomap validity checks Dave Chinner
2022-11-02  8:36   ` Christoph Hellwig
2022-11-02 16:43     ` Darrick J. Wong
2022-11-02 16:58       ` Darrick J. Wong
2022-11-03  0:35         ` Dave Chinner
2022-11-04  8:12           ` Christoph Hellwig
2022-11-02 16:57   ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 6/7] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-01  9:15   ` kernel test robot
2022-11-02  8:41   ` Christoph Hellwig
2022-11-02 21:39     ` Dave Chinner
2022-11-04  8:14       ` Christoph Hellwig
2022-11-02 17:19   ` Darrick J. Wong
2022-11-02 22:36     ` Dave Chinner
2022-11-08  0:00       ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 7/7] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-01  3:39 ` xfs, iomap: fix data corrupton due to stale cached iomaps Darrick J. Wong
2022-11-01  4:21   ` Dave Chinner
2022-11-02 17:23     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2LuogWYlbkpcoHD@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).