public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: stabilize insert range start boundary to avoid COW writeback race
Date: Wed, 11 Dec 2019 08:41:00 +1100	[thread overview]
Message-ID: <20191210214100.GB19256@dread.disaster.area> (raw)
In-Reply-To: <20191210132340.11330-1-bfoster@redhat.com>

On Tue, Dec 10, 2019 at 08:23:40AM -0500, Brian Foster wrote:
> generic/522 (fsx) occasionally fails with a file corruption due to
> an insert range operation. The primary characteristic of the
> corruption is a misplaced insert range operation that differs from
> the requested target offset. The reason for this behavior is a race
> between the extent shift sequence of an insert range and a COW
> writeback completion that causes a front merge with the first extent
> in the shift.

How is the COW writeback completion modifying the extent list while
an extent shift is modifying the extent list?  Both should be
running under XFS_ILOCK_EXCL contexts so there shouldn't be a race
condition here unless we've screwed up the extent list modification
atomicity...

> 
> The shift preparation function flushes and unmaps from the target
> offset of the operation to the end of the file to ensure no
> modifications can be made and page cache is invalidated before file
> data is shifted. An insert range operation then splits the extent at
> the target offset, if necessary, and begins to shift the start
> offset of each extent starting from the end of the file to the start
> offset. The shift sequence operates at extent level and so depends
> on the preparation sequence to guarantee no changes can be made to
> the target range during the shift.

Oh... shifting extents is not an atomic operation w.r.t. other
inode modifications - both insert and collapse run individual
modification transactions and lock/unlock the inode around each
transaction. So, essentially, they aren't atomic when faced with
other *metadata* modifications to the inode.

> If the block immediately prior to
> the target offset was dirty and shared, however, it can undergo
> writeback and move from the COW fork to the data fork at any point
> during the shift. If the block is contiguous with the block at the
> start offset of the insert range, it can front merge and alter the
> start offset of the extent. Once the shift sequence reaches the
> target offset, it shifts based on the latest start offset and
> silently changes the target offset of the operation and corrupts the
> file.

Yup, that's exactly the landmine that non-atomic, multi-transaction
extent range operations have. It might be a COW operation, it might
be something else that ends up manipulating the extent list. But
because the ILOCK is not held across the entire extent shift,
insert/collapse are susceptible to corruption when any other XFs
code concurrently modifies the extent list.

I think insert/collapse need to be converted to work like a
truncate operation instead of a series on individual write
operations. That is, they are a permanent transaction that locks the
inode once and is rolled repeatedly until the entire extent listi
modification is done and then the inode is unlocked.

> To address this problem, update the shift preparation code to
> stabilize the start boundary along with the full range of the
> insert. Also update the existing corruption check to fail if any
> extent is shifted with a start offset behind the target offset of
> the insert range. This prevents insert from racing with COW
> writeback completion and fails loudly in the event of an unexpected
> extent shift.

It looks ok to avoid this particular symptom (backportable point
fix), but I really think we should convert insert/collapse to be
atomic w.r.t other extent list modifications....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2019-12-10 21:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-10 13:23 [PATCH] xfs: stabilize insert range start boundary to avoid COW writeback race Brian Foster
2019-12-10 14:02 ` Carlos Maiolino
2019-12-10 21:41 ` Dave Chinner [this message]
2019-12-11 12:47   ` Brian Foster
2019-12-11 20:52     ` Dave Chinner
2019-12-12 14:16       ` Brian Foster
2019-12-12 20:48         ` Dave Chinner
2019-12-13 11:37           ` Brian Foster
2019-12-17  5:59 ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191210214100.GB19256@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox