From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
Date: Thu, 14 Aug 2014 13:11:36 +1000 [thread overview]
Message-ID: <20140814031136.GG20518@dastard> (raw)
In-Reply-To: <20140813154229.GB4426@laptop.bfoster>
On Wed, Aug 13, 2014 at 11:42:29AM -0400, Brian Foster wrote:
> On Fri, Aug 08, 2014 at 02:49:26PM -0400, Brian Foster wrote:
> > A file collapse stress test workload reproduces collapse failures
> > mid-operation due to changes in the inode fork extent count across
> > extent shift cycles. xfs_collapse_file_space() currently calls
> > xfs_bmap_shift_extents() to shift one extent at a time per transaction.
> > The extent index is used to track the next extent to shift after each
> > iteration.
Right, it does so after writing back all the dirty pages and
invalidating the cache. This is done under the IOLOCK_EXCL, so we
should end up with nothing being able to newly dirty the inode while
the collapse range operation is in progress.
> >
> > A concurrent fsx and fsstress workload reproduces a scenario where the
> > extent count changes during this sequence, causing the 'current_ext'
> > index to become inaccurate and possibly skip shifting an extent. The
> > likely result of this behavior is the subsequent shift attempt will not
> > find a hole in the area of the skipped extent and fail, leaving the file
> > in a partially collapsed state.
> >
> > This occurs because the ilock is released and acquired across each
> > transaction and each individual extent shift. Tracepoint output shows
> > that once the ilock is released after an extent shift, a pending
> > blocking writeback (e.g., sync) can acquire the lock and proceed before
> > the next extent is shifted down. If the writeback converts part of a
> > delayed allocation earlier in the file, for example, it can insert a new
> > extent into the map. Tracing confirms a call to
> > xfs_bmap_add_extent_delay_real() in this particular instance.
Are we getting a dirty extent in the range being collapsed, or does
it exist outside the range being shifted? If outside, then shouldn't
we just sync the entire file first? i.e. treat it the same way as we
do xfs_swap_extents()?
Realistically, cur_extent is not valid once we drop the ILOCK.
Perhaps we should keep the offset of the extent we are up to in the
shift, and then look up the offset to get current extent map index
once we pick the ILOCK back up. That way we avoid issues with other
parts of the extent map changing between ILOCK hold contexts.
> > To prevent this scenario, hold the ilock across the entire extent shift
> > loop in xfs_collapse_file_space().
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/xfs/xfs_bmap_util.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 2f1e30d..96eb97b 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1474,6 +1474,8 @@ xfs_collapse_file_space(
> > if (error)
> > return error;
> >
> > + xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +
>
> I realized this moves the lock outside of the xfs_trans_reserve(), thus
> opening a potential deadlock scenario with regard to the log. I suppose
> this might be harder to hit in real life than a sync() causing the
> operation to fall over mid-sequence, so I'm still Ok with keeping this
> unless anybody objects.
We can't do that. Inode locking rules explicitly forbid it - we've
avoided needing to do break this rule for 20 years, so let's not
start making exceptions now just because it's a "simple fix".
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-14 3:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 18:49 [PATCH 0/2] xfs: for-next file collapse bug fixes Brian Foster
2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
2014-08-11 18:03 ` Christoph Hellwig
2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
2014-08-11 18:03 ` Christoph Hellwig
2014-08-13 15:42 ` Brian Foster
2014-08-14 3:11 ` Dave Chinner [this message]
2014-08-14 19:09 ` Brian Foster
2014-08-14 22:30 ` Dave Chinner
2014-08-11 21:55 ` [PATCH 0/2] xfs: for-next file collapse bug fixes Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140814031136.GG20518@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox