From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: don't unlock invalidated buf on aborted tx commit
Date: Wed, 22 Aug 2018 09:05:13 -0400 [thread overview]
Message-ID: <20180822130512.GC26084@bfoster> (raw)
In-Reply-To: <20180822002840.GP31495@dastard>
On Wed, Aug 22, 2018 at 10:28:40AM +1000, Dave Chinner wrote:
> On Tue, Aug 21, 2018 at 10:01:16AM -0400, Brian Foster wrote:
> > xfstests generic/388,475 occasionally reproduce assertion failures
> > in xfs_buf_item_unpin() when the final bli reference is dropped on
> > an invalidated buffer and the buffer is not locked as it is expected
> > to be. Invalidated buffers should remain locked on transaction
> > commit until the final unpin, at which point the buffer is removed
> > from the AIL and the bli is freed since stale buffers are not
> > written back.
> >
> > The assert failures are associated with filesystem shutdown,
> > typically due to log I/O errors injected by the test. The
> > problematic situation can occur if the shutdown happens to cause a
> > race between an active transaction that has invalidated a particular
> > buffer and an I/O error on a log buffer that contains the bli
> > associated with the same (now stale) buffer.
> >
> > Both transaction and log contexts acquire a bli reference. If the
> > transaction has already invalidated the buffer by the time the I/O
> > error occurs and ends up aborting due to shutdown, the transaction
> > and log hold the last two references to a stale bli. If the
> > transaction cancel occurs first, it treats the buffer as non-stale
> > due to the aborted state: the bli reference is dropped and the
> > buffer is released/unlocked. The log buffer I/O error handling
> > eventually calls into xfs_buf_item_unpin(), drops the final
> > reference to the bli and treats it as stale. The buffer wasn't left
> > locked by xfs_buf_item_unlock(), however, so the assert fails and
> > the buffer is double unlocked. The latter problem is mitigated by
> > the fact that the fs is shutdown and no further damage is possible.
> >
> > ->iop_unlock() of an invalidated buffer should behave consistently
> > with respect to the bli refcount, regardless of aborted state. If
> > the refcount remains elevated on commit, we know the bli is awaiting
> > an unpin (since it can't be in another transaction) and will be
> > handled appropriately on log buffer completion. If the final bli
> > reference of an invalidated buffer is dropped in ->iop_unlock(), we
> > can assume the transaction has aborted because invalidation implies
> > a dirty transaction. In the non-abort case, the log would have
> > acquired a bli reference in ->iop_pin() and prevented bli release at
> > ->iop_unlock() time. In the abort case the item must be freed and
> > buffer unlocked because it wasn't pinned by the log.
> >
> > Rework xfs_buf_item_unlock() to simplify the currently circuitous
> > and duplicate logic and leave invalidated buffers locked based on
> > bli refcount, regardless of aborted state. This ensures that a
> > pinned, stale buffer is always found locked when eventually
> > unpinned.
>
> Sounds reasonable to me.
>
> BTW, can you follow this up by checking the freeing logic in
> xfs_trans_brelse()? There's a commented out assert that the pincount
> should be zero, implying that at some point there was a similar
> free within transaction context vs unpin from log io completion
> race occurring there.
>
Sure, I'll look into it once I get this patch fixed up.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > Hi all,
> >
> > This survives the assert reproducer, several xfstests runs with v4/v5
> > superblock filesystems, and fsstress and fsx workloads over a few hours.
> >
> > Brian
> >
> > fs/xfs/xfs_buf_item.c | 85 +++++++++++++++++++------------------------
> > 1 file changed, 37 insertions(+), 48 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > index 1c9d1398980b..5e8b91543d93 100644
> > --- a/fs/xfs/xfs_buf_item.c
> > +++ b/fs/xfs/xfs_buf_item.c
> > @@ -556,73 +556,62 @@ xfs_buf_item_unlock(
> > {
> > struct xfs_buf_log_item *bip = BUF_ITEM(lip);
> > struct xfs_buf *bp = bip->bli_buf;
> > + int freed;
>
> atomic_dec_and_test() returns a bool.
>
Ok.
> > bool aborted;
> > bool hold = !!(bip->bli_flags & XFS_BLI_HOLD);
> > bool dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
> > + bool stale = !!(bip->bli_flags & XFS_BLI_STALE);
> > #if defined(DEBUG) || defined(XFS_WARN)
> > bool ordered = !!(bip->bli_flags & XFS_BLI_ORDERED);
> > #endif
>
> Can we get rid of the "!!" here? The variables are booleans, so
> there's not need to do a double negation to squash the value down to
> 0 or 1 as the compiler will do that implicitly...
>
Yeah, I'll remove those.
> > + /*
> > + * The bli dirty state should match whether the blf has logged segments
> > + * except for ordered buffers, where only the bli should be dirty.
> > + */
> > + ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
> > + (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> > + ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> > + trace_xfs_buf_item_unlock(bip);
>
> Can you put the trace before the asserts? If an assert fires, we
> want the ibuf item state in the event trace record...
>
Ok.
> Speaking of trace events, the trace_xfs_buf_item_unlock_stale()
> event is no longer called by this function. Can you remove it or
> or re-add it as per below?
>
I removed it intentionally because the existing tracepoint already
expresses stale state, I just forgot to delete it.
> > + aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
> >
> > /*
> > + * Clear the buffer's association with this transaction and
> > + * per-transaction state from the bli, which has been copied above.
> > */
> > + bp->b_transp = NULL;
> > bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> >
> > /*
> > + * Drop the transaction bli reference and free the item if clean or
> > + * aborted and we had the last reference. In either case this is the
> > + * last opportunity to free the item since it won't be written back.
> > + * Otherwise, the bli is still referenced or dirty and will be freed on
> > + * final unpin of the buffer (if stale) or writeback completion.
> > */
> > + freed = atomic_dec_and_test(&bip->bli_refcount);
> > + if (freed && (aborted || !dirty)) {
> > + ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
> > + ASSERT(!stale || aborted);
> > + /* an aborted item may be in the AIL, remove it first */
> > + if (aborted)
> > + xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
> > + xfs_buf_item_relse(bp);
> > }
> >
> > /*
> > + * If the buffer was invalidated, leave it locked on transaction commit
> > + * unless we just dropped the final reference. The latter case should
> > + * only ever happen on abort because invalidation dirties the
> > + * transaction and the log would have grabbed another bli reference when
> > + * the buffer was pinned. In the non-abort case, the buffer is unlocked
> > + * on final unpin and the bli freed since stale buffers are not written
> > + * back.
> > */
> > + if (stale && !freed)
> > + return;
>
> This logic is still a bit circuitous. We have one action if the item
> is freed, and another action if it not freed. But they are separated
> by large comments and it's not clear that we do one or the other,
> but not both. Maybe it would be clearer if it were structured like
> this:
>
I mainly wanted to condense some of the comments and duplicate code
because the existing function seemed far longer than necessary, adding
to the confusion. Beyond that, I made the above structure changes
because I found it more simple to isolate the cases where we were
responsible to free the item from the stale specific check (and
comments) around unlocking it or not.
That said...
> /*
> * Comment explaining life, the universe and everything.
> */
> freed = atomic_dec_and_test(&bip->bli_refcount);
> if (freed) {
> if (aborted || !dirty) {
> ....
> } else if (stale) {
> trace_xfs_buf_item_unlock_stale(bip);
> return;
> }
... this looks equivalent (minus the stale tp that I still want to kill)
to me and facilitates the other changes. I'll give it a try.
Brian
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2018-08-22 16:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-21 14:01 [PATCH] xfs: don't unlock invalidated buf on aborted tx commit Brian Foster
2018-08-21 19:56 ` Bill O'Donnell
2018-08-22 0:28 ` Dave Chinner
2018-08-22 13:05 ` Brian Foster [this message]
2018-08-24 14:56 ` Brian Foster
2018-08-22 6:01 ` Christoph Hellwig
2018-08-22 13:05 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180822130512.GC26084@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).