* [PATCH RFC 0/2] xfs: fix up EFI/EFD error handling @ 2015-07-23 20:13 Brian Foster 2015-07-23 20:13 ` [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() Brian Foster 2015-07-23 20:13 ` [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs Brian Foster 0 siblings, 2 replies; 10+ messages in thread From: Brian Foster @ 2015-07-23 20:13 UTC (permalink / raw) To: xfs Hi all, This is another quick RFC related to testing issues I'm hitting with my previously posted log recovery fixes. Patch 1 is a small refactor to support patch 2. Patch 2 describes the issues I'm hitting in repeated shutdown and log recovery testing. This passes some spot testing and seems to resolve the hang from the test sequence described in the commit log description, but more testing is definitely required. I _think_ I follow the log callbacks and how the EFI/EFD are expected to be managed, but I could easily be missing something. As it is, I'm off for a long weekend after today so I'm sending this to hopefully get some eyes to sanity check before I get too deep into testing this along with the previous fixes next week... Brian Brian Foster (2): xfs: return committed status from xfs_trans_roll() xfs: fix efi/efd error handling to avoid fs shutdown hangs fs/xfs/xfs_bmap_util.c | 35 ++++++++++++++++++++-------- fs/xfs/xfs_extfree_item.c | 59 +++++++++++++++++++++++++---------------------- fs/xfs/xfs_trans.c | 15 ++++++++++-- fs/xfs/xfs_trans.h | 1 + 4 files changed, 71 insertions(+), 39 deletions(-) -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-23 20:13 [PATCH RFC 0/2] xfs: fix up EFI/EFD error handling Brian Foster @ 2015-07-23 20:13 ` Brian Foster 2015-07-28 0:40 ` Dave Chinner 2015-07-23 20:13 ` [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs Brian Foster 1 sibling, 1 reply; 10+ messages in thread From: Brian Foster @ 2015-07-23 20:13 UTC (permalink / raw) To: xfs Some callers need to make error handling decisions based on whether the current transaction successfully committed or not. Rename xfs_trans_roll(), add a new parameter and provide a wrapper to preserve existing callers. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_trans.c | 15 +++++++++++++-- fs/xfs/xfs_trans.h | 1 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 0582a27..a0ab1da 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -1019,9 +1019,10 @@ xfs_trans_cancel( * chunk we've been working on and get a new transaction to continue. */ int -xfs_trans_roll( +__xfs_trans_roll( struct xfs_trans **tpp, - struct xfs_inode *dp) + struct xfs_inode *dp, + int *committed) { struct xfs_trans *trans; struct xfs_trans_res tres; @@ -1052,6 +1053,7 @@ xfs_trans_roll( if (error) return error; + *committed = 1; trans = *tpp; /* @@ -1074,3 +1076,12 @@ xfs_trans_roll( xfs_trans_ijoin(trans, dp, 0); return 0; } + +int +xfs_trans_roll( + struct xfs_trans **tpp, + struct xfs_inode *dp) +{ + int committed = 0; + return __xfs_trans_roll(tpp, dp, &committed); +} diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 3b21b4e..518ae5c 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -226,6 +226,7 @@ void xfs_trans_log_efd_extent(xfs_trans_t *, xfs_fsblock_t, xfs_extlen_t); int xfs_trans_commit(struct xfs_trans *); +int __xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *); int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *); void xfs_trans_cancel(xfs_trans_t *); int xfs_trans_ail_init(struct xfs_mount *); -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-23 20:13 ` [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() Brian Foster @ 2015-07-28 0:40 ` Dave Chinner 2015-07-28 13:40 ` Brian Foster 2015-08-10 18:55 ` Brian Foster 0 siblings, 2 replies; 10+ messages in thread From: Dave Chinner @ 2015-07-28 0:40 UTC (permalink / raw) To: Brian Foster; +Cc: xfs [ reply to both patches in one reply, because it's related. ] On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: > Some callers need to make error handling decisions based on whether the > current transaction successfully committed or not. Rename > xfs_trans_roll(), add a new parameter and provide a wrapper to preserve > existing callers. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > fs/xfs/xfs_trans.c | 15 +++++++++++++-- > fs/xfs/xfs_trans.h | 1 + > 2 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > index 0582a27..a0ab1da 100644 > --- a/fs/xfs/xfs_trans.c > +++ b/fs/xfs/xfs_trans.c > @@ -1019,9 +1019,10 @@ xfs_trans_cancel( > * chunk we've been working on and get a new transaction to continue. > */ > int > -xfs_trans_roll( > +__xfs_trans_roll( > struct xfs_trans **tpp, > - struct xfs_inode *dp) > + struct xfs_inode *dp, > + int *committed) > { > struct xfs_trans *trans; > struct xfs_trans_res tres; > @@ -1052,6 +1053,7 @@ xfs_trans_roll( > if (error) > return error; So here we return committed = 0, error != 0. > > + *committed = 1; > trans = *tpp; And from here on in we return committed = 1 regardless of the error. So looking at the next patch in xfs_bmap_finish(): > free->xbfi_blockcount); > > - error = xfs_trans_roll(tp, NULL); > - *committed = 1; > + error = __xfs_trans_roll(tp, NULL, committed); > + > /* > - * We have a new transaction, so we should return committed=1, > - * even though we're returning an error. > + * We have a new transaction, so we should return committed=1, even > + * though we're returning an error. If an error was returned after the > + * original transaction was committed, defer the error handling until > + * the EFD is logged. We do this because a committed EFI requires an EFD > + * transaction to be processed to ensure the EFI is released. > */ > - if (error) > + if (error && *committed == 0) { > + *committed = 1; > return error; > + } So if we failed to commit the EFI, we say we did and then return. Why do we need to do that? /me goes an looks at all the callers. Hmm - only xfs_itruncate_extents() relies on that behaviour, but it could just as easily do an inode join on error, because on success the inode has already been joined to the new transaction by xfs_trans_roll(). Looking further, we have quite a bit of inconsistency in the error handling of xfs_bmap_finish() - some callers issue a xfs_bmap_cancel() in the error path, and some don't. failing to cancel the freelist on error looks to me like a memory leak, because we don't free the extents from the free list until the EFD for the extent has been logged. If we error out earlier, we still have items on the free list that haven't been processed. So it looks to me like we need fixes there. Further, it appears to me that there is really one xfs_bmap_finish() caller that requires the committed flag: xfs_qm_dqalloc(). All the others either use it for an assert or joining the inode to the transaction when committed = 1, which xfs_trans_roll() will have already done if we return committed = 1.... So xfs_qm_dqalloc(), on error, simply cancels the freelist and returns the error - it ignores the committed flag. So the committed flag is used to determine how to handle the dquot buffer: xfs_trans_bhold(tp, bp); if ((error = xfs_bmap_finish(tpp, &flist, &committed))) { goto error1; } if (committed) { tp = *tpp; xfs_trans_bjoin(tp, bp); } else { xfs_trans_bhold_release(tp, bp); } Now, xfs_trans_bhold() simply sets the BLI_HOLD flag on the buffer, and transaction commit clears it. That means we could call it unconditionally, and all we have to do is remove an assert(BLI_HOLD) from xfs_trans_bhold_release() to enable that. Given this is the only caller of xfs_trans_bhold_release(), I don't see a problem with doing that. Nor is there a problem with multiple joins of an object to a transaction, so we could simply make this: xfs_trans_bhold(tp, bp); error = xfs_bmap_finish(tpp, &flist, &committed); if (error) goto error1; tp = *tpp; xfs_trans_bjoin(tp, bp); xfs_trans_bhold_release(tp, bp); And the committed flag is not necessary. With this, we can hide everything to do with error on commit vs error after commit inside xfs_bmap_finish()... > efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count); > for (free = flist->xbf_first; free != NULL; free = next) { > next = free->xbfi_next; > - if ((error = xfs_free_extent(*tp, free->xbfi_startblock, > - free->xbfi_blockcount))) { > + > + /* > + * Free the extent if the above trans roll hasn't failed and log > + * the EFD before handling errors from either call to ensure the > + * EFI reference is accounted for in the tp. Otherwise, the EFI > + * is never released on abort and pins the AIL indefinitely. > + */ > + if (!error) > + error = xfs_free_extent(*tp, free->xbfi_startblock, > + free->xbfi_blockcount); > + xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > + free->xbfi_blockcount); > + if (error) { > /* > * The bmap free list will be cleaned up at a > * higher level. The EFI will be canceled when > @@ -118,11 +134,10 @@ xfs_bmap_finish( > SHUTDOWN_META_IO_ERROR); > return error; > } > - xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > - free->xbfi_blockcount); > xfs_bmap_del_free(flist, NULL, free); > } > - return 0; > + > + return error; This loop doesn't obviously do the right thing now. It will log the first extent into the EFD and then trigger a shutdown and return. The extent count in the EFD may not match the extent count on the EFI, so releasing the EFD at this point may not release all the extents in the EFI and hence not release the EFI. I think I'd prefer to see a xfs_trans_cancel_efi() call to handle this error path rather than having to go through the efd to release the reference on the EFI. i.e. error = __xfs_trans_roll(tp, NULL, &committed); if (error) { if (committed) { if (!XFS_FORCED_SHUTDOWN(mp)) xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_efi_cancel(tp, efi); } return error; } /* * Cancel a committed EFI after triggering a shutdown. We need to do * this so that the committed EFI is removed from the AIL and freed, * otherwise unmount will hang due to a non-empty AIL. */ xfs_efi_cancel() { ASSERT(XFS_FORCED_SHUTDOWN(mp)); xfs_efi_release(efi, efi->nextents); } Doing this means there's less complexity on the EFD abort side of things, as we don't have to now handle this failure case via transaction completion callback interface. Hmmm - something I just noticed: if we only have one EFD per EFI, why do we do we have that layer of extent counting before dropping real references? > xfs_efd_item_unlock( > struct xfs_log_item *lip) > { > - if (lip->li_flags & XFS_LI_ABORTED) > - xfs_efd_item_free(EFD_ITEM(lip)); > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > + > + if (lip->li_flags & XFS_LI_ABORTED) { > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > + xfs_efd_item_free(efdp); > + } > } i.e. we always call xfs_efi_release() with efi_nextents or efd_nextents, which are always the same, and so we never partially complete an EFI. Should we just kill that layer, as it does tend to complicate the EFI release code? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-28 0:40 ` Dave Chinner @ 2015-07-28 13:40 ` Brian Foster 2015-07-28 21:51 ` Dave Chinner 2015-07-30 17:21 ` Christoph Hellwig 2015-08-10 18:55 ` Brian Foster 1 sibling, 2 replies; 10+ messages in thread From: Brian Foster @ 2015-07-28 13:40 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote: > [ reply to both patches in one reply, because it's related. ] > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: > > Some callers need to make error handling decisions based on whether the > > current transaction successfully committed or not. Rename > > xfs_trans_roll(), add a new parameter and provide a wrapper to preserve > > existing callers. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > fs/xfs/xfs_trans.c | 15 +++++++++++++-- > > fs/xfs/xfs_trans.h | 1 + > > 2 files changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > index 0582a27..a0ab1da 100644 > > --- a/fs/xfs/xfs_trans.c > > +++ b/fs/xfs/xfs_trans.c > > @@ -1019,9 +1019,10 @@ xfs_trans_cancel( > > * chunk we've been working on and get a new transaction to continue. > > */ > > int > > -xfs_trans_roll( > > +__xfs_trans_roll( > > struct xfs_trans **tpp, > > - struct xfs_inode *dp) > > + struct xfs_inode *dp, > > + int *committed) > > { > > struct xfs_trans *trans; > > struct xfs_trans_res tres; > > @@ -1052,6 +1053,7 @@ xfs_trans_roll( > > if (error) > > return error; > > So here we return committed = 0, error != 0. > > > > > + *committed = 1; > > trans = *tpp; > > And from here on in we return committed = 1 regardless of the error. > > So looking at the next patch in xfs_bmap_finish(): > > > free->xbfi_blockcount); > > > > - error = xfs_trans_roll(tp, NULL); > > - *committed = 1; > > + error = __xfs_trans_roll(tp, NULL, committed); > > + > > /* > > - * We have a new transaction, so we should return committed=1, > > - * even though we're returning an error. > > + * We have a new transaction, so we should return committed=1, even > > + * though we're returning an error. If an error was returned after the > > + * original transaction was committed, defer the error handling until > > + * the EFD is logged. We do this because a committed EFI requires an EFD > > + * transaction to be processed to ensure the EFI is released. > > */ > > - if (error) > > + if (error && *committed == 0) { > > + *committed = 1; > > return error; > > + } > > So if we failed to commit the EFI, we say we did and then return. > Why do we need to do that? > > /me goes an looks at all the callers. > > Hmm - only xfs_itruncate_extents() relies on that behaviour, but it > could just as easily do an inode join on error, because on success > the inode has already been joined to the new transaction by > xfs_trans_roll(). > Interesting, though I don't follow why this caller even depends on it. It doesn't transfer lock ownership to the transaction. What difference does it make in the error path if the inode is joined? > Looking further, we have quite a bit of inconsistency in the error > handling of xfs_bmap_finish() - some callers issue a > xfs_bmap_cancel() in the error path, and some don't. failing to > cancel the freelist on error looks to me like a memory leak, because > we don't free the extents from the free list until the EFD for the > extent has been logged. If we error out earlier, we still have items > on the free list that haven't been processed. > > So it looks to me like we need fixes there. > Heh, not too surprising. I'll make a note to make a pass through these. > Further, it appears to me that there is really one xfs_bmap_finish() > caller that requires the committed flag: xfs_qm_dqalloc(). All the > others either use it for an assert or joining the inode to the > transaction when committed = 1, which xfs_trans_roll() will have > already done if we return committed = 1.... > Assuming xfs_trans_reserve() hasn't failed, which could cause *committed == 1 without the inode joined. We could probably change this in __xfs_trans_roll() since the inode is presumably already locked. > > So xfs_qm_dqalloc(), on error, simply cancels the freelist and > returns the error - it ignores the committed flag. So the committed > flag is used to determine how to handle the dquot buffer: > > xfs_trans_bhold(tp, bp); > > if ((error = xfs_bmap_finish(tpp, &flist, &committed))) { > goto error1; > } > > if (committed) { > tp = *tpp; > xfs_trans_bjoin(tp, bp); > } else { > xfs_trans_bhold_release(tp, bp); > } > > Now, xfs_trans_bhold() simply sets the BLI_HOLD flag on the buffer, > and transaction commit clears it. That means we could call it > unconditionally, and all we have to do is remove an assert(BLI_HOLD) > from xfs_trans_bhold_release() to enable that. Given this is the > only caller of xfs_trans_bhold_release(), I don't see a problem with > doing that. Nor is there a problem with multiple joins of an object > to a transaction, so we could simply make this: > > xfs_trans_bhold(tp, bp); > error = xfs_bmap_finish(tpp, &flist, &committed); > if (error) > goto error1; > tp = *tpp; > xfs_trans_bjoin(tp, bp); > xfs_trans_bhold_release(tp, bp); > > And the committed flag is not necessary. With this, we can hide > everything to do with error on commit vs error after commit inside > xfs_bmap_finish()... > Seems reasonable at a glance. > > efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count); > > for (free = flist->xbf_first; free != NULL; free = next) { > > next = free->xbfi_next; > > - if ((error = xfs_free_extent(*tp, free->xbfi_startblock, > > - free->xbfi_blockcount))) { > > + > > + /* > > + * Free the extent if the above trans roll hasn't failed and log > > + * the EFD before handling errors from either call to ensure the > > + * EFI reference is accounted for in the tp. Otherwise, the EFI > > + * is never released on abort and pins the AIL indefinitely. > > + */ > > + if (!error) > > + error = xfs_free_extent(*tp, free->xbfi_startblock, > > + free->xbfi_blockcount); > > + xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > > + free->xbfi_blockcount); > > + if (error) { > > /* > > * The bmap free list will be cleaned up at a > > * higher level. The EFI will be canceled when > > @@ -118,11 +134,10 @@ xfs_bmap_finish( > > SHUTDOWN_META_IO_ERROR); > > return error; > > } > > - xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > > - free->xbfi_blockcount); > > xfs_bmap_del_free(flist, NULL, free); > > } > > - return 0; > > + > > + return error; > > This loop doesn't obviously do the right thing now. It > will log the first extent into the EFD and then trigger a shutdown > and return. The extent count in the EFD may not match the > extent count on the EFI, so releasing the EFD at this point may not > release all the extents in the EFI and hence not release the EFI. > The EFD unlock handler forcibly releases the EFI on abort. It drops the EFI extent count reference by whatever the extent count on the EFD is, and that is determined on EFD initialization (xfs_trans_get_efd()) regardless of how many extents were logged to the EFD. That said, the error handling here is certainly not obvious because it depends on the lifecycle of the associated log items. The broader goal is to reduce that dependency so the code here is more straightforward... > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle > this error path rather than having to go through the efd to release > the reference on the EFI. i.e. > > error = __xfs_trans_roll(tp, NULL, &committed); > if (error) { > if (committed) { > if (!XFS_FORCED_SHUTDOWN(mp)) > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > xfs_efi_cancel(tp, efi); > } > return error; > } > That's a nice idea. It pulls some error handling out of the log item handling code explicitly. An EFD version might be useful for the unlogged EFD error case as well. IMO, the more of these cases that are handled explicitly in xfs_bmap_finish() rather than implicitly via the transaction management code, the more reliable and robust to future change it will be. I'll explore it further... > /* > * Cancel a committed EFI after triggering a shutdown. We need to do > * this so that the committed EFI is removed from the AIL and freed, > * otherwise unmount will hang due to a non-empty AIL. > */ > xfs_efi_cancel() > { > ASSERT(XFS_FORCED_SHUTDOWN(mp)); > xfs_efi_release(efi, efi->nextents); > } > > Doing this means there's less complexity on the EFD abort side of > things, as we don't have to now handle this failure case via > transaction completion callback interface. > Indeed. > Hmmm - something I just noticed: if we only have one EFD per EFI, > why do we do we have that layer of extent counting before dropping > real references? > I wondered this myself, but hadn't made it deep enough to see if we used the reference count elsewhere. > > xfs_efd_item_unlock( > > struct xfs_log_item *lip) > > { > > - if (lip->li_flags & XFS_LI_ABORTED) > > - xfs_efd_item_free(EFD_ITEM(lip)); > > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > + > > + if (lip->li_flags & XFS_LI_ABORTED) { > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > > + xfs_efd_item_free(efdp); > > + } > > } > > i.e. we always call xfs_efi_release() with efi_nextents or > efd_nextents, which are always the same, and so we never partially > complete an EFI. Should we just kill that layer, as it does tend to > complicate the EFI release code? > Yeah, that might be a good idea if we don't use the reference count elsewhere. I'll look into that as a subsequent cleanup as well. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-28 13:40 ` Brian Foster @ 2015-07-28 21:51 ` Dave Chinner 2015-07-29 11:47 ` Brian Foster 2015-07-30 17:21 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Dave Chinner @ 2015-07-28 21:51 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Tue, Jul 28, 2015 at 09:40:06AM -0400, Brian Foster wrote: > On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote: > > [ reply to both patches in one reply, because it's related. ] > > > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: > > > Some callers need to make error handling decisions based on whether the > > > current transaction successfully committed or not. Rename > > > xfs_trans_roll(), add a new parameter and provide a wrapper to preserve > > > existing callers. > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > --- > > > fs/xfs/xfs_trans.c | 15 +++++++++++++-- > > > fs/xfs/xfs_trans.h | 1 + > > > 2 files changed, 14 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > > index 0582a27..a0ab1da 100644 > > > --- a/fs/xfs/xfs_trans.c > > > +++ b/fs/xfs/xfs_trans.c > > > @@ -1019,9 +1019,10 @@ xfs_trans_cancel( > > > * chunk we've been working on and get a new transaction to continue. > > > */ > > > int > > > -xfs_trans_roll( > > > +__xfs_trans_roll( > > > struct xfs_trans **tpp, > > > - struct xfs_inode *dp) > > > + struct xfs_inode *dp, > > > + int *committed) > > > { > > > struct xfs_trans *trans; > > > struct xfs_trans_res tres; > > > @@ -1052,6 +1053,7 @@ xfs_trans_roll( > > > if (error) > > > return error; > > > > So here we return committed = 0, error != 0. > > > > > > > > + *committed = 1; > > > trans = *tpp; > > > > And from here on in we return committed = 1 regardless of the error. > > > > So looking at the next patch in xfs_bmap_finish(): > > > > > free->xbfi_blockcount); > > > > > > - error = xfs_trans_roll(tp, NULL); > > > - *committed = 1; > > > + error = __xfs_trans_roll(tp, NULL, committed); > > > + > > > /* > > > - * We have a new transaction, so we should return committed=1, > > > - * even though we're returning an error. > > > + * We have a new transaction, so we should return committed=1, even > > > + * though we're returning an error. If an error was returned after the > > > + * original transaction was committed, defer the error handling until > > > + * the EFD is logged. We do this because a committed EFI requires an EFD > > > + * transaction to be processed to ensure the EFI is released. > > > */ > > > - if (error) > > > + if (error && *committed == 0) { > > > + *committed = 1; > > > return error; > > > + } > > > > So if we failed to commit the EFI, we say we did and then return. > > Why do we need to do that? > > > > /me goes an looks at all the callers. > > > > Hmm - only xfs_itruncate_extents() relies on that behaviour, but it > > could just as easily do an inode join on error, because on success > > the inode has already been joined to the new transaction by > > xfs_trans_roll(). > > > > Interesting, though I don't follow why this caller even depends on it. > It doesn't transfer lock ownership to the transaction. What difference > does it make in the error path if the inode is joined? Callers of xfs_itruncate_extents() expect it to be locked and joined on return, even on error. > > Looking further, we have quite a bit of inconsistency in the error > > handling of xfs_bmap_finish() - some callers issue a > > xfs_bmap_cancel() in the error path, and some don't. failing to > > cancel the freelist on error looks to me like a memory leak, because > > we don't free the extents from the free list until the EFD for the > > extent has been logged. If we error out earlier, we still have items > > on the free list that haven't been processed. > > > > So it looks to me like we need fixes there. > > > > Heh, not too surprising. I'll make a note to make a pass through these. > > > Further, it appears to me that there is really one xfs_bmap_finish() > > caller that requires the committed flag: xfs_qm_dqalloc(). All the > > others either use it for an assert or joining the inode to the > > transaction when committed = 1, which xfs_trans_roll() will have > > already done if we return committed = 1.... > > > > Assuming xfs_trans_reserve() hasn't failed, which could cause *committed > == 1 without the inode joined. We could probably change this in > __xfs_trans_roll() since the inode is presumably already locked. I don't think we can join the inode until after the reservation is done. It could still be done in __xfs_trans_roll() regardless. > > > + return error; > > > > This loop doesn't obviously do the right thing now. It > > will log the first extent into the EFD and then trigger a shutdown > > and return. The extent count in the EFD may not match the > > extent count on the EFI, so releasing the EFD at this point may not > > release all the extents in the EFI and hence not release the EFI. > > > > The EFD unlock handler forcibly releases the EFI on abort. It drops the > EFI extent count reference by whatever the extent count on the EFD is, > and that is determined on EFD initialization (xfs_trans_get_efd()) > regardless of how many extents were logged to the EFD. > > That said, the error handling here is certainly not obvious because it > depends on the lifecycle of the associated log items. The broader goal > is to reduce that dependency so the code here is more straightforward... *nod* > > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle > > this error path rather than having to go through the efd to release > > the reference on the EFI. i.e. > > > > error = __xfs_trans_roll(tp, NULL, &committed); > > if (error) { > > if (committed) { > > if (!XFS_FORCED_SHUTDOWN(mp)) > > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > xfs_efi_cancel(tp, efi); > > } > > return error; > > } > > > > That's a nice idea. It pulls some error handling out of the log item > handling code explicitly. An EFD version might be useful for the > unlogged EFD error case as well. IMO, the more of these cases that are > handled explicitly in xfs_bmap_finish() rather than implicitly via the > transaction management code, the more reliable and robust to future > change it will be. I'll explore it further... Yes, that mirrors my thinking exactly - the EFI/EFD error handling has always been problematic with the reliance on reference counting via transaction commit/abort callbacks to handle it. > > Hmmm - something I just noticed: if we only have one EFD per EFI, > > why do we do we have that layer of extent counting before dropping > > real references? > > > > I wondered this myself, but hadn't made it deep enough to see if we used > the reference count elsewhere. > > > > xfs_efd_item_unlock( > > > struct xfs_log_item *lip) > > > { > > > - if (lip->li_flags & XFS_LI_ABORTED) > > > - xfs_efd_item_free(EFD_ITEM(lip)); > > > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > > + > > > + if (lip->li_flags & XFS_LI_ABORTED) { > > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > > > + xfs_efd_item_free(efdp); > > > + } > > > } > > > > i.e. we always call xfs_efi_release() with efi_nextents or > > efd_nextents, which are always the same, and so we never partially > > complete an EFI. Should we just kill that layer, as it does tend to > > complicate the EFI release code? > > > > Yeah, that might be a good idea if we don't use the reference count > elsewhere. I'll look into that as a subsequent cleanup as well. Excellent! Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-28 21:51 ` Dave Chinner @ 2015-07-29 11:47 ` Brian Foster 0 siblings, 0 replies; 10+ messages in thread From: Brian Foster @ 2015-07-29 11:47 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Wed, Jul 29, 2015 at 07:51:40AM +1000, Dave Chinner wrote: > On Tue, Jul 28, 2015 at 09:40:06AM -0400, Brian Foster wrote: > > On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote: > > > [ reply to both patches in one reply, because it's related. ] > > > > > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: ... > > > > free->xbfi_blockcount); > > > > > > > > - error = xfs_trans_roll(tp, NULL); > > > > - *committed = 1; > > > > + error = __xfs_trans_roll(tp, NULL, committed); > > > > + > > > > /* > > > > - * We have a new transaction, so we should return committed=1, > > > > - * even though we're returning an error. > > > > + * We have a new transaction, so we should return committed=1, even > > > > + * though we're returning an error. If an error was returned after the > > > > + * original transaction was committed, defer the error handling until > > > > + * the EFD is logged. We do this because a committed EFI requires an EFD > > > > + * transaction to be processed to ensure the EFI is released. > > > > */ > > > > - if (error) > > > > + if (error && *committed == 0) { > > > > + *committed = 1; > > > > return error; > > > > + } > > > > > > So if we failed to commit the EFI, we say we did and then return. > > > Why do we need to do that? > > > > > > /me goes an looks at all the callers. > > > > > > Hmm - only xfs_itruncate_extents() relies on that behaviour, but it > > > could just as easily do an inode join on error, because on success > > > the inode has already been joined to the new transaction by > > > xfs_trans_roll(). > > > > > > > Interesting, though I don't follow why this caller even depends on it. > > It doesn't transfer lock ownership to the transaction. What difference > > does it make in the error path if the inode is joined? > > Callers of xfs_itruncate_extents() expect it to be locked and joined > on return, even on error. > All of the callers I see cancel the transaction and unlock the inode separately on error. I could be glossing over some obvious point here, but so far I'm not seeing it... > > > Looking further, we have quite a bit of inconsistency in the error > > > handling of xfs_bmap_finish() - some callers issue a > > > xfs_bmap_cancel() in the error path, and some don't. failing to > > > cancel the freelist on error looks to me like a memory leak, because > > > we don't free the extents from the free list until the EFD for the > > > extent has been logged. If we error out earlier, we still have items > > > on the free list that haven't been processed. > > > > > > So it looks to me like we need fixes there. > > > > > > > Heh, not too surprising. I'll make a note to make a pass through these. > > > > > Further, it appears to me that there is really one xfs_bmap_finish() > > > caller that requires the committed flag: xfs_qm_dqalloc(). All the > > > others either use it for an assert or joining the inode to the > > > transaction when committed = 1, which xfs_trans_roll() will have > > > already done if we return committed = 1.... > > > > > > > Assuming xfs_trans_reserve() hasn't failed, which could cause *committed > > == 1 without the inode joined. We could probably change this in > > __xfs_trans_roll() since the inode is presumably already locked. > > I don't think we can join the inode until after the reservation is > done. It could still be done in __xfs_trans_roll() regardless. > Hmm, I don't see anything obvious that prevents it. It probably defies convention though. Anyways, we're probably a few levels of indirection away from the bug fixes and work that we know needs to happen at this point. I'll try to get the EFI/EFD stuff worked out, along with some of the other issues I'm reproducing, then we can go from there... Brian > > > > + return error; > > > > > > This loop doesn't obviously do the right thing now. It > > > will log the first extent into the EFD and then trigger a shutdown > > > and return. The extent count in the EFD may not match the > > > extent count on the EFI, so releasing the EFD at this point may not > > > release all the extents in the EFI and hence not release the EFI. > > > > > > > The EFD unlock handler forcibly releases the EFI on abort. It drops the > > EFI extent count reference by whatever the extent count on the EFD is, > > and that is determined on EFD initialization (xfs_trans_get_efd()) > > regardless of how many extents were logged to the EFD. > > > > That said, the error handling here is certainly not obvious because it > > depends on the lifecycle of the associated log items. The broader goal > > is to reduce that dependency so the code here is more straightforward... > > *nod* > > > > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle > > > this error path rather than having to go through the efd to release > > > the reference on the EFI. i.e. > > > > > > error = __xfs_trans_roll(tp, NULL, &committed); > > > if (error) { > > > if (committed) { > > > if (!XFS_FORCED_SHUTDOWN(mp)) > > > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > > xfs_efi_cancel(tp, efi); > > > } > > > return error; > > > } > > > > > > > That's a nice idea. It pulls some error handling out of the log item > > handling code explicitly. An EFD version might be useful for the > > unlogged EFD error case as well. IMO, the more of these cases that are > > handled explicitly in xfs_bmap_finish() rather than implicitly via the > > transaction management code, the more reliable and robust to future > > change it will be. I'll explore it further... > > Yes, that mirrors my thinking exactly - the EFI/EFD error handling > has always been problematic with the reliance on reference counting > via transaction commit/abort callbacks to handle it. > > > > Hmmm - something I just noticed: if we only have one EFD per EFI, > > > why do we do we have that layer of extent counting before dropping > > > real references? > > > > > > > I wondered this myself, but hadn't made it deep enough to see if we used > > the reference count elsewhere. > > > > > > xfs_efd_item_unlock( > > > > struct xfs_log_item *lip) > > > > { > > > > - if (lip->li_flags & XFS_LI_ABORTED) > > > > - xfs_efd_item_free(EFD_ITEM(lip)); > > > > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > > > + > > > > + if (lip->li_flags & XFS_LI_ABORTED) { > > > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > > > > + xfs_efd_item_free(efdp); > > > > + } > > > > } > > > > > > i.e. we always call xfs_efi_release() with efi_nextents or > > > efd_nextents, which are always the same, and so we never partially > > > complete an EFI. Should we just kill that layer, as it does tend to > > > complicate the EFI release code? > > > > > > > Yeah, that might be a good idea if we don't use the reference count > > elsewhere. I'll look into that as a subsequent cleanup as well. > > Excellent! > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-28 13:40 ` Brian Foster 2015-07-28 21:51 ` Dave Chinner @ 2015-07-30 17:21 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2015-07-30 17:21 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Tue, Jul 28, 2015 at 09:40:06AM -0400, Brian Foster wrote: > > Hmmm - something I just noticed: if we only have one EFD per EFI, > > why do we do we have that layer of extent counting before dropping > > real references? > > > > I wondered this myself, but hadn't made it deep enough to see if we used > the reference count elsewhere. Because the elders (no pun on Alex, sorry :)) didn't realize we a) don't even ever look at at the logged extents in EFD b) have a 1:1 relationship between EFIs and EFDs I tried to sort some of that out about a year ago, but I didn't manage to get far. have just one. Note that not having to log these _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() 2015-07-28 0:40 ` Dave Chinner 2015-07-28 13:40 ` Brian Foster @ 2015-08-10 18:55 ` Brian Foster 1 sibling, 0 replies; 10+ messages in thread From: Brian Foster @ 2015-08-10 18:55 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Jul 28, 2015 at 10:40:09AM +1000, Dave Chinner wrote: > [ reply to both patches in one reply, because it's related. ] > > On Thu, Jul 23, 2015 at 04:13:29PM -0400, Brian Foster wrote: > > Some callers need to make error handling decisions based on whether the > > current transaction successfully committed or not. Rename > > xfs_trans_roll(), add a new parameter and provide a wrapper to preserve > > existing callers. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > fs/xfs/xfs_trans.c | 15 +++++++++++++-- > > fs/xfs/xfs_trans.h | 1 + > > 2 files changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > index 0582a27..a0ab1da 100644 > > --- a/fs/xfs/xfs_trans.c > > +++ b/fs/xfs/xfs_trans.c > > @@ -1019,9 +1019,10 @@ xfs_trans_cancel( > > * chunk we've been working on and get a new transaction to continue. > > */ > > int > > -xfs_trans_roll( > > +__xfs_trans_roll( ... > So xfs_qm_dqalloc(), on error, simply cancels the freelist and > returns the error - it ignores the committed flag. So the committed > flag is used to determine how to handle the dquot buffer: > > xfs_trans_bhold(tp, bp); > > if ((error = xfs_bmap_finish(tpp, &flist, &committed))) { > goto error1; > } > > if (committed) { > tp = *tpp; > xfs_trans_bjoin(tp, bp); > } else { > xfs_trans_bhold_release(tp, bp); > } > > Now, xfs_trans_bhold() simply sets the BLI_HOLD flag on the buffer, > and transaction commit clears it. That means we could call it > unconditionally, and all we have to do is remove an assert(BLI_HOLD) > from xfs_trans_bhold_release() to enable that. Given this is the > only caller of xfs_trans_bhold_release(), I don't see a problem with > doing that. Nor is there a problem with multiple joins of an object > to a transaction, so we could simply make this: > > xfs_trans_bhold(tp, bp); > error = xfs_bmap_finish(tpp, &flist, &committed); > if (error) > goto error1; > tp = *tpp; > xfs_trans_bjoin(tp, bp); > xfs_trans_bhold_release(tp, bp); > > And the committed flag is not necessary. With this, we can hide > everything to do with error on commit vs error after commit inside > xfs_bmap_finish()... > I'm starting to look back at this and try to understand the intent here (FYI, I've appended some of the xfs_bmap_cancel() fixes to v3 of the EFI/EFD series, to be posted shortly). Is the proposition that we should be able to remove the committed parameter from xfs_bmap_finish() entirely? If so, the buffer hold technique makes some sense here, but not necessarily the potential for multiple joins of the object to the transaction that this kind of usage introduces. At least, there are currently assertions to protect against that kind of thing in the buffer/inode join handlers that would have to be removed. In the case of a buffer join, it looks like there's some reference counting that would have to be rectified against this type of usage (I take it there is some appropriate use for this that I'm not familiar with). Both the buffer and inode cases also look like they'd introduce multiple calls to xfs_trans_add_item(), which I'm not sure is totally safe at a glance. The fact that an xfs_log_item points to a single xfs_log_item_desc looks a bit suspicious to me, but I'd have to take a harder look at that. Even if it were completely safe, IMO we obfuscate the context of code as shown above because the object could be added to the new transaction or re-added to the original depending on whether the previous transaction freed some blocks or not. In other words, I think a multiple join of an object has to be made completely idempotent in order to justify use of a pattern like that shown above. I'm not sure it's worth doing that just to kill a flag. Thoughts? Brian > > efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count); > > for (free = flist->xbf_first; free != NULL; free = next) { > > next = free->xbfi_next; > > - if ((error = xfs_free_extent(*tp, free->xbfi_startblock, > > - free->xbfi_blockcount))) { > > + > > + /* > > + * Free the extent if the above trans roll hasn't failed and log > > + * the EFD before handling errors from either call to ensure the > > + * EFI reference is accounted for in the tp. Otherwise, the EFI > > + * is never released on abort and pins the AIL indefinitely. > > + */ > > + if (!error) > > + error = xfs_free_extent(*tp, free->xbfi_startblock, > > + free->xbfi_blockcount); > > + xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > > + free->xbfi_blockcount); > > + if (error) { > > /* > > * The bmap free list will be cleaned up at a > > * higher level. The EFI will be canceled when > > @@ -118,11 +134,10 @@ xfs_bmap_finish( > > SHUTDOWN_META_IO_ERROR); > > return error; > > } > > - xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, > > - free->xbfi_blockcount); > > xfs_bmap_del_free(flist, NULL, free); > > } > > - return 0; > > + > > + return error; > > This loop doesn't obviously do the right thing now. It > will log the first extent into the EFD and then trigger a shutdown > and return. The extent count in the EFD may not match the > extent count on the EFI, so releasing the EFD at this point may not > release all the extents in the EFI and hence not release the EFI. > > I think I'd prefer to see a xfs_trans_cancel_efi() call to handle > this error path rather than having to go through the efd to release > the reference on the EFI. i.e. > > error = __xfs_trans_roll(tp, NULL, &committed); > if (error) { > if (committed) { > if (!XFS_FORCED_SHUTDOWN(mp)) > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > xfs_efi_cancel(tp, efi); > } > return error; > } > > /* > * Cancel a committed EFI after triggering a shutdown. We need to do > * this so that the committed EFI is removed from the AIL and freed, > * otherwise unmount will hang due to a non-empty AIL. > */ > xfs_efi_cancel() > { > ASSERT(XFS_FORCED_SHUTDOWN(mp)); > xfs_efi_release(efi, efi->nextents); > } > > Doing this means there's less complexity on the EFD abort side of > things, as we don't have to now handle this failure case via > transaction completion callback interface. > > Hmmm - something I just noticed: if we only have one EFD per EFI, > why do we do we have that layer of extent counting before dropping > real references? > > > xfs_efd_item_unlock( > > struct xfs_log_item *lip) > > { > > - if (lip->li_flags & XFS_LI_ABORTED) > > - xfs_efd_item_free(EFD_ITEM(lip)); > > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > + > > + if (lip->li_flags & XFS_LI_ABORTED) { > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > > + xfs_efd_item_free(efdp); > > + } > > } > > i.e. we always call xfs_efi_release() with efi_nextents or > efd_nextents, which are always the same, and so we never partially > complete an EFI. Should we just kill that layer, as it does tend to > complicate the EFI release code? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs 2015-07-23 20:13 [PATCH RFC 0/2] xfs: fix up EFI/EFD error handling Brian Foster 2015-07-23 20:13 ` [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() Brian Foster @ 2015-07-23 20:13 ` Brian Foster 2015-07-29 22:18 ` Dave Chinner 1 sibling, 1 reply; 10+ messages in thread From: Brian Foster @ 2015-07-23 20:13 UTC (permalink / raw) To: xfs Freeing an extent in XFS involves logging an EFI (extent free intention), freeing the actual extent, and logging an EFD (extent free done). The EFI object is created with a reference count of 2: one for the current transaction and one for the subsequently created EFD. Under normal circumstances, the first reference is dropped when the EFI is unpinned and the second reference is dropped when the EFD is committed to the on-disk log. In event of errors or filesystem shutdown, there are various potential cleanup scenarios depending on the state of the EFI/EFD. The cleanup scenarios are confusing and racy, as demonstrated by the following test sequence: # mount $dev $mnt # fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \ -f punch=1 -f creat=1 -f unlink=1 & # sleep 5 # killall -9 fsstress; wait # godown -f $mnt # umount ... in which the final umount can hang due to the AIL being pinned indefinitely by one or more EFI items. This can occur due to several conditions. For example, if the shutdown occurs after the EFI is committed to the on-disk log and the EFD committed to the CIL, but before the EFD committed to the log, the EFD iop_committed() abort handler does not drop its reference to the EFI. Alternatively, manual error injection in the xfs_bmap_finish() codepath shows that if an error occurs after the EFI transaction is committed but before the EFD is constructed and logged, the EFI is never released from the AIL. Update the EFI/EFD item handling code to use a more straightforward and reliable approach to error handling. If the EFI transaction is cancelled, the EFI is freed when the log item is unlocked. If the EFI transaction is committed successfully, from that point forward it is the responsibility of the EFD to drop its EFI reference. This means that the EFI unpin callback only ever drops the log reference to the EFI. It does not free the EFI in the event of log I/O error. This also means that the EFD item must drop its EFI reference either if the EFD transaction is cancelled, committed or itself aborted due to log I/O error. Finally, update xfs_bmap_finish() to ensure that once an EFI transaction is committed, we are guaranteed to construct and log the associated EFD. This ensures that the EFD is aborted and drops the reference to the EFI. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_bmap_util.c | 35 ++++++++++++++++++++-------- fs/xfs/xfs_extfree_item.c | 59 +++++++++++++++++++++++++---------------------- 2 files changed, 57 insertions(+), 37 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 0f34886c..878b33a 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -88,20 +88,36 @@ xfs_bmap_finish( xfs_trans_log_efi_extent(*tp, efi, free->xbfi_startblock, free->xbfi_blockcount); - error = xfs_trans_roll(tp, NULL); - *committed = 1; + error = __xfs_trans_roll(tp, NULL, committed); + /* - * We have a new transaction, so we should return committed=1, - * even though we're returning an error. + * We have a new transaction, so we should return committed=1, even + * though we're returning an error. If an error was returned after the + * original transaction was committed, defer the error handling until + * the EFD is logged. We do this because a committed EFI requires an EFD + * transaction to be processed to ensure the EFI is released. */ - if (error) + if (error && *committed == 0) { + *committed = 1; return error; + } efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count); for (free = flist->xbf_first; free != NULL; free = next) { next = free->xbfi_next; - if ((error = xfs_free_extent(*tp, free->xbfi_startblock, - free->xbfi_blockcount))) { + + /* + * Free the extent if the above trans roll hasn't failed and log + * the EFD before handling errors from either call to ensure the + * EFI reference is accounted for in the tp. Otherwise, the EFI + * is never released on abort and pins the AIL indefinitely. + */ + if (!error) + error = xfs_free_extent(*tp, free->xbfi_startblock, + free->xbfi_blockcount); + xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, + free->xbfi_blockcount); + if (error) { /* * The bmap free list will be cleaned up at a * higher level. The EFI will be canceled when @@ -118,11 +134,10 @@ xfs_bmap_finish( SHUTDOWN_META_IO_ERROR); return error; } - xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, - free->xbfi_blockcount); xfs_bmap_del_free(flist, NULL, free); } - return 0; + + return error; } int diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index adc8f8f..0e426ab 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -128,12 +128,12 @@ xfs_efi_item_pin( } /* - * While EFIs cannot really be pinned, the unpin operation is the last place at - * which the EFI is manipulated during a transaction. If we are being asked to - * remove the EFI it's because the transaction has been cancelled and by - * definition that means the EFI cannot be in the AIL so remove it from the - * transaction and free it. Otherwise coordinate with xfs_efi_release() - * to determine who gets to free the EFI. + * The unpin operation is the last place an EFI is manipulated in the log. It is + * either inserted in the AIL or aborted in the event of a log I/O error. In + * either case, the EFI transaction has been successfully committed to make it + * this far. Therefore, we expect whoever committed the EFI to construct the EFD + * and manage the EFD's reference to the EFI appropriately. Simply drop the + * log's EFI reference now that the log is done with it. */ STATIC void xfs_efi_item_unpin( @@ -141,14 +141,6 @@ xfs_efi_item_unpin( int remove) { struct xfs_efi_log_item *efip = EFI_ITEM(lip); - - if (remove) { - ASSERT(!(lip->li_flags & XFS_LI_IN_AIL)); - if (lip->li_desc) - xfs_trans_del_item(lip); - xfs_efi_item_free(efip); - return; - } __xfs_efi_release(efip); } @@ -167,6 +159,11 @@ xfs_efi_item_push( return XFS_ITEM_PINNED; } +/* + * The EFI has been either committed or aborted if the transaction has been + * cancelled. If the transaction was cancelled, an EFD isn't going to be + * constructed and thus we free the EFI here directly. + */ STATIC void xfs_efi_item_unlock( struct xfs_log_item *lip) @@ -415,20 +412,27 @@ xfs_efd_item_push( return XFS_ITEM_PINNED; } +/* + * The EFD is either committed or aborted if the transaction is cancelled. If + * the transaction is cancelled, drop our reference to the EFI and free the EFD. + */ STATIC void xfs_efd_item_unlock( struct xfs_log_item *lip) { - if (lip->li_flags & XFS_LI_ABORTED) - xfs_efd_item_free(EFD_ITEM(lip)); + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); + + if (lip->li_flags & XFS_LI_ABORTED) { + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); + xfs_efd_item_free(efdp); + } } /* - * When the efd item is committed to disk, all we need to do - * is delete our reference to our partner efi item and then - * free ourselves. Since we're freeing ourselves we must - * return -1 to keep the transaction code from further referencing - * this item. + * When the efd item is committed to disk, all we need to do is delete our + * reference to our partner efi item and then free ourselves. Since we're + * freeing ourselves we must return -1 to keep the transaction code from further + * referencing this item. */ STATIC xfs_lsn_t xfs_efd_item_committed( @@ -438,14 +442,15 @@ xfs_efd_item_committed( struct xfs_efd_log_item *efdp = EFD_ITEM(lip); /* - * If we got a log I/O error, it's always the case that the LR with the - * EFI got unpinned and freed before the EFD got aborted. + * Drop the efi reference regardless of whether this item has been + * aborted. Once the EFI transaction is successfully committed, it is + * the sole responsibility of the EFD to clean up the EFI (even if the + * EFI is aborted due to log I/O error). */ - if (!(lip->li_flags & XFS_LI_ABORTED)) - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); - + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); xfs_efd_item_free(efdp); - return (xfs_lsn_t)-1; + + return (xfs_lsn_t) -1; } /* -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs 2015-07-23 20:13 ` [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs Brian Foster @ 2015-07-29 22:18 ` Dave Chinner 0 siblings, 0 replies; 10+ messages in thread From: Dave Chinner @ 2015-07-29 22:18 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Thu, Jul 23, 2015 at 04:13:30PM -0400, Brian Foster wrote: > Freeing an extent in XFS involves logging an EFI (extent free > intention), freeing the actual extent, and logging an EFD (extent free > done). The EFI object is created with a reference count of 2: one for > the current transaction and one for the subsequently created EFD. Under > normal circumstances, the first reference is dropped when the EFI is > unpinned and the second reference is dropped when the EFD is committed > to the on-disk log. > > In event of errors or filesystem shutdown, there are various potential > cleanup scenarios depending on the state of the EFI/EFD. The cleanup > scenarios are confusing and racy, as demonstrated by the following test > sequence: > > # mount $dev $mnt > # fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \ > -f punch=1 -f creat=1 -f unlink=1 & > # sleep 5 > # killall -9 fsstress; wait > # godown -f $mnt > # umount > > ... in which the final umount can hang due to the AIL being pinned > indefinitely by one or more EFI items. This can occur due to several > conditions. For example, if the shutdown occurs after the EFI is > committed to the on-disk log and the EFD committed to the CIL, but > before the EFD committed to the log, the EFD iop_committed() abort > handler does not drop its reference to the EFI. Alternatively, manual > error injection in the xfs_bmap_finish() codepath shows that if an error > occurs after the EFI transaction is committed but before the EFD is > constructed and logged, the EFI is never released from the AIL. > > Update the EFI/EFD item handling code to use a more straightforward and > reliable approach to error handling. If the EFI transaction is > cancelled, the EFI is freed when the log item is unlocked. If the EFI > transaction is committed successfully, from that point forward it is the > responsibility of the EFD to drop its EFI reference. This means that the > EFI unpin callback only ever drops the log reference to the EFI. It does > not free the EFI in the event of log I/O error. This also means that the > EFD item must drop its EFI reference either if the EFD transaction is > cancelled, committed or itself aborted due to log I/O error. Finally, > update xfs_bmap_finish() to ensure that once an EFI transaction is > committed, we are guaranteed to construct and log the associated EFD. > This ensures that the EFD is aborted and drops the reference to the EFI. > > Signed-off-by: Brian Foster <bfoster@redhat.com> FWIW, we have the same hang problems in log recovery when xfs_free_extent fails - the efd is not logged and the filesystem is not shut down, so the EFI being processed is not released or removed from the AIL. We also abort the processing of EFIs on the first error, leaving the AIL full of EFIs that will never be removed and so the unmount will hang trying to empty the AIL... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-08-10 18:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-23 20:13 [PATCH RFC 0/2] xfs: fix up EFI/EFD error handling Brian Foster 2015-07-23 20:13 ` [PATCH RFC 1/2] xfs: return committed status from xfs_trans_roll() Brian Foster 2015-07-28 0:40 ` Dave Chinner 2015-07-28 13:40 ` Brian Foster 2015-07-28 21:51 ` Dave Chinner 2015-07-29 11:47 ` Brian Foster 2015-07-30 17:21 ` Christoph Hellwig 2015-08-10 18:55 ` Brian Foster 2015-07-23 20:13 ` [PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs Brian Foster 2015-07-29 22:18 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox