* [PATCH] xfs: free the efi AIL entry on log recovery failure @ 2013-12-05 21:40 Mark Tinguely 2013-12-05 22:32 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Mark Tinguely @ 2013-12-05 21:40 UTC (permalink / raw) To: xfs [-- Attachment #1: xfs-remove-efi-entry-before-log-unmount.patch --] [-- Type: text/plain, Size: 1401 bytes --] If freeing an extent fails during recovery, then the filesystem will be forced down with the EFI entry still on the AIL. This will result in hanging the function xfs_ail_push_all_sync(). This patch is similar to the patches that removed the dquot and inode in commits 32ce90a and dea9609. Found by mounting an metadata dump that triggers a XFS_WANT_CORRUPTED_RETURN() on log recovery. Signed-off-by: Mark Tinguely <tinguely@sgi.com> --- fs/xfs/xfs_log_recover.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) Index: b/fs/xfs/xfs_log_recover.c =================================================================== --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3668,7 +3668,7 @@ xlog_recover_process_efi( extp = &(efip->efi_format.efi_extents[i]); error = xfs_free_extent(tp, extp->ext_start, extp->ext_len); if (error) - goto abort_error; + goto free_abort; xfs_trans_log_efd_extent(tp, efdp, extp->ext_start, extp->ext_len); } @@ -3677,6 +3677,9 @@ xlog_recover_process_efi( error = xfs_trans_commit(tp, 0); return error; +free_abort: + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); + xfs_efi_release(efip, efip->efi_format.efi_nextents); abort_error: xfs_trans_cancel(tp, XFS_TRANS_ABORT); return error; _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: free the efi AIL entry on log recovery failure 2013-12-05 21:40 [PATCH] xfs: free the efi AIL entry on log recovery failure Mark Tinguely @ 2013-12-05 22:32 ` Dave Chinner 2013-12-05 23:06 ` Mark Tinguely 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2013-12-05 22:32 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote: > If freeing an extent fails during recovery, then the filesystem > will be forced down with the EFI entry still on the AIL. This > will result in hanging the function xfs_ail_push_all_sync(). > > This patch is similar to the patches that removed the dquot and > inode in commits 32ce90a and dea9609. > > Found by mounting an metadata dump that triggers a > XFS_WANT_CORRUPTED_RETURN() on log recovery. > > Signed-off-by: Mark Tinguely <tinguely@sgi.com> > --- > fs/xfs/xfs_log_recover.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > Index: b/fs/xfs/xfs_log_recover.c > =================================================================== > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -3668,7 +3668,7 @@ xlog_recover_process_efi( > extp = &(efip->efi_format.efi_extents[i]); > error = xfs_free_extent(tp, extp->ext_start, extp->ext_len); > if (error) > - goto abort_error; > + goto free_abort; > xfs_trans_log_efd_extent(tp, efdp, extp->ext_start, > extp->ext_len); > } > @@ -3677,6 +3677,9 @@ xlog_recover_process_efi( > error = xfs_trans_commit(tp, 0); > return error; > > +free_abort: > + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); > + xfs_efi_release(efip, efip->efi_format.efi_nextents); > abort_error: > xfs_trans_cancel(tp, XFS_TRANS_ABORT); > return error; Why does a failure of xfs_trans_reserve() in this function not require the XFS_EFI_RECOVERED bit to be set? If it does require setting the XFS_EFI_RECOVERED bit (I think it does), then we unconditionally set that bit in xlog_recover_process_efi() so it should be pulled up to be the first thing the function does, not something that is handled in the error paths. IOWs, we leave this function having guaranteed that we've pulled the EFI from the AIL on any failure. Bigger picture: if we fail to recover this EFI, we abort the AIL walk across all the EFIs in the AIL. That means all unrecovered EFIs get left in the AIL and this will result in hanging the function xfs_ail_push_all_sync(). IOWs, this patch only fixes the case where we get an error on the last EFI in the AIL during recovery. Hence a larger fix is needed here - if we fail to recover an EFI, we need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit on all the EFIs still in the AIL and release them. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: free the efi AIL entry on log recovery failure 2013-12-05 22:32 ` Dave Chinner @ 2013-12-05 23:06 ` Mark Tinguely 2013-12-05 23:54 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Mark Tinguely @ 2013-12-05 23:06 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 12/05/13 16:32, Dave Chinner wrote: > On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote: >> If freeing an extent fails during recovery, then the filesystem >> will be forced down with the EFI entry still on the AIL. This >> will result in hanging the function xfs_ail_push_all_sync(). >> >> This patch is similar to the patches that removed the dquot and >> inode in commits 32ce90a and dea9609. >> >> Found by mounting an metadata dump that triggers a >> XFS_WANT_CORRUPTED_RETURN() on log recovery. >> >> Signed-off-by: Mark Tinguely<tinguely@sgi.com> >> --- >> fs/xfs/xfs_log_recover.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> Index: b/fs/xfs/xfs_log_recover.c >> =================================================================== >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -3668,7 +3668,7 @@ xlog_recover_process_efi( >> extp =&(efip->efi_format.efi_extents[i]); >> error = xfs_free_extent(tp, extp->ext_start, extp->ext_len); >> if (error) >> - goto abort_error; >> + goto free_abort; >> xfs_trans_log_efd_extent(tp, efdp, extp->ext_start, >> extp->ext_len); >> } >> @@ -3677,6 +3677,9 @@ xlog_recover_process_efi( >> error = xfs_trans_commit(tp, 0); >> return error; >> >> +free_abort: >> + set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); >> + xfs_efi_release(efip, efip->efi_format.efi_nextents); >> abort_error: >> xfs_trans_cancel(tp, XFS_TRANS_ABORT); >> return error; > > Why does a failure of xfs_trans_reserve() in this function > not require the XFS_EFI_RECOVERED bit to be set? > > If it does require setting the XFS_EFI_RECOVERED bit (I think it > does), then we unconditionally set that bit in > xlog_recover_process_efi() so it should be pulled up to be the first > thing the function does, not something that is handled in the error > paths. IOWs, we leave this function having guaranteed that we've > pulled the EFI from the AIL on any failure. or not if we move where we remove the entries from the AIL.... > > Bigger picture: if we fail to recover this EFI, we abort the AIL > walk across all the EFIs in the AIL. That means all unrecovered EFIs > get left in the AIL and this will result in hanging the function > xfs_ail_push_all_sync(). IOWs, this patch only fixes the case where > we get an error on the last EFI in the AIL during recovery. > > Hence a larger fix is needed here - if we fail to recover an EFI, we > need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit > on all the EFIs still in the AIL and release them. ...pretty sure the forced shutdown will happen later and the caller, xlog_recover_process_efis(), is already walking the ail with a xfs_ail_cursor. As you mention, there can be more entries, then xlog_recover_process_efis() could continue to use the cursor to the remove the remaining entries from the AIL on error. Thanks for the feedback. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: free the efi AIL entry on log recovery failure 2013-12-05 23:06 ` Mark Tinguely @ 2013-12-05 23:54 ` Dave Chinner 0 siblings, 0 replies; 4+ messages in thread From: Dave Chinner @ 2013-12-05 23:54 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Dec 05, 2013 at 05:06:52PM -0600, Mark Tinguely wrote: > On 12/05/13 16:32, Dave Chinner wrote: > >On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote: > >>If freeing an extent fails during recovery, then the filesystem > >>will be forced down with the EFI entry still on the AIL. This > >>will result in hanging the function xfs_ail_push_all_sync(). > >> > >>This patch is similar to the patches that removed the dquot and > >>inode in commits 32ce90a and dea9609. > >> > >>Found by mounting an metadata dump that triggers a > >>XFS_WANT_CORRUPTED_RETURN() on log recovery. > >> > >>Signed-off-by: Mark Tinguely<tinguely@sgi.com> > >>--- > >> fs/xfs/xfs_log_recover.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >>Index: b/fs/xfs/xfs_log_recover.c > >>=================================================================== > >>--- a/fs/xfs/xfs_log_recover.c > >>+++ b/fs/xfs/xfs_log_recover.c > >>@@ -3668,7 +3668,7 @@ xlog_recover_process_efi( > >> extp =&(efip->efi_format.efi_extents[i]); > >> error = xfs_free_extent(tp, extp->ext_start, extp->ext_len); > >> if (error) > >>- goto abort_error; > >>+ goto free_abort; > >> xfs_trans_log_efd_extent(tp, efdp, extp->ext_start, > >> extp->ext_len); > >> } > >>@@ -3677,6 +3677,9 @@ xlog_recover_process_efi( > >> error = xfs_trans_commit(tp, 0); > >> return error; > >> > >>+free_abort: > >>+ set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); > >>+ xfs_efi_release(efip, efip->efi_format.efi_nextents); > >> abort_error: > >> xfs_trans_cancel(tp, XFS_TRANS_ABORT); > >> return error; > > > >Why does a failure of xfs_trans_reserve() in this function > >not require the XFS_EFI_RECOVERED bit to be set? > > > >If it does require setting the XFS_EFI_RECOVERED bit (I think it > >does), then we unconditionally set that bit in > >xlog_recover_process_efi() so it should be pulled up to be the first > >thing the function does, not something that is handled in the error > >paths. IOWs, we leave this function having guaranteed that we've > >pulled the EFI from the AIL on any failure. > > or not if we move where we remove the entries from the AIL.... Sorry, I don't understand what this means. Can you explain in more detail? > >Bigger picture: if we fail to recover this EFI, we abort the AIL > >walk across all the EFIs in the AIL. That means all unrecovered EFIs > >get left in the AIL and this will result in hanging the function > >xfs_ail_push_all_sync(). IOWs, this patch only fixes the case where > >we get an error on the last EFI in the AIL during recovery. > > > >Hence a larger fix is needed here - if we fail to recover an EFI, we > >need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit > >on all the EFIs still in the AIL and release them. > > ...pretty sure the forced shutdown will happen later and the caller, > xlog_recover_process_efis(), The forced shutdown occurs within the context of the xlog_recover_process_efis() i.e. immediately on detection of the corruption error in xfs_free_extent(). > is already walking the ail with a > xfs_ail_cursor. As you mention, there can be more entries, then > xlog_recover_process_efis() could continue to use the cursor to the > remove the remaining entries from the AIL on error. I'm not sure that will be sufficient, because the EFIs that are in the AIL due to references from the EFD are relying on the EFD to take the last reference away from the EFI and hence free it. When the shutdown occurs, the EFDs are aborted via ->iop_committed() after the XFS_LI_ABORTED flag is set on the EFD log items and this happens: STATIC xfs_lsn_t xfs_efd_item_committed( struct xfs_log_item *lip, xfs_lsn_t lsn) { struct xfs_efd_log_item *efdp = EFD_ITEM(lip); /* * If we got a log I/O error, it's always the case that the LR with the * EFI got unpinned and freed before the EFD got aborted. */ if (!(lip->li_flags & XFS_LI_ABORTED)) xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); xfs_efd_item_free(efdp); return (xfs_lsn_t)-1; } i.e. the EFIs don't get released, and so continue to live in the AIL even though they have the XFS_EFI_RECOVERED bit set on them. Hence you have to walk the entire AIL again, releasing all EFIs that have the XFS_EFI_RECOVERED set, and for those that don't have it set it needs to be set and the EFI released.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-12-05 23:54 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-05 21:40 [PATCH] xfs: free the efi AIL entry on log recovery failure Mark Tinguely 2013-12-05 22:32 ` Dave Chinner 2013-12-05 23:06 ` Mark Tinguely 2013-12-05 23:54 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox