From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id A55D17F3F for ; Thu, 5 Dec 2013 17:06:55 -0600 (CST) Message-ID: <52A1070C.6080006@sgi.com> Date: Thu, 05 Dec 2013 17:06:52 -0600 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: [PATCH] xfs: free the efi AIL entry on log recovery failure References: <20131205214054.330817383@sgi.com> <20131205223244.GJ29897@dastard> In-Reply-To: <20131205223244.GJ29897@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 12/05/13 16:32, Dave Chinner wrote: > On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote: >> If freeing an extent fails during recovery, then the filesystem >> will be forced down with the EFI entry still on the AIL. This >> will result in hanging the function xfs_ail_push_all_sync(). >> >> This patch is similar to the patches that removed the dquot and >> inode in commits 32ce90a and dea9609. >> >> Found by mounting an metadata dump that triggers a >> XFS_WANT_CORRUPTED_RETURN() on log recovery. >> >> Signed-off-by: Mark Tinguely >> --- >> fs/xfs/xfs_log_recover.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> Index: b/fs/xfs/xfs_log_recover.c >> =================================================================== >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -3668,7 +3668,7 @@ xlog_recover_process_efi( >> extp =&(efip->efi_format.efi_extents[i]); >> error = xfs_free_extent(tp, extp->ext_start, extp->ext_len); >> if (error) >> - goto abort_error; >> + goto free_abort; >> xfs_trans_log_efd_extent(tp, efdp, extp->ext_start, >> extp->ext_len); >> } >> @@ -3677,6 +3677,9 @@ xlog_recover_process_efi( >> error = xfs_trans_commit(tp, 0); >> return error; >> >> +free_abort: >> + set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); >> + xfs_efi_release(efip, efip->efi_format.efi_nextents); >> abort_error: >> xfs_trans_cancel(tp, XFS_TRANS_ABORT); >> return error; > > Why does a failure of xfs_trans_reserve() in this function > not require the XFS_EFI_RECOVERED bit to be set? > > If it does require setting the XFS_EFI_RECOVERED bit (I think it > does), then we unconditionally set that bit in > xlog_recover_process_efi() so it should be pulled up to be the first > thing the function does, not something that is handled in the error > paths. IOWs, we leave this function having guaranteed that we've > pulled the EFI from the AIL on any failure. or not if we move where we remove the entries from the AIL.... > > Bigger picture: if we fail to recover this EFI, we abort the AIL > walk across all the EFIs in the AIL. That means all unrecovered EFIs > get left in the AIL and this will result in hanging the function > xfs_ail_push_all_sync(). IOWs, this patch only fixes the case where > we get an error on the last EFI in the AIL during recovery. > > Hence a larger fix is needed here - if we fail to recover an EFI, we > need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit > on all the EFIs still in the AIL and release them. ...pretty sure the forced shutdown will happen later and the caller, xlog_recover_process_efis(), is already walking the ail with a xfs_ail_cursor. As you mention, there can be more entries, then xlog_recover_process_efis() could continue to use the cursor to the remove the remaining entries from the AIL on error. Thanks for the feedback. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs