From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29])
	by oss.sgi.com (Postfix) with ESMTP id A55D17F3F
	for <xfs@oss.sgi.com>; Thu,  5 Dec 2013 17:06:55 -0600 (CST)
Message-ID: <52A1070C.6080006@sgi.com>
Date: Thu, 05 Dec 2013 17:06:52 -0600
From: Mark Tinguely <tinguely@sgi.com>
MIME-Version: 1.0
Subject: Re: [PATCH] xfs: free the efi AIL entry on log recovery failure
References: <20131205214054.330817383@sgi.com> <20131205223244.GJ29897@dastard>
In-Reply-To: <20131205223244.GJ29897@dastard>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: xfs-bounces@oss.sgi.com
Sender: xfs-bounces@oss.sgi.com
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com

On 12/05/13 16:32, Dave Chinner wrote:
> On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote:
>> If freeing an extent fails during recovery, then the filesystem
>> will be forced down with the EFI entry still on the AIL. This
>> will result in hanging the function xfs_ail_push_all_sync().
>>
>> This patch is similar to the patches that removed the dquot and
>> inode in commits 32ce90a and dea9609.
>>
>> Found by mounting an metadata dump that triggers a
>> XFS_WANT_CORRUPTED_RETURN() on log recovery.
>>
>> Signed-off-by: Mark Tinguely<tinguely@sgi.com>
>> ---
>>   fs/xfs/xfs_log_recover.c |    5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> Index: b/fs/xfs/xfs_log_recover.c
>> ===================================================================
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -3668,7 +3668,7 @@ xlog_recover_process_efi(
>>   		extp =&(efip->efi_format.efi_extents[i]);
>>   		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
>>   		if (error)
>> -			goto abort_error;
>> +			goto free_abort;
>>   		xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
>>   					 extp->ext_len);
>>   	}
>> @@ -3677,6 +3677,9 @@ xlog_recover_process_efi(
>>   	error = xfs_trans_commit(tp, 0);
>>   	return error;
>>
>> +free_abort:
>> +	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
>> +	xfs_efi_release(efip, efip->efi_format.efi_nextents);
>>   abort_error:
>>   	xfs_trans_cancel(tp, XFS_TRANS_ABORT);
>>   	return error;
>
> Why does a failure of xfs_trans_reserve() in this function
> not require the XFS_EFI_RECOVERED bit to be set?
>
> If it does require setting the XFS_EFI_RECOVERED bit (I think it
> does), then we unconditionally set that bit in
> xlog_recover_process_efi() so it should be pulled up to be the first
> thing the function does, not something that is handled in the error
> paths. IOWs, we leave this function having guaranteed that we've
> pulled the EFI from the AIL on any failure.

or not if we move where we remove the entries from the AIL....

>
> Bigger picture: if we fail to recover this EFI, we abort the AIL
> walk across all the EFIs in the AIL. That means all unrecovered EFIs
> get left in the AIL and this will result in hanging the function
> xfs_ail_push_all_sync().  IOWs, this patch only fixes the case where
> we get an error on the last EFI in the AIL during recovery.
>
> Hence a larger fix is needed here - if we fail to recover an EFI, we
> need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit
> on all the EFIs still in the AIL and release them.

...pretty sure the forced shutdown will happen later and the caller, 
xlog_recover_process_efis(), is already walking the ail with a 
xfs_ail_cursor. As you mention, there can be more entries, then 
xlog_recover_process_efis() could continue to use the cursor to the 
remove the remaining entries from the AIL on error.

Thanks for the feedback.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs