linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: Clean the EFI on errors series
@ 2014-03-25 20:06 Mark Tinguely
  2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely
  2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw)
  To: xfs

Here is a broken up version of the clean the EFI from the AIL
on error.
 
Patch 1 is for log recovery and patch 2 is for log aborts, forced
shutdowns in xfs_bmap_finish() or another thread while the EFI is
in the CIL or AIL and the EFD is in the CIL.
 
If xfs_free_extent() fails while processing the extent free list,
one has to manually remove the EFI because the EFD is not complete
and is not in the transaction.
 
--Mark.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] xfs: remove efi from AIL in log recovery error
  2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely
@ 2014-03-25 20:06 ` Mark Tinguely
  2014-03-28 15:24   ` Brian Foster
  2014-03-31 20:25   ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely
  2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely
  1 sibling, 2 replies; 12+ messages in thread
From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-efi-in-log-recovery-error.patch --]
[-- Type: text/plain, Size: 5155 bytes --]

xlog_recover_process_efi{s}() functions are completing the
second half of xfs_bmap_finish() which frees extents. If this
operation fails, the EFI will stay on the AIL and prevents
the xfs_ail_push all_sync() from completing and the mount will
fail to unmount.

Rather than have a special log recovery flag XFS_EFI_RECOVERED
to decrement the EFI/EFD counter, call the same decrement function
from the log recovery routine that is called then the EFI is added
to the AIL from a log write.

Remove all other unprocessed EFIs from the log recovery AIL
when one is discovered in error.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
---
Rewritten with suggestions from Dave.
Note: calling xfs_efi_item_unpin() seemed more explainatory than calling
      the helper __xfs_efi_release().

 fs/xfs/xfs_extfree_item.c |    9 +++------
 fs/xfs/xfs_log_recover.c  |   28 +++++++++++++++-------------
 fs/xfs/xfs_trans.h        |    1 +
 3 files changed, 19 insertions(+), 19 deletions(-)

Index: b/fs/xfs/xfs_extfree_item.c
===================================================================
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -134,9 +134,10 @@ xfs_efi_item_pin(
  * remove the EFI it's because the transaction has been canceled and by
  * definition that means the EFI cannot be in the AIL so remove it from the
  * transaction and free it.  Otherwise coordinate with xfs_efi_release()
- * to determine who gets to free the EFI.
+ * to determine who gets to free the EFI. Call from log recovery of EFI
+ * entries so the EFD or error handling will remove the entry.
  */
-STATIC void
+void
 xfs_efi_item_unpin(
 	struct xfs_log_item	*lip,
 	int			remove)
@@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t	*efip
 {
 	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
 	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
-		/* recovery needs us to drop the EFI reference, too */
-		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
-			__xfs_efi_release(efip);
-
 		__xfs_efi_release(efip);
 		/* efip may now have been freed, do not reference it again. */
 	}
Index: b/fs/xfs/xfs_log_recover.c
===================================================================
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3634,6 +3634,7 @@ xlog_recover_process_data(
 /*
  * Process an extent free intent item that was recovered from
  * the log.  We need to free the extents that it describes.
+ * The caller will release this and any following EFIs upon error.
  */
 STATIC int
 xlog_recover_process_efi(
@@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
 	xfs_fsblock_t		startblock_fsb;
 
 	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
+	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
+
+	/*
+	 * Decrement the EFI/EFD counter so the EFI is removed after
+	 * processing the EFD or error handling in the caller.
+	 */
+	xfs_efi_item_unpin(&efip->efi_item, 0);
 
 	/*
 	 * First check the validity of the extents described by the
@@ -3662,12 +3670,6 @@ xlog_recover_process_efi(
 		    (extp->ext_len == 0) ||
 		    (startblock_fsb >= mp->m_sb.sb_dblocks) ||
 		    (extp->ext_len >= mp->m_sb.sb_agblocks)) {
-			/*
-			 * This will pull the EFI from the AIL and
-			 * free the memory associated with it.
-			 */
-			set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
-			xfs_efi_release(efip, efip->efi_format.efi_nextents);
 			return XFS_ERROR(EIO);
 		}
 	}
@@ -3687,7 +3689,6 @@ xlog_recover_process_efi(
 					 extp->ext_len);
 	}
 
-	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
 	error = xfs_trans_commit(tp, 0);
 	return error;
 
@@ -3718,8 +3719,8 @@ STATIC int
 xlog_recover_process_efis(
 	struct xlog	*log)
 {
-	xfs_log_item_t		*lip;
-	xfs_efi_log_item_t	*efip;
+	struct xfs_log_item	*lip;
+	struct xfs_efi_log_item	*efip;
 	int			error = 0;
 	struct xfs_ail_cursor	cur;
 	struct xfs_ail		*ailp;
@@ -3750,13 +3751,14 @@ xlog_recover_process_efis(
 		}
 
 		spin_unlock(&ailp->xa_lock);
-		error = xlog_recover_process_efi(log->l_mp, efip);
-		spin_lock(&ailp->xa_lock);
+		/* Skip all EFIs after first EFI in error. */
+		if (!error)
+			error = xlog_recover_process_efi(log->l_mp, efip);
 		if (error)
-			goto out;
+			xfs_efi_release(efip, efip->efi_format.efi_nextents);
+		spin_lock(&ailp->xa_lock);
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 	}
-out:
 	xfs_trans_ail_cursor_done(ailp, &cur);
 	spin_unlock(&ailp->xa_lock);
 	return error;
Index: b/fs/xfs/xfs_trans.h
===================================================================
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -216,6 +216,7 @@ void		xfs_trans_ijoin(struct xfs_trans *
 void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 struct xfs_efi_log_item	*xfs_trans_get_efi(xfs_trans_t *, uint);
+void		xfs_efi_item_unpin(struct xfs_log_item *, int);
 void		xfs_efi_release(struct xfs_efi_log_item *, uint);
 void		xfs_trans_log_efi_extent(xfs_trans_t *,
 					 struct xfs_efi_log_item *,


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown
  2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely
  2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely
@ 2014-03-25 20:06 ` Mark Tinguely
  2014-04-03 19:34   ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-efi-xfs_bmapi_finish-error.patch --]
[-- Type: text/plain, Size: 7709 bytes --]

The extent free intention (EFI) and extent free done (EFD)
log items are in separate transactions. It is possible that
the EFI can be pushed to the AIL before a forced shutdown
where it gets stuck for following reasons:

 No complete EFD in the transaction. This can happen if the
 transaction fails to reserve space or the freeing the extent
 fails in xfs_bmap_finish().

 EFD IOP Abort processing. If xfs_trans_cancel() is called with
 an abort flag, or if the xfs_trans_commit() is called when the
 file system is in forced shutdown or if the log buffer write fails,
 then the EFD iop commands will not remove the EFI from the AIL.

If the EFI entry is left on the AIL, xfs_ail_push all_sync() will
fail to complete, the unmount will hang, and a reboot is required
to get the complete the unmount.

Do not free the EFI structure immediately on forced shutdowns, but
instead use the IOP calls to match the EFI/EFD entries. A small
wrinkle in the mix is the EFI is not automatically placed on the
AIL by the IOP routines in the cases but may have made it to the
AIL before the abort. We now have to check if the EFI is on the AIL
in the abort IOP cases.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
---
Dave, as I tried to explain, if the error in extent free happen early
enough, the EFD is not usable and is not in the transaction to cause
IOP commands to clean up the EFI. It too must be done manually.

The "in the AIL" test must be done in the abort IOP commands.

 fs/xfs/xfs_bmap_util.c    |   21 +++++++++++++++----
 fs/xfs/xfs_extfree_item.c |   49 +++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_log_recover.c  |    3 +-
 fs/xfs/xfs_trans.h        |    2 -
 4 files changed, 56 insertions(+), 19 deletions(-)

Index: b/fs/xfs/xfs_bmap_util.c
===================================================================
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -116,12 +116,14 @@ xfs_bmap_finish(
 
 	error = xfs_trans_reserve(ntp, &tres, 0, 0);
 	if (error)
-		return error;
+		goto error_return;
+
 	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
 	for (free = flist->xbf_first; free != NULL; free = next) {
 		next = free->xbfi_next;
-		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
-				free->xbfi_blockcount))) {
+		error = xfs_free_extent(ntp, free->xbfi_startblock,
+				free->xbfi_blockcount);
+		if (error) {
 			/*
 			 * The bmap free list will be cleaned up at a
 			 * higher level.  The EFI will be canceled when
@@ -136,13 +138,24 @@ xfs_bmap_finish(
 						   (error == EFSCORRUPTED) ?
 						   SHUTDOWN_CORRUPT_INCORE :
 						   SHUTDOWN_META_IO_ERROR);
-			return error;
+			goto error_return;
 		}
 		xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
 			free->xbfi_blockcount);
 		xfs_bmap_del_free(flist, NULL, free);
 	}
 	return 0;
+
+ error_return:
+	/*
+	 * No EFD in the transaction matching the EFI can be used at this
+	 * point Manually release the EFI and remove from AIL if necessary.
+	 * If the EFI did not make it into the AIL, then the transaction
+	 * cancel of the EFI will decrement the EFI/EFD counter and will not
+	 * attempt to remove itself from the AIL.
+	 */
+	xfs_efi_release(efi, efi->efi_format.efi_nextents, 0);
+	return error;
 }
 
 int
Index: b/fs/xfs/xfs_extfree_item.c
===================================================================
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -56,15 +56,22 @@ xfs_efi_item_free(
  */
 STATIC void
 __xfs_efi_release(
-	struct xfs_efi_log_item	*efip)
+	struct xfs_efi_log_item	*efip,
+	int			abort)
 {
 	struct xfs_ail		*ailp = efip->efi_item.li_ailp;
 
 	if (atomic_dec_and_test(&efip->efi_refcount)) {
 		spin_lock(&ailp->xa_lock);
-		/* xfs_trans_ail_delete() drops the AIL lock. */
-		xfs_trans_ail_delete(ailp, &efip->efi_item,
-				     SHUTDOWN_LOG_IO_ERROR);
+		/*
+		 * The EFI may not be on the AIL on abort.
+		 * xfs_trans_ail_delete() drops the AIL lock.
+		 */
+		if (abort)
+			spin_unlock(&ailp->xa_lock);
+		else
+			xfs_trans_ail_delete(ailp, &efip->efi_item,
+					     SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
@@ -148,10 +155,10 @@ xfs_efi_item_unpin(
 		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
 		if (lip->li_desc)
 			xfs_trans_del_item(lip);
-		xfs_efi_item_free(efip);
+		__xfs_efi_release(efip, 1);
 		return;
 	}
-	__xfs_efi_release(efip);
+	__xfs_efi_release(efip, 0);
 }
 
 /*
@@ -173,8 +180,9 @@ STATIC void
 xfs_efi_item_unlock(
 	struct xfs_log_item	*lip)
 {
+	/* EFI will not be on the AIL if called on abort */
 	if (lip->li_flags & XFS_LI_ABORTED)
-		xfs_efi_item_free(EFI_ITEM(lip));
+		__xfs_efi_release(EFI_ITEM(lip), 1);
 }
 
 /*
@@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf
  */
 void
 xfs_efi_release(xfs_efi_log_item_t	*efip,
-		uint			nextents)
+		uint			nextents,
+		int			abort)
 {
 	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
 	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
-		__xfs_efi_release(efip);
+		__xfs_efi_release(efip, abort);
 		/* efip may now have been freed, do not reference it again. */
 	}
 }
@@ -417,8 +426,19 @@ STATIC void
 xfs_efd_item_unlock(
 	struct xfs_log_item	*lip)
 {
-	if (lip->li_flags & XFS_LI_ABORTED)
-		xfs_efd_item_free(EFD_ITEM(lip));
+	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
+
+	if (!(lip->li_flags & XFS_LI_ABORTED))
+		return;
+
+	/* Free the EFI when aborting a commit. The EFI will be either
+	 * added to the AIL in a CIL push before this abort or unlocked
+	 * before the EFD unlock. In either case we can check the AIL
+	 * status now.
+	 */
+	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents,
+			!(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL));
+	xfs_efd_item_free(efdp);
 }
 
 /*
@@ -434,14 +454,17 @@ xfs_efd_item_committed(
 	xfs_lsn_t		lsn)
 {
 	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
+	int abort = 0;
 
 	/*
 	 * If we got a log I/O error, it's always the case that the LR with the
 	 * EFI got unpinned and freed before the EFD got aborted.
 	 */
-	if (!(lip->li_flags & XFS_LI_ABORTED))
-		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
+	if (lip->li_flags & XFS_LI_ABORTED &&
+	    !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL))
+		abort = 1;
 
+	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort);
 	xfs_efd_item_free(efdp);
 	return (xfs_lsn_t)-1;
 }
Index: b/fs/xfs/xfs_log_recover.c
===================================================================
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3754,8 +3754,9 @@ xlog_recover_process_efis(
 		/* Skip all EFIs after first EFI in error. */
 		if (!error)
 			error = xlog_recover_process_efi(log->l_mp, efip);
+		/* EFI will be on the AIL, so abort == 0 */
 		if (error)
-			xfs_efi_release(efip, efip->efi_format.efi_nextents);
+			xfs_efi_release(efip, efip->efi_format.efi_nextents, 0);
 		spin_lock(&ailp->xa_lock);
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 	}
Index: b/fs/xfs/xfs_trans.h
===================================================================
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -217,7 +217,7 @@ void		xfs_trans_log_buf(xfs_trans_t *, s
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 struct xfs_efi_log_item	*xfs_trans_get_efi(xfs_trans_t *, uint);
 void		xfs_efi_item_unpin(struct xfs_log_item *, int);
-void		xfs_efi_release(struct xfs_efi_log_item *, uint);
+void		xfs_efi_release(struct xfs_efi_log_item *, uint, int);
 void		xfs_trans_log_efi_extent(xfs_trans_t *,
 					 struct xfs_efi_log_item *,
 					 xfs_fsblock_t,


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error
  2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely
@ 2014-03-28 15:24   ` Brian Foster
  2014-03-28 15:41     ` Mark Tinguely
  2014-03-31 20:25   ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely
  1 sibling, 1 reply; 12+ messages in thread
From: Brian Foster @ 2014-03-28 15:24 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote:
> xlog_recover_process_efi{s}() functions are completing the
> second half of xfs_bmap_finish() which frees extents. If this
> operation fails, the EFI will stay on the AIL and prevents
> the xfs_ail_push all_sync() from completing and the mount will
> fail to unmount.
> 
> Rather than have a special log recovery flag XFS_EFI_RECOVERED
> to decrement the EFI/EFD counter, call the same decrement function
> from the log recovery routine that is called then the EFI is added
> to the AIL from a log write.
> 
> Remove all other unprocessed EFIs from the log recovery AIL
> when one is discovered in error.
> 
> Signed-off-by: Mark Tinguely <tinguely@sgi.com>
> ---
> Rewritten with suggestions from Dave.
> Note: calling xfs_efi_item_unpin() seemed more explainatory than calling
>       the helper __xfs_efi_release().
> 
>  fs/xfs/xfs_extfree_item.c |    9 +++------
>  fs/xfs/xfs_log_recover.c  |   28 +++++++++++++++-------------
>  fs/xfs/xfs_trans.h        |    1 +
>  3 files changed, 19 insertions(+), 19 deletions(-)
> 
> Index: b/fs/xfs/xfs_extfree_item.c
> ===================================================================
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -134,9 +134,10 @@ xfs_efi_item_pin(
>   * remove the EFI it's because the transaction has been canceled and by
>   * definition that means the EFI cannot be in the AIL so remove it from the
>   * transaction and free it.  Otherwise coordinate with xfs_efi_release()
> - * to determine who gets to free the EFI.
> + * to determine who gets to free the EFI. Call from log recovery of EFI
> + * entries so the EFD or error handling will remove the entry.
>   */
> -STATIC void
> +void
>  xfs_efi_item_unpin(
>  	struct xfs_log_item	*lip,
>  	int			remove)
> @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t	*efip
>  {
>  	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
>  	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
> -		/* recovery needs us to drop the EFI reference, too */
> -		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
> -			__xfs_efi_release(efip);
> -
>  		__xfs_efi_release(efip);
>  		/* efip may now have been freed, do not reference it again. */
>  	}
> Index: b/fs/xfs/xfs_log_recover.c
> ===================================================================
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3634,6 +3634,7 @@ xlog_recover_process_data(
>  /*
>   * Process an extent free intent item that was recovered from
>   * the log.  We need to free the extents that it describes.
> + * The caller will release this and any following EFIs upon error.
>   */
>  STATIC int
>  xlog_recover_process_efi(
> @@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
>  	xfs_fsblock_t		startblock_fsb;
>  
>  	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
> +	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
> +
> +	/*
> +	 * Decrement the EFI/EFD counter so the EFI is removed after
> +	 * processing the EFD or error handling in the caller.
> +	 */
> +	xfs_efi_item_unpin(&efip->efi_item, 0);
>  
>  	/*
>  	 * First check the validity of the extents described by the
> @@ -3662,12 +3670,6 @@ xlog_recover_process_efi(
>  		    (extp->ext_len == 0) ||
>  		    (startblock_fsb >= mp->m_sb.sb_dblocks) ||
>  		    (extp->ext_len >= mp->m_sb.sb_agblocks)) {
> -			/*
> -			 * This will pull the EFI from the AIL and
> -			 * free the memory associated with it.
> -			 */
> -			set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
> -			xfs_efi_release(efip, efip->efi_format.efi_nextents);
>  			return XFS_ERROR(EIO);
>  		}
>  	}
> @@ -3687,7 +3689,6 @@ xlog_recover_process_efi(
>  					 extp->ext_len);
>  	}
>  
> -	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
>  	error = xfs_trans_commit(tp, 0);
>  	return error;
>  
> @@ -3718,8 +3719,8 @@ STATIC int
>  xlog_recover_process_efis(
>  	struct xlog	*log)
>  {
> -	xfs_log_item_t		*lip;
> -	xfs_efi_log_item_t	*efip;
> +	struct xfs_log_item	*lip;
> +	struct xfs_efi_log_item	*efip;
>  	int			error = 0;
>  	struct xfs_ail_cursor	cur;
>  	struct xfs_ail		*ailp;
> @@ -3750,13 +3751,14 @@ xlog_recover_process_efis(
>  		}
>  
>  		spin_unlock(&ailp->xa_lock);
> -		error = xlog_recover_process_efi(log->l_mp, efip);
> -		spin_lock(&ailp->xa_lock);
> +		/* Skip all EFIs after first EFI in error. */
> +		if (!error)
> +			error = xlog_recover_process_efi(log->l_mp, efip);
>  		if (error)
> -			goto out;
> +			xfs_efi_release(efip, efip->efi_format.efi_nextents);

Hi Mark,

If we hit the scenario where we start skipping EFIs after an error, is
the equivalent unpin() call from process_efi() not necessary on the
subsequent EFIs?

Brian

> +		spin_lock(&ailp->xa_lock);
>  		lip = xfs_trans_ail_cursor_next(ailp, &cur);
>  	}
> -out:
>  	xfs_trans_ail_cursor_done(ailp, &cur);
>  	spin_unlock(&ailp->xa_lock);
>  	return error;
> Index: b/fs/xfs/xfs_trans.h
> ===================================================================
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -216,6 +216,7 @@ void		xfs_trans_ijoin(struct xfs_trans *
>  void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
>  void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
>  struct xfs_efi_log_item	*xfs_trans_get_efi(xfs_trans_t *, uint);
> +void		xfs_efi_item_unpin(struct xfs_log_item *, int);
>  void		xfs_efi_release(struct xfs_efi_log_item *, uint);
>  void		xfs_trans_log_efi_extent(xfs_trans_t *,
>  					 struct xfs_efi_log_item *,
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error
  2014-03-28 15:24   ` Brian Foster
@ 2014-03-28 15:41     ` Mark Tinguely
  2014-03-28 16:07       ` Brian Foster
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2014-03-28 15:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 03/28/14 10:24, Brian Foster wrote:
> On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote:
>> xlog_recover_process_efi{s}() functions are completing the
>> second half of xfs_bmap_finish() which frees extents. If this
>> operation fails, the EFI will stay on the AIL and prevents
>> the xfs_ail_push all_sync() from completing and the mount will
>> fail to unmount.
>>
>> Rather than have a special log recovery flag XFS_EFI_RECOVERED
>> to decrement the EFI/EFD counter, call the same decrement function
>> from the log recovery routine that is called then the EFI is added
>> to the AIL from a log write.
>>
>> Remove all other unprocessed EFIs from the log recovery AIL
>> when one is discovered in error.
>>
>> Signed-off-by: Mark Tinguely<tinguely@sgi.com>
>> ---
>> Rewritten with suggestions from Dave.
>> Note: calling xfs_efi_item_unpin() seemed more explainatory than calling
>>        the helper __xfs_efi_release().
>>
>>   fs/xfs/xfs_extfree_item.c |    9 +++------
>>   fs/xfs/xfs_log_recover.c  |   28 +++++++++++++++-------------
>>   fs/xfs/xfs_trans.h        |    1 +
>>   3 files changed, 19 insertions(+), 19 deletions(-)
>>
>> Index: b/fs/xfs/xfs_extfree_item.c
>> ===================================================================
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -134,9 +134,10 @@ xfs_efi_item_pin(
>>    * remove the EFI it's because the transaction has been canceled and by
>>    * definition that means the EFI cannot be in the AIL so remove it from the
>>    * transaction and free it.  Otherwise coordinate with xfs_efi_release()
>> - * to determine who gets to free the EFI.
>> + * to determine who gets to free the EFI. Call from log recovery of EFI
>> + * entries so the EFD or error handling will remove the entry.
>>    */
>> -STATIC void
>> +void
>>   xfs_efi_item_unpin(
>>   	struct xfs_log_item	*lip,
>>   	int			remove)
>> @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t	*efip
>>   {
>>   	ASSERT(atomic_read(&efip->efi_next_extent)>= nextents);
>>   	if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) {
>> -		/* recovery needs us to drop the EFI reference, too */
>> -		if (test_bit(XFS_EFI_RECOVERED,&efip->efi_flags))
>> -			__xfs_efi_release(efip);
>> -
>>   		__xfs_efi_release(efip);
>>   		/* efip may now have been freed, do not reference it again. */
>>   	}
>> Index: b/fs/xfs/xfs_log_recover.c
>> ===================================================================
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -3634,6 +3634,7 @@ xlog_recover_process_data(
>>   /*
>>    * Process an extent free intent item that was recovered from
>>    * the log.  We need to free the extents that it describes.
>> + * The caller will release this and any following EFIs upon error.
>>    */
>>   STATIC int
>>   xlog_recover_process_efi(
>> @@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
>>   	xfs_fsblock_t		startblock_fsb;
>>
>>   	ASSERT(!test_bit(XFS_EFI_RECOVERED,&efip->efi_flags));
>> +	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
>> +
>> +	/*
>> +	 * Decrement the EFI/EFD counter so the EFI is removed after
>> +	 * processing the EFD or error handling in the caller.
>> +	 */
>> +	xfs_efi_item_unpin(&efip->efi_item, 0);
>>
>>   	/*
>>   	 * First check the validity of the extents described by the
>> @@ -3662,12 +3670,6 @@ xlog_recover_process_efi(
>>   		    (extp->ext_len == 0) ||
>>   		    (startblock_fsb>= mp->m_sb.sb_dblocks) ||
>>   		    (extp->ext_len>= mp->m_sb.sb_agblocks)) {
>> -			/*
>> -			 * This will pull the EFI from the AIL and
>> -			 * free the memory associated with it.
>> -			 */
>> -			set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
>> -			xfs_efi_release(efip, efip->efi_format.efi_nextents);
>>   			return XFS_ERROR(EIO);
>>   		}
>>   	}
>> @@ -3687,7 +3689,6 @@ xlog_recover_process_efi(
>>   					 extp->ext_len);
>>   	}
>>
>> -	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
>>   	error = xfs_trans_commit(tp, 0);
>>   	return error;
>>
>> @@ -3718,8 +3719,8 @@ STATIC int
>>   xlog_recover_process_efis(
>>   	struct xlog	*log)
>>   {
>> -	xfs_log_item_t		*lip;
>> -	xfs_efi_log_item_t	*efip;
>> +	struct xfs_log_item	*lip;
>> +	struct xfs_efi_log_item	*efip;
>>   	int			error = 0;
>>   	struct xfs_ail_cursor	cur;
>>   	struct xfs_ail		*ailp;
>> @@ -3750,13 +3751,14 @@ xlog_recover_process_efis(
>>   		}
>>
>>   		spin_unlock(&ailp->xa_lock);
>> -		error = xlog_recover_process_efi(log->l_mp, efip);
>> -		spin_lock(&ailp->xa_lock);
>> +		/* Skip all EFIs after first EFI in error. */
>> +		if (!error)
>> +			error = xlog_recover_process_efi(log->l_mp, efip);
>>   		if (error)
>> -			goto out;
>> +			xfs_efi_release(efip, efip->efi_format.efi_nextents);
>
> Hi Mark,
>
> If we hit the scenario where we start skipping EFIs after an error, is
> the equivalent unpin() call from process_efi() not necessary on the
> subsequent EFIs?
>
> Brian

yes, good catch. They will have to be decremented twice. something like:
+		if (!error)
+			error = xlog_recover_process_efi(log->l_mp, efip);
+		else
+			xfs_efi_item_unpin(&efip->efi_item, 0);
+		if (error)
...

--Mark

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error
  2014-03-28 15:41     ` Mark Tinguely
@ 2014-03-28 16:07       ` Brian Foster
  2014-03-28 16:21         ` Mark Tinguely
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2014-03-28 16:07 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Fri, Mar 28, 2014 at 10:41:06AM -0500, Mark Tinguely wrote:
> On 03/28/14 10:24, Brian Foster wrote:
> >On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote:
> >>xlog_recover_process_efi{s}() functions are completing the
> >>second half of xfs_bmap_finish() which frees extents. If this
> >>operation fails, the EFI will stay on the AIL and prevents
> >>the xfs_ail_push all_sync() from completing and the mount will
> >>fail to unmount.
> >>
> >>Rather than have a special log recovery flag XFS_EFI_RECOVERED
> >>to decrement the EFI/EFD counter, call the same decrement function
> >>from the log recovery routine that is called then the EFI is added
> >>to the AIL from a log write.
> >>
> >>Remove all other unprocessed EFIs from the log recovery AIL
> >>when one is discovered in error.
> >>
> >>Signed-off-by: Mark Tinguely<tinguely@sgi.com>
> >>---
> >>Rewritten with suggestions from Dave.
> >>Note: calling xfs_efi_item_unpin() seemed more explainatory than calling
> >>       the helper __xfs_efi_release().
> >>
> >>  fs/xfs/xfs_extfree_item.c |    9 +++------
> >>  fs/xfs/xfs_log_recover.c  |   28 +++++++++++++++-------------
> >>  fs/xfs/xfs_trans.h        |    1 +
> >>  3 files changed, 19 insertions(+), 19 deletions(-)
> >>
> >>Index: b/fs/xfs/xfs_extfree_item.c
> >>===================================================================
> >>--- a/fs/xfs/xfs_extfree_item.c
> >>+++ b/fs/xfs/xfs_extfree_item.c
> >>@@ -134,9 +134,10 @@ xfs_efi_item_pin(
> >>   * remove the EFI it's because the transaction has been canceled and by
> >>   * definition that means the EFI cannot be in the AIL so remove it from the
> >>   * transaction and free it.  Otherwise coordinate with xfs_efi_release()
> >>- * to determine who gets to free the EFI.
> >>+ * to determine who gets to free the EFI. Call from log recovery of EFI
> >>+ * entries so the EFD or error handling will remove the entry.
> >>   */
> >>-STATIC void
> >>+void
> >>  xfs_efi_item_unpin(
> >>  	struct xfs_log_item	*lip,
> >>  	int			remove)
> >>@@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t	*efip
> >>  {
> >>  	ASSERT(atomic_read(&efip->efi_next_extent)>= nextents);
> >>  	if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) {
> >>-		/* recovery needs us to drop the EFI reference, too */
> >>-		if (test_bit(XFS_EFI_RECOVERED,&efip->efi_flags))
> >>-			__xfs_efi_release(efip);
> >>-
> >>  		__xfs_efi_release(efip);
> >>  		/* efip may now have been freed, do not reference it again. */
> >>  	}
> >>Index: b/fs/xfs/xfs_log_recover.c
> >>===================================================================
> >>--- a/fs/xfs/xfs_log_recover.c
> >>+++ b/fs/xfs/xfs_log_recover.c
> >>@@ -3634,6 +3634,7 @@ xlog_recover_process_data(
> >>  /*
> >>   * Process an extent free intent item that was recovered from
> >>   * the log.  We need to free the extents that it describes.
> >>+ * The caller will release this and any following EFIs upon error.
> >>   */
> >>  STATIC int
> >>  xlog_recover_process_efi(
> >>@@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
> >>  	xfs_fsblock_t		startblock_fsb;
> >>
> >>  	ASSERT(!test_bit(XFS_EFI_RECOVERED,&efip->efi_flags));
> >>+	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
> >>+
> >>+	/*
> >>+	 * Decrement the EFI/EFD counter so the EFI is removed after
> >>+	 * processing the EFD or error handling in the caller.
> >>+	 */
> >>+	xfs_efi_item_unpin(&efip->efi_item, 0);
> >>
> >>  	/*
> >>  	 * First check the validity of the extents described by the
> >>@@ -3662,12 +3670,6 @@ xlog_recover_process_efi(
> >>  		    (extp->ext_len == 0) ||
> >>  		    (startblock_fsb>= mp->m_sb.sb_dblocks) ||
> >>  		    (extp->ext_len>= mp->m_sb.sb_agblocks)) {
> >>-			/*
> >>-			 * This will pull the EFI from the AIL and
> >>-			 * free the memory associated with it.
> >>-			 */
> >>-			set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
> >>-			xfs_efi_release(efip, efip->efi_format.efi_nextents);
> >>  			return XFS_ERROR(EIO);
> >>  		}
> >>  	}
> >>@@ -3687,7 +3689,6 @@ xlog_recover_process_efi(
> >>  					 extp->ext_len);
> >>  	}
> >>
> >>-	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
> >>  	error = xfs_trans_commit(tp, 0);
> >>  	return error;
> >>
> >>@@ -3718,8 +3719,8 @@ STATIC int
> >>  xlog_recover_process_efis(
> >>  	struct xlog	*log)
> >>  {
> >>-	xfs_log_item_t		*lip;
> >>-	xfs_efi_log_item_t	*efip;
> >>+	struct xfs_log_item	*lip;
> >>+	struct xfs_efi_log_item	*efip;
> >>  	int			error = 0;
> >>  	struct xfs_ail_cursor	cur;
> >>  	struct xfs_ail		*ailp;
> >>@@ -3750,13 +3751,14 @@ xlog_recover_process_efis(
> >>  		}
> >>
> >>  		spin_unlock(&ailp->xa_lock);
> >>-		error = xlog_recover_process_efi(log->l_mp, efip);
> >>-		spin_lock(&ailp->xa_lock);
> >>+		/* Skip all EFIs after first EFI in error. */
> >>+		if (!error)
> >>+			error = xlog_recover_process_efi(log->l_mp, efip);
> >>  		if (error)
> >>-			goto out;
> >>+			xfs_efi_release(efip, efip->efi_format.efi_nextents);
> >
> >Hi Mark,
> >
> >If we hit the scenario where we start skipping EFIs after an error, is
> >the equivalent unpin() call from process_efi() not necessary on the
> >subsequent EFIs?
> >
> >Brian
> 
> yes, good catch. They will have to be decremented twice. something like:
> +		if (!error)
> +			error = xlog_recover_process_efi(log->l_mp, efip);
> +		else
> +			xfs_efi_item_unpin(&efip->efi_item, 0);
> +		if (error)
> ...
> 

Ok, looks reasonable to me. An extra sentence or two in the previous
comment to explain what's going on there would be nice as well. ;)

Brian

> --Mark

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error
  2014-03-28 16:07       ` Brian Foster
@ 2014-03-28 16:21         ` Mark Tinguely
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Tinguely @ 2014-03-28 16:21 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 03/28/14 11:07, Brian Foster wrote:
> On Fri, Mar 28, 2014 at 10:41:06AM -0500, Mark Tinguely wrote:
>> On 03/28/14 10:24, Brian Foster wrote:
>>> On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote:

...

>>> Hi Mark,
>>>
>>> If we hit the scenario where we start skipping EFIs after an error, is
>>> the equivalent unpin() call from process_efi() not necessary on the
>>> subsequent EFIs?
>>>
>>> Brian
>>
>> yes, good catch. They will have to be decremented twice. something like:
>> +		if (!error)
>> +			error = xlog_recover_process_efi(log->l_mp, efip);
>> +		else
>> +			xfs_efi_item_unpin(&efip->efi_item, 0);
>> +		if (error)
>> ...
>>
>
> Ok, looks reasonable to me. An extra sentence or two in the previous
> comment to explain what's going on there would be nice as well. ;)
>
> Brian

Probably will flip the if statement logic, but a comment is also a good 
idea. Thank-you for the feed back.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] xfs: remove efi from AIL in log recovery
  2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely
  2014-03-28 15:24   ` Brian Foster
@ 2014-03-31 20:25   ` Mark Tinguely
  2014-04-03 19:07     ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2014-03-31 20:25 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-fix-efi-in-log-recovery-error.patch --]
[-- Type: text/plain, Size: 5703 bytes --]

xlog_recover_process_efi{s}() functions are completing the
second half of xfs_bmap_finish() which frees extents. If this
operation fails, the EFI will stay on the AIL and prevents
the xfs_ail_push all_sync() from completing and the mount will
fail to unmount.

Rather than have a special log recovery flag XFS_EFI_RECOVERED
to decrement the EFI/EFD counter, call the same decrement function
from the log recovery routine that is called then the EFI is added
to the AIL from a log write.

Remove all other unprocessed EFIs from the log recovery AIL
when one is discovered in error.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
---
v2 per Brian's feedback:
  decrement the EFI counter again when removing EFI entries after the
  entry with the error.

  add comment in xlog_recover_process_efis() where we delete the EFIs.

v1:
Rewritten with suggestions from Dave.
Note: calling xfs_efi_item_unpin() seemed more explainatory than calling
      the helper __xfs_efi_release().

 fs/xfs/xfs_extfree_item.c |    9 +++------
 fs/xfs/xfs_log_recover.c  |   37 ++++++++++++++++++++++++-------------
 fs/xfs/xfs_trans.h        |    1 +
 3 files changed, 28 insertions(+), 19 deletions(-)

Index: b/fs/xfs/xfs_extfree_item.c
===================================================================
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -134,9 +134,10 @@ xfs_efi_item_pin(
  * remove the EFI it's because the transaction has been cancelled and by
  * definition that means the EFI cannot be in the AIL so remove it from the
  * transaction and free it.  Otherwise coordinate with xfs_efi_release()
- * to determine who gets to free the EFI.
+ * to determine who gets to free the EFI. Call from log recovery of EFI
+ * entries so the EFD or error handling will remove the entry.
  */
-STATIC void
+void
 xfs_efi_item_unpin(
 	struct xfs_log_item	*lip,
 	int			remove)
@@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t	*efip
 {
 	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
 	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
-		/* recovery needs us to drop the EFI reference, too */
-		if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
-			__xfs_efi_release(efip);
-
 		__xfs_efi_release(efip);
 		/* efip may now have been freed, do not reference it again. */
 	}
Index: b/fs/xfs/xfs_log_recover.c
===================================================================
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3634,6 +3634,7 @@ xlog_recover_process_data(
 /*
  * Process an extent free intent item that was recovered from
  * the log.  We need to free the extents that it describes.
+ * The caller will release this and any following EFIs upon error.
  */
 STATIC int
 xlog_recover_process_efi(
@@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
 	xfs_fsblock_t		startblock_fsb;
 
 	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
+	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
+
+	/*
+	 * Decrement the EFI/EFD counter so the EFI is removed after
+	 * processing the EFD or error handling in the caller.
+	 */
+	xfs_efi_item_unpin(&efip->efi_item, 0);
 
 	/*
 	 * First check the validity of the extents described by the
@@ -3662,12 +3670,6 @@ xlog_recover_process_efi(
 		    (extp->ext_len == 0) ||
 		    (startblock_fsb >= mp->m_sb.sb_dblocks) ||
 		    (extp->ext_len >= mp->m_sb.sb_agblocks)) {
-			/*
-			 * This will pull the EFI from the AIL and
-			 * free the memory associated with it.
-			 */
-			set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
-			xfs_efi_release(efip, efip->efi_format.efi_nextents);
 			return XFS_ERROR(EIO);
 		}
 	}
@@ -3687,7 +3689,6 @@ xlog_recover_process_efi(
 					 extp->ext_len);
 	}
 
-	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
 	error = xfs_trans_commit(tp, 0);
 	return error;
 
@@ -3718,8 +3719,8 @@ STATIC int
 xlog_recover_process_efis(
 	struct xlog	*log)
 {
-	xfs_log_item_t		*lip;
-	xfs_efi_log_item_t	*efip;
+	struct xfs_log_item	*lip;
+	struct xfs_efi_log_item	*efip;
 	int			error = 0;
 	struct xfs_ail_cursor	cur;
 	struct xfs_ail		*ailp;
@@ -3750,13 +3751,23 @@ xlog_recover_process_efis(
 		}
 
 		spin_unlock(&ailp->xa_lock);
-		error = xlog_recover_process_efi(log->l_mp, efip);
-		spin_lock(&ailp->xa_lock);
+		if (error) {
+			/*
+			 * An error happened while processing a previous
+			 * EFI entry. Since the extents are freed in order,
+			 * skip the remaining unprocessed EFIs. To do this
+			 * the EFI entry counter must be decremented once
+			 * here and the entry will be decremented and removed
+			 * with the following xfs_efi_release().
+			 */
+			xfs_efi_item_unpin(&efip->efi_item, 0);
+		} else
+			error = xlog_recover_process_efi(log->l_mp, efip);
 		if (error)
-			goto out;
+			xfs_efi_release(efip, efip->efi_format.efi_nextents);
+		spin_lock(&ailp->xa_lock);
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 	}
-out:
 	xfs_trans_ail_cursor_done(ailp, &cur);
 	spin_unlock(&ailp->xa_lock);
 	return error;
Index: b/fs/xfs/xfs_trans.h
===================================================================
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -216,6 +216,7 @@ void		xfs_trans_ijoin(struct xfs_trans *
 void		xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
 void		xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);
 struct xfs_efi_log_item	*xfs_trans_get_efi(xfs_trans_t *, uint);
+void		xfs_efi_item_unpin(struct xfs_log_item *, int);
 void		xfs_efi_release(struct xfs_efi_log_item *, uint);
 void		xfs_trans_log_efi_extent(xfs_trans_t *,
 					 struct xfs_efi_log_item *,


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xfs: remove efi from AIL in log recovery
  2014-03-31 20:25   ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely
@ 2014-04-03 19:07     ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2014-04-03 19:07 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Mon, Mar 31, 2014 at 03:25:56PM -0500, Mark Tinguely wrote:
> xlog_recover_process_efi{s}() functions are completing the
> second half of xfs_bmap_finish() which frees extents. If this
> operation fails, the EFI will stay on the AIL and prevents
> the xfs_ail_push all_sync() from completing and the mount will
> fail to unmount.
> 
> Rather than have a special log recovery flag XFS_EFI_RECOVERED
> to decrement the EFI/EFD counter, call the same decrement function
> from the log recovery routine that is called then the EFI is added
> to the AIL from a log write.
....
> ===================================================================
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3634,6 +3634,7 @@ xlog_recover_process_data(
>  /*
>   * Process an extent free intent item that was recovered from
>   * the log.  We need to free the extents that it describes.
> + * The caller will release this and any following EFIs upon error.
>   */
>  STATIC int
>  xlog_recover_process_efi(
> @@ -3648,6 +3649,13 @@ xlog_recover_process_efi(
>  	xfs_fsblock_t		startblock_fsb;
>  
>  	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
> +	set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
> +
> +	/*
> +	 * Decrement the EFI/EFD counter so the EFI is removed after
> +	 * processing the EFD or error handling in the caller.
> +	 */
> +	xfs_efi_item_unpin(&efip->efi_item, 0);

This should be done where the EFI is inserted into the AIL during
recovery, not where it is removed. That means "committed" EFIs in
the AIL have the same reference count values for both the normal
transaction case and the log recovery case. Brian already caught a
bug in your original patch because of this, so lets not leave
landmines if we can avoid it.

Also, the comment needs to explain why we don't need the reference
count (i.e. because it's a committed EFI and hence only the active
EFD can hold a reference against it now) and that on error the
caller is responsible for removing the EFI from the AIL and freeing
it.

Finally, I don't think there's any need for the XFS_EFI_RECOVERED
bit anymore, because we either process it (and remove it from the
AIL) or free it after removing it from the AIL on error and so it
won't ever get found in the AIL with that flag set....

> @@ -3750,13 +3751,23 @@ xlog_recover_process_efis(
>  		}
>  
>  		spin_unlock(&ailp->xa_lock);
> -		error = xlog_recover_process_efi(log->l_mp, efip);
> -		spin_lock(&ailp->xa_lock);
> +		if (error) {
> +			/*
> +			 * An error happened while processing a previous
> +			 * EFI entry. Since the extents are freed in order,
> +			 * skip the remaining unprocessed EFIs. To do this
> +			 * the EFI entry counter must be decremented once
> +			 * here and the entry will be decremented and removed
> +			 * with the following xfs_efi_release().
> +			 */
> +			xfs_efi_item_unpin(&efip->efi_item, 0);
> +		} else
> +			error = xlog_recover_process_efi(log->l_mp, efip);
>  		if (error)
> -			goto out;
> +			xfs_efi_release(efip, efip->efi_format.efi_nextents);
> +		spin_lock(&ailp->xa_lock);
>  		lip = xfs_trans_ail_cursor_next(ailp, &cur);

OK, so this now relies on xfs_efi_release() to remove the EFI from
the AIL. But the reason the unpin ugliness needs to be done here is
that we left a stale AIL reference on the EFI when we inserted it
into the AIL. That's another reason for moving that
xfs_efi_item_unpin() to the insert code.

Also, I'd prefer the large comment decribing the logic flow goes at
the head of the function (i.e. describing the overall logic flow of
the function) and that call should clean up this logic a lot.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown
  2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely
@ 2014-04-03 19:34   ` Dave Chinner
  2014-04-03 20:48     ` Mark Tinguely
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2014-04-03 19:34 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote:
> The extent free intention (EFI) and extent free done (EFD)
> log items are in separate transactions. It is possible that
> the EFI can be pushed to the AIL before a forced shutdown
> where it gets stuck for following reasons:
> 
>  No complete EFD in the transaction. This can happen if the
>  transaction fails to reserve space or the freeing the extent
>  fails in xfs_bmap_finish().
> 
>  EFD IOP Abort processing. If xfs_trans_cancel() is called with
>  an abort flag, or if the xfs_trans_commit() is called when the
>  file system is in forced shutdown or if the log buffer write fails,
>  then the EFD iop commands will not remove the EFI from the AIL.
> 
> If the EFI entry is left on the AIL, xfs_ail_push all_sync() will
> fail to complete, the unmount will hang, and a reboot is required
> to get the complete the unmount.
> 
> Do not free the EFI structure immediately on forced shutdowns, but
> instead use the IOP calls to match the EFI/EFD entries. A small
> wrinkle in the mix is the EFI is not automatically placed on the
> AIL by the IOP routines in the cases but may have made it to the
> AIL before the abort. We now have to check if the EFI is on the AIL
> in the abort IOP cases.
> 
> Signed-off-by: Mark Tinguely <tinguely@sgi.com>
> ---
> Dave, as I tried to explain, if the error in extent free happen early
> enough, the EFD is not usable and is not in the transaction to cause
> IOP commands to clean up the EFI. It too must be done manually.
>
> The "in the AIL" test must be done in the abort IOP commands.

I still don't see the reason behind these assertions. It just
doesn't make any sense to me that it can't be handled once the EFI
has been attached to the EFD, nor why checking AIL status in
__xfs_efi_release() does not work correctly...

>  fs/xfs/xfs_bmap_util.c    |   21 +++++++++++++++----
>  fs/xfs/xfs_extfree_item.c |   49 +++++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_log_recover.c  |    3 +-
>  fs/xfs/xfs_trans.h        |    2 -
>  4 files changed, 56 insertions(+), 19 deletions(-)
> 
> Index: b/fs/xfs/xfs_bmap_util.c
> ===================================================================
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -116,12 +116,14 @@ xfs_bmap_finish(
>  
>  	error = xfs_trans_reserve(ntp, &tres, 0, 0);
>  	if (error)
> -		return error;
> +		goto error_return;
> +
>  	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
>  	for (free = flist->xbf_first; free != NULL; free = next) {
>  		next = free->xbfi_next;
> -		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
> -				free->xbfi_blockcount))) {
> +		error = xfs_free_extent(ntp, free->xbfi_startblock,
> +				free->xbfi_blockcount);
> +		if (error) {
>  			/*
>  			 * The bmap free list will be cleaned up at a
>  			 * higher level.  The EFI will be canceled when
> @@ -136,13 +138,24 @@ xfs_bmap_finish(
>  						   (error == EFSCORRUPTED) ?
>  						   SHUTDOWN_CORRUPT_INCORE :
>  						   SHUTDOWN_META_IO_ERROR);
> -			return error;
> +			goto error_return;
>  		}
>  		xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
>  			free->xbfi_blockcount);
>  		xfs_bmap_del_free(flist, NULL, free);
>  	}
>  	return 0;
> +
> + error_return:
> +	/*
> +	 * No EFD in the transaction matching the EFI can be used at this
> +	 * point Manually release the EFI and remove from AIL if necessary.
> +	 * If the EFI did not make it into the AIL, then the transaction
> +	 * cancel of the EFI will decrement the EFI/EFD counter and will not
> +	 * attempt to remove itself from the AIL.
> +	 */
> +	xfs_efi_release(efi, efi->efi_format.efi_nextents, 0);
> +	return error;

This is inconsistent from a high level logic point of view. The EFI
is attached to the EFD after a call to xfs_trans_get_efd() and so
cleanup on failure should always occur through the IOP* interfaces
(specifically xfs_efd_item_unlock() with an aborted log item).

The only case where a xfs_efi_release() call is required is the
xfs_trans_reserve() failure case, where nothing in the current
transaction context owns the reference on the EFI we are going to
pass to the EFD.

>  }
>  
>  int
> Index: b/fs/xfs/xfs_extfree_item.c
> ===================================================================
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -56,15 +56,22 @@ xfs_efi_item_free(
>   */
>  STATIC void
>  __xfs_efi_release(
> -	struct xfs_efi_log_item	*efip)
> +	struct xfs_efi_log_item	*efip,
> +	int			abort)
>  {
>  	struct xfs_ail		*ailp = efip->efi_item.li_ailp;
>  
>  	if (atomic_dec_and_test(&efip->efi_refcount)) {
>  		spin_lock(&ailp->xa_lock);
> -		/* xfs_trans_ail_delete() drops the AIL lock. */
> -		xfs_trans_ail_delete(ailp, &efip->efi_item,
> -				     SHUTDOWN_LOG_IO_ERROR);
> +		/*
> +		 * The EFI may not be on the AIL on abort.
> +		 * xfs_trans_ail_delete() drops the AIL lock.
> +		 */
> +		if (abort)
> +			spin_unlock(&ailp->xa_lock);
> +		else
> +			xfs_trans_ail_delete(ailp, &efip->efi_item,
> +					     SHUTDOWN_LOG_IO_ERROR);
>  		xfs_efi_item_free(efip);
>  	}
>  }

"Abort" is not necessary. If it's the last reference, and the EFI is
in the AIL, then it needs to be removed. i.e. the check shoul dbe
against XFS_LI_IN_AIL, not against a flag passed in by the caller.

> @@ -148,10 +155,10 @@ xfs_efi_item_unpin(
>  		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
>  		if (lip->li_desc)
>  			xfs_trans_del_item(lip);
> -		xfs_efi_item_free(efip);
> +		__xfs_efi_release(efip, 1);
>  		return;
>  	}
> -	__xfs_efi_release(efip);
> +	__xfs_efi_release(efip, 0);
>  }
>  
>  /*
> @@ -173,8 +180,9 @@ STATIC void
>  xfs_efi_item_unlock(
>  	struct xfs_log_item	*lip)
>  {
> +	/* EFI will not be on the AIL if called on abort */

If that is true, then ASSERT() it rather than comment about it.

>  	if (lip->li_flags & XFS_LI_ABORTED)
> -		xfs_efi_item_free(EFI_ITEM(lip));
> +		__xfs_efi_release(EFI_ITEM(lip), 1);
>  }
>  
>  /*
> @@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf
>   */
>  void
>  xfs_efi_release(xfs_efi_log_item_t	*efip,
> -		uint			nextents)
> +		uint			nextents,
> +		int			abort)
>  {
>  	ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
>  	if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
> -		__xfs_efi_release(efip);
> +		__xfs_efi_release(efip, abort);
>  		/* efip may now have been freed, do not reference it again. */
>  	}
>  }
> @@ -417,8 +426,19 @@ STATIC void
>  xfs_efd_item_unlock(
>  	struct xfs_log_item	*lip)
>  {
> -	if (lip->li_flags & XFS_LI_ABORTED)
> -		xfs_efd_item_free(EFD_ITEM(lip));
> +	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
> +
> +	if (!(lip->li_flags & XFS_LI_ABORTED))
> +		return;
> +
> +	/* Free the EFI when aborting a commit. The EFI will be either
> +	 * added to the AIL in a CIL push before this abort or unlocked
> +	 * before the EFD unlock. In either case we can check the AIL
> +	 * status now.
> +	 */

We are not freeing the EFI here - we are simply releasing the
reference the EFD holds on it because we are about to free the EFD.
Nothing else actually matters here - we don't care whether the EFI
is in the AIL or not, because that can be handled internally to
the xfs_efi_release() code....

> +	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents,
> +			!(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL));
> +	xfs_efd_item_free(efdp);
>  }
>  
>  /*
> @@ -434,14 +454,17 @@ xfs_efd_item_committed(
>  	xfs_lsn_t		lsn)
>  {
>  	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
> +	int abort = 0;
>  
>  	/*
>  	 * If we got a log I/O error, it's always the case that the LR with the
>  	 * EFI got unpinned and freed before the EFD got aborted.
>  	 */
> -	if (!(lip->li_flags & XFS_LI_ABORTED))
> -		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
> +	if (lip->li_flags & XFS_LI_ABORTED &&
> +	    !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL))
> +		abort = 1;
>  
> +	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort);

All this abort logic can go away if __xfs_efi_release()
handles the AIL state internally and all the EFD does is drop the
reference to the EFI it owns. This function simply becomes:

{
	struct xfs_efd_log_item *efdp = EFD_ITEM(lip);

	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
	xfs_efd_item_free(efdp);
	return (xfs_lsn_t)-1;
}

No conditional behaviour, always does the same thing regardless of
caller context.


>  }
> Index: b/fs/xfs/xfs_log_recover.c
> ===================================================================
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3754,8 +3754,9 @@ xlog_recover_process_efis(
>  		/* Skip all EFIs after first EFI in error. */
>  		if (!error)
>  			error = xlog_recover_process_efi(log->l_mp, efip);
> +		/* EFI will be on the AIL, so abort == 0 */
>  		if (error)
> -			xfs_efi_release(efip, efip->efi_format.efi_nextents);
> +			xfs_efi_release(efip, efip->efi_format.efi_nextents, 0);

Yet another place that has to know what the state of the EFI is to
handle it correctly....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown
  2014-04-03 19:34   ` Dave Chinner
@ 2014-04-03 20:48     ` Mark Tinguely
  2014-04-03 21:53       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Tinguely @ 2014-04-03 20:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/03/14 14:34, Dave Chinner wrote:
> On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote:
>> The extent free intention (EFI) and extent free done (EFD)
>> log items are in separate transactions. It is possible that
>> the EFI can be pushed to the AIL before a forced shutdown
>> where it gets stuck for following reasons:
>>
>>   No complete EFD in the transaction. This can happen if the
>>   transaction fails to reserve space or the freeing the extent
>>   fails in xfs_bmap_finish().
>>
>>   EFD IOP Abort processing. If xfs_trans_cancel() is called with
>>   an abort flag, or if the xfs_trans_commit() is called when the
>>   file system is in forced shutdown or if the log buffer write fails,
>>   then the EFD iop commands will not remove the EFI from the AIL.
>>
>> If the EFI entry is left on the AIL, xfs_ail_push all_sync() will
>> fail to complete, the unmount will hang, and a reboot is required
>> to get the complete the unmount.
>>
>> Do not free the EFI structure immediately on forced shutdowns, but
>> instead use the IOP calls to match the EFI/EFD entries. A small
>> wrinkle in the mix is the EFI is not automatically placed on the
>> AIL by the IOP routines in the cases but may have made it to the
>> AIL before the abort. We now have to check if the EFI is on the AIL
>> in the abort IOP cases.
>>
>> Signed-off-by: Mark Tinguely<tinguely@sgi.com>
>> ---
>> Dave, as I tried to explain, if the error in extent free happen early
>> enough, the EFD is not usable and is not in the transaction to cause
>> IOP commands to clean up the EFI. It too must be done manually.
>>
>> The "in the AIL" test must be done in the abort IOP commands.
>
> I still don't see the reason behind these assertions. It just
> doesn't make any sense to me that it can't be handled once the EFI
> has been attached to the EFD, nor why checking AIL status in
> __xfs_efi_release() does not work correctly...
>
>>   fs/xfs/xfs_bmap_util.c    |   21 +++++++++++++++----
>>   fs/xfs/xfs_extfree_item.c |   49 +++++++++++++++++++++++++++++++++-------------
>>   fs/xfs/xfs_log_recover.c  |    3 +-
>>   fs/xfs/xfs_trans.h        |    2 -
>>   4 files changed, 56 insertions(+), 19 deletions(-)
>>
>> Index: b/fs/xfs/xfs_bmap_util.c
>> ===================================================================
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -116,12 +116,14 @@ xfs_bmap_finish(
>>
>>   	error = xfs_trans_reserve(ntp,&tres, 0, 0);
>>   	if (error)
>> -		return error;
>> +		goto error_return;
>> +
>>   	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
>>   	for (free = flist->xbf_first; free != NULL; free = next) {
>>   		next = free->xbfi_next;
>> -		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
>> -				free->xbfi_blockcount))) {
>> +		error = xfs_free_extent(ntp, free->xbfi_startblock,
>> +				free->xbfi_blockcount);
>> +		if (error) {
>>   			/*
>>   			 * The bmap free list will be cleaned up at a
>>   			 * higher level.  The EFI will be canceled when
>> @@ -136,13 +138,24 @@ xfs_bmap_finish(
>>   						   (error == EFSCORRUPTED) ?
>>   						   SHUTDOWN_CORRUPT_INCORE :
>>   						   SHUTDOWN_META_IO_ERROR);
>> -			return error;
>> +			goto error_return;
>>   		}
>>   		xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
>>   			free->xbfi_blockcount);

^^^^^^
  here is where the EFD is placed in the transaction. It cannot be placed
  in the transaction earlier because it is not complete.

>>   		xfs_bmap_del_free(flist, NULL, free);
>>   	}
>>   	return 0;
>> +
>> + error_return:
>> +	/*
>> +	 * No EFD in the transaction matching the EFI can be used at this
>> +	 * point Manually release the EFI and remove from AIL if necessary.
>> +	 * If the EFI did not make it into the AIL, then the transaction
>> +	 * cancel of the EFI will decrement the EFI/EFD counter and will not
>> +	 * attempt to remove itself from the AIL.
>> +	 */
>> +	xfs_efi_release(efi, efi->efi_format.efi_nextents, 0);
>> +	return error;
>
> This is inconsistent from a high level logic point of view. The EFI
> is attached to the EFD after a call to xfs_trans_get_efd() and so
> cleanup on failure should always occur through the IOP* interfaces
> (specifically xfs_efd_item_unlock() with an aborted log item).
>
> The only case where a xfs_efi_release() call is required is the
> xfs_trans_reserve() failure case, where nothing in the current
> transaction context owns the reference on the EFI we are going to
> pass to the EFD.

The EFD is not in the transaction in the failed xfs_extent_free failure 
case and there is not efd IOP call.


>
>>   }
>>
>>   int
>> Index: b/fs/xfs/xfs_extfree_item.c
>> ===================================================================
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -56,15 +56,22 @@ xfs_efi_item_free(
>>    */
>>   STATIC void
>>   __xfs_efi_release(
>> -	struct xfs_efi_log_item	*efip)
>> +	struct xfs_efi_log_item	*efip,
>> +	int			abort)
>>   {
>>   	struct xfs_ail		*ailp = efip->efi_item.li_ailp;
>>
>>   	if (atomic_dec_and_test(&efip->efi_refcount)) {
>>   		spin_lock(&ailp->xa_lock);
>> -		/* xfs_trans_ail_delete() drops the AIL lock. */
>> -		xfs_trans_ail_delete(ailp,&efip->efi_item,
>> -				     SHUTDOWN_LOG_IO_ERROR);
>> +		/*
>> +		 * The EFI may not be on the AIL on abort.
>> +		 * xfs_trans_ail_delete() drops the AIL lock.
>> +		 */
>> +		if (abort)
>> +			spin_unlock(&ailp->xa_lock);
>> +		else
>> +			xfs_trans_ail_delete(ailp,&efip->efi_item,
>> +					     SHUTDOWN_LOG_IO_ERROR);
>>   		xfs_efi_item_free(efip);
>>   	}
>>   }
>
> "Abort" is not necessary. If it's the last reference, and the EFI is
> in the AIL, then it needs to be removed. i.e. the check shoul dbe
> against XFS_LI_IN_AIL, not against a flag passed in by the caller.

You are mistaken. It may or may not be in the AIL.

>
>> @@ -148,10 +155,10 @@ xfs_efi_item_unpin(
>>   		ASSERT(!(lip->li_flags&  XFS_LI_IN_AIL));
>>   		if (lip->li_desc)
>>   			xfs_trans_del_item(lip);
>> -		xfs_efi_item_free(efip);
>> +		__xfs_efi_release(efip, 1);
>>   		return;
>>   	}
>> -	__xfs_efi_release(efip);
>> +	__xfs_efi_release(efip, 0);
>>   }
>>
>>   /*
>> @@ -173,8 +180,9 @@ STATIC void
>>   xfs_efi_item_unlock(
>>   	struct xfs_log_item	*lip)
>>   {
>> +	/* EFI will not be on the AIL if called on abort */
>
> If that is true, then ASSERT() it rather than comment about it.
>
>>   	if (lip->li_flags&  XFS_LI_ABORTED)
>> -		xfs_efi_item_free(EFI_ITEM(lip));
>> +		__xfs_efi_release(EFI_ITEM(lip), 1);
>>   }
>>
>>   /*
>> @@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf
>>    */
>>   void
>>   xfs_efi_release(xfs_efi_log_item_t	*efip,
>> -		uint			nextents)
>> +		uint			nextents,
>> +		int			abort)
>>   {
>>   	ASSERT(atomic_read(&efip->efi_next_extent)>= nextents);
>>   	if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) {
>> -		__xfs_efi_release(efip);
>> +		__xfs_efi_release(efip, abort);
>>   		/* efip may now have been freed, do not reference it again. */
>>   	}
>>   }
>> @@ -417,8 +426,19 @@ STATIC void
>>   xfs_efd_item_unlock(
>>   	struct xfs_log_item	*lip)
>>   {
>> -	if (lip->li_flags&  XFS_LI_ABORTED)
>> -		xfs_efd_item_free(EFD_ITEM(lip));
>> +	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
>> +
>> +	if (!(lip->li_flags&  XFS_LI_ABORTED))
>> +		return;
>> +
>> +	/* Free the EFI when aborting a commit. The EFI will be either
>> +	 * added to the AIL in a CIL push before this abort or unlocked
>> +	 * before the EFD unlock. In either case we can check the AIL
>> +	 * status now.
>> +	 */
>
> We are not freeing the EFI here - we are simply releasing the
> reference the EFD holds on it because we are about to free the EFD.
> Nothing else actually matters here - we don't care whether the EFI
> is in the AIL or not, because that can be handled internally to
> the xfs_efi_release() code....

Again. The EFI is in the previous commit. It could have made it to the 
AIL or is in the same abort.


>
>> +	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents,
>> +			!(efdp->efd_efip->efi_item.li_flags&  XFS_LI_IN_AIL));
>> +	xfs_efd_item_free(efdp);
>>   }
>>   /*
>> @@ -434,14 +454,17 @@ xfs_efd_item_committed(
>>   	xfs_lsn_t		lsn)
>>   {
>>   	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
>> +	int abort = 0;
>>
>>   	/*
>>   	 * If we got a log I/O error, it's always the case that the LR with the
>>   	 * EFI got unpinned and freed before the EFD got aborted.
>>   	 */
>> -	if (!(lip->li_flags&  XFS_LI_ABORTED))
>> -		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
>> +	if (lip->li_flags & XFS_LI_ABORTED &&
>> +	    !(efdp->efd_efip->efi_item.li_flags&  XFS_LI_IN_AIL))
>> +		abort = 1;
>>
>> +	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort);
>
> All this abort logic can go away if __xfs_efi_release()
> handles the AIL state internally and all the EFD does is drop the
> reference to the EFI it owns. This function simply becomes:

The EFI in the previous commit could have made it to the AIL or not.


> {
> 	struct xfs_efd_log_item *efdp = EFD_ITEM(lip);
>
> 	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
> 	xfs_efd_item_free(efdp);
> 	return (xfs_lsn_t)-1;
> }
>
> No conditional behaviour, always does the same thing regardless of
> caller context.

I disagree. It may or may not be in the AIL at this point.


>
>>   }
>> Index: b/fs/xfs/xfs_log_recover.c
>> ===================================================================
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -3754,8 +3754,9 @@ xlog_recover_process_efis(
>>   		/* Skip all EFIs after first EFI in error. */
>>   		if (!error)
>>   			error = xlog_recover_process_efi(log->l_mp, efip);
>> +		/* EFI will be on the AIL, so abort == 0 */
>>   		if (error)
>> -			xfs_efi_release(efip, efip->efi_format.efi_nextents);
>> +			xfs_efi_release(efip, efip->efi_format.efi_nextents, 0);
>
> Yet another place that has to know what the state of the EFI is to
> handle it correctly....

We know the EFI is in the AIL here.


--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown
  2014-04-03 20:48     ` Mark Tinguely
@ 2014-04-03 21:53       ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2014-04-03 21:53 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Thu, Apr 03, 2014 at 03:48:40PM -0500, Mark Tinguely wrote:
> On 04/03/14 14:34, Dave Chinner wrote:
> >On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote:
> >>The extent free intention (EFI) and extent free done (EFD)
> >>log items are in separate transactions. It is possible that
> >>the EFI can be pushed to the AIL before a forced shutdown
> >>where it gets stuck for following reasons:
> >>
> >>  No complete EFD in the transaction. This can happen if the
> >>  transaction fails to reserve space or the freeing the extent
> >>  fails in xfs_bmap_finish().
> >>
> >>  EFD IOP Abort processing. If xfs_trans_cancel() is called with
> >>  an abort flag, or if the xfs_trans_commit() is called when the
> >>  file system is in forced shutdown or if the log buffer write fails,
> >>  then the EFD iop commands will not remove the EFI from the AIL.
> >>
> >>If the EFI entry is left on the AIL, xfs_ail_push all_sync() will
> >>fail to complete, the unmount will hang, and a reboot is required
> >>to get the complete the unmount.
> >>
> >>Do not free the EFI structure immediately on forced shutdowns, but
> >>instead use the IOP calls to match the EFI/EFD entries. A small
> >>wrinkle in the mix is the EFI is not automatically placed on the
> >>AIL by the IOP routines in the cases but may have made it to the
> >>AIL before the abort. We now have to check if the EFI is on the AIL
> >>in the abort IOP cases.
> >>
> >>Signed-off-by: Mark Tinguely<tinguely@sgi.com>
> >>---
> >>Dave, as I tried to explain, if the error in extent free happen early
> >>enough, the EFD is not usable and is not in the transaction to cause
> >>IOP commands to clean up the EFI. It too must be done manually.
> >>
> >>The "in the AIL" test must be done in the abort IOP commands.
> >
> >I still don't see the reason behind these assertions. It just
> >doesn't make any sense to me that it can't be handled once the EFI
> >has been attached to the EFD, nor why checking AIL status in
> >__xfs_efi_release() does not work correctly...
> >
> >>  fs/xfs/xfs_bmap_util.c    |   21 +++++++++++++++----
> >>  fs/xfs/xfs_extfree_item.c |   49 +++++++++++++++++++++++++++++++++-------------
> >>  fs/xfs/xfs_log_recover.c  |    3 +-
> >>  fs/xfs/xfs_trans.h        |    2 -
> >>  4 files changed, 56 insertions(+), 19 deletions(-)
> >>
> >>Index: b/fs/xfs/xfs_bmap_util.c
> >>===================================================================
> >>--- a/fs/xfs/xfs_bmap_util.c
> >>+++ b/fs/xfs/xfs_bmap_util.c
> >>@@ -116,12 +116,14 @@ xfs_bmap_finish(
> >>
> >>  	error = xfs_trans_reserve(ntp,&tres, 0, 0);
> >>  	if (error)
> >>-		return error;
> >>+		goto error_return;
> >>+
> >>  	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
> >>  	for (free = flist->xbf_first; free != NULL; free = next) {
> >>  		next = free->xbfi_next;
> >>-		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
> >>-				free->xbfi_blockcount))) {
> >>+		error = xfs_free_extent(ntp, free->xbfi_startblock,
> >>+				free->xbfi_blockcount);
> >>+		if (error) {
> >>  			/*
> >>  			 * The bmap free list will be cleaned up at a
> >>  			 * higher level.  The EFI will be canceled when
> >>@@ -136,13 +138,24 @@ xfs_bmap_finish(
> >>  						   (error == EFSCORRUPTED) ?
> >>  						   SHUTDOWN_CORRUPT_INCORE :
> >>  						   SHUTDOWN_META_IO_ERROR);
> >>-			return error;
> >>+			goto error_return;
> >>  		}
> >>  		xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock,
> >>  			free->xbfi_blockcount);
> 
> ^^^^^^
>  here is where the EFD is placed in the transaction. It cannot be placed
>  in the transaction earlier because it is not complete.

No, that's wrong. The EFD is linked to the transaction in
xfs_trans_get_efd() where it calls xfs_trans_add_item(). The above
call is where it is logged (i.e. marked dirty), not where it is
added to the transaction.

> >This is inconsistent from a high level logic point of view. The EFI
> >is attached to the EFD after a call to xfs_trans_get_efd() and so
> >cleanup on failure should always occur through the IOP* interfaces
> >(specifically xfs_efd_item_unlock() with an aborted log item).
> >
> >The only case where a xfs_efi_release() call is required is the
> >xfs_trans_reserve() failure case, where nothing in the current
> >transaction context owns the reference on the EFI we are going to
> >pass to the EFD.
> 
> The EFD is not in the transaction in the failed xfs_extent_free
> failure case and there is not efd IOP call.

Yes it is. It may no be dirty, though.

> >>--- a/fs/xfs/xfs_extfree_item.c
> >>+++ b/fs/xfs/xfs_extfree_item.c
> >>@@ -56,15 +56,22 @@ xfs_efi_item_free(
> >>   */
> >>  STATIC void
> >>  __xfs_efi_release(
> >>-	struct xfs_efi_log_item	*efip)
> >>+	struct xfs_efi_log_item	*efip,
> >>+	int			abort)
> >>  {
> >>  	struct xfs_ail		*ailp = efip->efi_item.li_ailp;
> >>
> >>  	if (atomic_dec_and_test(&efip->efi_refcount)) {
> >>  		spin_lock(&ailp->xa_lock);
> >>-		/* xfs_trans_ail_delete() drops the AIL lock. */
> >>-		xfs_trans_ail_delete(ailp,&efip->efi_item,
> >>-				     SHUTDOWN_LOG_IO_ERROR);
> >>+		/*
> >>+		 * The EFI may not be on the AIL on abort.
> >>+		 * xfs_trans_ail_delete() drops the AIL lock.
> >>+		 */
> >>+		if (abort)
> >>+			spin_unlock(&ailp->xa_lock);
> >>+		else
> >>+			xfs_trans_ail_delete(ailp,&efip->efi_item,
> >>+					     SHUTDOWN_LOG_IO_ERROR);
> >>  		xfs_efi_item_free(efip);
> >>  	}
> >>  }
> >
> >"Abort" is not necessary. If it's the last reference, and the EFI is
> >in the AIL, then it needs to be removed. i.e. the check shoul dbe
> >against XFS_LI_IN_AIL, not against a flag passed in by the caller.
> 
> You are mistaken. It may or may not be in the AIL.

No, I'm not mistaken - I'm talking about using a different
techinique to determine whether we need to remove the EFI from the
AIL or not.

That is, we have a flag on *every log item* that tells us whether
the item is in the AIL. It is guaranteed to be correct, and we use
it in many places to determine if the object we are about to free is
in the AIL and hence needs to be removed first. IOWs, we can use
introspection to remove the EFI from the AIL when the last reference
goes away.  The code is simply:

	if (atomic_dec_and_test(&efip->efi_refcount)) {
		spin_lock(&ailp->xa_lock);
		if (efip->efi_item.li_flags & XFS_LI_IN_AIL) {
			xfs_trans_ail_delete(ailp,&efip->efi_item,
					     SHUTDOWN_LOG_IO_ERROR);
		}
		spin_unlock(&ailp->xa_lock);
		xfs_efi_item_free(efip);
	}

No abort flag is needed.

Other examples of this exact same check being done in error handling
paths for various log item are xfs_iflush_abort(),
xfs_buf_item_unlock() and xfs_qm_dqflush().

> >>  	if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) {
> >>-		__xfs_efi_release(efip);
> >>+		__xfs_efi_release(efip, abort);
> >>  		/* efip may now have been freed, do not reference it again. */
> >>  	}
> >>  }
> >>@@ -417,8 +426,19 @@ STATIC void
> >>  xfs_efd_item_unlock(
> >>  	struct xfs_log_item	*lip)
> >>  {
> >>-	if (lip->li_flags&  XFS_LI_ABORTED)
> >>-		xfs_efd_item_free(EFD_ITEM(lip));
> >>+	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
> >>+
> >>+	if (!(lip->li_flags&  XFS_LI_ABORTED))
> >>+		return;

BTW, your email program is still mangling quoted whitespace in the
code. Can you please fix it?

> >>+
> >>+	/* Free the EFI when aborting a commit. The EFI will be either
> >>+	 * added to the AIL in a CIL push before this abort or unlocked
> >>+	 * before the EFD unlock. In either case we can check the AIL
> >>+	 * status now.
> >>+	 */
> >
> >We are not freeing the EFI here - we are simply releasing the
> >reference the EFD holds on it because we are about to free the EFD.
> >Nothing else actually matters here - we don't care whether the EFI
> >is in the AIL or not, because that can be handled internally to
> >the xfs_efi_release() code....
> 
> Again. The EFI is in the previous commit. It could have made it to
> the AIL or is in the same abort.

You're still missing the point. It doesn't matter where the EFI is -
what matters is that we are freeing the object that holds a
reference to the EFI and so we have to drop it.

The point of using reference counting and introspection that no
caller needs to care about the internal state of the EFI, just that
they hold a reference that needs to be dropped. And that means
simpler code and that future modifications are much less likely to
screw up the EFI error handling.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-03 21:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely
2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely
2014-03-28 15:24   ` Brian Foster
2014-03-28 15:41     ` Mark Tinguely
2014-03-28 16:07       ` Brian Foster
2014-03-28 16:21         ` Mark Tinguely
2014-03-31 20:25   ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely
2014-04-03 19:07     ` Dave Chinner
2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely
2014-04-03 19:34   ` Dave Chinner
2014-04-03 20:48     ` Mark Tinguely
2014-04-03 21:53       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).