[PATCH 0/2 V3] Resubmit items failed during writeback

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2 V3] Resubmit items failed during writeback
@ 2017-06-08 14:00 Carlos Maiolino
  2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
  2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
  0 siblings, 2 replies; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw)
  To: linux-xfs

Hi,

this is the 3rd revision of this patchset, based on the discussion in the
previous version, hoping to have fixed all the issues pointed there :)

I've removed the atomic handling of the xfs_log_item flags once it has been
decided to convert the usage to atomic handling after the patchset if accepted,
so, I'll add it to another patchset together with the dquot fix for the items
resubmission.

More details on each patch.

Cheers

-- 
2.9.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure
  2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino
@ 2017-06-08 14:00 ` Carlos Maiolino
  2017-06-08 16:43   ` Brian Foster
  2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
  1 sibling, 1 reply; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw)
  To: linux-xfs

With the current code, XFS never re-submit a failed buffer for IO,
because the failed item in the buffer is kept in the flush locked state
forever.

To be able to resubmit an log item for IO, we need a way to mark an item
as failed, if, for any reason the buffer which the item belonged to
failed during writeback.

Add a new log item callback to be used after an IO completion failure
and make the needed clean ups.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---

V2:
	- Update commit log to include a better description of why this
	  patch is needed and fix spelling mistakes
	- Move xfs_buf_do_callbacks_fail() call into
	  xfs_buf_iodone_callback_error, so the callbacks can be executed
	  before the buffer is released, and only after it has been
	  retried once

V3:
	- fix some loops according to hch suggestion
	- whitespace cleanup

 fs/xfs/xfs_buf_item.c | 23 ++++++++++++++++++++++-
 fs/xfs/xfs_trans.h    |  7 +++++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 0306168..523b7a4 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -29,6 +29,7 @@
 #include "xfs_error.h"
 #include "xfs_trace.h"
 #include "xfs_log.h"
+#include "xfs_inode.h"
 
 
 kmem_zone_t	*xfs_buf_item_zone;
@@ -1051,6 +1052,20 @@ xfs_buf_do_callbacks(
 	}
 }
 
+STATIC void
+xfs_buf_do_callbacks_fail(
+	struct xfs_buf		*bp)
+{
+	struct xfs_log_item	*lip, *next;
+	unsigned int		bflags = bp->b_flags;
+
+	for (lip = bp->b_fspriv; lip; lip = next) {
+		next = lip->li_bio_list;
+		if (lip->li_ops->iop_error)
+			lip->li_ops->iop_error(lip, bp, bflags);
+	}
+}
+
 static bool
 xfs_buf_iodone_callback_error(
 	struct xfs_buf		*bp)
@@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error(
 	if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount)
 		goto permanent_error;
 
-	/* still a transient error, higher layers will retry */
+	/*
+	 * still a transient error, run IO completion failure callbacks and
+	 * let the higher layers retry the buffer.
+	 * */
 	xfs_buf_ioerror(bp, 0);
+
+	/* run failure callbacks before releasing buffer */
+	xfs_buf_do_callbacks_fail(bp);
 	xfs_buf_relse(bp);
 	return true;
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a07acbf..3486517 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -64,11 +64,13 @@ typedef struct xfs_log_item {
 } xfs_log_item_t;
 
 #define	XFS_LI_IN_AIL	0x1
-#define XFS_LI_ABORTED	0x2
+#define	XFS_LI_ABORTED	0x2
+#define	XFS_LI_FAILED	0x4
 
 #define XFS_LI_FLAGS \
 	{ XFS_LI_IN_AIL,	"IN_AIL" }, \
-	{ XFS_LI_ABORTED,	"ABORTED" }
+	{ XFS_LI_ABORTED,	"ABORTED" }, \
+	{ XFS_LI_FAILED,	"FAILED" }
 
 struct xfs_item_ops {
 	void (*iop_size)(xfs_log_item_t *, int *, int *);
@@ -79,6 +81,7 @@ struct xfs_item_ops {
 	void (*iop_unlock)(xfs_log_item_t *);
 	xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t);
 	void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t);
+	void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags);
 };
 
 void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure
  2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
@ 2017-06-08 16:43   ` Brian Foster
  2017-06-14 10:15     ` Carlos Maiolino
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2017-06-08 16:43 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: linux-xfs

On Thu, Jun 08, 2017 at 04:00:13PM +0200, Carlos Maiolino wrote:
> With the current code, XFS never re-submit a failed buffer for IO,
> because the failed item in the buffer is kept in the flush locked state
> forever.
> 
> To be able to resubmit an log item for IO, we need a way to mark an item
> as failed, if, for any reason the buffer which the item belonged to
> failed during writeback.
> 
> Add a new log item callback to be used after an IO completion failure
> and make the needed clean ups.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> 
> V2:
> 	- Update commit log to include a better description of why this
> 	  patch is needed and fix spelling mistakes
> 	- Move xfs_buf_do_callbacks_fail() call into
> 	  xfs_buf_iodone_callback_error, so the callbacks can be executed
> 	  before the buffer is released, and only after it has been
> 	  retried once
> 
> V3:
> 	- fix some loops according to hch suggestion
> 	- whitespace cleanup
> 
>  fs/xfs/xfs_buf_item.c | 23 ++++++++++++++++++++++-
>  fs/xfs/xfs_trans.h    |  7 +++++--
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 0306168..523b7a4 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -29,6 +29,7 @@
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
>  #include "xfs_log.h"
> +#include "xfs_inode.h"
>  
>  
>  kmem_zone_t	*xfs_buf_item_zone;
> @@ -1051,6 +1052,20 @@ xfs_buf_do_callbacks(
>  	}
>  }
>  
> +STATIC void
> +xfs_buf_do_callbacks_fail(
> +	struct xfs_buf		*bp)
> +{
> +	struct xfs_log_item	*lip, *next;
> +	unsigned int		bflags = bp->b_flags;
> +
> +	for (lip = bp->b_fspriv; lip; lip = next) {
> +		next = lip->li_bio_list;
> +		if (lip->li_ops->iop_error)
> +			lip->li_ops->iop_error(lip, bp, bflags);

There's still no need to pass bflags here, particularly since we pass
along the buffer already. (I'm still questioning the need for
->iop_error(), but I'll leave that as a final review question once
everything else is in order. :P)

> +	}
> +}
> +
>  static bool
>  xfs_buf_iodone_callback_error(
>  	struct xfs_buf		*bp)
> @@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error(
>  	if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount)
>  		goto permanent_error;
>  
> -	/* still a transient error, higher layers will retry */
> +	/*
> +	 * still a transient error, run IO completion failure callbacks and
> +	 * let the higher layers retry the buffer.
> +	 * */
>  	xfs_buf_ioerror(bp, 0);
> +
> +	/* run failure callbacks before releasing buffer */
> +	xfs_buf_do_callbacks_fail(bp);

I think in principle it might be more appropriate to invoke the
callbacks before we reset the I/O error. That way the failure callbacks
see the buffer in the error state, then we reset the error and release
it back to the AIL (so to speak).

Otherwise the rest looks sane to me.

Brian

>  	xfs_buf_relse(bp);
>  	return true;
>  
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index a07acbf..3486517 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -64,11 +64,13 @@ typedef struct xfs_log_item {
>  } xfs_log_item_t;
>  
>  #define	XFS_LI_IN_AIL	0x1
> -#define XFS_LI_ABORTED	0x2
> +#define	XFS_LI_ABORTED	0x2
> +#define	XFS_LI_FAILED	0x4
>  
>  #define XFS_LI_FLAGS \
>  	{ XFS_LI_IN_AIL,	"IN_AIL" }, \
> -	{ XFS_LI_ABORTED,	"ABORTED" }
> +	{ XFS_LI_ABORTED,	"ABORTED" }, \
> +	{ XFS_LI_FAILED,	"FAILED" }
>  
>  struct xfs_item_ops {
>  	void (*iop_size)(xfs_log_item_t *, int *, int *);
> @@ -79,6 +81,7 @@ struct xfs_item_ops {
>  	void (*iop_unlock)(xfs_log_item_t *);
>  	xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t);
>  	void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t);
> +	void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags);
>  };
>  
>  void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure
  2017-06-08 16:43   ` Brian Foster
@ 2017-06-14 10:15     ` Carlos Maiolino
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-14 10:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

> > +STATIC void
> > +xfs_buf_do_callbacks_fail(
> > +	struct xfs_buf		*bp)
> > +{
> > +	struct xfs_log_item	*lip, *next;
> > +	unsigned int		bflags = bp->b_flags;
> > +
> > +	for (lip = bp->b_fspriv; lip; lip = next) {
> > +		next = lip->li_bio_list;
> > +		if (lip->li_ops->iop_error)
> > +			lip->li_ops->iop_error(lip, bp, bflags);
> 
> There's still no need to pass bflags here, particularly since we pass
> along the buffer already. (I'm still questioning the need for
> ->iop_error(), but I'll leave that as a final review question once
> everything else is in order. :P)
> 

Agreed regarding bflags, removed for V4.

Regarding iop_error(), well, I'm not discussing it anymore :)

> > +	}
> > +}
> > +
> >  static bool
> >  xfs_buf_iodone_callback_error(
> >  	struct xfs_buf		*bp)
> > @@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error(
> >  	if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount)
> >  		goto permanent_error;
> >  
> > -	/* still a transient error, higher layers will retry */
> > +	/*
> > +	 * still a transient error, run IO completion failure callbacks and
> > +	 * let the higher layers retry the buffer.
> > +	 * */
> >  	xfs_buf_ioerror(bp, 0);
> > +
> > +	/* run failure callbacks before releasing buffer */
> > +	xfs_buf_do_callbacks_fail(bp);
> 
> I think in principle it might be more appropriate to invoke the
> callbacks before we reset the I/O error. That way the failure callbacks
> see the buffer in the error state, then we reset the error and release
> it back to the AIL (so to speak).
> 

Agreed, on V4.


> Otherwise the rest looks sane to me.
> 

Thanks for the review
> Brian
> 
> >  	xfs_buf_relse(bp);
> >  	return true;
> >  
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index a07acbf..3486517 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -64,11 +64,13 @@ typedef struct xfs_log_item {
> >  } xfs_log_item_t;
> >  
> >  #define	XFS_LI_IN_AIL	0x1
> > -#define XFS_LI_ABORTED	0x2
> > +#define	XFS_LI_ABORTED	0x2
> > +#define	XFS_LI_FAILED	0x4
> >  
> >  #define XFS_LI_FLAGS \
> >  	{ XFS_LI_IN_AIL,	"IN_AIL" }, \
> > -	{ XFS_LI_ABORTED,	"ABORTED" }
> > +	{ XFS_LI_ABORTED,	"ABORTED" }, \
> > +	{ XFS_LI_FAILED,	"FAILED" }
> >  
> >  struct xfs_item_ops {
> >  	void (*iop_size)(xfs_log_item_t *, int *, int *);
> > @@ -79,6 +81,7 @@ struct xfs_item_ops {
> >  	void (*iop_unlock)(xfs_log_item_t *);
> >  	xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t);
> >  	void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t);
> > +	void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags);
> >  };
> >  
> >  void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
> > -- 
> > 2.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback
  2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino
  2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
@ 2017-06-08 14:00 ` Carlos Maiolino
  2017-06-08 16:45   ` Brian Foster
  1 sibling, 1 reply; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw)
  To: linux-xfs

When a buffer has been failed during writeback, the inode items into it
are kept flush locked, and are never resubmitted due the flush lock, so,
if any buffer fails to be written, the items in AIL are never written to
disk and never unlocked.

This causes unmount operation to hang due these items flush locked in AIL,
but this also causes the items in AIL to never be written back, even when
the IO device comes back to normal.

I've been testing this patch with a DM-thin device, creating a
filesystem larger than the real device.

When writing enough data to fill the DM-thin device, XFS receives ENOSPC
errors from the device, and keep spinning on xfsaild (when 'retry
forever' configuration is set).

At this point, the filesystem can not be unmounted because of the flush locked
items in AIL, but worse, the items in AIL are never retried at all
(once xfs_inode_item_push() will skip the items that are flush locked),
even if the underlying DM-thin device is expanded to the proper size.

This patch fixes both cases, retrying any item that has been failed
previously, using the infra-structure provided by the previous patch.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---
V2:
	- Fix XFS_LI_FAILED flag removal
	- Use atomic operations to set and clear XFS_LI_FAILED flag
	- Remove check for XBF_WRITE_FAIL in xfs_inode_item_push
	- Add more comments to the code
	- Add a helper function to resubmit the failed buffers, so this
	  can be also used in dquot system without duplicating code

V3:
	- drop xfs_imap_to_bp call using a pointer in the log item to
	  hold the buffer address, fixing a possible infinite loop if
	  xfs_map_to_bp returns -EIO
	- use xa_lock instead of atomic operations to handle log item
	  flags
	- Add a hold to the buffer for each log item failed
	- move buffer resubmission up in xfs_inode_item_push()

 fs/xfs/xfs_buf_item.c   | 39 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_buf_item.h   |  2 ++
 fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trans.h      |  1 +
 fs/xfs/xfs_trans_ail.c  |  4 ++--
 5 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 523b7a4..9b7bd23 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1222,3 +1222,42 @@ xfs_buf_iodone(
 	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
+
+/*
+ * Requeue a failed buffer for writeback
+ *
+ * Return true if the buffer has been re-queued properly, false otherwise
+ */
+bool
+xfs_buf_resubmit_failed_buffers(
+	struct xfs_inode	*ip,
+	struct xfs_log_item	*lip,
+	struct list_head	*buffer_list)
+{
+	struct xfs_log_item	*next;
+	struct xfs_buf		*bp;
+	bool			ret;
+
+	bp = lip->li_buf;
+
+	/* Clear XFS_LI_FAILED flag from all items before resubmit
+	 *
+	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
+	 * function already have it acquired
+	 */
+	for (; lip; lip = next) {
+		next = lip->li_bio_list;
+		lip->li_flags &= ~XFS_LI_FAILED;
+		xfs_buf_rele(bp);
+	}
+
+	/* Add this buffer back to the delayed write list */
+	xfs_buf_lock(bp);
+	if (!xfs_buf_delwri_queue(bp, buffer_list))
+		ret = false;
+	else
+		ret = true;
+
+	xfs_buf_unlock(bp);
+	return ret;
+}
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index f7eba99..32aceae 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -70,6 +70,8 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      xfs_log_item_t *);
 void	xfs_buf_iodone_callbacks(struct xfs_buf *);
 void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
+bool	xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *,
+					struct list_head *);
 
 extern kmem_zone_t	*xfs_buf_item_zone;
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 08cb7d1..2f5a49e 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -27,6 +27,7 @@
 #include "xfs_error.h"
 #include "xfs_trace.h"
 #include "xfs_trans_priv.h"
+#include "xfs_buf_item.h"
 #include "xfs_log.h"
 
 
@@ -475,6 +476,26 @@ xfs_inode_item_unpin(
 		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
 }
 
+STATIC void
+xfs_inode_item_error(
+	struct xfs_log_item	*lip,
+	struct xfs_buf		*bp,
+	unsigned int		bflags)
+{
+
+	/*
+	 * The buffer writeback containing this inode has been failed
+	 * mark it as failed and unlock the flush lock, so it can be retried
+	 * again.
+	 * Use xa_lock to avoid races with other log item state changes.
+	 */
+	spin_lock(&lip->li_ailp->xa_lock);
+	xfs_buf_hold(bp);
+	lip->li_flags |= XFS_LI_FAILED;
+	lip->li_buf = bp;
+	spin_unlock(&lip->li_ailp->xa_lock);
+}
+
 STATIC uint
 xfs_inode_item_push(
 	struct xfs_log_item	*lip,
@@ -504,6 +525,18 @@ xfs_inode_item_push(
 	}
 
 	/*
+	 * The buffer containing this item failed to be written back
+	 * previously. Resubmit the buffer for IO.
+	 */
+	if (lip->li_flags & XFS_LI_FAILED) {
+		if (!xfs_buf_resubmit_failed_buffers(ip, lip,
+						     buffer_list))
+			rval = XFS_ITEM_FLUSHING;
+
+		goto out_unlock;
+	}
+
+	/*
 	 * Stale inode items should force out the iclog.
 	 */
 	if (ip->i_flags & XFS_ISTALE) {
@@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
 	.iop_unlock	= xfs_inode_item_unlock,
 	.iop_committed	= xfs_inode_item_committed,
 	.iop_push	= xfs_inode_item_push,
-	.iop_committing = xfs_inode_item_committing
+	.iop_committing = xfs_inode_item_committing,
+	.iop_error	= xfs_inode_item_error
 };
 
 
@@ -710,7 +744,8 @@ xfs_iflush_done(
 		 * the AIL lock.
 		 */
 		iip = INODE_ITEM(blip);
-		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
+		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
+		    lip->li_flags & XFS_LI_FAILED)
 			need_ail++;
 
 		blip = next;
@@ -718,7 +753,8 @@ xfs_iflush_done(
 
 	/* make sure we capture the state of the initial inode. */
 	iip = INODE_ITEM(lip);
-	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
+	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
+	    lip->li_flags & XFS_LI_FAILED)
 		need_ail++;
 
 	/*
@@ -739,6 +775,8 @@ xfs_iflush_done(
 			if (INODE_ITEM(blip)->ili_logged &&
 			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
 				mlip_changed |= xfs_ail_delete_one(ailp, blip);
+			else if (blip->li_flags & XFS_LI_FAILED)
+				blip->li_flags &= ~XFS_LI_FAILED;
 		}
 
 		if (mlip_changed) {
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 3486517..86c3464 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -49,6 +49,7 @@ typedef struct xfs_log_item {
 	struct xfs_ail			*li_ailp;	/* ptr to AIL */
 	uint				li_type;	/* item type */
 	uint				li_flags;	/* misc flags */
+	struct xfs_buf			*li_buf;	/* real buffer pointer */
 	struct xfs_log_item		*li_bio_list;	/* buffer item list */
 	void				(*li_cb)(struct xfs_buf *,
 						 struct xfs_log_item *);
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 9056c0f..36e08dd 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk(
 bool
 xfs_ail_delete_one(
 	struct xfs_ail		*ailp,
-	struct xfs_log_item 	*lip)
+	struct xfs_log_item	*lip)
 {
 	struct xfs_log_item	*mlip = xfs_ail_min(ailp);
 
 	trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
 	xfs_ail_delete(ailp, lip);
-	lip->li_flags &= ~XFS_LI_IN_AIL;
+	lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED);
 	lip->li_lsn = 0;
 
 	return mlip == lip;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback
  2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
@ 2017-06-08 16:45   ` Brian Foster
  2017-06-12 12:44     ` Carlos Maiolino
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2017-06-08 16:45 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: linux-xfs

On Thu, Jun 08, 2017 at 04:00:14PM +0200, Carlos Maiolino wrote:
> When a buffer has been failed during writeback, the inode items into it
> are kept flush locked, and are never resubmitted due the flush lock, so,
> if any buffer fails to be written, the items in AIL are never written to
> disk and never unlocked.
> 
> This causes unmount operation to hang due these items flush locked in AIL,
> but this also causes the items in AIL to never be written back, even when
> the IO device comes back to normal.
> 
> I've been testing this patch with a DM-thin device, creating a
> filesystem larger than the real device.
> 
> When writing enough data to fill the DM-thin device, XFS receives ENOSPC
> errors from the device, and keep spinning on xfsaild (when 'retry
> forever' configuration is set).
> 
> At this point, the filesystem can not be unmounted because of the flush locked
> items in AIL, but worse, the items in AIL are never retried at all
> (once xfs_inode_item_push() will skip the items that are flush locked),
> even if the underlying DM-thin device is expanded to the proper size.
> 
> This patch fixes both cases, retrying any item that has been failed
> previously, using the infra-structure provided by the previous patch.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> V2:
> 	- Fix XFS_LI_FAILED flag removal
> 	- Use atomic operations to set and clear XFS_LI_FAILED flag
> 	- Remove check for XBF_WRITE_FAIL in xfs_inode_item_push
> 	- Add more comments to the code
> 	- Add a helper function to resubmit the failed buffers, so this
> 	  can be also used in dquot system without duplicating code
> 
> V3:
> 	- drop xfs_imap_to_bp call using a pointer in the log item to
> 	  hold the buffer address, fixing a possible infinite loop if
> 	  xfs_map_to_bp returns -EIO
> 	- use xa_lock instead of atomic operations to handle log item
> 	  flags
> 	- Add a hold to the buffer for each log item failed
> 	- move buffer resubmission up in xfs_inode_item_push()
> 
>  fs/xfs/xfs_buf_item.c   | 39 +++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_buf_item.h   |  2 ++
>  fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_trans.h      |  1 +
>  fs/xfs/xfs_trans_ail.c  |  4 ++--
>  5 files changed, 85 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 523b7a4..9b7bd23 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1222,3 +1222,42 @@ xfs_buf_iodone(
>  	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
>  	xfs_buf_item_free(BUF_ITEM(lip));
>  }
> +
> +/*
> + * Requeue a failed buffer for writeback
> + *
> + * Return true if the buffer has been re-queued properly, false otherwise
> + */
> +bool
> +xfs_buf_resubmit_failed_buffers(
> +	struct xfs_inode	*ip,
> +	struct xfs_log_item	*lip,
> +	struct list_head	*buffer_list)
> +{
> +	struct xfs_log_item	*next;
> +	struct xfs_buf		*bp;
> +	bool			ret;
> +
> +	bp = lip->li_buf;
> +
> +	/* Clear XFS_LI_FAILED flag from all items before resubmit
> +	 *
> +	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
> +	 * function already have it acquired
> +	 */
> +	for (; lip; lip = next) {
> +		next = lip->li_bio_list;
> +		lip->li_flags &= ~XFS_LI_FAILED;
> +		xfs_buf_rele(bp);

We need to check that the log item has LI_FAILED set before we release
the buffer because it's possible for the buffer to have a mix of failed
and recently flushed (not failed) items. We should set ->li_buf = NULL
wherever we clear LI_FAILED as well.

A couple small inline helpers to fail/unfail log items correctly (i.e.,
xfs_log_item_[fail|clear]() or some such) might help to make sure we get
the reference counting right. We could also assert that the correct lock
is held from those helpers.

> +	}
> +
> +	/* Add this buffer back to the delayed write list */
> +	xfs_buf_lock(bp);
> +	if (!xfs_buf_delwri_queue(bp, buffer_list))
> +		ret = false;
> +	else
> +		ret = true;
> +
> +	xfs_buf_unlock(bp);
> +	return ret;
> +}
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index f7eba99..32aceae 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -70,6 +70,8 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
>  			      xfs_log_item_t *);
>  void	xfs_buf_iodone_callbacks(struct xfs_buf *);
>  void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
> +bool	xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *,
> +					struct list_head *);
>  
>  extern kmem_zone_t	*xfs_buf_item_zone;
>  
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 08cb7d1..2f5a49e 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -27,6 +27,7 @@
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_buf_item.h"
>  #include "xfs_log.h"
>  
>  
> @@ -475,6 +476,26 @@ xfs_inode_item_unpin(
>  		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
>  }
>  
> +STATIC void
> +xfs_inode_item_error(
> +	struct xfs_log_item	*lip,
> +	struct xfs_buf		*bp,
> +	unsigned int		bflags)
> +{
> +
> +	/*
> +	 * The buffer writeback containing this inode has been failed
> +	 * mark it as failed and unlock the flush lock, so it can be retried
> +	 * again.
> +	 * Use xa_lock to avoid races with other log item state changes.
> +	 */
> +	spin_lock(&lip->li_ailp->xa_lock);

I'd push the ->xa_lock up into caller and cycle it once over the list
rather than once per item. This is more consistent with the locking
pattern for successful I/Os.

> +	xfs_buf_hold(bp);
> +	lip->li_flags |= XFS_LI_FAILED;
> +	lip->li_buf = bp;

Also, while I think this is technically Ok based on the current
implementation, I think we should also test that LI_FAILED is not
already set and only hold the buf in that case (i.e., the previously
mentioned helper). This helps prevent a landmine if the
implementation/locking/etc. changes down the road.

I think it would be fine to assert that LI_FAILED is not set already if
we wanted to, though, as long as there's a brief comment as well.

> +	spin_unlock(&lip->li_ailp->xa_lock);
> +}
> +
>  STATIC uint
>  xfs_inode_item_push(
>  	struct xfs_log_item	*lip,
> @@ -504,6 +525,18 @@ xfs_inode_item_push(
>  	}
>  
>  	/*
> +	 * The buffer containing this item failed to be written back
> +	 * previously. Resubmit the buffer for IO.
> +	 */
> +	if (lip->li_flags & XFS_LI_FAILED) {
> +		if (!xfs_buf_resubmit_failed_buffers(ip, lip,
> +						     buffer_list))
> +			rval = XFS_ITEM_FLUSHING;
> +
> +		goto out_unlock;
> +	}

I think this needs to go at least before the ilock. We don't need the
latter lock because the inode is already flushed. Somebody else could be
holding ilock and blocked on the flush lock. If that occurs, we'll never
make progress.

I also think we should lock/unlock the buffer here rather than down in
the helper. Hmm, maybe even queue it as well so it's clear what's going
on. For example, something like:

	if (lip->li_flags & XFS_LI_FAILED) {
		xfs_buf_lock(bp);
		xfs_buf_clear_failed_items(bp);
		if (!xfs_buf_delwri_queue(bp, buffer_list))
			rval = XFS_ITEM_FLUSHING;
		xfs_buf_unlock(bp);
		goto out_unlock;
	}

... but that is mostly just aesthetic.

> +
> +	/*
>  	 * Stale inode items should force out the iclog.
>  	 */
>  	if (ip->i_flags & XFS_ISTALE) {
> @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
>  	.iop_unlock	= xfs_inode_item_unlock,
>  	.iop_committed	= xfs_inode_item_committed,
>  	.iop_push	= xfs_inode_item_push,
> -	.iop_committing = xfs_inode_item_committing
> +	.iop_committing = xfs_inode_item_committing,
> +	.iop_error	= xfs_inode_item_error
>  };
>  
>  
> @@ -710,7 +744,8 @@ xfs_iflush_done(
>  		 * the AIL lock.
>  		 */
>  		iip = INODE_ITEM(blip);
> -		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
> +		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
> +		    lip->li_flags & XFS_LI_FAILED)
>  			need_ail++;
>  
>  		blip = next;
> @@ -718,7 +753,8 @@ xfs_iflush_done(
>  
>  	/* make sure we capture the state of the initial inode. */
>  	iip = INODE_ITEM(lip);
> -	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
> +	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
> +	    lip->li_flags & XFS_LI_FAILED)
>  		need_ail++;
>  
>  	/*
> @@ -739,6 +775,8 @@ xfs_iflush_done(
>  			if (INODE_ITEM(blip)->ili_logged &&
>  			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
>  				mlip_changed |= xfs_ail_delete_one(ailp, blip);
> +			else if (blip->li_flags & XFS_LI_FAILED)
> +				blip->li_flags &= ~XFS_LI_FAILED;

This needs to do the full "unfail" sequence as well (another place to
call the helper).

>  		}
>  
>  		if (mlip_changed) {
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 3486517..86c3464 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -49,6 +49,7 @@ typedef struct xfs_log_item {
>  	struct xfs_ail			*li_ailp;	/* ptr to AIL */
>  	uint				li_type;	/* item type */
>  	uint				li_flags;	/* misc flags */
> +	struct xfs_buf			*li_buf;	/* real buffer pointer */
>  	struct xfs_log_item		*li_bio_list;	/* buffer item list */
>  	void				(*li_cb)(struct xfs_buf *,
>  						 struct xfs_log_item *);
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 9056c0f..36e08dd 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk(
>  bool
>  xfs_ail_delete_one(
>  	struct xfs_ail		*ailp,
> -	struct xfs_log_item 	*lip)
> +	struct xfs_log_item	*lip)
>  {
>  	struct xfs_log_item	*mlip = xfs_ail_min(ailp);
>  
>  	trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
>  	xfs_ail_delete(ailp, lip);
> -	lip->li_flags &= ~XFS_LI_IN_AIL;
> +	lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED);

... and the same thing here.

Brian

>  	lip->li_lsn = 0;
>  
>  	return mlip == lip;
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback
  2017-06-08 16:45   ` Brian Foster
@ 2017-06-12 12:44     ` Carlos Maiolino
  2017-06-12 13:45       ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-12 12:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Hey.

> > +/*
> > + * Requeue a failed buffer for writeback
> > + *
> > + * Return true if the buffer has been re-queued properly, false otherwise
> > + */
> > +bool
> > +xfs_buf_resubmit_failed_buffers(
> > +	struct xfs_inode	*ip,
> > +	struct xfs_log_item	*lip,
> > +	struct list_head	*buffer_list)
> > +{
> > +	struct xfs_log_item	*next;
> > +	struct xfs_buf		*bp;
> > +	bool			ret;
> > +
> > +	bp = lip->li_buf;
> > +
> > +	/* Clear XFS_LI_FAILED flag from all items before resubmit
> > +	 *
> > +	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
> > +	 * function already have it acquired
> > +	 */
> > +	for (; lip; lip = next) {
> > +		next = lip->li_bio_list;
> > +		lip->li_flags &= ~XFS_LI_FAILED;
> > +		xfs_buf_rele(bp);
> 
> We need to check that the log item has LI_FAILED set before we release
> the buffer because it's possible for the buffer to have a mix of failed
> and recently flushed (not failed) items. We should set ->li_buf = NULL
> wherever we clear LI_FAILED as well.
> 
> A couple small inline helpers to fail/unfail log items correctly (i.e.,
> xfs_log_item_[fail|clear]() or some such) might help to make sure we get
> the reference counting right. We could also assert that the correct lock
> is held from those helpers.
> 

Agreed, will queue it for V4

> > +	}
> > +
> > +	/* Add this buffer back to the delayed write list */
> > +	xfs_buf_lock(bp);
> > +	if (!xfs_buf_delwri_queue(bp, buffer_list))
> > +		ret = false;
> > +	else
> > +		ret = true;
> > +
> > +	xfs_buf_unlock(bp);
> > +	return ret;
> > +}
> > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> > index f7eba99..32aceae 100644
> > --- a/fs/xfs/xfs_buf_item.h
> > +++ b/fs/xfs/xfs_buf_item.h
> > @@ -70,6 +70,8 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
> >  			      xfs_log_item_t *);
> >  void	xfs_buf_iodone_callbacks(struct xfs_buf *);
> >  void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
> > +bool	xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *,
> > +					struct list_head *);
> >  
> >  extern kmem_zone_t	*xfs_buf_item_zone;
> >  
> > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > index 08cb7d1..2f5a49e 100644
> > --- a/fs/xfs/xfs_inode_item.c
> > +++ b/fs/xfs/xfs_inode_item.c
> > @@ -27,6 +27,7 @@
> >  #include "xfs_error.h"
> >  #include "xfs_trace.h"
> >  #include "xfs_trans_priv.h"
> > +#include "xfs_buf_item.h"
> >  #include "xfs_log.h"
> >  
> >  
> > @@ -475,6 +476,26 @@ xfs_inode_item_unpin(
> >  		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
> >  }
> >  
> > +STATIC void
> > +xfs_inode_item_error(
> > +	struct xfs_log_item	*lip,
> > +	struct xfs_buf		*bp,
> > +	unsigned int		bflags)
> > +{
> > +
> > +	/*
> > +	 * The buffer writeback containing this inode has been failed
> > +	 * mark it as failed and unlock the flush lock, so it can be retried
> > +	 * again.
> > +	 * Use xa_lock to avoid races with other log item state changes.
> > +	 */
> > +	spin_lock(&lip->li_ailp->xa_lock);
> 
> I'd push the ->xa_lock up into caller and cycle it once over the list
> rather than once per item. This is more consistent with the locking
> pattern for successful I/Os.
> 

I considered it too, and decided to use spin_lock/unlock on each interaction to
avoid having it locked for a long time, but I don't think there is any harm in
using a single lock/unlock.

> > +	xfs_buf_hold(bp);
> > +	lip->li_flags |= XFS_LI_FAILED;
> > +	lip->li_buf = bp;
> 
> Also, while I think this is technically Ok based on the current
> implementation, I think we should also test that LI_FAILED is not
> already set and only hold the buf in that case (i.e., the previously
> mentioned helper). This helps prevent a landmine if the
> implementation/locking/etc. changes down the road.
> 
> I think it would be fine to assert that LI_FAILED is not set already if
> we wanted to, though, as long as there's a brief comment as well.
> 
> > +	spin_unlock(&lip->li_ailp->xa_lock);
> > +}
> > +
> >  STATIC uint
> >  xfs_inode_item_push(
> >  	struct xfs_log_item	*lip,
> > @@ -504,6 +525,18 @@ xfs_inode_item_push(
> >  	}
> >  
> >  	/*
> > +	 * The buffer containing this item failed to be written back
> > +	 * previously. Resubmit the buffer for IO.
> > +	 */
> > +	if (lip->li_flags & XFS_LI_FAILED) {
> > +		if (!xfs_buf_resubmit_failed_buffers(ip, lip,
> > +						     buffer_list))
> > +			rval = XFS_ITEM_FLUSHING;
> > +
> > +		goto out_unlock;
> > +	}
> 
> I think this needs to go at least before the ilock. We don't need the
> latter lock because the inode is already flushed. Somebody else could be
> holding ilock and blocked on the flush lock. If that occurs, we'll never
> make progress.

It should be after the ilock IIRC, having it before the ilock will deadlock it
any unmount when there are still inodes to be destroyed.
> 
> I also think we should lock/unlock the buffer here rather than down in
> the helper. Hmm, maybe even queue it as well so it's clear what's going
> on. For example, something like:
> 
> 	if (lip->li_flags & XFS_LI_FAILED) {
> 		xfs_buf_lock(bp);
> 		xfs_buf_clear_failed_items(bp);
> 		if (!xfs_buf_delwri_queue(bp, buffer_list))
> 			rval = XFS_ITEM_FLUSHING;
> 		xfs_buf_unlock(bp);
> 		goto out_unlock;
> 	}
> 

Well, we could do that, but we would need to duplicate this code between inode
and dquot item, while keeping the helper as-is, we can re-use it in both inode
and dquot.

> ... but that is mostly just aesthetic.
> 
> > +
> > +	/*
> >  	 * Stale inode items should force out the iclog.
> >  	 */
> >  	if (ip->i_flags & XFS_ISTALE) {
> > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
> >  	.iop_unlock	= xfs_inode_item_unlock,
> >  	.iop_committed	= xfs_inode_item_committed,
> >  	.iop_push	= xfs_inode_item_push,
> > -	.iop_committing = xfs_inode_item_committing
> > +	.iop_committing = xfs_inode_item_committing,
> > +	.iop_error	= xfs_inode_item_error
> >  };
> >  
> >  
> > @@ -710,7 +744,8 @@ xfs_iflush_done(
> >  		 * the AIL lock.
> >  		 */
> >  		iip = INODE_ITEM(blip);
> > -		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
> > +		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
> > +		    lip->li_flags & XFS_LI_FAILED)
> >  			need_ail++;
> >  
> >  		blip = next;
> > @@ -718,7 +753,8 @@ xfs_iflush_done(
> >  
> >  	/* make sure we capture the state of the initial inode. */
> >  	iip = INODE_ITEM(lip);
> > -	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
> > +	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
> > +	    lip->li_flags & XFS_LI_FAILED)
> >  		need_ail++;
> >  
> >  	/*
> > @@ -739,6 +775,8 @@ xfs_iflush_done(
> >  			if (INODE_ITEM(blip)->ili_logged &&
> >  			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
> >  				mlip_changed |= xfs_ail_delete_one(ailp, blip);
> > +			else if (blip->li_flags & XFS_LI_FAILED)
> > +				blip->li_flags &= ~XFS_LI_FAILED;
> 
> This needs to do the full "unfail" sequence as well (another place to
> call the helper).
> 
> >  		}
> >  
> >  		if (mlip_changed) {
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index 3486517..86c3464 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -49,6 +49,7 @@ typedef struct xfs_log_item {
> >  	struct xfs_ail			*li_ailp;	/* ptr to AIL */
> >  	uint				li_type;	/* item type */
> >  	uint				li_flags;	/* misc flags */
> > +	struct xfs_buf			*li_buf;	/* real buffer pointer */
> >  	struct xfs_log_item		*li_bio_list;	/* buffer item list */
> >  	void				(*li_cb)(struct xfs_buf *,
> >  						 struct xfs_log_item *);
> > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > index 9056c0f..36e08dd 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk(
> >  bool
> >  xfs_ail_delete_one(
> >  	struct xfs_ail		*ailp,
> > -	struct xfs_log_item 	*lip)
> > +	struct xfs_log_item	*lip)
> >  {
> >  	struct xfs_log_item	*mlip = xfs_ail_min(ailp);
> >  
> >  	trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
> >  	xfs_ail_delete(ailp, lip);
> > -	lip->li_flags &= ~XFS_LI_IN_AIL;
> > +	lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED);
> 
> ... and the same thing here.
> 

I agree with the helpers, and I'll queue them up for V4, I'll just wait for some
more reviews on this version before sending V4.

Thanks for the review Brian.

Cheers

-- 
Carlos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback
  2017-06-12 12:44     ` Carlos Maiolino
@ 2017-06-12 13:45       ` Brian Foster
  2017-06-14 12:29         ` Carlos Maiolino
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2017-06-12 13:45 UTC (permalink / raw)
  To: linux-xfs

On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote:
> Hey.
> 
> > > +/*
> > > + * Requeue a failed buffer for writeback
> > > + *
> > > + * Return true if the buffer has been re-queued properly, false otherwise
> > > + */
> > > +bool
> > > +xfs_buf_resubmit_failed_buffers(
> > > +	struct xfs_inode	*ip,
> > > +	struct xfs_log_item	*lip,
> > > +	struct list_head	*buffer_list)
> > > +{
> > > +	struct xfs_log_item	*next;
> > > +	struct xfs_buf		*bp;
> > > +	bool			ret;
> > > +
> > > +	bp = lip->li_buf;
> > > +
> > > +	/* Clear XFS_LI_FAILED flag from all items before resubmit
> > > +	 *
> > > +	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
> > > +	 * function already have it acquired
> > > +	 */
> > > +	for (; lip; lip = next) {
> > > +		next = lip->li_bio_list;
> > > +		lip->li_flags &= ~XFS_LI_FAILED;
> > > +		xfs_buf_rele(bp);
> > 
> > We need to check that the log item has LI_FAILED set before we release
> > the buffer because it's possible for the buffer to have a mix of failed
> > and recently flushed (not failed) items. We should set ->li_buf = NULL
> > wherever we clear LI_FAILED as well.
> > 
> > A couple small inline helpers to fail/unfail log items correctly (i.e.,
> > xfs_log_item_[fail|clear]() or some such) might help to make sure we get
> > the reference counting right. We could also assert that the correct lock
> > is held from those helpers.
> > 
> 
> Agreed, will queue it for V4
> 
> > > +	}
> > > +
> > > +	/* Add this buffer back to the delayed write list */
> > > +	xfs_buf_lock(bp);
> > > +	if (!xfs_buf_delwri_queue(bp, buffer_list))
> > > +		ret = false;
> > > +	else
> > > +		ret = true;
> > > +
> > > +	xfs_buf_unlock(bp);
> > > +	return ret;
> > > +}
> > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> > > index f7eba99..32aceae 100644
> > > --- a/fs/xfs/xfs_buf_item.h
> > > +++ b/fs/xfs/xfs_buf_item.h
> > > @@ -70,6 +70,8 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
> > >  			      xfs_log_item_t *);
> > >  void	xfs_buf_iodone_callbacks(struct xfs_buf *);
> > >  void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
> > > +bool	xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *,
> > > +					struct list_head *);
> > >  
> > >  extern kmem_zone_t	*xfs_buf_item_zone;
> > >  
> > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > > index 08cb7d1..2f5a49e 100644
> > > --- a/fs/xfs/xfs_inode_item.c
> > > +++ b/fs/xfs/xfs_inode_item.c
> > > @@ -27,6 +27,7 @@
> > >  #include "xfs_error.h"
> > >  #include "xfs_trace.h"
> > >  #include "xfs_trans_priv.h"
> > > +#include "xfs_buf_item.h"
> > >  #include "xfs_log.h"
> > >  
> > >  
> > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin(
> > >  		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
> > >  }
> > >  
> > > +STATIC void
> > > +xfs_inode_item_error(
> > > +	struct xfs_log_item	*lip,
> > > +	struct xfs_buf		*bp,
> > > +	unsigned int		bflags)
> > > +{
> > > +
> > > +	/*
> > > +	 * The buffer writeback containing this inode has been failed
> > > +	 * mark it as failed and unlock the flush lock, so it can be retried
> > > +	 * again.
> > > +	 * Use xa_lock to avoid races with other log item state changes.
> > > +	 */
> > > +	spin_lock(&lip->li_ailp->xa_lock);
> > 
> > I'd push the ->xa_lock up into caller and cycle it once over the list
> > rather than once per item. This is more consistent with the locking
> > pattern for successful I/Os.
> > 
> 
> I considered it too, and decided to use spin_lock/unlock on each interaction to
> avoid having it locked for a long time, but I don't think there is any harm in
> using a single lock/unlock.
> 

I don't think we're too concerned about performance at this low a level
when when I/Os are failing (or yet, at least), but I'm guessing that
pulling up the lock is still lighter weight than what a successful I/O
completion does.

That aside, I think we also want the entire sequence of marking
LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL
push sees the state of the items before the errors are processed or
after, not somewhere in between).

> > > +	xfs_buf_hold(bp);
> > > +	lip->li_flags |= XFS_LI_FAILED;
> > > +	lip->li_buf = bp;
> > 
> > Also, while I think this is technically Ok based on the current
> > implementation, I think we should also test that LI_FAILED is not
> > already set and only hold the buf in that case (i.e., the previously
> > mentioned helper). This helps prevent a landmine if the
> > implementation/locking/etc. changes down the road.
> > 
> > I think it would be fine to assert that LI_FAILED is not set already if
> > we wanted to, though, as long as there's a brief comment as well.
> > 
> > > +	spin_unlock(&lip->li_ailp->xa_lock);
> > > +}
> > > +
> > >  STATIC uint
> > >  xfs_inode_item_push(
> > >  	struct xfs_log_item	*lip,
> > > @@ -504,6 +525,18 @@ xfs_inode_item_push(
> > >  	}
> > >  
> > >  	/*
> > > +	 * The buffer containing this item failed to be written back
> > > +	 * previously. Resubmit the buffer for IO.
> > > +	 */
> > > +	if (lip->li_flags & XFS_LI_FAILED) {
> > > +		if (!xfs_buf_resubmit_failed_buffers(ip, lip,
> > > +						     buffer_list))
> > > +			rval = XFS_ITEM_FLUSHING;
> > > +
> > > +		goto out_unlock;
> > > +	}
> > 
> > I think this needs to go at least before the ilock. We don't need the
> > latter lock because the inode is already flushed. Somebody else could be
> > holding ilock and blocked on the flush lock. If that occurs, we'll never
> > make progress.
> 
> It should be after the ilock IIRC, having it before the ilock will deadlock it
> any unmount when there are still inodes to be destroyed.

Hmm.. so if we get into the failed state on an inode and a reclaim
attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and
then block on the flush lock. With this code as it is, we'll now never
resubmit the failed buffer (unless via some other inode, by chance)
because we can't get the inode lock. IOW, we're stuck in a livelock that
resembles the original problem.

AFAICT, we don't need the inode lock in this case at all. The only
reason for it here that I'm aware of is to follow the ilock -> flush
lock ordering rules and we know the inode is already flush locked in the
failed case. Can you elaborate on the unmount deadlock issue if we move
this before the ilock? Shouldn't the I/O error handling detect the
unmounting state the next go around and flag it as a permanent error?
I'm wondering if there is some other bug lurking there that we need to
work around...

(Note that I think it's important we resolve this above all else,
because getting this correct is probably what dictates whether this is a
viable approach or that we need to start looking at something like
Dave's flush lock rework idea.)

> > 
> > I also think we should lock/unlock the buffer here rather than down in
> > the helper. Hmm, maybe even queue it as well so it's clear what's going
> > on. For example, something like:
> > 
> > 	if (lip->li_flags & XFS_LI_FAILED) {
> > 		xfs_buf_lock(bp);
> > 		xfs_buf_clear_failed_items(bp);
> > 		if (!xfs_buf_delwri_queue(bp, buffer_list))
> > 			rval = XFS_ITEM_FLUSHING;
> > 		xfs_buf_unlock(bp);
> > 		goto out_unlock;
> > 	}
> > 
> 
> Well, we could do that, but we would need to duplicate this code between inode
> and dquot item, while keeping the helper as-is, we can re-use it in both inode
> and dquot.
> 

That doesn't seem so bad to me. It's basically just a lock and delwri
queue of a buffer. That said, we could also create a couple separate
helpers here: one that requeues a buffer and another that clears failed
items on a buffer. Hmm, I'm actually wondering if we can refactor
xfs_inode_item_push() a bit to just reuse some of the existing code
here. Perhaps it's best to get the locking down correctly before looking
into this...

BTW, I just realized that the xfs_buf_lock() above probably needs to be
a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock
via xfs_iflush() is a trylock as well, and we actually drop and
reacquire ->xa_lock across that and the delwri queue of the buffer (but
I'm not totally sure that part is necessary here).

> > ... but that is mostly just aesthetic.
> > 
> > > +
> > > +	/*
> > >  	 * Stale inode items should force out the iclog.
> > >  	 */
> > >  	if (ip->i_flags & XFS_ISTALE) {
> > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
> > >  	.iop_unlock	= xfs_inode_item_unlock,
> > >  	.iop_committed	= xfs_inode_item_committed,
> > >  	.iop_push	= xfs_inode_item_push,
> > > -	.iop_committing = xfs_inode_item_committing
> > > +	.iop_committing = xfs_inode_item_committing,
> > > +	.iop_error	= xfs_inode_item_error
> > >  };
> > >  
> > >  
> > > @@ -710,7 +744,8 @@ xfs_iflush_done(
> > >  		 * the AIL lock.
> > >  		 */
> > >  		iip = INODE_ITEM(blip);
> > > -		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
> > > +		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
> > > +		    lip->li_flags & XFS_LI_FAILED)
> > >  			need_ail++;
> > >  
> > >  		blip = next;
> > > @@ -718,7 +753,8 @@ xfs_iflush_done(
> > >  
> > >  	/* make sure we capture the state of the initial inode. */
> > >  	iip = INODE_ITEM(lip);
> > > -	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
> > > +	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
> > > +	    lip->li_flags & XFS_LI_FAILED)
> > >  		need_ail++;
> > >  
> > >  	/*
> > > @@ -739,6 +775,8 @@ xfs_iflush_done(
> > >  			if (INODE_ITEM(blip)->ili_logged &&
> > >  			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
> > >  				mlip_changed |= xfs_ail_delete_one(ailp, blip);
> > > +			else if (blip->li_flags & XFS_LI_FAILED)
> > > +				blip->li_flags &= ~XFS_LI_FAILED;
> > 
> > This needs to do the full "unfail" sequence as well (another place to
> > call the helper).
> > 
> > >  		}
> > >  
> > >  		if (mlip_changed) {
> > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > > index 3486517..86c3464 100644
> > > --- a/fs/xfs/xfs_trans.h
> > > +++ b/fs/xfs/xfs_trans.h
> > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item {
> > >  	struct xfs_ail			*li_ailp;	/* ptr to AIL */
> > >  	uint				li_type;	/* item type */
> > >  	uint				li_flags;	/* misc flags */
> > > +	struct xfs_buf			*li_buf;	/* real buffer pointer */
> > >  	struct xfs_log_item		*li_bio_list;	/* buffer item list */
> > >  	void				(*li_cb)(struct xfs_buf *,
> > >  						 struct xfs_log_item *);
> > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > > index 9056c0f..36e08dd 100644
> > > --- a/fs/xfs/xfs_trans_ail.c
> > > +++ b/fs/xfs/xfs_trans_ail.c
> > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk(
> > >  bool
> > >  xfs_ail_delete_one(
> > >  	struct xfs_ail		*ailp,
> > > -	struct xfs_log_item 	*lip)
> > > +	struct xfs_log_item	*lip)
> > >  {
> > >  	struct xfs_log_item	*mlip = xfs_ail_min(ailp);
> > >  
> > >  	trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
> > >  	xfs_ail_delete(ailp, lip);
> > > -	lip->li_flags &= ~XFS_LI_IN_AIL;
> > > +	lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED);
> > 
> > ... and the same thing here.
> > 
> 
> I agree with the helpers, and I'll queue them up for V4, I'll just wait for some
> more reviews on this version before sending V4.
> 

Sounds good.

Brian

> Thanks for the review Brian.
> 
> Cheers
> 
> -- 
> Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback
  2017-06-12 13:45       ` Brian Foster
@ 2017-06-14 12:29         ` Carlos Maiolino
  0 siblings, 0 replies; 9+ messages in thread
From: Carlos Maiolino @ 2017-06-14 12:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Jun 12, 2017 at 09:45:44AM -0400, Brian Foster wrote:
> On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote:
> > Hey.
> > 
> > > > +/*
> > > > + * Requeue a failed buffer for writeback
> > > > + *
> > > > + * Return true if the buffer has been re-queued properly, false otherwise
> > > > + */
> > > > +bool
> > > > +xfs_buf_resubmit_failed_buffers(
> > > > +	struct xfs_inode	*ip,
> > > > +	struct xfs_log_item	*lip,
> > > > +	struct list_head	*buffer_list)
> > > > +{
> > > > +	struct xfs_log_item	*next;
> > > > +	struct xfs_buf		*bp;
> > > > +	bool			ret;
> > > > +
> > > > +	bp = lip->li_buf;
> > > > +
> > > > +	/* Clear XFS_LI_FAILED flag from all items before resubmit
> > > > +	 *
> > > > +	 * XFS_LI_FAILED set/clear is protected by xa_lock, caller  this
> > > > +	 * function already have it acquired
> > > > +	 */
> > > > +	for (; lip; lip = next) {
> > > > +		next = lip->li_bio_list;
> > > > +		lip->li_flags &= ~XFS_LI_FAILED;
> > > > +		xfs_buf_rele(bp);
> > > 
> > > We need to check that the log item has LI_FAILED set before we release
> > > the buffer because it's possible for the buffer to have a mix of failed
> > > and recently flushed (not failed) items. We should set ->li_buf = NULL
> > > wherever we clear LI_FAILED as well.
> > > 
> > > A couple small inline helpers to fail/unfail log items correctly (i.e.,
> > > xfs_log_item_[fail|clear]() or some such) might help to make sure we get
> > > the reference counting right. We could also assert that the correct lock
> > > is held from those helpers.
> > > 
> > 
> > Agreed, will queue it for V4
> > 
> > > > +	}
> > > > +
> > > > +	/* Add this buffer back to the delayed write list */
> > > > +	xfs_buf_lock(bp);
> > > > +	if (!xfs_buf_delwri_queue(bp, buffer_list))
> > > > +		ret = false;
> > > > +	else
> > > > +		ret = true;
> > > > +
> > > > +	xfs_buf_unlock(bp);
> > > > +	return ret;
> > > > +}
> > > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> > > > index f7eba99..32aceae 100644
> > > > --- a/fs/xfs/xfs_buf_item.h
> > > > +++ b/fs/xfs/xfs_buf_item.h
> > > > @@ -70,6 +70,8 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
> > > >  			      xfs_log_item_t *);
> > > >  void	xfs_buf_iodone_callbacks(struct xfs_buf *);
> > > >  void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
> > > > +bool	xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *,
> > > > +					struct list_head *);
> > > >  
> > > >  extern kmem_zone_t	*xfs_buf_item_zone;
> > > >  
> > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > > > index 08cb7d1..2f5a49e 100644
> > > > --- a/fs/xfs/xfs_inode_item.c
> > > > +++ b/fs/xfs/xfs_inode_item.c
> > > > @@ -27,6 +27,7 @@
> > > >  #include "xfs_error.h"
> > > >  #include "xfs_trace.h"
> > > >  #include "xfs_trans_priv.h"
> > > > +#include "xfs_buf_item.h"
> > > >  #include "xfs_log.h"
> > > >  
> > > >  
> > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin(
> > > >  		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
> > > >  }
> > > >  
> > > > +STATIC void
> > > > +xfs_inode_item_error(
> > > > +	struct xfs_log_item	*lip,
> > > > +	struct xfs_buf		*bp,
> > > > +	unsigned int		bflags)
> > > > +{
> > > > +
> > > > +	/*
> > > > +	 * The buffer writeback containing this inode has been failed
> > > > +	 * mark it as failed and unlock the flush lock, so it can be retried
> > > > +	 * again.
> > > > +	 * Use xa_lock to avoid races with other log item state changes.
> > > > +	 */
> > > > +	spin_lock(&lip->li_ailp->xa_lock);
> > > 
> > > I'd push the ->xa_lock up into caller and cycle it once over the list
> > > rather than once per item. This is more consistent with the locking
> > > pattern for successful I/Os.
> > > 
> > 
> > I considered it too, and decided to use spin_lock/unlock on each interaction to
> > avoid having it locked for a long time, but I don't think there is any harm in
> > using a single lock/unlock.
> > 
> 
> I don't think we're too concerned about performance at this low a level
> when when I/Os are failing (or yet, at least), but I'm guessing that
> pulling up the lock is still lighter weight than what a successful I/O
> completion does.
> 
> That aside, I think we also want the entire sequence of marking
> LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL
> push sees the state of the items before the errors are processed or
> after, not somewhere in between).
> 

Agreed.

> > > > +	xfs_buf_hold(bp);
> > > > +	lip->li_flags |= XFS_LI_FAILED;
> > > > +	lip->li_buf = bp;
> > > 
> > > Also, while I think this is technically Ok based on the current
> > > implementation, I think we should also test that LI_FAILED is not
> > > already set and only hold the buf in that case (i.e., the previously
> > > mentioned helper). This helps prevent a landmine if the
> > > implementation/locking/etc. changes down the road.
> > > 
> > > I think it would be fine to assert that LI_FAILED is not set already if
> > > we wanted to, though, as long as there's a brief comment as well.
> > > 
> > > > +	spin_unlock(&lip->li_ailp->xa_lock);
> > > > +}
> > > > +
> > > >  STATIC uint
> > > >  xfs_inode_item_push(
> > > >  	struct xfs_log_item	*lip,
> > > > @@ -504,6 +525,18 @@ xfs_inode_item_push(
> > > >  	}
> > > >  
> > > >  	/*
> > > > +	 * The buffer containing this item failed to be written back
> > > > +	 * previously. Resubmit the buffer for IO.
> > > > +	 */
> > > > +	if (lip->li_flags & XFS_LI_FAILED) {
> > > > +		if (!xfs_buf_resubmit_failed_buffers(ip, lip,
> > > > +						     buffer_list))
> > > > +			rval = XFS_ITEM_FLUSHING;
> > > > +
> > > > +		goto out_unlock;
> > > > +	}
> > > 
> > > I think this needs to go at least before the ilock. We don't need the
> > > latter lock because the inode is already flushed. Somebody else could be
> > > holding ilock and blocked on the flush lock. If that occurs, we'll never
> > > make progress.
> > 
> > It should be after the ilock IIRC, having it before the ilock will deadlock it
> > any unmount when there are still inodes to be destroyed.
> 
> Hmm.. so if we get into the failed state on an inode and a reclaim
> attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and
> then block on the flush lock. With this code as it is, we'll now never
> resubmit the failed buffer (unless via some other inode, by chance)
> because we can't get the inode lock. IOW, we're stuck in a livelock that
> resembles the original problem.
> 
> AFAICT, we don't need the inode lock in this case at all. The only
> reason for it here that I'm aware of is to follow the ilock -> flush
> lock ordering rules and we know the inode is already flush locked in the
> failed case. Can you elaborate on the unmount deadlock issue if we move
> this before the ilock? Shouldn't the I/O error handling detect the
> unmounting state the next go around and flag it as a permanent error?
> I'm wondering if there is some other bug lurking there that we need to
> work around...
> 

Nevermind, I didn't investigate when I've hit the deadlock, I found what
happened, fixed, and I'll queue this to V4.

> (Note that I think it's important we resolve this above all else,
> because getting this correct is probably what dictates whether this is a
> viable approach or that we need to start looking at something like
> Dave's flush lock rework idea.)
> 
> > > 
> > > I also think we should lock/unlock the buffer here rather than down in
> > > the helper. Hmm, maybe even queue it as well so it's clear what's going
> > > on. For example, something like:
> > > 
> > > 	if (lip->li_flags & XFS_LI_FAILED) {
> > > 		xfs_buf_lock(bp);
> > > 		xfs_buf_clear_failed_items(bp);
> > > 		if (!xfs_buf_delwri_queue(bp, buffer_list))
> > > 			rval = XFS_ITEM_FLUSHING;
> > > 		xfs_buf_unlock(bp);
> > > 		goto out_unlock;
> > > 	}
> > > 
> > 
> > Well, we could do that, but we would need to duplicate this code between inode
> > and dquot item, while keeping the helper as-is, we can re-use it in both inode
> > and dquot.
> > 
> 
> That doesn't seem so bad to me. It's basically just a lock and delwri
> queue of a buffer. That said, we could also create a couple separate
> helpers here: one that requeues a buffer and another that clears failed
> items on a buffer. Hmm, I'm actually wondering if we can refactor
> xfs_inode_item_push() a bit to just reuse some of the existing code
> here. Perhaps it's best to get the locking down correctly before looking
> into this...
> 
> BTW, I just realized that the xfs_buf_lock() above probably needs to be
> a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock
> via xfs_iflush() is a trylock as well, and we actually drop and
> reacquire ->xa_lock across that and the delwri queue of the buffer (but
> I'm not totally sure that part is necessary here).
> 
> > > ... but that is mostly just aesthetic.
> > > 
> > > > +
> > > > +	/*
> > > >  	 * Stale inode items should force out the iclog.
> > > >  	 */
> > > >  	if (ip->i_flags & XFS_ISTALE) {
> > > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
> > > >  	.iop_unlock	= xfs_inode_item_unlock,
> > > >  	.iop_committed	= xfs_inode_item_committed,
> > > >  	.iop_push	= xfs_inode_item_push,
> > > > -	.iop_committing = xfs_inode_item_committing
> > > > +	.iop_committing = xfs_inode_item_committing,
> > > > +	.iop_error	= xfs_inode_item_error
> > > >  };
> > > >  
> > > >  
> > > > @@ -710,7 +744,8 @@ xfs_iflush_done(
> > > >  		 * the AIL lock.
> > > >  		 */
> > > >  		iip = INODE_ITEM(blip);
> > > > -		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
> > > > +		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
> > > > +		    lip->li_flags & XFS_LI_FAILED)
> > > >  			need_ail++;
> > > >  
> > > >  		blip = next;
> > > > @@ -718,7 +753,8 @@ xfs_iflush_done(
> > > >  
> > > >  	/* make sure we capture the state of the initial inode. */
> > > >  	iip = INODE_ITEM(lip);
> > > > -	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
> > > > +	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
> > > > +	    lip->li_flags & XFS_LI_FAILED)
> > > >  		need_ail++;
> > > >  
> > > >  	/*
> > > > @@ -739,6 +775,8 @@ xfs_iflush_done(
> > > >  			if (INODE_ITEM(blip)->ili_logged &&
> > > >  			    blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
> > > >  				mlip_changed |= xfs_ail_delete_one(ailp, blip);
> > > > +			else if (blip->li_flags & XFS_LI_FAILED)
> > > > +				blip->li_flags &= ~XFS_LI_FAILED;
> > > 
> > > This needs to do the full "unfail" sequence as well (another place to
> > > call the helper).
> > > 
> > > >  		}
> > > >  
> > > >  		if (mlip_changed) {
> > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > > > index 3486517..86c3464 100644
> > > > --- a/fs/xfs/xfs_trans.h
> > > > +++ b/fs/xfs/xfs_trans.h
> > > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item {
> > > >  	struct xfs_ail			*li_ailp;	/* ptr to AIL */
> > > >  	uint				li_type;	/* item type */
> > > >  	uint				li_flags;	/* misc flags */
> > > > +	struct xfs_buf			*li_buf;	/* real buffer pointer */
> > > >  	struct xfs_log_item		*li_bio_list;	/* buffer item list */
> > > >  	void				(*li_cb)(struct xfs_buf *,
> > > >  						 struct xfs_log_item *);
> > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > > > index 9056c0f..36e08dd 100644
> > > > --- a/fs/xfs/xfs_trans_ail.c
> > > > +++ b/fs/xfs/xfs_trans_ail.c
> > > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk(
> > > >  bool
> > > >  xfs_ail_delete_one(
> > > >  	struct xfs_ail		*ailp,
> > > > -	struct xfs_log_item 	*lip)
> > > > +	struct xfs_log_item	*lip)
> > > >  {
> > > >  	struct xfs_log_item	*mlip = xfs_ail_min(ailp);
> > > >  
> > > >  	trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
> > > >  	xfs_ail_delete(ailp, lip);
> > > > -	lip->li_flags &= ~XFS_LI_IN_AIL;
> > > > +	lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED);
> > > 
> > > ... and the same thing here.
> > > 
> > 
> > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some
> > more reviews on this version before sending V4.
> > 
> 
> Sounds good.
> 
> Brian
> 
> > Thanks for the review Brian.
> > 
> > Cheers
> > 
> > -- 
> > Carlos
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-14 12:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino
2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino
2017-06-08 16:43   ` Brian Foster
2017-06-14 10:15     ` Carlos Maiolino
2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino
2017-06-08 16:45   ` Brian Foster
2017-06-12 12:44     ` Carlos Maiolino
2017-06-12 13:45       ` Brian Foster
2017-06-14 12:29         ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).