* [PATCH 0/2 V3] Resubmit items failed during writeback @ 2017-06-08 14:00 Carlos Maiolino 2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino 2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino 0 siblings, 2 replies; 9+ messages in thread From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw) To: linux-xfs Hi, this is the 3rd revision of this patchset, based on the discussion in the previous version, hoping to have fixed all the issues pointed there :) I've removed the atomic handling of the xfs_log_item flags once it has been decided to convert the usage to atomic handling after the patchset if accepted, so, I'll add it to another patchset together with the dquot fix for the items resubmission. More details on each patch. Cheers -- 2.9.3 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure 2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino @ 2017-06-08 14:00 ` Carlos Maiolino 2017-06-08 16:43 ` Brian Foster 2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino 1 sibling, 1 reply; 9+ messages in thread From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw) To: linux-xfs With the current code, XFS never re-submit a failed buffer for IO, because the failed item in the buffer is kept in the flush locked state forever. To be able to resubmit an log item for IO, we need a way to mark an item as failed, if, for any reason the buffer which the item belonged to failed during writeback. Add a new log item callback to be used after an IO completion failure and make the needed clean ups. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- V2: - Update commit log to include a better description of why this patch is needed and fix spelling mistakes - Move xfs_buf_do_callbacks_fail() call into xfs_buf_iodone_callback_error, so the callbacks can be executed before the buffer is released, and only after it has been retried once V3: - fix some loops according to hch suggestion - whitespace cleanup fs/xfs/xfs_buf_item.c | 23 ++++++++++++++++++++++- fs/xfs/xfs_trans.h | 7 +++++-- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 0306168..523b7a4 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -29,6 +29,7 @@ #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_log.h" +#include "xfs_inode.h" kmem_zone_t *xfs_buf_item_zone; @@ -1051,6 +1052,20 @@ xfs_buf_do_callbacks( } } +STATIC void +xfs_buf_do_callbacks_fail( + struct xfs_buf *bp) +{ + struct xfs_log_item *lip, *next; + unsigned int bflags = bp->b_flags; + + for (lip = bp->b_fspriv; lip; lip = next) { + next = lip->li_bio_list; + if (lip->li_ops->iop_error) + lip->li_ops->iop_error(lip, bp, bflags); + } +} + static bool xfs_buf_iodone_callback_error( struct xfs_buf *bp) @@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error( if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) goto permanent_error; - /* still a transient error, higher layers will retry */ + /* + * still a transient error, run IO completion failure callbacks and + * let the higher layers retry the buffer. + * */ xfs_buf_ioerror(bp, 0); + + /* run failure callbacks before releasing buffer */ + xfs_buf_do_callbacks_fail(bp); xfs_buf_relse(bp); return true; diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index a07acbf..3486517 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -64,11 +64,13 @@ typedef struct xfs_log_item { } xfs_log_item_t; #define XFS_LI_IN_AIL 0x1 -#define XFS_LI_ABORTED 0x2 +#define XFS_LI_ABORTED 0x2 +#define XFS_LI_FAILED 0x4 #define XFS_LI_FLAGS \ { XFS_LI_IN_AIL, "IN_AIL" }, \ - { XFS_LI_ABORTED, "ABORTED" } + { XFS_LI_ABORTED, "ABORTED" }, \ + { XFS_LI_FAILED, "FAILED" } struct xfs_item_ops { void (*iop_size)(xfs_log_item_t *, int *, int *); @@ -79,6 +81,7 @@ struct xfs_item_ops { void (*iop_unlock)(xfs_log_item_t *); xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t); void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t); + void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags); }; void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure 2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino @ 2017-06-08 16:43 ` Brian Foster 2017-06-14 10:15 ` Carlos Maiolino 0 siblings, 1 reply; 9+ messages in thread From: Brian Foster @ 2017-06-08 16:43 UTC (permalink / raw) To: Carlos Maiolino; +Cc: linux-xfs On Thu, Jun 08, 2017 at 04:00:13PM +0200, Carlos Maiolino wrote: > With the current code, XFS never re-submit a failed buffer for IO, > because the failed item in the buffer is kept in the flush locked state > forever. > > To be able to resubmit an log item for IO, we need a way to mark an item > as failed, if, for any reason the buffer which the item belonged to > failed during writeback. > > Add a new log item callback to be used after an IO completion failure > and make the needed clean ups. > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > > V2: > - Update commit log to include a better description of why this > patch is needed and fix spelling mistakes > - Move xfs_buf_do_callbacks_fail() call into > xfs_buf_iodone_callback_error, so the callbacks can be executed > before the buffer is released, and only after it has been > retried once > > V3: > - fix some loops according to hch suggestion > - whitespace cleanup > > fs/xfs/xfs_buf_item.c | 23 ++++++++++++++++++++++- > fs/xfs/xfs_trans.h | 7 +++++-- > 2 files changed, 27 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index 0306168..523b7a4 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -29,6 +29,7 @@ > #include "xfs_error.h" > #include "xfs_trace.h" > #include "xfs_log.h" > +#include "xfs_inode.h" > > > kmem_zone_t *xfs_buf_item_zone; > @@ -1051,6 +1052,20 @@ xfs_buf_do_callbacks( > } > } > > +STATIC void > +xfs_buf_do_callbacks_fail( > + struct xfs_buf *bp) > +{ > + struct xfs_log_item *lip, *next; > + unsigned int bflags = bp->b_flags; > + > + for (lip = bp->b_fspriv; lip; lip = next) { > + next = lip->li_bio_list; > + if (lip->li_ops->iop_error) > + lip->li_ops->iop_error(lip, bp, bflags); There's still no need to pass bflags here, particularly since we pass along the buffer already. (I'm still questioning the need for ->iop_error(), but I'll leave that as a final review question once everything else is in order. :P) > + } > +} > + > static bool > xfs_buf_iodone_callback_error( > struct xfs_buf *bp) > @@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error( > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > goto permanent_error; > > - /* still a transient error, higher layers will retry */ > + /* > + * still a transient error, run IO completion failure callbacks and > + * let the higher layers retry the buffer. > + * */ > xfs_buf_ioerror(bp, 0); > + > + /* run failure callbacks before releasing buffer */ > + xfs_buf_do_callbacks_fail(bp); I think in principle it might be more appropriate to invoke the callbacks before we reset the I/O error. That way the failure callbacks see the buffer in the error state, then we reset the error and release it back to the AIL (so to speak). Otherwise the rest looks sane to me. Brian > xfs_buf_relse(bp); > return true; > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > index a07acbf..3486517 100644 > --- a/fs/xfs/xfs_trans.h > +++ b/fs/xfs/xfs_trans.h > @@ -64,11 +64,13 @@ typedef struct xfs_log_item { > } xfs_log_item_t; > > #define XFS_LI_IN_AIL 0x1 > -#define XFS_LI_ABORTED 0x2 > +#define XFS_LI_ABORTED 0x2 > +#define XFS_LI_FAILED 0x4 > > #define XFS_LI_FLAGS \ > { XFS_LI_IN_AIL, "IN_AIL" }, \ > - { XFS_LI_ABORTED, "ABORTED" } > + { XFS_LI_ABORTED, "ABORTED" }, \ > + { XFS_LI_FAILED, "FAILED" } > > struct xfs_item_ops { > void (*iop_size)(xfs_log_item_t *, int *, int *); > @@ -79,6 +81,7 @@ struct xfs_item_ops { > void (*iop_unlock)(xfs_log_item_t *); > xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t); > void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t); > + void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags); > }; > > void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure 2017-06-08 16:43 ` Brian Foster @ 2017-06-14 10:15 ` Carlos Maiolino 0 siblings, 0 replies; 9+ messages in thread From: Carlos Maiolino @ 2017-06-14 10:15 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs > > +STATIC void > > +xfs_buf_do_callbacks_fail( > > + struct xfs_buf *bp) > > +{ > > + struct xfs_log_item *lip, *next; > > + unsigned int bflags = bp->b_flags; > > + > > + for (lip = bp->b_fspriv; lip; lip = next) { > > + next = lip->li_bio_list; > > + if (lip->li_ops->iop_error) > > + lip->li_ops->iop_error(lip, bp, bflags); > > There's still no need to pass bflags here, particularly since we pass > along the buffer already. (I'm still questioning the need for > ->iop_error(), but I'll leave that as a final review question once > everything else is in order. :P) > Agreed regarding bflags, removed for V4. Regarding iop_error(), well, I'm not discussing it anymore :) > > + } > > +} > > + > > static bool > > xfs_buf_iodone_callback_error( > > struct xfs_buf *bp) > > @@ -1120,8 +1135,14 @@ xfs_buf_iodone_callback_error( > > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > > goto permanent_error; > > > > - /* still a transient error, higher layers will retry */ > > + /* > > + * still a transient error, run IO completion failure callbacks and > > + * let the higher layers retry the buffer. > > + * */ > > xfs_buf_ioerror(bp, 0); > > + > > + /* run failure callbacks before releasing buffer */ > > + xfs_buf_do_callbacks_fail(bp); > > I think in principle it might be more appropriate to invoke the > callbacks before we reset the I/O error. That way the failure callbacks > see the buffer in the error state, then we reset the error and release > it back to the AIL (so to speak). > Agreed, on V4. > Otherwise the rest looks sane to me. > Thanks for the review > Brian > > > xfs_buf_relse(bp); > > return true; > > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > index a07acbf..3486517 100644 > > --- a/fs/xfs/xfs_trans.h > > +++ b/fs/xfs/xfs_trans.h > > @@ -64,11 +64,13 @@ typedef struct xfs_log_item { > > } xfs_log_item_t; > > > > #define XFS_LI_IN_AIL 0x1 > > -#define XFS_LI_ABORTED 0x2 > > +#define XFS_LI_ABORTED 0x2 > > +#define XFS_LI_FAILED 0x4 > > > > #define XFS_LI_FLAGS \ > > { XFS_LI_IN_AIL, "IN_AIL" }, \ > > - { XFS_LI_ABORTED, "ABORTED" } > > + { XFS_LI_ABORTED, "ABORTED" }, \ > > + { XFS_LI_FAILED, "FAILED" } > > > > struct xfs_item_ops { > > void (*iop_size)(xfs_log_item_t *, int *, int *); > > @@ -79,6 +81,7 @@ struct xfs_item_ops { > > void (*iop_unlock)(xfs_log_item_t *); > > xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t); > > void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t); > > + void (*iop_error)(xfs_log_item_t *, xfs_buf_t *, unsigned int bflags); > > }; > > > > void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, > > -- > > 2.9.3 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback 2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino 2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino @ 2017-06-08 14:00 ` Carlos Maiolino 2017-06-08 16:45 ` Brian Foster 1 sibling, 1 reply; 9+ messages in thread From: Carlos Maiolino @ 2017-06-08 14:00 UTC (permalink / raw) To: linux-xfs When a buffer has been failed during writeback, the inode items into it are kept flush locked, and are never resubmitted due the flush lock, so, if any buffer fails to be written, the items in AIL are never written to disk and never unlocked. This causes unmount operation to hang due these items flush locked in AIL, but this also causes the items in AIL to never be written back, even when the IO device comes back to normal. I've been testing this patch with a DM-thin device, creating a filesystem larger than the real device. When writing enough data to fill the DM-thin device, XFS receives ENOSPC errors from the device, and keep spinning on xfsaild (when 'retry forever' configuration is set). At this point, the filesystem can not be unmounted because of the flush locked items in AIL, but worse, the items in AIL are never retried at all (once xfs_inode_item_push() will skip the items that are flush locked), even if the underlying DM-thin device is expanded to the proper size. This patch fixes both cases, retrying any item that has been failed previously, using the infra-structure provided by the previous patch. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- V2: - Fix XFS_LI_FAILED flag removal - Use atomic operations to set and clear XFS_LI_FAILED flag - Remove check for XBF_WRITE_FAIL in xfs_inode_item_push - Add more comments to the code - Add a helper function to resubmit the failed buffers, so this can be also used in dquot system without duplicating code V3: - drop xfs_imap_to_bp call using a pointer in the log item to hold the buffer address, fixing a possible infinite loop if xfs_map_to_bp returns -EIO - use xa_lock instead of atomic operations to handle log item flags - Add a hold to the buffer for each log item failed - move buffer resubmission up in xfs_inode_item_push() fs/xfs/xfs_buf_item.c | 39 +++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_buf_item.h | 2 ++ fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_trans.h | 1 + fs/xfs/xfs_trans_ail.c | 4 ++-- 5 files changed, 85 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 523b7a4..9b7bd23 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -1222,3 +1222,42 @@ xfs_buf_iodone( xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE); xfs_buf_item_free(BUF_ITEM(lip)); } + +/* + * Requeue a failed buffer for writeback + * + * Return true if the buffer has been re-queued properly, false otherwise + */ +bool +xfs_buf_resubmit_failed_buffers( + struct xfs_inode *ip, + struct xfs_log_item *lip, + struct list_head *buffer_list) +{ + struct xfs_log_item *next; + struct xfs_buf *bp; + bool ret; + + bp = lip->li_buf; + + /* Clear XFS_LI_FAILED flag from all items before resubmit + * + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this + * function already have it acquired + */ + for (; lip; lip = next) { + next = lip->li_bio_list; + lip->li_flags &= ~XFS_LI_FAILED; + xfs_buf_rele(bp); + } + + /* Add this buffer back to the delayed write list */ + xfs_buf_lock(bp); + if (!xfs_buf_delwri_queue(bp, buffer_list)) + ret = false; + else + ret = true; + + xfs_buf_unlock(bp); + return ret; +} diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index f7eba99..32aceae 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, xfs_log_item_t *); void xfs_buf_iodone_callbacks(struct xfs_buf *); void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, + struct list_head *); extern kmem_zone_t *xfs_buf_item_zone; diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 08cb7d1..2f5a49e 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -27,6 +27,7 @@ #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_trans_priv.h" +#include "xfs_buf_item.h" #include "xfs_log.h" @@ -475,6 +476,26 @@ xfs_inode_item_unpin( wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); } +STATIC void +xfs_inode_item_error( + struct xfs_log_item *lip, + struct xfs_buf *bp, + unsigned int bflags) +{ + + /* + * The buffer writeback containing this inode has been failed + * mark it as failed and unlock the flush lock, so it can be retried + * again. + * Use xa_lock to avoid races with other log item state changes. + */ + spin_lock(&lip->li_ailp->xa_lock); + xfs_buf_hold(bp); + lip->li_flags |= XFS_LI_FAILED; + lip->li_buf = bp; + spin_unlock(&lip->li_ailp->xa_lock); +} + STATIC uint xfs_inode_item_push( struct xfs_log_item *lip, @@ -504,6 +525,18 @@ xfs_inode_item_push( } /* + * The buffer containing this item failed to be written back + * previously. Resubmit the buffer for IO. + */ + if (lip->li_flags & XFS_LI_FAILED) { + if (!xfs_buf_resubmit_failed_buffers(ip, lip, + buffer_list)) + rval = XFS_ITEM_FLUSHING; + + goto out_unlock; + } + + /* * Stale inode items should force out the iclog. */ if (ip->i_flags & XFS_ISTALE) { @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { .iop_unlock = xfs_inode_item_unlock, .iop_committed = xfs_inode_item_committed, .iop_push = xfs_inode_item_push, - .iop_committing = xfs_inode_item_committing + .iop_committing = xfs_inode_item_committing, + .iop_error = xfs_inode_item_error }; @@ -710,7 +744,8 @@ xfs_iflush_done( * the AIL lock. */ iip = INODE_ITEM(blip); - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || + lip->li_flags & XFS_LI_FAILED) need_ail++; blip = next; @@ -718,7 +753,8 @@ xfs_iflush_done( /* make sure we capture the state of the initial inode. */ iip = INODE_ITEM(lip); - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || + lip->li_flags & XFS_LI_FAILED) need_ail++; /* @@ -739,6 +775,8 @@ xfs_iflush_done( if (INODE_ITEM(blip)->ili_logged && blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) mlip_changed |= xfs_ail_delete_one(ailp, blip); + else if (blip->li_flags & XFS_LI_FAILED) + blip->li_flags &= ~XFS_LI_FAILED; } if (mlip_changed) { diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 3486517..86c3464 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -49,6 +49,7 @@ typedef struct xfs_log_item { struct xfs_ail *li_ailp; /* ptr to AIL */ uint li_type; /* item type */ uint li_flags; /* misc flags */ + struct xfs_buf *li_buf; /* real buffer pointer */ struct xfs_log_item *li_bio_list; /* buffer item list */ void (*li_cb)(struct xfs_buf *, struct xfs_log_item *); diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 9056c0f..36e08dd 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( bool xfs_ail_delete_one( struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip) { struct xfs_log_item *mlip = xfs_ail_min(ailp); trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); xfs_ail_delete(ailp, lip); - lip->li_flags &= ~XFS_LI_IN_AIL; + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); lip->li_lsn = 0; return mlip == lip; -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback 2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino @ 2017-06-08 16:45 ` Brian Foster 2017-06-12 12:44 ` Carlos Maiolino 0 siblings, 1 reply; 9+ messages in thread From: Brian Foster @ 2017-06-08 16:45 UTC (permalink / raw) To: Carlos Maiolino; +Cc: linux-xfs On Thu, Jun 08, 2017 at 04:00:14PM +0200, Carlos Maiolino wrote: > When a buffer has been failed during writeback, the inode items into it > are kept flush locked, and are never resubmitted due the flush lock, so, > if any buffer fails to be written, the items in AIL are never written to > disk and never unlocked. > > This causes unmount operation to hang due these items flush locked in AIL, > but this also causes the items in AIL to never be written back, even when > the IO device comes back to normal. > > I've been testing this patch with a DM-thin device, creating a > filesystem larger than the real device. > > When writing enough data to fill the DM-thin device, XFS receives ENOSPC > errors from the device, and keep spinning on xfsaild (when 'retry > forever' configuration is set). > > At this point, the filesystem can not be unmounted because of the flush locked > items in AIL, but worse, the items in AIL are never retried at all > (once xfs_inode_item_push() will skip the items that are flush locked), > even if the underlying DM-thin device is expanded to the proper size. > > This patch fixes both cases, retrying any item that has been failed > previously, using the infra-structure provided by the previous patch. > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > V2: > - Fix XFS_LI_FAILED flag removal > - Use atomic operations to set and clear XFS_LI_FAILED flag > - Remove check for XBF_WRITE_FAIL in xfs_inode_item_push > - Add more comments to the code > - Add a helper function to resubmit the failed buffers, so this > can be also used in dquot system without duplicating code > > V3: > - drop xfs_imap_to_bp call using a pointer in the log item to > hold the buffer address, fixing a possible infinite loop if > xfs_map_to_bp returns -EIO > - use xa_lock instead of atomic operations to handle log item > flags > - Add a hold to the buffer for each log item failed > - move buffer resubmission up in xfs_inode_item_push() > > fs/xfs/xfs_buf_item.c | 39 +++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_buf_item.h | 2 ++ > fs/xfs/xfs_inode_item.c | 44 +++++++++++++++++++++++++++++++++++++++++--- > fs/xfs/xfs_trans.h | 1 + > fs/xfs/xfs_trans_ail.c | 4 ++-- > 5 files changed, 85 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c > index 523b7a4..9b7bd23 100644 > --- a/fs/xfs/xfs_buf_item.c > +++ b/fs/xfs/xfs_buf_item.c > @@ -1222,3 +1222,42 @@ xfs_buf_iodone( > xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE); > xfs_buf_item_free(BUF_ITEM(lip)); > } > + > +/* > + * Requeue a failed buffer for writeback > + * > + * Return true if the buffer has been re-queued properly, false otherwise > + */ > +bool > +xfs_buf_resubmit_failed_buffers( > + struct xfs_inode *ip, > + struct xfs_log_item *lip, > + struct list_head *buffer_list) > +{ > + struct xfs_log_item *next; > + struct xfs_buf *bp; > + bool ret; > + > + bp = lip->li_buf; > + > + /* Clear XFS_LI_FAILED flag from all items before resubmit > + * > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > + * function already have it acquired > + */ > + for (; lip; lip = next) { > + next = lip->li_bio_list; > + lip->li_flags &= ~XFS_LI_FAILED; > + xfs_buf_rele(bp); We need to check that the log item has LI_FAILED set before we release the buffer because it's possible for the buffer to have a mix of failed and recently flushed (not failed) items. We should set ->li_buf = NULL wherever we clear LI_FAILED as well. A couple small inline helpers to fail/unfail log items correctly (i.e., xfs_log_item_[fail|clear]() or some such) might help to make sure we get the reference counting right. We could also assert that the correct lock is held from those helpers. > + } > + > + /* Add this buffer back to the delayed write list */ > + xfs_buf_lock(bp); > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > + ret = false; > + else > + ret = true; > + > + xfs_buf_unlock(bp); > + return ret; > +} > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > index f7eba99..32aceae 100644 > --- a/fs/xfs/xfs_buf_item.h > +++ b/fs/xfs/xfs_buf_item.h > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > xfs_log_item_t *); > void xfs_buf_iodone_callbacks(struct xfs_buf *); > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > + struct list_head *); > > extern kmem_zone_t *xfs_buf_item_zone; > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > index 08cb7d1..2f5a49e 100644 > --- a/fs/xfs/xfs_inode_item.c > +++ b/fs/xfs/xfs_inode_item.c > @@ -27,6 +27,7 @@ > #include "xfs_error.h" > #include "xfs_trace.h" > #include "xfs_trans_priv.h" > +#include "xfs_buf_item.h" > #include "xfs_log.h" > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > } > > +STATIC void > +xfs_inode_item_error( > + struct xfs_log_item *lip, > + struct xfs_buf *bp, > + unsigned int bflags) > +{ > + > + /* > + * The buffer writeback containing this inode has been failed > + * mark it as failed and unlock the flush lock, so it can be retried > + * again. > + * Use xa_lock to avoid races with other log item state changes. > + */ > + spin_lock(&lip->li_ailp->xa_lock); I'd push the ->xa_lock up into caller and cycle it once over the list rather than once per item. This is more consistent with the locking pattern for successful I/Os. > + xfs_buf_hold(bp); > + lip->li_flags |= XFS_LI_FAILED; > + lip->li_buf = bp; Also, while I think this is technically Ok based on the current implementation, I think we should also test that LI_FAILED is not already set and only hold the buf in that case (i.e., the previously mentioned helper). This helps prevent a landmine if the implementation/locking/etc. changes down the road. I think it would be fine to assert that LI_FAILED is not set already if we wanted to, though, as long as there's a brief comment as well. > + spin_unlock(&lip->li_ailp->xa_lock); > +} > + > STATIC uint > xfs_inode_item_push( > struct xfs_log_item *lip, > @@ -504,6 +525,18 @@ xfs_inode_item_push( > } > > /* > + * The buffer containing this item failed to be written back > + * previously. Resubmit the buffer for IO. > + */ > + if (lip->li_flags & XFS_LI_FAILED) { > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > + buffer_list)) > + rval = XFS_ITEM_FLUSHING; > + > + goto out_unlock; > + } I think this needs to go at least before the ilock. We don't need the latter lock because the inode is already flushed. Somebody else could be holding ilock and blocked on the flush lock. If that occurs, we'll never make progress. I also think we should lock/unlock the buffer here rather than down in the helper. Hmm, maybe even queue it as well so it's clear what's going on. For example, something like: if (lip->li_flags & XFS_LI_FAILED) { xfs_buf_lock(bp); xfs_buf_clear_failed_items(bp); if (!xfs_buf_delwri_queue(bp, buffer_list)) rval = XFS_ITEM_FLUSHING; xfs_buf_unlock(bp); goto out_unlock; } ... but that is mostly just aesthetic. > + > + /* > * Stale inode items should force out the iclog. > */ > if (ip->i_flags & XFS_ISTALE) { > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > .iop_unlock = xfs_inode_item_unlock, > .iop_committed = xfs_inode_item_committed, > .iop_push = xfs_inode_item_push, > - .iop_committing = xfs_inode_item_committing > + .iop_committing = xfs_inode_item_committing, > + .iop_error = xfs_inode_item_error > }; > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > * the AIL lock. > */ > iip = INODE_ITEM(blip); > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > + lip->li_flags & XFS_LI_FAILED) > need_ail++; > > blip = next; > @@ -718,7 +753,8 @@ xfs_iflush_done( > > /* make sure we capture the state of the initial inode. */ > iip = INODE_ITEM(lip); > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > + lip->li_flags & XFS_LI_FAILED) > need_ail++; > > /* > @@ -739,6 +775,8 @@ xfs_iflush_done( > if (INODE_ITEM(blip)->ili_logged && > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > mlip_changed |= xfs_ail_delete_one(ailp, blip); > + else if (blip->li_flags & XFS_LI_FAILED) > + blip->li_flags &= ~XFS_LI_FAILED; This needs to do the full "unfail" sequence as well (another place to call the helper). > } > > if (mlip_changed) { > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > index 3486517..86c3464 100644 > --- a/fs/xfs/xfs_trans.h > +++ b/fs/xfs/xfs_trans.h > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > struct xfs_ail *li_ailp; /* ptr to AIL */ > uint li_type; /* item type */ > uint li_flags; /* misc flags */ > + struct xfs_buf *li_buf; /* real buffer pointer */ > struct xfs_log_item *li_bio_list; /* buffer item list */ > void (*li_cb)(struct xfs_buf *, > struct xfs_log_item *); > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 9056c0f..36e08dd 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > bool > xfs_ail_delete_one( > struct xfs_ail *ailp, > - struct xfs_log_item *lip) > + struct xfs_log_item *lip) > { > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > xfs_ail_delete(ailp, lip); > - lip->li_flags &= ~XFS_LI_IN_AIL; > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); ... and the same thing here. Brian > lip->li_lsn = 0; > > return mlip == lip; > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback 2017-06-08 16:45 ` Brian Foster @ 2017-06-12 12:44 ` Carlos Maiolino 2017-06-12 13:45 ` Brian Foster 0 siblings, 1 reply; 9+ messages in thread From: Carlos Maiolino @ 2017-06-12 12:44 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs Hey. > > +/* > > + * Requeue a failed buffer for writeback > > + * > > + * Return true if the buffer has been re-queued properly, false otherwise > > + */ > > +bool > > +xfs_buf_resubmit_failed_buffers( > > + struct xfs_inode *ip, > > + struct xfs_log_item *lip, > > + struct list_head *buffer_list) > > +{ > > + struct xfs_log_item *next; > > + struct xfs_buf *bp; > > + bool ret; > > + > > + bp = lip->li_buf; > > + > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > + * > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > + * function already have it acquired > > + */ > > + for (; lip; lip = next) { > > + next = lip->li_bio_list; > > + lip->li_flags &= ~XFS_LI_FAILED; > > + xfs_buf_rele(bp); > > We need to check that the log item has LI_FAILED set before we release > the buffer because it's possible for the buffer to have a mix of failed > and recently flushed (not failed) items. We should set ->li_buf = NULL > wherever we clear LI_FAILED as well. > > A couple small inline helpers to fail/unfail log items correctly (i.e., > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > the reference counting right. We could also assert that the correct lock > is held from those helpers. > Agreed, will queue it for V4 > > + } > > + > > + /* Add this buffer back to the delayed write list */ > > + xfs_buf_lock(bp); > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > + ret = false; > > + else > > + ret = true; > > + > > + xfs_buf_unlock(bp); > > + return ret; > > +} > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > index f7eba99..32aceae 100644 > > --- a/fs/xfs/xfs_buf_item.h > > +++ b/fs/xfs/xfs_buf_item.h > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > xfs_log_item_t *); > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > + struct list_head *); > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > index 08cb7d1..2f5a49e 100644 > > --- a/fs/xfs/xfs_inode_item.c > > +++ b/fs/xfs/xfs_inode_item.c > > @@ -27,6 +27,7 @@ > > #include "xfs_error.h" > > #include "xfs_trace.h" > > #include "xfs_trans_priv.h" > > +#include "xfs_buf_item.h" > > #include "xfs_log.h" > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > } > > > > +STATIC void > > +xfs_inode_item_error( > > + struct xfs_log_item *lip, > > + struct xfs_buf *bp, > > + unsigned int bflags) > > +{ > > + > > + /* > > + * The buffer writeback containing this inode has been failed > > + * mark it as failed and unlock the flush lock, so it can be retried > > + * again. > > + * Use xa_lock to avoid races with other log item state changes. > > + */ > > + spin_lock(&lip->li_ailp->xa_lock); > > I'd push the ->xa_lock up into caller and cycle it once over the list > rather than once per item. This is more consistent with the locking > pattern for successful I/Os. > I considered it too, and decided to use spin_lock/unlock on each interaction to avoid having it locked for a long time, but I don't think there is any harm in using a single lock/unlock. > > + xfs_buf_hold(bp); > > + lip->li_flags |= XFS_LI_FAILED; > > + lip->li_buf = bp; > > Also, while I think this is technically Ok based on the current > implementation, I think we should also test that LI_FAILED is not > already set and only hold the buf in that case (i.e., the previously > mentioned helper). This helps prevent a landmine if the > implementation/locking/etc. changes down the road. > > I think it would be fine to assert that LI_FAILED is not set already if > we wanted to, though, as long as there's a brief comment as well. > > > + spin_unlock(&lip->li_ailp->xa_lock); > > +} > > + > > STATIC uint > > xfs_inode_item_push( > > struct xfs_log_item *lip, > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > } > > > > /* > > + * The buffer containing this item failed to be written back > > + * previously. Resubmit the buffer for IO. > > + */ > > + if (lip->li_flags & XFS_LI_FAILED) { > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > + buffer_list)) > > + rval = XFS_ITEM_FLUSHING; > > + > > + goto out_unlock; > > + } > > I think this needs to go at least before the ilock. We don't need the > latter lock because the inode is already flushed. Somebody else could be > holding ilock and blocked on the flush lock. If that occurs, we'll never > make progress. It should be after the ilock IIRC, having it before the ilock will deadlock it any unmount when there are still inodes to be destroyed. > > I also think we should lock/unlock the buffer here rather than down in > the helper. Hmm, maybe even queue it as well so it's clear what's going > on. For example, something like: > > if (lip->li_flags & XFS_LI_FAILED) { > xfs_buf_lock(bp); > xfs_buf_clear_failed_items(bp); > if (!xfs_buf_delwri_queue(bp, buffer_list)) > rval = XFS_ITEM_FLUSHING; > xfs_buf_unlock(bp); > goto out_unlock; > } > Well, we could do that, but we would need to duplicate this code between inode and dquot item, while keeping the helper as-is, we can re-use it in both inode and dquot. > ... but that is mostly just aesthetic. > > > + > > + /* > > * Stale inode items should force out the iclog. > > */ > > if (ip->i_flags & XFS_ISTALE) { > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > .iop_unlock = xfs_inode_item_unlock, > > .iop_committed = xfs_inode_item_committed, > > .iop_push = xfs_inode_item_push, > > - .iop_committing = xfs_inode_item_committing > > + .iop_committing = xfs_inode_item_committing, > > + .iop_error = xfs_inode_item_error > > }; > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > * the AIL lock. > > */ > > iip = INODE_ITEM(blip); > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > + lip->li_flags & XFS_LI_FAILED) > > need_ail++; > > > > blip = next; > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > /* make sure we capture the state of the initial inode. */ > > iip = INODE_ITEM(lip); > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > + lip->li_flags & XFS_LI_FAILED) > > need_ail++; > > > > /* > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > if (INODE_ITEM(blip)->ili_logged && > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > + else if (blip->li_flags & XFS_LI_FAILED) > > + blip->li_flags &= ~XFS_LI_FAILED; > > This needs to do the full "unfail" sequence as well (another place to > call the helper). > > > } > > > > if (mlip_changed) { > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > index 3486517..86c3464 100644 > > --- a/fs/xfs/xfs_trans.h > > +++ b/fs/xfs/xfs_trans.h > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > uint li_type; /* item type */ > > uint li_flags; /* misc flags */ > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > void (*li_cb)(struct xfs_buf *, > > struct xfs_log_item *); > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > index 9056c0f..36e08dd 100644 > > --- a/fs/xfs/xfs_trans_ail.c > > +++ b/fs/xfs/xfs_trans_ail.c > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > bool > > xfs_ail_delete_one( > > struct xfs_ail *ailp, > > - struct xfs_log_item *lip) > > + struct xfs_log_item *lip) > > { > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > xfs_ail_delete(ailp, lip); > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > ... and the same thing here. > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some more reviews on this version before sending V4. Thanks for the review Brian. Cheers -- Carlos ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback 2017-06-12 12:44 ` Carlos Maiolino @ 2017-06-12 13:45 ` Brian Foster 2017-06-14 12:29 ` Carlos Maiolino 0 siblings, 1 reply; 9+ messages in thread From: Brian Foster @ 2017-06-12 13:45 UTC (permalink / raw) To: linux-xfs On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote: > Hey. > > > > +/* > > > + * Requeue a failed buffer for writeback > > > + * > > > + * Return true if the buffer has been re-queued properly, false otherwise > > > + */ > > > +bool > > > +xfs_buf_resubmit_failed_buffers( > > > + struct xfs_inode *ip, > > > + struct xfs_log_item *lip, > > > + struct list_head *buffer_list) > > > +{ > > > + struct xfs_log_item *next; > > > + struct xfs_buf *bp; > > > + bool ret; > > > + > > > + bp = lip->li_buf; > > > + > > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > > + * > > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > > + * function already have it acquired > > > + */ > > > + for (; lip; lip = next) { > > > + next = lip->li_bio_list; > > > + lip->li_flags &= ~XFS_LI_FAILED; > > > + xfs_buf_rele(bp); > > > > We need to check that the log item has LI_FAILED set before we release > > the buffer because it's possible for the buffer to have a mix of failed > > and recently flushed (not failed) items. We should set ->li_buf = NULL > > wherever we clear LI_FAILED as well. > > > > A couple small inline helpers to fail/unfail log items correctly (i.e., > > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > > the reference counting right. We could also assert that the correct lock > > is held from those helpers. > > > > Agreed, will queue it for V4 > > > > + } > > > + > > > + /* Add this buffer back to the delayed write list */ > > > + xfs_buf_lock(bp); > > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > + ret = false; > > > + else > > > + ret = true; > > > + > > > + xfs_buf_unlock(bp); > > > + return ret; > > > +} > > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > > index f7eba99..32aceae 100644 > > > --- a/fs/xfs/xfs_buf_item.h > > > +++ b/fs/xfs/xfs_buf_item.h > > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > > xfs_log_item_t *); > > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > > + struct list_head *); > > > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > > index 08cb7d1..2f5a49e 100644 > > > --- a/fs/xfs/xfs_inode_item.c > > > +++ b/fs/xfs/xfs_inode_item.c > > > @@ -27,6 +27,7 @@ > > > #include "xfs_error.h" > > > #include "xfs_trace.h" > > > #include "xfs_trans_priv.h" > > > +#include "xfs_buf_item.h" > > > #include "xfs_log.h" > > > > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > > } > > > > > > +STATIC void > > > +xfs_inode_item_error( > > > + struct xfs_log_item *lip, > > > + struct xfs_buf *bp, > > > + unsigned int bflags) > > > +{ > > > + > > > + /* > > > + * The buffer writeback containing this inode has been failed > > > + * mark it as failed and unlock the flush lock, so it can be retried > > > + * again. > > > + * Use xa_lock to avoid races with other log item state changes. > > > + */ > > > + spin_lock(&lip->li_ailp->xa_lock); > > > > I'd push the ->xa_lock up into caller and cycle it once over the list > > rather than once per item. This is more consistent with the locking > > pattern for successful I/Os. > > > > I considered it too, and decided to use spin_lock/unlock on each interaction to > avoid having it locked for a long time, but I don't think there is any harm in > using a single lock/unlock. > I don't think we're too concerned about performance at this low a level when when I/Os are failing (or yet, at least), but I'm guessing that pulling up the lock is still lighter weight than what a successful I/O completion does. That aside, I think we also want the entire sequence of marking LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL push sees the state of the items before the errors are processed or after, not somewhere in between). > > > + xfs_buf_hold(bp); > > > + lip->li_flags |= XFS_LI_FAILED; > > > + lip->li_buf = bp; > > > > Also, while I think this is technically Ok based on the current > > implementation, I think we should also test that LI_FAILED is not > > already set and only hold the buf in that case (i.e., the previously > > mentioned helper). This helps prevent a landmine if the > > implementation/locking/etc. changes down the road. > > > > I think it would be fine to assert that LI_FAILED is not set already if > > we wanted to, though, as long as there's a brief comment as well. > > > > > + spin_unlock(&lip->li_ailp->xa_lock); > > > +} > > > + > > > STATIC uint > > > xfs_inode_item_push( > > > struct xfs_log_item *lip, > > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > > } > > > > > > /* > > > + * The buffer containing this item failed to be written back > > > + * previously. Resubmit the buffer for IO. > > > + */ > > > + if (lip->li_flags & XFS_LI_FAILED) { > > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > > + buffer_list)) > > > + rval = XFS_ITEM_FLUSHING; > > > + > > > + goto out_unlock; > > > + } > > > > I think this needs to go at least before the ilock. We don't need the > > latter lock because the inode is already flushed. Somebody else could be > > holding ilock and blocked on the flush lock. If that occurs, we'll never > > make progress. > > It should be after the ilock IIRC, having it before the ilock will deadlock it > any unmount when there are still inodes to be destroyed. Hmm.. so if we get into the failed state on an inode and a reclaim attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and then block on the flush lock. With this code as it is, we'll now never resubmit the failed buffer (unless via some other inode, by chance) because we can't get the inode lock. IOW, we're stuck in a livelock that resembles the original problem. AFAICT, we don't need the inode lock in this case at all. The only reason for it here that I'm aware of is to follow the ilock -> flush lock ordering rules and we know the inode is already flush locked in the failed case. Can you elaborate on the unmount deadlock issue if we move this before the ilock? Shouldn't the I/O error handling detect the unmounting state the next go around and flag it as a permanent error? I'm wondering if there is some other bug lurking there that we need to work around... (Note that I think it's important we resolve this above all else, because getting this correct is probably what dictates whether this is a viable approach or that we need to start looking at something like Dave's flush lock rework idea.) > > > > I also think we should lock/unlock the buffer here rather than down in > > the helper. Hmm, maybe even queue it as well so it's clear what's going > > on. For example, something like: > > > > if (lip->li_flags & XFS_LI_FAILED) { > > xfs_buf_lock(bp); > > xfs_buf_clear_failed_items(bp); > > if (!xfs_buf_delwri_queue(bp, buffer_list)) > > rval = XFS_ITEM_FLUSHING; > > xfs_buf_unlock(bp); > > goto out_unlock; > > } > > > > Well, we could do that, but we would need to duplicate this code between inode > and dquot item, while keeping the helper as-is, we can re-use it in both inode > and dquot. > That doesn't seem so bad to me. It's basically just a lock and delwri queue of a buffer. That said, we could also create a couple separate helpers here: one that requeues a buffer and another that clears failed items on a buffer. Hmm, I'm actually wondering if we can refactor xfs_inode_item_push() a bit to just reuse some of the existing code here. Perhaps it's best to get the locking down correctly before looking into this... BTW, I just realized that the xfs_buf_lock() above probably needs to be a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock via xfs_iflush() is a trylock as well, and we actually drop and reacquire ->xa_lock across that and the delwri queue of the buffer (but I'm not totally sure that part is necessary here). > > ... but that is mostly just aesthetic. > > > > > + > > > + /* > > > * Stale inode items should force out the iclog. > > > */ > > > if (ip->i_flags & XFS_ISTALE) { > > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > > .iop_unlock = xfs_inode_item_unlock, > > > .iop_committed = xfs_inode_item_committed, > > > .iop_push = xfs_inode_item_push, > > > - .iop_committing = xfs_inode_item_committing > > > + .iop_committing = xfs_inode_item_committing, > > > + .iop_error = xfs_inode_item_error > > > }; > > > > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > > * the AIL lock. > > > */ > > > iip = INODE_ITEM(blip); > > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > > + lip->li_flags & XFS_LI_FAILED) > > > need_ail++; > > > > > > blip = next; > > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > > > /* make sure we capture the state of the initial inode. */ > > > iip = INODE_ITEM(lip); > > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > > + lip->li_flags & XFS_LI_FAILED) > > > need_ail++; > > > > > > /* > > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > > if (INODE_ITEM(blip)->ili_logged && > > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > > + else if (blip->li_flags & XFS_LI_FAILED) > > > + blip->li_flags &= ~XFS_LI_FAILED; > > > > This needs to do the full "unfail" sequence as well (another place to > > call the helper). > > > > > } > > > > > > if (mlip_changed) { > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > > index 3486517..86c3464 100644 > > > --- a/fs/xfs/xfs_trans.h > > > +++ b/fs/xfs/xfs_trans.h > > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > > uint li_type; /* item type */ > > > uint li_flags; /* misc flags */ > > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > > void (*li_cb)(struct xfs_buf *, > > > struct xfs_log_item *); > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > > index 9056c0f..36e08dd 100644 > > > --- a/fs/xfs/xfs_trans_ail.c > > > +++ b/fs/xfs/xfs_trans_ail.c > > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > > bool > > > xfs_ail_delete_one( > > > struct xfs_ail *ailp, > > > - struct xfs_log_item *lip) > > > + struct xfs_log_item *lip) > > > { > > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > > xfs_ail_delete(ailp, lip); > > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > > > ... and the same thing here. > > > > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some > more reviews on this version before sending V4. > Sounds good. Brian > Thanks for the review Brian. > > Cheers > > -- > Carlos > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback 2017-06-12 13:45 ` Brian Foster @ 2017-06-14 12:29 ` Carlos Maiolino 0 siblings, 0 replies; 9+ messages in thread From: Carlos Maiolino @ 2017-06-14 12:29 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Mon, Jun 12, 2017 at 09:45:44AM -0400, Brian Foster wrote: > On Mon, Jun 12, 2017 at 02:44:23PM +0200, Carlos Maiolino wrote: > > Hey. > > > > > > +/* > > > > + * Requeue a failed buffer for writeback > > > > + * > > > > + * Return true if the buffer has been re-queued properly, false otherwise > > > > + */ > > > > +bool > > > > +xfs_buf_resubmit_failed_buffers( > > > > + struct xfs_inode *ip, > > > > + struct xfs_log_item *lip, > > > > + struct list_head *buffer_list) > > > > +{ > > > > + struct xfs_log_item *next; > > > > + struct xfs_buf *bp; > > > > + bool ret; > > > > + > > > > + bp = lip->li_buf; > > > > + > > > > + /* Clear XFS_LI_FAILED flag from all items before resubmit > > > > + * > > > > + * XFS_LI_FAILED set/clear is protected by xa_lock, caller this > > > > + * function already have it acquired > > > > + */ > > > > + for (; lip; lip = next) { > > > > + next = lip->li_bio_list; > > > > + lip->li_flags &= ~XFS_LI_FAILED; > > > > + xfs_buf_rele(bp); > > > > > > We need to check that the log item has LI_FAILED set before we release > > > the buffer because it's possible for the buffer to have a mix of failed > > > and recently flushed (not failed) items. We should set ->li_buf = NULL > > > wherever we clear LI_FAILED as well. > > > > > > A couple small inline helpers to fail/unfail log items correctly (i.e., > > > xfs_log_item_[fail|clear]() or some such) might help to make sure we get > > > the reference counting right. We could also assert that the correct lock > > > is held from those helpers. > > > > > > > Agreed, will queue it for V4 > > > > > > + } > > > > + > > > > + /* Add this buffer back to the delayed write list */ > > > > + xfs_buf_lock(bp); > > > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > > + ret = false; > > > > + else > > > > + ret = true; > > > > + > > > > + xfs_buf_unlock(bp); > > > > + return ret; > > > > +} > > > > diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h > > > > index f7eba99..32aceae 100644 > > > > --- a/fs/xfs/xfs_buf_item.h > > > > +++ b/fs/xfs/xfs_buf_item.h > > > > @@ -70,6 +70,8 @@ void xfs_buf_attach_iodone(struct xfs_buf *, > > > > xfs_log_item_t *); > > > > void xfs_buf_iodone_callbacks(struct xfs_buf *); > > > > void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *); > > > > +bool xfs_buf_resubmit_failed_buffers(struct xfs_inode *, struct xfs_log_item *, > > > > + struct list_head *); > > > > > > > > extern kmem_zone_t *xfs_buf_item_zone; > > > > > > > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > > > index 08cb7d1..2f5a49e 100644 > > > > --- a/fs/xfs/xfs_inode_item.c > > > > +++ b/fs/xfs/xfs_inode_item.c > > > > @@ -27,6 +27,7 @@ > > > > #include "xfs_error.h" > > > > #include "xfs_trace.h" > > > > #include "xfs_trans_priv.h" > > > > +#include "xfs_buf_item.h" > > > > #include "xfs_log.h" > > > > > > > > > > > > @@ -475,6 +476,26 @@ xfs_inode_item_unpin( > > > > wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); > > > > } > > > > > > > > +STATIC void > > > > +xfs_inode_item_error( > > > > + struct xfs_log_item *lip, > > > > + struct xfs_buf *bp, > > > > + unsigned int bflags) > > > > +{ > > > > + > > > > + /* > > > > + * The buffer writeback containing this inode has been failed > > > > + * mark it as failed and unlock the flush lock, so it can be retried > > > > + * again. > > > > + * Use xa_lock to avoid races with other log item state changes. > > > > + */ > > > > + spin_lock(&lip->li_ailp->xa_lock); > > > > > > I'd push the ->xa_lock up into caller and cycle it once over the list > > > rather than once per item. This is more consistent with the locking > > > pattern for successful I/Os. > > > > > > > I considered it too, and decided to use spin_lock/unlock on each interaction to > > avoid having it locked for a long time, but I don't think there is any harm in > > using a single lock/unlock. > > > > I don't think we're too concerned about performance at this low a level > when when I/Os are failing (or yet, at least), but I'm guessing that > pulling up the lock is still lighter weight than what a successful I/O > completion does. > > That aside, I think we also want the entire sequence of marking > LI_FAILED to be atomic with respect to the AIL (e.g., so the next AIL > push sees the state of the items before the errors are processed or > after, not somewhere in between). > Agreed. > > > > + xfs_buf_hold(bp); > > > > + lip->li_flags |= XFS_LI_FAILED; > > > > + lip->li_buf = bp; > > > > > > Also, while I think this is technically Ok based on the current > > > implementation, I think we should also test that LI_FAILED is not > > > already set and only hold the buf in that case (i.e., the previously > > > mentioned helper). This helps prevent a landmine if the > > > implementation/locking/etc. changes down the road. > > > > > > I think it would be fine to assert that LI_FAILED is not set already if > > > we wanted to, though, as long as there's a brief comment as well. > > > > > > > + spin_unlock(&lip->li_ailp->xa_lock); > > > > +} > > > > + > > > > STATIC uint > > > > xfs_inode_item_push( > > > > struct xfs_log_item *lip, > > > > @@ -504,6 +525,18 @@ xfs_inode_item_push( > > > > } > > > > > > > > /* > > > > + * The buffer containing this item failed to be written back > > > > + * previously. Resubmit the buffer for IO. > > > > + */ > > > > + if (lip->li_flags & XFS_LI_FAILED) { > > > > + if (!xfs_buf_resubmit_failed_buffers(ip, lip, > > > > + buffer_list)) > > > > + rval = XFS_ITEM_FLUSHING; > > > > + > > > > + goto out_unlock; > > > > + } > > > > > > I think this needs to go at least before the ilock. We don't need the > > > latter lock because the inode is already flushed. Somebody else could be > > > holding ilock and blocked on the flush lock. If that occurs, we'll never > > > make progress. > > > > It should be after the ilock IIRC, having it before the ilock will deadlock it > > any unmount when there are still inodes to be destroyed. > > Hmm.. so if we get into the failed state on an inode and a reclaim > attempt occurs on the inode, xfs_reclaim_inode() can grab the ilock and > then block on the flush lock. With this code as it is, we'll now never > resubmit the failed buffer (unless via some other inode, by chance) > because we can't get the inode lock. IOW, we're stuck in a livelock that > resembles the original problem. > > AFAICT, we don't need the inode lock in this case at all. The only > reason for it here that I'm aware of is to follow the ilock -> flush > lock ordering rules and we know the inode is already flush locked in the > failed case. Can you elaborate on the unmount deadlock issue if we move > this before the ilock? Shouldn't the I/O error handling detect the > unmounting state the next go around and flag it as a permanent error? > I'm wondering if there is some other bug lurking there that we need to > work around... > Nevermind, I didn't investigate when I've hit the deadlock, I found what happened, fixed, and I'll queue this to V4. > (Note that I think it's important we resolve this above all else, > because getting this correct is probably what dictates whether this is a > viable approach or that we need to start looking at something like > Dave's flush lock rework idea.) > > > > > > > I also think we should lock/unlock the buffer here rather than down in > > > the helper. Hmm, maybe even queue it as well so it's clear what's going > > > on. For example, something like: > > > > > > if (lip->li_flags & XFS_LI_FAILED) { > > > xfs_buf_lock(bp); > > > xfs_buf_clear_failed_items(bp); > > > if (!xfs_buf_delwri_queue(bp, buffer_list)) > > > rval = XFS_ITEM_FLUSHING; > > > xfs_buf_unlock(bp); > > > goto out_unlock; > > > } > > > > > > > Well, we could do that, but we would need to duplicate this code between inode > > and dquot item, while keeping the helper as-is, we can re-use it in both inode > > and dquot. > > > > That doesn't seem so bad to me. It's basically just a lock and delwri > queue of a buffer. That said, we could also create a couple separate > helpers here: one that requeues a buffer and another that clears failed > items on a buffer. Hmm, I'm actually wondering if we can refactor > xfs_inode_item_push() a bit to just reuse some of the existing code > here. Perhaps it's best to get the locking down correctly before looking > into this... > > BTW, I just realized that the xfs_buf_lock() above probably needs to be > a trylock (where we return XFS_ITEM_LOCKED on failure). The buffer lock > via xfs_iflush() is a trylock as well, and we actually drop and > reacquire ->xa_lock across that and the delwri queue of the buffer (but > I'm not totally sure that part is necessary here). > > > > ... but that is mostly just aesthetic. > > > > > > > + > > > > + /* > > > > * Stale inode items should force out the iclog. > > > > */ > > > > if (ip->i_flags & XFS_ISTALE) { > > > > @@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_inode_item_ops = { > > > > .iop_unlock = xfs_inode_item_unlock, > > > > .iop_committed = xfs_inode_item_committed, > > > > .iop_push = xfs_inode_item_push, > > > > - .iop_committing = xfs_inode_item_committing > > > > + .iop_committing = xfs_inode_item_committing, > > > > + .iop_error = xfs_inode_item_error > > > > }; > > > > > > > > > > > > @@ -710,7 +744,8 @@ xfs_iflush_done( > > > > * the AIL lock. > > > > */ > > > > iip = INODE_ITEM(blip); > > > > - if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) > > > > + if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) || > > > > + lip->li_flags & XFS_LI_FAILED) > > > > need_ail++; > > > > > > > > blip = next; > > > > @@ -718,7 +753,8 @@ xfs_iflush_done( > > > > > > > > /* make sure we capture the state of the initial inode. */ > > > > iip = INODE_ITEM(lip); > > > > - if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) > > > > + if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) || > > > > + lip->li_flags & XFS_LI_FAILED) > > > > need_ail++; > > > > > > > > /* > > > > @@ -739,6 +775,8 @@ xfs_iflush_done( > > > > if (INODE_ITEM(blip)->ili_logged && > > > > blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn) > > > > mlip_changed |= xfs_ail_delete_one(ailp, blip); > > > > + else if (blip->li_flags & XFS_LI_FAILED) > > > > + blip->li_flags &= ~XFS_LI_FAILED; > > > > > > This needs to do the full "unfail" sequence as well (another place to > > > call the helper). > > > > > > > } > > > > > > > > if (mlip_changed) { > > > > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h > > > > index 3486517..86c3464 100644 > > > > --- a/fs/xfs/xfs_trans.h > > > > +++ b/fs/xfs/xfs_trans.h > > > > @@ -49,6 +49,7 @@ typedef struct xfs_log_item { > > > > struct xfs_ail *li_ailp; /* ptr to AIL */ > > > > uint li_type; /* item type */ > > > > uint li_flags; /* misc flags */ > > > > + struct xfs_buf *li_buf; /* real buffer pointer */ > > > > struct xfs_log_item *li_bio_list; /* buffer item list */ > > > > void (*li_cb)(struct xfs_buf *, > > > > struct xfs_log_item *); > > > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > > > > index 9056c0f..36e08dd 100644 > > > > --- a/fs/xfs/xfs_trans_ail.c > > > > +++ b/fs/xfs/xfs_trans_ail.c > > > > @@ -687,13 +687,13 @@ xfs_trans_ail_update_bulk( > > > > bool > > > > xfs_ail_delete_one( > > > > struct xfs_ail *ailp, > > > > - struct xfs_log_item *lip) > > > > + struct xfs_log_item *lip) > > > > { > > > > struct xfs_log_item *mlip = xfs_ail_min(ailp); > > > > > > > > trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn); > > > > xfs_ail_delete(ailp, lip); > > > > - lip->li_flags &= ~XFS_LI_IN_AIL; > > > > + lip->li_flags &= ~(XFS_LI_IN_AIL | XFS_LI_FAILED); > > > > > > ... and the same thing here. > > > > > > > I agree with the helpers, and I'll queue them up for V4, I'll just wait for some > > more reviews on this version before sending V4. > > > > Sounds good. > > Brian > > > Thanks for the review Brian. > > > > Cheers > > > > -- > > Carlos > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-06-14 12:30 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-08 14:00 [PATCH 0/2 V3] Resubmit items failed during writeback Carlos Maiolino 2017-06-08 14:00 ` [PATCH 1/2 V3] xfs: Add infrastructure needed for error propagation during buffer IO failure Carlos Maiolino 2017-06-08 16:43 ` Brian Foster 2017-06-14 10:15 ` Carlos Maiolino 2017-06-08 14:00 ` [PATCH 2/2 V3] xfs: Properly retry failed inode items in case of error during buffer writeback Carlos Maiolino 2017-06-08 16:45 ` Brian Foster 2017-06-12 12:44 ` Carlos Maiolino 2017-06-12 13:45 ` Brian Foster 2017-06-14 12:29 ` Carlos Maiolino
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).