* [PATCH 0/2] xfs: Clean the EFI on errors series @ 2014-03-25 20:06 Mark Tinguely 2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely 2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely 0 siblings, 2 replies; 12+ messages in thread From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw) To: xfs Here is a broken up version of the clean the EFI from the AIL on error. Patch 1 is for log recovery and patch 2 is for log aborts, forced shutdowns in xfs_bmap_finish() or another thread while the EFI is in the CIL or AIL and the EFD is in the CIL. If xfs_free_extent() fails while processing the extent free list, one has to manually remove the EFI because the EFD is not complete and is not in the transaction. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] xfs: remove efi from AIL in log recovery error 2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely @ 2014-03-25 20:06 ` Mark Tinguely 2014-03-28 15:24 ` Brian Foster 2014-03-31 20:25 ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely 2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely 1 sibling, 2 replies; 12+ messages in thread From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw) To: xfs [-- Attachment #1: xfs-fix-efi-in-log-recovery-error.patch --] [-- Type: text/plain, Size: 5155 bytes --] xlog_recover_process_efi{s}() functions are completing the second half of xfs_bmap_finish() which frees extents. If this operation fails, the EFI will stay on the AIL and prevents the xfs_ail_push all_sync() from completing and the mount will fail to unmount. Rather than have a special log recovery flag XFS_EFI_RECOVERED to decrement the EFI/EFD counter, call the same decrement function from the log recovery routine that is called then the EFI is added to the AIL from a log write. Remove all other unprocessed EFIs from the log recovery AIL when one is discovered in error. Signed-off-by: Mark Tinguely <tinguely@sgi.com> --- Rewritten with suggestions from Dave. Note: calling xfs_efi_item_unpin() seemed more explainatory than calling the helper __xfs_efi_release(). fs/xfs/xfs_extfree_item.c | 9 +++------ fs/xfs/xfs_log_recover.c | 28 +++++++++++++++------------- fs/xfs/xfs_trans.h | 1 + 3 files changed, 19 insertions(+), 19 deletions(-) Index: b/fs/xfs/xfs_extfree_item.c =================================================================== --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -134,9 +134,10 @@ xfs_efi_item_pin( * remove the EFI it's because the transaction has been canceled and by * definition that means the EFI cannot be in the AIL so remove it from the * transaction and free it. Otherwise coordinate with xfs_efi_release() - * to determine who gets to free the EFI. + * to determine who gets to free the EFI. Call from log recovery of EFI + * entries so the EFD or error handling will remove the entry. */ -STATIC void +void xfs_efi_item_unpin( struct xfs_log_item *lip, int remove) @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t *efip { ASSERT(atomic_read(&efip->efi_next_extent) >= nextents); if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) { - /* recovery needs us to drop the EFI reference, too */ - if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)) - __xfs_efi_release(efip); - __xfs_efi_release(efip); /* efip may now have been freed, do not reference it again. */ } Index: b/fs/xfs/xfs_log_recover.c =================================================================== --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3634,6 +3634,7 @@ xlog_recover_process_data( /* * Process an extent free intent item that was recovered from * the log. We need to free the extents that it describes. + * The caller will release this and any following EFIs upon error. */ STATIC int xlog_recover_process_efi( @@ -3648,6 +3649,13 @@ xlog_recover_process_efi( xfs_fsblock_t startblock_fsb; ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)); + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); + + /* + * Decrement the EFI/EFD counter so the EFI is removed after + * processing the EFD or error handling in the caller. + */ + xfs_efi_item_unpin(&efip->efi_item, 0); /* * First check the validity of the extents described by the @@ -3662,12 +3670,6 @@ xlog_recover_process_efi( (extp->ext_len == 0) || (startblock_fsb >= mp->m_sb.sb_dblocks) || (extp->ext_len >= mp->m_sb.sb_agblocks)) { - /* - * This will pull the EFI from the AIL and - * free the memory associated with it. - */ - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); - xfs_efi_release(efip, efip->efi_format.efi_nextents); return XFS_ERROR(EIO); } } @@ -3687,7 +3689,6 @@ xlog_recover_process_efi( extp->ext_len); } - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); error = xfs_trans_commit(tp, 0); return error; @@ -3718,8 +3719,8 @@ STATIC int xlog_recover_process_efis( struct xlog *log) { - xfs_log_item_t *lip; - xfs_efi_log_item_t *efip; + struct xfs_log_item *lip; + struct xfs_efi_log_item *efip; int error = 0; struct xfs_ail_cursor cur; struct xfs_ail *ailp; @@ -3750,13 +3751,14 @@ xlog_recover_process_efis( } spin_unlock(&ailp->xa_lock); - error = xlog_recover_process_efi(log->l_mp, efip); - spin_lock(&ailp->xa_lock); + /* Skip all EFIs after first EFI in error. */ + if (!error) + error = xlog_recover_process_efi(log->l_mp, efip); if (error) - goto out; + xfs_efi_release(efip, efip->efi_format.efi_nextents); + spin_lock(&ailp->xa_lock); lip = xfs_trans_ail_cursor_next(ailp, &cur); } -out: xfs_trans_ail_cursor_done(ailp, &cur); spin_unlock(&ailp->xa_lock); return error; Index: b/fs/xfs/xfs_trans.h =================================================================== --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -216,6 +216,7 @@ void xfs_trans_ijoin(struct xfs_trans * void xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint); void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint); struct xfs_efi_log_item *xfs_trans_get_efi(xfs_trans_t *, uint); +void xfs_efi_item_unpin(struct xfs_log_item *, int); void xfs_efi_release(struct xfs_efi_log_item *, uint); void xfs_trans_log_efi_extent(xfs_trans_t *, struct xfs_efi_log_item *, _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error 2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely @ 2014-03-28 15:24 ` Brian Foster 2014-03-28 15:41 ` Mark Tinguely 2014-03-31 20:25 ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely 1 sibling, 1 reply; 12+ messages in thread From: Brian Foster @ 2014-03-28 15:24 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote: > xlog_recover_process_efi{s}() functions are completing the > second half of xfs_bmap_finish() which frees extents. If this > operation fails, the EFI will stay on the AIL and prevents > the xfs_ail_push all_sync() from completing and the mount will > fail to unmount. > > Rather than have a special log recovery flag XFS_EFI_RECOVERED > to decrement the EFI/EFD counter, call the same decrement function > from the log recovery routine that is called then the EFI is added > to the AIL from a log write. > > Remove all other unprocessed EFIs from the log recovery AIL > when one is discovered in error. > > Signed-off-by: Mark Tinguely <tinguely@sgi.com> > --- > Rewritten with suggestions from Dave. > Note: calling xfs_efi_item_unpin() seemed more explainatory than calling > the helper __xfs_efi_release(). > > fs/xfs/xfs_extfree_item.c | 9 +++------ > fs/xfs/xfs_log_recover.c | 28 +++++++++++++++------------- > fs/xfs/xfs_trans.h | 1 + > 3 files changed, 19 insertions(+), 19 deletions(-) > > Index: b/fs/xfs/xfs_extfree_item.c > =================================================================== > --- a/fs/xfs/xfs_extfree_item.c > +++ b/fs/xfs/xfs_extfree_item.c > @@ -134,9 +134,10 @@ xfs_efi_item_pin( > * remove the EFI it's because the transaction has been canceled and by > * definition that means the EFI cannot be in the AIL so remove it from the > * transaction and free it. Otherwise coordinate with xfs_efi_release() > - * to determine who gets to free the EFI. > + * to determine who gets to free the EFI. Call from log recovery of EFI > + * entries so the EFD or error handling will remove the entry. > */ > -STATIC void > +void > xfs_efi_item_unpin( > struct xfs_log_item *lip, > int remove) > @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t *efip > { > ASSERT(atomic_read(&efip->efi_next_extent) >= nextents); > if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) { > - /* recovery needs us to drop the EFI reference, too */ > - if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)) > - __xfs_efi_release(efip); > - > __xfs_efi_release(efip); > /* efip may now have been freed, do not reference it again. */ > } > Index: b/fs/xfs/xfs_log_recover.c > =================================================================== > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -3634,6 +3634,7 @@ xlog_recover_process_data( > /* > * Process an extent free intent item that was recovered from > * the log. We need to free the extents that it describes. > + * The caller will release this and any following EFIs upon error. > */ > STATIC int > xlog_recover_process_efi( > @@ -3648,6 +3649,13 @@ xlog_recover_process_efi( > xfs_fsblock_t startblock_fsb; > > ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)); > + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); > + > + /* > + * Decrement the EFI/EFD counter so the EFI is removed after > + * processing the EFD or error handling in the caller. > + */ > + xfs_efi_item_unpin(&efip->efi_item, 0); > > /* > * First check the validity of the extents described by the > @@ -3662,12 +3670,6 @@ xlog_recover_process_efi( > (extp->ext_len == 0) || > (startblock_fsb >= mp->m_sb.sb_dblocks) || > (extp->ext_len >= mp->m_sb.sb_agblocks)) { > - /* > - * This will pull the EFI from the AIL and > - * free the memory associated with it. > - */ > - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); > - xfs_efi_release(efip, efip->efi_format.efi_nextents); > return XFS_ERROR(EIO); > } > } > @@ -3687,7 +3689,6 @@ xlog_recover_process_efi( > extp->ext_len); > } > > - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); > error = xfs_trans_commit(tp, 0); > return error; > > @@ -3718,8 +3719,8 @@ STATIC int > xlog_recover_process_efis( > struct xlog *log) > { > - xfs_log_item_t *lip; > - xfs_efi_log_item_t *efip; > + struct xfs_log_item *lip; > + struct xfs_efi_log_item *efip; > int error = 0; > struct xfs_ail_cursor cur; > struct xfs_ail *ailp; > @@ -3750,13 +3751,14 @@ xlog_recover_process_efis( > } > > spin_unlock(&ailp->xa_lock); > - error = xlog_recover_process_efi(log->l_mp, efip); > - spin_lock(&ailp->xa_lock); > + /* Skip all EFIs after first EFI in error. */ > + if (!error) > + error = xlog_recover_process_efi(log->l_mp, efip); > if (error) > - goto out; > + xfs_efi_release(efip, efip->efi_format.efi_nextents); Hi Mark, If we hit the scenario where we start skipping EFIs after an error, is the equivalent unpin() call from process_efi() not necessary on the subsequent EFIs? Brian > + spin_lock(&ailp->xa_lock); > lip = xfs_trans_ail_cursor_next(ailp, &cur); > } > -out: > xfs_trans_ail_cursor_done(ailp, &cur); > spin_unlock(&ailp->xa_lock); > return error; > Index: b/fs/xfs/xfs_trans.h > =================================================================== > --- a/fs/xfs/xfs_trans.h > +++ b/fs/xfs/xfs_trans.h > @@ -216,6 +216,7 @@ void xfs_trans_ijoin(struct xfs_trans * > void xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint); > void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint); > struct xfs_efi_log_item *xfs_trans_get_efi(xfs_trans_t *, uint); > +void xfs_efi_item_unpin(struct xfs_log_item *, int); > void xfs_efi_release(struct xfs_efi_log_item *, uint); > void xfs_trans_log_efi_extent(xfs_trans_t *, > struct xfs_efi_log_item *, > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error 2014-03-28 15:24 ` Brian Foster @ 2014-03-28 15:41 ` Mark Tinguely 2014-03-28 16:07 ` Brian Foster 0 siblings, 1 reply; 12+ messages in thread From: Mark Tinguely @ 2014-03-28 15:41 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 03/28/14 10:24, Brian Foster wrote: > On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote: >> xlog_recover_process_efi{s}() functions are completing the >> second half of xfs_bmap_finish() which frees extents. If this >> operation fails, the EFI will stay on the AIL and prevents >> the xfs_ail_push all_sync() from completing and the mount will >> fail to unmount. >> >> Rather than have a special log recovery flag XFS_EFI_RECOVERED >> to decrement the EFI/EFD counter, call the same decrement function >> from the log recovery routine that is called then the EFI is added >> to the AIL from a log write. >> >> Remove all other unprocessed EFIs from the log recovery AIL >> when one is discovered in error. >> >> Signed-off-by: Mark Tinguely<tinguely@sgi.com> >> --- >> Rewritten with suggestions from Dave. >> Note: calling xfs_efi_item_unpin() seemed more explainatory than calling >> the helper __xfs_efi_release(). >> >> fs/xfs/xfs_extfree_item.c | 9 +++------ >> fs/xfs/xfs_log_recover.c | 28 +++++++++++++++------------- >> fs/xfs/xfs_trans.h | 1 + >> 3 files changed, 19 insertions(+), 19 deletions(-) >> >> Index: b/fs/xfs/xfs_extfree_item.c >> =================================================================== >> --- a/fs/xfs/xfs_extfree_item.c >> +++ b/fs/xfs/xfs_extfree_item.c >> @@ -134,9 +134,10 @@ xfs_efi_item_pin( >> * remove the EFI it's because the transaction has been canceled and by >> * definition that means the EFI cannot be in the AIL so remove it from the >> * transaction and free it. Otherwise coordinate with xfs_efi_release() >> - * to determine who gets to free the EFI. >> + * to determine who gets to free the EFI. Call from log recovery of EFI >> + * entries so the EFD or error handling will remove the entry. >> */ >> -STATIC void >> +void >> xfs_efi_item_unpin( >> struct xfs_log_item *lip, >> int remove) >> @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t *efip >> { >> ASSERT(atomic_read(&efip->efi_next_extent)>= nextents); >> if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) { >> - /* recovery needs us to drop the EFI reference, too */ >> - if (test_bit(XFS_EFI_RECOVERED,&efip->efi_flags)) >> - __xfs_efi_release(efip); >> - >> __xfs_efi_release(efip); >> /* efip may now have been freed, do not reference it again. */ >> } >> Index: b/fs/xfs/xfs_log_recover.c >> =================================================================== >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -3634,6 +3634,7 @@ xlog_recover_process_data( >> /* >> * Process an extent free intent item that was recovered from >> * the log. We need to free the extents that it describes. >> + * The caller will release this and any following EFIs upon error. >> */ >> STATIC int >> xlog_recover_process_efi( >> @@ -3648,6 +3649,13 @@ xlog_recover_process_efi( >> xfs_fsblock_t startblock_fsb; >> >> ASSERT(!test_bit(XFS_EFI_RECOVERED,&efip->efi_flags)); >> + set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); >> + >> + /* >> + * Decrement the EFI/EFD counter so the EFI is removed after >> + * processing the EFD or error handling in the caller. >> + */ >> + xfs_efi_item_unpin(&efip->efi_item, 0); >> >> /* >> * First check the validity of the extents described by the >> @@ -3662,12 +3670,6 @@ xlog_recover_process_efi( >> (extp->ext_len == 0) || >> (startblock_fsb>= mp->m_sb.sb_dblocks) || >> (extp->ext_len>= mp->m_sb.sb_agblocks)) { >> - /* >> - * This will pull the EFI from the AIL and >> - * free the memory associated with it. >> - */ >> - set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); >> - xfs_efi_release(efip, efip->efi_format.efi_nextents); >> return XFS_ERROR(EIO); >> } >> } >> @@ -3687,7 +3689,6 @@ xlog_recover_process_efi( >> extp->ext_len); >> } >> >> - set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); >> error = xfs_trans_commit(tp, 0); >> return error; >> >> @@ -3718,8 +3719,8 @@ STATIC int >> xlog_recover_process_efis( >> struct xlog *log) >> { >> - xfs_log_item_t *lip; >> - xfs_efi_log_item_t *efip; >> + struct xfs_log_item *lip; >> + struct xfs_efi_log_item *efip; >> int error = 0; >> struct xfs_ail_cursor cur; >> struct xfs_ail *ailp; >> @@ -3750,13 +3751,14 @@ xlog_recover_process_efis( >> } >> >> spin_unlock(&ailp->xa_lock); >> - error = xlog_recover_process_efi(log->l_mp, efip); >> - spin_lock(&ailp->xa_lock); >> + /* Skip all EFIs after first EFI in error. */ >> + if (!error) >> + error = xlog_recover_process_efi(log->l_mp, efip); >> if (error) >> - goto out; >> + xfs_efi_release(efip, efip->efi_format.efi_nextents); > > Hi Mark, > > If we hit the scenario where we start skipping EFIs after an error, is > the equivalent unpin() call from process_efi() not necessary on the > subsequent EFIs? > > Brian yes, good catch. They will have to be decremented twice. something like: + if (!error) + error = xlog_recover_process_efi(log->l_mp, efip); + else + xfs_efi_item_unpin(&efip->efi_item, 0); + if (error) ... --Mark _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error 2014-03-28 15:41 ` Mark Tinguely @ 2014-03-28 16:07 ` Brian Foster 2014-03-28 16:21 ` Mark Tinguely 0 siblings, 1 reply; 12+ messages in thread From: Brian Foster @ 2014-03-28 16:07 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Fri, Mar 28, 2014 at 10:41:06AM -0500, Mark Tinguely wrote: > On 03/28/14 10:24, Brian Foster wrote: > >On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote: > >>xlog_recover_process_efi{s}() functions are completing the > >>second half of xfs_bmap_finish() which frees extents. If this > >>operation fails, the EFI will stay on the AIL and prevents > >>the xfs_ail_push all_sync() from completing and the mount will > >>fail to unmount. > >> > >>Rather than have a special log recovery flag XFS_EFI_RECOVERED > >>to decrement the EFI/EFD counter, call the same decrement function > >>from the log recovery routine that is called then the EFI is added > >>to the AIL from a log write. > >> > >>Remove all other unprocessed EFIs from the log recovery AIL > >>when one is discovered in error. > >> > >>Signed-off-by: Mark Tinguely<tinguely@sgi.com> > >>--- > >>Rewritten with suggestions from Dave. > >>Note: calling xfs_efi_item_unpin() seemed more explainatory than calling > >> the helper __xfs_efi_release(). > >> > >> fs/xfs/xfs_extfree_item.c | 9 +++------ > >> fs/xfs/xfs_log_recover.c | 28 +++++++++++++++------------- > >> fs/xfs/xfs_trans.h | 1 + > >> 3 files changed, 19 insertions(+), 19 deletions(-) > >> > >>Index: b/fs/xfs/xfs_extfree_item.c > >>=================================================================== > >>--- a/fs/xfs/xfs_extfree_item.c > >>+++ b/fs/xfs/xfs_extfree_item.c > >>@@ -134,9 +134,10 @@ xfs_efi_item_pin( > >> * remove the EFI it's because the transaction has been canceled and by > >> * definition that means the EFI cannot be in the AIL so remove it from the > >> * transaction and free it. Otherwise coordinate with xfs_efi_release() > >>- * to determine who gets to free the EFI. > >>+ * to determine who gets to free the EFI. Call from log recovery of EFI > >>+ * entries so the EFD or error handling will remove the entry. > >> */ > >>-STATIC void > >>+void > >> xfs_efi_item_unpin( > >> struct xfs_log_item *lip, > >> int remove) > >>@@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t *efip > >> { > >> ASSERT(atomic_read(&efip->efi_next_extent)>= nextents); > >> if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) { > >>- /* recovery needs us to drop the EFI reference, too */ > >>- if (test_bit(XFS_EFI_RECOVERED,&efip->efi_flags)) > >>- __xfs_efi_release(efip); > >>- > >> __xfs_efi_release(efip); > >> /* efip may now have been freed, do not reference it again. */ > >> } > >>Index: b/fs/xfs/xfs_log_recover.c > >>=================================================================== > >>--- a/fs/xfs/xfs_log_recover.c > >>+++ b/fs/xfs/xfs_log_recover.c > >>@@ -3634,6 +3634,7 @@ xlog_recover_process_data( > >> /* > >> * Process an extent free intent item that was recovered from > >> * the log. We need to free the extents that it describes. > >>+ * The caller will release this and any following EFIs upon error. > >> */ > >> STATIC int > >> xlog_recover_process_efi( > >>@@ -3648,6 +3649,13 @@ xlog_recover_process_efi( > >> xfs_fsblock_t startblock_fsb; > >> > >> ASSERT(!test_bit(XFS_EFI_RECOVERED,&efip->efi_flags)); > >>+ set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); > >>+ > >>+ /* > >>+ * Decrement the EFI/EFD counter so the EFI is removed after > >>+ * processing the EFD or error handling in the caller. > >>+ */ > >>+ xfs_efi_item_unpin(&efip->efi_item, 0); > >> > >> /* > >> * First check the validity of the extents described by the > >>@@ -3662,12 +3670,6 @@ xlog_recover_process_efi( > >> (extp->ext_len == 0) || > >> (startblock_fsb>= mp->m_sb.sb_dblocks) || > >> (extp->ext_len>= mp->m_sb.sb_agblocks)) { > >>- /* > >>- * This will pull the EFI from the AIL and > >>- * free the memory associated with it. > >>- */ > >>- set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); > >>- xfs_efi_release(efip, efip->efi_format.efi_nextents); > >> return XFS_ERROR(EIO); > >> } > >> } > >>@@ -3687,7 +3689,6 @@ xlog_recover_process_efi( > >> extp->ext_len); > >> } > >> > >>- set_bit(XFS_EFI_RECOVERED,&efip->efi_flags); > >> error = xfs_trans_commit(tp, 0); > >> return error; > >> > >>@@ -3718,8 +3719,8 @@ STATIC int > >> xlog_recover_process_efis( > >> struct xlog *log) > >> { > >>- xfs_log_item_t *lip; > >>- xfs_efi_log_item_t *efip; > >>+ struct xfs_log_item *lip; > >>+ struct xfs_efi_log_item *efip; > >> int error = 0; > >> struct xfs_ail_cursor cur; > >> struct xfs_ail *ailp; > >>@@ -3750,13 +3751,14 @@ xlog_recover_process_efis( > >> } > >> > >> spin_unlock(&ailp->xa_lock); > >>- error = xlog_recover_process_efi(log->l_mp, efip); > >>- spin_lock(&ailp->xa_lock); > >>+ /* Skip all EFIs after first EFI in error. */ > >>+ if (!error) > >>+ error = xlog_recover_process_efi(log->l_mp, efip); > >> if (error) > >>- goto out; > >>+ xfs_efi_release(efip, efip->efi_format.efi_nextents); > > > >Hi Mark, > > > >If we hit the scenario where we start skipping EFIs after an error, is > >the equivalent unpin() call from process_efi() not necessary on the > >subsequent EFIs? > > > >Brian > > yes, good catch. They will have to be decremented twice. something like: > + if (!error) > + error = xlog_recover_process_efi(log->l_mp, efip); > + else > + xfs_efi_item_unpin(&efip->efi_item, 0); > + if (error) > ... > Ok, looks reasonable to me. An extra sentence or two in the previous comment to explain what's going on there would be nice as well. ;) Brian > --Mark _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: remove efi from AIL in log recovery error 2014-03-28 16:07 ` Brian Foster @ 2014-03-28 16:21 ` Mark Tinguely 0 siblings, 0 replies; 12+ messages in thread From: Mark Tinguely @ 2014-03-28 16:21 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 03/28/14 11:07, Brian Foster wrote: > On Fri, Mar 28, 2014 at 10:41:06AM -0500, Mark Tinguely wrote: >> On 03/28/14 10:24, Brian Foster wrote: >>> On Tue, Mar 25, 2014 at 03:06:34PM -0500, Mark Tinguely wrote: ... >>> Hi Mark, >>> >>> If we hit the scenario where we start skipping EFIs after an error, is >>> the equivalent unpin() call from process_efi() not necessary on the >>> subsequent EFIs? >>> >>> Brian >> >> yes, good catch. They will have to be decremented twice. something like: >> + if (!error) >> + error = xlog_recover_process_efi(log->l_mp, efip); >> + else >> + xfs_efi_item_unpin(&efip->efi_item, 0); >> + if (error) >> ... >> > > Ok, looks reasonable to me. An extra sentence or two in the previous > comment to explain what's going on there would be nice as well. ;) > > Brian Probably will flip the if statement logic, but a comment is also a good idea. Thank-you for the feed back. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/2] xfs: remove efi from AIL in log recovery 2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely 2014-03-28 15:24 ` Brian Foster @ 2014-03-31 20:25 ` Mark Tinguely 2014-04-03 19:07 ` Dave Chinner 1 sibling, 1 reply; 12+ messages in thread From: Mark Tinguely @ 2014-03-31 20:25 UTC (permalink / raw) To: xfs [-- Attachment #1: xfs-fix-efi-in-log-recovery-error.patch --] [-- Type: text/plain, Size: 5703 bytes --] xlog_recover_process_efi{s}() functions are completing the second half of xfs_bmap_finish() which frees extents. If this operation fails, the EFI will stay on the AIL and prevents the xfs_ail_push all_sync() from completing and the mount will fail to unmount. Rather than have a special log recovery flag XFS_EFI_RECOVERED to decrement the EFI/EFD counter, call the same decrement function from the log recovery routine that is called then the EFI is added to the AIL from a log write. Remove all other unprocessed EFIs from the log recovery AIL when one is discovered in error. Signed-off-by: Mark Tinguely <tinguely@sgi.com> --- v2 per Brian's feedback: decrement the EFI counter again when removing EFI entries after the entry with the error. add comment in xlog_recover_process_efis() where we delete the EFIs. v1: Rewritten with suggestions from Dave. Note: calling xfs_efi_item_unpin() seemed more explainatory than calling the helper __xfs_efi_release(). fs/xfs/xfs_extfree_item.c | 9 +++------ fs/xfs/xfs_log_recover.c | 37 ++++++++++++++++++++++++------------- fs/xfs/xfs_trans.h | 1 + 3 files changed, 28 insertions(+), 19 deletions(-) Index: b/fs/xfs/xfs_extfree_item.c =================================================================== --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -134,9 +134,10 @@ xfs_efi_item_pin( * remove the EFI it's because the transaction has been cancelled and by * definition that means the EFI cannot be in the AIL so remove it from the * transaction and free it. Otherwise coordinate with xfs_efi_release() - * to determine who gets to free the EFI. + * to determine who gets to free the EFI. Call from log recovery of EFI + * entries so the EFD or error handling will remove the entry. */ -STATIC void +void xfs_efi_item_unpin( struct xfs_log_item *lip, int remove) @@ -313,10 +314,6 @@ xfs_efi_release(xfs_efi_log_item_t *efip { ASSERT(atomic_read(&efip->efi_next_extent) >= nextents); if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) { - /* recovery needs us to drop the EFI reference, too */ - if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)) - __xfs_efi_release(efip); - __xfs_efi_release(efip); /* efip may now have been freed, do not reference it again. */ } Index: b/fs/xfs/xfs_log_recover.c =================================================================== --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3634,6 +3634,7 @@ xlog_recover_process_data( /* * Process an extent free intent item that was recovered from * the log. We need to free the extents that it describes. + * The caller will release this and any following EFIs upon error. */ STATIC int xlog_recover_process_efi( @@ -3648,6 +3649,13 @@ xlog_recover_process_efi( xfs_fsblock_t startblock_fsb; ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)); + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); + + /* + * Decrement the EFI/EFD counter so the EFI is removed after + * processing the EFD or error handling in the caller. + */ + xfs_efi_item_unpin(&efip->efi_item, 0); /* * First check the validity of the extents described by the @@ -3662,12 +3670,6 @@ xlog_recover_process_efi( (extp->ext_len == 0) || (startblock_fsb >= mp->m_sb.sb_dblocks) || (extp->ext_len >= mp->m_sb.sb_agblocks)) { - /* - * This will pull the EFI from the AIL and - * free the memory associated with it. - */ - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); - xfs_efi_release(efip, efip->efi_format.efi_nextents); return XFS_ERROR(EIO); } } @@ -3687,7 +3689,6 @@ xlog_recover_process_efi( extp->ext_len); } - set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); error = xfs_trans_commit(tp, 0); return error; @@ -3718,8 +3719,8 @@ STATIC int xlog_recover_process_efis( struct xlog *log) { - xfs_log_item_t *lip; - xfs_efi_log_item_t *efip; + struct xfs_log_item *lip; + struct xfs_efi_log_item *efip; int error = 0; struct xfs_ail_cursor cur; struct xfs_ail *ailp; @@ -3750,13 +3751,23 @@ xlog_recover_process_efis( } spin_unlock(&ailp->xa_lock); - error = xlog_recover_process_efi(log->l_mp, efip); - spin_lock(&ailp->xa_lock); + if (error) { + /* + * An error happened while processing a previous + * EFI entry. Since the extents are freed in order, + * skip the remaining unprocessed EFIs. To do this + * the EFI entry counter must be decremented once + * here and the entry will be decremented and removed + * with the following xfs_efi_release(). + */ + xfs_efi_item_unpin(&efip->efi_item, 0); + } else + error = xlog_recover_process_efi(log->l_mp, efip); if (error) - goto out; + xfs_efi_release(efip, efip->efi_format.efi_nextents); + spin_lock(&ailp->xa_lock); lip = xfs_trans_ail_cursor_next(ailp, &cur); } -out: xfs_trans_ail_cursor_done(ailp, &cur); spin_unlock(&ailp->xa_lock); return error; Index: b/fs/xfs/xfs_trans.h =================================================================== --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -216,6 +216,7 @@ void xfs_trans_ijoin(struct xfs_trans * void xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint); void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint); struct xfs_efi_log_item *xfs_trans_get_efi(xfs_trans_t *, uint); +void xfs_efi_item_unpin(struct xfs_log_item *, int); void xfs_efi_release(struct xfs_efi_log_item *, uint); void xfs_trans_log_efi_extent(xfs_trans_t *, struct xfs_efi_log_item *, _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] xfs: remove efi from AIL in log recovery 2014-03-31 20:25 ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely @ 2014-04-03 19:07 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2014-04-03 19:07 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Mon, Mar 31, 2014 at 03:25:56PM -0500, Mark Tinguely wrote: > xlog_recover_process_efi{s}() functions are completing the > second half of xfs_bmap_finish() which frees extents. If this > operation fails, the EFI will stay on the AIL and prevents > the xfs_ail_push all_sync() from completing and the mount will > fail to unmount. > > Rather than have a special log recovery flag XFS_EFI_RECOVERED > to decrement the EFI/EFD counter, call the same decrement function > from the log recovery routine that is called then the EFI is added > to the AIL from a log write. .... > =================================================================== > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -3634,6 +3634,7 @@ xlog_recover_process_data( > /* > * Process an extent free intent item that was recovered from > * the log. We need to free the extents that it describes. > + * The caller will release this and any following EFIs upon error. > */ > STATIC int > xlog_recover_process_efi( > @@ -3648,6 +3649,13 @@ xlog_recover_process_efi( > xfs_fsblock_t startblock_fsb; > > ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags)); > + set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); > + > + /* > + * Decrement the EFI/EFD counter so the EFI is removed after > + * processing the EFD or error handling in the caller. > + */ > + xfs_efi_item_unpin(&efip->efi_item, 0); This should be done where the EFI is inserted into the AIL during recovery, not where it is removed. That means "committed" EFIs in the AIL have the same reference count values for both the normal transaction case and the log recovery case. Brian already caught a bug in your original patch because of this, so lets not leave landmines if we can avoid it. Also, the comment needs to explain why we don't need the reference count (i.e. because it's a committed EFI and hence only the active EFD can hold a reference against it now) and that on error the caller is responsible for removing the EFI from the AIL and freeing it. Finally, I don't think there's any need for the XFS_EFI_RECOVERED bit anymore, because we either process it (and remove it from the AIL) or free it after removing it from the AIL on error and so it won't ever get found in the AIL with that flag set.... > @@ -3750,13 +3751,23 @@ xlog_recover_process_efis( > } > > spin_unlock(&ailp->xa_lock); > - error = xlog_recover_process_efi(log->l_mp, efip); > - spin_lock(&ailp->xa_lock); > + if (error) { > + /* > + * An error happened while processing a previous > + * EFI entry. Since the extents are freed in order, > + * skip the remaining unprocessed EFIs. To do this > + * the EFI entry counter must be decremented once > + * here and the entry will be decremented and removed > + * with the following xfs_efi_release(). > + */ > + xfs_efi_item_unpin(&efip->efi_item, 0); > + } else > + error = xlog_recover_process_efi(log->l_mp, efip); > if (error) > - goto out; > + xfs_efi_release(efip, efip->efi_format.efi_nextents); > + spin_lock(&ailp->xa_lock); > lip = xfs_trans_ail_cursor_next(ailp, &cur); OK, so this now relies on xfs_efi_release() to remove the EFI from the AIL. But the reason the unpin ugliness needs to be done here is that we left a stale AIL reference on the EFI when we inserted it into the AIL. That's another reason for moving that xfs_efi_item_unpin() to the insert code. Also, I'd prefer the large comment decribing the logic flow goes at the head of the function (i.e. describing the overall logic flow of the function) and that call should clean up this logic a lot. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown 2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely 2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely @ 2014-03-25 20:06 ` Mark Tinguely 2014-04-03 19:34 ` Dave Chinner 1 sibling, 1 reply; 12+ messages in thread From: Mark Tinguely @ 2014-03-25 20:06 UTC (permalink / raw) To: xfs [-- Attachment #1: xfs-fix-efi-xfs_bmapi_finish-error.patch --] [-- Type: text/plain, Size: 7709 bytes --] The extent free intention (EFI) and extent free done (EFD) log items are in separate transactions. It is possible that the EFI can be pushed to the AIL before a forced shutdown where it gets stuck for following reasons: No complete EFD in the transaction. This can happen if the transaction fails to reserve space or the freeing the extent fails in xfs_bmap_finish(). EFD IOP Abort processing. If xfs_trans_cancel() is called with an abort flag, or if the xfs_trans_commit() is called when the file system is in forced shutdown or if the log buffer write fails, then the EFD iop commands will not remove the EFI from the AIL. If the EFI entry is left on the AIL, xfs_ail_push all_sync() will fail to complete, the unmount will hang, and a reboot is required to get the complete the unmount. Do not free the EFI structure immediately on forced shutdowns, but instead use the IOP calls to match the EFI/EFD entries. A small wrinkle in the mix is the EFI is not automatically placed on the AIL by the IOP routines in the cases but may have made it to the AIL before the abort. We now have to check if the EFI is on the AIL in the abort IOP cases. Signed-off-by: Mark Tinguely <tinguely@sgi.com> --- Dave, as I tried to explain, if the error in extent free happen early enough, the EFD is not usable and is not in the transaction to cause IOP commands to clean up the EFI. It too must be done manually. The "in the AIL" test must be done in the abort IOP commands. fs/xfs/xfs_bmap_util.c | 21 +++++++++++++++---- fs/xfs/xfs_extfree_item.c | 49 +++++++++++++++++++++++++++++++++------------- fs/xfs/xfs_log_recover.c | 3 +- fs/xfs/xfs_trans.h | 2 - 4 files changed, 56 insertions(+), 19 deletions(-) Index: b/fs/xfs/xfs_bmap_util.c =================================================================== --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -116,12 +116,14 @@ xfs_bmap_finish( error = xfs_trans_reserve(ntp, &tres, 0, 0); if (error) - return error; + goto error_return; + efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count); for (free = flist->xbf_first; free != NULL; free = next) { next = free->xbfi_next; - if ((error = xfs_free_extent(ntp, free->xbfi_startblock, - free->xbfi_blockcount))) { + error = xfs_free_extent(ntp, free->xbfi_startblock, + free->xbfi_blockcount); + if (error) { /* * The bmap free list will be cleaned up at a * higher level. The EFI will be canceled when @@ -136,13 +138,24 @@ xfs_bmap_finish( (error == EFSCORRUPTED) ? SHUTDOWN_CORRUPT_INCORE : SHUTDOWN_META_IO_ERROR); - return error; + goto error_return; } xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock, free->xbfi_blockcount); xfs_bmap_del_free(flist, NULL, free); } return 0; + + error_return: + /* + * No EFD in the transaction matching the EFI can be used at this + * point Manually release the EFI and remove from AIL if necessary. + * If the EFI did not make it into the AIL, then the transaction + * cancel of the EFI will decrement the EFI/EFD counter and will not + * attempt to remove itself from the AIL. + */ + xfs_efi_release(efi, efi->efi_format.efi_nextents, 0); + return error; } int Index: b/fs/xfs/xfs_extfree_item.c =================================================================== --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -56,15 +56,22 @@ xfs_efi_item_free( */ STATIC void __xfs_efi_release( - struct xfs_efi_log_item *efip) + struct xfs_efi_log_item *efip, + int abort) { struct xfs_ail *ailp = efip->efi_item.li_ailp; if (atomic_dec_and_test(&efip->efi_refcount)) { spin_lock(&ailp->xa_lock); - /* xfs_trans_ail_delete() drops the AIL lock. */ - xfs_trans_ail_delete(ailp, &efip->efi_item, - SHUTDOWN_LOG_IO_ERROR); + /* + * The EFI may not be on the AIL on abort. + * xfs_trans_ail_delete() drops the AIL lock. + */ + if (abort) + spin_unlock(&ailp->xa_lock); + else + xfs_trans_ail_delete(ailp, &efip->efi_item, + SHUTDOWN_LOG_IO_ERROR); xfs_efi_item_free(efip); } } @@ -148,10 +155,10 @@ xfs_efi_item_unpin( ASSERT(!(lip->li_flags & XFS_LI_IN_AIL)); if (lip->li_desc) xfs_trans_del_item(lip); - xfs_efi_item_free(efip); + __xfs_efi_release(efip, 1); return; } - __xfs_efi_release(efip); + __xfs_efi_release(efip, 0); } /* @@ -173,8 +180,9 @@ STATIC void xfs_efi_item_unlock( struct xfs_log_item *lip) { + /* EFI will not be on the AIL if called on abort */ if (lip->li_flags & XFS_LI_ABORTED) - xfs_efi_item_free(EFI_ITEM(lip)); + __xfs_efi_release(EFI_ITEM(lip), 1); } /* @@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf */ void xfs_efi_release(xfs_efi_log_item_t *efip, - uint nextents) + uint nextents, + int abort) { ASSERT(atomic_read(&efip->efi_next_extent) >= nextents); if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) { - __xfs_efi_release(efip); + __xfs_efi_release(efip, abort); /* efip may now have been freed, do not reference it again. */ } } @@ -417,8 +426,19 @@ STATIC void xfs_efd_item_unlock( struct xfs_log_item *lip) { - if (lip->li_flags & XFS_LI_ABORTED) - xfs_efd_item_free(EFD_ITEM(lip)); + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); + + if (!(lip->li_flags & XFS_LI_ABORTED)) + return; + + /* Free the EFI when aborting a commit. The EFI will be either + * added to the AIL in a CIL push before this abort or unlocked + * before the EFD unlock. In either case we can check the AIL + * status now. + */ + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, + !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL)); + xfs_efd_item_free(efdp); } /* @@ -434,14 +454,17 @@ xfs_efd_item_committed( xfs_lsn_t lsn) { struct xfs_efd_log_item *efdp = EFD_ITEM(lip); + int abort = 0; /* * If we got a log I/O error, it's always the case that the LR with the * EFI got unpinned and freed before the EFD got aborted. */ - if (!(lip->li_flags & XFS_LI_ABORTED)) - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); + if (lip->li_flags & XFS_LI_ABORTED && + !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL)) + abort = 1; + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort); xfs_efd_item_free(efdp); return (xfs_lsn_t)-1; } Index: b/fs/xfs/xfs_log_recover.c =================================================================== --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3754,8 +3754,9 @@ xlog_recover_process_efis( /* Skip all EFIs after first EFI in error. */ if (!error) error = xlog_recover_process_efi(log->l_mp, efip); + /* EFI will be on the AIL, so abort == 0 */ if (error) - xfs_efi_release(efip, efip->efi_format.efi_nextents); + xfs_efi_release(efip, efip->efi_format.efi_nextents, 0); spin_lock(&ailp->xa_lock); lip = xfs_trans_ail_cursor_next(ailp, &cur); } Index: b/fs/xfs/xfs_trans.h =================================================================== --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -217,7 +217,7 @@ void xfs_trans_log_buf(xfs_trans_t *, s void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint); struct xfs_efi_log_item *xfs_trans_get_efi(xfs_trans_t *, uint); void xfs_efi_item_unpin(struct xfs_log_item *, int); -void xfs_efi_release(struct xfs_efi_log_item *, uint); +void xfs_efi_release(struct xfs_efi_log_item *, uint, int); void xfs_trans_log_efi_extent(xfs_trans_t *, struct xfs_efi_log_item *, xfs_fsblock_t, _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown 2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely @ 2014-04-03 19:34 ` Dave Chinner 2014-04-03 20:48 ` Mark Tinguely 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2014-04-03 19:34 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote: > The extent free intention (EFI) and extent free done (EFD) > log items are in separate transactions. It is possible that > the EFI can be pushed to the AIL before a forced shutdown > where it gets stuck for following reasons: > > No complete EFD in the transaction. This can happen if the > transaction fails to reserve space or the freeing the extent > fails in xfs_bmap_finish(). > > EFD IOP Abort processing. If xfs_trans_cancel() is called with > an abort flag, or if the xfs_trans_commit() is called when the > file system is in forced shutdown or if the log buffer write fails, > then the EFD iop commands will not remove the EFI from the AIL. > > If the EFI entry is left on the AIL, xfs_ail_push all_sync() will > fail to complete, the unmount will hang, and a reboot is required > to get the complete the unmount. > > Do not free the EFI structure immediately on forced shutdowns, but > instead use the IOP calls to match the EFI/EFD entries. A small > wrinkle in the mix is the EFI is not automatically placed on the > AIL by the IOP routines in the cases but may have made it to the > AIL before the abort. We now have to check if the EFI is on the AIL > in the abort IOP cases. > > Signed-off-by: Mark Tinguely <tinguely@sgi.com> > --- > Dave, as I tried to explain, if the error in extent free happen early > enough, the EFD is not usable and is not in the transaction to cause > IOP commands to clean up the EFI. It too must be done manually. > > The "in the AIL" test must be done in the abort IOP commands. I still don't see the reason behind these assertions. It just doesn't make any sense to me that it can't be handled once the EFI has been attached to the EFD, nor why checking AIL status in __xfs_efi_release() does not work correctly... > fs/xfs/xfs_bmap_util.c | 21 +++++++++++++++---- > fs/xfs/xfs_extfree_item.c | 49 +++++++++++++++++++++++++++++++++------------- > fs/xfs/xfs_log_recover.c | 3 +- > fs/xfs/xfs_trans.h | 2 - > 4 files changed, 56 insertions(+), 19 deletions(-) > > Index: b/fs/xfs/xfs_bmap_util.c > =================================================================== > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -116,12 +116,14 @@ xfs_bmap_finish( > > error = xfs_trans_reserve(ntp, &tres, 0, 0); > if (error) > - return error; > + goto error_return; > + > efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count); > for (free = flist->xbf_first; free != NULL; free = next) { > next = free->xbfi_next; > - if ((error = xfs_free_extent(ntp, free->xbfi_startblock, > - free->xbfi_blockcount))) { > + error = xfs_free_extent(ntp, free->xbfi_startblock, > + free->xbfi_blockcount); > + if (error) { > /* > * The bmap free list will be cleaned up at a > * higher level. The EFI will be canceled when > @@ -136,13 +138,24 @@ xfs_bmap_finish( > (error == EFSCORRUPTED) ? > SHUTDOWN_CORRUPT_INCORE : > SHUTDOWN_META_IO_ERROR); > - return error; > + goto error_return; > } > xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock, > free->xbfi_blockcount); > xfs_bmap_del_free(flist, NULL, free); > } > return 0; > + > + error_return: > + /* > + * No EFD in the transaction matching the EFI can be used at this > + * point Manually release the EFI and remove from AIL if necessary. > + * If the EFI did not make it into the AIL, then the transaction > + * cancel of the EFI will decrement the EFI/EFD counter and will not > + * attempt to remove itself from the AIL. > + */ > + xfs_efi_release(efi, efi->efi_format.efi_nextents, 0); > + return error; This is inconsistent from a high level logic point of view. The EFI is attached to the EFD after a call to xfs_trans_get_efd() and so cleanup on failure should always occur through the IOP* interfaces (specifically xfs_efd_item_unlock() with an aborted log item). The only case where a xfs_efi_release() call is required is the xfs_trans_reserve() failure case, where nothing in the current transaction context owns the reference on the EFI we are going to pass to the EFD. > } > > int > Index: b/fs/xfs/xfs_extfree_item.c > =================================================================== > --- a/fs/xfs/xfs_extfree_item.c > +++ b/fs/xfs/xfs_extfree_item.c > @@ -56,15 +56,22 @@ xfs_efi_item_free( > */ > STATIC void > __xfs_efi_release( > - struct xfs_efi_log_item *efip) > + struct xfs_efi_log_item *efip, > + int abort) > { > struct xfs_ail *ailp = efip->efi_item.li_ailp; > > if (atomic_dec_and_test(&efip->efi_refcount)) { > spin_lock(&ailp->xa_lock); > - /* xfs_trans_ail_delete() drops the AIL lock. */ > - xfs_trans_ail_delete(ailp, &efip->efi_item, > - SHUTDOWN_LOG_IO_ERROR); > + /* > + * The EFI may not be on the AIL on abort. > + * xfs_trans_ail_delete() drops the AIL lock. > + */ > + if (abort) > + spin_unlock(&ailp->xa_lock); > + else > + xfs_trans_ail_delete(ailp, &efip->efi_item, > + SHUTDOWN_LOG_IO_ERROR); > xfs_efi_item_free(efip); > } > } "Abort" is not necessary. If it's the last reference, and the EFI is in the AIL, then it needs to be removed. i.e. the check shoul dbe against XFS_LI_IN_AIL, not against a flag passed in by the caller. > @@ -148,10 +155,10 @@ xfs_efi_item_unpin( > ASSERT(!(lip->li_flags & XFS_LI_IN_AIL)); > if (lip->li_desc) > xfs_trans_del_item(lip); > - xfs_efi_item_free(efip); > + __xfs_efi_release(efip, 1); > return; > } > - __xfs_efi_release(efip); > + __xfs_efi_release(efip, 0); > } > > /* > @@ -173,8 +180,9 @@ STATIC void > xfs_efi_item_unlock( > struct xfs_log_item *lip) > { > + /* EFI will not be on the AIL if called on abort */ If that is true, then ASSERT() it rather than comment about it. > if (lip->li_flags & XFS_LI_ABORTED) > - xfs_efi_item_free(EFI_ITEM(lip)); > + __xfs_efi_release(EFI_ITEM(lip), 1); > } > > /* > @@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf > */ > void > xfs_efi_release(xfs_efi_log_item_t *efip, > - uint nextents) > + uint nextents, > + int abort) > { > ASSERT(atomic_read(&efip->efi_next_extent) >= nextents); > if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) { > - __xfs_efi_release(efip); > + __xfs_efi_release(efip, abort); > /* efip may now have been freed, do not reference it again. */ > } > } > @@ -417,8 +426,19 @@ STATIC void > xfs_efd_item_unlock( > struct xfs_log_item *lip) > { > - if (lip->li_flags & XFS_LI_ABORTED) > - xfs_efd_item_free(EFD_ITEM(lip)); > + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > + > + if (!(lip->li_flags & XFS_LI_ABORTED)) > + return; > + > + /* Free the EFI when aborting a commit. The EFI will be either > + * added to the AIL in a CIL push before this abort or unlocked > + * before the EFD unlock. In either case we can check the AIL > + * status now. > + */ We are not freeing the EFI here - we are simply releasing the reference the EFD holds on it because we are about to free the EFD. Nothing else actually matters here - we don't care whether the EFI is in the AIL or not, because that can be handled internally to the xfs_efi_release() code.... > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, > + !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL)); > + xfs_efd_item_free(efdp); > } > > /* > @@ -434,14 +454,17 @@ xfs_efd_item_committed( > xfs_lsn_t lsn) > { > struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > + int abort = 0; > > /* > * If we got a log I/O error, it's always the case that the LR with the > * EFI got unpinned and freed before the EFD got aborted. > */ > - if (!(lip->li_flags & XFS_LI_ABORTED)) > - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > + if (lip->li_flags & XFS_LI_ABORTED && > + !(efdp->efd_efip->efi_item.li_flags & XFS_LI_IN_AIL)) > + abort = 1; > > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort); All this abort logic can go away if __xfs_efi_release() handles the AIL state internally and all the EFD does is drop the reference to the EFI it owns. This function simply becomes: { struct xfs_efd_log_item *efdp = EFD_ITEM(lip); xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); xfs_efd_item_free(efdp); return (xfs_lsn_t)-1; } No conditional behaviour, always does the same thing regardless of caller context. > } > Index: b/fs/xfs/xfs_log_recover.c > =================================================================== > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -3754,8 +3754,9 @@ xlog_recover_process_efis( > /* Skip all EFIs after first EFI in error. */ > if (!error) > error = xlog_recover_process_efi(log->l_mp, efip); > + /* EFI will be on the AIL, so abort == 0 */ > if (error) > - xfs_efi_release(efip, efip->efi_format.efi_nextents); > + xfs_efi_release(efip, efip->efi_format.efi_nextents, 0); Yet another place that has to know what the state of the EFI is to handle it correctly.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown 2014-04-03 19:34 ` Dave Chinner @ 2014-04-03 20:48 ` Mark Tinguely 2014-04-03 21:53 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Mark Tinguely @ 2014-04-03 20:48 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 04/03/14 14:34, Dave Chinner wrote: > On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote: >> The extent free intention (EFI) and extent free done (EFD) >> log items are in separate transactions. It is possible that >> the EFI can be pushed to the AIL before a forced shutdown >> where it gets stuck for following reasons: >> >> No complete EFD in the transaction. This can happen if the >> transaction fails to reserve space or the freeing the extent >> fails in xfs_bmap_finish(). >> >> EFD IOP Abort processing. If xfs_trans_cancel() is called with >> an abort flag, or if the xfs_trans_commit() is called when the >> file system is in forced shutdown or if the log buffer write fails, >> then the EFD iop commands will not remove the EFI from the AIL. >> >> If the EFI entry is left on the AIL, xfs_ail_push all_sync() will >> fail to complete, the unmount will hang, and a reboot is required >> to get the complete the unmount. >> >> Do not free the EFI structure immediately on forced shutdowns, but >> instead use the IOP calls to match the EFI/EFD entries. A small >> wrinkle in the mix is the EFI is not automatically placed on the >> AIL by the IOP routines in the cases but may have made it to the >> AIL before the abort. We now have to check if the EFI is on the AIL >> in the abort IOP cases. >> >> Signed-off-by: Mark Tinguely<tinguely@sgi.com> >> --- >> Dave, as I tried to explain, if the error in extent free happen early >> enough, the EFD is not usable and is not in the transaction to cause >> IOP commands to clean up the EFI. It too must be done manually. >> >> The "in the AIL" test must be done in the abort IOP commands. > > I still don't see the reason behind these assertions. It just > doesn't make any sense to me that it can't be handled once the EFI > has been attached to the EFD, nor why checking AIL status in > __xfs_efi_release() does not work correctly... > >> fs/xfs/xfs_bmap_util.c | 21 +++++++++++++++---- >> fs/xfs/xfs_extfree_item.c | 49 +++++++++++++++++++++++++++++++++------------- >> fs/xfs/xfs_log_recover.c | 3 +- >> fs/xfs/xfs_trans.h | 2 - >> 4 files changed, 56 insertions(+), 19 deletions(-) >> >> Index: b/fs/xfs/xfs_bmap_util.c >> =================================================================== >> --- a/fs/xfs/xfs_bmap_util.c >> +++ b/fs/xfs/xfs_bmap_util.c >> @@ -116,12 +116,14 @@ xfs_bmap_finish( >> >> error = xfs_trans_reserve(ntp,&tres, 0, 0); >> if (error) >> - return error; >> + goto error_return; >> + >> efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count); >> for (free = flist->xbf_first; free != NULL; free = next) { >> next = free->xbfi_next; >> - if ((error = xfs_free_extent(ntp, free->xbfi_startblock, >> - free->xbfi_blockcount))) { >> + error = xfs_free_extent(ntp, free->xbfi_startblock, >> + free->xbfi_blockcount); >> + if (error) { >> /* >> * The bmap free list will be cleaned up at a >> * higher level. The EFI will be canceled when >> @@ -136,13 +138,24 @@ xfs_bmap_finish( >> (error == EFSCORRUPTED) ? >> SHUTDOWN_CORRUPT_INCORE : >> SHUTDOWN_META_IO_ERROR); >> - return error; >> + goto error_return; >> } >> xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock, >> free->xbfi_blockcount); ^^^^^^ here is where the EFD is placed in the transaction. It cannot be placed in the transaction earlier because it is not complete. >> xfs_bmap_del_free(flist, NULL, free); >> } >> return 0; >> + >> + error_return: >> + /* >> + * No EFD in the transaction matching the EFI can be used at this >> + * point Manually release the EFI and remove from AIL if necessary. >> + * If the EFI did not make it into the AIL, then the transaction >> + * cancel of the EFI will decrement the EFI/EFD counter and will not >> + * attempt to remove itself from the AIL. >> + */ >> + xfs_efi_release(efi, efi->efi_format.efi_nextents, 0); >> + return error; > > This is inconsistent from a high level logic point of view. The EFI > is attached to the EFD after a call to xfs_trans_get_efd() and so > cleanup on failure should always occur through the IOP* interfaces > (specifically xfs_efd_item_unlock() with an aborted log item). > > The only case where a xfs_efi_release() call is required is the > xfs_trans_reserve() failure case, where nothing in the current > transaction context owns the reference on the EFI we are going to > pass to the EFD. The EFD is not in the transaction in the failed xfs_extent_free failure case and there is not efd IOP call. > >> } >> >> int >> Index: b/fs/xfs/xfs_extfree_item.c >> =================================================================== >> --- a/fs/xfs/xfs_extfree_item.c >> +++ b/fs/xfs/xfs_extfree_item.c >> @@ -56,15 +56,22 @@ xfs_efi_item_free( >> */ >> STATIC void >> __xfs_efi_release( >> - struct xfs_efi_log_item *efip) >> + struct xfs_efi_log_item *efip, >> + int abort) >> { >> struct xfs_ail *ailp = efip->efi_item.li_ailp; >> >> if (atomic_dec_and_test(&efip->efi_refcount)) { >> spin_lock(&ailp->xa_lock); >> - /* xfs_trans_ail_delete() drops the AIL lock. */ >> - xfs_trans_ail_delete(ailp,&efip->efi_item, >> - SHUTDOWN_LOG_IO_ERROR); >> + /* >> + * The EFI may not be on the AIL on abort. >> + * xfs_trans_ail_delete() drops the AIL lock. >> + */ >> + if (abort) >> + spin_unlock(&ailp->xa_lock); >> + else >> + xfs_trans_ail_delete(ailp,&efip->efi_item, >> + SHUTDOWN_LOG_IO_ERROR); >> xfs_efi_item_free(efip); >> } >> } > > "Abort" is not necessary. If it's the last reference, and the EFI is > in the AIL, then it needs to be removed. i.e. the check shoul dbe > against XFS_LI_IN_AIL, not against a flag passed in by the caller. You are mistaken. It may or may not be in the AIL. > >> @@ -148,10 +155,10 @@ xfs_efi_item_unpin( >> ASSERT(!(lip->li_flags& XFS_LI_IN_AIL)); >> if (lip->li_desc) >> xfs_trans_del_item(lip); >> - xfs_efi_item_free(efip); >> + __xfs_efi_release(efip, 1); >> return; >> } >> - __xfs_efi_release(efip); >> + __xfs_efi_release(efip, 0); >> } >> >> /* >> @@ -173,8 +180,9 @@ STATIC void >> xfs_efi_item_unlock( >> struct xfs_log_item *lip) >> { >> + /* EFI will not be on the AIL if called on abort */ > > If that is true, then ASSERT() it rather than comment about it. > >> if (lip->li_flags& XFS_LI_ABORTED) >> - xfs_efi_item_free(EFI_ITEM(lip)); >> + __xfs_efi_release(EFI_ITEM(lip), 1); >> } >> >> /* >> @@ -310,11 +318,12 @@ xfs_efi_copy_format(xfs_log_iovec_t *buf >> */ >> void >> xfs_efi_release(xfs_efi_log_item_t *efip, >> - uint nextents) >> + uint nextents, >> + int abort) >> { >> ASSERT(atomic_read(&efip->efi_next_extent)>= nextents); >> if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) { >> - __xfs_efi_release(efip); >> + __xfs_efi_release(efip, abort); >> /* efip may now have been freed, do not reference it again. */ >> } >> } >> @@ -417,8 +426,19 @@ STATIC void >> xfs_efd_item_unlock( >> struct xfs_log_item *lip) >> { >> - if (lip->li_flags& XFS_LI_ABORTED) >> - xfs_efd_item_free(EFD_ITEM(lip)); >> + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); >> + >> + if (!(lip->li_flags& XFS_LI_ABORTED)) >> + return; >> + >> + /* Free the EFI when aborting a commit. The EFI will be either >> + * added to the AIL in a CIL push before this abort or unlocked >> + * before the EFD unlock. In either case we can check the AIL >> + * status now. >> + */ > > We are not freeing the EFI here - we are simply releasing the > reference the EFD holds on it because we are about to free the EFD. > Nothing else actually matters here - we don't care whether the EFI > is in the AIL or not, because that can be handled internally to > the xfs_efi_release() code.... Again. The EFI is in the previous commit. It could have made it to the AIL or is in the same abort. > >> + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, >> + !(efdp->efd_efip->efi_item.li_flags& XFS_LI_IN_AIL)); >> + xfs_efd_item_free(efdp); >> } >> /* >> @@ -434,14 +454,17 @@ xfs_efd_item_committed( >> xfs_lsn_t lsn) >> { >> struct xfs_efd_log_item *efdp = EFD_ITEM(lip); >> + int abort = 0; >> >> /* >> * If we got a log I/O error, it's always the case that the LR with the >> * EFI got unpinned and freed before the EFD got aborted. >> */ >> - if (!(lip->li_flags& XFS_LI_ABORTED)) >> - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); >> + if (lip->li_flags & XFS_LI_ABORTED && >> + !(efdp->efd_efip->efi_item.li_flags& XFS_LI_IN_AIL)) >> + abort = 1; >> >> + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents, abort); > > All this abort logic can go away if __xfs_efi_release() > handles the AIL state internally and all the EFD does is drop the > reference to the EFI it owns. This function simply becomes: The EFI in the previous commit could have made it to the AIL or not. > { > struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > xfs_efd_item_free(efdp); > return (xfs_lsn_t)-1; > } > > No conditional behaviour, always does the same thing regardless of > caller context. I disagree. It may or may not be in the AIL at this point. > >> } >> Index: b/fs/xfs/xfs_log_recover.c >> =================================================================== >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -3754,8 +3754,9 @@ xlog_recover_process_efis( >> /* Skip all EFIs after first EFI in error. */ >> if (!error) >> error = xlog_recover_process_efi(log->l_mp, efip); >> + /* EFI will be on the AIL, so abort == 0 */ >> if (error) >> - xfs_efi_release(efip, efip->efi_format.efi_nextents); >> + xfs_efi_release(efip, efip->efi_format.efi_nextents, 0); > > Yet another place that has to know what the state of the EFI is to > handle it correctly.... We know the EFI is in the AIL here. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown 2014-04-03 20:48 ` Mark Tinguely @ 2014-04-03 21:53 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2014-04-03 21:53 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Apr 03, 2014 at 03:48:40PM -0500, Mark Tinguely wrote: > On 04/03/14 14:34, Dave Chinner wrote: > >On Tue, Mar 25, 2014 at 03:06:35PM -0500, Mark Tinguely wrote: > >>The extent free intention (EFI) and extent free done (EFD) > >>log items are in separate transactions. It is possible that > >>the EFI can be pushed to the AIL before a forced shutdown > >>where it gets stuck for following reasons: > >> > >> No complete EFD in the transaction. This can happen if the > >> transaction fails to reserve space or the freeing the extent > >> fails in xfs_bmap_finish(). > >> > >> EFD IOP Abort processing. If xfs_trans_cancel() is called with > >> an abort flag, or if the xfs_trans_commit() is called when the > >> file system is in forced shutdown or if the log buffer write fails, > >> then the EFD iop commands will not remove the EFI from the AIL. > >> > >>If the EFI entry is left on the AIL, xfs_ail_push all_sync() will > >>fail to complete, the unmount will hang, and a reboot is required > >>to get the complete the unmount. > >> > >>Do not free the EFI structure immediately on forced shutdowns, but > >>instead use the IOP calls to match the EFI/EFD entries. A small > >>wrinkle in the mix is the EFI is not automatically placed on the > >>AIL by the IOP routines in the cases but may have made it to the > >>AIL before the abort. We now have to check if the EFI is on the AIL > >>in the abort IOP cases. > >> > >>Signed-off-by: Mark Tinguely<tinguely@sgi.com> > >>--- > >>Dave, as I tried to explain, if the error in extent free happen early > >>enough, the EFD is not usable and is not in the transaction to cause > >>IOP commands to clean up the EFI. It too must be done manually. > >> > >>The "in the AIL" test must be done in the abort IOP commands. > > > >I still don't see the reason behind these assertions. It just > >doesn't make any sense to me that it can't be handled once the EFI > >has been attached to the EFD, nor why checking AIL status in > >__xfs_efi_release() does not work correctly... > > > >> fs/xfs/xfs_bmap_util.c | 21 +++++++++++++++---- > >> fs/xfs/xfs_extfree_item.c | 49 +++++++++++++++++++++++++++++++++------------- > >> fs/xfs/xfs_log_recover.c | 3 +- > >> fs/xfs/xfs_trans.h | 2 - > >> 4 files changed, 56 insertions(+), 19 deletions(-) > >> > >>Index: b/fs/xfs/xfs_bmap_util.c > >>=================================================================== > >>--- a/fs/xfs/xfs_bmap_util.c > >>+++ b/fs/xfs/xfs_bmap_util.c > >>@@ -116,12 +116,14 @@ xfs_bmap_finish( > >> > >> error = xfs_trans_reserve(ntp,&tres, 0, 0); > >> if (error) > >>- return error; > >>+ goto error_return; > >>+ > >> efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count); > >> for (free = flist->xbf_first; free != NULL; free = next) { > >> next = free->xbfi_next; > >>- if ((error = xfs_free_extent(ntp, free->xbfi_startblock, > >>- free->xbfi_blockcount))) { > >>+ error = xfs_free_extent(ntp, free->xbfi_startblock, > >>+ free->xbfi_blockcount); > >>+ if (error) { > >> /* > >> * The bmap free list will be cleaned up at a > >> * higher level. The EFI will be canceled when > >>@@ -136,13 +138,24 @@ xfs_bmap_finish( > >> (error == EFSCORRUPTED) ? > >> SHUTDOWN_CORRUPT_INCORE : > >> SHUTDOWN_META_IO_ERROR); > >>- return error; > >>+ goto error_return; > >> } > >> xfs_trans_log_efd_extent(ntp, efd, free->xbfi_startblock, > >> free->xbfi_blockcount); > > ^^^^^^ > here is where the EFD is placed in the transaction. It cannot be placed > in the transaction earlier because it is not complete. No, that's wrong. The EFD is linked to the transaction in xfs_trans_get_efd() where it calls xfs_trans_add_item(). The above call is where it is logged (i.e. marked dirty), not where it is added to the transaction. > >This is inconsistent from a high level logic point of view. The EFI > >is attached to the EFD after a call to xfs_trans_get_efd() and so > >cleanup on failure should always occur through the IOP* interfaces > >(specifically xfs_efd_item_unlock() with an aborted log item). > > > >The only case where a xfs_efi_release() call is required is the > >xfs_trans_reserve() failure case, where nothing in the current > >transaction context owns the reference on the EFI we are going to > >pass to the EFD. > > The EFD is not in the transaction in the failed xfs_extent_free > failure case and there is not efd IOP call. Yes it is. It may no be dirty, though. > >>--- a/fs/xfs/xfs_extfree_item.c > >>+++ b/fs/xfs/xfs_extfree_item.c > >>@@ -56,15 +56,22 @@ xfs_efi_item_free( > >> */ > >> STATIC void > >> __xfs_efi_release( > >>- struct xfs_efi_log_item *efip) > >>+ struct xfs_efi_log_item *efip, > >>+ int abort) > >> { > >> struct xfs_ail *ailp = efip->efi_item.li_ailp; > >> > >> if (atomic_dec_and_test(&efip->efi_refcount)) { > >> spin_lock(&ailp->xa_lock); > >>- /* xfs_trans_ail_delete() drops the AIL lock. */ > >>- xfs_trans_ail_delete(ailp,&efip->efi_item, > >>- SHUTDOWN_LOG_IO_ERROR); > >>+ /* > >>+ * The EFI may not be on the AIL on abort. > >>+ * xfs_trans_ail_delete() drops the AIL lock. > >>+ */ > >>+ if (abort) > >>+ spin_unlock(&ailp->xa_lock); > >>+ else > >>+ xfs_trans_ail_delete(ailp,&efip->efi_item, > >>+ SHUTDOWN_LOG_IO_ERROR); > >> xfs_efi_item_free(efip); > >> } > >> } > > > >"Abort" is not necessary. If it's the last reference, and the EFI is > >in the AIL, then it needs to be removed. i.e. the check shoul dbe > >against XFS_LI_IN_AIL, not against a flag passed in by the caller. > > You are mistaken. It may or may not be in the AIL. No, I'm not mistaken - I'm talking about using a different techinique to determine whether we need to remove the EFI from the AIL or not. That is, we have a flag on *every log item* that tells us whether the item is in the AIL. It is guaranteed to be correct, and we use it in many places to determine if the object we are about to free is in the AIL and hence needs to be removed first. IOWs, we can use introspection to remove the EFI from the AIL when the last reference goes away. The code is simply: if (atomic_dec_and_test(&efip->efi_refcount)) { spin_lock(&ailp->xa_lock); if (efip->efi_item.li_flags & XFS_LI_IN_AIL) { xfs_trans_ail_delete(ailp,&efip->efi_item, SHUTDOWN_LOG_IO_ERROR); } spin_unlock(&ailp->xa_lock); xfs_efi_item_free(efip); } No abort flag is needed. Other examples of this exact same check being done in error handling paths for various log item are xfs_iflush_abort(), xfs_buf_item_unlock() and xfs_qm_dqflush(). > >> if (atomic_sub_and_test(nextents,&efip->efi_next_extent)) { > >>- __xfs_efi_release(efip); > >>+ __xfs_efi_release(efip, abort); > >> /* efip may now have been freed, do not reference it again. */ > >> } > >> } > >>@@ -417,8 +426,19 @@ STATIC void > >> xfs_efd_item_unlock( > >> struct xfs_log_item *lip) > >> { > >>- if (lip->li_flags& XFS_LI_ABORTED) > >>- xfs_efd_item_free(EFD_ITEM(lip)); > >>+ struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > >>+ > >>+ if (!(lip->li_flags& XFS_LI_ABORTED)) > >>+ return; BTW, your email program is still mangling quoted whitespace in the code. Can you please fix it? > >>+ > >>+ /* Free the EFI when aborting a commit. The EFI will be either > >>+ * added to the AIL in a CIL push before this abort or unlocked > >>+ * before the EFD unlock. In either case we can check the AIL > >>+ * status now. > >>+ */ > > > >We are not freeing the EFI here - we are simply releasing the > >reference the EFD holds on it because we are about to free the EFD. > >Nothing else actually matters here - we don't care whether the EFI > >is in the AIL or not, because that can be handled internally to > >the xfs_efi_release() code.... > > Again. The EFI is in the previous commit. It could have made it to > the AIL or is in the same abort. You're still missing the point. It doesn't matter where the EFI is - what matters is that we are freeing the object that holds a reference to the EFI and so we have to drop it. The point of using reference counting and introspection that no caller needs to care about the internal state of the EFI, just that they hold a reference that needs to be dropped. And that means simpler code and that future modifications are much less likely to screw up the EFI error handling. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-03 21:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-25 20:06 [PATCH 0/2] xfs: Clean the EFI on errors series Mark Tinguely 2014-03-25 20:06 ` [PATCH 1/2] xfs: remove efi from AIL in log recovery error Mark Tinguely 2014-03-28 15:24 ` Brian Foster 2014-03-28 15:41 ` Mark Tinguely 2014-03-28 16:07 ` Brian Foster 2014-03-28 16:21 ` Mark Tinguely 2014-03-31 20:25 ` [PATCH v2 1/2] xfs: remove efi from AIL in log recovery Mark Tinguely 2014-04-03 19:07 ` Dave Chinner 2014-03-25 20:06 ` [PATCH 2/2] xfs: free the EFI entries from AIL on forced shutdown Mark Tinguely 2014-04-03 19:34 ` Dave Chinner 2014-04-03 20:48 ` Mark Tinguely 2014-04-03 21:53 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).