* [PATCH] xfs: don't free EFIs before the EFDs are committed
@ 2013-04-03 3:09 Dave Chinner
2013-04-03 19:12 ` Mark Tinguely
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Dave Chinner @ 2013-04-03 3:09 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Filesystems are occasionally being shut down with this error:
xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.
It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:
http://oss.sgi.com/archives/xfs/2013-01/msg00082.html
The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:
EFI EFI in AIL EFD Result
committed/unpinned Yes committed OK
committed/pinned No committed Shutdown
uncommitted No committed Shutdown
Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck. That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.
It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.
What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.
This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.
Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.
However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_extfree_item.c | 27 +++++++++++++--------------
fs/xfs/xfs_extfree_item.h | 14 +++++++++-----
fs/xfs/xfs_log_recover.c | 1 +
3 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index feb36d7..c0f3750 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -50,9 +50,8 @@ xfs_efi_item_free(
* Freeing the efi requires that we remove it from the AIL if it has already
* been placed there. However, the EFI may not yet have been placed in the AIL
* when called by xfs_efi_release() from EFD processing due to the ordering of
- * committed vs unpin operations in bulk insert operations. Hence the
- * test_and_clear_bit(XFS_EFI_COMMITTED) to ensure only the last caller frees
- * the EFI.
+ * committed vs unpin operations in bulk insert operations. Hence the reference
+ * count to ensure only the last caller frees the EFI.
*/
STATIC void
__xfs_efi_release(
@@ -60,7 +59,7 @@ __xfs_efi_release(
{
struct xfs_ail *ailp = efip->efi_item.li_ailp;
- if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
+ if (atomic_dec_and_test(&efip->efi_refcount)) {
spin_lock(&ailp->xa_lock);
/* xfs_trans_ail_delete() drops the AIL lock. */
xfs_trans_ail_delete(ailp, &efip->efi_item,
@@ -126,8 +125,8 @@ xfs_efi_item_pin(
* which the EFI is manipulated during a transaction. If we are being asked to
* remove the EFI it's because the transaction has been cancelled and by
* definition that means the EFI cannot be in the AIL so remove it from the
- * transaction and free it. Otherwise coordinate with xfs_efi_release() (via
- * XFS_EFI_COMMITTED) to determine who gets to free the EFI.
+ * transaction and free it. Otherwise coordinate with xfs_efi_release()
+ * to determine who gets to free the EFI.
*/
STATIC void
xfs_efi_item_unpin(
@@ -171,19 +170,13 @@ xfs_efi_item_unlock(
/*
* The EFI is logged only once and cannot be moved in the log, so simply return
- * the lsn at which it's been logged. For bulk transaction committed
- * processing, the EFI may be processed but not yet unpinned prior to the EFD
- * being processed. Set the XFS_EFI_COMMITTED flag so this case can be detected
- * when processing the EFD.
+ * the lsn at which it's been logged.
*/
STATIC xfs_lsn_t
xfs_efi_item_committed(
struct xfs_log_item *lip,
xfs_lsn_t lsn)
{
- struct xfs_efi_log_item *efip = EFI_ITEM(lip);
-
- set_bit(XFS_EFI_COMMITTED, &efip->efi_flags);
return lsn;
}
@@ -241,6 +234,7 @@ xfs_efi_init(
efip->efi_format.efi_nextents = nextents;
efip->efi_format.efi_id = (__psint_t)(void*)efip;
atomic_set(&efip->efi_next_extent, 0);
+ atomic_set(&efip->efi_refcount, 2);
return efip;
}
@@ -310,8 +304,13 @@ xfs_efi_release(xfs_efi_log_item_t *efip,
uint nextents)
{
ASSERT(atomic_read(&efip->efi_next_extent) >= nextents);
- if (atomic_sub_and_test(nextents, &efip->efi_next_extent))
+ if (atomic_sub_and_test(nextents, &efip->efi_next_extent)) {
__xfs_efi_release(efip);
+
+ /* recovery needs us to drop the EFI reference, too */
+ if (test_bit(XFS_EFI_RECOVERED, &efip->efi_flags))
+ __xfs_efi_release(efip);
+ }
}
static inline struct xfs_efd_log_item *EFD_ITEM(struct xfs_log_item *lip)
diff --git a/fs/xfs/xfs_extfree_item.h b/fs/xfs/xfs_extfree_item.h
index 375f68e..4322224 100644
--- a/fs/xfs/xfs_extfree_item.h
+++ b/fs/xfs/xfs_extfree_item.h
@@ -114,16 +114,20 @@ typedef struct xfs_efd_log_format_64 {
* Define EFI flag bits. Manipulated by set/clear/test_bit operators.
*/
#define XFS_EFI_RECOVERED 1
-#define XFS_EFI_COMMITTED 2
/*
- * This is the "extent free intention" log item. It is used
- * to log the fact that some extents need to be free. It is
- * used in conjunction with the "extent free done" log item
- * described below.
+ * This is the "extent free intention" log item. It is used to log the fact
+ * that some extents need to be free. It is used in conjunction with the
+ * "extent free done" log item described below.
+ *
+ * The EFI is reference counted so that it is not freed prior to both the EFI
+ * and EFD being committed and unpinned. This ensures that when the last
+ * reference goes away the EFI will always be in the AIL as it has been
+ * unpinned, regardless of whether the EFD is processed before or after the EFI.
*/
typedef struct xfs_efi_log_item {
xfs_log_item_t efi_item;
+ atomic_t efi_refcount;
atomic_t efi_next_extent;
unsigned long efi_flags; /* misc flags */
xfs_efi_log_format_t efi_format;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d1dba7c..3ca3380 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2948,6 +2948,7 @@ xlog_recover_process_efi(
* This will pull the EFI from the AIL and
* free the memory associated with it.
*/
+ set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
xfs_efi_release(efip, efip->efi_format.efi_nextents);
return XFS_ERROR(EIO);
}
--
1.7.10.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 3:09 [PATCH] xfs: don't free EFIs before the EFDs are committed Dave Chinner @ 2013-04-03 19:12 ` Mark Tinguely 2013-04-03 19:46 ` Eric Sandeen 2013-04-04 1:14 ` Dave Chinner 2013-04-04 22:06 ` Mark Tinguely 2013-04-05 18:31 ` Ben Myers 2 siblings, 2 replies; 10+ messages in thread From: Mark Tinguely @ 2013-04-03 19:12 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 04/02/13 22:09, Dave Chinner wrote: > From: Dave Chinner<dchinner@redhat.com> > > Filesystems are occasionally being shut down with this error: > > xfs_trans_ail_delete_bulk: attempting to delete a log item that is > not in the AIL. > > It was diagnosed to be related to the EFI/EFD commit order when the > EFI and EFD are in different checkpoints and the EFD is committed > before the EFI here: > > http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > > The real problem is that a single bit cannot fully describe the > states that the EFI/EFD processing can be in. These completion > states are: > > EFI EFI in AIL EFD Result > committed/unpinned Yes committed OK > committed/pinned No committed Shutdown > uncommitted No committed Shutdown > > > Note that the "result" field is what should happen, not what does > happen. The current logic is broken and handles the first two cases > correctly by luck. That is, the code will free the EFI if the > XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The > inverted logic "works" because if both EFI and EFD are committed, > then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > bit, and the second frees the EFI item. Hence as long as > xfs_efi_item_committed() has been called, everything appears to be > fine. > > It is the third case where the logic fails - where > xfs_efd_item_committed() is called before xfs_efi_item_committed(), > and that results in the EFI being freed before it has been > committed. That is the bug that triggered the shutdown, d hence > keeping track of whether the EFI has been committed or not is > insufficient to correctly order the EFI/EFD operations w.r.t. the > AIL. > I think the exist xfs_efi routines are working correctly. the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED (sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, it should not release the efi and remove the entry from the AIL. It also correctly detects if the EFD is committed before the EFI by shutting down the filesystem. This patch seems to paper over why the EFD will come before the EFI. The CIL commits are in the correct order - protected by the xlog_wait() but the callbacks are out of order. My previous patch fixed the callback from happening out of sequence. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 19:12 ` Mark Tinguely @ 2013-04-03 19:46 ` Eric Sandeen 2013-04-03 21:02 ` Eric Sandeen 2013-04-04 1:14 ` Dave Chinner 1 sibling, 1 reply; 10+ messages in thread From: Eric Sandeen @ 2013-04-03 19:46 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On 4/3/13 2:12 PM, Mark Tinguely wrote: > On 04/02/13 22:09, Dave Chinner wrote: >> From: Dave Chinner<dchinner@redhat.com> >> >> Filesystems are occasionally being shut down with this error: >> >> xfs_trans_ail_delete_bulk: attempting to delete a log item that is >> not in the AIL. >> >> It was diagnosed to be related to the EFI/EFD commit order when the >> EFI and EFD are in different checkpoints and the EFD is committed >> before the EFI here: >> >> http://oss.sgi.com/archives/xfs/2013-01/msg00082.html >> >> The real problem is that a single bit cannot fully describe the >> states that the EFI/EFD processing can be in. These completion >> states are: >> >> EFI EFI in AIL EFD Result >> committed/unpinned Yes committed OK >> committed/pinned No committed Shutdown >> uncommitted No committed Shutdown >> >> >> Note that the "result" field is what should happen, not what does >> happen. The current logic is broken and handles the first two cases >> correctly by luck. That is, the code will free the EFI if the >> XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The >> inverted logic "works" because if both EFI and EFD are committed, >> then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED >> bit, and the second frees the EFI item. Hence as long as >> xfs_efi_item_committed() has been called, everything appears to be >> fine. >> >> It is the third case where the logic fails - where >> xfs_efd_item_committed() is called before xfs_efi_item_committed(), >> and that results in the EFI being freed before it has been >> committed. That is the bug that triggered the shutdown, d hence >> keeping track of whether the EFI has been committed or not is >> insufficient to correctly order the EFI/EFD operations w.r.t. the >> AIL. >> > > I think the exist xfs_efi routines are working correctly. > > the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED > (sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then > an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, it > should not release the efi and remove the entry from the AIL. It also > correctly detects if the EFD is committed before the EFI by shutting > down the filesystem. Well hang on, not everything can be cool in EFI-land here, if nothing else this: __xfs_efi_release( struct xfs_efi_log_item *efip) { struct xfs_ail *ailp = efip->efi_item.li_ailp; if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) { spin_lock(&ailp->xa_lock); /* xfs_trans_ail_delete() drops the AIL lock. */ xfs_trans_ail_delete(ailp, &efip->efi_item, SHUTDOWN_LOG_IO_ERROR); xfs_efi_item_free(efip); } } seems obviously broken. * test_and_clear_bit - Clear a bit and return its old value so it is saying if XFS_EFI_COMMITED was *not* set, free the efi. Which I think we must admit is broken. I have to get up to speed on this code, but it seems like at least this much above is quite obviously broken, and the previously proposed patch doesn't address it. Am I missing something? -Eric > This patch seems to paper over why the EFD will come before the EFI. > > The CIL commits are in the correct order - protected by the > xlog_wait() but the callbacks are out of order. > > My previous patch fixed the callback from happening out of sequence. > > --Mark. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 19:46 ` Eric Sandeen @ 2013-04-03 21:02 ` Eric Sandeen 2013-04-03 21:45 ` Mark Tinguely 0 siblings, 1 reply; 10+ messages in thread From: Eric Sandeen @ 2013-04-03 21:02 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On 4/3/13 2:46 PM, Eric Sandeen wrote: > On 4/3/13 2:12 PM, Mark Tinguely wrote: >> On 04/02/13 22:09, Dave Chinner wrote: >>> From: Dave Chinner<dchinner@redhat.com> >>> >>> Filesystems are occasionally being shut down with this error: >>> >>> xfs_trans_ail_delete_bulk: attempting to delete a log item that is >>> not in the AIL. >>> >>> It was diagnosed to be related to the EFI/EFD commit order when the >>> EFI and EFD are in different checkpoints and the EFD is committed >>> before the EFI here: >>> >>> http://oss.sgi.com/archives/xfs/2013-01/msg00082.html >>> >>> The real problem is that a single bit cannot fully describe the >>> states that the EFI/EFD processing can be in. These completion >>> states are: >>> >>> EFI EFI in AIL EFD Result >>> committed/unpinned Yes committed OK >>> committed/pinned No committed Shutdown >>> uncommitted No committed Shutdown >>> >>> >>> Note that the "result" field is what should happen, not what does >>> happen. The current logic is broken and handles the first two cases >>> correctly by luck. That is, the code will free the EFI if the >>> XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The >>> inverted logic "works" because if both EFI and EFD are committed, >>> then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED >>> bit, and the second frees the EFI item. Hence as long as >>> xfs_efi_item_committed() has been called, everything appears to be >>> fine. >>> >>> It is the third case where the logic fails - where >>> xfs_efd_item_committed() is called before xfs_efi_item_committed(), >>> and that results in the EFI being freed before it has been >>> committed. That is the bug that triggered the shutdown, d hence >>> keeping track of whether the EFI has been committed or not is >>> insufficient to correctly order the EFI/EFD operations w.r.t. the >>> AIL. >>> >> >> I think the exist xfs_efi routines are working correctly. >> >> the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED >> (sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then >> an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, it >> should not release the efi and remove the entry from the AIL. It also >> correctly detects if the EFD is committed before the EFI by shutting >> down the filesystem. > > Well hang on, not everything can be cool in EFI-land here, if nothing > else this: > > __xfs_efi_release( > struct xfs_efi_log_item *efip) > { > struct xfs_ail *ailp = efip->efi_item.li_ailp; > > if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) { > spin_lock(&ailp->xa_lock); > /* xfs_trans_ail_delete() drops the AIL lock. */ > xfs_trans_ail_delete(ailp, &efip->efi_item, > SHUTDOWN_LOG_IO_ERROR); > xfs_efi_item_free(efip); > } > } > > seems obviously broken. > > * test_and_clear_bit - Clear a bit and return its old value > > so it is saying if XFS_EFI_COMMITED was *not* set, free the efi. > Which I think we must admit is broken. > > I have to get up to speed on this code, but it seems like at least > this much above is quite obviously broken, and the previously > proposed patch doesn't address it. Am I missing something? Or are you saying that it's working as designed, so that the first caller finds it set, clears it and does nothing, and the 2nd caller finds it unset, and frees it? help me out here ;) -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 21:02 ` Eric Sandeen @ 2013-04-03 21:45 ` Mark Tinguely 2013-04-04 1:31 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Mark Tinguely @ 2013-04-03 21:45 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On 04/03/13 16:02, Eric Sandeen wrote: > On 4/3/13 2:46 PM, Eric Sandeen wrote: >> On 4/3/13 2:12 PM, Mark Tinguely wrote: >>> On 04/02/13 22:09, Dave Chinner wrote: >>>> From: Dave Chinner<dchinner@redhat.com> >>>> >>>> Filesystems are occasionally being shut down with this error: >>>> >>>> xfs_trans_ail_delete_bulk: attempting to delete a log item that is >>>> not in the AIL. >>>> >>>> It was diagnosed to be related to the EFI/EFD commit order when the >>>> EFI and EFD are in different checkpoints and the EFD is committed >>>> before the EFI here: >>>> >>>> http://oss.sgi.com/archives/xfs/2013-01/msg00082.html >>>> >>>> The real problem is that a single bit cannot fully describe the >>>> states that the EFI/EFD processing can be in. These completion >>>> states are: >>>> >>>> EFI EFI in AIL EFD Result >>>> committed/unpinned Yes committed OK >>>> committed/pinned No committed Shutdown >>>> uncommitted No committed Shutdown >>>> >>>> >>>> Note that the "result" field is what should happen, not what does >>>> happen. The current logic is broken and handles the first two cases >>>> correctly by luck. That is, the code will free the EFI if the >>>> XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The >>>> inverted logic "works" because if both EFI and EFD are committed, >>>> then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED >>>> bit, and the second frees the EFI item. Hence as long as >>>> xfs_efi_item_committed() has been called, everything appears to be >>>> fine. >>>> >>>> It is the third case where the logic fails - where >>>> xfs_efd_item_committed() is called before xfs_efi_item_committed(), >>>> and that results in the EFI being freed before it has been >>>> committed. That is the bug that triggered the shutdown, d hence >>>> keeping track of whether the EFI has been committed or not is >>>> insufficient to correctly order the EFI/EFD operations w.r.t. the >>>> AIL. >>>> >>> >>> I think the exist xfs_efi routines are working correctly. >>> >>> the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED >>> (sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then >>> an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, it >>> should not release the efi and remove the entry from the AIL. It also >>> correctly detects if the EFD is committed before the EFI by shutting >>> down the filesystem. >> >> Well hang on, not everything can be cool in EFI-land here, if nothing >> else this: >> >> __xfs_efi_release( >> struct xfs_efi_log_item *efip) >> { >> struct xfs_ail *ailp = efip->efi_item.li_ailp; >> >> if (!test_and_clear_bit(XFS_EFI_COMMITTED,&efip->efi_flags)) { >> spin_lock(&ailp->xa_lock); >> /* xfs_trans_ail_delete() drops the AIL lock. */ >> xfs_trans_ail_delete(ailp,&efip->efi_item, >> SHUTDOWN_LOG_IO_ERROR); >> xfs_efi_item_free(efip); >> } >> } >> >> seems obviously broken. >> >> * test_and_clear_bit - Clear a bit and return its old value >> >> so it is saying if XFS_EFI_COMMITED was *not* set, free the efi. >> Which I think we must admit is broken. >> >> I have to get up to speed on this code, but it seems like at least >> this much above is quite obviously broken, and the previously >> proposed patch doesn't address it. Am I missing something? > > Or are you saying that it's working as designed, so that the first > caller finds it set, clears it and does nothing, and the 2nd caller > finds it unset, and frees it? help me out here ;) > > -Eric > Yes. The order is IOP_COMMITTED and then IOP_UNPIN: The EFI IOP_COMMITTED: xfs_efi_item_committed( struct xfs_log_item *lip, xfs_lsn_t lsn) { struct xfs_efi_log_item *efip = EFI_ITEM(lip); set_bit(XFS_EFI_COMMITTED, &efip->efi_flags); return lsn; } then The EFI IOP_UNPIN() -> STATIC void xfs_efi_item_unpin( struct xfs_log_item *lip, int remove) { struct xfs_efi_log_item *efip = EFI_ITEM(lip); if (remove) { ASSERT(!(lip->li_flags & XFS_LI_IN_AIL)); if (lip->li_desc) xfs_trans_del_item(lip); xfs_efi_item_free(efip); return; } __xfs_efi_release(efip); } and the the EFD's IOP_COMMITED: STATIC xfs_lsn_t xfs_efi_item_committed( struct xfs_log_item *lip, xfs_lsn_t lsn) { struct xfs_efi_log_item *efip = EFI_ITEM(lip); set_bit(XFS_EFI_COMMITTED, &efip->efi_flags); return lsn; } The code makes sure the sequence EFI and then EFD are done in order. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 21:45 ` Mark Tinguely @ 2013-04-04 1:31 ` Dave Chinner 0 siblings, 0 replies; 10+ messages in thread From: Dave Chinner @ 2013-04-04 1:31 UTC (permalink / raw) To: Mark Tinguely; +Cc: Eric Sandeen, xfs On Wed, Apr 03, 2013 at 04:45:18PM -0500, Mark Tinguely wrote: > On 04/03/13 16:02, Eric Sandeen wrote: > >On 4/3/13 2:46 PM, Eric Sandeen wrote: > >>On 4/3/13 2:12 PM, Mark Tinguely wrote: > >>>On 04/02/13 22:09, Dave Chinner wrote: > >>>>From: Dave Chinner<dchinner@redhat.com> > >>>> > >>>>Filesystems are occasionally being shut down with this error: > >>>> > >>>>xfs_trans_ail_delete_bulk: attempting to delete a log item that is > >>>>not in the AIL. > >>>> > >>>>It was diagnosed to be related to the EFI/EFD commit order when the > >>>>EFI and EFD are in different checkpoints and the EFD is committed > >>>>before the EFI here: > >>>> > >>>>http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > >>>> > >>>>The real problem is that a single bit cannot fully describe the > >>>>states that the EFI/EFD processing can be in. These completion > >>>>states are: > >>>> > >>>>EFI EFI in AIL EFD Result > >>>>committed/unpinned Yes committed OK > >>>>committed/pinned No committed Shutdown > >>>>uncommitted No committed Shutdown > >>>> > >>>> > >>>>Note that the "result" field is what should happen, not what does > >>>>happen. The current logic is broken and handles the first two cases > >>>>correctly by luck. That is, the code will free the EFI if the > >>>>XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The > >>>>inverted logic "works" because if both EFI and EFD are committed, > >>>>then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > >>>>bit, and the second frees the EFI item. Hence as long as > >>>>xfs_efi_item_committed() has been called, everything appears to be > >>>>fine. > >>>> > >>>>It is the third case where the logic fails - where > >>>>xfs_efd_item_committed() is called before xfs_efi_item_committed(), > >>>>and that results in the EFI being freed before it has been > >>>>committed. That is the bug that triggered the shutdown, d hence > >>>>keeping track of whether the EFI has been committed or not is > >>>>insufficient to correctly order the EFI/EFD operations w.r.t. the > >>>>AIL. > >>>> > >>> > >>>I think the exist xfs_efi routines are working correctly. > >>> > >>>the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED > >>>(sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then > >>>an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, it > >>>should not release the efi and remove the entry from the AIL. It also > >>>correctly detects if the EFD is committed before the EFI by shutting > >>>down the filesystem. > >> > >>Well hang on, not everything can be cool in EFI-land here, if nothing > >>else this: > >> > >>__xfs_efi_release( > >> struct xfs_efi_log_item *efip) > >>{ > >> struct xfs_ail *ailp = efip->efi_item.li_ailp; > >> > >> if (!test_and_clear_bit(XFS_EFI_COMMITTED,&efip->efi_flags)) { > >> spin_lock(&ailp->xa_lock); > >> /* xfs_trans_ail_delete() drops the AIL lock. */ > >> xfs_trans_ail_delete(ailp,&efip->efi_item, > >> SHUTDOWN_LOG_IO_ERROR); > >> xfs_efi_item_free(efip); > >> } > >>} > >> > >>seems obviously broken. > >> > >> * test_and_clear_bit - Clear a bit and return its old value > >> > >>so it is saying if XFS_EFI_COMMITED was *not* set, free the efi. > >>Which I think we must admit is broken. > >> > >>I have to get up to speed on this code, but it seems like at least > >>this much above is quite obviously broken, and the previously > >>proposed patch doesn't address it. Am I missing something? > > > >Or are you saying that it's working as designed, so that the first > >caller finds it set, clears it and does nothing, and the 2nd caller > >finds it unset, and frees it? help me out here ;) > > > >-Eric > > > > Yes. The order is IOP_COMMITTED and then IOP_UNPIN: Batching means you can get: IOP_COMMITTED(a) IOP_COMMITTED(b) IOP_COMMITTED(c) ..... IOP_COMMITTED(6) AIL_INSERT(a) AIL_INSERT(b) AIL_INSERT(c) ..... AIL_INSERT(6) IOP_UNPIN(a) IOP_UNPIN(b) IOP_UNPIN(c) ..... IOP_UNPIN(6) So that means at minimum we have to handle the EFI committed/pinned case when the EFD is committed as well. This is what the commit b199c8a (xfs: Pull EFI/EFD handling out from under the AIL lock) did - it moved the COMMITTED bit from the EFI IOP_UNPIN() callback to the IOP_COMMITTED() callback. But then the IOP_UNPIN() callback has to handle the same case by freeing the EFI as well, and hence the XFS_EFI_COMMITTED() bit became a proxy reference count. Which doesn't handle the third case, which is the EFD being processed before the EFI, and hence we get a shutdown because the EFD is trying to remove the EFI from the AIL before it is inserted. > The code makes sure the sequence EFI and then EFD are done in order. No, it doesn't ensure they are done in order - it *works* when they are done in order. If it ensured they were done in order, then the EFD commit would not be allowed to proceed until the EFI wsa committed. The code does not do that, not does it need to. But it needs to handle the EFD being processed before the EFI... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 19:12 ` Mark Tinguely 2013-04-03 19:46 ` Eric Sandeen @ 2013-04-04 1:14 ` Dave Chinner 1 sibling, 0 replies; 10+ messages in thread From: Dave Chinner @ 2013-04-04 1:14 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Wed, Apr 03, 2013 at 02:12:09PM -0500, Mark Tinguely wrote: > On 04/02/13 22:09, Dave Chinner wrote: > >From: Dave Chinner<dchinner@redhat.com> > > > >Filesystems are occasionally being shut down with this error: > > > >xfs_trans_ail_delete_bulk: attempting to delete a log item that is > >not in the AIL. > > > >It was diagnosed to be related to the EFI/EFD commit order when the > >EFI and EFD are in different checkpoints and the EFD is committed > >before the EFI here: > > > >http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > > > >The real problem is that a single bit cannot fully describe the > >states that the EFI/EFD processing can be in. These completion > >states are: > > > >EFI EFI in AIL EFD Result > >committed/unpinned Yes committed OK > >committed/pinned No committed Shutdown > >uncommitted No committed Shutdown > > > > > >Note that the "result" field is what should happen, not what does > >happen. The current logic is broken and handles the first two cases > >correctly by luck. That is, the code will free the EFI if the > >XFS_EFI_COMMITTED bit is*not* set, rather than if it is set. The > >inverted logic "works" because if both EFI and EFD are committed, > >then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > >bit, and the second frees the EFI item. Hence as long as > >xfs_efi_item_committed() has been called, everything appears to be > >fine. > > > >It is the third case where the logic fails - where > >xfs_efd_item_committed() is called before xfs_efi_item_committed(), > >and that results in the EFI being freed before it has been > >committed. That is the bug that triggered the shutdown, d hence > >keeping track of whether the EFI has been committed or not is > >insufficient to correctly order the EFI/EFD operations w.r.t. the > >AIL. > > > > I think the exist xfs_efi routines are working correctly. Speaking as the person whose designed and wrote the current EFI/EFD code in question here, I think it is broken and only working by chance. Hint: there is no comment anywhere (code or commit message) that is works by a tricky piece of logic that first clears the committed bit and then frees the object when it is not set the second time through. I *always* write comments about tricky logic like this when it is intentional, so that when I look at it a couple of years later I know exactly why the logic is correct even though it appears wrong. No comments therefore means this logic was not the intended behaviour and therefore only working by chance. > the xfs_trans_committed_bulk() for the efi does the IOP_COMMITTED > (sets the XFS_EFI_COMMITTED bit), add the entry to the AIL and then > an IOP_UNPIN() which clears the XFS_EFI_COMMITTED since it is set, > it should not release the efi and remove the entry from the AIL. It > also correctly detects if the EFD is committed before the EFI by > shutting down the filesystem. But that behaviour is wrong - seeing the EFD before the EFI is not a bug, just an indication that we have interleaved checkpoints being committed concurrently. The fact that we've seen the EFD means when we can simply free the EFI once it has been committed. The AIL is where the ordering of journalled objects matter; the order in which iclog commit callbacks are run does not affect that. > This patch seems to paper over why the EFD will come before the EFI. > > The CIL commits are in the correct order - protected by the > xlog_wait() but the callbacks are out of order. The design is such that what is physically on disk is correctly ordered, but we don't care about the callback order as the AIL is where the *in-memory object order* is correctly established. > My previous patch fixed the callback from happening out of sequence. It may have, but it serialised all CIL checkpoints, not just the commit record order. That approach to fixing the bug is simply a non-starter. Why are you raising this now when you were silent about it 3 months ago when this bug was first being discussed and the EFI/EFD handling bug was initially raised? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 3:09 [PATCH] xfs: don't free EFIs before the EFDs are committed Dave Chinner 2013-04-03 19:12 ` Mark Tinguely @ 2013-04-04 22:06 ` Mark Tinguely 2013-04-05 0:45 ` Dave Chinner 2013-04-05 18:31 ` Ben Myers 2 siblings, 1 reply; 10+ messages in thread From: Mark Tinguely @ 2013-04-04 22:06 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 04/02/13 22:09, Dave Chinner wrote: > From: Dave Chinner<dchinner@redhat.com> > > Filesystems are occasionally being shut down with this error: > > xfs_trans_ail_delete_bulk: attempting to delete a log item that is > not in the AIL. > > It was diagnosed to be related to the EFI/EFD commit order when the > EFI and EFD are in different checkpoints and the EFD is committed > before the EFI here: > > http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > > The real problem is that a single bit cannot fully describe the > states that the EFI/EFD processing can be in. These completion > states are: > > EFI EFI in AIL EFD Result > committed/unpinned Yes committed OK > committed/pinned No committed Shutdown > uncommitted No committed Shutdown > > > Note that the "result" field is what should happen, not what does > happen. The current logic is broken and handles the first two cases > correctly by luck. That is, the code will free the EFI if the > XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The > inverted logic "works" because if both EFI and EFD are committed, > then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > bit, and the second frees the EFI item. Hence as long as > xfs_efi_item_committed() has been called, everything appears to be > fine. > > It is the third case where the logic fails - where > xfs_efd_item_committed() is called before xfs_efi_item_committed(), > and that results in the EFI being freed before it has been > committed. That is the bug that triggered the shutdown, and hence > keeping track of whether the EFI has been committed or not is > insufficient to correctly order the EFI/EFD operations w.r.t. the > AIL. > > What we really want is this: the EFI is always placed into the > AIL before the last reference goes away. The only way to guarantee > that is that the EFI is not freed until after it has been unpinned > *and* the EFD has been committed. That is, restructure the logic so > that the only case that can occur is the first case. > > This can be done easily by replacing the XFS_EFI_COMMITTED with an > EFI reference count. The EFI is initialised with it's own count, and > that is not released until it is unpinned. However, there is a > complication to this method - the high level EFI/EFD code in > xfs_bmap_finish() does not hold direct references to the EFI > structure, and runs a transaction commit between the EFI and EFD > processing. Hence the EFI can be freed even before the EFD is > created using such a method. > > Further, log recovery uses the AIL for tracking EFI/EFDs that need > to be recovered, but it uses the AIL *differently* to the EFI > transaction commit. Hence log recovery never pins or unpins EFIs, so > we can't drop the EFI reference count indirectly to free the EFI. > > However, this doesn't prevent us from using a reference count here. > There is a 1:1 relationship between EFIs and EFDs, so when we > initialise the EFI we can take a reference count for the EFD as > well. This solves the xfs_bmap_finish() issue - the EFI will never > be freed until the EFD is processed. In terms of log recovery, > during the committing of the EFD we can look for the > XFS_EFI_RECOVERED bit being set and drop the EFI reference as well, > thereby ensuring everything works correctly there as well. Duh me, beside what our problem where the cil push callbacks are called out of order, there can be several xfs_trans_committed_bulk() running at the same time. Yep, the counter allows efi and efd to happen in any order and make sure the efi is entered into the AIL. Okay, the pass2 of recovery places the efi item into the AIL, and the second decrement on the counter when the XFS_EFI_RECOVERED bit gets the counter to zero in the recovery case because there is no unpin. Reviewed-by: Mark Tinguely <tinguely@sgi.com> This patch also prevents the only chance of detecting of out of sequence ctx callbacks. The out of sequence ctx callbacks *does* matter because the li_lsn is set by the ctx->start_lsn and the lowest li_lsn is used to move the tail. When the callbacks are out of order, it is possible for the larger lsn of the greater ctx sequence to move the log tail beyond a lower ctx sequence entries before the entries for the lower sequence ctx are moved into the AIL. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-04 22:06 ` Mark Tinguely @ 2013-04-05 0:45 ` Dave Chinner 0 siblings, 0 replies; 10+ messages in thread From: Dave Chinner @ 2013-04-05 0:45 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Apr 04, 2013 at 05:06:31PM -0500, Mark Tinguely wrote: > On 04/02/13 22:09, Dave Chinner wrote: > >From: Dave Chinner<dchinner@redhat.com> > > > >Filesystems are occasionally being shut down with this error: > > > >xfs_trans_ail_delete_bulk: attempting to delete a log item that is > >not in the AIL. > > > >It was diagnosed to be related to the EFI/EFD commit order when the > >EFI and EFD are in different checkpoints and the EFD is committed > >before the EFI here: > > > >http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > > > >The real problem is that a single bit cannot fully describe the > >states that the EFI/EFD processing can be in. These completion > >states are: > > > >EFI EFI in AIL EFD Result > >committed/unpinned Yes committed OK > >committed/pinned No committed Shutdown > >uncommitted No committed Shutdown > > > > > >Note that the "result" field is what should happen, not what does > >happen. The current logic is broken and handles the first two cases > >correctly by luck. That is, the code will free the EFI if the > >XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The > >inverted logic "works" because if both EFI and EFD are committed, > >then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > >bit, and the second frees the EFI item. Hence as long as > >xfs_efi_item_committed() has been called, everything appears to be > >fine. > > > >It is the third case where the logic fails - where > >xfs_efd_item_committed() is called before xfs_efi_item_committed(), > >and that results in the EFI being freed before it has been > >committed. That is the bug that triggered the shutdown, and hence > >keeping track of whether the EFI has been committed or not is > >insufficient to correctly order the EFI/EFD operations w.r.t. the > >AIL. > > > >What we really want is this: the EFI is always placed into the > >AIL before the last reference goes away. The only way to guarantee > >that is that the EFI is not freed until after it has been unpinned > >*and* the EFD has been committed. That is, restructure the logic so > >that the only case that can occur is the first case. > > > >This can be done easily by replacing the XFS_EFI_COMMITTED with an > >EFI reference count. The EFI is initialised with it's own count, and > >that is not released until it is unpinned. However, there is a > >complication to this method - the high level EFI/EFD code in > >xfs_bmap_finish() does not hold direct references to the EFI > >structure, and runs a transaction commit between the EFI and EFD > >processing. Hence the EFI can be freed even before the EFD is > >created using such a method. > > > >Further, log recovery uses the AIL for tracking EFI/EFDs that need > >to be recovered, but it uses the AIL *differently* to the EFI > >transaction commit. Hence log recovery never pins or unpins EFIs, so > >we can't drop the EFI reference count indirectly to free the EFI. > > > >However, this doesn't prevent us from using a reference count here. > >There is a 1:1 relationship between EFIs and EFDs, so when we > >initialise the EFI we can take a reference count for the EFD as > >well. This solves the xfs_bmap_finish() issue - the EFI will never > >be freed until the EFD is processed. In terms of log recovery, > >during the committing of the EFD we can look for the > >XFS_EFI_RECOVERED bit being set and drop the EFI reference as well, > >thereby ensuring everything works correctly there as well. > > Duh me, beside what our problem where the cil push callbacks are > called out of order, there can be several xfs_trans_committed_bulk() > running at the same time. I'm not sure that can actually happen - callback processing is supposed to be serialised by xlog_state_do_callback().... > Yep, the counter allows efi and efd to happen in any order and make > sure the efi is entered into the AIL. > > Okay, the pass2 of recovery places the efi item into the AIL, and > the second decrement on the counter when the XFS_EFI_RECOVERED bit > gets the counter to zero in the recovery case because there is no > unpin. > > Reviewed-by: Mark Tinguely <tinguely@sgi.com> Thank you. > This patch also prevents the only chance of detecting of out of > sequence ctx callbacks. I don't think the callbacks are being attached out of order. Like I said previously: if that were happening then the commit records are being written to the log in the wrong order and that is a *major* issue. As such, I've attached the informal proof that I just ran through to verify that the current code is ordering commit records and hence callbacks on iclogs correctly for multiple outstanding checkpoints. I don't see any obvious hole in the logic as the callbacks are attached to the iclog the commit record is written into and the commit records are strictly serialised and ordered... > The out of sequence ctx callbacks *does* > matter because the li_lsn is set by the ctx->start_lsn and the > lowest li_lsn is used to move the tail. When the callbacks are out > of order, it is possible for the larger lsn of the greater ctx > sequence to move the log tail beyond a lower ctx sequence entries > before the entries for the lower sequence ctx are moved into the > AIL. Assuming we are getting callbacks running out of order, this is only a problem if the AIL is empty at the time the out of order callbacks are run. If the AIL is not empty, then the tail of the log is derived directly from the LSN of the first item in the AIL and that LSN will be from an item committed from a previous checkpoint, not the current one that is runing out of order. Hence the tail of the log is not affected in this case. If the AIL is empty, then the tail LSN is determined by the lowest LSN of the iclogs being processed concurrently. This code in xlog_state_do_callback() is supposed to serialise and order the completion of multiple concurrent iclog completions: lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0) { iclog = iclog->ic_next; continue; /* Leave this iclog for * another thread */ } IOWs, if the current iclog being checked for IO completion processing has a higher LSN than the lowest in all the iclogs that are not clean, then skip over it and check the next iclog. So, even if we have multiple concurrent iclogs in XLOG_STATE_DONE_SYNC/XLOG_STATE_DO_CALLBACK state, only the first that comes across the iclog with the lowest LSN in these states is supposed to run callbacks. And an iclog is considered in the xlog_get_lowest_lsn() calculation until it becomes dirty/active, which is a state transition that occurs *after* callbacks are run on the iclog. Further, once an iclog has been selected for callback processing, if it has callbacks on it, we do this: if (iclog->ic_callback) atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(iclog->ic_header.h_lsn)); to set the tail of the log to match that of the current LSN of the iclog that is being processed. This is an "advance" update of the log tail that will be used until items are inserted into the AIL by callback processing (see xlog_assign_tail_lsn()). If there are not items to be placed into the AIL (i.e. it remains empty) then this ensures that the next log write still gets the tail LSN correct. And by this analysis, we should not be seeing concurrent execution of callbacks on different iclogs, nor should we be seeing out of order execution of callbacks on different iclogs. And because callbacks are ordered correctly by the CIL push code we shouldn't be seeing callbacks on a single iclog being run out of order either. IOWs, the tail LSN of the log should not be jumping around due to out of order iclog and/or callback processing. Unless there is a problem in xlog_state_do_callback() with serialising and ordering iclog IO completion processing, there shouldn't be a problem with out of order callback processing, either. Note that I haven't been through the iclog state machine in fine detail since 2007, so I'm going to need to spend a bit of time looking at it again to confirm this. I don't see any obvious problem at the moment, though.... Cheers, Dave. -- Dave Chinner david@fromorbit.com commit CIL operation: down_write(ctx lock) increment sequence switch ctxs spin_lock(cil lock) add old ctx to commit list spin_unlock(cil lock) up_write(ctx lock) write checkpoint to log spin_lock(cil lock) walk commiting list and wait for prior sequences to commit spin_unlock(cil lock) xfs_log_done() -> writes commit record, returns iclog, commit_lsn xfs_log_notify() -> attaches callback to iclog tail spin_lock(cil lock) ctx->commit_lsn = commit_lsn wake waiters spin_unlock(cil lock) release iclog <iclog IO started> <iclog IO complete> <iclog run callbacks> spin_lock(cil lock) remove from committing list. spin_unlock(cil lock) Informal proof of sequence commit order correctness by induction: The initial context switching segment runs under an exclusive lock, and it means that S2 can not be started (and hence committed) until S1 is already in the committing list. So when S2 commits, if S1 has not yet written a commit record we have the initial condition of: commiting list: S2, commit_lsn = 0 S1, commit_lsn = 0 Now, if S2 writes it's checkpoint into the iclogs before S1 completes (e.g. S1 is a 10MB checkpoint, S2 is single inode core from a fsync) S2 enters the committing list check and sees: 528 restart: 529 spin_lock(&cil->xc_cil_lock); 530 list_for_each_entry(new_ctx, &cil->xc_committing, committing) { First entry is S2 531 /* 532 * Higher sequences will wait for this one so skip them. 533 * Don't wait for own own sequence, either. 534 */ 535 if (new_ctx->sequence >= ctx->sequence) 536 continue; new_ctx->sequence = S2, ctx->sequence = S2, so continue; Second entry is S1, so we see: new_ctx->sequence = S1, ctx->sequence = S2, so we now check the commit lsn: 537 if (!new_ctx->commit_lsn) { new_ctx->commit_lsn = 0, so we: 538 /* 539 * It is still being pushed! Wait for the push to 540 * complete, then start again from the beginning. 541 */ 542 xlog_wait(&cil->xc_commit_wait, &cil->xc_cil_lock); 543 goto restart; wait and restart once woken. 544 } 545 } 546 spin_unlock(&cil->xc_cil_lock); Eventually, S1 finishes writing the checkpoint, and enters the commit record loop: 528 restart: 529 spin_lock(&cil->xc_cil_lock); 530 list_for_each_entry(new_ctx, &cil->xc_committing, committing) { First entry is S2 531 /* 532 * Higher sequences will wait for this one so skip them. 533 * Don't wait for own own sequence, either. 534 */ 535 if (new_ctx->sequence >= ctx->sequence) 536 continue; new_ctx->sequence = S2, ctx->sequence = S1, so we continue. Second entry is S1, so again we continue, hit the end of list and drop out of the loop. At this point S1 writes the commit record, then attaches the callbacks to the *tail* of the iclog callback list. In doing this, it gets the commit_lsn of the commit record, so it does: ctx->commit_lsn = commit_lsn wakeup_all(&cil->xc_commit_wait); Which wakes up S2. S2 now does: _ 528 restart: 529 spin_lock(&cil->xc_cil_lock); 530 list_for_each_entry(new_ctx, &cil->xc_committing, committing) { First entry is S2 531 /* 532 * Higher sequences will wait for this one so skip them. 533 * Don't wait for own own sequence, either. 534 */ 535 if (new_ctx->sequence >= ctx->sequence) 536 continue; new_ctx->sequence = S2, ctx->sequence = S2, so we continue. Second entry is S1, so we now check the commit lsn: 537 if (!new_ctx->commit_lsn) { new_ctx->commit_lsn != 0, so again we continue, hit the end of list and fall through to the commit record write. This attaches the callback to the *tail* of the iclog callback list, so it ordered *after* the callbacks for S1. Hence the commit records appear to be correctly ordered by this code, as are the callbacks being attached to iclogs. So, for the case of S1, S2 the code is safe. For the case of S1, S2, S3, we have the same initial conditions: - S2 is not made active until S1 is on the commit list - S3 is not made active until S2 is on the commit list Hence if we have S1 still committing while S2 and S3 have completed their commits, the wait cycle looks like: S2: Sees S2, continue; Sees S1, sees commit_lsn == 0, waits S3: Sees S3, continue; Sees S2, sees commit_lsn == 0, waits S1: completes commit, Sees S3, continue; Sees S2, continue; Sees S1, continue; Writes commit record Attaches callbacks sets commit_lsn wakeup_all() S2: S3: wake, restart wake, restart [win wakeup race on cil lock] Sees S3, continue Sees S2, sees commit_lsn == 0, wait Sees S3, continue Sees S2, continue Sees S1, sees commit_lsn != 0, continue [lost wakeup race, spins on cil lock] [not woken until S2 already through] Sees S3, continue Sees S2, sees commit_lsn == 0, wait Writes commit record Attaches callbacks sets commit_lsn wakeup_all() S3: Sees S3, continue Sees S2, sees commit_lsn != 0, continue Sees S1, sees commit_lsn != 0, continue Writes commit record Attaches callbacks sets commit_lsn wakeup_all() And so the ordering of commit records and multiple callbacks on the same iclog are correct for S1, S2 and S3. So, for and arbitrary set of sequences {S1...Sn, S(n+1)}, the initial conditions are: - S2 is not made active until S1 is on the commit list .... - S(n+1) is not made active until Sn is on the commit list Hence if we have S1 still committing while Sn and S(n+1) have completed their commits, the wait cycle looks like: Sn: Sees Sn, continue; ..... Sees S1, sees commit_lsn == 0, waits S(n+1_: Sees S(n+1), continue; Sees Sn, sees commit_lsn == 0, waits S1: completes commit, Sees S(n+1), continue; Sees Sn, continue; ..... Sees S1, continue; Writes commit record Attaches callbacks sets commit_lsn wakeup_all() Sn: S(n+1): wake, restart wake, restart [win wakeup race on cil lock] Sees S(n+1), continue Sees Sn, sees commit_lsn == 0, wait Sees S(n+1), continue Sees Sn, continue ..... Sees S1, sees commit_lsn != 0, continue [lost wakeup race, spins on cil lock] [not woken until Sn already through] Sees S(n+1), continue Sees Sn, sees commit_lsn == 0, wait Writes commit record Attaches callbacks sets commit_lsn wakeup_all() S3: Sees S(n+1), continue Sees Sn, sees commit_lsn != 0, continue ..... Sees S1, sees commit_lsn != 0, continue Writes commit record Attaches callbacks sets commit_lsn wakeup_all() Hence by induction the ordering of commit records and multiple callbacks on the same iclog are correct for {S1...Sn, S(n+1)} QED _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: don't free EFIs before the EFDs are committed 2013-04-03 3:09 [PATCH] xfs: don't free EFIs before the EFDs are committed Dave Chinner 2013-04-03 19:12 ` Mark Tinguely 2013-04-04 22:06 ` Mark Tinguely @ 2013-04-05 18:31 ` Ben Myers 2 siblings, 0 replies; 10+ messages in thread From: Ben Myers @ 2013-04-05 18:31 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Wed, Apr 03, 2013 at 02:09:21PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Filesystems are occasionally being shut down with this error: > > xfs_trans_ail_delete_bulk: attempting to delete a log item that is > not in the AIL. > > It was diagnosed to be related to the EFI/EFD commit order when the > EFI and EFD are in different checkpoints and the EFD is committed > before the EFI here: > > http://oss.sgi.com/archives/xfs/2013-01/msg00082.html > > The real problem is that a single bit cannot fully describe the > states that the EFI/EFD processing can be in. These completion > states are: > > EFI EFI in AIL EFD Result > committed/unpinned Yes committed OK > committed/pinned No committed Shutdown > uncommitted No committed Shutdown > > > Note that the "result" field is what should happen, not what does > happen. The current logic is broken and handles the first two cases > correctly by luck. That is, the code will free the EFI if the > XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The > inverted logic "works" because if both EFI and EFD are committed, > then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED > bit, and the second frees the EFI item. Hence as long as > xfs_efi_item_committed() has been called, everything appears to be > fine. > > It is the third case where the logic fails - where > xfs_efd_item_committed() is called before xfs_efi_item_committed(), > and that results in the EFI being freed before it has been > committed. That is the bug that triggered the shutdown, and hence > keeping track of whether the EFI has been committed or not is > insufficient to correctly order the EFI/EFD operations w.r.t. the > AIL. > > What we really want is this: the EFI is always placed into the > AIL before the last reference goes away. The only way to guarantee > that is that the EFI is not freed until after it has been unpinned > *and* the EFD has been committed. That is, restructure the logic so > that the only case that can occur is the first case. > > This can be done easily by replacing the XFS_EFI_COMMITTED with an > EFI reference count. The EFI is initialised with it's own count, and > that is not released until it is unpinned. However, there is a > complication to this method - the high level EFI/EFD code in > xfs_bmap_finish() does not hold direct references to the EFI > structure, and runs a transaction commit between the EFI and EFD > processing. Hence the EFI can be freed even before the EFD is > created using such a method. > > Further, log recovery uses the AIL for tracking EFI/EFDs that need > to be recovered, but it uses the AIL *differently* to the EFI > transaction commit. Hence log recovery never pins or unpins EFIs, so > we can't drop the EFI reference count indirectly to free the EFI. > > However, this doesn't prevent us from using a reference count here. > There is a 1:1 relationship between EFIs and EFDs, so when we > initialise the EFI we can take a reference count for the EFD as > well. This solves the xfs_bmap_finish() issue - the EFI will never > be freed until the EFD is processed. In terms of log recovery, > during the committing of the EFD we can look for the > XFS_EFI_RECOVERED bit being set and drop the EFI reference as well, > thereby ensuring everything works correctly there as well. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Applied. Regards, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-05 18:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-03 3:09 [PATCH] xfs: don't free EFIs before the EFDs are committed Dave Chinner 2013-04-03 19:12 ` Mark Tinguely 2013-04-03 19:46 ` Eric Sandeen 2013-04-03 21:02 ` Eric Sandeen 2013-04-03 21:45 ` Mark Tinguely 2013-04-04 1:31 ` Dave Chinner 2013-04-04 1:14 ` Dave Chinner 2013-04-04 22:06 ` Mark Tinguely 2013-04-05 0:45 ` Dave Chinner 2013-04-05 18:31 ` Ben Myers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox