linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] xfs: bli refcount fixups
@ 2018-08-27 14:25 Brian Foster
  2018-08-27 14:25 ` [PATCH 1/3] xfs: don't unlock invalidated buf on aborted tx commit Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Brian Foster @ 2018-08-27 14:25 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This is a v2 of the stale buffer locking fix for an aborted transaction.
It has minor refactoring changes noted below. I've added two optional
patches on top because I wanted to take a stab at a bit more cleanup in
this area, but these are aesthetic changes only and can be dropped if
desired.

Patch 1 is the original bug fix. Patch 2 cleans up xfs_trans_brelse() to
remove some cruft and use similar logic/flow as xfs_buf_item_unlock().
Patch 3 refactors the bli refcount cleanup logic that is common between
the two functions. There is still open-coded bli refcount handling on
unpin because there are additional considerations in that path. Stale
inode buffer handling has special handling, for whatever reason. The
stale inode buf handling actually looks quite similar to the
non-specialized buffer handling, but further cleanup in that area would
require a more in-depth audit to be sure.

I find the end result a bit cleaner but I'm not wedded to it and I'm
fine with just dropping patches 2-3 if the current approach is
preferred. FWIW, this also raised a couple thoughts on alternative
approaches, such as reworking the bli refcount such that the AIL also
holds a reference, but it's not totally clear that (by itself)
simplifies things in the case of a shutdown with AIL-held blis.

Thoughts, reviews, flames appreciated.

Brian

v2:
- Refactor stale bli unlock logic to more closely resemble original.
- Use bool for 'freed' variable and clean up type casting.
- Reorder asserts/tracepoints and remove unused stale tracepoint.
- Added patches 2 and 3.
v1: https://marc.info/?l=linux-xfs&m=153486008127962&w=2

Brian Foster (3):
  xfs: don't unlock invalidated buf on aborted tx commit
  xfs: clean up xfs_trans_brelse()
  xfs: refactor xfs_buf_log_item reference count handling

 fs/xfs/xfs_buf_item.c  | 115 +++++++++++++++++++++--------------------
 fs/xfs/xfs_buf_item.h  |   1 +
 fs/xfs/xfs_trace.h     |   1 -
 fs/xfs/xfs_trans_buf.c |  98 +++++++++--------------------------
 4 files changed, 84 insertions(+), 131 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] xfs: don't unlock invalidated buf on aborted tx commit
  2018-08-27 14:25 [PATCH v2 0/3] xfs: bli refcount fixups Brian Foster
@ 2018-08-27 14:25 ` Brian Foster
  2018-08-27 14:25 ` [PATCH 2/3] xfs: clean up xfs_trans_brelse() Brian Foster
  2018-08-27 14:25 ` [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling Brian Foster
  2 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2018-08-27 14:25 UTC (permalink / raw)
  To: linux-xfs

xfstests generic/388,475 occasionally reproduce assertion failures
in xfs_buf_item_unpin() when the final bli reference is dropped on
an invalidated buffer and the buffer is not locked as it is expected
to be. Invalidated buffers should remain locked on transaction
commit until the final unpin, at which point the buffer is removed
from the AIL and the bli is freed since stale buffers are not
written back.

The assert failures are associated with filesystem shutdown,
typically due to log I/O errors injected by the test. The
problematic situation can occur if the shutdown happens to cause a
race between an active transaction that has invalidated a particular
buffer and an I/O error on a log buffer that contains the bli
associated with the same (now stale) buffer.

Both transaction and log contexts acquire a bli reference. If the
transaction has already invalidated the buffer by the time the I/O
error occurs and ends up aborting due to shutdown, the transaction
and log hold the last two references to a stale bli. If the
transaction cancel occurs first, it treats the buffer as non-stale
due to the aborted state: the bli reference is dropped and the
buffer is released/unlocked. The log buffer I/O error handling
eventually calls into xfs_buf_item_unpin(), drops the final
reference to the bli and treats it as stale. The buffer wasn't left
locked by xfs_buf_item_unlock(), however, so the assert fails and
the buffer is double unlocked. The latter problem is mitigated by
the fact that the fs is shutdown and no further damage is possible.

->iop_unlock() of an invalidated buffer should behave consistently
with respect to the bli refcount, regardless of aborted state. If
the refcount remains elevated on commit, we know the bli is awaiting
an unpin (since it can't be in another transaction) and will be
handled appropriately on log buffer completion. If the final bli
reference of an invalidated buffer is dropped in ->iop_unlock(), we
can assume the transaction has aborted because invalidation implies
a dirty transaction. In the non-abort case, the log would have
acquired a bli reference in ->iop_pin() and prevented bli release at
->iop_unlock() time. In the abort case the item must be freed and
buffer unlocked because it wasn't pinned by the log.

Rework xfs_buf_item_unlock() to simplify the currently circuitous
and duplicate logic and leave invalidated buffers locked based on
bli refcount, regardless of aborted state. This ensures that a
pinned, stale buffer is always found locked when eventually
unpinned.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c | 92 ++++++++++++++++++++-----------------------
 fs/xfs/xfs_trace.h    |  1 -
 2 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1c9d1398980b..b39d1b5320e7 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -556,73 +556,65 @@ xfs_buf_item_unlock(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	struct xfs_buf		*bp = bip->bli_buf;
+	bool			freed;
 	bool			aborted;
-	bool			hold = !!(bip->bli_flags & XFS_BLI_HOLD);
-	bool			dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
+	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
+	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
+	bool			stale = bip->bli_flags & XFS_BLI_STALE;
 #if defined(DEBUG) || defined(XFS_WARN)
-	bool			ordered = !!(bip->bli_flags & XFS_BLI_ORDERED);
+	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
 #endif
-
-	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
-
-	/* Clear the buffer's association with this transaction. */
-	bp->b_transp = NULL;
-
-	/*
-	 * The per-transaction state has been copied above so clear it from the
-	 * bli.
-	 */
-	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
-
-	/*
-	 * If the buf item is marked stale, then don't do anything.  We'll
-	 * unlock the buffer and free the buf item when the buffer is unpinned
-	 * for the last time.
-	 */
-	if (bip->bli_flags & XFS_BLI_STALE) {
-		trace_xfs_buf_item_unlock_stale(bip);
-		ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
-		if (!aborted) {
-			atomic_dec(&bip->bli_refcount);
-			return;
-		}
-	}
-
 	trace_xfs_buf_item_unlock(bip);
 
 	/*
-	 * If the buf item isn't tracking any data, free it, otherwise drop the
-	 * reference we hold to it. If we are aborting the transaction, this may
-	 * be the only reference to the buf item, so we free it anyway
-	 * regardless of whether it is dirty or not. A dirty abort implies a
-	 * shutdown, anyway.
-	 *
 	 * The bli dirty state should match whether the blf has logged segments
 	 * except for ordered buffers, where only the bli should be dirty.
 	 */
 	ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
 	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
+	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
+
+	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
 
 	/*
-	 * Clean buffers, by definition, cannot be in the AIL. However, aborted
-	 * buffers may be in the AIL regardless of dirty state. An aborted
-	 * transaction that invalidates a buffer already in the AIL may have
-	 * marked it stale and cleared the dirty state, for example.
-	 *
-	 * Therefore if we are aborting a buffer and we've just taken the last
-	 * reference away, we have to check if it is in the AIL before freeing
-	 * it. We need to free it in this case, because an aborted transaction
-	 * has already shut the filesystem down and this is the last chance we
-	 * will have to do so.
+	 * Clear the buffer's association with this transaction and
+	 * per-transaction state from the bli, which has been copied above.
+	 */
+	bp->b_transp = NULL;
+	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
+
+	/*
+	 * Drop the transaction's bli reference and deal with the item if we had
+	 * the last one. We must free the item if clean or aborted since it
+	 * wasn't pinned by the log and this is the last chance to do so. If the
+	 * bli is freed and dirty (but non-aborted), the buffer was not dirty in
+	 * this transaction but modified by a previous one and still awaiting
+	 * writeback. In that case, the bli is freed on buffer writeback
+	 * completion.
 	 */
-	if (atomic_dec_and_test(&bip->bli_refcount)) {
-		if (aborted) {
-			ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
+	freed = atomic_dec_and_test(&bip->bli_refcount);
+	if (freed) {
+		ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
+		/*
+		 * An aborted item may be in the AIL regardless of dirty state.
+		 * For example, consider an aborted transaction that invalidated
+		 * a dirty bli and cleared the dirty state.
+		 */
+		if (aborted)
 			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
+		if (aborted || !dirty)
 			xfs_buf_item_relse(bp);
-		} else if (!dirty)
-			xfs_buf_item_relse(bp);
+	} else if (stale) {
+		/*
+		 * Stale buffers remain locked until final unpin unless the bli
+		 * was freed in the branch above. A freed stale bli implies an
+		 * abort because buffer invalidation dirties the bli and
+		 * transaction.
+		 */
+		ASSERT(!freed);
+		return;
 	}
+	ASSERT(!stale || (aborted && freed));
 
 	if (!hold)
 		xfs_buf_relse(bp);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ad315e83bc02..3043e5ed6495 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -473,7 +473,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unpin_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unlock);
-DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unlock_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_committed);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_push);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_get_buf);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] xfs: clean up xfs_trans_brelse()
  2018-08-27 14:25 [PATCH v2 0/3] xfs: bli refcount fixups Brian Foster
  2018-08-27 14:25 ` [PATCH 1/3] xfs: don't unlock invalidated buf on aborted tx commit Brian Foster
@ 2018-08-27 14:25 ` Brian Foster
  2018-08-28 22:36   ` Dave Chinner
  2018-08-27 14:25 ` [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling Brian Foster
  2 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2018-08-27 14:25 UTC (permalink / raw)
  To: linux-xfs

xfs_trans_brelse() is a bit of a historical mess, similar to
xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
commented out code, inconsistency with regard to stale items, etc.

Clean up xfs_trans_brelse() to use similar logic and flow as
xfs_buf_item_unlock() with regard to bli reference count handling.
This patch makes no functional changes, but facilitates further
refactoring of the common bli reference count handling code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_trans_buf.c | 108 +++++++++++++++--------------------------
 1 file changed, 38 insertions(+), 70 deletions(-)

diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 15919f67a88f..92a708cdfee3 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -322,49 +322,40 @@ xfs_trans_read_buf_map(
 }
 
 /*
- * Release the buffer bp which was previously acquired with one of the
- * xfs_trans_... buffer allocation routines if the buffer has not
- * been modified within this transaction.  If the buffer is modified
- * within this transaction, do decrement the recursion count but do
- * not release the buffer even if the count goes to 0.  If the buffer is not
- * modified within the transaction, decrement the recursion count and
- * release the buffer if the recursion count goes to 0.
+ * Release a buffer previously joined to the transaction. If the buffer is
+ * modified within this transaction, decrement the recursion count but do not
+ * release the buffer even if the count goes to 0. If the buffer is not modified
+ * within the transaction, decrement the recursion count and release the buffer
+ * if the recursion count goes to 0.
  *
- * If the buffer is to be released and it was not modified before
- * this transaction began, then free the buf_log_item associated with it.
+ * If the buffer is to be released and it was not already dirty before this
+ * transaction began, then also free the buf_log_item associated with it.
  *
- * If the transaction pointer is NULL, make this just a normal
- * brelse() call.
+ * If the transaction pointer is NULL, this is a normal xfs_buf_relse() call.
  */
 void
 xfs_trans_brelse(
-	xfs_trans_t		*tp,
-	xfs_buf_t		*bp)
+	struct xfs_trans	*tp,
+	struct xfs_buf		*bp)
 {
-	struct xfs_buf_log_item	*bip;
-	int			freed;
+	struct xfs_buf_log_item	*bip = bp->b_log_item;
+	bool			freed;
+	bool			dirty;
 
-	/*
-	 * Default to a normal brelse() call if the tp is NULL.
-	 */
-	if (tp == NULL) {
-		ASSERT(bp->b_transp == NULL);
+	ASSERT(bp->b_transp == tp);
+
+	if (!tp) {
 		xfs_buf_relse(bp);
 		return;
 	}
 
-	ASSERT(bp->b_transp == tp);
-	bip = bp->b_log_item;
+	trace_xfs_trans_brelse(bip);
 	ASSERT(bip->bli_item.li_type == XFS_LI_BUF);
-	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
-	ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
 
-	trace_xfs_trans_brelse(bip);
-
 	/*
-	 * If the release is just for a recursive lock,
-	 * then decrement the count and return.
+	 * If the release is for a recursive lookup, then decrement the count
+	 * and return.
 	 */
 	if (bip->bli_recur > 0) {
 		bip->bli_recur--;
@@ -372,63 +363,40 @@ xfs_trans_brelse(
 	}
 
 	/*
-	 * If the buffer is dirty within this transaction, we can't
+	 * If the buffer is invalidated or dirty in this transaction, we can't
 	 * release it until we commit.
 	 */
 	if (test_bit(XFS_LI_DIRTY, &bip->bli_item.li_flags))
 		return;
-
-	/*
-	 * If the buffer has been invalidated, then we can't release
-	 * it until the transaction commits to disk unless it is re-dirtied
-	 * as part of this transaction.  This prevents us from pulling
-	 * the item from the AIL before we should.
-	 */
 	if (bip->bli_flags & XFS_BLI_STALE)
 		return;
 
-	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
-
 	/*
-	 * Free up the log item descriptor tracking the released item.
+	 * Unlink the log item from the transaction and clear the hold flag, if
+	 * set. We wouldn't want the next user of the buffer to get confused.
 	 */
+	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
 	xfs_trans_del_item(&bip->bli_item);
-
-	/*
-	 * Clear the hold flag in the buf log item if it is set.
-	 * We wouldn't want the next user of the buffer to
-	 * get confused.
-	 */
-	if (bip->bli_flags & XFS_BLI_HOLD) {
+	if (bip->bli_flags & XFS_BLI_HOLD)
 		bip->bli_flags &= ~XFS_BLI_HOLD;
-	}
 
 	/*
-	 * Drop our reference to the buf log item.
+	 * Drop the reference to the bli. At this point, the bli must be either
+	 * freed or dirty (or both). If freed, there are a couple cases where we
+	 * are responsible to free the item. If the bli is clean, we're the last
+	 * user of it. If the fs has shut down, the bli may be dirty and AIL
+	 * resident, but won't ever be written back.
 	 */
 	freed = atomic_dec_and_test(&bip->bli_refcount);
-
-	/*
-	 * If the buf item is not tracking data in the log, then we must free it
-	 * before releasing the buffer back to the free pool.
-	 *
-	 * If the fs has shutdown and we dropped the last reference, it may fall
-	 * on us to release a (possibly dirty) bli if it never made it to the
-	 * AIL (e.g., the aborted unpin already happened and didn't release it
-	 * due to our reference). Since we're already shutdown and need
-	 * ail_lock, just force remove from the AIL and release the bli here.
-	 */
-	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
-		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_buf_item_relse(bp);
-	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
-/***
-		ASSERT(bp->b_pincount == 0);
-***/
-		ASSERT(atomic_read(&bip->bli_refcount) == 0);
-		ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
-		ASSERT(!(bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF));
-		xfs_buf_item_relse(bp);
+	dirty = bip->bli_flags & XFS_BLI_DIRTY;
+	ASSERT(freed || dirty);
+	if (freed) {
+		bool abort = XFS_FORCED_SHUTDOWN(tp->t_mountp);
+		ASSERT(abort || !test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
+		if (abort)
+			xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
+		if (!dirty || abort)
+			xfs_buf_item_relse(bp);
 	}
 
 	bp->b_transp = NULL;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling
  2018-08-27 14:25 [PATCH v2 0/3] xfs: bli refcount fixups Brian Foster
  2018-08-27 14:25 ` [PATCH 1/3] xfs: don't unlock invalidated buf on aborted tx commit Brian Foster
  2018-08-27 14:25 ` [PATCH 2/3] xfs: clean up xfs_trans_brelse() Brian Foster
@ 2018-08-27 14:25 ` Brian Foster
  2018-08-28 22:59   ` Dave Chinner
  2 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2018-08-27 14:25 UTC (permalink / raw)
  To: linux-xfs

The xfs_buf_log_item structure has a reference counter with slightly
tricky semantics. In the common case, a buffer is logged and
committed in a transaction, committed to the on-disk log (added to
the AIL) and then finally written back and removed from the AIL. The
bli refcount covers two potentially overlapping timeframes:

 1. the bli is held in an active transaction
 2. the bli is pinned by the log

The caveat to this approach is that the reference counter does not
purely dictate the lifetime of the bli. IOW, when a dirty buffer is
physically logged and unpinned, the bli refcount may go to zero as
the log item is inserted into the AIL. Only once the buffer is
written back can the bli finally be freed.

The above semantics means that it is not enough for the various
refcount decrementing contexts to release the bli on decrement to
zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
unpin (->iop_unpin()) must all drop the associated reference and
make additional checks to determine if the current context is
responsible for freeing the item.

For example, if a transaction holds but does not dirty a particular
bli, the commit may drop the refcount to zero. If the bli itself is
clean, it is also not AIL resident and must be freed at this time.
The same is true for xfs_trans_brelse(). If the transaction dirties
a bli and then aborts or an unpin results in an abort due to a log
I/O error, the last reference count holder is expected to explicitly
remove the item from the AIL and release it (since an abort means
filesystem shutdown and metadata writeback will never occur).

This leads to fairly complex checks being replicated in a few
different places. Since ->iop_unlock() and xfs_trans_brelse() are
nearly identical, refactor the logic into a common helper that
implements and documents the semantics in one place. This patch does
not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c  | 85 ++++++++++++++++++++++++------------------
 fs/xfs/xfs_buf_item.h  |  1 +
 fs/xfs/xfs_trans_buf.c | 22 +----------
 3 files changed, 51 insertions(+), 57 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index b39d1b5320e7..d0191451fe18 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -556,13 +556,12 @@ xfs_buf_item_unlock(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	struct xfs_buf		*bp = bip->bli_buf;
-	bool			freed;
-	bool			aborted;
+	bool			released;
 	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
-	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
 	bool			stale = bip->bli_flags & XFS_BLI_STALE;
 #if defined(DEBUG) || defined(XFS_WARN)
 	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
+	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
 #endif
 	trace_xfs_buf_item_unlock(bip);
 
@@ -574,8 +573,6 @@ xfs_buf_item_unlock(
 	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
 	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
 
-	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
-
 	/*
 	 * Clear the buffer's association with this transaction and
 	 * per-transaction state from the bli, which has been copied above.
@@ -584,40 +581,16 @@ xfs_buf_item_unlock(
 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
 
 	/*
-	 * Drop the transaction's bli reference and deal with the item if we had
-	 * the last one. We must free the item if clean or aborted since it
-	 * wasn't pinned by the log and this is the last chance to do so. If the
-	 * bli is freed and dirty (but non-aborted), the buffer was not dirty in
-	 * this transaction but modified by a previous one and still awaiting
-	 * writeback. In that case, the bli is freed on buffer writeback
-	 * completion.
+	 * Unref the item and unlock the buffer unless held or stale. Stale
+	 * buffers remain locked until final unpin unless the bli is freed by
+	 * the unref call. The latter implies shutdown because buffer
+	 * invalidation dirties the bli and transaction.
 	 */
-	freed = atomic_dec_and_test(&bip->bli_refcount);
-	if (freed) {
-		ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
-		/*
-		 * An aborted item may be in the AIL regardless of dirty state.
-		 * For example, consider an aborted transaction that invalidated
-		 * a dirty bli and cleared the dirty state.
-		 */
-		if (aborted)
-			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
-		if (aborted || !dirty)
-			xfs_buf_item_relse(bp);
-	} else if (stale) {
-		/*
-		 * Stale buffers remain locked until final unpin unless the bli
-		 * was freed in the branch above. A freed stale bli implies an
-		 * abort because buffer invalidation dirties the bli and
-		 * transaction.
-		 */
-		ASSERT(!freed);
+	released = xfs_buf_item_unref(bip);
+	if ((stale && !released) || hold)
 		return;
-	}
-	ASSERT(!stale || (aborted && freed));
-
-	if (!hold)
-		xfs_buf_relse(bp);
+	ASSERT(!stale || test_bit(XFS_LI_ABORTED, &lip->li_flags));
+	xfs_buf_relse(bp);
 }
 
 /*
@@ -951,6 +924,44 @@ xfs_buf_item_relse(
 	xfs_buf_item_free(bip);
 }
 
+bool
+xfs_buf_item_unref(
+	struct xfs_buf_log_item	*bip)
+{
+	struct xfs_log_item	*lip = &bip->bli_item;
+	bool			aborted;
+	bool			dirty;
+	bool			freed;
+
+	/* drop the bli ref and return if it wasn't the last one */
+	freed = atomic_dec_and_test(&bip->bli_refcount);
+	if (!freed)
+		return false;
+
+	/*
+	 * We dropped the last ref and must free the item if clean or aborted.
+	 * If the bli is dirty and non-aborted, the buffer was clean in the
+	 * transaction but still awaiting writeback from previous changes. In
+	 * that case, the bli is freed on buffer writeback completion.
+	 */
+	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags) ||
+		  XFS_FORCED_SHUTDOWN(lip->li_mountp);
+	dirty = bip->bli_flags & XFS_BLI_DIRTY;
+	if (!aborted && dirty)
+		return false;
+
+	/*
+	 * The bli is aborted or clean. An aborted item may be in the AIL
+	 * regardless of dirty state.  For example, consider an aborted
+	 * transaction that invalidated a dirty bli and cleared the dirty
+	 * state.
+	 */
+	if (aborted)
+		xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
+	xfs_buf_item_relse(bip->bli_buf);
+	return true;
+}
+
 
 /*
  * Add the given log item with its callback to the list of callbacks
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 3f7d7b72e7e6..ac5663955daf 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -51,6 +51,7 @@ struct xfs_buf_log_item {
 
 int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
 void	xfs_buf_item_relse(struct xfs_buf *);
+bool	xfs_buf_item_unref(struct xfs_buf_log_item *);
 void	xfs_buf_item_log(struct xfs_buf_log_item *, uint, uint);
 bool	xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
 void	xfs_buf_attach_iodone(struct xfs_buf *,
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 92a708cdfee3..4a16db972518 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -339,8 +339,6 @@ xfs_trans_brelse(
 	struct xfs_buf		*bp)
 {
 	struct xfs_buf_log_item	*bip = bp->b_log_item;
-	bool			freed;
-	bool			dirty;
 
 	ASSERT(bp->b_transp == tp);
 
@@ -380,24 +378,8 @@ xfs_trans_brelse(
 	if (bip->bli_flags & XFS_BLI_HOLD)
 		bip->bli_flags &= ~XFS_BLI_HOLD;
 
-	/*
-	 * Drop the reference to the bli. At this point, the bli must be either
-	 * freed or dirty (or both). If freed, there are a couple cases where we
-	 * are responsible to free the item. If the bli is clean, we're the last
-	 * user of it. If the fs has shut down, the bli may be dirty and AIL
-	 * resident, but won't ever be written back.
-	 */
-	freed = atomic_dec_and_test(&bip->bli_refcount);
-	dirty = bip->bli_flags & XFS_BLI_DIRTY;
-	ASSERT(freed || dirty);
-	if (freed) {
-		bool abort = XFS_FORCED_SHUTDOWN(tp->t_mountp);
-		ASSERT(abort || !test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
-		if (abort)
-			xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
-		if (!dirty || abort)
-			xfs_buf_item_relse(bp);
-	}
+	/* drop the reference to the bli */
+	xfs_buf_item_unref(bip);
 
 	bp->b_transp = NULL;
 	xfs_buf_relse(bp);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] xfs: clean up xfs_trans_brelse()
  2018-08-27 14:25 ` [PATCH 2/3] xfs: clean up xfs_trans_brelse() Brian Foster
@ 2018-08-28 22:36   ` Dave Chinner
  2018-08-29 13:55     ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2018-08-28 22:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Aug 27, 2018 at 10:25:47AM -0400, Brian Foster wrote:
> xfs_trans_brelse() is a bit of a historical mess, similar to
> xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
> commented out code, inconsistency with regard to stale items, etc.
> 
> Clean up xfs_trans_brelse() to use similar logic and flow as
> xfs_buf_item_unlock() with regard to bli reference count handling.
> This patch makes no functional changes, but facilitates further
> refactoring of the common bli reference count handling code.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Nice! I like it a lot. Couple of minor things....

> -	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
> -
>  	/*
> -	 * Free up the log item descriptor tracking the released item.
> +	 * Unlink the log item from the transaction and clear the hold flag, if
> +	 * set. We wouldn't want the next user of the buffer to get confused.
>  	 */
> +	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
>  	xfs_trans_del_item(&bip->bli_item);
> -
> -	/*
> -	 * Clear the hold flag in the buf log item if it is set.
> -	 * We wouldn't want the next user of the buffer to
> -	 * get confused.
> -	 */
> -	if (bip->bli_flags & XFS_BLI_HOLD) {
> +	if (bip->bli_flags & XFS_BLI_HOLD)
>  		bip->bli_flags &= ~XFS_BLI_HOLD;
> -	}

May as well just unconditionally clear XFS_BLI_HOLD - the cache line
is already dirty by this point, so it's less code than checking to
see if we do need to clear it.

>  	/*
> -	 * Drop our reference to the buf log item.
> +	 * Drop the reference to the bli. At this point, the bli must be either
> +	 * freed or dirty (or both). If freed, there are a couple cases where we
> +	 * are responsible to free the item. If the bli is clean, we're the last
> +	 * user of it. If the fs has shut down, the bli may be dirty and AIL
> +	 * resident, but won't ever be written back.
						    so we may also need to
	 * remove it from the AIL before freeing it
>  	 */
>  	freed = atomic_dec_and_test(&bip->bli_refcount);
> -
> -	/*
> -	 * If the buf item is not tracking data in the log, then we must free it
> -	 * before releasing the buffer back to the free pool.
> -	 *
> -	 * If the fs has shutdown and we dropped the last reference, it may fall
> -	 * on us to release a (possibly dirty) bli if it never made it to the
> -	 * AIL (e.g., the aborted unpin already happened and didn't release it
> -	 * due to our reference). Since we're already shutdown and need
> -	 * ail_lock, just force remove from the AIL and release the bli here.
> -	 */
> -	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
> -		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
> -		xfs_buf_item_relse(bp);
> -	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
> -/***
> -		ASSERT(bp->b_pincount == 0);
> -***/
> -		ASSERT(atomic_read(&bip->bli_refcount) == 0);
> -		ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
> -		ASSERT(!(bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF));
> -		xfs_buf_item_relse(bp);
> +	dirty = bip->bli_flags & XFS_BLI_DIRTY;
> +	ASSERT(freed || dirty);
> +	if (freed) {
> +		bool abort = XFS_FORCED_SHUTDOWN(tp->t_mountp);
> +		ASSERT(abort || !test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
> +		if (abort)
> +			xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
> +		if (!dirty || abort)
> +			xfs_buf_item_relse(bp);
>  	}

That, overall, is much nicer than the current code and worth doing,
I think.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling
  2018-08-27 14:25 ` [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling Brian Foster
@ 2018-08-28 22:59   ` Dave Chinner
  2018-08-29 13:55     ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2018-08-28 22:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Aug 27, 2018 at 10:25:48AM -0400, Brian Foster wrote:
> The xfs_buf_log_item structure has a reference counter with slightly
> tricky semantics. In the common case, a buffer is logged and
> committed in a transaction, committed to the on-disk log (added to
> the AIL) and then finally written back and removed from the AIL. The
> bli refcount covers two potentially overlapping timeframes:
> 
>  1. the bli is held in an active transaction
>  2. the bli is pinned by the log
> 
> The caveat to this approach is that the reference counter does not
> purely dictate the lifetime of the bli. IOW, when a dirty buffer is
> physically logged and unpinned, the bli refcount may go to zero as
> the log item is inserted into the AIL. Only once the buffer is
> written back can the bli finally be freed.
> 
> The above semantics means that it is not enough for the various
> refcount decrementing contexts to release the bli on decrement to
> zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
> unpin (->iop_unpin()) must all drop the associated reference and
> make additional checks to determine if the current context is
> responsible for freeing the item.
> 
> For example, if a transaction holds but does not dirty a particular
> bli, the commit may drop the refcount to zero. If the bli itself is
> clean, it is also not AIL resident and must be freed at this time.
> The same is true for xfs_trans_brelse(). If the transaction dirties
> a bli and then aborts or an unpin results in an abort due to a log
> I/O error, the last reference count holder is expected to explicitly
> remove the item from the AIL and release it (since an abort means
> filesystem shutdown and metadata writeback will never occur).
> 
> This leads to fairly complex checks being replicated in a few
> different places. Since ->iop_unlock() and xfs_trans_brelse() are
> nearly identical, refactor the logic into a common helper that
> implements and documents the semantics in one place. This patch does
> not change behavior.

I think this improves the code further, because now we don't have to
care about the buf item cleanup when dropping references.

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf_item.c  | 85 ++++++++++++++++++++++++------------------
>  fs/xfs/xfs_buf_item.h  |  1 +
>  fs/xfs/xfs_trans_buf.c | 22 +----------
>  3 files changed, 51 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index b39d1b5320e7..d0191451fe18 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -556,13 +556,12 @@ xfs_buf_item_unlock(
>  {
>  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
>  	struct xfs_buf		*bp = bip->bli_buf;
> -	bool			freed;
> -	bool			aborted;
> +	bool			released;
>  	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
> -	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
>  	bool			stale = bip->bli_flags & XFS_BLI_STALE;
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
> +	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
>  #endif
>  	trace_xfs_buf_item_unlock(bip);
>  
> @@ -574,8 +573,6 @@ xfs_buf_item_unlock(
>  	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
>  	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
>  
> -	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
> -
>  	/*
>  	 * Clear the buffer's association with this transaction and
>  	 * per-transaction state from the bli, which has been copied above.
> @@ -584,40 +581,16 @@ xfs_buf_item_unlock(
>  	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
>  
>  	/*
> -	 * Drop the transaction's bli reference and deal with the item if we had
> -	 * the last one. We must free the item if clean or aborted since it
> -	 * wasn't pinned by the log and this is the last chance to do so. If the
> -	 * bli is freed and dirty (but non-aborted), the buffer was not dirty in
> -	 * this transaction but modified by a previous one and still awaiting
> -	 * writeback. In that case, the bli is freed on buffer writeback
> -	 * completion.
> +	 * Unref the item and unlock the buffer unless held or stale. Stale
> +	 * buffers remain locked until final unpin unless the bli is freed by
> +	 * the unref call. The latter implies shutdown because buffer
> +	 * invalidation dirties the bli and transaction.
>  	 */
> -	freed = atomic_dec_and_test(&bip->bli_refcount);
> -	if (freed) {
> -		ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
> -		/*
> -		 * An aborted item may be in the AIL regardless of dirty state.
> -		 * For example, consider an aborted transaction that invalidated
> -		 * a dirty bli and cleared the dirty state.
> -		 */
> -		if (aborted)
> -			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
> -		if (aborted || !dirty)
> -			xfs_buf_item_relse(bp);
> -	} else if (stale) {
> -		/*
> -		 * Stale buffers remain locked until final unpin unless the bli
> -		 * was freed in the branch above. A freed stale bli implies an
> -		 * abort because buffer invalidation dirties the bli and
> -		 * transaction.
> -		 */
> -		ASSERT(!freed);
> +	released = xfs_buf_item_unref(bip);
> +	if ((stale && !released) || hold)
>  		return;

Personal preference, but I'd put the hold check first because it's
not a compound logic statement.

> +bool
> +xfs_buf_item_unref(

"+xfs_buf_item_put()" to be consistent with dropping references on
other reference counted objects (e.g.  iput, dput, bio_put,
xfs_perag_put, xfs_log_ticket_put, etc).

> +	struct xfs_buf_log_item	*bip)
> +{
> +	struct xfs_log_item	*lip = &bip->bli_item;
> +	bool			aborted;
> +	bool			dirty;
> +	bool			freed;
> +
> +	/* drop the bli ref and return if it wasn't the last one */
> +	freed = atomic_dec_and_test(&bip->bli_refcount);
> +	if (!freed)
> +		return false;

We don't really need the freed variable here. This

	if (!atomic_dec_and_test(refcount))
		return false

is a common enough "do something only when we drop the last
reference" pattern that everyone should understand it by
now.

> +	 * If the bli is dirty and non-aborted, the buffer was clean in the
> +	 * transaction but still awaiting writeback from previous changes. In
> +	 * that case, the bli is freed on buffer writeback completion.
> +	 */
> +	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags) ||
> +		  XFS_FORCED_SHUTDOWN(lip->li_mountp);
> +	dirty = bip->bli_flags & XFS_BLI_DIRTY;
> +	if (!aborted && dirty)
> +		return false;

Personal preference (again) is to check dirty first - same as the
comment text mentions "dirty and not-aborted"...

These are all minor things - I think it's a good improvement
overall.

Cheers,

dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] xfs: clean up xfs_trans_brelse()
  2018-08-28 22:36   ` Dave Chinner
@ 2018-08-29 13:55     ` Brian Foster
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2018-08-29 13:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 29, 2018 at 08:36:25AM +1000, Dave Chinner wrote:
> On Mon, Aug 27, 2018 at 10:25:47AM -0400, Brian Foster wrote:
> > xfs_trans_brelse() is a bit of a historical mess, similar to
> > xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
> > commented out code, inconsistency with regard to stale items, etc.
> > 
> > Clean up xfs_trans_brelse() to use similar logic and flow as
> > xfs_buf_item_unlock() with regard to bli reference count handling.
> > This patch makes no functional changes, but facilitates further
> > refactoring of the common bli reference count handling code.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> 
> Nice! I like it a lot. Couple of minor things....
> 
> > -	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
> > -
> >  	/*
> > -	 * Free up the log item descriptor tracking the released item.
> > +	 * Unlink the log item from the transaction and clear the hold flag, if
> > +	 * set. We wouldn't want the next user of the buffer to get confused.
> >  	 */
> > +	ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
> >  	xfs_trans_del_item(&bip->bli_item);
> > -
> > -	/*
> > -	 * Clear the hold flag in the buf log item if it is set.
> > -	 * We wouldn't want the next user of the buffer to
> > -	 * get confused.
> > -	 */
> > -	if (bip->bli_flags & XFS_BLI_HOLD) {
> > +	if (bip->bli_flags & XFS_BLI_HOLD)
> >  		bip->bli_flags &= ~XFS_BLI_HOLD;
> > -	}
> 
> May as well just unconditionally clear XFS_BLI_HOLD - the cache line
> is already dirty by this point, so it's less code than checking to
> see if we do need to clear it.
> 

Good point.

> >  	/*
> > -	 * Drop our reference to the buf log item.
> > +	 * Drop the reference to the bli. At this point, the bli must be either
> > +	 * freed or dirty (or both). If freed, there are a couple cases where we
> > +	 * are responsible to free the item. If the bli is clean, we're the last
> > +	 * user of it. If the fs has shut down, the bli may be dirty and AIL
> > +	 * resident, but won't ever be written back.
> 						    so we may also need to
> 	 * remove it from the AIL before freeing it
> >  	 */

Tweaked... though note that this comment ends up split/reworked in the
subsequent patch.

Brian

> >  	freed = atomic_dec_and_test(&bip->bli_refcount);
> > -
> > -	/*
> > -	 * If the buf item is not tracking data in the log, then we must free it
> > -	 * before releasing the buffer back to the free pool.
> > -	 *
> > -	 * If the fs has shutdown and we dropped the last reference, it may fall
> > -	 * on us to release a (possibly dirty) bli if it never made it to the
> > -	 * AIL (e.g., the aborted unpin already happened and didn't release it
> > -	 * due to our reference). Since we're already shutdown and need
> > -	 * ail_lock, just force remove from the AIL and release the bli here.
> > -	 */
> > -	if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
> > -		xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
> > -		xfs_buf_item_relse(bp);
> > -	} else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
> > -/***
> > -		ASSERT(bp->b_pincount == 0);
> > -***/
> > -		ASSERT(atomic_read(&bip->bli_refcount) == 0);
> > -		ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
> > -		ASSERT(!(bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF));
> > -		xfs_buf_item_relse(bp);
> > +	dirty = bip->bli_flags & XFS_BLI_DIRTY;
> > +	ASSERT(freed || dirty);
> > +	if (freed) {
> > +		bool abort = XFS_FORCED_SHUTDOWN(tp->t_mountp);
> > +		ASSERT(abort || !test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags));
> > +		if (abort)
> > +			xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
> > +		if (!dirty || abort)
> > +			xfs_buf_item_relse(bp);
> >  	}
> 
> That, overall, is much nicer than the current code and worth doing,
> I think.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling
  2018-08-28 22:59   ` Dave Chinner
@ 2018-08-29 13:55     ` Brian Foster
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2018-08-29 13:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 29, 2018 at 08:59:09AM +1000, Dave Chinner wrote:
> On Mon, Aug 27, 2018 at 10:25:48AM -0400, Brian Foster wrote:
> > The xfs_buf_log_item structure has a reference counter with slightly
> > tricky semantics. In the common case, a buffer is logged and
> > committed in a transaction, committed to the on-disk log (added to
> > the AIL) and then finally written back and removed from the AIL. The
> > bli refcount covers two potentially overlapping timeframes:
> > 
> >  1. the bli is held in an active transaction
> >  2. the bli is pinned by the log
> > 
> > The caveat to this approach is that the reference counter does not
> > purely dictate the lifetime of the bli. IOW, when a dirty buffer is
> > physically logged and unpinned, the bli refcount may go to zero as
> > the log item is inserted into the AIL. Only once the buffer is
> > written back can the bli finally be freed.
> > 
> > The above semantics means that it is not enough for the various
> > refcount decrementing contexts to release the bli on decrement to
> > zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
> > unpin (->iop_unpin()) must all drop the associated reference and
> > make additional checks to determine if the current context is
> > responsible for freeing the item.
> > 
> > For example, if a transaction holds but does not dirty a particular
> > bli, the commit may drop the refcount to zero. If the bli itself is
> > clean, it is also not AIL resident and must be freed at this time.
> > The same is true for xfs_trans_brelse(). If the transaction dirties
> > a bli and then aborts or an unpin results in an abort due to a log
> > I/O error, the last reference count holder is expected to explicitly
> > remove the item from the AIL and release it (since an abort means
> > filesystem shutdown and metadata writeback will never occur).
> > 
> > This leads to fairly complex checks being replicated in a few
> > different places. Since ->iop_unlock() and xfs_trans_brelse() are
> > nearly identical, refactor the logic into a common helper that
> > implements and documents the semantics in one place. This patch does
> > not change behavior.
> 
> I think this improves the code further, because now we don't have to
> care about the buf item cleanup when dropping references.
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_buf_item.c  | 85 ++++++++++++++++++++++++------------------
> >  fs/xfs/xfs_buf_item.h  |  1 +
> >  fs/xfs/xfs_trans_buf.c | 22 +----------
> >  3 files changed, 51 insertions(+), 57 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > index b39d1b5320e7..d0191451fe18 100644
> > --- a/fs/xfs/xfs_buf_item.c
> > +++ b/fs/xfs/xfs_buf_item.c
> > @@ -556,13 +556,12 @@ xfs_buf_item_unlock(
> >  {
> >  	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
> >  	struct xfs_buf		*bp = bip->bli_buf;
> > -	bool			freed;
> > -	bool			aborted;
> > +	bool			released;
> >  	bool			hold = bip->bli_flags & XFS_BLI_HOLD;
> > -	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
> >  	bool			stale = bip->bli_flags & XFS_BLI_STALE;
> >  #if defined(DEBUG) || defined(XFS_WARN)
> >  	bool			ordered = bip->bli_flags & XFS_BLI_ORDERED;
> > +	bool			dirty = bip->bli_flags & XFS_BLI_DIRTY;
> >  #endif
> >  	trace_xfs_buf_item_unlock(bip);
> >  
> > @@ -574,8 +573,6 @@ xfs_buf_item_unlock(
> >  	       (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> >  	ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> >  
> > -	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags);
> > -
> >  	/*
> >  	 * Clear the buffer's association with this transaction and
> >  	 * per-transaction state from the bli, which has been copied above.
> > @@ -584,40 +581,16 @@ xfs_buf_item_unlock(
> >  	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> >  
> >  	/*
> > -	 * Drop the transaction's bli reference and deal with the item if we had
> > -	 * the last one. We must free the item if clean or aborted since it
> > -	 * wasn't pinned by the log and this is the last chance to do so. If the
> > -	 * bli is freed and dirty (but non-aborted), the buffer was not dirty in
> > -	 * this transaction but modified by a previous one and still awaiting
> > -	 * writeback. In that case, the bli is freed on buffer writeback
> > -	 * completion.
> > +	 * Unref the item and unlock the buffer unless held or stale. Stale
> > +	 * buffers remain locked until final unpin unless the bli is freed by
> > +	 * the unref call. The latter implies shutdown because buffer
> > +	 * invalidation dirties the bli and transaction.
> >  	 */
> > -	freed = atomic_dec_and_test(&bip->bli_refcount);
> > -	if (freed) {
> > -		ASSERT(!aborted || XFS_FORCED_SHUTDOWN(lip->li_mountp));
> > -		/*
> > -		 * An aborted item may be in the AIL regardless of dirty state.
> > -		 * For example, consider an aborted transaction that invalidated
> > -		 * a dirty bli and cleared the dirty state.
> > -		 */
> > -		if (aborted)
> > -			xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
> > -		if (aborted || !dirty)
> > -			xfs_buf_item_relse(bp);
> > -	} else if (stale) {
> > -		/*
> > -		 * Stale buffers remain locked until final unpin unless the bli
> > -		 * was freed in the branch above. A freed stale bli implies an
> > -		 * abort because buffer invalidation dirties the bli and
> > -		 * transaction.
> > -		 */
> > -		ASSERT(!freed);
> > +	released = xfs_buf_item_unref(bip);
> > +	if ((stale && !released) || hold)
> >  		return;
> 
> Personal preference, but I'd put the hold check first because it's
> not a compound logic statement.
> 

Ok.

> > +bool
> > +xfs_buf_item_unref(
> 
> "+xfs_buf_item_put()" to be consistent with dropping references on
> other reference counted objects (e.g.  iput, dput, bio_put,
> xfs_perag_put, xfs_log_ticket_put, etc).
> 

Works for me.

> > +	struct xfs_buf_log_item	*bip)
> > +{
> > +	struct xfs_log_item	*lip = &bip->bli_item;
> > +	bool			aborted;
> > +	bool			dirty;
> > +	bool			freed;
> > +
> > +	/* drop the bli ref and return if it wasn't the last one */
> > +	freed = atomic_dec_and_test(&bip->bli_refcount);
> > +	if (!freed)
> > +		return false;
> 
> We don't really need the freed variable here. This
> 
> 	if (!atomic_dec_and_test(refcount))
> 		return false
> 
> is a common enough "do something only when we drop the last
> reference" pattern that everyone should understand it by
> now.
> 

Yeah, I think I just missed that the associated asserts (where freed was
used) were made unnecessary by the simplified logic.

> > +	 * If the bli is dirty and non-aborted, the buffer was clean in the
> > +	 * transaction but still awaiting writeback from previous changes. In
> > +	 * that case, the bli is freed on buffer writeback completion.
> > +	 */
> > +	aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags) ||
> > +		  XFS_FORCED_SHUTDOWN(lip->li_mountp);
> > +	dirty = bip->bli_flags & XFS_BLI_DIRTY;
> > +	if (!aborted && dirty)
> > +		return false;
> 
> Personal preference (again) is to check dirty first - same as the
> comment text mentions "dirty and not-aborted"...
> 

Makes sense. The dirty/clean state is also the more common/primary
consideration, whereas aborted == true should obviously be a
rare/outlier case.

> These are all minor things - I think it's a good improvement
> overall.
> 

Thanks. I'll send a v3 with the associated tweaks..

Brian

> Cheers,
> 
> dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-29 17:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-27 14:25 [PATCH v2 0/3] xfs: bli refcount fixups Brian Foster
2018-08-27 14:25 ` [PATCH 1/3] xfs: don't unlock invalidated buf on aborted tx commit Brian Foster
2018-08-27 14:25 ` [PATCH 2/3] xfs: clean up xfs_trans_brelse() Brian Foster
2018-08-28 22:36   ` Dave Chinner
2018-08-29 13:55     ` Brian Foster
2018-08-27 14:25 ` [PATCH 3/3] xfs: refactor xfs_buf_log_item reference count handling Brian Foster
2018-08-28 22:59   ` Dave Chinner
2018-08-29 13:55     ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).