linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Chandan Babu R <chandan.babu@oracle.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: [PATCH 11/11] xfs: skip flushing log items during push
Date: Thu, 20 Jun 2024 09:21:28 +0200	[thread overview]
Message-ID: <20240620072146.530267-12-hch@lst.de> (raw)
In-Reply-To: <20240620072146.530267-1-hch@lst.de>

From: Dave Chinner <dchinner@redhat.com>

The AIL pushing code spends a huge amount of time skipping over
items that are already marked as flushing. It is not uncommon to
see hundreds of thousands of items skipped every second due to inode
clustering marking all the inodes in a cluster as flushing when the
first one is flushed.

However, to discover an item is already flushing and should be
skipped we have to call the iop_push() method for it to try to flush
the item. For inodes (where this matters most), we have to first
check that inode is flushable first.

We can optimise this overhead away by tracking whether the log item
is flushing internally. This allows xfsaild_push() to check the log
item directly for flushing state and immediately skip the log item.
Whilst this doesn't remove the CPU cache misses for loading the log
item, it does avoid the overhead of an indirect function call
and the cache misses involved in accessing inode and
backing cluster buffer structures to determine flushing state. When
trying to flush hundreds of thousands of inodes each second, this
CPU overhead saving adds up quickly.

It's so noticeable that the biggest issue with pushing on the AIL on
fast storage becomes the 10ms back-off wait when we hit enough
pinned buffers to break out of the push loop but not enough for the
AIL pushing to be considered stuck. This limits the xfsaild to about
70% total CPU usage, and on fast storage this isn't enough to keep
the storage 100% busy.

The xfsaild will block on IO submission on slow storage and so is
self throttling - it does not need a backoff in the case where we
are really just breaking out of the walk to submit the IO we have
gathered.

Further with no backoff we don't need to gather huge delwri lists to
mitigate the impact of backoffs, so we can submit IO more frequently
and reduce the time log items spend in flushing state by breaking
out of the item push loop once we've gathered enough IO to batch
submission effectively.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c      | 1 +
 fs/xfs/xfs_inode_item.c | 6 +++++-
 fs/xfs/xfs_trans.h      | 4 +++-
 fs/xfs/xfs_trans_ail.c  | 8 +++++++-
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 58fb7a5062e1e6..da611ec56f1be0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3713,6 +3713,7 @@ xfs_iflush(
 	iip->ili_last_fields = iip->ili_fields;
 	iip->ili_fields = 0;
 	iip->ili_fsync_fields = 0;
+	set_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags);
 	spin_unlock(&iip->ili_lock);
 
 	/*
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f28d653300d124..ba49e56820f0a7 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -933,6 +933,7 @@ xfs_iflush_finish(
 		}
 		iip->ili_last_fields = 0;
 		iip->ili_flush_lsn = 0;
+		clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
 		spin_unlock(&iip->ili_lock);
 		xfs_iflags_clear(iip->ili_inode, XFS_IFLUSHING);
 		if (drop_buffer)
@@ -991,8 +992,10 @@ xfs_buf_inode_io_fail(
 {
 	struct xfs_log_item	*lip;
 
-	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
+	list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
 		set_bit(XFS_LI_FAILED, &lip->li_flags);
+		clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
+	}
 }
 
 /*
@@ -1011,6 +1014,7 @@ xfs_iflush_abort_clean(
 	iip->ili_flush_lsn = 0;
 	iip->ili_item.li_buf = NULL;
 	list_del_init(&iip->ili_item.li_bio_list);
+	clear_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags);
 }
 
 /*
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 1636663707dc04..20eb6ea7ebaa04 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -58,13 +58,15 @@ struct xfs_log_item {
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3
 #define	XFS_LI_WHITEOUT	4
+#define	XFS_LI_FLUSHING	5
 
 #define XFS_LI_FLAGS \
 	{ (1u << XFS_LI_IN_AIL),	"IN_AIL" }, \
 	{ (1u << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1u << XFS_LI_FAILED),	"FAILED" }, \
 	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
-	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
+	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }, \
+	{ (1u << XFS_LI_FLUSHING),	"FLUSHING" }
 
 struct xfs_item_ops {
 	unsigned flags;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 6a106a05fae017..0fafcc9f3dbe44 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -512,6 +512,9 @@ xfsaild_push(
 	while ((XFS_LSN_CMP(lip->li_lsn, ailp->ail_target) <= 0)) {
 		int	lock_result;
 
+		if (test_bit(XFS_LI_FLUSHING, &lip->li_flags))
+			goto next_item;
+
 		/*
 		 * Note that iop_push may unlock and reacquire the AIL lock.  We
 		 * rely on the AIL cursor implementation to be able to deal with
@@ -581,9 +584,12 @@ xfsaild_push(
 		if (stuck > 100)
 			break;
 
+next_item:
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
 		if (lip == NULL)
 			break;
+		if (lip->li_lsn != lsn && count > 1000)
+			break;
 		lsn = lip->li_lsn;
 	}
 
@@ -620,7 +626,7 @@ xfsaild_push(
 		/*
 		 * Assume we have more work to do in a short while.
 		 */
-		tout = 10;
+		tout = 0;
 	}
 
 	return tout;
-- 
2.43.0


  parent reply	other threads:[~2024-06-20  7:22 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20  7:21 xfs: byte-based grant head reservation tracking v4 Christoph Hellwig
2024-06-20  7:21 ` [PATCH 01/11] xfs: fix the contact address for the sysfs ABI documentation Christoph Hellwig
2024-06-20 15:20   ` Darrick J. Wong
2024-06-20  7:21 ` [PATCH 02/11] xfs: move and rename xfs_trans_committed_bulk Christoph Hellwig
2024-06-20  7:21 ` [PATCH 03/11] xfs: AIL doesn't need manual pushing Christoph Hellwig
2024-06-20 19:01   ` Darrick J. Wong
2024-06-20  7:21 ` [PATCH 04/11] xfs: background AIL push should target physical space Christoph Hellwig
2024-06-20 19:42   ` Darrick J. Wong
2024-06-21  5:37     ` Christoph Hellwig
2024-06-21 16:18       ` Darrick J. Wong
2024-06-20  7:21 ` [PATCH 05/11] xfs: ensure log tail is always up to date Christoph Hellwig
2024-06-20  7:21 ` [PATCH 06/11] xfs: l_last_sync_lsn is really AIL state Christoph Hellwig
2024-06-20  7:21 ` [PATCH 07/11] xfs: collapse xlog_state_set_callback in caller Christoph Hellwig
2024-06-20  7:21 ` [PATCH 08/11] xfs: track log space pinned by the AIL Christoph Hellwig
2024-06-20  7:21 ` [PATCH 09/11] xfs: pass the full grant head to accounting functions Christoph Hellwig
2024-06-20  7:21 ` [PATCH 10/11] xfs: grant heads track byte counts, not LSNs Christoph Hellwig
2024-07-01  0:59   ` Dave Chinner
2024-06-20  7:21 ` Christoph Hellwig [this message]
2024-06-20 19:51   ` [PATCH 11/11] xfs: skip flushing log items during push Darrick J. Wong
2024-06-21  5:48     ` Christoph Hellwig
2024-06-21 17:46       ` Darrick J. Wong
2024-07-02 18:51         ` Darrick J. Wong
2024-07-02 23:29           ` Dave Chinner
2024-07-03  5:10             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240620072146.530267-12-hch@lst.de \
    --to=hch@lst.de \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).