From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Chandan Babu R <chandan.babu@oracle.com>,
Dave Chinner <david@fromorbit.com>,
linux-xfs@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 11/11] xfs: skip flushing log items during push
Date: Thu, 20 Jun 2024 12:51:42 -0700 [thread overview]
Message-ID: <20240620195142.GG103034@frogsfrogsfrogs> (raw)
In-Reply-To: <20240620072146.530267-12-hch@lst.de>
On Thu, Jun 20, 2024 at 09:21:28AM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> The AIL pushing code spends a huge amount of time skipping over
> items that are already marked as flushing. It is not uncommon to
> see hundreds of thousands of items skipped every second due to inode
> clustering marking all the inodes in a cluster as flushing when the
> first one is flushed.
>
> However, to discover an item is already flushing and should be
> skipped we have to call the iop_push() method for it to try to flush
> the item. For inodes (where this matters most), we have to first
> check that inode is flushable first.
>
> We can optimise this overhead away by tracking whether the log item
> is flushing internally. This allows xfsaild_push() to check the log
> item directly for flushing state and immediately skip the log item.
> Whilst this doesn't remove the CPU cache misses for loading the log
> item, it does avoid the overhead of an indirect function call
> and the cache misses involved in accessing inode and
> backing cluster buffer structures to determine flushing state. When
> trying to flush hundreds of thousands of inodes each second, this
> CPU overhead saving adds up quickly.
>
> It's so noticeable that the biggest issue with pushing on the AIL on
> fast storage becomes the 10ms back-off wait when we hit enough
> pinned buffers to break out of the push loop but not enough for the
> AIL pushing to be considered stuck. This limits the xfsaild to about
> 70% total CPU usage, and on fast storage this isn't enough to keep
> the storage 100% busy.
>
> The xfsaild will block on IO submission on slow storage and so is
> self throttling - it does not need a backoff in the case where we
> are really just breaking out of the walk to submit the IO we have
> gathered.
>
> Further with no backoff we don't need to gather huge delwri lists to
> mitigate the impact of backoffs, so we can submit IO more frequently
> and reduce the time log items spend in flushing state by breaking
> out of the item push loop once we've gathered enough IO to batch
> submission effectively.
Is that what the new count > 1000 branch does?
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_inode.c | 1 +
> fs/xfs/xfs_inode_item.c | 6 +++++-
Does it make sense to do this for buffer or dquot items too?
Modulo my questions, the rest of the changes in this series look sensible.
--D
> fs/xfs/xfs_trans.h | 4 +++-
> fs/xfs/xfs_trans_ail.c | 8 +++++++-
> 4 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 58fb7a5062e1e6..da611ec56f1be0 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3713,6 +3713,7 @@ xfs_iflush(
> iip->ili_last_fields = iip->ili_fields;
> iip->ili_fields = 0;
> iip->ili_fsync_fields = 0;
> + set_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags);
> spin_unlock(&iip->ili_lock);
>
> /*
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index f28d653300d124..ba49e56820f0a7 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -933,6 +933,7 @@ xfs_iflush_finish(
> }
> iip->ili_last_fields = 0;
> iip->ili_flush_lsn = 0;
> + clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
> spin_unlock(&iip->ili_lock);
> xfs_iflags_clear(iip->ili_inode, XFS_IFLUSHING);
> if (drop_buffer)
> @@ -991,8 +992,10 @@ xfs_buf_inode_io_fail(
> {
> struct xfs_log_item *lip;
>
> - list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
> + list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> set_bit(XFS_LI_FAILED, &lip->li_flags);
> + clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
> + }
> }
>
> /*
> @@ -1011,6 +1014,7 @@ xfs_iflush_abort_clean(
> iip->ili_flush_lsn = 0;
> iip->ili_item.li_buf = NULL;
> list_del_init(&iip->ili_item.li_bio_list);
> + clear_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags);
> }
>
> /*
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 1636663707dc04..20eb6ea7ebaa04 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -58,13 +58,15 @@ struct xfs_log_item {
> #define XFS_LI_FAILED 2
> #define XFS_LI_DIRTY 3
> #define XFS_LI_WHITEOUT 4
> +#define XFS_LI_FLUSHING 5
>
> #define XFS_LI_FLAGS \
> { (1u << XFS_LI_IN_AIL), "IN_AIL" }, \
> { (1u << XFS_LI_ABORTED), "ABORTED" }, \
> { (1u << XFS_LI_FAILED), "FAILED" }, \
> { (1u << XFS_LI_DIRTY), "DIRTY" }, \
> - { (1u << XFS_LI_WHITEOUT), "WHITEOUT" }
> + { (1u << XFS_LI_WHITEOUT), "WHITEOUT" }, \
> + { (1u << XFS_LI_FLUSHING), "FLUSHING" }
>
> struct xfs_item_ops {
> unsigned flags;
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 6a106a05fae017..0fafcc9f3dbe44 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -512,6 +512,9 @@ xfsaild_push(
> while ((XFS_LSN_CMP(lip->li_lsn, ailp->ail_target) <= 0)) {
> int lock_result;
>
> + if (test_bit(XFS_LI_FLUSHING, &lip->li_flags))
> + goto next_item;
> +
> /*
> * Note that iop_push may unlock and reacquire the AIL lock. We
> * rely on the AIL cursor implementation to be able to deal with
> @@ -581,9 +584,12 @@ xfsaild_push(
> if (stuck > 100)
> break;
>
> +next_item:
> lip = xfs_trans_ail_cursor_next(ailp, &cur);
> if (lip == NULL)
> break;
> + if (lip->li_lsn != lsn && count > 1000)
> + break;
> lsn = lip->li_lsn;
> }
>
> @@ -620,7 +626,7 @@ xfsaild_push(
> /*
> * Assume we have more work to do in a short while.
> */
> - tout = 10;
> + tout = 0;
> }
>
> return tout;
> --
> 2.43.0
>
>
next prev parent reply other threads:[~2024-06-20 19:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-20 7:21 xfs: byte-based grant head reservation tracking v4 Christoph Hellwig
2024-06-20 7:21 ` [PATCH 01/11] xfs: fix the contact address for the sysfs ABI documentation Christoph Hellwig
2024-06-20 15:20 ` Darrick J. Wong
2024-06-20 7:21 ` [PATCH 02/11] xfs: move and rename xfs_trans_committed_bulk Christoph Hellwig
2024-06-20 7:21 ` [PATCH 03/11] xfs: AIL doesn't need manual pushing Christoph Hellwig
2024-06-20 19:01 ` Darrick J. Wong
2024-06-20 7:21 ` [PATCH 04/11] xfs: background AIL push should target physical space Christoph Hellwig
2024-06-20 19:42 ` Darrick J. Wong
2024-06-21 5:37 ` Christoph Hellwig
2024-06-21 16:18 ` Darrick J. Wong
2024-06-20 7:21 ` [PATCH 05/11] xfs: ensure log tail is always up to date Christoph Hellwig
2024-06-20 7:21 ` [PATCH 06/11] xfs: l_last_sync_lsn is really AIL state Christoph Hellwig
2024-06-20 7:21 ` [PATCH 07/11] xfs: collapse xlog_state_set_callback in caller Christoph Hellwig
2024-06-20 7:21 ` [PATCH 08/11] xfs: track log space pinned by the AIL Christoph Hellwig
2024-06-20 7:21 ` [PATCH 09/11] xfs: pass the full grant head to accounting functions Christoph Hellwig
2024-06-20 7:21 ` [PATCH 10/11] xfs: grant heads track byte counts, not LSNs Christoph Hellwig
2024-07-01 0:59 ` Dave Chinner
2024-06-20 7:21 ` [PATCH 11/11] xfs: skip flushing log items during push Christoph Hellwig
2024-06-20 19:51 ` Darrick J. Wong [this message]
2024-06-21 5:48 ` Christoph Hellwig
2024-06-21 17:46 ` Darrick J. Wong
2024-07-02 18:51 ` Darrick J. Wong
2024-07-02 23:29 ` Dave Chinner
2024-07-03 5:10 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240620195142.GG103034@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox