public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: linux-xfs@vger.kernel.org
Cc: Dave Chinner <david@fromorbit.com>, Dave Chinner <dchinner@redhat.com>
Subject: [PATCH 5/5] xfs: lockless buffer lookup
Date: Sun,  3 Apr 2022 14:01:19 +0200	[thread overview]
Message-ID: <20220403120119.235457-6-hch@lst.de> (raw)
In-Reply-To: <20220403120119.235457-1-hch@lst.de>

From: Dave Chinner <dchinner@redhat.com>

Current work to merge the XFS inode life cycle with the VFS indoe
life cycle is finding some interesting issues. If we have a path
that hits buffer trylocks fairly hard (e.g. a non-blocking
background inode freeing function), we end up hitting massive
contention on the buffer cache hash locks:

-   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
   - 92.67% xfs_inodegc_worker
      - 92.13% xfs_inode_unlink
         - 91.52% xfs_inactive_ifree
            - 85.63% xfs_read_agi
               - 85.61% xfs_trans_read_buf_map
                  - 85.59% xfs_buf_read_map
                     - xfs_buf_get_map
                        - 85.55% xfs_buf_find
                           - 72.87% _raw_spin_lock
                              - do_raw_spin_lock
                                   71.86% __pv_queued_spin_lock_slowpath
                           - 8.74% xfs_buf_rele
                              - 7.88% _raw_spin_lock
                                 - 7.88% do_raw_spin_lock
                                      7.63% __pv_queued_spin_lock_slowpath
                           - 1.70% xfs_buf_trylock
                              - 1.68% down_trylock
                                 - 1.41% _raw_spin_lock_irqsave
                                    - 1.39% do_raw_spin_lock
                                         __pv_queued_spin_lock_slowpath
                           - 0.76% _raw_spin_unlock
                                0.75% do_raw_spin_unlock

This is basically hammering the pag->pag_buf_lock from lots of CPUs
doing trylocks at the same time. Most of the buffer trylock
operations ultimately fail after we've done the lookup, so we're
really hammering the buf hash lock whilst making no progress.

We can also see significant spinlock traffic on the same lock just
under normal operation when lots of tasks are accessing metadata
from the same AG, so let's avoid all this by creating a lookup fast
path which leverages the rhashtable's ability to do rcu protected
lookups.

We avoid races with the buffer release path by using
atomic_inc_not_zero() on the buffer hold count. Any buffer that is
in the LRU will have a non-zero count, thereby allowing the lockless
fast path to be taken in most cache hit situations. If the buffer
hold count is zero, then it is likely going through the release path
so in that case we fall back to the existing lookup miss slow path.
i.e. we simply use the fallback path where the caller allocates a
new buf and tries to insert it, but this time holding the
pag->pag_buf_lock.

The use of rcu protected lookups means that buffer handles now need
to be freed by RCU callbacks (same as inodes). We still free the
buffer pages before the RCU callback - we won't be trying to access
them at all on a buffer that has zero references - but we need the
buffer handle itself to be present for the entire rcu protected read
side to detect a zero hold count correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: rebased and split]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 25 +++++++++++++++++--------
 fs/xfs/xfs_buf.h |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index dd68aee52118c2..2d6d57e80f56d5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -295,6 +295,16 @@ xfs_buf_free_pages(
 	bp->b_flags &= ~_XBF_PAGES;
 }
 
+static void
+xfs_buf_free_callback(
+	struct callback_head	*cb)
+{
+	struct xfs_buf		*bp = container_of(cb, struct xfs_buf, b_rcu);
+
+	xfs_buf_free_maps(bp);
+	kmem_cache_free(xfs_buf_cache, bp);
+}
+
 static void
 xfs_buf_free(
 	struct xfs_buf		*bp)
@@ -308,10 +318,10 @@ xfs_buf_free(
 	else if (bp->b_flags & _XBF_KMEM)
 		kmem_free(bp->b_addr);
 
-	xfs_buf_free_maps(bp);
-	kmem_cache_free(xfs_buf_cache, bp);
+	call_rcu(&bp->b_rcu, xfs_buf_free_callback);
 }
 
+
 static int
 xfs_buf_alloc_kmem(
 	struct xfs_buf	*bp,
@@ -612,12 +622,11 @@ xfs_buf_get_map(
 
 	pag = xfs_perag_get(mp, xfs_daddr_to_agno(mp, cmap.bm_bn));
 
-	spin_lock(&pag->pag_buf_lock);
-	bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap,
-				    xfs_buf_hash_params);
-	if (bp)
-		atomic_inc(&bp->b_hold);
-	spin_unlock(&pag->pag_buf_lock);
+	rcu_read_lock();
+	bp = rhashtable_lookup(&pag->pag_buf_hash, &cmap, xfs_buf_hash_params);
+	if (bp && !atomic_inc_not_zero(&bp->b_hold))
+		bp = NULL;
+	rcu_read_unlock();
 
 	if (unlikely(!bp)) {
 		if (flags & XBF_NOALLOC) {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index a28a2c5a496ab5..b74761a4e83025 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -194,6 +194,7 @@ struct xfs_buf {
 	int			b_last_error;
 
 	const struct xfs_buf_ops	*b_ops;
+	struct rcu_head		b_rcu;
 };
 
 /* Finding and Reading Buffers */
-- 
2.30.2


      parent reply	other threads:[~2022-04-03 12:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-03 12:01 lockless and cleaned up buffer lookup Christoph Hellwig
2022-04-03 12:01 ` [PATCH 1/5] xfs: add a flags argument to xfs_buf_get Christoph Hellwig
2022-04-03 12:01 ` [PATCH 2/5] xfs: replace xfs_buf_incore with an XBF_NOALLOC flag to xfs_buf_get* Christoph Hellwig
2022-04-03 21:54   ` Dave Chinner
2022-04-05 14:55     ` Christoph Hellwig
2022-04-05 21:21       ` Dave Chinner
2022-04-06 16:24         ` Christoph Hellwig
2022-04-03 12:01 ` [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Christoph Hellwig
2022-04-03 23:04   ` Dave Chinner
2022-04-05 15:00     ` Christoph Hellwig
2022-04-05 22:01       ` Dave Chinner
2022-04-06 16:26         ` Christoph Hellwig
2022-04-03 12:01 ` [PATCH 4/5] xfs: reduce the number of atomic when locking a buffer after lookup Christoph Hellwig
2022-04-03 12:01 ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220403120119.235457-6-hch@lst.de \
    --to=hch@lst.de \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox