Re: [PATCH 6/6] xfs: lockless buffer lookup

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 6/6] xfs: lockless buffer lookup
Date: Sat, 9 Jul 2022 17:15:40 -0700	[thread overview]
Message-ID: <YsoaLD2T/aelPMxJ@magnolia> (raw)
In-Reply-To: <20220707235259.1097443-7-david@fromorbit.com>

On Fri, Jul 08, 2022 at 09:52:59AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that we have a standalone fast path for buffer lookup, we can
> easily convert it to use rcu lookups. When we continually hammer the
> buffer cache with trylock lookups, we end up with a huge amount of
> lock contention on the per-ag buffer hash locks:
> 
> -   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
>    - 92.67% xfs_inodegc_worker
>       - 92.13% xfs_inode_unlink
>          - 91.52% xfs_inactive_ifree
>             - 85.63% xfs_read_agi
>                - 85.61% xfs_trans_read_buf_map
>                   - 85.59% xfs_buf_read_map
>                      - xfs_buf_get_map
>                         - 85.55% xfs_buf_find
>                            - 72.87% _raw_spin_lock
>                               - do_raw_spin_lock
>                                    71.86% __pv_queued_spin_lock_slowpath
>                            - 8.74% xfs_buf_rele
>                               - 7.88% _raw_spin_lock
>                                  - 7.88% do_raw_spin_lock
>                                       7.63% __pv_queued_spin_lock_slowpath
>                            - 1.70% xfs_buf_trylock
>                               - 1.68% down_trylock
>                                  - 1.41% _raw_spin_lock_irqsave
>                                     - 1.39% do_raw_spin_lock
>                                          __pv_queued_spin_lock_slowpath
>                            - 0.76% _raw_spin_unlock
>                                 0.75% do_raw_spin_unlock
> 
> This is basically hammering the pag->pag_buf_lock from lots of CPUs
> doing trylocks at the same time. Most of the buffer trylock
> operations ultimately fail after we've done the lookup, so we're
> really hammering the buf hash lock whilst making no progress.
> 
> We can also see significant spinlock traffic on the same lock just
> under normal operation when lots of tasks are accessing metadata
> from the same AG, so let's avoid all this by converting the lookup
> fast path to leverages the rhashtable's ability to do rcu protected
> lookups.
> 
> We avoid races with the buffer release path by using
> atomic_inc_not_zero() on the buffer hold count. Any buffer that is
> in the LRU will have a non-zero count, thereby allowing the lockless
> fast path to be taken in most cache hit situations. If the buffer
> hold count is zero, then it is likely going through the release path
> so in that case we fall back to the existing lookup miss slow path.
> 
> The slow path will then do an atomic lookup and insert under the
> buffer hash lock and hence serialise correctly against buffer
> release freeing the buffer.
> 
> The use of rcu protected lookups means that buffer handles now need
> to be freed by RCU callbacks (same as inodes). We still free the
> buffer pages before the RCU callback - we won't be trying to access
> them at all on a buffer that has zero references - but we need the
> buffer handle itself to be present for the entire rcu protected read
> side to detect a zero hold count correctly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good, will test....
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf.c | 22 +++++++++++++++-------
>  fs/xfs/xfs_buf.h |  1 +
>  2 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1a6542e01bec..6dac5583977f 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -294,6 +294,16 @@ xfs_buf_free_pages(
>  	bp->b_flags &= ~_XBF_PAGES;
>  }
>  
> +static void
> +xfs_buf_free_callback(
> +	struct callback_head	*cb)
> +{
> +	struct xfs_buf		*bp = container_of(cb, struct xfs_buf, b_rcu);
> +
> +	xfs_buf_free_maps(bp);
> +	kmem_cache_free(xfs_buf_cache, bp);
> +}
> +
>  static void
>  xfs_buf_free(
>  	struct xfs_buf		*bp)
> @@ -307,8 +317,7 @@ xfs_buf_free(
>  	else if (bp->b_flags & _XBF_KMEM)
>  		kmem_free(bp->b_addr);
>  
> -	xfs_buf_free_maps(bp);
> -	kmem_cache_free(xfs_buf_cache, bp);
> +	call_rcu(&bp->b_rcu, xfs_buf_free_callback);
>  }
>  
>  static int
> @@ -567,14 +576,13 @@ xfs_buf_lookup(
>  	struct xfs_buf          *bp;
>  	int			error;
>  
> -	spin_lock(&pag->pag_buf_lock);
> +	rcu_read_lock();
>  	bp = rhashtable_lookup(&pag->pag_buf_hash, map, xfs_buf_hash_params);
> -	if (!bp) {
> -		spin_unlock(&pag->pag_buf_lock);
> +	if (!bp || !atomic_inc_not_zero(&bp->b_hold)) {
> +		rcu_read_unlock();
>  		return -ENOENT;
>  	}
> -	atomic_inc(&bp->b_hold);
> -	spin_unlock(&pag->pag_buf_lock);
> +	rcu_read_unlock();
>  
>  	error = xfs_buf_find_lock(bp, flags);
>  	if (error) {
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 58e9034d51bd..02b3c1635ec3 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -196,6 +196,7 @@ struct xfs_buf {
>  	int			b_last_error;
>  
>  	const struct xfs_buf_ops	*b_ops;
> +	struct rcu_head		b_rcu;
>  };
>  
>  /* Finding and Reading Buffers */
> -- 
> 2.36.1
>

next prev parent reply	other threads:[~2022-07-10  0:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 23:52 [PATCH 0/6 v3] xfs: lockless buffer lookups Dave Chinner
2022-07-07 23:52 ` [PATCH 1/6] xfs: rework xfs_buf_incore() API Dave Chinner
2022-07-07 23:52 ` [PATCH 2/6] xfs: break up xfs_buf_find() into individual pieces Dave Chinner
2022-07-09 22:58   ` Darrick J. Wong
2022-07-07 23:52 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-11  5:14   ` Christoph Hellwig
2022-07-12  0:01     ` Dave Chinner
2022-07-07 23:52 ` [PATCH 4/6] xfs: reduce the number of atomic when locking a buffer after lookup Dave Chinner
2022-07-07 23:52 ` [PATCH 5/6] xfs: remove a superflous hash lookup when inserting new buffers Dave Chinner
2022-07-07 23:52 ` [PATCH 6/6] xfs: lockless buffer lookup Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong [this message]
2022-07-13 17:01 ` [PATCH 0/6 v3] xfs: lockless buffer lookups Darrick J. Wong
2022-07-13 17:03   ` Darrick J. Wong
2022-07-14  1:32   ` Dave Chinner
2022-07-14  2:11     ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2022-06-27  6:08 [PATCH 0/6 v2] " Dave Chinner
2022-06-27  6:08 ` [PATCH 6/6] xfs: lockless buffer lookup Dave Chinner
2022-06-29  7:41   ` Christoph Hellwig
2022-06-29 22:04   ` Darrick J. Wong
2022-07-07 12:36     ` Dave Chinner
2022-07-07 17:55       ` Darrick J. Wong
2022-07-11  5:16       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsoaLD2T/aelPMxJ@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox