From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Dave Chinner <dchinner@redhat.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: fix buffer lookup vs release race
Date: Tue, 14 Jan 2025 07:55:30 +1100 [thread overview]
Message-ID: <Z4V9wg8dbLXvq8hy@dread.disaster.area> (raw)
In-Reply-To: <20250113042542.2051287-3-hch@lst.de>
On Mon, Jan 13, 2025 at 05:24:27AM +0100, Christoph Hellwig wrote:
> Since commit 298f34224506 ("xfs: lockless buffer lookup") the buffer
> lookup fastpath is done without a hash-wide lock (then pag_buf_lock, now
> bc_lock) and only under RCU protection. But this means that nothing
> serializes lookups against the temporary 0 reference count for buffers
> that are added to the LRU after dropping the last regular reference,
> and a concurrent lookup would fail to find them.
>
> Fix this by doing all b_hold modifications under b_lock. We're already
> doing this for release so this "only" ~ doubles the b_lock round trips.
> We'll later look into the lockref infrastructure to optimize the number
> of lock round trips again.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Logic changes look ok, but...
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 3d56bc7a35cc..cbf7c2a076c7 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -172,7 +172,8 @@ struct xfs_buf {
>
> xfs_daddr_t b_rhash_key; /* buffer cache index */
> int b_length; /* size of buffer in BBs */
> - atomic_t b_hold; /* reference count */
> + spinlock_t b_lock; /* internal state lock */
> + unsigned int b_hold; /* reference count */
> atomic_t b_lru_ref; /* lru reclaim ref count */
> xfs_buf_flags_t b_flags; /* status flags */
> struct semaphore b_sema; /* semaphore for lockables */
> @@ -182,7 +183,6 @@ struct xfs_buf {
> * bt_lru_lock and not by b_sema
> */
> struct list_head b_lru; /* lru list */
> - spinlock_t b_lock; /* internal state lock */
> unsigned int b_state; /* internal state flags */
> int b_io_error; /* internal IO error state */
> wait_queue_head_t b_waiters; /* unpin waiters */
... I think this is misguided.
The idea behind the initial cacheline layout is that it should stay
read-only as much as possible so that cache lookups can walk the
buffer without causing shared/exclusive cacheline contention with
existing buffer users.
This was really important back in the days when the cache used a
rb-tree (i.e. the rbnode pointers dominated lookup profiles), and
it's still likely important with the rhashtable on large caches.
i.e. Putting a spinlock in that first cache line will result in
lookups and shrinker walks having cacheline contention as the
shrinker needs exclusive access for the spin lock, whilst the lookup
walk needs shared access for the b_rhash_head, b_rhash_key and
b_length fields in _xfs_buf_obj_cmp() for lookless lookup
concurrency.
Hence I think it would be better to move the b_hold field to the
same cacheline as the b_state field rather than move it to the
initial cacheline that cache lookups walk...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-01-13 20:55 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-13 4:24 fix buffer refcount races Christoph Hellwig
2025-01-13 4:24 ` [PATCH 1/2] xfs: check for dead buffers in xfs_buf_find_insert Christoph Hellwig
2025-01-13 4:56 ` Darrick J. Wong
2025-01-14 0:24 ` Dave Chinner
2025-01-13 4:24 ` [PATCH 2/2] xfs: fix buffer lookup vs release race Christoph Hellwig
2025-01-13 17:55 ` Darrick J. Wong
2025-01-13 20:55 ` Dave Chinner [this message]
2025-01-15 5:38 ` Christoph Hellwig
2025-01-15 11:21 ` Dave Chinner
2025-01-13 5:08 ` fix buffer refcount races Darrick J. Wong
2025-01-13 5:14 ` Christoph Hellwig
2025-01-13 7:13 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2025-01-16 6:01 fix buffer refcount races v2 Christoph Hellwig
2025-01-16 6:01 ` [PATCH 2/2] xfs: fix buffer lookup vs release race Christoph Hellwig
2025-01-25 7:35 ` Lai, Yi
2025-01-26 5:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4V9wg8dbLXvq8hy@dread.disaster.area \
--to=david@fromorbit.com \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox