From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
Dave Chinner <dchinner@redhat.com>,
linux-xfs@vger.kernel.org, "Darrick J. Wong" <djwong@kernel.org>
Subject: Re: [PATCH 1/3] xfs: don't keep a reference for buffers on the LRU
Date: Fri, 23 Jan 2026 11:01:48 -0500 [thread overview]
Message-ID: <aXObbCA2bu6p_8by@bfoster> (raw)
In-Reply-To: <20260122052709.412336-2-hch@lst.de>
On Thu, Jan 22, 2026 at 06:26:55AM +0100, Christoph Hellwig wrote:
> Currently the buffer cache adds a reference to b_hold for buffers that
> are on the LRU. This seems to go all the way back and allows releasing
> buffers from the LRU using xfs_buf_rele. But it makes xfs_buf_rele
> really complicated in differs from how other LRUs are implemented in
> Linux.
>
> Switch to not having a reference for buffers in the LRU, and use a
> separate negative hold value to mark buffers as dead. This simplifies
> xfs_buf_rele, which now just deal with the last "real" reference,
> and prepares for using the lockref primitive.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> fs/xfs/xfs_buf.c | 140 ++++++++++++++++++-----------------------------
> fs/xfs/xfs_buf.h | 8 +--
> 2 files changed, 54 insertions(+), 94 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index db46883991de..aacdf080e400 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -80,11 +80,8 @@ xfs_buf_stale(
>
> spin_lock(&bp->b_lock);
> atomic_set(&bp->b_lru_ref, 0);
> - if (!(bp->b_state & XFS_BSTATE_DISPOSE) &&
> - (list_lru_del_obj(&bp->b_target->bt_lru, &bp->b_lru)))
> - bp->b_hold--;
> -
> - ASSERT(bp->b_hold >= 1);
> + if (bp->b_hold >= 0)
> + list_lru_del_obj(&bp->b_target->bt_lru, &bp->b_lru);
I see that b_hold goes away in subsequent patches, but nonetheless it's
unsigned as of this patch, which kind of makes this look like a
potential bisect bomb. I wouldn't want to turn this into a big effort
just to fix it up mid-series, but maybe this should just change b_hold
to a signed type and call out the transient wart in the commit log?
(A quick/spot test might not hurt either if that hadn't been done.)
> spin_unlock(&bp->b_lock);
> }
>
...
> @@ -862,76 +859,24 @@ xfs_buf_hold(
> }
>
...
> -static void
> -xfs_buf_rele_cached(
> +xfs_buf_destroy(
> struct xfs_buf *bp)
> {
...
> + if (!xfs_buf_is_uncached(bp)) {
> + struct xfs_buf_cache *bch =
> + xfs_buftarg_buf_cache(bp->b_target, bp->b_pag);
>
> - ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
> rhashtable_remove_fast(&bch->bc_hash, &bp->b_rhash_head,
> xfs_buf_hash_params);
This looks like a subtle locking change in that we're no longer under
b_lock..? AFAICT that is fine as RCU protection is more important for
the lookup side, but it also might be worth documenting that change in
the commit log as well so it's clear that it is intentional and safe.
Brian
> - if (pag)
> - xfs_perag_put(pag);
> - freebuf = true;
> - }
>
> -out_unlock:
> - spin_unlock(&bp->b_lock);
> + if (bp->b_pag)
> + xfs_perag_put(bp->b_pag);
> + }
>
> - if (freebuf)
> - xfs_buf_free(bp);
> + xfs_buf_free(bp);
> }
>
> /*
> @@ -942,10 +887,22 @@ xfs_buf_rele(
> struct xfs_buf *bp)
> {
> trace_xfs_buf_rele(bp, _RET_IP_);
> - if (xfs_buf_is_uncached(bp))
> - xfs_buf_rele_uncached(bp);
> - else
> - xfs_buf_rele_cached(bp);
> +
> + spin_lock(&bp->b_lock);
> + if (!--bp->b_hold) {
> + if (xfs_buf_is_uncached(bp) || !atomic_read(&bp->b_lru_ref))
> + goto kill;
> + list_lru_add_obj(&bp->b_target->bt_lru, &bp->b_lru);
> + }
> + spin_unlock(&bp->b_lock);
> + return;
> +
> +kill:
> + bp->b_hold = -1;
> + list_lru_del_obj(&bp->b_target->bt_lru, &bp->b_lru);
> + spin_unlock(&bp->b_lock);
> +
> + xfs_buf_destroy(bp);
> }
>
> /*
> @@ -1254,9 +1211,11 @@ xfs_buf_ioerror_alert(
>
> /*
> * To simulate an I/O failure, the buffer must be locked and held with at least
> - * three references. The LRU reference is dropped by the stale call. The buf
> - * item reference is dropped via ioend processing. The third reference is owned
> - * by the caller and is dropped on I/O completion if the buffer is XBF_ASYNC.
> + * two references.
> + *
> + * The buf item reference is dropped via ioend processing. The second reference
> + * is owned by the caller and is dropped on I/O completion if the buffer is
> + * XBF_ASYNC.
> */
> void
> xfs_buf_ioend_fail(
> @@ -1514,19 +1473,14 @@ xfs_buftarg_drain_rele(
>
> if (!spin_trylock(&bp->b_lock))
> return LRU_SKIP;
> - if (bp->b_hold > 1) {
> + if (bp->b_hold > 0) {
> /* need to wait, so skip it this pass */
> spin_unlock(&bp->b_lock);
> trace_xfs_buf_drain_buftarg(bp, _RET_IP_);
> return LRU_SKIP;
> }
>
> - /*
> - * clear the LRU reference count so the buffer doesn't get
> - * ignored in xfs_buf_rele().
> - */
> - atomic_set(&bp->b_lru_ref, 0);
> - bp->b_state |= XFS_BSTATE_DISPOSE;
> + bp->b_hold = -1;
> list_lru_isolate_move(lru, item, dispose);
> spin_unlock(&bp->b_lock);
> return LRU_REMOVED;
> @@ -1581,7 +1535,7 @@ xfs_buftarg_drain(
> "Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> (long long)xfs_buf_daddr(bp));
> }
> - xfs_buf_rele(bp);
> + xfs_buf_destroy(bp);
> }
> if (loop++ != 0)
> delay(100);
> @@ -1610,11 +1564,23 @@ xfs_buftarg_isolate(
> struct list_head *dispose = arg;
>
> /*
> - * we are inverting the lru lock/bp->b_lock here, so use a trylock.
> - * If we fail to get the lock, just skip it.
> + * We are inverting the lru lock vs bp->b_lock order here, so use a
> + * trylock. If we fail to get the lock, just skip the buffer.
> */
> if (!spin_trylock(&bp->b_lock))
> return LRU_SKIP;
> +
> + /*
> + * If the buffer is in use, remove it from the LRU for now as we can't
> + * free it. It will be added to the LRU again when the reference count
> + * hits zero.
> + */
> + if (bp->b_hold > 0) {
> + list_lru_isolate(lru, &bp->b_lru);
> + spin_unlock(&bp->b_lock);
> + return LRU_REMOVED;
> + }
> +
> /*
> * Decrement the b_lru_ref count unless the value is already
> * zero. If the value is already zero, we need to reclaim the
> @@ -1625,7 +1591,7 @@ xfs_buftarg_isolate(
> return LRU_ROTATE;
> }
>
> - bp->b_state |= XFS_BSTATE_DISPOSE;
> + bp->b_hold = -1;
> list_lru_isolate_move(lru, item, dispose);
> spin_unlock(&bp->b_lock);
> return LRU_REMOVED;
> @@ -1647,7 +1613,7 @@ xfs_buftarg_shrink_scan(
> struct xfs_buf *bp;
> bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> list_del_init(&bp->b_lru);
> - xfs_buf_rele(bp);
> + xfs_buf_destroy(bp);
> }
>
> return freed;
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index e25cd2a160f3..1117cd9cbfb9 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -68,11 +68,6 @@ typedef unsigned int xfs_buf_flags_t;
> { XBF_INCORE, "INCORE" }, \
> { XBF_TRYLOCK, "TRYLOCK" }
>
> -/*
> - * Internal state flags.
> - */
> -#define XFS_BSTATE_DISPOSE (1 << 0) /* buffer being discarded */
> -
> struct xfs_buf_cache {
> struct rhashtable bc_hash;
> };
> @@ -159,6 +154,7 @@ struct xfs_buf {
>
> xfs_daddr_t b_rhash_key; /* buffer cache index */
> int b_length; /* size of buffer in BBs */
> + spinlock_t b_lock; /* internal state lock */
> unsigned int b_hold; /* reference count */
> atomic_t b_lru_ref; /* lru reclaim ref count */
> xfs_buf_flags_t b_flags; /* status flags */
> @@ -169,8 +165,6 @@ struct xfs_buf {
> * bt_lru_lock and not by b_sema
> */
> struct list_head b_lru; /* lru list */
> - spinlock_t b_lock; /* internal state lock */
> - unsigned int b_state; /* internal state flags */
> wait_queue_head_t b_waiters; /* unpin waiters */
> struct list_head b_list;
> struct xfs_perag *b_pag;
> --
> 2.47.3
>
>
next prev parent reply other threads:[~2026-01-23 16:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-22 5:26 buffer cache simplification v2 Christoph Hellwig
2026-01-22 5:26 ` [PATCH 1/3] xfs: don't keep a reference for buffers on the LRU Christoph Hellwig
2026-01-23 11:55 ` Carlos Maiolino
2026-01-23 16:01 ` Brian Foster [this message]
2026-01-22 5:26 ` [PATCH 2/3] xfs: use a lockref for the buffer reference count Christoph Hellwig
2026-01-23 12:04 ` Carlos Maiolino
2026-01-22 5:26 ` [PATCH 3/3] xfs: switch (back) to a per-buftarg buffer hash Christoph Hellwig
2026-01-22 18:10 ` Darrick J. Wong
2026-01-23 5:36 ` Christoph Hellwig
2026-01-23 7:01 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2026-01-26 5:37 buffer cache simplification v3 Christoph Hellwig
2026-01-26 5:38 ` [PATCH 1/3] xfs: don't keep a reference for buffers on the LRU Christoph Hellwig
2026-01-26 19:18 ` Brian Foster
2026-01-27 5:20 ` Christoph Hellwig
2026-01-27 15:42 ` Brian Foster
2026-01-27 16:42 ` Christoph Hellwig
2026-01-19 15:31 buffer cache simplification Christoph Hellwig
2026-01-19 15:31 ` [PATCH 1/3] xfs: don't keep a reference for buffers on the LRU Christoph Hellwig
2026-01-20 2:53 ` Darrick J. Wong
2026-01-20 6:55 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aXObbCA2bu6p_8by@bfoster \
--to=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox