From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/4] xfs: buffer pins need to hold a buffer reference
Date: Tue, 16 May 2023 18:26:42 -0700 [thread overview]
Message-ID: <20230517012642.GQ858799@frogsfrogsfrogs> (raw)
In-Reply-To: <20230517000449.3997582-2-david@fromorbit.com>
On Wed, May 17, 2023 at 10:04:46AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When a buffer is unpinned by xfs_buf_item_unpin(), we need to access
> the buffer after we've dropped the buffer log item reference count.
> This opens a window where we can have two racing unpins for the
> buffer item (e.g. shutdown checkpoint context callback processing
> racing with journal IO iclog completion processing) and both attempt
> to access the buffer after dropping the BLI reference count. If we
> are unlucky, the "BLI freed" context wins the race and frees the
> buffer before the "BLI still active" case checks the buffer pin
> count.
>
> This results in a use after free that can only be triggered
> in active filesystem shutdown situations.
>
> To fix this, we need to ensure that buffer existence extends beyond
> the BLI reference count checks and until the unpin processing is
> complete. This implies that a buffer pin operation must also take a
> buffer reference to ensure that the buffer cannot be freed until the
> buffer unpin processing is complete.
>
> Reported-by: yangerkun <yangerkun@huawei.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
LGTM
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 65 insertions(+), 23 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index df7322ed73fa..b2d211730fd2 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -452,10 +452,18 @@ xfs_buf_item_format(
> * This is called to pin the buffer associated with the buf log item in memory
> * so it cannot be written out.
> *
> - * We also always take a reference to the buffer log item here so that the bli
> - * is held while the item is pinned in memory. This means that we can
> - * unconditionally drop the reference count a transaction holds when the
> - * transaction is completed.
> + * We take a reference to the buffer log item here so that the BLI life cycle
> + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and
> + * inserted into the AIL.
> + *
> + * We also need to take a reference to the buffer itself as the BLI unpin
> + * processing requires accessing the buffer after the BLI has dropped the final
> + * BLI reference. See xfs_buf_item_unpin() for an explanation.
> + * If unpins race to drop the final BLI reference and only the
> + * BLI owns a reference to the buffer, then the loser of the race can have the
> + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per
> + * pin count ensures the life cycle of the buffer extends for as
> + * long as we hold the buffer pin reference in xfs_buf_item_unpin().
> */
> STATIC void
> xfs_buf_item_pin(
> @@ -470,13 +478,30 @@ xfs_buf_item_pin(
>
> trace_xfs_buf_item_pin(bip);
>
> + xfs_buf_hold(bip->bli_buf);
> atomic_inc(&bip->bli_refcount);
> atomic_inc(&bip->bli_buf->b_pin_count);
> }
>
> /*
> - * This is called to unpin the buffer associated with the buf log item which
> - * was previously pinned with a call to xfs_buf_item_pin().
> + * This is called to unpin the buffer associated with the buf log item which was
> + * previously pinned with a call to xfs_buf_item_pin(). We enter this function
> + * with a buffer pin count, a buffer reference and a BLI reference.
> + *
> + * We must drop the BLI reference before we unpin the buffer because the AIL
> + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the
> + * refcount drops to zero, the bli could still be AIL resident and the buffer
> + * submitted for I/O at any point before we return. This can result in IO
> + * completion freeing the buffer while we are still trying to access it here.
> + * This race condition can also occur in shutdown situations where we abort and
> + * unpin buffers from contexts other that journal IO completion.
> + *
> + * Hence we have to hold a buffer reference per pin count to ensure that the
> + * buffer cannot be freed until we have finished processing the unpin operation.
> + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we
> + * are done processing the buffer state. In the case of an abort (remove =
> + * true) then we re-use the current pin reference as the IO reference we hand
> + * off to IO failure handling.
> */
> STATIC void
> xfs_buf_item_unpin(
> @@ -493,24 +518,18 @@ xfs_buf_item_unpin(
>
> trace_xfs_buf_item_unpin(bip);
>
> - /*
> - * Drop the bli ref associated with the pin and grab the hold required
> - * for the I/O simulation failure in the abort case. We have to do this
> - * before the pin count drops because the AIL doesn't acquire a bli
> - * reference. Therefore if the refcount drops to zero, the bli could
> - * still be AIL resident and the buffer submitted for I/O (and freed on
> - * completion) at any point before we return. This can be removed once
> - * the AIL properly holds a reference on the bli.
> - */
> freed = atomic_dec_and_test(&bip->bli_refcount);
> - if (freed && !stale && remove)
> - xfs_buf_hold(bp);
> if (atomic_dec_and_test(&bp->b_pin_count))
> wake_up_all(&bp->b_waiters);
>
> - /* nothing to do but drop the pin count if the bli is active */
> - if (!freed)
> + /*
> + * Nothing to do but drop the buffer pin reference if the BLI is
> + * still active
> + */
> + if (!freed) {
> + xfs_buf_rele(bp);
> return;
> + }
>
> if (stale) {
> ASSERT(bip->bli_flags & XFS_BLI_STALE);
> @@ -522,6 +541,15 @@ xfs_buf_item_unpin(
>
> trace_xfs_buf_item_unpin_stale(bip);
>
> + /*
> + * The buffer has been locked and referenced since it was marked
> + * stale so we own both lock and reference exclusively here. We
> + * do not need the pin reference any more, so drop it now so
> + * that we only have one reference to drop once item completion
> + * processing is complete.
> + */
> + xfs_buf_rele(bp);
> +
> /*
> * If we get called here because of an IO error, we may or may
> * not have the item on the AIL. xfs_trans_ail_delete() will
> @@ -538,16 +566,30 @@ xfs_buf_item_unpin(
> ASSERT(bp->b_log_item == NULL);
> }
> xfs_buf_relse(bp);
> - } else if (remove) {
> + return;
> + }
> +
> + if (remove) {
> /*
> - * The buffer must be locked and held by the caller to simulate
> - * an async I/O failure. We acquired the hold for this case
> - * before the buffer was unpinned.
> + * We need to simulate an async IO failures here to ensure that
> + * the correct error completion is run on this buffer. This
> + * requires a reference to the buffer and for the buffer to be
> + * locked. We can safely pass ownership of the pin reference to
> + * the IO to ensure that nothing can free the buffer while we
> + * wait for the lock and then run the IO failure completion.
> */
> xfs_buf_lock(bp);
> bp->b_flags |= XBF_ASYNC;
> xfs_buf_ioend_fail(bp);
> + return;
> }
> +
> + /*
> + * BLI has no more active references - it will be moved to the AIL to
> + * manage the remaining BLI/buffer life cycle. There is nothing left for
> + * us to do here so drop the pin reference to the buffer.
> + */
> + xfs_buf_rele(bp);
> }
>
> STATIC uint
> --
> 2.40.1
>
next prev parent reply other threads:[~2023-05-17 1:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-17 0:04 xfs: bug fixes for 6.4-rcX Dave Chinner
2023-05-17 0:04 ` [PATCH 1/4] xfs: buffer pins need to hold a buffer reference Dave Chinner
2023-05-17 1:26 ` Darrick J. Wong [this message]
2023-05-17 12:58 ` Christoph Hellwig
2023-05-17 22:24 ` Dave Chinner
2023-05-17 0:04 ` [PATCH 2/4] xfs: restore allocation trylock iteration Dave Chinner
2023-05-17 1:11 ` Darrick J. Wong
2023-05-17 12:59 ` Christoph Hellwig
2023-05-17 0:04 ` [PATCH 3/4] xfs: defered work could create precommits Dave Chinner
2023-05-17 1:20 ` Darrick J. Wong
2023-05-17 1:44 ` Dave Chinner
2023-06-01 15:01 ` Christoph Hellwig
2023-05-17 0:04 ` [PATCH 4/4] xfs: fix AGF vs inode cluster buffer deadlock Dave Chinner
2023-05-17 1:26 ` Darrick J. Wong
2023-05-17 1:47 ` Dave Chinner
2023-06-01 1:51 ` Dave Chinner
2023-06-01 14:38 ` Darrick J. Wong
2023-06-01 15:12 ` Christoph Hellwig
2023-06-25 2:58 ` Matthew Wilcox
2023-06-25 22:34 ` Dave Chinner
2023-05-17 7:07 ` xfs: bug fixes for 6.4-rcX Amir Goldstein
2023-05-17 11:34 ` Dave Chinner
2023-05-17 12:48 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230517012642.GQ858799@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox