XFS stable LTS mailing list
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: djwong@kernel.org,gregkh@linuxfoundation.org,hch@lst.de,leah.rumancik@gmail.com,xfs-stable@lists.linux.dev
Cc: <stable-commits@vger.kernel.org>
Subject: Patch "xfs: force all buffers to be written during btree bulk load" has been added to the 6.1-stable tree
Date: Sun, 16 Mar 2025 07:17:06 +0100	[thread overview]
Message-ID: <2025031606-hummus-verse-3052@gregkh> (raw)
In-Reply-To: <20250313202550.2257219-23-leah.rumancik@gmail.com>


This is a note to let you know that I've just added the patch titled

    xfs: force all buffers to be written during btree bulk load

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-force-all-buffers-to-be-written-during-btree-bulk-load.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From stable+bounces-124380-greg=kroah.com@vger.kernel.org Thu Mar 13 21:26:40 2025
From: Leah Rumancik <leah.rumancik@gmail.com>
Date: Thu, 13 Mar 2025 13:25:42 -0700
Subject: xfs: force all buffers to be written during btree bulk load
To: stable@vger.kernel.org
Cc: xfs-stable@lists.linux.dev, "Darrick J. Wong" <djwong@kernel.org>, Christoph Hellwig <hch@lst.de>, Leah Rumancik <leah.rumancik@gmail.com>
Message-ID: <20250313202550.2257219-23-leah.rumancik@gmail.com>

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 ]

While stress-testing online repair of btrees, I noticed periodic
assertion failures from the buffer cache about buffers with incorrect
DELWRI_Q state.  Looking further, I observed this race between the AIL
trying to write out a btree block and repair zapping a btree block after
the fact:

AIL:    Repair0:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

delwri_submit   # oops

Worse yet, I discovered that running the same repair over and over in a
tight loop can result in a second race that cause data integrity
problems with the repair:

AIL:    Repair0:        Repair1:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

                        find free space X
                        get buffer
                        rewrite buffer
                        delwri_queue:
                        set DELWRI_Q
                        already on a list, do not add
                        commit

                        BAD: committed tree root before all blocks written

delwri_submit   # too late now

I traced this to my own misunderstanding of how the delwri lists work,
particularly with regards to the AIL's buffer list.  If a buffer is
logged and committed, the buffer can end up on that AIL buffer list.  If
btree repairs are run twice in rapid succession, it's possible that the
first repair will invalidate the buffer and free it before the next time
the AIL wakes up.  Marking the buffer stale clears DELWRI_Q from the
buffer state without removing the buffer from its delwri list.  The
buffer doesn't know which list it's on, so it cannot know which lock to
take to protect the list for a removal.

If the second repair allocates the same block, it will then recycle the
buffer to start writing the new btree block.  Meanwhile, if the AIL
wakes up and walks the buffer list, it will ignore the buffer because it
can't lock it, and go back to sleep.

When the second repair calls delwri_queue to put the buffer on the
list of buffers to write before committing the new btree, it will set
DELWRI_Q again, but since the buffer hasn't been removed from the AIL's
buffer list, it won't add it to the bulkload buffer's list.

This is incorrect, because the bulkload caller relies on delwri_submit
to ensure that all the buffers have been sent to disk /before/
committing the new btree root pointer.  This ordering requirement is
required for data consistency.

Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally
drop it, so the next thread to walk through the btree will trip over a
debug assertion on that flag.

To fix this, create a new function that waits for the buffer to be
removed from any other delwri lists before adding the buffer to the
caller's delwri list.  By waiting for the buffer to clear both the
delwri list and any potential delwri wait list, we can be sure that
repair will initiate writes of all buffers and report all write errors
back to userspace instead of committing the new structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/libxfs/xfs_btree_staging.c |    4 ---
 fs/xfs/xfs_buf.c                  |   44 ++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_buf.h                  |    1 
 3 files changed, 42 insertions(+), 7 deletions(-)

--- a/fs/xfs/libxfs/xfs_btree_staging.c
+++ b/fs/xfs/libxfs/xfs_btree_staging.c
@@ -342,9 +342,7 @@ xfs_btree_bload_drop_buf(
 	if (*bpp == NULL)
 		return;
 
-	if (!xfs_buf_delwri_queue(*bpp, buffers_list))
-		ASSERT(0);
-
+	xfs_buf_delwri_queue_here(*bpp, buffers_list);
 	xfs_buf_relse(*bpp);
 	*bpp = NULL;
 }
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2040,6 +2040,14 @@ error_free:
 	return NULL;
 }
 
+static inline void
+xfs_buf_list_del(
+	struct xfs_buf		*bp)
+{
+	list_del_init(&bp->b_list);
+	wake_up_var(&bp->b_list);
+}
+
 /*
  * Cancel a delayed write list.
  *
@@ -2057,7 +2065,7 @@ xfs_buf_delwri_cancel(
 
 		xfs_buf_lock(bp);
 		bp->b_flags &= ~_XBF_DELWRI_Q;
-		list_del_init(&bp->b_list);
+		xfs_buf_list_del(bp);
 		xfs_buf_relse(bp);
 	}
 }
@@ -2111,6 +2119,34 @@ xfs_buf_delwri_queue(
 }
 
 /*
+ * Queue a buffer to this delwri list as part of a data integrity operation.
+ * If the buffer is on any other delwri list, we'll wait for that to clear
+ * so that the caller can submit the buffer for IO and wait for the result.
+ * Callers must ensure the buffer is not already on the list.
+ */
+void
+xfs_buf_delwri_queue_here(
+	struct xfs_buf		*bp,
+	struct list_head	*buffer_list)
+{
+	/*
+	 * We need this buffer to end up on the /caller's/ delwri list, not any
+	 * old list.  This can happen if the buffer is marked stale (which
+	 * clears DELWRI_Q) after the AIL queues the buffer to its list but
+	 * before the AIL has a chance to submit the list.
+	 */
+	while (!list_empty(&bp->b_list)) {
+		xfs_buf_unlock(bp);
+		wait_var_event(&bp->b_list, list_empty(&bp->b_list));
+		xfs_buf_lock(bp);
+	}
+
+	ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+
+	xfs_buf_delwri_queue(bp, buffer_list);
+}
+
+/*
  * Compare function is more complex than it needs to be because
  * the return value is only 32 bits and we are doing comparisons
  * on 64 bit values
@@ -2172,7 +2208,7 @@ xfs_buf_delwri_submit_buffers(
 		 * reference and remove it from the list here.
 		 */
 		if (!(bp->b_flags & _XBF_DELWRI_Q)) {
-			list_del_init(&bp->b_list);
+			xfs_buf_list_del(bp);
 			xfs_buf_relse(bp);
 			continue;
 		}
@@ -2192,7 +2228,7 @@ xfs_buf_delwri_submit_buffers(
 			list_move_tail(&bp->b_list, wait_list);
 		} else {
 			bp->b_flags |= XBF_ASYNC;
-			list_del_init(&bp->b_list);
+			xfs_buf_list_del(bp);
 		}
 		__xfs_buf_submit(bp, false);
 	}
@@ -2246,7 +2282,7 @@ xfs_buf_delwri_submit(
 	while (!list_empty(&wait_list)) {
 		bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
 
-		list_del_init(&bp->b_list);
+		xfs_buf_list_del(bp);
 
 		/*
 		 * Wait on the locked buffer, check for errors and unlock and
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -305,6 +305,7 @@ extern void xfs_buf_stale(struct xfs_buf
 /* Delayed Write Buffer Routines */
 extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
+void xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *bl);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
 extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);


Patches currently in stable-queue which might be from leah.rumancik@gmail.com are

queue-6.1/xfs-fix-confusing-xfs_extent_item-variable-names.patch
queue-6.1/xfs-fix-32-bit-truncation-in-xfs_compute_rextslog.patch
queue-6.1/xfs-transfer-recovered-intent-item-ownership-in-iop_recover.patch
queue-6.1/xfs-initialise-di_crc-in-xfs_log_dinode.patch
queue-6.1/xfs-consider-minlen-sized-extents-in-xfs_rtallocate_extent_block.patch
queue-6.1/xfs-don-t-leak-recovered-attri-intent-items.patch
queue-6.1/xfs-remove-unused-fields-from-struct-xbtree_ifakeroot.patch
queue-6.1/xfs-fix-bounds-check-in-xfs_defer_agfl_block.patch
queue-6.1/xfs-ensure-logflagsp-is-initialized-in-xfs_bmap_del_extent_real.patch
queue-6.1/xfs-convert-rt-bitmap-extent-lengths-to-xfs_rtbxlen_t.patch
queue-6.1/xfs-pass-refcount-intent-directly-through-the-log-intent-code.patch
queue-6.1/xfs-fix-perag-leak-when-growfs-fails.patch
queue-6.1/xfs-pass-the-xfs_defer_pending-object-to-iop_recover.patch
queue-6.1/xfs-update-dir3-leaf-block-metadata-after-swap.patch
queue-6.1/xfs-use-deferred-frees-for-btree-block-freeing.patch
queue-6.1/xfs-make-rextslog-computation-consistent-with-mkfs.patch
queue-6.1/xfs-pass-xfs_extent_free_item-directly-through-the-log-intent-code.patch
queue-6.1/xfs-move-the-xfs_rtbitmap.c-declarations-to-xfs_rtbitmap.h.patch
queue-6.1/xfs-recompute-growfsrtfree-transaction-reservation-while-growing-rt-volume.patch
queue-6.1/xfs-reserve-less-log-space-when-recovering-log-intent-items.patch
queue-6.1/xfs-pass-the-xfs_bmbt_irec-directly-through-the-log-intent-code.patch
queue-6.1/xfs-force-all-buffers-to-be-written-during-btree-bulk-load.patch
queue-6.1/xfs-reset-xfs_attr_incomplete-filter-on-node-removal.patch
queue-6.1/xfs-add-lock-protection-when-remove-perag-from-radix-tree.patch
queue-6.1/xfs-use-xfs_defer_pending-objects-to-recover-intent-items.patch
queue-6.1/xfs-pass-per-ag-references-to-xfs_free_extent.patch
queue-6.1/xfs-validate-block-number-being-freed-before-adding-to-xefi.patch
queue-6.1/xfs-don-t-allow-overly-small-or-large-realtime-volumes.patch
queue-6.1/xfs-remove-conditional-building-of-rt-geometry-validator-functions.patch

  reply	other threads:[~2025-03-16  6:17 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: pass refcount intent directly through the log intent code" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item directly through the log intent code Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: pass xfs_extent_free_item directly through the log intent code" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: fix confusing xfs_extent_item variable names" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: pass the xfs_bmbt_irec directly through the log intent code" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: pass per-ag references to xfs_free_extent" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: validate block number being freed before adding to xefi" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block() Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: fix bounds check in xfs_defer_agfl_block()" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: use deferred frees for btree block freeing" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 09/29] xfs: reserve less log space when recovering log intent items Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: reserve less log space when recovering log intent items" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 10/29] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 11/29] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 12/29] xfs: consider minlen sized extents in xfs_rtallocate_extent_block Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: consider minlen sized extents in xfs_rtallocate_extent_block" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 13/29] xfs: don't leak recovered attri intent items Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: don't leak recovered attri intent items" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: use xfs_defer_pending objects to recover intent items" has been added to the 6.1-stable tree gregkh
2025-03-21  8:39   ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items Fedor Pchelkin
2025-03-21 17:42     ` Leah Rumancik
2025-03-22 14:27       ` Fedor Pchelkin
2025-03-24  0:29         ` Leah Rumancik
2025-03-24  8:53           ` Fedor Pchelkin
2025-03-24 21:10             ` Leah Rumancik
2025-03-25 11:50               ` Greg Kroah-Hartman
2025-03-13 20:25 ` [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: pass the xfs_defer_pending object to iop_recover" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: transfer recovered intent item ownership in ->iop_recover" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 17/29] xfs: make rextslog computation consistent with mkfs Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: make rextslog computation consistent with mkfs" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 18/29] xfs: fix 32-bit truncation in xfs_compute_rextslog Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: fix 32-bit truncation in xfs_compute_rextslog" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 19/29] xfs: don't allow overly small or large realtime volumes Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: don't allow overly small or large realtime volumes" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 20/29] xfs: remove unused fields from struct xbtree_ifakeroot Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: remove unused fields from struct xbtree_ifakeroot" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 21/29] xfs: recompute growfsrtfree transaction reservation while growing rt volume Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: recompute growfsrtfree transaction reservation while growing rt volume" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 22/29] xfs: force all buffers to be written during btree bulk load Leah Rumancik
2025-03-16  6:17   ` gregkh [this message]
2025-03-13 20:25 ` [PATCH 6.1 23/29] xfs: initialise di_crc in xfs_log_dinode Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: initialise di_crc in xfs_log_dinode" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 24/29] xfs: add lock protection when remove perag from radix tree Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: add lock protection when remove perag from radix tree" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 25/29] xfs: fix perag leak when growfs fails Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: fix perag leak when growfs fails" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 26/29] xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 27/29] xfs: update dir3 leaf block metadata after swap Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: update dir3 leaf block metadata after swap" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 28/29] xfs: reset XFS_ATTR_INCOMPLETE filter on node removal Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: reset XFS_ATTR_INCOMPLETE filter on node removal" has been added to the 6.1-stable tree gregkh
2025-03-13 20:25 ` [PATCH 6.1 29/29] xfs: remove conditional building of rt geometry validator functions Leah Rumancik
2025-03-16  6:17   ` Patch "xfs: remove conditional building of rt geometry validator functions" has been added to the 6.1-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2025031606-hummus-verse-3052@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=leah.rumancik@gmail.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=xfs-stable@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox