public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: cem@kernel.org
To: linux-xfs@vger.kernel.org
Subject: [PATCH 34/35] xfs: fix internal error from AGFL exhaustion
Date: Thu, 15 Feb 2024 13:08:46 +0100	[thread overview]
Message-ID: <20240215120907.1542854-35-cem@kernel.org> (raw)
In-Reply-To: <20240215120907.1542854-1-cem@kernel.org>

From: Omar Sandoval <osandov@fb.com>

Source kernel commit: f63a5b3769ad7659da4c0420751d78958ab97675

We've been seeing XFS errors like the following:

XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_btree_insert+0x1ec/0x280
...
Call Trace:
xfs_corruption_error+0x94/0xa0
xfs_btree_insert+0x221/0x280
xfs_alloc_fixup_trees+0x104/0x3e0
xfs_alloc_ag_vextent_size+0x667/0x820
xfs_alloc_fix_freelist+0x5d9/0x750
xfs_free_extent_fix_freelist+0x65/0xa0
__xfs_free_extent+0x57/0x180
...

This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when
xfs_btree_insrec() fails.

After converting this into a panic and dissecting the core dump, I found
that xfs_btree_insrec() is failing because it's trying to split a leaf
node in the cntbt when the AG free list is empty. In particular, it's
failing to get a block from the AGFL _while trying to refill the AGFL_.

If a single operation splits every level of the bnobt and the cntbt (and
the rmapbt if it is enabled) at once, the free list will be empty. Then,
when the next operation tries to refill the free list, it allocates
space. If the allocation does not use a full extent, it will need to
insert records for the remaining space in the bnobt and cntbt. And if
those new records go in full leaves, the leaves (and potentially more
nodes up to the old root) need to be split.

Fix it by accounting for the additional splits that may be required to
refill the free list in the calculation for the minimum free list size.

P.S. As far as I can tell, this bug has existed for a long time -- maybe
back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros
...") in April 1994! It requires a very unlucky sequence of events, and
in fact we didn't hit it until a particular sparse mmap workload updated
from 5.12 to 5.19. But this bug existed in 5.12, so it must've been
exposed by some other change in allocation or writeback patterns. It's
also much less likely to be hit with the rmapbt enabled, since that
increases the minimum free list size and is unlikely to split at the
same time as the bnobt and cntbt.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_alloc.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 4519a0555..7ac7c2f6c 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2271,16 +2271,37 @@ xfs_alloc_min_freelist(
 
 	ASSERT(mp->m_alloc_maxlevels > 0);
 
+	/*
+	 * For a btree shorter than the maximum height, the worst case is that
+	 * every level gets split and a new level is added, then while inserting
+	 * another entry to refill the AGFL, every level under the old root gets
+	 * split again. This is:
+	 *
+	 *   (full height split reservation) + (AGFL refill split height)
+	 * = (current height + 1) + (current height - 1)
+	 * = (new height) + (new height - 2)
+	 * = 2 * new height - 2
+	 *
+	 * For a btree of maximum height, the worst case is that every level
+	 * under the root gets split, then while inserting another entry to
+	 * refill the AGFL, every level under the root gets split again. This is
+	 * also:
+	 *
+	 *   2 * (current height - 1)
+	 * = 2 * (new height - 1)
+	 * = 2 * new height - 2
+	 */
+
 	/* space needed by-bno freespace btree */
 	min_free = min_t(unsigned int, levels[XFS_BTNUM_BNOi] + 1,
-				       mp->m_alloc_maxlevels);
+				       mp->m_alloc_maxlevels) * 2 - 2;
 	/* space needed by-size freespace btree */
 	min_free += min_t(unsigned int, levels[XFS_BTNUM_CNTi] + 1,
-				       mp->m_alloc_maxlevels);
+				       mp->m_alloc_maxlevels) * 2 - 2;
 	/* space needed reverse mapping used space btree */
 	if (xfs_has_rmapbt(mp))
 		min_free += min_t(unsigned int, levels[XFS_BTNUM_RMAPi] + 1,
-						mp->m_rmap_maxlevels);
+						mp->m_rmap_maxlevels) * 2 - 2;
 
 	return min_free;
 }
-- 
2.43.0


  parent reply	other threads:[~2024-02-15 12:09 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 12:08 [PATCH 00/35] xfsprogs: libxfs-sync for 6.7 cem
2024-02-15 12:08 ` [PATCH 01/35] xfs: bump max fsgeom struct version cem
2024-02-15 12:08 ` [PATCH 02/35] xfs: hoist freeing of rt data fork extent mappings cem
2024-02-15 12:08 ` [PATCH 03/35] xfs: fix units conversion error in xfs_bmap_del_extent_delay cem
2024-02-15 12:08 ` [PATCH 04/35] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h cem
2024-02-15 12:08 ` [PATCH 05/35] xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator cem
2024-02-15 12:08 ` [PATCH 06/35] xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t cem
2024-02-15 12:08 ` [PATCH 07/35] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t cem
2024-02-15 12:08 ` [PATCH 08/35] xfs: rename xfs_verify_rtext to xfs_verify_rtbext cem
2024-02-15 12:08 ` [PATCH 09/35] xfs: convert rt extent numbers to xfs_rtxnum_t cem
2024-02-15 12:08 ` [PATCH 10/35] xfs: create a helper to convert rtextents to rtblocks cem
2024-02-15 12:08 ` [PATCH 11/35] xfs: create a helper to compute leftovers of realtime extents cem
2024-02-15 12:08 ` [PATCH 12/35] xfs: create a helper to convert extlen to rtextlen cem
2024-02-15 12:08 ` [PATCH 13/35] xfs: create helpers to convert rt block numbers to rt extent numbers cem
2024-02-15 12:08 ` [PATCH 14/35] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls cem
2024-02-15 12:08 ` [PATCH 15/35] xfs: create rt extent rounding helpers for realtime extent blocks cem
2024-02-15 12:08 ` [PATCH 16/35] xfs: use shifting and masking when converting rt extents, if possible cem
2024-02-15 12:08 ` [PATCH 17/35] xfs: convert the rtbitmap block and bit macros to static inline functions cem
2024-02-15 12:08 ` [PATCH 18/35] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros cem
2024-02-15 12:08 ` [PATCH 19/35] xfs: convert open-coded xfs_rtword_t pointer accesses to helper cem
2024-02-15 12:08 ` [PATCH 20/35] xfs: convert rt summary macros to helpers cem
2024-02-15 12:08 ` [PATCH 21/35] xfs: convert to new timestamp accessors cem
2024-02-15 12:08 ` [PATCH 22/35] xfs: create helpers for rtbitmap block/wordcount computations cem
2024-02-15 12:08 ` [PATCH 23/35] xfs: create a helper to handle logging parts of rt bitmap/summary blocks cem
2024-02-15 12:08 ` [PATCH 24/35] xfs: use accessor functions for bitmap words cem
2024-02-15 12:08 ` [PATCH 25/35] xfs: create helpers for rtsummary block/wordcount computations cem
2024-02-15 12:08 ` [PATCH 26/35] xfs: use accessor functions for summary info words cem
2024-02-15 12:08 ` [PATCH 27/35] xfs: consolidate realtime allocation arguments cem
2024-02-15 12:08 ` [PATCH 28/35] xfs: cache last bitmap block in realtime allocator cem
2024-02-15 12:08 ` [PATCH 29/35] xfs: simplify xfs_rtbuf_get calling conventions cem
2024-02-15 12:08 ` [PATCH 30/35] xfs: simplify rt bitmap/summary block accessor functions cem
2024-02-15 12:08 ` [PATCH 31/35] xfs: invert the realtime summary cache cem
2024-02-15 12:08 ` [PATCH 32/35] xfs: factor out xfs_defer_pending_abort cem
2024-02-15 12:08 ` [PATCH 33/35] xfs: abort intent items when recovery intents fail cem
2024-02-15 12:08 ` cem [this message]
2024-02-15 12:08 ` [PATCH 35/35] xfs: inode recovery does not validate the recovered inode cem
2024-02-16  7:11 ` [PATCH 00/35] xfsprogs: libxfs-sync for 6.7 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240215120907.1542854-35-cem@kernel.org \
    --to=cem@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox