From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/4] xfs: fix off-by-one error in xfs_btree_space_to_height
Date: Tue, 20 Dec 2022 08:20:03 -0800 [thread overview]
Message-ID: <Y6HfiWz4R55RbW5G@magnolia> (raw)
In-Reply-To: <Y6HeYJXVhamT589A@magnolia>
On Tue, Dec 20, 2022 at 08:10:08AM -0800, Darrick J. Wong wrote:
> On Tue, Dec 20, 2022 at 04:00:01PM +1100, Dave Chinner wrote:
> > On Mon, Dec 19, 2022 at 04:05:19PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > Lately I've been stress-testing extreme-sized rmap btrees by using the
> > > (new) xfs_db bmap_inflate command to clone bmbt mappings billions of
> > > times and then using xfs_repair to build new rmap and refcount btrees.
> > > This of course is /much/ faster than actually FICLONEing a file billions
> > > of times.
> > >
> > > Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with
> > > EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not
> > > sufficiently large for the test scenario. For a 1TB filesystem (~67
> > > million AG blocks, 4 AGs) the btheight command reports:
> > >
> > > $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda
> > > rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node)
> > > level 0: 4400801200 records, 52390491 blocks
> > > level 1: 52390491 records, 1164234 blocks
> > > level 2: 1164234 records, 25872 blocks
> > > level 3: 25872 records, 575 blocks
> > > level 4: 575 records, 13 blocks
> > > level 5: 13 records, 1 block
> > > 6 levels, 53581186 blocks total
> > >
> > > The AG is sufficiently large to build this rmap btree. Unfortunately,
> > > m_rmap_maxlevels is 5. Augmenting the loop in the space->height
> > > function to report height, node blocks, and blocks remaining produces
> > > this:
> > >
> > > ht 1 node_blocks 45 blockleft 67108863
> > > ht 2 node_blocks 2025 blockleft 67108818
> > > ht 3 node_blocks 91125 blockleft 67106793
> > > ht 4 node_blocks 4100625 blockleft 67015668
> > > final height: 5
> > >
> > > The goal of this function is to compute the maximum height btree that
> > > can be stored in the given number of ondisk fsblocks. Starting with the
> > > top level of the tree, each iteration through the loop adds the fanout
> > > factor of the next level down until we run out of blocks. IOWs, maximum
> > > height is achieved by using the smallest fanout factor that can apply
> > > to that level.
> > >
> > > However, the loop setup is not correct. Top level btree blocks are
> > > allowed to contain fewer than minrecs items, so the computation is
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > Ah, that's the critical piece of information I was looking for. I
> > couldn't work out from the code change below what was wrong with
> > limits[1]. So....
> >
> > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > index 4c16c8c31fcb..8d11d3f5e529 100644
> > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > @@ -4666,7 +4666,11 @@ xfs_btree_space_to_height(
> > > const unsigned int *limits,
> > > unsigned long long leaf_blocks)
> > > {
> > > - unsigned long long node_blocks = limits[1];
> > > + /*
> > > + * The root btree block can have a fanout between 2 and maxrecs because
> > > + * the tree might not be big enough to fill it.
> > > + */
> >
> > Can you change this comment to say something like:
> >
> > /*
> > * The root btree block can have less than minrecs pointers
> > * in it because the tree might not be big enough to require
> > * that amount of fanout. Hence it has a minimum size of
> > * 2 pointers, not limits[1].
> > */
>
> Done. Thanks for the reviews! :)
>
> --D
>
> >
> > Otherwise it looks good.
> >
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> >
> > > + unsigned long long node_blocks = 2;
> > > unsigned long long blocks_left = leaf_blocks - 1;
> > > unsigned int height = 1;
> >
> > For future consideration, we don't use maxrecs in this calculation
> > at all - should we just pass minrecs into the function rather than
> > an array of limits?
I prefer to replace all those mxr/mnr array constructs with an explicit
structure:
struct xfs_btblock_geometry {
uint16_t leaf_maxrecs;
uint16_t leaf_minrecs;
uint16_t node_maxrecs;
uint16_t node_minrecs;
};
and get rid of all the mp->m_rmap_mxr[leaf != 0] stuff that slows me
down every time I have to read it.
--D
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
next prev parent reply other threads:[~2022-12-20 16:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-20 0:04 [PATCHSET 0/4] xfs: random fixes for 6.2, part 3 Darrick J. Wong
2022-12-20 0:05 ` [PATCH 1/4] xfs: don't assert if cmap covers imap after cycling lock Darrick J. Wong
2022-12-20 4:49 ` Dave Chinner
2022-12-20 0:05 ` [PATCH 2/4] xfs: don't stall background reclaim on inactvation Darrick J. Wong
2022-12-20 4:49 ` Dave Chinner
2022-12-20 16:28 ` Darrick J. Wong
2022-12-20 0:05 ` [PATCH 3/4] xfs: make xfs_iomap_page_ops static Darrick J. Wong
2022-12-20 4:49 ` Dave Chinner
2022-12-20 0:05 ` [PATCH 4/4] xfs: fix off-by-one error in xfs_btree_space_to_height Darrick J. Wong
2022-12-20 5:00 ` Dave Chinner
2022-12-20 16:10 ` Darrick J. Wong
2022-12-20 16:20 ` Darrick J. Wong [this message]
2022-12-20 20:39 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6HfiWz4R55RbW5G@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox