From: Chandan Babu R <chandan.babu@oracle.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: chandanrlinux@gmail.com, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 13/14] xfs: compute the maximum height of the rmap btree when reflink enabled
Date: Mon, 20 Sep 2021 15:26:56 +0530 [thread overview]
Message-ID: <8735pz7eon.fsf@debian-BULLSEYE-live-builder-AMD64> (raw)
In-Reply-To: <163192862112.416199.3937220618088469929.stgit@magnolia>
On 18 Sep 2021 at 07:00, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
> enough to handle the maximally tall rmap btree when all blocks are in
> use and maximally shared, let's compute the maximum height assuming the
> rmapbt consumes as many blocks as possible.
Maximum rmap btree height calculations look good to me.
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/libxfs/xfs_btree.c | 34 +++++++++++++++++++++++++++++++++
> fs/xfs/libxfs/xfs_btree.h | 2 ++
> fs/xfs/libxfs/xfs_rmap_btree.c | 40 ++++++++++++++++++++-------------------
> fs/xfs/libxfs/xfs_rmap_btree.h | 2 +-
> fs/xfs/libxfs/xfs_trans_resv.c | 12 ++++++++++++
> fs/xfs/libxfs/xfs_trans_space.h | 7 +++++++
> fs/xfs/xfs_mount.c | 2 +-
> 7 files changed, 78 insertions(+), 21 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 6cf49f7e1299..005bc42cf0bd 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -4526,6 +4526,40 @@ xfs_btree_compute_maxlevels(
> return level;
> }
>
> +/*
> + * Compute the maximum height of a btree that is allowed to consume up to the
> + * given number of blocks.
> + */
> +unsigned int
> +xfs_btree_compute_maxlevels_size(
> + unsigned long long max_btblocks,
> + unsigned int leaf_mnr)
> +{
> + unsigned long long leaf_blocks = leaf_mnr;
> + unsigned long long blocks_left;
> + unsigned int maxlevels;
> +
> + if (max_btblocks < 1)
> + return 0;
> +
> + /*
> + * The loop increments maxlevels as long as there would be enough
> + * blocks left in the reservation to handle each node block at the
> + * current level pointing to the minimum possible number of leaf blocks
> + * at the next level down. We start the loop assuming a single-level
> + * btree consuming one block.
> + */
> + maxlevels = 1;
> + blocks_left = max_btblocks - 1;
> + while (leaf_blocks < blocks_left) {
> + maxlevels++;
> + blocks_left -= leaf_blocks;
> + leaf_blocks *= leaf_mnr;
> + }
> +
> + return maxlevels;
> +}
> +
> /*
> * Query a regular btree for all records overlapping a given interval.
> * Start with a LE lookup of the key of low_rec and return all records
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 106760c540c7..d256d869f0af 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -476,6 +476,8 @@ xfs_failaddr_t xfs_btree_lblock_verify(struct xfs_buf *bp,
> unsigned int max_recs);
>
> uint xfs_btree_compute_maxlevels(uint *limits, unsigned long len);
> +unsigned int xfs_btree_compute_maxlevels_size(unsigned long long max_btblocks,
> + unsigned int leaf_mnr);
> unsigned long long xfs_btree_calc_size(uint *limits, unsigned long long len);
>
> /*
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> index f3c4d0965cc9..85caeb14e4db 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -535,30 +535,32 @@ xfs_rmapbt_maxrecs(
> }
>
> /* Compute the maximum height of an rmap btree. */
> -void
> +unsigned int
> xfs_rmapbt_compute_maxlevels(
> - struct xfs_mount *mp)
> + struct xfs_mount *mp)
> {
> + if (!xfs_has_reflink(mp)) {
> + /*
> + * If there's no block sharing, compute the maximum rmapbt
> + * height assuming one rmap record per AG block.
> + */
> + return xfs_btree_compute_maxlevels(mp->m_rmap_mnr,
> + mp->m_sb.sb_agblocks);
> + }
> +
> /*
> - * On a non-reflink filesystem, the maximum number of rmap
> - * records is the number of blocks in the AG, hence the max
> - * rmapbt height is log_$maxrecs($agblocks). However, with
> - * reflink each AG block can have up to 2^32 (per the refcount
> - * record format) owners, which means that theoretically we
> - * could face up to 2^64 rmap records.
> + * Compute the asymptotic maxlevels for an rmapbt on a reflink fs.
> *
> - * That effectively means that the max rmapbt height must be
> - * XFS_BTREE_MAXLEVELS. "Fortunately" we'll run out of AG
> - * blocks to feed the rmapbt long before the rmapbt reaches
> - * maximum height. The reflink code uses ag_resv_critical to
> - * disallow reflinking when less than 10% of the per-AG metadata
> - * block reservation since the fallback is a regular file copy.
> + * On a reflink filesystem, each AG block can have up to 2^32 (per the
> + * refcount record format) owners, which means that theoretically we
> + * could face up to 2^64 rmap records. However, we're likely to run
> + * out of blocks in the AG long before that happens, which means that
> + * we must compute the max height based on what the btree will look
> + * like if it consumes almost all the blocks in the AG due to maximal
> + * sharing factor.
> */
> - if (xfs_has_reflink(mp))
> - mp->m_rmap_maxlevels = XFS_BTREE_MAXLEVELS;
> - else
> - mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(
> - mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
> + return xfs_btree_compute_maxlevels_size(mp->m_sb.sb_agblocks,
> + mp->m_rmap_mnr[1]);
> }
>
> /* Calculate the refcount btree size for some records. */
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
> index f2eee6572af4..5aaecf755abd 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.h
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.h
> @@ -49,7 +49,7 @@ struct xfs_btree_cur *xfs_rmapbt_stage_cursor(struct xfs_mount *mp,
> void xfs_rmapbt_commit_staged_btree(struct xfs_btree_cur *cur,
> struct xfs_trans *tp, struct xfs_buf *agbp);
> int xfs_rmapbt_maxrecs(int blocklen, int leaf);
> -extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
> +unsigned int xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
>
> extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
> unsigned long long len);
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 5e300daa2559..679f10e08f31 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -814,6 +814,15 @@ xfs_trans_resv_calc(
> struct xfs_mount *mp,
> struct xfs_trans_resv *resp)
> {
> + unsigned int rmap_maxlevels = mp->m_rmap_maxlevels;
> +
> + /*
> + * In the early days of rmap+reflink, we hardcoded the rmap maxlevels
> + * to 9 even if the AG size was smaller.
> + */
> + if (xfs_has_rmapbt(mp) && xfs_has_reflink(mp))
> + mp->m_rmap_maxlevels = XFS_OLD_REFLINK_RMAP_MAXLEVELS;
> +
> /*
> * The following transactions are logged in physical format and
> * require a permanent reservation on space.
> @@ -916,4 +925,7 @@ xfs_trans_resv_calc(
> resp->tr_clearagi.tr_logres = xfs_calc_clear_agi_bucket_reservation(mp);
> resp->tr_growrtzero.tr_logres = xfs_calc_growrtzero_reservation(mp);
> resp->tr_growrtfree.tr_logres = xfs_calc_growrtfree_reservation(mp);
> +
> + /* Put everything back the way it was. This goes at the end. */
> + mp->m_rmap_maxlevels = rmap_maxlevels;
> }
> diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
> index 50332be34388..440c9c390b86 100644
> --- a/fs/xfs/libxfs/xfs_trans_space.h
> +++ b/fs/xfs/libxfs/xfs_trans_space.h
> @@ -17,6 +17,13 @@
> /* Adding one rmap could split every level up to the top of the tree. */
> #define XFS_RMAPADD_SPACE_RES(mp) ((mp)->m_rmap_maxlevels)
>
> +/*
> + * Note that we historically set m_rmap_maxlevels to 9 when reflink was
> + * enabled, so we must preserve this behavior to avoid changing the transaction
> + * space reservations.
> + */
> +#define XFS_OLD_REFLINK_RMAP_MAXLEVELS (9)
> +
> /* Blocks we might need to add "b" rmaps to a tree. */
> #define XFS_NRMAPADD_SPACE_RES(mp, b)\
> (((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 06dac09eddbd..e600a0b781c8 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -635,7 +635,7 @@ xfs_mountfs(
> xfs_bmap_compute_maxlevels(mp, XFS_DATA_FORK);
> xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
> xfs_mount_setup_inode_geom(mp);
> - xfs_rmapbt_compute_maxlevels(mp);
> + mp->m_rmap_maxlevels = xfs_rmapbt_compute_maxlevels(mp);
> xfs_refcountbt_compute_maxlevels(mp);
>
> /*
--
chandan
next prev parent reply other threads:[~2021-09-20 10:12 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-18 1:29 [PATCHSET RFC chandan 00/14] xfs: support dynamic btree cursor height Darrick J. Wong
2021-09-18 1:29 ` [PATCH 01/14] xfs: remove xfs_btree_cur_t typedef Darrick J. Wong
2021-09-20 9:53 ` Chandan Babu R
2021-09-21 8:36 ` Christoph Hellwig
2021-09-18 1:29 ` [PATCH 02/14] xfs: don't allocate scrub contexts on the stack Darrick J. Wong
2021-09-20 9:53 ` Chandan Babu R
2021-09-20 17:39 ` Darrick J. Wong
2021-09-21 8:39 ` Christoph Hellwig
2021-09-18 1:29 ` [PATCH 03/14] xfs: dynamically allocate btree scrub context structure Darrick J. Wong
2021-09-20 9:53 ` Chandan Babu R
2021-09-21 8:43 ` Christoph Hellwig
2021-09-22 16:17 ` Darrick J. Wong
2021-09-18 1:29 ` [PATCH 04/14] xfs: stricter btree height checking when looking for errors Darrick J. Wong
2021-09-20 9:54 ` Chandan Babu R
2021-09-18 1:29 ` [PATCH 05/14] xfs: stricter btree height checking when scanning for btree roots Darrick J. Wong
2021-09-20 9:54 ` Chandan Babu R
2021-09-18 1:29 ` [PATCH 06/14] xfs: check that bc_nlevels never overflows Darrick J. Wong
2021-09-20 9:54 ` Chandan Babu R
2021-09-21 8:44 ` Christoph Hellwig
2021-09-18 1:29 ` [PATCH 07/14] xfs: support dynamic btree cursor heights Darrick J. Wong
2021-09-20 9:55 ` Chandan Babu R
2021-09-21 8:49 ` Christoph Hellwig
2021-09-18 1:29 ` [PATCH 08/14] xfs: refactor btree cursor allocation function Darrick J. Wong
2021-09-20 9:55 ` Chandan Babu R
2021-09-21 8:53 ` Christoph Hellwig
2021-09-18 1:29 ` [PATCH 09/14] xfs: fix maxlevels comparisons in the btree staging code Darrick J. Wong
2021-09-20 9:55 ` Chandan Babu R
2021-09-21 8:56 ` Christoph Hellwig
2021-09-22 15:59 ` Darrick J. Wong
2021-09-18 1:30 ` [PATCH 10/14] xfs: encode the max btree height in the cursor Darrick J. Wong
2021-09-20 9:55 ` Chandan Babu R
2021-09-21 8:57 ` Christoph Hellwig
2021-09-18 1:30 ` [PATCH 11/14] xfs: dynamically allocate cursors based on maxlevels Darrick J. Wong
2021-09-20 9:56 ` Chandan Babu R
2021-09-20 23:06 ` Dave Chinner
2021-09-20 23:36 ` Dave Chinner
2021-09-21 9:03 ` Christoph Hellwig
2021-09-22 18:55 ` Darrick J. Wong
2021-09-22 17:38 ` Darrick J. Wong
2021-09-22 23:10 ` Dave Chinner
2021-09-23 1:58 ` Darrick J. Wong
2021-09-23 5:56 ` Chandan Babu R
2021-09-18 1:30 ` [PATCH 12/14] xfs: compute actual maximum btree height for critical reservation calculation Darrick J. Wong
2021-09-20 9:56 ` Chandan Babu R
2021-09-18 1:30 ` [PATCH 13/14] xfs: compute the maximum height of the rmap btree when reflink enabled Darrick J. Wong
2021-09-20 9:56 ` Chandan Babu R [this message]
2021-09-18 1:30 ` [PATCH 14/14] xfs: kill XFS_BTREE_MAXLEVELS Darrick J. Wong
2021-09-20 9:57 ` Chandan Babu R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735pz7eon.fsf@debian-BULLSEYE-live-builder-AMD64 \
--to=chandan.babu@oracle.com \
--cc=chandanrlinux@gmail.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox