From: "Darrick J. Wong" <djwong@kernel.org>
To: Chandan Babu R <chandan.babu@oracle.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH V5 11/16] xfs: Introduce macros to represent new maximum extent counts for data/attr forks
Date: Tue, 1 Feb 2022 10:49:26 -0800 [thread overview]
Message-ID: <20220201184926.GA8338@magnolia> (raw)
In-Reply-To: <20220121051857.221105-12-chandan.babu@oracle.com>
On Fri, Jan 21, 2022 at 10:48:52AM +0530, Chandan Babu R wrote:
> This commit defines new macros to represent maximum extent counts allowed by
> filesystems which have support for large per-inode extent counters.
>
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 8 +++-----
> fs/xfs/libxfs/xfs_bmap_btree.c | 2 +-
> fs/xfs/libxfs/xfs_format.h | 20 ++++++++++++++++----
> fs/xfs/libxfs/xfs_inode_buf.c | 3 ++-
> fs/xfs/libxfs/xfs_inode_fork.c | 2 +-
> fs/xfs/libxfs/xfs_inode_fork.h | 19 +++++++++++++++----
> 6 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 1948af000c97..384532aac60a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -61,10 +61,8 @@ xfs_bmap_compute_maxlevels(
> int sz; /* root block size */
>
> /*
> - * The maximum number of extents in a file, hence the maximum number of
> - * leaf entries, is controlled by the size of the on-disk extent count,
> - * either a signed 32-bit number for the data fork, or a signed 16-bit
> - * number for the attr fork.
> + * The maximum number of extents in a fork, hence the maximum number of
> + * leaf entries, is controlled by the size of the on-disk extent count.
> *
> * Note that we can no longer assume that if we are in ATTR1 that the
> * fork offset of all the inodes will be
> @@ -74,7 +72,7 @@ xfs_bmap_compute_maxlevels(
> * ATTR2 we have to assume the worst case scenario of a minimum size
> * available.
> */
> - maxleafents = xfs_iext_max_nextents(whichfork);
> + maxleafents = xfs_iext_max_nextents(xfs_has_nrext64(mp), whichfork);
> if (whichfork == XFS_DATA_FORK)
> sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS);
> else
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 453309fc85f2..e8d21d69b9ff 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -611,7 +611,7 @@ xfs_bmbt_maxlevels_ondisk(void)
> minrecs[1] = xfs_bmbt_block_maxrecs(blocklen, false) / 2;
>
> /* One extra level for the inode root. */
> - return xfs_btree_compute_maxlevels(minrecs, MAXEXTNUM) + 1;
> + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_EXTCNT_DATA_FORK) + 1;
> }
>
> /*
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 9934c320bf01..d3dfd45c39e0 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -872,10 +872,22 @@ enum xfs_dinode_fmt {
>
> /*
> * Max values for extlen, extnum, aextnum.
> - */
> -#define MAXEXTLEN ((xfs_extlen_t)0x001fffff) /* 21 bits */
> -#define MAXEXTNUM ((xfs_extnum_t)0x7fffffff) /* signed int */
> -#define MAXAEXTNUM ((xfs_aextnum_t)0x7fff) /* signed short */
> + *
> + * The newly introduced data fork extent counter is a 64-bit field. However, the
> + * maximum number of extents in a file is limited to 2^54 extents (assuming one
> + * blocks per extent) by the 54-bit wide startoff field of an extent record.
> + *
> + * A further limitation applies as shown below,
> + * 2^63 (max file size) / 64k (max block size) = 2^47
> + *
> + * Rounding up 47 to the nearest multiple of bits-per-byte results in 48. Hence
> + * 2^48 was chosen as the maximum data fork extent count.
Ok. I know I've brought up previously the fact that we leave the upper
16 bits of di_big_nextents completely unused, AKA:
It's odd that startoff is a 54-bit field, di_big_nextents is a 64-bit
field, but we don't allow more than 2^48 data fork extents even though
that means that one cannot populate a file on a 4k-FSB filesystem with
one extent record for each file block.
Prior to 5.16, a potential justification was that xfs_btree_cur
supported exactly 9 levels and we didn't want to raise that all the way
to 12 (or whatever you'd need to support a btree with 2^54 extent
records) for *all cursor types* to handle Ultra Extreme Fragmentation.
Now that we have separate cursor caches for all btree types, we could
create one bmbt cursor cache for NREXT64 data forks and another for all
other cases, which (in my mind anyway) assuages that concern.
The other justification we've covered is that the incore btree for a
data fork with 2^48 xfs_bmbt_irec records will consume a bit more than
2^52 bytes of memory, which is (AFAIK) the current x64 memory limit.
Assuming that CPU manufacturers keep adding an extra address line bit
every other year or so, the extremely wealthy could complain about
hitting this limit as early as 2040. That's ~20 or so years out, which
is probably enough time either to find a more efficient incore extent
map structure due to customer demand or start using the upper 16 bits.
So with those two factors in mind, I /think/ I'm ok with approving this
extension to the ondisk format.
IOWs, if anyone has an objection, the time to raise it is NOW.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + */
> +#define MAXEXTLEN ((xfs_extlen_t)((1ULL << 21) - 1)) /* 21 bits */
> +#define XFS_MAX_EXTCNT_DATA_FORK ((xfs_extnum_t)((1ULL << 48) - 1)) /* Unsigned 48-bits */
> +#define XFS_MAX_EXTCNT_ATTR_FORK ((xfs_extnum_t)((1ULL << 32) - 1)) /* Unsigned 32-bits */
> +#define XFS_MAX_EXTCNT_DATA_FORK_OLD ((xfs_extnum_t)((1ULL << 31) - 1)) /* Signed 32-bits */
> +#define XFS_MAX_EXTCNT_ATTR_FORK_OLD ((xfs_extnum_t)((1ULL << 15) - 1)) /* Signed 16-bits */
>
> /*
> * Inode minimum and maximum sizes.
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 860d32816909..34f360a38603 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -361,7 +361,8 @@ xfs_dinode_verify_fork(
> return __this_address;
> break;
> case XFS_DINODE_FMT_BTREE:
> - max_extents = xfs_iext_max_nextents(whichfork);
> + max_extents = xfs_iext_max_nextents(xfs_dinode_has_nrext64(dip),
> + whichfork);
> if (di_nextents > max_extents)
> return __this_address;
> break;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index ce690abe5dce..a3a3b54f9c55 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -746,7 +746,7 @@ xfs_iext_count_may_overflow(
> if (whichfork == XFS_COW_FORK)
> return 0;
>
> - max_exts = xfs_iext_max_nextents(whichfork);
> + max_exts = xfs_iext_max_nextents(xfs_inode_has_nrext64(ip), whichfork);
>
> if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> max_exts = 10;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index 4a8b77d425df..e56803436c61 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -133,12 +133,23 @@ static inline int8_t xfs_ifork_format(struct xfs_ifork *ifp)
> return ifp->if_format;
> }
>
> -static inline xfs_extnum_t xfs_iext_max_nextents(int whichfork)
> +static inline xfs_extnum_t xfs_iext_max_nextents(bool has_nrext64,
> + int whichfork)
> {
> - if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK)
> - return MAXEXTNUM;
> + switch (whichfork) {
> + case XFS_DATA_FORK:
> + case XFS_COW_FORK:
> + return has_nrext64 ? XFS_MAX_EXTCNT_DATA_FORK
> + : XFS_MAX_EXTCNT_DATA_FORK_OLD;
> +
> + case XFS_ATTR_FORK:
> + return has_nrext64 ? XFS_MAX_EXTCNT_ATTR_FORK
> + : XFS_MAX_EXTCNT_ATTR_FORK_OLD;
>
> - return MAXAEXTNUM;
> + default:
> + ASSERT(0);
> + return 0;
> + }
> }
>
> static inline xfs_extnum_t
> --
> 2.30.2
>
next prev parent reply other threads:[~2022-02-01 18:49 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-21 5:18 [PATCH V5 00/16] xfs: Extend per-inode extent counters Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 01/16] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 02/16] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 03/16] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 04/16] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 05/16] xfs: Use basic types to define xfs_log_dinode's di_nextents and di_anextents Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 06/16] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2022-01-25 0:32 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 07/16] xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 08/16] xfs: Introduce XFS_FSOP_GEOM_FLAGS_NREXT64 Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 09/16] xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 10/16] xfs: Use xfs_rfsblock_t to count maximum blocks that can be used by BMBT Chandan Babu R
2022-01-25 0:31 ` Darrick J. Wong
2022-01-21 5:18 ` [PATCH V5 11/16] xfs: Introduce macros to represent new maximum extent counts for data/attr forks Chandan Babu R
2022-02-01 18:49 ` Darrick J. Wong [this message]
2022-01-21 5:18 ` [PATCH V5 12/16] xfs: Introduce per-inode 64-bit extent counters Chandan Babu R
2022-01-25 22:51 ` kernel test robot
2022-01-26 8:50 ` Chandan Babu R
2022-02-01 18:51 ` Darrick J. Wong
2022-02-01 19:10 ` Darrick J. Wong
2022-02-07 4:54 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use " Chandan Babu R
2022-02-01 20:01 ` Darrick J. Wong
2022-02-07 4:55 ` Chandan Babu R
2022-02-07 17:11 ` Darrick J. Wong
2022-02-11 12:10 ` Chandan Babu R
2022-02-14 17:07 ` Darrick J. Wong
2022-02-15 6:48 ` Chandan Babu R
2022-02-15 9:33 ` Dave Chinner
2022-02-15 11:33 ` Chandan Babu R
2022-02-15 13:16 ` Chandan Babu R
2022-02-16 1:16 ` Darrick J. Wong
2022-02-16 3:59 ` Dave Chinner
2022-02-16 12:34 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 14/16] xfs: Enable bulkstat ioctl to support 64-bit per-inode " Chandan Babu R
2022-02-01 19:24 ` Darrick J. Wong
2022-02-07 4:56 ` Chandan Babu R
2022-02-07 9:46 ` Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 15/16] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to the list of supported flags Chandan Babu R
2022-01-21 5:18 ` [PATCH V5 16/16] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2022-02-01 19:26 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220201184926.GA8338@magnolia \
--to=djwong@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox