From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Carlos Maiolino <cem@kernel.org>,
Hans Holmberg <hans.holmberg@wdc.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 10/43] xfs: make metabtree reservations global
Date: Thu, 6 Feb 2025 12:50:21 -0800 [thread overview]
Message-ID: <20250206205021.GL21808@frogsfrogsfrogs> (raw)
In-Reply-To: <20250206064511.2323878-11-hch@lst.de>
On Thu, Feb 06, 2025 at 07:44:26AM +0100, Christoph Hellwig wrote:
> Currently each metabtree inode has it's own space reservation to ensure
> it can be expanded to the maximum size. This is not very efficient as it
> requires a large number of blocks to be set aside that can't be used at
> all by other btrees.
>
> Switch to a model that uses a global pool instead in preparation for
> reducing the amount of reserved space.
...because inodes draw from the global pool, so there's no reason to
compartmentalize each rt rmap btree's reservation like we do for per-AG
rmap btrees, right?
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/libxfs/xfs_metafile.c | 164 ++++++++++++++++++++-----------
> fs/xfs/libxfs/xfs_metafile.h | 6 +-
> fs/xfs/scrub/reap.c | 12 ++-
> fs/xfs/scrub/repair.c | 54 ----------
> fs/xfs/scrub/repair.h | 1 -
> fs/xfs/scrub/rtrefcount_repair.c | 34 ++-----
> fs/xfs/scrub/rtrmap_repair.c | 29 +-----
> fs/xfs/xfs_fsops.c | 19 ++--
> fs/xfs/xfs_inode.h | 16 +--
> fs/xfs/xfs_mount.h | 9 ++
> fs/xfs/xfs_reflink.c | 12 +--
> fs/xfs/xfs_rtalloc.c | 43 +-------
> fs/xfs/xfs_rtalloc.h | 5 -
> fs/xfs/xfs_super.c | 1 +
> fs/xfs/xfs_trace.h | 23 ++---
> 15 files changed, 160 insertions(+), 268 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_metafile.c b/fs/xfs/libxfs/xfs_metafile.c
> index 7625e694eb8d..88f011750add 100644
> --- a/fs/xfs/libxfs/xfs_metafile.c
> +++ b/fs/xfs/libxfs/xfs_metafile.c
> @@ -21,6 +21,9 @@
> #include "xfs_errortag.h"
> #include "xfs_error.h"
> #include "xfs_alloc.h"
> +#include "xfs_rtgroup.h"
> +#include "xfs_rtrmap_btree.h"
> +#include "xfs_rtrefcount_btree.h"
>
> static const struct {
> enum xfs_metafile_type mtype;
> @@ -74,12 +77,11 @@ xfs_metafile_clear_iflag(
> }
>
> /*
> - * Is the amount of space that could be allocated towards a given metadata
> - * file at or beneath a certain threshold?
> + * Is the metafile reservations at or beneath a certain threshold?
> */
> static inline bool
> xfs_metafile_resv_can_cover(
> - struct xfs_inode *ip,
> + struct xfs_mount *mp,
> int64_t rhs)
> {
> /*
> @@ -88,43 +90,38 @@ xfs_metafile_resv_can_cover(
> * global free block count. Take care of the first case to avoid
> * touching the per-cpu counter.
> */
> - if (ip->i_delayed_blks >= rhs)
> + if (mp->m_metafile_resv_avail >= rhs)
> return true;
>
> /*
> * There aren't enough blocks left in the inode's reservation, but it
> * isn't critical unless there also isn't enough free space.
> */
> - return xfs_compare_freecounter(ip->i_mount, XC_FREE_BLOCKS,
> - rhs - ip->i_delayed_blks, 2048) >= 0;
> + return xfs_compare_freecounter(mp, XC_FREE_BLOCKS,
> + rhs - mp->m_metafile_resv_avail, 2048) >= 0;
> }
>
> /*
> - * Is this metadata file critically low on blocks? For now we'll define that
> - * as the number of blocks we can get our hands on being less than 10% of what
> - * we reserved or less than some arbitrary number (maximum btree height).
> + * Is the metafile reservation critically low on blocks? For now we'll define
> + * that as the number of blocks we can get our hands on being less than 10% of
> + * what we reserved or less than some arbitrary number (maximum btree height).
> */
> bool
> xfs_metafile_resv_critical(
> - struct xfs_inode *ip)
> + struct xfs_mount *mp)
> {
> - uint64_t asked_low_water;
> + ASSERT(xfs_has_metadir(mp));
>
> - if (!ip)
> - return false;
> -
> - ASSERT(xfs_is_metadir_inode(ip));
> - trace_xfs_metafile_resv_critical(ip, 0);
> + trace_xfs_metafile_resv_critical(mp, 0);
>
> - if (!xfs_metafile_resv_can_cover(ip, ip->i_mount->m_rtbtree_maxlevels))
> + if (!xfs_metafile_resv_can_cover(mp, mp->m_rtbtree_maxlevels))
> return true;
>
> - asked_low_water = div_u64(ip->i_meta_resv_asked, 10);
> - if (!xfs_metafile_resv_can_cover(ip, asked_low_water))
> + if (!xfs_metafile_resv_can_cover(mp,
> + div_u64(mp->m_metafile_resv_target, 10)))
> return true;
>
> - return XFS_TEST_ERROR(false, ip->i_mount,
> - XFS_ERRTAG_METAFILE_RESV_CRITICAL);
> + return XFS_TEST_ERROR(false, mp, XFS_ERRTAG_METAFILE_RESV_CRITICAL);
> }
>
> /* Allocate a block from the metadata file's reservation. */
> @@ -133,22 +130,24 @@ xfs_metafile_resv_alloc_space(
> struct xfs_inode *ip,
> struct xfs_alloc_arg *args)
> {
> + struct xfs_mount *mp = ip->i_mount;
> int64_t len = args->len;
>
> ASSERT(xfs_is_metadir_inode(ip));
> ASSERT(args->resv == XFS_AG_RESV_METAFILE);
>
> - trace_xfs_metafile_resv_alloc_space(ip, args->len);
> + trace_xfs_metafile_resv_alloc_space(mp, args->len);
>
> /*
> * Allocate the blocks from the metadata inode's block reservation
> * and update the ondisk sb counter.
> */
> - if (ip->i_delayed_blks > 0) {
> + mutex_lock(&mp->m_metafile_resv_lock);
> + if (mp->m_metafile_resv_avail > 0) {
> int64_t from_resv;
>
> - from_resv = min_t(int64_t, len, ip->i_delayed_blks);
> - ip->i_delayed_blks -= from_resv;
> + from_resv = min_t(int64_t, len, mp->m_metafile_resv_avail);
> + mp->m_metafile_resv_avail -= from_resv;
> xfs_mod_delalloc(ip, 0, -from_resv);
> xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS,
> -from_resv);
> @@ -175,6 +174,9 @@ xfs_metafile_resv_alloc_space(
> xfs_trans_mod_sb(args->tp, field, -len);
> }
>
> + mp->m_metafile_resv_used += args->len;
> + mutex_unlock(&mp->m_metafile_resv_lock);
> +
> ip->i_nblocks += args->len;
> xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE);
> }
> @@ -186,26 +188,33 @@ xfs_metafile_resv_free_space(
> struct xfs_trans *tp,
> xfs_filblks_t len)
> {
> + struct xfs_mount *mp = ip->i_mount;
> int64_t to_resv;
>
> ASSERT(xfs_is_metadir_inode(ip));
> - trace_xfs_metafile_resv_free_space(ip, len);
> +
> + trace_xfs_metafile_resv_free_space(mp, len);
>
> ip->i_nblocks -= len;
> xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>
> + mutex_lock(&mp->m_metafile_resv_lock);
> + mp->m_metafile_resv_used -= len;
> +
> /*
> * Add the freed blocks back into the inode's delalloc reservation
> * until it reaches the maximum size. Update the ondisk fdblocks only.
> */
> - to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks);
> + to_resv = mp->m_metafile_resv_target -
> + (mp->m_metafile_resv_used + mp->m_metafile_resv_avail);
This separates accounting of the usage of the metadata btree (e.g.
rtrmapbt) from i_blocks, which is a convenient way to fix an accounting
bug if ever there's a metadir inode with a big enough xattr structure
that it blows out into a real ondisk block. So far I don't think that
happens in practice, but it's not impossible.
Clearly, the mapping of before to after is:
i_meta_resv_asked -> m_metafile_resv_target
i_nblocks -> m_metafile_resv_used
i_delayed_blks -> m_metafile_resv_avail
Makes sense. And I guess we still consider the reservations as "delayed
allocations" in the sense that we subtract from fdblocks and put the
same quantity into m_delalloc_blks.
> if (to_resv > 0) {
> to_resv = min_t(int64_t, to_resv, len);
> - ip->i_delayed_blks += to_resv;
> + mp->m_metafile_resv_avail += to_resv;
> xfs_mod_delalloc(ip, 0, to_resv);
> xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, to_resv);
> len -= to_resv;
> }
> + mutex_unlock(&mp->m_metafile_resv_lock);
>
> /*
> * Everything else goes back to the filesystem, so update the in-core
> @@ -215,61 +224,96 @@ xfs_metafile_resv_free_space(
> xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len);
> }
>
> -/* Release a metadata file's space reservation. */
> +static void
> +__xfs_metafile_resv_free(
> + struct xfs_mount *mp)
> +{
> + if (mp->m_metafile_resv_avail) {
> + xfs_mod_sb_delalloc(mp, -(int64_t)mp->m_metafile_resv_avail);
> + xfs_add_fdblocks(mp, mp->m_metafile_resv_avail);
> + }
> + mp->m_metafile_resv_avail = 0;
> + mp->m_metafile_resv_used = 0;
> + mp->m_metafile_resv_target = 0;
> +}
> +
> +/* Release unused metafile space reservation. */
> void
> xfs_metafile_resv_free(
> - struct xfs_inode *ip)
> + struct xfs_mount *mp)
> {
> - /* Non-btree metadata inodes don't need space reservations. */
> - if (!ip || !ip->i_meta_resv_asked)
> + if (!xfs_has_metadir(mp))
> return;
>
> - ASSERT(xfs_is_metadir_inode(ip));
> - trace_xfs_metafile_resv_free(ip, 0);
> + trace_xfs_metafile_resv_free(mp, 0);
>
> - if (ip->i_delayed_blks) {
> - xfs_mod_delalloc(ip, 0, -ip->i_delayed_blks);
> - xfs_add_fdblocks(ip->i_mount, ip->i_delayed_blks);
> - ip->i_delayed_blks = 0;
> - }
> - ip->i_meta_resv_asked = 0;
> + mutex_lock(&mp->m_metafile_resv_lock);
> + __xfs_metafile_resv_free(mp);
> + mutex_unlock(&mp->m_metafile_resv_lock);
> }
>
> -/* Set up a metadata file's space reservation. */
> +/* Set up a metafile space reservation. */
> int
> xfs_metafile_resv_init(
> - struct xfs_inode *ip,
> - xfs_filblks_t ask)
> + struct xfs_mount *mp)
> {
> + struct xfs_rtgroup *rtg = NULL;
> + xfs_filblks_t used = 0, target = 0;
> xfs_filblks_t hidden_space;
> - xfs_filblks_t used;
> - int error;
> + int error = 0;
>
> - if (!ip || ip->i_meta_resv_asked > 0)
> + if (!xfs_has_metadir(mp))
> return 0;
>
> - ASSERT(xfs_is_metadir_inode(ip));
> + /*
> + * Free any previous reservation to have a clean slate.
> + */
> + mutex_lock(&mp->m_metafile_resv_lock);
> + __xfs_metafile_resv_free(mp);
> +
> + /*
> + * Currently the only btree metafiles that require reservations are the
> + * rtrmap and the rtrefcount. Anything new will have to be added here
> + * as well.
> + */
> + while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> + if (xfs_has_rtrmapbt(mp)) {
> + used += rtg_rmap(rtg)->i_nblocks;
> + target += xfs_rtrmapbt_calc_reserves(mp);
> + }
> + if (xfs_has_rtreflink(mp)) {
> + used += rtg_refcount(rtg)->i_nblocks;
> + target += xfs_rtrefcountbt_calc_reserves(mp);
> + }
> + }
> +
> + if (!target)
> + goto out_unlock;
>
> /*
> - * Space taken by all other metadata btrees are accounted on-disk as
> + * Space taken by the per-AG metadata btrees are accounted on-disk as
> * used space. We therefore only hide the space that is reserved but
> * not used by the trees.
> */
> - used = ip->i_nblocks;
> - if (used > ask)
> - ask = used;
> - hidden_space = ask - used;
> + if (used > target)
> + target = used;
> + hidden_space = target - used;
>
> - error = xfs_dec_fdblocks(ip->i_mount, hidden_space, true);
> + error = xfs_dec_fdblocks(mp, hidden_space, true);
> if (error) {
> - trace_xfs_metafile_resv_init_error(ip, error, _RET_IP_);
> - return error;
> + trace_xfs_metafile_resv_init_error(mp, 0);
> + goto out_unlock;
> }
>
> - xfs_mod_delalloc(ip, 0, hidden_space);
> - ip->i_delayed_blks = hidden_space;
> - ip->i_meta_resv_asked = ask;
> + xfs_mod_sb_delalloc(mp, hidden_space);
> +
> + mp->m_metafile_resv_target = target;
> + mp->m_metafile_resv_used = used;
> + mp->m_metafile_resv_avail = hidden_space;
> +
> + trace_xfs_metafile_resv_init(mp, target);
>
> - trace_xfs_metafile_resv_init(ip, ask);
> - return 0;
> +out_unlock:
> + mutex_unlock(&mp->m_metafile_resv_lock);
> + return error;
> }
> diff --git a/fs/xfs/libxfs/xfs_metafile.h b/fs/xfs/libxfs/xfs_metafile.h
> index 95af4b52e5a7..ae6f9e779b98 100644
> --- a/fs/xfs/libxfs/xfs_metafile.h
> +++ b/fs/xfs/libxfs/xfs_metafile.h
> @@ -26,13 +26,13 @@ void xfs_metafile_clear_iflag(struct xfs_trans *tp, struct xfs_inode *ip);
> /* Space reservations for metadata inodes. */
> struct xfs_alloc_arg;
>
> -bool xfs_metafile_resv_critical(struct xfs_inode *ip);
> +bool xfs_metafile_resv_critical(struct xfs_mount *mp);
> void xfs_metafile_resv_alloc_space(struct xfs_inode *ip,
> struct xfs_alloc_arg *args);
> void xfs_metafile_resv_free_space(struct xfs_inode *ip, struct xfs_trans *tp,
> xfs_filblks_t len);
> -void xfs_metafile_resv_free(struct xfs_inode *ip);
> -int xfs_metafile_resv_init(struct xfs_inode *ip, xfs_filblks_t ask);
> +void xfs_metafile_resv_free(struct xfs_mount *mp);
> +int xfs_metafile_resv_init(struct xfs_mount *mp);
>
> /* Code specific to kernel/userspace; must be provided externally. */
>
> diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c
> index b32fb233cf84..cb66918832a9 100644
> --- a/fs/xfs/scrub/reap.c
> +++ b/fs/xfs/scrub/reap.c
> @@ -932,13 +932,17 @@ xrep_reap_metadir_fsblocks(
> xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK);
>
> error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs);
> - if (error)
> + if (error || !xreap_dirty(&rs))
> return error;
>
> - if (xreap_dirty(&rs))
> - return xrep_defer_finish(sc);
> + error = xrep_defer_finish(sc);
> + if (error)
> + return error;
>
> - return 0;
> + /*
> + * Resize the reservations so that we don't fail to expand the btree.
> + */
> + return xfs_metafile_resv_init(sc->mp);
I'm not sure this fixes the reservation correctly -- let's say we have a
busted btree inode with a 100 fsblock reservation that was using 20 fsb.
At mount, we'll have called xfs_dec_fdblocks for 100 - 20 = 80 fsb:
hidden_space = target - used;
xfs_dec_fdblocks(mp, hidden_space, true);
Then we rebuild it with a more compact 18 fsb btree. Now we run
xfs_metafile_resv_init and it thinks it needs to call xfs_dec_fdblocks
for 100 - 18 = 82 fsb. But we already reserved 80 for this file, so the
correct thing to do here is to xfs_dec_fdblocks 2fsb from fdblocks, not
another 82.
--D
> }
>
> /*
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 3b5288d3ef4e..5cdaf82aee7b 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -1380,57 +1380,3 @@ xrep_inode_set_nblocks(
> xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT,
> delta);
> }
> -
> -/* Reset the block reservation for a metadata inode. */
> -int
> -xrep_reset_metafile_resv(
> - struct xfs_scrub *sc)
> -{
> - struct xfs_inode *ip = sc->ip;
> - int64_t delta;
> - int error;
> -
> - delta = ip->i_nblocks + ip->i_delayed_blks - ip->i_meta_resv_asked;
> - if (delta == 0)
> - return 0;
> -
> - /*
> - * Too many blocks have been reserved, transfer some from the incore
> - * reservation back to the filesystem.
> - */
> - if (delta > 0) {
> - int64_t give_back;
> -
> - give_back = min_t(uint64_t, delta, ip->i_delayed_blks);
> - if (give_back > 0) {
> - xfs_mod_delalloc(ip, 0, -give_back);
> - xfs_add_fdblocks(ip->i_mount, give_back);
> - ip->i_delayed_blks -= give_back;
> - }
> -
> - return 0;
> - }
> -
> - /*
> - * Not enough reservation; try to take some blocks from the filesystem
> - * to the metadata inode. @delta is negative here, so invert the sign.
> - */
> - delta = -delta;
> - error = xfs_dec_fdblocks(sc->mp, delta, true);
> - while (error == -ENOSPC) {
> - delta--;
> - if (delta == 0) {
> - xfs_warn(sc->mp,
> -"Insufficient free space to reset space reservation for inode 0x%llx after repair.",
> - ip->i_ino);
> - return 0;
> - }
> - error = xfs_dec_fdblocks(sc->mp, delta, true);
> - }
> - if (error)
> - return error;
> -
> - xfs_mod_delalloc(ip, 0, delta);
> - ip->i_delayed_blks += delta;
> - return 0;
> -}
> diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> index 823c00d1a502..342bae348b47 100644
> --- a/fs/xfs/scrub/repair.h
> +++ b/fs/xfs/scrub/repair.h
> @@ -186,7 +186,6 @@ void xrep_trans_cancel_hook_dummy(void **cookiep, struct xfs_trans *tp);
>
> bool xrep_buf_verify_struct(struct xfs_buf *bp, const struct xfs_buf_ops *ops);
> void xrep_inode_set_nblocks(struct xfs_scrub *sc, int64_t new_blocks);
> -int xrep_reset_metafile_resv(struct xfs_scrub *sc);
>
> #else
>
> diff --git a/fs/xfs/scrub/rtrefcount_repair.c b/fs/xfs/scrub/rtrefcount_repair.c
> index 257cfb24beb4..983362447826 100644
> --- a/fs/xfs/scrub/rtrefcount_repair.c
> +++ b/fs/xfs/scrub/rtrefcount_repair.c
> @@ -697,32 +697,6 @@ xrep_rtrefc_build_new_tree(
> return error;
> }
>
> -/*
> - * Now that we've logged the roots of the new btrees, invalidate all of the
> - * old blocks and free them.
> - */
> -STATIC int
> -xrep_rtrefc_remove_old_tree(
> - struct xrep_rtrefc *rr)
> -{
> - int error;
> -
> - /*
> - * Free all the extents that were allocated to the former rtrefcountbt
> - * and aren't cross-linked with something else.
> - */
> - error = xrep_reap_metadir_fsblocks(rr->sc,
> - &rr->old_rtrefcountbt_blocks);
> - if (error)
> - return error;
> -
> - /*
> - * Ensure the proper reservation for the rtrefcount inode so that we
> - * don't fail to expand the btree.
> - */
> - return xrep_reset_metafile_resv(rr->sc);
> -}
> -
> /* Rebuild the rt refcount btree. */
> int
> xrep_rtrefcountbt(
> @@ -769,8 +743,12 @@ xrep_rtrefcountbt(
> if (error)
> goto out_bitmap;
>
> - /* Kill the old tree. */
> - error = xrep_rtrefc_remove_old_tree(rr);
> + /*
> + * Free all the extents that were allocated to the former rtrefcountbt
> + * and aren't cross-linked with something else.
> + */
> + error = xrep_reap_metadir_fsblocks(rr->sc,
> + &rr->old_rtrefcountbt_blocks);
> if (error)
> goto out_bitmap;
>
> diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c
> index f2fdd7a9fc24..fc2592c53af5 100644
> --- a/fs/xfs/scrub/rtrmap_repair.c
> +++ b/fs/xfs/scrub/rtrmap_repair.c
> @@ -810,28 +810,6 @@ xrep_rtrmap_build_new_tree(
>
> /* Reaping the old btree. */
>
> -/* Reap the old rtrmapbt blocks. */
> -STATIC int
> -xrep_rtrmap_remove_old_tree(
> - struct xrep_rtrmap *rr)
> -{
> - int error;
> -
> - /*
> - * Free all the extents that were allocated to the former rtrmapbt and
> - * aren't cross-linked with something else.
> - */
> - error = xrep_reap_metadir_fsblocks(rr->sc, &rr->old_rtrmapbt_blocks);
> - if (error)
> - return error;
> -
> - /*
> - * Ensure the proper reservation for the rtrmap inode so that we don't
> - * fail to expand the new btree.
> - */
> - return xrep_reset_metafile_resv(rr->sc);
> -}
> -
> static inline bool
> xrep_rtrmapbt_want_live_update(
> struct xchk_iscan *iscan,
> @@ -995,8 +973,11 @@ xrep_rtrmapbt(
> if (error)
> goto out_records;
>
> - /* Kill the old tree. */
> - error = xrep_rtrmap_remove_old_tree(rr);
> + /*
> + * Free all the extents that were allocated to the former rtrmapbt and
> + * aren't cross-linked with something else.
> + */
> + error = xrep_reap_metadir_fsblocks(rr->sc, &rr->old_rtrmapbt_blocks);
> if (error)
> goto out_records;
>
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 73275d7a6ec0..22d50f8acd92 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -24,6 +24,7 @@
> #include "xfs_rtalloc.h"
> #include "xfs_rtrmap_btree.h"
> #include "xfs_rtrefcount_btree.h"
> +#include "xfs_metafile.h"
>
> /*
> * Write new AG headers to disk. Non-transactional, but need to be
> @@ -561,15 +562,13 @@ xfs_fs_reserve_ag_blocks(
> return error;
> }
>
> - if (xfs_has_realtime(mp)) {
> - err2 = xfs_rt_resv_init(mp);
> - if (err2 && err2 != -ENOSPC) {
> - xfs_warn(mp,
> - "Error %d reserving realtime metadata reserve pool.", err2);
> - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> - }
> + err2 = xfs_metafile_resv_init(mp);
> + if (err2 && err2 != -ENOSPC) {
> + xfs_warn(mp,
> + "Error %d reserving realtime metadata reserve pool.", err2);
> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>
> - if (err2 && !error)
> + if (!error)
> error = err2;
> }
>
> @@ -585,9 +584,7 @@ xfs_fs_unreserve_ag_blocks(
> {
> struct xfs_perag *pag = NULL;
>
> - if (xfs_has_realtime(mp))
> - xfs_rt_resv_free(mp);
> -
> + xfs_metafile_resv_free(mp);
> while ((pag = xfs_perag_next(mp, pag)))
> xfs_ag_resv_free(pag);
> }
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index c08093a65352..1648dc5a8068 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -25,19 +25,9 @@ struct xfs_dquot;
> typedef struct xfs_inode {
> /* Inode linking and identification information. */
> struct xfs_mount *i_mount; /* fs mount struct ptr */
> - union {
> - struct {
> - struct xfs_dquot *i_udquot; /* user dquot */
> - struct xfs_dquot *i_gdquot; /* group dquot */
> - struct xfs_dquot *i_pdquot; /* project dquot */
> - };
> -
> - /*
> - * Space that has been set aside to accomodate expansions of a
> - * metadata btree rooted in this file.
> - */
> - uint64_t i_meta_resv_asked;
> - };
> + struct xfs_dquot *i_udquot; /* user dquot */
> + struct xfs_dquot *i_gdquot; /* group dquot */
> + struct xfs_dquot *i_pdquot; /* project dquot */
>
> /* Inode location stuff */
> xfs_ino_t i_ino; /* inode number (agno/agino)*/
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index d4a57e2fdcc5..9bfb5e08715d 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -263,6 +263,11 @@ typedef struct xfs_mount {
> atomic_t m_agirotor; /* last ag dir inode alloced */
> atomic_t m_rtgrotor; /* last rtgroup rtpicked */
>
> + struct mutex m_metafile_resv_lock;
> + uint64_t m_metafile_resv_target;
> + uint64_t m_metafile_resv_used;
> + uint64_t m_metafile_resv_avail;
> +
> /* Memory shrinker to throttle and reprioritize inodegc */
> struct shrinker *m_inodegc_shrinker;
> /*
> @@ -737,5 +742,9 @@ int xfs_add_incompat_log_feature(struct xfs_mount *mp, uint32_t feature);
> bool xfs_clear_incompat_log_features(struct xfs_mount *mp);
> void xfs_mod_delalloc(struct xfs_inode *ip, int64_t data_delta,
> int64_t ind_delta);
> +static inline void xfs_mod_sb_delalloc(struct xfs_mount *mp, int64_t delta)
> +{
> + percpu_counter_add(&mp->m_delalloc_blks, delta);
> +}
>
> #endif /* __XFS_MOUNT_H__ */
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 3e778e077d09..fd65e5d7994a 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1207,15 +1207,9 @@ xfs_reflink_ag_has_free_space(
> if (!xfs_has_rmapbt(mp))
> return 0;
> if (XFS_IS_REALTIME_INODE(ip)) {
> - struct xfs_rtgroup *rtg;
> - xfs_rgnumber_t rgno;
> -
> - rgno = xfs_rtb_to_rgno(mp, fsb);
> - rtg = xfs_rtgroup_get(mp, rgno);
> - if (xfs_metafile_resv_critical(rtg_rmap(rtg)))
> - error = -ENOSPC;
> - xfs_rtgroup_put(rtg);
> - return error;
> + if (xfs_metafile_resv_critical(mp))
> + return -ENOSPC;
> + return 0;
> }
>
> agno = XFS_FSB_TO_AGNO(mp, fsb);
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 8da2498417f5..f5a0dbc46a14 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -1398,8 +1398,7 @@ xfs_growfs_rt(
> error = error2;
>
> /* Reset the rt metadata btree space reservations. */
> - xfs_rt_resv_free(mp);
> - error2 = xfs_rt_resv_init(mp);
> + error2 = xfs_metafile_resv_init(mp);
> if (error2 && error2 != -ENOSPC)
> error = error2;
> }
> @@ -1523,46 +1522,6 @@ xfs_rtalloc_reinit_frextents(
> return 0;
> }
>
> -/* Free space reservations for rt metadata inodes. */
> -void
> -xfs_rt_resv_free(
> - struct xfs_mount *mp)
> -{
> - struct xfs_rtgroup *rtg = NULL;
> - unsigned int i;
> -
> - while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> - for (i = 0; i < XFS_RTGI_MAX; i++)
> - xfs_metafile_resv_free(rtg->rtg_inodes[i]);
> - }
> -}
> -
> -/* Reserve space for rt metadata inodes' space expansion. */
> -int
> -xfs_rt_resv_init(
> - struct xfs_mount *mp)
> -{
> - struct xfs_rtgroup *rtg = NULL;
> - xfs_filblks_t ask;
> - int error = 0;
> -
> - while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> - int err2;
> -
> - ask = xfs_rtrmapbt_calc_reserves(mp);
> - err2 = xfs_metafile_resv_init(rtg_rmap(rtg), ask);
> - if (err2 && !error)
> - error = err2;
> -
> - ask = xfs_rtrefcountbt_calc_reserves(mp);
> - err2 = xfs_metafile_resv_init(rtg_refcount(rtg), ask);
> - if (err2 && !error)
> - error = err2;
> - }
> -
> - return error;
> -}
> -
> /*
> * Read in the bmbt of an rt metadata inode so that we never have to load them
> * at runtime. This enables the use of shared ILOCKs for rtbitmap scans. Use
> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
> index 0d95b29092c9..78a690b489ed 100644
> --- a/fs/xfs/xfs_rtalloc.h
> +++ b/fs/xfs/xfs_rtalloc.h
> @@ -34,9 +34,6 @@ int /* error */
> xfs_rtmount_inodes(
> struct xfs_mount *mp); /* file system mount structure */
>
> -void xfs_rt_resv_free(struct xfs_mount *mp);
> -int xfs_rt_resv_init(struct xfs_mount *mp);
> -
> /*
> * Grow the realtime area of the filesystem.
> */
> @@ -65,8 +62,6 @@ xfs_rtmount_init(
> }
> # define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (-ENOSYS))
> # define xfs_rtunmount_inodes(m)
> -# define xfs_rt_resv_free(mp) ((void)0)
> -# define xfs_rt_resv_init(mp) (0)
>
> static inline int
> xfs_growfs_check_rtgeom(const struct xfs_mount *mp,
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 5c9a2a0826ff..4414c8542144 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -2089,6 +2089,7 @@ xfs_init_fs_context(
> for (i = 0; i < XG_TYPE_MAX; i++)
> xa_init(&mp->m_groups[i].xa);
> mutex_init(&mp->m_growlock);
> + mutex_init(&mp->m_metafile_resv_lock);
> INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker);
> INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
> mp->m_kobj.kobject.kset = xfs_kset;
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 740e0a8c3eca..a02129c202b2 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -5605,11 +5605,10 @@ DEFINE_METADIR_EVENT(xfs_metadir_lookup);
> /* metadata inode space reservations */
>
> DECLARE_EVENT_CLASS(xfs_metafile_resv_class,
> - TP_PROTO(struct xfs_inode *ip, xfs_filblks_t len),
> - TP_ARGS(ip, len),
> + TP_PROTO(struct xfs_mount *mp, xfs_filblks_t len),
> + TP_ARGS(mp, len),
> TP_STRUCT__entry(
> __field(dev_t, dev)
> - __field(xfs_ino_t, ino)
> __field(unsigned long long, freeblks)
> __field(unsigned long long, reserved)
> __field(unsigned long long, asked)
> @@ -5617,19 +5616,15 @@ DECLARE_EVENT_CLASS(xfs_metafile_resv_class,
> __field(unsigned long long, len)
> ),
> TP_fast_assign(
> - struct xfs_mount *mp = ip->i_mount;
> -
> __entry->dev = mp->m_super->s_dev;
> - __entry->ino = ip->i_ino;
> __entry->freeblks = xfs_sum_freecounter_raw(mp, XC_FREE_BLOCKS);
> - __entry->reserved = ip->i_delayed_blks;
> - __entry->asked = ip->i_meta_resv_asked;
> - __entry->used = ip->i_nblocks;
> + __entry->reserved = mp->m_metafile_resv_avail;
> + __entry->asked = mp->m_metafile_resv_target;
> + __entry->used = mp->m_metafile_resv_used;
> __entry->len = len;
> ),
> - TP_printk("dev %d:%d ino 0x%llx freeblks %llu resv %llu ask %llu used %llu len %llu",
> + TP_printk("dev %d:%d freeblks %llu resv %llu ask %llu used %llu len %llu",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> - __entry->ino,
> __entry->freeblks,
> __entry->reserved,
> __entry->asked,
> @@ -5638,14 +5633,14 @@ DECLARE_EVENT_CLASS(xfs_metafile_resv_class,
> )
> #define DEFINE_METAFILE_RESV_EVENT(name) \
> DEFINE_EVENT(xfs_metafile_resv_class, name, \
> - TP_PROTO(struct xfs_inode *ip, xfs_filblks_t len), \
> - TP_ARGS(ip, len))
> + TP_PROTO(struct xfs_mount *mp, xfs_filblks_t len), \
> + TP_ARGS(mp, len))
> DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_init);
> DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_free);
> DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_alloc_space);
> DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_free_space);
> DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_critical);
> -DEFINE_INODE_ERROR_EVENT(xfs_metafile_resv_init_error);
> +DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_init_error);
>
> #ifdef CONFIG_XFS_RT
> TRACE_EVENT(xfs_growfs_check_rtgeom,
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2025-02-06 20:50 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 6:44 support for zoned devices RFCv2 Christoph Hellwig
2025-02-06 6:44 ` [PATCH 01/43] xfs: factor out a xfs_rt_check_size helper Christoph Hellwig
2025-02-06 6:44 ` [PATCH 02/43] xfs: add a rtg_blocks helper Christoph Hellwig
2025-02-06 6:44 ` [PATCH 03/43] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
2025-02-06 6:44 ` [PATCH 04/43] xfs: skip always_cow inodes in xfs_reflink_trim_around_shared Christoph Hellwig
2025-02-06 20:13 ` Darrick J. Wong
2025-02-07 4:15 ` Christoph Hellwig
2025-02-07 4:22 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 05/43] xfs: refine the unaligned check for always COW inodes in xfs_file_dio_write Christoph Hellwig
2025-02-06 6:44 ` [PATCH 06/43] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
2025-02-06 21:29 ` Darrick J. Wong
2025-02-07 4:21 ` Christoph Hellwig
2025-02-07 4:28 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 07/43] xfs: preserve RT reservations across remounts Christoph Hellwig
2025-02-06 20:14 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 08/43] xfs: reflow xfs_dec_freecounter Christoph Hellwig
2025-02-06 20:14 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 09/43] xfs: trace in-memory freecounters Christoph Hellwig
2025-02-06 20:15 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 10/43] xfs: make metabtree reservations global Christoph Hellwig
2025-02-06 20:50 ` Darrick J. Wong [this message]
2025-02-07 4:18 ` Christoph Hellwig
2025-02-07 4:24 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 11/43] xfs: reduce metafile reservations Christoph Hellwig
2025-02-06 20:52 ` Darrick J. Wong
2025-02-07 4:19 ` Christoph Hellwig
2025-02-07 4:26 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 12/43] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
2025-02-06 20:54 ` Darrick J. Wong
2025-02-07 4:19 ` Christoph Hellwig
2025-02-06 6:44 ` [PATCH 13/43] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
2025-02-06 20:55 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 14/43] xfs: define the zoned on-disk format Christoph Hellwig
2025-02-06 21:00 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 15/43] xfs: allow internal RT devices for zoned mode Christoph Hellwig
2025-02-06 6:44 ` [PATCH 16/43] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
2025-02-06 21:03 ` Darrick J. Wong
2025-02-07 4:20 ` Christoph Hellwig
2025-02-06 6:44 ` [PATCH 17/43] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
2025-02-06 6:44 ` [PATCH 18/43] xfs: disable FITRIM for zoned RT devices Christoph Hellwig
2025-02-06 6:44 ` [PATCH 19/43] xfs: don't call xfs_can_free_eofblocks from ->release for zoned inodes Christoph Hellwig
2025-02-06 6:44 ` [PATCH 20/43] xfs: skip zoned RT inodes in xfs_inodegc_want_queue_rt_file Christoph Hellwig
2025-02-06 21:04 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 21/43] xfs: parse and validate hardware zone information Christoph Hellwig
2025-02-06 21:05 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 22/43] xfs: add the zoned space allocator Christoph Hellwig
2025-02-07 17:39 ` Darrick J. Wong
2025-02-13 5:14 ` Christoph Hellwig
2025-02-13 8:35 ` Christoph Hellwig
2025-02-17 9:20 ` Christoph Hellwig
2025-02-13 22:08 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 23/43] xfs: add support for zoned space reservations Christoph Hellwig
2025-02-07 17:52 ` Darrick J. Wong
2025-02-13 5:17 ` Christoph Hellwig
2025-02-13 22:09 ` Darrick J. Wong
2025-02-14 6:20 ` Christoph Hellwig
2025-02-14 18:25 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 24/43] xfs: implement zoned garbage collection Christoph Hellwig
2025-02-07 18:33 ` Darrick J. Wong
2025-02-13 5:22 ` Christoph Hellwig
2025-02-13 22:10 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 25/43] xfs: implement buffered writes to zoned RT devices Christoph Hellwig
2025-02-12 0:54 ` Darrick J. Wong
2025-02-13 5:39 ` Christoph Hellwig
2025-02-13 23:05 ` Darrick J. Wong
2025-02-14 6:22 ` Christoph Hellwig
2025-02-14 18:29 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 26/43] xfs: implement direct " Christoph Hellwig
2025-02-06 6:44 ` [PATCH 27/43] xfs: wire up zoned block freeing in xfs_rtextent_free_finish_item Christoph Hellwig
2025-02-06 6:44 ` [PATCH 28/43] xfs: hide reserved RT blocks from statfs Christoph Hellwig
2025-02-07 4:32 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 29/43] xfs: support growfs on zoned file systems Christoph Hellwig
2025-02-06 6:44 ` [PATCH 30/43] xfs: allow COW forks on zoned file systems in xchk_bmap Christoph Hellwig
2025-02-06 6:44 ` [PATCH 31/43] xfs: support xchk_xref_is_used_rt_space on zoned file systems Christoph Hellwig
2025-02-06 6:44 ` [PATCH 32/43] xfs: support xrep_require_rtext_inuse " Christoph Hellwig
2025-02-06 6:44 ` [PATCH 33/43] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
2025-02-07 4:39 ` Darrick J. Wong
2025-02-07 4:55 ` Christoph Hellwig
2025-02-06 6:44 ` [PATCH 34/43] xfs: disable reflink for zoned file systems Christoph Hellwig
2025-02-07 4:31 ` Darrick J. Wong
2025-02-07 4:54 ` Christoph Hellwig
2025-02-07 5:05 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 35/43] xfs: disable rt quotas " Christoph Hellwig
2025-02-07 4:31 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 36/43] xfs: enable the zoned RT device feature Christoph Hellwig
2025-02-06 6:44 ` [PATCH 37/43] xfs: support zone gaps Christoph Hellwig
2025-02-06 6:44 ` [PATCH 38/43] xfs: add a max_open_zones mount option Christoph Hellwig
2025-02-07 4:29 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 39/43] xfs: support write life time based data placement Christoph Hellwig
2025-02-12 0:27 ` Darrick J. Wong
2025-02-12 13:29 ` Hans Holmberg
2025-02-13 4:42 ` Darrick J. Wong
2025-02-13 13:03 ` Hans Holmberg
2025-02-14 6:41 ` hch
2025-02-14 12:20 ` Hans Holmberg
2025-02-14 16:21 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 40/43] xfs: wire up the show_stats super operation Christoph Hellwig
2025-02-06 6:44 ` [PATCH 41/43] xfs: export zone stats in /proc/*/mountstats Christoph Hellwig
2025-02-07 1:02 ` Darrick J. Wong
2025-02-07 4:25 ` Christoph Hellwig
2025-02-06 6:44 ` [PATCH 42/43] xfs: contain more sysfs code in xfs_sysfs.c Christoph Hellwig
2025-02-07 0:54 ` Darrick J. Wong
2025-02-06 6:44 ` [PATCH 43/43] xfs: export max_open_zones in sysfs Christoph Hellwig
2025-02-07 0:52 ` Darrick J. Wong
2025-02-07 4:23 ` Christoph Hellwig
2025-02-07 4:27 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250206205021.GL21808@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=cem@kernel.org \
--cc=hans.holmberg@wdc.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.