* xfsprogs support for zoned devices v2
@ 2025-04-14 5:35 Christoph Hellwig
2025-04-14 5:35 ` [PATCH 01/43] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
` (43 more replies)
0 siblings, 44 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Hi all,
this series add support for zoned devices to xfsprogs, matching the
kernel support that landed in 6.15-rc1.
For the libxfs syncs I kept separate fixup patches for the xfsprogs
local changes not directly ported from the kernel to make rebasing
easier. I can squash them for the final submission if needed.
Changes since v1:
- improve an error message in mkfs
- improve the comments for the new i_used_blocks field
- improve commit messages
- add a rtdev_name helper based on a mail from Darrick
- consolidate geometry reporting in a single patch
- consistently use xfs_rtblock_t
- fix indentation in check_zones and report_zones
- use BTOBB in check_zones and report_zones
- spelling fix in the mkfs man page
- s/log/RT/ in ioctl_xfs_fsgeometry.2
- fix a hung that slipped into the wrong patch
- split creating validate_rtgroup_geometry into a separate patch
- add a comment why cfg->rtstart can't be overridden for zoned
devices
- drop "xfs_io: don't re-query geometry information in fsmap_f"
because it was causing problems on non-xfs file systems
- use sb_rextents to check for a present RT section in
process_dinode_int
- check for rtstart instead of XFS_FSOP_GEOM_FLAGS_ZONED in scrub
phase1_func
- exit on fatal error in report_zones and report errno values where
applicable
- report BLKRESETZONE errno
- allow overriding -d rthinherit
- merge the two scrub patches
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 01/43] xfs: generalize the freespace and reserved blocks handling
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 02/43] FIXUP: " Christoph Hellwig
` (42 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 712bae96631852c1a1822ee4f57a08ccd843358b
xfs_{add,dec}_freecounter already handles the block and RT extent
percpu counters, but it currently hardcodes the passed in counter.
Add a freecounter abstraction that uses an enum to designate the counter
and add wrappers that hide the actual percpu_counters. This will allow
expanding the reserved block handling to the RT extent counter in the
next step, and also prepares for adding yet another such counter that
can share the code. Both these additions will be needed for the zoned
allocator.
Also switch the flooring of the frextents counter to 0 in statfs for the
rthinherit case to a manual min_t call to match the handling of the
fdblocks counter for normal file systems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_ialloc.c | 2 +-
libxfs/xfs_metafile.c | 2 +-
libxfs/xfs_sb.c | 8 ++++----
libxfs/xfs_types.h | 17 +++++++++++++++++
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 63ce76755eb7..b401299ad933 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1922,7 +1922,7 @@ xfs_dialloc(
* that we can immediately allocate, but then we allow allocation on the
* second pass if we fail to find an AG with free inodes in it.
*/
- if (percpu_counter_read_positive(&mp->m_fdblocks) <
+ if (xfs_estimate_freecounter(mp, XC_FREE_BLOCKS) <
mp->m_low_space[XFS_LOWSP_1_PCNT]) {
ok_alloc = false;
low_space = true;
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 4488e38a8734..7673265510fd 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -93,7 +93,7 @@ xfs_metafile_resv_can_cover(
* There aren't enough blocks left in the inode's reservation, but it
* isn't critical unless there also isn't enough free space.
*/
- return __percpu_counter_compare(&ip->i_mount->m_fdblocks,
+ return xfs_compare_freecounter(ip->i_mount, XC_FREE_BLOCKS,
rhs - ip->i_delayed_blks, 2048) >= 0;
}
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 50a43c0328cb..1781ca36b2cc 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1262,8 +1262,7 @@ xfs_log_sb(
mp->m_sb.sb_ifree = min_t(uint64_t,
percpu_counter_sum_positive(&mp->m_ifree),
mp->m_sb.sb_icount);
- mp->m_sb.sb_fdblocks =
- percpu_counter_sum_positive(&mp->m_fdblocks);
+ mp->m_sb.sb_fdblocks = xfs_sum_freecounter(mp, XC_FREE_BLOCKS);
}
/*
@@ -1272,9 +1271,10 @@ xfs_log_sb(
* we handle nearly-lockless reservations, so we must use the _positive
* variant here to avoid writing out nonsense frextents.
*/
- if (xfs_has_rtgroups(mp))
+ if (xfs_has_rtgroups(mp)) {
mp->m_sb.sb_frextents =
- percpu_counter_sum_positive(&mp->m_frextents);
+ xfs_sum_freecounter(mp, XC_FREE_RTEXTENTS);
+ }
xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index ca2401c1facd..76f3c31573ec 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -233,6 +233,23 @@ enum xfs_group_type {
{ XG_TYPE_AG, "ag" }, \
{ XG_TYPE_RTG, "rtg" }
+enum xfs_free_counter {
+ /*
+ * Number of free blocks on the data device.
+ */
+ XC_FREE_BLOCKS,
+
+ /*
+ * Number of free RT extents on the RT device.
+ */
+ XC_FREE_RTEXTENTS,
+ XC_FREE_NR,
+};
+
+#define XFS_FREECOUNTER_STR \
+ { XC_FREE_BLOCKS, "blocks" }, \
+ { XC_FREE_RTEXTENTS, "rtextents" }
+
/*
* Type verifier functions
*/
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 02/43] FIXUP: xfs: generalize the freespace and reserved blocks handling
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
2025-04-14 5:35 ` [PATCH 01/43] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 03/43] xfs: make metabtree reservations global Christoph Hellwig
` (41 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
include/xfs_mount.h | 32 ++++++++++++++++++++++++++++++--
libxfs/libxfs_priv.h | 13 +------------
2 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 383cba7d6e3f..e0f72fc32b25 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -63,8 +63,6 @@ typedef struct xfs_mount {
xfs_sb_t m_sb; /* copy of fs superblock */
#define m_icount m_sb.sb_icount
#define m_ifree m_sb.sb_ifree
-#define m_fdblocks m_sb.sb_fdblocks
-#define m_frextents m_sb.sb_frextents
spinlock_t m_sb_lock;
/*
@@ -332,6 +330,36 @@ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
__XFS_UNSUPP_OPSTATE(readonly)
__XFS_UNSUPP_OPSTATE(shutdown)
+static inline int64_t xfs_sum_freecounter(struct xfs_mount *mp,
+ enum xfs_free_counter ctr)
+{
+ if (ctr == XC_FREE_RTEXTENTS)
+ return mp->m_sb.sb_frextents;
+ return mp->m_sb.sb_fdblocks;
+}
+
+static inline int64_t xfs_estimate_freecounter(struct xfs_mount *mp,
+ enum xfs_free_counter ctr)
+{
+ return xfs_sum_freecounter(mp, ctr);
+}
+
+static inline int xfs_compare_freecounter(struct xfs_mount *mp,
+ enum xfs_free_counter ctr, int64_t rhs, int32_t batch)
+{
+ uint64_t count;
+
+ if (ctr == XC_FREE_RTEXTENTS)
+ count = mp->m_sb.sb_frextents;
+ else
+ count = mp->m_sb.sb_fdblocks;
+ if (count > rhs)
+ return 1;
+ else if (count < rhs)
+ return -1;
+ return 0;
+}
+
/* don't fail on device size or AG count checks */
#define LIBXFS_MOUNT_DEBUGGER (1U << 0)
/* report metadata corruption to stdout */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 7e5c125b581a..cb4800de0b11 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -209,7 +209,7 @@ static inline bool WARN_ON(bool expr) {
}
#define WARN_ON_ONCE(e) WARN_ON(e)
-#define percpu_counter_read(x) (*x)
+
#define percpu_counter_read_positive(x) ((*x) > 0 ? (*x) : 0)
#define percpu_counter_sum_positive(x) ((*x) > 0 ? (*x) : 0)
@@ -219,17 +219,6 @@ uint32_t get_random_u32(void);
#define get_random_u32() (0)
#endif
-static inline int
-__percpu_counter_compare(uint64_t *count, int64_t rhs, int32_t batch)
-{
- if (*count > rhs)
- return 1;
- else if (*count < rhs)
- return -1;
- return 0;
-}
-
-
#define PAGE_SIZE getpagesize()
extern unsigned int PAGE_SHIFT;
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 03/43] xfs: make metabtree reservations global
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
2025-04-14 5:35 ` [PATCH 01/43] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
2025-04-14 5:35 ` [PATCH 02/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 04/43] FIXUP: " Christoph Hellwig
` (40 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 1df8d75030b787a9fae270b59b93eef809dd2011
Currently each metabtree inode has it's own space reservation to ensure
it can be expanded to the maximum size, mirroring what is done for the
AG-based btrees. But unlike the AG-based btrees the metabtree inodes
aren't restricted to allocate from a single AG but can use free space
form the entire file system. And unlike AG-based btrees where the
required reservation shrinks with the available free space due to this,
the metabtree reservations for the rtrmap and rtfreflink trees are not
bound in any way by the data device free space as they track RT extent
allocations. This is not very efficient as it requires a large number
of blocks to be set aside that can't be used at all by other btrees.
Switch to a model that uses a global pool instead in preparation for
reducing the amount of reserved space, which now also removes the
overloading of the i_nblocks field for metabtree inodes, which would
create problems if metabtree inodes ever had a big enough xattr fork
to require xattr blocks outside the inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_metafile.c | 164 ++++++++++++++++++++++++++----------------
libxfs/xfs_metafile.h | 6 +-
2 files changed, 107 insertions(+), 63 deletions(-)
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 7673265510fd..1216a0e5169e 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -19,6 +19,9 @@
#include "xfs_inode.h"
#include "xfs_errortag.h"
#include "xfs_alloc.h"
+#include "xfs_rtgroup.h"
+#include "xfs_rtrmap_btree.h"
+#include "xfs_rtrefcount_btree.h"
static const struct {
enum xfs_metafile_type mtype;
@@ -72,12 +75,11 @@ xfs_metafile_clear_iflag(
}
/*
- * Is the amount of space that could be allocated towards a given metadata
- * file at or beneath a certain threshold?
+ * Is the metafile reservations at or beneath a certain threshold?
*/
static inline bool
xfs_metafile_resv_can_cover(
- struct xfs_inode *ip,
+ struct xfs_mount *mp,
int64_t rhs)
{
/*
@@ -86,43 +88,38 @@ xfs_metafile_resv_can_cover(
* global free block count. Take care of the first case to avoid
* touching the per-cpu counter.
*/
- if (ip->i_delayed_blks >= rhs)
+ if (mp->m_metafile_resv_avail >= rhs)
return true;
/*
* There aren't enough blocks left in the inode's reservation, but it
* isn't critical unless there also isn't enough free space.
*/
- return xfs_compare_freecounter(ip->i_mount, XC_FREE_BLOCKS,
- rhs - ip->i_delayed_blks, 2048) >= 0;
+ return xfs_compare_freecounter(mp, XC_FREE_BLOCKS,
+ rhs - mp->m_metafile_resv_avail, 2048) >= 0;
}
/*
- * Is this metadata file critically low on blocks? For now we'll define that
- * as the number of blocks we can get our hands on being less than 10% of what
- * we reserved or less than some arbitrary number (maximum btree height).
+ * Is the metafile reservation critically low on blocks? For now we'll define
+ * that as the number of blocks we can get our hands on being less than 10% of
+ * what we reserved or less than some arbitrary number (maximum btree height).
*/
bool
xfs_metafile_resv_critical(
- struct xfs_inode *ip)
+ struct xfs_mount *mp)
{
- uint64_t asked_low_water;
+ ASSERT(xfs_has_metadir(mp));
- if (!ip)
- return false;
-
- ASSERT(xfs_is_metadir_inode(ip));
- trace_xfs_metafile_resv_critical(ip, 0);
+ trace_xfs_metafile_resv_critical(mp, 0);
- if (!xfs_metafile_resv_can_cover(ip, ip->i_mount->m_rtbtree_maxlevels))
+ if (!xfs_metafile_resv_can_cover(mp, mp->m_rtbtree_maxlevels))
return true;
- asked_low_water = div_u64(ip->i_meta_resv_asked, 10);
- if (!xfs_metafile_resv_can_cover(ip, asked_low_water))
+ if (!xfs_metafile_resv_can_cover(mp,
+ div_u64(mp->m_metafile_resv_target, 10)))
return true;
- return XFS_TEST_ERROR(false, ip->i_mount,
- XFS_ERRTAG_METAFILE_RESV_CRITICAL);
+ return XFS_TEST_ERROR(false, mp, XFS_ERRTAG_METAFILE_RESV_CRITICAL);
}
/* Allocate a block from the metadata file's reservation. */
@@ -131,22 +128,24 @@ xfs_metafile_resv_alloc_space(
struct xfs_inode *ip,
struct xfs_alloc_arg *args)
{
+ struct xfs_mount *mp = ip->i_mount;
int64_t len = args->len;
ASSERT(xfs_is_metadir_inode(ip));
ASSERT(args->resv == XFS_AG_RESV_METAFILE);
- trace_xfs_metafile_resv_alloc_space(ip, args->len);
+ trace_xfs_metafile_resv_alloc_space(mp, args->len);
/*
* Allocate the blocks from the metadata inode's block reservation
* and update the ondisk sb counter.
*/
- if (ip->i_delayed_blks > 0) {
+ mutex_lock(&mp->m_metafile_resv_lock);
+ if (mp->m_metafile_resv_avail > 0) {
int64_t from_resv;
- from_resv = min_t(int64_t, len, ip->i_delayed_blks);
- ip->i_delayed_blks -= from_resv;
+ from_resv = min_t(int64_t, len, mp->m_metafile_resv_avail);
+ mp->m_metafile_resv_avail -= from_resv;
xfs_mod_delalloc(ip, 0, -from_resv);
xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS,
-from_resv);
@@ -173,6 +172,9 @@ xfs_metafile_resv_alloc_space(
xfs_trans_mod_sb(args->tp, field, -len);
}
+ mp->m_metafile_resv_used += args->len;
+ mutex_unlock(&mp->m_metafile_resv_lock);
+
ip->i_nblocks += args->len;
xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE);
}
@@ -184,26 +186,33 @@ xfs_metafile_resv_free_space(
struct xfs_trans *tp,
xfs_filblks_t len)
{
+ struct xfs_mount *mp = ip->i_mount;
int64_t to_resv;
ASSERT(xfs_is_metadir_inode(ip));
- trace_xfs_metafile_resv_free_space(ip, len);
+
+ trace_xfs_metafile_resv_free_space(mp, len);
ip->i_nblocks -= len;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ mutex_lock(&mp->m_metafile_resv_lock);
+ mp->m_metafile_resv_used -= len;
+
/*
* Add the freed blocks back into the inode's delalloc reservation
* until it reaches the maximum size. Update the ondisk fdblocks only.
*/
- to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks);
+ to_resv = mp->m_metafile_resv_target -
+ (mp->m_metafile_resv_used + mp->m_metafile_resv_avail);
if (to_resv > 0) {
to_resv = min_t(int64_t, to_resv, len);
- ip->i_delayed_blks += to_resv;
+ mp->m_metafile_resv_avail += to_resv;
xfs_mod_delalloc(ip, 0, to_resv);
xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, to_resv);
len -= to_resv;
}
+ mutex_unlock(&mp->m_metafile_resv_lock);
/*
* Everything else goes back to the filesystem, so update the in-core
@@ -213,61 +222,96 @@ xfs_metafile_resv_free_space(
xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len);
}
-/* Release a metadata file's space reservation. */
+static void
+__xfs_metafile_resv_free(
+ struct xfs_mount *mp)
+{
+ if (mp->m_metafile_resv_avail) {
+ xfs_mod_sb_delalloc(mp, -(int64_t)mp->m_metafile_resv_avail);
+ xfs_add_fdblocks(mp, mp->m_metafile_resv_avail);
+ }
+ mp->m_metafile_resv_avail = 0;
+ mp->m_metafile_resv_used = 0;
+ mp->m_metafile_resv_target = 0;
+}
+
+/* Release unused metafile space reservation. */
void
xfs_metafile_resv_free(
- struct xfs_inode *ip)
+ struct xfs_mount *mp)
{
- /* Non-btree metadata inodes don't need space reservations. */
- if (!ip || !ip->i_meta_resv_asked)
+ if (!xfs_has_metadir(mp))
return;
- ASSERT(xfs_is_metadir_inode(ip));
- trace_xfs_metafile_resv_free(ip, 0);
+ trace_xfs_metafile_resv_free(mp, 0);
- if (ip->i_delayed_blks) {
- xfs_mod_delalloc(ip, 0, -ip->i_delayed_blks);
- xfs_add_fdblocks(ip->i_mount, ip->i_delayed_blks);
- ip->i_delayed_blks = 0;
- }
- ip->i_meta_resv_asked = 0;
+ mutex_lock(&mp->m_metafile_resv_lock);
+ __xfs_metafile_resv_free(mp);
+ mutex_unlock(&mp->m_metafile_resv_lock);
}
-/* Set up a metadata file's space reservation. */
+/* Set up a metafile space reservation. */
int
xfs_metafile_resv_init(
- struct xfs_inode *ip,
- xfs_filblks_t ask)
+ struct xfs_mount *mp)
{
+ struct xfs_rtgroup *rtg = NULL;
+ xfs_filblks_t used = 0, target = 0;
xfs_filblks_t hidden_space;
- xfs_filblks_t used;
- int error;
+ int error = 0;
- if (!ip || ip->i_meta_resv_asked > 0)
+ if (!xfs_has_metadir(mp))
return 0;
- ASSERT(xfs_is_metadir_inode(ip));
+ /*
+ * Free any previous reservation to have a clean slate.
+ */
+ mutex_lock(&mp->m_metafile_resv_lock);
+ __xfs_metafile_resv_free(mp);
+
+ /*
+ * Currently the only btree metafiles that require reservations are the
+ * rtrmap and the rtrefcount. Anything new will have to be added here
+ * as well.
+ */
+ while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+ if (xfs_has_rtrmapbt(mp)) {
+ used += rtg_rmap(rtg)->i_nblocks;
+ target += xfs_rtrmapbt_calc_reserves(mp);
+ }
+ if (xfs_has_rtreflink(mp)) {
+ used += rtg_refcount(rtg)->i_nblocks;
+ target += xfs_rtrefcountbt_calc_reserves(mp);
+ }
+ }
+
+ if (!target)
+ goto out_unlock;
/*
- * Space taken by all other metadata btrees are accounted on-disk as
+ * Space taken by the per-AG metadata btrees are accounted on-disk as
* used space. We therefore only hide the space that is reserved but
* not used by the trees.
*/
- used = ip->i_nblocks;
- if (used > ask)
- ask = used;
- hidden_space = ask - used;
+ if (used > target)
+ target = used;
+ hidden_space = target - used;
- error = xfs_dec_fdblocks(ip->i_mount, hidden_space, true);
+ error = xfs_dec_fdblocks(mp, hidden_space, true);
if (error) {
- trace_xfs_metafile_resv_init_error(ip, error, _RET_IP_);
- return error;
+ trace_xfs_metafile_resv_init_error(mp, 0);
+ goto out_unlock;
}
- xfs_mod_delalloc(ip, 0, hidden_space);
- ip->i_delayed_blks = hidden_space;
- ip->i_meta_resv_asked = ask;
+ xfs_mod_sb_delalloc(mp, hidden_space);
+
+ mp->m_metafile_resv_target = target;
+ mp->m_metafile_resv_used = used;
+ mp->m_metafile_resv_avail = hidden_space;
+
+ trace_xfs_metafile_resv_init(mp, target);
- trace_xfs_metafile_resv_init(ip, ask);
- return 0;
+out_unlock:
+ mutex_unlock(&mp->m_metafile_resv_lock);
+ return error;
}
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index 95af4b52e5a7..ae6f9e779b98 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -26,13 +26,13 @@ void xfs_metafile_clear_iflag(struct xfs_trans *tp, struct xfs_inode *ip);
/* Space reservations for metadata inodes. */
struct xfs_alloc_arg;
-bool xfs_metafile_resv_critical(struct xfs_inode *ip);
+bool xfs_metafile_resv_critical(struct xfs_mount *mp);
void xfs_metafile_resv_alloc_space(struct xfs_inode *ip,
struct xfs_alloc_arg *args);
void xfs_metafile_resv_free_space(struct xfs_inode *ip, struct xfs_trans *tp,
xfs_filblks_t len);
-void xfs_metafile_resv_free(struct xfs_inode *ip);
-int xfs_metafile_resv_init(struct xfs_inode *ip, xfs_filblks_t ask);
+void xfs_metafile_resv_free(struct xfs_mount *mp);
+int xfs_metafile_resv_init(struct xfs_mount *mp);
/* Code specific to kernel/userspace; must be provided externally. */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 04/43] FIXUP: xfs: make metabtree reservations global
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (2 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 03/43] xfs: make metabtree reservations global Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 05/43] xfs: reduce metafile reservations Christoph Hellwig
` (39 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
include/spinlock.h | 5 +++++
include/xfs_mount.h | 4 ++++
libxfs/libxfs_priv.h | 1 +
mkfs/xfs_mkfs.c | 25 ++++---------------------
4 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/include/spinlock.h b/include/spinlock.h
index 82973726b101..73bd8c078fea 100644
--- a/include/spinlock.h
+++ b/include/spinlock.h
@@ -22,4 +22,9 @@ typedef pthread_mutex_t spinlock_t;
#define spin_trylock(l) (pthread_mutex_trylock(l) != EBUSY)
#define spin_unlock(l) pthread_mutex_unlock(l)
+#define mutex_init(l) pthread_mutex_init(l, NULL)
+#define mutex_lock(l) pthread_mutex_lock(l)
+#define mutex_trylock(l) (pthread_mutex_trylock(l) != EBUSY)
+#define mutex_unlock(l) pthread_mutex_unlock(l)
+
#endif /* __LIBXFS_SPINLOCK_H__ */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index e0f72fc32b25..0acf952eb9d7 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -164,6 +164,10 @@ typedef struct xfs_mount {
atomic64_t m_allocbt_blks;
spinlock_t m_perag_lock; /* lock for m_perag_tree */
+ pthread_mutex_t m_metafile_resv_lock;
+ uint64_t m_metafile_resv_target;
+ uint64_t m_metafile_resv_used;
+ uint64_t m_metafile_resv_avail;
} xfs_mount_t;
#define M_IGEO(mp) (&(mp)->m_ino_geo)
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index cb4800de0b11..82952b0db629 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -151,6 +151,7 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
#define xfs_force_shutdown(d,n) ((void) 0)
#define xfs_mod_delalloc(a,b,c) ((void) 0)
+#define xfs_mod_sb_delalloc(sb, d) ((void) 0)
/* stop unused var warnings by assigning mp to itself */
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 3f4455d46383..ec82e05bf4e4 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -5102,8 +5102,6 @@ check_rt_meta_prealloc(
struct xfs_mount *mp)
{
struct xfs_perag *pag = NULL;
- struct xfs_rtgroup *rtg = NULL;
- xfs_filblks_t ask;
int error;
/*
@@ -5123,27 +5121,12 @@ check_rt_meta_prealloc(
}
}
- /* Realtime metadata btree inode */
- while ((rtg = xfs_rtgroup_next(mp, rtg))) {
- ask = libxfs_rtrmapbt_calc_reserves(mp);
- error = -libxfs_metafile_resv_init(rtg_rmap(rtg), ask);
- if (error)
- prealloc_fail(mp, error, ask, _("realtime rmap btree"));
-
- ask = libxfs_rtrefcountbt_calc_reserves(mp);
- error = -libxfs_metafile_resv_init(rtg_refcount(rtg), ask);
- if (error)
- prealloc_fail(mp, error, ask,
- _("realtime refcount btree"));
- }
+ error = -libxfs_metafile_resv_init(mp);
+ if (error)
+ prealloc_fail(mp, error, 0, _("metadata files"));
- /* Unreserve the realtime metadata reservations. */
- while ((rtg = xfs_rtgroup_next(mp, rtg))) {
- libxfs_metafile_resv_free(rtg_rmap(rtg));
- libxfs_metafile_resv_free(rtg_refcount(rtg));
- }
+ libxfs_metafile_resv_free(mp);
- /* Unreserve the per-AG reservations. */
while ((pag = xfs_perag_next(mp, pag)))
libxfs_ag_resv_free(pag);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 05/43] xfs: reduce metafile reservations
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (3 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 04/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 06/43] xfs: add a rtg_blocks helper Christoph Hellwig
` (38 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 272e20bb24dc895375ccc18a82596a7259b5a652
There is no point in reserving more space than actually available
on the data device for the worst case scenario that is unlikely to
happen. Reserve at most 1/4th of the data device blocks, which is
still a heuristic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_metafile.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 1216a0e5169e..6ded87d09ab7 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -258,6 +258,7 @@ xfs_metafile_resv_init(
struct xfs_rtgroup *rtg = NULL;
xfs_filblks_t used = 0, target = 0;
xfs_filblks_t hidden_space;
+ xfs_rfsblock_t dblocks_avail = mp->m_sb.sb_dblocks / 4;
int error = 0;
if (!xfs_has_metadir(mp))
@@ -295,6 +296,8 @@ xfs_metafile_resv_init(
*/
if (used > target)
target = used;
+ else if (target > dblocks_avail)
+ target = dblocks_avail;
hidden_space = target - used;
error = xfs_dec_fdblocks(mp, hidden_space, true);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 06/43] xfs: add a rtg_blocks helper
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (4 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 05/43] xfs: reduce metafile reservations Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 07/43] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
` (37 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 012482b3308a49a84c2a7df08218dd4ad081e1da
Shortcut dereferencing the xg_block_count field in the generic group
structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_rtgroup.c | 2 +-
libxfs/xfs_rtgroup.h | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 24fb160b8067..6f65ecc3015d 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -267,7 +267,7 @@ xfs_rtgroup_get_geometry(
/* Fill out form. */
memset(rgeo, 0, sizeof(*rgeo));
rgeo->rg_number = rtg_rgno(rtg);
- rgeo->rg_length = rtg_group(rtg)->xg_block_count;
+ rgeo->rg_length = rtg_blocks(rtg);
xfs_rtgroup_geom_health(rtg, rgeo);
return 0;
}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 03f39d4e43fc..9c7e03f913cb 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -66,6 +66,11 @@ static inline xfs_rgnumber_t rtg_rgno(const struct xfs_rtgroup *rtg)
return rtg->rtg_group.xg_gno;
}
+static inline xfs_rgblock_t rtg_blocks(const struct xfs_rtgroup *rtg)
+{
+ return rtg->rtg_group.xg_block_count;
+}
+
static inline struct xfs_inode *rtg_bitmap(const struct xfs_rtgroup *rtg)
{
return rtg->rtg_inodes[XFS_RTGI_BITMAP];
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 07/43] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (5 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 06/43] xfs: add a rtg_blocks helper Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 08/43] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
` (36 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 7c879c8275c0505c551f0fc6c152299c8d11f756
Delalloc reservations are not supported in userspace, and thus it doesn't
make sense to share this helper with xfsprogs.c. Move it to xfs_iomap.c
toward the two callers.
Note that there rest of the delalloc handling should probably eventually
also move out of xfs_bmap.c, but that will require a bit more surgery.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_bmap.c | 294 ++--------------------------------------------
libxfs/xfs_bmap.h | 5 +-
2 files changed, 8 insertions(+), 291 deletions(-)
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 07c553d92423..4dc66e77744f 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -165,18 +165,16 @@ xfs_bmbt_update(
* Compute the worst-case number of indirect blocks that will be used
* for ip's delayed extent of length "len".
*/
-STATIC xfs_filblks_t
+xfs_filblks_t
xfs_bmap_worst_indlen(
- xfs_inode_t *ip, /* incore inode pointer */
- xfs_filblks_t len) /* delayed extent length */
+ struct xfs_inode *ip, /* incore inode pointer */
+ xfs_filblks_t len) /* delayed extent length */
{
- int level; /* btree level number */
- int maxrecs; /* maximum record count at this level */
- xfs_mount_t *mp; /* mount structure */
- xfs_filblks_t rval; /* return value */
+ struct xfs_mount *mp = ip->i_mount;
+ int maxrecs = mp->m_bmap_dmxr[0];
+ int level;
+ xfs_filblks_t rval;
- mp = ip->i_mount;
- maxrecs = mp->m_bmap_dmxr[0];
for (level = 0, rval = 0;
level < XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK);
level++) {
@@ -2565,146 +2563,6 @@ done:
#undef PREV
}
-/*
- * Convert a hole to a delayed allocation.
- */
-STATIC void
-xfs_bmap_add_extent_hole_delay(
- xfs_inode_t *ip, /* incore inode pointer */
- int whichfork,
- struct xfs_iext_cursor *icur,
- xfs_bmbt_irec_t *new) /* new data to add to file extents */
-{
- struct xfs_ifork *ifp; /* inode fork pointer */
- xfs_bmbt_irec_t left; /* left neighbor extent entry */
- xfs_filblks_t newlen=0; /* new indirect size */
- xfs_filblks_t oldlen=0; /* old indirect size */
- xfs_bmbt_irec_t right; /* right neighbor extent entry */
- uint32_t state = xfs_bmap_fork_to_state(whichfork);
- xfs_filblks_t temp; /* temp for indirect calculations */
-
- ifp = xfs_ifork_ptr(ip, whichfork);
- ASSERT(isnullstartblock(new->br_startblock));
-
- /*
- * Check and set flags if this segment has a left neighbor
- */
- if (xfs_iext_peek_prev_extent(ifp, icur, &left)) {
- state |= BMAP_LEFT_VALID;
- if (isnullstartblock(left.br_startblock))
- state |= BMAP_LEFT_DELAY;
- }
-
- /*
- * Check and set flags if the current (right) segment exists.
- * If it doesn't exist, we're converting the hole at end-of-file.
- */
- if (xfs_iext_get_extent(ifp, icur, &right)) {
- state |= BMAP_RIGHT_VALID;
- if (isnullstartblock(right.br_startblock))
- state |= BMAP_RIGHT_DELAY;
- }
-
- /*
- * Set contiguity flags on the left and right neighbors.
- * Don't let extents get too large, even if the pieces are contiguous.
- */
- if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
- left.br_startoff + left.br_blockcount == new->br_startoff &&
- left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
- state |= BMAP_LEFT_CONTIG;
-
- if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
- new->br_startoff + new->br_blockcount == right.br_startoff &&
- new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
- (!(state & BMAP_LEFT_CONTIG) ||
- (left.br_blockcount + new->br_blockcount +
- right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
- state |= BMAP_RIGHT_CONTIG;
-
- /*
- * Switch out based on the contiguity flags.
- */
- switch (state & (BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG)) {
- case BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG:
- /*
- * New allocation is contiguous with delayed allocations
- * on the left and on the right.
- * Merge all three into a single extent record.
- */
- temp = left.br_blockcount + new->br_blockcount +
- right.br_blockcount;
-
- oldlen = startblockval(left.br_startblock) +
- startblockval(new->br_startblock) +
- startblockval(right.br_startblock);
- newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
- oldlen);
- left.br_startblock = nullstartblock(newlen);
- left.br_blockcount = temp;
-
- xfs_iext_remove(ip, icur, state);
- xfs_iext_prev(ifp, icur);
- xfs_iext_update_extent(ip, state, icur, &left);
- break;
-
- case BMAP_LEFT_CONTIG:
- /*
- * New allocation is contiguous with a delayed allocation
- * on the left.
- * Merge the new allocation with the left neighbor.
- */
- temp = left.br_blockcount + new->br_blockcount;
-
- oldlen = startblockval(left.br_startblock) +
- startblockval(new->br_startblock);
- newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
- oldlen);
- left.br_blockcount = temp;
- left.br_startblock = nullstartblock(newlen);
-
- xfs_iext_prev(ifp, icur);
- xfs_iext_update_extent(ip, state, icur, &left);
- break;
-
- case BMAP_RIGHT_CONTIG:
- /*
- * New allocation is contiguous with a delayed allocation
- * on the right.
- * Merge the new allocation with the right neighbor.
- */
- temp = new->br_blockcount + right.br_blockcount;
- oldlen = startblockval(new->br_startblock) +
- startblockval(right.br_startblock);
- newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
- oldlen);
- right.br_startoff = new->br_startoff;
- right.br_startblock = nullstartblock(newlen);
- right.br_blockcount = temp;
- xfs_iext_update_extent(ip, state, icur, &right);
- break;
-
- case 0:
- /*
- * New allocation is not contiguous with another
- * delayed allocation.
- * Insert a new entry.
- */
- oldlen = newlen = 0;
- xfs_iext_insert(ip, icur, new, state);
- break;
- }
- if (oldlen != newlen) {
- ASSERT(oldlen > newlen);
- xfs_add_fdblocks(ip->i_mount, oldlen - newlen);
-
- /*
- * Nothing to do for disk quota accounting here.
- */
- xfs_mod_delalloc(ip, 0, (int64_t)newlen - oldlen);
- }
-}
-
/*
* Convert a hole to a real allocation.
*/
@@ -4033,144 +3891,6 @@ xfs_bmapi_read(
return 0;
}
-/*
- * Add a delayed allocation extent to an inode. Blocks are reserved from the
- * global pool and the extent inserted into the inode in-core extent tree.
- *
- * On entry, got refers to the first extent beyond the offset of the extent to
- * allocate or eof is specified if no such extent exists. On return, got refers
- * to the extent record that was inserted to the inode fork.
- *
- * Note that the allocated extent may have been merged with contiguous extents
- * during insertion into the inode fork. Thus, got does not reflect the current
- * state of the inode fork on return. If necessary, the caller can use lastx to
- * look up the updated record in the inode fork.
- */
-int
-xfs_bmapi_reserve_delalloc(
- struct xfs_inode *ip,
- int whichfork,
- xfs_fileoff_t off,
- xfs_filblks_t len,
- xfs_filblks_t prealloc,
- struct xfs_bmbt_irec *got,
- struct xfs_iext_cursor *icur,
- int eof)
-{
- struct xfs_mount *mp = ip->i_mount;
- struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
- xfs_extlen_t alen;
- xfs_extlen_t indlen;
- uint64_t fdblocks;
- int error;
- xfs_fileoff_t aoff;
- bool use_cowextszhint =
- whichfork == XFS_COW_FORK && !prealloc;
-
-retry:
- /*
- * Cap the alloc length. Keep track of prealloc so we know whether to
- * tag the inode before we return.
- */
- aoff = off;
- alen = XFS_FILBLKS_MIN(len + prealloc, XFS_MAX_BMBT_EXTLEN);
- if (!eof)
- alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff);
- if (prealloc && alen >= len)
- prealloc = alen - len;
-
- /*
- * If we're targetting the COW fork but aren't creating a speculative
- * posteof preallocation, try to expand the reservation to align with
- * the COW extent size hint if there's sufficient free space.
- *
- * Unlike the data fork, the CoW cancellation functions will free all
- * the reservations at inactivation, so we don't require that every
- * delalloc reservation have a dirty pagecache.
- */
- if (use_cowextszhint) {
- struct xfs_bmbt_irec prev;
- xfs_extlen_t extsz = xfs_get_cowextsz_hint(ip);
-
- if (!xfs_iext_peek_prev_extent(ifp, icur, &prev))
- prev.br_startoff = NULLFILEOFF;
-
- error = xfs_bmap_extsize_align(mp, got, &prev, extsz, 0, eof,
- 1, 0, &aoff, &alen);
- ASSERT(!error);
- }
-
- /*
- * Make a transaction-less quota reservation for delayed allocation
- * blocks. This number gets adjusted later. We return if we haven't
- * allocated blocks already inside this loop.
- */
- error = xfs_quota_reserve_blkres(ip, alen);
- if (error)
- goto out;
-
- /*
- * Split changing sb for alen and indlen since they could be coming
- * from different places.
- */
- indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
- ASSERT(indlen > 0);
-
- fdblocks = indlen;
- if (XFS_IS_REALTIME_INODE(ip)) {
- error = xfs_dec_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
- if (error)
- goto out_unreserve_quota;
- } else {
- fdblocks += alen;
- }
-
- error = xfs_dec_fdblocks(mp, fdblocks, false);
- if (error)
- goto out_unreserve_frextents;
-
- ip->i_delayed_blks += alen;
- xfs_mod_delalloc(ip, alen, indlen);
-
- got->br_startoff = aoff;
- got->br_startblock = nullstartblock(indlen);
- got->br_blockcount = alen;
- got->br_state = XFS_EXT_NORM;
-
- xfs_bmap_add_extent_hole_delay(ip, whichfork, icur, got);
-
- /*
- * Tag the inode if blocks were preallocated. Note that COW fork
- * preallocation can occur at the start or end of the extent, even when
- * prealloc == 0, so we must also check the aligned offset and length.
- */
- if (whichfork == XFS_DATA_FORK && prealloc)
- xfs_inode_set_eofblocks_tag(ip);
- if (whichfork == XFS_COW_FORK && (prealloc || aoff < off || alen > len))
- xfs_inode_set_cowblocks_tag(ip);
-
- return 0;
-
-out_unreserve_frextents:
- if (XFS_IS_REALTIME_INODE(ip))
- xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
-out_unreserve_quota:
- if (XFS_IS_QUOTA_ON(mp))
- xfs_quota_unreserve_blkres(ip, alen);
-out:
- if (error == -ENOSPC || error == -EDQUOT) {
- trace_xfs_delalloc_enospc(ip, off, len);
-
- if (prealloc || use_cowextszhint) {
- /* retry without any preallocation */
- use_cowextszhint = false;
- prealloc = 0;
- goto retry;
- }
- }
- return error;
-}
-
static int
xfs_bmapi_allocate(
struct xfs_bmalloca *bma)
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 4b721d935994..4d48087fd3a8 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -219,10 +219,6 @@ int xfs_bmap_insert_extents(struct xfs_trans *tp, struct xfs_inode *ip,
bool *done, xfs_fileoff_t stop_fsb);
int xfs_bmap_split_extent(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t split_offset);
-int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
- xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
- struct xfs_bmbt_irec *got, struct xfs_iext_cursor *cur,
- int eof);
int xfs_bmapi_convert_delalloc(struct xfs_inode *ip, int whichfork,
xfs_off_t offset, struct iomap *iomap, unsigned int *seq);
int xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
@@ -233,6 +229,7 @@ xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip,
int fork);
int xfs_bmap_btalloc_low_space(struct xfs_bmalloca *ap,
struct xfs_alloc_arg *args);
+xfs_filblks_t xfs_bmap_worst_indlen(struct xfs_inode *ip, xfs_filblks_t len);
enum xfs_bmap_intent_type {
XFS_BMAP_MAP = 1,
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 08/43] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (6 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 07/43] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 09/43] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
` (35 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: f42c652434de5e26e02798bf6a0c2a4a8627196b
The zone allocator wants to be able to remove a delalloc mapping in the
COW fork while keeping the block reservation. To support that pass the
flags argument down to xfs_bmap_del_extent_delay and support the
XFS_BMAPI_REMAP flag to keep the reservation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_bmap.c | 10 +++++++---
libxfs/xfs_bmap.h | 2 +-
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 4dc66e77744f..c40cdf004ac9 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4662,7 +4662,8 @@ xfs_bmap_del_extent_delay(
int whichfork,
struct xfs_iext_cursor *icur,
struct xfs_bmbt_irec *got,
- struct xfs_bmbt_irec *del)
+ struct xfs_bmbt_irec *del,
+ uint32_t bflags) /* bmapi flags */
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork);
@@ -4782,7 +4783,9 @@ xfs_bmap_del_extent_delay(
da_diff = da_old - da_new;
fdblocks = da_diff;
- if (isrt)
+ if (bflags & XFS_BMAPI_REMAP)
+ ;
+ else if (isrt)
xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, del->br_blockcount));
else
fdblocks += del->br_blockcount;
@@ -5384,7 +5387,8 @@ __xfs_bunmapi(
delete:
if (wasdel) {
- xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
+ xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got,
+ &del, flags);
} else {
error = xfs_bmap_del_extent_real(ip, tp, &icur, cur,
&del, &tmp_logflags, whichfork,
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 4d48087fd3a8..b4d9c6e0f3f9 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -204,7 +204,7 @@ int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_extnum_t nexts, int *done);
void xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
- struct xfs_bmbt_irec *del);
+ struct xfs_bmbt_irec *del, uint32_t bflags);
void xfs_bmap_del_extent_cow(struct xfs_inode *ip,
struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
struct xfs_bmbt_irec *del);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 09/43] xfs: add a xfs_rtrmap_highest_rgbno helper
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (7 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 08/43] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 10/43] xfs: define the zoned on-disk format Christoph Hellwig
` (34 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: aacde95a37160b1462e46e0fd0cc7fd70e3bf1cc
Add a helper to find the last offset mapped in the rtrmap. This will be
used by the zoned code to find out where to start writing again on
conventional devices without hardware zone support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_rtrmap_btree.c | 19 +++++++++++++++++++
libxfs/xfs_rtrmap_btree.h | 2 ++
2 files changed, 21 insertions(+)
diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index d897d140efa7..d46fb67bb5d4 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -1032,3 +1032,22 @@ xfs_rtrmapbt_init_rtsb(
xfs_btree_del_cursor(cur, error);
return error;
}
+
+/*
+ * Return the highest rgbno currently tracked by the rmap for this rtg.
+ */
+xfs_rgblock_t
+xfs_rtrmap_highest_rgbno(
+ struct xfs_rtgroup *rtg)
+{
+ struct xfs_btree_block *block = rtg_rmap(rtg)->i_df.if_broot;
+ union xfs_btree_key key = {};
+ struct xfs_btree_cur *cur;
+
+ if (block->bb_numrecs == 0)
+ return NULLRGBLOCK;
+ cur = xfs_rtrmapbt_init_cursor(NULL, rtg);
+ xfs_btree_get_keys(cur, block, &key);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ return be32_to_cpu(key.__rmap_bigkey[1].rm_startblock);
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index 9d0915089891..e328fd62a149 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -207,4 +207,6 @@ struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg,
int xfs_rtrmapbt_mem_init(struct xfs_mount *mp, struct xfbtree *xfbtree,
struct xfs_buftarg *btp, xfs_rgnumber_t rgno);
+xfs_rgblock_t xfs_rtrmap_highest_rgbno(struct xfs_rtgroup *rtg);
+
#endif /* __XFS_RTRMAP_BTREE_H__ */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 10/43] xfs: define the zoned on-disk format
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (8 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 09/43] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 11/43] FIXUP: " Christoph Hellwig
` (33 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 2167eaabe2fadde24cb8f1dafbec64da1d2ed2f5
Zone file systems reuse the basic RT group enabled XFS file system
structure to support a mode where each RT group is always written from
start to end and then reset for reuse (after moving out any remaining
data). There are few minor but important changes, which are indicated
by a new incompat flag:
1) there are no bitmap and summary inodes, thus the
/rtgroups/{rgno}.{bitmap,summary} metadir files do not exist and the
sb_rbmblocks superblock field must be cleared to zero.
2) there is a new superblock field that specifies the start of an
internal RT section. This allows supporting SMR HDDs that have random
writable space at the beginning which is used for the XFS data device
(which really is the metadata device for this configuration), directly
followed by a RT device on the same block device. While something
similar could be achieved using dm-linear just having a single device
directly consumed by XFS makes handling the file systems a lot easier.
3) Another superblock field that tracks the amount of reserved space (or
overprovisioning) that is never used for user capacity, but allows GC
to run more smoothly.
4) an overlay of the cowextsize field for the rtrmap inode so that we
can persistently track the total amount of rtblocks currently used in
a RT group. There is no data structure other than the rmap that
tracks used space in an RT group, and this counter is used to decide
when a RT group has been entirely emptied, and to select one that
is relatively empty if garbage collection needs to be performed.
While this counter could be tracked entirely in memory and rebuilt
from the rmap at mount time, that would lead to very long mount times
with the large number of RT groups implied by the number of hardware
zones especially on SMR hard drives with 256MB zone sizes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_format.h | 15 ++++++++--
libxfs/xfs_inode_buf.c | 21 ++++++++++----
libxfs/xfs_inode_util.c | 1 +
libxfs/xfs_log_format.h | 7 ++++-
libxfs/xfs_ondisk.h | 6 ++--
libxfs/xfs_rtbitmap.c | 11 +++++++
libxfs/xfs_rtgroup.c | 37 ++++++++++++++----------
libxfs/xfs_sb.c | 64 +++++++++++++++++++++++++++++++++++++++--
8 files changed, 133 insertions(+), 29 deletions(-)
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index b1007fb661ba..f67380a25805 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -178,9 +178,10 @@ typedef struct xfs_sb {
xfs_rgnumber_t sb_rgcount; /* number of realtime groups */
xfs_rtxlen_t sb_rgextents; /* size of a realtime group in rtx */
-
uint8_t sb_rgblklog; /* rt group number shift */
uint8_t sb_pad[7]; /* zeroes */
+ xfs_rfsblock_t sb_rtstart; /* start of internal RT section (FSB) */
+ xfs_filblks_t sb_rtreserved; /* reserved (zoned) RT blocks */
/* must be padded to 64 bit alignment */
} xfs_sb_t;
@@ -270,9 +271,10 @@ struct xfs_dsb {
__be64 sb_metadirino; /* metadata directory tree root */
__be32 sb_rgcount; /* # of realtime groups */
__be32 sb_rgextents; /* size of rtgroup in rtx */
-
__u8 sb_rgblklog; /* rt group number shift */
__u8 sb_pad[7]; /* zeroes */
+ __be64 sb_rtstart; /* start of internal RT section (FSB) */
+ __be64 sb_rtreserved; /* reserved (zoned) RT blocks */
/*
* The size of this structure must be padded to 64 bit alignment.
@@ -395,6 +397,8 @@ xfs_sb_has_ro_compat_feature(
#define XFS_SB_FEAT_INCOMPAT_EXCHRANGE (1 << 6) /* exchangerange supported */
#define XFS_SB_FEAT_INCOMPAT_PARENT (1 << 7) /* parent pointers */
#define XFS_SB_FEAT_INCOMPAT_METADIR (1 << 8) /* metadata dir tree */
+#define XFS_SB_FEAT_INCOMPAT_ZONED (1 << 9) /* zoned RT allocator */
+
#define XFS_SB_FEAT_INCOMPAT_ALL \
(XFS_SB_FEAT_INCOMPAT_FTYPE | \
XFS_SB_FEAT_INCOMPAT_SPINODES | \
@@ -952,7 +956,12 @@ struct xfs_dinode {
__be64 di_changecount; /* number of attribute changes */
__be64 di_lsn; /* flush sequence */
__be64 di_flags2; /* more random flags */
- __be32 di_cowextsize; /* basic cow extent size for file */
+ union {
+ /* basic cow extent size for (regular) file */
+ __be32 di_cowextsize;
+ /* used blocks in RTG for (zoned) rtrmap inode */
+ __be32 di_used_blocks;
+ };
__u8 di_pad2[12]; /* more padding for future expansion */
/* fields only written to during inode creation */
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 5ca857ca2dc1..5ca753465b96 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -249,7 +249,10 @@ xfs_inode_from_disk(
be64_to_cpu(from->di_changecount));
ip->i_crtime = xfs_inode_from_disk_ts(from, from->di_crtime);
ip->i_diflags2 = be64_to_cpu(from->di_flags2);
+ /* also covers the di_used_blocks union arm: */
ip->i_cowextsize = be32_to_cpu(from->di_cowextsize);
+ BUILD_BUG_ON(sizeof(from->di_cowextsize) !=
+ sizeof(from->di_used_blocks));
}
error = xfs_iformat_data_fork(ip, from);
@@ -346,6 +349,7 @@ xfs_inode_to_disk(
to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
to->di_crtime = xfs_inode_to_disk_ts(ip, ip->i_crtime);
to->di_flags2 = cpu_to_be64(ip->i_diflags2);
+ /* also covers the di_used_blocks union arm: */
to->di_cowextsize = cpu_to_be32(ip->i_cowextsize);
to->di_ino = cpu_to_be64(ip->i_ino);
to->di_lsn = cpu_to_be64(lsn);
@@ -749,11 +753,18 @@ xfs_dinode_verify(
!xfs_has_rtreflink(mp))
return __this_address;
- /* COW extent size hint validation */
- fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
- mode, flags, flags2);
- if (fa)
- return fa;
+ if (xfs_has_zoned(mp) &&
+ dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP)) {
+ if (be32_to_cpu(dip->di_used_blocks) > mp->m_sb.sb_rgextents)
+ return __this_address;
+ } else {
+ /* COW extent size hint validation */
+ fa = xfs_inode_validate_cowextsize(mp,
+ be32_to_cpu(dip->di_cowextsize),
+ mode, flags, flags2);
+ if (fa)
+ return fa;
+ }
/* bigtime iflag can only happen on bigtime filesystems */
if (xfs_dinode_has_bigtime(dip) &&
diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c
index edc985eb9a4e..2a7988d774b8 100644
--- a/libxfs/xfs_inode_util.c
+++ b/libxfs/xfs_inode_util.c
@@ -319,6 +319,7 @@ xfs_inode_init(
if (xfs_has_v3inodes(mp)) {
inode_set_iversion(inode, 1);
+ /* also covers the di_used_blocks union arm: */
ip->i_cowextsize = 0;
times |= XFS_ICHGTIME_CREATE;
}
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index a472ac2e45d0..0d637c276db0 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -475,7 +475,12 @@ struct xfs_log_dinode {
xfs_lsn_t di_lsn;
uint64_t di_flags2; /* more random flags */
- uint32_t di_cowextsize; /* basic cow extent size for file */
+ union {
+ /* basic cow extent size for (regular) file */
+ uint32_t di_cowextsize;
+ /* used blocks in RTG for (zoned) rtrmap inode */
+ uint32_t di_used_blocks;
+ };
uint8_t di_pad2[12]; /* more padding for future expansion */
/* fields only written to during inode creation */
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index a85ecddaa48e..5ed44fdf7491 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -233,8 +233,8 @@ xfs_check_ondisk_structs(void)
16299260424LL);
/* superblock field checks we got from xfs/122 */
- XFS_CHECK_STRUCT_SIZE(struct xfs_dsb, 288);
- XFS_CHECK_STRUCT_SIZE(struct xfs_sb, 288);
+ XFS_CHECK_STRUCT_SIZE(struct xfs_dsb, 304);
+ XFS_CHECK_STRUCT_SIZE(struct xfs_sb, 304);
XFS_CHECK_SB_OFFSET(sb_magicnum, 0);
XFS_CHECK_SB_OFFSET(sb_blocksize, 4);
XFS_CHECK_SB_OFFSET(sb_dblocks, 8);
@@ -295,6 +295,8 @@ xfs_check_ondisk_structs(void)
XFS_CHECK_SB_OFFSET(sb_rgextents, 276);
XFS_CHECK_SB_OFFSET(sb_rgblklog, 280);
XFS_CHECK_SB_OFFSET(sb_pad, 281);
+ XFS_CHECK_SB_OFFSET(sb_rtstart, 288);
+ XFS_CHECK_SB_OFFSET(sb_rtreserved, 296);
}
#endif /* __XFS_ONDISK_H */
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 689d5844b8bd..34425a933650 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1118,6 +1118,7 @@ xfs_rtfree_blocks(
xfs_extlen_t mod;
int error;
+ ASSERT(!xfs_has_zoned(mp));
ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN);
mod = xfs_blen_to_rtxoff(mp, rtlen);
@@ -1169,6 +1170,9 @@ xfs_rtalloc_query_range(
end = min(end, rtg->rtg_extents - 1);
+ if (xfs_has_zoned(mp))
+ return -EINVAL;
+
/* Iterate the bitmap, looking for discrepancies. */
while (start <= end) {
struct xfs_rtalloc_rec rec;
@@ -1263,6 +1267,8 @@ xfs_rtbitmap_blockcount_len(
struct xfs_mount *mp,
xfs_rtbxlen_t rtextents)
{
+ if (xfs_has_zoned(mp))
+ return 0;
return howmany_64(rtextents, xfs_rtbitmap_rtx_per_rbmblock(mp));
}
@@ -1303,6 +1309,11 @@ xfs_rtsummary_blockcount(
xfs_rtbxlen_t rextents = xfs_rtbitmap_bitcount(mp);
unsigned long long rsumwords;
+ if (xfs_has_zoned(mp)) {
+ *rsumlevels = 0;
+ return 0;
+ }
+
*rsumlevels = xfs_compute_rextslog(rextents) + 1;
rsumwords = xfs_rtbitmap_blockcount_len(mp, rextents) * (*rsumlevels);
return howmany_64(rsumwords, mp->m_blockwsize);
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 6f65ecc3015d..e58968286f32 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -191,15 +191,17 @@ xfs_rtgroup_lock(
ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
!(rtglock_flags & XFS_RTGLOCK_BITMAP));
- if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
- /*
- * Lock both realtime free space metadata inodes for a freespace
- * update.
- */
- xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
- xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL);
- } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
- xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+ if (!xfs_has_zoned(rtg_mount(rtg))) {
+ if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+ /*
+ * Lock both realtime free space metadata inodes for a
+ * freespace update.
+ */
+ xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+ xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL);
+ } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+ xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+ }
}
if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
@@ -225,11 +227,13 @@ xfs_rtgroup_unlock(
if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
- if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
- xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
- xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
- } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
- xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+ if (!xfs_has_zoned(rtg_mount(rtg))) {
+ if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+ xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
+ xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+ } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+ xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+ }
}
}
@@ -246,7 +250,8 @@ xfs_rtgroup_trans_join(
ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED));
- if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+ if (!xfs_has_zoned(rtg_mount(rtg)) &&
+ (rtglock_flags & XFS_RTGLOCK_BITMAP)) {
xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL);
}
@@ -351,6 +356,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
.sick = XFS_SICK_RG_BITMAP,
.fmt_mask = (1U << XFS_DINODE_FMT_EXTENTS) |
(1U << XFS_DINODE_FMT_BTREE),
+ .enabled = xfs_has_nonzoned,
.create = xfs_rtbitmap_create,
},
[XFS_RTGI_SUMMARY] = {
@@ -359,6 +365,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
.sick = XFS_SICK_RG_SUMMARY,
.fmt_mask = (1U << XFS_DINODE_FMT_EXTENTS) |
(1U << XFS_DINODE_FMT_BTREE),
+ .enabled = xfs_has_nonzoned,
.create = xfs_rtsummary_create,
},
[XFS_RTGI_RMAP] = {
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 1781ca36b2cc..bc84792c565c 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -27,6 +27,7 @@
#include "xfs_rtgroup.h"
#include "xfs_rtrmap_btree.h"
#include "xfs_rtrefcount_btree.h"
+#include "xfs_rtbitmap.h"
/*
* Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -182,6 +183,8 @@ xfs_sb_version_to_features(
features |= XFS_FEAT_PARENT;
if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
features |= XFS_FEAT_METADIR;
+ if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED)
+ features |= XFS_FEAT_ZONED;
return features;
}
@@ -263,6 +266,9 @@ static uint64_t
xfs_expected_rbmblocks(
struct xfs_sb *sbp)
{
+ if (xfs_sb_is_v5(sbp) &&
+ (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED))
+ return 0;
return howmany_64(xfs_extents_per_rbm(sbp),
NBBY * xfs_rtbmblock_size(sbp));
}
@@ -272,9 +278,15 @@ bool
xfs_validate_rt_geometry(
struct xfs_sb *sbp)
{
- if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
- sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE)
- return false;
+ if (xfs_sb_is_v5(sbp) &&
+ (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED)) {
+ if (sbp->sb_rextsize != 1)
+ return false;
+ } else {
+ if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
+ sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE)
+ return false;
+ }
if (sbp->sb_rblocks == 0) {
if (sbp->sb_rextents != 0 || sbp->sb_rbmblocks != 0 ||
@@ -432,6 +444,34 @@ xfs_validate_sb_rtgroups(
return 0;
}
+static int
+xfs_validate_sb_zoned(
+ struct xfs_mount *mp,
+ struct xfs_sb *sbp)
+{
+ if (sbp->sb_frextents != 0) {
+ xfs_warn(mp,
+"sb_frextents must be zero for zoned file systems.");
+ return -EINVAL;
+ }
+
+ if (sbp->sb_rtstart && sbp->sb_rtstart < sbp->sb_dblocks) {
+ xfs_warn(mp,
+"sb_rtstart (%lld) overlaps sb_dblocks (%lld).",
+ sbp->sb_rtstart, sbp->sb_dblocks);
+ return -EINVAL;
+ }
+
+ if (sbp->sb_rtreserved && sbp->sb_rtreserved >= sbp->sb_rblocks) {
+ xfs_warn(mp,
+"sb_rtreserved (%lld) larger than sb_rblocks (%lld).",
+ sbp->sb_rtreserved, sbp->sb_rblocks);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
/* Check the validity of the SB. */
STATIC int
xfs_validate_sb_common(
@@ -520,6 +560,11 @@ xfs_validate_sb_common(
if (error)
return error;
}
+ if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+ error = xfs_validate_sb_zoned(mp, sbp);
+ if (error)
+ return error;
+ }
} else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD |
XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) {
xfs_notice(mp,
@@ -832,6 +877,14 @@ __xfs_sb_from_disk(
to->sb_rgcount = 1;
to->sb_rgextents = 0;
}
+
+ if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+ to->sb_rtstart = be64_to_cpu(from->sb_rtstart);
+ to->sb_rtreserved = be64_to_cpu(from->sb_rtreserved);
+ } else {
+ to->sb_rtstart = 0;
+ to->sb_rtreserved = 0;
+ }
}
void
@@ -998,6 +1051,11 @@ xfs_sb_to_disk(
to->sb_rbmino = cpu_to_be64(0);
to->sb_rsumino = cpu_to_be64(0);
}
+
+ if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+ to->sb_rtstart = cpu_to_be64(from->sb_rtstart);
+ to->sb_rtreserved = cpu_to_be64(from->sb_rtreserved);
+ }
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 11/43] FIXUP: xfs: define the zoned on-disk format
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (9 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 10/43] xfs: define the zoned on-disk format Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 12/43] xfs: allow internal RT devices for zoned mode Christoph Hellwig
` (32 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
include/xfs_inode.h | 12 +++++++++++-
include/xfs_mount.h | 12 ++++++++++--
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 5bb31eb4aa53..61d4d285a106 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -232,8 +232,13 @@ typedef struct xfs_inode {
xfs_rfsblock_t i_nblocks; /* # of direct & btree blocks */
prid_t i_projid; /* owner's project id */
xfs_extlen_t i_extsize; /* basic/minimum extent size */
- /* cowextsize is only used for v3 inodes, flushiter for v1/2 */
+ /*
+ * i_used_blocks is used for zoned rtrmap inodes,
+ * i_cowextsize is used for other v3 inodes,
+ * i_flushiter for v1/2 inodes
+ */
union {
+ uint32_t i_used_blocks; /* used blocks in RTG */
xfs_extlen_t i_cowextsize; /* basic cow extent size */
uint16_t i_flushiter; /* incremented on flush */
};
@@ -361,6 +366,11 @@ static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
}
#define XFS_IS_REALTIME_INODE(ip) ((ip)->i_diflags & XFS_DIFLAG_REALTIME)
+static inline bool xfs_is_zoned_inode(struct xfs_inode *ip)
+{
+ return xfs_has_zoned(ip->i_mount) && XFS_IS_REALTIME_INODE(ip);
+}
+
/* inode link counts */
static inline void set_nlink(struct inode *inode, uint32_t nlink)
{
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 0acf952eb9d7..7856acfb9f8e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -207,6 +207,7 @@ typedef struct xfs_mount {
#define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */
#define XFS_FEAT_EXCHANGE_RANGE (1ULL << 27) /* exchange range */
#define XFS_FEAT_METADIR (1ULL << 28) /* metadata directory tree */
+#define XFS_FEAT_ZONED (1ULL << 29) /* zoned RT device */
#define __XFS_HAS_FEAT(name, NAME) \
static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
@@ -253,7 +254,7 @@ __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
__XFS_HAS_FEAT(large_extent_counts, NREXT64)
__XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
__XFS_HAS_FEAT(metadir, METADIR)
-
+__XFS_HAS_FEAT(zoned, ZONED)
static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
{
@@ -264,7 +265,9 @@ static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
static inline bool xfs_has_rtsb(const struct xfs_mount *mp)
{
/* all rtgroups filesystems with an rt section have an rtsb */
- return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
+ return xfs_has_rtgroups(mp) &&
+ xfs_has_realtime(mp) &&
+ !xfs_has_zoned(mp);
}
static inline bool xfs_has_rtrmapbt(const struct xfs_mount *mp)
@@ -279,6 +282,11 @@ static inline bool xfs_has_rtreflink(const struct xfs_mount *mp)
xfs_has_reflink(mp);
}
+static inline bool xfs_has_nonzoned(const struct xfs_mount *mp)
+{
+ return !xfs_has_zoned(mp);
+}
+
/* Kernel mount features that we don't support */
#define __XFS_UNSUPP_FEAT(name) \
static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 12/43] xfs: allow internal RT devices for zoned mode
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (10 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 11/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 13/43] FIXUP: " Christoph Hellwig
` (31 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: bdc03eb5f98f6f1ae4bd5e020d1582a23efb7799
Allow creating an RT subvolume on the same device as the main data
device. This is mostly used for SMR HDDs where the conventional zones
are used for the data device and the sequential write required zones
for the zoned RT section.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_group.h | 6 ++++--
libxfs/xfs_rtgroup.h | 8 +++++---
libxfs/xfs_sb.c | 1 +
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index 242b05627c7a..a70096113384 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -107,9 +107,11 @@ xfs_gbno_to_daddr(
xfs_agblock_t gbno)
{
struct xfs_mount *mp = xg->xg_mount;
- uint32_t blocks = mp->m_groups[xg->xg_type].blocks;
+ struct xfs_groups *g = &mp->m_groups[xg->xg_type];
+ xfs_fsblock_t fsbno;
- return XFS_FSB_TO_BB(mp, (xfs_fsblock_t)xg->xg_gno * blocks + gbno);
+ fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+ return XFS_FSB_TO_BB(mp, g->start_fsb + fsbno);
}
static inline uint32_t
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 9c7e03f913cb..e35d1d798327 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -230,7 +230,8 @@ xfs_rtb_to_daddr(
xfs_rgnumber_t rgno = xfs_rtb_to_rgno(mp, rtbno);
uint64_t start_bno = (xfs_rtblock_t)rgno * g->blocks;
- return XFS_FSB_TO_BB(mp, start_bno + (rtbno & g->blkmask));
+ return XFS_FSB_TO_BB(mp,
+ g->start_fsb + start_bno + (rtbno & g->blkmask));
}
static inline xfs_rtblock_t
@@ -238,10 +239,11 @@ xfs_daddr_to_rtb(
struct xfs_mount *mp,
xfs_daddr_t daddr)
{
- xfs_rfsblock_t bno = XFS_BB_TO_FSBT(mp, daddr);
+ struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
+ xfs_rfsblock_t bno;
+ bno = XFS_BB_TO_FSBT(mp, daddr) - g->start_fsb;
if (xfs_has_rtgroups(mp)) {
- struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
xfs_rgnumber_t rgno;
uint32_t rgbno;
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index bc84792c565c..a95d712363fa 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1201,6 +1201,7 @@ xfs_sb_mount_rextsize(
rgs->blocks = sbp->sb_rgextents * sbp->sb_rextsize;
rgs->blklog = mp->m_sb.sb_rgblklog;
rgs->blkmask = xfs_mask32lo(mp->m_sb.sb_rgblklog);
+ rgs->start_fsb = mp->m_sb.sb_rtstart;
} else {
rgs->blocks = 0;
rgs->blklog = 0;
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 13/43] FIXUP: xfs: allow internal RT devices for zoned mode
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (11 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 12/43] xfs: allow internal RT devices for zoned mode Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 20:16 ` Darrick J. Wong
2025-04-14 5:35 ` [PATCH 14/43] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
` (30 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/libxfs.h | 6 ++++++
include/xfs_mount.h | 7 +++++++
libxfs/init.c | 13 +++++++++----
libxfs/rdwr.c | 2 ++
| 4 +++-
5 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/include/libxfs.h b/include/libxfs.h
index 82b34b9d81c3..b968a2b88da3 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -293,4 +293,10 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
}
+static inline bool xfs_sb_version_haszoned(struct xfs_sb *sbp)
+{
+ return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+ xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_ZONED);
+}
+
#endif /* __LIBXFS_H__ */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 7856acfb9f8e..bf9ebc25fc79 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -53,6 +53,13 @@ struct xfs_groups {
* rtgroup, so this mask must be 64-bit.
*/
uint64_t blkmask;
+
+ /*
+ * Start of the first group in the device. This is used to support a
+ * RT device following the data device on the same block device for
+ * SMR hard drives.
+ */
+ xfs_fsblock_t start_fsb;
};
/*
diff --git a/libxfs/init.c b/libxfs/init.c
index 5b45ed347276..a186369f3fd8 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -560,7 +560,7 @@ libxfs_buftarg_init(
progname);
exit(1);
}
- if (xi->rt.dev &&
+ if ((xi->rt.dev || xi->rt.dev == xi->data.dev) &&
(mp->m_rtdev_targp->bt_bdev != xi->rt.dev ||
mp->m_rtdev_targp->bt_mount != mp)) {
fprintf(stderr,
@@ -577,7 +577,11 @@ libxfs_buftarg_init(
else
mp->m_logdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->log,
lfail);
- mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt, rfail);
+ if (!xi->rt.dev || xi->rt.dev == xi->data.dev)
+ mp->m_rtdev_targp = mp->m_ddev_targp;
+ else
+ mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt,
+ rfail);
}
/* Compute maximum possible height for per-AG btree types for this fs. */
@@ -978,7 +982,7 @@ libxfs_flush_mount(
error = err2;
}
- if (mp->m_rtdev_targp) {
+ if (mp->m_rtdev_targp && mp->m_rtdev_targp != mp->m_ddev_targp) {
err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
_("realtime device"));
if (!error)
@@ -1031,7 +1035,8 @@ libxfs_umount(
free(mp->m_fsname);
mp->m_fsname = NULL;
- libxfs_buftarg_free(mp->m_rtdev_targp);
+ if (mp->m_rtdev_targp != mp->m_ddev_targp)
+ libxfs_buftarg_free(mp->m_rtdev_targp);
if (mp->m_logdev_targp != mp->m_ddev_targp)
libxfs_buftarg_free(mp->m_logdev_targp);
libxfs_buftarg_free(mp->m_ddev_targp);
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 35be785c435a..f06763b38bd8 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -175,6 +175,8 @@ libxfs_getrtsb(
if (!mp->m_rtdev_targp->bt_bdev)
return NULL;
+ ASSERT(!mp->m_sb.sb_rtstart);
+
error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
if (error)
--git a/repair/agheader.c b/repair/agheader.c
index 327ba041671f..048e6c3143b5 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -485,7 +485,9 @@ secondary_sb_whack(
*
* size is the size of data which is valid for this sb.
*/
- if (xfs_sb_version_hasmetadir(sb))
+ if (xfs_sb_version_haszoned(sb))
+ size = offsetofend(struct xfs_dsb, sb_rtreserved);
+ else if (xfs_sb_version_hasmetadir(sb))
size = offsetofend(struct xfs_dsb, sb_pad);
else if (xfs_sb_version_hasmetauuid(sb))
size = offsetofend(struct xfs_dsb, sb_meta_uuid);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 14/43] xfs: export zoned geometry via XFS_FSOP_GEOM
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (12 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 13/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 15/43] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
` (29 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 1fd8159e7ca41203798b6f65efaf1724eb318cd4
Export the zoned geometry information so that userspace can query it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_fs.h | 5 ++++-
libxfs/xfs_sb.c | 6 ++++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 2c3171262b44..5e66fb2b2cc7 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -189,7 +189,9 @@ struct xfs_fsop_geom {
uint32_t checked; /* o: checked fs & rt metadata */
__u32 rgextents; /* rt extents in a realtime group */
__u32 rgcount; /* number of realtime groups */
- __u64 reserved[16]; /* reserved space */
+ __u64 rtstart; /* start of internal rt section */
+ __u64 rtreserved; /* RT (zoned) reserved blocks */
+ __u64 reserved[14]; /* reserved space */
};
#define XFS_FSOP_GEOM_SICK_COUNTERS (1 << 0) /* summary counters */
@@ -247,6 +249,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
#define XFS_FSOP_GEOM_FLAGS_PARENT (1 << 25) /* linux parent pointers */
#define XFS_FSOP_GEOM_FLAGS_METADIR (1 << 26) /* metadata directories */
+#define XFS_FSOP_GEOM_FLAGS_ZONED (1 << 27) /* zoned rt device */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index a95d712363fa..8344b40e4ab9 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1566,6 +1566,8 @@ xfs_fs_geometry(
geo->flags |= XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE;
if (xfs_has_metadir(mp))
geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
+ if (xfs_has_zoned(mp))
+ geo->flags |= XFS_FSOP_GEOM_FLAGS_ZONED;
geo->rtsectsize = sbp->sb_blocksize;
geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
@@ -1586,6 +1588,10 @@ xfs_fs_geometry(
geo->rgcount = sbp->sb_rgcount;
geo->rgextents = sbp->sb_rgextents;
}
+ if (xfs_has_zoned(mp)) {
+ geo->rtstart = sbp->sb_rtstart;
+ geo->rtreserved = sbp->sb_rtreserved;
+ }
}
/* Read a secondary superblock. */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 15/43] xfs: disable sb_frextents for zoned file systems
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (13 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 14/43] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:35 ` [PATCH 16/43] xfs: parse and validate hardware zone information Christoph Hellwig
` (28 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 1d319ac6fe1bd6364c5fc6e285ac47b117aed117
Zoned file systems not only don't use the global frextents counter, but
for them the in-memory percpu counter also includes reservations taken
before even allocating delalloc extent records, so it will never match
the per-zone used information. Disable all updates and verification of
the sb counter for zoned file systems as it isn't useful for them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_sb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 8344b40e4ab9..8df99e518c70 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1330,7 +1330,7 @@ xfs_log_sb(
* we handle nearly-lockless reservations, so we must use the _positive
* variant here to avoid writing out nonsense frextents.
*/
- if (xfs_has_rtgroups(mp)) {
+ if (xfs_has_rtgroups(mp) && !xfs_has_zoned(mp)) {
mp->m_sb.sb_frextents =
xfs_sum_freecounter(mp, XC_FREE_RTEXTENTS);
}
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 16/43] xfs: parse and validate hardware zone information
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (14 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 15/43] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
@ 2025-04-14 5:35 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 17/43] FIXUP: " Christoph Hellwig
` (27 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:35 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 720c2d58348329ce57bfa7ecef93ee0c9bf4b405
Add support to validate and parse reported hardware zone state.
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
libxfs/xfs_zones.c | 171 +++++++++++++++++++++++++++++++++++++++++++++
libxfs/xfs_zones.h | 35 ++++++++++
2 files changed, 206 insertions(+)
create mode 100644 libxfs/xfs_zones.c
create mode 100644 libxfs/xfs_zones.h
diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
new file mode 100644
index 000000000000..b022ed960eac
--- /dev/null
+++ b/libxfs/xfs_zones.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023-2025 Christoph Hellwig.
+ * Copyright (c) 2024-2025, Western Digital Corporation or its affiliates.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_rtgroup.h"
+#include "xfs_zones.h"
+
+static bool
+xfs_zone_validate_empty(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+
+ if (rtg_rmap(rtg)->i_used_blocks > 0) {
+ xfs_warn(mp, "empty zone %u has non-zero used counter (0x%x).",
+ rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+ return false;
+ }
+
+ *write_pointer = 0;
+ return true;
+}
+
+static bool
+xfs_zone_validate_wp(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+ xfs_rtblock_t wp_fsb = xfs_daddr_to_rtb(mp, zone->wp);
+
+ if (rtg_rmap(rtg)->i_used_blocks > rtg->rtg_extents) {
+ xfs_warn(mp, "zone %u has too large used counter (0x%x).",
+ rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+ return false;
+ }
+
+ if (xfs_rtb_to_rgno(mp, wp_fsb) != rtg_rgno(rtg)) {
+ xfs_warn(mp, "zone %u write pointer (0x%llx) outside of zone.",
+ rtg_rgno(rtg), wp_fsb);
+ return false;
+ }
+
+ *write_pointer = xfs_rtb_to_rgbno(mp, wp_fsb);
+ if (*write_pointer >= rtg->rtg_extents) {
+ xfs_warn(mp, "zone %u has invalid write pointer (0x%x).",
+ rtg_rgno(rtg), *write_pointer);
+ return false;
+ }
+
+ return true;
+}
+
+static bool
+xfs_zone_validate_full(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+
+ if (rtg_rmap(rtg)->i_used_blocks > rtg->rtg_extents) {
+ xfs_warn(mp, "zone %u has too large used counter (0x%x).",
+ rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+ return false;
+ }
+
+ *write_pointer = rtg->rtg_extents;
+ return true;
+}
+
+static bool
+xfs_zone_validate_seq(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+
+ switch (zone->cond) {
+ case BLK_ZONE_COND_EMPTY:
+ return xfs_zone_validate_empty(zone, rtg, write_pointer);
+ case BLK_ZONE_COND_IMP_OPEN:
+ case BLK_ZONE_COND_EXP_OPEN:
+ case BLK_ZONE_COND_CLOSED:
+ return xfs_zone_validate_wp(zone, rtg, write_pointer);
+ case BLK_ZONE_COND_FULL:
+ return xfs_zone_validate_full(zone, rtg, write_pointer);
+ case BLK_ZONE_COND_NOT_WP:
+ case BLK_ZONE_COND_OFFLINE:
+ case BLK_ZONE_COND_READONLY:
+ xfs_warn(mp, "zone %u has unsupported zone condition 0x%x.",
+ rtg_rgno(rtg), zone->cond);
+ return false;
+ default:
+ xfs_warn(mp, "zone %u has unknown zone condition 0x%x.",
+ rtg_rgno(rtg), zone->cond);
+ return false;
+ }
+}
+
+static bool
+xfs_zone_validate_conv(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+
+ switch (zone->cond) {
+ case BLK_ZONE_COND_NOT_WP:
+ return true;
+ default:
+ xfs_warn(mp,
+"conventional zone %u has unsupported zone condition 0x%x.",
+ rtg_rgno(rtg), zone->cond);
+ return false;
+ }
+}
+
+bool
+xfs_zone_validate(
+ struct blk_zone *zone,
+ struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer)
+{
+ struct xfs_mount *mp = rtg_mount(rtg);
+ struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
+
+ /*
+ * Check that the zone capacity matches the rtgroup size stored in the
+ * superblock. Note that all zones including the last one must have a
+ * uniform capacity.
+ */
+ if (XFS_BB_TO_FSB(mp, zone->capacity) != g->blocks) {
+ xfs_warn(mp,
+"zone %u capacity (0x%llx) does not match RT group size (0x%x).",
+ rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->capacity),
+ g->blocks);
+ return false;
+ }
+
+ if (XFS_BB_TO_FSB(mp, zone->len) != 1 << g->blklog) {
+ xfs_warn(mp,
+"zone %u length (0x%llx) does match geometry (0x%x).",
+ rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->len),
+ 1 << g->blklog);
+ }
+
+ switch (zone->type) {
+ case BLK_ZONE_TYPE_CONVENTIONAL:
+ return xfs_zone_validate_conv(zone, rtg);
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
+ return xfs_zone_validate_seq(zone, rtg, write_pointer);
+ default:
+ xfs_warn(mp, "zoned %u has unsupported type 0x%x.",
+ rtg_rgno(rtg), zone->type);
+ return false;
+ }
+}
diff --git a/libxfs/xfs_zones.h b/libxfs/xfs_zones.h
new file mode 100644
index 000000000000..c4f1367b2cca
--- /dev/null
+++ b/libxfs/xfs_zones.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LIBXFS_ZONES_H
+#define _LIBXFS_ZONES_H
+
+struct xfs_rtgroup;
+
+/*
+ * In order to guarantee forward progress for GC we need to reserve at least
+ * two zones: one that will be used for moving data into and one spare zone
+ * making sure that we have enough space to relocate a nearly-full zone.
+ * To allow for slightly sloppy accounting for when we need to reserve the
+ * second zone, we actually reserve three as that is easier than doing fully
+ * accurate bookkeeping.
+ */
+#define XFS_GC_ZONES 3U
+
+/*
+ * In addition we need two zones for user writes, one open zone for writing
+ * and one to still have available blocks without resetting the open zone
+ * when data in the open zone has been freed.
+ */
+#define XFS_RESERVED_ZONES (XFS_GC_ZONES + 1)
+#define XFS_MIN_ZONES (XFS_RESERVED_ZONES + 1)
+
+/*
+ * Always keep one zone out of the general open zone pool to allow for GC to
+ * happen while other writers are waiting for free space.
+ */
+#define XFS_OPEN_GC_ZONES 1U
+#define XFS_MIN_OPEN_ZONES (XFS_OPEN_GC_ZONES + 1U)
+
+bool xfs_zone_validate(struct blk_zone *zone, struct xfs_rtgroup *rtg,
+ xfs_rgblock_t *write_pointer);
+
+#endif /* _LIBXFS_ZONES_H */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 17/43] FIXUP: xfs: parse and validate hardware zone information
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (15 preceding siblings ...)
2025-04-14 5:35 ` [PATCH 16/43] xfs: parse and validate hardware zone information Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 18/43] xfs: add the zoned space allocator Christoph Hellwig
` (26 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
libxfs/Makefile | 6 ++++--
libxfs/xfs_zones.c | 2 ++
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/libxfs/Makefile b/libxfs/Makefile
index f3daa598ca97..61c43529b532 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -71,7 +71,8 @@ HFILES = \
xfs_shared.h \
xfs_trans_resv.h \
xfs_trans_space.h \
- xfs_dir2_priv.h
+ xfs_dir2_priv.h \
+ xfs_zones.h
CFILES = buf_mem.c \
cache.c \
@@ -135,7 +136,8 @@ CFILES = buf_mem.c \
xfs_trans_inode.c \
xfs_trans_resv.c \
xfs_trans_space.c \
- xfs_types.c
+ xfs_types.c \
+ xfs_zones.c
EXTRA_CFILES=\
ioctl_c_dummy.c
diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
index b022ed960eac..712c0fe9b0da 100644
--- a/libxfs/xfs_zones.c
+++ b/libxfs/xfs_zones.c
@@ -3,6 +3,8 @@
* Copyright (c) 2023-2025 Christoph Hellwig.
* Copyright (c) 2024-2025, Western Digital Corporation or its affiliates.
*/
+#include <linux/blkzoned.h>
+#include "libxfs_priv.h"
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 18/43] xfs: add the zoned space allocator
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (16 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 17/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 19/43] xfs: add support for zoned space reservations Christoph Hellwig
` (25 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 4e4d52075577707f8393e3fc74c1ef79ca1d3ce6
For zoned RT devices space is always allocated at the write pointer, that
is right after the last written block and only recorded on I/O completion.
Because the actual allocation algorithm is very simple and just involves
picking a good zone - preferably the one used for the last write to the
inode. As the number of zones that can written at the same time is
usually limited by the hardware, selecting a zone is done as late as
possible from the iomap dio and buffered writeback bio submissions
helpers just before submitting the bio.
Given that the writers already took a reservation before acquiring the
iolock, space will always be readily available if an open zone slot is
available. A new structure is used to track these open zones, and
pointed to by the xfs_rtgroup. Because zoned file systems don't have
a rsum cache the space for that pointer can be reused.
Allocations are only recorded at I/O completion time. The scheme used
for that is very similar to the reflink COW end I/O path.
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_rtgroup.h | 22 +++++++++++++++++-----
libxfs/xfs_types.h | 1 +
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index e35d1d798327..5d8777f819f4 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -37,15 +37,27 @@ struct xfs_rtgroup {
xfs_rtxnum_t rtg_extents;
/*
- * Cache of rt summary level per bitmap block with the invariant that
- * rtg_rsum_cache[bbno] > the maximum i for which rsum[i][bbno] != 0,
- * or 0 if rsum[i][bbno] == 0 for all i.
- *
+ * For bitmap based RT devices this points to a cache of rt summary
+ * level per bitmap block with the invariant that rtg_rsum_cache[bbno]
+ * > the maximum i for which rsum[i][bbno] != 0, or 0 if
+ * rsum[i][bbno] == 0 for all i.
* Reads and writes are serialized by the rsumip inode lock.
+ *
+ * For zoned RT devices this points to the open zone structure for
+ * a group that is open for writers, or is NULL.
*/
- uint8_t *rtg_rsum_cache;
+ union {
+ uint8_t *rtg_rsum_cache;
+ struct xfs_open_zone *rtg_open_zone;
+ };
};
+/*
+ * For zoned RT devices this is set on groups that have no written blocks
+ * and can be picked by the allocator for opening.
+ */
+#define XFS_RTG_FREE XA_MARK_0
+
static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
{
return container_of(xg, struct xfs_rtgroup, rtg_group);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index 76f3c31573ec..dc1db15f0be5 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -243,6 +243,7 @@ enum xfs_free_counter {
* Number of free RT extents on the RT device.
*/
XC_FREE_RTEXTENTS,
+
XC_FREE_NR,
};
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 19/43] xfs: add support for zoned space reservations
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (17 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 18/43] xfs: add the zoned space allocator Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 20/43] FIXUP: " Christoph Hellwig
` (24 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 0bb2193056b5969e4148fc0909e89a5362da873e
For zoned file systems garbage collection (GC) has to take the iolock
and mmaplock after moving data to a new place to synchronize with
readers. This means waiting for garbage collection with the iolock can
deadlock.
To avoid this, the worst case required blocks have to be reserved before
taking the iolock, which is done using a new RTAVAILABLE counter that
tracks blocks that are free to write into and don't require garbage
collection. The new helpers try to take these available blocks, and
if there aren't enough available it wakes and waits for GC. This is
done using a list of on-stack reservations to ensure fairness.
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_bmap.c | 15 +++++++++++----
libxfs/xfs_types.h | 12 +++++++++++-
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index c40cdf004ac9..3a857181bfa4 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -35,6 +35,7 @@
#include "xfs_symlink_remote.h"
#include "xfs_inode_util.h"
#include "xfs_rtgroup.h"
+#include "xfs_zone_alloc.h"
struct kmem_cache *xfs_bmap_intent_cache;
@@ -4783,12 +4784,18 @@ xfs_bmap_del_extent_delay(
da_diff = da_old - da_new;
fdblocks = da_diff;
- if (bflags & XFS_BMAPI_REMAP)
+ if (bflags & XFS_BMAPI_REMAP) {
;
- else if (isrt)
- xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, del->br_blockcount));
- else
+ } else if (isrt) {
+ xfs_rtbxlen_t rtxlen;
+
+ rtxlen = xfs_blen_to_rtbxlen(mp, del->br_blockcount);
+ if (xfs_is_zoned_inode(ip))
+ xfs_zoned_add_available(mp, rtxlen);
+ xfs_add_frextents(mp, rtxlen);
+ } else {
fdblocks += del->br_blockcount;
+ }
xfs_add_fdblocks(mp, fdblocks);
xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index dc1db15f0be5..f6f4f2d4b5db 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -244,12 +244,22 @@ enum xfs_free_counter {
*/
XC_FREE_RTEXTENTS,
+ /*
+ * Number of available for use RT extents.
+ *
+ * This counter only exists for zoned RT device and indicates the number
+ * of RT extents that can be directly used by writes. XC_FREE_RTEXTENTS
+ * also includes blocks that have been written previously and freed, but
+ * sit in a rtgroup that still needs a zone reset.
+ */
+ XC_FREE_RTAVAILABLE,
XC_FREE_NR,
};
#define XFS_FREECOUNTER_STR \
{ XC_FREE_BLOCKS, "blocks" }, \
- { XC_FREE_RTEXTENTS, "rtextents" }
+ { XC_FREE_RTEXTENTS, "rtextents" }, \
+ { XC_FREE_RTAVAILABLE, "rtavailable" }
/*
* Type verifier functions
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 20/43] FIXUP: xfs: add support for zoned space reservations
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (18 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 19/43] xfs: add support for zoned space reservations Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 21/43] xfs: implement zoned garbage collection Christoph Hellwig
` (23 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
libxfs/libxfs_priv.h | 2 ++
libxfs/xfs_bmap.c | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 82952b0db629..d5f7d28e08e2 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -471,6 +471,8 @@ static inline int retzero(void) { return 0; }
#define xfs_sb_validate_fsb_count(sbp, nblks) (0)
#define xlog_calc_iovec_len(len) roundup(len, sizeof(uint32_t))
+#define xfs_zoned_add_available(mp, rtxnum) do { } while (0)
+
/*
* Prototypes for kernel static functions that are aren't in their
* associated header files.
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 3a857181bfa4..3cb47c3c8707 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -35,7 +35,6 @@
#include "xfs_symlink_remote.h"
#include "xfs_inode_util.h"
#include "xfs_rtgroup.h"
-#include "xfs_zone_alloc.h"
struct kmem_cache *xfs_bmap_intent_cache;
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 21/43] xfs: implement zoned garbage collection
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (19 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 20/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 22/43] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
` (22 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 080d01c41d44f0993f2c235a6bfdb681f0a66be6
RT groups on a zoned file system need to be completely empty before their
space can be reused. This means that partially empty groups need to be
emptied entirely to free up space if no entirely free groups are
available.
Add a garbage collection thread that moves all data out of the least used
zone when not enough free zones are available, and which resets all zones
that have been emptied. To find empty zone a simple set of 10 buckets
based on the amount of space used in the zone is used. To empty zones,
the rmap is walked to find the owners and the data is read and then
written to the new place.
To automatically defragment files the rmap records are sorted by inode
and logical offset. This means defragmentation of parallel writes into
a single zone happens automatically when performing garbage collection.
Because holding the iolock over the entire GC cycle would inject very
noticeable latency for other accesses to the inodes, the iolock is not
taken while performing I/O. Instead the I/O completion handler checks
that the mapping hasn't changed over the one recorded at the start of
the GC cycle and doesn't update the mapping if it change.
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_group.h | 21 +++++++++++++++++----
libxfs/xfs_rtgroup.h | 6 ++++++
2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index a70096113384..cff3f815947b 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -19,10 +19,23 @@ struct xfs_group {
#ifdef __KERNEL__
/* -- kernel only structures below this line -- */
- /*
- * Track freed but not yet committed extents.
- */
- struct xfs_extent_busy_tree *xg_busy_extents;
+ union {
+ /*
+ * For perags and non-zoned RT groups:
+ * Track freed but not yet committed extents.
+ */
+ struct xfs_extent_busy_tree *xg_busy_extents;
+
+ /*
+ * For zoned RT groups:
+ * List of groups that need a zone reset.
+ *
+ * The zonegc code forces a log flush of the rtrmap inode before
+ * resetting the write pointer, so there is no need for
+ * individual busy extent tracking.
+ */
+ struct xfs_group *xg_next_reset;
+ };
/*
* Bitsets of per-ag metadata that have been checked and/or are sick.
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 5d8777f819f4..b325aff28264 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -58,6 +58,12 @@ struct xfs_rtgroup {
*/
#define XFS_RTG_FREE XA_MARK_0
+/*
+ * For zoned RT devices this is set on groups that are fully written and that
+ * have unused blocks. Used by the garbage collection to pick targets.
+ */
+#define XFS_RTG_RECLAIMABLE XA_MARK_1
+
static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
{
return container_of(xg, struct xfs_rtgroup, rtg_group);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 22/43] xfs: enable fsmap reporting for internal RT devices
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (20 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 21/43] xfs: implement zoned garbage collection Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 23/43] xfs: enable the zoned RT device feature Christoph Hellwig
` (21 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: e50ec7fac81aa271f20ae09868f772ff43a240b0
File system with internal RT devices are a bit odd in that we need
to report AGs and RGs. To make this happen use separate synthetic
fmr_device values for the different sections instead of the dev_t
mapping used by other XFS configurations.
The data device is reported as file system metadata before the
start of the RGs for the synthetic RT fmr_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_fs.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 5e66fb2b2cc7..12463ba766da 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -1082,6 +1082,15 @@ struct xfs_rtgroup_geometry {
#define XFS_IOC_COMMIT_RANGE _IOW ('X', 131, struct xfs_commit_range)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */
+/*
+ * Devices supported by a single XFS file system. Reported in fsmaps fmr_device
+ * when using internal RT devices.
+ */
+enum xfs_device {
+ XFS_DEV_DATA = 1,
+ XFS_DEV_LOG = 2,
+ XFS_DEV_RT = 3,
+};
#ifndef HAVE_BBMACROS
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 23/43] xfs: enable the zoned RT device feature
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (21 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 22/43] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 24/43] xfs: support zone gaps Christoph Hellwig
` (20 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: be458049ffe32b5885c5c35b12997fd40c2986c4
Enable the zoned RT device directory feature. With this feature, RT
groups are written sequentially and always emptied before rewriting
the blocks. This perfectly maps to zoned devices, but can also be
used on conventional block devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_format.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index f67380a25805..cee7e26f23bd 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -408,7 +408,8 @@ xfs_sb_has_ro_compat_feature(
XFS_SB_FEAT_INCOMPAT_NREXT64 | \
XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
XFS_SB_FEAT_INCOMPAT_PARENT | \
- XFS_SB_FEAT_INCOMPAT_METADIR)
+ XFS_SB_FEAT_INCOMPAT_METADIR | \
+ XFS_SB_FEAT_INCOMPAT_ZONED)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 24/43] xfs: support zone gaps
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (22 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 23/43] xfs: enable the zoned RT device feature Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 25/43] FIXUP: " Christoph Hellwig
` (19 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Source kernel commit: 97c69ba1c08d5a2bb3cacecae685b63e20e4d485
Zoned devices can have gaps beyond the usable capacity of a zone and the
end in the LBA/daddr address space. In other words, the hardware
equivalent to the RT groups already takes care of the power of 2
alignment for us. In this case the sparse FSB/RTB address space maps 1:1
to the device address space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/xfs_format.h | 4 +++-
libxfs/xfs_group.h | 6 +++++-
libxfs/xfs_rtgroup.h | 13 ++++++++-----
libxfs/xfs_sb.c | 3 +++
libxfs/xfs_zones.c | 19 +++++++++++++++++--
5 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index cee7e26f23bd..9566a7623365 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -398,6 +398,7 @@ xfs_sb_has_ro_compat_feature(
#define XFS_SB_FEAT_INCOMPAT_PARENT (1 << 7) /* parent pointers */
#define XFS_SB_FEAT_INCOMPAT_METADIR (1 << 8) /* metadata dir tree */
#define XFS_SB_FEAT_INCOMPAT_ZONED (1 << 9) /* zoned RT allocator */
+#define XFS_SB_FEAT_INCOMPAT_ZONE_GAPS (1 << 10) /* RTGs have LBA gaps */
#define XFS_SB_FEAT_INCOMPAT_ALL \
(XFS_SB_FEAT_INCOMPAT_FTYPE | \
@@ -409,7 +410,8 @@ xfs_sb_has_ro_compat_feature(
XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
XFS_SB_FEAT_INCOMPAT_PARENT | \
XFS_SB_FEAT_INCOMPAT_METADIR | \
- XFS_SB_FEAT_INCOMPAT_ZONED)
+ XFS_SB_FEAT_INCOMPAT_ZONED | \
+ XFS_SB_FEAT_INCOMPAT_ZONE_GAPS)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index cff3f815947b..4423932a2313 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -123,7 +123,11 @@ xfs_gbno_to_daddr(
struct xfs_groups *g = &mp->m_groups[xg->xg_type];
xfs_fsblock_t fsbno;
- fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+ if (g->has_daddr_gaps)
+ fsbno = xfs_gbno_to_fsb(xg, gbno);
+ else
+ fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+
return XFS_FSB_TO_BB(mp, g->start_fsb + fsbno);
}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index b325aff28264..d36a6ae0abe5 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -245,11 +245,14 @@ xfs_rtb_to_daddr(
xfs_rtblock_t rtbno)
{
struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
- xfs_rgnumber_t rgno = xfs_rtb_to_rgno(mp, rtbno);
- uint64_t start_bno = (xfs_rtblock_t)rgno * g->blocks;
- return XFS_FSB_TO_BB(mp,
- g->start_fsb + start_bno + (rtbno & g->blkmask));
+ if (xfs_has_rtgroups(mp) && !g->has_daddr_gaps) {
+ xfs_rgnumber_t rgno = xfs_rtb_to_rgno(mp, rtbno);
+
+ rtbno = (xfs_rtblock_t)rgno * g->blocks + (rtbno & g->blkmask);
+ }
+
+ return XFS_FSB_TO_BB(mp, g->start_fsb + rtbno);
}
static inline xfs_rtblock_t
@@ -261,7 +264,7 @@ xfs_daddr_to_rtb(
xfs_rfsblock_t bno;
bno = XFS_BB_TO_FSBT(mp, daddr) - g->start_fsb;
- if (xfs_has_rtgroups(mp)) {
+ if (xfs_has_rtgroups(mp) && !g->has_daddr_gaps) {
xfs_rgnumber_t rgno;
uint32_t rgbno;
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 8df99e518c70..078c75febf4f 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1202,6 +1202,9 @@ xfs_sb_mount_rextsize(
rgs->blklog = mp->m_sb.sb_rgblklog;
rgs->blkmask = xfs_mask32lo(mp->m_sb.sb_rgblklog);
rgs->start_fsb = mp->m_sb.sb_rtstart;
+ if (xfs_sb_has_incompat_feature(sbp,
+ XFS_SB_FEAT_INCOMPAT_ZONE_GAPS))
+ rgs->has_daddr_gaps = true;
} else {
rgs->blocks = 0;
rgs->blklog = 0;
diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
index 712c0fe9b0da..7a81d83f5b3e 100644
--- a/libxfs/xfs_zones.c
+++ b/libxfs/xfs_zones.c
@@ -139,6 +139,7 @@ xfs_zone_validate(
{
struct xfs_mount *mp = rtg_mount(rtg);
struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
+ uint32_t expected_size;
/*
* Check that the zone capacity matches the rtgroup size stored in the
@@ -153,11 +154,25 @@ xfs_zone_validate(
return false;
}
- if (XFS_BB_TO_FSB(mp, zone->len) != 1 << g->blklog) {
+ if (g->has_daddr_gaps) {
+ expected_size = 1 << g->blklog;
+ } else {
+ if (zone->len != zone->capacity) {
+ xfs_warn(mp,
+"zone %u has capacity != size ((0x%llx vs 0x%llx)",
+ rtg_rgno(rtg),
+ XFS_BB_TO_FSB(mp, zone->len),
+ XFS_BB_TO_FSB(mp, zone->capacity));
+ return false;
+ }
+ expected_size = g->blocks;
+ }
+
+ if (XFS_BB_TO_FSB(mp, zone->len) != expected_size) {
xfs_warn(mp,
"zone %u length (0x%llx) does match geometry (0x%x).",
rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->len),
- 1 << g->blklog);
+ expected_size);
}
switch (zone->type) {
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 25/43] FIXUP: xfs: support zone gaps
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (23 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 24/43] xfs: support zone gaps Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 26/43] libfrog: report the zoned geometry Christoph Hellwig
` (18 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
db/convert.c | 6 +++++-
include/xfs_mount.h | 9 +++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/db/convert.c b/db/convert.c
index 47d3e86fdc4e..3eec4f224f51 100644
--- a/db/convert.c
+++ b/db/convert.c
@@ -44,10 +44,14 @@ xfs_daddr_to_rgno(
struct xfs_mount *mp,
xfs_daddr_t daddr)
{
+ struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
+
if (!xfs_has_rtgroups(mp))
return 0;
- return XFS_BB_TO_FSBT(mp, daddr) / mp->m_groups[XG_TYPE_RTG].blocks;
+ if (g->has_daddr_gaps)
+ return XFS_BB_TO_FSBT(mp, daddr) / (1 << g->blklog);
+ return XFS_BB_TO_FSBT(mp, daddr) / g->blocks;
}
typedef enum {
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index bf9ebc25fc79..5a714333c16e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -47,6 +47,15 @@ struct xfs_groups {
*/
uint8_t blklog;
+ /*
+ * Zoned devices can have gaps beyoned the usable capacity of a zone
+ * and the end in the LBA/daddr address space. In other words, the
+ * hardware equivalent to the RT groups already takes care of the power
+ * of 2 alignment for us. In this case the sparse FSB/RTB address space
+ * maps 1:1 to the device address space.
+ */
+ bool has_daddr_gaps;
+
/*
* Mask to extract the group-relative block number from a FSB.
* For a pre-rtgroups filesystem we pretend to have one very large
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 26/43] libfrog: report the zoned geometry
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (24 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 25/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 22:16 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 27/43] xfs_repair: support repairing zoned file systems Christoph Hellwig
` (17 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
The rtdev_name helper is based on example code posted by Darrick Wong.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libfrog/fsgeom.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index b5220d2d6ffd..571d376c6b3c 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -8,6 +8,20 @@
#include "fsgeom.h"
#include "util.h"
+static inline const char *
+rtdev_name(
+ struct xfs_fsop_geom *geo,
+ const char *rtname)
+{
+ if (!geo->rtblocks)
+ return _("none");
+ if (geo->rtstart)
+ return _("internal");
+ if (!rtname)
+ return _("external");
+ return rtname;
+}
+
void
xfs_report_geom(
struct xfs_fsop_geom *geo,
@@ -34,6 +48,7 @@ xfs_report_geom(
int exchangerange;
int parent;
int metadir;
+ int zoned;
isint = geo->logstart > 0;
lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -55,6 +70,7 @@ xfs_report_geom(
exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
+ zoned = geo->flags & XFS_FSOP_GEOM_FLAGS_ZONED ? 1 : 0;
printf(_(
"meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
@@ -68,7 +84,8 @@ xfs_report_geom(
"log =%-22s bsize=%-6d blocks=%u, version=%d\n"
" =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
-" =%-22s rgcount=%-4d rgsize=%u extents\n"),
+" =%-22s rgcount=%-4d rgsize=%u extents\n"
+" =%-22s zoned=%-6d start=%llu reserved=%llu\n"),
mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
"", geo->sectsize, attrversion, projid32bit,
"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
@@ -81,10 +98,11 @@ xfs_report_geom(
isint ? _("internal log") : logname ? logname : _("external"),
geo->blocksize, geo->logblocks, logversion,
"", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,
- !geo->rtblocks ? _("none") : rtname ? rtname : _("external"),
+ rtdev_name(geo, rtname),
geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
(unsigned long long)geo->rtextents,
- "", geo->rgcount, geo->rgextents);
+ "", geo->rgcount, geo->rgextents,
+ "", zoned, geo->rtstart, geo->rtreserved);
}
/* Try to obtain the xfs geometry. On error returns a negative error code. */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 27/43] xfs_repair: support repairing zoned file systems
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (25 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 26/43] libfrog: report the zoned geometry Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:38 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
` (16 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Note really much to do here. Mostly ignore the validation and
regeneration of the bitmap and summary inodes. Eventually this
could grow a bit of validation of the hardware zone state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
repair/dinode.c | 4 +++-
repair/phase5.c | 13 +++++++++++++
repair/phase6.c | 6 ++++--
repair/rt.c | 2 ++
repair/rtrmap_repair.c | 33 +++++++++++++++++++++++++++++++++
repair/xfs_repair.c | 9 ++++++---
6 files changed, 61 insertions(+), 6 deletions(-)
diff --git a/repair/dinode.c b/repair/dinode.c
index 8696a838087f..7bdd3dcf15c1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3585,7 +3585,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
validate_extsize(mp, dino, lino, dirty);
- if (dino->di_version >= 3)
+ if (dino->di_version >= 3 &&
+ (!xfs_has_zoned(mp) ||
+ dino->di_metatype != cpu_to_be16(XFS_METAFILE_RTRMAP)))
validate_cowextsize(mp, dino, lino, dirty);
/* nsec fields cannot be larger than 1 billion */
diff --git a/repair/phase5.c b/repair/phase5.c
index 4cf28d8ae1a2..e350b411c243 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -630,6 +630,19 @@ void
check_rtmetadata(
struct xfs_mount *mp)
{
+ if (xfs_has_zoned(mp)) {
+ /*
+ * Here we could/should verify the zone state a bit when we are
+ * on actual zoned devices:
+ * - compare hw write pointer to last written
+ * - compare zone state to last written
+ *
+ * Note much we can do when running in zoned mode on a
+ * conventional device.
+ */
+ return;
+ }
+
generate_rtinfo(mp);
check_rtbitmap(mp);
check_rtsummary(mp);
diff --git a/repair/phase6.c b/repair/phase6.c
index dbc090a54139..a7187e84daae 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3460,8 +3460,10 @@ _(" - resetting contents of realtime bitmap and summary inodes\n"));
return;
while ((rtg = xfs_rtgroup_next(mp, rtg))) {
- ensure_rtgroup_bitmap(rtg);
- ensure_rtgroup_summary(rtg);
+ if (!xfs_has_zoned(mp)) {
+ ensure_rtgroup_bitmap(rtg);
+ ensure_rtgroup_summary(rtg);
+ }
ensure_rtgroup_rmapbt(rtg, est_fdblocks);
ensure_rtgroup_refcountbt(rtg, est_fdblocks);
}
diff --git a/repair/rt.c b/repair/rt.c
index e0a4943ee3b7..a2478fb635e3 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -222,6 +222,8 @@ check_rtfile_contents(
xfs_fileoff_t bno = 0;
int error;
+ ASSERT(!xfs_has_zoned(mp));
+
if (!ip) {
do_warn(_("unable to open %s file\n"), filename);
return;
diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c
index 2b07e8943e59..955db1738fe2 100644
--- a/repair/rtrmap_repair.c
+++ b/repair/rtrmap_repair.c
@@ -141,6 +141,37 @@ xrep_rtrmap_btree_load(
return error;
}
+static void
+rtgroup_update_counters(
+ struct xfs_rtgroup *rtg)
+{
+ struct xfs_inode *rmapip = rtg->rtg_inodes[XFS_RTGI_RMAP];
+ struct xfs_mount *mp = rtg_mount(rtg);
+ uint64_t end =
+ xfs_rtbxlen_to_blen(mp, rtg->rtg_extents);
+ xfs_agblock_t gbno = 0;
+ uint64_t used = 0;
+
+ do {
+ int bstate;
+ xfs_extlen_t blen;
+
+ bstate = get_bmap_ext(rtg_rgno(rtg), gbno, end, &blen, true);
+ switch (bstate) {
+ case XR_E_INUSE:
+ case XR_E_INUSE_FS:
+ used += blen;
+ break;
+ default:
+ break;
+ }
+
+ gbno += blen;
+ } while (gbno < end);
+
+ rmapip->i_used_blocks = used;
+}
+
/* Update the inode counters. */
STATIC int
xrep_rtrmap_reset_counters(
@@ -153,6 +184,8 @@ xrep_rtrmap_reset_counters(
* generated.
*/
sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
+ if (xfs_has_zoned(sc->mp))
+ rtgroup_update_counters(rr->rtg);
libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
/* Quotas don't exist so we're done. */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index eeaaf6434689..7bf75c09b945 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1388,16 +1388,19 @@ main(int argc, char **argv)
* Done with the block usage maps, toss them. Realtime metadata aren't
* rebuilt until phase 6, so we have to keep them around.
*/
- if (mp->m_sb.sb_rblocks == 0)
+ if (mp->m_sb.sb_rblocks == 0) {
rmaps_free(mp);
- free_bmaps(mp);
+ free_bmaps(mp);
+ }
if (!bad_ino_btree) {
phase6(mp);
phase_end(mp, 6);
- if (mp->m_sb.sb_rblocks != 0)
+ if (mp->m_sb.sb_rblocks != 0) {
rmaps_free(mp);
+ free_bmaps(mp);
+ }
free_rtgroup_inodes();
phase7(mp, phase2_threads);
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (26 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 27/43] xfs_repair: support repairing zoned file systems Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:34 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
` (15 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Don't look at the variable for the rtname command line option, but
the actual file system geometry.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
repair/dinode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/repair/dinode.c b/repair/dinode.c
index 7bdd3dcf15c1..8ca0aa0238c7 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3265,8 +3265,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
flags &= XFS_DIFLAG_ANY;
}
- /* need an rt-dev for the realtime flag! */
- if ((flags & XFS_DIFLAG_REALTIME) && !rt_name) {
+ /* need an rt-dev for the realtime flag */
+ if ((flags & XFS_DIFLAG_REALTIME) && !mp->m_sb.sb_rextents) {
if (!uncertain) {
do_warn(
_("inode %" PRIu64 " has RT flag set but there is no RT device\n"),
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (27 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:39 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper Christoph Hellwig
` (14 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Run a report zones ioctl, and verify the rt group state vs the
reported hardware zone state. Note that there is no way to actually
fix up any discrepancies here, as that would be rather scary without
having transactions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
repair/Makefile | 1 +
repair/phase5.c | 11 +---
repair/zoned.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
repair/zoned.h | 10 ++++
4 files changed, 152 insertions(+), 9 deletions(-)
create mode 100644 repair/zoned.c
create mode 100644 repair/zoned.h
diff --git a/repair/Makefile b/repair/Makefile
index ff5b1f5abeda..fb0b2f96cc91 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -81,6 +81,7 @@ CFILES = \
strblobs.c \
threads.c \
versions.c \
+ zoned.c \
xfs_repair.c
LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBXCMD) $(LIBFROG) $(LIBUUID) $(LIBRT) \
diff --git a/repair/phase5.c b/repair/phase5.c
index e350b411c243..e44c26885717 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -21,6 +21,7 @@
#include "rmap.h"
#include "bulkload.h"
#include "agbtree.h"
+#include "zoned.h"
static uint64_t *sb_icount_ag; /* allocated inodes per ag */
static uint64_t *sb_ifree_ag; /* free inodes per ag */
@@ -631,15 +632,7 @@ check_rtmetadata(
struct xfs_mount *mp)
{
if (xfs_has_zoned(mp)) {
- /*
- * Here we could/should verify the zone state a bit when we are
- * on actual zoned devices:
- * - compare hw write pointer to last written
- * - compare zone state to last written
- *
- * Note much we can do when running in zoned mode on a
- * conventional device.
- */
+ check_zones(mp);
return;
}
diff --git a/repair/zoned.c b/repair/zoned.c
new file mode 100644
index 000000000000..456076b9817d
--- /dev/null
+++ b/repair/zoned.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2024 Christoph Hellwig.
+ */
+#include <ctype.h>
+#include <linux/blkzoned.h>
+#include "libxfs_priv.h"
+#include "libxfs.h"
+#include "xfs_zones.h"
+#include "err_protos.h"
+#include "zoned.h"
+
+/* random size that allows efficient processing */
+#define ZONES_PER_IOCTL 16384
+
+static void
+report_zones_cb(
+ struct xfs_mount *mp,
+ struct blk_zone *zone)
+{
+ xfs_rtblock_t zsbno = xfs_daddr_to_rtb(mp, zone->start);
+ xfs_rgblock_t write_pointer;
+ xfs_rgnumber_t rgno;
+ struct xfs_rtgroup *rtg;
+
+ if (xfs_rtb_to_rgbno(mp, zsbno) != 0) {
+ do_error(_("mismatched zone start 0x%llx."),
+ (unsigned long long)zsbno);
+ return;
+ }
+
+ rgno = xfs_rtb_to_rgno(mp, zsbno);
+ rtg = xfs_rtgroup_grab(mp, rgno);
+ if (!rtg) {
+ do_error(_("realtime group not found for zone %u."), rgno);
+ return;
+ }
+
+ if (!rtg_rmap(rtg))
+ do_warn(_("no rmap inode for zone %u."), rgno);
+ else
+ xfs_zone_validate(zone, rtg, &write_pointer);
+ xfs_rtgroup_rele(rtg);
+}
+
+void
+check_zones(
+ struct xfs_mount *mp)
+{
+ int fd = mp->m_rtdev_targp->bt_bdev_fd;
+ uint64_t sector = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart);
+ unsigned int zone_size, zone_capacity;
+ uint64_t device_size;
+ size_t rep_size;
+ struct blk_zone_report *rep;
+ unsigned int i, n = 0;
+
+ if (ioctl(fd, BLKGETSIZE64, &device_size))
+ return; /* not a block device */
+ if (ioctl(fd, BLKGETZONESZ, &zone_size) || !zone_size)
+ return; /* not zoned */
+
+ /* BLKGETSIZE64 reports a byte value */
+ device_size = BTOBB(device_size);
+ if (device_size / zone_size < mp->m_sb.sb_rgcount) {
+ do_error(_("rt device too small\n"));
+ return;
+ }
+
+ rep_size = sizeof(struct blk_zone_report) +
+ sizeof(struct blk_zone) * ZONES_PER_IOCTL;
+ rep = malloc(rep_size);
+ if (!rep) {
+ do_warn(_("malloc failed for zone report\n"));
+ return;
+ }
+
+ while (n < mp->m_sb.sb_rgcount) {
+ struct blk_zone *zones = (struct blk_zone *)(rep + 1);
+ int ret;
+
+ memset(rep, 0, rep_size);
+ rep->sector = sector;
+ rep->nr_zones = ZONES_PER_IOCTL;
+
+ ret = ioctl(fd, BLKREPORTZONE, rep);
+ if (ret) {
+ do_error(_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
+ goto out_free;
+ }
+ if (!rep->nr_zones)
+ break;
+
+ for (i = 0; i < rep->nr_zones; i++) {
+ if (n >= mp->m_sb.sb_rgcount)
+ break;
+
+ if (zones[i].len != zone_size) {
+ do_error(_("Inconsistent zone size!\n"));
+ goto out_free;
+ }
+
+ switch (zones[i].type) {
+ case BLK_ZONE_TYPE_CONVENTIONAL:
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
+ break;
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
+ do_error(
+_("Found sequential write preferred zone\n"));
+ goto out_free;
+ default:
+ do_error(
+_("Found unknown zone type (0x%x)\n"), zones[i].type);
+ goto out_free;
+ }
+
+ if (!n) {
+ zone_capacity = zones[i].capacity;
+ if (zone_capacity > zone_size) {
+ do_error(
+_("Zone capacity larger than zone size!\n"));
+ goto out_free;
+ }
+ } else if (zones[i].capacity != zone_capacity) {
+ do_error(
+_("Inconsistent zone capacity!\n"));
+ goto out_free;
+ }
+
+ report_zones_cb(mp, &zones[i]);
+ n++;
+ }
+ sector = zones[rep->nr_zones - 1].start +
+ zones[rep->nr_zones - 1].len;
+ }
+
+out_free:
+ free(rep);
+}
diff --git a/repair/zoned.h b/repair/zoned.h
new file mode 100644
index 000000000000..ab76bf15b3ca
--- /dev/null
+++ b/repair/zoned.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2024 Christoph Hellwig.
+ */
+#ifndef _XFS_REPAIR_ZONED_H_
+#define _XFS_REPAIR_ZONED_H_
+
+void check_zones(struct xfs_mount *mp);
+
+#endif /* _XFS_REPAIR_ZONED_H_ */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (28 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:40 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices Christoph Hellwig
` (13 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Factor out the rtgroup geometry checks so that they can be easily reused
for the upcoming zoned RT allocator support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mkfs/xfs_mkfs.c | 67 +++++++++++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 30 deletions(-)
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index ec82e05bf4e4..13b746b365e1 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3950,6 +3950,42 @@ out:
cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
}
+static void
+validate_rtgroup_geometry(
+ struct mkfs_params *cfg)
+{
+ if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
+ fprintf(stderr,
+_("realtime group size (%llu) must be less than the maximum (%u)\n"),
+ (unsigned long long)cfg->rgsize,
+ XFS_MAX_RGBLOCKS);
+ usage();
+ }
+
+ if (cfg->rgsize % cfg->rtextblocks != 0) {
+ fprintf(stderr,
+_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
+ (unsigned long long)cfg->rgsize,
+ (unsigned long long)cfg->rtextblocks);
+ usage();
+ }
+
+ if (cfg->rgsize <= cfg->rtextblocks) {
+ fprintf(stderr,
+_("realtime group size (%llu) must be at least two realtime extents\n"),
+ (unsigned long long)cfg->rgsize);
+ usage();
+ }
+
+ if (cfg->rgcount > XFS_MAX_RGNUMBER) {
+ fprintf(stderr,
+_("realtime group count (%llu) must be less than the maximum (%u)\n"),
+ (unsigned long long)cfg->rgcount,
+ XFS_MAX_RGNUMBER);
+ usage();
+ }
+}
+
static void
calculate_rtgroup_geometry(
struct mkfs_params *cfg,
@@ -4007,36 +4043,7 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
(cfg->rtblocks % cfg->rgsize != 0);
}
- if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
- fprintf(stderr,
-_("realtime group size (%llu) must be less than the maximum (%u)\n"),
- (unsigned long long)cfg->rgsize,
- XFS_MAX_RGBLOCKS);
- usage();
- }
-
- if (cfg->rgsize % cfg->rtextblocks != 0) {
- fprintf(stderr,
-_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
- (unsigned long long)cfg->rgsize,
- (unsigned long long)cfg->rtextblocks);
- usage();
- }
-
- if (cfg->rgsize <= cfg->rtextblocks) {
- fprintf(stderr,
-_("realtime group size (%llu) must be at least two realtime extents\n"),
- (unsigned long long)cfg->rgsize);
- usage();
- }
-
- if (cfg->rgcount > XFS_MAX_RGNUMBER) {
- fprintf(stderr,
-_("realtime group count (%llu) must be less than the maximum (%u)\n"),
- (unsigned long long)cfg->rgcount,
- XFS_MAX_RGNUMBER);
- usage();
- }
+ validate_rtgroup_geometry(cfg);
if (cfg->rtextents)
cfg->rtbmblocks = howmany(cfg->rgsize / cfg->rtextblocks,
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (29 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:41 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 32/43] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
` (12 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
To create file systems with a zoned RT device, query the hardware
zone information to align the RT groups to it, and create an internal
RT device if the device has conventional and sequential write required
zones.
Default to use all sequential write required zoned for the RT device if
there are sequential write required zones.
Default to 256 and 1% conventional when -r zoned is specified without
further option and there are no sequential write required zones. This
mimics a SMR HDD and works well with tests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
libxfs/init.c | 2 +-
mkfs/proto.c | 3 +-
mkfs/xfs_mkfs.c | 511 ++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 497 insertions(+), 19 deletions(-)
diff --git a/libxfs/init.c b/libxfs/init.c
index a186369f3fd8..393a94673f7e 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -251,7 +251,7 @@ libxfs_close_devices(
libxfs_device_close(&li->data);
if (li->log.dev && li->log.dev != li->data.dev)
libxfs_device_close(&li->log);
- if (li->rt.dev)
+ if (li->rt.dev && li->rt.dev != li->data.dev)
libxfs_device_close(&li->rt);
}
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 7f56a3d82a06..7f80bef838be 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1144,7 +1144,8 @@ rtinit_groups(
fail(_("rtrmap rtsb init failed"), error);
}
- rtfreesp_init(rtg);
+ if (!xfs_has_zoned(mp))
+ rtfreesp_init(rtg);
}
}
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 13b746b365e1..8b7a0b617d3e 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -6,6 +6,8 @@
#include "libfrog/util.h"
#include "libxfs.h"
#include <ctype.h>
+#include <linux/blkzoned.h>
+#include "libxfs/xfs_zones.h"
#include "xfs_multidisk.h"
#include "libxcmd.h"
#include "libfrog/fsgeom.h"
@@ -135,6 +137,9 @@ enum {
R_RGCOUNT,
R_RGSIZE,
R_CONCURRENCY,
+ R_ZONED,
+ R_START,
+ R_RESERVED,
R_MAX_OPTS,
};
@@ -739,6 +744,9 @@ static struct opt_params ropts = {
[R_RGCOUNT] = "rgcount",
[R_RGSIZE] = "rgsize",
[R_CONCURRENCY] = "concurrency",
+ [R_ZONED] = "zoned",
+ [R_START] = "start",
+ [R_RESERVED] = "reserved",
[R_MAX_OPTS] = NULL,
},
.subopt_params = {
@@ -804,6 +812,28 @@ static struct opt_params ropts = {
.maxval = INT_MAX,
.defaultval = 1,
},
+ { .index = R_ZONED,
+ .conflicts = { { &ropts, R_EXTSIZE },
+ { NULL, LAST_CONFLICT } },
+ .minval = 0,
+ .maxval = 1,
+ .defaultval = 1,
+ },
+ { .index = R_START,
+ .conflicts = { { &ropts, R_DEV },
+ { NULL, LAST_CONFLICT } },
+ .convert = true,
+ .minval = 0,
+ .maxval = LLONG_MAX,
+ .defaultval = SUBOPT_NEEDS_VAL,
+ },
+ { .index = R_RESERVED,
+ .conflicts = { { NULL, LAST_CONFLICT } },
+ .convert = true,
+ .minval = 0,
+ .maxval = LLONG_MAX,
+ .defaultval = SUBOPT_NEEDS_VAL,
+ },
},
};
@@ -1012,6 +1042,8 @@ struct sb_feat_args {
bool nortalign;
bool nrext64;
bool exchrange; /* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
+ bool zoned;
+ bool zone_gaps;
uint16_t qflags;
};
@@ -1035,6 +1067,8 @@ struct cli_params {
char *lsu;
char *rtextsize;
char *rtsize;
+ char *rtstart;
+ uint64_t rtreserved;
/* parameters where 0 is a valid CLI value */
int dsunit;
@@ -1121,6 +1155,8 @@ struct mkfs_params {
char *label;
struct sb_feat_args sb_feat;
+ uint64_t rtstart;
+ uint64_t rtreserved;
};
/*
@@ -1172,7 +1208,7 @@ usage( void )
/* prototype file */ [-p fname]\n\
/* quiet */ [-q]\n\
/* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
- concurrency=num]\n\
+ concurrency=num,zoned=0|1,start=n,reserved=n]\n\
/* sectorsize */ [-s size=num]\n\
/* version */ [-V]\n\
devicename\n\
@@ -1539,6 +1575,34 @@ discard_blocks(int fd, uint64_t nsectors, int quiet)
printf("Done.\n");
}
+static void
+reset_zones(
+ struct mkfs_params *cfg,
+ int fd,
+ uint64_t start_sector,
+ uint64_t nsectors,
+ int quiet)
+{
+ struct blk_zone_range range = {
+ .sector = start_sector,
+ .nr_sectors = nsectors,
+ };
+
+ if (!quiet) {
+ printf("Resetting zones...");
+ fflush(stdout);
+ }
+
+ if (ioctl(fd, BLKRESETZONE, &range) < 0) {
+ if (!quiet)
+ printf(" FAILED (%d)\n", -errno);
+ exit(1);
+ }
+
+ if (!quiet)
+ printf("Done.\n");
+}
+
static __attribute__((noreturn)) void
illegal_option(
const char *value,
@@ -2144,6 +2208,15 @@ rtdev_opts_parser(
case R_CONCURRENCY:
set_rtvol_concurrency(opts, subopt, cli, value);
break;
+ case R_ZONED:
+ cli->sb_feat.zoned = getnum(value, opts, subopt);
+ break;
+ case R_START:
+ cli->rtstart = getstr(value, opts, subopt);
+ break;
+ case R_RESERVED:
+ cli->rtreserved = getnum(value, opts, subopt);
+ break;
default:
return -EINVAL;
}
@@ -2445,7 +2518,213 @@ _("Version 1 logs do not support sector size %d\n"),
_("log stripe unit specified, using v2 logs\n"));
cli->sb_feat.log_version = 2;
}
+}
+
+struct zone_info {
+ /* number of zones, conventional or sequential */
+ unsigned int nr_zones;
+ /* number of conventional zones */
+ unsigned int nr_conv_zones;
+
+ /* size of the address space for a zone, in 512b blocks */
+ xfs_daddr_t zone_size;
+ /* write capacity of a zone, in 512b blocks */
+ xfs_daddr_t zone_capacity;
+};
+
+struct zone_topology {
+ struct zone_info data;
+ struct zone_info rt;
+ struct zone_info log;
+};
+
+/* random size that allows efficient processing */
+#define ZONES_PER_IOCTL 16384
+
+static void
+report_zones(
+ const char *name,
+ struct zone_info *zi)
+{
+ struct blk_zone_report *rep;
+ bool found_seq = false;
+ int fd, ret = 0;
+ uint64_t device_size;
+ uint64_t sector = 0;
+ size_t rep_size;
+ unsigned int i, n = 0;
+ struct stat st;
+
+ fd = open(name, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, _("Failed to open RT device: %d.\n"), -errno);
+ exit(1);
+ }
+
+ if (fstat(fd, &st) < 0) {
+ ret = -EIO;
+ goto out_close;
+ }
+ if (!S_ISBLK(st.st_mode))
+ goto out_close;
+
+ if (ioctl(fd, BLKGETSIZE64, &device_size)) {
+ fprintf(stderr, _("Failed to get block size: %d.\n"), -errno);
+ exit(1);
+ }
+ if (ioctl(fd, BLKGETZONESZ, &zi->zone_size) || !zi->zone_size)
+ goto out_close; /* not zoned */
+
+ /* BLKGETSIZE64 reports a byte value */
+ device_size = BTOBB(device_size);
+ zi->nr_zones = device_size / zi->zone_size;
+ zi->nr_conv_zones = 0;
+
+ rep_size = sizeof(struct blk_zone_report) +
+ sizeof(struct blk_zone) * ZONES_PER_IOCTL;
+ rep = malloc(rep_size);
+ if (!rep) {
+ fprintf(stderr,
+_("Failed to allocate memory for zone reporting.\n"));
+ exit(1);
+ }
+
+ while (n < zi->nr_zones) {
+ struct blk_zone *zones = (struct blk_zone *)(rep + 1);
+
+ memset(rep, 0, rep_size);
+ rep->sector = sector;
+ rep->nr_zones = ZONES_PER_IOCTL;
+
+ ret = ioctl(fd, BLKREPORTZONE, rep);
+ if (ret) {
+ fprintf(stderr,
+_("ioctl(BLKREPORTZONE) failed: %d!\n"), -errno);
+ exit(1);
+ }
+ if (!rep->nr_zones)
+ break;
+
+ for (i = 0; i < rep->nr_zones; i++) {
+ if (n >= zi->nr_zones)
+ break;
+
+ if (zones[i].len != zi->zone_size) {
+ fprintf(stderr,
+_("Inconsistent zone size!\n"));
+ exit(1);
+ }
+
+ switch (zones[i].type) {
+ case BLK_ZONE_TYPE_CONVENTIONAL:
+ /*
+ * We can only use the conventional space at the
+ * start of the device for metadata, so don't
+ * count later conventional zones. This is
+ * not an error because we can use them for data
+ * just fine.
+ */
+ if (!found_seq)
+ zi->nr_conv_zones++;
+ break;
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
+ found_seq = true;
+ break;
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
+ fprintf(stderr,
+_("Sequential write preferred zones not supported.\n"));
+ exit(1);
+ default:
+ fprintf(stderr,
+_("Unknown zone type (0x%x) found.\n"), zones[i].type);
+ exit(1);
+ }
+
+ if (!n) {
+ zi->zone_capacity = zones[i].capacity;
+ if (zi->zone_capacity > zi->zone_size) {
+ fprintf(stderr,
+_("Zone capacity larger than zone size!\n"));
+ exit(1);
+ }
+ } else if (zones[i].capacity != zi->zone_capacity) {
+ fprintf(stderr,
+_("Inconsistent zone capacity!\n"));
+ exit(1);
+ }
+
+ n++;
+ }
+ sector = zones[rep->nr_zones - 1].start +
+ zones[rep->nr_zones - 1].len;
+ }
+
+ free(rep);
+out_close:
+ close(fd);
+}
+
+static void
+validate_zoned(
+ struct mkfs_params *cfg,
+ struct cli_params *cli,
+ struct mkfs_default_params *dft,
+ struct zone_topology *zt)
+{
+ if (!cli->xi->data.isfile) {
+ report_zones(cli->xi->data.name, &zt->data);
+ if (zt->data.nr_zones) {
+ if (!zt->data.nr_conv_zones) {
+ fprintf(stderr,
+_("Data devices requires conventional zones.\n"));
+ usage();
+ }
+ if (zt->data.zone_capacity != zt->data.zone_size) {
+ fprintf(stderr,
+_("Zone capacity equal to Zone size required for conventional zones.\n"));
+ usage();
+ }
+
+ cli->sb_feat.zoned = true;
+ cfg->rtstart =
+ zt->data.nr_conv_zones * zt->data.zone_capacity;
+ }
+ }
+
+ if (cli->xi->rt.name && !cli->xi->rt.isfile) {
+ report_zones(cli->xi->rt.name, &zt->rt);
+ if (zt->rt.nr_zones && !cli->sb_feat.zoned)
+ cli->sb_feat.zoned = true;
+ if (zt->rt.zone_size != zt->rt.zone_capacity)
+ cli->sb_feat.zone_gaps = true;
+ }
+
+ if (cli->xi->log.name && !cli->xi->log.isfile) {
+ report_zones(cli->xi->log.name, &zt->log);
+ if (zt->log.nr_zones) {
+ fprintf(stderr,
+_("Zoned devices not supported as log device!\n"));
+ usage();
+ }
+ }
+ if (cli->rtstart) {
+ /*
+ * For zoned devices with conventional zones, cfg->rtstart is
+ * set to the start of the first sequential write required zoned
+ * above. Don't allow the user to override it as that won't
+ * work.
+ */
+ if (cfg->rtstart) {
+ fprintf(stderr,
+_("rtstart override not allowed on zoned devices.\n"));
+ usage();
+ }
+ cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;
+ }
+
+ if (cli->rtreserved)
+ cfg->rtreserved = cli->rtreserved;
}
/*
@@ -2670,7 +2949,37 @@ _("inode btree counters not supported without finobt support\n"));
cli->sb_feat.inobtcnt = false;
}
- if (cli->xi->rt.name) {
+ if (cli->sb_feat.zoned) {
+ if (!cli->sb_feat.metadir) {
+ if (cli_opt_set(&mopts, M_METADIR)) {
+ fprintf(stderr,
+_("zoned realtime device not supported without metadir support\n"));
+ usage();
+ }
+ cli->sb_feat.metadir = true;
+ }
+ if (cli->rtextsize) {
+ if (cli_opt_set(&ropts, R_EXTSIZE)) {
+ fprintf(stderr,
+_("rt extent size not supported on realtime devices with zoned mode\n"));
+ usage();
+ }
+ cli->rtextsize = 0;
+ }
+ } else {
+ if (cli->rtstart) {
+ fprintf(stderr,
+_("internal RT section only supported in zoned mode\n"));
+ usage();
+ }
+ if (cli->rtreserved) {
+ fprintf(stderr,
+_("reserved RT blocks only supported in zoned mode\n"));
+ usage();
+ }
+ }
+
+ if (cli->xi->rt.name || cfg->rtstart) {
if (cli->rtextsize && cli->sb_feat.reflink) {
if (cli_opt_set(&mopts, M_REFLINK)) {
fprintf(stderr,
@@ -2911,6 +3220,11 @@ validate_rtextsize(
usage();
}
cfg->rtextblocks = 1;
+ } else if (cli->sb_feat.zoned) {
+ /*
+ * Zoned mode only supports a rtextsize of 1.
+ */
+ cfg->rtextblocks = 1;
} else {
/*
* If realtime extsize has not been specified by the user,
@@ -3315,7 +3629,8 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
static void
open_devices(
struct mkfs_params *cfg,
- struct libxfs_init *xi)
+ struct libxfs_init *xi,
+ struct zone_topology *zt)
{
uint64_t sector_mask;
@@ -3330,6 +3645,34 @@ open_devices(
usage();
}
+ if (zt->data.nr_zones) {
+ zt->rt.zone_size = zt->data.zone_size;
+ zt->rt.zone_capacity = zt->data.zone_capacity;
+ zt->rt.nr_zones = zt->data.nr_zones - zt->data.nr_conv_zones;
+ } else if (cfg->sb_feat.zoned && !cfg->rtstart && !xi->rt.dev) {
+ /*
+ * By default reserve at 1% of the total capacity (rounded up to
+ * the next power of two) for metadata, but match the minimum we
+ * enforce elsewhere. This matches what SMR HDDs provide.
+ */
+ uint64_t rt_target_size = max((xi->data.size + 99) / 100,
+ BTOBB(300 * 1024 * 1024));
+
+ cfg->rtstart = 1;
+ while (cfg->rtstart < rt_target_size)
+ cfg->rtstart <<= 1;
+ }
+
+ if (cfg->rtstart) {
+ if (cfg->rtstart >= xi->data.size) {
+ fprintf(stderr,
+ _("device size %lld too small for zoned allocator\n"), xi->data.size);
+ usage();
+ }
+ xi->rt.size = xi->data.size - cfg->rtstart;
+ xi->data.size = cfg->rtstart;
+ }
+
/*
* Ok, Linux only has a 1024-byte resolution on device _size_,
* and the sizes below are in basic 512-byte blocks,
@@ -3348,17 +3691,42 @@ open_devices(
static void
discard_devices(
+ struct mkfs_params *cfg,
struct libxfs_init *xi,
+ struct zone_topology *zt,
int quiet)
{
/*
* This function has to be called after libxfs has been initialized.
*/
- if (!xi->data.isfile)
- discard_blocks(xi->data.fd, xi->data.size, quiet);
- if (xi->rt.dev && !xi->rt.isfile)
- discard_blocks(xi->rt.fd, xi->rt.size, quiet);
+ if (!xi->data.isfile) {
+ uint64_t nsectors = xi->data.size;
+
+ if (cfg->rtstart && zt->data.nr_zones) {
+ /*
+ * Note that the zone reset here includes the LBA range
+ * for the data device.
+ *
+ * This is because doing a single zone reset all on the
+ * entire device (which the kernel automatically does
+ * for us for a full device range) is a lot faster than
+ * resetting each zone individually and resetting
+ * the conventional zones used for the data device is a
+ * no-op.
+ */
+ reset_zones(cfg, xi->data.fd, 0,
+ cfg->rtstart + xi->rt.size, quiet);
+ nsectors -= cfg->rtstart;
+ }
+ discard_blocks(xi->data.fd, nsectors, quiet);
+ }
+ if (xi->rt.dev && !xi->rt.isfile) {
+ if (zt->rt.nr_zones)
+ reset_zones(cfg, xi->rt.fd, 0, xi->rt.size, quiet);
+ else
+ discard_blocks(xi->rt.fd, xi->rt.size, quiet);
+ }
if (xi->log.dev && xi->log.dev != xi->data.dev && !xi->log.isfile)
discard_blocks(xi->log.fd, xi->log.size, quiet);
}
@@ -3477,11 +3845,12 @@ reported by the device (%u).\n"),
static void
validate_rtdev(
struct mkfs_params *cfg,
- struct cli_params *cli)
+ struct cli_params *cli,
+ struct zone_topology *zt)
{
struct libxfs_init *xi = cli->xi;
- if (!xi->rt.dev) {
+ if (!xi->rt.dev && !cfg->rtstart) {
if (cli->rtsize) {
fprintf(stderr,
_("size specified for non-existent rt subvolume\n"));
@@ -3501,7 +3870,7 @@ _("size specified for non-existent rt subvolume\n"));
if (cli->rtsize) {
if (cfg->rtblocks > DTOBT(xi->rt.size, cfg->blocklog)) {
fprintf(stderr,
-_("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
+_("size %s specified for rt subvolume is too large, maximum is %lld blocks\n"),
cli->rtsize,
(long long)DTOBT(xi->rt.size, cfg->blocklog));
usage();
@@ -3512,6 +3881,9 @@ _("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
reported by the device (%u).\n"),
cfg->sectorsize, xi->rt.bsize);
}
+ } else if (zt->rt.nr_zones) {
+ cfg->rtblocks = DTOBT(zt->rt.nr_zones * zt->rt.zone_capacity,
+ cfg->blocklog);
} else {
/* grab volume size */
cfg->rtblocks = DTOBT(xi->rt.size, cfg->blocklog);
@@ -4050,6 +4422,92 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
}
+static void
+calculate_zone_geometry(
+ struct mkfs_params *cfg,
+ struct cli_params *cli,
+ struct libxfs_init *xi,
+ struct zone_topology *zt)
+{
+ if (cfg->rtblocks == 0) {
+ fprintf(stderr,
+_("empty zoned realtime device not supported.\n"));
+ usage();
+ }
+
+ if (zt->rt.nr_zones) {
+ /* The RT device has hardware zones */
+ cfg->rgsize = zt->rt.zone_capacity * 512;
+
+ if (cfg->rgsize % cfg->blocksize) {
+ fprintf(stderr,
+_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
+ cli->rgsize, cfg->blocksize);
+ usage();
+ }
+ if (cli->rgsize) {
+ fprintf(stderr,
+_("rgsize (%s) may not be specified when the rt device is zoned\n"),
+ cli->rgsize);
+ usage();
+ }
+
+ cfg->rgsize /= cfg->blocksize;
+ cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
+
+ if (cli->rgcount > cfg->rgcount) {
+ fprintf(stderr,
+_("rgcount (%llu) is larger than hardware zone count (%llu)\n"),
+ (unsigned long long)cli->rgcount,
+ (unsigned long long)cfg->rgcount);
+ usage();
+ } else if (cli->rgcount && cli->rgcount < cfg->rgcount) {
+ /* constrain the rt device to the given rgcount */
+ cfg->rgcount = cli->rgcount;
+ }
+ } else {
+ /* No hardware zones */
+ if (cli->rgsize) {
+ /* User-specified rtgroup size */
+ cfg->rgsize = getnum(cli->rgsize, &ropts, R_RGSIZE);
+
+ /* Check specified agsize is a multiple of blocksize. */
+ if (cfg->rgsize % cfg->blocksize) {
+ fprintf(stderr,
+_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
+ cli->rgsize, cfg->blocksize);
+ usage();
+ }
+ cfg->rgsize /= cfg->blocksize;
+ cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+ (cfg->rtblocks % cfg->rgsize != 0);
+ } else if (cli->rgcount) {
+ /* User-specified rtgroup count */
+ cfg->rgcount = cli->rgcount;
+ cfg->rgsize = cfg->rtblocks / cfg->rgcount +
+ (cfg->rtblocks % cfg->rgcount != 0);
+ } else {
+ /* 256MB zones just like typical SMR HDDs */
+ cfg->rgsize = MEGABYTES(256, cfg->blocklog);
+ cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+ (cfg->rtblocks % cfg->rgsize != 0);
+ }
+ }
+
+ if (cfg->rgcount < XFS_MIN_ZONES) {
+ fprintf(stderr,
+_("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
+ (unsigned long long)cfg->rgcount,
+ XFS_MIN_ZONES);
+ usage();
+ }
+
+ validate_rtgroup_geometry(cfg);
+
+ /* Zoned RT devices don't use the rtbitmap, and have no bitmap blocks */
+ cfg->rtbmblocks = 0;
+}
+
static void
calculate_imaxpct(
struct mkfs_params *cfg,
@@ -4213,6 +4671,14 @@ sb_set_features(
sbp->sb_rgblklog = libxfs_compute_rgblklog(sbp->sb_rgextents,
cfg->rtextblocks);
}
+
+ if (fp->zoned) {
+ sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONED;
+ sbp->sb_rtstart = (cfg->rtstart * 512) / cfg->blocksize;
+ sbp->sb_rtreserved = cfg->rtreserved / cfg->blocksize;
+ }
+ if (fp->zone_gaps)
+ sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONE_GAPS;
}
/*
@@ -4775,9 +5241,11 @@ prepare_devices(
(xfs_extlen_t)XFS_FSB_TO_BB(mp, cfg->logblocks),
&sbp->sb_uuid, cfg->sb_feat.log_version,
lsunit, XLOG_FMT, XLOG_INIT_CYCLE, false);
-
/* finally, check we can write the last block in the realtime area */
- if (mp->m_rtdev_targp->bt_bdev && cfg->rtblocks > 0) {
+ if (mp->m_rtdev_targp->bt_bdev &&
+ mp->m_rtdev_targp != mp->m_ddev_targp &&
+ cfg->rtblocks > 0 &&
+ !xfs_has_zoned(mp)) {
buf = alloc_write_buf(mp->m_rtdev_targp,
XFS_FSB_TO_BB(mp, cfg->rtblocks - 1LL),
BTOBB(cfg->blocksize));
@@ -5216,7 +5684,7 @@ main(
*/
},
};
-
+ struct zone_topology zt = {};
struct list_head buffer_list;
int error;
@@ -5318,6 +5786,7 @@ main(
sectorsize = cfg.sectorsize;
validate_log_sectorsize(&cfg, &cli, &dft, &ft);
+ validate_zoned(&cfg, &cli, &dft, &zt);
validate_sb_features(&cfg, &cli);
/*
@@ -5342,11 +5811,11 @@ main(
/*
* Open and validate the device configurations
*/
- open_devices(&cfg, &xi);
+ open_devices(&cfg, &xi, &zt);
validate_overwrite(xi.data.name, force_overwrite);
validate_datadev(&cfg, &cli);
validate_logdev(&cfg, &cli);
- validate_rtdev(&cfg, &cli);
+ validate_rtdev(&cfg, &cli, &zt);
calc_stripe_factors(&cfg, &cli, &ft);
/*
@@ -5357,7 +5826,10 @@ main(
*/
calculate_initial_ag_geometry(&cfg, &cli, &xi);
align_ag_geometry(&cfg);
- calculate_rtgroup_geometry(&cfg, &cli, &xi);
+ if (cfg.sb_feat.zoned)
+ calculate_zone_geometry(&cfg, &cli, &xi, &zt);
+ else
+ calculate_rtgroup_geometry(&cfg, &cli, &xi);
calculate_imaxpct(&cfg, &cli);
@@ -5410,8 +5882,13 @@ main(
/*
* All values have been validated, discard the old device layout.
*/
+ if (cli.sb_feat.zoned && !discard) {
+ fprintf(stderr,
+ _("-K not support for zoned file systems.\n"));
+ return 1;
+ }
if (discard && !dry_run)
- discard_devices(&xi, quiet);
+ discard_devices(&cfg, &xi, &zt, quiet);
/*
* we need the libxfs buffer cache from here on in.
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 32/43] xfs_mkfs: calculate zone overprovisioning when specifying size
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (30 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
` (11 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
When size is specified for zoned file systems, calculate the required
over provisioning to back the requested capacity.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
mkfs/xfs_mkfs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 8b7a0b617d3e..5cd100c87f43 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -4422,6 +4422,49 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
}
+/*
+ * If we're creating a zoned filesystem and the user specified a size, add
+ * enough over-provisioning to be able to back the requested amount of
+ * writable space.
+ */
+static void
+adjust_nr_zones(
+ struct mkfs_params *cfg,
+ struct cli_params *cli,
+ struct libxfs_init *xi,
+ struct zone_topology *zt)
+{
+ uint64_t new_rtblocks, slack;
+ unsigned int max_zones;
+
+ if (zt->rt.nr_zones)
+ max_zones = zt->rt.nr_zones;
+ else
+ max_zones = DTOBT(xi->rt.size, cfg->blocklog) / cfg->rgsize;
+
+ if (!cli->rgcount)
+ cfg->rgcount += XFS_RESERVED_ZONES;
+ if (cfg->rgcount > max_zones) {
+ cfg->rgcount = max_zones;
+ fprintf(stderr,
+_("Warning: not enough zones for backing requested rt size due to\n"
+ "over-provisioning needs, writable size will be less than %s\n"),
+ cli->rtsize);
+ }
+ new_rtblocks = (cfg->rgcount * cfg->rgsize);
+ slack = (new_rtblocks - cfg->rtblocks) % cfg->rgsize;
+
+ cfg->rtblocks = new_rtblocks;
+ cfg->rtextents = cfg->rtblocks / cfg->rtextblocks;
+
+ /*
+ * Add the slack to the end of the last zone to the reserved blocks.
+ * This ensures the visible user capacity is exactly the one that the
+ * user asked for.
+ */
+ cfg->rtreserved += (slack * cfg->blocksize);
+}
+
static void
calculate_zone_geometry(
struct mkfs_params *cfg,
@@ -4494,6 +4537,9 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
}
}
+ if (cli->rtsize || cli->rgcount)
+ adjust_nr_zones(cfg, cli, xi, zt);
+
if (cfg->rgcount < XFS_MIN_ZONES) {
fprintf(stderr,
_("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (31 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 32/43] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:37 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 34/43] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
` (10 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Zone file systems are intended to use sequential write required zones
(or areas treated as such) for the main data store. And usually use the
data device only for metadata that requires random writes.
rtinherit=1 is the way to achieve that, so enabled it by default, but
still allow the user to override it if needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
inherit
---
mkfs/xfs_mkfs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 5cd100c87f43..e6adb345551e 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2966,6 +2966,13 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
}
cli->rtextsize = 0;
}
+
+ /*
+ * Set the rtinherit by default for zoned file systems as they
+ * usually use the data device purely as a metadata container.
+ */
+ if (!cli_opt_set(&dopts, D_RTINHERIT))
+ cli->fsx.fsx_xflags |= FS_XFLAG_RTINHERIT;
} else {
if (cli->rtstart) {
fprintf(stderr,
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 34/43] xfs_mkfs: reflink conflicts with zoned file systems for now
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (32 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 35/43] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
` (9 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Don't allow reflink on zoned file system until garbage collections learns
how to deal with shared extents and doesn't blindly unshare them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
mkfs/xfs_mkfs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index e6adb345551e..9d84dc9cb426 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2966,6 +2966,14 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
}
cli->rtextsize = 0;
}
+ if (cli->sb_feat.reflink) {
+ if (cli_opt_set(&mopts, M_REFLINK)) {
+ fprintf(stderr,
+_("reflink not supported on realtime devices with zoned mode specified\n"));
+ usage();
+ }
+ cli->sb_feat.reflink = false;
+ }
/*
* Set the rtinherit by default for zoned file systems as they
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 35/43] xfs_mkfs: document the new zoned options in the man page
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (33 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 34/43] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
` (8 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Add documentation for the zoned file system specific options.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
man/man8/mkfs.xfs.8.in | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 37e3a88e7ac7..bc80493187f6 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -1248,6 +1248,23 @@ The magic value of
.I 0
forces use of the older rtgroups geometry calculations that is used for
mechanical storage.
+.TP
+.BI zoned= value
+Controls if the zoned allocator is used for the realtime device.
+The value is either 0 to disable the feature, or 1 to enable it.
+Defaults to 1 for zoned block device, else 0.
+.TP
+.BI start= value
+Controls the start of the internal realtime section. Defaults to 0
+for conventional block devices, or the start of the first sequential
+required zone for zoned block devices.
+This option is only valid if the zoned realtime allocator is used.
+.TP
+.BI reserved= value
+Controls the amount of space in the realtime section that is reserved for
+internal use by garbage collection and reorganization algorithms.
+Defaults to 0 if not set.
+This option is only valid if the zoned realtime allocator is used.
.RE
.PP
.PD 0
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (34 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 35/43] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:36 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 37/43] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
` (7 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Document the new zoned feature flag and the two new fields added
with it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
man/man2/ioctl_xfs_fsgeometry.2 | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
index 502054f391e9..037f8e15e415 100644
--- a/man/man2/ioctl_xfs_fsgeometry.2
+++ b/man/man2/ioctl_xfs_fsgeometry.2
@@ -50,7 +50,9 @@ struct xfs_fsop_geom {
__u32 sick;
__u32 checked;
__u64 rgextents;
- __u64 reserved[16];
+ __u64 rtstart;
+ __u64 rtreserved;
+ __u64 reserved[14];
};
.fi
.in
@@ -143,6 +145,20 @@ for more details.
.I rgextents
Is the number of RT extents in each rtgroup.
.PP
+.I rtstart
+Start of the internal RT device in fsblocks. 0 if an external RT device
+is used.
+This field is meaningful only if the flag
+.B XFS_FSOP_GEOM_FLAGS_ZONED
+is set.
+.PP
+.I rtreserved
+The amount of space in the realtime section that is reserved for internal use
+by garbage collection and reorganization algorithms in fsblocks.
+This field is meaningful only if the flag
+.B XFS_FSOP_GEOM_FLAGS_ZONED
+is set.
+.PP
.I reserved
is set to zero.
.SH FILESYSTEM FEATURE FLAGS
@@ -221,6 +237,9 @@ Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
.TP
.B XFS_FSOP_GEOM_FLAGS_METADIR
Filesystem contains a metadata directory tree.
+.TP
+.B XFS_FSOP_GEOM_FLAGS_ZONED
+Filesystem uses the zoned allocator for the RT device.
.RE
.SH XFS METADATA HEALTH REPORTING
.PP
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 37/43] xfs_io: correctly report RGs with internal rt dev in bmap output
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (35 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 38/43] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
` (6 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Apply the proper offset. Somehow this made gcc complain about
possible overflowing abuf, so increase the size for that as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
io/bmap.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/io/bmap.c b/io/bmap.c
index b2f6b4905285..944f658b35f0 100644
--- a/io/bmap.c
+++ b/io/bmap.c
@@ -257,18 +257,21 @@ bmap_f(
#define FLG_BSW 0000010 /* Not on begin of stripe width */
#define FLG_ESW 0000001 /* Not on end of stripe width */
int agno;
- off_t agoff, bbperag;
+ off_t agoff, bbperag, bstart;
int foff_w, boff_w, aoff_w, tot_w, agno_w;
- char rbuf[32], bbuf[32], abuf[32];
+ char rbuf[32], bbuf[32], abuf[64];
int sunit, swidth;
foff_w = boff_w = aoff_w = MINRANGE_WIDTH;
tot_w = MINTOT_WIDTH;
if (is_rt) {
+ bstart = fsgeo.rtstart *
+ (fsgeo.blocksize / BBSIZE);
bbperag = bytes_per_rtgroup(&fsgeo) / BBSIZE;
sunit = 0;
swidth = 0;
} else {
+ bstart = 0;
bbperag = (off_t)fsgeo.agblocks *
(off_t)fsgeo.blocksize / BBSIZE;
sunit = (fsgeo.sunit * fsgeo.blocksize) / BBSIZE;
@@ -298,9 +301,11 @@ bmap_f(
map[i + 1].bmv_length - 1LL));
boff_w = max(boff_w, strlen(bbuf));
if (bbperag > 0) {
- agno = map[i + 1].bmv_block / bbperag;
- agoff = map[i + 1].bmv_block -
- (agno * bbperag);
+ off_t bno;
+
+ bno = map[i + 1].bmv_block - bstart;
+ agno = bno / bbperag;
+ agoff = bno % bbperag;
snprintf(abuf, sizeof(abuf),
"(%lld..%lld)",
(long long)agoff,
@@ -387,9 +392,11 @@ bmap_f(
printf("%4d: %-*s %-*s", i, foff_w, rbuf,
boff_w, bbuf);
if (bbperag > 0) {
- agno = map[i + 1].bmv_block / bbperag;
- agoff = map[i + 1].bmv_block -
- (agno * bbperag);
+ off_t bno;
+
+ bno = map[i + 1].bmv_block - bstart;
+ agno = bno / bbperag;
+ agoff = bno % bbperag;
snprintf(abuf, sizeof(abuf),
"(%lld..%lld)",
(long long)agoff,
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 38/43] xfs_io: don't re-query fs_path information in fsmap_f
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (36 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 37/43] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 39/43] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
` (5 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Reuse the information stashed in "file" instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
io/fsmap.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a/io/fsmap.c b/io/fsmap.c
index 6de720f238bb..6a87e8972f26 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -14,6 +14,7 @@
static cmdinfo_t fsmap_cmd;
static dev_t xfs_data_dev;
+static dev_t xfs_log_dev;
static dev_t xfs_rt_dev;
static void
@@ -405,8 +406,6 @@ fsmap_f(
int c;
unsigned long long nr = 0;
size_t fsblocksize, fssectsize;
- struct fs_path *fs;
- static bool tab_init;
bool dumped_flags = false;
int dflag, lflag, rflag;
@@ -491,15 +490,19 @@ fsmap_f(
return 0;
}
+ xfs_data_dev = file->fs_path.fs_datadev;
+ xfs_log_dev = file->fs_path.fs_logdev;
+ xfs_rt_dev = file->fs_path.fs_rtdev;
+
memset(head, 0, sizeof(*head));
l = head->fmh_keys;
h = head->fmh_keys + 1;
if (dflag) {
- l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+ l->fmr_device = h->fmr_device = xfs_data_dev;
} else if (lflag) {
- l->fmr_device = h->fmr_device = file->fs_path.fs_logdev;
+ l->fmr_device = h->fmr_device = xfs_log_dev;
} else if (rflag) {
- l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+ l->fmr_device = h->fmr_device = xfs_rt_dev;
} else {
l->fmr_device = 0;
h->fmr_device = UINT_MAX;
@@ -510,18 +513,6 @@ fsmap_f(
h->fmr_flags = UINT_MAX;
h->fmr_offset = ULLONG_MAX;
- /*
- * If this is an XFS filesystem, remember the data device.
- * (We report AG number/block for data device extents on XFS).
- */
- if (!tab_init) {
- fs_table_initialise(0, NULL, 0, NULL);
- tab_init = true;
- }
- fs = fs_table_lookup(file->name, FS_MOUNT_POINT);
- xfs_data_dev = fs ? fs->fs_datadev : 0;
- xfs_rt_dev = fs ? fs->fs_rtdev : 0;
-
head->fmh_count = map_size;
do {
/* Get some extents */
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 39/43] xfs_io: handle internal RT devices in fsmap output
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (37 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 38/43] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 40/43] xfs_spaceman: handle internal RT devices Christoph Hellwig
` (4 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Deal with the synthetic fmr_device values and the rt device offset when
calculating RG numbers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
io/fsmap.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/io/fsmap.c b/io/fsmap.c
index 6a87e8972f26..005d32e500a0 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -247,8 +247,13 @@ dump_map_verbose(
(long long)BTOBBT(agoff),
(long long)BTOBBT(agoff + p->fmr_length - 1));
} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
- agno = p->fmr_physical / bperrtg;
- agoff = p->fmr_physical % bperrtg;
+ uint64_t start = p->fmr_physical -
+ fsgeo->rtstart * fsgeo->blocksize;
+
+ agno = start / bperrtg;
+ if (agno < 0)
+ agno = -1;
+ agoff = start % bperrtg;
snprintf(abuf, sizeof(abuf),
"(%lld..%lld)",
(long long)BTOBBT(agoff),
@@ -326,8 +331,13 @@ dump_map_verbose(
"%lld",
(long long)agno);
} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
- agno = p->fmr_physical / bperrtg;
- agoff = p->fmr_physical % bperrtg;
+ uint64_t start = p->fmr_physical -
+ fsgeo->rtstart * fsgeo->blocksize;
+
+ agno = start / bperrtg;
+ if (agno < 0)
+ agno = -1;
+ agoff = start % bperrtg;
snprintf(abuf, sizeof(abuf),
"(%lld..%lld)",
(long long)BTOBBT(agoff),
@@ -490,9 +500,18 @@ fsmap_f(
return 0;
}
- xfs_data_dev = file->fs_path.fs_datadev;
- xfs_log_dev = file->fs_path.fs_logdev;
- xfs_rt_dev = file->fs_path.fs_rtdev;
+ /*
+ * File systems with internal rt device use synthetic device values.
+ */
+ if (file->geom.rtstart) {
+ xfs_data_dev = XFS_DEV_DATA;
+ xfs_log_dev = XFS_DEV_LOG;
+ xfs_rt_dev = XFS_DEV_RT;
+ } else {
+ xfs_data_dev = file->fs_path.fs_datadev;
+ xfs_log_dev = file->fs_path.fs_logdev;
+ xfs_rt_dev = file->fs_path.fs_rtdev;
+ }
memset(head, 0, sizeof(*head));
l = head->fmh_keys;
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 40/43] xfs_spaceman: handle internal RT devices
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (38 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 39/43] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 41/43] xfs_scrub: support internal RT device Christoph Hellwig
` (3 subsequent siblings)
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Handle the synthetic fmr_device values for fsmap.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
spaceman/freesp.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/spaceman/freesp.c b/spaceman/freesp.c
index dfbec52a7160..9ad321c4843f 100644
--- a/spaceman/freesp.c
+++ b/spaceman/freesp.c
@@ -140,12 +140,19 @@ scan_ag(
if (agno != NULLAGNUMBER) {
l->fmr_physical = cvt_agbno_to_b(xfd, agno, 0);
h->fmr_physical = cvt_agbno_to_b(xfd, agno + 1, 0);
- l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+ if (file->xfd.fsgeom.rtstart)
+ l->fmr_device = XFS_DEV_DATA;
+ else
+ l->fmr_device = file->fs_path.fs_datadev;
} else {
l->fmr_physical = 0;
h->fmr_physical = ULLONG_MAX;
- l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+ if (file->xfd.fsgeom.rtstart)
+ l->fmr_device = XFS_DEV_RT;
+ else
+ l->fmr_device = file->fs_path.fs_rtdev;
}
+ h->fmr_device = l->fmr_device;
h->fmr_owner = ULLONG_MAX;
h->fmr_flags = UINT_MAX;
h->fmr_offset = ULLONG_MAX;
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 41/43] xfs_scrub: support internal RT device
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (39 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 40/43] xfs_spaceman: handle internal RT devices Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-15 0:35 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 42/43] xfs_mdrestore: support internal RT devices Christoph Hellwig
` (2 subsequent siblings)
43 siblings, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Handle the synthetic fmr_device values, and deal with the fact that
ctx->fsinfo.fs_rt is allowed to be non-NULL for internal RT devices as
it is the same as the data device in this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
scrub/phase1.c | 3 ++-
scrub/phase6.c | 65 ++++++++++++++++++++++++++++--------------------
scrub/phase7.c | 28 +++++++++++++++------
scrub/spacemap.c | 17 +++++++++----
4 files changed, 72 insertions(+), 41 deletions(-)
diff --git a/scrub/phase1.c b/scrub/phase1.c
index d03a9099a217..10e9aa1892b7 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -341,7 +341,8 @@ _("Kernel metadata repair facility is not available. Use -n to scrub."));
_("Unable to find log device path."));
return ECANCELED;
}
- if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
+ if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL &&
+ !ctx->mnt.fsgeom.rtstart) {
str_error(ctx, ctx->mntpoint,
_("Unable to find realtime device path."));
return ECANCELED;
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 2a52b2c92419..abf6f9713f1a 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -56,12 +56,21 @@ dev_to_pool(
struct media_verify_state *vs,
dev_t dev)
{
- if (dev == ctx->fsinfo.fs_datadev)
- return vs->rvp_data;
- else if (dev == ctx->fsinfo.fs_logdev)
- return vs->rvp_log;
- else if (dev == ctx->fsinfo.fs_rtdev)
- return vs->rvp_realtime;
+ if (ctx->mnt.fsgeom.rtstart) {
+ if (dev == XFS_DEV_DATA)
+ return vs->rvp_data;
+ if (dev == XFS_DEV_LOG)
+ return vs->rvp_log;
+ if (dev == XFS_DEV_RT)
+ return vs->rvp_realtime;
+ } else {
+ if (dev == ctx->fsinfo.fs_datadev)
+ return vs->rvp_data;
+ if (dev == ctx->fsinfo.fs_logdev)
+ return vs->rvp_log;
+ if (dev == ctx->fsinfo.fs_rtdev)
+ return vs->rvp_realtime;
+ }
abort();
}
@@ -71,12 +80,21 @@ disk_to_dev(
struct scrub_ctx *ctx,
struct disk *disk)
{
- if (disk == ctx->datadev)
- return ctx->fsinfo.fs_datadev;
- else if (disk == ctx->logdev)
- return ctx->fsinfo.fs_logdev;
- else if (disk == ctx->rtdev)
- return ctx->fsinfo.fs_rtdev;
+ if (ctx->mnt.fsgeom.rtstart) {
+ if (disk == ctx->datadev)
+ return XFS_DEV_DATA;
+ if (disk == ctx->logdev)
+ return XFS_DEV_LOG;
+ if (disk == ctx->rtdev)
+ return XFS_DEV_RT;
+ } else {
+ if (disk == ctx->datadev)
+ return ctx->fsinfo.fs_datadev;
+ if (disk == ctx->logdev)
+ return ctx->fsinfo.fs_logdev;
+ if (disk == ctx->rtdev)
+ return ctx->fsinfo.fs_rtdev;
+ }
abort();
}
@@ -87,11 +105,9 @@ bitmap_for_disk(
struct disk *disk,
struct media_verify_state *vs)
{
- dev_t dev = disk_to_dev(ctx, disk);
-
- if (dev == ctx->fsinfo.fs_datadev)
+ if (disk == ctx->datadev)
return vs->d_bad;
- else if (dev == ctx->fsinfo.fs_rtdev)
+ if (disk == ctx->rtdev)
return vs->r_bad;
return NULL;
}
@@ -501,14 +517,11 @@ report_ioerr(
.length = length,
};
struct disk_ioerr_report *dioerr = arg;
- dev_t dev;
-
- dev = disk_to_dev(dioerr->ctx, dioerr->disk);
/* Go figure out which blocks are bad from the fsmap. */
- keys[0].fmr_device = dev;
+ keys[0].fmr_device = disk_to_dev(dioerr->ctx, dioerr->disk);
keys[0].fmr_physical = start;
- keys[1].fmr_device = dev;
+ keys[1].fmr_device = keys[0].fmr_device;
keys[1].fmr_physical = start + length - 1;
keys[1].fmr_owner = ULLONG_MAX;
keys[1].fmr_offset = ULLONG_MAX;
@@ -675,14 +688,12 @@ remember_ioerr(
int ret;
if (!length) {
- dev_t dev = disk_to_dev(ctx, disk);
-
- if (dev == ctx->fsinfo.fs_datadev)
+ if (disk == ctx->datadev)
vs->d_trunc = true;
- else if (dev == ctx->fsinfo.fs_rtdev)
- vs->r_trunc = true;
- else if (dev == ctx->fsinfo.fs_logdev)
+ else if (disk == ctx->logdev)
vs->l_trunc = true;
+ else if (disk == ctx->rtdev)
+ vs->r_trunc = true;
return;
}
diff --git a/scrub/phase7.c b/scrub/phase7.c
index 01097b678798..e25502668b1c 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -68,25 +68,37 @@ count_block_summary(
void *arg)
{
struct summary_counts *counts;
+ bool is_rt = false;
unsigned long long len;
int ret;
+ if (ctx->mnt.fsgeom.rtstart) {
+ if (fsmap->fmr_device == XFS_DEV_LOG)
+ return 0;
+ if (fsmap->fmr_device == XFS_DEV_RT)
+ is_rt = true;
+ } else {
+ if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
+ return 0;
+ if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+ is_rt = true;
+ }
+
counts = ptvar_get((struct ptvar *)arg, &ret);
if (ret) {
str_liberror(ctx, -ret, _("retrieving summary counts"));
return -ret;
}
- if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
- return 0;
+
if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
fsmap->fmr_owner == XFS_FMR_OWN_FREE) {
uint64_t blocks;
blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
- if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
- hist_add(&counts->datadev_hist, blocks);
- else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+ if (is_rt)
hist_add(&counts->rtdev_hist, blocks);
+ else
+ hist_add(&counts->datadev_hist, blocks);
return 0;
}
@@ -94,10 +106,10 @@ count_block_summary(
/* freesp btrees live in free space, need to adjust counters later. */
if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
- fsmap->fmr_owner == XFS_FMR_OWN_AG) {
+ fsmap->fmr_owner == XFS_FMR_OWN_AG)
counts->agbytes += fsmap->fmr_length;
- }
- if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
+
+ if (is_rt) {
/* Count realtime extents. */
counts->rbytes += len;
} else {
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index c293ab44a528..1ee4d1946d3d 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -103,9 +103,12 @@ scan_ag_rmaps(
bperag = (off_t)ctx->mnt.fsgeom.agblocks *
(off_t)ctx->mnt.fsgeom.blocksize;
- keys[0].fmr_device = ctx->fsinfo.fs_datadev;
+ if (ctx->mnt.fsgeom.rtstart)
+ keys[0].fmr_device = XFS_DEV_DATA;
+ else
+ keys[0].fmr_device = ctx->fsinfo.fs_datadev;
keys[0].fmr_physical = agno * bperag;
- keys[1].fmr_device = ctx->fsinfo.fs_datadev;
+ keys[1].fmr_device = keys[0].fmr_device;
keys[1].fmr_physical = ((agno + 1) * bperag) - 1;
keys[1].fmr_owner = ULLONG_MAX;
keys[1].fmr_offset = ULLONG_MAX;
@@ -140,9 +143,12 @@ scan_rtg_rmaps(
off_t bperrg = bytes_per_rtgroup(&ctx->mnt.fsgeom);
int ret;
- keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
+ if (ctx->mnt.fsgeom.rtstart)
+ keys[0].fmr_device = XFS_DEV_RT;
+ else
+ keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
keys[0].fmr_physical = (xfs_rtblock_t)rgno * bperrg;
- keys[1].fmr_device = ctx->fsinfo.fs_rtdev;
+ keys[1].fmr_device = keys[0].fmr_device;
keys[1].fmr_physical = ((rgno + 1) * bperrg) - 1;
keys[1].fmr_owner = ULLONG_MAX;
keys[1].fmr_offset = ULLONG_MAX;
@@ -216,7 +222,8 @@ scan_log_rmaps(
{
struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx;
- scan_dev_rmaps(ctx, ctx->fsinfo.fs_logdev, arg);
+ scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
+ arg);
}
/*
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 42/43] xfs_mdrestore: support internal RT devices
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (40 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 41/43] xfs_scrub: support internal RT device Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 43/43] xfs_growfs: " Christoph Hellwig
2025-04-25 15:48 ` [PATCH 44/43] xfs_repair: fix libxfs abstraction mess Darrick J. Wong
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Calculate the size properly for internal RT devices and skip restoring
to the external one for this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
mdrestore/xfs_mdrestore.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index d5014981b15a..95b01a99a154 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -183,6 +183,20 @@ verify_device_size(
}
}
+static void
+verify_main_device_size(
+ const struct mdrestore_dev *dev,
+ struct xfs_sb *sb)
+{
+ xfs_rfsblock_t nr_blocks = sb->sb_dblocks;
+
+ /* internal RT device */
+ if (sb->sb_rtstart)
+ nr_blocks = sb->sb_rtstart + sb->sb_rblocks;
+
+ verify_device_size(dev, nr_blocks, sb->sb_blocksize);
+}
+
static void
read_header_v1(
union mdrestore_headers *h,
@@ -269,7 +283,7 @@ restore_v1(
((struct xfs_dsb*)block_buffer)->sb_inprogress = 1;
- verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
+ verify_main_device_size(ddev, &sb);
bytes_read = 0;
@@ -432,14 +446,14 @@ restore_v2(
((struct xfs_dsb *)block_buffer)->sb_inprogress = 1;
- verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
+ verify_main_device_size(ddev, &sb);
if (sb.sb_logstart == 0) {
ASSERT(mdrestore.external_log == true);
verify_device_size(logdev, sb.sb_logblocks, sb.sb_blocksize);
}
- if (sb.sb_rblocks > 0) {
+ if (sb.sb_rblocks > 0 && !sb.sb_rtstart) {
ASSERT(mdrestore.realtime_data == true);
verify_device_size(rtdev, sb.sb_rblocks, sb.sb_blocksize);
}
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 43/43] xfs_growfs: support internal RT devices
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (41 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 42/43] xfs_mdrestore: support internal RT devices Christoph Hellwig
@ 2025-04-14 5:36 ` Christoph Hellwig
2025-04-25 15:48 ` [PATCH 44/43] xfs_repair: fix libxfs abstraction mess Darrick J. Wong
43 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-14 5:36 UTC (permalink / raw)
To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs
Allow RT growfs when rtstart is set in the geomety, and adjust the
queried size for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
growfs/xfs_growfs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 4b941403e2fd..0d0b2ae3e739 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -202,7 +202,7 @@ main(int argc, char **argv)
progname, fname);
exit(1);
}
- if (rflag && !xi.rt.dev) {
+ if (rflag && (!xi.rt.dev && !geo.rtstart)) {
fprintf(stderr,
_("%s: failed to access realtime device for %s\n"),
progname, fname);
@@ -211,6 +211,13 @@ main(int argc, char **argv)
xfs_report_geom(&geo, datadev, logdev, rtdev);
+ if (geo.rtstart) {
+ xfs_daddr_t rtstart = geo.rtstart * (geo.blocksize / BBSIZE);
+
+ xi.rt.size = xi.data.size - rtstart;
+ xi.data.size = rtstart;
+ }
+
ddsize = xi.data.size;
dlsize = (xi.log.size ? xi.log.size :
geo.logblocks * (geo.blocksize / BBSIZE) );
--
2.47.2
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 13/43] FIXUP: xfs: allow internal RT devices for zoned mode
2025-04-14 5:35 ` [PATCH 13/43] FIXUP: " Christoph Hellwig
@ 2025-04-14 20:16 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-14 20:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:35:56AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Seems fine to me now
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> include/libxfs.h | 6 ++++++
> include/xfs_mount.h | 7 +++++++
> libxfs/init.c | 13 +++++++++----
> libxfs/rdwr.c | 2 ++
> repair/agheader.c | 4 +++-
> 5 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/include/libxfs.h b/include/libxfs.h
> index 82b34b9d81c3..b968a2b88da3 100644
> --- a/include/libxfs.h
> +++ b/include/libxfs.h
> @@ -293,4 +293,10 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
> xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
> }
>
> +static inline bool xfs_sb_version_haszoned(struct xfs_sb *sbp)
> +{
> + return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> + xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_ZONED);
> +}
> +
> #endif /* __LIBXFS_H__ */
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index 7856acfb9f8e..bf9ebc25fc79 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -53,6 +53,13 @@ struct xfs_groups {
> * rtgroup, so this mask must be 64-bit.
> */
> uint64_t blkmask;
> +
> + /*
> + * Start of the first group in the device. This is used to support a
> + * RT device following the data device on the same block device for
> + * SMR hard drives.
> + */
> + xfs_fsblock_t start_fsb;
> };
>
> /*
> diff --git a/libxfs/init.c b/libxfs/init.c
> index 5b45ed347276..a186369f3fd8 100644
> --- a/libxfs/init.c
> +++ b/libxfs/init.c
> @@ -560,7 +560,7 @@ libxfs_buftarg_init(
> progname);
> exit(1);
> }
> - if (xi->rt.dev &&
> + if ((xi->rt.dev || xi->rt.dev == xi->data.dev) &&
> (mp->m_rtdev_targp->bt_bdev != xi->rt.dev ||
> mp->m_rtdev_targp->bt_mount != mp)) {
> fprintf(stderr,
> @@ -577,7 +577,11 @@ libxfs_buftarg_init(
> else
> mp->m_logdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->log,
> lfail);
> - mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt, rfail);
> + if (!xi->rt.dev || xi->rt.dev == xi->data.dev)
> + mp->m_rtdev_targp = mp->m_ddev_targp;
> + else
> + mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt,
> + rfail);
> }
>
> /* Compute maximum possible height for per-AG btree types for this fs. */
> @@ -978,7 +982,7 @@ libxfs_flush_mount(
> error = err2;
> }
>
> - if (mp->m_rtdev_targp) {
> + if (mp->m_rtdev_targp && mp->m_rtdev_targp != mp->m_ddev_targp) {
> err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
> _("realtime device"));
> if (!error)
> @@ -1031,7 +1035,8 @@ libxfs_umount(
> free(mp->m_fsname);
> mp->m_fsname = NULL;
>
> - libxfs_buftarg_free(mp->m_rtdev_targp);
> + if (mp->m_rtdev_targp != mp->m_ddev_targp)
> + libxfs_buftarg_free(mp->m_rtdev_targp);
> if (mp->m_logdev_targp != mp->m_ddev_targp)
> libxfs_buftarg_free(mp->m_logdev_targp);
> libxfs_buftarg_free(mp->m_ddev_targp);
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 35be785c435a..f06763b38bd8 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -175,6 +175,8 @@ libxfs_getrtsb(
> if (!mp->m_rtdev_targp->bt_bdev)
> return NULL;
>
> + ASSERT(!mp->m_sb.sb_rtstart);
> +
> error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
> XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
> if (error)
> diff --git a/repair/agheader.c b/repair/agheader.c
> index 327ba041671f..048e6c3143b5 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -485,7 +485,9 @@ secondary_sb_whack(
> *
> * size is the size of data which is valid for this sb.
> */
> - if (xfs_sb_version_hasmetadir(sb))
> + if (xfs_sb_version_haszoned(sb))
> + size = offsetofend(struct xfs_dsb, sb_rtreserved);
> + else if (xfs_sb_version_hasmetadir(sb))
> size = offsetofend(struct xfs_dsb, sb_pad);
> else if (xfs_sb_version_hasmetauuid(sb))
> size = offsetofend(struct xfs_dsb, sb_meta_uuid);
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 26/43] libfrog: report the zoned geometry
2025-04-14 5:36 ` [PATCH 26/43] libfrog: report the zoned geometry Christoph Hellwig
@ 2025-04-14 22:16 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-14 22:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:09AM +0200, Christoph Hellwig wrote:
> The rtdev_name helper is based on example code posted by Darrick Wong.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Much nicer, thanks for cleaning this up
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> libfrog/fsgeom.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
> index b5220d2d6ffd..571d376c6b3c 100644
> --- a/libfrog/fsgeom.c
> +++ b/libfrog/fsgeom.c
> @@ -8,6 +8,20 @@
> #include "fsgeom.h"
> #include "util.h"
>
> +static inline const char *
> +rtdev_name(
> + struct xfs_fsop_geom *geo,
> + const char *rtname)
> +{
> + if (!geo->rtblocks)
> + return _("none");
> + if (geo->rtstart)
> + return _("internal");
> + if (!rtname)
> + return _("external");
> + return rtname;
> +}
> +
> void
> xfs_report_geom(
> struct xfs_fsop_geom *geo,
> @@ -34,6 +48,7 @@ xfs_report_geom(
> int exchangerange;
> int parent;
> int metadir;
> + int zoned;
>
> isint = geo->logstart > 0;
> lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
> @@ -55,6 +70,7 @@ xfs_report_geom(
> exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
> parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
> metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
> + zoned = geo->flags & XFS_FSOP_GEOM_FLAGS_ZONED ? 1 : 0;
>
> printf(_(
> "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
> @@ -68,7 +84,8 @@ xfs_report_geom(
> "log =%-22s bsize=%-6d blocks=%u, version=%d\n"
> " =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
> "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
> -" =%-22s rgcount=%-4d rgsize=%u extents\n"),
> +" =%-22s rgcount=%-4d rgsize=%u extents\n"
> +" =%-22s zoned=%-6d start=%llu reserved=%llu\n"),
> mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
> "", geo->sectsize, attrversion, projid32bit,
> "", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
> @@ -81,10 +98,11 @@ xfs_report_geom(
> isint ? _("internal log") : logname ? logname : _("external"),
> geo->blocksize, geo->logblocks, logversion,
> "", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,
> - !geo->rtblocks ? _("none") : rtname ? rtname : _("external"),
> + rtdev_name(geo, rtname),
> geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
> (unsigned long long)geo->rtextents,
> - "", geo->rgcount, geo->rgextents);
> + "", geo->rgcount, geo->rgextents,
> + "", zoned, geo->rtstart, geo->rtreserved);
> }
>
> /* Try to obtain the xfs geometry. On error returns a negative error code. */
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int
2025-04-14 5:36 ` [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
@ 2025-04-15 0:34 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:11AM +0200, Christoph Hellwig wrote:
> Don't look at the variable for the rtname command line option, but
> the actual file system geometry.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> repair/dinode.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 7bdd3dcf15c1..8ca0aa0238c7 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -3265,8 +3265,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
> flags &= XFS_DIFLAG_ANY;
> }
>
> - /* need an rt-dev for the realtime flag! */
> - if ((flags & XFS_DIFLAG_REALTIME) && !rt_name) {
> + /* need an rt-dev for the realtime flag */
> + if ((flags & XFS_DIFLAG_REALTIME) && !mp->m_sb.sb_rextents) {
> if (!uncertain) {
> do_warn(
> _("inode %" PRIu64 " has RT flag set but there is no RT device\n"),
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 41/43] xfs_scrub: support internal RT device
2025-04-14 5:36 ` [PATCH 41/43] xfs_scrub: support internal RT device Christoph Hellwig
@ 2025-04-15 0:35 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:24AM +0200, Christoph Hellwig wrote:
> Handle the synthetic fmr_device values, and deal with the fact that
> ctx->fsinfo.fs_rt is allowed to be non-NULL for internal RT devices as
> it is the same as the data device in this case.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good now,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> scrub/phase1.c | 3 ++-
> scrub/phase6.c | 65 ++++++++++++++++++++++++++++--------------------
> scrub/phase7.c | 28 +++++++++++++++------
> scrub/spacemap.c | 17 +++++++++----
> 4 files changed, 72 insertions(+), 41 deletions(-)
>
> diff --git a/scrub/phase1.c b/scrub/phase1.c
> index d03a9099a217..10e9aa1892b7 100644
> --- a/scrub/phase1.c
> +++ b/scrub/phase1.c
> @@ -341,7 +341,8 @@ _("Kernel metadata repair facility is not available. Use -n to scrub."));
> _("Unable to find log device path."));
> return ECANCELED;
> }
> - if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
> + if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL &&
> + !ctx->mnt.fsgeom.rtstart) {
> str_error(ctx, ctx->mntpoint,
> _("Unable to find realtime device path."));
> return ECANCELED;
> diff --git a/scrub/phase6.c b/scrub/phase6.c
> index 2a52b2c92419..abf6f9713f1a 100644
> --- a/scrub/phase6.c
> +++ b/scrub/phase6.c
> @@ -56,12 +56,21 @@ dev_to_pool(
> struct media_verify_state *vs,
> dev_t dev)
> {
> - if (dev == ctx->fsinfo.fs_datadev)
> - return vs->rvp_data;
> - else if (dev == ctx->fsinfo.fs_logdev)
> - return vs->rvp_log;
> - else if (dev == ctx->fsinfo.fs_rtdev)
> - return vs->rvp_realtime;
> + if (ctx->mnt.fsgeom.rtstart) {
> + if (dev == XFS_DEV_DATA)
> + return vs->rvp_data;
> + if (dev == XFS_DEV_LOG)
> + return vs->rvp_log;
> + if (dev == XFS_DEV_RT)
> + return vs->rvp_realtime;
> + } else {
> + if (dev == ctx->fsinfo.fs_datadev)
> + return vs->rvp_data;
> + if (dev == ctx->fsinfo.fs_logdev)
> + return vs->rvp_log;
> + if (dev == ctx->fsinfo.fs_rtdev)
> + return vs->rvp_realtime;
> + }
> abort();
> }
>
> @@ -71,12 +80,21 @@ disk_to_dev(
> struct scrub_ctx *ctx,
> struct disk *disk)
> {
> - if (disk == ctx->datadev)
> - return ctx->fsinfo.fs_datadev;
> - else if (disk == ctx->logdev)
> - return ctx->fsinfo.fs_logdev;
> - else if (disk == ctx->rtdev)
> - return ctx->fsinfo.fs_rtdev;
> + if (ctx->mnt.fsgeom.rtstart) {
> + if (disk == ctx->datadev)
> + return XFS_DEV_DATA;
> + if (disk == ctx->logdev)
> + return XFS_DEV_LOG;
> + if (disk == ctx->rtdev)
> + return XFS_DEV_RT;
> + } else {
> + if (disk == ctx->datadev)
> + return ctx->fsinfo.fs_datadev;
> + if (disk == ctx->logdev)
> + return ctx->fsinfo.fs_logdev;
> + if (disk == ctx->rtdev)
> + return ctx->fsinfo.fs_rtdev;
> + }
> abort();
> }
>
> @@ -87,11 +105,9 @@ bitmap_for_disk(
> struct disk *disk,
> struct media_verify_state *vs)
> {
> - dev_t dev = disk_to_dev(ctx, disk);
> -
> - if (dev == ctx->fsinfo.fs_datadev)
> + if (disk == ctx->datadev)
> return vs->d_bad;
> - else if (dev == ctx->fsinfo.fs_rtdev)
> + if (disk == ctx->rtdev)
> return vs->r_bad;
> return NULL;
> }
> @@ -501,14 +517,11 @@ report_ioerr(
> .length = length,
> };
> struct disk_ioerr_report *dioerr = arg;
> - dev_t dev;
> -
> - dev = disk_to_dev(dioerr->ctx, dioerr->disk);
>
> /* Go figure out which blocks are bad from the fsmap. */
> - keys[0].fmr_device = dev;
> + keys[0].fmr_device = disk_to_dev(dioerr->ctx, dioerr->disk);
> keys[0].fmr_physical = start;
> - keys[1].fmr_device = dev;
> + keys[1].fmr_device = keys[0].fmr_device;
> keys[1].fmr_physical = start + length - 1;
> keys[1].fmr_owner = ULLONG_MAX;
> keys[1].fmr_offset = ULLONG_MAX;
> @@ -675,14 +688,12 @@ remember_ioerr(
> int ret;
>
> if (!length) {
> - dev_t dev = disk_to_dev(ctx, disk);
> -
> - if (dev == ctx->fsinfo.fs_datadev)
> + if (disk == ctx->datadev)
> vs->d_trunc = true;
> - else if (dev == ctx->fsinfo.fs_rtdev)
> - vs->r_trunc = true;
> - else if (dev == ctx->fsinfo.fs_logdev)
> + else if (disk == ctx->logdev)
> vs->l_trunc = true;
> + else if (disk == ctx->rtdev)
> + vs->r_trunc = true;
> return;
> }
>
> diff --git a/scrub/phase7.c b/scrub/phase7.c
> index 01097b678798..e25502668b1c 100644
> --- a/scrub/phase7.c
> +++ b/scrub/phase7.c
> @@ -68,25 +68,37 @@ count_block_summary(
> void *arg)
> {
> struct summary_counts *counts;
> + bool is_rt = false;
> unsigned long long len;
> int ret;
>
> + if (ctx->mnt.fsgeom.rtstart) {
> + if (fsmap->fmr_device == XFS_DEV_LOG)
> + return 0;
> + if (fsmap->fmr_device == XFS_DEV_RT)
> + is_rt = true;
> + } else {
> + if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
> + return 0;
> + if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
> + is_rt = true;
> + }
> +
> counts = ptvar_get((struct ptvar *)arg, &ret);
> if (ret) {
> str_liberror(ctx, -ret, _("retrieving summary counts"));
> return -ret;
> }
> - if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
> - return 0;
> +
> if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
> fsmap->fmr_owner == XFS_FMR_OWN_FREE) {
> uint64_t blocks;
>
> blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
> - if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
> - hist_add(&counts->datadev_hist, blocks);
> - else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
> + if (is_rt)
> hist_add(&counts->rtdev_hist, blocks);
> + else
> + hist_add(&counts->datadev_hist, blocks);
> return 0;
> }
>
> @@ -94,10 +106,10 @@ count_block_summary(
>
> /* freesp btrees live in free space, need to adjust counters later. */
> if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
> - fsmap->fmr_owner == XFS_FMR_OWN_AG) {
> + fsmap->fmr_owner == XFS_FMR_OWN_AG)
> counts->agbytes += fsmap->fmr_length;
> - }
> - if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
> +
> + if (is_rt) {
> /* Count realtime extents. */
> counts->rbytes += len;
> } else {
> diff --git a/scrub/spacemap.c b/scrub/spacemap.c
> index c293ab44a528..1ee4d1946d3d 100644
> --- a/scrub/spacemap.c
> +++ b/scrub/spacemap.c
> @@ -103,9 +103,12 @@ scan_ag_rmaps(
> bperag = (off_t)ctx->mnt.fsgeom.agblocks *
> (off_t)ctx->mnt.fsgeom.blocksize;
>
> - keys[0].fmr_device = ctx->fsinfo.fs_datadev;
> + if (ctx->mnt.fsgeom.rtstart)
> + keys[0].fmr_device = XFS_DEV_DATA;
> + else
> + keys[0].fmr_device = ctx->fsinfo.fs_datadev;
> keys[0].fmr_physical = agno * bperag;
> - keys[1].fmr_device = ctx->fsinfo.fs_datadev;
> + keys[1].fmr_device = keys[0].fmr_device;
> keys[1].fmr_physical = ((agno + 1) * bperag) - 1;
> keys[1].fmr_owner = ULLONG_MAX;
> keys[1].fmr_offset = ULLONG_MAX;
> @@ -140,9 +143,12 @@ scan_rtg_rmaps(
> off_t bperrg = bytes_per_rtgroup(&ctx->mnt.fsgeom);
> int ret;
>
> - keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
> + if (ctx->mnt.fsgeom.rtstart)
> + keys[0].fmr_device = XFS_DEV_RT;
> + else
> + keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
> keys[0].fmr_physical = (xfs_rtblock_t)rgno * bperrg;
> - keys[1].fmr_device = ctx->fsinfo.fs_rtdev;
> + keys[1].fmr_device = keys[0].fmr_device;
> keys[1].fmr_physical = ((rgno + 1) * bperrg) - 1;
> keys[1].fmr_owner = ULLONG_MAX;
> keys[1].fmr_offset = ULLONG_MAX;
> @@ -216,7 +222,8 @@ scan_log_rmaps(
> {
> struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx;
>
> - scan_dev_rmaps(ctx, ctx->fsinfo.fs_logdev, arg);
> + scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
> + arg);
> }
>
> /*
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED
2025-04-14 5:36 ` [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
@ 2025-04-15 0:36 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:36 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:19AM +0200, Christoph Hellwig wrote:
> Document the new zoned feature flag and the two new fields added
> with it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Much better,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> man/man2/ioctl_xfs_fsgeometry.2 | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
> index 502054f391e9..037f8e15e415 100644
> --- a/man/man2/ioctl_xfs_fsgeometry.2
> +++ b/man/man2/ioctl_xfs_fsgeometry.2
> @@ -50,7 +50,9 @@ struct xfs_fsop_geom {
> __u32 sick;
> __u32 checked;
> __u64 rgextents;
> - __u64 reserved[16];
> + __u64 rtstart;
> + __u64 rtreserved;
> + __u64 reserved[14];
> };
> .fi
> .in
> @@ -143,6 +145,20 @@ for more details.
> .I rgextents
> Is the number of RT extents in each rtgroup.
> .PP
> +.I rtstart
> +Start of the internal RT device in fsblocks. 0 if an external RT device
> +is used.
> +This field is meaningful only if the flag
> +.B XFS_FSOP_GEOM_FLAGS_ZONED
> +is set.
> +.PP
> +.I rtreserved
> +The amount of space in the realtime section that is reserved for internal use
> +by garbage collection and reorganization algorithms in fsblocks.
> +This field is meaningful only if the flag
> +.B XFS_FSOP_GEOM_FLAGS_ZONED
> +is set.
> +.PP
> .I reserved
> is set to zero.
> .SH FILESYSTEM FEATURE FLAGS
> @@ -221,6 +237,9 @@ Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
> .TP
> .B XFS_FSOP_GEOM_FLAGS_METADIR
> Filesystem contains a metadata directory tree.
> +.TP
> +.B XFS_FSOP_GEOM_FLAGS_ZONED
> +Filesystem uses the zoned allocator for the RT device.
> .RE
> .SH XFS METADATA HEALTH REPORTING
> .PP
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems
2025-04-14 5:36 ` [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
@ 2025-04-15 0:37 ` Darrick J. Wong
2025-04-15 8:09 ` Christoph Hellwig
0 siblings, 1 reply; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:37 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:16AM +0200, Christoph Hellwig wrote:
> Zone file systems are intended to use sequential write required zones
> (or areas treated as such) for the main data store. And usually use the
> data device only for metadata that requires random writes.
>
> rtinherit=1 is the way to achieve that, so enabled it by default, but
> still allow the user to override it if needed.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> inherit
^^ Not sure what this is, but assuming that's just paste-fingering,
the patch looks ok so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> mkfs/xfs_mkfs.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 5cd100c87f43..e6adb345551e 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2966,6 +2966,13 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
> }
> cli->rtextsize = 0;
> }
> +
> + /*
> + * Set the rtinherit by default for zoned file systems as they
> + * usually use the data device purely as a metadata container.
> + */
> + if (!cli_opt_set(&dopts, D_RTINHERIT))
> + cli->fsx.fsx_xflags |= FS_XFLAG_RTINHERIT;
> } else {
> if (cli->rtstart) {
> fprintf(stderr,
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 27/43] xfs_repair: support repairing zoned file systems
2025-04-14 5:36 ` [PATCH 27/43] xfs_repair: support repairing zoned file systems Christoph Hellwig
@ 2025-04-15 0:38 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:38 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:10AM +0200, Christoph Hellwig wrote:
> Note really much to do here. Mostly ignore the validation and
> regeneration of the bitmap and summary inodes. Eventually this
> could grow a bit of validation of the hardware zone state.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
LGTM
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> repair/dinode.c | 4 +++-
> repair/phase5.c | 13 +++++++++++++
> repair/phase6.c | 6 ++++--
> repair/rt.c | 2 ++
> repair/rtrmap_repair.c | 33 +++++++++++++++++++++++++++++++++
> repair/xfs_repair.c | 9 ++++++---
> 6 files changed, 61 insertions(+), 6 deletions(-)
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 8696a838087f..7bdd3dcf15c1 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -3585,7 +3585,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
>
> validate_extsize(mp, dino, lino, dirty);
>
> - if (dino->di_version >= 3)
> + if (dino->di_version >= 3 &&
> + (!xfs_has_zoned(mp) ||
> + dino->di_metatype != cpu_to_be16(XFS_METAFILE_RTRMAP)))
> validate_cowextsize(mp, dino, lino, dirty);
>
> /* nsec fields cannot be larger than 1 billion */
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 4cf28d8ae1a2..e350b411c243 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -630,6 +630,19 @@ void
> check_rtmetadata(
> struct xfs_mount *mp)
> {
> + if (xfs_has_zoned(mp)) {
> + /*
> + * Here we could/should verify the zone state a bit when we are
> + * on actual zoned devices:
> + * - compare hw write pointer to last written
> + * - compare zone state to last written
> + *
> + * Note much we can do when running in zoned mode on a
> + * conventional device.
> + */
> + return;
> + }
> +
> generate_rtinfo(mp);
> check_rtbitmap(mp);
> check_rtsummary(mp);
> diff --git a/repair/phase6.c b/repair/phase6.c
> index dbc090a54139..a7187e84daae 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -3460,8 +3460,10 @@ _(" - resetting contents of realtime bitmap and summary inodes\n"));
> return;
>
> while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> - ensure_rtgroup_bitmap(rtg);
> - ensure_rtgroup_summary(rtg);
> + if (!xfs_has_zoned(mp)) {
> + ensure_rtgroup_bitmap(rtg);
> + ensure_rtgroup_summary(rtg);
> + }
> ensure_rtgroup_rmapbt(rtg, est_fdblocks);
> ensure_rtgroup_refcountbt(rtg, est_fdblocks);
> }
> diff --git a/repair/rt.c b/repair/rt.c
> index e0a4943ee3b7..a2478fb635e3 100644
> --- a/repair/rt.c
> +++ b/repair/rt.c
> @@ -222,6 +222,8 @@ check_rtfile_contents(
> xfs_fileoff_t bno = 0;
> int error;
>
> + ASSERT(!xfs_has_zoned(mp));
> +
> if (!ip) {
> do_warn(_("unable to open %s file\n"), filename);
> return;
> diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c
> index 2b07e8943e59..955db1738fe2 100644
> --- a/repair/rtrmap_repair.c
> +++ b/repair/rtrmap_repair.c
> @@ -141,6 +141,37 @@ xrep_rtrmap_btree_load(
> return error;
> }
>
> +static void
> +rtgroup_update_counters(
> + struct xfs_rtgroup *rtg)
> +{
> + struct xfs_inode *rmapip = rtg->rtg_inodes[XFS_RTGI_RMAP];
> + struct xfs_mount *mp = rtg_mount(rtg);
> + uint64_t end =
> + xfs_rtbxlen_to_blen(mp, rtg->rtg_extents);
> + xfs_agblock_t gbno = 0;
> + uint64_t used = 0;
> +
> + do {
> + int bstate;
> + xfs_extlen_t blen;
> +
> + bstate = get_bmap_ext(rtg_rgno(rtg), gbno, end, &blen, true);
> + switch (bstate) {
> + case XR_E_INUSE:
> + case XR_E_INUSE_FS:
> + used += blen;
> + break;
> + default:
> + break;
> + }
> +
> + gbno += blen;
> + } while (gbno < end);
> +
> + rmapip->i_used_blocks = used;
> +}
> +
> /* Update the inode counters. */
> STATIC int
> xrep_rtrmap_reset_counters(
> @@ -153,6 +184,8 @@ xrep_rtrmap_reset_counters(
> * generated.
> */
> sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
> + if (xfs_has_zoned(sc->mp))
> + rtgroup_update_counters(rr->rtg);
> libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
>
> /* Quotas don't exist so we're done. */
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index eeaaf6434689..7bf75c09b945 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -1388,16 +1388,19 @@ main(int argc, char **argv)
> * Done with the block usage maps, toss them. Realtime metadata aren't
> * rebuilt until phase 6, so we have to keep them around.
> */
> - if (mp->m_sb.sb_rblocks == 0)
> + if (mp->m_sb.sb_rblocks == 0) {
> rmaps_free(mp);
> - free_bmaps(mp);
> + free_bmaps(mp);
> + }
>
> if (!bad_ino_btree) {
> phase6(mp);
> phase_end(mp, 6);
>
> - if (mp->m_sb.sb_rblocks != 0)
> + if (mp->m_sb.sb_rblocks != 0) {
> rmaps_free(mp);
> + free_bmaps(mp);
> + }
> free_rtgroup_inodes();
>
> phase7(mp, phase2_threads);
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones
2025-04-14 5:36 ` [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
@ 2025-04-15 0:39 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:39 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:12AM +0200, Christoph Hellwig wrote:
> Run a report zones ioctl, and verify the rt group state vs the
> reported hardware zone state. Note that there is no way to actually
> fix up any discrepancies here, as that would be rather scary without
> having transactions.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> repair/Makefile | 1 +
> repair/phase5.c | 11 +---
> repair/zoned.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
> repair/zoned.h | 10 ++++
> 4 files changed, 152 insertions(+), 9 deletions(-)
> create mode 100644 repair/zoned.c
> create mode 100644 repair/zoned.h
>
> diff --git a/repair/Makefile b/repair/Makefile
> index ff5b1f5abeda..fb0b2f96cc91 100644
> --- a/repair/Makefile
> +++ b/repair/Makefile
> @@ -81,6 +81,7 @@ CFILES = \
> strblobs.c \
> threads.c \
> versions.c \
> + zoned.c \
> xfs_repair.c
>
> LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBXCMD) $(LIBFROG) $(LIBUUID) $(LIBRT) \
> diff --git a/repair/phase5.c b/repair/phase5.c
> index e350b411c243..e44c26885717 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -21,6 +21,7 @@
> #include "rmap.h"
> #include "bulkload.h"
> #include "agbtree.h"
> +#include "zoned.h"
>
> static uint64_t *sb_icount_ag; /* allocated inodes per ag */
> static uint64_t *sb_ifree_ag; /* free inodes per ag */
> @@ -631,15 +632,7 @@ check_rtmetadata(
> struct xfs_mount *mp)
> {
> if (xfs_has_zoned(mp)) {
> - /*
> - * Here we could/should verify the zone state a bit when we are
> - * on actual zoned devices:
> - * - compare hw write pointer to last written
> - * - compare zone state to last written
> - *
> - * Note much we can do when running in zoned mode on a
> - * conventional device.
> - */
> + check_zones(mp);
> return;
> }
>
> diff --git a/repair/zoned.c b/repair/zoned.c
> new file mode 100644
> index 000000000000..456076b9817d
> --- /dev/null
> +++ b/repair/zoned.c
> @@ -0,0 +1,139 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2024 Christoph Hellwig.
> + */
> +#include <ctype.h>
> +#include <linux/blkzoned.h>
> +#include "libxfs_priv.h"
> +#include "libxfs.h"
> +#include "xfs_zones.h"
> +#include "err_protos.h"
> +#include "zoned.h"
> +
> +/* random size that allows efficient processing */
> +#define ZONES_PER_IOCTL 16384
> +
> +static void
> +report_zones_cb(
> + struct xfs_mount *mp,
> + struct blk_zone *zone)
> +{
> + xfs_rtblock_t zsbno = xfs_daddr_to_rtb(mp, zone->start);
> + xfs_rgblock_t write_pointer;
> + xfs_rgnumber_t rgno;
> + struct xfs_rtgroup *rtg;
> +
> + if (xfs_rtb_to_rgbno(mp, zsbno) != 0) {
> + do_error(_("mismatched zone start 0x%llx."),
> + (unsigned long long)zsbno);
> + return;
> + }
> +
> + rgno = xfs_rtb_to_rgno(mp, zsbno);
> + rtg = xfs_rtgroup_grab(mp, rgno);
> + if (!rtg) {
> + do_error(_("realtime group not found for zone %u."), rgno);
> + return;
> + }
> +
> + if (!rtg_rmap(rtg))
> + do_warn(_("no rmap inode for zone %u."), rgno);
> + else
> + xfs_zone_validate(zone, rtg, &write_pointer);
> + xfs_rtgroup_rele(rtg);
> +}
> +
> +void
> +check_zones(
> + struct xfs_mount *mp)
> +{
> + int fd = mp->m_rtdev_targp->bt_bdev_fd;
> + uint64_t sector = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart);
> + unsigned int zone_size, zone_capacity;
> + uint64_t device_size;
> + size_t rep_size;
> + struct blk_zone_report *rep;
> + unsigned int i, n = 0;
> +
> + if (ioctl(fd, BLKGETSIZE64, &device_size))
> + return; /* not a block device */
> + if (ioctl(fd, BLKGETZONESZ, &zone_size) || !zone_size)
> + return; /* not zoned */
> +
> + /* BLKGETSIZE64 reports a byte value */
> + device_size = BTOBB(device_size);
Much better now, thanks for cleaning up the type conversions
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> + if (device_size / zone_size < mp->m_sb.sb_rgcount) {
> + do_error(_("rt device too small\n"));
> + return;
> + }
> +
> + rep_size = sizeof(struct blk_zone_report) +
> + sizeof(struct blk_zone) * ZONES_PER_IOCTL;
> + rep = malloc(rep_size);
> + if (!rep) {
> + do_warn(_("malloc failed for zone report\n"));
> + return;
> + }
> +
> + while (n < mp->m_sb.sb_rgcount) {
> + struct blk_zone *zones = (struct blk_zone *)(rep + 1);
> + int ret;
> +
> + memset(rep, 0, rep_size);
> + rep->sector = sector;
> + rep->nr_zones = ZONES_PER_IOCTL;
> +
> + ret = ioctl(fd, BLKREPORTZONE, rep);
> + if (ret) {
> + do_error(_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
> + goto out_free;
> + }
> + if (!rep->nr_zones)
> + break;
> +
> + for (i = 0; i < rep->nr_zones; i++) {
> + if (n >= mp->m_sb.sb_rgcount)
> + break;
> +
> + if (zones[i].len != zone_size) {
> + do_error(_("Inconsistent zone size!\n"));
> + goto out_free;
> + }
> +
> + switch (zones[i].type) {
> + case BLK_ZONE_TYPE_CONVENTIONAL:
> + case BLK_ZONE_TYPE_SEQWRITE_REQ:
> + break;
> + case BLK_ZONE_TYPE_SEQWRITE_PREF:
> + do_error(
> +_("Found sequential write preferred zone\n"));
> + goto out_free;
> + default:
> + do_error(
> +_("Found unknown zone type (0x%x)\n"), zones[i].type);
> + goto out_free;
> + }
> +
> + if (!n) {
> + zone_capacity = zones[i].capacity;
> + if (zone_capacity > zone_size) {
> + do_error(
> +_("Zone capacity larger than zone size!\n"));
> + goto out_free;
> + }
> + } else if (zones[i].capacity != zone_capacity) {
> + do_error(
> +_("Inconsistent zone capacity!\n"));
> + goto out_free;
> + }
> +
> + report_zones_cb(mp, &zones[i]);
> + n++;
> + }
> + sector = zones[rep->nr_zones - 1].start +
> + zones[rep->nr_zones - 1].len;
> + }
> +
> +out_free:
> + free(rep);
> +}
> diff --git a/repair/zoned.h b/repair/zoned.h
> new file mode 100644
> index 000000000000..ab76bf15b3ca
> --- /dev/null
> +++ b/repair/zoned.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2024 Christoph Hellwig.
> + */
> +#ifndef _XFS_REPAIR_ZONED_H_
> +#define _XFS_REPAIR_ZONED_H_
> +
> +void check_zones(struct xfs_mount *mp);
> +
> +#endif /* _XFS_REPAIR_ZONED_H_ */
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper
2025-04-14 5:36 ` [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper Christoph Hellwig
@ 2025-04-15 0:40 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:40 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:13AM +0200, Christoph Hellwig wrote:
> Factor out the rtgroup geometry checks so that they can be easily reused
> for the upcoming zoned RT allocator support.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Pretty straightforward, thanks for taking the time to disentangle this!
:)
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> mkfs/xfs_mkfs.c | 67 +++++++++++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 30 deletions(-)
>
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index ec82e05bf4e4..13b746b365e1 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -3950,6 +3950,42 @@ out:
> cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
> }
>
> +static void
> +validate_rtgroup_geometry(
> + struct mkfs_params *cfg)
> +{
> + if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
> + fprintf(stderr,
> +_("realtime group size (%llu) must be less than the maximum (%u)\n"),
> + (unsigned long long)cfg->rgsize,
> + XFS_MAX_RGBLOCKS);
> + usage();
> + }
> +
> + if (cfg->rgsize % cfg->rtextblocks != 0) {
> + fprintf(stderr,
> +_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
> + (unsigned long long)cfg->rgsize,
> + (unsigned long long)cfg->rtextblocks);
> + usage();
> + }
> +
> + if (cfg->rgsize <= cfg->rtextblocks) {
> + fprintf(stderr,
> +_("realtime group size (%llu) must be at least two realtime extents\n"),
> + (unsigned long long)cfg->rgsize);
> + usage();
> + }
> +
> + if (cfg->rgcount > XFS_MAX_RGNUMBER) {
> + fprintf(stderr,
> +_("realtime group count (%llu) must be less than the maximum (%u)\n"),
> + (unsigned long long)cfg->rgcount,
> + XFS_MAX_RGNUMBER);
> + usage();
> + }
> +}
> +
> static void
> calculate_rtgroup_geometry(
> struct mkfs_params *cfg,
> @@ -4007,36 +4043,7 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
> (cfg->rtblocks % cfg->rgsize != 0);
> }
>
> - if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
> - fprintf(stderr,
> -_("realtime group size (%llu) must be less than the maximum (%u)\n"),
> - (unsigned long long)cfg->rgsize,
> - XFS_MAX_RGBLOCKS);
> - usage();
> - }
> -
> - if (cfg->rgsize % cfg->rtextblocks != 0) {
> - fprintf(stderr,
> -_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
> - (unsigned long long)cfg->rgsize,
> - (unsigned long long)cfg->rtextblocks);
> - usage();
> - }
> -
> - if (cfg->rgsize <= cfg->rtextblocks) {
> - fprintf(stderr,
> -_("realtime group size (%llu) must be at least two realtime extents\n"),
> - (unsigned long long)cfg->rgsize);
> - usage();
> - }
> -
> - if (cfg->rgcount > XFS_MAX_RGNUMBER) {
> - fprintf(stderr,
> -_("realtime group count (%llu) must be less than the maximum (%u)\n"),
> - (unsigned long long)cfg->rgcount,
> - XFS_MAX_RGNUMBER);
> - usage();
> - }
> + validate_rtgroup_geometry(cfg);
>
> if (cfg->rtextents)
> cfg->rtbmblocks = howmany(cfg->rgsize / cfg->rtextblocks,
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices
2025-04-14 5:36 ` [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices Christoph Hellwig
@ 2025-04-15 0:41 ` Darrick J. Wong
0 siblings, 0 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-15 0:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 07:36:14AM +0200, Christoph Hellwig wrote:
> To create file systems with a zoned RT device, query the hardware
> zone information to align the RT groups to it, and create an internal
> RT device if the device has conventional and sequential write required
> zones.
>
> Default to use all sequential write required zoned for the RT device if
> there are sequential write required zones.
>
> Default to 256 and 1% conventional when -r zoned is specified without
> further option and there are no sequential write required zones. This
> mimics a SMR HDD and works well with tests.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Much improved, thanks!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> libxfs/init.c | 2 +-
> mkfs/proto.c | 3 +-
> mkfs/xfs_mkfs.c | 511 ++++++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 497 insertions(+), 19 deletions(-)
>
> diff --git a/libxfs/init.c b/libxfs/init.c
> index a186369f3fd8..393a94673f7e 100644
> --- a/libxfs/init.c
> +++ b/libxfs/init.c
> @@ -251,7 +251,7 @@ libxfs_close_devices(
> libxfs_device_close(&li->data);
> if (li->log.dev && li->log.dev != li->data.dev)
> libxfs_device_close(&li->log);
> - if (li->rt.dev)
> + if (li->rt.dev && li->rt.dev != li->data.dev)
> libxfs_device_close(&li->rt);
> }
>
> diff --git a/mkfs/proto.c b/mkfs/proto.c
> index 7f56a3d82a06..7f80bef838be 100644
> --- a/mkfs/proto.c
> +++ b/mkfs/proto.c
> @@ -1144,7 +1144,8 @@ rtinit_groups(
> fail(_("rtrmap rtsb init failed"), error);
> }
>
> - rtfreesp_init(rtg);
> + if (!xfs_has_zoned(mp))
> + rtfreesp_init(rtg);
> }
> }
>
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 13b746b365e1..8b7a0b617d3e 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -6,6 +6,8 @@
> #include "libfrog/util.h"
> #include "libxfs.h"
> #include <ctype.h>
> +#include <linux/blkzoned.h>
> +#include "libxfs/xfs_zones.h"
> #include "xfs_multidisk.h"
> #include "libxcmd.h"
> #include "libfrog/fsgeom.h"
> @@ -135,6 +137,9 @@ enum {
> R_RGCOUNT,
> R_RGSIZE,
> R_CONCURRENCY,
> + R_ZONED,
> + R_START,
> + R_RESERVED,
> R_MAX_OPTS,
> };
>
> @@ -739,6 +744,9 @@ static struct opt_params ropts = {
> [R_RGCOUNT] = "rgcount",
> [R_RGSIZE] = "rgsize",
> [R_CONCURRENCY] = "concurrency",
> + [R_ZONED] = "zoned",
> + [R_START] = "start",
> + [R_RESERVED] = "reserved",
> [R_MAX_OPTS] = NULL,
> },
> .subopt_params = {
> @@ -804,6 +812,28 @@ static struct opt_params ropts = {
> .maxval = INT_MAX,
> .defaultval = 1,
> },
> + { .index = R_ZONED,
> + .conflicts = { { &ropts, R_EXTSIZE },
> + { NULL, LAST_CONFLICT } },
> + .minval = 0,
> + .maxval = 1,
> + .defaultval = 1,
> + },
> + { .index = R_START,
> + .conflicts = { { &ropts, R_DEV },
> + { NULL, LAST_CONFLICT } },
> + .convert = true,
> + .minval = 0,
> + .maxval = LLONG_MAX,
> + .defaultval = SUBOPT_NEEDS_VAL,
> + },
> + { .index = R_RESERVED,
> + .conflicts = { { NULL, LAST_CONFLICT } },
> + .convert = true,
> + .minval = 0,
> + .maxval = LLONG_MAX,
> + .defaultval = SUBOPT_NEEDS_VAL,
> + },
> },
> };
>
> @@ -1012,6 +1042,8 @@ struct sb_feat_args {
> bool nortalign;
> bool nrext64;
> bool exchrange; /* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
> + bool zoned;
> + bool zone_gaps;
>
> uint16_t qflags;
> };
> @@ -1035,6 +1067,8 @@ struct cli_params {
> char *lsu;
> char *rtextsize;
> char *rtsize;
> + char *rtstart;
> + uint64_t rtreserved;
>
> /* parameters where 0 is a valid CLI value */
> int dsunit;
> @@ -1121,6 +1155,8 @@ struct mkfs_params {
> char *label;
>
> struct sb_feat_args sb_feat;
> + uint64_t rtstart;
> + uint64_t rtreserved;
> };
>
> /*
> @@ -1172,7 +1208,7 @@ usage( void )
> /* prototype file */ [-p fname]\n\
> /* quiet */ [-q]\n\
> /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
> - concurrency=num]\n\
> + concurrency=num,zoned=0|1,start=n,reserved=n]\n\
> /* sectorsize */ [-s size=num]\n\
> /* version */ [-V]\n\
> devicename\n\
> @@ -1539,6 +1575,34 @@ discard_blocks(int fd, uint64_t nsectors, int quiet)
> printf("Done.\n");
> }
>
> +static void
> +reset_zones(
> + struct mkfs_params *cfg,
> + int fd,
> + uint64_t start_sector,
> + uint64_t nsectors,
> + int quiet)
> +{
> + struct blk_zone_range range = {
> + .sector = start_sector,
> + .nr_sectors = nsectors,
> + };
> +
> + if (!quiet) {
> + printf("Resetting zones...");
> + fflush(stdout);
> + }
> +
> + if (ioctl(fd, BLKRESETZONE, &range) < 0) {
> + if (!quiet)
> + printf(" FAILED (%d)\n", -errno);
> + exit(1);
> + }
> +
> + if (!quiet)
> + printf("Done.\n");
> +}
> +
> static __attribute__((noreturn)) void
> illegal_option(
> const char *value,
> @@ -2144,6 +2208,15 @@ rtdev_opts_parser(
> case R_CONCURRENCY:
> set_rtvol_concurrency(opts, subopt, cli, value);
> break;
> + case R_ZONED:
> + cli->sb_feat.zoned = getnum(value, opts, subopt);
> + break;
> + case R_START:
> + cli->rtstart = getstr(value, opts, subopt);
> + break;
> + case R_RESERVED:
> + cli->rtreserved = getnum(value, opts, subopt);
> + break;
> default:
> return -EINVAL;
> }
> @@ -2445,7 +2518,213 @@ _("Version 1 logs do not support sector size %d\n"),
> _("log stripe unit specified, using v2 logs\n"));
> cli->sb_feat.log_version = 2;
> }
> +}
> +
> +struct zone_info {
> + /* number of zones, conventional or sequential */
> + unsigned int nr_zones;
> + /* number of conventional zones */
> + unsigned int nr_conv_zones;
> +
> + /* size of the address space for a zone, in 512b blocks */
> + xfs_daddr_t zone_size;
> + /* write capacity of a zone, in 512b blocks */
> + xfs_daddr_t zone_capacity;
> +};
> +
> +struct zone_topology {
> + struct zone_info data;
> + struct zone_info rt;
> + struct zone_info log;
> +};
> +
> +/* random size that allows efficient processing */
> +#define ZONES_PER_IOCTL 16384
> +
> +static void
> +report_zones(
> + const char *name,
> + struct zone_info *zi)
> +{
> + struct blk_zone_report *rep;
> + bool found_seq = false;
> + int fd, ret = 0;
> + uint64_t device_size;
> + uint64_t sector = 0;
> + size_t rep_size;
> + unsigned int i, n = 0;
> + struct stat st;
> +
> + fd = open(name, O_RDONLY);
> + if (fd < 0) {
> + fprintf(stderr, _("Failed to open RT device: %d.\n"), -errno);
> + exit(1);
> + }
> +
> + if (fstat(fd, &st) < 0) {
> + ret = -EIO;
> + goto out_close;
> + }
> + if (!S_ISBLK(st.st_mode))
> + goto out_close;
> +
> + if (ioctl(fd, BLKGETSIZE64, &device_size)) {
> + fprintf(stderr, _("Failed to get block size: %d.\n"), -errno);
> + exit(1);
> + }
> + if (ioctl(fd, BLKGETZONESZ, &zi->zone_size) || !zi->zone_size)
> + goto out_close; /* not zoned */
> +
> + /* BLKGETSIZE64 reports a byte value */
> + device_size = BTOBB(device_size);
> + zi->nr_zones = device_size / zi->zone_size;
> + zi->nr_conv_zones = 0;
> +
> + rep_size = sizeof(struct blk_zone_report) +
> + sizeof(struct blk_zone) * ZONES_PER_IOCTL;
> + rep = malloc(rep_size);
> + if (!rep) {
> + fprintf(stderr,
> +_("Failed to allocate memory for zone reporting.\n"));
> + exit(1);
> + }
> +
> + while (n < zi->nr_zones) {
> + struct blk_zone *zones = (struct blk_zone *)(rep + 1);
> +
> + memset(rep, 0, rep_size);
> + rep->sector = sector;
> + rep->nr_zones = ZONES_PER_IOCTL;
> +
> + ret = ioctl(fd, BLKREPORTZONE, rep);
> + if (ret) {
> + fprintf(stderr,
> +_("ioctl(BLKREPORTZONE) failed: %d!\n"), -errno);
> + exit(1);
> + }
> + if (!rep->nr_zones)
> + break;
> +
> + for (i = 0; i < rep->nr_zones; i++) {
> + if (n >= zi->nr_zones)
> + break;
> +
> + if (zones[i].len != zi->zone_size) {
> + fprintf(stderr,
> +_("Inconsistent zone size!\n"));
> + exit(1);
> + }
> +
> + switch (zones[i].type) {
> + case BLK_ZONE_TYPE_CONVENTIONAL:
> + /*
> + * We can only use the conventional space at the
> + * start of the device for metadata, so don't
> + * count later conventional zones. This is
> + * not an error because we can use them for data
> + * just fine.
> + */
> + if (!found_seq)
> + zi->nr_conv_zones++;
> + break;
> + case BLK_ZONE_TYPE_SEQWRITE_REQ:
> + found_seq = true;
> + break;
> + case BLK_ZONE_TYPE_SEQWRITE_PREF:
> + fprintf(stderr,
> +_("Sequential write preferred zones not supported.\n"));
> + exit(1);
> + default:
> + fprintf(stderr,
> +_("Unknown zone type (0x%x) found.\n"), zones[i].type);
> + exit(1);
> + }
> +
> + if (!n) {
> + zi->zone_capacity = zones[i].capacity;
> + if (zi->zone_capacity > zi->zone_size) {
> + fprintf(stderr,
> +_("Zone capacity larger than zone size!\n"));
> + exit(1);
> + }
> + } else if (zones[i].capacity != zi->zone_capacity) {
> + fprintf(stderr,
> +_("Inconsistent zone capacity!\n"));
> + exit(1);
> + }
> +
> + n++;
> + }
> + sector = zones[rep->nr_zones - 1].start +
> + zones[rep->nr_zones - 1].len;
> + }
> +
> + free(rep);
> +out_close:
> + close(fd);
> +}
> +
> +static void
> +validate_zoned(
> + struct mkfs_params *cfg,
> + struct cli_params *cli,
> + struct mkfs_default_params *dft,
> + struct zone_topology *zt)
> +{
> + if (!cli->xi->data.isfile) {
> + report_zones(cli->xi->data.name, &zt->data);
> + if (zt->data.nr_zones) {
> + if (!zt->data.nr_conv_zones) {
> + fprintf(stderr,
> +_("Data devices requires conventional zones.\n"));
> + usage();
> + }
> + if (zt->data.zone_capacity != zt->data.zone_size) {
> + fprintf(stderr,
> +_("Zone capacity equal to Zone size required for conventional zones.\n"));
> + usage();
> + }
> +
> + cli->sb_feat.zoned = true;
> + cfg->rtstart =
> + zt->data.nr_conv_zones * zt->data.zone_capacity;
> + }
> + }
> +
> + if (cli->xi->rt.name && !cli->xi->rt.isfile) {
> + report_zones(cli->xi->rt.name, &zt->rt);
> + if (zt->rt.nr_zones && !cli->sb_feat.zoned)
> + cli->sb_feat.zoned = true;
> + if (zt->rt.zone_size != zt->rt.zone_capacity)
> + cli->sb_feat.zone_gaps = true;
> + }
> +
> + if (cli->xi->log.name && !cli->xi->log.isfile) {
> + report_zones(cli->xi->log.name, &zt->log);
> + if (zt->log.nr_zones) {
> + fprintf(stderr,
> +_("Zoned devices not supported as log device!\n"));
> + usage();
> + }
> + }
>
> + if (cli->rtstart) {
> + /*
> + * For zoned devices with conventional zones, cfg->rtstart is
> + * set to the start of the first sequential write required zoned
> + * above. Don't allow the user to override it as that won't
> + * work.
> + */
> + if (cfg->rtstart) {
> + fprintf(stderr,
> +_("rtstart override not allowed on zoned devices.\n"));
> + usage();
> + }
> + cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;
> + }
> +
> + if (cli->rtreserved)
> + cfg->rtreserved = cli->rtreserved;
> }
>
> /*
> @@ -2670,7 +2949,37 @@ _("inode btree counters not supported without finobt support\n"));
> cli->sb_feat.inobtcnt = false;
> }
>
> - if (cli->xi->rt.name) {
> + if (cli->sb_feat.zoned) {
> + if (!cli->sb_feat.metadir) {
> + if (cli_opt_set(&mopts, M_METADIR)) {
> + fprintf(stderr,
> +_("zoned realtime device not supported without metadir support\n"));
> + usage();
> + }
> + cli->sb_feat.metadir = true;
> + }
> + if (cli->rtextsize) {
> + if (cli_opt_set(&ropts, R_EXTSIZE)) {
> + fprintf(stderr,
> +_("rt extent size not supported on realtime devices with zoned mode\n"));
> + usage();
> + }
> + cli->rtextsize = 0;
> + }
> + } else {
> + if (cli->rtstart) {
> + fprintf(stderr,
> +_("internal RT section only supported in zoned mode\n"));
> + usage();
> + }
> + if (cli->rtreserved) {
> + fprintf(stderr,
> +_("reserved RT blocks only supported in zoned mode\n"));
> + usage();
> + }
> + }
> +
> + if (cli->xi->rt.name || cfg->rtstart) {
> if (cli->rtextsize && cli->sb_feat.reflink) {
> if (cli_opt_set(&mopts, M_REFLINK)) {
> fprintf(stderr,
> @@ -2911,6 +3220,11 @@ validate_rtextsize(
> usage();
> }
> cfg->rtextblocks = 1;
> + } else if (cli->sb_feat.zoned) {
> + /*
> + * Zoned mode only supports a rtextsize of 1.
> + */
> + cfg->rtextblocks = 1;
> } else {
> /*
> * If realtime extsize has not been specified by the user,
> @@ -3315,7 +3629,8 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
> static void
> open_devices(
> struct mkfs_params *cfg,
> - struct libxfs_init *xi)
> + struct libxfs_init *xi,
> + struct zone_topology *zt)
> {
> uint64_t sector_mask;
>
> @@ -3330,6 +3645,34 @@ open_devices(
> usage();
> }
>
> + if (zt->data.nr_zones) {
> + zt->rt.zone_size = zt->data.zone_size;
> + zt->rt.zone_capacity = zt->data.zone_capacity;
> + zt->rt.nr_zones = zt->data.nr_zones - zt->data.nr_conv_zones;
> + } else if (cfg->sb_feat.zoned && !cfg->rtstart && !xi->rt.dev) {
> + /*
> + * By default reserve at 1% of the total capacity (rounded up to
> + * the next power of two) for metadata, but match the minimum we
> + * enforce elsewhere. This matches what SMR HDDs provide.
> + */
> + uint64_t rt_target_size = max((xi->data.size + 99) / 100,
> + BTOBB(300 * 1024 * 1024));
> +
> + cfg->rtstart = 1;
> + while (cfg->rtstart < rt_target_size)
> + cfg->rtstart <<= 1;
> + }
> +
> + if (cfg->rtstart) {
> + if (cfg->rtstart >= xi->data.size) {
> + fprintf(stderr,
> + _("device size %lld too small for zoned allocator\n"), xi->data.size);
> + usage();
> + }
> + xi->rt.size = xi->data.size - cfg->rtstart;
> + xi->data.size = cfg->rtstart;
> + }
> +
> /*
> * Ok, Linux only has a 1024-byte resolution on device _size_,
> * and the sizes below are in basic 512-byte blocks,
> @@ -3348,17 +3691,42 @@ open_devices(
>
> static void
> discard_devices(
> + struct mkfs_params *cfg,
> struct libxfs_init *xi,
> + struct zone_topology *zt,
> int quiet)
> {
> /*
> * This function has to be called after libxfs has been initialized.
> */
>
> - if (!xi->data.isfile)
> - discard_blocks(xi->data.fd, xi->data.size, quiet);
> - if (xi->rt.dev && !xi->rt.isfile)
> - discard_blocks(xi->rt.fd, xi->rt.size, quiet);
> + if (!xi->data.isfile) {
> + uint64_t nsectors = xi->data.size;
> +
> + if (cfg->rtstart && zt->data.nr_zones) {
> + /*
> + * Note that the zone reset here includes the LBA range
> + * for the data device.
> + *
> + * This is because doing a single zone reset all on the
> + * entire device (which the kernel automatically does
> + * for us for a full device range) is a lot faster than
> + * resetting each zone individually and resetting
> + * the conventional zones used for the data device is a
> + * no-op.
> + */
> + reset_zones(cfg, xi->data.fd, 0,
> + cfg->rtstart + xi->rt.size, quiet);
> + nsectors -= cfg->rtstart;
> + }
> + discard_blocks(xi->data.fd, nsectors, quiet);
> + }
> + if (xi->rt.dev && !xi->rt.isfile) {
> + if (zt->rt.nr_zones)
> + reset_zones(cfg, xi->rt.fd, 0, xi->rt.size, quiet);
> + else
> + discard_blocks(xi->rt.fd, xi->rt.size, quiet);
> + }
> if (xi->log.dev && xi->log.dev != xi->data.dev && !xi->log.isfile)
> discard_blocks(xi->log.fd, xi->log.size, quiet);
> }
> @@ -3477,11 +3845,12 @@ reported by the device (%u).\n"),
> static void
> validate_rtdev(
> struct mkfs_params *cfg,
> - struct cli_params *cli)
> + struct cli_params *cli,
> + struct zone_topology *zt)
> {
> struct libxfs_init *xi = cli->xi;
>
> - if (!xi->rt.dev) {
> + if (!xi->rt.dev && !cfg->rtstart) {
> if (cli->rtsize) {
> fprintf(stderr,
> _("size specified for non-existent rt subvolume\n"));
> @@ -3501,7 +3870,7 @@ _("size specified for non-existent rt subvolume\n"));
> if (cli->rtsize) {
> if (cfg->rtblocks > DTOBT(xi->rt.size, cfg->blocklog)) {
> fprintf(stderr,
> -_("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
> +_("size %s specified for rt subvolume is too large, maximum is %lld blocks\n"),
> cli->rtsize,
> (long long)DTOBT(xi->rt.size, cfg->blocklog));
> usage();
> @@ -3512,6 +3881,9 @@ _("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
> reported by the device (%u).\n"),
> cfg->sectorsize, xi->rt.bsize);
> }
> + } else if (zt->rt.nr_zones) {
> + cfg->rtblocks = DTOBT(zt->rt.nr_zones * zt->rt.zone_capacity,
> + cfg->blocklog);
> } else {
> /* grab volume size */
> cfg->rtblocks = DTOBT(xi->rt.size, cfg->blocklog);
> @@ -4050,6 +4422,92 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
> NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
> }
>
> +static void
> +calculate_zone_geometry(
> + struct mkfs_params *cfg,
> + struct cli_params *cli,
> + struct libxfs_init *xi,
> + struct zone_topology *zt)
> +{
> + if (cfg->rtblocks == 0) {
> + fprintf(stderr,
> +_("empty zoned realtime device not supported.\n"));
> + usage();
> + }
> +
> + if (zt->rt.nr_zones) {
> + /* The RT device has hardware zones */
> + cfg->rgsize = zt->rt.zone_capacity * 512;
> +
> + if (cfg->rgsize % cfg->blocksize) {
> + fprintf(stderr,
> +_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
> + cli->rgsize, cfg->blocksize);
> + usage();
> + }
> + if (cli->rgsize) {
> + fprintf(stderr,
> +_("rgsize (%s) may not be specified when the rt device is zoned\n"),
> + cli->rgsize);
> + usage();
> + }
> +
> + cfg->rgsize /= cfg->blocksize;
> + cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
> +
> + if (cli->rgcount > cfg->rgcount) {
> + fprintf(stderr,
> +_("rgcount (%llu) is larger than hardware zone count (%llu)\n"),
> + (unsigned long long)cli->rgcount,
> + (unsigned long long)cfg->rgcount);
> + usage();
> + } else if (cli->rgcount && cli->rgcount < cfg->rgcount) {
> + /* constrain the rt device to the given rgcount */
> + cfg->rgcount = cli->rgcount;
> + }
> + } else {
> + /* No hardware zones */
> + if (cli->rgsize) {
> + /* User-specified rtgroup size */
> + cfg->rgsize = getnum(cli->rgsize, &ropts, R_RGSIZE);
> +
> + /* Check specified agsize is a multiple of blocksize. */
> + if (cfg->rgsize % cfg->blocksize) {
> + fprintf(stderr,
> +_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
> + cli->rgsize, cfg->blocksize);
> + usage();
> + }
> + cfg->rgsize /= cfg->blocksize;
> + cfg->rgcount = cfg->rtblocks / cfg->rgsize +
> + (cfg->rtblocks % cfg->rgsize != 0);
> + } else if (cli->rgcount) {
> + /* User-specified rtgroup count */
> + cfg->rgcount = cli->rgcount;
> + cfg->rgsize = cfg->rtblocks / cfg->rgcount +
> + (cfg->rtblocks % cfg->rgcount != 0);
> + } else {
> + /* 256MB zones just like typical SMR HDDs */
> + cfg->rgsize = MEGABYTES(256, cfg->blocklog);
> + cfg->rgcount = cfg->rtblocks / cfg->rgsize +
> + (cfg->rtblocks % cfg->rgsize != 0);
> + }
> + }
> +
> + if (cfg->rgcount < XFS_MIN_ZONES) {
> + fprintf(stderr,
> +_("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
> + (unsigned long long)cfg->rgcount,
> + XFS_MIN_ZONES);
> + usage();
> + }
> +
> + validate_rtgroup_geometry(cfg);
> +
> + /* Zoned RT devices don't use the rtbitmap, and have no bitmap blocks */
> + cfg->rtbmblocks = 0;
> +}
> +
> static void
> calculate_imaxpct(
> struct mkfs_params *cfg,
> @@ -4213,6 +4671,14 @@ sb_set_features(
> sbp->sb_rgblklog = libxfs_compute_rgblklog(sbp->sb_rgextents,
> cfg->rtextblocks);
> }
> +
> + if (fp->zoned) {
> + sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONED;
> + sbp->sb_rtstart = (cfg->rtstart * 512) / cfg->blocksize;
> + sbp->sb_rtreserved = cfg->rtreserved / cfg->blocksize;
> + }
> + if (fp->zone_gaps)
> + sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONE_GAPS;
> }
>
> /*
> @@ -4775,9 +5241,11 @@ prepare_devices(
> (xfs_extlen_t)XFS_FSB_TO_BB(mp, cfg->logblocks),
> &sbp->sb_uuid, cfg->sb_feat.log_version,
> lsunit, XLOG_FMT, XLOG_INIT_CYCLE, false);
> -
> /* finally, check we can write the last block in the realtime area */
> - if (mp->m_rtdev_targp->bt_bdev && cfg->rtblocks > 0) {
> + if (mp->m_rtdev_targp->bt_bdev &&
> + mp->m_rtdev_targp != mp->m_ddev_targp &&
> + cfg->rtblocks > 0 &&
> + !xfs_has_zoned(mp)) {
> buf = alloc_write_buf(mp->m_rtdev_targp,
> XFS_FSB_TO_BB(mp, cfg->rtblocks - 1LL),
> BTOBB(cfg->blocksize));
> @@ -5216,7 +5684,7 @@ main(
> */
> },
> };
> -
> + struct zone_topology zt = {};
> struct list_head buffer_list;
> int error;
>
> @@ -5318,6 +5786,7 @@ main(
> sectorsize = cfg.sectorsize;
>
> validate_log_sectorsize(&cfg, &cli, &dft, &ft);
> + validate_zoned(&cfg, &cli, &dft, &zt);
> validate_sb_features(&cfg, &cli);
>
> /*
> @@ -5342,11 +5811,11 @@ main(
> /*
> * Open and validate the device configurations
> */
> - open_devices(&cfg, &xi);
> + open_devices(&cfg, &xi, &zt);
> validate_overwrite(xi.data.name, force_overwrite);
> validate_datadev(&cfg, &cli);
> validate_logdev(&cfg, &cli);
> - validate_rtdev(&cfg, &cli);
> + validate_rtdev(&cfg, &cli, &zt);
> calc_stripe_factors(&cfg, &cli, &ft);
>
> /*
> @@ -5357,7 +5826,10 @@ main(
> */
> calculate_initial_ag_geometry(&cfg, &cli, &xi);
> align_ag_geometry(&cfg);
> - calculate_rtgroup_geometry(&cfg, &cli, &xi);
> + if (cfg.sb_feat.zoned)
> + calculate_zone_geometry(&cfg, &cli, &xi, &zt);
> + else
> + calculate_rtgroup_geometry(&cfg, &cli, &xi);
>
> calculate_imaxpct(&cfg, &cli);
>
> @@ -5410,8 +5882,13 @@ main(
> /*
> * All values have been validated, discard the old device layout.
> */
> + if (cli.sb_feat.zoned && !discard) {
> + fprintf(stderr,
> + _("-K not support for zoned file systems.\n"));
> + return 1;
> + }
> if (discard && !dry_run)
> - discard_devices(&xi, quiet);
> + discard_devices(&cfg, &xi, &zt, quiet);
>
> /*
> * we need the libxfs buffer cache from here on in.
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems
2025-04-15 0:37 ` Darrick J. Wong
@ 2025-04-15 8:09 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-15 8:09 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs
On Mon, Apr 14, 2025 at 05:37:05PM -0700, Darrick J. Wong wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > inherit
>
> ^^ Not sure what this is, but assuming that's just paste-fingering,
> the patch looks ok so
This was me squashing rather then folding in my local fixup for
your suggestion.
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 44/43] xfs_repair: fix libxfs abstraction mess
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
` (42 preceding siblings ...)
2025-04-14 5:36 ` [PATCH 43/43] xfs_growfs: " Christoph Hellwig
@ 2025-04-25 15:48 ` Darrick J. Wong
2025-04-28 13:17 ` Christoph Hellwig
2025-04-28 16:31 ` Andrey Albershteyn
43 siblings, 2 replies; 60+ messages in thread
From: Darrick J. Wong @ 2025-04-25 15:48 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Do some xfs -> libxfs callsite conversions to shut up xfs/437.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
libxfs/libxfs_api_defs.h | 1 +
repair/zoned.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index dcb5dec0a7abd2..c5fcb5e3229ae4 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -407,6 +407,7 @@
#define xfs_verify_rgbno libxfs_verify_rgbno
#define xfs_verify_rtbno libxfs_verify_rtbno
#define xfs_zero_extent libxfs_zero_extent
+#define xfs_zone_validate libxfs_zone_validate
/* Please keep this list alphabetized. */
diff --git a/repair/zoned.c b/repair/zoned.c
index 456076b9817d76..206b0158f95f84 100644
--- a/repair/zoned.c
+++ b/repair/zoned.c
@@ -30,7 +30,7 @@ report_zones_cb(
}
rgno = xfs_rtb_to_rgno(mp, zsbno);
- rtg = xfs_rtgroup_grab(mp, rgno);
+ rtg = libxfs_rtgroup_grab(mp, rgno);
if (!rtg) {
do_error(_("realtime group not found for zone %u."), rgno);
return;
@@ -39,8 +39,8 @@ report_zones_cb(
if (!rtg_rmap(rtg))
do_warn(_("no rmap inode for zone %u."), rgno);
else
- xfs_zone_validate(zone, rtg, &write_pointer);
- xfs_rtgroup_rele(rtg);
+ libxfs_zone_validate(zone, rtg, &write_pointer);
+ libxfs_rtgroup_rele(rtg);
}
void
^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 44/43] xfs_repair: fix libxfs abstraction mess
2025-04-25 15:48 ` [PATCH 44/43] xfs_repair: fix libxfs abstraction mess Darrick J. Wong
@ 2025-04-28 13:17 ` Christoph Hellwig
2025-04-28 16:28 ` Andrey Albershteyn
2025-04-28 16:31 ` Andrey Albershteyn
1 sibling, 1 reply; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-28 13:17 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs
On Fri, Apr 25, 2025 at 08:48:18AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Do some xfs -> libxfs callsite conversions to shut up xfs/437.
I'll add it to the next round. Which afaik is just this and some minor
commit message tweaking.
Andrey: let me know when it's a good time for another xfsprogs series
round, and if you want the FIXUP patches squashed for that or not.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 44/43] xfs_repair: fix libxfs abstraction mess
2025-04-28 13:17 ` Christoph Hellwig
@ 2025-04-28 16:28 ` Andrey Albershteyn
2025-04-29 12:24 ` Christoph Hellwig
0 siblings, 1 reply; 60+ messages in thread
From: Andrey Albershteyn @ 2025-04-28 16:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Andrey Albershteyn, Hans Holmberg, linux-xfs
On 2025-04-28 15:17:45, Christoph Hellwig wrote:
> On Fri, Apr 25, 2025 at 08:48:18AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Do some xfs -> libxfs callsite conversions to shut up xfs/437.
>
> I'll add it to the next round. Which afaik is just this and some minor
> commit message tweaking.
>
> Andrey: let me know when it's a good time for another xfsprogs series
> round, and if you want the FIXUP patches squashed for that or not.
>
I'm preparing for-next update, do you have any other changes for this
patchset? I can fixup the commits and fix "inherit" commit message.
Or if you want you can send it now, I will take v3.
--
- Andrey
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 44/43] xfs_repair: fix libxfs abstraction mess
2025-04-25 15:48 ` [PATCH 44/43] xfs_repair: fix libxfs abstraction mess Darrick J. Wong
2025-04-28 13:17 ` Christoph Hellwig
@ 2025-04-28 16:31 ` Andrey Albershteyn
1 sibling, 0 replies; 60+ messages in thread
From: Andrey Albershteyn @ 2025-04-28 16:31 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs
On 2025-04-25 08:48:18, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Do some xfs -> libxfs callsite conversions to shut up xfs/437.
>
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> libxfs/libxfs_api_defs.h | 1 +
> repair/zoned.c | 6 +++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
> index dcb5dec0a7abd2..c5fcb5e3229ae4 100644
> --- a/libxfs/libxfs_api_defs.h
> +++ b/libxfs/libxfs_api_defs.h
> @@ -407,6 +407,7 @@
> #define xfs_verify_rgbno libxfs_verify_rgbno
> #define xfs_verify_rtbno libxfs_verify_rtbno
> #define xfs_zero_extent libxfs_zero_extent
> +#define xfs_zone_validate libxfs_zone_validate
>
> /* Please keep this list alphabetized. */
>
> diff --git a/repair/zoned.c b/repair/zoned.c
> index 456076b9817d76..206b0158f95f84 100644
> --- a/repair/zoned.c
> +++ b/repair/zoned.c
> @@ -30,7 +30,7 @@ report_zones_cb(
> }
>
> rgno = xfs_rtb_to_rgno(mp, zsbno);
> - rtg = xfs_rtgroup_grab(mp, rgno);
> + rtg = libxfs_rtgroup_grab(mp, rgno);
> if (!rtg) {
> do_error(_("realtime group not found for zone %u."), rgno);
> return;
> @@ -39,8 +39,8 @@ report_zones_cb(
> if (!rtg_rmap(rtg))
> do_warn(_("no rmap inode for zone %u."), rgno);
> else
> - xfs_zone_validate(zone, rtg, &write_pointer);
> - xfs_rtgroup_rele(rtg);
> + libxfs_zone_validate(zone, rtg, &write_pointer);
> + libxfs_rtgroup_rele(rtg);
> }
>
> void
>
LGTM
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
--
- Andrey
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 44/43] xfs_repair: fix libxfs abstraction mess
2025-04-28 16:28 ` Andrey Albershteyn
@ 2025-04-29 12:24 ` Christoph Hellwig
0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2025-04-29 12:24 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Darrick J. Wong, Andrey Albershteyn,
Hans Holmberg, linux-xfs
On Mon, Apr 28, 2025 at 06:28:58PM +0200, Andrey Albershteyn wrote:
> > Andrey: let me know when it's a good time for another xfsprogs series
> > round, and if you want the FIXUP patches squashed for that or not.
> >
>
> I'm preparing for-next update, do you have any other changes for this
> patchset? I can fixup the commits and fix "inherit" commit message.
> Or if you want you can send it now, I will take v3.
Fixing the inherit message and folding in this patch were the only
todo items on my list.
^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2025-04-29 12:24 UTC | newest]
Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 5:35 xfsprogs support for zoned devices v2 Christoph Hellwig
2025-04-14 5:35 ` [PATCH 01/43] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
2025-04-14 5:35 ` [PATCH 02/43] FIXUP: " Christoph Hellwig
2025-04-14 5:35 ` [PATCH 03/43] xfs: make metabtree reservations global Christoph Hellwig
2025-04-14 5:35 ` [PATCH 04/43] FIXUP: " Christoph Hellwig
2025-04-14 5:35 ` [PATCH 05/43] xfs: reduce metafile reservations Christoph Hellwig
2025-04-14 5:35 ` [PATCH 06/43] xfs: add a rtg_blocks helper Christoph Hellwig
2025-04-14 5:35 ` [PATCH 07/43] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
2025-04-14 5:35 ` [PATCH 08/43] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
2025-04-14 5:35 ` [PATCH 09/43] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
2025-04-14 5:35 ` [PATCH 10/43] xfs: define the zoned on-disk format Christoph Hellwig
2025-04-14 5:35 ` [PATCH 11/43] FIXUP: " Christoph Hellwig
2025-04-14 5:35 ` [PATCH 12/43] xfs: allow internal RT devices for zoned mode Christoph Hellwig
2025-04-14 5:35 ` [PATCH 13/43] FIXUP: " Christoph Hellwig
2025-04-14 20:16 ` Darrick J. Wong
2025-04-14 5:35 ` [PATCH 14/43] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
2025-04-14 5:35 ` [PATCH 15/43] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
2025-04-14 5:35 ` [PATCH 16/43] xfs: parse and validate hardware zone information Christoph Hellwig
2025-04-14 5:36 ` [PATCH 17/43] FIXUP: " Christoph Hellwig
2025-04-14 5:36 ` [PATCH 18/43] xfs: add the zoned space allocator Christoph Hellwig
2025-04-14 5:36 ` [PATCH 19/43] xfs: add support for zoned space reservations Christoph Hellwig
2025-04-14 5:36 ` [PATCH 20/43] FIXUP: " Christoph Hellwig
2025-04-14 5:36 ` [PATCH 21/43] xfs: implement zoned garbage collection Christoph Hellwig
2025-04-14 5:36 ` [PATCH 22/43] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
2025-04-14 5:36 ` [PATCH 23/43] xfs: enable the zoned RT device feature Christoph Hellwig
2025-04-14 5:36 ` [PATCH 24/43] xfs: support zone gaps Christoph Hellwig
2025-04-14 5:36 ` [PATCH 25/43] FIXUP: " Christoph Hellwig
2025-04-14 5:36 ` [PATCH 26/43] libfrog: report the zoned geometry Christoph Hellwig
2025-04-14 22:16 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 27/43] xfs_repair: support repairing zoned file systems Christoph Hellwig
2025-04-15 0:38 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 28/43] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
2025-04-15 0:34 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 29/43] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
2025-04-15 0:39 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 30/43] xfs_mkfs: factor out a validate_rtgroup_geometry helper Christoph Hellwig
2025-04-15 0:40 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 31/43] xfs_mkfs: support creating file system with zoned RT devices Christoph Hellwig
2025-04-15 0:41 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 32/43] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
2025-04-14 5:36 ` [PATCH 33/43] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
2025-04-15 0:37 ` Darrick J. Wong
2025-04-15 8:09 ` Christoph Hellwig
2025-04-14 5:36 ` [PATCH 34/43] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
2025-04-14 5:36 ` [PATCH 35/43] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
2025-04-14 5:36 ` [PATCH 36/43] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
2025-04-15 0:36 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 37/43] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
2025-04-14 5:36 ` [PATCH 38/43] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
2025-04-14 5:36 ` [PATCH 39/43] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
2025-04-14 5:36 ` [PATCH 40/43] xfs_spaceman: handle internal RT devices Christoph Hellwig
2025-04-14 5:36 ` [PATCH 41/43] xfs_scrub: support internal RT device Christoph Hellwig
2025-04-15 0:35 ` Darrick J. Wong
2025-04-14 5:36 ` [PATCH 42/43] xfs_mdrestore: support internal RT devices Christoph Hellwig
2025-04-14 5:36 ` [PATCH 43/43] xfs_growfs: " Christoph Hellwig
2025-04-25 15:48 ` [PATCH 44/43] xfs_repair: fix libxfs abstraction mess Darrick J. Wong
2025-04-28 13:17 ` Christoph Hellwig
2025-04-28 16:28 ` Andrey Albershteyn
2025-04-29 12:24 ` Christoph Hellwig
2025-04-28 16:31 ` Andrey Albershteyn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox