public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfsprogs support for zoned devices
@ 2025-04-09  7:55 Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 01/45] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
                   ` (44 more replies)
  0 siblings, 45 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Hi all,

this series add support for zoned devices to xfsprogs, matching the
kernel support that landed in 6.15-rc1.

For the libxfs syncs I kept separate fixup patches for the xfsprogs
local changes not directly ported from the kernel to make rebasing
easier.  I'll squash them for the final submission.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 01/45] xfs: generalize the freespace and reserved blocks handling
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 02/45] FIXUP: " Christoph Hellwig
                   ` (43 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 712bae96631852c1a1822ee4f57a08ccd843358b

xfs_{add,dec}_freecounter already handles the block and RT extent
percpu counters, but it currently hardcodes the passed in counter.

Add a freecounter abstraction that uses an enum to designate the counter
and add wrappers that hide the actual percpu_counters.  This will allow
expanding the reserved block handling to the RT extent counter in the
next step, and also prepares for adding yet another such counter that
can share the code.  Both these additions will be needed for the zoned
allocator.

Also switch the flooring of the frextents counter to 0 in statfs for the
rthinherit case to a manual min_t call to match the handling of the
fdblocks counter for normal file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ialloc.c   |  2 +-
 libxfs/xfs_metafile.c |  2 +-
 libxfs/xfs_sb.c       |  8 ++++----
 libxfs/xfs_types.h    | 17 +++++++++++++++++
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 63ce76755eb7..b401299ad933 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1922,7 +1922,7 @@ xfs_dialloc(
 	 * that we can immediately allocate, but then we allow allocation on the
 	 * second pass if we fail to find an AG with free inodes in it.
 	 */
-	if (percpu_counter_read_positive(&mp->m_fdblocks) <
+	if (xfs_estimate_freecounter(mp, XC_FREE_BLOCKS) <
 			mp->m_low_space[XFS_LOWSP_1_PCNT]) {
 		ok_alloc = false;
 		low_space = true;
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 4488e38a8734..7673265510fd 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -93,7 +93,7 @@ xfs_metafile_resv_can_cover(
 	 * There aren't enough blocks left in the inode's reservation, but it
 	 * isn't critical unless there also isn't enough free space.
 	 */
-	return __percpu_counter_compare(&ip->i_mount->m_fdblocks,
+	return xfs_compare_freecounter(ip->i_mount, XC_FREE_BLOCKS,
 			rhs - ip->i_delayed_blks, 2048) >= 0;
 }
 
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 50a43c0328cb..1781ca36b2cc 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1262,8 +1262,7 @@ xfs_log_sb(
 		mp->m_sb.sb_ifree = min_t(uint64_t,
 				percpu_counter_sum_positive(&mp->m_ifree),
 				mp->m_sb.sb_icount);
-		mp->m_sb.sb_fdblocks =
-				percpu_counter_sum_positive(&mp->m_fdblocks);
+		mp->m_sb.sb_fdblocks = xfs_sum_freecounter(mp, XC_FREE_BLOCKS);
 	}
 
 	/*
@@ -1272,9 +1271,10 @@ xfs_log_sb(
 	 * we handle nearly-lockless reservations, so we must use the _positive
 	 * variant here to avoid writing out nonsense frextents.
 	 */
-	if (xfs_has_rtgroups(mp))
+	if (xfs_has_rtgroups(mp)) {
 		mp->m_sb.sb_frextents =
-				percpu_counter_sum_positive(&mp->m_frextents);
+				xfs_sum_freecounter(mp, XC_FREE_RTEXTENTS);
+	}
 
 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
 	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index ca2401c1facd..76f3c31573ec 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -233,6 +233,23 @@ enum xfs_group_type {
 	{ XG_TYPE_AG,	"ag" }, \
 	{ XG_TYPE_RTG,	"rtg" }
 
+enum xfs_free_counter {
+	/*
+	 * Number of free blocks on the data device.
+	 */
+	XC_FREE_BLOCKS,
+
+	/*
+	 * Number of free RT extents on the RT device.
+	 */
+	XC_FREE_RTEXTENTS,
+	XC_FREE_NR,
+};
+
+#define XFS_FREECOUNTER_STR \
+	{ XC_FREE_BLOCKS,		"blocks" }, \
+	{ XC_FREE_RTEXTENTS,		"rtextents" }
+
 /*
  * Type verifier functions
  */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 02/45] FIXUP: xfs: generalize the freespace and reserved blocks handling
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 01/45] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:32   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 03/45] xfs: make metabtree reservations global Christoph Hellwig
                   ` (42 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h  | 32 ++++++++++++++++++++++++++++++--
 libxfs/libxfs_priv.h | 13 +------------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 383cba7d6e3f..e0f72fc32b25 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -63,8 +63,6 @@ typedef struct xfs_mount {
 	xfs_sb_t		m_sb;		/* copy of fs superblock */
 #define m_icount	m_sb.sb_icount
 #define m_ifree		m_sb.sb_ifree
-#define m_fdblocks	m_sb.sb_fdblocks
-#define m_frextents	m_sb.sb_frextents
 	spinlock_t		m_sb_lock;
 
 	/*
@@ -332,6 +330,36 @@ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
 __XFS_UNSUPP_OPSTATE(readonly)
 __XFS_UNSUPP_OPSTATE(shutdown)
 
+static inline int64_t xfs_sum_freecounter(struct xfs_mount *mp,
+		enum xfs_free_counter ctr)
+{
+	if (ctr == XC_FREE_RTEXTENTS)
+		return mp->m_sb.sb_frextents;
+	return mp->m_sb.sb_fdblocks;
+}
+
+static inline int64_t xfs_estimate_freecounter(struct xfs_mount *mp,
+		enum xfs_free_counter ctr)
+{
+	return xfs_sum_freecounter(mp, ctr);
+}
+
+static inline int xfs_compare_freecounter(struct xfs_mount *mp,
+		enum xfs_free_counter ctr, int64_t rhs, int32_t batch)
+{
+	uint64_t count;
+
+	if (ctr == XC_FREE_RTEXTENTS)
+		count = mp->m_sb.sb_frextents;
+	else
+		count = mp->m_sb.sb_fdblocks;
+	if (count > rhs)
+		return 1;
+	else if (count < rhs)
+		return -1;
+	return 0;
+}
+
 /* don't fail on device size or AG count checks */
 #define LIBXFS_MOUNT_DEBUGGER		(1U << 0)
 /* report metadata corruption to stdout */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 7e5c125b581a..cb4800de0b11 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -209,7 +209,7 @@ static inline bool WARN_ON(bool expr) {
 }
 
 #define WARN_ON_ONCE(e)			WARN_ON(e)
-#define percpu_counter_read(x)		(*x)
+
 #define percpu_counter_read_positive(x)	((*x) > 0 ? (*x) : 0)
 #define percpu_counter_sum_positive(x)	((*x) > 0 ? (*x) : 0)
 
@@ -219,17 +219,6 @@ uint32_t get_random_u32(void);
 #define get_random_u32()	(0)
 #endif
 
-static inline int
-__percpu_counter_compare(uint64_t *count, int64_t rhs, int32_t batch)
-{
-	if (*count > rhs)
-		return 1;
-	else if (*count < rhs)
-		return -1;
-	return 0;
-}
-
-
 #define PAGE_SIZE		getpagesize()
 extern unsigned int PAGE_SHIFT;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 03/45] xfs: make metabtree reservations global
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 01/45] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 02/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 04/45] FIXUP: " Christoph Hellwig
                   ` (41 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 1df8d75030b787a9fae270b59b93eef809dd2011

Currently each metabtree inode has it's own space reservation to ensure
it can be expanded to the maximum size, mirroring what is done for the
AG-based btrees.  But unlike the AG-based btrees the metabtree inodes
aren't restricted to allocate from a single AG but can use free space
form the entire file system.  And unlike AG-based btrees where the
required reservation shrinks with the available free space due to this,
the metabtree reservations for the rtrmap and rtfreflink trees are not
bound in any way by the data device free space as they track RT extent
allocations.  This is not very efficient as it requires a large number
of blocks to be set aside that can't be used at all by other btrees.

Switch to a model that uses a global pool instead in preparation for
reducing the amount of reserved space, which now also removes the
overloading of the i_nblocks field for metabtree inodes, which would
create problems if metabtree inodes ever had a big enough xattr fork
to require xattr blocks outside the inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_metafile.c | 164 ++++++++++++++++++++++++++----------------
 libxfs/xfs_metafile.h |   6 +-
 2 files changed, 107 insertions(+), 63 deletions(-)

diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 7673265510fd..1216a0e5169e 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -19,6 +19,9 @@
 #include "xfs_inode.h"
 #include "xfs_errortag.h"
 #include "xfs_alloc.h"
+#include "xfs_rtgroup.h"
+#include "xfs_rtrmap_btree.h"
+#include "xfs_rtrefcount_btree.h"
 
 static const struct {
 	enum xfs_metafile_type	mtype;
@@ -72,12 +75,11 @@ xfs_metafile_clear_iflag(
 }
 
 /*
- * Is the amount of space that could be allocated towards a given metadata
- * file at or beneath a certain threshold?
+ * Is the metafile reservations at or beneath a certain threshold?
  */
 static inline bool
 xfs_metafile_resv_can_cover(
-	struct xfs_inode	*ip,
+	struct xfs_mount	*mp,
 	int64_t			rhs)
 {
 	/*
@@ -86,43 +88,38 @@ xfs_metafile_resv_can_cover(
 	 * global free block count.  Take care of the first case to avoid
 	 * touching the per-cpu counter.
 	 */
-	if (ip->i_delayed_blks >= rhs)
+	if (mp->m_metafile_resv_avail >= rhs)
 		return true;
 
 	/*
 	 * There aren't enough blocks left in the inode's reservation, but it
 	 * isn't critical unless there also isn't enough free space.
 	 */
-	return xfs_compare_freecounter(ip->i_mount, XC_FREE_BLOCKS,
-			rhs - ip->i_delayed_blks, 2048) >= 0;
+	return xfs_compare_freecounter(mp, XC_FREE_BLOCKS,
+			rhs - mp->m_metafile_resv_avail, 2048) >= 0;
 }
 
 /*
- * Is this metadata file critically low on blocks?  For now we'll define that
- * as the number of blocks we can get our hands on being less than 10% of what
- * we reserved or less than some arbitrary number (maximum btree height).
+ * Is the metafile reservation critically low on blocks?  For now we'll define
+ * that as the number of blocks we can get our hands on being less than 10% of
+ * what we reserved or less than some arbitrary number (maximum btree height).
  */
 bool
 xfs_metafile_resv_critical(
-	struct xfs_inode	*ip)
+	struct xfs_mount	*mp)
 {
-	uint64_t		asked_low_water;
+	ASSERT(xfs_has_metadir(mp));
 
-	if (!ip)
-		return false;
-
-	ASSERT(xfs_is_metadir_inode(ip));
-	trace_xfs_metafile_resv_critical(ip, 0);
+	trace_xfs_metafile_resv_critical(mp, 0);
 
-	if (!xfs_metafile_resv_can_cover(ip, ip->i_mount->m_rtbtree_maxlevels))
+	if (!xfs_metafile_resv_can_cover(mp, mp->m_rtbtree_maxlevels))
 		return true;
 
-	asked_low_water = div_u64(ip->i_meta_resv_asked, 10);
-	if (!xfs_metafile_resv_can_cover(ip, asked_low_water))
+	if (!xfs_metafile_resv_can_cover(mp,
+			div_u64(mp->m_metafile_resv_target, 10)))
 		return true;
 
-	return XFS_TEST_ERROR(false, ip->i_mount,
-			XFS_ERRTAG_METAFILE_RESV_CRITICAL);
+	return XFS_TEST_ERROR(false, mp, XFS_ERRTAG_METAFILE_RESV_CRITICAL);
 }
 
 /* Allocate a block from the metadata file's reservation. */
@@ -131,22 +128,24 @@ xfs_metafile_resv_alloc_space(
 	struct xfs_inode	*ip,
 	struct xfs_alloc_arg	*args)
 {
+	struct xfs_mount	*mp = ip->i_mount;
 	int64_t			len = args->len;
 
 	ASSERT(xfs_is_metadir_inode(ip));
 	ASSERT(args->resv == XFS_AG_RESV_METAFILE);
 
-	trace_xfs_metafile_resv_alloc_space(ip, args->len);
+	trace_xfs_metafile_resv_alloc_space(mp, args->len);
 
 	/*
 	 * Allocate the blocks from the metadata inode's block reservation
 	 * and update the ondisk sb counter.
 	 */
-	if (ip->i_delayed_blks > 0) {
+	mutex_lock(&mp->m_metafile_resv_lock);
+	if (mp->m_metafile_resv_avail > 0) {
 		int64_t		from_resv;
 
-		from_resv = min_t(int64_t, len, ip->i_delayed_blks);
-		ip->i_delayed_blks -= from_resv;
+		from_resv = min_t(int64_t, len, mp->m_metafile_resv_avail);
+		mp->m_metafile_resv_avail -= from_resv;
 		xfs_mod_delalloc(ip, 0, -from_resv);
 		xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS,
 				-from_resv);
@@ -173,6 +172,9 @@ xfs_metafile_resv_alloc_space(
 		xfs_trans_mod_sb(args->tp, field, -len);
 	}
 
+	mp->m_metafile_resv_used += args->len;
+	mutex_unlock(&mp->m_metafile_resv_lock);
+
 	ip->i_nblocks += args->len;
 	xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE);
 }
@@ -184,26 +186,33 @@ xfs_metafile_resv_free_space(
 	struct xfs_trans	*tp,
 	xfs_filblks_t		len)
 {
+	struct xfs_mount	*mp = ip->i_mount;
 	int64_t			to_resv;
 
 	ASSERT(xfs_is_metadir_inode(ip));
-	trace_xfs_metafile_resv_free_space(ip, len);
+
+	trace_xfs_metafile_resv_free_space(mp, len);
 
 	ip->i_nblocks -= len;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
+	mutex_lock(&mp->m_metafile_resv_lock);
+	mp->m_metafile_resv_used -= len;
+
 	/*
 	 * Add the freed blocks back into the inode's delalloc reservation
 	 * until it reaches the maximum size.  Update the ondisk fdblocks only.
 	 */
-	to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks);
+	to_resv = mp->m_metafile_resv_target -
+		(mp->m_metafile_resv_used + mp->m_metafile_resv_avail);
 	if (to_resv > 0) {
 		to_resv = min_t(int64_t, to_resv, len);
-		ip->i_delayed_blks += to_resv;
+		mp->m_metafile_resv_avail += to_resv;
 		xfs_mod_delalloc(ip, 0, to_resv);
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, to_resv);
 		len -= to_resv;
 	}
+	mutex_unlock(&mp->m_metafile_resv_lock);
 
 	/*
 	 * Everything else goes back to the filesystem, so update the in-core
@@ -213,61 +222,96 @@ xfs_metafile_resv_free_space(
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len);
 }
 
-/* Release a metadata file's space reservation. */
+static void
+__xfs_metafile_resv_free(
+	struct xfs_mount	*mp)
+{
+	if (mp->m_metafile_resv_avail) {
+		xfs_mod_sb_delalloc(mp, -(int64_t)mp->m_metafile_resv_avail);
+		xfs_add_fdblocks(mp, mp->m_metafile_resv_avail);
+	}
+	mp->m_metafile_resv_avail = 0;
+	mp->m_metafile_resv_used = 0;
+	mp->m_metafile_resv_target = 0;
+}
+
+/* Release unused metafile space reservation. */
 void
 xfs_metafile_resv_free(
-	struct xfs_inode	*ip)
+	struct xfs_mount	*mp)
 {
-	/* Non-btree metadata inodes don't need space reservations. */
-	if (!ip || !ip->i_meta_resv_asked)
+	if (!xfs_has_metadir(mp))
 		return;
 
-	ASSERT(xfs_is_metadir_inode(ip));
-	trace_xfs_metafile_resv_free(ip, 0);
+	trace_xfs_metafile_resv_free(mp, 0);
 
-	if (ip->i_delayed_blks) {
-		xfs_mod_delalloc(ip, 0, -ip->i_delayed_blks);
-		xfs_add_fdblocks(ip->i_mount, ip->i_delayed_blks);
-		ip->i_delayed_blks = 0;
-	}
-	ip->i_meta_resv_asked = 0;
+	mutex_lock(&mp->m_metafile_resv_lock);
+	__xfs_metafile_resv_free(mp);
+	mutex_unlock(&mp->m_metafile_resv_lock);
 }
 
-/* Set up a metadata file's space reservation. */
+/* Set up a metafile space reservation. */
 int
 xfs_metafile_resv_init(
-	struct xfs_inode	*ip,
-	xfs_filblks_t		ask)
+	struct xfs_mount	*mp)
 {
+	struct xfs_rtgroup	*rtg = NULL;
+	xfs_filblks_t		used = 0, target = 0;
 	xfs_filblks_t		hidden_space;
-	xfs_filblks_t		used;
-	int			error;
+	int			error = 0;
 
-	if (!ip || ip->i_meta_resv_asked > 0)
+	if (!xfs_has_metadir(mp))
 		return 0;
 
-	ASSERT(xfs_is_metadir_inode(ip));
+	/*
+	 * Free any previous reservation to have a clean slate.
+	 */
+	mutex_lock(&mp->m_metafile_resv_lock);
+	__xfs_metafile_resv_free(mp);
+
+	/*
+	 * Currently the only btree metafiles that require reservations are the
+	 * rtrmap and the rtrefcount.  Anything new will have to be added here
+	 * as well.
+	 */
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		if (xfs_has_rtrmapbt(mp)) {
+			used += rtg_rmap(rtg)->i_nblocks;
+			target += xfs_rtrmapbt_calc_reserves(mp);
+		}
+		if (xfs_has_rtreflink(mp)) {
+			used += rtg_refcount(rtg)->i_nblocks;
+			target += xfs_rtrefcountbt_calc_reserves(mp);
+		}
+	}
+
+	if (!target)
+		goto out_unlock;
 
 	/*
-	 * Space taken by all other metadata btrees are accounted on-disk as
+	 * Space taken by the per-AG metadata btrees are accounted on-disk as
 	 * used space.  We therefore only hide the space that is reserved but
 	 * not used by the trees.
 	 */
-	used = ip->i_nblocks;
-	if (used > ask)
-		ask = used;
-	hidden_space = ask - used;
+	if (used > target)
+		target = used;
+	hidden_space = target - used;
 
-	error = xfs_dec_fdblocks(ip->i_mount, hidden_space, true);
+	error = xfs_dec_fdblocks(mp, hidden_space, true);
 	if (error) {
-		trace_xfs_metafile_resv_init_error(ip, error, _RET_IP_);
-		return error;
+		trace_xfs_metafile_resv_init_error(mp, 0);
+		goto out_unlock;
 	}
 
-	xfs_mod_delalloc(ip, 0, hidden_space);
-	ip->i_delayed_blks = hidden_space;
-	ip->i_meta_resv_asked = ask;
+	xfs_mod_sb_delalloc(mp, hidden_space);
+
+	mp->m_metafile_resv_target = target;
+	mp->m_metafile_resv_used = used;
+	mp->m_metafile_resv_avail = hidden_space;
+
+	trace_xfs_metafile_resv_init(mp, target);
 
-	trace_xfs_metafile_resv_init(ip, ask);
-	return 0;
+out_unlock:
+	mutex_unlock(&mp->m_metafile_resv_lock);
+	return error;
 }
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index 95af4b52e5a7..ae6f9e779b98 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -26,13 +26,13 @@ void xfs_metafile_clear_iflag(struct xfs_trans *tp, struct xfs_inode *ip);
 /* Space reservations for metadata inodes. */
 struct xfs_alloc_arg;
 
-bool xfs_metafile_resv_critical(struct xfs_inode *ip);
+bool xfs_metafile_resv_critical(struct xfs_mount *mp);
 void xfs_metafile_resv_alloc_space(struct xfs_inode *ip,
 		struct xfs_alloc_arg *args);
 void xfs_metafile_resv_free_space(struct xfs_inode *ip, struct xfs_trans *tp,
 		xfs_filblks_t len);
-void xfs_metafile_resv_free(struct xfs_inode *ip);
-int xfs_metafile_resv_init(struct xfs_inode *ip, xfs_filblks_t ask);
+void xfs_metafile_resv_free(struct xfs_mount *mp);
+int xfs_metafile_resv_init(struct xfs_mount *mp);
 
 /* Code specific to kernel/userspace; must be provided externally. */
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 04/45] FIXUP: xfs: make metabtree reservations global
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 03/45] xfs: make metabtree reservations global Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:43   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 05/45] xfs: reduce metafile reservations Christoph Hellwig
                   ` (40 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

---
 include/spinlock.h   |  5 +++++
 include/xfs_mount.h  |  4 ++++
 libxfs/libxfs_priv.h |  1 +
 mkfs/xfs_mkfs.c      | 25 ++++---------------------
 4 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/include/spinlock.h b/include/spinlock.h
index 82973726b101..73bd8c078fea 100644
--- a/include/spinlock.h
+++ b/include/spinlock.h
@@ -22,4 +22,9 @@ typedef pthread_mutex_t	spinlock_t;
 #define spin_trylock(l)		(pthread_mutex_trylock(l) != EBUSY)
 #define spin_unlock(l)		pthread_mutex_unlock(l)
 
+#define mutex_init(l)		pthread_mutex_init(l, NULL)
+#define mutex_lock(l)		pthread_mutex_lock(l)
+#define mutex_trylock(l)	(pthread_mutex_trylock(l) != EBUSY)
+#define mutex_unlock(l)		pthread_mutex_unlock(l)
+
 #endif /* __LIBXFS_SPINLOCK_H__ */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index e0f72fc32b25..0acf952eb9d7 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -164,6 +164,10 @@ typedef struct xfs_mount {
 	atomic64_t		m_allocbt_blks;
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 
+	pthread_mutex_t		m_metafile_resv_lock;
+	uint64_t		m_metafile_resv_target;
+	uint64_t		m_metafile_resv_used;
+	uint64_t		m_metafile_resv_avail;
 } xfs_mount_t;
 
 #define M_IGEO(mp)		(&(mp)->m_ino_geo)
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index cb4800de0b11..82952b0db629 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -151,6 +151,7 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
 
 #define xfs_force_shutdown(d,n)		((void) 0)
 #define xfs_mod_delalloc(a,b,c)		((void) 0)
+#define xfs_mod_sb_delalloc(sb, d)	((void) 0)
 
 /* stop unused var warnings by assigning mp to itself */
 
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 3f4455d46383..39e3349205fb 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -5102,8 +5102,6 @@ check_rt_meta_prealloc(
 	struct xfs_mount	*mp)
 {
 	struct xfs_perag	*pag = NULL;
-	struct xfs_rtgroup	*rtg = NULL;
-	xfs_filblks_t		ask;
 	int			error;
 
 	/*
@@ -5123,27 +5121,12 @@ check_rt_meta_prealloc(
 		}
 	}
 
-	/* Realtime metadata btree inode */
-	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
-		ask = libxfs_rtrmapbt_calc_reserves(mp);
-		error = -libxfs_metafile_resv_init(rtg_rmap(rtg), ask);
-		if (error)
-			prealloc_fail(mp, error, ask, _("realtime rmap btree"));
-
-		ask = libxfs_rtrefcountbt_calc_reserves(mp);
-		error = -libxfs_metafile_resv_init(rtg_refcount(rtg), ask);
-		if (error)
-			prealloc_fail(mp, error, ask,
-					_("realtime refcount btree"));
-	}
+	error = -libxfs_metafile_resv_init(mp);
+	if (error)
+		prealloc_fail(mp, error, 0, _("metafile"));
 
-	/* Unreserve the realtime metadata reservations. */
-	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
-		libxfs_metafile_resv_free(rtg_rmap(rtg));
-		libxfs_metafile_resv_free(rtg_refcount(rtg));
-	}
+	libxfs_metafile_resv_free(mp);
 
-	/* Unreserve the per-AG reservations. */
 	while ((pag = xfs_perag_next(mp, pag)))
 		libxfs_ag_resv_free(pag);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 05/45] xfs: reduce metafile reservations
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (3 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 04/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 06/45] xfs: add a rtg_blocks helper Christoph Hellwig
                   ` (39 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 272e20bb24dc895375ccc18a82596a7259b5a652

There is no point in reserving more space than actually available
on the data device for the worst case scenario that is unlikely to
happen.  Reserve at most 1/4th of the data device blocks, which is
still a heuristic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_metafile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
index 1216a0e5169e..6ded87d09ab7 100644
--- a/libxfs/xfs_metafile.c
+++ b/libxfs/xfs_metafile.c
@@ -258,6 +258,7 @@ xfs_metafile_resv_init(
 	struct xfs_rtgroup	*rtg = NULL;
 	xfs_filblks_t		used = 0, target = 0;
 	xfs_filblks_t		hidden_space;
+	xfs_rfsblock_t		dblocks_avail = mp->m_sb.sb_dblocks / 4;
 	int			error = 0;
 
 	if (!xfs_has_metadir(mp))
@@ -295,6 +296,8 @@ xfs_metafile_resv_init(
 	 */
 	if (used > target)
 		target = used;
+	else if (target > dblocks_avail)
+		target = dblocks_avail;
 	hidden_space = target - used;
 
 	error = xfs_dec_fdblocks(mp, hidden_space, true);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 06/45] xfs: add a rtg_blocks helper
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (4 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 05/45] xfs: reduce metafile reservations Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 07/45] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
                   ` (38 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 012482b3308a49a84c2a7df08218dd4ad081e1da

Shortcut dereferencing the xg_block_count field in the generic group
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c | 2 +-
 libxfs/xfs_rtgroup.h | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 24fb160b8067..6f65ecc3015d 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -267,7 +267,7 @@ xfs_rtgroup_get_geometry(
 	/* Fill out form. */
 	memset(rgeo, 0, sizeof(*rgeo));
 	rgeo->rg_number = rtg_rgno(rtg);
-	rgeo->rg_length = rtg_group(rtg)->xg_block_count;
+	rgeo->rg_length = rtg_blocks(rtg);
 	xfs_rtgroup_geom_health(rtg, rgeo);
 	return 0;
 }
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 03f39d4e43fc..9c7e03f913cb 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -66,6 +66,11 @@ static inline xfs_rgnumber_t rtg_rgno(const struct xfs_rtgroup *rtg)
 	return rtg->rtg_group.xg_gno;
 }
 
+static inline xfs_rgblock_t rtg_blocks(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_group.xg_block_count;
+}
+
 static inline struct xfs_inode *rtg_bitmap(const struct xfs_rtgroup *rtg)
 {
 	return rtg->rtg_inodes[XFS_RTGI_BITMAP];
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 07/45] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (5 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 06/45] xfs: add a rtg_blocks helper Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 08/45] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
                   ` (37 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 7c879c8275c0505c551f0fc6c152299c8d11f756

Delalloc reservations are not supported in userspace, and thus it doesn't
make sense to share this helper with xfsprogs.c.  Move it to xfs_iomap.c
toward the two callers.

Note that there rest of the delalloc handling should probably eventually
also move out of xfs_bmap.c, but that will require a bit more surgery.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c | 294 ++--------------------------------------------
 libxfs/xfs_bmap.h |   5 +-
 2 files changed, 8 insertions(+), 291 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 07c553d92423..4dc66e77744f 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -165,18 +165,16 @@ xfs_bmbt_update(
  * Compute the worst-case number of indirect blocks that will be used
  * for ip's delayed extent of length "len".
  */
-STATIC xfs_filblks_t
+xfs_filblks_t
 xfs_bmap_worst_indlen(
-	xfs_inode_t	*ip,		/* incore inode pointer */
-	xfs_filblks_t	len)		/* delayed extent length */
+	struct xfs_inode	*ip,		/* incore inode pointer */
+	xfs_filblks_t		len)		/* delayed extent length */
 {
-	int		level;		/* btree level number */
-	int		maxrecs;	/* maximum record count at this level */
-	xfs_mount_t	*mp;		/* mount structure */
-	xfs_filblks_t	rval;		/* return value */
+	struct xfs_mount	*mp = ip->i_mount;
+	int			maxrecs = mp->m_bmap_dmxr[0];
+	int			level;
+	xfs_filblks_t		rval;
 
-	mp = ip->i_mount;
-	maxrecs = mp->m_bmap_dmxr[0];
 	for (level = 0, rval = 0;
 	     level < XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK);
 	     level++) {
@@ -2565,146 +2563,6 @@ done:
 #undef	PREV
 }
 
-/*
- * Convert a hole to a delayed allocation.
- */
-STATIC void
-xfs_bmap_add_extent_hole_delay(
-	xfs_inode_t		*ip,	/* incore inode pointer */
-	int			whichfork,
-	struct xfs_iext_cursor	*icur,
-	xfs_bmbt_irec_t		*new)	/* new data to add to file extents */
-{
-	struct xfs_ifork	*ifp;	/* inode fork pointer */
-	xfs_bmbt_irec_t		left;	/* left neighbor extent entry */
-	xfs_filblks_t		newlen=0;	/* new indirect size */
-	xfs_filblks_t		oldlen=0;	/* old indirect size */
-	xfs_bmbt_irec_t		right;	/* right neighbor extent entry */
-	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
-	xfs_filblks_t		temp;	 /* temp for indirect calculations */
-
-	ifp = xfs_ifork_ptr(ip, whichfork);
-	ASSERT(isnullstartblock(new->br_startblock));
-
-	/*
-	 * Check and set flags if this segment has a left neighbor
-	 */
-	if (xfs_iext_peek_prev_extent(ifp, icur, &left)) {
-		state |= BMAP_LEFT_VALID;
-		if (isnullstartblock(left.br_startblock))
-			state |= BMAP_LEFT_DELAY;
-	}
-
-	/*
-	 * Check and set flags if the current (right) segment exists.
-	 * If it doesn't exist, we're converting the hole at end-of-file.
-	 */
-	if (xfs_iext_get_extent(ifp, icur, &right)) {
-		state |= BMAP_RIGHT_VALID;
-		if (isnullstartblock(right.br_startblock))
-			state |= BMAP_RIGHT_DELAY;
-	}
-
-	/*
-	 * Set contiguity flags on the left and right neighbors.
-	 * Don't let extents get too large, even if the pieces are contiguous.
-	 */
-	if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
-	    left.br_startoff + left.br_blockcount == new->br_startoff &&
-	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
-		state |= BMAP_LEFT_CONTIG;
-
-	if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
-	    new->br_startoff + new->br_blockcount == right.br_startoff &&
-	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
-	    (!(state & BMAP_LEFT_CONTIG) ||
-	     (left.br_blockcount + new->br_blockcount +
-	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
-		state |= BMAP_RIGHT_CONTIG;
-
-	/*
-	 * Switch out based on the contiguity flags.
-	 */
-	switch (state & (BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG)) {
-	case BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG:
-		/*
-		 * New allocation is contiguous with delayed allocations
-		 * on the left and on the right.
-		 * Merge all three into a single extent record.
-		 */
-		temp = left.br_blockcount + new->br_blockcount +
-			right.br_blockcount;
-
-		oldlen = startblockval(left.br_startblock) +
-			startblockval(new->br_startblock) +
-			startblockval(right.br_startblock);
-		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
-					 oldlen);
-		left.br_startblock = nullstartblock(newlen);
-		left.br_blockcount = temp;
-
-		xfs_iext_remove(ip, icur, state);
-		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &left);
-		break;
-
-	case BMAP_LEFT_CONTIG:
-		/*
-		 * New allocation is contiguous with a delayed allocation
-		 * on the left.
-		 * Merge the new allocation with the left neighbor.
-		 */
-		temp = left.br_blockcount + new->br_blockcount;
-
-		oldlen = startblockval(left.br_startblock) +
-			startblockval(new->br_startblock);
-		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
-					 oldlen);
-		left.br_blockcount = temp;
-		left.br_startblock = nullstartblock(newlen);
-
-		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &left);
-		break;
-
-	case BMAP_RIGHT_CONTIG:
-		/*
-		 * New allocation is contiguous with a delayed allocation
-		 * on the right.
-		 * Merge the new allocation with the right neighbor.
-		 */
-		temp = new->br_blockcount + right.br_blockcount;
-		oldlen = startblockval(new->br_startblock) +
-			startblockval(right.br_startblock);
-		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
-					 oldlen);
-		right.br_startoff = new->br_startoff;
-		right.br_startblock = nullstartblock(newlen);
-		right.br_blockcount = temp;
-		xfs_iext_update_extent(ip, state, icur, &right);
-		break;
-
-	case 0:
-		/*
-		 * New allocation is not contiguous with another
-		 * delayed allocation.
-		 * Insert a new entry.
-		 */
-		oldlen = newlen = 0;
-		xfs_iext_insert(ip, icur, new, state);
-		break;
-	}
-	if (oldlen != newlen) {
-		ASSERT(oldlen > newlen);
-		xfs_add_fdblocks(ip->i_mount, oldlen - newlen);
-
-		/*
-		 * Nothing to do for disk quota accounting here.
-		 */
-		xfs_mod_delalloc(ip, 0, (int64_t)newlen - oldlen);
-	}
-}
-
 /*
  * Convert a hole to a real allocation.
  */
@@ -4033,144 +3891,6 @@ xfs_bmapi_read(
 	return 0;
 }
 
-/*
- * Add a delayed allocation extent to an inode. Blocks are reserved from the
- * global pool and the extent inserted into the inode in-core extent tree.
- *
- * On entry, got refers to the first extent beyond the offset of the extent to
- * allocate or eof is specified if no such extent exists. On return, got refers
- * to the extent record that was inserted to the inode fork.
- *
- * Note that the allocated extent may have been merged with contiguous extents
- * during insertion into the inode fork. Thus, got does not reflect the current
- * state of the inode fork on return. If necessary, the caller can use lastx to
- * look up the updated record in the inode fork.
- */
-int
-xfs_bmapi_reserve_delalloc(
-	struct xfs_inode	*ip,
-	int			whichfork,
-	xfs_fileoff_t		off,
-	xfs_filblks_t		len,
-	xfs_filblks_t		prealloc,
-	struct xfs_bmbt_irec	*got,
-	struct xfs_iext_cursor	*icur,
-	int			eof)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
-	xfs_extlen_t		alen;
-	xfs_extlen_t		indlen;
-	uint64_t		fdblocks;
-	int			error;
-	xfs_fileoff_t		aoff;
-	bool			use_cowextszhint =
-					whichfork == XFS_COW_FORK && !prealloc;
-
-retry:
-	/*
-	 * Cap the alloc length. Keep track of prealloc so we know whether to
-	 * tag the inode before we return.
-	 */
-	aoff = off;
-	alen = XFS_FILBLKS_MIN(len + prealloc, XFS_MAX_BMBT_EXTLEN);
-	if (!eof)
-		alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff);
-	if (prealloc && alen >= len)
-		prealloc = alen - len;
-
-	/*
-	 * If we're targetting the COW fork but aren't creating a speculative
-	 * posteof preallocation, try to expand the reservation to align with
-	 * the COW extent size hint if there's sufficient free space.
-	 *
-	 * Unlike the data fork, the CoW cancellation functions will free all
-	 * the reservations at inactivation, so we don't require that every
-	 * delalloc reservation have a dirty pagecache.
-	 */
-	if (use_cowextszhint) {
-		struct xfs_bmbt_irec	prev;
-		xfs_extlen_t		extsz = xfs_get_cowextsz_hint(ip);
-
-		if (!xfs_iext_peek_prev_extent(ifp, icur, &prev))
-			prev.br_startoff = NULLFILEOFF;
-
-		error = xfs_bmap_extsize_align(mp, got, &prev, extsz, 0, eof,
-					       1, 0, &aoff, &alen);
-		ASSERT(!error);
-	}
-
-	/*
-	 * Make a transaction-less quota reservation for delayed allocation
-	 * blocks.  This number gets adjusted later.  We return if we haven't
-	 * allocated blocks already inside this loop.
-	 */
-	error = xfs_quota_reserve_blkres(ip, alen);
-	if (error)
-		goto out;
-
-	/*
-	 * Split changing sb for alen and indlen since they could be coming
-	 * from different places.
-	 */
-	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
-	ASSERT(indlen > 0);
-
-	fdblocks = indlen;
-	if (XFS_IS_REALTIME_INODE(ip)) {
-		error = xfs_dec_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
-		if (error)
-			goto out_unreserve_quota;
-	} else {
-		fdblocks += alen;
-	}
-
-	error = xfs_dec_fdblocks(mp, fdblocks, false);
-	if (error)
-		goto out_unreserve_frextents;
-
-	ip->i_delayed_blks += alen;
-	xfs_mod_delalloc(ip, alen, indlen);
-
-	got->br_startoff = aoff;
-	got->br_startblock = nullstartblock(indlen);
-	got->br_blockcount = alen;
-	got->br_state = XFS_EXT_NORM;
-
-	xfs_bmap_add_extent_hole_delay(ip, whichfork, icur, got);
-
-	/*
-	 * Tag the inode if blocks were preallocated. Note that COW fork
-	 * preallocation can occur at the start or end of the extent, even when
-	 * prealloc == 0, so we must also check the aligned offset and length.
-	 */
-	if (whichfork == XFS_DATA_FORK && prealloc)
-		xfs_inode_set_eofblocks_tag(ip);
-	if (whichfork == XFS_COW_FORK && (prealloc || aoff < off || alen > len))
-		xfs_inode_set_cowblocks_tag(ip);
-
-	return 0;
-
-out_unreserve_frextents:
-	if (XFS_IS_REALTIME_INODE(ip))
-		xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
-out_unreserve_quota:
-	if (XFS_IS_QUOTA_ON(mp))
-		xfs_quota_unreserve_blkres(ip, alen);
-out:
-	if (error == -ENOSPC || error == -EDQUOT) {
-		trace_xfs_delalloc_enospc(ip, off, len);
-
-		if (prealloc || use_cowextszhint) {
-			/* retry without any preallocation */
-			use_cowextszhint = false;
-			prealloc = 0;
-			goto retry;
-		}
-	}
-	return error;
-}
-
 static int
 xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 4b721d935994..4d48087fd3a8 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -219,10 +219,6 @@ int	xfs_bmap_insert_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 		bool *done, xfs_fileoff_t stop_fsb);
 int	xfs_bmap_split_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t split_offset);
-int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
-		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
-		struct xfs_bmbt_irec *got, struct xfs_iext_cursor *cur,
-		int eof);
 int	xfs_bmapi_convert_delalloc(struct xfs_inode *ip, int whichfork,
 		xfs_off_t offset, struct iomap *iomap, unsigned int *seq);
 int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
@@ -233,6 +229,7 @@ xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip,
 		int fork);
 int	xfs_bmap_btalloc_low_space(struct xfs_bmalloca *ap,
 		struct xfs_alloc_arg *args);
+xfs_filblks_t xfs_bmap_worst_indlen(struct xfs_inode *ip, xfs_filblks_t len);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 08/45] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (6 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 07/45] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 09/45] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
                   ` (36 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: f42c652434de5e26e02798bf6a0c2a4a8627196b

The zone allocator wants to be able to remove a delalloc mapping in the
COW fork while keeping the block reservation.  To support that pass the
flags argument down to xfs_bmap_del_extent_delay and support the
XFS_BMAPI_REMAP flag to keep the reservation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c | 10 +++++++---
 libxfs/xfs_bmap.h |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 4dc66e77744f..c40cdf004ac9 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4662,7 +4662,8 @@ xfs_bmap_del_extent_delay(
 	int			whichfork,
 	struct xfs_iext_cursor	*icur,
 	struct xfs_bmbt_irec	*got,
-	struct xfs_bmbt_irec	*del)
+	struct xfs_bmbt_irec	*del,
+	uint32_t		bflags)	/* bmapi flags */
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
@@ -4782,7 +4783,9 @@ xfs_bmap_del_extent_delay(
 	da_diff = da_old - da_new;
 	fdblocks = da_diff;
 
-	if (isrt)
+	if (bflags & XFS_BMAPI_REMAP)
+		;
+	else if (isrt)
 		xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, del->br_blockcount));
 	else
 		fdblocks += del->br_blockcount;
@@ -5384,7 +5387,8 @@ __xfs_bunmapi(
 
 delete:
 		if (wasdel) {
-			xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
+			xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got,
+					&del, flags);
 		} else {
 			error = xfs_bmap_del_extent_real(ip, tp, &icur, cur,
 					&del, &tmp_logflags, whichfork,
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 4d48087fd3a8..b4d9c6e0f3f9 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -204,7 +204,7 @@ int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_extnum_t nexts, int *done);
 void	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
-		struct xfs_bmbt_irec *del);
+		struct xfs_bmbt_irec *del, uint32_t bflags);
 void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
 		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
 		struct xfs_bmbt_irec *del);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 09/45] xfs: add a xfs_rtrmap_highest_rgbno helper
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (7 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 08/45] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 10/45] xfs: define the zoned on-disk format Christoph Hellwig
                   ` (35 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: aacde95a37160b1462e46e0fd0cc7fd70e3bf1cc

Add a helper to find the last offset mapped in the rtrmap.  This will be
used by the zoned code to find out where to start writing again on
conventional devices without hardware zone support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtrmap_btree.c | 19 +++++++++++++++++++
 libxfs/xfs_rtrmap_btree.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c
index d897d140efa7..d46fb67bb5d4 100644
--- a/libxfs/xfs_rtrmap_btree.c
+++ b/libxfs/xfs_rtrmap_btree.c
@@ -1032,3 +1032,22 @@ xfs_rtrmapbt_init_rtsb(
 	xfs_btree_del_cursor(cur, error);
 	return error;
 }
+
+/*
+ * Return the highest rgbno currently tracked by the rmap for this rtg.
+ */
+xfs_rgblock_t
+xfs_rtrmap_highest_rgbno(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_btree_block	*block = rtg_rmap(rtg)->i_df.if_broot;
+	union xfs_btree_key	key = {};
+	struct xfs_btree_cur	*cur;
+
+	if (block->bb_numrecs == 0)
+		return NULLRGBLOCK;
+	cur = xfs_rtrmapbt_init_cursor(NULL, rtg);
+	xfs_btree_get_keys(cur, block, &key);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	return be32_to_cpu(key.__rmap_bigkey[1].rm_startblock);
+}
diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h
index 9d0915089891..e328fd62a149 100644
--- a/libxfs/xfs_rtrmap_btree.h
+++ b/libxfs/xfs_rtrmap_btree.h
@@ -207,4 +207,6 @@ struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg,
 int xfs_rtrmapbt_mem_init(struct xfs_mount *mp, struct xfbtree *xfbtree,
 		struct xfs_buftarg *btp, xfs_rgnumber_t rgno);
 
+xfs_rgblock_t xfs_rtrmap_highest_rgbno(struct xfs_rtgroup *rtg);
+
 #endif /* __XFS_RTRMAP_BTREE_H__ */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 10/45] xfs: define the zoned on-disk format
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (8 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 09/45] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 11/45] FIXUP: " Christoph Hellwig
                   ` (34 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 2167eaabe2fadde24cb8f1dafbec64da1d2ed2f5

Zone file systems reuse the basic RT group enabled XFS file system
structure to support a mode where each RT group is always written from
start to end and then reset for reuse (after moving out any remaining
data).  There are few minor but important changes, which are indicated
by a new incompat flag:

1) there are no bitmap and summary inodes, thus the
/rtgroups/{rgno}.{bitmap,summary} metadir files do not exist and the
sb_rbmblocks superblock field must be cleared to zero.

2) there is a new superblock field that specifies the start of an
internal RT section.  This allows supporting SMR HDDs that have random
writable space at the beginning which is used for the XFS data device
(which really is the metadata device for this configuration), directly
followed by a RT device on the same block device.  While something
similar could be achieved using dm-linear just having a single device
directly consumed by XFS makes handling the file systems a lot easier.

3) Another superblock field that tracks the amount of reserved space (or
overprovisioning) that is never used for user capacity, but allows GC
to run more smoothly.

4) an overlay of the cowextsize field for the rtrmap inode so that we
can persistently track the total amount of rtblocks currently used in
a RT group.  There is no data structure other than the rmap that
tracks used space in an RT group, and this counter is used to decide
when a RT group has been entirely emptied, and to select one that
is relatively empty if garbage collection needs to be performed.
While this counter could be tracked entirely in memory and rebuilt
from the rmap at mount time, that would lead to very long mount times
with the large number of RT groups implied by the number of hardware
zones especially on SMR hard drives with 256MB zone sizes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h     | 15 ++++++++--
 libxfs/xfs_inode_buf.c  | 21 ++++++++++----
 libxfs/xfs_inode_util.c |  1 +
 libxfs/xfs_log_format.h |  7 ++++-
 libxfs/xfs_ondisk.h     |  6 ++--
 libxfs/xfs_rtbitmap.c   | 11 +++++++
 libxfs/xfs_rtgroup.c    | 37 ++++++++++++++----------
 libxfs/xfs_sb.c         | 64 +++++++++++++++++++++++++++++++++++++++--
 8 files changed, 133 insertions(+), 29 deletions(-)

diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index b1007fb661ba..f67380a25805 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -178,9 +178,10 @@ typedef struct xfs_sb {
 
 	xfs_rgnumber_t	sb_rgcount;	/* number of realtime groups */
 	xfs_rtxlen_t	sb_rgextents;	/* size of a realtime group in rtx */
-
 	uint8_t		sb_rgblklog;    /* rt group number shift */
 	uint8_t		sb_pad[7];	/* zeroes */
+	xfs_rfsblock_t	sb_rtstart;	/* start of internal RT section (FSB) */
+	xfs_filblks_t	sb_rtreserved;	/* reserved (zoned) RT blocks */
 
 	/* must be padded to 64 bit alignment */
 } xfs_sb_t;
@@ -270,9 +271,10 @@ struct xfs_dsb {
 	__be64		sb_metadirino;	/* metadata directory tree root */
 	__be32		sb_rgcount;	/* # of realtime groups */
 	__be32		sb_rgextents;	/* size of rtgroup in rtx */
-
 	__u8		sb_rgblklog;    /* rt group number shift */
 	__u8		sb_pad[7];	/* zeroes */
+	__be64		sb_rtstart;	/* start of internal RT section (FSB) */
+	__be64		sb_rtreserved;	/* reserved (zoned) RT blocks */
 
 	/*
 	 * The size of this structure must be padded to 64 bit alignment.
@@ -395,6 +397,8 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_EXCHRANGE	(1 << 6)  /* exchangerange supported */
 #define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 7)  /* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_METADIR	(1 << 8)  /* metadata dir tree */
+#define XFS_SB_FEAT_INCOMPAT_ZONED	(1 << 9)  /* zoned RT allocator */
+
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE | \
 		 XFS_SB_FEAT_INCOMPAT_SPINODES | \
@@ -952,7 +956,12 @@ struct xfs_dinode {
 	__be64		di_changecount;	/* number of attribute changes */
 	__be64		di_lsn;		/* flush sequence */
 	__be64		di_flags2;	/* more random flags */
-	__be32		di_cowextsize;	/* basic cow extent size for file */
+	union {
+		/* basic cow extent size for (regular) file */
+		__be32		di_cowextsize;
+		/* used blocks in RTG for (zoned) rtrmap inode */
+		__be32		di_used_blocks;
+	};
 	__u8		di_pad2[12];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 5ca857ca2dc1..5ca753465b96 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -249,7 +249,10 @@ xfs_inode_from_disk(
 					   be64_to_cpu(from->di_changecount));
 		ip->i_crtime = xfs_inode_from_disk_ts(from, from->di_crtime);
 		ip->i_diflags2 = be64_to_cpu(from->di_flags2);
+		/* also covers the di_used_blocks union arm: */
 		ip->i_cowextsize = be32_to_cpu(from->di_cowextsize);
+		BUILD_BUG_ON(sizeof(from->di_cowextsize) !=
+			     sizeof(from->di_used_blocks));
 	}
 
 	error = xfs_iformat_data_fork(ip, from);
@@ -346,6 +349,7 @@ xfs_inode_to_disk(
 		to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
 		to->di_crtime = xfs_inode_to_disk_ts(ip, ip->i_crtime);
 		to->di_flags2 = cpu_to_be64(ip->i_diflags2);
+		/* also covers the di_used_blocks union arm: */
 		to->di_cowextsize = cpu_to_be32(ip->i_cowextsize);
 		to->di_ino = cpu_to_be64(ip->i_ino);
 		to->di_lsn = cpu_to_be64(lsn);
@@ -749,11 +753,18 @@ xfs_dinode_verify(
 	    !xfs_has_rtreflink(mp))
 		return __this_address;
 
-	/* COW extent size hint validation */
-	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
-			mode, flags, flags2);
-	if (fa)
-		return fa;
+	if (xfs_has_zoned(mp) &&
+	    dip->di_metatype == cpu_to_be16(XFS_METAFILE_RTRMAP)) {
+		if (be32_to_cpu(dip->di_used_blocks) > mp->m_sb.sb_rgextents)
+			return __this_address;
+	} else {
+		/* COW extent size hint validation */
+		fa = xfs_inode_validate_cowextsize(mp,
+				be32_to_cpu(dip->di_cowextsize),
+				mode, flags, flags2);
+		if (fa)
+			return fa;
+	}
 
 	/* bigtime iflag can only happen on bigtime filesystems */
 	if (xfs_dinode_has_bigtime(dip) &&
diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c
index edc985eb9a4e..2a7988d774b8 100644
--- a/libxfs/xfs_inode_util.c
+++ b/libxfs/xfs_inode_util.c
@@ -319,6 +319,7 @@ xfs_inode_init(
 
 	if (xfs_has_v3inodes(mp)) {
 		inode_set_iversion(inode, 1);
+		/* also covers the di_used_blocks union arm: */
 		ip->i_cowextsize = 0;
 		times |= XFS_ICHGTIME_CREATE;
 	}
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index a472ac2e45d0..0d637c276db0 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -475,7 +475,12 @@ struct xfs_log_dinode {
 	xfs_lsn_t	di_lsn;
 
 	uint64_t	di_flags2;	/* more random flags */
-	uint32_t	di_cowextsize;	/* basic cow extent size for file */
+	union {
+		/* basic cow extent size for (regular) file */
+		uint32_t		di_cowextsize;
+		/* used blocks in RTG for (zoned) rtrmap inode */
+		uint32_t		di_used_blocks;
+	};
 	uint8_t		di_pad2[12];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index a85ecddaa48e..5ed44fdf7491 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -233,8 +233,8 @@ xfs_check_ondisk_structs(void)
 			16299260424LL);
 
 	/* superblock field checks we got from xfs/122 */
-	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,		288);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_sb,		288);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,		304);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_sb,		304);
 	XFS_CHECK_SB_OFFSET(sb_magicnum,		0);
 	XFS_CHECK_SB_OFFSET(sb_blocksize,		4);
 	XFS_CHECK_SB_OFFSET(sb_dblocks,			8);
@@ -295,6 +295,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_SB_OFFSET(sb_rgextents,		276);
 	XFS_CHECK_SB_OFFSET(sb_rgblklog,		280);
 	XFS_CHECK_SB_OFFSET(sb_pad,			281);
+	XFS_CHECK_SB_OFFSET(sb_rtstart,			288);
+	XFS_CHECK_SB_OFFSET(sb_rtreserved,		296);
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 689d5844b8bd..34425a933650 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1118,6 +1118,7 @@ xfs_rtfree_blocks(
 	xfs_extlen_t		mod;
 	int			error;
 
+	ASSERT(!xfs_has_zoned(mp));
 	ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN);
 
 	mod = xfs_blen_to_rtxoff(mp, rtlen);
@@ -1169,6 +1170,9 @@ xfs_rtalloc_query_range(
 
 	end = min(end, rtg->rtg_extents - 1);
 
+	if (xfs_has_zoned(mp))
+		return -EINVAL;
+
 	/* Iterate the bitmap, looking for discrepancies. */
 	while (start <= end) {
 		struct xfs_rtalloc_rec	rec;
@@ -1263,6 +1267,8 @@ xfs_rtbitmap_blockcount_len(
 	struct xfs_mount	*mp,
 	xfs_rtbxlen_t		rtextents)
 {
+	if (xfs_has_zoned(mp))
+		return 0;
 	return howmany_64(rtextents, xfs_rtbitmap_rtx_per_rbmblock(mp));
 }
 
@@ -1303,6 +1309,11 @@ xfs_rtsummary_blockcount(
 	xfs_rtbxlen_t		rextents = xfs_rtbitmap_bitcount(mp);
 	unsigned long long	rsumwords;
 
+	if (xfs_has_zoned(mp)) {
+		*rsumlevels = 0;
+		return 0;
+	}
+
 	*rsumlevels = xfs_compute_rextslog(rextents) + 1;
 	rsumwords = xfs_rtbitmap_blockcount_len(mp, rextents) * (*rsumlevels);
 	return howmany_64(rsumwords, mp->m_blockwsize);
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 6f65ecc3015d..e58968286f32 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -191,15 +191,17 @@ xfs_rtgroup_lock(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
-		/*
-		 * Lock both realtime free space metadata inodes for a freespace
-		 * update.
-		 */
-		xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
-		xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL);
-	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
-		xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+	if (!xfs_has_zoned(rtg_mount(rtg))) {
+		if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+			/*
+			 * Lock both realtime free space metadata inodes for a
+			 * freespace update.
+			 */
+			xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+			xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL);
+		} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+			xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+		}
 	}
 
 	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
@@ -225,11 +227,13 @@ xfs_rtgroup_unlock(
 	if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg))
 		xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_EXCL);
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
-		xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
-		xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
-	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
-		xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+	if (!xfs_has_zoned(rtg_mount(rtg))) {
+		if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+			xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL);
+			xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL);
+		} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+			xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED);
+		}
 	}
 }
 
@@ -246,7 +250,8 @@ xfs_rtgroup_trans_join(
 	ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED));
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+	if (!xfs_has_zoned(rtg_mount(rtg)) &&
+	    (rtglock_flags & XFS_RTGLOCK_BITMAP)) {
 		xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL);
 	}
@@ -351,6 +356,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		.sick		= XFS_SICK_RG_BITMAP,
 		.fmt_mask	= (1U << XFS_DINODE_FMT_EXTENTS) |
 				  (1U << XFS_DINODE_FMT_BTREE),
+		.enabled	= xfs_has_nonzoned,
 		.create		= xfs_rtbitmap_create,
 	},
 	[XFS_RTGI_SUMMARY] = {
@@ -359,6 +365,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 		.sick		= XFS_SICK_RG_SUMMARY,
 		.fmt_mask	= (1U << XFS_DINODE_FMT_EXTENTS) |
 				  (1U << XFS_DINODE_FMT_BTREE),
+		.enabled	= xfs_has_nonzoned,
 		.create		= xfs_rtsummary_create,
 	},
 	[XFS_RTGI_RMAP] = {
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 1781ca36b2cc..bc84792c565c 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -27,6 +27,7 @@
 #include "xfs_rtgroup.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_rtrefcount_btree.h"
+#include "xfs_rtbitmap.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -182,6 +183,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_PARENT;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
 		features |= XFS_FEAT_METADIR;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED)
+		features |= XFS_FEAT_ZONED;
 
 	return features;
 }
@@ -263,6 +266,9 @@ static uint64_t
 xfs_expected_rbmblocks(
 	struct xfs_sb		*sbp)
 {
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED))
+		return 0;
 	return howmany_64(xfs_extents_per_rbm(sbp),
 			  NBBY * xfs_rtbmblock_size(sbp));
 }
@@ -272,9 +278,15 @@ bool
 xfs_validate_rt_geometry(
 	struct xfs_sb		*sbp)
 {
-	if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
-	    sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE)
-		return false;
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED)) {
+		if (sbp->sb_rextsize != 1)
+			return false;
+	} else {
+		if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
+		    sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE)
+			return false;
+	}
 
 	if (sbp->sb_rblocks == 0) {
 		if (sbp->sb_rextents != 0 || sbp->sb_rbmblocks != 0 ||
@@ -432,6 +444,34 @@ xfs_validate_sb_rtgroups(
 	return 0;
 }
 
+static int
+xfs_validate_sb_zoned(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*sbp)
+{
+	if (sbp->sb_frextents != 0) {
+		xfs_warn(mp,
+"sb_frextents must be zero for zoned file systems.");
+		return -EINVAL;
+	}
+
+	if (sbp->sb_rtstart && sbp->sb_rtstart < sbp->sb_dblocks) {
+		xfs_warn(mp,
+"sb_rtstart (%lld) overlaps sb_dblocks (%lld).",
+			sbp->sb_rtstart, sbp->sb_dblocks);
+		return -EINVAL;
+	}
+
+	if (sbp->sb_rtreserved && sbp->sb_rtreserved >= sbp->sb_rblocks) {
+		xfs_warn(mp,
+"sb_rtreserved (%lld) larger than sb_rblocks (%lld).",
+			sbp->sb_rtreserved, sbp->sb_rblocks);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Check the validity of the SB. */
 STATIC int
 xfs_validate_sb_common(
@@ -520,6 +560,11 @@ xfs_validate_sb_common(
 			if (error)
 				return error;
 		}
+		if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+			error = xfs_validate_sb_zoned(mp, sbp);
+			if (error)
+				return error;
+		}
 	} else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD |
 				XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) {
 			xfs_notice(mp,
@@ -832,6 +877,14 @@ __xfs_sb_from_disk(
 		to->sb_rgcount = 1;
 		to->sb_rgextents = 0;
 	}
+
+	if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+		to->sb_rtstart = be64_to_cpu(from->sb_rtstart);
+		to->sb_rtreserved = be64_to_cpu(from->sb_rtreserved);
+	} else {
+		to->sb_rtstart = 0;
+		to->sb_rtreserved = 0;
+	}
 }
 
 void
@@ -998,6 +1051,11 @@ xfs_sb_to_disk(
 		to->sb_rbmino = cpu_to_be64(0);
 		to->sb_rsumino = cpu_to_be64(0);
 	}
+
+	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_ZONED) {
+		to->sb_rtstart = cpu_to_be64(from->sb_rtstart);
+		to->sb_rtreserved = cpu_to_be64(from->sb_rtreserved);
+	}
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 11/45] FIXUP: xfs: define the zoned on-disk format
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (9 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 10/45] xfs: define the zoned on-disk format Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:47   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 12/45] xfs: allow internal RT devices for zoned mode Christoph Hellwig
                   ` (33 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

---
 include/xfs_inode.h |  6 ++++++
 include/xfs_mount.h | 12 ++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 5bb31eb4aa53..efef0da636d1 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -234,6 +234,7 @@ typedef struct xfs_inode {
 	xfs_extlen_t		i_extsize;	/* basic/minimum extent size */
 	/* cowextsize is only used for v3 inodes, flushiter for v1/2 */
 	union {
+		uint32_t	i_used_blocks;
 		xfs_extlen_t	i_cowextsize;	/* basic cow extent size */
 		uint16_t	i_flushiter;	/* incremented on flush */
 	};
@@ -361,6 +362,11 @@ static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
 }
 #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_diflags & XFS_DIFLAG_REALTIME)
 
+static inline bool xfs_is_zoned_inode(struct xfs_inode *ip)
+{
+	return xfs_has_zoned(ip->i_mount) && XFS_IS_REALTIME_INODE(ip);
+}
+
 /* inode link counts */
 static inline void set_nlink(struct inode *inode, uint32_t nlink)
 {
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 0acf952eb9d7..7856acfb9f8e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -207,6 +207,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
 #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
+#define XFS_FEAT_ZONED		(1ULL << 29)	/* zoned RT device */
 
 #define __XFS_HAS_FEAT(name, NAME) \
 static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
@@ -253,7 +254,7 @@ __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
 __XFS_HAS_FEAT(large_extent_counts, NREXT64)
 __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
 __XFS_HAS_FEAT(metadir, METADIR)
-
+__XFS_HAS_FEAT(zoned, ZONED)
 
 static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
 {
@@ -264,7 +265,9 @@ static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
 static inline bool xfs_has_rtsb(const struct xfs_mount *mp)
 {
 	/* all rtgroups filesystems with an rt section have an rtsb */
-	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
+	return xfs_has_rtgroups(mp) &&
+		xfs_has_realtime(mp) &&
+		!xfs_has_zoned(mp);
 }
 
 static inline bool xfs_has_rtrmapbt(const struct xfs_mount *mp)
@@ -279,6 +282,11 @@ static inline bool xfs_has_rtreflink(const struct xfs_mount *mp)
 	       xfs_has_reflink(mp);
 }
 
+static inline bool xfs_has_nonzoned(const struct xfs_mount *mp)
+{
+	return !xfs_has_zoned(mp);
+}
+
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
 static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 12/45] xfs: allow internal RT devices for zoned mode
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (10 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 11/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 13/45] FIXUP: " Christoph Hellwig
                   ` (32 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: bdc03eb5f98f6f1ae4bd5e020d1582a23efb7799

Allow creating an RT subvolume on the same device as the main data
device.  This is mostly used for SMR HDDs where the conventional zones
are used for the data device and the sequential write required zones
for the zoned RT section.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_group.h   | 6 ++++--
 libxfs/xfs_rtgroup.h | 8 +++++---
 libxfs/xfs_sb.c      | 1 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index 242b05627c7a..a70096113384 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -107,9 +107,11 @@ xfs_gbno_to_daddr(
 	xfs_agblock_t		gbno)
 {
 	struct xfs_mount	*mp = xg->xg_mount;
-	uint32_t		blocks = mp->m_groups[xg->xg_type].blocks;
+	struct xfs_groups	*g = &mp->m_groups[xg->xg_type];
+	xfs_fsblock_t		fsbno;
 
-	return XFS_FSB_TO_BB(mp, (xfs_fsblock_t)xg->xg_gno * blocks + gbno);
+	fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+	return XFS_FSB_TO_BB(mp, g->start_fsb + fsbno);
 }
 
 static inline uint32_t
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 9c7e03f913cb..e35d1d798327 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -230,7 +230,8 @@ xfs_rtb_to_daddr(
 	xfs_rgnumber_t		rgno = xfs_rtb_to_rgno(mp, rtbno);
 	uint64_t		start_bno = (xfs_rtblock_t)rgno * g->blocks;
 
-	return XFS_FSB_TO_BB(mp, start_bno + (rtbno & g->blkmask));
+	return XFS_FSB_TO_BB(mp,
+		g->start_fsb + start_bno + (rtbno & g->blkmask));
 }
 
 static inline xfs_rtblock_t
@@ -238,10 +239,11 @@ xfs_daddr_to_rtb(
 	struct xfs_mount	*mp,
 	xfs_daddr_t		daddr)
 {
-	xfs_rfsblock_t		bno = XFS_BB_TO_FSBT(mp, daddr);
+	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
+	xfs_rfsblock_t		bno;
 
+	bno = XFS_BB_TO_FSBT(mp, daddr) - g->start_fsb;
 	if (xfs_has_rtgroups(mp)) {
-		struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
 		xfs_rgnumber_t	rgno;
 		uint32_t	rgbno;
 
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index bc84792c565c..a95d712363fa 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1201,6 +1201,7 @@ xfs_sb_mount_rextsize(
 		rgs->blocks = sbp->sb_rgextents * sbp->sb_rextsize;
 		rgs->blklog = mp->m_sb.sb_rgblklog;
 		rgs->blkmask = xfs_mask32lo(mp->m_sb.sb_rgblklog);
+		rgs->start_fsb = mp->m_sb.sb_rtstart;
 	} else {
 		rgs->blocks = 0;
 		rgs->blklog = 0;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 13/45] FIXUP: xfs: allow internal RT devices for zoned mode
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (11 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 12/45] xfs: allow internal RT devices for zoned mode Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:55   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 14/45] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
                   ` (31 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h    |  6 ++++++
 include/xfs_mount.h |  7 +++++++
 libfrog/fsgeom.c    |  2 +-
 libxfs/init.c       | 13 +++++++++----
 libxfs/rdwr.c       |  2 ++
 repair/agheader.c   |  4 +++-
 6 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index 82b34b9d81c3..b968a2b88da3 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -293,4 +293,10 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
 		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
 }
 
+static inline bool xfs_sb_version_haszoned(struct xfs_sb *sbp)
+{
+	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_ZONED);
+}
+
 #endif	/* __LIBXFS_H__ */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 7856acfb9f8e..bf9ebc25fc79 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -53,6 +53,13 @@ struct xfs_groups {
 	 * rtgroup, so this mask must be 64-bit.
 	 */
 	uint64_t		blkmask;
+
+	/*
+	 * Start of the first group in the device.  This is used to support a
+	 * RT device following the data device on the same block device for
+	 * SMR hard drives.
+	 */
+	xfs_fsblock_t		start_fsb;
 };
 
 /*
diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index b5220d2d6ffd..13df88ae43a7 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -81,7 +81,7 @@ xfs_report_geom(
 		isint ? _("internal log") : logname ? logname : _("external"),
 			geo->blocksize, geo->logblocks, logversion,
 		"", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,
-		!geo->rtblocks ? _("none") : rtname ? rtname : _("external"),
+		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),
 		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
 			(unsigned long long)geo->rtextents,
 		"", geo->rgcount, geo->rgextents);
diff --git a/libxfs/init.c b/libxfs/init.c
index 5b45ed347276..a186369f3fd8 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -560,7 +560,7 @@ libxfs_buftarg_init(
 				progname);
 			exit(1);
 		}
-		if (xi->rt.dev &&
+		if ((xi->rt.dev || xi->rt.dev == xi->data.dev) &&
 		    (mp->m_rtdev_targp->bt_bdev != xi->rt.dev ||
 		     mp->m_rtdev_targp->bt_mount != mp)) {
 			fprintf(stderr,
@@ -577,7 +577,11 @@ libxfs_buftarg_init(
 	else
 		mp->m_logdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->log,
 				lfail);
-	mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt, rfail);
+	if (!xi->rt.dev || xi->rt.dev == xi->data.dev)
+		mp->m_rtdev_targp = mp->m_ddev_targp;
+	else
+		mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt,
+				rfail);
 }
 
 /* Compute maximum possible height for per-AG btree types for this fs. */
@@ -978,7 +982,7 @@ libxfs_flush_mount(
 			error = err2;
 	}
 
-	if (mp->m_rtdev_targp) {
+	if (mp->m_rtdev_targp && mp->m_rtdev_targp != mp->m_ddev_targp) {
 		err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
 				_("realtime device"));
 		if (!error)
@@ -1031,7 +1035,8 @@ libxfs_umount(
 	free(mp->m_fsname);
 	mp->m_fsname = NULL;
 
-	libxfs_buftarg_free(mp->m_rtdev_targp);
+	if (mp->m_rtdev_targp != mp->m_ddev_targp)
+		libxfs_buftarg_free(mp->m_rtdev_targp);
 	if (mp->m_logdev_targp != mp->m_ddev_targp)
 		libxfs_buftarg_free(mp->m_logdev_targp);
 	libxfs_buftarg_free(mp->m_ddev_targp);
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 35be785c435a..f06763b38bd8 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -175,6 +175,8 @@ libxfs_getrtsb(
 	if (!mp->m_rtdev_targp->bt_bdev)
 		return NULL;
 
+	ASSERT(!mp->m_sb.sb_rtstart);
+
 	error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
 			XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
 	if (error)
diff --git a/repair/agheader.c b/repair/agheader.c
index 327ba041671f..5bb4e47e0c5b 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -485,7 +485,9 @@ secondary_sb_whack(
 	 *
 	 * size is the size of data which is valid for this sb.
 	 */
-	if (xfs_sb_version_hasmetadir(sb))
+	if (xfs_sb_version_haszoned(sb))
+		size = offsetofend(struct xfs_dsb, sb_rtstart);
+	else if (xfs_sb_version_hasmetadir(sb))
 		size = offsetofend(struct xfs_dsb, sb_pad);
 	else if (xfs_sb_version_hasmetauuid(sb))
 		size = offsetofend(struct xfs_dsb, sb_meta_uuid);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 14/45] xfs: export zoned geometry via XFS_FSOP_GEOM
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (12 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 13/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 15/45] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
                   ` (30 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 1fd8159e7ca41203798b6f65efaf1724eb318cd4

Export the zoned geometry information so that userspace can query it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h | 5 ++++-
 libxfs/xfs_sb.c | 6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 2c3171262b44..5e66fb2b2cc7 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -189,7 +189,9 @@ struct xfs_fsop_geom {
 	uint32_t	checked;	/* o: checked fs & rt metadata	*/
 	__u32		rgextents;	/* rt extents in a realtime group */
 	__u32		rgcount;	/* number of realtime groups	*/
-	__u64		reserved[16];	/* reserved space		*/
+	__u64		rtstart;	/* start of internal rt section */
+	__u64		rtreserved;	/* RT (zoned) reserved blocks	*/
+	__u64		reserved[14];	/* reserved space		*/
 };
 
 #define XFS_FSOP_GEOM_SICK_COUNTERS	(1 << 0)  /* summary counters */
@@ -247,6 +249,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
 #define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 25) /* linux parent pointers */
 #define XFS_FSOP_GEOM_FLAGS_METADIR	(1 << 26) /* metadata directories */
+#define XFS_FSOP_GEOM_FLAGS_ZONED	(1 << 27) /* zoned rt device */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index a95d712363fa..8344b40e4ab9 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1566,6 +1566,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE;
 	if (xfs_has_metadir(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
+	if (xfs_has_zoned(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_ZONED;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 
@@ -1586,6 +1588,10 @@ xfs_fs_geometry(
 		geo->rgcount = sbp->sb_rgcount;
 		geo->rgextents = sbp->sb_rgextents;
 	}
+	if (xfs_has_zoned(mp)) {
+		geo->rtstart = sbp->sb_rtstart;
+		geo->rtreserved = sbp->sb_rtreserved;
+	}
 }
 
 /* Read a secondary superblock. */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 15/45] xfs: disable sb_frextents for zoned file systems
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (13 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 14/45] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 16/45] xfs: parse and validate hardware zone information Christoph Hellwig
                   ` (29 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 1d319ac6fe1bd6364c5fc6e285ac47b117aed117

Zoned file systems not only don't use the global frextents counter, but
for them the in-memory percpu counter also includes reservations taken
before even allocating delalloc extent records, so it will never match
the per-zone used information.  Disable all updates and verification of
the sb counter for zoned file systems as it isn't useful for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 8344b40e4ab9..8df99e518c70 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1330,7 +1330,7 @@ xfs_log_sb(
 	 * we handle nearly-lockless reservations, so we must use the _positive
 	 * variant here to avoid writing out nonsense frextents.
 	 */
-	if (xfs_has_rtgroups(mp)) {
+	if (xfs_has_rtgroups(mp) && !xfs_has_zoned(mp)) {
 		mp->m_sb.sb_frextents =
 				xfs_sum_freecounter(mp, XC_FREE_RTEXTENTS);
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 16/45] xfs: parse and validate hardware zone information
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (14 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 15/45] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 17/45] FIXUP: " Christoph Hellwig
                   ` (28 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 720c2d58348329ce57bfa7ecef93ee0c9bf4b405

Add support to validate and parse reported hardware zone state.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/xfs_zones.c | 171 +++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_zones.h |  35 ++++++++++
 2 files changed, 206 insertions(+)
 create mode 100644 libxfs/xfs_zones.c
 create mode 100644 libxfs/xfs_zones.h

diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
new file mode 100644
index 000000000000..b022ed960eac
--- /dev/null
+++ b/libxfs/xfs_zones.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023-2025 Christoph Hellwig.
+ * Copyright (c) 2024-2025, Western Digital Corporation or its affiliates.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_rtgroup.h"
+#include "xfs_zones.h"
+
+static bool
+xfs_zone_validate_empty(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		*write_pointer)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	if (rtg_rmap(rtg)->i_used_blocks > 0) {
+		xfs_warn(mp, "empty zone %u has non-zero used counter (0x%x).",
+			 rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+		return false;
+	}
+
+	*write_pointer = 0;
+	return true;
+}
+
+static bool
+xfs_zone_validate_wp(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		*write_pointer)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	xfs_rtblock_t		wp_fsb = xfs_daddr_to_rtb(mp, zone->wp);
+
+	if (rtg_rmap(rtg)->i_used_blocks > rtg->rtg_extents) {
+		xfs_warn(mp, "zone %u has too large used counter (0x%x).",
+			 rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+		return false;
+	}
+
+	if (xfs_rtb_to_rgno(mp, wp_fsb) != rtg_rgno(rtg)) {
+		xfs_warn(mp, "zone %u write pointer (0x%llx) outside of zone.",
+			 rtg_rgno(rtg), wp_fsb);
+		return false;
+	}
+
+	*write_pointer = xfs_rtb_to_rgbno(mp, wp_fsb);
+	if (*write_pointer >= rtg->rtg_extents) {
+		xfs_warn(mp, "zone %u has invalid write pointer (0x%x).",
+			 rtg_rgno(rtg), *write_pointer);
+		return false;
+	}
+
+	return true;
+}
+
+static bool
+xfs_zone_validate_full(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		*write_pointer)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	if (rtg_rmap(rtg)->i_used_blocks > rtg->rtg_extents) {
+		xfs_warn(mp, "zone %u has too large used counter (0x%x).",
+			 rtg_rgno(rtg), rtg_rmap(rtg)->i_used_blocks);
+		return false;
+	}
+
+	*write_pointer = rtg->rtg_extents;
+	return true;
+}
+
+static bool
+xfs_zone_validate_seq(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		*write_pointer)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	switch (zone->cond) {
+	case BLK_ZONE_COND_EMPTY:
+		return xfs_zone_validate_empty(zone, rtg, write_pointer);
+	case BLK_ZONE_COND_IMP_OPEN:
+	case BLK_ZONE_COND_EXP_OPEN:
+	case BLK_ZONE_COND_CLOSED:
+		return xfs_zone_validate_wp(zone, rtg, write_pointer);
+	case BLK_ZONE_COND_FULL:
+		return xfs_zone_validate_full(zone, rtg, write_pointer);
+	case BLK_ZONE_COND_NOT_WP:
+	case BLK_ZONE_COND_OFFLINE:
+	case BLK_ZONE_COND_READONLY:
+		xfs_warn(mp, "zone %u has unsupported zone condition 0x%x.",
+			rtg_rgno(rtg), zone->cond);
+		return false;
+	default:
+		xfs_warn(mp, "zone %u has unknown zone condition 0x%x.",
+			rtg_rgno(rtg), zone->cond);
+		return false;
+	}
+}
+
+static bool
+xfs_zone_validate_conv(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	switch (zone->cond) {
+	case BLK_ZONE_COND_NOT_WP:
+		return true;
+	default:
+		xfs_warn(mp,
+"conventional zone %u has unsupported zone condition 0x%x.",
+			 rtg_rgno(rtg), zone->cond);
+		return false;
+	}
+}
+
+bool
+xfs_zone_validate(
+	struct blk_zone		*zone,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		*write_pointer)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
+
+	/*
+	 * Check that the zone capacity matches the rtgroup size stored in the
+	 * superblock.  Note that all zones including the last one must have a
+	 * uniform capacity.
+	 */
+	if (XFS_BB_TO_FSB(mp, zone->capacity) != g->blocks) {
+		xfs_warn(mp,
+"zone %u capacity (0x%llx) does not match RT group size (0x%x).",
+			rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->capacity),
+			g->blocks);
+		return false;
+	}
+
+	if (XFS_BB_TO_FSB(mp, zone->len) != 1 << g->blklog) {
+		xfs_warn(mp,
+"zone %u length (0x%llx) does match geometry (0x%x).",
+			rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->len),
+			1 << g->blklog);
+	}
+
+	switch (zone->type) {
+	case BLK_ZONE_TYPE_CONVENTIONAL:
+		return xfs_zone_validate_conv(zone, rtg);
+	case BLK_ZONE_TYPE_SEQWRITE_REQ:
+		return xfs_zone_validate_seq(zone, rtg, write_pointer);
+	default:
+		xfs_warn(mp, "zoned %u has unsupported type 0x%x.",
+			rtg_rgno(rtg), zone->type);
+		return false;
+	}
+}
diff --git a/libxfs/xfs_zones.h b/libxfs/xfs_zones.h
new file mode 100644
index 000000000000..c4f1367b2cca
--- /dev/null
+++ b/libxfs/xfs_zones.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LIBXFS_ZONES_H
+#define _LIBXFS_ZONES_H
+
+struct xfs_rtgroup;
+
+/*
+ * In order to guarantee forward progress for GC we need to reserve at least
+ * two zones:  one that will be used for moving data into and one spare zone
+ * making sure that we have enough space to relocate a nearly-full zone.
+ * To allow for slightly sloppy accounting for when we need to reserve the
+ * second zone, we actually reserve three as that is easier than doing fully
+ * accurate bookkeeping.
+ */
+#define XFS_GC_ZONES		3U
+
+/*
+ * In addition we need two zones for user writes, one open zone for writing
+ * and one to still have available blocks without resetting the open zone
+ * when data in the open zone has been freed.
+ */
+#define XFS_RESERVED_ZONES	(XFS_GC_ZONES + 1)
+#define XFS_MIN_ZONES		(XFS_RESERVED_ZONES + 1)
+
+/*
+ * Always keep one zone out of the general open zone pool to allow for GC to
+ * happen while other writers are waiting for free space.
+ */
+#define XFS_OPEN_GC_ZONES	1U
+#define XFS_MIN_OPEN_ZONES	(XFS_OPEN_GC_ZONES + 1U)
+
+bool xfs_zone_validate(struct blk_zone *zone, struct xfs_rtgroup *rtg,
+	xfs_rgblock_t *write_pointer);
+
+#endif /* _LIBXFS_ZONES_H */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 17/45] FIXUP: xfs: parse and validate hardware zone information
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (15 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 16/45] xfs: parse and validate hardware zone information Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:56   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 18/45] xfs: add the zoned space allocator Christoph Hellwig
                   ` (27 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/Makefile    | 6 ++++--
 libxfs/xfs_zones.c | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libxfs/Makefile b/libxfs/Makefile
index f3daa598ca97..61c43529b532 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -71,7 +71,8 @@ HFILES = \
 	xfs_shared.h \
 	xfs_trans_resv.h \
 	xfs_trans_space.h \
-	xfs_dir2_priv.h
+	xfs_dir2_priv.h \
+	xfs_zones.h
 
 CFILES = buf_mem.c \
 	cache.c \
@@ -135,7 +136,8 @@ CFILES = buf_mem.c \
 	xfs_trans_inode.c \
 	xfs_trans_resv.c \
 	xfs_trans_space.c \
-	xfs_types.c
+	xfs_types.c \
+	xfs_zones.c
 
 EXTRA_CFILES=\
 	ioctl_c_dummy.c
diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
index b022ed960eac..712c0fe9b0da 100644
--- a/libxfs/xfs_zones.c
+++ b/libxfs/xfs_zones.c
@@ -3,6 +3,8 @@
  * Copyright (c) 2023-2025 Christoph Hellwig.
  * Copyright (c) 2024-2025, Western Digital Corporation or its affiliates.
  */
+#include <linux/blkzoned.h>
+#include "libxfs_priv.h"
 #include "xfs.h"
 #include "xfs_fs.h"
 #include "xfs_shared.h"
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 18/45] xfs: add the zoned space allocator
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (16 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 17/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 19/45] xfs: add support for zoned space reservations Christoph Hellwig
                   ` (26 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 4e4d52075577707f8393e3fc74c1ef79ca1d3ce6

For zoned RT devices space is always allocated at the write pointer, that
is right after the last written block and only recorded on I/O completion.

Because the actual allocation algorithm is very simple and just involves
picking a good zone - preferably the one used for the last write to the
inode.  As the number of zones that can written at the same time is
usually limited by the hardware, selecting a zone is done as late as
possible from the iomap dio and buffered writeback bio submissions
helpers just before submitting the bio.

Given that the writers already took a reservation before acquiring the
iolock, space will always be readily available if an open zone slot is
available.  A new structure is used to track these open zones, and
pointed to by the xfs_rtgroup.  Because zoned file systems don't have
a rsum cache the space for that pointer can be reused.

Allocations are only recorded at I/O completion time.  The scheme used
for that is very similar to the reflink COW end I/O path.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.h | 22 +++++++++++++++++-----
 libxfs/xfs_types.h   |  1 +
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index e35d1d798327..5d8777f819f4 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -37,15 +37,27 @@ struct xfs_rtgroup {
 	xfs_rtxnum_t		rtg_extents;
 
 	/*
-	 * Cache of rt summary level per bitmap block with the invariant that
-	 * rtg_rsum_cache[bbno] > the maximum i for which rsum[i][bbno] != 0,
-	 * or 0 if rsum[i][bbno] == 0 for all i.
-	 *
+	 * For bitmap based RT devices this points to a cache of rt summary
+	 * level per bitmap block with the invariant that rtg_rsum_cache[bbno]
+	 * > the maximum i for which rsum[i][bbno] != 0, or 0 if
+	 * rsum[i][bbno] == 0 for all i.
 	 * Reads and writes are serialized by the rsumip inode lock.
+	 *
+	 * For zoned RT devices this points to the open zone structure for
+	 * a group that is open for writers, or is NULL.
 	 */
-	uint8_t			*rtg_rsum_cache;
+	union {
+		uint8_t			*rtg_rsum_cache;
+		struct xfs_open_zone	*rtg_open_zone;
+	};
 };
 
+/*
+ * For zoned RT devices this is set on groups that have no written blocks
+ * and can be picked by the allocator for opening.
+ */
+#define XFS_RTG_FREE			XA_MARK_0
+
 static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
 {
 	return container_of(xg, struct xfs_rtgroup, rtg_group);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index 76f3c31573ec..dc1db15f0be5 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -243,6 +243,7 @@ enum xfs_free_counter {
 	 * Number of free RT extents on the RT device.
 	 */
 	XC_FREE_RTEXTENTS,
+
 	XC_FREE_NR,
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 19/45] xfs: add support for zoned space reservations
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (17 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 18/45] xfs: add the zoned space allocator Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 20/45] FIXUP: " Christoph Hellwig
                   ` (25 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 0bb2193056b5969e4148fc0909e89a5362da873e

For zoned file systems garbage collection (GC) has to take the iolock
and mmaplock after moving data to a new place to synchronize with
readers.  This means waiting for garbage collection with the iolock can
deadlock.

To avoid this, the worst case required blocks have to be reserved before
taking the iolock, which is done using a new RTAVAILABLE counter that
tracks blocks that are free to write into and don't require garbage
collection.  The new helpers try to take these available blocks, and
if there aren't enough available it wakes and waits for GC.  This is
done using a list of on-stack reservations to ensure fairness.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c  | 15 +++++++++++----
 libxfs/xfs_types.h | 12 +++++++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index c40cdf004ac9..3a857181bfa4 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -35,6 +35,7 @@
 #include "xfs_symlink_remote.h"
 #include "xfs_inode_util.h"
 #include "xfs_rtgroup.h"
+#include "xfs_zone_alloc.h"
 
 struct kmem_cache		*xfs_bmap_intent_cache;
 
@@ -4783,12 +4784,18 @@ xfs_bmap_del_extent_delay(
 	da_diff = da_old - da_new;
 	fdblocks = da_diff;
 
-	if (bflags & XFS_BMAPI_REMAP)
+	if (bflags & XFS_BMAPI_REMAP) {
 		;
-	else if (isrt)
-		xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, del->br_blockcount));
-	else
+	} else if (isrt) {
+		xfs_rtbxlen_t	rtxlen;
+
+		rtxlen = xfs_blen_to_rtbxlen(mp, del->br_blockcount);
+		if (xfs_is_zoned_inode(ip))
+			xfs_zoned_add_available(mp, rtxlen);
+		xfs_add_frextents(mp, rtxlen);
+	} else {
 		fdblocks += del->br_blockcount;
+	}
 
 	xfs_add_fdblocks(mp, fdblocks);
 	xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index dc1db15f0be5..f6f4f2d4b5db 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -244,12 +244,22 @@ enum xfs_free_counter {
 	 */
 	XC_FREE_RTEXTENTS,
 
+	/*
+	 * Number of available for use RT extents.
+	 *
+	 * This counter only exists for zoned RT device and indicates the number
+	 * of RT extents that can be directly used by writes.  XC_FREE_RTEXTENTS
+	 * also includes blocks that have been written previously and freed, but
+	 * sit in a rtgroup that still needs a zone reset.
+	 */
+	XC_FREE_RTAVAILABLE,
 	XC_FREE_NR,
 };
 
 #define XFS_FREECOUNTER_STR \
 	{ XC_FREE_BLOCKS,		"blocks" }, \
-	{ XC_FREE_RTEXTENTS,		"rtextents" }
+	{ XC_FREE_RTEXTENTS,		"rtextents" }, \
+	{ XC_FREE_RTAVAILABLE,		"rtavailable" }
 
 /*
  * Type verifier functions
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 20/45] FIXUP: xfs: add support for zoned space reservations
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (18 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 19/45] xfs: add support for zoned space reservations Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:56   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 21/45] xfs: implement zoned garbage collection Christoph Hellwig
                   ` (24 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_priv.h | 2 ++
 libxfs/xfs_bmap.c    | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 82952b0db629..d5f7d28e08e2 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -471,6 +471,8 @@ static inline int retzero(void) { return 0; }
 #define xfs_sb_validate_fsb_count(sbp, nblks)		(0)
 #define xlog_calc_iovec_len(len)		roundup(len, sizeof(uint32_t))
 
+#define xfs_zoned_add_available(mp, rtxnum)	do { } while (0)
+
 /*
  * Prototypes for kernel static functions that are aren't in their
  * associated header files.
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 3a857181bfa4..3cb47c3c8707 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -35,7 +35,6 @@
 #include "xfs_symlink_remote.h"
 #include "xfs_inode_util.h"
 #include "xfs_rtgroup.h"
-#include "xfs_zone_alloc.h"
 
 struct kmem_cache		*xfs_bmap_intent_cache;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 21/45] xfs: implement zoned garbage collection
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (19 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 20/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 22/45] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
                   ` (23 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 080d01c41d44f0993f2c235a6bfdb681f0a66be6

RT groups on a zoned file system need to be completely empty before their
space can be reused.  This means that partially empty groups need to be
emptied entirely to free up space if no entirely free groups are
available.

Add a garbage collection thread that moves all data out of the least used
zone when not enough free zones are available, and which resets all zones
that have been emptied.  To find empty zone a simple set of 10 buckets
based on the amount of space used in the zone is used.  To empty zones,
the rmap is walked to find the owners and the data is read and then
written to the new place.

To automatically defragment files the rmap records are sorted by inode
and logical offset.  This means defragmentation of parallel writes into
a single zone happens automatically when performing garbage collection.
Because holding the iolock over the entire GC cycle would inject very
noticeable latency for other accesses to the inodes, the iolock is not
taken while performing I/O.  Instead the I/O completion handler checks
that the mapping hasn't changed over the one recorded at the start of
the GC cycle and doesn't update the mapping if it change.

Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_group.h   | 21 +++++++++++++++++----
 libxfs/xfs_rtgroup.h |  6 ++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index a70096113384..cff3f815947b 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -19,10 +19,23 @@ struct xfs_group {
 #ifdef __KERNEL__
 	/* -- kernel only structures below this line -- */
 
-	/*
-	 * Track freed but not yet committed extents.
-	 */
-	struct xfs_extent_busy_tree *xg_busy_extents;
+	union {
+		/*
+		 * For perags and non-zoned RT groups:
+		 * Track freed but not yet committed extents.
+		 */
+		struct xfs_extent_busy_tree	*xg_busy_extents;
+
+		/*
+		 * For zoned RT groups:
+		 * List of groups that need a zone reset.
+		 *
+		 * The zonegc code forces a log flush of the rtrmap inode before
+		 * resetting the write pointer, so there is no need for
+		 * individual busy extent tracking.
+		 */
+		struct xfs_group		*xg_next_reset;
+	};
 
 	/*
 	 * Bitsets of per-ag metadata that have been checked and/or are sick.
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 5d8777f819f4..b325aff28264 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -58,6 +58,12 @@ struct xfs_rtgroup {
  */
 #define XFS_RTG_FREE			XA_MARK_0
 
+/*
+ * For zoned RT devices this is set on groups that are fully written and that
+ * have unused blocks.  Used by the garbage collection to pick targets.
+ */
+#define XFS_RTG_RECLAIMABLE		XA_MARK_1
+
 static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
 {
 	return container_of(xg, struct xfs_rtgroup, rtg_group);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 22/45] xfs: enable fsmap reporting for internal RT devices
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (20 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 21/45] xfs: implement zoned garbage collection Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 23/45] xfs: enable the zoned RT device feature Christoph Hellwig
                   ` (22 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: e50ec7fac81aa271f20ae09868f772ff43a240b0

File system with internal RT devices are a bit odd in that we need
to report AGs and RGs.  To make this happen use separate synthetic
fmr_device values for the different sections instead of the dev_t
mapping used by other XFS configurations.

The data device is reported as file system metadata before the
start of the RGs for the synthetic RT fmr_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 5e66fb2b2cc7..12463ba766da 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -1082,6 +1082,15 @@ struct xfs_rtgroup_geometry {
 #define XFS_IOC_COMMIT_RANGE	     _IOW ('X', 131, struct xfs_commit_range)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
+/*
+ * Devices supported by a single XFS file system.  Reported in fsmaps fmr_device
+ * when using internal RT devices.
+ */
+enum xfs_device {
+	XFS_DEV_DATA	= 1,
+	XFS_DEV_LOG	= 2,
+	XFS_DEV_RT	= 3,
+};
 
 #ifndef HAVE_BBMACROS
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 23/45] xfs: enable the zoned RT device feature
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (21 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 22/45] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 24/45] xfs: support zone gaps Christoph Hellwig
                   ` (21 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: be458049ffe32b5885c5c35b12997fd40c2986c4

Enable the zoned RT device directory feature.  With this feature, RT
groups are written sequentially and always emptied before rewriting
the blocks.  This perfectly maps to zoned devices, but can also be
used on conventional block devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index f67380a25805..cee7e26f23bd 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -408,7 +408,8 @@ xfs_sb_has_ro_compat_feature(
 		 XFS_SB_FEAT_INCOMPAT_NREXT64 | \
 		 XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
 		 XFS_SB_FEAT_INCOMPAT_PARENT | \
-		 XFS_SB_FEAT_INCOMPAT_METADIR)
+		 XFS_SB_FEAT_INCOMPAT_METADIR | \
+		 XFS_SB_FEAT_INCOMPAT_ZONED)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 24/45] xfs: support zone gaps
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (22 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 23/45] xfs: enable the zoned RT device feature Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 25/45] FIXUP: " Christoph Hellwig
                   ` (20 subsequent siblings)
  44 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Source kernel commit: 97c69ba1c08d5a2bb3cacecae685b63e20e4d485

Zoned devices can have gaps beyond the usable capacity of a zone and the
end in the LBA/daddr address space.  In other words, the hardware
equivalent to the RT groups already takes care of the power of 2
alignment for us.  In this case the sparse FSB/RTB address space maps 1:1
to the device address space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h  |  4 +++-
 libxfs/xfs_group.h   |  6 +++++-
 libxfs/xfs_rtgroup.h | 13 ++++++++-----
 libxfs/xfs_sb.c      |  3 +++
 libxfs/xfs_zones.c   | 19 +++++++++++++++++--
 5 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index cee7e26f23bd..9566a7623365 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -398,6 +398,7 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 7)  /* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_METADIR	(1 << 8)  /* metadata dir tree */
 #define XFS_SB_FEAT_INCOMPAT_ZONED	(1 << 9)  /* zoned RT allocator */
+#define XFS_SB_FEAT_INCOMPAT_ZONE_GAPS	(1 << 10) /* RTGs have LBA gaps */
 
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE | \
@@ -409,7 +410,8 @@ xfs_sb_has_ro_compat_feature(
 		 XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
 		 XFS_SB_FEAT_INCOMPAT_PARENT | \
 		 XFS_SB_FEAT_INCOMPAT_METADIR | \
-		 XFS_SB_FEAT_INCOMPAT_ZONED)
+		 XFS_SB_FEAT_INCOMPAT_ZONED | \
+		 XFS_SB_FEAT_INCOMPAT_ZONE_GAPS)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index cff3f815947b..4423932a2313 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -123,7 +123,11 @@ xfs_gbno_to_daddr(
 	struct xfs_groups	*g = &mp->m_groups[xg->xg_type];
 	xfs_fsblock_t		fsbno;
 
-	fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+	if (g->has_daddr_gaps)
+		fsbno = xfs_gbno_to_fsb(xg, gbno);
+	else
+		fsbno = (xfs_fsblock_t)xg->xg_gno * g->blocks + gbno;
+
 	return XFS_FSB_TO_BB(mp, g->start_fsb + fsbno);
 }
 
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index b325aff28264..d36a6ae0abe5 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -245,11 +245,14 @@ xfs_rtb_to_daddr(
 	xfs_rtblock_t		rtbno)
 {
 	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
-	xfs_rgnumber_t		rgno = xfs_rtb_to_rgno(mp, rtbno);
-	uint64_t		start_bno = (xfs_rtblock_t)rgno * g->blocks;
 
-	return XFS_FSB_TO_BB(mp,
-		g->start_fsb + start_bno + (rtbno & g->blkmask));
+	if (xfs_has_rtgroups(mp) && !g->has_daddr_gaps) {
+		xfs_rgnumber_t	rgno = xfs_rtb_to_rgno(mp, rtbno);
+
+		rtbno = (xfs_rtblock_t)rgno * g->blocks + (rtbno & g->blkmask);
+	}
+
+	return XFS_FSB_TO_BB(mp, g->start_fsb + rtbno);
 }
 
 static inline xfs_rtblock_t
@@ -261,7 +264,7 @@ xfs_daddr_to_rtb(
 	xfs_rfsblock_t		bno;
 
 	bno = XFS_BB_TO_FSBT(mp, daddr) - g->start_fsb;
-	if (xfs_has_rtgroups(mp)) {
+	if (xfs_has_rtgroups(mp) && !g->has_daddr_gaps) {
 		xfs_rgnumber_t	rgno;
 		uint32_t	rgbno;
 
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 8df99e518c70..078c75febf4f 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1202,6 +1202,9 @@ xfs_sb_mount_rextsize(
 		rgs->blklog = mp->m_sb.sb_rgblklog;
 		rgs->blkmask = xfs_mask32lo(mp->m_sb.sb_rgblklog);
 		rgs->start_fsb = mp->m_sb.sb_rtstart;
+		if (xfs_sb_has_incompat_feature(sbp,
+				XFS_SB_FEAT_INCOMPAT_ZONE_GAPS))
+			rgs->has_daddr_gaps = true;
 	} else {
 		rgs->blocks = 0;
 		rgs->blklog = 0;
diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
index 712c0fe9b0da..7a81d83f5b3e 100644
--- a/libxfs/xfs_zones.c
+++ b/libxfs/xfs_zones.c
@@ -139,6 +139,7 @@ xfs_zone_validate(
 {
 	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
+	uint32_t		expected_size;
 
 	/*
 	 * Check that the zone capacity matches the rtgroup size stored in the
@@ -153,11 +154,25 @@ xfs_zone_validate(
 		return false;
 	}
 
-	if (XFS_BB_TO_FSB(mp, zone->len) != 1 << g->blklog) {
+	if (g->has_daddr_gaps) {
+		expected_size = 1 << g->blklog;
+	} else {
+		if (zone->len != zone->capacity) {
+			xfs_warn(mp,
+"zone %u has capacity != size ((0x%llx vs 0x%llx)",
+				rtg_rgno(rtg),
+				XFS_BB_TO_FSB(mp, zone->len),
+				XFS_BB_TO_FSB(mp, zone->capacity));
+			return false;
+		}
+		expected_size = g->blocks;
+	}
+
+	if (XFS_BB_TO_FSB(mp, zone->len) != expected_size) {
 		xfs_warn(mp,
 "zone %u length (0x%llx) does match geometry (0x%x).",
 			rtg_rgno(rtg), XFS_BB_TO_FSB(mp, zone->len),
-			1 << g->blklog);
+			expected_size);
 	}
 
 	switch (zone->type) {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 25/45] FIXUP: xfs: support zone gaps
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (23 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 24/45] xfs: support zone gaps Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:57   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 26/45] libfrog: report the zoned flag Christoph Hellwig
                   ` (19 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 db/convert.c        | 6 +++++-
 include/xfs_mount.h | 9 +++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/db/convert.c b/db/convert.c
index 47d3e86fdc4e..3eec4f224f51 100644
--- a/db/convert.c
+++ b/db/convert.c
@@ -44,10 +44,14 @@ xfs_daddr_to_rgno(
 	struct xfs_mount	*mp,
 	xfs_daddr_t		daddr)
 {
+	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
+
 	if (!xfs_has_rtgroups(mp))
 		return 0;
 
-	return XFS_BB_TO_FSBT(mp, daddr) / mp->m_groups[XG_TYPE_RTG].blocks;
+	if (g->has_daddr_gaps)
+		return XFS_BB_TO_FSBT(mp, daddr) / (1 << g->blklog);
+	return XFS_BB_TO_FSBT(mp, daddr) / g->blocks;
 }
 
 typedef enum {
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index bf9ebc25fc79..5a714333c16e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -47,6 +47,15 @@ struct xfs_groups {
 	 */
 	uint8_t			blklog;
 
+	/*
+	 * Zoned devices can have gaps beyoned the usable capacity of a zone
+	 * and the end in the LBA/daddr address space.  In other words, the
+	 * hardware equivalent to the RT groups already takes care of the power
+	 * of 2 alignment for us.  In this case the sparse FSB/RTB address space
+	 * maps 1:1 to the device address space.
+	 */
+	bool			has_daddr_gaps;
+
 	/*
 	 * Mask to extract the group-relative block number from a FSB.
 	 * For a pre-rtgroups filesystem we pretend to have one very large
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 26/45] libfrog: report the zoned flag
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (24 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 25/45] FIXUP: " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 15:58   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 27/45] xfs_repair: support repairing zoned file systems Christoph Hellwig
                   ` (18 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/fsgeom.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 13df88ae43a7..5c4ba29ca9ac 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -34,6 +34,7 @@ xfs_report_geom(
 	int			exchangerange;
 	int			parent;
 	int			metadir;
+	int			zoned;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -55,6 +56,7 @@ xfs_report_geom(
 	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
 	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
 	metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
+	zoned = geo->flags & XFS_FSOP_GEOM_FLAGS_ZONED ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
@@ -68,7 +70,7 @@ xfs_report_geom(
 "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
 "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
 "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
-"         =%-22s rgcount=%-4d rgsize=%u extents\n"),
+"         =%-22s rgcount=%-4d rgsize=%u extents, zoned=%d\n"),
 		mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
@@ -84,7 +86,7 @@ xfs_report_geom(
 		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),
 		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
 			(unsigned long long)geo->rtextents,
-		"", geo->rgcount, geo->rgextents);
+		"", geo->rgcount, geo->rgextents, zoned);
 }
 
 /* Try to obtain the xfs geometry.  On error returns a negative error code. */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 27/45] xfs_repair: support repairing zoned file systems
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (25 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 26/45] libfrog: report the zoned flag Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 16:10   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
                   ` (17 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Note really much to do here.  Mostly ignore the validation and
regeneration of the bitmap and summary inodes.  Eventually this
could grow a bit of validation of the hardware zone state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c        |  4 +++-
 repair/phase5.c        | 13 +++++++++++++
 repair/phase6.c        |  6 ++++--
 repair/rt.c            |  2 ++
 repair/rtrmap_repair.c | 33 +++++++++++++++++++++++++++++++++
 repair/xfs_repair.c    |  9 ++++++---
 6 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/repair/dinode.c b/repair/dinode.c
index 8696a838087f..7bdd3dcf15c1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3585,7 +3585,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 
 	validate_extsize(mp, dino, lino, dirty);
 
-	if (dino->di_version >= 3)
+	if (dino->di_version >= 3 &&
+	    (!xfs_has_zoned(mp) ||
+	     dino->di_metatype != cpu_to_be16(XFS_METAFILE_RTRMAP)))
 		validate_cowextsize(mp, dino, lino, dirty);
 
 	/* nsec fields cannot be larger than 1 billion */
diff --git a/repair/phase5.c b/repair/phase5.c
index 4cf28d8ae1a2..e350b411c243 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -630,6 +630,19 @@ void
 check_rtmetadata(
 	struct xfs_mount	*mp)
 {
+	if (xfs_has_zoned(mp)) {
+		/*
+		 * Here we could/should verify the zone state a bit when we are
+		 * on actual zoned devices:
+		 *	- compare hw write pointer to last written
+		 *	- compare zone state to last written
+		 *
+		 * Note much we can do when running in zoned mode on a
+		 * conventional device.
+		 */
+		return;
+	}
+
 	generate_rtinfo(mp);
 	check_rtbitmap(mp);
 	check_rtsummary(mp);
diff --git a/repair/phase6.c b/repair/phase6.c
index dbc090a54139..a7187e84daae 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3460,8 +3460,10 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 		return;
 
 	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
-		ensure_rtgroup_bitmap(rtg);
-		ensure_rtgroup_summary(rtg);
+		if (!xfs_has_zoned(mp)) {
+			ensure_rtgroup_bitmap(rtg);
+			ensure_rtgroup_summary(rtg);
+		}
 		ensure_rtgroup_rmapbt(rtg, est_fdblocks);
 		ensure_rtgroup_refcountbt(rtg, est_fdblocks);
 	}
diff --git a/repair/rt.c b/repair/rt.c
index e0a4943ee3b7..a2478fb635e3 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -222,6 +222,8 @@ check_rtfile_contents(
 	xfs_fileoff_t		bno = 0;
 	int			error;
 
+	ASSERT(!xfs_has_zoned(mp));
+
 	if (!ip) {
 		do_warn(_("unable to open %s file\n"), filename);
 		return;
diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c
index 2b07e8943e59..955db1738fe2 100644
--- a/repair/rtrmap_repair.c
+++ b/repair/rtrmap_repair.c
@@ -141,6 +141,37 @@ xrep_rtrmap_btree_load(
 	return error;
 }
 
+static void
+rtgroup_update_counters(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_inode	*rmapip = rtg->rtg_inodes[XFS_RTGI_RMAP];
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	uint64_t		end =
+		xfs_rtbxlen_to_blen(mp, rtg->rtg_extents);
+	xfs_agblock_t		gbno = 0;
+	uint64_t		used = 0;
+
+	do {
+		int		bstate;
+		xfs_extlen_t	blen;
+
+		bstate = get_bmap_ext(rtg_rgno(rtg), gbno, end, &blen, true);
+		switch (bstate) {
+		case XR_E_INUSE:
+		case XR_E_INUSE_FS:
+			used += blen;
+			break;
+		default:
+			break;
+		}
+
+		gbno += blen;
+	} while (gbno < end);
+
+	rmapip->i_used_blocks = used;
+}
+
 /* Update the inode counters. */
 STATIC int
 xrep_rtrmap_reset_counters(
@@ -153,6 +184,8 @@ xrep_rtrmap_reset_counters(
 	 * generated.
 	 */
 	sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
+	if (xfs_has_zoned(sc->mp))
+		rtgroup_update_counters(rr->rtg);
 	libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
 
 	/* Quotas don't exist so we're done. */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index eeaaf6434689..7bf75c09b945 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1388,16 +1388,19 @@ main(int argc, char **argv)
 	 * Done with the block usage maps, toss them.  Realtime metadata aren't
 	 * rebuilt until phase 6, so we have to keep them around.
 	 */
-	if (mp->m_sb.sb_rblocks == 0)
+	if (mp->m_sb.sb_rblocks == 0) {
 		rmaps_free(mp);
-	free_bmaps(mp);
+		free_bmaps(mp);
+	}
 
 	if (!bad_ino_btree)  {
 		phase6(mp);
 		phase_end(mp, 6);
 
-		if (mp->m_sb.sb_rblocks != 0)
+		if (mp->m_sb.sb_rblocks != 0) {
 			rmaps_free(mp);
+			free_bmaps(mp);
+		}
 		free_rtgroup_inodes();
 
 		phase7(mp, phase2_threads);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (26 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 27/45] xfs_repair: support repairing zoned file systems Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 16:11   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
                   ` (16 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Don't look at the variable for the rtname command line option, but
the actual file system geometry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/repair/dinode.c b/repair/dinode.c
index 7bdd3dcf15c1..0c559c408085 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3265,8 +3265,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			flags &= XFS_DIFLAG_ANY;
 		}
 
-		/* need an rt-dev for the realtime flag! */
-		if ((flags & XFS_DIFLAG_REALTIME) && !rt_name) {
+		/* need an rt-dev for the realtime flag */
+		if ((flags & XFS_DIFLAG_REALTIME) &&
+		    !mp->m_rtdev_targp) {
 			if (!uncertain) {
 				do_warn(
 	_("inode %" PRIu64 " has RT flag set but there is no RT device\n"),
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (27 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 18:41   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 30/45] xfs_mkfs: support creating zoned file systems Christoph Hellwig
                   ` (15 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Run a report zones ioctl, and verify the rt group state vs the
reported hardware zone state.  Note that there is no way to actually
fix up any discrepancies here, as that would be rather scary without
having transactions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 repair/Makefile |   1 +
 repair/phase5.c |  11 +---
 repair/zoned.c  | 136 ++++++++++++++++++++++++++++++++++++++++++++++++
 repair/zoned.h  |  10 ++++
 4 files changed, 149 insertions(+), 9 deletions(-)
 create mode 100644 repair/zoned.c
 create mode 100644 repair/zoned.h

diff --git a/repair/Makefile b/repair/Makefile
index ff5b1f5abeda..fb0b2f96cc91 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -81,6 +81,7 @@ CFILES = \
 	strblobs.c \
 	threads.c \
 	versions.c \
+	zoned.c \
 	xfs_repair.c
 
 LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBXCMD) $(LIBFROG) $(LIBUUID) $(LIBRT) \
diff --git a/repair/phase5.c b/repair/phase5.c
index e350b411c243..e44c26885717 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -21,6 +21,7 @@
 #include "rmap.h"
 #include "bulkload.h"
 #include "agbtree.h"
+#include "zoned.h"
 
 static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
 static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
@@ -631,15 +632,7 @@ check_rtmetadata(
 	struct xfs_mount	*mp)
 {
 	if (xfs_has_zoned(mp)) {
-		/*
-		 * Here we could/should verify the zone state a bit when we are
-		 * on actual zoned devices:
-		 *	- compare hw write pointer to last written
-		 *	- compare zone state to last written
-		 *
-		 * Note much we can do when running in zoned mode on a
-		 * conventional device.
-		 */
+		check_zones(mp);
 		return;
 	}
 
diff --git a/repair/zoned.c b/repair/zoned.c
new file mode 100644
index 000000000000..06b2a08dff39
--- /dev/null
+++ b/repair/zoned.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2024 Christoph Hellwig.
+ */
+#include <ctype.h>
+#include <linux/blkzoned.h>
+#include "libxfs_priv.h"
+#include "libxfs.h"
+#include "xfs_zones.h"
+#include "err_protos.h"
+#include "zoned.h"
+
+/* random size that allows efficient processing */
+#define ZONES_PER_IOCTL			16384
+
+static void
+report_zones_cb(
+	struct xfs_mount	*mp,
+	struct blk_zone		*zone)
+{
+	xfs_fsblock_t		zsbno = xfs_daddr_to_rtb(mp, zone->start);
+	xfs_rgblock_t		write_pointer;
+	xfs_rgnumber_t		rgno;
+	struct xfs_rtgroup	*rtg;
+
+	if (xfs_rtb_to_rgbno(mp, zsbno) != 0) {
+		do_error(_("mismatched zone start 0x%llx."),
+				(unsigned long long)zsbno);
+		return;
+	}
+
+	rgno = xfs_rtb_to_rgno(mp, zsbno);
+	rtg = xfs_rtgroup_grab(mp, rgno);
+	if (!rtg) {
+		do_error(_("realtime group not found for zone %u."), rgno);
+		return;
+	}
+
+	if (!rtg_rmap(rtg))
+		do_warn(_("no rmap inode for zone %u."), rgno);
+	else
+		xfs_zone_validate(zone, rtg, &write_pointer);
+	xfs_rtgroup_rele(rtg);
+}
+
+void check_zones(struct xfs_mount *mp)
+{
+	int fd = mp->m_rtdev_targp->bt_bdev_fd;
+	uint64_t sector = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart);
+	unsigned int zone_size, zone_capacity;
+	struct blk_zone_report *rep;
+	unsigned int i, n = 0;
+	uint64_t device_size;
+	size_t rep_size;
+
+	if (ioctl(fd, BLKGETSIZE64, &device_size))
+		return; /* not a block device */
+	if (ioctl(fd, BLKGETZONESZ, &zone_size) || !zone_size)
+		return;	/* not zoned */
+
+	device_size /= 512; /* BLKGETSIZE64 reports a byte value */
+	if (device_size / zone_size < mp->m_sb.sb_rgcount) {
+		do_error(_("rt device too small\n"));
+		return;
+	}
+
+	rep_size = sizeof(struct blk_zone_report) +
+		   sizeof(struct blk_zone) * ZONES_PER_IOCTL;
+	rep = malloc(rep_size);
+	if (!rep) {
+		do_warn(_("malloc failed for zone report\n"));
+		return;
+	}
+
+	while (n < mp->m_sb.sb_rgcount) {
+		struct blk_zone *zones = (struct blk_zone *)(rep + 1);
+		int ret;
+
+		memset(rep, 0, rep_size);
+		rep->sector = sector;
+		rep->nr_zones = ZONES_PER_IOCTL;
+
+		ret = ioctl(fd, BLKREPORTZONE, rep);
+		if (ret) {
+			do_error(_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
+			goto out_free;
+		}
+		if (!rep->nr_zones)
+			break;
+
+		for (i = 0; i < rep->nr_zones; i++) {
+			if (n >= mp->m_sb.sb_rgcount)
+				break;
+
+			if (zones[i].len != zone_size) {
+				do_error(_("Inconsistent zone size!\n"));
+				goto out_free;
+			}
+
+			switch (zones[i].type) {
+			case BLK_ZONE_TYPE_CONVENTIONAL:
+			case BLK_ZONE_TYPE_SEQWRITE_REQ:
+				break;
+			case BLK_ZONE_TYPE_SEQWRITE_PREF:
+				do_error(
+_("Found sequential write preferred zone\n"));
+				goto out_free;
+			default:
+				do_error(
+_("Found unknown zone type (0x%x)\n"), zones[i].type);
+				goto out_free;
+			}
+
+			if (!n) {
+				zone_capacity = zones[i].capacity;
+				if (zone_capacity > zone_size) {
+					do_error(
+_("Zone capacity larger than zone size!\n"));
+					goto out_free;
+				}
+			} else if (zones[i].capacity != zone_capacity) {
+				do_error(
+_("Inconsistent zone capacity!\n"));
+				goto out_free;
+			}
+
+			report_zones_cb(mp, &zones[i]);
+			n++;
+		}
+		sector = zones[rep->nr_zones - 1].start +
+			 zones[rep->nr_zones - 1].len;
+	}
+
+out_free:
+	free(rep);
+}
diff --git a/repair/zoned.h b/repair/zoned.h
new file mode 100644
index 000000000000..ab76bf15b3ca
--- /dev/null
+++ b/repair/zoned.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2024 Christoph Hellwig.
+ */
+#ifndef _XFS_REPAIR_ZONED_H_
+#define _XFS_REPAIR_ZONED_H_
+
+void check_zones(struct xfs_mount *mp);
+
+#endif /* _XFS_REPAIR_ZONED_H_ */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 30/45] xfs_mkfs: support creating zoned file systems
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (28 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 18:54   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
                   ` (14 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Default to use all sequential write required zoned for the RT device.

Default to 256 and 1% conventional when -r zoned is specified without
further option.  This mimics a SMR HDD and works well with tests.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/init.c     |   2 +-
 mkfs/proto.c      |   3 +-
 mkfs/xfs_mkfs.c   | 553 ++++++++++++++++++++++++++++++++++++++++++----
 repair/agheader.c |   2 +-
 4 files changed, 518 insertions(+), 42 deletions(-)

diff --git a/libxfs/init.c b/libxfs/init.c
index a186369f3fd8..393a94673f7e 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -251,7 +251,7 @@ libxfs_close_devices(
 		libxfs_device_close(&li->data);
 	if (li->log.dev && li->log.dev != li->data.dev)
 		libxfs_device_close(&li->log);
-	if (li->rt.dev)
+	if (li->rt.dev && li->rt.dev != li->data.dev)
 		libxfs_device_close(&li->rt);
 }
 
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 7f56a3d82a06..7f80bef838be 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1144,7 +1144,8 @@ rtinit_groups(
 				fail(_("rtrmap rtsb init failed"), error);
 		}
 
-		rtfreesp_init(rtg);
+		if (!xfs_has_zoned(mp))
+			rtfreesp_init(rtg);
 	}
 }
 
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 39e3349205fb..133ede8d8483 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -6,6 +6,8 @@
 #include "libfrog/util.h"
 #include "libxfs.h"
 #include <ctype.h>
+#include <linux/blkzoned.h>
+#include "libxfs/xfs_zones.h"
 #include "xfs_multidisk.h"
 #include "libxcmd.h"
 #include "libfrog/fsgeom.h"
@@ -135,6 +137,9 @@ enum {
 	R_RGCOUNT,
 	R_RGSIZE,
 	R_CONCURRENCY,
+	R_ZONED,
+	R_START,
+	R_RESERVED,
 	R_MAX_OPTS,
 };
 
@@ -739,6 +744,9 @@ static struct opt_params ropts = {
 		[R_RGCOUNT] = "rgcount",
 		[R_RGSIZE] = "rgsize",
 		[R_CONCURRENCY] = "concurrency",
+		[R_ZONED] = "zoned",
+		[R_START] = "start",
+		[R_RESERVED] = "reserved",
 		[R_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -804,6 +812,28 @@ static struct opt_params ropts = {
 		  .maxval = INT_MAX,
 		  .defaultval = 1,
 		},
+		{ .index = R_ZONED,
+		  .conflicts = { { &ropts, R_EXTSIZE },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = R_START,
+		  .conflicts = { { &ropts, R_DEV },
+				 { NULL, LAST_CONFLICT } },
+		  .convert = true,
+		  .minval = 0,
+		  .maxval = LLONG_MAX,
+		  .defaultval = SUBOPT_NEEDS_VAL,
+		},
+		{ .index = R_RESERVED,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .convert = true,
+		  .minval = 0,
+		  .maxval = LLONG_MAX,
+		  .defaultval = SUBOPT_NEEDS_VAL,
+		},
 	},
 };
 
@@ -1012,6 +1042,8 @@ struct sb_feat_args {
 	bool	nortalign;
 	bool	nrext64;
 	bool	exchrange;		/* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
+	bool	zoned;
+	bool	zone_gaps;
 
 	uint16_t qflags;
 };
@@ -1035,6 +1067,8 @@ struct cli_params {
 	char	*lsu;
 	char	*rtextsize;
 	char	*rtsize;
+	char	*rtstart;
+	uint64_t rtreserved;
 
 	/* parameters where 0 is a valid CLI value */
 	int	dsunit;
@@ -1121,6 +1155,8 @@ struct mkfs_params {
 	char		*label;
 
 	struct sb_feat_args	sb_feat;
+	uint64_t	rtstart;
+	uint64_t	rtreserved;
 };
 
 /*
@@ -1172,7 +1208,7 @@ usage( void )
 /* prototype file */	[-p fname]\n\
 /* quiet */		[-q]\n\
 /* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
-			    concurrency=num]\n\
+			    concurrency=num,zoned=0|1,start=n,reserved=n]\n\
 /* sectorsize */	[-s size=num]\n\
 /* version */		[-V]\n\
 			devicename\n\
@@ -1539,6 +1575,30 @@ discard_blocks(int fd, uint64_t nsectors, int quiet)
 		printf("Done.\n");
 }
 
+static void
+reset_zones(struct mkfs_params *cfg, int fd, uint64_t start_sector,
+		uint64_t nsectors, int quiet)
+{
+	struct blk_zone_range range = {
+		.sector		= start_sector,
+		.nr_sectors	= nsectors,
+	};
+
+	if (!quiet) {
+		printf("Resetting zones...");
+		fflush(stdout);
+	}
+
+	if (ioctl(fd, BLKRESETZONE, &range) < 0) {
+		if (!quiet)
+			printf(" FAILED\n");
+		exit(1);
+	}
+
+	if (!quiet)
+		printf("Done.\n");
+}
+
 static __attribute__((noreturn)) void
 illegal_option(
 	const char		*value,
@@ -2144,6 +2204,15 @@ rtdev_opts_parser(
 	case R_CONCURRENCY:
 		set_rtvol_concurrency(opts, subopt, cli, value);
 		break;
+	case R_ZONED:
+		cli->sb_feat.zoned = getnum(value, opts, subopt);
+		break;
+	case R_START:
+		cli->rtstart = getstr(value, opts, subopt);
+		break;
+	case R_RESERVED:
+		cli->rtreserved = getnum(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2445,7 +2514,208 @@ _("Version 1 logs do not support sector size %d\n"),
 _("log stripe unit specified, using v2 logs\n"));
 		cli->sb_feat.log_version = 2;
 	}
+}
+
+struct zone_info {
+	/* number of zones, conventional or sequential */
+	unsigned int		nr_zones;
+	/* number of conventional zones */
+	unsigned int		nr_conv_zones;
+
+	/* size of the address space for a zone, in 512b blocks */
+	xfs_daddr_t		zone_size;
+	/* write capacity of a zone, in 512b blocks */
+	xfs_daddr_t		zone_capacity;
+};
 
+struct zone_topology {
+	struct zone_info	data;
+	struct zone_info	rt;
+	struct zone_info	log;
+};
+
+/* random size that allows efficient processing */
+#define ZONES_PER_IOCTL			16384
+
+static int report_zones(const char *name, struct zone_info *zi)
+{
+	struct blk_zone_report *rep;
+	size_t rep_size;
+	struct stat st;
+	unsigned int i, n = 0;
+	uint64_t device_size;
+	uint64_t sector = 0;
+	bool found_seq = false;
+	int ret = 0;
+	int fd;
+
+	fd = open(name, O_RDONLY);
+	if (fd < 0)
+		return -EIO;
+
+	if (fstat(fd, &st) < 0) {
+		ret = -EIO;
+		goto out_close;
+	}
+        if (!S_ISBLK(st.st_mode))
+		goto out_close;
+
+	if (ioctl(fd, BLKGETSIZE64, &device_size)) {
+		ret = -EIO;
+		goto out_close;
+	}
+	if (ioctl(fd, BLKGETZONESZ, &zi->zone_size) || !zi->zone_size)
+		goto out_close; /* not zoned */
+
+	device_size /= 512; /* BLKGETSIZE64 reports a byte value */
+	zi->nr_zones = device_size / zi->zone_size;
+	zi->nr_conv_zones = 0;
+
+	rep_size = sizeof(struct blk_zone_report) +
+		   sizeof(struct blk_zone) * ZONES_PER_IOCTL;
+	rep = malloc(rep_size);
+	if (!rep) {
+		ret = -ENOMEM;
+		goto out_close;
+	}
+
+	while (n < zi->nr_zones) {
+		struct blk_zone *zones = (struct blk_zone *)(rep + 1);
+
+		memset(rep, 0, rep_size);
+		rep->sector = sector;
+		rep->nr_zones = ZONES_PER_IOCTL;
+
+		ret = ioctl(fd, BLKREPORTZONE, rep);
+		if (ret) {
+			fprintf(stderr,
+_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
+			goto out_free;
+		}
+		if (!rep->nr_zones)
+			break;
+
+		for (i = 0; i < rep->nr_zones; i++) {
+			if (n >= zi->nr_zones)
+				break;
+
+			if (zones[i].len != zi->zone_size) {
+				fprintf(stderr,
+_("Inconsistent zone size!\n"));
+				ret = -EIO;
+				goto out_free;
+			}
+
+			switch (zones[i].type) {
+			case BLK_ZONE_TYPE_CONVENTIONAL:
+				/*
+				 * We can only use the conventional space at the
+				 * start of the device for metadata, so don't
+				 * count later conventional zones.  This is
+				 * not an error because we can use them for data
+				 * just fine.
+				 */
+				if (!found_seq)
+					zi->nr_conv_zones++;
+				break;
+			case BLK_ZONE_TYPE_SEQWRITE_REQ:
+				found_seq = true;
+				break;
+			case BLK_ZONE_TYPE_SEQWRITE_PREF:
+				fprintf(stderr,
+_("Sequential write preferred zones not supported.\n"));
+				ret = -EIO;
+				goto out_free;
+			default:
+				fprintf(stderr,
+_("Unknown zone type (0x%x) found.\n"), zones[i].type);
+				ret = -EIO;
+				goto out_free;
+			}
+
+			if (!n) {
+				zi->zone_capacity = zones[i].capacity;
+				if (zi->zone_capacity > zi->zone_size) {
+					fprintf(stderr,
+_("Zone capacity larger than zone size!\n"));
+					ret = -EIO;
+					goto out_free;
+				}
+			} else if (zones[i].capacity != zi->zone_capacity) {
+				fprintf(stderr,
+_("Inconsistent zone capacity!\n"));
+				ret = -EIO;
+				goto out_free;
+			}
+
+			n++;
+		}
+		sector = zones[rep->nr_zones - 1].start +
+			 zones[rep->nr_zones - 1].len;
+	}
+
+out_free:
+	free(rep);
+out_close:
+	close(fd);
+	return ret;
+}
+
+static void
+validate_zoned(
+	struct mkfs_params	*cfg,
+	struct cli_params	*cli,
+	struct mkfs_default_params *dft,
+	struct zone_topology	*zt)
+{
+	if (!cli->xi->data.isfile) {
+		report_zones(cli->xi->data.name, &zt->data);
+		if (zt->data.nr_zones) {
+			if (!zt->data.nr_conv_zones) {
+				fprintf(stderr,
+_("Data devices requires conventional zones.\n"));
+				usage();
+			}
+			if (zt->data.zone_capacity != zt->data.zone_size) {
+				fprintf(stderr,
+_("Zone capacity equal to Zone size required for conventional zones.\n"));
+				usage();
+			}
+
+			cli->sb_feat.zoned = true;
+			cfg->rtstart =
+				zt->data.nr_conv_zones * zt->data.zone_capacity;
+		}
+	}
+
+	if (cli->xi->rt.name && !cli->xi->rt.isfile) {
+		report_zones(cli->xi->rt.name, &zt->rt);
+		if (zt->rt.nr_zones && !cli->sb_feat.zoned)
+			cli->sb_feat.zoned = true;
+		if (zt->rt.zone_size != zt->rt.zone_capacity)
+			cli->sb_feat.zone_gaps = true;
+	}
+
+	if (cli->xi->log.name && !cli->xi->log.isfile) {
+		report_zones(cli->xi->log.name, &zt->log);
+		if (zt->log.nr_zones) {
+			fprintf(stderr,
+_("Zoned devices not supported as log device!\n"));
+			usage();
+		}
+	}
+
+	if (cli->rtstart) {
+		if (cfg->rtstart) {
+			fprintf(stderr,
+_("rtstart override not allowed on zoned devices.\n"));
+			usage();
+		}
+		cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;
+	}
+
+	if (cli->rtreserved)
+		cfg->rtreserved = cli->rtreserved;
 }
 
 /*
@@ -2670,7 +2940,37 @@ _("inode btree counters not supported without finobt support\n"));
 		cli->sb_feat.inobtcnt = false;
 	}
 
-	if (cli->xi->rt.name) {
+	if (cli->sb_feat.zoned) {
+		if (!cli->sb_feat.metadir) {
+			if (cli_opt_set(&mopts, M_METADIR)) {
+				fprintf(stderr,
+_("zoned realtime device not supported without metadir support\n"));
+				usage();
+			}
+			cli->sb_feat.metadir = true;
+		}
+		if (cli->rtextsize) {
+			if (cli_opt_set(&ropts, R_EXTSIZE)) {
+				fprintf(stderr,
+_("rt extent size not supported on realtime devices with zoned mode\n"));
+				usage();
+			}
+			cli->rtextsize = 0;
+		}
+	} else {
+		if (cli->rtstart) {
+			fprintf(stderr,
+_("internal RT section only supported in zoned mode\n"));
+			usage();
+		}
+		if (cli->rtreserved) {
+			fprintf(stderr,
+_("reserved RT blocks only supported in zoned mode\n"));
+			usage();
+		}
+	}
+
+	if (cli->xi->rt.name || cfg->rtstart) {
 		if (cli->rtextsize && cli->sb_feat.reflink) {
 			if (cli_opt_set(&mopts, M_REFLINK)) {
 				fprintf(stderr,
@@ -2911,6 +3211,11 @@ validate_rtextsize(
 			usage();
 		}
 		cfg->rtextblocks = 1;
+	} else if (cli->sb_feat.zoned) {
+		/*
+		 * Zoned mode only supports a rtextsize of 1.
+		 */
+		cfg->rtextblocks = 1;
 	} else {
 		/*
 		 * If realtime extsize has not been specified by the user,
@@ -3315,7 +3620,8 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
 static void
 open_devices(
 	struct mkfs_params	*cfg,
-	struct libxfs_init	*xi)
+	struct libxfs_init	*xi,
+	struct zone_topology	*zt)
 {
 	uint64_t		sector_mask;
 
@@ -3330,6 +3636,34 @@ open_devices(
 		usage();
 	}
 
+	if (zt->data.nr_zones) {
+		zt->rt.zone_size = zt->data.zone_size;
+		zt->rt.zone_capacity = zt->data.zone_capacity;
+		zt->rt.nr_zones = zt->data.nr_zones - zt->data.nr_conv_zones;
+	} else if (cfg->sb_feat.zoned && !cfg->rtstart && !xi->rt.dev) {
+		/*
+		 * By default reserve at 1% of the total capacity (rounded up to
+		 * the next power of two) for metadata, but match the minimum we
+		 * enforce elsewhere. This matches what SMR HDDs provide.
+		 */
+		uint64_t rt_target_size = max((xi->data.size + 99) / 100,
+					      BTOBB(300 * 1024 * 1024));
+
+		cfg->rtstart = 1;
+		while (cfg->rtstart < rt_target_size)
+			cfg->rtstart <<= 1;
+	}
+
+	if (cfg->rtstart) {
+		if (cfg->rtstart >= xi->data.size) {
+			fprintf(stderr,
+ _("device size %lld too small for zoned allocator\n"), xi->data.size);
+			usage();
+		}
+		xi->rt.size = xi->data.size - cfg->rtstart;
+		xi->data.size = cfg->rtstart;
+	}
+
 	/*
 	 * Ok, Linux only has a 1024-byte resolution on device _size_,
 	 * and the sizes below are in basic 512-byte blocks,
@@ -3348,17 +3682,42 @@ open_devices(
 
 static void
 discard_devices(
+	struct mkfs_params	*cfg,
 	struct libxfs_init	*xi,
+	struct zone_topology	*zt,
 	int			quiet)
 {
 	/*
 	 * This function has to be called after libxfs has been initialized.
 	 */
 
-	if (!xi->data.isfile)
-		discard_blocks(xi->data.fd, xi->data.size, quiet);
-	if (xi->rt.dev && !xi->rt.isfile)
-		discard_blocks(xi->rt.fd, xi->rt.size, quiet);
+	if (!xi->data.isfile) {
+		uint64_t	nsectors = xi->data.size;
+
+		if (cfg->rtstart && zt->data.nr_zones) {
+			/*
+			 * Note that the zone reset here includes the LBA range
+			 * for the data device.
+			 *
+			 * This is because doing a single zone reset all on the
+			 * entire device (which the kernel automatically does
+			 * for us for a full device range) is a lot faster than
+			 * resetting each zone individually and resetting
+			 * the conventional zones used for the data device is a
+			 * no-op.
+			 */
+			reset_zones(cfg, xi->data.fd, 0,
+					cfg->rtstart + xi->rt.size, quiet);
+			nsectors -= cfg->rtstart;
+		}
+		discard_blocks(xi->data.fd, nsectors, quiet);
+	}
+	if (xi->rt.dev && !xi->rt.isfile) {
+		if (zt->rt.nr_zones)
+			reset_zones(cfg, xi->rt.fd, 0, xi->rt.size, quiet);
+		else
+			discard_blocks(xi->rt.fd, xi->rt.size, quiet);
+	}
 	if (xi->log.dev && xi->log.dev != xi->data.dev && !xi->log.isfile)
 		discard_blocks(xi->log.fd, xi->log.size, quiet);
 }
@@ -3477,11 +3836,12 @@ reported by the device (%u).\n"),
 static void
 validate_rtdev(
 	struct mkfs_params	*cfg,
-	struct cli_params	*cli)
+	struct cli_params	*cli,
+	struct zone_topology	*zt)
 {
 	struct libxfs_init	*xi = cli->xi;
 
-	if (!xi->rt.dev) {
+	if (!xi->rt.dev && !cfg->rtstart) {
 		if (cli->rtsize) {
 			fprintf(stderr,
 _("size specified for non-existent rt subvolume\n"));
@@ -3501,7 +3861,7 @@ _("size specified for non-existent rt subvolume\n"));
 	if (cli->rtsize) {
 		if (cfg->rtblocks > DTOBT(xi->rt.size, cfg->blocklog)) {
 			fprintf(stderr,
-_("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
+_("size %s specified for rt subvolume is too large, maximum is %lld blocks\n"),
 				cli->rtsize,
 				(long long)DTOBT(xi->rt.size, cfg->blocklog));
 			usage();
@@ -3512,6 +3872,9 @@ _("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
 reported by the device (%u).\n"),
 				cfg->sectorsize, xi->rt.bsize);
 		}
+	} else if (zt->rt.nr_zones) {
+		cfg->rtblocks = DTOBT(zt->rt.nr_zones * zt->rt.zone_capacity,
+				      cfg->blocklog);
 	} else {
 		/* grab volume size */
 		cfg->rtblocks = DTOBT(xi->rt.size, cfg->blocklog);
@@ -3950,6 +4313,42 @@ out:
 	cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
 }
 
+static void
+validate_rtgroup_geometry(
+	struct mkfs_params	*cfg)
+{
+	if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
+		fprintf(stderr,
+_("realtime group size (%llu) must be less than the maximum (%u)\n"),
+				(unsigned long long)cfg->rgsize,
+				XFS_MAX_RGBLOCKS);
+		usage();
+	}
+
+	if (cfg->rgsize % cfg->rtextblocks != 0) {
+		fprintf(stderr,
+_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
+				(unsigned long long)cfg->rgsize,
+				(unsigned long long)cfg->rtextblocks);
+		usage();
+	}
+
+	if (cfg->rgsize <= cfg->rtextblocks) {
+		fprintf(stderr,
+_("realtime group size (%llu) must be at least two realtime extents\n"),
+				(unsigned long long)cfg->rgsize);
+		usage();
+	}
+
+	if (cfg->rgcount > XFS_MAX_RGNUMBER) {
+		fprintf(stderr,
+_("realtime group count (%llu) must be less than the maximum (%u)\n"),
+				(unsigned long long)cfg->rgcount,
+				XFS_MAX_RGNUMBER);
+		usage();
+	}
+}
+
 static void
 calculate_rtgroup_geometry(
 	struct mkfs_params	*cfg,
@@ -4007,40 +4406,97 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
 				(cfg->rtblocks % cfg->rgsize != 0);
 	}
 
-	if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
-		fprintf(stderr,
-_("realtime group size (%llu) must be less than the maximum (%u)\n"),
-				(unsigned long long)cfg->rgsize,
-				XFS_MAX_RGBLOCKS);
-		usage();
-	}
+	validate_rtgroup_geometry(cfg);
 
-	if (cfg->rgsize % cfg->rtextblocks != 0) {
+	if (cfg->rtextents)
+		cfg->rtbmblocks = howmany(cfg->rgsize / cfg->rtextblocks,
+			NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
+}
+
+static void
+calculate_zone_geometry(
+	struct mkfs_params	*cfg,
+	struct cli_params	*cli,
+	struct libxfs_init	*xi,
+	struct zone_topology	*zt)
+{
+	if (cfg->rtblocks == 0) {
 		fprintf(stderr,
-_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
-				(unsigned long long)cfg->rgsize,
-				(unsigned long long)cfg->rtextblocks);
+_("empty zoned realtime device not supported.\n"));
 		usage();
 	}
 
-	if (cfg->rgsize <= cfg->rtextblocks) {
-		fprintf(stderr,
-_("realtime group size (%llu) must be at least two realtime extents\n"),
-				(unsigned long long)cfg->rgsize);
-		usage();
+	if (zt->rt.nr_zones) {
+		/* The RT device has hardware zones */
+		cfg->rgsize = zt->rt.zone_capacity * 512;
+
+		if (cfg->rgsize % cfg->blocksize) {
+			fprintf(stderr,
+_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
+				cli->rgsize, cfg->blocksize);
+			usage();
+		}
+		if (cli->rgsize) {
+			fprintf(stderr,
+_("rgsize (%s) may not be specified when the rt device is zoned\n"),
+				cli->rgsize);
+			usage();
+		}
+
+		cfg->rgsize /= cfg->blocksize;
+		cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
+
+		if (cli->rgcount > cfg->rgcount) {
+			fprintf(stderr,
+_("rgcount (%llu) is larger than hardware zone count (%llu)\n"),
+					(unsigned long long)cli->rgcount,
+					(unsigned long long)cfg->rgcount);
+			usage();
+		} else if (cli->rgcount && cli->rgcount < cfg->rgcount) {
+			/* constrain the rt device to the given rgcount */
+			cfg->rgcount = cli->rgcount;
+		}
+	} else {
+		/* No hardware zones */
+		if (cli->rgsize) {
+			/* User-specified rtgroup size */
+			cfg->rgsize = getnum(cli->rgsize, &ropts, R_RGSIZE);
+
+			/* Check specified agsize is a multiple of blocksize. */
+			if (cfg->rgsize % cfg->blocksize) {
+				fprintf(stderr,
+_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
+					cli->rgsize, cfg->blocksize);
+				usage();
+			}
+			cfg->rgsize /= cfg->blocksize;
+			cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+					(cfg->rtblocks % cfg->rgsize != 0);
+		} else if (cli->rgcount) {
+			/* User-specified rtgroup count */
+			cfg->rgcount = cli->rgcount;
+			cfg->rgsize = cfg->rtblocks / cfg->rgcount +
+					(cfg->rtblocks % cfg->rgcount != 0);
+		} else {
+			/* 256MB zones just like typical SMR HDDs */
+			cfg->rgsize = MEGABYTES(256, cfg->blocklog);
+			cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+					(cfg->rtblocks % cfg->rgsize != 0);
+		}
 	}
 
-	if (cfg->rgcount > XFS_MAX_RGNUMBER) {
+	if (cfg->rgcount < XFS_MIN_ZONES)  {
 		fprintf(stderr,
-_("realtime group count (%llu) must be less than the maximum (%u)\n"),
+_("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
 				(unsigned long long)cfg->rgcount,
-				XFS_MAX_RGNUMBER);
+				XFS_MIN_ZONES);
 		usage();
 	}
 
-	if (cfg->rtextents)
-		cfg->rtbmblocks = howmany(cfg->rgsize / cfg->rtextblocks,
-			NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
+	validate_rtgroup_geometry(cfg);
+
+	/* Zoned RT devices don't use the rtbitmap, and have no bitmap blocks */
+	cfg->rtbmblocks = 0;
 }
 
 static void
@@ -4206,6 +4662,14 @@ sb_set_features(
 		sbp->sb_rgblklog = libxfs_compute_rgblklog(sbp->sb_rgextents,
 							   cfg->rtextblocks);
 	}
+
+	if (fp->zoned) {
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONED;
+		sbp->sb_rtstart = (cfg->rtstart * 512) / cfg->blocksize;
+		sbp->sb_rtreserved = cfg->rtreserved / cfg->blocksize;
+	}
+	if (fp->zone_gaps)
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_ZONE_GAPS;
 }
 
 /*
@@ -4768,9 +5232,11 @@ prepare_devices(
 			 (xfs_extlen_t)XFS_FSB_TO_BB(mp, cfg->logblocks),
 			 &sbp->sb_uuid, cfg->sb_feat.log_version,
 			 lsunit, XLOG_FMT, XLOG_INIT_CYCLE, false);
-
 	/* finally, check we can write the last block in the realtime area */
-	if (mp->m_rtdev_targp->bt_bdev && cfg->rtblocks > 0) {
+	if (mp->m_rtdev_targp->bt_bdev &&
+	    mp->m_rtdev_targp != mp->m_ddev_targp &&
+	    cfg->rtblocks > 0 &&
+	    !xfs_has_zoned(mp)) {
 		buf = alloc_write_buf(mp->m_rtdev_targp,
 				XFS_FSB_TO_BB(mp, cfg->rtblocks - 1LL),
 				BTOBB(cfg->blocksize));
@@ -5209,7 +5675,7 @@ main(
 			 */
 		},
 	};
-
+	struct zone_topology zt = {};
 	struct list_head	buffer_list;
 	int			error;
 
@@ -5311,6 +5777,7 @@ main(
 	sectorsize = cfg.sectorsize;
 
 	validate_log_sectorsize(&cfg, &cli, &dft, &ft);
+	validate_zoned(&cfg, &cli, &dft, &zt);
 	validate_sb_features(&cfg, &cli);
 
 	/*
@@ -5335,11 +5802,11 @@ main(
 	/*
 	 * Open and validate the device configurations
 	 */
-	open_devices(&cfg, &xi);
+	open_devices(&cfg, &xi, &zt);
 	validate_overwrite(xi.data.name, force_overwrite);
 	validate_datadev(&cfg, &cli);
 	validate_logdev(&cfg, &cli);
-	validate_rtdev(&cfg, &cli);
+	validate_rtdev(&cfg, &cli, &zt);
 	calc_stripe_factors(&cfg, &cli, &ft);
 
 	/*
@@ -5350,7 +5817,10 @@ main(
 	 */
 	calculate_initial_ag_geometry(&cfg, &cli, &xi);
 	align_ag_geometry(&cfg);
-	calculate_rtgroup_geometry(&cfg, &cli, &xi);
+	if (cfg.sb_feat.zoned)
+		calculate_zone_geometry(&cfg, &cli, &xi, &zt);
+	else
+		calculate_rtgroup_geometry(&cfg, &cli, &xi);
 
 	calculate_imaxpct(&cfg, &cli);
 
@@ -5403,8 +5873,13 @@ main(
 	/*
 	 * All values have been validated, discard the old device layout.
 	 */
+	if (cli.sb_feat.zoned && !discard) {
+		fprintf(stderr,
+ _("-K not support for zoned file systems.\n"));
+		return 1;
+	}
 	if (discard && !dry_run)
-		discard_devices(&xi, quiet);
+		discard_devices(&cfg, &xi, &zt, quiet);
 
 	/*
 	 * we need the libxfs buffer cache from here on in.
diff --git a/repair/agheader.c b/repair/agheader.c
index 5bb4e47e0c5b..048e6c3143b5 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -486,7 +486,7 @@ secondary_sb_whack(
 	 * size is the size of data which is valid for this sb.
 	 */
 	if (xfs_sb_version_haszoned(sb))
-		size = offsetofend(struct xfs_dsb, sb_rtstart);
+		size = offsetofend(struct xfs_dsb, sb_rtreserved);
 	else if (xfs_sb_version_hasmetadir(sb))
 		size = offsetofend(struct xfs_dsb, sb_pad);
 	else if (xfs_sb_version_hasmetauuid(sb))
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (29 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 30/45] xfs_mkfs: support creating zoned file systems Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:06   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
                   ` (13 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

When size is specified for zoned file systems, calculate the required
over provisioning to back the requested capacity.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/xfs_mkfs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 133ede8d8483..6b5eb9eb140a 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -4413,6 +4413,49 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
 			NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
 }
 
+/*
+ * If we're creating a zoned filesystem and the user specified a size, add
+ * enough over-provisioning to be able to back the requested amount of
+ * writable space.
+ */
+static void
+adjust_nr_zones(
+	struct mkfs_params	*cfg,
+	struct cli_params	*cli,
+	struct libxfs_init	*xi,
+	struct zone_topology	*zt)
+{
+	uint64_t		new_rtblocks, slack;
+	unsigned int		max_zones;
+
+	if (zt->rt.nr_zones)
+		max_zones = zt->rt.nr_zones;
+	else
+		max_zones = DTOBT(xi->rt.size, cfg->blocklog) / cfg->rgsize;
+
+	if (!cli->rgcount)
+		cfg->rgcount += XFS_RESERVED_ZONES;
+	if (cfg->rgcount > max_zones) {
+		cfg->rgcount = max_zones;
+		fprintf(stderr,
+_("Warning: not enough zones for backing requested rt size due to\n"
+  "over-provisioning needs, writeable size will be less than %s\n"),
+			cli->rtsize);
+	}
+	new_rtblocks = (cfg->rgcount * cfg->rgsize);
+	slack = (new_rtblocks - cfg->rtblocks) % cfg->rgsize;
+
+	cfg->rtblocks = new_rtblocks;
+	cfg->rtextents = cfg->rtblocks / cfg->rtextblocks;
+
+	/*
+	 * Add the slack to the end of the last zone to the reserved blocks.
+	 * This ensures the visible user capacity is exactly the one that the
+	 * user asked for.
+	 */
+	cfg->rtreserved += (slack * cfg->blocksize);
+}
+
 static void
 calculate_zone_geometry(
 	struct mkfs_params	*cfg,
@@ -4485,6 +4528,9 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
 		}
 	}
 
+	if (cli->rtsize || cli->rgcount)
+		adjust_nr_zones(cfg, cli, xi, zt);
+
 	if (cfg->rgcount < XFS_MIN_ZONES)  {
 		fprintf(stderr,
 _("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (30 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 18:59   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
                   ` (12 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Zone file systems are intended to use sequential write required zones
(or areas treated as such) for data, and the main data device only for
metadata.  rtinherit=1 is the way to achieve that, so enabled it by
default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/xfs_mkfs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 6b5eb9eb140a..7d4114e8a2ea 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2957,6 +2957,13 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
 			}
 			cli->rtextsize = 0;
 		}
+
+		/*
+		 * Force the rtinherit flag on the root inode for zoned file
+		 * systems as they use the data device only as a metadata
+		 * container.
+		 */
+		cli->fsx.fsx_xflags |= FS_XFLAG_RTINHERIT;
 	} else {
 		if (cli->rtstart) {
 			fprintf(stderr,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (31 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:00   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
                   ` (11 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Until GC is enhanced to not unshared reflinked blocks we better prohibit
this combination.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/xfs_mkfs.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 7d4114e8a2ea..0e351f9efc32 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2957,6 +2957,14 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
 			}
 			cli->rtextsize = 0;
 		}
+		if (cli->sb_feat.reflink) {
+			if (cli_opt_set(&mopts, M_REFLINK)) {
+				fprintf(stderr,
+_("reflink not supported on realtime devices with zoned mode specified\n"));
+				usage();
+			}
+			cli->sb_feat.reflink = false;
+		}
 
 		/*
 		 * Force the rtinherit flag on the root inode for zoned file
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (32 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:00   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 35/45] libfrog: report the zoned geometry Christoph Hellwig
                   ` (10 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Add documentation for the zoned file system specific options.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/mkfs.xfs.8.in | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 37e3a88e7ac7..27df7f4546c7 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -1248,6 +1248,23 @@ The magic value of
 .I 0
 forces use of the older rtgroups geometry calculations that is used for
 mechanical storage.
+.TP
+.BI zoned= value
+Controls if the zoned allocator is used for the realtime device.
+The value is either 0 to disable the feature, or 1 to enable it.
+Defaults to 1 for zoned block device, else 0.
+.TP
+.BI start= value
+Controls the start of the internal realtime section.  Defaults to 0
+for conventional block devices, or the start of the first sequential
+required zone for zoned block devices.
+This option is only valid if the zoned realtime allocator is used.
+.TP
+.BI reserved= value
+Controls the amount of space in the realtime section that is reserved for
+internal use by garbage collection and reorganization algorithms.
+Defaults 0 if not set.
+This option is only valid if the zoned realtime allocator is used.
 .RE
 .PP
 .PD 0
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 35/45] libfrog: report the zoned geometry
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (33 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:01   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
                   ` (9 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Also fix up to report all the zoned information in a separate line,
which also helps with alignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/fsgeom.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 5c4ba29ca9ac..b4107b133861 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -70,7 +70,8 @@ xfs_report_geom(
 "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
 "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
 "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
-"         =%-22s rgcount=%-4d rgsize=%u extents, zoned=%d\n"),
+"         =%-22s rgcount=%-4d rgsize=%u extents\n"
+"         =%-22s zoned=%-6d start=%llu reserved=%llu\n"),
 		mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
@@ -86,7 +87,8 @@ xfs_report_geom(
 		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),
 		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
 			(unsigned long long)geo->rtextents,
-		"", geo->rgcount, geo->rgextents, zoned);
+		"", geo->rgcount, geo->rgextents,
+		"", zoned, geo->rtstart, geo->rtreserved);
 }
 
 /* Try to obtain the xfs geometry.  On error returns a negative error code. */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (34 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 35/45] libfrog: report the zoned geometry Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:13   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
                   ` (8 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Document the new zoned feature flag and the two new fields added
with it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 man/man2/ioctl_xfs_fsgeometry.2 | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
index 502054f391e9..df7234e603d2 100644
--- a/man/man2/ioctl_xfs_fsgeometry.2
+++ b/man/man2/ioctl_xfs_fsgeometry.2
@@ -50,7 +50,9 @@ struct xfs_fsop_geom {
 	__u32         sick;
 	__u32         checked;
 	__u64         rgextents;
-	__u64         reserved[16];
+	__u64	      rtstart;
+	__u64         rtreserved;
+	__u64         reserved[14];
 };
 .fi
 .in
@@ -143,6 +145,20 @@ for more details.
 .I rgextents
 Is the number of RT extents in each rtgroup.
 .PP
+.I rtstart
+Start of the internal RT device in fsblocks.  0 if an external log device
+is used.
+This field is meaningful only if the flag
+.B  XFS_FSOP_GEOM_FLAGS_ZONED
+is set.
+.PP
+.I rtreserved
+The amount of space in the realtime section that is reserved for internal use
+by garbage collection and reorganization algorithms in fsblocks.
+This field is meaningful only if the flag
+.B  XFS_FSOP_GEOM_FLAGS_ZONED
+is set.
+.PP
 .I reserved
 is set to zero.
 .SH FILESYSTEM FEATURE FLAGS
@@ -221,6 +237,9 @@ Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
 .TP
 .B XFS_FSOP_GEOM_FLAGS_METADIR
 Filesystem contains a metadata directory tree.
+.TP
+.B XFS_FSOP_GEOM_FLAGS_ZONED
+Filesystem uses the zoned allocator for the RT device.
 .RE
 .SH XFS METADATA HEALTH REPORTING
 .PP
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (35 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 22:22   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
                   ` (7 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Apply the proper offset.  Somehow this made gcc complain about
possible overflowing abuf, so increase the size for that as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 io/bmap.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/io/bmap.c b/io/bmap.c
index b2f6b4905285..944f658b35f0 100644
--- a/io/bmap.c
+++ b/io/bmap.c
@@ -257,18 +257,21 @@ bmap_f(
 #define	FLG_BSW		0000010	/* Not on begin of stripe width */
 #define	FLG_ESW		0000001	/* Not on end   of stripe width */
 		int	agno;
-		off_t	agoff, bbperag;
+		off_t	agoff, bbperag, bstart;
 		int	foff_w, boff_w, aoff_w, tot_w, agno_w;
-		char	rbuf[32], bbuf[32], abuf[32];
+		char	rbuf[32], bbuf[32], abuf[64];
 		int	sunit, swidth;
 
 		foff_w = boff_w = aoff_w = MINRANGE_WIDTH;
 		tot_w = MINTOT_WIDTH;
 		if (is_rt) {
+			bstart = fsgeo.rtstart *
+				(fsgeo.blocksize / BBSIZE);
 			bbperag = bytes_per_rtgroup(&fsgeo) / BBSIZE;
 			sunit = 0;
 			swidth = 0;
 		} else {
+			bstart = 0;
 			bbperag = (off_t)fsgeo.agblocks *
 				  (off_t)fsgeo.blocksize / BBSIZE;
 			sunit = (fsgeo.sunit * fsgeo.blocksize) / BBSIZE;
@@ -298,9 +301,11 @@ bmap_f(
 						map[i + 1].bmv_length - 1LL));
 				boff_w = max(boff_w, strlen(bbuf));
 				if (bbperag > 0) {
-					agno = map[i + 1].bmv_block / bbperag;
-					agoff = map[i + 1].bmv_block -
-							(agno * bbperag);
+					off_t bno;
+
+					bno = map[i + 1].bmv_block - bstart;
+					agno = bno / bbperag;
+					agoff = bno % bbperag;
 					snprintf(abuf, sizeof(abuf),
 						"(%lld..%lld)",
 						(long long)agoff,
@@ -387,9 +392,11 @@ bmap_f(
 				printf("%4d: %-*s %-*s", i, foff_w, rbuf,
 					boff_w, bbuf);
 				if (bbperag > 0) {
-					agno = map[i + 1].bmv_block / bbperag;
-					agoff = map[i + 1].bmv_block -
-							(agno * bbperag);
+					off_t bno;
+
+					bno = map[i + 1].bmv_block - bstart;
+					agno = bno / bbperag;
+					agoff = bno % bbperag;
 					snprintf(abuf, sizeof(abuf),
 						"(%lld..%lld)",
 						(long long)agoff,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (36 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 20:48   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 39/45] xfs_io: don't re-query geometry " Christoph Hellwig
                   ` (6 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

But reuse the information stash in "file".

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 io/fsmap.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/io/fsmap.c b/io/fsmap.c
index 6de720f238bb..6a87e8972f26 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -14,6 +14,7 @@
 
 static cmdinfo_t	fsmap_cmd;
 static dev_t		xfs_data_dev;
+static dev_t		xfs_log_dev;
 static dev_t		xfs_rt_dev;
 
 static void
@@ -405,8 +406,6 @@ fsmap_f(
 	int			c;
 	unsigned long long	nr = 0;
 	size_t			fsblocksize, fssectsize;
-	struct fs_path		*fs;
-	static bool		tab_init;
 	bool			dumped_flags = false;
 	int			dflag, lflag, rflag;
 
@@ -491,15 +490,19 @@ fsmap_f(
 		return 0;
 	}
 
+	xfs_data_dev = file->fs_path.fs_datadev;
+	xfs_log_dev = file->fs_path.fs_logdev;
+	xfs_rt_dev = file->fs_path.fs_rtdev;
+
 	memset(head, 0, sizeof(*head));
 	l = head->fmh_keys;
 	h = head->fmh_keys + 1;
 	if (dflag) {
-		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+		l->fmr_device = h->fmr_device = xfs_data_dev;
 	} else if (lflag) {
-		l->fmr_device = h->fmr_device = file->fs_path.fs_logdev;
+		l->fmr_device = h->fmr_device = xfs_log_dev;
 	} else if (rflag) {
-		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+		l->fmr_device = h->fmr_device = xfs_rt_dev;
 	} else {
 		l->fmr_device = 0;
 		h->fmr_device = UINT_MAX;
@@ -510,18 +513,6 @@ fsmap_f(
 	h->fmr_flags = UINT_MAX;
 	h->fmr_offset = ULLONG_MAX;
 
-	/*
-	 * If this is an XFS filesystem, remember the data device.
-	 * (We report AG number/block for data device extents on XFS).
-	 */
-	if (!tab_init) {
-		fs_table_initialise(0, NULL, 0, NULL);
-		tab_init = true;
-	}
-	fs = fs_table_lookup(file->name, FS_MOUNT_POINT);
-	xfs_data_dev = fs ? fs->fs_datadev : 0;
-	xfs_rt_dev = fs ? fs->fs_rtdev : 0;
-
 	head->fmh_count = map_size;
 	do {
 		/* Get some extents */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 39/45] xfs_io: don't re-query geometry information in fsmap_f
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (37 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 20:46   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
                   ` (5 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

But use the information store in "file".

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 io/fsmap.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/io/fsmap.c b/io/fsmap.c
index 6a87e8972f26..3cc1b510316c 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -166,9 +166,9 @@ static void
 dump_map_verbose(
 	unsigned long long	*nr,
 	struct fsmap_head	*head,
-	bool			*dumped_flags,
-	struct xfs_fsop_geom	*fsgeo)
+	bool			*dumped_flags)
 {
+	struct xfs_fsop_geom	*fsgeo = &file->geom;
 	unsigned long long	i;
 	struct fsmap		*p;
 	int			agno;
@@ -395,7 +395,6 @@ fsmap_f(
 	struct fsmap		*p;
 	struct fsmap_head	*head;
 	struct fsmap		*l, *h;
-	struct xfs_fsop_geom	fsgeo;
 	long long		start = 0;
 	long long		end = -1;
 	int			map_size;
@@ -470,17 +469,6 @@ fsmap_f(
 		end <<= BBSHIFT;
 	}
 
-	if (vflag) {
-		c = -xfrog_geometry(file->fd, &fsgeo);
-		if (c) {
-			fprintf(stderr,
-				_("%s: can't get geometry [\"%s\"]: %s\n"),
-				progname, file->name, strerror(c));
-			exitcode = 1;
-			return 0;
-		}
-	}
-
 	map_size = nflag ? nflag : 131072 / sizeof(struct fsmap);
 	head = malloc(fsmap_sizeof(map_size));
 	if (head == NULL) {
@@ -531,7 +519,7 @@ fsmap_f(
 			break;
 
 		if (vflag)
-			dump_map_verbose(&nr, head, &dumped_flags, &fsgeo);
+			dump_map_verbose(&nr, head, &dumped_flags);
 		else if (mflag)
 			dump_map_machine(&nr, head);
 		else
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (38 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 39/45] xfs_io: don't re-query geometry " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 21:48   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 41/45] xfs_spaceman: handle internal RT devices Christoph Hellwig
                   ` (4 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Deal with the synthetic fmr_device values and the rt device offset when
calculating RG numbers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 io/fsmap.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/io/fsmap.c b/io/fsmap.c
index 3cc1b510316c..41f2da50f344 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -247,8 +247,13 @@ dump_map_verbose(
 				(long long)BTOBBT(agoff),
 				(long long)BTOBBT(agoff + p->fmr_length - 1));
 		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
-			agno = p->fmr_physical / bperrtg;
-			agoff = p->fmr_physical % bperrtg;
+			uint64_t start = p->fmr_physical -
+				fsgeo->rtstart * fsgeo->blocksize;
+
+			agno = start / bperrtg;
+			if (agno < 0)
+				agno = -1;
+			agoff = start % bperrtg;
 			snprintf(abuf, sizeof(abuf),
 				"(%lld..%lld)",
 				(long long)BTOBBT(agoff),
@@ -326,8 +331,13 @@ dump_map_verbose(
 				"%lld",
 				(long long)agno);
 		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
-			agno = p->fmr_physical / bperrtg;
-			agoff = p->fmr_physical % bperrtg;
+			uint64_t start = p->fmr_physical -
+				fsgeo->rtstart * fsgeo->blocksize;
+
+			agno = start / bperrtg;
+			if (agno < 0)
+				agno = -1;
+			agoff = start % bperrtg;
 			snprintf(abuf, sizeof(abuf),
 				"(%lld..%lld)",
 				(long long)BTOBBT(agoff),
@@ -478,9 +488,18 @@ fsmap_f(
 		return 0;
 	}
 
-	xfs_data_dev = file->fs_path.fs_datadev;
-	xfs_log_dev = file->fs_path.fs_logdev;
-	xfs_rt_dev = file->fs_path.fs_rtdev;
+	/*
+	 * File systems with internal rt device use synthetic device values.
+	 */
+	if (file->geom.rtstart) {
+		xfs_data_dev = XFS_DEV_DATA;
+		xfs_log_dev = XFS_DEV_LOG;
+		xfs_rt_dev = XFS_DEV_RT;
+	} else {
+		xfs_data_dev = file->fs_path.fs_datadev;
+		xfs_log_dev = file->fs_path.fs_logdev;
+		xfs_rt_dev = file->fs_path.fs_rtdev;
+	}
 
 	memset(head, 0, sizeof(*head));
 	l = head->fmh_keys;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 41/45] xfs_spaceman: handle internal RT devices
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (39 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:29   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 42/45] xfs_scrub: support internal RT sections Christoph Hellwig
                   ` (3 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Handle the synthetic fmr_device values for fsmap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 spaceman/freesp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/spaceman/freesp.c b/spaceman/freesp.c
index dfbec52a7160..9ad321c4843f 100644
--- a/spaceman/freesp.c
+++ b/spaceman/freesp.c
@@ -140,12 +140,19 @@ scan_ag(
 	if (agno != NULLAGNUMBER) {
 		l->fmr_physical = cvt_agbno_to_b(xfd, agno, 0);
 		h->fmr_physical = cvt_agbno_to_b(xfd, agno + 1, 0);
-		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
+		if (file->xfd.fsgeom.rtstart)
+			l->fmr_device = XFS_DEV_DATA;
+		else
+			l->fmr_device = file->fs_path.fs_datadev;
 	} else {
 		l->fmr_physical = 0;
 		h->fmr_physical = ULLONG_MAX;
-		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
+		if (file->xfd.fsgeom.rtstart)
+			l->fmr_device = XFS_DEV_RT;
+		else
+			l->fmr_device = file->fs_path.fs_rtdev;
 	}
+		h->fmr_device = l->fmr_device;
 	h->fmr_owner = ULLONG_MAX;
 	h->fmr_flags = UINT_MAX;
 	h->fmr_offset = ULLONG_MAX;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 42/45] xfs_scrub: support internal RT sections
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (40 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 41/45] xfs_spaceman: handle internal RT devices Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:30   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 43/45] xfs_scrub: handle internal RT devices Christoph Hellwig
                   ` (2 subsequent siblings)
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scrub/phase1.c b/scrub/phase1.c
index d03a9099a217..e71cab7b7d90 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -341,7 +341,8 @@ _("Kernel metadata repair facility is not available.  Use -n to scrub."));
 _("Unable to find log device path."));
 		return ECANCELED;
 	}
-	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
+	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL &&
+	    !(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_ZONED)) {
 		str_error(ctx, ctx->mntpoint,
 _("Unable to find realtime device path."));
 		return ECANCELED;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 43/45] xfs_scrub: handle internal RT devices
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (41 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 42/45] xfs_scrub: support internal RT sections Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:34   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 44/45] xfs_mdrestore: support " Christoph Hellwig
  2025-04-09  7:55 ` [PATCH 45/45] xfs_growfs: " Christoph Hellwig
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Handle the synthetic fmr_device values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase6.c   | 65 ++++++++++++++++++++++++++++--------------------
 scrub/phase7.c   | 28 +++++++++++++++------
 scrub/spacemap.c | 17 +++++++++----
 3 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/scrub/phase6.c b/scrub/phase6.c
index 2a52b2c92419..abf6f9713f1a 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -56,12 +56,21 @@ dev_to_pool(
 	struct media_verify_state	*vs,
 	dev_t				dev)
 {
-	if (dev == ctx->fsinfo.fs_datadev)
-		return vs->rvp_data;
-	else if (dev == ctx->fsinfo.fs_logdev)
-		return vs->rvp_log;
-	else if (dev == ctx->fsinfo.fs_rtdev)
-		return vs->rvp_realtime;
+	if (ctx->mnt.fsgeom.rtstart) {
+		if (dev == XFS_DEV_DATA)
+			return vs->rvp_data;
+		if (dev == XFS_DEV_LOG)
+			return vs->rvp_log;
+		if (dev == XFS_DEV_RT)
+			return vs->rvp_realtime;
+	} else {
+		if (dev == ctx->fsinfo.fs_datadev)
+			return vs->rvp_data;
+		if (dev == ctx->fsinfo.fs_logdev)
+			return vs->rvp_log;
+		if (dev == ctx->fsinfo.fs_rtdev)
+			return vs->rvp_realtime;
+	}
 	abort();
 }
 
@@ -71,12 +80,21 @@ disk_to_dev(
 	struct scrub_ctx	*ctx,
 	struct disk		*disk)
 {
-	if (disk == ctx->datadev)
-		return ctx->fsinfo.fs_datadev;
-	else if (disk == ctx->logdev)
-		return ctx->fsinfo.fs_logdev;
-	else if (disk == ctx->rtdev)
-		return ctx->fsinfo.fs_rtdev;
+	if (ctx->mnt.fsgeom.rtstart) {
+		if (disk == ctx->datadev)
+			return XFS_DEV_DATA;
+		if (disk == ctx->logdev)
+			return XFS_DEV_LOG;
+		if (disk == ctx->rtdev)
+			return XFS_DEV_RT;
+	} else {
+		if (disk == ctx->datadev)
+			return ctx->fsinfo.fs_datadev;
+		if (disk == ctx->logdev)
+			return ctx->fsinfo.fs_logdev;
+		if (disk == ctx->rtdev)
+			return ctx->fsinfo.fs_rtdev;
+	}
 	abort();
 }
 
@@ -87,11 +105,9 @@ bitmap_for_disk(
 	struct disk			*disk,
 	struct media_verify_state	*vs)
 {
-	dev_t				dev = disk_to_dev(ctx, disk);
-
-	if (dev == ctx->fsinfo.fs_datadev)
+	if (disk == ctx->datadev)
 		return vs->d_bad;
-	else if (dev == ctx->fsinfo.fs_rtdev)
+	if (disk == ctx->rtdev)
 		return vs->r_bad;
 	return NULL;
 }
@@ -501,14 +517,11 @@ report_ioerr(
 		.length			= length,
 	};
 	struct disk_ioerr_report	*dioerr = arg;
-	dev_t				dev;
-
-	dev = disk_to_dev(dioerr->ctx, dioerr->disk);
 
 	/* Go figure out which blocks are bad from the fsmap. */
-	keys[0].fmr_device = dev;
+	keys[0].fmr_device = disk_to_dev(dioerr->ctx, dioerr->disk);
 	keys[0].fmr_physical = start;
-	keys[1].fmr_device = dev;
+	keys[1].fmr_device = keys[0].fmr_device;
 	keys[1].fmr_physical = start + length - 1;
 	keys[1].fmr_owner = ULLONG_MAX;
 	keys[1].fmr_offset = ULLONG_MAX;
@@ -675,14 +688,12 @@ remember_ioerr(
 	int				ret;
 
 	if (!length) {
-		dev_t			dev = disk_to_dev(ctx, disk);
-
-		if (dev == ctx->fsinfo.fs_datadev)
+		if (disk == ctx->datadev)
 			vs->d_trunc = true;
-		else if (dev == ctx->fsinfo.fs_rtdev)
-			vs->r_trunc = true;
-		else if (dev == ctx->fsinfo.fs_logdev)
+		else if (disk == ctx->logdev)
 			vs->l_trunc = true;
+		else if (disk == ctx->rtdev)
+			vs->r_trunc = true;
 		return;
 	}
 
diff --git a/scrub/phase7.c b/scrub/phase7.c
index 01097b678798..e25502668b1c 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -68,25 +68,37 @@ count_block_summary(
 	void			*arg)
 {
 	struct summary_counts	*counts;
+	bool			is_rt = false;
 	unsigned long long	len;
 	int			ret;
 
+	if (ctx->mnt.fsgeom.rtstart) {
+		if (fsmap->fmr_device == XFS_DEV_LOG)
+			return 0;
+		if (fsmap->fmr_device == XFS_DEV_RT)
+			is_rt = true;
+	} else {
+		if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
+			return 0;
+		if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+			is_rt = true;
+	}
+
 	counts = ptvar_get((struct ptvar *)arg, &ret);
 	if (ret) {
 		str_liberror(ctx, -ret, _("retrieving summary counts"));
 		return -ret;
 	}
-	if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
-		return 0;
+
 	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
 	    fsmap->fmr_owner == XFS_FMR_OWN_FREE) {
 		uint64_t	blocks;
 
 		blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
-		if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
-			hist_add(&counts->datadev_hist, blocks);
-		else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+		if (is_rt)
 			hist_add(&counts->rtdev_hist, blocks);
+		else
+			hist_add(&counts->datadev_hist, blocks);
 		return 0;
 	}
 
@@ -94,10 +106,10 @@ count_block_summary(
 
 	/* freesp btrees live in free space, need to adjust counters later. */
 	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
-	    fsmap->fmr_owner == XFS_FMR_OWN_AG) {
+	    fsmap->fmr_owner == XFS_FMR_OWN_AG)
 		counts->agbytes += fsmap->fmr_length;
-	}
-	if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
+
+	if (is_rt) {
 		/* Count realtime extents. */
 		counts->rbytes += len;
 	} else {
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index c293ab44a528..1ee4d1946d3d 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -103,9 +103,12 @@ scan_ag_rmaps(
 	bperag = (off_t)ctx->mnt.fsgeom.agblocks *
 		 (off_t)ctx->mnt.fsgeom.blocksize;
 
-	keys[0].fmr_device = ctx->fsinfo.fs_datadev;
+	if (ctx->mnt.fsgeom.rtstart)
+		keys[0].fmr_device = XFS_DEV_DATA;
+	else
+		keys[0].fmr_device = ctx->fsinfo.fs_datadev;
 	keys[0].fmr_physical = agno * bperag;
-	keys[1].fmr_device = ctx->fsinfo.fs_datadev;
+	keys[1].fmr_device = keys[0].fmr_device;
 	keys[1].fmr_physical = ((agno + 1) * bperag) - 1;
 	keys[1].fmr_owner = ULLONG_MAX;
 	keys[1].fmr_offset = ULLONG_MAX;
@@ -140,9 +143,12 @@ scan_rtg_rmaps(
 	off_t			bperrg = bytes_per_rtgroup(&ctx->mnt.fsgeom);
 	int			ret;
 
-	keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
+	if (ctx->mnt.fsgeom.rtstart)
+		keys[0].fmr_device = XFS_DEV_RT;
+	else
+		keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
 	keys[0].fmr_physical = (xfs_rtblock_t)rgno * bperrg;
-	keys[1].fmr_device = ctx->fsinfo.fs_rtdev;
+	keys[1].fmr_device = keys[0].fmr_device;
 	keys[1].fmr_physical = ((rgno + 1) * bperrg) - 1;
 	keys[1].fmr_owner = ULLONG_MAX;
 	keys[1].fmr_offset = ULLONG_MAX;
@@ -216,7 +222,8 @@ scan_log_rmaps(
 {
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 
-	scan_dev_rmaps(ctx, ctx->fsinfo.fs_logdev, arg);
+	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
+			arg);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 44/45] xfs_mdrestore: support internal RT devices
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (42 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 43/45] xfs_scrub: handle internal RT devices Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:36   ` Darrick J. Wong
  2025-04-09  7:55 ` [PATCH 45/45] xfs_growfs: " Christoph Hellwig
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Calculate the size properly for internal RT devices and skip restoring
to the external one for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mdrestore/xfs_mdrestore.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index d5014981b15a..95b01a99a154 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -183,6 +183,20 @@ verify_device_size(
 	}
 }
 
+static void
+verify_main_device_size(
+	const struct mdrestore_dev	*dev,
+	struct xfs_sb			*sb)
+{
+	xfs_rfsblock_t			nr_blocks = sb->sb_dblocks;
+
+	/* internal RT device */
+	if (sb->sb_rtstart)
+		nr_blocks = sb->sb_rtstart + sb->sb_rblocks;
+
+	verify_device_size(dev, nr_blocks, sb->sb_blocksize);
+}
+
 static void
 read_header_v1(
 	union mdrestore_headers	*h,
@@ -269,7 +283,7 @@ restore_v1(
 
 	((struct xfs_dsb*)block_buffer)->sb_inprogress = 1;
 
-	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
+	verify_main_device_size(ddev, &sb);
 
 	bytes_read = 0;
 
@@ -432,14 +446,14 @@ restore_v2(
 
 	((struct xfs_dsb *)block_buffer)->sb_inprogress = 1;
 
-	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
+	verify_main_device_size(ddev, &sb);
 
 	if (sb.sb_logstart == 0) {
 		ASSERT(mdrestore.external_log == true);
 		verify_device_size(logdev, sb.sb_logblocks, sb.sb_blocksize);
 	}
 
-	if (sb.sb_rblocks > 0) {
+	if (sb.sb_rblocks > 0 && !sb.sb_rtstart) {
 		ASSERT(mdrestore.realtime_data == true);
 		verify_device_size(rtdev, sb.sb_rblocks, sb.sb_blocksize);
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 45/45] xfs_growfs: support internal RT devices
  2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
                   ` (43 preceding siblings ...)
  2025-04-09  7:55 ` [PATCH 44/45] xfs_mdrestore: support " Christoph Hellwig
@ 2025-04-09  7:55 ` Christoph Hellwig
  2025-04-09 19:35   ` Darrick J. Wong
  44 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-09  7:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: Darrick J . Wong, Hans Holmberg, linux-xfs

Allow RT growfs when rtstart is set in the geomety, and adjust the
queried size for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 growfs/xfs_growfs.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
index 4b941403e2fd..0d0b2ae3e739 100644
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -202,7 +202,7 @@ main(int argc, char **argv)
 			progname, fname);
 		exit(1);
 	}
-	if (rflag && !xi.rt.dev) {
+	if (rflag && (!xi.rt.dev && !geo.rtstart)) {
 		fprintf(stderr,
 			_("%s: failed to access realtime device for %s\n"),
 			progname, fname);
@@ -211,6 +211,13 @@ main(int argc, char **argv)
 
 	xfs_report_geom(&geo, datadev, logdev, rtdev);
 
+	if (geo.rtstart) {
+		xfs_daddr_t rtstart = geo.rtstart * (geo.blocksize / BBSIZE);
+
+		xi.rt.size = xi.data.size - rtstart;
+		xi.data.size = rtstart;
+	}
+
 	ddsize = xi.data.size;
 	dlsize = (xi.log.size ? xi.log.size :
 			geo.logblocks * (geo.blocksize / BBSIZE) );
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH 02/45] FIXUP: xfs: generalize the freespace and reserved blocks handling
  2025-04-09  7:55 ` [PATCH 02/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:32   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:05AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  include/xfs_mount.h  | 32 ++++++++++++++++++++++++++++++--
>  libxfs/libxfs_priv.h | 13 +------------
>  2 files changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index 383cba7d6e3f..e0f72fc32b25 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -63,8 +63,6 @@ typedef struct xfs_mount {
>  	xfs_sb_t		m_sb;		/* copy of fs superblock */
>  #define m_icount	m_sb.sb_icount
>  #define m_ifree		m_sb.sb_ifree
> -#define m_fdblocks	m_sb.sb_fdblocks
> -#define m_frextents	m_sb.sb_frextents
>  	spinlock_t		m_sb_lock;
>  
>  	/*
> @@ -332,6 +330,36 @@ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
>  __XFS_UNSUPP_OPSTATE(readonly)
>  __XFS_UNSUPP_OPSTATE(shutdown)
>  
> +static inline int64_t xfs_sum_freecounter(struct xfs_mount *mp,
> +		enum xfs_free_counter ctr)
> +{
> +	if (ctr == XC_FREE_RTEXTENTS)
> +		return mp->m_sb.sb_frextents;
> +	return mp->m_sb.sb_fdblocks;
> +}
> +
> +static inline int64_t xfs_estimate_freecounter(struct xfs_mount *mp,
> +		enum xfs_free_counter ctr)
> +{
> +	return xfs_sum_freecounter(mp, ctr);
> +}
> +
> +static inline int xfs_compare_freecounter(struct xfs_mount *mp,
> +		enum xfs_free_counter ctr, int64_t rhs, int32_t batch)
> +{
> +	uint64_t count;
> +
> +	if (ctr == XC_FREE_RTEXTENTS)
> +		count = mp->m_sb.sb_frextents;
> +	else
> +		count = mp->m_sb.sb_fdblocks;
> +	if (count > rhs)
> +		return 1;
> +	else if (count < rhs)
> +		return -1;
> +	return 0;
> +}
> +
>  /* don't fail on device size or AG count checks */
>  #define LIBXFS_MOUNT_DEBUGGER		(1U << 0)
>  /* report metadata corruption to stdout */
> diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
> index 7e5c125b581a..cb4800de0b11 100644
> --- a/libxfs/libxfs_priv.h
> +++ b/libxfs/libxfs_priv.h
> @@ -209,7 +209,7 @@ static inline bool WARN_ON(bool expr) {
>  }
>  
>  #define WARN_ON_ONCE(e)			WARN_ON(e)
> -#define percpu_counter_read(x)		(*x)
> +
>  #define percpu_counter_read_positive(x)	((*x) > 0 ? (*x) : 0)
>  #define percpu_counter_sum_positive(x)	((*x) > 0 ? (*x) : 0)
>  
> @@ -219,17 +219,6 @@ uint32_t get_random_u32(void);
>  #define get_random_u32()	(0)
>  #endif
>  
> -static inline int
> -__percpu_counter_compare(uint64_t *count, int64_t rhs, int32_t batch)
> -{
> -	if (*count > rhs)
> -		return 1;
> -	else if (*count < rhs)
> -		return -1;
> -	return 0;
> -}
> -
> -
>  #define PAGE_SIZE		getpagesize()
>  extern unsigned int PAGE_SHIFT;
>  
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 04/45] FIXUP: xfs: make metabtree reservations global
  2025-04-09  7:55 ` [PATCH 04/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:43   ` Darrick J. Wong
  2025-04-10  6:00     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:07AM +0200, Christoph Hellwig wrote:
> ---
>  include/spinlock.h   |  5 +++++
>  include/xfs_mount.h  |  4 ++++
>  libxfs/libxfs_priv.h |  1 +
>  mkfs/xfs_mkfs.c      | 25 ++++---------------------
>  4 files changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/include/spinlock.h b/include/spinlock.h
> index 82973726b101..73bd8c078fea 100644
> --- a/include/spinlock.h
> +++ b/include/spinlock.h
> @@ -22,4 +22,9 @@ typedef pthread_mutex_t	spinlock_t;
>  #define spin_trylock(l)		(pthread_mutex_trylock(l) != EBUSY)
>  #define spin_unlock(l)		pthread_mutex_unlock(l)
>  
> +#define mutex_init(l)		pthread_mutex_init(l, NULL)
> +#define mutex_lock(l)		pthread_mutex_lock(l)
> +#define mutex_trylock(l)	(pthread_mutex_trylock(l) != EBUSY)
> +#define mutex_unlock(l)		pthread_mutex_unlock(l)
> +
>  #endif /* __LIBXFS_SPINLOCK_H__ */
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index e0f72fc32b25..0acf952eb9d7 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -164,6 +164,10 @@ typedef struct xfs_mount {
>  	atomic64_t		m_allocbt_blks;
>  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
>  
> +	pthread_mutex_t		m_metafile_resv_lock;
> +	uint64_t		m_metafile_resv_target;
> +	uint64_t		m_metafile_resv_used;
> +	uint64_t		m_metafile_resv_avail;
>  } xfs_mount_t;
>  
>  #define M_IGEO(mp)		(&(mp)->m_ino_geo)
> diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
> index cb4800de0b11..82952b0db629 100644
> --- a/libxfs/libxfs_priv.h
> +++ b/libxfs/libxfs_priv.h
> @@ -151,6 +151,7 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
>  
>  #define xfs_force_shutdown(d,n)		((void) 0)
>  #define xfs_mod_delalloc(a,b,c)		((void) 0)
> +#define xfs_mod_sb_delalloc(sb, d)	((void) 0)
>  
>  /* stop unused var warnings by assigning mp to itself */
>  
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 3f4455d46383..39e3349205fb 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -5102,8 +5102,6 @@ check_rt_meta_prealloc(
>  	struct xfs_mount	*mp)
>  {
>  	struct xfs_perag	*pag = NULL;
> -	struct xfs_rtgroup	*rtg = NULL;
> -	xfs_filblks_t		ask;
>  	int			error;
>  
>  	/*
> @@ -5123,27 +5121,12 @@ check_rt_meta_prealloc(
>  		}
>  	}
>  
> -	/* Realtime metadata btree inode */
> -	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> -		ask = libxfs_rtrmapbt_calc_reserves(mp);
> -		error = -libxfs_metafile_resv_init(rtg_rmap(rtg), ask);
> -		if (error)
> -			prealloc_fail(mp, error, ask, _("realtime rmap btree"));
> -
> -		ask = libxfs_rtrefcountbt_calc_reserves(mp);
> -		error = -libxfs_metafile_resv_init(rtg_refcount(rtg), ask);
> -		if (error)
> -			prealloc_fail(mp, error, ask,
> -					_("realtime refcount btree"));
> -	}
> +	error = -libxfs_metafile_resv_init(mp);
> +	if (error)
> +		prealloc_fail(mp, error, 0, _("metafile"));

Could this be _("metadata files") so that the error message becomes:

mkfs.xfs: cannot handle expansion of metadata files; need 55 free blocks, have 7

With that changed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

>  
> -	/* Unreserve the realtime metadata reservations. */
> -	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> -		libxfs_metafile_resv_free(rtg_rmap(rtg));
> -		libxfs_metafile_resv_free(rtg_refcount(rtg));
> -	}
> +	libxfs_metafile_resv_free(mp);
>  
> -	/* Unreserve the per-AG reservations. */
>  	while ((pag = xfs_perag_next(mp, pag)))
>  		libxfs_ag_resv_free(pag);
>  
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 11/45] FIXUP: xfs: define the zoned on-disk format
  2025-04-09  7:55 ` [PATCH 11/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:47   ` Darrick J. Wong
  2025-04-09 16:04     ` Darrick J. Wong
  2025-04-10  6:01     ` Christoph Hellwig
  0 siblings, 2 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:14AM +0200, Christoph Hellwig wrote:

No SoB?

Eh whatever it's going to get folded into the previous patch anyway so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  include/xfs_inode.h |  6 ++++++
>  include/xfs_mount.h | 12 ++++++++++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/xfs_inode.h b/include/xfs_inode.h
> index 5bb31eb4aa53..efef0da636d1 100644
> --- a/include/xfs_inode.h
> +++ b/include/xfs_inode.h
> @@ -234,6 +234,7 @@ typedef struct xfs_inode {
>  	xfs_extlen_t		i_extsize;	/* basic/minimum extent size */
>  	/* cowextsize is only used for v3 inodes, flushiter for v1/2 */
>  	union {
> +		uint32_t	i_used_blocks;
>  		xfs_extlen_t	i_cowextsize;	/* basic cow extent size */
>  		uint16_t	i_flushiter;	/* incremented on flush */
>  	};
> @@ -361,6 +362,11 @@ static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
>  }
>  #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_diflags & XFS_DIFLAG_REALTIME)
>  
> +static inline bool xfs_is_zoned_inode(struct xfs_inode *ip)
> +{
> +	return xfs_has_zoned(ip->i_mount) && XFS_IS_REALTIME_INODE(ip);
> +}
> +
>  /* inode link counts */
>  static inline void set_nlink(struct inode *inode, uint32_t nlink)
>  {
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index 0acf952eb9d7..7856acfb9f8e 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -207,6 +207,7 @@ typedef struct xfs_mount {
>  #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
>  #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
>  #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
> +#define XFS_FEAT_ZONED		(1ULL << 29)	/* zoned RT device */
>  
>  #define __XFS_HAS_FEAT(name, NAME) \
>  static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
> @@ -253,7 +254,7 @@ __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
>  __XFS_HAS_FEAT(large_extent_counts, NREXT64)
>  __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
>  __XFS_HAS_FEAT(metadir, METADIR)
> -
> +__XFS_HAS_FEAT(zoned, ZONED)
>  
>  static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
>  {
> @@ -264,7 +265,9 @@ static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
>  static inline bool xfs_has_rtsb(const struct xfs_mount *mp)
>  {
>  	/* all rtgroups filesystems with an rt section have an rtsb */
> -	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
> +	return xfs_has_rtgroups(mp) &&
> +		xfs_has_realtime(mp) &&
> +		!xfs_has_zoned(mp);
>  }
>  
>  static inline bool xfs_has_rtrmapbt(const struct xfs_mount *mp)
> @@ -279,6 +282,11 @@ static inline bool xfs_has_rtreflink(const struct xfs_mount *mp)
>  	       xfs_has_reflink(mp);
>  }
>  
> +static inline bool xfs_has_nonzoned(const struct xfs_mount *mp)
> +{
> +	return !xfs_has_zoned(mp);
> +}
> +
>  /* Kernel mount features that we don't support */
>  #define __XFS_UNSUPP_FEAT(name) \
>  static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/45] FIXUP: xfs: allow internal RT devices for zoned mode
  2025-04-09  7:55 ` [PATCH 13/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:55   ` Darrick J. Wong
  2025-04-10  6:09     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:16AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/libxfs.h    |  6 ++++++
>  include/xfs_mount.h |  7 +++++++
>  libfrog/fsgeom.c    |  2 +-
>  libxfs/init.c       | 13 +++++++++----
>  libxfs/rdwr.c       |  2 ++
>  repair/agheader.c   |  4 +++-
>  6 files changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/include/libxfs.h b/include/libxfs.h
> index 82b34b9d81c3..b968a2b88da3 100644
> --- a/include/libxfs.h
> +++ b/include/libxfs.h
> @@ -293,4 +293,10 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
>  		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
>  }
>  
> +static inline bool xfs_sb_version_haszoned(struct xfs_sb *sbp)
> +{
> +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> +		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_ZONED);
> +}
> +
>  #endif	/* __LIBXFS_H__ */
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index 7856acfb9f8e..bf9ebc25fc79 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -53,6 +53,13 @@ struct xfs_groups {
>  	 * rtgroup, so this mask must be 64-bit.
>  	 */
>  	uint64_t		blkmask;
> +
> +	/*
> +	 * Start of the first group in the device.  This is used to support a
> +	 * RT device following the data device on the same block device for
> +	 * SMR hard drives.
> +	 */
> +	xfs_fsblock_t		start_fsb;
>  };
>  
>  /*
> diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
> index b5220d2d6ffd..13df88ae43a7 100644
> --- a/libfrog/fsgeom.c
> +++ b/libfrog/fsgeom.c
> @@ -81,7 +81,7 @@ xfs_report_geom(
>  		isint ? _("internal log") : logname ? logname : _("external"),
>  			geo->blocksize, geo->logblocks, logversion,
>  		"", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,
> -		!geo->rtblocks ? _("none") : rtname ? rtname : _("external"),
> +		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),

Hum.  This change means that if you call xfs_db -c info without
supplying a realtime device, the info command output will claim an
internal rt device:

$ truncate -s 3g /tmp/a
$ truncate -s 3g /tmp/b
$ mkfs.xfs -f /tmp/a -r rtdev=/tmp/b
$ xfs_db -c info /tmp/a
meta-data=/tmp/a                 isize=512    agcount=4, agsize=196608 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=0   metadir=0
data     =                       bsize=4096   blocks=786432, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =internal               extsz=4096   blocks=786432, rtextents=786432
         =                       rgcount=0    rgsize=0 extents
         =                       zoned=0      start=0 reserved=0

The realtime device is external, you just haven't supplied one.  How
about:

static inline const char *
rtdev_name(
	const struct xfs_fsop_geo	*geo,
	const char			*rtname)
{
	if (!geo->rtblocks)
		return _("none");
	if (geo->rtstart)
		return _("internal");
	if (!rtname)
		return _("external");
	return rtname;
}

instead?

--D

>  		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
>  			(unsigned long long)geo->rtextents,
>  		"", geo->rgcount, geo->rgextents);
> diff --git a/libxfs/init.c b/libxfs/init.c
> index 5b45ed347276..a186369f3fd8 100644
> --- a/libxfs/init.c
> +++ b/libxfs/init.c
> @@ -560,7 +560,7 @@ libxfs_buftarg_init(
>  				progname);
>  			exit(1);
>  		}
> -		if (xi->rt.dev &&
> +		if ((xi->rt.dev || xi->rt.dev == xi->data.dev) &&
>  		    (mp->m_rtdev_targp->bt_bdev != xi->rt.dev ||
>  		     mp->m_rtdev_targp->bt_mount != mp)) {
>  			fprintf(stderr,
> @@ -577,7 +577,11 @@ libxfs_buftarg_init(
>  	else
>  		mp->m_logdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->log,
>  				lfail);
> -	mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt, rfail);
> +	if (!xi->rt.dev || xi->rt.dev == xi->data.dev)
> +		mp->m_rtdev_targp = mp->m_ddev_targp;
> +	else
> +		mp->m_rtdev_targp = libxfs_buftarg_alloc(mp, xi, &xi->rt,
> +				rfail);
>  }
>  
>  /* Compute maximum possible height for per-AG btree types for this fs. */
> @@ -978,7 +982,7 @@ libxfs_flush_mount(
>  			error = err2;
>  	}
>  
> -	if (mp->m_rtdev_targp) {
> +	if (mp->m_rtdev_targp && mp->m_rtdev_targp != mp->m_ddev_targp) {
>  		err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
>  				_("realtime device"));
>  		if (!error)
> @@ -1031,7 +1035,8 @@ libxfs_umount(
>  	free(mp->m_fsname);
>  	mp->m_fsname = NULL;
>  
> -	libxfs_buftarg_free(mp->m_rtdev_targp);
> +	if (mp->m_rtdev_targp != mp->m_ddev_targp)
> +		libxfs_buftarg_free(mp->m_rtdev_targp);
>  	if (mp->m_logdev_targp != mp->m_ddev_targp)
>  		libxfs_buftarg_free(mp->m_logdev_targp);
>  	libxfs_buftarg_free(mp->m_ddev_targp);
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 35be785c435a..f06763b38bd8 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -175,6 +175,8 @@ libxfs_getrtsb(
>  	if (!mp->m_rtdev_targp->bt_bdev)
>  		return NULL;
>  
> +	ASSERT(!mp->m_sb.sb_rtstart);
> +
>  	error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
>  			XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
>  	if (error)
> diff --git a/repair/agheader.c b/repair/agheader.c
> index 327ba041671f..5bb4e47e0c5b 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c
> @@ -485,7 +485,9 @@ secondary_sb_whack(
>  	 *
>  	 * size is the size of data which is valid for this sb.
>  	 */
> -	if (xfs_sb_version_hasmetadir(sb))
> +	if (xfs_sb_version_haszoned(sb))
> +		size = offsetofend(struct xfs_dsb, sb_rtstart);
> +	else if (xfs_sb_version_hasmetadir(sb))
>  		size = offsetofend(struct xfs_dsb, sb_pad);
>  	else if (xfs_sb_version_hasmetauuid(sb))
>  		size = offsetofend(struct xfs_dsb, sb_meta_uuid);
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 17/45] FIXUP: xfs: parse and validate hardware zone information
  2025-04-09  7:55 ` [PATCH 17/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:56   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:20AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  libxfs/Makefile    | 6 ++++--
>  libxfs/xfs_zones.c | 2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/libxfs/Makefile b/libxfs/Makefile
> index f3daa598ca97..61c43529b532 100644
> --- a/libxfs/Makefile
> +++ b/libxfs/Makefile
> @@ -71,7 +71,8 @@ HFILES = \
>  	xfs_shared.h \
>  	xfs_trans_resv.h \
>  	xfs_trans_space.h \
> -	xfs_dir2_priv.h
> +	xfs_dir2_priv.h \
> +	xfs_zones.h
>  
>  CFILES = buf_mem.c \
>  	cache.c \
> @@ -135,7 +136,8 @@ CFILES = buf_mem.c \
>  	xfs_trans_inode.c \
>  	xfs_trans_resv.c \
>  	xfs_trans_space.c \
> -	xfs_types.c
> +	xfs_types.c \
> +	xfs_zones.c
>  
>  EXTRA_CFILES=\
>  	ioctl_c_dummy.c
> diff --git a/libxfs/xfs_zones.c b/libxfs/xfs_zones.c
> index b022ed960eac..712c0fe9b0da 100644
> --- a/libxfs/xfs_zones.c
> +++ b/libxfs/xfs_zones.c
> @@ -3,6 +3,8 @@
>   * Copyright (c) 2023-2025 Christoph Hellwig.
>   * Copyright (c) 2024-2025, Western Digital Corporation or its affiliates.
>   */
> +#include <linux/blkzoned.h>
> +#include "libxfs_priv.h"
>  #include "xfs.h"
>  #include "xfs_fs.h"
>  #include "xfs_shared.h"
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 20/45] FIXUP: xfs: add support for zoned space reservations
  2025-04-09  7:55 ` [PATCH 20/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:56   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:23AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  libxfs/libxfs_priv.h | 2 ++
>  libxfs/xfs_bmap.c    | 1 -
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
> index 82952b0db629..d5f7d28e08e2 100644
> --- a/libxfs/libxfs_priv.h
> +++ b/libxfs/libxfs_priv.h
> @@ -471,6 +471,8 @@ static inline int retzero(void) { return 0; }
>  #define xfs_sb_validate_fsb_count(sbp, nblks)		(0)
>  #define xlog_calc_iovec_len(len)		roundup(len, sizeof(uint32_t))
>  
> +#define xfs_zoned_add_available(mp, rtxnum)	do { } while (0)
> +
>  /*
>   * Prototypes for kernel static functions that are aren't in their
>   * associated header files.
> diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
> index 3a857181bfa4..3cb47c3c8707 100644
> --- a/libxfs/xfs_bmap.c
> +++ b/libxfs/xfs_bmap.c
> @@ -35,7 +35,6 @@
>  #include "xfs_symlink_remote.h"
>  #include "xfs_inode_util.h"
>  #include "xfs_rtgroup.h"
> -#include "xfs_zone_alloc.h"
>  
>  struct kmem_cache		*xfs_bmap_intent_cache;
>  
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 25/45] FIXUP: xfs: support zone gaps
  2025-04-09  7:55 ` [PATCH 25/45] FIXUP: " Christoph Hellwig
@ 2025-04-09 15:57   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:28AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Makes sense to me
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  db/convert.c        | 6 +++++-
>  include/xfs_mount.h | 9 +++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/db/convert.c b/db/convert.c
> index 47d3e86fdc4e..3eec4f224f51 100644
> --- a/db/convert.c
> +++ b/db/convert.c
> @@ -44,10 +44,14 @@ xfs_daddr_to_rgno(
>  	struct xfs_mount	*mp,
>  	xfs_daddr_t		daddr)
>  {
> +	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
> +
>  	if (!xfs_has_rtgroups(mp))
>  		return 0;
>  
> -	return XFS_BB_TO_FSBT(mp, daddr) / mp->m_groups[XG_TYPE_RTG].blocks;
> +	if (g->has_daddr_gaps)
> +		return XFS_BB_TO_FSBT(mp, daddr) / (1 << g->blklog);
> +	return XFS_BB_TO_FSBT(mp, daddr) / g->blocks;
>  }
>  
>  typedef enum {
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index bf9ebc25fc79..5a714333c16e 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -47,6 +47,15 @@ struct xfs_groups {
>  	 */
>  	uint8_t			blklog;
>  
> +	/*
> +	 * Zoned devices can have gaps beyoned the usable capacity of a zone
> +	 * and the end in the LBA/daddr address space.  In other words, the
> +	 * hardware equivalent to the RT groups already takes care of the power
> +	 * of 2 alignment for us.  In this case the sparse FSB/RTB address space
> +	 * maps 1:1 to the device address space.
> +	 */
> +	bool			has_daddr_gaps;
> +
>  	/*
>  	 * Mask to extract the group-relative block number from a FSB.
>  	 * For a pre-rtgroups filesystem we pretend to have one very large
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 26/45] libfrog: report the zoned flag
  2025-04-09  7:55 ` [PATCH 26/45] libfrog: report the zoned flag Christoph Hellwig
@ 2025-04-09 15:58   ` Darrick J. Wong
  2025-04-10  6:14     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 15:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:29AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I wonder why the other xfs_report_geom change (about the realtime device
name reporting) wasn't in this patch?

Anyway the changes here seem reasonable to me, so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  libfrog/fsgeom.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
> index 13df88ae43a7..5c4ba29ca9ac 100644
> --- a/libfrog/fsgeom.c
> +++ b/libfrog/fsgeom.c
> @@ -34,6 +34,7 @@ xfs_report_geom(
>  	int			exchangerange;
>  	int			parent;
>  	int			metadir;
> +	int			zoned;
>  
>  	isint = geo->logstart > 0;
>  	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
> @@ -55,6 +56,7 @@ xfs_report_geom(
>  	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
>  	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
>  	metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
> +	zoned = geo->flags & XFS_FSOP_GEOM_FLAGS_ZONED ? 1 : 0;
>  
>  	printf(_(
>  "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
> @@ -68,7 +70,7 @@ xfs_report_geom(
>  "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
>  "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
>  "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
> -"         =%-22s rgcount=%-4d rgsize=%u extents\n"),
> +"         =%-22s rgcount=%-4d rgsize=%u extents, zoned=%d\n"),
>  		mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
>  		"", geo->sectsize, attrversion, projid32bit,
>  		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
> @@ -84,7 +86,7 @@ xfs_report_geom(
>  		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),
>  		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
>  			(unsigned long long)geo->rtextents,
> -		"", geo->rgcount, geo->rgextents);
> +		"", geo->rgcount, geo->rgextents, zoned);
>  }
>  
>  /* Try to obtain the xfs geometry.  On error returns a negative error code. */
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 11/45] FIXUP: xfs: define the zoned on-disk format
  2025-04-09 15:47   ` Darrick J. Wong
@ 2025-04-09 16:04     ` Darrick J. Wong
  2025-04-10  6:01     ` Christoph Hellwig
  1 sibling, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 16:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 08:47:15AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:14AM +0200, Christoph Hellwig wrote:
> 
> No SoB?
> 
> Eh whatever it's going to get folded into the previous patch anyway so
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> --D
> 
> > ---
> >  include/xfs_inode.h |  6 ++++++
> >  include/xfs_mount.h | 12 ++++++++++--
> >  2 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/xfs_inode.h b/include/xfs_inode.h
> > index 5bb31eb4aa53..efef0da636d1 100644
> > --- a/include/xfs_inode.h
> > +++ b/include/xfs_inode.h
> > @@ -234,6 +234,7 @@ typedef struct xfs_inode {
> >  	xfs_extlen_t		i_extsize;	/* basic/minimum extent size */
> >  	/* cowextsize is only used for v3 inodes, flushiter for v1/2 */
> >  	union {
> > +		uint32_t	i_used_blocks;

And only now did I notice the missing comment  ^^; can this add

"i_used_blocks is used for zoned rtrmap inodes"

from the kernel?

--D

> >  		xfs_extlen_t	i_cowextsize;	/* basic cow extent size */
> >  		uint16_t	i_flushiter;	/* incremented on flush */
> >  	};
> > @@ -361,6 +362,11 @@ static inline xfs_fsize_t XFS_ISIZE(struct xfs_inode *ip)
> >  }
> >  #define XFS_IS_REALTIME_INODE(ip) ((ip)->i_diflags & XFS_DIFLAG_REALTIME)
> >  
> > +static inline bool xfs_is_zoned_inode(struct xfs_inode *ip)
> > +{
> > +	return xfs_has_zoned(ip->i_mount) && XFS_IS_REALTIME_INODE(ip);
> > +}
> > +
> >  /* inode link counts */
> >  static inline void set_nlink(struct inode *inode, uint32_t nlink)
> >  {
> > diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> > index 0acf952eb9d7..7856acfb9f8e 100644
> > --- a/include/xfs_mount.h
> > +++ b/include/xfs_mount.h
> > @@ -207,6 +207,7 @@ typedef struct xfs_mount {
> >  #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
> >  #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
> >  #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
> > +#define XFS_FEAT_ZONED		(1ULL << 29)	/* zoned RT device */
> >  
> >  #define __XFS_HAS_FEAT(name, NAME) \
> >  static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
> > @@ -253,7 +254,7 @@ __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
> >  __XFS_HAS_FEAT(large_extent_counts, NREXT64)
> >  __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
> >  __XFS_HAS_FEAT(metadir, METADIR)
> > -
> > +__XFS_HAS_FEAT(zoned, ZONED)
> >  
> >  static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
> >  {
> > @@ -264,7 +265,9 @@ static inline bool xfs_has_rtgroups(const struct xfs_mount *mp)
> >  static inline bool xfs_has_rtsb(const struct xfs_mount *mp)
> >  {
> >  	/* all rtgroups filesystems with an rt section have an rtsb */
> > -	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
> > +	return xfs_has_rtgroups(mp) &&
> > +		xfs_has_realtime(mp) &&
> > +		!xfs_has_zoned(mp);
> >  }
> >  
> >  static inline bool xfs_has_rtrmapbt(const struct xfs_mount *mp)
> > @@ -279,6 +282,11 @@ static inline bool xfs_has_rtreflink(const struct xfs_mount *mp)
> >  	       xfs_has_reflink(mp);
> >  }
> >  
> > +static inline bool xfs_has_nonzoned(const struct xfs_mount *mp)
> > +{
> > +	return !xfs_has_zoned(mp);
> > +}
> > +
> >  /* Kernel mount features that we don't support */
> >  #define __XFS_UNSUPP_FEAT(name) \
> >  static inline bool xfs_has_ ## name (const struct xfs_mount *mp) \
> > -- 
> > 2.47.2
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 27/45] xfs_repair: support repairing zoned file systems
  2025-04-09  7:55 ` [PATCH 27/45] xfs_repair: support repairing zoned file systems Christoph Hellwig
@ 2025-04-09 16:10   ` Darrick J. Wong
  2025-04-10  6:27     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 16:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:30AM +0200, Christoph Hellwig wrote:
> Note really much to do here.  Mostly ignore the validation and
> regeneration of the bitmap and summary inodes.  Eventually this
> could grow a bit of validation of the hardware zone state.

What do we actually do about the hardware zone state?  If the write
pointer is lower than wherever the rtrmapbt thinks it is, then we're
screwed, right?

Does it matter if the hw write pointer is higher than where the rtrmapbt
thinks it is?  In that case, a new write will be beyond the last write
that the filesystem knows about, but the device will tell us the disk
address so it's all good aside from the freertx counters being wrong.  I
think?

The code changes presented here look ok to me though.

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  repair/dinode.c        |  4 +++-
>  repair/phase5.c        | 13 +++++++++++++
>  repair/phase6.c        |  6 ++++--
>  repair/rt.c            |  2 ++
>  repair/rtrmap_repair.c | 33 +++++++++++++++++++++++++++++++++
>  repair/xfs_repair.c    |  9 ++++++---
>  6 files changed, 61 insertions(+), 6 deletions(-)
> 
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 8696a838087f..7bdd3dcf15c1 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -3585,7 +3585,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
>  
>  	validate_extsize(mp, dino, lino, dirty);
>  
> -	if (dino->di_version >= 3)
> +	if (dino->di_version >= 3 &&
> +	    (!xfs_has_zoned(mp) ||
> +	     dino->di_metatype != cpu_to_be16(XFS_METAFILE_RTRMAP)))
>  		validate_cowextsize(mp, dino, lino, dirty);
>  
>  	/* nsec fields cannot be larger than 1 billion */
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 4cf28d8ae1a2..e350b411c243 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -630,6 +630,19 @@ void
>  check_rtmetadata(
>  	struct xfs_mount	*mp)
>  {
> +	if (xfs_has_zoned(mp)) {
> +		/*
> +		 * Here we could/should verify the zone state a bit when we are
> +		 * on actual zoned devices:
> +		 *	- compare hw write pointer to last written
> +		 *	- compare zone state to last written
> +		 *
> +		 * Note much we can do when running in zoned mode on a
> +		 * conventional device.
> +		 */
> +		return;
> +	}
> +
>  	generate_rtinfo(mp);
>  	check_rtbitmap(mp);
>  	check_rtsummary(mp);
> diff --git a/repair/phase6.c b/repair/phase6.c
> index dbc090a54139..a7187e84daae 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -3460,8 +3460,10 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
>  		return;
>  
>  	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
> -		ensure_rtgroup_bitmap(rtg);
> -		ensure_rtgroup_summary(rtg);
> +		if (!xfs_has_zoned(mp)) {
> +			ensure_rtgroup_bitmap(rtg);
> +			ensure_rtgroup_summary(rtg);
> +		}
>  		ensure_rtgroup_rmapbt(rtg, est_fdblocks);
>  		ensure_rtgroup_refcountbt(rtg, est_fdblocks);
>  	}
> diff --git a/repair/rt.c b/repair/rt.c
> index e0a4943ee3b7..a2478fb635e3 100644
> --- a/repair/rt.c
> +++ b/repair/rt.c
> @@ -222,6 +222,8 @@ check_rtfile_contents(
>  	xfs_fileoff_t		bno = 0;
>  	int			error;
>  
> +	ASSERT(!xfs_has_zoned(mp));
> +
>  	if (!ip) {
>  		do_warn(_("unable to open %s file\n"), filename);
>  		return;
> diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c
> index 2b07e8943e59..955db1738fe2 100644
> --- a/repair/rtrmap_repair.c
> +++ b/repair/rtrmap_repair.c
> @@ -141,6 +141,37 @@ xrep_rtrmap_btree_load(
>  	return error;
>  }
>  
> +static void
> +rtgroup_update_counters(
> +	struct xfs_rtgroup	*rtg)
> +{
> +	struct xfs_inode	*rmapip = rtg->rtg_inodes[XFS_RTGI_RMAP];
> +	struct xfs_mount	*mp = rtg_mount(rtg);
> +	uint64_t		end =
> +		xfs_rtbxlen_to_blen(mp, rtg->rtg_extents);
> +	xfs_agblock_t		gbno = 0;
> +	uint64_t		used = 0;
> +
> +	do {
> +		int		bstate;
> +		xfs_extlen_t	blen;
> +
> +		bstate = get_bmap_ext(rtg_rgno(rtg), gbno, end, &blen, true);
> +		switch (bstate) {
> +		case XR_E_INUSE:
> +		case XR_E_INUSE_FS:
> +			used += blen;
> +			break;
> +		default:
> +			break;
> +		}
> +
> +		gbno += blen;
> +	} while (gbno < end);
> +
> +	rmapip->i_used_blocks = used;
> +}
> +
>  /* Update the inode counters. */
>  STATIC int
>  xrep_rtrmap_reset_counters(
> @@ -153,6 +184,8 @@ xrep_rtrmap_reset_counters(
>  	 * generated.
>  	 */
>  	sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks;
> +	if (xfs_has_zoned(sc->mp))
> +		rtgroup_update_counters(rr->rtg);
>  	libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
>  
>  	/* Quotas don't exist so we're done. */
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index eeaaf6434689..7bf75c09b945 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -1388,16 +1388,19 @@ main(int argc, char **argv)
>  	 * Done with the block usage maps, toss them.  Realtime metadata aren't
>  	 * rebuilt until phase 6, so we have to keep them around.
>  	 */
> -	if (mp->m_sb.sb_rblocks == 0)
> +	if (mp->m_sb.sb_rblocks == 0) {
>  		rmaps_free(mp);
> -	free_bmaps(mp);
> +		free_bmaps(mp);
> +	}
>  
>  	if (!bad_ino_btree)  {
>  		phase6(mp);
>  		phase_end(mp, 6);
>  
> -		if (mp->m_sb.sb_rblocks != 0)
> +		if (mp->m_sb.sb_rblocks != 0) {
>  			rmaps_free(mp);
> +			free_bmaps(mp);
> +		}
>  		free_rtgroup_inodes();
>  
>  		phase7(mp, phase2_threads);
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int
  2025-04-09  7:55 ` [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
@ 2025-04-09 16:11   ` Darrick J. Wong
  2025-04-10  6:29     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 16:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:31AM +0200, Christoph Hellwig wrote:
> Don't look at the variable for the rtname command line option, but
> the actual file system geometry.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  repair/dinode.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 7bdd3dcf15c1..0c559c408085 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -3265,8 +3265,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
>  			flags &= XFS_DIFLAG_ANY;
>  		}
>  
> -		/* need an rt-dev for the realtime flag! */
> -		if ((flags & XFS_DIFLAG_REALTIME) && !rt_name) {
> +		/* need an rt-dev for the realtime flag */
> +		if ((flags & XFS_DIFLAG_REALTIME) &&
> +		    !mp->m_rtdev_targp) {

If we're going to check the fs geometry, then why not check
mp->m_sb.sb_rextents != 0?

--D

>  			if (!uncertain) {
>  				do_warn(
>  	_("inode %" PRIu64 " has RT flag set but there is no RT device\n"),
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones
  2025-04-09  7:55 ` [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
@ 2025-04-09 18:41   ` Darrick J. Wong
  2025-04-10  6:34     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 18:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:32AM +0200, Christoph Hellwig wrote:
> Run a report zones ioctl, and verify the rt group state vs the
> reported hardware zone state.  Note that there is no way to actually
> fix up any discrepancies here, as that would be rather scary without
> having transactions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  repair/Makefile |   1 +
>  repair/phase5.c |  11 +---
>  repair/zoned.c  | 136 ++++++++++++++++++++++++++++++++++++++++++++++++
>  repair/zoned.h  |  10 ++++
>  4 files changed, 149 insertions(+), 9 deletions(-)
>  create mode 100644 repair/zoned.c
>  create mode 100644 repair/zoned.h
> 
> diff --git a/repair/Makefile b/repair/Makefile
> index ff5b1f5abeda..fb0b2f96cc91 100644
> --- a/repair/Makefile
> +++ b/repair/Makefile
> @@ -81,6 +81,7 @@ CFILES = \
>  	strblobs.c \
>  	threads.c \
>  	versions.c \
> +	zoned.c \
>  	xfs_repair.c
>  
>  LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBXCMD) $(LIBFROG) $(LIBUUID) $(LIBRT) \
> diff --git a/repair/phase5.c b/repair/phase5.c
> index e350b411c243..e44c26885717 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -21,6 +21,7 @@
>  #include "rmap.h"
>  #include "bulkload.h"
>  #include "agbtree.h"
> +#include "zoned.h"
>  
>  static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
>  static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
> @@ -631,15 +632,7 @@ check_rtmetadata(
>  	struct xfs_mount	*mp)
>  {
>  	if (xfs_has_zoned(mp)) {
> -		/*
> -		 * Here we could/should verify the zone state a bit when we are
> -		 * on actual zoned devices:
> -		 *	- compare hw write pointer to last written
> -		 *	- compare zone state to last written
> -		 *
> -		 * Note much we can do when running in zoned mode on a
> -		 * conventional device.
> -		 */
> +		check_zones(mp);
>  		return;
>  	}
>  
> diff --git a/repair/zoned.c b/repair/zoned.c
> new file mode 100644
> index 000000000000..06b2a08dff39
> --- /dev/null
> +++ b/repair/zoned.c
> @@ -0,0 +1,136 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2024 Christoph Hellwig.
> + */
> +#include <ctype.h>
> +#include <linux/blkzoned.h>
> +#include "libxfs_priv.h"
> +#include "libxfs.h"
> +#include "xfs_zones.h"
> +#include "err_protos.h"
> +#include "zoned.h"
> +
> +/* random size that allows efficient processing */
> +#define ZONES_PER_IOCTL			16384
> +
> +static void
> +report_zones_cb(
> +	struct xfs_mount	*mp,
> +	struct blk_zone		*zone)
> +{
> +	xfs_fsblock_t		zsbno = xfs_daddr_to_rtb(mp, zone->start);

        ^^^^^^^^^^^^^ nit: xfs_rtblock_t ?

> +	xfs_rgblock_t		write_pointer;
> +	xfs_rgnumber_t		rgno;
> +	struct xfs_rtgroup	*rtg;
> +
> +	if (xfs_rtb_to_rgbno(mp, zsbno) != 0) {
> +		do_error(_("mismatched zone start 0x%llx."),
> +				(unsigned long long)zsbno);
> +		return;
> +	}
> +
> +	rgno = xfs_rtb_to_rgno(mp, zsbno);
> +	rtg = xfs_rtgroup_grab(mp, rgno);
> +	if (!rtg) {
> +		do_error(_("realtime group not found for zone %u."), rgno);
> +		return;
> +	}
> +
> +	if (!rtg_rmap(rtg))
> +		do_warn(_("no rmap inode for zone %u."), rgno);
> +	else
> +		xfs_zone_validate(zone, rtg, &write_pointer);
> +	xfs_rtgroup_rele(rtg);
> +}
> +
> +void check_zones(struct xfs_mount *mp)
> +{
> +	int fd = mp->m_rtdev_targp->bt_bdev_fd;
> +	uint64_t sector = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rtstart);
> +	unsigned int zone_size, zone_capacity;
> +	struct blk_zone_report *rep;
> +	unsigned int i, n = 0;
> +	uint64_t device_size;
> +	size_t rep_size;

Nit: inconsistent styles in declaration indentation
> +
> +	if (ioctl(fd, BLKGETSIZE64, &device_size))
> +		return; /* not a block device */
> +	if (ioctl(fd, BLKGETZONESZ, &zone_size) || !zone_size)
> +		return;	/* not zoned */
> +
> +	device_size /= 512; /* BLKGETSIZE64 reports a byte value */

device_size = BTOBB(device_size); ?

> +	if (device_size / zone_size < mp->m_sb.sb_rgcount) {
> +		do_error(_("rt device too small\n"));
> +		return;
> +	}
> +
> +	rep_size = sizeof(struct blk_zone_report) +
> +		   sizeof(struct blk_zone) * ZONES_PER_IOCTL;
> +	rep = malloc(rep_size);
> +	if (!rep) {
> +		do_warn(_("malloc failed for zone report\n"));
> +		return;
> +	}
> +
> +	while (n < mp->m_sb.sb_rgcount) {
> +		struct blk_zone *zones = (struct blk_zone *)(rep + 1);
> +		int ret;
> +
> +		memset(rep, 0, rep_size);
> +		rep->sector = sector;
> +		rep->nr_zones = ZONES_PER_IOCTL;
> +
> +		ret = ioctl(fd, BLKREPORTZONE, rep);
> +		if (ret) {
> +			do_error(_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
> +			goto out_free;
> +		}
> +		if (!rep->nr_zones)
> +			break;
> +
> +		for (i = 0; i < rep->nr_zones; i++) {
> +			if (n >= mp->m_sb.sb_rgcount)
> +				break;
> +
> +			if (zones[i].len != zone_size) {
> +				do_error(_("Inconsistent zone size!\n"));
> +				goto out_free;
> +			}
> +
> +			switch (zones[i].type) {
> +			case BLK_ZONE_TYPE_CONVENTIONAL:
> +			case BLK_ZONE_TYPE_SEQWRITE_REQ:
> +				break;
> +			case BLK_ZONE_TYPE_SEQWRITE_PREF:
> +				do_error(
> +_("Found sequential write preferred zone\n"));

I wonder, can "sequential preferred" zones be treated as if they are
conventional zones?  Albeit really slow ones?

/me goes to rummage to see if he still has one of these DMSMR disks.

--D

> +				goto out_free;
> +			default:
> +				do_error(
> +_("Found unknown zone type (0x%x)\n"), zones[i].type);
> +				goto out_free;
> +			}
> +
> +			if (!n) {
> +				zone_capacity = zones[i].capacity;
> +				if (zone_capacity > zone_size) {
> +					do_error(
> +_("Zone capacity larger than zone size!\n"));
> +					goto out_free;
> +				}
> +			} else if (zones[i].capacity != zone_capacity) {
> +				do_error(
> +_("Inconsistent zone capacity!\n"));
> +				goto out_free;
> +			}
> +
> +			report_zones_cb(mp, &zones[i]);
> +			n++;
> +		}
> +		sector = zones[rep->nr_zones - 1].start +
> +			 zones[rep->nr_zones - 1].len;
> +	}
> +
> +out_free:
> +	free(rep);
> +}
> diff --git a/repair/zoned.h b/repair/zoned.h
> new file mode 100644
> index 000000000000..ab76bf15b3ca
> --- /dev/null
> +++ b/repair/zoned.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2024 Christoph Hellwig.
> + */
> +#ifndef _XFS_REPAIR_ZONED_H_
> +#define _XFS_REPAIR_ZONED_H_
> +
> +void check_zones(struct xfs_mount *mp);
> +
> +#endif /* _XFS_REPAIR_ZONED_H_ */
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 30/45] xfs_mkfs: support creating zoned file systems
  2025-04-09  7:55 ` [PATCH 30/45] xfs_mkfs: support creating zoned file systems Christoph Hellwig
@ 2025-04-09 18:54   ` Darrick J. Wong
  2025-04-10  6:45     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 18:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:33AM +0200, Christoph Hellwig wrote:
> Default to use all sequential write required zoned for the RT device.
> 
> Default to 256 and 1% conventional when -r zoned is specified without
> further option.  This mimics a SMR HDD and works well with tests.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  libxfs/init.c     |   2 +-
>  mkfs/proto.c      |   3 +-
>  mkfs/xfs_mkfs.c   | 553 ++++++++++++++++++++++++++++++++++++++++++----
>  repair/agheader.c |   2 +-
>  4 files changed, 518 insertions(+), 42 deletions(-)
> 
> diff --git a/libxfs/init.c b/libxfs/init.c
> index a186369f3fd8..393a94673f7e 100644
> --- a/libxfs/init.c
> +++ b/libxfs/init.c
> @@ -251,7 +251,7 @@ libxfs_close_devices(
>  		libxfs_device_close(&li->data);
>  	if (li->log.dev && li->log.dev != li->data.dev)
>  		libxfs_device_close(&li->log);
> -	if (li->rt.dev)
> +	if (li->rt.dev && li->rt.dev != li->data.dev)
>  		libxfs_device_close(&li->rt);
>  }
>  
> diff --git a/mkfs/proto.c b/mkfs/proto.c
> index 7f56a3d82a06..7f80bef838be 100644
> --- a/mkfs/proto.c
> +++ b/mkfs/proto.c
> @@ -1144,7 +1144,8 @@ rtinit_groups(
>  				fail(_("rtrmap rtsb init failed"), error);
>  		}
>  
> -		rtfreesp_init(rtg);
> +		if (!xfs_has_zoned(mp))
> +			rtfreesp_init(rtg);
>  	}
>  }
>  
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 39e3349205fb..133ede8d8483 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -6,6 +6,8 @@
>  #include "libfrog/util.h"
>  #include "libxfs.h"
>  #include <ctype.h>
> +#include <linux/blkzoned.h>
> +#include "libxfs/xfs_zones.h"
>  #include "xfs_multidisk.h"
>  #include "libxcmd.h"
>  #include "libfrog/fsgeom.h"
> @@ -135,6 +137,9 @@ enum {
>  	R_RGCOUNT,
>  	R_RGSIZE,
>  	R_CONCURRENCY,
> +	R_ZONED,
> +	R_START,
> +	R_RESERVED,
>  	R_MAX_OPTS,
>  };
>  
> @@ -739,6 +744,9 @@ static struct opt_params ropts = {
>  		[R_RGCOUNT] = "rgcount",
>  		[R_RGSIZE] = "rgsize",
>  		[R_CONCURRENCY] = "concurrency",
> +		[R_ZONED] = "zoned",
> +		[R_START] = "start",
> +		[R_RESERVED] = "reserved",
>  		[R_MAX_OPTS] = NULL,
>  	},
>  	.subopt_params = {
> @@ -804,6 +812,28 @@ static struct opt_params ropts = {
>  		  .maxval = INT_MAX,
>  		  .defaultval = 1,
>  		},
> +		{ .index = R_ZONED,
> +		  .conflicts = { { &ropts, R_EXTSIZE },
> +				 { NULL, LAST_CONFLICT } },
> +		  .minval = 0,
> +		  .maxval = 1,
> +		  .defaultval = 1,
> +		},
> +		{ .index = R_START,
> +		  .conflicts = { { &ropts, R_DEV },
> +				 { NULL, LAST_CONFLICT } },
> +		  .convert = true,
> +		  .minval = 0,
> +		  .maxval = LLONG_MAX,
> +		  .defaultval = SUBOPT_NEEDS_VAL,
> +		},
> +		{ .index = R_RESERVED,
> +		  .conflicts = { { NULL, LAST_CONFLICT } },
> +		  .convert = true,
> +		  .minval = 0,
> +		  .maxval = LLONG_MAX,
> +		  .defaultval = SUBOPT_NEEDS_VAL,
> +		},
>  	},
>  };
>  
> @@ -1012,6 +1042,8 @@ struct sb_feat_args {
>  	bool	nortalign;
>  	bool	nrext64;
>  	bool	exchrange;		/* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
> +	bool	zoned;
> +	bool	zone_gaps;
>  
>  	uint16_t qflags;
>  };
> @@ -1035,6 +1067,8 @@ struct cli_params {
>  	char	*lsu;
>  	char	*rtextsize;
>  	char	*rtsize;
> +	char	*rtstart;
> +	uint64_t rtreserved;
>  
>  	/* parameters where 0 is a valid CLI value */
>  	int	dsunit;
> @@ -1121,6 +1155,8 @@ struct mkfs_params {
>  	char		*label;
>  
>  	struct sb_feat_args	sb_feat;
> +	uint64_t	rtstart;
> +	uint64_t	rtreserved;
>  };
>  
>  /*
> @@ -1172,7 +1208,7 @@ usage( void )
>  /* prototype file */	[-p fname]\n\
>  /* quiet */		[-q]\n\
>  /* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
> -			    concurrency=num]\n\
> +			    concurrency=num,zoned=0|1,start=n,reserved=n]\n\
>  /* sectorsize */	[-s size=num]\n\
>  /* version */		[-V]\n\
>  			devicename\n\
> @@ -1539,6 +1575,30 @@ discard_blocks(int fd, uint64_t nsectors, int quiet)
>  		printf("Done.\n");
>  }
>  
> +static void
> +reset_zones(struct mkfs_params *cfg, int fd, uint64_t start_sector,
> +		uint64_t nsectors, int quiet)
> +{
> +	struct blk_zone_range range = {
> +		.sector		= start_sector,
> +		.nr_sectors	= nsectors,
> +	};
> +
> +	if (!quiet) {
> +		printf("Resetting zones...");
> +		fflush(stdout);
> +	}
> +
> +	if (ioctl(fd, BLKRESETZONE, &range) < 0) {
> +		if (!quiet)
> +			printf(" FAILED\n");

Should we print /why/ the zone reset failed?

> +		exit(1);
> +	}
> +
> +	if (!quiet)
> +		printf("Done.\n");
> +}
> +
>  static __attribute__((noreturn)) void
>  illegal_option(
>  	const char		*value,
> @@ -2144,6 +2204,15 @@ rtdev_opts_parser(
>  	case R_CONCURRENCY:
>  		set_rtvol_concurrency(opts, subopt, cli, value);
>  		break;
> +	case R_ZONED:
> +		cli->sb_feat.zoned = getnum(value, opts, subopt);
> +		break;
> +	case R_START:
> +		cli->rtstart = getstr(value, opts, subopt);
> +		break;
> +	case R_RESERVED:
> +		cli->rtreserved = getnum(value, opts, subopt);
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> @@ -2445,7 +2514,208 @@ _("Version 1 logs do not support sector size %d\n"),
>  _("log stripe unit specified, using v2 logs\n"));
>  		cli->sb_feat.log_version = 2;
>  	}
> +}
> +
> +struct zone_info {
> +	/* number of zones, conventional or sequential */
> +	unsigned int		nr_zones;
> +	/* number of conventional zones */
> +	unsigned int		nr_conv_zones;
> +
> +	/* size of the address space for a zone, in 512b blocks */
> +	xfs_daddr_t		zone_size;
> +	/* write capacity of a zone, in 512b blocks */
> +	xfs_daddr_t		zone_capacity;
> +};
>  
> +struct zone_topology {
> +	struct zone_info	data;
> +	struct zone_info	rt;
> +	struct zone_info	log;
> +};
> +
> +/* random size that allows efficient processing */
> +#define ZONES_PER_IOCTL			16384
> +
> +static int report_zones(const char *name, struct zone_info *zi)
> +{
> +	struct blk_zone_report *rep;
> +	size_t rep_size;
> +	struct stat st;
> +	unsigned int i, n = 0;
> +	uint64_t device_size;
> +	uint64_t sector = 0;
> +	bool found_seq = false;
> +	int ret = 0;
> +	int fd;

Nit: indenting

> +
> +	fd = open(name, O_RDONLY);
> +	if (fd < 0)
> +		return -EIO;
> +
> +	if (fstat(fd, &st) < 0) {
> +		ret = -EIO;
> +		goto out_close;
> +	}
> +        if (!S_ISBLK(st.st_mode))

    ^^^^^^ especially here

> +		goto out_close;
> +
> +	if (ioctl(fd, BLKGETSIZE64, &device_size)) {
> +		ret = -EIO;

ret = errno; ?  But then...

> +		goto out_close;
> +	}

...what's the point in returning errors if the caller never checks?

> +	if (ioctl(fd, BLKGETZONESZ, &zi->zone_size) || !zi->zone_size)
> +		goto out_close; /* not zoned */
> +
> +	device_size /= 512; /* BLKGETSIZE64 reports a byte value */

BTOBB

> +	zi->nr_zones = device_size / zi->zone_size;
> +	zi->nr_conv_zones = 0;
> +
> +	rep_size = sizeof(struct blk_zone_report) +
> +		   sizeof(struct blk_zone) * ZONES_PER_IOCTL;
> +	rep = malloc(rep_size);
> +	if (!rep) {
> +		ret = -ENOMEM;
> +		goto out_close;
> +	}
> +
> +	while (n < zi->nr_zones) {
> +		struct blk_zone *zones = (struct blk_zone *)(rep + 1);
> +
> +		memset(rep, 0, rep_size);
> +		rep->sector = sector;
> +		rep->nr_zones = ZONES_PER_IOCTL;
> +
> +		ret = ioctl(fd, BLKREPORTZONE, rep);
> +		if (ret) {
> +			fprintf(stderr,
> +_("ioctl(BLKREPORTZONE) failed: %d!\n"), ret);
> +			goto out_free;
> +		}
> +		if (!rep->nr_zones)
> +			break;
> +
> +		for (i = 0; i < rep->nr_zones; i++) {
> +			if (n >= zi->nr_zones)
> +				break;
> +
> +			if (zones[i].len != zi->zone_size) {
> +				fprintf(stderr,
> +_("Inconsistent zone size!\n"));
> +				ret = -EIO;
> +				goto out_free;
> +			}
> +
> +			switch (zones[i].type) {
> +			case BLK_ZONE_TYPE_CONVENTIONAL:
> +				/*
> +				 * We can only use the conventional space at the
> +				 * start of the device for metadata, so don't
> +				 * count later conventional zones.  This is
> +				 * not an error because we can use them for data
> +				 * just fine.
> +				 */
> +				if (!found_seq)
> +					zi->nr_conv_zones++;
> +				break;
> +			case BLK_ZONE_TYPE_SEQWRITE_REQ:
> +				found_seq = true;
> +				break;
> +			case BLK_ZONE_TYPE_SEQWRITE_PREF:
> +				fprintf(stderr,
> +_("Sequential write preferred zones not supported.\n"));
> +				ret = -EIO;
> +				goto out_free;
> +			default:
> +				fprintf(stderr,
> +_("Unknown zone type (0x%x) found.\n"), zones[i].type);
> +				ret = -EIO;
> +				goto out_free;
> +			}
> +
> +			if (!n) {
> +				zi->zone_capacity = zones[i].capacity;
> +				if (zi->zone_capacity > zi->zone_size) {
> +					fprintf(stderr,
> +_("Zone capacity larger than zone size!\n"));
> +					ret = -EIO;
> +					goto out_free;
> +				}
> +			} else if (zones[i].capacity != zi->zone_capacity) {
> +				fprintf(stderr,
> +_("Inconsistent zone capacity!\n"));
> +				ret = -EIO;
> +				goto out_free;
> +			}
> +
> +			n++;
> +		}
> +		sector = zones[rep->nr_zones - 1].start +
> +			 zones[rep->nr_zones - 1].len;
> +	}
> +
> +out_free:
> +	free(rep);
> +out_close:
> +	close(fd);
> +	return ret;
> +}
> +
> +static void
> +validate_zoned(
> +	struct mkfs_params	*cfg,
> +	struct cli_params	*cli,
> +	struct mkfs_default_params *dft,
> +	struct zone_topology	*zt)
> +{
> +	if (!cli->xi->data.isfile) {
> +		report_zones(cli->xi->data.name, &zt->data);
> +		if (zt->data.nr_zones) {
> +			if (!zt->data.nr_conv_zones) {
> +				fprintf(stderr,
> +_("Data devices requires conventional zones.\n"));
> +				usage();
> +			}
> +			if (zt->data.zone_capacity != zt->data.zone_size) {
> +				fprintf(stderr,
> +_("Zone capacity equal to Zone size required for conventional zones.\n"));
> +				usage();
> +			}
> +
> +			cli->sb_feat.zoned = true;
> +			cfg->rtstart =
> +				zt->data.nr_conv_zones * zt->data.zone_capacity;
> +		}
> +	}
> +
> +	if (cli->xi->rt.name && !cli->xi->rt.isfile) {
> +		report_zones(cli->xi->rt.name, &zt->rt);
> +		if (zt->rt.nr_zones && !cli->sb_feat.zoned)
> +			cli->sb_feat.zoned = true;
> +		if (zt->rt.zone_size != zt->rt.zone_capacity)
> +			cli->sb_feat.zone_gaps = true;
> +	}
> +
> +	if (cli->xi->log.name && !cli->xi->log.isfile) {
> +		report_zones(cli->xi->log.name, &zt->log);
> +		if (zt->log.nr_zones) {
> +			fprintf(stderr,
> +_("Zoned devices not supported as log device!\n"));

Too bad, we really ought to be able to write logs to a zone device.
But that's not in scope here.

> +			usage();
> +		}
> +	}
> +
> +	if (cli->rtstart) {
> +		if (cfg->rtstart) {

Er... why are we checking the variable that we set four lines down?
Is this supposed to be a check for external zoned rt devices?

> +			fprintf(stderr,
> +_("rtstart override not allowed on zoned devices.\n"));
> +			usage();
> +		}
> +		cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;
> +	}
> +
> +	if (cli->rtreserved)
> +		cfg->rtreserved = cli->rtreserved;
>  }
>  
>  /*
> @@ -2670,7 +2940,37 @@ _("inode btree counters not supported without finobt support\n"));
>  		cli->sb_feat.inobtcnt = false;
>  	}
>  
> -	if (cli->xi->rt.name) {
> +	if (cli->sb_feat.zoned) {
> +		if (!cli->sb_feat.metadir) {
> +			if (cli_opt_set(&mopts, M_METADIR)) {
> +				fprintf(stderr,
> +_("zoned realtime device not supported without metadir support\n"));
> +				usage();
> +			}
> +			cli->sb_feat.metadir = true;
> +		}
> +		if (cli->rtextsize) {
> +			if (cli_opt_set(&ropts, R_EXTSIZE)) {
> +				fprintf(stderr,
> +_("rt extent size not supported on realtime devices with zoned mode\n"));
> +				usage();
> +			}
> +			cli->rtextsize = 0;
> +		}
> +	} else {
> +		if (cli->rtstart) {
> +			fprintf(stderr,
> +_("internal RT section only supported in zoned mode\n"));
> +			usage();
> +		}
> +		if (cli->rtreserved) {
> +			fprintf(stderr,
> +_("reserved RT blocks only supported in zoned mode\n"));
> +			usage();
> +		}
> +	}
> +
> +	if (cli->xi->rt.name || cfg->rtstart) {
>  		if (cli->rtextsize && cli->sb_feat.reflink) {
>  			if (cli_opt_set(&mopts, M_REFLINK)) {
>  				fprintf(stderr,
> @@ -2911,6 +3211,11 @@ validate_rtextsize(
>  			usage();
>  		}
>  		cfg->rtextblocks = 1;
> +	} else if (cli->sb_feat.zoned) {
> +		/*
> +		 * Zoned mode only supports a rtextsize of 1.
> +		 */
> +		cfg->rtextblocks = 1;
>  	} else {
>  		/*
>  		 * If realtime extsize has not been specified by the user,
> @@ -3315,7 +3620,8 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
>  static void
>  open_devices(
>  	struct mkfs_params	*cfg,
> -	struct libxfs_init	*xi)
> +	struct libxfs_init	*xi,
> +	struct zone_topology	*zt)
>  {
>  	uint64_t		sector_mask;
>  
> @@ -3330,6 +3636,34 @@ open_devices(
>  		usage();
>  	}
>  
> +	if (zt->data.nr_zones) {
> +		zt->rt.zone_size = zt->data.zone_size;
> +		zt->rt.zone_capacity = zt->data.zone_capacity;
> +		zt->rt.nr_zones = zt->data.nr_zones - zt->data.nr_conv_zones;
> +	} else if (cfg->sb_feat.zoned && !cfg->rtstart && !xi->rt.dev) {
> +		/*
> +		 * By default reserve at 1% of the total capacity (rounded up to
> +		 * the next power of two) for metadata, but match the minimum we
> +		 * enforce elsewhere. This matches what SMR HDDs provide.
> +		 */
> +		uint64_t rt_target_size = max((xi->data.size + 99) / 100,
> +					      BTOBB(300 * 1024 * 1024));
> +
> +		cfg->rtstart = 1;
> +		while (cfg->rtstart < rt_target_size)
> +			cfg->rtstart <<= 1;
> +	}
> +
> +	if (cfg->rtstart) {
> +		if (cfg->rtstart >= xi->data.size) {
> +			fprintf(stderr,
> + _("device size %lld too small for zoned allocator\n"), xi->data.size);
> +			usage();
> +		}
> +		xi->rt.size = xi->data.size - cfg->rtstart;
> +		xi->data.size = cfg->rtstart;
> +	}
> +
>  	/*
>  	 * Ok, Linux only has a 1024-byte resolution on device _size_,
>  	 * and the sizes below are in basic 512-byte blocks,
> @@ -3348,17 +3682,42 @@ open_devices(
>  
>  static void
>  discard_devices(
> +	struct mkfs_params	*cfg,
>  	struct libxfs_init	*xi,
> +	struct zone_topology	*zt,
>  	int			quiet)
>  {
>  	/*
>  	 * This function has to be called after libxfs has been initialized.
>  	 */
>  
> -	if (!xi->data.isfile)
> -		discard_blocks(xi->data.fd, xi->data.size, quiet);
> -	if (xi->rt.dev && !xi->rt.isfile)
> -		discard_blocks(xi->rt.fd, xi->rt.size, quiet);
> +	if (!xi->data.isfile) {
> +		uint64_t	nsectors = xi->data.size;
> +
> +		if (cfg->rtstart && zt->data.nr_zones) {
> +			/*
> +			 * Note that the zone reset here includes the LBA range
> +			 * for the data device.
> +			 *
> +			 * This is because doing a single zone reset all on the
> +			 * entire device (which the kernel automatically does
> +			 * for us for a full device range) is a lot faster than
> +			 * resetting each zone individually and resetting
> +			 * the conventional zones used for the data device is a
> +			 * no-op.
> +			 */
> +			reset_zones(cfg, xi->data.fd, 0,
> +					cfg->rtstart + xi->rt.size, quiet);
> +			nsectors -= cfg->rtstart;
> +		}
> +		discard_blocks(xi->data.fd, nsectors, quiet);
> +	}
> +	if (xi->rt.dev && !xi->rt.isfile) {
> +		if (zt->rt.nr_zones)
> +			reset_zones(cfg, xi->rt.fd, 0, xi->rt.size, quiet);
> +		else
> +			discard_blocks(xi->rt.fd, xi->rt.size, quiet);
> +	}
>  	if (xi->log.dev && xi->log.dev != xi->data.dev && !xi->log.isfile)
>  		discard_blocks(xi->log.fd, xi->log.size, quiet);
>  }
> @@ -3477,11 +3836,12 @@ reported by the device (%u).\n"),
>  static void
>  validate_rtdev(
>  	struct mkfs_params	*cfg,
> -	struct cli_params	*cli)
> +	struct cli_params	*cli,
> +	struct zone_topology	*zt)
>  {
>  	struct libxfs_init	*xi = cli->xi;
>  
> -	if (!xi->rt.dev) {
> +	if (!xi->rt.dev && !cfg->rtstart) {
>  		if (cli->rtsize) {
>  			fprintf(stderr,
>  _("size specified for non-existent rt subvolume\n"));
> @@ -3501,7 +3861,7 @@ _("size specified for non-existent rt subvolume\n"));
>  	if (cli->rtsize) {
>  		if (cfg->rtblocks > DTOBT(xi->rt.size, cfg->blocklog)) {
>  			fprintf(stderr,
> -_("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
> +_("size %s specified for rt subvolume is too large, maximum is %lld blocks\n"),
>  				cli->rtsize,
>  				(long long)DTOBT(xi->rt.size, cfg->blocklog));
>  			usage();
> @@ -3512,6 +3872,9 @@ _("size %s specified for rt subvolume is too large, maxi->um is %lld blocks\n"),
>  reported by the device (%u).\n"),
>  				cfg->sectorsize, xi->rt.bsize);
>  		}
> +	} else if (zt->rt.nr_zones) {
> +		cfg->rtblocks = DTOBT(zt->rt.nr_zones * zt->rt.zone_capacity,
> +				      cfg->blocklog);
>  	} else {
>  		/* grab volume size */
>  		cfg->rtblocks = DTOBT(xi->rt.size, cfg->blocklog);
> @@ -3950,6 +4313,42 @@ out:
>  	cfg->rgcount = howmany(cfg->rtblocks, cfg->rgsize);
>  }
>  
> +static void
> +validate_rtgroup_geometry(
> +	struct mkfs_params	*cfg)
> +{
> +	if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
> +		fprintf(stderr,
> +_("realtime group size (%llu) must be less than the maximum (%u)\n"),
> +				(unsigned long long)cfg->rgsize,
> +				XFS_MAX_RGBLOCKS);
> +		usage();
> +	}
> +
> +	if (cfg->rgsize % cfg->rtextblocks != 0) {
> +		fprintf(stderr,
> +_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
> +				(unsigned long long)cfg->rgsize,
> +				(unsigned long long)cfg->rtextblocks);
> +		usage();
> +	}
> +
> +	if (cfg->rgsize <= cfg->rtextblocks) {
> +		fprintf(stderr,
> +_("realtime group size (%llu) must be at least two realtime extents\n"),
> +				(unsigned long long)cfg->rgsize);
> +		usage();
> +	}
> +
> +	if (cfg->rgcount > XFS_MAX_RGNUMBER) {
> +		fprintf(stderr,
> +_("realtime group count (%llu) must be less than the maximum (%u)\n"),
> +				(unsigned long long)cfg->rgcount,
> +				XFS_MAX_RGNUMBER);
> +		usage();
> +	}
> +}

Hoisting this out probably should've been a separate patch.

<snip>

> diff --git a/repair/agheader.c b/repair/agheader.c
> index 5bb4e47e0c5b..048e6c3143b5 100644
> --- a/repair/agheader.c
> +++ b/repair/agheader.c

Should this be in a different patch?

--D

> @@ -486,7 +486,7 @@ secondary_sb_whack(
>  	 * size is the size of data which is valid for this sb.
>  	 */
>  	if (xfs_sb_version_haszoned(sb))
> -		size = offsetofend(struct xfs_dsb, sb_rtstart);
> +		size = offsetofend(struct xfs_dsb, sb_rtreserved);
>  	else if (xfs_sb_version_hasmetadir(sb))
>  		size = offsetofend(struct xfs_dsb, sb_pad);
>  	else if (xfs_sb_version_hasmetauuid(sb))
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems
  2025-04-09  7:55 ` [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
@ 2025-04-09 18:59   ` Darrick J. Wong
  2025-04-10  6:45     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 18:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:35AM +0200, Christoph Hellwig wrote:
> Zone file systems are intended to use sequential write required zones
> (or areas treated as such) for data, and the main data device only for
> metadata.  rtinherit=1 is the way to achieve that, so enabled it by
> default.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mkfs/xfs_mkfs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 6b5eb9eb140a..7d4114e8a2ea 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2957,6 +2957,13 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
>  			}
>  			cli->rtextsize = 0;
>  		}
> +
> +		/*
> +		 * Force the rtinherit flag on the root inode for zoned file
> +		 * systems as they use the data device only as a metadata
> +		 * container.
> +		 */
> +		cli->fsx.fsx_xflags |= FS_XFLAG_RTINHERIT;

If the caller specified -d rtinherit=0, this will override their choice.
Perhaps only do this if !cli_opt_set(&dopts, D_RTINHERIT) ?  I can
imagine people trying to combine a large SSD and a large SMR drive and
wanting to be able to store files on both devices.

--D

>  	} else {
>  		if (cli->rtstart) {
>  			fprintf(stderr,
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now
  2025-04-09  7:55 ` [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
@ 2025-04-09 19:00   ` Darrick J. Wong
  2025-04-10  6:46     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:36AM +0200, Christoph Hellwig wrote:
> Until GC is enhanced to not unshared reflinked blocks we better prohibit
> this combination.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

...or gc figures out how to do reflinked moves.  Either way,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  mkfs/xfs_mkfs.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 7d4114e8a2ea..0e351f9efc32 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2957,6 +2957,14 @@ _("rt extent size not supported on realtime devices with zoned mode\n"));
>  			}
>  			cli->rtextsize = 0;
>  		}
> +		if (cli->sb_feat.reflink) {
> +			if (cli_opt_set(&mopts, M_REFLINK)) {
> +				fprintf(stderr,
> +_("reflink not supported on realtime devices with zoned mode specified\n"));
> +				usage();
> +			}
> +			cli->sb_feat.reflink = false;
> +		}
>  
>  		/*
>  		 * Force the rtinherit flag on the root inode for zoned file
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page
  2025-04-09  7:55 ` [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
@ 2025-04-09 19:00   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:37AM +0200, Christoph Hellwig wrote:
> Add documentation for the zoned file system specific options.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  man/man8/mkfs.xfs.8.in | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
> index 37e3a88e7ac7..27df7f4546c7 100644
> --- a/man/man8/mkfs.xfs.8.in
> +++ b/man/man8/mkfs.xfs.8.in
> @@ -1248,6 +1248,23 @@ The magic value of
>  .I 0
>  forces use of the older rtgroups geometry calculations that is used for
>  mechanical storage.
> +.TP
> +.BI zoned= value
> +Controls if the zoned allocator is used for the realtime device.
> +The value is either 0 to disable the feature, or 1 to enable it.
> +Defaults to 1 for zoned block device, else 0.
> +.TP
> +.BI start= value
> +Controls the start of the internal realtime section.  Defaults to 0
> +for conventional block devices, or the start of the first sequential
> +required zone for zoned block devices.
> +This option is only valid if the zoned realtime allocator is used.
> +.TP
> +.BI reserved= value
> +Controls the amount of space in the realtime section that is reserved for
> +internal use by garbage collection and reorganization algorithms.
> +Defaults 0 if not set.

"Defaults to 0"

With that fixed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> +This option is only valid if the zoned realtime allocator is used.
>  .RE
>  .PP
>  .PD 0
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 35/45] libfrog: report the zoned geometry
  2025-04-09  7:55 ` [PATCH 35/45] libfrog: report the zoned geometry Christoph Hellwig
@ 2025-04-09 19:01   ` Darrick J. Wong
  2025-04-10  7:02     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:38AM +0200, Christoph Hellwig wrote:
> Also fix up to report all the zoned information in a separate line,
> which also helps with alignment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Could all these xfs_report_geom should be a single patch?
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  libfrog/fsgeom.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
> index 5c4ba29ca9ac..b4107b133861 100644
> --- a/libfrog/fsgeom.c
> +++ b/libfrog/fsgeom.c
> @@ -70,7 +70,8 @@ xfs_report_geom(
>  "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
>  "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
>  "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
> -"         =%-22s rgcount=%-4d rgsize=%u extents, zoned=%d\n"),
> +"         =%-22s rgcount=%-4d rgsize=%u extents\n"
> +"         =%-22s zoned=%-6d start=%llu reserved=%llu\n"),
>  		mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
>  		"", geo->sectsize, attrversion, projid32bit,
>  		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
> @@ -86,7 +87,8 @@ xfs_report_geom(
>  		!geo->rtblocks ? _("none") : rtname ? rtname : _("internal"),
>  		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
>  			(unsigned long long)geo->rtextents,
> -		"", geo->rgcount, geo->rgextents, zoned);
> +		"", geo->rgcount, geo->rgextents,
> +		"", zoned, geo->rtstart, geo->rtreserved);
>  }
>  
>  /* Try to obtain the xfs geometry.  On error returns a negative error code. */
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size
  2025-04-09  7:55 ` [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
@ 2025-04-09 19:06   ` Darrick J. Wong
  2025-04-10  7:00     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:34AM +0200, Christoph Hellwig wrote:
> When size is specified for zoned file systems, calculate the required
> over provisioning to back the requested capacity.
> 
> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mkfs/xfs_mkfs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 133ede8d8483..6b5eb9eb140a 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -4413,6 +4413,49 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
>  			NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
>  }
>  
> +/*
> + * If we're creating a zoned filesystem and the user specified a size, add
> + * enough over-provisioning to be able to back the requested amount of
> + * writable space.
> + */
> +static void
> +adjust_nr_zones(
> +	struct mkfs_params	*cfg,
> +	struct cli_params	*cli,
> +	struct libxfs_init	*xi,
> +	struct zone_topology	*zt)
> +{
> +	uint64_t		new_rtblocks, slack;
> +	unsigned int		max_zones;
> +
> +	if (zt->rt.nr_zones)
> +		max_zones = zt->rt.nr_zones;
> +	else
> +		max_zones = DTOBT(xi->rt.size, cfg->blocklog) / cfg->rgsize;
> +
> +	if (!cli->rgcount)
> +		cfg->rgcount += XFS_RESERVED_ZONES;
> +	if (cfg->rgcount > max_zones) {
> +		cfg->rgcount = max_zones;
> +		fprintf(stderr,
> +_("Warning: not enough zones for backing requested rt size due to\n"
> +  "over-provisioning needs, writeable size will be less than %s\n"),

Nit: "writable", not "writeable"

With that fixed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> +			cli->rtsize);
> +	}
> +	new_rtblocks = (cfg->rgcount * cfg->rgsize);
> +	slack = (new_rtblocks - cfg->rtblocks) % cfg->rgsize;
> +
> +	cfg->rtblocks = new_rtblocks;
> +	cfg->rtextents = cfg->rtblocks / cfg->rtextblocks;
> +
> +	/*
> +	 * Add the slack to the end of the last zone to the reserved blocks.
> +	 * This ensures the visible user capacity is exactly the one that the
> +	 * user asked for.
> +	 */
> +	cfg->rtreserved += (slack * cfg->blocksize);
> +}
> +
>  static void
>  calculate_zone_geometry(
>  	struct mkfs_params	*cfg,
> @@ -4485,6 +4528,9 @@ _("rgsize (%s) not a multiple of fs blk size (%d)\n"),
>  		}
>  	}
>  
> +	if (cli->rtsize || cli->rgcount)
> +		adjust_nr_zones(cfg, cli, xi, zt);
> +
>  	if (cfg->rgcount < XFS_MIN_ZONES)  {
>  		fprintf(stderr,
>  _("realtime group count (%llu) must be greater than the minimum zone count (%u)\n"),
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED
  2025-04-09  7:55 ` [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
@ 2025-04-09 19:13   ` Darrick J. Wong
  2025-04-10  6:53     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:39AM +0200, Christoph Hellwig wrote:
> Document the new zoned feature flag and the two new fields added
> with it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  man/man2/ioctl_xfs_fsgeometry.2 | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
> index 502054f391e9..df7234e603d2 100644
> --- a/man/man2/ioctl_xfs_fsgeometry.2
> +++ b/man/man2/ioctl_xfs_fsgeometry.2
> @@ -50,7 +50,9 @@ struct xfs_fsop_geom {
>  	__u32         sick;
>  	__u32         checked;
>  	__u64         rgextents;
> -	__u64         reserved[16];
> +	__u64	      rtstart;
> +	__u64         rtreserved;
> +	__u64         reserved[14];
>  };
>  .fi
>  .in
> @@ -143,6 +145,20 @@ for more details.
>  .I rgextents
>  Is the number of RT extents in each rtgroup.
>  .PP
> +.I rtstart
> +Start of the internal RT device in fsblocks.  0 if an external log device

external RT device?

--D

> +is used.
> +This field is meaningful only if the flag
> +.B  XFS_FSOP_GEOM_FLAGS_ZONED
> +is set.
> +.PP
> +.I rtreserved
> +The amount of space in the realtime section that is reserved for internal use
> +by garbage collection and reorganization algorithms in fsblocks.
> +This field is meaningful only if the flag
> +.B  XFS_FSOP_GEOM_FLAGS_ZONED
> +is set.
> +.PP
>  .I reserved
>  is set to zero.
>  .SH FILESYSTEM FEATURE FLAGS
> @@ -221,6 +237,9 @@ Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
>  .TP
>  .B XFS_FSOP_GEOM_FLAGS_METADIR
>  Filesystem contains a metadata directory tree.
> +.TP
> +.B XFS_FSOP_GEOM_FLAGS_ZONED
> +Filesystem uses the zoned allocator for the RT device.
>  .RE
>  .SH XFS METADATA HEALTH REPORTING
>  .PP
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 41/45] xfs_spaceman: handle internal RT devices
  2025-04-09  7:55 ` [PATCH 41/45] xfs_spaceman: handle internal RT devices Christoph Hellwig
@ 2025-04-09 19:29   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:44AM +0200, Christoph Hellwig wrote:
> Handle the synthetic fmr_device values for fsmap.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks correct to me
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  spaceman/freesp.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/spaceman/freesp.c b/spaceman/freesp.c
> index dfbec52a7160..9ad321c4843f 100644
> --- a/spaceman/freesp.c
> +++ b/spaceman/freesp.c
> @@ -140,12 +140,19 @@ scan_ag(
>  	if (agno != NULLAGNUMBER) {
>  		l->fmr_physical = cvt_agbno_to_b(xfd, agno, 0);
>  		h->fmr_physical = cvt_agbno_to_b(xfd, agno + 1, 0);
> -		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
> +		if (file->xfd.fsgeom.rtstart)
> +			l->fmr_device = XFS_DEV_DATA;
> +		else
> +			l->fmr_device = file->fs_path.fs_datadev;
>  	} else {
>  		l->fmr_physical = 0;
>  		h->fmr_physical = ULLONG_MAX;
> -		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
> +		if (file->xfd.fsgeom.rtstart)
> +			l->fmr_device = XFS_DEV_RT;
> +		else
> +			l->fmr_device = file->fs_path.fs_rtdev;
>  	}
> +		h->fmr_device = l->fmr_device;
>  	h->fmr_owner = ULLONG_MAX;
>  	h->fmr_flags = UINT_MAX;
>  	h->fmr_offset = ULLONG_MAX;
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 42/45] xfs_scrub: support internal RT sections
  2025-04-09  7:55 ` [PATCH 42/45] xfs_scrub: support internal RT sections Christoph Hellwig
@ 2025-04-09 19:30   ` Darrick J. Wong
  2025-04-10  6:57     ` Christoph Hellwig
  0 siblings, 1 reply; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:45AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  scrub/phase1.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scrub/phase1.c b/scrub/phase1.c
> index d03a9099a217..e71cab7b7d90 100644
> --- a/scrub/phase1.c
> +++ b/scrub/phase1.c
> @@ -341,7 +341,8 @@ _("Kernel metadata repair facility is not available.  Use -n to scrub."));
>  _("Unable to find log device path."));
>  		return ECANCELED;
>  	}
> -	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
> +	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL &&
> +	    !(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_ZONED)) {

Shouldn't this be gated on ctx->mnt.fsgeom.rtstart == 0 instead of
!ZONED?  I think we still want to be able to do media scans of zoned
external rt devices.

--D

>  		str_error(ctx, ctx->mntpoint,
>  _("Unable to find realtime device path."));
>  		return ECANCELED;
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 43/45] xfs_scrub: handle internal RT devices
  2025-04-09  7:55 ` [PATCH 43/45] xfs_scrub: handle internal RT devices Christoph Hellwig
@ 2025-04-09 19:34   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:46AM +0200, Christoph Hellwig wrote:
> Handle the synthetic fmr_device values.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  scrub/phase6.c   | 65 ++++++++++++++++++++++++++++--------------------
>  scrub/phase7.c   | 28 +++++++++++++++------
>  scrub/spacemap.c | 17 +++++++++----
>  3 files changed, 70 insertions(+), 40 deletions(-)
> 
> diff --git a/scrub/phase6.c b/scrub/phase6.c
> index 2a52b2c92419..abf6f9713f1a 100644
> --- a/scrub/phase6.c
> +++ b/scrub/phase6.c
> @@ -56,12 +56,21 @@ dev_to_pool(
>  	struct media_verify_state	*vs,
>  	dev_t				dev)
>  {
> -	if (dev == ctx->fsinfo.fs_datadev)
> -		return vs->rvp_data;
> -	else if (dev == ctx->fsinfo.fs_logdev)
> -		return vs->rvp_log;
> -	else if (dev == ctx->fsinfo.fs_rtdev)
> -		return vs->rvp_realtime;
> +	if (ctx->mnt.fsgeom.rtstart) {
> +		if (dev == XFS_DEV_DATA)
> +			return vs->rvp_data;
> +		if (dev == XFS_DEV_LOG)
> +			return vs->rvp_log;
> +		if (dev == XFS_DEV_RT)
> +			return vs->rvp_realtime;
> +	} else {
> +		if (dev == ctx->fsinfo.fs_datadev)
> +			return vs->rvp_data;
> +		if (dev == ctx->fsinfo.fs_logdev)
> +			return vs->rvp_log;
> +		if (dev == ctx->fsinfo.fs_rtdev)
> +			return vs->rvp_realtime;
> +	}
>  	abort();
>  }
>  
> @@ -71,12 +80,21 @@ disk_to_dev(
>  	struct scrub_ctx	*ctx,
>  	struct disk		*disk)
>  {
> -	if (disk == ctx->datadev)
> -		return ctx->fsinfo.fs_datadev;
> -	else if (disk == ctx->logdev)
> -		return ctx->fsinfo.fs_logdev;
> -	else if (disk == ctx->rtdev)
> -		return ctx->fsinfo.fs_rtdev;
> +	if (ctx->mnt.fsgeom.rtstart) {
> +		if (disk == ctx->datadev)
> +			return XFS_DEV_DATA;
> +		if (disk == ctx->logdev)
> +			return XFS_DEV_LOG;
> +		if (disk == ctx->rtdev)
> +			return XFS_DEV_RT;
> +	} else {
> +		if (disk == ctx->datadev)
> +			return ctx->fsinfo.fs_datadev;
> +		if (disk == ctx->logdev)
> +			return ctx->fsinfo.fs_logdev;
> +		if (disk == ctx->rtdev)
> +			return ctx->fsinfo.fs_rtdev;
> +	}
>  	abort();
>  }
>  
> @@ -87,11 +105,9 @@ bitmap_for_disk(
>  	struct disk			*disk,
>  	struct media_verify_state	*vs)
>  {
> -	dev_t				dev = disk_to_dev(ctx, disk);
> -
> -	if (dev == ctx->fsinfo.fs_datadev)
> +	if (disk == ctx->datadev)
>  		return vs->d_bad;
> -	else if (dev == ctx->fsinfo.fs_rtdev)
> +	if (disk == ctx->rtdev)
>  		return vs->r_bad;
>  	return NULL;
>  }
> @@ -501,14 +517,11 @@ report_ioerr(
>  		.length			= length,
>  	};
>  	struct disk_ioerr_report	*dioerr = arg;
> -	dev_t				dev;
> -
> -	dev = disk_to_dev(dioerr->ctx, dioerr->disk);
>  
>  	/* Go figure out which blocks are bad from the fsmap. */
> -	keys[0].fmr_device = dev;
> +	keys[0].fmr_device = disk_to_dev(dioerr->ctx, dioerr->disk);
>  	keys[0].fmr_physical = start;
> -	keys[1].fmr_device = dev;
> +	keys[1].fmr_device = keys[0].fmr_device;
>  	keys[1].fmr_physical = start + length - 1;
>  	keys[1].fmr_owner = ULLONG_MAX;
>  	keys[1].fmr_offset = ULLONG_MAX;
> @@ -675,14 +688,12 @@ remember_ioerr(
>  	int				ret;
>  
>  	if (!length) {
> -		dev_t			dev = disk_to_dev(ctx, disk);
> -
> -		if (dev == ctx->fsinfo.fs_datadev)
> +		if (disk == ctx->datadev)
>  			vs->d_trunc = true;
> -		else if (dev == ctx->fsinfo.fs_rtdev)
> -			vs->r_trunc = true;
> -		else if (dev == ctx->fsinfo.fs_logdev)
> +		else if (disk == ctx->logdev)
>  			vs->l_trunc = true;
> +		else if (disk == ctx->rtdev)
> +			vs->r_trunc = true;
>  		return;
>  	}
>  
> diff --git a/scrub/phase7.c b/scrub/phase7.c
> index 01097b678798..e25502668b1c 100644
> --- a/scrub/phase7.c
> +++ b/scrub/phase7.c
> @@ -68,25 +68,37 @@ count_block_summary(
>  	void			*arg)
>  {
>  	struct summary_counts	*counts;
> +	bool			is_rt = false;
>  	unsigned long long	len;
>  	int			ret;
>  
> +	if (ctx->mnt.fsgeom.rtstart) {
> +		if (fsmap->fmr_device == XFS_DEV_LOG)
> +			return 0;
> +		if (fsmap->fmr_device == XFS_DEV_RT)
> +			is_rt = true;
> +	} else {
> +		if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
> +			return 0;
> +		if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
> +			is_rt = true;
> +	}
> +
>  	counts = ptvar_get((struct ptvar *)arg, &ret);
>  	if (ret) {
>  		str_liberror(ctx, -ret, _("retrieving summary counts"));
>  		return -ret;
>  	}
> -	if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
> -		return 0;
> +
>  	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
>  	    fsmap->fmr_owner == XFS_FMR_OWN_FREE) {
>  		uint64_t	blocks;
>  
>  		blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
> -		if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
> -			hist_add(&counts->datadev_hist, blocks);
> -		else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
> +		if (is_rt)
>  			hist_add(&counts->rtdev_hist, blocks);
> +		else
> +			hist_add(&counts->datadev_hist, blocks);
>  		return 0;
>  	}
>  
> @@ -94,10 +106,10 @@ count_block_summary(
>  
>  	/* freesp btrees live in free space, need to adjust counters later. */
>  	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
> -	    fsmap->fmr_owner == XFS_FMR_OWN_AG) {
> +	    fsmap->fmr_owner == XFS_FMR_OWN_AG)
>  		counts->agbytes += fsmap->fmr_length;
> -	}
> -	if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
> +
> +	if (is_rt) {
>  		/* Count realtime extents. */
>  		counts->rbytes += len;
>  	} else {
> diff --git a/scrub/spacemap.c b/scrub/spacemap.c
> index c293ab44a528..1ee4d1946d3d 100644
> --- a/scrub/spacemap.c
> +++ b/scrub/spacemap.c
> @@ -103,9 +103,12 @@ scan_ag_rmaps(
>  	bperag = (off_t)ctx->mnt.fsgeom.agblocks *
>  		 (off_t)ctx->mnt.fsgeom.blocksize;
>  
> -	keys[0].fmr_device = ctx->fsinfo.fs_datadev;
> +	if (ctx->mnt.fsgeom.rtstart)
> +		keys[0].fmr_device = XFS_DEV_DATA;
> +	else
> +		keys[0].fmr_device = ctx->fsinfo.fs_datadev;
>  	keys[0].fmr_physical = agno * bperag;
> -	keys[1].fmr_device = ctx->fsinfo.fs_datadev;
> +	keys[1].fmr_device = keys[0].fmr_device;
>  	keys[1].fmr_physical = ((agno + 1) * bperag) - 1;
>  	keys[1].fmr_owner = ULLONG_MAX;
>  	keys[1].fmr_offset = ULLONG_MAX;
> @@ -140,9 +143,12 @@ scan_rtg_rmaps(
>  	off_t			bperrg = bytes_per_rtgroup(&ctx->mnt.fsgeom);
>  	int			ret;
>  
> -	keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
> +	if (ctx->mnt.fsgeom.rtstart)
> +		keys[0].fmr_device = XFS_DEV_RT;
> +	else
> +		keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
>  	keys[0].fmr_physical = (xfs_rtblock_t)rgno * bperrg;
> -	keys[1].fmr_device = ctx->fsinfo.fs_rtdev;
> +	keys[1].fmr_device = keys[0].fmr_device;
>  	keys[1].fmr_physical = ((rgno + 1) * bperrg) - 1;
>  	keys[1].fmr_owner = ULLONG_MAX;
>  	keys[1].fmr_offset = ULLONG_MAX;
> @@ -216,7 +222,8 @@ scan_log_rmaps(
>  {
>  	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
>  
> -	scan_dev_rmaps(ctx, ctx->fsinfo.fs_logdev, arg);
> +	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
> +			arg);
>  }
>  
>  /*
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 45/45] xfs_growfs: support internal RT devices
  2025-04-09  7:55 ` [PATCH 45/45] xfs_growfs: " Christoph Hellwig
@ 2025-04-09 19:35   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:48AM +0200, Christoph Hellwig wrote:
> Allow RT growfs when rtstart is set in the geomety, and adjust the
> queried size for it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Seems fine to me...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  growfs/xfs_growfs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/growfs/xfs_growfs.c b/growfs/xfs_growfs.c
> index 4b941403e2fd..0d0b2ae3e739 100644
> --- a/growfs/xfs_growfs.c
> +++ b/growfs/xfs_growfs.c
> @@ -202,7 +202,7 @@ main(int argc, char **argv)
>  			progname, fname);
>  		exit(1);
>  	}
> -	if (rflag && !xi.rt.dev) {
> +	if (rflag && (!xi.rt.dev && !geo.rtstart)) {
>  		fprintf(stderr,
>  			_("%s: failed to access realtime device for %s\n"),
>  			progname, fname);
> @@ -211,6 +211,13 @@ main(int argc, char **argv)
>  
>  	xfs_report_geom(&geo, datadev, logdev, rtdev);
>  
> +	if (geo.rtstart) {
> +		xfs_daddr_t rtstart = geo.rtstart * (geo.blocksize / BBSIZE);
> +
> +		xi.rt.size = xi.data.size - rtstart;
> +		xi.data.size = rtstart;
> +	}
> +
>  	ddsize = xi.data.size;
>  	dlsize = (xi.log.size ? xi.log.size :
>  			geo.logblocks * (geo.blocksize / BBSIZE) );
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 44/45] xfs_mdrestore: support internal RT devices
  2025-04-09  7:55 ` [PATCH 44/45] xfs_mdrestore: support " Christoph Hellwig
@ 2025-04-09 19:36   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 19:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:47AM +0200, Christoph Hellwig wrote:
> Calculate the size properly for internal RT devices and skip restoring
> to the external one for this case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  mdrestore/xfs_mdrestore.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
> index d5014981b15a..95b01a99a154 100644
> --- a/mdrestore/xfs_mdrestore.c
> +++ b/mdrestore/xfs_mdrestore.c
> @@ -183,6 +183,20 @@ verify_device_size(
>  	}
>  }
>  
> +static void
> +verify_main_device_size(
> +	const struct mdrestore_dev	*dev,
> +	struct xfs_sb			*sb)
> +{
> +	xfs_rfsblock_t			nr_blocks = sb->sb_dblocks;
> +
> +	/* internal RT device */
> +	if (sb->sb_rtstart)
> +		nr_blocks = sb->sb_rtstart + sb->sb_rblocks;
> +
> +	verify_device_size(dev, nr_blocks, sb->sb_blocksize);
> +}
> +
>  static void
>  read_header_v1(
>  	union mdrestore_headers	*h,
> @@ -269,7 +283,7 @@ restore_v1(
>  
>  	((struct xfs_dsb*)block_buffer)->sb_inprogress = 1;
>  
> -	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
> +	verify_main_device_size(ddev, &sb);
>  
>  	bytes_read = 0;
>  
> @@ -432,14 +446,14 @@ restore_v2(
>  
>  	((struct xfs_dsb *)block_buffer)->sb_inprogress = 1;
>  
> -	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
> +	verify_main_device_size(ddev, &sb);
>  
>  	if (sb.sb_logstart == 0) {
>  		ASSERT(mdrestore.external_log == true);
>  		verify_device_size(logdev, sb.sb_logblocks, sb.sb_blocksize);
>  	}
>  
> -	if (sb.sb_rblocks > 0) {
> +	if (sb.sb_rblocks > 0 && !sb.sb_rtstart) {
>  		ASSERT(mdrestore.realtime_data == true);
>  		verify_device_size(rtdev, sb.sb_rblocks, sb.sb_blocksize);
>  	}
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 39/45] xfs_io: don't re-query geometry information in fsmap_f
  2025-04-09  7:55 ` [PATCH 39/45] xfs_io: don't re-query geometry " Christoph Hellwig
@ 2025-04-09 20:46   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 20:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:42AM +0200, Christoph Hellwig wrote:
> But use the information store in "file".
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  io/fsmap.c | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/io/fsmap.c b/io/fsmap.c
> index 6a87e8972f26..3cc1b510316c 100644
> --- a/io/fsmap.c
> +++ b/io/fsmap.c
> @@ -166,9 +166,9 @@ static void
>  dump_map_verbose(
>  	unsigned long long	*nr,
>  	struct fsmap_head	*head,
> -	bool			*dumped_flags,
> -	struct xfs_fsop_geom	*fsgeo)
> +	bool			*dumped_flags)
>  {
> +	struct xfs_fsop_geom	*fsgeo = &file->geom;
>  	unsigned long long	i;
>  	struct fsmap		*p;
>  	int			agno;
> @@ -395,7 +395,6 @@ fsmap_f(
>  	struct fsmap		*p;
>  	struct fsmap_head	*head;
>  	struct fsmap		*l, *h;
> -	struct xfs_fsop_geom	fsgeo;
>  	long long		start = 0;
>  	long long		end = -1;
>  	int			map_size;
> @@ -470,17 +469,6 @@ fsmap_f(
>  		end <<= BBSHIFT;
>  	}
>  
> -	if (vflag) {
> -		c = -xfrog_geometry(file->fd, &fsgeo);
> -		if (c) {
> -			fprintf(stderr,
> -				_("%s: can't get geometry [\"%s\"]: %s\n"),
> -				progname, file->name, strerror(c));
> -			exitcode = 1;
> -			return 0;
> -		}
> -	}

This leads to a regression on ext4:

# xfs_io -c 'fsmap -vvvvvv' /opt/
Floating point exception

which is a divide by zero in the line:

	agno = p->fmr_physical / bperag;

because bperag is 0 on ext4.  The current behavior is slightly less
unfriendly:

# xfs_io -c 'fsmap -vvvvvv' /opt/
xfs_io: can't get geometry ["/opt/"]: Inappropriate ioctl for device

because the xfrog_geometry() call here fails.

--D

> -
>  	map_size = nflag ? nflag : 131072 / sizeof(struct fsmap);
>  	head = malloc(fsmap_sizeof(map_size));
>  	if (head == NULL) {
> @@ -531,7 +519,7 @@ fsmap_f(
>  			break;
>  
>  		if (vflag)
> -			dump_map_verbose(&nr, head, &dumped_flags, &fsgeo);
> +			dump_map_verbose(&nr, head, &dumped_flags);
>  		else if (mflag)
>  			dump_map_machine(&nr, head);
>  		else
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f
  2025-04-09  7:55 ` [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
@ 2025-04-09 20:48   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 20:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:41AM +0200, Christoph Hellwig wrote:
> But reuse the information stash in "file".
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Seems fine to me.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  io/fsmap.c | 25 ++++++++-----------------
>  1 file changed, 8 insertions(+), 17 deletions(-)
> 
> diff --git a/io/fsmap.c b/io/fsmap.c
> index 6de720f238bb..6a87e8972f26 100644
> --- a/io/fsmap.c
> +++ b/io/fsmap.c
> @@ -14,6 +14,7 @@
>  
>  static cmdinfo_t	fsmap_cmd;
>  static dev_t		xfs_data_dev;
> +static dev_t		xfs_log_dev;
>  static dev_t		xfs_rt_dev;
>  
>  static void
> @@ -405,8 +406,6 @@ fsmap_f(
>  	int			c;
>  	unsigned long long	nr = 0;
>  	size_t			fsblocksize, fssectsize;
> -	struct fs_path		*fs;
> -	static bool		tab_init;
>  	bool			dumped_flags = false;
>  	int			dflag, lflag, rflag;
>  
> @@ -491,15 +490,19 @@ fsmap_f(
>  		return 0;
>  	}
>  
> +	xfs_data_dev = file->fs_path.fs_datadev;
> +	xfs_log_dev = file->fs_path.fs_logdev;
> +	xfs_rt_dev = file->fs_path.fs_rtdev;
> +
>  	memset(head, 0, sizeof(*head));
>  	l = head->fmh_keys;
>  	h = head->fmh_keys + 1;
>  	if (dflag) {
> -		l->fmr_device = h->fmr_device = file->fs_path.fs_datadev;
> +		l->fmr_device = h->fmr_device = xfs_data_dev;
>  	} else if (lflag) {
> -		l->fmr_device = h->fmr_device = file->fs_path.fs_logdev;
> +		l->fmr_device = h->fmr_device = xfs_log_dev;
>  	} else if (rflag) {
> -		l->fmr_device = h->fmr_device = file->fs_path.fs_rtdev;
> +		l->fmr_device = h->fmr_device = xfs_rt_dev;
>  	} else {
>  		l->fmr_device = 0;
>  		h->fmr_device = UINT_MAX;
> @@ -510,18 +513,6 @@ fsmap_f(
>  	h->fmr_flags = UINT_MAX;
>  	h->fmr_offset = ULLONG_MAX;
>  
> -	/*
> -	 * If this is an XFS filesystem, remember the data device.
> -	 * (We report AG number/block for data device extents on XFS).
> -	 */
> -	if (!tab_init) {
> -		fs_table_initialise(0, NULL, 0, NULL);
> -		tab_init = true;
> -	}
> -	fs = fs_table_lookup(file->name, FS_MOUNT_POINT);
> -	xfs_data_dev = fs ? fs->fs_datadev : 0;
> -	xfs_rt_dev = fs ? fs->fs_rtdev : 0;
> -
>  	head->fmh_count = map_size;
>  	do {
>  		/* Get some extents */
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output
  2025-04-09  7:55 ` [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
@ 2025-04-09 21:48   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 21:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:43AM +0200, Christoph Hellwig wrote:
> Deal with the synthetic fmr_device values and the rt device offset when
> calculating RG numbers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  io/fsmap.c | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/io/fsmap.c b/io/fsmap.c
> index 3cc1b510316c..41f2da50f344 100644
> --- a/io/fsmap.c
> +++ b/io/fsmap.c
> @@ -247,8 +247,13 @@ dump_map_verbose(
>  				(long long)BTOBBT(agoff),
>  				(long long)BTOBBT(agoff + p->fmr_length - 1));
>  		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
> -			agno = p->fmr_physical / bperrtg;
> -			agoff = p->fmr_physical % bperrtg;
> +			uint64_t start = p->fmr_physical -
> +				fsgeo->rtstart * fsgeo->blocksize;
> +
> +			agno = start / bperrtg;
> +			if (agno < 0)
> +				agno = -1;
> +			agoff = start % bperrtg;
>  			snprintf(abuf, sizeof(abuf),
>  				"(%lld..%lld)",
>  				(long long)BTOBBT(agoff),
> @@ -326,8 +331,13 @@ dump_map_verbose(
>  				"%lld",
>  				(long long)agno);
>  		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
> -			agno = p->fmr_physical / bperrtg;
> -			agoff = p->fmr_physical % bperrtg;
> +			uint64_t start = p->fmr_physical -
> +				fsgeo->rtstart * fsgeo->blocksize;
> +
> +			agno = start / bperrtg;
> +			if (agno < 0)
> +				agno = -1;
> +			agoff = start % bperrtg;
>  			snprintf(abuf, sizeof(abuf),
>  				"(%lld..%lld)",
>  				(long long)BTOBBT(agoff),
> @@ -478,9 +488,18 @@ fsmap_f(
>  		return 0;
>  	}
>  
> -	xfs_data_dev = file->fs_path.fs_datadev;
> -	xfs_log_dev = file->fs_path.fs_logdev;
> -	xfs_rt_dev = file->fs_path.fs_rtdev;
> +	/*
> +	 * File systems with internal rt device use synthetic device values.
> +	 */
> +	if (file->geom.rtstart) {
> +		xfs_data_dev = XFS_DEV_DATA;
> +		xfs_log_dev = XFS_DEV_LOG;
> +		xfs_rt_dev = XFS_DEV_RT;
> +	} else {
> +		xfs_data_dev = file->fs_path.fs_datadev;
> +		xfs_log_dev = file->fs_path.fs_logdev;
> +		xfs_rt_dev = file->fs_path.fs_rtdev;
> +	}
>  
>  	memset(head, 0, sizeof(*head));
>  	l = head->fmh_keys;
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output
  2025-04-09  7:55 ` [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
@ 2025-04-09 22:22   ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-09 22:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:55:40AM +0200, Christoph Hellwig wrote:
> Apply the proper offset.  Somehow this made gcc complain about
> possible overflowing abuf, so increase the size for that as well.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine to me...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  io/bmap.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/io/bmap.c b/io/bmap.c
> index b2f6b4905285..944f658b35f0 100644
> --- a/io/bmap.c
> +++ b/io/bmap.c
> @@ -257,18 +257,21 @@ bmap_f(
>  #define	FLG_BSW		0000010	/* Not on begin of stripe width */
>  #define	FLG_ESW		0000001	/* Not on end   of stripe width */
>  		int	agno;
> -		off_t	agoff, bbperag;
> +		off_t	agoff, bbperag, bstart;
>  		int	foff_w, boff_w, aoff_w, tot_w, agno_w;
> -		char	rbuf[32], bbuf[32], abuf[32];
> +		char	rbuf[32], bbuf[32], abuf[64];
>  		int	sunit, swidth;
>  
>  		foff_w = boff_w = aoff_w = MINRANGE_WIDTH;
>  		tot_w = MINTOT_WIDTH;
>  		if (is_rt) {
> +			bstart = fsgeo.rtstart *
> +				(fsgeo.blocksize / BBSIZE);
>  			bbperag = bytes_per_rtgroup(&fsgeo) / BBSIZE;
>  			sunit = 0;
>  			swidth = 0;
>  		} else {
> +			bstart = 0;
>  			bbperag = (off_t)fsgeo.agblocks *
>  				  (off_t)fsgeo.blocksize / BBSIZE;
>  			sunit = (fsgeo.sunit * fsgeo.blocksize) / BBSIZE;
> @@ -298,9 +301,11 @@ bmap_f(
>  						map[i + 1].bmv_length - 1LL));
>  				boff_w = max(boff_w, strlen(bbuf));
>  				if (bbperag > 0) {
> -					agno = map[i + 1].bmv_block / bbperag;
> -					agoff = map[i + 1].bmv_block -
> -							(agno * bbperag);
> +					off_t bno;
> +
> +					bno = map[i + 1].bmv_block - bstart;
> +					agno = bno / bbperag;
> +					agoff = bno % bbperag;
>  					snprintf(abuf, sizeof(abuf),
>  						"(%lld..%lld)",
>  						(long long)agoff,
> @@ -387,9 +392,11 @@ bmap_f(
>  				printf("%4d: %-*s %-*s", i, foff_w, rbuf,
>  					boff_w, bbuf);
>  				if (bbperag > 0) {
> -					agno = map[i + 1].bmv_block / bbperag;
> -					agoff = map[i + 1].bmv_block -
> -							(agno * bbperag);
> +					off_t bno;
> +
> +					bno = map[i + 1].bmv_block - bstart;
> +					agno = bno / bbperag;
> +					agoff = bno % bbperag;
>  					snprintf(abuf, sizeof(abuf),
>  						"(%lld..%lld)",
>  						(long long)agoff,
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 04/45] FIXUP: xfs: make metabtree reservations global
  2025-04-09 15:43   ` Darrick J. Wong
@ 2025-04-10  6:00     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 08:43:03AM -0700, Darrick J. Wong wrote:
> > +	error = -libxfs_metafile_resv_init(mp);
> > +	if (error)
> > +		prealloc_fail(mp, error, 0, _("metafile"));
> 
> Could this be _("metadata files") so that the error message becomes:
> 
> mkfs.xfs: cannot handle expansion of metadata files; need 55 free blocks, have 7
> 
> With that changed,

Sure.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 11/45] FIXUP: xfs: define the zoned on-disk format
  2025-04-09 15:47   ` Darrick J. Wong
  2025-04-09 16:04     ` Darrick J. Wong
@ 2025-04-10  6:01     ` Christoph Hellwig
  2025-04-10 16:31       ` Darrick J. Wong
  1 sibling, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 08:47:15AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:14AM +0200, Christoph Hellwig wrote:
> 
> No SoB?

The FIXUPs don't really have a clear process yet.  I'll add them where
missing, but I'd also be interested if these fixups are a generally
useful process we should follow.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/45] FIXUP: xfs: allow internal RT devices for zoned mode
  2025-04-09 15:55   ` Darrick J. Wong
@ 2025-04-10  6:09     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:09 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 08:55:48AM -0700, Darrick J. Wong wrote:
> Hum.  This change means that if you call xfs_db -c info without
> supplying a realtime device, the info command output will claim an
> internal rt device:
> 
> $ truncate -s 3g /tmp/a
> $ truncate -s 3g /tmp/b
> $ mkfs.xfs -f /tmp/a -r rtdev=/tmp/b
> $ xfs_db -c info /tmp/a

I guess this wants a regression test while we're at it.

> static inline const char *
> rtdev_name(
> 	const struct xfs_fsop_geo	*geo,
> 	const char			*rtname)
> {
> 	if (!geo->rtblocks)
> 		return _("none");
> 	if (geo->rtstart)
> 		return _("internal");
> 	if (!rtname)
> 		return _("external");
> 	return rtname;
> }
> 
> instead?

That should work.  Although the rtstart handling needs to be later
as it doesn't exist yet here.  I've added the helper for now and
will see how it works out while I finish the rebase.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 26/45] libfrog: report the zoned flag
  2025-04-09 15:58   ` Darrick J. Wong
@ 2025-04-10  6:14     ` Christoph Hellwig
  2025-04-10 16:36       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 08:58:46AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:29AM +0200, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I wonder why the other xfs_report_geom change (about the realtime device
> name reporting) wasn't in this patch?

Good question.  I gave it a quick try and it seems to work out much
better indeed and also solve the rtstart issue mentioned earlier.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 27/45] xfs_repair: support repairing zoned file systems
  2025-04-09 16:10   ` Darrick J. Wong
@ 2025-04-10  6:27     ` Christoph Hellwig
  2025-04-10 16:41       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:10:12AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:30AM +0200, Christoph Hellwig wrote:
> > Note really much to do here.  Mostly ignore the validation and
> > regeneration of the bitmap and summary inodes.  Eventually this
> > could grow a bit of validation of the hardware zone state.
> 
> What do we actually do about the hardware zone state?  If the write
> pointer is lower than wherever the rtrmapbt thinks it is, then we're
> screwed, right?

Yes.  See offlist discussion with Hans.

> Does it matter if the hw write pointer is higher than where the rtrmapbt
> thinks it is?  In that case, a new write will be beyond the last write
> that the filesystem knows about, but the device will tell us the disk
> address so it's all good aside from the freertx counters being wrong.  I
> think?

Yes.  That's actually a totally expected case for an unclean shutdown
with data I/O in flight.  freertx is recalculate at each mount, and
the space not recorded in the rmapbt is simply marked as reclaimable
through garbage collection.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int
  2025-04-09 16:11   ` Darrick J. Wong
@ 2025-04-10  6:29     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 09:11:55AM -0700, Darrick J. Wong wrote:
> > index 7bdd3dcf15c1..0c559c408085 100644
> > --- a/repair/dinode.c
> > +++ b/repair/dinode.c
> > @@ -3265,8 +3265,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
> >  			flags &= XFS_DIFLAG_ANY;
> >  		}
> >  
> > -		/* need an rt-dev for the realtime flag! */
> > -		if ((flags & XFS_DIFLAG_REALTIME) && !rt_name) {
> > +		/* need an rt-dev for the realtime flag */
> > +		if ((flags & XFS_DIFLAG_REALTIME) &&
> > +		    !mp->m_rtdev_targp) {
> 
> If we're going to check the fs geometry, then why not check
> mp->m_sb.sb_rextents != 0?

I'll give it a spin.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones
  2025-04-09 18:41   ` Darrick J. Wong
@ 2025-04-10  6:34     ` Christoph Hellwig
  2025-04-10 16:43       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 11:41:12AM -0700, Darrick J. Wong wrote:
> > +#define ZONES_PER_IOCTL			16384
> > +
> > +static void
> > +report_zones_cb(
> > +	struct xfs_mount	*mp,
> > +	struct blk_zone		*zone)
> > +{
> > +	xfs_fsblock_t		zsbno = xfs_daddr_to_rtb(mp, zone->start);
> 
>         ^^^^^^^^^^^^^ nit: xfs_rtblock_t ?

Updated.

> Nit: inconsistent styles in declaration indentation

Fixed.

> > +	device_size /= 512; /* BLKGETSIZE64 reports a byte value */
> 
> device_size = BTOBB(device_size); ?

Sure.

> > +
> > +			switch (zones[i].type) {
> > +			case BLK_ZONE_TYPE_CONVENTIONAL:
> > +			case BLK_ZONE_TYPE_SEQWRITE_REQ:
> > +				break;
> > +			case BLK_ZONE_TYPE_SEQWRITE_PREF:
> > +				do_error(
> > +_("Found sequential write preferred zone\n"));
> 
> I wonder, can "sequential preferred" zones be treated as if they are
> conventional zones?  Albeit really slow ones?

Yes, they could.  However in the kernel we've decided that dealing
them is too painful for the few prototypes build that way and reject
them in the block layer.  So we won't ever seem them here except with
a rather old kernel.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 30/45] xfs_mkfs: support creating zoned file systems
  2025-04-09 18:54   ` Darrick J. Wong
@ 2025-04-10  6:45     ` Christoph Hellwig
  2025-04-10 16:45       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 11:54:49AM -0700, Darrick J. Wong wrote:
> > +	if (ioctl(fd, BLKRESETZONE, &range) < 0) {
> > +		if (!quiet)
> > +			printf(" FAILED\n");
> 
> Should we print /why/ the zone reset failed?

As in the errno value?  Sure.

> > +static int report_zones(const char *name, struct zone_info *zi)
> > +{
> > +	struct blk_zone_report *rep;
> > +	size_t rep_size;
> > +	struct stat st;
> > +	unsigned int i, n = 0;
> > +	uint64_t device_size;
> > +	uint64_t sector = 0;
> > +	bool found_seq = false;
> > +	int ret = 0;
> > +	int fd;
> 
> Nit: indenting

Fixed.

> > +		goto out_close;
> > +
> > +	if (ioctl(fd, BLKGETSIZE64, &device_size)) {
> > +		ret = -EIO;
> 
> ret = errno; ?  But then...
> 
> > +		goto out_close;
> > +	}
> 
> ...what's the point in returning errors if the caller never checks?

Heh, I'll look into that.

> > +	if (cli->xi->log.name && !cli->xi->log.isfile) {
> > +		report_zones(cli->xi->log.name, &zt->log);
> > +		if (zt->log.nr_zones) {
> > +			fprintf(stderr,
> > +_("Zoned devices not supported as log device!\n"));
> 
> Too bad, we really ought to be able to write logs to a zone device.
> But that's not in scope here.

That is on my todo list, but I need to finish support for the zoned RT
device first.

> 
> > +			usage();
> > +		}
> > +	}
> > +
> > +	if (cli->rtstart) {
> > +		if (cfg->rtstart) {
> 
> Er... why are we checking the variable that we set four lines down?
> Is this supposed to be a check for external zoned rt devices?
> 
> > +			fprintf(stderr,
> > +_("rtstart override not allowed on zoned devices.\n"));
> > +			usage();
> > +		}
> > +		cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;

For devices with hardware zones rtstart is already set when we get
here and we don't want to allow overriding with the command line
parameter as that won't work.

> > +static void
> > +validate_rtgroup_geometry(
> > +	struct mkfs_params	*cfg)

> Hoisting this out probably should've been a separate patch.

Sure, I'll add a new one for the refactoring.

> <snip>
> 
> > diff --git a/repair/agheader.c b/repair/agheader.c
> > index 5bb4e47e0c5b..048e6c3143b5 100644
> > --- a/repair/agheader.c
> > +++ b/repair/agheader.c
> 
> Should this be in a different patch?

Yes.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems
  2025-04-09 18:59   ` Darrick J. Wong
@ 2025-04-10  6:45     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 11:59:21AM -0700, Darrick J. Wong wrote:
> > +		/*
> > +		 * Force the rtinherit flag on the root inode for zoned file
> > +		 * systems as they use the data device only as a metadata
> > +		 * container.
> > +		 */
> > +		cli->fsx.fsx_xflags |= FS_XFLAG_RTINHERIT;
> 
> If the caller specified -d rtinherit=0, this will override their choice.
> Perhaps only do this if !cli_opt_set(&dopts, D_RTINHERIT) ?  I can
> imagine people trying to combine a large SSD and a large SMR drive and
> wanting to be able to store files on both devices.

Sounds reasonable.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now
  2025-04-09 19:00   ` Darrick J. Wong
@ 2025-04-10  6:46     ` Christoph Hellwig
  2025-04-10 16:47       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 12:00:00PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:36AM +0200, Christoph Hellwig wrote:
> > Until GC is enhanced to not unshared reflinked blocks we better prohibit
> > this combination.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> ...or gc figures out how to do reflinked moves.  Either way,

That's what I mean with not unsharing.  I can adjust the wording
if mine was too confusing, though.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED
  2025-04-09 19:13   ` Darrick J. Wong
@ 2025-04-10  6:53     ` Christoph Hellwig
  2025-04-10 16:47       ` Darrick J. Wong
  0 siblings, 1 reply; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 12:13:40PM -0700, Darrick J. Wong wrote:
> > +Start of the internal RT device in fsblocks.  0 if an external log device
> 
> external RT device?

instead of the log device?  Yes.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 42/45] xfs_scrub: support internal RT sections
  2025-04-09 19:30   ` Darrick J. Wong
@ 2025-04-10  6:57     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  6:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 12:30:49PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:45AM +0200, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  scrub/phase1.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/scrub/phase1.c b/scrub/phase1.c
> > index d03a9099a217..e71cab7b7d90 100644
> > --- a/scrub/phase1.c
> > +++ b/scrub/phase1.c
> > @@ -341,7 +341,8 @@ _("Kernel metadata repair facility is not available.  Use -n to scrub."));
> >  _("Unable to find log device path."));
> >  		return ECANCELED;
> >  	}
> > -	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
> > +	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL &&
> > +	    !(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_ZONED)) {
> 
> Shouldn't this be gated on ctx->mnt.fsgeom.rtstart == 0 instead of
> !ZONED?  I think we still want to be able to do media scans of zoned
> external rt devices.

It checks for ctx->fsinfo.fs_rt above, which is non-NULL for
external devices.  But looking at rtstart might still be better, I'll
look into it.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size
  2025-04-09 19:06   ` Darrick J. Wong
@ 2025-04-10  7:00     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  7:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 12:06:14PM -0700, Darrick J. Wong wrote:
> > +_("Warning: not enough zones for backing requested rt size due to\n"
> > +  "over-provisioning needs, writeable size will be less than %s\n"),
> 
> Nit: "writable", not "writeable"

Fixed, thanks.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 35/45] libfrog: report the zoned geometry
  2025-04-09 19:01   ` Darrick J. Wong
@ 2025-04-10  7:02     ` Christoph Hellwig
  0 siblings, 0 replies; 95+ messages in thread
From: Christoph Hellwig @ 2025-04-10  7:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrey Albershteyn, Hans Holmberg, linux-xfs

On Wed, Apr 09, 2025 at 12:01:44PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 09, 2025 at 09:55:38AM +0200, Christoph Hellwig wrote:
> > Also fix up to report all the zoned information in a separate line,
> > which also helps with alignment.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Could all these xfs_report_geom should be a single patch?

Yes, that works much better.

> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

Although between all the changes I won't add the Reviewed-by tag for now.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 11/45] FIXUP: xfs: define the zoned on-disk format
  2025-04-10  6:01     ` Christoph Hellwig
@ 2025-04-10 16:31       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:01:32AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 08:47:15AM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 09, 2025 at 09:55:14AM +0200, Christoph Hellwig wrote:
> > 
> > No SoB?
> 
> The FIXUPs don't really have a clear process yet.  I'll add them where
> missing, but I'd also be interested if these fixups are a generally
> useful process we should follow.

I like the FIXUPS approach because (a) it makes it easier to keep the
libxfs ports in sync with the kernel with a diff/patch script; and (b)
the subtleties of the userspace port are in a separate patch where it's
much easier to see.  The biggest downside is that it breaks compile
testing of every commit in the branch, but that can be worked around
with some (ugh) filtering heuristics on the subject line.

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 26/45] libfrog: report the zoned flag
  2025-04-10  6:14     ` Christoph Hellwig
@ 2025-04-10 16:36       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:14:24AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 08:58:46AM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 09, 2025 at 09:55:29AM +0200, Christoph Hellwig wrote:
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > I wonder why the other xfs_report_geom change (about the realtime device
> > name reporting) wasn't in this patch?
> 
> Good question.  I gave it a quick try and it seems to work out much
> better indeed and also solve the rtstart issue mentioned earlier.

<nod> I usually try to keep the FIXUP changes to the bare minimum that
makes compilation work again, and push everything else until after the
libxfs rebase part.  It sort of kept things sane.

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 27/45] xfs_repair: support repairing zoned file systems
  2025-04-10  6:27     ` Christoph Hellwig
@ 2025-04-10 16:41       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:27:49AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 09:10:12AM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 09, 2025 at 09:55:30AM +0200, Christoph Hellwig wrote:
> > > Note really much to do here.  Mostly ignore the validation and
> > > regeneration of the bitmap and summary inodes.  Eventually this
> > > could grow a bit of validation of the hardware zone state.
> > 
> > What do we actually do about the hardware zone state?  If the write
> > pointer is lower than wherever the rtrmapbt thinks it is, then we're
> > screwed, right?
> 
> Yes.  See offlist discussion with Hans.

To summarize -- in the ideal world we'd have another file mapping extent
state for "damaged".  Then we could amend the media error handler (aka
the stuff in xfs_notify_failure.c) to set all the mappings to damaged
and set an inode flag saying that the file lost data.  Reads to the
damaged areas would return EIO.  We'd also be able to report to
userspace exactly what was lost.

Unfortunately we don't really have that, so all we can do with the
current ondisk format is ... remove the mappings?

> > Does it matter if the hw write pointer is higher than where the rtrmapbt
> > thinks it is?  In that case, a new write will be beyond the last write
> > that the filesystem knows about, but the device will tell us the disk
> > address so it's all good aside from the freertx counters being wrong.  I
> > think?
> 
> Yes.  That's actually a totally expected case for an unclean shutdown
> with data I/O in flight.  freertx is recalculate at each mount, and
> the space not recorded in the rmapbt is simply marked as reclaimable
> through garbage collection.

<nod> Ok, that's what I thought was going on, thanks for the reminder.

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones
  2025-04-10  6:34     ` Christoph Hellwig
@ 2025-04-10 16:43       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:34:57AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 11:41:12AM -0700, Darrick J. Wong wrote:
> > > +#define ZONES_PER_IOCTL			16384
> > > +
> > > +static void
> > > +report_zones_cb(
> > > +	struct xfs_mount	*mp,
> > > +	struct blk_zone		*zone)
> > > +{
> > > +	xfs_fsblock_t		zsbno = xfs_daddr_to_rtb(mp, zone->start);
> > 
> >         ^^^^^^^^^^^^^ nit: xfs_rtblock_t ?
> 
> Updated.
> 
> > Nit: inconsistent styles in declaration indentation
> 
> Fixed.
> 
> > > +	device_size /= 512; /* BLKGETSIZE64 reports a byte value */
> > 
> > device_size = BTOBB(device_size); ?
> 
> Sure.
> 
> > > +
> > > +			switch (zones[i].type) {
> > > +			case BLK_ZONE_TYPE_CONVENTIONAL:
> > > +			case BLK_ZONE_TYPE_SEQWRITE_REQ:
> > > +				break;
> > > +			case BLK_ZONE_TYPE_SEQWRITE_PREF:
> > > +				do_error(
> > > +_("Found sequential write preferred zone\n"));
> > 
> > I wonder, can "sequential preferred" zones be treated as if they are
> > conventional zones?  Albeit really slow ones?
> 
> Yes, they could.  However in the kernel we've decided that dealing
> them is too painful for the few prototypes build that way and reject
> them in the block layer.  So we won't ever seem them here except with
> a rather old kernel.

Ah, I hadn't realized those aren't even supported now, but yeah:

	case BLK_ZONE_TYPE_SEQWRITE_PREF:
	default:
		pr_warn("%s: Invalid zone type 0x%x at sectors %llu\n",
			disk->disk_name, (int)zone->type, zone->start);

I guess that means drive-managed SMR disks don't export zone info at
all?

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 30/45] xfs_mkfs: support creating zoned file systems
  2025-04-10  6:45     ` Christoph Hellwig
@ 2025-04-10 16:45       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:45:01AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 11:54:49AM -0700, Darrick J. Wong wrote:
> > > +	if (ioctl(fd, BLKRESETZONE, &range) < 0) {
> > > +		if (!quiet)
> > > +			printf(" FAILED\n");
> > 
> > Should we print /why/ the zone reset failed?
> 
> As in the errno value?  Sure.

Yes.

> > > +static int report_zones(const char *name, struct zone_info *zi)
> > > +{
> > > +	struct blk_zone_report *rep;
> > > +	size_t rep_size;
> > > +	struct stat st;
> > > +	unsigned int i, n = 0;
> > > +	uint64_t device_size;
> > > +	uint64_t sector = 0;
> > > +	bool found_seq = false;
> > > +	int ret = 0;
> > > +	int fd;
> > 
> > Nit: indenting
> 
> Fixed.
> 
> > > +		goto out_close;
> > > +
> > > +	if (ioctl(fd, BLKGETSIZE64, &device_size)) {
> > > +		ret = -EIO;
> > 
> > ret = errno; ?  But then...
> > 
> > > +		goto out_close;
> > > +	}
> > 
> > ...what's the point in returning errors if the caller never checks?
> 
> Heh, I'll look into that.
> 
> > > +	if (cli->xi->log.name && !cli->xi->log.isfile) {
> > > +		report_zones(cli->xi->log.name, &zt->log);
> > > +		if (zt->log.nr_zones) {
> > > +			fprintf(stderr,
> > > +_("Zoned devices not supported as log device!\n"));
> > 
> > Too bad, we really ought to be able to write logs to a zone device.
> > But that's not in scope here.
> 
> That is on my todo list, but I need to finish support for the zoned RT
> device first.

<nod>

> > 
> > > +			usage();
> > > +		}
> > > +	}
> > > +
> > > +	if (cli->rtstart) {
> > > +		if (cfg->rtstart) {
> > 
> > Er... why are we checking the variable that we set four lines down?
> > Is this supposed to be a check for external zoned rt devices?
> > 
> > > +			fprintf(stderr,
> > > +_("rtstart override not allowed on zoned devices.\n"));
> > > +			usage();
> > > +		}
> > > +		cfg->rtstart = getnum(cli->rtstart, &ropts, R_START) / 512;
> 
> For devices with hardware zones rtstart is already set when we get
> here and we don't want to allow overriding with the command line
> parameter as that won't work.

Oh, ok.  Maybe a comment?

		/*
		 * Device probing might already have set cfg->rtstart
		 * from the zone data.
		 */
		if (cfg->rtstart) {

--D

> > > +static void
> > > +validate_rtgroup_geometry(
> > > +	struct mkfs_params	*cfg)
> 
> > Hoisting this out probably should've been a separate patch.
> 
> Sure, I'll add a new one for the refactoring.
> 
> > <snip>
> > 
> > > diff --git a/repair/agheader.c b/repair/agheader.c
> > > index 5bb4e47e0c5b..048e6c3143b5 100644
> > > --- a/repair/agheader.c
> > > +++ b/repair/agheader.c
> > 
> > Should this be in a different patch?
> 
> Yes.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now
  2025-04-10  6:46     ` Christoph Hellwig
@ 2025-04-10 16:47       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:46:31AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 12:00:00PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 09, 2025 at 09:55:36AM +0200, Christoph Hellwig wrote:
> > > Until GC is enhanced to not unshared reflinked blocks we better prohibit
> > > this combination.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > ...or gc figures out how to do reflinked moves.  Either way,
> 
> That's what I mean with not unsharing.  I can adjust the wording
> if mine was too confusing, though.

"Don't allow reflink until zonegc learns how to deal with shared
extents." ?

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED
  2025-04-10  6:53     ` Christoph Hellwig
@ 2025-04-10 16:47       ` Darrick J. Wong
  0 siblings, 0 replies; 95+ messages in thread
From: Darrick J. Wong @ 2025-04-10 16:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrey Albershteyn, Hans Holmberg, linux-xfs

On Thu, Apr 10, 2025 at 08:53:51AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 12:13:40PM -0700, Darrick J. Wong wrote:
> > > +Start of the internal RT device in fsblocks.  0 if an external log device
> > 
> > external RT device?
> 
> instead of the log device?  Yes.

Yes, that was what I meant.  Sorry that was confusing. :)

--D

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2025-04-10 16:47 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09  7:55 xfsprogs support for zoned devices Christoph Hellwig
2025-04-09  7:55 ` [PATCH 01/45] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
2025-04-09  7:55 ` [PATCH 02/45] FIXUP: " Christoph Hellwig
2025-04-09 15:32   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 03/45] xfs: make metabtree reservations global Christoph Hellwig
2025-04-09  7:55 ` [PATCH 04/45] FIXUP: " Christoph Hellwig
2025-04-09 15:43   ` Darrick J. Wong
2025-04-10  6:00     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 05/45] xfs: reduce metafile reservations Christoph Hellwig
2025-04-09  7:55 ` [PATCH 06/45] xfs: add a rtg_blocks helper Christoph Hellwig
2025-04-09  7:55 ` [PATCH 07/45] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
2025-04-09  7:55 ` [PATCH 08/45] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
2025-04-09  7:55 ` [PATCH 09/45] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
2025-04-09  7:55 ` [PATCH 10/45] xfs: define the zoned on-disk format Christoph Hellwig
2025-04-09  7:55 ` [PATCH 11/45] FIXUP: " Christoph Hellwig
2025-04-09 15:47   ` Darrick J. Wong
2025-04-09 16:04     ` Darrick J. Wong
2025-04-10  6:01     ` Christoph Hellwig
2025-04-10 16:31       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 12/45] xfs: allow internal RT devices for zoned mode Christoph Hellwig
2025-04-09  7:55 ` [PATCH 13/45] FIXUP: " Christoph Hellwig
2025-04-09 15:55   ` Darrick J. Wong
2025-04-10  6:09     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 14/45] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
2025-04-09  7:55 ` [PATCH 15/45] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
2025-04-09  7:55 ` [PATCH 16/45] xfs: parse and validate hardware zone information Christoph Hellwig
2025-04-09  7:55 ` [PATCH 17/45] FIXUP: " Christoph Hellwig
2025-04-09 15:56   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 18/45] xfs: add the zoned space allocator Christoph Hellwig
2025-04-09  7:55 ` [PATCH 19/45] xfs: add support for zoned space reservations Christoph Hellwig
2025-04-09  7:55 ` [PATCH 20/45] FIXUP: " Christoph Hellwig
2025-04-09 15:56   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 21/45] xfs: implement zoned garbage collection Christoph Hellwig
2025-04-09  7:55 ` [PATCH 22/45] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
2025-04-09  7:55 ` [PATCH 23/45] xfs: enable the zoned RT device feature Christoph Hellwig
2025-04-09  7:55 ` [PATCH 24/45] xfs: support zone gaps Christoph Hellwig
2025-04-09  7:55 ` [PATCH 25/45] FIXUP: " Christoph Hellwig
2025-04-09 15:57   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 26/45] libfrog: report the zoned flag Christoph Hellwig
2025-04-09 15:58   ` Darrick J. Wong
2025-04-10  6:14     ` Christoph Hellwig
2025-04-10 16:36       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 27/45] xfs_repair: support repairing zoned file systems Christoph Hellwig
2025-04-09 16:10   ` Darrick J. Wong
2025-04-10  6:27     ` Christoph Hellwig
2025-04-10 16:41       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 28/45] xfs_repair: fix the RT device check in process_dinode_int Christoph Hellwig
2025-04-09 16:11   ` Darrick J. Wong
2025-04-10  6:29     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 29/45] xfs_repair: validate rt groups vs reported hardware zones Christoph Hellwig
2025-04-09 18:41   ` Darrick J. Wong
2025-04-10  6:34     ` Christoph Hellwig
2025-04-10 16:43       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 30/45] xfs_mkfs: support creating zoned file systems Christoph Hellwig
2025-04-09 18:54   ` Darrick J. Wong
2025-04-10  6:45     ` Christoph Hellwig
2025-04-10 16:45       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 31/45] xfs_mkfs: calculate zone overprovisioning when specifying size Christoph Hellwig
2025-04-09 19:06   ` Darrick J. Wong
2025-04-10  7:00     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 32/45] xfs_mkfs: default to rtinherit=1 for zoned file systems Christoph Hellwig
2025-04-09 18:59   ` Darrick J. Wong
2025-04-10  6:45     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 33/45] xfs_mkfs: reflink conflicts with zoned file systems for now Christoph Hellwig
2025-04-09 19:00   ` Darrick J. Wong
2025-04-10  6:46     ` Christoph Hellwig
2025-04-10 16:47       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 34/45] xfs_mkfs: document the new zoned options in the man page Christoph Hellwig
2025-04-09 19:00   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 35/45] libfrog: report the zoned geometry Christoph Hellwig
2025-04-09 19:01   ` Darrick J. Wong
2025-04-10  7:02     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 36/45] man: document XFS_FSOP_GEOM_FLAGS_ZONED Christoph Hellwig
2025-04-09 19:13   ` Darrick J. Wong
2025-04-10  6:53     ` Christoph Hellwig
2025-04-10 16:47       ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 37/45] xfs_io: correctly report RGs with internal rt dev in bmap output Christoph Hellwig
2025-04-09 22:22   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 38/45] xfs_io: don't re-query fs_path information in fsmap_f Christoph Hellwig
2025-04-09 20:48   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 39/45] xfs_io: don't re-query geometry " Christoph Hellwig
2025-04-09 20:46   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 40/45] xfs_io: handle internal RT devices in fsmap output Christoph Hellwig
2025-04-09 21:48   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 41/45] xfs_spaceman: handle internal RT devices Christoph Hellwig
2025-04-09 19:29   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 42/45] xfs_scrub: support internal RT sections Christoph Hellwig
2025-04-09 19:30   ` Darrick J. Wong
2025-04-10  6:57     ` Christoph Hellwig
2025-04-09  7:55 ` [PATCH 43/45] xfs_scrub: handle internal RT devices Christoph Hellwig
2025-04-09 19:34   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 44/45] xfs_mdrestore: support " Christoph Hellwig
2025-04-09 19:36   ` Darrick J. Wong
2025-04-09  7:55 ` [PATCH 45/45] xfs_growfs: " Christoph Hellwig
2025-04-09 19:35   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox