fix handling of too many open zones at mount time v2

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* fix handling of too many open zones at mount time v2
@ 2026-03-31 15:26 Christoph Hellwig
  2026-03-31 15:26 ` [PATCH 1/2] xfs: refactor xfs_mount_zones Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-03-31 15:26 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

Hi all,

because there is no actual write pointer when running the zoned allocator
on conventional devices or zones, we can see spurious extra open zones
when the last blocks in a written zone have been invalidated.  This
series adds code to handle that case and remove these spurious extra
zones.  It also fixes up the mountinfo code for open zones to be
more easy to parse, and adds a new sysfs file reporting the currently
open zones, which makes it easier to use the value in tests.

A test for the open zone behavior will be added to xfstests.

Changes since v1:
 - spelling and other documentation fixes
 - move two patches not directly related to the fix to a different
   series

Diffstat:
 xfs_trace.h      |    1 
 xfs_zone_alloc.c |  129 ++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 110 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] xfs: refactor xfs_mount_zones
  2026-03-31 15:26 fix handling of too many open zones at mount time v2 Christoph Hellwig
@ 2026-03-31 15:26 ` Christoph Hellwig
  2026-03-31 19:37   ` Damien Le Moal
  2026-03-31 15:26 ` [PATCH 2/2] xfs: handle too many open zones when mounting Christoph Hellwig
  2026-04-07 13:38 ` fix handling of too many open zones at mount time v2 Carlos Maiolino
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-03-31 15:26 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

xfs_mount_zones has grown a bit too big and unorganized.  Split the
zone reporting loop into a separate helper, hiding the rtg variable
there.  Print the mount message last, and also keep the VFS writeback
chunk size last instead of in the middle of the logic to calculate
the free/available blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
---
 fs/xfs/xfs_zone_alloc.c | 54 ++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 06e2cb79030e..e9f1d9d08620 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1230,6 +1230,29 @@ xfs_free_zone_info(
 	kfree(zi);
 }
 
+static int
+xfs_report_zones(
+	struct xfs_mount	*mp,
+	struct xfs_init_zones	*iz)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		xfs_rgblock_t		write_pointer;
+		int			error;
+
+		error = xfs_query_write_pointer(iz, rtg, &write_pointer);
+		if (!error)
+			error = xfs_init_zone(iz, rtg, write_pointer);
+		if (error) {
+			xfs_rtgroup_rele(rtg);
+			return error;
+		}
+	}
+
+	return 0;
+}
+
 int
 xfs_mount_zones(
 	struct xfs_mount	*mp)
@@ -1238,7 +1261,6 @@ xfs_mount_zones(
 		.zone_capacity	= mp->m_groups[XG_TYPE_RTG].blocks,
 		.zone_size	= xfs_rtgroup_raw_size(mp),
 	};
-	struct xfs_rtgroup	*rtg = NULL;
 	int			error;
 
 	if (!mp->m_rtdev_targp) {
@@ -1268,9 +1290,13 @@ xfs_mount_zones(
 	if (!mp->m_zone_info)
 		return -ENOMEM;
 
-	xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
-		 mp->m_sb.sb_rgcount, iz.zone_capacity, mp->m_max_open_zones);
-	trace_xfs_zones_mount(mp);
+	error = xfs_report_zones(mp, &iz);
+	if (error)
+		goto out_free_zone_info;
+
+	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
+	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
+			iz.available + iz.reclaimable);
 
 	/*
 	 * The writeback code switches between inodes regularly to provide
@@ -1296,22 +1322,6 @@ xfs_mount_zones(
 		XFS_FSB_TO_B(mp, min(iz.zone_capacity, XFS_MAX_BMBT_EXTLEN)) >>
 			PAGE_SHIFT;
 
-	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
-		xfs_rgblock_t		write_pointer;
-
-		error = xfs_query_write_pointer(&iz, rtg, &write_pointer);
-		if (!error)
-			error = xfs_init_zone(&iz, rtg, write_pointer);
-		if (error) {
-			xfs_rtgroup_rele(rtg);
-			goto out_free_zone_info;
-		}
-	}
-
-	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
-	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
-			iz.available + iz.reclaimable);
-
 	/*
 	 * The user may configure GC to free up a percentage of unused blocks.
 	 * By default this is 0. GC will always trigger at the minimum level
@@ -1322,6 +1332,10 @@ xfs_mount_zones(
 	error = xfs_zone_gc_mount(mp);
 	if (error)
 		goto out_free_zone_info;
+
+	xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
+		 mp->m_sb.sb_rgcount, iz.zone_capacity, mp->m_max_open_zones);
+	trace_xfs_zones_mount(mp);
 	return 0;
 
 out_free_zone_info:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xfs: refactor xfs_mount_zones
  2026-03-31 15:26 ` [PATCH 1/2] xfs: refactor xfs_mount_zones Christoph Hellwig
@ 2026-03-31 19:37   ` Damien Le Moal
  0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2026-03-31 19:37 UTC (permalink / raw)
  To: Christoph Hellwig, Carlos Maiolino; +Cc: Hans Holmberg, linux-xfs

On 4/1/26 00:26, Christoph Hellwig wrote:
> xfs_mount_zones has grown a bit too big and unorganized.  Split the
> zone reporting loop into a separate helper, hiding the rtg variable
> there.  Print the mount message last, and also keep the VFS writeback
> chunk size last instead of in the middle of the logic to calculate
> the free/available blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] xfs: handle too many open zones when mounting
  2026-03-31 15:26 fix handling of too many open zones at mount time v2 Christoph Hellwig
  2026-03-31 15:26 ` [PATCH 1/2] xfs: refactor xfs_mount_zones Christoph Hellwig
@ 2026-03-31 15:26 ` Christoph Hellwig
  2026-03-31 19:38   ` Damien Le Moal
  2026-04-07 13:38 ` fix handling of too many open zones at mount time v2 Carlos Maiolino
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-03-31 15:26 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

When running on conventional zones or devices, the zoned allocator does
not have a real write pointer, but instead fakes it up at mount time
based on the last block recorded in the rmap.  This can create spurious
"open" zones when the last written blocks in a conventional zone are
invalidated.  Add a loop to the mount code to find the conventional zone
with the highest used block in the rmap tree and "finish" it until we
are below the open zones limit.

While we're at it, also error out if there are too many open sequential
zones, which can only happen when the user overrode the max open zones
limit (or with really buggy hardware reducing the limit, but not much
we can do about that).

Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
---
 fs/xfs/xfs_trace.h      |  1 +
 fs/xfs/xfs_zone_alloc.c | 75 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 60d1e605dfa5..c5ad26a1d7bb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -461,6 +461,7 @@ DEFINE_EVENT(xfs_zone_alloc_class, name,			\
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_record_blocks);
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_skip_blocks);
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_alloc_blocks);
+DEFINE_ZONE_ALLOC_EVENT(xfs_zone_spurious_open);
 
 TRACE_EVENT(xfs_zone_gc_select_victim,
 	TP_PROTO(struct xfs_rtgroup *rtg, unsigned int bucket),
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index e9f1d9d08620..5f8b6cbeebfd 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1253,6 +1253,77 @@ xfs_report_zones(
 	return 0;
 }
 
+static inline bool
+xfs_zone_is_conv(
+	struct xfs_rtgroup	*rtg)
+{
+	return !bdev_zone_is_seq(rtg_mount(rtg)->m_rtdev_targp->bt_bdev,
+			xfs_gbno_to_daddr(rtg_group(rtg), 0));
+}
+
+static struct xfs_open_zone *
+xfs_find_fullest_conventional_open_zone(
+	struct xfs_mount	*mp)
+{
+	struct xfs_zone_info	*zi = mp->m_zone_info;
+	struct xfs_open_zone	*found = NULL, *oz;
+
+	spin_lock(&zi->zi_open_zones_lock);
+	list_for_each_entry(oz, &zi->zi_open_zones, oz_entry) {
+		if (!xfs_zone_is_conv(oz->oz_rtg))
+			continue;
+		if (!found || oz->oz_allocated > found->oz_allocated)
+			found = oz;
+	}
+	spin_unlock(&zi->zi_open_zones_lock);
+
+	return found;
+}
+
+/*
+ * Find the fullest conventional zones and remove them from the open zone pool
+ * until we are at the open zone limit.
+ *
+ * We can end up with spurious "open" zones when the last blocks in a fully
+ * written zone were invalidate as there is no write pointer for conventional
+ * zones.
+ *
+ * If we are still over the limit when there is no conventional open zone left,
+ * the user overrode the max open zones limit using the max_open_zones mount
+ * option we should fail.
+ */
+static int
+xfs_finish_spurious_open_zones(
+	struct xfs_mount	*mp,
+	struct xfs_init_zones	*iz)
+{
+	struct xfs_zone_info	*zi = mp->m_zone_info;
+
+	while (zi->zi_nr_open_zones > mp->m_max_open_zones) {
+		struct xfs_open_zone	*oz;
+		xfs_filblks_t		adjust;
+
+		oz = xfs_find_fullest_conventional_open_zone(mp);
+		if (!oz) {
+			xfs_err(mp,
+"too many open zones for max_open_zones limit (%u/%u)",
+			zi->zi_nr_open_zones, mp->m_max_open_zones);
+			return -EINVAL;
+		}
+
+		xfs_rtgroup_lock(oz->oz_rtg, XFS_RTGLOCK_RMAP);
+		adjust = rtg_blocks(oz->oz_rtg) - oz->oz_written;
+		trace_xfs_zone_spurious_open(oz, oz->oz_written, adjust);
+		oz->oz_written = rtg_blocks(oz->oz_rtg);
+		xfs_open_zone_mark_full(oz);
+		xfs_rtgroup_unlock(oz->oz_rtg, XFS_RTGLOCK_RMAP);
+		iz->available -= adjust;
+		iz->reclaimable += adjust;
+	}
+
+	return 0;
+}
+
 int
 xfs_mount_zones(
 	struct xfs_mount	*mp)
@@ -1294,6 +1365,10 @@ xfs_mount_zones(
 	if (error)
 		goto out_free_zone_info;
 
+	error = xfs_finish_spurious_open_zones(mp, &iz);
+	if (error)
+		goto out_free_zone_info;
+
 	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
 	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
 			iz.available + iz.reclaimable);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] xfs: handle too many open zones when mounting
  2026-03-31 15:26 ` [PATCH 2/2] xfs: handle too many open zones when mounting Christoph Hellwig
@ 2026-03-31 19:38   ` Damien Le Moal
  0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2026-03-31 19:38 UTC (permalink / raw)
  To: Christoph Hellwig, Carlos Maiolino; +Cc: Hans Holmberg, linux-xfs

On 4/1/26 00:26, Christoph Hellwig wrote:
> When running on conventional zones or devices, the zoned allocator does
> not have a real write pointer, but instead fakes it up at mount time
> based on the last block recorded in the rmap.  This can create spurious
> "open" zones when the last written blocks in a conventional zone are
> invalidated.  Add a loop to the mount code to find the conventional zone
> with the highest used block in the rmap tree and "finish" it until we
> are below the open zones limit.
> 
> While we're at it, also error out if there are too many open sequential
> zones, which can only happen when the user overrode the max open zones
> limit (or with really buggy hardware reducing the limit, but not much
> we can do about that).
> 
> Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fix handling of too many open zones at mount time v2
  2026-03-31 15:26 fix handling of too many open zones at mount time v2 Christoph Hellwig
  2026-03-31 15:26 ` [PATCH 1/2] xfs: refactor xfs_mount_zones Christoph Hellwig
  2026-03-31 15:26 ` [PATCH 2/2] xfs: handle too many open zones when mounting Christoph Hellwig
@ 2026-04-07 13:38 ` Carlos Maiolino
  2 siblings, 0 replies; 6+ messages in thread
From: Carlos Maiolino @ 2026-04-07 13:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

On Tue, 31 Mar 2026 17:26:04 +0200, Christoph Hellwig wrote:
> because there is no actual write pointer when running the zoned allocator
> on conventional devices or zones, we can see spurious extra open zones
> when the last blocks in a written zone have been invalidated.  This
> series adds code to handle that case and remove these spurious extra
> zones.  It also fixes up the mountinfo code for open zones to be
> more easy to parse, and adds a new sysfs file reporting the currently
> open zones, which makes it easier to use the value in tests.
> 
> [...]

Applied to for-next, thanks!

[1/2] xfs: refactor xfs_mount_zones
      commit: 02367990bdcbeabb0ffd3e8e227e5f79a04186fc
[2/2] xfs: handle too many open zones when mounting
      commit: c6584888864e36d6225a6c16d8c39fd2aa9a45d8

Best regards,
-- 
Carlos Maiolino <cem@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-07 13:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 15:26 fix handling of too many open zones at mount time v2 Christoph Hellwig
2026-03-31 15:26 ` [PATCH 1/2] xfs: refactor xfs_mount_zones Christoph Hellwig
2026-03-31 19:37   ` Damien Le Moal
2026-03-31 15:26 ` [PATCH 2/2] xfs: handle too many open zones when mounting Christoph Hellwig
2026-03-31 19:38   ` Damien Le Moal
2026-04-07 13:38 ` fix handling of too many open zones at mount time v2 Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox