public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* fix handling of too many open zones at mount time
@ 2026-03-30  5:40 Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 1/4] xfs: refactor xfs_mount_zones Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:40 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

Hi all,

because there is no actual write pointer when running the zoned allocator
on conventional devices or zones, we can see spurious extra open zones
when the last blocks in a written zone have been invalidated.  This
series adds code to handle that case and remove these spurious extra
zones.  It also fixes up the mountinfo code for open zones to be
more easy to parse, and adds a new sysfs file reporting the currently
open zones, which makes it easier to use the value in tests.

A test for the open zone behavior will be added to xfstests.

 Documentation/admin-guide/xfs.rst |    4 +
 fs/xfs/scrub/trace.h              |   12 ---
 fs/xfs/xfs_attr_item.c            |    1 
 fs/xfs/xfs_sysfs.c                |   13 +++
 fs/xfs/xfs_trace.h                |   12 ---
 fs/xfs/xfs_zone_alloc.c           |  128 ++++++++++++++++++++++++++++++++------
 fs/xfs/xfs_zone_info.c            |    8 +-
 7 files changed, 136 insertions(+), 42 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] xfs: refactor xfs_mount_zones
  2026-03-30  5:40 fix handling of too many open zones at mount time Christoph Hellwig
@ 2026-03-30  5:40 ` Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 2/4] xfs: handle too many open zones when mounting Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:40 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

xfs_mount_zones has grown a bit too big and unorganized.  Split the
zone reporting loop into a separate helper, hiding the rtg variable
there.  Print the mount message last, and also keep the VFS writeback
chunk size last instead of in the middle of the logic to calculate
the free/available blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_zone_alloc.c | 54 ++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 06e2cb79030e..e9f1d9d08620 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1230,6 +1230,29 @@ xfs_free_zone_info(
 	kfree(zi);
 }
 
+static int
+xfs_report_zones(
+	struct xfs_mount	*mp,
+	struct xfs_init_zones	*iz)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		xfs_rgblock_t		write_pointer;
+		int			error;
+
+		error = xfs_query_write_pointer(iz, rtg, &write_pointer);
+		if (!error)
+			error = xfs_init_zone(iz, rtg, write_pointer);
+		if (error) {
+			xfs_rtgroup_rele(rtg);
+			return error;
+		}
+	}
+
+	return 0;
+}
+
 int
 xfs_mount_zones(
 	struct xfs_mount	*mp)
@@ -1238,7 +1261,6 @@ xfs_mount_zones(
 		.zone_capacity	= mp->m_groups[XG_TYPE_RTG].blocks,
 		.zone_size	= xfs_rtgroup_raw_size(mp),
 	};
-	struct xfs_rtgroup	*rtg = NULL;
 	int			error;
 
 	if (!mp->m_rtdev_targp) {
@@ -1268,9 +1290,13 @@ xfs_mount_zones(
 	if (!mp->m_zone_info)
 		return -ENOMEM;
 
-	xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
-		 mp->m_sb.sb_rgcount, iz.zone_capacity, mp->m_max_open_zones);
-	trace_xfs_zones_mount(mp);
+	error = xfs_report_zones(mp, &iz);
+	if (error)
+		goto out_free_zone_info;
+
+	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
+	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
+			iz.available + iz.reclaimable);
 
 	/*
 	 * The writeback code switches between inodes regularly to provide
@@ -1296,22 +1322,6 @@ xfs_mount_zones(
 		XFS_FSB_TO_B(mp, min(iz.zone_capacity, XFS_MAX_BMBT_EXTLEN)) >>
 			PAGE_SHIFT;
 
-	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
-		xfs_rgblock_t		write_pointer;
-
-		error = xfs_query_write_pointer(&iz, rtg, &write_pointer);
-		if (!error)
-			error = xfs_init_zone(&iz, rtg, write_pointer);
-		if (error) {
-			xfs_rtgroup_rele(rtg);
-			goto out_free_zone_info;
-		}
-	}
-
-	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
-	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
-			iz.available + iz.reclaimable);
-
 	/*
 	 * The user may configure GC to free up a percentage of unused blocks.
 	 * By default this is 0. GC will always trigger at the minimum level
@@ -1322,6 +1332,10 @@ xfs_mount_zones(
 	error = xfs_zone_gc_mount(mp);
 	if (error)
 		goto out_free_zone_info;
+
+	xfs_info(mp, "%u zones of %u blocks (%u max open zones)",
+		 mp->m_sb.sb_rgcount, iz.zone_capacity, mp->m_max_open_zones);
+	trace_xfs_zones_mount(mp);
 	return 0;
 
 out_free_zone_info:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] xfs: handle too many open zones when mounting
  2026-03-30  5:40 fix handling of too many open zones at mount time Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 1/4] xfs: refactor xfs_mount_zones Christoph Hellwig
@ 2026-03-30  5:40 ` Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 3/4] xfs: expose the number of open zones in sysfs Christoph Hellwig
  2026-03-30  5:41 ` [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo Christoph Hellwig
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:40 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

When running on conventional zones or devices, the zoned allocator does
not have a real write pointer, but instead fakes it up at mount time
based on the last block recorded in the rmap.  This can create spurious
"open" zones when the last written blocks in a conventional zone are
invalidated.  Add a loop to the mount code to find the conventional zones
with the most used space and "finish" them.

While we're at it, also error out if there are too many open sequential
zones, which can only happen when the user overrode the max open zones
limit (or with really buggy hardware reducing the limit, but not much
we can do about that).

Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trace.h      |  1 +
 fs/xfs/xfs_zone_alloc.c | 74 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 5e8190fe2be9..433f84252119 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -461,6 +461,7 @@ DEFINE_EVENT(xfs_zone_alloc_class, name,			\
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_record_blocks);
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_skip_blocks);
 DEFINE_ZONE_ALLOC_EVENT(xfs_zone_alloc_blocks);
+DEFINE_ZONE_ALLOC_EVENT(xfs_zone_spurious_open);
 
 TRACE_EVENT(xfs_zone_gc_select_victim,
 	TP_PROTO(struct xfs_rtgroup *rtg, unsigned int bucket),
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index e9f1d9d08620..b0a6bc777c36 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -1253,6 +1253,76 @@ xfs_report_zones(
 	return 0;
 }
 
+static inline bool
+xfs_zone_is_conv(
+	struct xfs_rtgroup	*rtg)
+{
+	return !bdev_zone_is_seq(rtg_mount(rtg)->m_rtdev_targp->bt_bdev,
+			xfs_gbno_to_daddr(rtg_group(rtg), 0));
+}
+
+static struct xfs_open_zone *
+xfs_find_fullest_conventional_open_zone(
+	struct xfs_mount	*mp)
+{
+	struct xfs_zone_info	*zi = mp->m_zone_info;
+	struct xfs_open_zone	*found = NULL, *oz;
+
+	spin_lock(&zi->zi_open_zones_lock);
+	list_for_each_entry(oz, &zi->zi_open_zones, oz_entry) {
+		if (!xfs_zone_is_conv(oz->oz_rtg))
+			continue;
+		if (!found || oz->oz_allocated > found->oz_allocated)
+			found = oz;
+	}
+	spin_unlock(&zi->zi_open_zones_lock);
+
+	return found;
+}
+
+/*
+ * Find the fullest conventional zones and remove them from the open zone pool
+ * until we are at the open zone limit.
+ *
+ * We can end up with spurious "open" zones when the last blocks in a fully
+ * written zone were invalidate as there is no write pointer for conventional
+ * zone.
+ *
+ * If we are still over the limit when there is no conventional open zone left,
+ * the user manually overrode the max open zones limit and we should fail.
+ */
+static int
+xfs_finish_spurious_open_zones(
+	struct xfs_mount	*mp,
+	struct xfs_init_zones	*iz)
+{
+	struct xfs_zone_info	*zi = mp->m_zone_info;
+
+	while (zi->zi_nr_open_zones > mp->m_max_open_zones) {
+		struct xfs_open_zone	*oz;
+		xfs_filblks_t		adjust;
+
+		oz = xfs_find_fullest_conventional_open_zone(mp);
+		if (!oz) {
+			xfs_err(mp,
+"too many open zones for max_open_zones limit (%u/%u)",
+			zi->zi_nr_open_zones, mp->m_max_open_zones);
+			return -EINVAL;
+		}
+
+		xfs_rtgroup_lock(oz->oz_rtg, XFS_RTGLOCK_RMAP);
+		adjust = rtg_blocks(oz->oz_rtg) - oz->oz_written;
+		trace_xfs_zone_spurious_open(oz, oz->oz_written, adjust);
+		oz->oz_written = rtg_blocks(oz->oz_rtg);
+		xfs_open_zone_mark_full(oz);
+		xfs_rtgroup_unlock(oz->oz_rtg, XFS_RTGLOCK_RMAP);
+		iz->available -= adjust;
+		iz->reclaimable += adjust;
+	}
+
+	return 0;
+}
+
 int
 xfs_mount_zones(
 	struct xfs_mount	*mp)
@@ -1294,6 +1364,10 @@ xfs_mount_zones(
 	if (error)
 		goto out_free_zone_info;
 
+	error = xfs_finish_spurious_open_zones(mp, &iz);
+	if (error)
+		goto out_free_zone_info;
+
 	xfs_set_freecounter(mp, XC_FREE_RTAVAILABLE, iz.available);
 	xfs_set_freecounter(mp, XC_FREE_RTEXTENTS,
 			iz.available + iz.reclaimable);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] xfs: expose the number of open zones in sysfs
  2026-03-30  5:40 fix handling of too many open zones at mount time Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 1/4] xfs: refactor xfs_mount_zones Christoph Hellwig
  2026-03-30  5:40 ` [PATCH 2/4] xfs: handle too many open zones when mounting Christoph Hellwig
@ 2026-03-30  5:40 ` Christoph Hellwig
  2026-03-30  5:41 ` [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo Christoph Hellwig
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:40 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

Add a sysfs attribute for the current number of open zones so that it
can be trivially read from userspace in monitoring or testing software.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/admin-guide/xfs.rst |  4 ++++
 fs/xfs/xfs_sysfs.c                | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index 746ea60eed3f..eb3085421276 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -550,6 +550,10 @@ For zoned file systems, the following attributes are exposed in:
 	is limited by the capabilities of the backing zoned device, file system
 	size and the max_open_zones mount option.
 
+  nr_open_zones			(Min:  1  Default:  Varies  Max:  UINTMAX)
+	This read-only attribute exposes the current number of open zones
+	used by the file system.
+
   zonegc_low_space		(Min:  0  Default:  0  Max:  100)
 	Define a percentage for how much of the unused space that GC should keep
 	available for writing. A high value will reclaim more of the space
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 6c7909838234..43a02c622b6d 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -13,6 +13,7 @@
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
 #include "xfs_mount.h"
+#include "xfs_zone_priv.h"
 #include "xfs_zones.h"
 
 struct xfs_sysfs_attr {
@@ -718,6 +719,17 @@ max_open_zones_show(
 }
 XFS_SYSFS_ATTR_RO(max_open_zones);
 
+static ssize_t
+nr_open_zones_show(
+	struct kobject		*kobj,
+	char			*buf)
+{
+	struct xfs_zone_info	*zi = zoned_to_mp(kobj)->m_zone_info;
+
+	return sysfs_emit(buf, "%u\n", zi->zi_nr_open_zones);
+}
+XFS_SYSFS_ATTR_RO(nr_open_zones);
+
 static ssize_t
 zonegc_low_space_store(
 	struct kobject		*kobj,
@@ -751,6 +763,7 @@ XFS_SYSFS_ATTR_RW(zonegc_low_space);
 
 static struct attribute *xfs_zoned_attrs[] = {
 	ATTR_LIST(max_open_zones),
+	ATTR_LIST(nr_open_zones),
 	ATTR_LIST(zonegc_low_space),
 	NULL,
 };
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo
  2026-03-30  5:40 fix handling of too many open zones at mount time Christoph Hellwig
                   ` (2 preceding siblings ...)
  2026-03-30  5:40 ` [PATCH 3/4] xfs: expose the number of open zones in sysfs Christoph Hellwig
@ 2026-03-30  5:41 ` Christoph Hellwig
  2026-03-30  5:47   ` Christoph Hellwig
  3 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:41 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

Keeping a value per line makes parsing much easier, so move the maximum
number of open zones into a separate line, and also add a new line for
the currently open number of open GC zones.  While that has to be either
0 or 1 currently having a value future-proofs the interface for adding
more open GC zones if needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_zone_info.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_zone_info.c b/fs/xfs/xfs_zone_info.c
index a2af44011654..8e56181021b3 100644
--- a/fs/xfs/xfs_zone_info.c
+++ b/fs/xfs/xfs_zone_info.c
@@ -95,8 +95,12 @@ xfs_zoned_show_stats(
 	seq_printf(m, "\tfree zones: %d\n", atomic_read(&zi->zi_nr_free_zones));
 
 	spin_lock(&zi->zi_open_zones_lock);
-	seq_printf(m, "\tnumber of open zones: %u / %u\n",
-		zi->zi_nr_open_zones, mp->m_max_open_zones);
+	seq_printf(m, "\tmax open zones: %u\n",
+		mp->m_max_open_zones);
+	seq_printf(m, "\tnr open zones: %u\n",
+		zi->zi_nr_open_zones);
+	seq_printf(m, "\tnr open GC zones: %u\n",
+		zi->zi_nr_open_gc_zones);
 	seq_puts(m, "\topen zones:\n");
 	list_for_each_entry(oz, &zi->zi_open_zones, oz_entry)
 		xfs_show_open_zone(m, oz);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo
  2026-03-30  5:41 ` [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo Christoph Hellwig
@ 2026-03-30  5:47   ` Christoph Hellwig
  2026-03-30  9:55     ` Niklas Cassel
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30  5:47 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Damien Le Moal, Hans Holmberg, linux-xfs

Note that this patch actually requires the "cleanup open GC zone handling"
series.  I had the two the other way around in my local tree, but decided
to send the series with the fix out first after a trivial rebase worked
fine, which obviously wasn't a good enough test.  Sorry.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo
  2026-03-30  5:47   ` Christoph Hellwig
@ 2026-03-30  9:55     ` Niklas Cassel
  2026-03-30 13:10       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2026-03-30  9:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Carlos Maiolino, Damien Le Moal, Hans Holmberg, linux-xfs

On Mon, Mar 30, 2026 at 07:47:32AM +0200, Christoph Hellwig wrote:
> Note that this patch actually requires the "cleanup open GC zone handling"
> series.  I had the two the other way around in my local tree, but decided
> to send the series with the fix out first after a trivial rebase worked
> fine, which obviously wasn't a good enough test.  Sorry.

I don't know if this series should be applied to 7.0 or not, but if so,
then I'm guessing that we do not want this series to depend on a cleanup
series.

If both series are targeting 7.1, then it probably doesn't matter.


Side note: You probably know this already, but if you had it the other way
around in your local tree, "git reflog" can be of great help to get the
SHA1 of the commit before the trivial rebase, even if you no longer have
any references to that old SHA1.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo
  2026-03-30  9:55     ` Niklas Cassel
@ 2026-03-30 13:10       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2026-03-30 13:10 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Christoph Hellwig, Carlos Maiolino, Damien Le Moal, Hans Holmberg,
	linux-xfs

On Mon, Mar 30, 2026 at 11:55:01AM +0200, Niklas Cassel wrote:
> On Mon, Mar 30, 2026 at 07:47:32AM +0200, Christoph Hellwig wrote:
> > Note that this patch actually requires the "cleanup open GC zone handling"
> > series.  I had the two the other way around in my local tree, but decided
> > to send the series with the fix out first after a trivial rebase worked
> > fine, which obviously wasn't a good enough test.  Sorry.
> 
> I don't know if this series should be applied to 7.0 or not, but if so,
> then I'm guessing that we do not want this series to depend on a cleanup
> series.

Everything here is targetted at 7.1.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-30 13:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30  5:40 fix handling of too many open zones at mount time Christoph Hellwig
2026-03-30  5:40 ` [PATCH 1/4] xfs: refactor xfs_mount_zones Christoph Hellwig
2026-03-30  5:40 ` [PATCH 2/4] xfs: handle too many open zones when mounting Christoph Hellwig
2026-03-30  5:40 ` [PATCH 3/4] xfs: expose the number of open zones in sysfs Christoph Hellwig
2026-03-30  5:41 ` [PATCH 4/4] xfs: untangle the open zones reporting in mountinfo Christoph Hellwig
2026-03-30  5:47   ` Christoph Hellwig
2026-03-30  9:55     ` Niklas Cassel
2026-03-30 13:10       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox