* [PATCH] xfs: always allocate the free zone with the lowest index
@ 2026-01-20 8:57 Hans Holmberg
2026-01-20 15:53 ` Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Hans Holmberg @ 2026-01-20 8:57 UTC (permalink / raw)
To: linux-xfs
Cc: Carlos Maiolino, Dave Chinner, Darrick J . Wong,
Christoph Hellwig, dlemoal, johannes.thumshirn, Hans Holmberg
Zones in the beginning of the address space are typically mapped to
higer bandwidth tracks on HDDs than those at the end of the address
space. So, in stead of allocating zones "round robin" across the whole
address space, always allocate the zone with the lowest index.
This increases average write bandwidth for overwrite workloads
when less than the full capacity is being used. At ~50% utilization
this improves bandwidth for a random file overwrite benchmark
with 128MiB files and 256MiB zone capacity by 30%.
Running the same benchmark with small 2-8 MiB files at 67% capacity
shows no significant difference in performance. Due to heavy
fragmentation the whole zone range is in use, greatly limiting the
number of free zones with high bw.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
---
fs/xfs/xfs_zone_alloc.c | 47 +++++++++++++++--------------------------
fs/xfs/xfs_zone_priv.h | 1 -
2 files changed, 17 insertions(+), 31 deletions(-)
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index bbcf21704ea0..d6c97026f733 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -408,31 +408,6 @@ xfs_zone_free_blocks(
return 0;
}
-static struct xfs_group *
-xfs_find_free_zone(
- struct xfs_mount *mp,
- unsigned long start,
- unsigned long end)
-{
- struct xfs_zone_info *zi = mp->m_zone_info;
- XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, start);
- struct xfs_group *xg;
-
- xas_lock(&xas);
- xas_for_each_marked(&xas, xg, end, XFS_RTG_FREE)
- if (atomic_inc_not_zero(&xg->xg_active_ref))
- goto found;
- xas_unlock(&xas);
- return NULL;
-
-found:
- xas_clear_mark(&xas, XFS_RTG_FREE);
- atomic_dec(&zi->zi_nr_free_zones);
- zi->zi_free_zone_cursor = xg->xg_gno;
- xas_unlock(&xas);
- return xg;
-}
-
static struct xfs_open_zone *
xfs_init_open_zone(
struct xfs_rtgroup *rtg,
@@ -472,13 +447,25 @@ xfs_open_zone(
bool is_gc)
{
struct xfs_zone_info *zi = mp->m_zone_info;
+ XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, 0);
struct xfs_group *xg;
- xg = xfs_find_free_zone(mp, zi->zi_free_zone_cursor, ULONG_MAX);
- if (!xg)
- xg = xfs_find_free_zone(mp, 0, zi->zi_free_zone_cursor);
- if (!xg)
- return NULL;
+ /*
+ * Pick the free zone with lowest index. Zones in the beginning of the
+ * address space typically provides higher bandwidth than those at the
+ * end of the address space on HDDs.
+ */
+ xas_lock(&xas);
+ xas_for_each_marked(&xas, xg, ULONG_MAX, XFS_RTG_FREE)
+ if (atomic_inc_not_zero(&xg->xg_active_ref))
+ goto found;
+ xas_unlock(&xas);
+ return NULL;
+
+found:
+ xas_clear_mark(&xas, XFS_RTG_FREE);
+ atomic_dec(&zi->zi_nr_free_zones);
+ xas_unlock(&xas);
set_current_state(TASK_RUNNING);
return xfs_init_open_zone(to_rtg(xg), 0, write_hint, is_gc);
diff --git a/fs/xfs/xfs_zone_priv.h b/fs/xfs/xfs_zone_priv.h
index ce7f0e2f4598..8fbf9a52964e 100644
--- a/fs/xfs/xfs_zone_priv.h
+++ b/fs/xfs/xfs_zone_priv.h
@@ -72,7 +72,6 @@ struct xfs_zone_info {
/*
* Free zone search cursor and number of free zones:
*/
- unsigned long zi_free_zone_cursor;
atomic_t zi_nr_free_zones;
/*
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: always allocate the free zone with the lowest index
2026-01-20 8:57 [PATCH] xfs: always allocate the free zone with the lowest index Hans Holmberg
@ 2026-01-20 15:53 ` Darrick J. Wong
2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 7:23 ` Hans Holmberg
2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 12:24 ` Carlos Maiolino
2 siblings, 2 replies; 6+ messages in thread
From: Darrick J. Wong @ 2026-01-20 15:53 UTC (permalink / raw)
To: Hans Holmberg
Cc: linux-xfs, Carlos Maiolino, Dave Chinner, Christoph Hellwig,
dlemoal, johannes.thumshirn
On Tue, Jan 20, 2026 at 09:57:46AM +0100, Hans Holmberg wrote:
> Zones in the beginning of the address space are typically mapped to
> higer bandwidth tracks on HDDs than those at the end of the address
> space. So, in stead of allocating zones "round robin" across the whole
> address space, always allocate the zone with the lowest index.
Does it make any difference if it's a zoned ssd? I'd imagine not, but I
wonder if there are any longer term side effects like lower-numbered
zones filling up and getting gc'd more often?
--D
> This increases average write bandwidth for overwrite workloads
> when less than the full capacity is being used. At ~50% utilization
> this improves bandwidth for a random file overwrite benchmark
> with 128MiB files and 256MiB zone capacity by 30%.
>
> Running the same benchmark with small 2-8 MiB files at 67% capacity
> shows no significant difference in performance. Due to heavy
> fragmentation the whole zone range is in use, greatly limiting the
> number of free zones with high bw.
>
> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> ---
>
> fs/xfs/xfs_zone_alloc.c | 47 +++++++++++++++--------------------------
> fs/xfs/xfs_zone_priv.h | 1 -
> 2 files changed, 17 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
> index bbcf21704ea0..d6c97026f733 100644
> --- a/fs/xfs/xfs_zone_alloc.c
> +++ b/fs/xfs/xfs_zone_alloc.c
> @@ -408,31 +408,6 @@ xfs_zone_free_blocks(
> return 0;
> }
>
> -static struct xfs_group *
> -xfs_find_free_zone(
> - struct xfs_mount *mp,
> - unsigned long start,
> - unsigned long end)
> -{
> - struct xfs_zone_info *zi = mp->m_zone_info;
> - XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, start);
> - struct xfs_group *xg;
> -
> - xas_lock(&xas);
> - xas_for_each_marked(&xas, xg, end, XFS_RTG_FREE)
> - if (atomic_inc_not_zero(&xg->xg_active_ref))
> - goto found;
> - xas_unlock(&xas);
> - return NULL;
> -
> -found:
> - xas_clear_mark(&xas, XFS_RTG_FREE);
> - atomic_dec(&zi->zi_nr_free_zones);
> - zi->zi_free_zone_cursor = xg->xg_gno;
> - xas_unlock(&xas);
> - return xg;
> -}
> -
> static struct xfs_open_zone *
> xfs_init_open_zone(
> struct xfs_rtgroup *rtg,
> @@ -472,13 +447,25 @@ xfs_open_zone(
> bool is_gc)
> {
> struct xfs_zone_info *zi = mp->m_zone_info;
> + XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, 0);
> struct xfs_group *xg;
>
> - xg = xfs_find_free_zone(mp, zi->zi_free_zone_cursor, ULONG_MAX);
> - if (!xg)
> - xg = xfs_find_free_zone(mp, 0, zi->zi_free_zone_cursor);
> - if (!xg)
> - return NULL;
> + /*
> + * Pick the free zone with lowest index. Zones in the beginning of the
> + * address space typically provides higher bandwidth than those at the
> + * end of the address space on HDDs.
> + */
> + xas_lock(&xas);
> + xas_for_each_marked(&xas, xg, ULONG_MAX, XFS_RTG_FREE)
> + if (atomic_inc_not_zero(&xg->xg_active_ref))
> + goto found;
> + xas_unlock(&xas);
> + return NULL;
> +
> +found:
> + xas_clear_mark(&xas, XFS_RTG_FREE);
> + atomic_dec(&zi->zi_nr_free_zones);
> + xas_unlock(&xas);
>
> set_current_state(TASK_RUNNING);
> return xfs_init_open_zone(to_rtg(xg), 0, write_hint, is_gc);
> diff --git a/fs/xfs/xfs_zone_priv.h b/fs/xfs/xfs_zone_priv.h
> index ce7f0e2f4598..8fbf9a52964e 100644
> --- a/fs/xfs/xfs_zone_priv.h
> +++ b/fs/xfs/xfs_zone_priv.h
> @@ -72,7 +72,6 @@ struct xfs_zone_info {
> /*
> * Free zone search cursor and number of free zones:
> */
> - unsigned long zi_free_zone_cursor;
> atomic_t zi_nr_free_zones;
>
> /*
> --
> 2.40.1
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: always allocate the free zone with the lowest index
2026-01-20 15:53 ` Darrick J. Wong
@ 2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 7:23 ` Hans Holmberg
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-01-21 7:16 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Hans Holmberg, linux-xfs, Carlos Maiolino, Dave Chinner,
Christoph Hellwig, dlemoal, johannes.thumshirn
On Tue, Jan 20, 2026 at 07:53:29AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 20, 2026 at 09:57:46AM +0100, Hans Holmberg wrote:
> > Zones in the beginning of the address space are typically mapped to
> > higer bandwidth tracks on HDDs than those at the end of the address
> > space. So, in stead of allocating zones "round robin" across the whole
> > address space, always allocate the zone with the lowest index.
>
> Does it make any difference if it's a zoned ssd? I'd imagine not, but I
> wonder if there are any longer term side effects like lower-numbered
> zones filling up and getting gc'd more often?
ZNS SSDs have to do wear leveling by mapping from logical to physical
zones or even recombine the internal arrangement from NAND blocks to
zones. The interface does not expose wear counters, and for modern NAND
the numbers might be different for different cells in the SSD anyway
and/or depend on various other things. Even read disturb where frequent
reads require a rewrite is a very real problem now.
So in short: no. That's probably the biggest difference between the
old Open Channel SSD concept and ZNS or other zoned interfaces, and
what makes using them directly from a normal file system feasible.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: always allocate the free zone with the lowest index
2026-01-20 8:57 [PATCH] xfs: always allocate the free zone with the lowest index Hans Holmberg
2026-01-20 15:53 ` Darrick J. Wong
@ 2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 12:24 ` Carlos Maiolino
2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-01-21 7:16 UTC (permalink / raw)
To: Hans Holmberg
Cc: linux-xfs, Carlos Maiolino, Dave Chinner, Darrick J . Wong,
Christoph Hellwig, dlemoal, johannes.thumshirn
On Tue, Jan 20, 2026 at 09:57:46AM +0100, Hans Holmberg wrote:
> Zones in the beginning of the address space are typically mapped to
> higer bandwidth tracks on HDDs than those at the end of the address
> space. So, in stead of allocating zones "round robin" across the whole
> address space, always allocate the zone with the lowest index.
>
> This increases average write bandwidth for overwrite workloads
> when less than the full capacity is being used. At ~50% utilization
> this improves bandwidth for a random file overwrite benchmark
> with 128MiB files and 256MiB zone capacity by 30%.
>
> Running the same benchmark with small 2-8 MiB files at 67% capacity
> shows no significant difference in performance. Due to heavy
> fragmentation the whole zone range is in use, greatly limiting the
> number of free zones with high bw.
Cool, thanks!
Reviewed-by: Christoph Hellwig <hch@lst.de>
I always like patches that speed things up by removing code :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: always allocate the free zone with the lowest index
2026-01-20 15:53 ` Darrick J. Wong
2026-01-21 7:16 ` Christoph Hellwig
@ 2026-01-21 7:23 ` Hans Holmberg
1 sibling, 0 replies; 6+ messages in thread
From: Hans Holmberg @ 2026-01-21 7:23 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-xfs@vger.kernel.org, Carlos Maiolino, Dave Chinner, hch,
dlemoal@kernel.org, Johannes Thumshirn
On 20/01/2026 16:53, Darrick J. Wong wrote:
> On Tue, Jan 20, 2026 at 09:57:46AM +0100, Hans Holmberg wrote:
>> Zones in the beginning of the address space are typically mapped to
>> higer bandwidth tracks on HDDs than those at the end of the address
>> space. So, in stead of allocating zones "round robin" across the whole
>> address space, always allocate the zone with the lowest index.
>
> Does it make any difference if it's a zoned ssd? I'd imagine not, but I
> wonder if there are any longer term side effects like lower-numbered
> zones filling up and getting gc'd more often?
It's a valid question. Your assumptions are correct, this has no effect
on zoned ssds. There is no direct mapping between logical zones and
physical erase blocks and thus no need to do any wear-leveling from the host.
Those gritty details are taken care of by the firmware.
> --D
>
>> This increases average write bandwidth for overwrite workloads
>> when less than the full capacity is being used. At ~50% utilization
>> this improves bandwidth for a random file overwrite benchmark
>> with 128MiB files and 256MiB zone capacity by 30%.
>>
>> Running the same benchmark with small 2-8 MiB files at 67% capacity
>> shows no significant difference in performance. Due to heavy
>> fragmentation the whole zone range is in use, greatly limiting the
>> number of free zones with high bw.
>>
>> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
>> ---
>>
>> fs/xfs/xfs_zone_alloc.c | 47 +++++++++++++++--------------------------
>> fs/xfs/xfs_zone_priv.h | 1 -
>> 2 files changed, 17 insertions(+), 31 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
>> index bbcf21704ea0..d6c97026f733 100644
>> --- a/fs/xfs/xfs_zone_alloc.c
>> +++ b/fs/xfs/xfs_zone_alloc.c
>> @@ -408,31 +408,6 @@ xfs_zone_free_blocks(
>> return 0;
>> }
>>
>> -static struct xfs_group *
>> -xfs_find_free_zone(
>> - struct xfs_mount *mp,
>> - unsigned long start,
>> - unsigned long end)
>> -{
>> - struct xfs_zone_info *zi = mp->m_zone_info;
>> - XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, start);
>> - struct xfs_group *xg;
>> -
>> - xas_lock(&xas);
>> - xas_for_each_marked(&xas, xg, end, XFS_RTG_FREE)
>> - if (atomic_inc_not_zero(&xg->xg_active_ref))
>> - goto found;
>> - xas_unlock(&xas);
>> - return NULL;
>> -
>> -found:
>> - xas_clear_mark(&xas, XFS_RTG_FREE);
>> - atomic_dec(&zi->zi_nr_free_zones);
>> - zi->zi_free_zone_cursor = xg->xg_gno;
>> - xas_unlock(&xas);
>> - return xg;
>> -}
>> -
>> static struct xfs_open_zone *
>> xfs_init_open_zone(
>> struct xfs_rtgroup *rtg,
>> @@ -472,13 +447,25 @@ xfs_open_zone(
>> bool is_gc)
>> {
>> struct xfs_zone_info *zi = mp->m_zone_info;
>> + XA_STATE (xas, &mp->m_groups[XG_TYPE_RTG].xa, 0);
>> struct xfs_group *xg;
>>
>> - xg = xfs_find_free_zone(mp, zi->zi_free_zone_cursor, ULONG_MAX);
>> - if (!xg)
>> - xg = xfs_find_free_zone(mp, 0, zi->zi_free_zone_cursor);
>> - if (!xg)
>> - return NULL;
>> + /*
>> + * Pick the free zone with lowest index. Zones in the beginning of the
>> + * address space typically provides higher bandwidth than those at the
>> + * end of the address space on HDDs.
>> + */
>> + xas_lock(&xas);
>> + xas_for_each_marked(&xas, xg, ULONG_MAX, XFS_RTG_FREE)
>> + if (atomic_inc_not_zero(&xg->xg_active_ref))
>> + goto found;
>> + xas_unlock(&xas);
>> + return NULL;
>> +
>> +found:
>> + xas_clear_mark(&xas, XFS_RTG_FREE);
>> + atomic_dec(&zi->zi_nr_free_zones);
>> + xas_unlock(&xas);
>>
>> set_current_state(TASK_RUNNING);
>> return xfs_init_open_zone(to_rtg(xg), 0, write_hint, is_gc);
>> diff --git a/fs/xfs/xfs_zone_priv.h b/fs/xfs/xfs_zone_priv.h
>> index ce7f0e2f4598..8fbf9a52964e 100644
>> --- a/fs/xfs/xfs_zone_priv.h
>> +++ b/fs/xfs/xfs_zone_priv.h
>> @@ -72,7 +72,6 @@ struct xfs_zone_info {
>> /*
>> * Free zone search cursor and number of free zones:
>> */
>> - unsigned long zi_free_zone_cursor;
>> atomic_t zi_nr_free_zones;
>>
>> /*
>> --
>> 2.40.1
>>
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: always allocate the free zone with the lowest index
2026-01-20 8:57 [PATCH] xfs: always allocate the free zone with the lowest index Hans Holmberg
2026-01-20 15:53 ` Darrick J. Wong
2026-01-21 7:16 ` Christoph Hellwig
@ 2026-01-21 12:24 ` Carlos Maiolino
2 siblings, 0 replies; 6+ messages in thread
From: Carlos Maiolino @ 2026-01-21 12:24 UTC (permalink / raw)
To: linux-xfs, Hans Holmberg
Cc: Dave Chinner, Darrick J . Wong, Christoph Hellwig, dlemoal,
johannes.thumshirn
On Tue, 20 Jan 2026 09:57:46 +0100, Hans Holmberg wrote:
> Zones in the beginning of the address space are typically mapped to
> higer bandwidth tracks on HDDs than those at the end of the address
> space. So, in stead of allocating zones "round robin" across the whole
> address space, always allocate the zone with the lowest index.
>
> This increases average write bandwidth for overwrite workloads
> when less than the full capacity is being used. At ~50% utilization
> this improves bandwidth for a random file overwrite benchmark
> with 128MiB files and 256MiB zone capacity by 30%.
>
> [...]
Applied to for-next, thanks!
[1/1] xfs: always allocate the free zone with the lowest index
commit: 01a28961549ac9c387ccd5eb00d58be1d8c2794b
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-21 12:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 8:57 [PATCH] xfs: always allocate the free zone with the lowest index Hans Holmberg
2026-01-20 15:53 ` Darrick J. Wong
2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 7:23 ` Hans Holmberg
2026-01-21 7:16 ` Christoph Hellwig
2026-01-21 12:24 ` Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox