* zoned allocator fixes
@ 2025-08-18 5:06 Christoph Hellwig
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-08-18 5:06 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
Hi all,
this has fixes for the zoned allocator. One removes my original version
of the inode zone affinity, which turned to be worse than the more
general version later added by Hans. The second one ensures we run
inodegc when out of space just like for the normal write path, and the
third one rejects swapon directly instead of going into a rat hole
iterating all extents first.
Diffstat:
xfs_aops.c | 3 +++
xfs_zone_alloc.c | 45 ++-------------------------------------------
xfs_zone_space_resv.c | 6 ++++++
3 files changed, 11 insertions(+), 43 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] xfs: remove xfs_last_used_zone
2025-08-18 5:06 zoned allocator fixes Christoph Hellwig
@ 2025-08-18 5:06 ` Christoph Hellwig
2025-08-18 13:05 ` Hans Holmberg
2025-08-19 5:56 ` Damien Le Moal
2025-08-18 5:06 ` [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-08-18 5:06 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
This was my first attempt at caching the last used zone. But it turns out
for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short
sequence, the bmap btree does not record a written extent yet, so it fails.
Because it then still finds the last written zone it can lead to a weird
ping-pong around a few zones with writers seeing different values.
Remove it entirely as the later added xfs_cached_zone actually does a
much better job enforcing the locality as the zone is associated with the
inode in the MRU cache as soon as the zone is selected.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_alloc.c | 45 ++---------------------------------------
1 file changed, 2 insertions(+), 43 deletions(-)
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index f8bd6d741755..dfdb14120614 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -374,44 +374,6 @@ xfs_zone_free_blocks(
return 0;
}
-/*
- * Check if the zone containing the data just before the offset we are
- * writing to is still open and has space.
- */
-static struct xfs_open_zone *
-xfs_last_used_zone(
- struct iomap_ioend *ioend)
-{
- struct xfs_inode *ip = XFS_I(ioend->io_inode);
- struct xfs_mount *mp = ip->i_mount;
- xfs_fileoff_t offset_fsb = XFS_B_TO_FSB(mp, ioend->io_offset);
- struct xfs_rtgroup *rtg = NULL;
- struct xfs_open_zone *oz = NULL;
- struct xfs_iext_cursor icur;
- struct xfs_bmbt_irec got;
-
- xfs_ilock(ip, XFS_ILOCK_SHARED);
- if (!xfs_iext_lookup_extent_before(ip, &ip->i_df, &offset_fsb,
- &icur, &got)) {
- xfs_iunlock(ip, XFS_ILOCK_SHARED);
- return NULL;
- }
- xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
- rtg = xfs_rtgroup_grab(mp, xfs_rtb_to_rgno(mp, got.br_startblock));
- if (!rtg)
- return NULL;
-
- xfs_ilock(rtg_rmap(rtg), XFS_ILOCK_SHARED);
- oz = READ_ONCE(rtg->rtg_open_zone);
- if (oz && (oz->oz_is_gc || !atomic_inc_not_zero(&oz->oz_ref)))
- oz = NULL;
- xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_SHARED);
-
- xfs_rtgroup_rele(rtg);
- return oz;
-}
-
static struct xfs_group *
xfs_find_free_zone(
struct xfs_mount *mp,
@@ -918,12 +880,9 @@ xfs_zone_alloc_and_submit(
goto out_error;
/*
- * If we don't have a cached zone in this write context, see if the
- * last extent before the one we are writing to points to an active
- * zone. If so, just continue writing to it.
+ * If we don't have a locally cached zone in this write context, see if
+ * the inode is still associated with a zone and use that if so.
*/
- if (!*oz && ioend->io_offset)
- *oz = xfs_last_used_zone(ioend);
if (!*oz)
*oz = xfs_cached_zone(mp, ip);
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks
2025-08-18 5:06 zoned allocator fixes Christoph Hellwig
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
@ 2025-08-18 5:06 ` Christoph Hellwig
2025-08-18 13:05 ` Hans Holmberg
2025-08-18 5:06 ` [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier Christoph Hellwig
2025-08-19 13:11 ` zoned allocator fixes Carlos Maiolino
3 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-08-18 5:06 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
XFS processes truncating unlinked inodes asynchronously and thus the free
space pool only sees them with a delay. The non-zoned write path thus
calls into inodegc to accelerate this processing before failing an
allocation due the lack of free blocks. Do the same for the zoned space
reservation.
Fixes: 0bb2193056b5 ("xfs: add support for zoned space reservations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_space_resv.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/xfs/xfs_zone_space_resv.c b/fs/xfs/xfs_zone_space_resv.c
index 1313c55b8cbe..9cd38716fd25 100644
--- a/fs/xfs/xfs_zone_space_resv.c
+++ b/fs/xfs/xfs_zone_space_resv.c
@@ -10,6 +10,7 @@
#include "xfs_mount.h"
#include "xfs_inode.h"
#include "xfs_rtbitmap.h"
+#include "xfs_icache.h"
#include "xfs_zone_alloc.h"
#include "xfs_zone_priv.h"
#include "xfs_zones.h"
@@ -230,6 +231,11 @@ xfs_zoned_space_reserve(
error = xfs_dec_freecounter(mp, XC_FREE_RTEXTENTS, count_fsb,
flags & XFS_ZR_RESERVED);
+ if (error == -ENOSPC && !(flags & XFS_ZR_NOWAIT)) {
+ xfs_inodegc_flush(mp);
+ error = xfs_dec_freecounter(mp, XC_FREE_RTEXTENTS, count_fsb,
+ flags & XFS_ZR_RESERVED);
+ }
if (error == -ENOSPC && (flags & XFS_ZR_GREEDY) && count_fsb > 1)
error = xfs_zoned_reserve_extents_greedy(mp, &count_fsb, flags);
if (error)
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier
2025-08-18 5:06 zoned allocator fixes Christoph Hellwig
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
2025-08-18 5:06 ` [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks Christoph Hellwig
@ 2025-08-18 5:06 ` Christoph Hellwig
2025-08-18 13:06 ` Hans Holmberg
2025-08-19 7:17 ` Damien Le Moal
2025-08-19 13:11 ` zoned allocator fixes Carlos Maiolino
3 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-08-18 5:06 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
No point in going down into the iomap mapping loop when we known it
will be rejected.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_aops.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1ee4f835ac3c..a26f79815533 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -760,6 +760,9 @@ xfs_vm_swap_activate(
{
struct xfs_inode *ip = XFS_I(file_inode(swap_file));
+ if (xfs_is_zoned_inode(ip))
+ return -EINVAL;
+
/*
* Swap file activation can race against concurrent shared extent
* removal in files that have been cloned. If this happens,
--
2.47.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] xfs: remove xfs_last_used_zone
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
@ 2025-08-18 13:05 ` Hans Holmberg
2025-08-19 5:56 ` Damien Le Moal
1 sibling, 0 replies; 10+ messages in thread
From: Hans Holmberg @ 2025-08-18 13:05 UTC (permalink / raw)
To: hch, Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs@vger.kernel.org
On 18/08/2025 07:07, Christoph Hellwig wrote:
> This was my first attempt at caching the last used zone. But it turns out
> for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short
> sequence, the bmap btree does not record a written extent yet, so it fails.
> Because it then still finds the last written zone it can lead to a weird
> ping-pong around a few zones with writers seeing different values.
>
> Remove it entirely as the later added xfs_cached_zone actually does a
> much better job enforcing the locality as the zone is associated with the
> inode in the MRU cache as soon as the zone is selected.
>
> Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_alloc.c | 45 ++---------------------------------------
> 1 file changed, 2 insertions(+), 43 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
> index f8bd6d741755..dfdb14120614 100644
> --- a/fs/xfs/xfs_zone_alloc.c
> +++ b/fs/xfs/xfs_zone_alloc.c
> @@ -374,44 +374,6 @@ xfs_zone_free_blocks(
> return 0;
> }
>
> -/*
> - * Check if the zone containing the data just before the offset we are
> - * writing to is still open and has space.
> - */
> -static struct xfs_open_zone *
> -xfs_last_used_zone(
> - struct iomap_ioend *ioend)
> -{
> - struct xfs_inode *ip = XFS_I(ioend->io_inode);
> - struct xfs_mount *mp = ip->i_mount;
> - xfs_fileoff_t offset_fsb = XFS_B_TO_FSB(mp, ioend->io_offset);
> - struct xfs_rtgroup *rtg = NULL;
> - struct xfs_open_zone *oz = NULL;
> - struct xfs_iext_cursor icur;
> - struct xfs_bmbt_irec got;
> -
> - xfs_ilock(ip, XFS_ILOCK_SHARED);
> - if (!xfs_iext_lookup_extent_before(ip, &ip->i_df, &offset_fsb,
> - &icur, &got)) {
> - xfs_iunlock(ip, XFS_ILOCK_SHARED);
> - return NULL;
> - }
> - xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -
> - rtg = xfs_rtgroup_grab(mp, xfs_rtb_to_rgno(mp, got.br_startblock));
> - if (!rtg)
> - return NULL;
> -
> - xfs_ilock(rtg_rmap(rtg), XFS_ILOCK_SHARED);
> - oz = READ_ONCE(rtg->rtg_open_zone);
> - if (oz && (oz->oz_is_gc || !atomic_inc_not_zero(&oz->oz_ref)))
> - oz = NULL;
> - xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_SHARED);
> -
> - xfs_rtgroup_rele(rtg);
> - return oz;
> -}
> -
> static struct xfs_group *
> xfs_find_free_zone(
> struct xfs_mount *mp,
> @@ -918,12 +880,9 @@ xfs_zone_alloc_and_submit(
> goto out_error;
>
> /*
> - * If we don't have a cached zone in this write context, see if the
> - * last extent before the one we are writing to points to an active
> - * zone. If so, just continue writing to it.
> + * If we don't have a locally cached zone in this write context, see if
> + * the inode is still associated with a zone and use that if so.
> */
> - if (!*oz && ioend->io_offset)
> - *oz = xfs_last_used_zone(ioend);
> if (!*oz)
> *oz = xfs_cached_zone(mp, ip);
>
Looks good to me.
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks
2025-08-18 5:06 ` [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks Christoph Hellwig
@ 2025-08-18 13:05 ` Hans Holmberg
0 siblings, 0 replies; 10+ messages in thread
From: Hans Holmberg @ 2025-08-18 13:05 UTC (permalink / raw)
To: hch, Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs@vger.kernel.org
On 18/08/2025 07:07, Christoph Hellwig wrote:
> XFS processes truncating unlinked inodes asynchronously and thus the free
> space pool only sees them with a delay. The non-zoned write path thus
> calls into inodegc to accelerate this processing before failing an
> allocation due the lack of free blocks. Do the same for the zoned space
> reservation.
>
> Fixes: 0bb2193056b5 ("xfs: add support for zoned space reservations")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_space_resv.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/xfs/xfs_zone_space_resv.c b/fs/xfs/xfs_zone_space_resv.c
> index 1313c55b8cbe..9cd38716fd25 100644
> --- a/fs/xfs/xfs_zone_space_resv.c
> +++ b/fs/xfs/xfs_zone_space_resv.c
> @@ -10,6 +10,7 @@
> #include "xfs_mount.h"
> #include "xfs_inode.h"
> #include "xfs_rtbitmap.h"
> +#include "xfs_icache.h"
> #include "xfs_zone_alloc.h"
> #include "xfs_zone_priv.h"
> #include "xfs_zones.h"
> @@ -230,6 +231,11 @@ xfs_zoned_space_reserve(
>
> error = xfs_dec_freecounter(mp, XC_FREE_RTEXTENTS, count_fsb,
> flags & XFS_ZR_RESERVED);
> + if (error == -ENOSPC && !(flags & XFS_ZR_NOWAIT)) {
> + xfs_inodegc_flush(mp);
> + error = xfs_dec_freecounter(mp, XC_FREE_RTEXTENTS, count_fsb,
> + flags & XFS_ZR_RESERVED);
> + }
> if (error == -ENOSPC && (flags & XFS_ZR_GREEDY) && count_fsb > 1)
> error = xfs_zoned_reserve_extents_greedy(mp, &count_fsb, flags);
> if (error)
Looks good to me.
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier
2025-08-18 5:06 ` [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier Christoph Hellwig
@ 2025-08-18 13:06 ` Hans Holmberg
2025-08-19 7:17 ` Damien Le Moal
1 sibling, 0 replies; 10+ messages in thread
From: Hans Holmberg @ 2025-08-18 13:06 UTC (permalink / raw)
To: hch, Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs@vger.kernel.org
On 18/08/2025 07:07, Christoph Hellwig wrote:
> No point in going down into the iomap mapping loop when we known it
> will be rejected.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_aops.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 1ee4f835ac3c..a26f79815533 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -760,6 +760,9 @@ xfs_vm_swap_activate(
> {
> struct xfs_inode *ip = XFS_I(file_inode(swap_file));
>
> + if (xfs_is_zoned_inode(ip))
> + return -EINVAL;
> +
> /*
> * Swap file activation can race against concurrent shared extent
> * removal in files that have been cloned. If this happens,
Looks good to me.
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] xfs: remove xfs_last_used_zone
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
2025-08-18 13:05 ` Hans Holmberg
@ 2025-08-19 5:56 ` Damien Le Moal
1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2025-08-19 5:56 UTC (permalink / raw)
To: Christoph Hellwig, Carlos Maiolino
Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
On 8/18/25 14:06, Christoph Hellwig wrote:
> This was my first attempt at caching the last used zone. But it turns out
> for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short
> sequence, the bmap btree does not record a written extent yet, so it fails.
> Because it then still finds the last written zone it can lead to a weird
> ping-pong around a few zones with writers seeing different values.
>
> Remove it entirely as the later added xfs_cached_zone actually does a
> much better job enforcing the locality as the zone is associated with the
> inode in the MRU cache as soon as the zone is selected.
>
> Fixes: 4e4d52075577 ("xfs: add the zoned space allocator")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
I ran several write & overwrite workloads with this. No problems detected.
So:
Tested-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier
2025-08-18 5:06 ` [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier Christoph Hellwig
2025-08-18 13:06 ` Hans Holmberg
@ 2025-08-19 7:17 ` Damien Le Moal
1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2025-08-19 7:17 UTC (permalink / raw)
To: Christoph Hellwig, Carlos Maiolino
Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
On 8/18/25 14:06, Christoph Hellwig wrote:
> No point in going down into the iomap mapping loop when we known it
> will be rejected.
s/known/know
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: zoned allocator fixes
2025-08-18 5:06 zoned allocator fixes Christoph Hellwig
` (2 preceding siblings ...)
2025-08-18 5:06 ` [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier Christoph Hellwig
@ 2025-08-19 13:11 ` Carlos Maiolino
3 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2025-08-19 13:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
On Mon, 18 Aug 2025 07:06:42 +0200, Christoph Hellwig wrote:
> this has fixes for the zoned allocator. One removes my original version
> of the inode zone affinity, which turned to be worse than the more
> general version later added by Hans. The second one ensures we run
> inodegc when out of space just like for the normal write path, and the
> third one rejects swapon directly instead of going into a rat hole
> iterating all extents first.
>
> [...]
Applied to for-next, thanks!
[1/3] xfs: remove xfs_last_used_zone
commit: d004d70d6cdf03928da0d05c8c15c2ccc15657cd
[2/3] xfs: kick off inodegc when failing to reserve zoned blocks
commit: 7d523255f524c95208cefef4edaed149615ff96c
[3/3] xfs: reject swapon for inodes on a zoned file system earlier
commit: 8e5a2441e18640fb22a25fd097368957bf5cab91
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-19 13:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 5:06 zoned allocator fixes Christoph Hellwig
2025-08-18 5:06 ` [PATCH 1/3] xfs: remove xfs_last_used_zone Christoph Hellwig
2025-08-18 13:05 ` Hans Holmberg
2025-08-19 5:56 ` Damien Le Moal
2025-08-18 5:06 ` [PATCH 2/3] xfs: kick off inodegc when failing to reserve zoned blocks Christoph Hellwig
2025-08-18 13:05 ` Hans Holmberg
2025-08-18 5:06 ` [PATCH 3/3] xfs: reject swapon for inodes on a zoned file system earlier Christoph Hellwig
2025-08-18 13:06 ` Hans Holmberg
2025-08-19 7:17 ` Damien Le Moal
2025-08-19 13:11 ` zoned allocator fixes Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).