* [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-14 14:44 ` Boris Burkov
2026-05-13 12:34 ` [PATCH 2/7] btrfs: zoned: decode 'RECLAIM_ZONES' state in tracepoints Johannes Thumshirn
` (6 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/space-info.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index f0436eea1544..58256a9c056d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1411,6 +1411,11 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
* This is where we reclaim all of the pinned space generated by running the
* iputs
*
+ * RECLAIM_ZONES
+ * This state only works for the zoned mode. We scan the block groups in the
+ * reclaim_bgs_list and check if we can relocate them. If yes perform the
+ * relocation to garbage collect the zone.
+ *
* RESET_ZONES
* This state works only for the zoned mode. We scan the unused block group
* list and reset the zones and reuse the block group.
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state
2026-05-13 12:34 ` [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state Johannes Thumshirn
@ 2026-05-14 14:44 ` Boris Burkov
0 siblings, 0 replies; 14+ messages in thread
From: Boris Burkov @ 2026-05-14 14:44 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs, Filipe Manana, David Sterba, Hans Holmberg,
Damien Le Moal, Naohiro Aota, Christoph Hellwig
On Wed, May 13, 2026 at 02:34:39PM +0200, Johannes Thumshirn wrote:
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> fs/btrfs/space-info.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index f0436eea1544..58256a9c056d 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1411,6 +1411,11 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
> * This is where we reclaim all of the pinned space generated by running the
> * iputs
> *
> + * RECLAIM_ZONES
> + * This state only works for the zoned mode. We scan the block groups in the
> + * reclaim_bgs_list and check if we can relocate them. If yes perform the
> + * relocation to garbage collect the zone.
> + *
I think it would be helpful for this to be clear on how many bgs it tries
to relocate
> * RESET_ZONES
> * This state works only for the zoned mode. We scan the unused block group
> * list and reset the zones and reuse the block group.
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/7] btrfs: zoned: decode 'RECLAIM_ZONES' state in tracepoints
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 3/7] btrfs: zoned: always set data_relocation_bg Johannes Thumshirn
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
Decode the 'RECLAIM_ZONES' state in tracepoints, as of now only the
numerical state is shown in the tracepoints.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/trace/events/btrfs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index ec1df8b94517..ed272a100fa8 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -101,6 +101,7 @@ struct find_free_extent_ctl;
EM( ALLOC_CHUNK_FORCE, "ALLOC_CHUNK_FORCE") \
EM( RUN_DELAYED_IPUTS, "RUN_DELAYED_IPUTS") \
EM( COMMIT_TRANS, "COMMIT_TRANS") \
+ EM( RECLAIM_ZONES, "RECLAIM_ZONES") \
EMe(RESET_ZONES, "RESET_ZONES")
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 3/7] btrfs: zoned: always set data_relocation_bg
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 2/7] btrfs: zoned: decode 'RECLAIM_ZONES' state in tracepoints Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-14 5:42 ` Damien Le Moal
2026-05-14 14:54 ` Boris Burkov
2026-05-13 12:34 ` [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space Johannes Thumshirn
` (4 subsequent siblings)
7 siblings, 2 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
When searching for a data relocation block-group on mount,
btrfs_zoned_reserve_data_reloc_bg() is looking for the first empty DATA
block-group. But it first checks if the block-group is empty and if yes
continues the search, and then checks if it is the first DATA block-group.
Reverse the order, first check if it is the first DATA block-group found
and skip, then check if the block-group is empty. This enusres that there
is always a DATA block-group set as data_relocation_bg in the filesystem.
Singed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/zoned.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 16dd87aa06f2..0b80b498fb85 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2784,14 +2784,14 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
again:
bg_list = &space_info->block_groups[index];
list_for_each_entry(bg, bg_list, list) {
- if (bg->alloc_offset != 0)
- continue;
-
if (first) {
first = false;
continue;
}
+ if (bg->alloc_offset != 0)
+ continue;
+
if (space_info == data_sinfo) {
/* Migrate the block group to the data relocation space_info. */
struct btrfs_space_info *reloc_sinfo = data_sinfo->sub_group[0];
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 3/7] btrfs: zoned: always set data_relocation_bg
2026-05-13 12:34 ` [PATCH 3/7] btrfs: zoned: always set data_relocation_bg Johannes Thumshirn
@ 2026-05-14 5:42 ` Damien Le Moal
2026-05-14 14:54 ` Boris Burkov
1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2026-05-14 5:42 UTC (permalink / raw)
To: Johannes Thumshirn, linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Naohiro Aota, Christoph Hellwig
On 2026/05/13 21:34, Johannes Thumshirn wrote:
> When searching for a data relocation block-group on mount,
> btrfs_zoned_reserve_data_reloc_bg() is looking for the first empty DATA
> block-group. But it first checks if the block-group is empty and if yes
> continues the search, and then checks if it is the first DATA block-group.
>
> Reverse the order, first check if it is the first DATA block-group found
> and skip, then check if the block-group is empty. This enusres that there
> is always a DATA block-group set as data_relocation_bg in the filesystem.
>
> Singed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Doesn't this need a Fixes tag ?
> ---
> fs/btrfs/zoned.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 16dd87aa06f2..0b80b498fb85 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2784,14 +2784,14 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
> again:
> bg_list = &space_info->block_groups[index];
> list_for_each_entry(bg, bg_list, list) {
> - if (bg->alloc_offset != 0)
> - continue;
> -
> if (first) {
> first = false;
> continue;
> }
>
> + if (bg->alloc_offset != 0)
> + continue;
> +
> if (space_info == data_sinfo) {
> /* Migrate the block group to the data relocation space_info. */
> struct btrfs_space_info *reloc_sinfo = data_sinfo->sub_group[0];
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 3/7] btrfs: zoned: always set data_relocation_bg
2026-05-13 12:34 ` [PATCH 3/7] btrfs: zoned: always set data_relocation_bg Johannes Thumshirn
2026-05-14 5:42 ` Damien Le Moal
@ 2026-05-14 14:54 ` Boris Burkov
1 sibling, 0 replies; 14+ messages in thread
From: Boris Burkov @ 2026-05-14 14:54 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs, Filipe Manana, David Sterba, Hans Holmberg,
Damien Le Moal, Naohiro Aota, Christoph Hellwig
On Wed, May 13, 2026 at 02:34:41PM +0200, Johannes Thumshirn wrote:
> When searching for a data relocation block-group on mount,
> btrfs_zoned_reserve_data_reloc_bg() is looking for the first empty DATA
> block-group. But it first checks if the block-group is empty and if yes
> continues the search, and then checks if it is the first DATA block-group.
>
> Reverse the order, first check if it is the first DATA block-group found
> and skip, then check if the block-group is empty. This enusres that there
> is always a DATA block-group set as data_relocation_bg in the filesystem.
You want the second empty block group? or the first? if you want the
first, then checking for "first" ever seems pointless. If you want the
second, then I think you need to layer the "if (first)" inside the
successful non-empty check?
As is, it just unilaterally skips the first bg even if it's not empty
and will grab the first empty one. It makes sense to not use the first
empty for reloc (that's for alloc presumably) but I don't see why just
plain skipping the first one makes sense.
>
> Singed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> fs/btrfs/zoned.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 16dd87aa06f2..0b80b498fb85 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2784,14 +2784,14 @@ void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
> again:
> bg_list = &space_info->block_groups[index];
> list_for_each_entry(bg, bg_list, list) {
> - if (bg->alloc_offset != 0)
> - continue;
> -
> if (first) {
> first = false;
> continue;
> }
>
> + if (bg->alloc_offset != 0)
> + continue;
> +
> if (space_info == data_sinfo) {
> /* Migrate the block group to the data relocation space_info. */
> struct btrfs_space_info *reloc_sinfo = data_sinfo->sub_group[0];
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
` (2 preceding siblings ...)
2026-05-13 12:34 ` [PATCH 3/7] btrfs: zoned: always set data_relocation_bg Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-14 5:42 ` Damien Le Moal
2026-05-13 12:34 ` [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs Johannes Thumshirn
` (3 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
Don't account the free space in a data relocation space-info sub-group as
usable free space in statfs.
This is misleading as no user allocations can be made in this space-info
sub-group. It is only a target for relocation.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index fb15decb0861..a0dbc0d2213f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1739,7 +1739,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
int mixed = 0;
list_for_each_entry(found, &fs_info->space_info, list) {
- if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
+ if (found->flags & BTRFS_BLOCK_GROUP_DATA &&
+ found->subgroup_id != BTRFS_SUB_GROUP_DATA_RELOC) {
int i;
total_free_data += found->disk_total - found->disk_used;
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space
2026-05-13 12:34 ` [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space Johannes Thumshirn
@ 2026-05-14 5:42 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2026-05-14 5:42 UTC (permalink / raw)
To: Johannes Thumshirn, linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Naohiro Aota, Christoph Hellwig
On 2026/05/13 21:34, Johannes Thumshirn wrote:
> Don't account the free space in a data relocation space-info sub-group as
> usable free space in statfs.
>
> This is misleading as no user allocations can be made in this space-info
> sub-group. It is only a target for relocation.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
It feels like this one also needs a Fixes tag...
> ---
> fs/btrfs/super.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index fb15decb0861..a0dbc0d2213f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1739,7 +1739,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> int mixed = 0;
>
> list_for_each_entry(found, &fs_info->space_info, list) {
> - if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
> + if (found->flags & BTRFS_BLOCK_GROUP_DATA &&
> + found->subgroup_id != BTRFS_SUB_GROUP_DATA_RELOC) {
> int i;
>
> total_free_data += found->disk_total - found->disk_used;
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
` (3 preceding siblings ...)
2026-05-13 12:34 ` [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-14 5:43 ` Damien Le Moal
2026-05-13 12:34 ` [PATCH 6/7] btrfs: zoned: fix deadlock waiting for ticket during data relocation Johannes Thumshirn
` (2 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
On zoned filesystems, space in block groups that has been freed but not
yet reset is tracked in bytes_zone_unusable. This space cannot be used for
new allocations until zone reclaim resets the zones, but it was being
reported as available space in statfs.
This caused statfs to over-report free space, leading to ENOSPC errors
when applications tried to allocate based on the reported free space.
Fix this by subtracting bytes_zone_unusable from total_free_data in the
statfs calculation for zoned filesystems.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index a0dbc0d2213f..498aa3039fbe 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1752,6 +1752,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
factor = btrfs_bg_type_to_factor(
btrfs_raid_array[i].bg_flag);
}
+
+ total_free_data -= found->bytes_zone_unusable * factor;
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs
2026-05-13 12:34 ` [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs Johannes Thumshirn
@ 2026-05-14 5:43 ` Damien Le Moal
0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2026-05-14 5:43 UTC (permalink / raw)
To: Johannes Thumshirn, linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Naohiro Aota, Christoph Hellwig
On 2026/05/13 21:34, Johannes Thumshirn wrote:
> On zoned filesystems, space in block groups that has been freed but not
> yet reset is tracked in bytes_zone_unusable. This space cannot be used for
> new allocations until zone reclaim resets the zones, but it was being
> reported as available space in statfs.
>
> This caused statfs to over-report free space, leading to ENOSPC errors
> when applications tried to allocate based on the reported free space.
>
> Fix this by subtracting bytes_zone_unusable from total_free_data in the
> statfs calculation for zoned filesystems.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Fixes tag here as well ? and same for the following patch.
> ---
> fs/btrfs/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index a0dbc0d2213f..498aa3039fbe 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1752,6 +1752,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> factor = btrfs_bg_type_to_factor(
> btrfs_raid_array[i].bg_flag);
> }
> +
> + total_free_data -= found->bytes_zone_unusable * factor;
> }
>
> /*
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 6/7] btrfs: zoned: fix deadlock waiting for ticket during data relocation
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
` (4 preceding siblings ...)
2026-05-13 12:34 ` [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-13 12:34 ` [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop Johannes Thumshirn
2026-05-14 14:43 ` [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Boris Burkov
7 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
When performing data relocation on a zoned filesystem, BTRFS can deadlock
in handle_reserve_tickets(). The relocation process is waiting on a space
reservation ticket that can never be fulfilled, because the relocation
itself is the operation responsible for freeing up that space.
Fix this by introducing a new flush state,
BTRFS_RESERVE_FLUSH_DATA_RELOCATION, specifically for data chunk
allocation during zoned relocation. Like
BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE, this state uses
priority_reclaim_data_space() instead of the normal flushing path, which
avoids re-entering the relocation code and breaking the deadlock cycle.
In btrfs_alloc_data_chunk_ondemand(), select this new flush state when the
inode belongs to a data relocation root on a zoned filesystem.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/delalloc-space.c | 2 ++
fs/btrfs/space-info.c | 2 ++
fs/btrfs/space-info.h | 11 +++++++++++
3 files changed, 15 insertions(+)
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 0970799d0aa4..c9d3ec6bbc3c 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -134,6 +134,8 @@ int btrfs_alloc_data_chunk_ondemand(const struct btrfs_inode *inode, u64 bytes)
if (btrfs_is_free_space_inode(inode))
flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
+ else if (btrfs_is_zoned(fs_info) && btrfs_is_data_reloc_root(root))
+ flush = BTRFS_RESERVE_FLUSH_DATA_RELOCATION;
return btrfs_reserve_data_bytes(data_sinfo_for_inode(inode), bytes, flush);
}
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 58256a9c056d..ec811a77ebb1 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1703,6 +1703,7 @@ static int handle_reserve_ticket(struct btrfs_space_info *space_info,
ARRAY_SIZE(evict_flush_states));
break;
case BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE:
+ case BTRFS_RESERVE_FLUSH_DATA_RELOCATION:
priority_reclaim_data_space(space_info, ticket);
break;
default:
@@ -1966,6 +1967,7 @@ int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes,
ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE ||
+ flush == BTRFS_RESERVE_FLUSH_DATA_RELOCATION ||
flush == BTRFS_RESERVE_NO_FLUSH, "flush=%d", flush);
ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA,
"current->journal_info=0x%lx flush=%d",
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 24f45072ca4b..f2b8be2af5c3 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -77,6 +77,17 @@ enum btrfs_reserve_flush_enum {
*/
BTRFS_RESERVE_FLUSH_ALL_STEAL,
+ /*
+ * This is for relocation on zoned filesystems only. We need to use
+ * priority flushing for this, because otherwise we can deadlock on
+ * waiting for a ticket, that cannot be granted, because we cannot do
+ * any allocations.
+ *
+ * Apart from being specific to zoned relocation, it is equal to
+ * BTRFS_FLUSH_FREE_SPACE_INODE.
+ */
+ BTRFS_RESERVE_FLUSH_DATA_RELOCATION,
+
/*
* This is for btrfs_use_block_rsv only. We have exhausted our block
* rsv and our global block rsv. This can happen for things like
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
` (5 preceding siblings ...)
2026-05-13 12:34 ` [PATCH 6/7] btrfs: zoned: fix deadlock waiting for ticket during data relocation Johannes Thumshirn
@ 2026-05-13 12:34 ` Johannes Thumshirn
2026-05-14 14:43 ` [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Boris Burkov
7 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2026-05-13 12:34 UTC (permalink / raw)
To: linux-btrfs
Cc: Filipe Manana, David Sterba, Hans Holmberg, Boris Burkov,
Damien Le Moal, Naohiro Aota, Christoph Hellwig,
Johannes Thumshirn
On zoned filesystems, when waiting for space tickets during data
relocation, the async reclaim flush state machine may starve if
RECLAIM_ZONES and RESET_ZONES states are not executed early in the flush
sequence.
Currently do_async_reclaim_data_space() only executes RECLAIM_ZONES and
RESET_ZONES in later flush states (FLUSH_DELALLOC and beyond), but by
the time these states are reached, the ticket wait may have already
deadlocked waiting for space that can only be freed by zone reset.
Fix this by adding RECLAIM_ZONES and RESET_ZONES to the first async
reclaim loop (FLUSH_ALLOC) for zoned filesystems, ensuring zone reset
happens early enough to free space for pending allocation tickets.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
This patch was AI assisted and I'm not sure this is the correct thing to
do (the flushing, not the use of AI), hence the RFC tag.
fs/btrfs/space-info.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index ec811a77ebb1..a1235f114f3e 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1451,6 +1451,17 @@ static void do_async_reclaim_data_space(struct btrfs_space_info *space_info)
while (!space_info->full) {
flush_space(space_info, U64_MAX, ALLOC_CHUNK_FORCE, false);
+ /*
+ * For zoned filesystems, also run RECLAIM_ZONES and RESET_ZONES
+ * in the first loop to avoid starvation. Zoned filesystems have
+ * sequential write requirements, so space cannot be reused until
+ * zones are reset. Running these states early ensures zones are
+ * reclaimed and reset before we get into a starvation situation.
+ */
+ if (btrfs_is_zoned(fs_info)) {
+ flush_space(space_info, U64_MAX, RECLAIM_ZONES, false);
+ flush_space(space_info, U64_MAX, RESET_ZONES, false);
+ }
spin_lock(&space_info->lock);
if (list_empty(&space_info->tickets)) {
space_info->flush = false;
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
` (6 preceding siblings ...)
2026-05-13 12:34 ` [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop Johannes Thumshirn
@ 2026-05-14 14:43 ` Boris Burkov
7 siblings, 0 replies; 14+ messages in thread
From: Boris Burkov @ 2026-05-14 14:43 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs, Filipe Manana, David Sterba, Hans Holmberg,
Damien Le Moal, Naohiro Aota, Christoph Hellwig
On Wed, May 13, 2026 at 02:34:38PM +0200, Johannes Thumshirn wrote:
> This series fixes premature ENOSPC errors and starvation issues on zoned BTRFS
> filesystems when running xfstest generic/747, which tests garbage collection
> on zoned block devices using direct and buffered I/O.
>
> The investigation revealed two distinct issues:
>
> 1. Async reclaim starvation: On zoned filesystems, the flush state machine
> only executes RECLAIM_ZONES and RESET_ZONES in later flush states
> (FLUSH_DELALLOC and beyond). By the time these states are reached,
> ticket waiters can starvation waiting for space that can only be freed
"can starvation" reads like a typo or lost sentence edit
> by zone reset. The fix adds RECLAIM_ZONES and RESET_ZONES to the first
> async reclaim loop (FLUSH_ALLOC) specifically for zoned filesystems,
> ensuring zone reset happens early enough to free space for pending
> allocation tickets.
>
> 2. Inaccurate statfs reporting: On zoned filesystems, space in block
> groups that has been freed but not yet reset is tracked in
> bytes_zone_unusable. This space cannot be used for new allocations
> until zone reclaim resets the zones, but it was being reported as
> available space in statfs. This caused statfs to over-report free
> space, leading to ENOSPC errors when applications tried to allocate
> based on the reported free space. The fix subtracts bytes_zone_unusable
> from total_free_data in the statfs calculation for zoned filesystems,
> with proper RAID factor multiplication for unit conversion.
>
> Additionally, the series fixes a bug in data relocation block group selection
> where the first block group was incorrectly skipped, and adds a new flush
> state (BTRFS_RESERVE_FLUSH_DATA_RELOCATION) to use priority reclaim for
> zoned data relocation operations.
>
>
> Johannes Thumshirn (7):
> btrfs: zoned: document RECLAIM_ZONES flush state
> btrfs: zoned: decode 'RECLAIM_ZONES' state in tracepoints
> btrfs: zoned: always set data_relocation_bg
> btrfs: zoned: don't account data relocation space-info in statfs free
> space
> btrfs: zoned: subtract zone_unusable space in statfs
> btrfs: zoned: fix deadlock waiting for ticket during data relocation
> btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim
> loop
>
> fs/btrfs/delalloc-space.c | 2 ++
> fs/btrfs/space-info.c | 18 ++++++++++++++++++
> fs/btrfs/space-info.h | 11 +++++++++++
> fs/btrfs/super.c | 5 ++++-
> fs/btrfs/zoned.c | 6 +++---
> include/trace/events/btrfs.h | 1 +
> 6 files changed, 39 insertions(+), 4 deletions(-)
>
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread