* [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
@ 2025-06-04 8:46 Johannes Thumshirn
2025-06-04 16:16 ` Filipe Manana
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Thumshirn @ 2025-06-04 8:46 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, David Sterba, Josef Bacik,
Filipe Manana, Johannes Thumshirn
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Create a block group dedicated for data relocation on mount of a zoned
filesystem.
If there is already more than one empty DATA block group on mount, this
one is picked for the data relocation block group, instead of a newly
created one.
This is done to ensure, there is always space for performing garbage
collection and the filesystem is not hitting ENOSPC under heavy overwrite
workloads.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
Changes to v4:
- Use btrfs_commit_transaction() instead of btrfs_end_transaction
fs/btrfs/disk-io.c | 1 +
fs/btrfs/zoned.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/zoned.h | 3 +++
3 files changed, 65 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3def93016963..b211dc8cdb86 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3562,6 +3562,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
goto fail_sysfs;
}
+ btrfs_zoned_reserve_data_reloc_bg(fs_info);
btrfs_free_zone_cache(fs_info);
btrfs_check_active_zone_reservation(fs_info);
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 19710634d63f..4e122d6c19c0 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -17,6 +17,7 @@
#include "fs.h"
#include "accessors.h"
#include "bio.h"
+#include "transaction.h"
/* Maximum number of zones to report per blkdev_report_zones() call */
#define BTRFS_REPORT_NR_ZONES 4096
@@ -2443,6 +2444,66 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
spin_unlock(&fs_info->relocation_bg_lock);
}
+void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+ struct btrfs_space_info *space_info = data_sinfo->sub_group[0];
+ struct btrfs_trans_handle *trans;
+ struct btrfs_block_group *bg;
+ struct list_head *bg_list;
+ u64 alloc_flags;
+ bool initial = false;
+ bool did_chunk_alloc = false;
+ int index;
+ int ret;
+
+ if (!btrfs_is_zoned(fs_info))
+ return;
+
+ if (fs_info->data_reloc_bg)
+ return;
+
+ if (sb_rdonly(fs_info->sb))
+ return;
+
+ ASSERT(space_info->subgroup_id == BTRFS_SUB_GROUP_DATA_RELOC);
+ alloc_flags = btrfs_get_alloc_profile(fs_info, space_info->flags);
+ index = btrfs_bg_flags_to_raid_index(alloc_flags);
+
+ bg_list = &data_sinfo->block_groups[index];
+again:
+ list_for_each_entry(bg, bg_list, list) {
+ if (bg->used > 0)
+ continue;
+
+ if (!initial) {
+ initial = true;
+ continue;
+ }
+
+ fs_info->data_reloc_bg = bg->start;
+ set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
+ btrfs_zone_activate(bg);
+
+ return;
+ }
+
+ if (did_chunk_alloc)
+ return;
+
+ trans = btrfs_join_transaction(fs_info->tree_root);
+ if (IS_ERR(trans))
+ return;
+
+ ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
+ btrfs_commit_transaction(trans);
+ if (ret == 1) {
+ did_chunk_alloc = true;
+ bg_list = &space_info->block_groups[index];
+ goto again;
+ }
+}
+
void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
{
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 9672bf4c3335..6e11533b8e14 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -88,6 +88,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
struct extent_buffer *eb);
void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
+void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info);
void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info);
void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical,
@@ -241,6 +242,8 @@ static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
+static inline void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info) { }
+
static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
static inline bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info)
--
2.49.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
2025-06-04 8:46 [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount Johannes Thumshirn
@ 2025-06-04 16:16 ` Filipe Manana
2025-06-05 6:32 ` Johannes Thumshirn
0 siblings, 1 reply; 3+ messages in thread
From: Filipe Manana @ 2025-06-04 16:16 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-btrfs, Naohiro Aota, Damien Le Moal, David Sterba,
Josef Bacik, Filipe Manana, Johannes Thumshirn
On Wed, Jun 4, 2025 at 9:46 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Create a block group dedicated for data relocation on mount of a zoned
> filesystem.
>
> If there is already more than one empty DATA block group on mount, this
> one is picked for the data relocation block group, instead of a newly
> created one.
>
> This is done to ensure, there is always space for performing garbage
> collection and the filesystem is not hitting ENOSPC under heavy overwrite
> workloads.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> Changes to v4:
> - Use btrfs_commit_transaction() instead of btrfs_end_transaction
>
> fs/btrfs/disk-io.c | 1 +
> fs/btrfs/zoned.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/zoned.h | 3 +++
> 3 files changed, 65 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3def93016963..b211dc8cdb86 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3562,6 +3562,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> goto fail_sysfs;
> }
>
> + btrfs_zoned_reserve_data_reloc_bg(fs_info);
> btrfs_free_zone_cache(fs_info);
>
> btrfs_check_active_zone_reservation(fs_info);
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 19710634d63f..4e122d6c19c0 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -17,6 +17,7 @@
> #include "fs.h"
> #include "accessors.h"
> #include "bio.h"
> +#include "transaction.h"
>
> /* Maximum number of zones to report per blkdev_report_zones() call */
> #define BTRFS_REPORT_NR_ZONES 4096
> @@ -2443,6 +2444,66 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
> spin_unlock(&fs_info->relocation_bg_lock);
> }
>
> +void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
> +{
> + struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
> + struct btrfs_space_info *space_info = data_sinfo->sub_group[0];
> + struct btrfs_trans_handle *trans;
> + struct btrfs_block_group *bg;
> + struct list_head *bg_list;
> + u64 alloc_flags;
> + bool initial = false;
> + bool did_chunk_alloc = false;
> + int index;
> + int ret;
> +
> + if (!btrfs_is_zoned(fs_info))
> + return;
> +
> + if (fs_info->data_reloc_bg)
> + return;
> +
> + if (sb_rdonly(fs_info->sb))
> + return;
> +
> + ASSERT(space_info->subgroup_id == BTRFS_SUB_GROUP_DATA_RELOC);
> + alloc_flags = btrfs_get_alloc_profile(fs_info, space_info->flags);
> + index = btrfs_bg_flags_to_raid_index(alloc_flags);
> +
> + bg_list = &data_sinfo->block_groups[index];
> +again:
> + list_for_each_entry(bg, bg_list, list) {
> + if (bg->used > 0)
> + continue;
> +
> + if (!initial) {
> + initial = true;
> + continue;
> + }
> +
> + fs_info->data_reloc_bg = bg->start;
> + set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
> + btrfs_zone_activate(bg);
> +
> + return;
> + }
> +
> + if (did_chunk_alloc)
> + return;
> +
> + trans = btrfs_join_transaction(fs_info->tree_root);
> + if (IS_ERR(trans))
> + return;
> +
> + ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
> + btrfs_commit_transaction(trans);
Ok so the commit makes a difference and I suppose it fixes the zoned
specific corruption you mentioned before.
Can we please get a comment here that explains why it's needed?
Because normally we don't need to do it, it's enough to call
btrfs_end_transaction() and anyone is able to use the new block group.
Thanks.
> + if (ret == 1) {
> + did_chunk_alloc = true;
> + bg_list = &space_info->block_groups[index];
> + goto again;
> + }
> +}
> +
> void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
> {
> struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index 9672bf4c3335..6e11533b8e14 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -88,6 +88,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
> void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
> struct extent_buffer *eb);
> void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
> +void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info);
> void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
> bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info);
> void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical,
> @@ -241,6 +242,8 @@ static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
>
> static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
>
> +static inline void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info) { }
> +
> static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
>
> static inline bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info)
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
2025-06-04 16:16 ` Filipe Manana
@ 2025-06-05 6:32 ` Johannes Thumshirn
0 siblings, 0 replies; 3+ messages in thread
From: Johannes Thumshirn @ 2025-06-05 6:32 UTC (permalink / raw)
To: Filipe Manana, Johannes Thumshirn
Cc: linux-btrfs@vger.kernel.org, Naohiro Aota, Damien Le Moal,
David Sterba, Josef Bacik, Filipe Manana
On 04.06.25 18:17, Filipe Manana wrote:
>> + ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
>> + btrfs_commit_transaction(trans);
>
> Ok so the commit makes a difference and I suppose it fixes the zoned
> specific corruption you mentioned before.
>
> Can we please get a comment here that explains why it's needed?
> Because normally we don't need to do it, it's enough to call
> btrfs_end_transaction() and anyone is able to use the new block group.
With patch
https://lore.kernel.org/linux-btrfs/20250604103730.358907-1-jth@kernel.org/
applied even v4 works (with a clean test environment). So I think I can
go back to v4 but need to have the above applied first (after I've fixed
the kbuild error).
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-06-05 6:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 8:46 [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount Johannes Thumshirn
2025-06-04 16:16 ` Filipe Manana
2025-06-05 6:32 ` Johannes Thumshirn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.