All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
@ 2025-06-04  8:46 Johannes Thumshirn
  2025-06-04 16:16 ` Filipe Manana
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Thumshirn @ 2025-06-04  8:46 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, David Sterba, Josef Bacik,
	Filipe Manana, Johannes Thumshirn

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Create a block group dedicated for data relocation on mount of a zoned
filesystem.

If there is already more than one empty DATA block group on mount, this
one is picked for the data relocation block group, instead of a newly
created one.

This is done to ensure, there is always space for performing garbage
collection and the filesystem is not hitting ENOSPC under heavy overwrite
workloads.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
Changes to v4:
- Use btrfs_commit_transaction() instead of btrfs_end_transaction

 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/zoned.c   | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/zoned.h   |  3 +++
 3 files changed, 65 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3def93016963..b211dc8cdb86 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3562,6 +3562,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_sysfs;
 	}
 
+	btrfs_zoned_reserve_data_reloc_bg(fs_info);
 	btrfs_free_zone_cache(fs_info);
 
 	btrfs_check_active_zone_reservation(fs_info);
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 19710634d63f..4e122d6c19c0 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -17,6 +17,7 @@
 #include "fs.h"
 #include "accessors.h"
 #include "bio.h"
+#include "transaction.h"
 
 /* Maximum number of zones to report per blkdev_report_zones() call */
 #define BTRFS_REPORT_NR_ZONES   4096
@@ -2443,6 +2444,66 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
 	spin_unlock(&fs_info->relocation_bg_lock);
 }
 
+void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+	struct btrfs_space_info *space_info = data_sinfo->sub_group[0];
+	struct btrfs_trans_handle *trans;
+	struct btrfs_block_group *bg;
+	struct list_head *bg_list;
+	u64 alloc_flags;
+	bool initial = false;
+	bool did_chunk_alloc = false;
+	int index;
+	int ret;
+
+	if (!btrfs_is_zoned(fs_info))
+		return;
+
+	if (fs_info->data_reloc_bg)
+		return;
+
+	if (sb_rdonly(fs_info->sb))
+		return;
+
+	ASSERT(space_info->subgroup_id == BTRFS_SUB_GROUP_DATA_RELOC);
+	alloc_flags = btrfs_get_alloc_profile(fs_info, space_info->flags);
+	index = btrfs_bg_flags_to_raid_index(alloc_flags);
+
+	bg_list = &data_sinfo->block_groups[index];
+again:
+	list_for_each_entry(bg, bg_list, list) {
+		if (bg->used > 0)
+			continue;
+
+		if (!initial) {
+			initial = true;
+			continue;
+		}
+
+		fs_info->data_reloc_bg = bg->start;
+		set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
+		btrfs_zone_activate(bg);
+
+		return;
+	}
+
+	if (did_chunk_alloc)
+		return;
+
+	trans = btrfs_join_transaction(fs_info->tree_root);
+	if (IS_ERR(trans))
+		return;
+
+	ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
+	btrfs_commit_transaction(trans);
+	if (ret == 1) {
+		did_chunk_alloc = true;
+		bg_list = &space_info->block_groups[index];
+		goto again;
+	}
+}
+
 void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 9672bf4c3335..6e11533b8e14 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -88,6 +88,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
 void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
 				   struct extent_buffer *eb);
 void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
+void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info);
 void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
 bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info);
 void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical,
@@ -241,6 +242,8 @@ static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
 
 static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
 
+static inline void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info) { }
+
 static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
 
 static inline bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
  2025-06-04  8:46 [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount Johannes Thumshirn
@ 2025-06-04 16:16 ` Filipe Manana
  2025-06-05  6:32   ` Johannes Thumshirn
  0 siblings, 1 reply; 3+ messages in thread
From: Filipe Manana @ 2025-06-04 16:16 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-btrfs, Naohiro Aota, Damien Le Moal, David Sterba,
	Josef Bacik, Filipe Manana, Johannes Thumshirn

On Wed, Jun 4, 2025 at 9:46 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Create a block group dedicated for data relocation on mount of a zoned
> filesystem.
>
> If there is already more than one empty DATA block group on mount, this
> one is picked for the data relocation block group, instead of a newly
> created one.
>
> This is done to ensure, there is always space for performing garbage
> collection and the filesystem is not hitting ENOSPC under heavy overwrite
> workloads.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> Changes to v4:
> - Use btrfs_commit_transaction() instead of btrfs_end_transaction
>
>  fs/btrfs/disk-io.c |  1 +
>  fs/btrfs/zoned.c   | 61 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/zoned.h   |  3 +++
>  3 files changed, 65 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3def93016963..b211dc8cdb86 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3562,6 +3562,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>                 goto fail_sysfs;
>         }
>
> +       btrfs_zoned_reserve_data_reloc_bg(fs_info);
>         btrfs_free_zone_cache(fs_info);
>
>         btrfs_check_active_zone_reservation(fs_info);
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 19710634d63f..4e122d6c19c0 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -17,6 +17,7 @@
>  #include "fs.h"
>  #include "accessors.h"
>  #include "bio.h"
> +#include "transaction.h"
>
>  /* Maximum number of zones to report per blkdev_report_zones() call */
>  #define BTRFS_REPORT_NR_ZONES   4096
> @@ -2443,6 +2444,66 @@ void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg)
>         spin_unlock(&fs_info->relocation_bg_lock);
>  }
>
> +void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info)
> +{
> +       struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
> +       struct btrfs_space_info *space_info = data_sinfo->sub_group[0];
> +       struct btrfs_trans_handle *trans;
> +       struct btrfs_block_group *bg;
> +       struct list_head *bg_list;
> +       u64 alloc_flags;
> +       bool initial = false;
> +       bool did_chunk_alloc = false;
> +       int index;
> +       int ret;
> +
> +       if (!btrfs_is_zoned(fs_info))
> +               return;
> +
> +       if (fs_info->data_reloc_bg)
> +               return;
> +
> +       if (sb_rdonly(fs_info->sb))
> +               return;
> +
> +       ASSERT(space_info->subgroup_id == BTRFS_SUB_GROUP_DATA_RELOC);
> +       alloc_flags = btrfs_get_alloc_profile(fs_info, space_info->flags);
> +       index = btrfs_bg_flags_to_raid_index(alloc_flags);
> +
> +       bg_list = &data_sinfo->block_groups[index];
> +again:
> +       list_for_each_entry(bg, bg_list, list) {
> +               if (bg->used > 0)
> +                       continue;
> +
> +               if (!initial) {
> +                       initial = true;
> +                       continue;
> +               }
> +
> +               fs_info->data_reloc_bg = bg->start;
> +               set_bit(BLOCK_GROUP_FLAG_ZONED_DATA_RELOC, &bg->runtime_flags);
> +               btrfs_zone_activate(bg);
> +
> +               return;
> +       }
> +
> +       if (did_chunk_alloc)
> +               return;
> +
> +       trans = btrfs_join_transaction(fs_info->tree_root);
> +       if (IS_ERR(trans))
> +               return;
> +
> +       ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
> +       btrfs_commit_transaction(trans);

Ok so the commit makes a difference and I suppose it fixes the zoned
specific corruption you mentioned before.

Can we please get a comment here that explains why it's needed?
Because normally we don't need to do it, it's enough to call
btrfs_end_transaction() and anyone is able to use the new block group.

Thanks.

> +       if (ret == 1) {
> +               did_chunk_alloc = true;
> +               bg_list = &space_info->block_groups[index];
> +               goto again;
> +       }
> +}
> +
>  void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info)
>  {
>         struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index 9672bf4c3335..6e11533b8e14 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -88,6 +88,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical,
>  void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
>                                    struct extent_buffer *eb);
>  void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg);
> +void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info);
>  void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info);
>  bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info);
>  void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical,
> @@ -241,6 +242,8 @@ static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg,
>
>  static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { }
>
> +static inline void btrfs_zoned_reserve_data_reloc_bg(struct btrfs_fs_info *fs_info) { }
> +
>  static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { }
>
>  static inline bool btrfs_zoned_should_reclaim(const struct btrfs_fs_info *fs_info)
> --
> 2.49.0
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount
  2025-06-04 16:16 ` Filipe Manana
@ 2025-06-05  6:32   ` Johannes Thumshirn
  0 siblings, 0 replies; 3+ messages in thread
From: Johannes Thumshirn @ 2025-06-05  6:32 UTC (permalink / raw)
  To: Filipe Manana, Johannes Thumshirn
  Cc: linux-btrfs@vger.kernel.org, Naohiro Aota, Damien Le Moal,
	David Sterba, Josef Bacik, Filipe Manana

On 04.06.25 18:17, Filipe Manana wrote:
>> +       ret = btrfs_chunk_alloc(trans, space_info, alloc_flags, CHUNK_ALLOC_FORCE);
>> +       btrfs_commit_transaction(trans);
> 
> Ok so the commit makes a difference and I suppose it fixes the zoned
> specific corruption you mentioned before.
> 
> Can we please get a comment here that explains why it's needed?
> Because normally we don't need to do it, it's enough to call
> btrfs_end_transaction() and anyone is able to use the new block group.

With patch

https://lore.kernel.org/linux-btrfs/20250604103730.358907-1-jth@kernel.org/

applied even v4 works (with a clean test environment). So I think I can
go back to v4 but need to have the above applied first (after I've fixed
the kbuild error).

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-05  6:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04  8:46 [PATCH v5] btrfs: zoned: reserve data_reloc block group on mount Johannes Thumshirn
2025-06-04 16:16 ` Filipe Manana
2025-06-05  6:32   ` Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.