public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 06/16] btrfs: add extended version of struct block_group_item
Date: Fri, 31 Oct 2025 14:47:46 -0700	[thread overview]
Message-ID: <aQUugkU1TkymUM1T@devvm12410.ftw0.facebook.com> (raw)
In-Reply-To: <20251024181227.32228-7-mark@harmstone.com>

On Fri, Oct 24, 2025 at 07:12:07PM +0100, Mark Harmstone wrote:
> Add a struct btrfs_block_group_item_v2, which is used in the block group
> tree if the remap-tree incompat flag is set.
> 
> This adds two new fields to the block group item: `remap_bytes` and
> `identity_remap_count`.
> 
> `remap_bytes` records the amount of data that's physically within this
> block group, but nominally in another, remapped block group. This is
> necessary because this data will need to be moved first if this block
> group is itself relocated. If `remap_bytes` > 0, this is an indicator to
> the relocation thread that it will need to search the remap-tree for
> backrefs. A block group must also have `remap_bytes` == 0 before it can
> be dropped.
> 
> `identity_remap_count` records how many identity remap items are located
> in the remap tree for this block group. When relocation is begun for
> this block group, this is set to the number of holes in the free-space
> tree for this range. As identity remaps are converted into actual remaps
> by the relocation process, this number is decreased. Once it reaches 0,
> either because of relocation or because extents have been deleted, the
> block group has been fully remapped and its chunk's device extents are
> removed.
> 
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/accessors.h            |  20 +++++++
>  fs/btrfs/block-group.c          | 100 ++++++++++++++++++++++++--------
>  fs/btrfs/block-group.h          |  14 ++++-
>  fs/btrfs/discard.c              |   2 +-
>  fs/btrfs/tree-checker.c         |  10 +++-
>  include/uapi/linux/btrfs_tree.h |   8 +++
>  6 files changed, 126 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
> index 95a1ca8c099b..0dd161ee6863 100644
> --- a/fs/btrfs/accessors.h
> +++ b/fs/btrfs/accessors.h
> @@ -239,6 +239,26 @@ BTRFS_SETGET_FUNCS(block_group_flags, struct btrfs_block_group_item, flags, 64);
>  BTRFS_SETGET_STACK_FUNCS(stack_block_group_flags,
>  			struct btrfs_block_group_item, flags, 64);
>  
> +/* struct btrfs_block_group_item_v2 */
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_used, struct btrfs_block_group_item_v2,
> +			 used, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_used, struct btrfs_block_group_item_v2, used, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_chunk_objectid,
> +			 struct btrfs_block_group_item_v2, chunk_objectid, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_chunk_objectid,
> +		   struct btrfs_block_group_item_v2, chunk_objectid, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_flags,
> +			 struct btrfs_block_group_item_v2, flags, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_flags, struct btrfs_block_group_item_v2, flags, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_remap_bytes,
> +			 struct btrfs_block_group_item_v2, remap_bytes, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_remap_bytes, struct btrfs_block_group_item_v2,
> +		   remap_bytes, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_identity_remap_count,
> +			 struct btrfs_block_group_item_v2, identity_remap_count, 32);
> +BTRFS_SETGET_FUNCS(block_group_v2_identity_remap_count, struct btrfs_block_group_item_v2,
> +		   identity_remap_count, 32);
> +
>  /* struct btrfs_free_space_info */
>  BTRFS_SETGET_FUNCS(free_space_extent_count, struct btrfs_free_space_info,
>  		   extent_count, 32);
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index b5f2ec8d013f..27173aca6fc1 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2374,7 +2374,7 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>  }
>  
>  static int read_one_block_group(struct btrfs_fs_info *info,
> -				struct btrfs_block_group_item *bgi,
> +				struct btrfs_block_group_item_v2 *bgi,
>  				const struct btrfs_key *key,
>  				int need_clear)
>  {
> @@ -2389,11 +2389,16 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>  		return -ENOMEM;
>  
>  	cache->length = key->offset;
> -	cache->used = btrfs_stack_block_group_used(bgi);
> +	cache->used = btrfs_stack_block_group_v2_used(bgi);
>  	cache->commit_used = cache->used;
> -	cache->flags = btrfs_stack_block_group_flags(bgi);
> -	cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
> +	cache->flags = btrfs_stack_block_group_v2_flags(bgi);
> +	cache->global_root_id = btrfs_stack_block_group_v2_chunk_objectid(bgi);
>  	cache->space_info = btrfs_find_space_info(info, cache->flags);
> +	cache->remap_bytes = btrfs_stack_block_group_v2_remap_bytes(bgi);
> +	cache->commit_remap_bytes = cache->remap_bytes;
> +	cache->identity_remap_count =
> +		btrfs_stack_block_group_v2_identity_remap_count(bgi);
> +	cache->commit_identity_remap_count = cache->identity_remap_count;
>  
>  	btrfs_set_free_space_tree_thresholds(cache);
>  
> @@ -2458,7 +2463,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>  	} else if (cache->length == cache->used) {
>  		cache->cached = BTRFS_CACHE_FINISHED;
>  		btrfs_free_excluded_extents(cache);
> -	} else if (cache->used == 0) {
> +	} else if (cache->used == 0 && cache->remap_bytes == 0) {
>  		cache->cached = BTRFS_CACHE_FINISHED;
>  		ret = btrfs_add_new_free_space(cache, cache->start,
>  					       cache->start + cache->length, NULL);
> @@ -2478,7 +2483,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>  
>  	set_avail_alloc_bits(info, cache->flags);
>  	if (btrfs_chunk_writeable(info, cache->start)) {
> -		if (cache->used == 0) {
> +		if (cache->used == 0 && cache->remap_bytes == 0) {
>  			ASSERT(list_empty(&cache->bg_list));
>  			if (btrfs_test_opt(info, DISCARD_ASYNC))
>  				btrfs_discard_queue_work(&info->discard_ctl, cache);
> @@ -2582,9 +2587,10 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		need_clear = 1;
>  
>  	while (1) {
> -		struct btrfs_block_group_item bgi;
> +		struct btrfs_block_group_item_v2 bgi;
>  		struct extent_buffer *leaf;
>  		int slot;
> +		size_t size;
>  
>  		ret = find_first_block_group(info, path, &key);
>  		if (ret > 0)
> @@ -2595,8 +2601,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		leaf = path->nodes[0];
>  		slot = path->slots[0];
>  
> +		if (btrfs_fs_incompat(info, REMAP_TREE)) {
> +			size = sizeof(struct btrfs_block_group_item_v2);
> +		} else {
> +			size = sizeof(struct btrfs_block_group_item);
> +			btrfs_set_stack_block_group_v2_remap_bytes(&bgi, 0);
> +			btrfs_set_stack_block_group_v2_identity_remap_count(&bgi, 0);
> +		}
> +
>  		read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
> -				   sizeof(bgi));
> +				   size);
>  
>  		btrfs_item_key_to_cpu(leaf, &key, slot);
>  		btrfs_release_path(path);
> @@ -2666,25 +2680,38 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
>  				   struct btrfs_block_group *block_group)
>  {
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
> -	struct btrfs_block_group_item bgi;
> +	struct btrfs_block_group_item_v2 bgi;
>  	struct btrfs_root *root = btrfs_block_group_root(fs_info);
>  	struct btrfs_key key;
>  	u64 old_commit_used;
> +	size_t size;
>  	int ret;
>  
>  	spin_lock(&block_group->lock);
> -	btrfs_set_stack_block_group_used(&bgi, block_group->used);
> -	btrfs_set_stack_block_group_chunk_objectid(&bgi,
> -						   block_group->global_root_id);
> -	btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
> +	btrfs_set_stack_block_group_v2_used(&bgi, block_group->used);
> +	btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
> +						      block_group->global_root_id);
> +	btrfs_set_stack_block_group_v2_flags(&bgi, block_group->flags);
> +	btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
> +						   block_group->remap_bytes);
> +	btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
> +					block_group->identity_remap_count);
>  	old_commit_used = block_group->commit_used;
>  	block_group->commit_used = block_group->used;
> +	block_group->commit_remap_bytes = block_group->remap_bytes;
> +	block_group->commit_identity_remap_count =
> +		block_group->identity_remap_count;
>  	key.objectid = block_group->start;
>  	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
>  	key.offset = block_group->length;
>  	spin_unlock(&block_group->lock);
>  
> -	ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
> +	if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> +		size = sizeof(struct btrfs_block_group_item_v2);
> +	else
> +		size = sizeof(struct btrfs_block_group_item);
> +
> +	ret = btrfs_insert_item(trans, root, &key, &bgi, size);
>  	if (ret < 0) {
>  		spin_lock(&block_group->lock);
>  		block_group->commit_used = old_commit_used;
> @@ -3139,10 +3166,12 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>  	struct btrfs_root *root = btrfs_block_group_root(fs_info);
>  	unsigned long bi;
>  	struct extent_buffer *leaf;
> -	struct btrfs_block_group_item bgi;
> +	struct btrfs_block_group_item_v2 bgi;
>  	struct btrfs_key key;
> -	u64 old_commit_used;
> -	u64 used;
> +	u64 old_commit_used, old_commit_remap_bytes;
> +	u32 old_commit_identity_remap_count;
> +	u64 used, remap_bytes;
> +	u32 identity_remap_count;
>  
>  	/*
>  	 * Block group items update can be triggered out of commit transaction
> @@ -3152,13 +3181,21 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>  	 */
>  	spin_lock(&cache->lock);
>  	old_commit_used = cache->commit_used;
> +	old_commit_remap_bytes = cache->commit_remap_bytes;
> +	old_commit_identity_remap_count = cache->commit_identity_remap_count;
>  	used = cache->used;
> -	/* No change in used bytes, can safely skip it. */
> -	if (cache->commit_used == used) {
> +	remap_bytes = cache->remap_bytes;
> +	identity_remap_count = cache->identity_remap_count;
> +	/* No change in values, can safely skip it. */
> +	if (cache->commit_used == used &&
> +	    cache->commit_remap_bytes == remap_bytes &&
> +	    cache->commit_identity_remap_count == identity_remap_count) {
>  		spin_unlock(&cache->lock);
>  		return 0;
>  	}
>  	cache->commit_used = used;
> +	cache->commit_remap_bytes = remap_bytes;
> +	cache->commit_identity_remap_count = identity_remap_count;
>  	spin_unlock(&cache->lock);
>  
>  	key.objectid = cache->start;
> @@ -3174,11 +3211,23 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>  
>  	leaf = path->nodes[0];
>  	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
> -	btrfs_set_stack_block_group_used(&bgi, used);
> -	btrfs_set_stack_block_group_chunk_objectid(&bgi,
> -						   cache->global_root_id);
> -	btrfs_set_stack_block_group_flags(&bgi, cache->flags);
> -	write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
> +	btrfs_set_stack_block_group_v2_used(&bgi, used);
> +	btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
> +						      cache->global_root_id);
> +	btrfs_set_stack_block_group_v2_flags(&bgi, cache->flags);
> +
> +	if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
> +		btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
> +							   cache->remap_bytes);
> +		btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
> +						cache->identity_remap_count);
> +		write_extent_buffer(leaf, &bgi, bi,
> +				    sizeof(struct btrfs_block_group_item_v2));
> +	} else {
> +		write_extent_buffer(leaf, &bgi, bi,
> +				    sizeof(struct btrfs_block_group_item));
> +	}
> +
>  fail:
>  	btrfs_release_path(path);
>  	/*
> @@ -3193,6 +3242,9 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>  	if (ret < 0 && ret != -ENOENT) {
>  		spin_lock(&cache->lock);
>  		cache->commit_used = old_commit_used;
> +		cache->commit_remap_bytes = old_commit_remap_bytes;
> +		cache->commit_identity_remap_count =
> +			old_commit_identity_remap_count;
>  		spin_unlock(&cache->lock);
>  	}
>  	return ret;
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 9172104a5889..af23fdb3cf4d 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -129,6 +129,8 @@ struct btrfs_block_group {
>  	u64 flags;
>  	u64 cache_generation;
>  	u64 global_root_id;
> +	u64 remap_bytes;
> +	u32 identity_remap_count;
>  
>  	/*
>  	 * The last committed used bytes of this block group, if the above @used
> @@ -136,6 +138,15 @@ struct btrfs_block_group {
>  	 * group item of this block group.
>  	 */
>  	u64 commit_used;
> +	/*
> +	 * The last committed remap_bytes value of this block group.
> +	 */
> +	u64 commit_remap_bytes;
> +	/*
> +	 * The last commited identity_remap_count value of this block group.
> +	 */
> +	u32 commit_identity_remap_count;
> +
>  	/*
>  	 * If the free space extent count exceeds this number, convert the block
>  	 * group to bitmaps.
> @@ -282,7 +293,8 @@ static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
>  {
>  	lockdep_assert_held(&bg->lock);
>  
> -	return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
> +	return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0 ||
> +		bg->remap_bytes > 0);
>  }
>  
>  static inline bool btrfs_is_block_group_data_only(const struct btrfs_block_group *block_group)
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 89fe85778115..ee5f5b2788e1 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -373,7 +373,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
>  	if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
>  		return;
>  
> -	if (block_group->used == 0)
> +	if (block_group->used == 0 && block_group->remap_bytes == 0)
>  		add_to_discard_unused_list(discard_ctl, block_group);
>  	else
>  		add_to_discard_list(discard_ctl, block_group);
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b6827c2a7815..08b1bcfc7db7 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -688,6 +688,7 @@ static int check_block_group_item(struct extent_buffer *leaf,
>  	u64 chunk_objectid;
>  	u64 flags;
>  	u64 type;
> +	size_t exp_size;
>  
>  	/*
>  	 * Here we don't really care about alignment since extent allocator can
> @@ -699,10 +700,15 @@ static int check_block_group_item(struct extent_buffer *leaf,
>  		return -EUCLEAN;
>  	}
>  
> -	if (unlikely(item_size != sizeof(bgi))) {
> +	if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> +		exp_size = sizeof(struct btrfs_block_group_item_v2);
> +	else
> +		exp_size = sizeof(struct btrfs_block_group_item);
> +
> +	if (unlikely(item_size != exp_size)) {
>  		block_group_err(leaf, slot,
>  			"invalid item size, have %u expect %zu",
> -				item_size, sizeof(bgi));
> +				item_size, exp_size);
>  		return -EUCLEAN;
>  	}
>  
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index 9a36f0206d90..500e3a7df90b 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -1229,6 +1229,14 @@ struct btrfs_block_group_item {
>  	__le64 flags;
>  } __attribute__ ((__packed__));
>  
> +struct btrfs_block_group_item_v2 {
> +	__le64 used;
> +	__le64 chunk_objectid;
> +	__le64 flags;
> +	__le64 remap_bytes;
> +	__le32 identity_remap_count;
> +} __attribute__ ((__packed__));
> +
>  struct btrfs_free_space_info {
>  	__le32 extent_count;
>  	__le32 flags;
> -- 
> 2.49.1
> 

  reply	other threads:[~2025-10-31 21:47 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 18:12 [PATCH v4 00/16] Remap tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-10-31 22:50   ` Boris Burkov
2025-11-03 12:18     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-10-31 21:39   ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-10-31 21:44   ` Boris Burkov
2025-11-03 12:39     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-10-31 21:47   ` Boris Burkov [this message]
2025-10-24 18:12 ` [PATCH v4 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-10-31 22:03   ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-10-31 23:05   ` Boris Burkov
2025-11-03 15:51     ` Mark Harmstone
2025-10-31 23:30   ` Boris Burkov
2025-11-04 12:30     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-10-31 23:43   ` Boris Burkov
2025-11-03 18:45     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 11/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-11-01  0:02   ` Boris Burkov
2025-11-04 13:00     ` Mark Harmstone
2025-11-01  0:10   ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
2025-11-01  0:09   ` Boris Burkov
2025-11-04 14:31     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 13/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-11-01  0:12   ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 14/16] btrfs: allow balancing remap tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-10-27 16:04   ` kernel test robot
2025-10-31 22:12     ` Boris Burkov
2025-11-03 16:49       ` Mark Harmstone
2025-11-09  8:42         ` Philip Li
2025-10-31 22:11   ` Boris Burkov
2025-11-03 17:01     ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 16/16] btrfs: add stripe removal pending flag Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQUugkU1TkymUM1T@devvm12410.ftw0.facebook.com \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mark@harmstone.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox