From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 06/16] btrfs: add extended version of struct block_group_item
Date: Fri, 31 Oct 2025 14:47:46 -0700 [thread overview]
Message-ID: <aQUugkU1TkymUM1T@devvm12410.ftw0.facebook.com> (raw)
In-Reply-To: <20251024181227.32228-7-mark@harmstone.com>
On Fri, Oct 24, 2025 at 07:12:07PM +0100, Mark Harmstone wrote:
> Add a struct btrfs_block_group_item_v2, which is used in the block group
> tree if the remap-tree incompat flag is set.
>
> This adds two new fields to the block group item: `remap_bytes` and
> `identity_remap_count`.
>
> `remap_bytes` records the amount of data that's physically within this
> block group, but nominally in another, remapped block group. This is
> necessary because this data will need to be moved first if this block
> group is itself relocated. If `remap_bytes` > 0, this is an indicator to
> the relocation thread that it will need to search the remap-tree for
> backrefs. A block group must also have `remap_bytes` == 0 before it can
> be dropped.
>
> `identity_remap_count` records how many identity remap items are located
> in the remap tree for this block group. When relocation is begun for
> this block group, this is set to the number of holes in the free-space
> tree for this range. As identity remaps are converted into actual remaps
> by the relocation process, this number is decreased. Once it reaches 0,
> either because of relocation or because extents have been deleted, the
> block group has been fully remapped and its chunk's device extents are
> removed.
>
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/accessors.h | 20 +++++++
> fs/btrfs/block-group.c | 100 ++++++++++++++++++++++++--------
> fs/btrfs/block-group.h | 14 ++++-
> fs/btrfs/discard.c | 2 +-
> fs/btrfs/tree-checker.c | 10 +++-
> include/uapi/linux/btrfs_tree.h | 8 +++
> 6 files changed, 126 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
> index 95a1ca8c099b..0dd161ee6863 100644
> --- a/fs/btrfs/accessors.h
> +++ b/fs/btrfs/accessors.h
> @@ -239,6 +239,26 @@ BTRFS_SETGET_FUNCS(block_group_flags, struct btrfs_block_group_item, flags, 64);
> BTRFS_SETGET_STACK_FUNCS(stack_block_group_flags,
> struct btrfs_block_group_item, flags, 64);
>
> +/* struct btrfs_block_group_item_v2 */
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_used, struct btrfs_block_group_item_v2,
> + used, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_used, struct btrfs_block_group_item_v2, used, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_chunk_objectid,
> + struct btrfs_block_group_item_v2, chunk_objectid, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_chunk_objectid,
> + struct btrfs_block_group_item_v2, chunk_objectid, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_flags,
> + struct btrfs_block_group_item_v2, flags, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_flags, struct btrfs_block_group_item_v2, flags, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_remap_bytes,
> + struct btrfs_block_group_item_v2, remap_bytes, 64);
> +BTRFS_SETGET_FUNCS(block_group_v2_remap_bytes, struct btrfs_block_group_item_v2,
> + remap_bytes, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_identity_remap_count,
> + struct btrfs_block_group_item_v2, identity_remap_count, 32);
> +BTRFS_SETGET_FUNCS(block_group_v2_identity_remap_count, struct btrfs_block_group_item_v2,
> + identity_remap_count, 32);
> +
> /* struct btrfs_free_space_info */
> BTRFS_SETGET_FUNCS(free_space_extent_count, struct btrfs_free_space_info,
> extent_count, 32);
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index b5f2ec8d013f..27173aca6fc1 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2374,7 +2374,7 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> }
>
> static int read_one_block_group(struct btrfs_fs_info *info,
> - struct btrfs_block_group_item *bgi,
> + struct btrfs_block_group_item_v2 *bgi,
> const struct btrfs_key *key,
> int need_clear)
> {
> @@ -2389,11 +2389,16 @@ static int read_one_block_group(struct btrfs_fs_info *info,
> return -ENOMEM;
>
> cache->length = key->offset;
> - cache->used = btrfs_stack_block_group_used(bgi);
> + cache->used = btrfs_stack_block_group_v2_used(bgi);
> cache->commit_used = cache->used;
> - cache->flags = btrfs_stack_block_group_flags(bgi);
> - cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
> + cache->flags = btrfs_stack_block_group_v2_flags(bgi);
> + cache->global_root_id = btrfs_stack_block_group_v2_chunk_objectid(bgi);
> cache->space_info = btrfs_find_space_info(info, cache->flags);
> + cache->remap_bytes = btrfs_stack_block_group_v2_remap_bytes(bgi);
> + cache->commit_remap_bytes = cache->remap_bytes;
> + cache->identity_remap_count =
> + btrfs_stack_block_group_v2_identity_remap_count(bgi);
> + cache->commit_identity_remap_count = cache->identity_remap_count;
>
> btrfs_set_free_space_tree_thresholds(cache);
>
> @@ -2458,7 +2463,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
> } else if (cache->length == cache->used) {
> cache->cached = BTRFS_CACHE_FINISHED;
> btrfs_free_excluded_extents(cache);
> - } else if (cache->used == 0) {
> + } else if (cache->used == 0 && cache->remap_bytes == 0) {
> cache->cached = BTRFS_CACHE_FINISHED;
> ret = btrfs_add_new_free_space(cache, cache->start,
> cache->start + cache->length, NULL);
> @@ -2478,7 +2483,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>
> set_avail_alloc_bits(info, cache->flags);
> if (btrfs_chunk_writeable(info, cache->start)) {
> - if (cache->used == 0) {
> + if (cache->used == 0 && cache->remap_bytes == 0) {
> ASSERT(list_empty(&cache->bg_list));
> if (btrfs_test_opt(info, DISCARD_ASYNC))
> btrfs_discard_queue_work(&info->discard_ctl, cache);
> @@ -2582,9 +2587,10 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
> need_clear = 1;
>
> while (1) {
> - struct btrfs_block_group_item bgi;
> + struct btrfs_block_group_item_v2 bgi;
> struct extent_buffer *leaf;
> int slot;
> + size_t size;
>
> ret = find_first_block_group(info, path, &key);
> if (ret > 0)
> @@ -2595,8 +2601,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
> leaf = path->nodes[0];
> slot = path->slots[0];
>
> + if (btrfs_fs_incompat(info, REMAP_TREE)) {
> + size = sizeof(struct btrfs_block_group_item_v2);
> + } else {
> + size = sizeof(struct btrfs_block_group_item);
> + btrfs_set_stack_block_group_v2_remap_bytes(&bgi, 0);
> + btrfs_set_stack_block_group_v2_identity_remap_count(&bgi, 0);
> + }
> +
> read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
> - sizeof(bgi));
> + size);
>
> btrfs_item_key_to_cpu(leaf, &key, slot);
> btrfs_release_path(path);
> @@ -2666,25 +2680,38 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
> struct btrfs_block_group *block_group)
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
> - struct btrfs_block_group_item bgi;
> + struct btrfs_block_group_item_v2 bgi;
> struct btrfs_root *root = btrfs_block_group_root(fs_info);
> struct btrfs_key key;
> u64 old_commit_used;
> + size_t size;
> int ret;
>
> spin_lock(&block_group->lock);
> - btrfs_set_stack_block_group_used(&bgi, block_group->used);
> - btrfs_set_stack_block_group_chunk_objectid(&bgi,
> - block_group->global_root_id);
> - btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
> + btrfs_set_stack_block_group_v2_used(&bgi, block_group->used);
> + btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
> + block_group->global_root_id);
> + btrfs_set_stack_block_group_v2_flags(&bgi, block_group->flags);
> + btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
> + block_group->remap_bytes);
> + btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
> + block_group->identity_remap_count);
> old_commit_used = block_group->commit_used;
> block_group->commit_used = block_group->used;
> + block_group->commit_remap_bytes = block_group->remap_bytes;
> + block_group->commit_identity_remap_count =
> + block_group->identity_remap_count;
> key.objectid = block_group->start;
> key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> key.offset = block_group->length;
> spin_unlock(&block_group->lock);
>
> - ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> + size = sizeof(struct btrfs_block_group_item_v2);
> + else
> + size = sizeof(struct btrfs_block_group_item);
> +
> + ret = btrfs_insert_item(trans, root, &key, &bgi, size);
> if (ret < 0) {
> spin_lock(&block_group->lock);
> block_group->commit_used = old_commit_used;
> @@ -3139,10 +3166,12 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
> struct btrfs_root *root = btrfs_block_group_root(fs_info);
> unsigned long bi;
> struct extent_buffer *leaf;
> - struct btrfs_block_group_item bgi;
> + struct btrfs_block_group_item_v2 bgi;
> struct btrfs_key key;
> - u64 old_commit_used;
> - u64 used;
> + u64 old_commit_used, old_commit_remap_bytes;
> + u32 old_commit_identity_remap_count;
> + u64 used, remap_bytes;
> + u32 identity_remap_count;
>
> /*
> * Block group items update can be triggered out of commit transaction
> @@ -3152,13 +3181,21 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
> */
> spin_lock(&cache->lock);
> old_commit_used = cache->commit_used;
> + old_commit_remap_bytes = cache->commit_remap_bytes;
> + old_commit_identity_remap_count = cache->commit_identity_remap_count;
> used = cache->used;
> - /* No change in used bytes, can safely skip it. */
> - if (cache->commit_used == used) {
> + remap_bytes = cache->remap_bytes;
> + identity_remap_count = cache->identity_remap_count;
> + /* No change in values, can safely skip it. */
> + if (cache->commit_used == used &&
> + cache->commit_remap_bytes == remap_bytes &&
> + cache->commit_identity_remap_count == identity_remap_count) {
> spin_unlock(&cache->lock);
> return 0;
> }
> cache->commit_used = used;
> + cache->commit_remap_bytes = remap_bytes;
> + cache->commit_identity_remap_count = identity_remap_count;
> spin_unlock(&cache->lock);
>
> key.objectid = cache->start;
> @@ -3174,11 +3211,23 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>
> leaf = path->nodes[0];
> bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
> - btrfs_set_stack_block_group_used(&bgi, used);
> - btrfs_set_stack_block_group_chunk_objectid(&bgi,
> - cache->global_root_id);
> - btrfs_set_stack_block_group_flags(&bgi, cache->flags);
> - write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
> + btrfs_set_stack_block_group_v2_used(&bgi, used);
> + btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
> + cache->global_root_id);
> + btrfs_set_stack_block_group_v2_flags(&bgi, cache->flags);
> +
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
> + btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
> + cache->remap_bytes);
> + btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
> + cache->identity_remap_count);
> + write_extent_buffer(leaf, &bgi, bi,
> + sizeof(struct btrfs_block_group_item_v2));
> + } else {
> + write_extent_buffer(leaf, &bgi, bi,
> + sizeof(struct btrfs_block_group_item));
> + }
> +
> fail:
> btrfs_release_path(path);
> /*
> @@ -3193,6 +3242,9 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
> if (ret < 0 && ret != -ENOENT) {
> spin_lock(&cache->lock);
> cache->commit_used = old_commit_used;
> + cache->commit_remap_bytes = old_commit_remap_bytes;
> + cache->commit_identity_remap_count =
> + old_commit_identity_remap_count;
> spin_unlock(&cache->lock);
> }
> return ret;
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 9172104a5889..af23fdb3cf4d 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -129,6 +129,8 @@ struct btrfs_block_group {
> u64 flags;
> u64 cache_generation;
> u64 global_root_id;
> + u64 remap_bytes;
> + u32 identity_remap_count;
>
> /*
> * The last committed used bytes of this block group, if the above @used
> @@ -136,6 +138,15 @@ struct btrfs_block_group {
> * group item of this block group.
> */
> u64 commit_used;
> + /*
> + * The last committed remap_bytes value of this block group.
> + */
> + u64 commit_remap_bytes;
> + /*
> + * The last commited identity_remap_count value of this block group.
> + */
> + u32 commit_identity_remap_count;
> +
> /*
> * If the free space extent count exceeds this number, convert the block
> * group to bitmaps.
> @@ -282,7 +293,8 @@ static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
> {
> lockdep_assert_held(&bg->lock);
>
> - return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
> + return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0 ||
> + bg->remap_bytes > 0);
> }
>
> static inline bool btrfs_is_block_group_data_only(const struct btrfs_block_group *block_group)
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 89fe85778115..ee5f5b2788e1 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -373,7 +373,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> return;
>
> - if (block_group->used == 0)
> + if (block_group->used == 0 && block_group->remap_bytes == 0)
> add_to_discard_unused_list(discard_ctl, block_group);
> else
> add_to_discard_list(discard_ctl, block_group);
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b6827c2a7815..08b1bcfc7db7 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -688,6 +688,7 @@ static int check_block_group_item(struct extent_buffer *leaf,
> u64 chunk_objectid;
> u64 flags;
> u64 type;
> + size_t exp_size;
>
> /*
> * Here we don't really care about alignment since extent allocator can
> @@ -699,10 +700,15 @@ static int check_block_group_item(struct extent_buffer *leaf,
> return -EUCLEAN;
> }
>
> - if (unlikely(item_size != sizeof(bgi))) {
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> + exp_size = sizeof(struct btrfs_block_group_item_v2);
> + else
> + exp_size = sizeof(struct btrfs_block_group_item);
> +
> + if (unlikely(item_size != exp_size)) {
> block_group_err(leaf, slot,
> "invalid item size, have %u expect %zu",
> - item_size, sizeof(bgi));
> + item_size, exp_size);
> return -EUCLEAN;
> }
>
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index 9a36f0206d90..500e3a7df90b 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -1229,6 +1229,14 @@ struct btrfs_block_group_item {
> __le64 flags;
> } __attribute__ ((__packed__));
>
> +struct btrfs_block_group_item_v2 {
> + __le64 used;
> + __le64 chunk_objectid;
> + __le64 flags;
> + __le64 remap_bytes;
> + __le32 identity_remap_count;
> +} __attribute__ ((__packed__));
> +
> struct btrfs_free_space_info {
> __le32 extent_count;
> __le32 flags;
> --
> 2.49.1
>
next prev parent reply other threads:[~2025-10-31 21:47 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-24 18:12 [PATCH v4 00/16] Remap tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-10-31 22:50 ` Boris Burkov
2025-11-03 12:18 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-10-31 21:39 ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-10-31 21:44 ` Boris Burkov
2025-11-03 12:39 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-10-31 21:47 ` Boris Burkov [this message]
2025-10-24 18:12 ` [PATCH v4 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-10-31 22:03 ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-10-31 23:05 ` Boris Burkov
2025-11-03 15:51 ` Mark Harmstone
2025-10-31 23:30 ` Boris Burkov
2025-11-04 12:30 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-10-31 23:43 ` Boris Burkov
2025-11-03 18:45 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 11/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-11-01 0:02 ` Boris Burkov
2025-11-04 13:00 ` Mark Harmstone
2025-11-01 0:10 ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
2025-11-01 0:09 ` Boris Burkov
2025-11-04 14:31 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 13/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-11-01 0:12 ` Boris Burkov
2025-10-24 18:12 ` [PATCH v4 14/16] btrfs: allow balancing remap tree Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-10-27 16:04 ` kernel test robot
2025-10-31 22:12 ` Boris Burkov
2025-11-03 16:49 ` Mark Harmstone
2025-11-09 8:42 ` Philip Li
2025-10-31 22:11 ` Boris Burkov
2025-11-03 17:01 ` Mark Harmstone
2025-10-24 18:12 ` [PATCH v4 16/16] btrfs: add stripe removal pending flag Mark Harmstone
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQUugkU1TkymUM1T@devvm12410.ftw0.facebook.com \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=mark@harmstone.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox