public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Harmstone <maharmstone@meta.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC PATCH 04/10] btrfs: add extended version of struct block_group_item
Date: Fri, 23 May 2025 12:00:11 +0000	[thread overview]
Message-ID: <c0280288-cb7c-4135-81b0-46e4cf6e4c54@meta.com> (raw)
In-Reply-To: <1b511b2f-0738-42a1-95af-546bd7b4e51e@gmx.com>

On 23/5/25 10:53, Qu Wenruo wrote:
> > 
> 
> 
> 在 2025/5/16 02:06, Mark Harmstone 写道:
>> Add a struct btrfs_block_group_item_v2, which is used in the block group
>> tree if the remap-tree incompat flag is set.
>>
>> This adds two new fields to the block group item: `remap_bytes` and
>> `identity_remap_count`.
>>
>> `remap_bytes` records the amount of data that's physically within this
>> block group, but nominally in another, remapped block group. This is
>> necessary because this data will need to be moved first if this block
>> group is itself relocated. If `remap_bytes` > 0, this is an indicator to
>> the relocation thread that it will need to search the remap-tree for
>> backrefs. A block group must also have `remap_bytes` == 0 before it can
>> be dropped.
>>
>> `identity_remap_count` records how many identity remap items are located
>> in the remap tree for this block group. When relocation is begun for
>> this block group, this is set to the number of holes in the free-space
>> tree for this range. As identity remaps are converted into actual remaps
>> by the relocation process, this number is decreased. Once it reaches 0,
>> either because of relocation or because extents have been deleted, the
>> block group has been fully remapped and its chunk's device extents are
>> removed.
> 
> Can we add those two items into a new item other than a completely new 
> v2 block group item?
> 
> I mean for regular block groups they do not need those members, and all 
> block groups starts from regular ones at mkfs time.
> 
> We can add a regular block group flag to indicate if the bg has the 
> extra members.
> 
> Thanks,
> Qu

I did consider that, but the downside of doing that is that it makes the 
timing of updating the block group tree less predictable. It would imply 
that when relocating a block group we have to lock the BGT up to the 
root, as we'd be changing keys and the length of items rather than doing 
everything in place.
This didn't seem worth it to save a few bytes, particularly as it's 
anticipated that in practice most block groups will have the REMAPPED 
flag set.

Mark

> 
>>
>> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
>> ---
>>   fs/btrfs/accessors.h            |  20 +++++++
>>   fs/btrfs/block-group.c          | 101 ++++++++++++++++++++++++--------
>>   fs/btrfs/block-group.h          |  14 ++++-
>>   fs/btrfs/tree-checker.c         |  10 +++-
>>   include/uapi/linux/btrfs_tree.h |   8 +++
>>   5 files changed, 126 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
>> index 5f5eda8d6f9e..6e6dd664217b 100644
>> --- a/fs/btrfs/accessors.h
>> +++ b/fs/btrfs/accessors.h
>> @@ -264,6 +264,26 @@ BTRFS_SETGET_FUNCS(block_group_flags, struct 
>> btrfs_block_group_item, flags, 64);
>>   BTRFS_SETGET_STACK_FUNCS(stack_block_group_flags,
>>               struct btrfs_block_group_item, flags, 64);
>> +/* struct btrfs_block_group_item_v2 */
>> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_used, struct 
>> btrfs_block_group_item_v2,
>> +             used, 64);
>> +BTRFS_SETGET_FUNCS(block_group_v2_used, struct 
>> btrfs_block_group_item_v2, used, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_chunk_objectid,
>> +             struct btrfs_block_group_item_v2, chunk_objectid, 64);
>> +BTRFS_SETGET_FUNCS(block_group_v2_chunk_objectid,
>> +           struct btrfs_block_group_item_v2, chunk_objectid, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_flags,
>> +             struct btrfs_block_group_item_v2, flags, 64);
>> +BTRFS_SETGET_FUNCS(block_group_v2_flags, struct 
>> btrfs_block_group_item_v2, flags, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_remap_bytes,
>> +             struct btrfs_block_group_item_v2, remap_bytes, 64);
>> +BTRFS_SETGET_FUNCS(block_group_v2_remap_bytes, struct 
>> btrfs_block_group_item_v2,
>> +           remap_bytes, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_identity_remap_count,
>> +             struct btrfs_block_group_item_v2, identity_remap_count, 
>> 32);
>> +BTRFS_SETGET_FUNCS(block_group_v2_identity_remap_count, struct 
>> btrfs_block_group_item_v2,
>> +           identity_remap_count, 32);
>> +
>>   /* struct btrfs_free_space_info */
>>   BTRFS_SETGET_FUNCS(free_space_extent_count, struct 
>> btrfs_free_space_info,
>>              extent_count, 32);
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 5b0cb04b2b93..6a2aa792ccb2 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -2351,7 +2351,7 @@ static int 
>> check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>>   }
>>   static int read_one_block_group(struct btrfs_fs_info *info,
>> -                struct btrfs_block_group_item *bgi,
>> +                struct btrfs_block_group_item_v2 *bgi,
>>                   const struct btrfs_key *key,
>>                   int need_clear)
>>   {
>> @@ -2366,11 +2366,16 @@ static int read_one_block_group(struct 
>> btrfs_fs_info *info,
>>           return -ENOMEM;
>>       cache->length = key->offset;
>> -    cache->used = btrfs_stack_block_group_used(bgi);
>> +    cache->used = btrfs_stack_block_group_v2_used(bgi);
>>       cache->commit_used = cache->used;
>> -    cache->flags = btrfs_stack_block_group_flags(bgi);
>> -    cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
>> +    cache->flags = btrfs_stack_block_group_v2_flags(bgi);
>> +    cache->global_root_id = 
>> btrfs_stack_block_group_v2_chunk_objectid(bgi);
>>       cache->space_info = btrfs_find_space_info(info, cache->flags);
>> +    cache->remap_bytes = btrfs_stack_block_group_v2_remap_bytes(bgi);
>> +    cache->commit_remap_bytes = cache->remap_bytes;
>> +    cache->identity_remap_count =
>> +        btrfs_stack_block_group_v2_identity_remap_count(bgi);
>> +    cache->commit_identity_remap_count = cache->identity_remap_count;
>>       set_free_space_tree_thresholds(cache);
>> @@ -2435,7 +2440,7 @@ static int read_one_block_group(struct 
>> btrfs_fs_info *info,
>>       } else if (cache->length == cache->used) {
>>           cache->cached = BTRFS_CACHE_FINISHED;
>>           btrfs_free_excluded_extents(cache);
>> -    } else if (cache->used == 0) {
>> +    } else if (cache->used == 0 && cache->remap_bytes == 0) {
>>           cache->cached = BTRFS_CACHE_FINISHED;
>>           ret = btrfs_add_new_free_space(cache, cache->start,
>>                              cache->start + cache->length, NULL);
>> @@ -2455,7 +2460,8 @@ static int read_one_block_group(struct 
>> btrfs_fs_info *info,
>>       set_avail_alloc_bits(info, cache->flags);
>>       if (btrfs_chunk_writeable(info, cache->start)) {
>> -        if (cache->used == 0) {
>> +        if (cache->used == 0 && cache->identity_remap_count == 0 &&
>> +            cache->remap_bytes == 0) {
>>               ASSERT(list_empty(&cache->bg_list));
>>               if (btrfs_test_opt(info, DISCARD_ASYNC))
>>                   btrfs_discard_queue_work(&info->discard_ctl, cache);
>> @@ -2559,9 +2565,10 @@ int btrfs_read_block_groups(struct 
>> btrfs_fs_info *info)
>>           need_clear = 1;
>>       while (1) {
>> -        struct btrfs_block_group_item bgi;
>> +        struct btrfs_block_group_item_v2 bgi;
>>           struct extent_buffer *leaf;
>>           int slot;
>> +        size_t size;
>>           ret = find_first_block_group(info, path, &key);
>>           if (ret > 0)
>> @@ -2572,8 +2579,16 @@ int btrfs_read_block_groups(struct 
>> btrfs_fs_info *info)
>>           leaf = path->nodes[0];
>>           slot = path->slots[0];
>> +        if (btrfs_fs_incompat(info, REMAP_TREE)) {
>> +            size = sizeof(struct btrfs_block_group_item_v2);
>> +        } else {
>> +            size = sizeof(struct btrfs_block_group_item);
>> +            btrfs_set_stack_block_group_v2_remap_bytes(&bgi, 0);
>> +            btrfs_set_stack_block_group_v2_identity_remap_count(&bgi, 
>> 0);
>> +        }
>> +
>>           read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, 
>> slot),
>> -                   sizeof(bgi));
>> +                   size);
>>           btrfs_item_key_to_cpu(leaf, &key, slot);
>>           btrfs_release_path(path);
>> @@ -2643,25 +2658,38 @@ static int insert_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>                      struct btrfs_block_group *block_group)
>>   {
>>       struct btrfs_fs_info *fs_info = trans->fs_info;
>> -    struct btrfs_block_group_item bgi;
>> +    struct btrfs_block_group_item_v2 bgi;
>>       struct btrfs_root *root = btrfs_block_group_root(fs_info);
>>       struct btrfs_key key;
>>       u64 old_commit_used;
>> +    size_t size;
>>       int ret;
>>       spin_lock(&block_group->lock);
>> -    btrfs_set_stack_block_group_used(&bgi, block_group->used);
>> -    btrfs_set_stack_block_group_chunk_objectid(&bgi,
>> -                           block_group->global_root_id);
>> -    btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
>> +    btrfs_set_stack_block_group_v2_used(&bgi, block_group->used);
>> +    btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
>> +                              block_group->global_root_id);
>> +    btrfs_set_stack_block_group_v2_flags(&bgi, block_group->flags);
>> +    btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
>> +                           block_group->remap_bytes);
>> +    btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
>> +                    block_group->identity_remap_count);
>>       old_commit_used = block_group->commit_used;
>>       block_group->commit_used = block_group->used;
>> +    block_group->commit_remap_bytes = block_group->remap_bytes;
>> +    block_group->commit_identity_remap_count =
>> +        block_group->identity_remap_count;
>>       key.objectid = block_group->start;
>>       key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
>>       key.offset = block_group->length;
>>       spin_unlock(&block_group->lock);
>> -    ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
>> +    if (btrfs_fs_incompat(fs_info, REMAP_TREE))
>> +        size = sizeof(struct btrfs_block_group_item_v2);
>> +    else
>> +        size = sizeof(struct btrfs_block_group_item);
>> +
>> +    ret = btrfs_insert_item(trans, root, &key, &bgi, size);
>>       if (ret < 0) {
>>           spin_lock(&block_group->lock);
>>           block_group->commit_used = old_commit_used;
>> @@ -3116,10 +3144,12 @@ static int update_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>       struct btrfs_root *root = btrfs_block_group_root(fs_info);
>>       unsigned long bi;
>>       struct extent_buffer *leaf;
>> -    struct btrfs_block_group_item bgi;
>> +    struct btrfs_block_group_item_v2 bgi;
>>       struct btrfs_key key;
>> -    u64 old_commit_used;
>> -    u64 used;
>> +    u64 old_commit_used, old_commit_remap_bytes;
>> +    u32 old_commit_identity_remap_count;
>> +    u64 used, remap_bytes;
>> +    u32 identity_remap_count;
>>       /*
>>        * Block group items update can be triggered out of commit 
>> transaction
>> @@ -3129,13 +3159,21 @@ static int update_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>        */
>>       spin_lock(&cache->lock);
>>       old_commit_used = cache->commit_used;
>> +    old_commit_remap_bytes = cache->commit_remap_bytes;
>> +    old_commit_identity_remap_count = cache- 
>> >commit_identity_remap_count;
>>       used = cache->used;
>> -    /* No change in used bytes, can safely skip it. */
>> -    if (cache->commit_used == used) {
>> +    remap_bytes = cache->remap_bytes;
>> +    identity_remap_count = cache->identity_remap_count;
>> +    /* No change in values, can safely skip it. */
>> +    if (cache->commit_used == used &&
>> +        cache->commit_remap_bytes == remap_bytes &&
>> +        cache->commit_identity_remap_count == identity_remap_count) {
>>           spin_unlock(&cache->lock);
>>           return 0;
>>       }
>>       cache->commit_used = used;
>> +    cache->commit_remap_bytes = remap_bytes;
>> +    cache->commit_identity_remap_count = identity_remap_count;
>>       spin_unlock(&cache->lock);
>>       key.objectid = cache->start;
>> @@ -3151,11 +3189,23 @@ static int update_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>       leaf = path->nodes[0];
>>       bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>> -    btrfs_set_stack_block_group_used(&bgi, used);
>> -    btrfs_set_stack_block_group_chunk_objectid(&bgi,
>> -                           cache->global_root_id);
>> -    btrfs_set_stack_block_group_flags(&bgi, cache->flags);
>> -    write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
>> +    btrfs_set_stack_block_group_v2_used(&bgi, used);
>> +    btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
>> +                              cache->global_root_id);
>> +    btrfs_set_stack_block_group_v2_flags(&bgi, cache->flags);
>> +
>> +    if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
>> +        btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
>> +                               cache->remap_bytes);
>> +        btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
>> +                        cache->identity_remap_count);
>> +        write_extent_buffer(leaf, &bgi, bi,
>> +                    sizeof(struct btrfs_block_group_item_v2));
>> +    } else {
>> +        write_extent_buffer(leaf, &bgi, bi,
>> +                    sizeof(struct btrfs_block_group_item));
>> +    }
>> +
>>   fail:
>>       btrfs_release_path(path);
>>       /*
>> @@ -3170,6 +3220,9 @@ static int update_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>       if (ret < 0 && ret != -ENOENT) {
>>           spin_lock(&cache->lock);
>>           cache->commit_used = old_commit_used;
>> +        cache->commit_remap_bytes = old_commit_remap_bytes;
>> +        cache->commit_identity_remap_count =
>> +            old_commit_identity_remap_count;
>>           spin_unlock(&cache->lock);
>>       }
>>       return ret;
>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>> index 9de356bcb411..c484118b8b8d 100644
>> --- a/fs/btrfs/block-group.h
>> +++ b/fs/btrfs/block-group.h
>> @@ -127,6 +127,8 @@ struct btrfs_block_group {
>>       u64 flags;
>>       u64 cache_generation;
>>       u64 global_root_id;
>> +    u64 remap_bytes;
>> +    u32 identity_remap_count;
>>       /*
>>        * The last committed used bytes of this block group, if the 
>> above @used
>> @@ -134,6 +136,15 @@ struct btrfs_block_group {
>>        * group item of this block group.
>>        */
>>       u64 commit_used;
>> +    /*
>> +     * The last committed remap_bytes value of this block group.
>> +     */
>> +    u64 commit_remap_bytes;
>> +    /*
>> +     * The last commited identity_remap_count value of this block group.
>> +     */
>> +    u32 commit_identity_remap_count;
>> +
>>       /*
>>        * If the free space extent count exceeds this number, convert 
>> the block
>>        * group to bitmaps.
>> @@ -275,7 +286,8 @@ static inline bool btrfs_is_block_group_used(const 
>> struct btrfs_block_group *bg)
>>   {
>>       lockdep_assert_held(&bg->lock);
>> -    return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
>> +    return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0 ||
>> +        bg->remap_bytes > 0);
>>   }
>>   static inline bool btrfs_is_block_group_data_only(const struct 
>> btrfs_block_group *block_group)
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index fd83df06e3fb..25311576fab6 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -687,6 +687,7 @@ static int check_block_group_item(struct 
>> extent_buffer *leaf,
>>       u64 chunk_objectid;
>>       u64 flags;
>>       u64 type;
>> +    size_t exp_size;
>>       /*
>>        * Here we don't really care about alignment since extent 
>> allocator can
>> @@ -698,10 +699,15 @@ static int check_block_group_item(struct 
>> extent_buffer *leaf,
>>           return -EUCLEAN;
>>       }
>> -    if (unlikely(item_size != sizeof(bgi))) {
>> +    if (btrfs_fs_incompat(fs_info, REMAP_TREE))
>> +        exp_size = sizeof(struct btrfs_block_group_item_v2);
>> +    else
>> +        exp_size = sizeof(struct btrfs_block_group_item);
>> +
>> +    if (unlikely(item_size != exp_size)) {
>>           block_group_err(leaf, slot,
>>               "invalid item size, have %u expect %zu",
>> -                item_size, sizeof(bgi));
>> +                item_size, exp_size);
>>           return -EUCLEAN;
>>       }
>> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/ 
>> btrfs_tree.h
>> index 9a36f0206d90..500e3a7df90b 100644
>> --- a/include/uapi/linux/btrfs_tree.h
>> +++ b/include/uapi/linux/btrfs_tree.h
>> @@ -1229,6 +1229,14 @@ struct btrfs_block_group_item {
>>       __le64 flags;
>>   } __attribute__ ((__packed__));
>> +struct btrfs_block_group_item_v2 {
>> +    __le64 used;
>> +    __le64 chunk_objectid;
>> +    __le64 flags;
>> +    __le64 remap_bytes;
>> +    __le32 identity_remap_count;
>> +} __attribute__ ((__packed__));
>> +
>>   struct btrfs_free_space_info {
>>       __le32 extent_count;
>>       __le32 flags;
> 


  reply	other threads:[~2025-05-23 12:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-15 16:36 [RFC PATCH 00/10] Remap tree Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 01/10] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-05-21 12:43   ` Johannes Thumshirn
2025-05-23 13:06     ` Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 02/10] btrfs: add REMAP chunk type Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 03/10] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 04/10] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-05-23  9:53   ` Qu Wenruo
2025-05-23 12:00     ` Mark Harmstone [this message]
2025-05-15 16:36 ` [RFC PATCH 05/10] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 06/10] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-05-23 10:09   ` Qu Wenruo
2025-05-23 11:53     ` Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 07/10] btrfs: handle deletions from remapped block group Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 08/10] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 09/10] btrfs: move existing remaps before relocating block group Mark Harmstone
     [not found]   ` <202505161726.w1lqCZxG-lkp@intel.com>
2025-05-16 11:43     ` Mark Harmstone
2025-05-15 16:36 ` [RFC PATCH 10/10] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-05-21  0:04   ` Boris Burkov
2025-05-23 14:54     ` Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0280288-cb7c-4135-81b0-46e4cf6e4c54@meta.com \
    --to=maharmstone@meta.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox