From: Mark Harmstone <mark@harmstone.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 09/12] btrfs: handle deletions from remapped block group
Date: Mon, 11 Aug 2025 17:48:05 +0100 [thread overview]
Message-ID: <a70b41cb-dab5-440f-8250-2e849aef147b@harmstone.com> (raw)
In-Reply-To: <20250613234250.GG3621880@zen.localdomain>
On 14/06/2025 12.42 am, Boris Burkov wrote:
> On Thu, Jun 05, 2025 at 05:23:39PM +0100, Mark Harmstone wrote:
>> Handle the case where we free an extent from a block group that has the
>> REMAPPED flag set. Because the remap tree is orthogonal to the extent
>> tree, for data this may be within any number of identity remaps or
>> actual remaps. If we're freeing a metadata node, this will be wholly
>> inside one or the other.
>>
>> btrfs_remove_extent_from_remap_tree() searches the remap tree for the
>> remaps that cover the range in question, then calls
>> remove_range_from_remap_tree() for each one, to punch a hole in the
>> remap and adjust the free-space tree.
>>
>> For an identity remap, remove_range_from_remap_tree() will adjust the
>> block group's `identity_remap_count` if this changes. If it reaches
>> zero we call last_identity_remap_gone(), which removes the chunk's
>> stripes and device extents - it is now fully remapped.
>>
>> The changes which involve the block group's ro flag are because the
>> REMAPPED flag itself prevents a block group from having any new
>> allocations within it, and so we don't need to account for this
>> separately.
>
> Those changes didn't really make much sense to me. Do you *want to*
> delete the "unused" block group with remapped set?
> How did it get in the list?
> Did you actually hit this case and are fixing something?
So the life cycle of a block group is:
1. Normal BG
2. Partially remapped BG (REMAPPED flag set, one or more identity remaps)
3. Fully remapped BG (no identity remaps, chunk stripes removed; all data
that's nominally here is actually somewhere else)
4. Empty BG (all extents that nominally live within this BG have been
removed)
You still want to remove these empty BGs, otherwise they'll stick around
in the BG tree forever.
>
>>
>> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
>> ---
>> fs/btrfs/block-group.c | 80 ++++---
>> fs/btrfs/block-group.h | 1 +
>> fs/btrfs/disk-io.c | 1 +
>> fs/btrfs/extent-tree.c | 30 ++-
>> fs/btrfs/fs.h | 1 +
>> fs/btrfs/relocation.c | 505 +++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/relocation.h | 3 +
>> fs/btrfs/volumes.c | 56 +++--
>> fs/btrfs/volumes.h | 6 +
>> 9 files changed, 628 insertions(+), 55 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 4529356bb1e3..334df145ab3f 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -1055,6 +1055,32 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
>> return ret;
>> }
>>
>> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group)
>> +{
>> + int factor = btrfs_bg_type_to_factor(block_group->flags);
>> +
>> + spin_lock(&block_group->space_info->lock);
>> +
>> + if (btrfs_test_opt(block_group->fs_info, ENOSPC_DEBUG)) {
>> + WARN_ON(block_group->space_info->total_bytes
>> + < block_group->length);
>> + WARN_ON(block_group->space_info->bytes_readonly
>> + < block_group->length - block_group->zone_unusable);
>> + WARN_ON(block_group->space_info->bytes_zone_unusable
>> + < block_group->zone_unusable);
>> + WARN_ON(block_group->space_info->disk_total
>> + < block_group->length * factor);
>> + }
>> + block_group->space_info->total_bytes -= block_group->length;
>> + block_group->space_info->bytes_readonly -=
>> + (block_group->length - block_group->zone_unusable);
>> + btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
>> + -block_group->zone_unusable);
>> + block_group->space_info->disk_total -= block_group->length * factor;
>> +
>> + spin_unlock(&block_group->space_info->lock);
>> +}
>> +
>> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct btrfs_chunk_map *map)
>> {
>> @@ -1066,7 +1092,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct kobject *kobj = NULL;
>> int ret;
>> int index;
>> - int factor;
>> struct btrfs_caching_control *caching_ctl = NULL;
>> bool remove_map;
>> bool remove_rsv = false;
>> @@ -1075,7 +1100,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> if (!block_group)
>> return -ENOENT;
>>
>> - BUG_ON(!block_group->ro);
>> + BUG_ON(!block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED));
>>
>> trace_btrfs_remove_block_group(block_group);
>> /*
>> @@ -1087,7 +1112,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> block_group->length);
>>
>> index = btrfs_bg_flags_to_raid_index(block_group->flags);
>> - factor = btrfs_bg_type_to_factor(block_group->flags);
>>
>> /* make sure this block group isn't part of an allocation cluster */
>> cluster = &fs_info->data_alloc_cluster;
>> @@ -1211,26 +1235,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>
>> spin_lock(&block_group->space_info->lock);
>> list_del_init(&block_group->ro_list);
>> -
>> - if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
>> - WARN_ON(block_group->space_info->total_bytes
>> - < block_group->length);
>> - WARN_ON(block_group->space_info->bytes_readonly
>> - < block_group->length - block_group->zone_unusable);
>> - WARN_ON(block_group->space_info->bytes_zone_unusable
>> - < block_group->zone_unusable);
>> - WARN_ON(block_group->space_info->disk_total
>> - < block_group->length * factor);
>> - }
>> - block_group->space_info->total_bytes -= block_group->length;
>> - block_group->space_info->bytes_readonly -=
>> - (block_group->length - block_group->zone_unusable);
>> - btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
>> - -block_group->zone_unusable);
>> - block_group->space_info->disk_total -= block_group->length * factor;
>> -
>> spin_unlock(&block_group->space_info->lock);
>>
>> + if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED))
>> + btrfs_remove_bg_from_sinfo(block_group);
>
> So we double count the block group in the space info usage while the
> remapped one stays around? Why delete the block group itself then? Why
> not just delete it when the last identity remap is gone and we call this
> function? Sorry if this is obvious, but I don't see it for some reason.
btrfs_remove_bg_from_sinfo() also gets called in last_identity_remap_gone(),
there's no double-counting. For a remapped extent, the data gets charged
to the destination BG's space info.
IIRC Qu asked a similar question, i.e. why don't we remove remapped BGs as
soon as the last identity remap has gone. The problem here is that there's
no longer anyway to distinguish between addresses of fully remapped BGs,
and completely invalid addresses - which makes things like btrfs-check more
difficult. See also the frequency at which bit-flip errors get posted to
the mailing list.
>
>> +
>> /*
>> * Remove the free space for the block group from the free space tree
>> * and the block group's item from the extent tree before marking the
>> @@ -1517,6 +1526,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> while (!list_empty(&fs_info->unused_bgs)) {
>> u64 used;
>> int trimming;
>> + bool made_ro = false;
>>
>> block_group = list_first_entry(&fs_info->unused_bgs,
>> struct btrfs_block_group,
>> @@ -1553,7 +1563,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>>
>> spin_lock(&space_info->lock);
>> spin_lock(&block_group->lock);
>> - if (btrfs_is_block_group_used(block_group) || block_group->ro ||
>> + if (btrfs_is_block_group_used(block_group) ||
>> + (block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) ||
>> list_is_singular(&block_group->list)) {
>> /*
>> * We want to bail if we made new allocations or have
>> @@ -1596,7 +1607,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> */
>> used = btrfs_space_info_used(space_info, true);
>> if (space_info->total_bytes - block_group->length < used &&
>> - block_group->zone_unusable < block_group->length) {
>> + block_group->zone_unusable < block_group->length &&
>> + !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>> /*
>> * Add a reference for the list, compensate for the ref
>> * drop under the "next" label for the
>> @@ -1614,8 +1626,14 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> spin_unlock(&block_group->lock);
>> spin_unlock(&space_info->lock);
>>
>> - /* We don't want to force the issue, only flip if it's ok. */
>> - ret = inc_block_group_ro(block_group, 0);
>> + if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>> + /* We don't want to force the issue, only flip if it's ok. */
>> + ret = inc_block_group_ro(block_group, 0);
>> + made_ro = true;
>> + } else {
>> + ret = 0;
>> + }
>> +
>> up_write(&space_info->groups_sem);
>> if (ret < 0) {
>> ret = 0;
>> @@ -1624,7 +1642,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>>
>> ret = btrfs_zone_finish(block_group);
>> if (ret < 0) {
>> - btrfs_dec_block_group_ro(block_group);
>> + if (made_ro)
>> + btrfs_dec_block_group_ro(block_group);
>> if (ret == -EAGAIN)
>> ret = 0;
>> goto next;
>> @@ -1637,7 +1656,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> trans = btrfs_start_trans_remove_block_group(fs_info,
>> block_group->start);
>> if (IS_ERR(trans)) {
>> - btrfs_dec_block_group_ro(block_group);
>> + if (made_ro)
>> + btrfs_dec_block_group_ro(block_group);
>> ret = PTR_ERR(trans);
>> goto next;
>> }
>> @@ -1647,7 +1667,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> * just delete them, we don't care about them anymore.
>> */
>> if (!clean_pinned_extents(trans, block_group)) {
>> - btrfs_dec_block_group_ro(block_group);
>> + if (made_ro)
>> + btrfs_dec_block_group_ro(block_group);
>> goto end_trans;
>> }
>>
>> @@ -1661,7 +1682,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> spin_lock(&fs_info->discard_ctl.lock);
>> if (!list_empty(&block_group->discard_list)) {
>> spin_unlock(&fs_info->discard_ctl.lock);
>> - btrfs_dec_block_group_ro(block_group);
>> + if (made_ro)
>> + btrfs_dec_block_group_ro(block_group);
>> btrfs_discard_queue_work(&fs_info->discard_ctl,
>> block_group);
>> goto end_trans;
>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>> index c484118b8b8d..767898929960 100644
>> --- a/fs/btrfs/block-group.h
>> +++ b/fs/btrfs/block-group.h
>> @@ -329,6 +329,7 @@ int btrfs_add_new_free_space(struct btrfs_block_group *block_group,
>> struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
>> struct btrfs_fs_info *fs_info,
>> const u64 chunk_offset);
>> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group);
>> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct btrfs_chunk_map *map);
>> void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index a0542b581f4e..dac22efd2332 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2922,6 +2922,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>> mutex_init(&fs_info->chunk_mutex);
>> mutex_init(&fs_info->transaction_kthread_mutex);
>> mutex_init(&fs_info->cleaner_mutex);
>> + mutex_init(&fs_info->remap_mutex);
>> mutex_init(&fs_info->ro_block_group_mutex);
>> init_rwsem(&fs_info->commit_root_sem);
>> init_rwsem(&fs_info->cleanup_work_sem);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index e8f752ef1da9..995784cdca9d 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -40,6 +40,7 @@
>> #include "orphan.h"
>> #include "tree-checker.h"
>> #include "raid-stripe-tree.h"
>> +#include "relocation.h"
>>
>> #undef SCRAMBLE_DELAYED_REFS
>>
>> @@ -2977,6 +2978,8 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
>> u64 bytenr, struct btrfs_squota_delta *delta)
>> {
>> int ret;
>> + struct btrfs_block_group *bg;
>> + bool bg_is_remapped = false;
>> u64 num_bytes = delta->num_bytes;
>>
>> if (delta->is_data) {
>> @@ -3002,10 +3005,22 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
>> return ret;
>> }
>>
>> - ret = add_to_free_space_tree(trans, bytenr, num_bytes);
>> - if (ret) {
>> - btrfs_abort_transaction(trans, ret);
>> - return ret;
>> + if (btrfs_fs_incompat(trans->fs_info, REMAP_TREE)) {
>> + bg = btrfs_lookup_block_group(trans->fs_info, bytenr);
>> + bg_is_remapped = bg->flags & BTRFS_BLOCK_GROUP_REMAPPED;
>
> As mentioned in the patch that sets the flag, this feels quite
> susceptible to races. I don't have a particular one in mind, it just
> feels off to set and rely on this flag to decide what work to do without
> some kind of explicit synchronization plan.
>
>> + btrfs_put_block_group(bg);
>> + }
>> +
>> + /*
>> + * If remapped, FST has already been taken care of in
>> + * remove_range_from_remap_tree().
>> + */
>> + if (!bg_is_remapped) {
>> + ret = add_to_free_space_tree(trans, bytenr, num_bytes);
>> + if (ret) {
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + }
>> }
>>
>> ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
>> @@ -3387,6 +3402,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>> }
>> btrfs_release_path(path);
>>
>> + ret = btrfs_remove_extent_from_remap_tree(trans, path, bytenr,
>> + num_bytes);
>> + if (ret) {
>> + btrfs_abort_transaction(trans, ret);
>> + goto out;
>> + }
>> +
>> ret = do_free_extent_accounting(trans, bytenr, &delta);
>> }
>> btrfs_release_path(path);
>> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
>> index 8ceeb64aceb3..fd7dc61be9c7 100644
>> --- a/fs/btrfs/fs.h
>> +++ b/fs/btrfs/fs.h
>> @@ -551,6 +551,7 @@ struct btrfs_fs_info {
>> struct mutex transaction_kthread_mutex;
>> struct mutex cleaner_mutex;
>> struct mutex chunk_mutex;
>> + struct mutex remap_mutex;
>>
>> /*
>> * This is taken to make sure we don't set block groups ro after the
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index e45f3598ef03..54c3e99c7dab 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -37,6 +37,7 @@
>> #include "super.h"
>> #include "tree-checker.h"
>> #include "raid-stripe-tree.h"
>> +#include "free-space-tree.h"
>>
>> /*
>> * Relocation overview
>> @@ -3905,6 +3906,150 @@ static const char *stage_to_string(enum reloc_stage stage)
>> return "unknown";
>> }
>>
>> +static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
>> + struct btrfs_block_group *bg,
>> + s64 diff)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + bool bg_already_dirty = true;
>> +
>> + bg->remap_bytes += diff;
>> +
>> + if (bg->used == 0 && bg->remap_bytes == 0)
>> + btrfs_mark_bg_unused(bg);
>> +
>> + spin_lock(&trans->transaction->dirty_bgs_lock);
>> + if (list_empty(&bg->dirty_list)) {
>> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
>> + bg_already_dirty = false;
>> + btrfs_get_block_group(bg);
>> + }
>> + spin_unlock(&trans->transaction->dirty_bgs_lock);
>> +
>> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
>> + if (!bg_already_dirty)
>> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
>> +}
>> +
>> +static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *chunk,
>> + struct btrfs_path *path)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_key key;
>> + struct extent_buffer *leaf;
>> + struct btrfs_chunk *c;
>> + int ret;
>> +
>> + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>> + key.type = BTRFS_CHUNK_ITEM_KEY;
>> + key.offset = chunk->start;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
>> + 0, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + return ret;
>> + }
>> +
>> + leaf = path->nodes[0];
>> +
>> + c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
>> + btrfs_set_chunk_num_stripes(leaf, c, 0);
>> +
>> + btrfs_truncate_item(trans, path, offsetof(struct btrfs_chunk, stripe),
>> + 1);
>> +
>> + btrfs_mark_buffer_dirty(trans, leaf);
>> +
>> + btrfs_release_path(path);
>> +
>> + chunk->num_stripes = 0;
>> +
>> + return 0;
>> +}
>> +
>> +static int last_identity_remap_gone(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *chunk,
>> + struct btrfs_block_group *bg,
>> + struct btrfs_path *path)
>> +{
>> + int ret;
>> +
>> + ret = btrfs_remove_dev_extents(trans, chunk);
>> + if (ret)
>> + return ret;
>> +
>> + mutex_lock(&trans->fs_info->chunk_mutex);
>> +
>> + for (unsigned int i = 0; i < chunk->num_stripes; i++) {
>> + ret = btrfs_update_device(trans, chunk->stripes[i].dev);
>> + if (ret) {
>> + mutex_unlock(&trans->fs_info->chunk_mutex);
>> + return ret;
>> + }
>> + }
>> +
>> + mutex_unlock(&trans->fs_info->chunk_mutex);
>> +
>> + write_lock(&trans->fs_info->mapping_tree_lock);
>> + btrfs_chunk_map_device_clear_bits(chunk, CHUNK_ALLOCATED);
>> + write_unlock(&trans->fs_info->mapping_tree_lock);
>> +
>> + btrfs_remove_bg_from_sinfo(bg);
>> +
>> + ret = remove_chunk_stripes(trans, chunk, path);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int adjust_identity_remap_count(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + struct btrfs_block_group *bg, int delta)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_chunk_map *chunk;
>> + bool bg_already_dirty = true;
>> + int ret;
>> +
>> + WARN_ON(delta < 0 && -delta > bg->identity_remap_count);
>> +
>> + bg->identity_remap_count += delta;
>> +
>> + spin_lock(&trans->transaction->dirty_bgs_lock);
>> + if (list_empty(&bg->dirty_list)) {
>> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
>> + bg_already_dirty = false;
>> + btrfs_get_block_group(bg);
>> + }
>> + spin_unlock(&trans->transaction->dirty_bgs_lock);
>> +
>> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
>> + if (!bg_already_dirty)
>> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
>> +
>> + if (bg->identity_remap_count != 0)
>> + return 0;
>> +
>> + chunk = btrfs_find_chunk_map(fs_info, bg->start, 1);
>> + if (!chunk)
>> + return -ENOENT;
>> +
>> + ret = last_identity_remap_gone(trans, chunk, bg, path);
>> + if (ret)
>> + goto end;
>> +
>> + ret = 0;
>> +end:
>> + btrfs_free_chunk_map(chunk);
>> + return ret;
>> +}
>> +
>> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>> u64 *length, bool nolock)
>> {
>> @@ -4521,3 +4666,363 @@ u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info)
>> logical = fs_info->reloc_ctl->block_group->start;
>> return logical;
>> }
>> +
>> +static int remove_range_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + struct btrfs_block_group *bg,
>> + u64 bytenr, u64 num_bytes)
>> +{
>> + int ret;
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct extent_buffer *leaf = path->nodes[0];
>> + struct btrfs_key key, new_key;
>> + struct btrfs_remap *remap_ptr = NULL, remap;
>> + struct btrfs_block_group *dest_bg = NULL;
>> + u64 end, new_addr = 0, remap_start, remap_length, overlap_length;
>> + bool is_identity_remap;
>> +
>> + end = bytenr + num_bytes;
>> +
>> + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
>> +
>> + is_identity_remap = key.type == BTRFS_IDENTITY_REMAP_KEY;
>> +
>> + remap_start = key.objectid;
>> + remap_length = key.offset;
>> +
>> + if (!is_identity_remap) {
>> + remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
>> + struct btrfs_remap);
>> + new_addr = btrfs_remap_address(leaf, remap_ptr);
>> +
>> + dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
>> + }
>> +
>> + if (bytenr == remap_start && num_bytes >= remap_length) {
>> + /* Remove entirely. */
>> +
>> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
>> + if (ret)
>> + goto end;
>> +
>> + btrfs_release_path(path);
>> +
>> + overlap_length = remap_length;
>> +
>> + if (!is_identity_remap) {
>> + /* Remove backref. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = remap_length;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root,
>> + &key, path, -1, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + goto end;
>> + }
>> +
>> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
>> +
>> + btrfs_release_path(path);
>> +
>> + if (ret)
>> + goto end;
>> +
>> + adjust_block_group_remap_bytes(trans, dest_bg,
>> + -remap_length);
>> + } else {
>> + ret = adjust_identity_remap_count(trans, path, bg, -1);
>> + if (ret)
>> + goto end;
>> + }
>> + } else if (bytenr == remap_start) {
>> + /* Remove beginning. */
>> +
>> + new_key.objectid = end;
>> + new_key.type = key.type;
>> + new_key.offset = remap_length + remap_start - end;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> + btrfs_mark_buffer_dirty(trans, leaf);
>> +
>> + overlap_length = num_bytes;
>> +
>> + if (!is_identity_remap) {
>> + btrfs_set_remap_address(leaf, remap_ptr,
>> + new_addr + end - remap_start);
>> + btrfs_release_path(path);
>> +
>> + /* Adjust backref. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = remap_length;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root,
>> + &key, path, -1, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + goto end;
>> + }
>> +
>> + leaf = path->nodes[0];
>> +
>> + new_key.objectid = new_addr + end - remap_start;
>> + new_key.type = BTRFS_REMAP_BACKREF_KEY;
>> + new_key.offset = remap_length + remap_start - end;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> +
>> + remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
>> + struct btrfs_remap);
>> + btrfs_set_remap_address(leaf, remap_ptr, end);
>> +
>> + btrfs_mark_buffer_dirty(trans, path->nodes[0]);
>> +
>> + btrfs_release_path(path);
>> +
>> + adjust_block_group_remap_bytes(trans, dest_bg,
>> + -num_bytes);
>> + }
>> + } else if (bytenr + num_bytes < remap_start + remap_length) {
>> + /* Remove middle. */
>> +
>> + new_key.objectid = remap_start;
>> + new_key.type = key.type;
>> + new_key.offset = bytenr - remap_start;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> + btrfs_mark_buffer_dirty(trans, leaf);
>> +
>> + new_key.objectid = end;
>> + new_key.offset = remap_start + remap_length - end;
>> +
>> + btrfs_release_path(path);
>> +
>> + overlap_length = num_bytes;
>> +
>> + if (!is_identity_remap) {
>> + /* Add second remap entry. */
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &new_key,
>> + sizeof(struct btrfs_remap));
>> + if (ret)
>> + goto end;
>> +
>> + btrfs_set_stack_remap_address(&remap,
>> + new_addr + end - remap_start);
>> +
>> + write_extent_buffer(path->nodes[0], &remap,
>> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>> + sizeof(struct btrfs_remap));
>> +
>> + btrfs_release_path(path);
>> +
>> + /* Shorten backref entry. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = remap_length;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root,
>> + &key, path, -1, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + goto end;
>> + }
>> +
>> + new_key.objectid = new_addr;
>> + new_key.type = BTRFS_REMAP_BACKREF_KEY;
>> + new_key.offset = bytenr - remap_start;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> + btrfs_mark_buffer_dirty(trans, path->nodes[0]);
>> +
>> + btrfs_release_path(path);
>> +
>> + /* Add second backref entry. */
>> +
>> + new_key.objectid = new_addr + end - remap_start;
>> + new_key.type = BTRFS_REMAP_BACKREF_KEY;
>> + new_key.offset = remap_start + remap_length - end;
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &new_key,
>> + sizeof(struct btrfs_remap));
>> + if (ret)
>> + goto end;
>> +
>> + btrfs_set_stack_remap_address(&remap, end);
>> +
>> + write_extent_buffer(path->nodes[0], &remap,
>> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>> + sizeof(struct btrfs_remap));
>> +
>> + btrfs_release_path(path);
>> +
>> + adjust_block_group_remap_bytes(trans, dest_bg,
>> + -num_bytes);
>> + } else {
>> + /* Add second identity remap entry. */
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &new_key, 0);
>> + if (ret)
>> + goto end;
>> +
>> + btrfs_release_path(path);
>> +
>> + ret = adjust_identity_remap_count(trans, path, bg, 1);
>> + if (ret)
>> + goto end;
>> + }
>> + } else {
>> + /* Remove end. */
>> +
>> + new_key.objectid = remap_start;
>> + new_key.type = key.type;
>> + new_key.offset = bytenr - remap_start;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> + btrfs_mark_buffer_dirty(trans, leaf);
>> +
>> + btrfs_release_path(path);
>> +
>> + overlap_length = remap_start + remap_length - bytenr;
>> +
>> + if (!is_identity_remap) {
>> + /* Shorten backref entry. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = remap_length;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root,
>> + &key, path, -1, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + goto end;
>> + }
>> +
>> + new_key.objectid = new_addr;
>> + new_key.type = BTRFS_REMAP_BACKREF_KEY;
>> + new_key.offset = bytenr - remap_start;
>> +
>> + btrfs_set_item_key_safe(trans, path, &new_key);
>> + btrfs_mark_buffer_dirty(trans, path->nodes[0]);
>> +
>> + btrfs_release_path(path);
>> +
>> + adjust_block_group_remap_bytes(trans, dest_bg,
>> + bytenr - remap_start - remap_length);
>> + }
>> + }
>> +
>> + if (!is_identity_remap) {
>> + ret = add_to_free_space_tree(trans,
>> + bytenr - remap_start + new_addr,
>> + overlap_length);
>> + if (ret)
>> + goto end;
>> + }
>> +
>> + ret = overlap_length;
>> +
>> +end:
>> + if (dest_bg)
>> + btrfs_put_block_group(dest_bg);
>> +
>> + return ret;
>> +}
>> +
>> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + u64 bytenr, u64 num_bytes)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_key key, found_key;
>> + struct extent_buffer *leaf;
>> + struct btrfs_block_group *bg;
>> + int ret;
>> +
>> + if (!(btrfs_super_incompat_flags(fs_info->super_copy) &
>> + BTRFS_FEATURE_INCOMPAT_REMAP_TREE))
>> + return 0;
>> +
>> + bg = btrfs_lookup_block_group(fs_info, bytenr);
>> + if (!bg)
>> + return 0;
>> +
>> + if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>> + btrfs_put_block_group(bg);
>> + return 0;
>> + }
>> +
>> + mutex_lock(&fs_info->remap_mutex);
>> +
>> + do {
>> + key.objectid = bytenr;
>> + key.type = (u8)-1;
>> + key.offset = (u64)-1;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
>> + -1, 1);
>> + if (ret < 0)
>> + goto end;
>> +
>> + leaf = path->nodes[0];
>> +
>> + if (path->slots[0] == 0) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + path->slots[0]--;
>> +
>> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> + if (found_key.type != BTRFS_IDENTITY_REMAP_KEY &&
>> + found_key.type != BTRFS_REMAP_KEY) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + if (bytenr < found_key.objectid ||
>> + bytenr >= found_key.objectid + found_key.offset) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + ret = remove_range_from_remap_tree(trans, path, bg, bytenr,
>> + num_bytes);
>> + if (ret < 0)
>> + goto end;
>> +
>> + bytenr += ret;
>> + num_bytes -= ret;
>> + } while (num_bytes > 0);
>> +
>> + ret = 0;
>> +
>> +end:
>> + mutex_unlock(&fs_info->remap_mutex);
>> +
>> + btrfs_put_block_group(bg);
>> + btrfs_release_path(path);
>> + return ret;
>> +}
>> diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
>> index 8c9dfc55b799..0021f812b12c 100644
>> --- a/fs/btrfs/relocation.h
>> +++ b/fs/btrfs/relocation.h
>> @@ -32,5 +32,8 @@ bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
>> u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
>> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>> u64 *length, bool nolock);
>> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + u64 bytenr, u64 num_bytes);
>>
>> #endif
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 62bd6259ebd3..6c0a67da92f1 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2931,8 +2931,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>> return ret;
>> }
>>
>> -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
>> - struct btrfs_device *device)
>> +int btrfs_update_device(struct btrfs_trans_handle *trans,
>> + struct btrfs_device *device)
>> {
>> int ret;
>> struct btrfs_path *path;
>> @@ -3236,25 +3236,13 @@ static int remove_chunk_item(struct btrfs_trans_handle *trans,
>> return btrfs_free_chunk(trans, chunk_offset);
>> }
>>
>> -int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *map)
>> {
>> struct btrfs_fs_info *fs_info = trans->fs_info;
>> - struct btrfs_chunk_map *map;
>> + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> u64 dev_extent_len = 0;
>> int i, ret = 0;
>> - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> -
>> - map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>> - if (IS_ERR(map)) {
>> - /*
>> - * This is a logic error, but we don't want to just rely on the
>> - * user having built with ASSERT enabled, so if ASSERT doesn't
>> - * do anything we still error out.
>> - */
>> - DEBUG_WARN("errr %ld reading chunk map at offset %llu",
>> - PTR_ERR(map), chunk_offset);
>> - return PTR_ERR(map);
>> - }
>>
>> /*
>> * First delete the device extent items from the devices btree.
>> @@ -3275,7 +3263,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> if (ret) {
>> mutex_unlock(&fs_devices->device_list_mutex);
>> btrfs_abort_transaction(trans, ret);
>> - goto out;
>> + return ret;
>> }
>>
>> if (device->bytes_used > 0) {
>> @@ -3295,6 +3283,30 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> }
>> mutex_unlock(&fs_devices->device_list_mutex);
>>
>> + return 0;
>> +}
>> +
>> +int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_chunk_map *map;
>> + int ret;
>> +
>> + map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>> + if (IS_ERR(map)) {
>> + /*
>> + * This is a logic error, but we don't want to just rely on the
>> + * user having built with ASSERT enabled, so if ASSERT doesn't
>> + * do anything we still error out.
>> + */
>> + ASSERT(0);
>> + return PTR_ERR(map);
>> + }
>> +
>> + ret = btrfs_remove_dev_extents(trans, map);
>> + if (ret)
>> + goto out;
>> +
>> /*
>> * We acquire fs_info->chunk_mutex for 2 reasons:
>> *
>> @@ -5436,7 +5448,7 @@ static void chunk_map_device_set_bits(struct btrfs_chunk_map *map, unsigned int
>> }
>> }
>>
>> -static void chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
>> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
>> {
>> for (int i = 0; i < map->num_stripes; i++) {
>> struct btrfs_io_stripe *stripe = &map->stripes[i];
>> @@ -5453,7 +5465,7 @@ void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_ma
>> write_lock(&fs_info->mapping_tree_lock);
>> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>> RB_CLEAR_NODE(&map->rb_node);
>> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> write_unlock(&fs_info->mapping_tree_lock);
>>
>> /* Once for the tree reference. */
>> @@ -5489,7 +5501,7 @@ int btrfs_add_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *m
>> return -EEXIST;
>> }
>> chunk_map_device_set_bits(map, CHUNK_ALLOCATED);
>> - chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
>> write_unlock(&fs_info->mapping_tree_lock);
>>
>> return 0;
>> @@ -5854,7 +5866,7 @@ void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info)
>> map = rb_entry(node, struct btrfs_chunk_map, rb_node);
>> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>> RB_CLEAR_NODE(&map->rb_node);
>> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> /* Once for the tree ref. */
>> btrfs_free_chunk_map(map);
>> cond_resched_rwlock_write(&fs_info->mapping_tree_lock);
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 9fb8fe4312a5..0a73ea2a2a6a 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -779,6 +779,8 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map);
>> int btrfs_nr_parity_stripes(u64 type);
>> int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
>> struct btrfs_block_group *bg);
>> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *map);
>> int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
>>
>> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>> @@ -876,6 +878,10 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
>>
>> bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
>> const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb);
>> +int btrfs_update_device(struct btrfs_trans_handle *trans,
>> + struct btrfs_device *device);
>> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map,
>> + unsigned int bits);
>>
>> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>> struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
>> --
>> 2.49.0
>>
>
next prev parent reply other threads:[~2025-08-11 16:48 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 16:23 [PATCH 00/12] btrfs: remap tree Mark Harmstone
2025-06-05 16:23 ` [PATCH 01/12] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-06-13 21:02 ` Boris Burkov
2025-06-05 16:23 ` [PATCH 02/12] btrfs: add REMAP chunk type Mark Harmstone
2025-06-13 21:22 ` Boris Burkov
2025-06-05 16:23 ` [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-06-13 21:41 ` Boris Burkov
2025-08-08 14:12 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 04/12] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-06-06 6:41 ` kernel test robot
2025-06-13 22:00 ` Boris Burkov
2025-08-12 14:50 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-06-13 22:39 ` Boris Burkov
2025-06-05 16:23 ` [PATCH 06/12] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-06-05 16:23 ` [PATCH 07/12] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-06-05 16:23 ` [PATCH 08/12] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-06-05 16:23 ` [PATCH 09/12] btrfs: handle deletions from remapped block group Mark Harmstone
2025-06-13 23:42 ` Boris Burkov
2025-08-11 16:48 ` Mark Harmstone [this message]
2025-08-11 16:59 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 10/12] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-06-13 23:25 ` Boris Burkov
2025-08-12 11:20 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 11/12] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-06-06 11:20 ` kernel test robot
2025-06-05 16:23 ` [PATCH 12/12] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-06-05 16:43 ` [PATCH 00/12] btrfs: remap tree Jonah Sabean
2025-06-06 13:35 ` Mark Harmstone
2025-06-09 16:05 ` Anand Jain
2025-06-09 18:51 ` David Sterba
2025-06-10 9:19 ` Mark Harmstone
2025-06-10 14:31 ` Mark Harmstone
2025-06-10 23:56 ` Qu Wenruo
2025-06-11 8:06 ` Mark Harmstone
2025-06-11 15:28 ` Mark Harmstone
2025-06-14 0:04 ` Boris Burkov
2025-06-26 22:10 ` Mark Harmstone
2025-06-27 5:59 ` Neal Gompa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a70b41cb-dab5-440f-8250-2e849aef147b@harmstone.com \
--to=mark@harmstone.com \
--cc=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox