Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 10/16] btrfs: handle deletions from remapped block group
Date: Fri, 15 Aug 2025 17:28:22 -0700	[thread overview]
Message-ID: <20250816002822.GE3042054@zen.localdomain> (raw)
In-Reply-To: <20250813143509.31073-11-mark@harmstone.com>

On Wed, Aug 13, 2025 at 03:34:52PM +0100, Mark Harmstone wrote:
> Handle the case where we free an extent from a block group that has the
> REMAPPED flag set. Because the remap tree is orthogonal to the extent
> tree, for data this may be within any number of identity remaps or
> actual remaps. If we're freeing a metadata node, this will be wholly
> inside one or the other.
> 
> btrfs_remove_extent_from_remap_tree() searches the remap tree for the
> remaps that cover the range in question, then calls
> remove_range_from_remap_tree() for each one, to punch a hole in the
> remap and adjust the free-space tree.
> 
> For an identity remap, remove_range_from_remap_tree() will adjust the
> block group's `identity_remap_count` if this changes. If it reaches
> zero we call last_identity_remap_gone(), which removes the chunk's
> stripes and device extents - it is now fully remapped.
> 
> The changes which involve the block group's ro flag are because the
> REMAPPED flag itself prevents a block group from having any new
> allocations within it, and so we don't need to account for this
> separately.
> 
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
> ---
>  fs/btrfs/block-group.c |  82 ++++---
>  fs/btrfs/block-group.h |   1 +
>  fs/btrfs/disk-io.c     |   1 +
>  fs/btrfs/extent-tree.c |  28 ++-
>  fs/btrfs/fs.h          |   1 +
>  fs/btrfs/relocation.c  | 510 +++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/relocation.h  |   3 +
>  fs/btrfs/volumes.c     |  56 +++--
>  fs/btrfs/volumes.h     |   6 +
>  9 files changed, 630 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 8c28f829547e..7a0524138235 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1068,6 +1068,32 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group)
> +{
> +	int factor = btrfs_bg_type_to_factor(block_group->flags);
> +
> +	spin_lock(&block_group->space_info->lock);
> +
> +	if (btrfs_test_opt(block_group->fs_info, ENOSPC_DEBUG)) {
> +		WARN_ON(block_group->space_info->total_bytes
> +			< block_group->length);
> +		WARN_ON(block_group->space_info->bytes_readonly
> +			< block_group->length - block_group->zone_unusable);
> +		WARN_ON(block_group->space_info->bytes_zone_unusable
> +			< block_group->zone_unusable);
> +		WARN_ON(block_group->space_info->disk_total
> +			< block_group->length * factor);
> +	}
> +	block_group->space_info->total_bytes -= block_group->length;
> +	block_group->space_info->bytes_readonly -=
> +		(block_group->length - block_group->zone_unusable);
> +	btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
> +						    -block_group->zone_unusable);
> +	block_group->space_info->disk_total -= block_group->length * factor;
> +
> +	spin_unlock(&block_group->space_info->lock);
> +}
> +
>  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  			     struct btrfs_chunk_map *map)
>  {
> @@ -1079,7 +1105,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  	struct kobject *kobj = NULL;
>  	int ret;
>  	int index;
> -	int factor;
>  	struct btrfs_caching_control *caching_ctl = NULL;
>  	bool remove_map;
>  	bool remove_rsv = false;
> @@ -1088,7 +1113,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  	if (!block_group)
>  		return -ENOENT;
>  
> -	BUG_ON(!block_group->ro);
> +	BUG_ON(!block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED));
>  
>  	trace_btrfs_remove_block_group(block_group);
>  	/*
> @@ -1100,7 +1125,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  				  block_group->length);
>  
>  	index = btrfs_bg_flags_to_raid_index(block_group->flags);
> -	factor = btrfs_bg_type_to_factor(block_group->flags);
>  
>  	/* make sure this block group isn't part of an allocation cluster */
>  	cluster = &fs_info->data_alloc_cluster;
> @@ -1224,26 +1248,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  
>  	spin_lock(&block_group->space_info->lock);
>  	list_del_init(&block_group->ro_list);
> -
> -	if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
> -		WARN_ON(block_group->space_info->total_bytes
> -			< block_group->length);
> -		WARN_ON(block_group->space_info->bytes_readonly
> -			< block_group->length - block_group->zone_unusable);
> -		WARN_ON(block_group->space_info->bytes_zone_unusable
> -			< block_group->zone_unusable);
> -		WARN_ON(block_group->space_info->disk_total
> -			< block_group->length * factor);
> -	}
> -	block_group->space_info->total_bytes -= block_group->length;
> -	block_group->space_info->bytes_readonly -=
> -		(block_group->length - block_group->zone_unusable);
> -	btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
> -						    -block_group->zone_unusable);
> -	block_group->space_info->disk_total -= block_group->length * factor;
> -
>  	spin_unlock(&block_group->space_info->lock);
>  
> +	if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED))
> +		btrfs_remove_bg_from_sinfo(block_group);
> +
>  	/*
>  	 * Remove the free space for the block group from the free space tree
>  	 * and the block group's item from the extent tree before marking the
> @@ -1539,6 +1548,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  	while (!list_empty(&fs_info->unused_bgs)) {
>  		u64 used;
>  		int trimming;
> +		bool made_ro = false;
>  
>  		block_group = list_first_entry(&fs_info->unused_bgs,
>  					       struct btrfs_block_group,
> @@ -1575,7 +1585,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  
>  		spin_lock(&space_info->lock);
>  		spin_lock(&block_group->lock);
> -		if (btrfs_is_block_group_used(block_group) || block_group->ro ||
> +		if (btrfs_is_block_group_used(block_group) ||
> +		    (block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) ||
>  		    list_is_singular(&block_group->list)) {
>  			/*
>  			 * We want to bail if we made new allocations or have
> @@ -1617,9 +1628,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  		 * needing to allocate extents from the block group.
>  		 */
>  		used = btrfs_space_info_used(space_info, true);
> -		if ((space_info->total_bytes - block_group->length < used &&
> +		if (((space_info->total_bytes - block_group->length < used &&
>  		     block_group->zone_unusable < block_group->length) ||
> -		    has_unwritten_metadata(block_group)) {
> +		    has_unwritten_metadata(block_group)) &&
> +			!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>  			spin_unlock(&block_group->lock);
>  
>  			/*
> @@ -1638,8 +1650,14 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  		spin_unlock(&block_group->lock);
>  		spin_unlock(&space_info->lock);
>  
> -		/* We don't want to force the issue, only flip if it's ok. */
> -		ret = inc_block_group_ro(block_group, 0);
> +		if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
> +			/* We don't want to force the issue, only flip if it's ok. */
> +			ret = inc_block_group_ro(block_group, 0);
> +			made_ro = true;
> +		} else {
> +			ret = 0;
> +		}
> +
>  		up_write(&space_info->groups_sem);
>  		if (ret < 0) {
>  			ret = 0;
> @@ -1648,7 +1666,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  
>  		ret = btrfs_zone_finish(block_group);
>  		if (ret < 0) {
> -			btrfs_dec_block_group_ro(block_group);
> +			if (made_ro)
> +				btrfs_dec_block_group_ro(block_group);
>  			if (ret == -EAGAIN) {
>  				btrfs_link_bg_list(block_group, &retry_list);
>  				ret = 0;
> @@ -1663,7 +1682,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  		trans = btrfs_start_trans_remove_block_group(fs_info,
>  						     block_group->start);
>  		if (IS_ERR(trans)) {
> -			btrfs_dec_block_group_ro(block_group);
> +			if (made_ro)
> +				btrfs_dec_block_group_ro(block_group);
>  			ret = PTR_ERR(trans);
>  			goto next;
>  		}
> @@ -1673,7 +1693,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  		 * just delete them, we don't care about them anymore.
>  		 */
>  		if (!clean_pinned_extents(trans, block_group)) {
> -			btrfs_dec_block_group_ro(block_group);
> +			if (made_ro)
> +				btrfs_dec_block_group_ro(block_group);
>  			goto end_trans;
>  		}
>  
> @@ -1687,7 +1708,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>  		spin_lock(&fs_info->discard_ctl.lock);
>  		if (!list_empty(&block_group->discard_list)) {
>  			spin_unlock(&fs_info->discard_ctl.lock);
> -			btrfs_dec_block_group_ro(block_group);
> +			if (made_ro)
> +				btrfs_dec_block_group_ro(block_group);
>  			btrfs_discard_queue_work(&fs_info->discard_ctl,
>  						 block_group);
>  			goto end_trans;
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index ecc89701b2ea..0433b0127ed8 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -336,6 +336,7 @@ int btrfs_add_new_free_space(struct btrfs_block_group *block_group,
>  struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
>  				struct btrfs_fs_info *fs_info,
>  				const u64 chunk_offset);
> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group);
>  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>  			     struct btrfs_chunk_map *map);
>  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 563aea5e3b1b..d92d08316322 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2907,6 +2907,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>  	mutex_init(&fs_info->chunk_mutex);
>  	mutex_init(&fs_info->transaction_kthread_mutex);
>  	mutex_init(&fs_info->cleaner_mutex);
> +	mutex_init(&fs_info->remap_mutex);
>  	mutex_init(&fs_info->ro_block_group_mutex);
>  	init_rwsem(&fs_info->commit_root_sem);
>  	init_rwsem(&fs_info->cleanup_work_sem);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c1b96c728fe6..ca3f6d6bb5ba 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -40,6 +40,7 @@
>  #include "orphan.h"
>  #include "tree-checker.h"
>  #include "raid-stripe-tree.h"
> +#include "relocation.h"
>  
>  #undef SCRAMBLE_DELAYED_REFS
>  
> @@ -2999,7 +3000,8 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info,
>  }
>  
>  static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
> -				     u64 bytenr, struct btrfs_squota_delta *delta)
> +				     u64 bytenr, struct btrfs_squota_delta *delta,
> +				     bool remapped)
>  {
>  	int ret;
>  	u64 num_bytes = delta->num_bytes;
> @@ -3027,10 +3029,16 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
>  		return ret;
>  	}
>  
> -	ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
> -	if (ret) {
> -		btrfs_abort_transaction(trans, ret);
> -		return ret;
> +	/*
> +	 * If remapped, FST has already been taken care of in
> +	 * remove_range_from_remap_tree().
> +	 */

Why not do btrfs_remove_extent_from_remap_tree() here in
do_free_extent_accounting() rather than the caller?

> +	if (!remapped) {
> +		ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
> +		if (ret) {
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
>  	}
>  
>  	ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);

So in the normal case, this will trigger updating the block group bytes
counters, and when they hit 0 put the block group on the unused list,
which queues it for deletion and ultimately removing from space info
etc.

I fail to see what is special about remapped block groups from that
perspective. I would strongly prefer to see you integrate with
btrsf_delete_unused_bgs() rather than special case skipping it there and
copying parts of the logic elsewhere.

Unless there is some very good reason for the special treatment that I
am not seeing.

> @@ -3396,7 +3404,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  		}
>  		btrfs_release_path(path);
>  
> -		ret = do_free_extent_accounting(trans, bytenr, &delta);
> +		/* returns 1 on success and 0 on no-op */
> +		ret = btrfs_remove_extent_from_remap_tree(trans, path, bytenr,
> +							  num_bytes);
> +		if (ret < 0) {
> +			btrfs_abort_transaction(trans, ret);
> +			goto out;
> +		}
> +
> +		ret = do_free_extent_accounting(trans, bytenr, &delta, ret);
>  	}
>  	btrfs_release_path(path);
>  
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 6ea96e76655e..dbb7de95241b 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -547,6 +547,7 @@ struct btrfs_fs_info {
>  	struct mutex transaction_kthread_mutex;
>  	struct mutex cleaner_mutex;
>  	struct mutex chunk_mutex;
> +	struct mutex remap_mutex;
>  
>  	/*
>  	 * This is taken to make sure we don't set block groups ro after the
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index e1f1da9336e7..03a1246af678 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -37,6 +37,7 @@
>  #include "super.h"
>  #include "tree-checker.h"
>  #include "raid-stripe-tree.h"
> +#include "free-space-tree.h"
>  
>  /*
>   * Relocation overview
> @@ -3884,6 +3885,148 @@ static const char *stage_to_string(enum reloc_stage stage)
>  	return "unknown";
>  }
>  
> +static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
> +					   struct btrfs_block_group *bg,
> +					   s64 diff)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	bool bg_already_dirty = true;
> +
> +	bg->remap_bytes += diff;
> +
> +	if (bg->used == 0 && bg->remap_bytes == 0)
> +		btrfs_mark_bg_unused(bg);
> +
> +	spin_lock(&trans->transaction->dirty_bgs_lock);
> +	if (list_empty(&bg->dirty_list)) {
> +		list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
> +		bg_already_dirty = false;
> +		btrfs_get_block_group(bg);
> +	}
> +	spin_unlock(&trans->transaction->dirty_bgs_lock);
> +
> +	/* Modified block groups are accounted for in the delayed_refs_rsv. */
> +	if (!bg_already_dirty)
> +		btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
> +}
> +
> +static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
> +				struct btrfs_chunk_map *chunk,
> +				struct btrfs_path *path)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_key key;
> +	struct extent_buffer *leaf;
> +	struct btrfs_chunk *c;
> +	int ret;
> +
> +	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> +	key.type = BTRFS_CHUNK_ITEM_KEY;
> +	key.offset = chunk->start;
> +
> +	ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
> +				0, 1);
> +	if (ret) {
> +		if (ret == 1) {
> +			btrfs_release_path(path);
> +			ret = -ENOENT;
> +		}
> +		return ret;
> +	}
> +
> +	leaf = path->nodes[0];
> +
> +	c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
> +	btrfs_set_chunk_num_stripes(leaf, c, 0);
> +
> +	btrfs_truncate_item(trans, path, offsetof(struct btrfs_chunk, stripe),
> +			    1);
> +
> +	btrfs_mark_buffer_dirty(trans, leaf);
> +
> +	btrfs_release_path(path);
> +
> +	return 0;
> +}
> +

Same question as elsewhere, but placed here for clarity:
Why can't this be queued for normal unused bgs deletion, rather than
having a special remap bg deletion function?

> +static int last_identity_remap_gone(struct btrfs_trans_handle *trans,
> +				    struct btrfs_chunk_map *chunk,
> +				    struct btrfs_block_group *bg,
> +				    struct btrfs_path *path)
> +{
> +	int ret;
> +
> +	ret = btrfs_remove_dev_extents(trans, chunk);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&trans->fs_info->chunk_mutex);
> +
> +	for (unsigned int i = 0; i < chunk->num_stripes; i++) {
> +		ret = btrfs_update_device(trans, chunk->stripes[i].dev);
> +		if (ret) {
> +			mutex_unlock(&trans->fs_info->chunk_mutex);
> +			return ret;
> +		}
> +	}
> +
> +	mutex_unlock(&trans->fs_info->chunk_mutex);
> +
> +	write_lock(&trans->fs_info->mapping_tree_lock);
> +	btrfs_chunk_map_device_clear_bits(chunk, CHUNK_ALLOCATED);
> +	write_unlock(&trans->fs_info->mapping_tree_lock);
> +
> +	btrfs_remove_bg_from_sinfo(bg);
> +
> +	ret = remove_chunk_stripes(trans, chunk, path);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static int adjust_identity_remap_count(struct btrfs_trans_handle *trans,
> +				       struct btrfs_path *path,
> +				       struct btrfs_block_group *bg, int delta)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_chunk_map *chunk;
> +	bool bg_already_dirty = true;
> +	int ret;
> +
> +	WARN_ON(delta < 0 && -delta > bg->identity_remap_count);
> +
> +	bg->identity_remap_count += delta;
> +
> +	spin_lock(&trans->transaction->dirty_bgs_lock);
> +	if (list_empty(&bg->dirty_list)) {
> +		list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
> +		bg_already_dirty = false;
> +		btrfs_get_block_group(bg);
> +	}
> +	spin_unlock(&trans->transaction->dirty_bgs_lock);
> +
> +	/* Modified block groups are accounted for in the delayed_refs_rsv. */
> +	if (!bg_already_dirty)
> +		btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
> +
> +	if (bg->identity_remap_count != 0)
> +		return 0;
> +
> +	chunk = btrfs_find_chunk_map(fs_info, bg->start, 1);
> +	if (!chunk)
> +		return -ENOENT;
> +
> +	ret = last_identity_remap_gone(trans, chunk, bg, path);
> +	if (ret)
> +		goto end;
> +
> +	ret = 0;
> +end:
> +	btrfs_free_chunk_map(chunk);
> +	return ret;
> +}
> +
>  int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>  			  u64 *length, bool nolock)
>  {
> @@ -4504,3 +4647,370 @@ u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info)
>  		logical = fs_info->reloc_ctl->block_group->start;
>  	return logical;
>  }
> +
> +static int remove_range_from_remap_tree(struct btrfs_trans_handle *trans,
> +					struct btrfs_path *path,
> +					struct btrfs_block_group *bg,
> +					u64 bytenr, u64 num_bytes)
> +{
> +	int ret;
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct extent_buffer *leaf = path->nodes[0];
> +	struct btrfs_key key, new_key;
> +	struct btrfs_remap *remap_ptr = NULL, remap;
> +	struct btrfs_block_group *dest_bg = NULL;
> +	u64 end, new_addr = 0, remap_start, remap_length, overlap_length;
> +	bool is_identity_remap;
> +
> +	end = bytenr + num_bytes;
> +
> +	btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
> +
> +	is_identity_remap = key.type == BTRFS_IDENTITY_REMAP_KEY;
> +
> +	remap_start = key.objectid;
> +	remap_length = key.offset;
> +
> +	if (!is_identity_remap) {
> +		remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
> +					   struct btrfs_remap);
> +		new_addr = btrfs_remap_address(leaf, remap_ptr);
> +
> +		dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
> +	}
> +
> +	if (bytenr == remap_start && num_bytes >= remap_length) {
> +		/* Remove entirely. */
> +
> +		ret = btrfs_del_item(trans, fs_info->remap_root, path);
> +		if (ret)
> +			goto end;
> +
> +		btrfs_release_path(path);
> +
> +		overlap_length = remap_length;
> +
> +		if (!is_identity_remap) {
> +			/* Remove backref. */
> +
> +			key.objectid = new_addr;
> +			key.type = BTRFS_REMAP_BACKREF_KEY;
> +			key.offset = remap_length;
> +
> +			ret = btrfs_search_slot(trans, fs_info->remap_root,
> +						&key, path, -1, 1);
> +			if (ret) {
> +				if (ret == 1) {
> +					btrfs_release_path(path);
> +					ret = -ENOENT;
> +				}
> +				goto end;
> +			}
> +
> +			ret = btrfs_del_item(trans, fs_info->remap_root, path);
> +
> +			btrfs_release_path(path);
> +
> +			if (ret)
> +				goto end;
> +
> +			adjust_block_group_remap_bytes(trans, dest_bg,
> +						       -remap_length);
> +		} else {
> +			ret = adjust_identity_remap_count(trans, path, bg, -1);
> +			if (ret)
> +				goto end;
> +		}
> +	} else if (bytenr == remap_start) {
> +		/* Remove beginning. */
> +
> +		new_key.objectid = end;
> +		new_key.type = key.type;
> +		new_key.offset = remap_length + remap_start - end;
> +
> +		btrfs_set_item_key_safe(trans, path, &new_key);
> +		btrfs_mark_buffer_dirty(trans, leaf);
> +
> +		overlap_length = num_bytes;
> +
> +		if (!is_identity_remap) {
> +			btrfs_set_remap_address(leaf, remap_ptr,
> +						new_addr + end - remap_start);
> +			btrfs_release_path(path);
> +
> +			/* Adjust backref. */
> +
> +			key.objectid = new_addr;
> +			key.type = BTRFS_REMAP_BACKREF_KEY;
> +			key.offset = remap_length;
> +
> +			ret = btrfs_search_slot(trans, fs_info->remap_root,
> +						&key, path, -1, 1);
> +			if (ret) {
> +				if (ret == 1) {
> +					btrfs_release_path(path);
> +					ret = -ENOENT;
> +				}
> +				goto end;
> +			}
> +
> +			leaf = path->nodes[0];
> +
> +			new_key.objectid = new_addr + end - remap_start;
> +			new_key.type = BTRFS_REMAP_BACKREF_KEY;
> +			new_key.offset = remap_length + remap_start - end;
> +
> +			btrfs_set_item_key_safe(trans, path, &new_key);
> +
> +			remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
> +						   struct btrfs_remap);
> +			btrfs_set_remap_address(leaf, remap_ptr, end);
> +
> +			btrfs_mark_buffer_dirty(trans, path->nodes[0]);
> +
> +			btrfs_release_path(path);
> +
> +			adjust_block_group_remap_bytes(trans, dest_bg,
> +						       -num_bytes);
> +		}
> +	} else if (bytenr + num_bytes < remap_start + remap_length) {
> +		/* Remove middle. */
> +
> +		new_key.objectid = remap_start;
> +		new_key.type = key.type;
> +		new_key.offset = bytenr - remap_start;
> +
> +		btrfs_set_item_key_safe(trans, path, &new_key);
> +		btrfs_mark_buffer_dirty(trans, leaf);
> +
> +		new_key.objectid = end;
> +		new_key.offset = remap_start + remap_length - end;
> +
> +		btrfs_release_path(path);
> +
> +		overlap_length = num_bytes;
> +
> +		if (!is_identity_remap) {
> +			/* Add second remap entry. */
> +
> +			ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> +						path, &new_key,
> +						sizeof(struct btrfs_remap));
> +			if (ret)
> +				goto end;
> +
> +			btrfs_set_stack_remap_address(&remap,
> +						new_addr + end - remap_start);
> +
> +			write_extent_buffer(path->nodes[0], &remap,
> +				btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> +				sizeof(struct btrfs_remap));
> +
> +			btrfs_release_path(path);
> +
> +			/* Shorten backref entry. */
> +
> +			key.objectid = new_addr;
> +			key.type = BTRFS_REMAP_BACKREF_KEY;
> +			key.offset = remap_length;
> +
> +			ret = btrfs_search_slot(trans, fs_info->remap_root,
> +						&key, path, -1, 1);
> +			if (ret) {
> +				if (ret == 1) {
> +					btrfs_release_path(path);
> +					ret = -ENOENT;
> +				}
> +				goto end;
> +			}
> +
> +			new_key.objectid = new_addr;
> +			new_key.type = BTRFS_REMAP_BACKREF_KEY;
> +			new_key.offset = bytenr - remap_start;
> +
> +			btrfs_set_item_key_safe(trans, path, &new_key);
> +			btrfs_mark_buffer_dirty(trans, path->nodes[0]);
> +
> +			btrfs_release_path(path);
> +
> +			/* Add second backref entry. */
> +
> +			new_key.objectid = new_addr + end - remap_start;
> +			new_key.type = BTRFS_REMAP_BACKREF_KEY;
> +			new_key.offset = remap_start + remap_length - end;
> +
> +			ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> +						path, &new_key,
> +						sizeof(struct btrfs_remap));
> +			if (ret)
> +				goto end;
> +
> +			btrfs_set_stack_remap_address(&remap, end);
> +
> +			write_extent_buffer(path->nodes[0], &remap,
> +				btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> +				sizeof(struct btrfs_remap));
> +
> +			btrfs_release_path(path);
> +
> +			adjust_block_group_remap_bytes(trans, dest_bg,
> +						       -num_bytes);
> +		} else {
> +			/* Add second identity remap entry. */
> +
> +			ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> +						      path, &new_key, 0);
> +			if (ret)
> +				goto end;
> +
> +			btrfs_release_path(path);
> +
> +			ret = adjust_identity_remap_count(trans, path, bg, 1);
> +			if (ret)
> +				goto end;
> +		}
> +	} else {
> +		/* Remove end. */
> +
> +		new_key.objectid = remap_start;
> +		new_key.type = key.type;
> +		new_key.offset = bytenr - remap_start;
> +
> +		btrfs_set_item_key_safe(trans, path, &new_key);
> +		btrfs_mark_buffer_dirty(trans, leaf);
> +
> +		btrfs_release_path(path);
> +
> +		overlap_length = remap_start + remap_length - bytenr;
> +
> +		if (!is_identity_remap) {
> +			/* Shorten backref entry. */
> +
> +			key.objectid = new_addr;
> +			key.type = BTRFS_REMAP_BACKREF_KEY;
> +			key.offset = remap_length;
> +
> +			ret = btrfs_search_slot(trans, fs_info->remap_root,
> +						&key, path, -1, 1);
> +			if (ret) {
> +				if (ret == 1) {
> +					btrfs_release_path(path);
> +					ret = -ENOENT;
> +				}
> +				goto end;
> +			}
> +
> +			new_key.objectid = new_addr;
> +			new_key.type = BTRFS_REMAP_BACKREF_KEY;
> +			new_key.offset = bytenr - remap_start;
> +
> +			btrfs_set_item_key_safe(trans, path, &new_key);
> +			btrfs_mark_buffer_dirty(trans, path->nodes[0]);
> +
> +			btrfs_release_path(path);
> +
> +			adjust_block_group_remap_bytes(trans, dest_bg,
> +					bytenr - remap_start - remap_length);
> +		}
> +	}
> +
> +	if (!is_identity_remap) {
> +		ret = btrfs_add_to_free_space_tree(trans,
> +					     bytenr - remap_start + new_addr,
> +					     overlap_length);
> +		if (ret)
> +			goto end;
> +	}
> +
> +	ret = overlap_length;
> +
> +end:
> +	if (dest_bg)
> +		btrfs_put_block_group(dest_bg);
> +
> +	return ret;
> +}
> +
> +/*
> + * Returns 1 if remove_range_from_remap_tree() has been called successfully,
> + * 0 if block group wasn't remapped, and a negative number on error.
> + */
> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
> +					struct btrfs_path *path,
> +					u64 bytenr, u64 num_bytes)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_key key, found_key;
> +	struct extent_buffer *leaf;
> +	struct btrfs_block_group *bg;
> +	int ret, length;
> +
> +	if (!(btrfs_super_incompat_flags(fs_info->super_copy) &
> +	      BTRFS_FEATURE_INCOMPAT_REMAP_TREE))
> +		return 0;
> +
> +	bg = btrfs_lookup_block_group(fs_info, bytenr);
> +	if (!bg)
> +		return 0;
> +
> +	mutex_lock(&fs_info->remap_mutex);
> +
> +	if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
> +		mutex_unlock(&fs_info->remap_mutex);
> +		btrfs_put_block_group(bg);
> +		return 0;
> +	}
> +
> +	do {
> +		key.objectid = bytenr;
> +		key.type = (u8)-1;
> +		key.offset = (u64)-1;
> +
> +		ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
> +					-1, 1);
> +		if (ret < 0)
> +			goto end;
> +
> +		leaf = path->nodes[0];
> +
> +		if (path->slots[0] == 0) {
> +			ret = -ENOENT;
> +			goto end;
> +		}
> +
> +		path->slots[0]--;
> +
> +		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> +		if (found_key.type != BTRFS_IDENTITY_REMAP_KEY &&
> +		    found_key.type != BTRFS_REMAP_KEY) {
> +			ret = -ENOENT;
> +			goto end;
> +		}
> +
> +		if (bytenr < found_key.objectid ||
> +		    bytenr >= found_key.objectid + found_key.offset) {
> +			ret = -ENOENT;
> +			goto end;
> +		}
> +
> +		length = remove_range_from_remap_tree(trans, path, bg, bytenr,
> +						      num_bytes);
> +		if (length < 0) {
> +			ret = length;
> +			goto end;
> +		}
> +
> +		bytenr += length;
> +		num_bytes -= length;
> +	} while (num_bytes > 0);
> +
> +	ret = 1;
> +
> +end:
> +	mutex_unlock(&fs_info->remap_mutex);
> +
> +	btrfs_put_block_group(bg);
> +	btrfs_release_path(path);
> +	return ret;
> +}
> diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
> index a653c42a25a3..4b0bb34b3fc1 100644
> --- a/fs/btrfs/relocation.h
> +++ b/fs/btrfs/relocation.h
> @@ -33,5 +33,8 @@ bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
>  u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
>  int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>  			  u64 *length, bool nolock);
> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
> +					struct btrfs_path *path,
> +					u64 bytenr, u64 num_bytes);
>  
>  #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a2c49cb8bfc6..fc2b3e7de32e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2941,8 +2941,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	return ret;
>  }
>  
> -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
> -					struct btrfs_device *device)
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
> +			struct btrfs_device *device)
>  {
>  	int ret;
>  	struct btrfs_path *path;
> @@ -3246,25 +3246,13 @@ static int remove_chunk_item(struct btrfs_trans_handle *trans,
>  	return btrfs_free_chunk(trans, chunk_offset);
>  }
>  
> -int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
> +			     struct btrfs_chunk_map *map)
>  {
>  	struct btrfs_fs_info *fs_info = trans->fs_info;
> -	struct btrfs_chunk_map *map;
> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>  	u64 dev_extent_len = 0;
>  	int i, ret = 0;
> -	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> -
> -	map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> -	if (IS_ERR(map)) {
> -		/*
> -		 * This is a logic error, but we don't want to just rely on the
> -		 * user having built with ASSERT enabled, so if ASSERT doesn't
> -		 * do anything we still error out.
> -		 */
> -		DEBUG_WARN("errr %ld reading chunk map at offset %llu",
> -			   PTR_ERR(map), chunk_offset);
> -		return PTR_ERR(map);
> -	}
>  
>  	/*
>  	 * First delete the device extent items from the devices btree.
> @@ -3285,7 +3273,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>  		if (ret) {
>  			mutex_unlock(&fs_devices->device_list_mutex);
>  			btrfs_abort_transaction(trans, ret);
> -			goto out;
> +			return ret;
>  		}
>  
>  		if (device->bytes_used > 0) {
> @@ -3305,6 +3293,30 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>  	}
>  	mutex_unlock(&fs_devices->device_list_mutex);
>  
> +	return 0;
> +}
> +
> +int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_chunk_map *map;
> +	int ret;
> +
> +	map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> +	if (IS_ERR(map)) {
> +		/*
> +		 * This is a logic error, but we don't want to just rely on the
> +		 * user having built with ASSERT enabled, so if ASSERT doesn't
> +		 * do anything we still error out.
> +		 */
> +		ASSERT(0);
> +		return PTR_ERR(map);
> +	}
> +
> +	ret = btrfs_remove_dev_extents(trans, map);
> +	if (ret)
> +		goto out;
> +
>  	/*
>  	 * We acquire fs_info->chunk_mutex for 2 reasons:
>  	 *
> @@ -5448,7 +5460,7 @@ static void chunk_map_device_set_bits(struct btrfs_chunk_map *map, unsigned int
>  	}
>  }
>  
> -static void chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
>  {
>  	for (int i = 0; i < map->num_stripes; i++) {
>  		struct btrfs_io_stripe *stripe = &map->stripes[i];
> @@ -5465,7 +5477,7 @@ void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_ma
>  	write_lock(&fs_info->mapping_tree_lock);
>  	rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>  	RB_CLEAR_NODE(&map->rb_node);
> -	chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> +	btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>  	write_unlock(&fs_info->mapping_tree_lock);
>  
>  	/* Once for the tree reference. */
> @@ -5501,7 +5513,7 @@ int btrfs_add_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *m
>  		return -EEXIST;
>  	}
>  	chunk_map_device_set_bits(map, CHUNK_ALLOCATED);
> -	chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
> +	btrfs_chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
>  	write_unlock(&fs_info->mapping_tree_lock);
>  
>  	return 0;
> @@ -5866,7 +5878,7 @@ void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info)
>  		map = rb_entry(node, struct btrfs_chunk_map, rb_node);
>  		rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>  		RB_CLEAR_NODE(&map->rb_node);
> -		chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> +		btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>  		/* Once for the tree ref. */
>  		btrfs_free_chunk_map(map);
>  		cond_resched_rwlock_write(&fs_info->mapping_tree_lock);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 430be12fd5e7..64b34710b68b 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -789,6 +789,8 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map);
>  int btrfs_nr_parity_stripes(u64 type);
>  int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
>  				     struct btrfs_block_group *bg);
> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
> +			     struct btrfs_chunk_map *map);
>  int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
>  
>  #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> @@ -900,6 +902,10 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
>  
>  bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
>  const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb);
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
> +			struct btrfs_device *device);
> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map,
> +				       unsigned int bits);
>  
>  #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>  struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
> -- 
> 2.49.1
> 

  reply	other threads:[~2025-08-16  0:27 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 14:34 [PATCH v2 00/16] btrfs: remap tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-08-15 23:51   ` Boris Burkov
2025-08-18 17:21     ` Mark Harmstone
2025-08-18 17:33       ` Boris Burkov
2025-08-16  0:01   ` Qu Wenruo
2025-08-16  0:17     ` Qu Wenruo
2025-08-18 17:23       ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-08-16  0:03   ` Boris Burkov
2025-08-22 17:01     ` Mark Harmstone
2025-08-19  1:05   ` kernel test robot
2025-08-22 17:07     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-08-16  0:06   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-08-16  0:08   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-08-22 19:14   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-08-22 19:42   ` Boris Burkov
2025-08-27 14:08     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 09/16] btrfs: release BG lock before calling btrfs_link_bg_list() Mark Harmstone
2025-08-16  0:32   ` Boris Burkov
2025-08-27 15:35     ` Mark Harmstone
2025-08-27 15:48       ` Filipe Manana
2025-08-27 15:52         ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 10/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-08-16  0:28   ` Boris Burkov [this message]
2025-08-27 17:11     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 11/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 12/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 13/16] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 14/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 15/16] btrfs: add fully_remapped_bgs list Mark Harmstone
2025-08-16  0:56   ` Boris Burkov
2025-08-27 18:51     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 16/16] btrfs: allow balancing remap tree Mark Harmstone
2025-08-16  1:02   ` Boris Burkov
2025-09-02 14:58     ` Mark Harmstone
2025-09-02 15:21     ` Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250816002822.GE3042054@zen.localdomain \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mark@harmstone.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox