All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Harmstone <mark@harmstone.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 16/16] btrfs: allow balancing remap tree
Date: Tue, 2 Sep 2025 16:21:56 +0100	[thread overview]
Message-ID: <1629f7a0-a2d8-400a-827d-eae9280de3cb@harmstone.com> (raw)
In-Reply-To: <20250816010231.GH3042054@zen.localdomain>

On 16/08/2025 2.02 am, Boris Burkov wrote:
> On Wed, Aug 13, 2025 at 03:34:58PM +0100, Mark Harmstone wrote:
>> Balancing the REMAP chunk, i.e. the chunk in which the remap tree lives,
>> is a special case.
>>
>> We can't use the remap tree itself for this, as then we'd have no way to
>> boostrap it on mount. And we can't use the pre-remap tree code for this
>> as it relies on walking the extent tree, and we're not creating backrefs
>> for REMAP chunks.
>>
>> So instead, if a balance would relocate any REMAP block groups, mark
>> those block groups as readonly and COW every leaf of the remap tree.
>>
>> There's more sophisticated ways of doing this, such as only COWing nodes
>> within a block group that's to be relocated, but they're fiddly and with
>> lots of edge cases. Plus it's not anticipated that a) the number of
>> REMAP chunks is going to be particularly large, or b) that users will
>> want to only relocate some of these chunks - the main use case here is
>> to unbreak RAID conversion and device removal.
>>
>> Signed-off-by: Mark Harmstone <mark@harmstone.com>
>> ---
>>   fs/btrfs/volumes.c | 161 +++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 157 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index e13f16a7a904..dc535ed90ae0 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4011,8 +4011,11 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
>>   	struct btrfs_balance_args *bargs = NULL;
>>   	u64 chunk_type = btrfs_chunk_type(leaf, chunk);
>>   
>> -	if (chunk_type & BTRFS_BLOCK_GROUP_REMAP)
>> -		return false;
>> +	/* treat REMAP chunks as METADATA */
>> +	if (chunk_type & BTRFS_BLOCK_GROUP_REMAP) {
>> +		chunk_type &= ~BTRFS_BLOCK_GROUP_REMAP;
>> +		chunk_type |= BTRFS_BLOCK_GROUP_METADATA;
> 
> why not honor the REMAP chunk type where appropriate?

This would imply adding a new flag to btrfs balance start, and a new
version of the ioctl, and I'm not sure it's worth it. Happy to argue
the toss though.

Doing btrfs balance start -m already implies -s, so it's not much of
a stretch for to cover REMAP as well.

Possibly it would make more sense for REMAP to be SYSTEM for balancing
purposes rather than METADATA.

>> +	}
>>   
>>   	/* type filter */
>>   	if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
>> @@ -4095,6 +4098,113 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
>>   	return true;
>>   }
>>   
>> +struct remap_chunk_info {
>> +	struct list_head list;
>> +	u64 offset;
>> +	struct btrfs_block_group *bg;
>> +	bool made_ro;
>> +};
>> +
>> +static int cow_remap_tree(struct btrfs_trans_handle *trans,
>> +			  struct btrfs_path *path)
>> +{
>> +	struct btrfs_fs_info *fs_info = trans->fs_info;
>> +	struct btrfs_key key = { 0 };
>> +	int ret;
>> +
>> +	ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path, 0, 1);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	while (true) {
>> +		ret = btrfs_next_leaf(fs_info->remap_root, path);
>> +		if (ret < 0) {
>> +			return ret;
>> +		} else if (ret > 0) {
>> +			ret = 0;
>> +			break;
>> +		}
>> +
>> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +
>> +		btrfs_release_path(path);
>> +
>> +		ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
>> +					0, 1);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static int balance_remap_chunks(struct btrfs_fs_info *fs_info,
>> +				struct btrfs_path *path,
>> +				struct list_head *chunks)
>> +{
>> +	struct remap_chunk_info *rci, *tmp;
>> +	struct btrfs_trans_handle *trans;
>> +	int ret;
>> +
>> +	list_for_each_entry_safe(rci, tmp, chunks, list) {
>> +		rci->bg = btrfs_lookup_block_group(fs_info, rci->offset);
>> +		if (!rci->bg) {
>> +			list_del(&rci->list);
>> +			kfree(rci);
>> +			continue;
>> +		}
>> +
>> +		ret = btrfs_inc_block_group_ro(rci->bg, false);
> 
> Just thinking out loud, what happens if we concurrently attempt a
> balance that would need to use the remap tree? Is something structurally
> blocking that at a higher level? Or will it fail? How will that failure
> be handled? Does the answer hold for btrfs-internal background reclaim
> rather than explicit balancing?
> 
>> +		if (ret)
>> +			goto end;
>> +
>> +		rci->made_ro = true;
>> +	}
>> +
>> +	if (list_empty(chunks))
>> +		return 0;
>> +
>> +	trans = btrfs_start_transaction(fs_info->remap_root, 0);
>> +	if (IS_ERR(trans)) {
>> +		ret = PTR_ERR(trans);
>> +		goto end;
>> +	}
>> +
>> +	mutex_lock(&fs_info->remap_mutex);
>> +
>> +	ret = cow_remap_tree(trans, path);
>> +
>> +	btrfs_release_path(path);
>> +
>> +	mutex_unlock(&fs_info->remap_mutex);
>> +
>> +	btrfs_commit_transaction(trans);
>> +
>> +end:
>> +	while (!list_empty(chunks)) {
>> +		bool unused;
>> +
>> +		rci = list_first_entry(chunks, struct remap_chunk_info, list);
>> +
>> +		spin_lock(&rci->bg->lock);
>> +		unused = !btrfs_is_block_group_used(rci->bg);
>> +		spin_unlock(&rci->bg->lock);
>> +
>> +		if (unused)
>> +			btrfs_mark_bg_unused(rci->bg);
>> +
>> +		if (rci->made_ro)
>> +			btrfs_dec_block_group_ro(rci->bg);
>> +
>> +		btrfs_put_block_group(rci->bg);
>> +
>> +		list_del(&rci->list);
>> +		kfree(rci);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>   {
>>   	struct btrfs_balance_control *bctl = fs_info->balance_ctl;
>> @@ -4117,6 +4227,9 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>   	u32 count_meta = 0;
>>   	u32 count_sys = 0;
>>   	int chunk_reserved = 0;
>> +	struct remap_chunk_info *rci;
>> +	unsigned int num_remap_chunks = 0;
>> +	LIST_HEAD(remap_chunks);
>>   
>>   	path = btrfs_alloc_path();
>>   	if (!path) {
>> @@ -4215,7 +4328,8 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>   				count_data++;
>>   			else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
>>   				count_sys++;
>> -			else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
>> +			else if (chunk_type & (BTRFS_BLOCK_GROUP_METADATA |
>> +					       BTRFS_BLOCK_GROUP_REMAP))
>>   				count_meta++;
>>   
>>   			goto loop;
>> @@ -4235,6 +4349,30 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>   			goto loop;
>>   		}
>>   
>> +		/*
>> +		 * Balancing REMAP chunks takes place separately - add the
>> +		 * details to a list so it can be processed later.
>> +		 */
>> +		if (chunk_type & BTRFS_BLOCK_GROUP_REMAP) {
>> +			mutex_unlock(&fs_info->reclaim_bgs_lock);
>> +
>> +			rci = kmalloc(sizeof(struct remap_chunk_info),
>> +				      GFP_NOFS);
>> +			if (!rci) {
>> +				ret = -ENOMEM;
>> +				goto error;
>> +			}
>> +
>> +			rci->offset = found_key.offset;
>> +			rci->bg = NULL;
>> +			rci->made_ro = false;
>> +			list_add_tail(&rci->list, &remap_chunks);
>> +
>> +			num_remap_chunks++;
>> +
>> +			goto loop;
>> +		}
>> +
>>   		if (!chunk_reserved) {
>>   			/*
>>   			 * We may be relocating the only data chunk we have,
>> @@ -4274,11 +4412,26 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>   		key.offset = found_key.offset - 1;
>>   	}
>>   
>> +	btrfs_release_path(path);
>> +
>>   	if (counting) {
>> -		btrfs_release_path(path);
>>   		counting = false;
>>   		goto again;
>>   	}
>> +
>> +	if (!list_empty(&remap_chunks)) {
>> +		ret = balance_remap_chunks(fs_info, path, &remap_chunks);
>> +		if (ret == -ENOSPC)
>> +			enospc_errors++;
>> +
>> +		if (!ret) {
>> +			btrfs_delete_unused_bgs(fs_info);
> 
> Why is this necessary here?
> 
>> +
>> +			spin_lock(&fs_info->balance_lock);
>> +			bctl->stat.completed += num_remap_chunks;
>> +			spin_unlock(&fs_info->balance_lock);
>> +		}
>> +	}
>>   error:
>>   	btrfs_free_path(path);
>>   	if (enospc_errors) {
>> -- 
>> 2.49.1
>>


      parent reply	other threads:[~2025-09-02 15:21 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 14:34 [PATCH v2 00/16] btrfs: remap tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-08-15 23:51   ` Boris Burkov
2025-08-18 17:21     ` Mark Harmstone
2025-08-18 17:33       ` Boris Burkov
2025-08-16  0:01   ` Qu Wenruo
2025-08-16  0:17     ` Qu Wenruo
2025-08-18 17:23       ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-08-16  0:03   ` Boris Burkov
2025-08-22 17:01     ` Mark Harmstone
2025-08-19  1:05   ` kernel test robot
2025-08-22 17:07     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-08-16  0:06   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-08-16  0:08   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-08-22 19:14   ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-08-22 19:42   ` Boris Burkov
2025-08-27 14:08     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 09/16] btrfs: release BG lock before calling btrfs_link_bg_list() Mark Harmstone
2025-08-16  0:32   ` Boris Burkov
2025-08-27 15:35     ` Mark Harmstone
2025-08-27 15:48       ` Filipe Manana
2025-08-27 15:52         ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 10/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-08-16  0:28   ` Boris Burkov
2025-08-27 17:11     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 11/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 12/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 13/16] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 14/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 15/16] btrfs: add fully_remapped_bgs list Mark Harmstone
2025-08-16  0:56   ` Boris Burkov
2025-08-27 18:51     ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 16/16] btrfs: allow balancing remap tree Mark Harmstone
2025-08-16  1:02   ` Boris Burkov
2025-09-02 14:58     ` Mark Harmstone
2025-09-02 15:21     ` Mark Harmstone [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1629f7a0-a2d8-400a-827d-eae9280de3cb@harmstone.com \
    --to=mark@harmstone.com \
    --cc=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.