public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <maharmstone@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree
Date: Fri, 13 Jun 2025 15:39:16 -0700	[thread overview]
Message-ID: <20250613223916.GE3621880@zen.localdomain> (raw)
In-Reply-To: <20250605162345.2561026-6-maharmstone@fb.com>

On Thu, Jun 05, 2025 at 05:23:35PM +0100, Mark Harmstone wrote:
> There is the following potential problem with the remap tree and delayed refs:
> 
> * Remapped extent freed in a delayed ref, which removes an entry from the
>   remap tree
> * Remap tree now small enough to fit in a single leaf
> * Corruption as we now have a level-0 block with a level-1 metadata item
>   in the extent tree
> 
> One solution to this would be to rework the remap tree code so that it operates
> via delayed refs. But as we're hoping to remove cow-only metadata items in the
> future anyway, change things so that the remap tree doesn't have any entries in
> the extent tree. This also has the benefit of reducing write amplification.
> 
> We also make it so that the clear_cache mount option is a no-op, as with the
> extent tree v2, as the free-space tree can no longer be recreated from the
> extent tree.
> 
> Finally disable relocating the remap tree itself for the time being: rather
> than walking the extent tree, this will need to be changed so that the remap
> tree gets walked, and any nodes within the specified block groups get COWed.
> This code will also cover the future cases when we remove the metadata items
> for the SYSTEM block groups, i.e. the chunk and root trees.

Why not a separate trivial patch for disabling remap tree reloc?

> 
> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
> ---
>  fs/btrfs/disk-io.c     |   3 ++
>  fs/btrfs/extent-tree.c | 114 ++++++++++++++++++++++++-----------------
>  fs/btrfs/volumes.c     |   3 ++
>  3 files changed, 73 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60cce96a9ec4..324116c3566c 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3064,6 +3064,9 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
>  		if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>  			btrfs_warn(fs_info,
>  				   "'clear_cache' option is ignored with extent tree v2");
> +		else if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> +			btrfs_warn(fs_info,
> +				   "'clear_cache' option is ignored with remap tree");
>  		else
>  			rebuild_free_space_tree = true;
>  	} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 46d4963a8241..205692fc1c7e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3106,6 +3106,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  	bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA);
>  	u64 delayed_ref_root = href->owning_root;
>  
> +	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
> +
> +	if (!is_data && node->ref_root == BTRFS_REMAP_TREE_OBJECTID) {

Are there cases where ref_root is REMAP_TREE but is_data is true? Or is
this redundant? If so, an assert might make more sense than including it
in the if condition.


Also, rather than special-casing / short-cutting a generic function or
metadata/data that is fully concerned with *extents*, I would come up
with a name for the "non extent tree metadata" concept and make that a
case at the metadata-freeing callsite.

> +		ret = add_to_free_space_tree(trans, bytenr, num_bytes);
> +		if (ret) {
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
> +
> +		ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
> +		if (ret) {
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
> +
> +		return 0;
> +	}
> +
>  	extent_root = btrfs_extent_root(info, bytenr);
>  	ASSERT(extent_root);
>  
> @@ -3113,8 +3131,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  	if (!path)
>  		return -ENOMEM;
>  
> -	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
> -
>  	if (!is_data && refs_to_drop != 1) {
>  		btrfs_crit(info,
>  "invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
> @@ -4893,57 +4909,61 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
>  	int level = btrfs_delayed_ref_owner(node);
>  	bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA);
>  
> -	extent_key.objectid = node->bytenr;
> -	if (skinny_metadata) {
> -		/* The owner of a tree block is the level. */
> -		extent_key.offset = level;
> -		extent_key.type = BTRFS_METADATA_ITEM_KEY;
> -	} else {
> -		extent_key.offset = node->num_bytes;
> -		extent_key.type = BTRFS_EXTENT_ITEM_KEY;
> -		size += sizeof(*block_info);
> -	}
> +	if (node->ref_root != BTRFS_REMAP_TREE_OBJECTID) {

Similarly, I don't like jamming this whole allocation function that
fundamentally doesn't care about remap_tree behind a remap tree check
if.

I would at least change it to

if (unlikely(remap_tree))
        goto skip;

or something.

> +		extent_key.objectid = node->bytenr;
> +		if (skinny_metadata) {
> +			/* The owner of a tree block is the level. */
> +			extent_key.offset = level;
> +			extent_key.type = BTRFS_METADATA_ITEM_KEY;
> +		} else {
> +			extent_key.offset = node->num_bytes;
> +			extent_key.type = BTRFS_EXTENT_ITEM_KEY;
> +			size += sizeof(*block_info);
> +		}
>  
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> +		path = btrfs_alloc_path();
> +		if (!path)
> +			return -ENOMEM;
>  
> -	extent_root = btrfs_extent_root(fs_info, extent_key.objectid);
> -	ret = btrfs_insert_empty_item(trans, extent_root, path, &extent_key,
> -				      size);
> -	if (ret) {
> -		btrfs_free_path(path);
> -		return ret;
> -	}
> +		extent_root = btrfs_extent_root(fs_info, extent_key.objectid);
> +		ret = btrfs_insert_empty_item(trans, extent_root, path,
> +					      &extent_key, size);
> +		if (ret) {
> +			btrfs_free_path(path);
> +			return ret;
> +		}
>  
> -	leaf = path->nodes[0];
> -	extent_item = btrfs_item_ptr(leaf, path->slots[0],
> -				     struct btrfs_extent_item);
> -	btrfs_set_extent_refs(leaf, extent_item, 1);
> -	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
> -	btrfs_set_extent_flags(leaf, extent_item,
> -			       flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
> +		leaf = path->nodes[0];
> +		extent_item = btrfs_item_ptr(leaf, path->slots[0],
> +					struct btrfs_extent_item);
> +		btrfs_set_extent_refs(leaf, extent_item, 1);
> +		btrfs_set_extent_generation(leaf, extent_item, trans->transid);
> +		btrfs_set_extent_flags(leaf, extent_item,
> +				flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
>  
> -	if (skinny_metadata) {
> -		iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
> -	} else {
> -		block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
> -		btrfs_set_tree_block_key(leaf, block_info, &extent_op->key);
> -		btrfs_set_tree_block_level(leaf, block_info, level);
> -		iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
> -	}
> +		if (skinny_metadata) {
> +			iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
> +		} else {
> +			block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
> +			btrfs_set_tree_block_key(leaf, block_info, &extent_op->key);
> +			btrfs_set_tree_block_level(leaf, block_info, level);
> +			iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
> +		}
>  
> -	if (node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
> -		btrfs_set_extent_inline_ref_type(leaf, iref,
> -						 BTRFS_SHARED_BLOCK_REF_KEY);
> -		btrfs_set_extent_inline_ref_offset(leaf, iref, node->parent);
> -	} else {
> -		btrfs_set_extent_inline_ref_type(leaf, iref,
> -						 BTRFS_TREE_BLOCK_REF_KEY);
> -		btrfs_set_extent_inline_ref_offset(leaf, iref, node->ref_root);
> -	}
> +		if (node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
> +			btrfs_set_extent_inline_ref_type(leaf, iref,
> +						BTRFS_SHARED_BLOCK_REF_KEY);
> +			btrfs_set_extent_inline_ref_offset(leaf, iref,
> +							   node->parent);
> +		} else {
> +			btrfs_set_extent_inline_ref_type(leaf, iref,
> +						BTRFS_TREE_BLOCK_REF_KEY);
> +			btrfs_set_extent_inline_ref_offset(leaf, iref,
> +							   node->ref_root);
> +		}
>  
> -	btrfs_free_path(path);
> +		btrfs_free_path(path);
> +	}
>  
>  	return alloc_reserved_extent(trans, node->bytenr, fs_info->nodesize);
>  }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9159d11cb143..0f4954f998cd 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3981,6 +3981,9 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
>  	struct btrfs_balance_args *bargs = NULL;
>  	u64 chunk_type = btrfs_chunk_type(leaf, chunk);
>  
> +	if (chunk_type & BTRFS_BLOCK_GROUP_REMAP)
> +		return false;
> +
>  	/* type filter */
>  	if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
>  	      (bctl->flags & BTRFS_BALANCE_TYPE_MASK))) {
> -- 
> 2.49.0
> 

  reply	other threads:[~2025-06-13 22:39 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 16:23 [PATCH 00/12] btrfs: remap tree Mark Harmstone
2025-06-05 16:23 ` [PATCH 01/12] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-06-13 21:02   ` Boris Burkov
2025-06-05 16:23 ` [PATCH 02/12] btrfs: add REMAP chunk type Mark Harmstone
2025-06-13 21:22   ` Boris Burkov
2025-06-05 16:23 ` [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-06-13 21:41   ` Boris Burkov
2025-08-08 14:12     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 04/12] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-06-06  6:41   ` kernel test robot
2025-06-13 22:00   ` Boris Burkov
2025-08-12 14:50     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-06-13 22:39   ` Boris Burkov [this message]
2025-06-05 16:23 ` [PATCH 06/12] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-06-05 16:23 ` [PATCH 07/12] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-06-05 16:23 ` [PATCH 08/12] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-06-05 16:23 ` [PATCH 09/12] btrfs: handle deletions from remapped block group Mark Harmstone
2025-06-13 23:42   ` Boris Burkov
2025-08-11 16:48     ` Mark Harmstone
2025-08-11 16:59     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 10/12] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-06-13 23:25   ` Boris Burkov
2025-08-12 11:20     ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 11/12] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-06-06 11:20   ` kernel test robot
2025-06-05 16:23 ` [PATCH 12/12] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-06-05 16:43 ` [PATCH 00/12] btrfs: remap tree Jonah Sabean
2025-06-06 13:35   ` Mark Harmstone
2025-06-09 16:05     ` Anand Jain
2025-06-09 18:51 ` David Sterba
2025-06-10  9:19   ` Mark Harmstone
2025-06-10 14:31 ` Mark Harmstone
2025-06-10 23:56   ` Qu Wenruo
2025-06-11  8:06     ` Mark Harmstone
2025-06-11 15:28 ` Mark Harmstone
2025-06-14  0:04 ` Boris Burkov
2025-06-26 22:10 ` Mark Harmstone
2025-06-27  5:59   ` Neal Gompa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250613223916.GE3621880@zen.localdomain \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=maharmstone@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox