From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <maharmstone@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree
Date: Fri, 13 Jun 2025 15:39:16 -0700 [thread overview]
Message-ID: <20250613223916.GE3621880@zen.localdomain> (raw)
In-Reply-To: <20250605162345.2561026-6-maharmstone@fb.com>
On Thu, Jun 05, 2025 at 05:23:35PM +0100, Mark Harmstone wrote:
> There is the following potential problem with the remap tree and delayed refs:
>
> * Remapped extent freed in a delayed ref, which removes an entry from the
> remap tree
> * Remap tree now small enough to fit in a single leaf
> * Corruption as we now have a level-0 block with a level-1 metadata item
> in the extent tree
>
> One solution to this would be to rework the remap tree code so that it operates
> via delayed refs. But as we're hoping to remove cow-only metadata items in the
> future anyway, change things so that the remap tree doesn't have any entries in
> the extent tree. This also has the benefit of reducing write amplification.
>
> We also make it so that the clear_cache mount option is a no-op, as with the
> extent tree v2, as the free-space tree can no longer be recreated from the
> extent tree.
>
> Finally disable relocating the remap tree itself for the time being: rather
> than walking the extent tree, this will need to be changed so that the remap
> tree gets walked, and any nodes within the specified block groups get COWed.
> This code will also cover the future cases when we remove the metadata items
> for the SYSTEM block groups, i.e. the chunk and root trees.
Why not a separate trivial patch for disabling remap tree reloc?
>
> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
> ---
> fs/btrfs/disk-io.c | 3 ++
> fs/btrfs/extent-tree.c | 114 ++++++++++++++++++++++++-----------------
> fs/btrfs/volumes.c | 3 ++
> 3 files changed, 73 insertions(+), 47 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60cce96a9ec4..324116c3566c 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3064,6 +3064,9 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
> if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> btrfs_warn(fs_info,
> "'clear_cache' option is ignored with extent tree v2");
> + else if (btrfs_fs_incompat(fs_info, REMAP_TREE))
> + btrfs_warn(fs_info,
> + "'clear_cache' option is ignored with remap tree");
> else
> rebuild_free_space_tree = true;
> } else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 46d4963a8241..205692fc1c7e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3106,6 +3106,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA);
> u64 delayed_ref_root = href->owning_root;
>
> + is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
> +
> + if (!is_data && node->ref_root == BTRFS_REMAP_TREE_OBJECTID) {
Are there cases where ref_root is REMAP_TREE but is_data is true? Or is
this redundant? If so, an assert might make more sense than including it
in the if condition.
Also, rather than special-casing / short-cutting a generic function or
metadata/data that is fully concerned with *extents*, I would come up
with a name for the "non extent tree metadata" concept and make that a
case at the metadata-freeing callsite.
> + ret = add_to_free_space_tree(trans, bytenr, num_bytes);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> +
> + ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> +
> + return 0;
> + }
> +
> extent_root = btrfs_extent_root(info, bytenr);
> ASSERT(extent_root);
>
> @@ -3113,8 +3131,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> if (!path)
> return -ENOMEM;
>
> - is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
> -
> if (!is_data && refs_to_drop != 1) {
> btrfs_crit(info,
> "invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
> @@ -4893,57 +4909,61 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
> int level = btrfs_delayed_ref_owner(node);
> bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA);
>
> - extent_key.objectid = node->bytenr;
> - if (skinny_metadata) {
> - /* The owner of a tree block is the level. */
> - extent_key.offset = level;
> - extent_key.type = BTRFS_METADATA_ITEM_KEY;
> - } else {
> - extent_key.offset = node->num_bytes;
> - extent_key.type = BTRFS_EXTENT_ITEM_KEY;
> - size += sizeof(*block_info);
> - }
> + if (node->ref_root != BTRFS_REMAP_TREE_OBJECTID) {
Similarly, I don't like jamming this whole allocation function that
fundamentally doesn't care about remap_tree behind a remap tree check
if.
I would at least change it to
if (unlikely(remap_tree))
goto skip;
or something.
> + extent_key.objectid = node->bytenr;
> + if (skinny_metadata) {
> + /* The owner of a tree block is the level. */
> + extent_key.offset = level;
> + extent_key.type = BTRFS_METADATA_ITEM_KEY;
> + } else {
> + extent_key.offset = node->num_bytes;
> + extent_key.type = BTRFS_EXTENT_ITEM_KEY;
> + size += sizeof(*block_info);
> + }
>
> - path = btrfs_alloc_path();
> - if (!path)
> - return -ENOMEM;
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
>
> - extent_root = btrfs_extent_root(fs_info, extent_key.objectid);
> - ret = btrfs_insert_empty_item(trans, extent_root, path, &extent_key,
> - size);
> - if (ret) {
> - btrfs_free_path(path);
> - return ret;
> - }
> + extent_root = btrfs_extent_root(fs_info, extent_key.objectid);
> + ret = btrfs_insert_empty_item(trans, extent_root, path,
> + &extent_key, size);
> + if (ret) {
> + btrfs_free_path(path);
> + return ret;
> + }
>
> - leaf = path->nodes[0];
> - extent_item = btrfs_item_ptr(leaf, path->slots[0],
> - struct btrfs_extent_item);
> - btrfs_set_extent_refs(leaf, extent_item, 1);
> - btrfs_set_extent_generation(leaf, extent_item, trans->transid);
> - btrfs_set_extent_flags(leaf, extent_item,
> - flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
> + leaf = path->nodes[0];
> + extent_item = btrfs_item_ptr(leaf, path->slots[0],
> + struct btrfs_extent_item);
> + btrfs_set_extent_refs(leaf, extent_item, 1);
> + btrfs_set_extent_generation(leaf, extent_item, trans->transid);
> + btrfs_set_extent_flags(leaf, extent_item,
> + flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
>
> - if (skinny_metadata) {
> - iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
> - } else {
> - block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
> - btrfs_set_tree_block_key(leaf, block_info, &extent_op->key);
> - btrfs_set_tree_block_level(leaf, block_info, level);
> - iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
> - }
> + if (skinny_metadata) {
> + iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
> + } else {
> + block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
> + btrfs_set_tree_block_key(leaf, block_info, &extent_op->key);
> + btrfs_set_tree_block_level(leaf, block_info, level);
> + iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
> + }
>
> - if (node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
> - btrfs_set_extent_inline_ref_type(leaf, iref,
> - BTRFS_SHARED_BLOCK_REF_KEY);
> - btrfs_set_extent_inline_ref_offset(leaf, iref, node->parent);
> - } else {
> - btrfs_set_extent_inline_ref_type(leaf, iref,
> - BTRFS_TREE_BLOCK_REF_KEY);
> - btrfs_set_extent_inline_ref_offset(leaf, iref, node->ref_root);
> - }
> + if (node->type == BTRFS_SHARED_BLOCK_REF_KEY) {
> + btrfs_set_extent_inline_ref_type(leaf, iref,
> + BTRFS_SHARED_BLOCK_REF_KEY);
> + btrfs_set_extent_inline_ref_offset(leaf, iref,
> + node->parent);
> + } else {
> + btrfs_set_extent_inline_ref_type(leaf, iref,
> + BTRFS_TREE_BLOCK_REF_KEY);
> + btrfs_set_extent_inline_ref_offset(leaf, iref,
> + node->ref_root);
> + }
>
> - btrfs_free_path(path);
> + btrfs_free_path(path);
> + }
>
> return alloc_reserved_extent(trans, node->bytenr, fs_info->nodesize);
> }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9159d11cb143..0f4954f998cd 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3981,6 +3981,9 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
> struct btrfs_balance_args *bargs = NULL;
> u64 chunk_type = btrfs_chunk_type(leaf, chunk);
>
> + if (chunk_type & BTRFS_BLOCK_GROUP_REMAP)
> + return false;
> +
> /* type filter */
> if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
> (bctl->flags & BTRFS_BALANCE_TYPE_MASK))) {
> --
> 2.49.0
>
next prev parent reply other threads:[~2025-06-13 22:39 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 16:23 [PATCH 00/12] btrfs: remap tree Mark Harmstone
2025-06-05 16:23 ` [PATCH 01/12] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-06-13 21:02 ` Boris Burkov
2025-06-05 16:23 ` [PATCH 02/12] btrfs: add REMAP chunk type Mark Harmstone
2025-06-13 21:22 ` Boris Burkov
2025-06-05 16:23 ` [PATCH 03/12] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-06-13 21:41 ` Boris Burkov
2025-08-08 14:12 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 04/12] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-06-06 6:41 ` kernel test robot
2025-06-13 22:00 ` Boris Burkov
2025-08-12 14:50 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 05/12] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-06-13 22:39 ` Boris Burkov [this message]
2025-06-05 16:23 ` [PATCH 06/12] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-06-05 16:23 ` [PATCH 07/12] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-06-05 16:23 ` [PATCH 08/12] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-06-05 16:23 ` [PATCH 09/12] btrfs: handle deletions from remapped block group Mark Harmstone
2025-06-13 23:42 ` Boris Burkov
2025-08-11 16:48 ` Mark Harmstone
2025-08-11 16:59 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 10/12] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-06-13 23:25 ` Boris Burkov
2025-08-12 11:20 ` Mark Harmstone
2025-06-05 16:23 ` [PATCH 11/12] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-06-06 11:20 ` kernel test robot
2025-06-05 16:23 ` [PATCH 12/12] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-06-05 16:43 ` [PATCH 00/12] btrfs: remap tree Jonah Sabean
2025-06-06 13:35 ` Mark Harmstone
2025-06-09 16:05 ` Anand Jain
2025-06-09 18:51 ` David Sterba
2025-06-10 9:19 ` Mark Harmstone
2025-06-10 14:31 ` Mark Harmstone
2025-06-10 23:56 ` Qu Wenruo
2025-06-11 8:06 ` Mark Harmstone
2025-06-11 15:28 ` Mark Harmstone
2025-06-14 0:04 ` Boris Burkov
2025-06-26 22:10 ` Mark Harmstone
2025-06-27 5:59 ` Neal Gompa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250613223916.GE3621880@zen.localdomain \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=maharmstone@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox