From: Boris Burkov <boris@bur.io>
To: Mark Harmstone <mark@harmstone.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag
Date: Fri, 22 Aug 2025 12:14:57 -0700 [thread overview]
Message-ID: <20250822191457.GA492925@zen.localdomain> (raw)
In-Reply-To: <20250813143509.31073-8-mark@harmstone.com>
On Wed, Aug 13, 2025 at 03:34:49PM +0100, Mark Harmstone wrote:
> If we encounter a filesystem with the remap-tree incompat flag set,
> valdiate its compatibility with the other flags, and load the remap tree
> using the values that have been added to the superblock.
>
> The remap-tree feature depends on the free space tere, but no-holes and
> block-group-tree have been made dependencies to reduce the testing
> matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be
> incompatible with remap-tree, but this is blocked for the time being
> until it can be fully tested.
>
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
> ---
> fs/btrfs/Kconfig | 2 +
> fs/btrfs/accessors.h | 6 +++
> fs/btrfs/disk-io.c | 86 ++++++++++++++++++++++++++++-----
> fs/btrfs/extent-tree.c | 2 +
> fs/btrfs/fs.h | 4 +-
> fs/btrfs/transaction.c | 7 +++
> include/uapi/linux/btrfs_tree.h | 5 +-
> 7 files changed, 97 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
> index ea95c90c8474..598a4af4ce4b 100644
> --- a/fs/btrfs/Kconfig
> +++ b/fs/btrfs/Kconfig
> @@ -116,6 +116,8 @@ config BTRFS_EXPERIMENTAL
>
> - large folio support
>
> + - remap-tree - logical address remapping tree
> +
> If unsure, say N.
>
> config BTRFS_FS_REF_VERIFY
> diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
> index 0dd161ee6863..392eaad75e72 100644
> --- a/fs/btrfs/accessors.h
> +++ b/fs/btrfs/accessors.h
> @@ -882,6 +882,12 @@ BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
> uuid_tree_generation, 64);
> BTRFS_SETGET_STACK_FUNCS(super_nr_global_roots, struct btrfs_super_block,
> nr_global_roots, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_remap_root, struct btrfs_super_block,
> + remap_root, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_remap_root_generation, struct btrfs_super_block,
> + remap_root_generation, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_remap_root_level, struct btrfs_super_block,
> + remap_root_level, 8);
>
> /* struct btrfs_file_extent_item */
> BTRFS_SETGET_STACK_FUNCS(stack_file_extent_type, struct btrfs_file_extent_item,
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 8e9520119d4f..563aea5e3b1b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1181,6 +1181,8 @@ static struct btrfs_root *btrfs_get_global_root(struct btrfs_fs_info *fs_info,
> return btrfs_grab_root(btrfs_global_root(fs_info, &key));
> case BTRFS_RAID_STRIPE_TREE_OBJECTID:
> return btrfs_grab_root(fs_info->stripe_root);
> + case BTRFS_REMAP_TREE_OBJECTID:
> + return btrfs_grab_root(fs_info->remap_root);
> default:
> return NULL;
> }
> @@ -1271,6 +1273,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
> btrfs_put_root(fs_info->data_reloc_root);
> btrfs_put_root(fs_info->block_group_root);
> btrfs_put_root(fs_info->stripe_root);
> + btrfs_put_root(fs_info->remap_root);
> btrfs_check_leaked_roots(fs_info);
> btrfs_extent_buffer_leak_debug_check(fs_info);
> kfree(fs_info->super_copy);
> @@ -1825,6 +1828,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
> free_root_extent_buffers(info->data_reloc_root);
> free_root_extent_buffers(info->block_group_root);
> free_root_extent_buffers(info->stripe_root);
> + free_root_extent_buffers(info->remap_root);
> if (free_chunk_root)
> free_root_extent_buffers(info->chunk_root);
> }
> @@ -2256,20 +2260,31 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
> if (ret)
> goto out;
>
> - /*
> - * This tree can share blocks with some other fs tree during relocation
> - * and we need a proper setup by btrfs_get_fs_root
> - */
> - root = btrfs_get_fs_root(tree_root->fs_info,
> - BTRFS_DATA_RELOC_TREE_OBJECTID, true);
> - if (IS_ERR(root)) {
> - if (!btrfs_test_opt(fs_info, IGNOREBADROOTS)) {
> - ret = PTR_ERR(root);
> - goto out;
> - }
> - } else {
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
> + /* remap_root already loaded in load_important_roots() */
> + root = fs_info->remap_root;
> +
> set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
> - fs_info->data_reloc_root = root;
> +
> + root->root_key.objectid = BTRFS_REMAP_TREE_OBJECTID;
> + root->root_key.type = BTRFS_ROOT_ITEM_KEY;
> + root->root_key.offset = 0;
> + } else {
It might be a good idea to vomit on finding a reloc tree if the
REMAP_TREE incompat bit is set. If that would happen elsewhere in tree
checker, that's great, the idea just struck me while reading this.
> + /*
> + * This tree can share blocks with some other fs tree during
> + * relocation and we need a proper setup by btrfs_get_fs_root
> + */
> + root = btrfs_get_fs_root(tree_root->fs_info,
> + BTRFS_DATA_RELOC_TREE_OBJECTID, true);
> + if (IS_ERR(root)) {
> + if (!btrfs_test_opt(fs_info, IGNOREBADROOTS)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
> + } else {
> + set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
> + fs_info->data_reloc_root = root;
> + }
> }
>
> location.objectid = BTRFS_QUOTA_TREE_OBJECTID;
> @@ -2509,6 +2524,28 @@ int btrfs_validate_super(const struct btrfs_fs_info *fs_info,
> ret = -EINVAL;
> }
>
> + /* Ditto for remap_tree */
Don't care strongly, but "ditto" is less clear and more prone to
breaking when other code gets refactored than just writing out what
you mean (perhaps with a reference to something above).
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
> + (!btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID) ||
> + !btrfs_fs_incompat(fs_info, NO_HOLES) ||
> + !btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))) {
> + btrfs_err(fs_info,
> +"remap-tree feature requires free-space-tree, no-holes, and block-group-tree");
> + ret = -EINVAL;
> + }
> +
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
> + btrfs_fs_incompat(fs_info, MIXED_GROUPS)) {
> + btrfs_err(fs_info, "remap-tree not supported with mixed-bg");
> + ret = -EINVAL;
> + }
> +
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
> + btrfs_fs_incompat(fs_info, ZONED)) {
> + btrfs_err(fs_info, "remap-tree not supported with zoned devices");
> + ret = -EINVAL;
> + }
> +
> /*
> * Hint to catch really bogus numbers, bitflips or so, more exact checks are
> * done later
> @@ -2667,6 +2704,18 @@ static int load_important_roots(struct btrfs_fs_info *fs_info)
> btrfs_warn(fs_info, "couldn't read tree root");
> return ret;
> }
> +
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
> + bytenr = btrfs_super_remap_root(sb);
> + gen = btrfs_super_remap_root_generation(sb);
> + level = btrfs_super_remap_root_level(sb);
> + ret = load_super_root(fs_info->remap_root, bytenr, gen, level);
> + if (ret) {
> + btrfs_warn(fs_info, "couldn't read remap root");
> + return ret;
> + }
> + }
> +
> return 0;
> }
>
> @@ -3278,6 +3327,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> struct btrfs_root *tree_root;
> struct btrfs_root *chunk_root;
> + struct btrfs_root *remap_root;
> int ret;
> int level;
>
> @@ -3312,6 +3362,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> goto fail_alloc;
> }
>
> + if (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_REMAP_TREE) {
> + remap_root = btrfs_alloc_root(fs_info, BTRFS_REMAP_TREE_OBJECTID,
> + GFP_KERNEL);
> + fs_info->remap_root = remap_root;
> + if (!remap_root) {
> + ret = -ENOMEM;
> + goto fail_alloc;
> + }
> + }
> +
This feels like it should come after the csum verification stuff in the
bootstrap process. The csum stuff is from the super so it shouldn't
depend on remap, but remap is an eb and has csums, so it does "rely" on
that (obviously it will break anyway, but then why bother doing the
explicit checking at all)
As the most general statement, I think it putting it as "late as possible"
is the most self documenting option.
> btrfs_info(fs_info, "first mount of filesystem %pU", disk_super->fsid);
> /*
> * Verify the type first, if that or the checksum value are
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5e038ae1a93f..c1b96c728fe6 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2589,6 +2589,8 @@ static u64 get_alloc_profile_by_root(struct btrfs_root *root, int data)
> flags = BTRFS_BLOCK_GROUP_DATA;
> else if (root == fs_info->chunk_root)
> flags = BTRFS_BLOCK_GROUP_SYSTEM;
> + else if (root == fs_info->remap_root)
> + flags = BTRFS_BLOCK_GROUP_REMAP;
> else
> flags = BTRFS_BLOCK_GROUP_METADATA;
>
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 9ce75843b578..6ea96e76655e 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -288,7 +288,8 @@ enum {
> #define BTRFS_FEATURE_INCOMPAT_SUPP \
> (BTRFS_FEATURE_INCOMPAT_SUPP_STABLE | \
> BTRFS_FEATURE_INCOMPAT_RAID_STRIPE_TREE | \
> - BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2)
> + BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 | \
> + BTRFS_FEATURE_INCOMPAT_REMAP_TREE)
>
> #else
>
> @@ -438,6 +439,7 @@ struct btrfs_fs_info {
> struct btrfs_root *data_reloc_root;
> struct btrfs_root *block_group_root;
> struct btrfs_root *stripe_root;
> + struct btrfs_root *remap_root;
>
> /* The log root tree is a directory of all the other log roots */
> struct btrfs_root *log_root_tree;
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c5c0d9cf1a80..64b9c427af6a 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1953,6 +1953,13 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
> super->cache_generation = 0;
> if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
> super->uuid_tree_generation = root_item->generation;
> +
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
> + root_item = &fs_info->remap_root->root_item;
> + super->remap_root = root_item->bytenr;
> + super->remap_root_generation = root_item->generation;
> + super->remap_root_level = root_item->level;
> + }
> }
>
> int btrfs_transaction_blocked(struct btrfs_fs_info *info)
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index 500e3a7df90b..89bcb80081a6 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -721,9 +721,12 @@ struct btrfs_super_block {
> __u8 metadata_uuid[BTRFS_FSID_SIZE];
>
> __u64 nr_global_roots;
> + __le64 remap_root;
> + __le64 remap_root_generation;
> + __u8 remap_root_level;
>
> /* Future expansion */
> - __le64 reserved[27];
> + __u8 reserved[199];
> __u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
> struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
>
> --
> 2.49.1
>
next prev parent reply other threads:[~2025-08-22 19:12 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 14:34 [PATCH v2 00/16] btrfs: remap tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-08-15 23:51 ` Boris Burkov
2025-08-18 17:21 ` Mark Harmstone
2025-08-18 17:33 ` Boris Burkov
2025-08-16 0:01 ` Qu Wenruo
2025-08-16 0:17 ` Qu Wenruo
2025-08-18 17:23 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-08-16 0:03 ` Boris Burkov
2025-08-22 17:01 ` Mark Harmstone
2025-08-19 1:05 ` kernel test robot
2025-08-22 17:07 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-08-16 0:06 ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-08-16 0:08 ` Boris Burkov
2025-08-13 14:34 ` [PATCH v2 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-08-22 19:14 ` Boris Burkov [this message]
2025-08-13 14:34 ` [PATCH v2 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-08-22 19:42 ` Boris Burkov
2025-08-27 14:08 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 09/16] btrfs: release BG lock before calling btrfs_link_bg_list() Mark Harmstone
2025-08-16 0:32 ` Boris Burkov
2025-08-27 15:35 ` Mark Harmstone
2025-08-27 15:48 ` Filipe Manana
2025-08-27 15:52 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 10/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-08-16 0:28 ` Boris Burkov
2025-08-27 17:11 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 11/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 12/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 13/16] btrfs: replace identity maps with actual remaps when doing relocations Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 14/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 15/16] btrfs: add fully_remapped_bgs list Mark Harmstone
2025-08-16 0:56 ` Boris Burkov
2025-08-27 18:51 ` Mark Harmstone
2025-08-13 14:34 ` [PATCH v2 16/16] btrfs: allow balancing remap tree Mark Harmstone
2025-08-16 1:02 ` Boris Burkov
2025-09-02 14:58 ` Mark Harmstone
2025-09-02 15:21 ` Mark Harmstone
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250822191457.GA492925@zen.localdomain \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=mark@harmstone.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox