From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Josef Bacik <josef@toxicpanda.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 8/8] btrfs: add support for multiple global roots
Date: Sat, 6 Nov 2021 09:18:02 +0800 [thread overview]
Message-ID: <0595f1b5-2d5c-c5ca-cfad-efb753afec1b@gmx.com> (raw)
In-Reply-To: <a6f403691bdec22e8e052f699ae52f18875cb870.1636145221.git.josef@toxicpanda.com>
On 2021/11/6 04:49, Josef Bacik wrote:
> With extent tree v2 you will be able to create multiple csum, extent,
> and free space trees. They will be used based on the block group, which
> will now use the block_group_item->chunk_objectid to point to the set of
> global roots that it will use. When allocating new block groups we'll
> simply mod the gigabyte offset of the block group against the number of
> global roots we have and that will be the block groups global id.
>
> From there we can take the bytenr that we're modifying in the respective
> tree, look up the block group and get that block groups corresponding
> global root id. From there we can get to the appropriate global root
> for that bytenr.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/block-group.c | 11 +++++++--
> fs/btrfs/block-group.h | 1 +
> fs/btrfs/ctree.h | 2 ++
> fs/btrfs/disk-io.c | 49 +++++++++++++++++++++++++++++++-------
> fs/btrfs/free-space-tree.c | 2 ++
> fs/btrfs/transaction.c | 15 ++++++++++++
> fs/btrfs/tree-checker.c | 21 ++++++++++++++--
> 7 files changed, 88 insertions(+), 13 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 7eb0a8632a01..85516f2fd5da 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2002,6 +2002,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
> cache->length = key->offset;
> cache->used = btrfs_stack_block_group_used(bgi);
> cache->flags = btrfs_stack_block_group_flags(bgi);
> + cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
>
> set_free_space_tree_thresholds(cache);
>
> @@ -2284,7 +2285,7 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
> spin_lock(&block_group->lock);
> btrfs_set_stack_block_group_used(&bgi, block_group->used);
> btrfs_set_stack_block_group_chunk_objectid(&bgi,
> - BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> + block_group->global_root_id);
> btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
> key.objectid = block_group->start;
> key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> @@ -2460,6 +2461,12 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
> cache->flags = type;
> cache->last_byte_to_unpin = (u64)-1;
> cache->cached = BTRFS_CACHE_FINISHED;
> + cache->global_root_id = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> +
> + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> + cache->global_root_id = div64_u64(cache->start, SZ_1G) %
> + fs_info->nr_global_roots;
> +
Any special reason for this complex global_root_id calculation?
My initial assumption for global trees is pretty simple, just something
like (CSUM_TREE, ROOT_ITEM, bg bytenr) or (EXTENT_TREE, ROOT_ITEM, bg
bytenr) as their root key items.
But this is definitely not the case here.
Thus I'm wondering why we're not using something more simple.
Thanks,
Qu
> if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
> cache->needs_free_space = 1;
>
> @@ -2676,7 +2683,7 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
> bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
> btrfs_set_stack_block_group_used(&bgi, cache->used);
> btrfs_set_stack_block_group_chunk_objectid(&bgi,
> - BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> + cache->global_root_id);
> btrfs_set_stack_block_group_flags(&bgi, cache->flags);
> write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
> btrfs_mark_buffer_dirty(leaf);
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 5878b7ce3b78..93aabc68bb6a 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -68,6 +68,7 @@ struct btrfs_block_group {
> u64 bytes_super;
> u64 flags;
> u64 cache_generation;
> + u64 global_root_id;
>
> /*
> * If the free space extent count exceeds this number, convert the block
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b57367141b95..7de0cd2b87ec 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1057,6 +1057,8 @@ struct btrfs_fs_info {
> spinlock_t relocation_bg_lock;
> u64 data_reloc_bg;
>
> + u64 nr_global_roots;
> +
> spinlock_t zone_active_bgs_lock;
> struct list_head zone_active_bgs;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 45b2bde43150..a8bc00d17b26 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1295,13 +1295,33 @@ struct btrfs_root *btrfs_global_root(struct btrfs_fs_info *fs_info,
> return root;
> }
>
> +static u64 btrfs_global_root_id(struct btrfs_fs_info *fs_info, u64 bytenr)
> +{
> + struct btrfs_block_group *block_group;
> + u64 ret;
> +
> + if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> + return 0;
> +
> + if (likely(bytenr))
> + block_group = btrfs_lookup_block_group(fs_info, bytenr);
> + else
> + block_group = btrfs_lookup_first_block_group(fs_info, bytenr);
> + ASSERT(block_group);
> + if (!block_group)
> + return 0;
> + ret = block_group->global_root_id;
> + btrfs_put_block_group(block_group);
> + return ret;
> +}
> +
> struct btrfs_root *btrfs_csum_root(struct btrfs_fs_info *fs_info,
> u64 bytenr)
> {
> struct btrfs_key key = {
> .objectid = BTRFS_CSUM_TREE_OBJECTID,
> .type = BTRFS_ROOT_ITEM_KEY,
> - .offset = 0,
> + .offset = btrfs_global_root_id(fs_info, bytenr),
> };
>
> return btrfs_global_root(fs_info, &key);
> @@ -1313,7 +1333,7 @@ struct btrfs_root *btrfs_extent_root(struct btrfs_fs_info *fs_info,
> struct btrfs_key key = {
> .objectid = BTRFS_EXTENT_TREE_OBJECTID,
> .type = BTRFS_ROOT_ITEM_KEY,
> - .offset = 0,
> + .offset = btrfs_global_root_id(fs_info, bytenr),
> };
>
> return btrfs_global_root(fs_info, &key);
> @@ -2094,7 +2114,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
> {
> const int next_backup = info->backup_root_index;
> struct btrfs_root_backup *root_backup;
> - struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>
> root_backup = info->super_for_commit->super_roots + next_backup;
>
> @@ -2128,6 +2147,7 @@ static void backup_super_roots(struct btrfs_fs_info *info)
> btrfs_header_level(info->block_group_root->node));
> } else {
> struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
> + struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>
> btrfs_set_backup_extent_root(root_backup,
> extent_root->node->start);
> @@ -2135,6 +2155,12 @@ static void backup_super_roots(struct btrfs_fs_info *info)
> btrfs_header_generation(extent_root->node));
> btrfs_set_backup_extent_root_level(root_backup,
> btrfs_header_level(extent_root->node));
> +
> + btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
> + btrfs_set_backup_csum_root_gen(root_backup,
> + btrfs_header_generation(csum_root->node));
> + btrfs_set_backup_csum_root_level(root_backup,
> + btrfs_header_level(csum_root->node));
> }
>
> /*
> @@ -2156,12 +2182,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
> btrfs_set_backup_dev_root_level(root_backup,
> btrfs_header_level(info->dev_root->node));
>
> - btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
> - btrfs_set_backup_csum_root_gen(root_backup,
> - btrfs_header_generation(csum_root->node));
> - btrfs_set_backup_csum_root_level(root_backup,
> - btrfs_header_level(csum_root->node));
> -
> btrfs_set_backup_total_bytes(root_backup,
> btrfs_super_total_bytes(info->super_copy));
> btrfs_set_backup_bytes_used(root_backup,
> @@ -2550,6 +2570,7 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
> {
> struct btrfs_fs_info *fs_info = tree_root->fs_info;
> struct btrfs_root *root;
> + u64 max_global_id = 0;
> int ret;
> struct btrfs_key key = {
> .objectid = objectid,
> @@ -2586,6 +2607,13 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
> break;
> btrfs_release_path(path);
>
> + /*
> + * Just worry about this for extent tree, it'll be the same for
> + * everybody.
> + */
> + if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
> + max_global_id = max(max_global_id, key.offset);
> +
> found = true;
> root = read_tree_root_path(tree_root, path, &key);
> if (IS_ERR(root)) {
> @@ -2603,6 +2631,9 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
> }
> btrfs_release_path(path);
>
> + if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
> + fs_info->nr_global_roots = max_global_id + 1;
> +
> if (!found || ret) {
> if (objectid == BTRFS_CSUM_TREE_OBJECTID)
> set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
> index cf227450f356..60a73bcffaf1 100644
> --- a/fs/btrfs/free-space-tree.c
> +++ b/fs/btrfs/free-space-tree.c
> @@ -24,6 +24,8 @@ static struct btrfs_root *btrfs_free_space_root(
> .type = BTRFS_ROOT_ITEM_KEY,
> .offset = 0,
> };
> + if (btrfs_fs_incompat(block_group->fs_info, EXTENT_TREE_V2))
> + key.offset = block_group->global_root_id;
> return btrfs_global_root(block_group->fs_info, &key);
> }
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index ba8dd90ac3ce..e343ff8db05d 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1827,6 +1827,14 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
> super->cache_generation = 0;
> if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
> super->uuid_tree_generation = root_item->generation;
> +
> + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> + root_item = &fs_info->block_group_root->root_item;
> +
> + super->block_group_root = root_item->bytenr;
> + super->block_group_root_generation = root_item->generation;
> + super->block_group_root_level = root_item->level;
> + }
> }
>
> int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
> @@ -2261,6 +2269,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
> list_add_tail(&fs_info->chunk_root->dirty_list,
> &cur_trans->switch_commits);
>
> + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> + btrfs_set_root_node(&fs_info->block_group_root->root_item,
> + fs_info->block_group_root->node);
> + list_add_tail(&fs_info->block_group_root->dirty_list,
> + &cur_trans->switch_commits);
> + }
> +
> switch_commit_roots(trans);
>
> ASSERT(list_empty(&cur_trans->dirty_bgs));
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1c33dd0e4afc..572f52d78297 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -639,8 +639,10 @@ static void block_group_err(const struct extent_buffer *eb, int slot,
> static int check_block_group_item(struct extent_buffer *leaf,
> struct btrfs_key *key, int slot)
> {
> + struct btrfs_fs_info *fs_info = leaf->fs_info;
> struct btrfs_block_group_item bgi;
> u32 item_size = btrfs_item_size_nr(leaf, slot);
> + u64 chunk_objectid;
> u64 flags;
> u64 type;
>
> @@ -663,8 +665,23 @@ static int check_block_group_item(struct extent_buffer *leaf,
>
> read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
> sizeof(bgi));
> - if (unlikely(btrfs_stack_block_group_chunk_objectid(&bgi) !=
> - BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
> + chunk_objectid = btrfs_stack_block_group_chunk_objectid(&bgi);
> + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> + /*
> + * We don't init the nr_global_roots until we load the global
> + * roots, so this could be 0 at mount time. If it's 0 we'll
> + * just assume we're fine, and later we'll check against our
> + * actual value.
> + */
> + if (unlikely(fs_info->nr_global_roots &&
> + chunk_objectid >= fs_info->nr_global_roots)) {
> + block_group_err(leaf, slot,
> + "invalid block group global root id, have %llu, needs to be <= %llu",
> + chunk_objectid,
> + fs_info->nr_global_roots);
> + return -EUCLEAN;
> + }
> + } else if (unlikely(chunk_objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
> block_group_err(leaf, slot,
> "invalid block group chunk objectid, have %llu expect %llu",
> btrfs_stack_block_group_chunk_objectid(&bgi),
>
next prev parent reply other threads:[~2021-11-06 1:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
2021-11-06 1:05 ` Qu Wenruo
2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
2021-11-06 1:11 ` Qu Wenruo
2021-11-08 19:36 ` Josef Bacik
2021-11-09 1:14 ` Qu Wenruo
2021-11-09 19:24 ` Josef Bacik
2021-11-09 23:44 ` Qu Wenruo
2021-11-10 13:57 ` Josef Bacik
2021-11-10 7:13 ` Qu Wenruo
2021-11-10 13:54 ` Josef Bacik
2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
2021-11-06 1:18 ` Qu Wenruo [this message]
2021-11-06 1:51 ` Qu Wenruo
2021-11-08 19:39 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0595f1b5-2d5c-c5ca-cfad-efb753afec1b@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox