[PATCH 0/8] btrfs: extent tree v2, support for global roots

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] btrfs: extent tree v2, support for global roots
@ 2021-11-05 20:49 Josef Bacik
  2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

This is the kernel side of the global roots and block group root support.  The
motivation for this change is described in the progs patches.  The important
part here is I've disabled qgroups and balance for now, this support will be
added back later.  I've also changed global block rsv size calculation, but it's
exactly the same result for !EXTENT_TREE_V2.  And finally there's the support
for loading the roots.  This doesn't panic and doesn't introduce any performance
regressions.  I've also hidden the support behind CONFIG_BTRFS_DEBUG so it
doesn't get used accidentally.  Thanks,

Josef

Josef Bacik (8):
  btrfs: add definition for EXTENT_TREE_V2
  btrfs: disable balance for extent tree v2 for now
  btrfs: disable qgroups in extent tree v2
  btrfs: use metadata usage for global block rsv in extent tree v2
  btrfs: tree-checker: don't fail on empty extent roots for extent tree
    v2
  btrfs: abstract out loading the tree root
  btrfs: add code to support the block group root
  btrfs: add support for multiple global roots

 fs/btrfs/block-group.c          |  11 +-
 fs/btrfs/block-group.h          |   1 +
 fs/btrfs/block-rsv.c            |  16 +--
 fs/btrfs/ctree.h                |  46 ++++++++-
 fs/btrfs/disk-io.c              | 178 +++++++++++++++++++++++---------
 fs/btrfs/disk-io.h              |   2 +
 fs/btrfs/free-space-tree.c      |   2 +
 fs/btrfs/print-tree.c           |   1 +
 fs/btrfs/qgroup.c               |   6 ++
 fs/btrfs/sysfs.c                |   5 +-
 fs/btrfs/transaction.c          |  15 +++
 fs/btrfs/tree-checker.c         |  35 ++++++-
 fs/btrfs/volumes.c              |   6 ++
 include/trace/events/btrfs.h    |   1 +
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   3 +
 16 files changed, 266 insertions(+), 63 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This adds the initial definition of the EXTENT_TREE_V2 incompat feature
flag.  This also hides the support behind CONFIG_BTRFS_DEBUG.

THIS IS A IN DEVELOPMENT FORMAT CHANGE, DO NOT USE UNLESS YOU ARE A
DEVELOPER OR A TESTER.

The format is in flux and will be added in stages, any fs will need to
be re-made between updates to the format.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h           | 18 ++++++++++++++++++
 fs/btrfs/sysfs.c           |  5 ++++-
 include/uapi/linux/btrfs.h |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d32aa0fe1415..8ec2f068a1c2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -297,6 +297,23 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET	0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR	0ULL
 
+#ifdef CONFIG_BTRFS_DEBUG
+#define BTRFS_FEATURE_INCOMPAT_SUPP			\
+	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
+	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
+	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
+	 BTRFS_FEATURE_INCOMPAT_BIG_METADATA |		\
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD |		\
+	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
+	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
+	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES	|	\
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID	|	\
+	 BTRFS_FEATURE_INCOMPAT_RAID1C34	|	\
+	 BTRFS_FEATURE_INCOMPAT_ZONED		|	\
+	 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2)
+#else
 #define BTRFS_FEATURE_INCOMPAT_SUPP			\
 	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
 	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
@@ -311,6 +328,7 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
 	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID	|	\
 	 BTRFS_FEATURE_INCOMPAT_RAID1C34	|	\
 	 BTRFS_FEATURE_INCOMPAT_ZONED)
+#endif
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index f9eff3b0f77c..36f545ae1264 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -283,9 +283,11 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
 BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
 BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
 BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34);
-/* Remove once support for zoned allocation is feature complete */
 #ifdef CONFIG_BTRFS_DEBUG
+/* Remove once support for zoned allocation is feature complete */
 BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED);
+/* Remove once support for extent tree v2 is feature complete */
+BTRFS_FEAT_ATTR_INCOMPAT(extent_tree_v2, EXTENT_TREE_V2);
 #endif
 #ifdef CONFIG_FS_VERITY
 BTRFS_FEAT_ATTR_COMPAT_RO(verity, VERITY);
@@ -314,6 +316,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(raid1c34),
 #ifdef CONFIG_BTRFS_DEBUG
 	BTRFS_FEAT_ATTR_PTR(zoned),
+	BTRFS_FEAT_ATTR_PTR(extent_tree_v2),
 #endif
 #ifdef CONFIG_FS_VERITY
 	BTRFS_FEAT_ATTR_PTR(verity),
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index c1a665d87f61..bd29869448e3 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -309,6 +309,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID	(1ULL << 10)
 #define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
 #define BTRFS_FEATURE_INCOMPAT_ZONED		(1ULL << 12)
+#define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2	(1ULL << 13)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] btrfs: disable balance for extent tree v2 for now
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
  2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

With global root id's it makes it problematic to do backref lookups for
balance.  This isn't hard to deal with, but future changes are going to
make it impossible to lookup backrefs on any cowonly roots, so go ahead
and disable balance for now on extent tree v2 until we can add balance
support back in future patches.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 85842eb1f7b1..5e62b97cb265 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3216,6 +3216,12 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	u64 length;
 	int ret;
 
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		btrfs_err(fs_info,
+			  "relocate: not supported on extent tree v2 yet.");
+		return -EINVAL;
+	}
+
 	/*
 	 * Prevent races with automatic removal of unused block groups.
 	 * After we relocate and before we remove the chunk with offset
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] btrfs: disable qgroups in extent tree v2
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
  2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
  2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Backref lookups are going to be drastically different with extent tree
v2, disable qgroups until we do the work to add this support for extent
tree v2.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/qgroup.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index be0c6ce3205d..2f37dbcfc35d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -940,6 +940,12 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 	int ret = 0;
 	int slot;
 
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		btrfs_err(fs_info,
+			  "qgroups are currently unsupported in extent tree v2");
+		return -EINVAL;
+	}
+
 	mutex_lock(&fs_info->qgroup_ioctl_lock);
 	if (fs_info->quota_root)
 		goto out;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] btrfs: use metadata usage for global block rsv in extent tree v2
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
                   ` (2 preceding siblings ...)
  2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

With multiple csum and extent roots it makes it tricky to figure out the
total used across all of the different global roots.  Instead just use
the total metadata usage as a guide for the global rsv size.  This will
be adjusted up for the truncate minimum size, and clamped down to 512mb
if it's too much.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-rsv.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index 7dcbe901b583..de6d8b4c8213 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -354,9 +354,8 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
 	struct btrfs_space_info *sinfo = block_rsv->space_info;
-	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, 0);
-	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0);
-	u64 num_bytes;
+	struct btrfs_root *root, *tmp;
+	u64 num_bytes = btrfs_root_used(&fs_info->tree_root->root_item);
 	unsigned min_items;
 
 	/*
@@ -364,9 +363,14 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
 	 * checksum tree and the root tree.  If the fs is empty we want to set
 	 * it to a minimal amount for safety.
 	 */
-	num_bytes = btrfs_root_used(&extent_root->root_item) +
-		btrfs_root_used(&csum_root->root_item) +
-		btrfs_root_used(&fs_info->tree_root->root_item);
+	read_lock(&fs_info->global_root_lock);
+	rbtree_postorder_for_each_entry_safe(root, tmp, &fs_info->global_root_tree,
+					     rb_node) {
+		if (root->root_key.objectid == BTRFS_EXTENT_TREE_OBJECTID ||
+		    root->root_key.objectid == BTRFS_CSUM_TREE_OBJECTID)
+			num_bytes += btrfs_root_used(&root->root_item);
+	}
+	read_unlock(&fs_info->global_root_lock);
 
 	/*
 	 * We at a minimum are going to modify the csum root, the tree root, and
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for extent tree v2
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
                   ` (3 preceding siblings ...)
  2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-06  1:05   ` Qu Wenruo
  2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

For extent tree v2 we can definitely have empty extent roots, so skip
this particular check if we have that set.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/tree-checker.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 7733e8ac0a69..1c33dd0e4afc 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1633,7 +1633,6 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
 		/* These trees must never be empty */
 		if (unlikely(owner == BTRFS_ROOT_TREE_OBJECTID ||
 			     owner == BTRFS_CHUNK_TREE_OBJECTID ||
-			     owner == BTRFS_EXTENT_TREE_OBJECTID ||
 			     owner == BTRFS_DEV_TREE_OBJECTID ||
 			     owner == BTRFS_FS_TREE_OBJECTID ||
 			     owner == BTRFS_DATA_RELOC_TREE_OBJECTID)) {
@@ -1642,12 +1641,25 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
 				    owner);
 			return -EUCLEAN;
 		}
+
 		/* Unknown tree */
 		if (unlikely(owner == 0)) {
 			generic_err(leaf, 0,
 				"invalid owner, root 0 is not defined");
 			return -EUCLEAN;
 		}
+
+		/* EXTENT_TREE_V2 can have empty extent trees. */
+		if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+			return 0;
+
+		if (unlikely(owner == BTRFS_EXTENT_TREE_OBJECTID)) {
+			generic_err(leaf, 0,
+			"invalid root, root %llu must never be empty",
+				    owner);
+			return -EUCLEAN;
+		}
+
 		return 0;
 	}
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for extent tree v2
  2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
@ 2021-11-06  1:05   ` Qu Wenruo
  0 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-11-06  1:05 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 2021/11/6 04:49, Josef Bacik wrote:
> For extent tree v2 we can definitely have empty extent roots, so skip
> this particular check if we have that set.

OK I guess the changes in tree-checker is not yet complete.

As I thought there would be more and bigger changes to support those
global roots.

But so far so good for tree-checker.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/tree-checker.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 7733e8ac0a69..1c33dd0e4afc 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -1633,7 +1633,6 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
>   		/* These trees must never be empty */
>   		if (unlikely(owner == BTRFS_ROOT_TREE_OBJECTID ||
>   			     owner == BTRFS_CHUNK_TREE_OBJECTID ||
> -			     owner == BTRFS_EXTENT_TREE_OBJECTID ||
>   			     owner == BTRFS_DEV_TREE_OBJECTID ||
>   			     owner == BTRFS_FS_TREE_OBJECTID ||
>   			     owner == BTRFS_DATA_RELOC_TREE_OBJECTID)) {
> @@ -1642,12 +1641,25 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
>   				    owner);
>   			return -EUCLEAN;
>   		}
> +
>   		/* Unknown tree */
>   		if (unlikely(owner == 0)) {
>   			generic_err(leaf, 0,
>   				"invalid owner, root 0 is not defined");
>   			return -EUCLEAN;
>   		}
> +
> +		/* EXTENT_TREE_V2 can have empty extent trees. */
> +		if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +			return 0;
> +
> +		if (unlikely(owner == BTRFS_EXTENT_TREE_OBJECTID)) {
> +			generic_err(leaf, 0,
> +			"invalid root, root %llu must never be empty",
> +				    owner);
> +			return -EUCLEAN;
> +		}
> +
>   		return 0;
>   	}
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/8] btrfs: abstract out loading the tree root
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
                   ` (4 preceding siblings ...)
  2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
  2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
  7 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We're going to be adding more roots that need to be loaded from the
super block, so abstract out the code to read the tree_root from the
super block, and use this helper for the chunk root as well.  This will
make it simpler to load the new trees in the future.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c | 82 ++++++++++++++++++++++++++--------------------
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2a80b9b6d52d..db8e4856364e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2935,6 +2935,46 @@ static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
+static int load_super_root(struct btrfs_root *root, u64 bytenr, u64 gen,
+			   int level)
+{
+	int ret = 0;
+
+	root->node = read_tree_block(root->fs_info, bytenr,
+				     root->root_key.objectid, gen, level, NULL);
+	if (IS_ERR(root->node)) {
+		ret = PTR_ERR(root->node);
+		root->node = NULL;
+	} else if (!extent_buffer_uptodate(root->node)) {
+		free_extent_buffer(root->node);
+		root->node = NULL;
+		ret = -EIO;
+	}
+
+	if (ret)
+		return ret;
+
+	btrfs_set_root_node(&root->root_item, root->node);
+	root->commit_root = btrfs_root_node(root);
+	btrfs_set_root_refs(&root->root_item, 1);
+	return ret;
+}
+
+static int load_important_roots(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_super_block *sb = fs_info->super_copy;
+	u64 gen, bytenr;
+	int level, ret;
+
+	bytenr = btrfs_super_root(sb);
+	gen = btrfs_super_generation(sb);
+	level = btrfs_super_root_level(sb);
+	ret = load_super_root(fs_info->tree_root, bytenr, gen, level);
+	if (ret)
+		btrfs_warn(fs_info, "couldn't read tree root");
+	return ret;
+}
+
 static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 {
 	int backup_index = find_newest_super_backup(fs_info);
@@ -2945,9 +2985,6 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 	int i;
 
 	for (i = 0; i < BTRFS_NUM_BACKUP_ROOTS; i++) {
-		u64 generation;
-		int level;
-
 		if (handle_error) {
 			if (!IS_ERR(tree_root->node))
 				free_extent_buffer(tree_root->node);
@@ -2972,29 +3009,13 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 			if (ret < 0)
 				return ret;
 		}
-		generation = btrfs_super_generation(sb);
-		level = btrfs_super_root_level(sb);
-		tree_root->node = read_tree_block(fs_info, btrfs_super_root(sb),
-						  BTRFS_ROOT_TREE_OBJECTID,
-						  generation, level, NULL);
-		if (IS_ERR(tree_root->node)) {
-			handle_error = true;
-			ret = PTR_ERR(tree_root->node);
-			tree_root->node = NULL;
-			btrfs_warn(fs_info, "couldn't read tree root");
-			continue;
 
-		} else if (!extent_buffer_uptodate(tree_root->node)) {
+		ret = load_important_roots(fs_info);
+		if (ret) {
 			handle_error = true;
-			ret = -EIO;
-			btrfs_warn(fs_info, "error while reading tree root");
 			continue;
 		}
 
-		btrfs_set_root_node(&tree_root->root_item, tree_root->node);
-		tree_root->commit_root = btrfs_root_node(tree_root);
-		btrfs_set_root_refs(&tree_root->root_item, 1);
-
 		/*
 		 * No need to hold btrfs_root::objectid_mutex since the fs
 		 * hasn't been fully initialised and we are the only user
@@ -3014,8 +3035,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 		}
 
 		/* All successful */
-		fs_info->generation = generation;
-		fs_info->last_trans_committed = generation;
+		fs_info->generation = btrfs_header_generation(tree_root->node);
+		fs_info->last_trans_committed = fs_info->generation;
 
 		/* Always begin writing backup roots after the one being used */
 		if (backup_index < 0) {
@@ -3605,21 +3626,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	generation = btrfs_super_chunk_root_generation(disk_super);
 	level = btrfs_super_chunk_root_level(disk_super);
-
-	chunk_root->node = read_tree_block(fs_info,
-					   btrfs_super_chunk_root(disk_super),
-					   BTRFS_CHUNK_TREE_OBJECTID,
-					   generation, level, NULL);
-	if (IS_ERR(chunk_root->node) ||
-	    !extent_buffer_uptodate(chunk_root->node)) {
+	ret = load_super_root(chunk_root, btrfs_super_chunk_root(disk_super),
+			      generation, level);
+	if (ret) {
 		btrfs_err(fs_info, "failed to read chunk root");
-		if (!IS_ERR(chunk_root->node))
-			free_extent_buffer(chunk_root->node);
-		chunk_root->node = NULL;
 		goto fail_tree_roots;
 	}
-	btrfs_set_root_node(&chunk_root->root_item, chunk_root->node);
-	chunk_root->commit_root = btrfs_root_node(chunk_root);
 
 	read_extent_buffer(chunk_root->node, fs_info->chunk_tree_uuid,
 			   offsetof(struct btrfs_header, chunk_tree_uuid),
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
                   ` (5 preceding siblings ...)
  2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-06  1:11   ` Qu Wenruo
  2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
  7 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This code adds the on disk structures for the block group root, which
will hold the block group items for extent tree v2.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h                | 26 ++++++++++++++++-
 fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
 fs/btrfs/disk-io.h              |  2 ++
 fs/btrfs/print-tree.c           |  1 +
 include/trace/events/btrfs.h    |  1 +
 include/uapi/linux/btrfs_tree.h |  3 ++
 6 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8ec2f068a1c2..b57367141b95 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -271,8 +271,13 @@ struct btrfs_super_block {
 	/* the UUID written into btree blocks */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
 
+	__le64 block_group_root;
+	__le64 block_group_root_generation;
+	u8 block_group_root_level;
+
 	/* future expansion */
-	__le64 reserved[28];
+	u8 reserved8[7];
+	__le64 reserved[25];
 	u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
 	struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 
@@ -648,6 +653,7 @@ struct btrfs_fs_info {
 	struct btrfs_root *quota_root;
 	struct btrfs_root *uuid_root;
 	struct btrfs_root *data_reloc_root;
+	struct btrfs_root *block_group_root;
 
 	/* the log root tree is a directory of all the other log roots */
 	struct btrfs_root *log_root_tree;
@@ -2326,6 +2332,17 @@ BTRFS_SETGET_STACK_FUNCS(backup_bytes_used, struct btrfs_root_backup,
 BTRFS_SETGET_STACK_FUNCS(backup_num_devices, struct btrfs_root_backup,
 		   num_devices, 64);
 
+/*
+ * for extent tree v2 we overload the extent root with the block group root, as
+ * we will have multiple extent roots.
+ */
+BTRFS_SETGET_STACK_FUNCS(backup_block_group_root, struct btrfs_root_backup,
+			 extent_root, 64);
+BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_gen, struct btrfs_root_backup,
+			 extent_root_gen, 64);
+BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_level,
+			 struct btrfs_root_backup, extent_root_level, 8);
+
 /* struct btrfs_balance_item */
 BTRFS_SETGET_FUNCS(balance_flags, struct btrfs_balance_item, flags, 64);
 
@@ -2460,6 +2477,13 @@ BTRFS_SETGET_STACK_FUNCS(super_cache_generation, struct btrfs_super_block,
 BTRFS_SETGET_STACK_FUNCS(super_magic, struct btrfs_super_block, magic, 64);
 BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
 			 uuid_tree_generation, 64);
+BTRFS_SETGET_STACK_FUNCS(super_block_group_root, struct btrfs_super_block,
+			 block_group_root, 64);
+BTRFS_SETGET_STACK_FUNCS(super_block_group_root_generation,
+			 struct btrfs_super_block,
+			 block_group_root_generation, 64);
+BTRFS_SETGET_STACK_FUNCS(super_block_group_root_level, struct btrfs_super_block,
+			 block_group_root_level, 8);
 
 int btrfs_super_csum_size(const struct btrfs_super_block *s);
 const char *btrfs_super_csum_name(u16 csum_type);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index db8e4856364e..45b2bde43150 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1733,6 +1733,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
 	btrfs_put_root(fs_info->uuid_root);
 	btrfs_put_root(fs_info->fs_root);
 	btrfs_put_root(fs_info->data_reloc_root);
+	btrfs_put_root(fs_info->block_group_root);
 	btrfs_check_leaked_roots(fs_info);
 	btrfs_extent_buffer_leak_debug_check(fs_info);
 	kfree(fs_info->super_copy);
@@ -2093,7 +2094,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 {
 	const int next_backup = info->backup_root_index;
 	struct btrfs_root_backup *root_backup;
-	struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
 	struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
 
 	root_backup = info->super_for_commit->super_roots + next_backup;
@@ -2119,11 +2119,23 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 	btrfs_set_backup_chunk_root_level(root_backup,
 			       btrfs_header_level(info->chunk_root->node));
 
-	btrfs_set_backup_extent_root(root_backup, extent_root->node->start);
-	btrfs_set_backup_extent_root_gen(root_backup,
-			       btrfs_header_generation(extent_root->node));
-	btrfs_set_backup_extent_root_level(root_backup,
-			       btrfs_header_level(extent_root->node));
+	if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
+		btrfs_set_backup_block_group_root(root_backup,
+					info->block_group_root->node->start);
+		btrfs_set_backup_block_group_root_gen(root_backup,
+			btrfs_header_generation(info->block_group_root->node));
+		btrfs_set_backup_block_group_root_level(root_backup,
+			btrfs_header_level(info->block_group_root->node));
+	} else {
+		struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
+
+		btrfs_set_backup_extent_root(root_backup,
+					     extent_root->node->start);
+		btrfs_set_backup_extent_root_gen(root_backup,
+				btrfs_header_generation(extent_root->node));
+		btrfs_set_backup_extent_root_level(root_backup,
+					btrfs_header_level(extent_root->node));
+	}
 
 	/*
 	 * we might commit during log recovery, which happens before we set
@@ -2268,6 +2280,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
 	free_root_extent_buffers(info->uuid_root);
 	free_root_extent_buffers(info->fs_root);
 	free_root_extent_buffers(info->data_reloc_root);
+	free_root_extent_buffers(info->block_group_root);
 	if (free_chunk_root)
 		free_root_extent_buffers(info->chunk_root);
 }
@@ -2970,8 +2983,20 @@ static int load_important_roots(struct btrfs_fs_info *fs_info)
 	gen = btrfs_super_generation(sb);
 	level = btrfs_super_root_level(sb);
 	ret = load_super_root(fs_info->tree_root, bytenr, gen, level);
-	if (ret)
+	if (ret) {
 		btrfs_warn(fs_info, "couldn't read tree root");
+		return ret;
+	}
+
+	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+		return 0;
+
+	bytenr = btrfs_super_block_group_root(sb);
+	gen = btrfs_super_block_group_root_generation(sb);
+	level = btrfs_super_block_group_root_level(sb);
+	ret = load_super_root(fs_info->block_group_root, bytenr, gen, level);
+	if (ret)
+		btrfs_warn(fs_info, "couldn't read block group root");
 	return ret;
 }
 
@@ -2984,6 +3009,16 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 	int ret = 0;
 	int i;
 
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		struct btrfs_root *root;
+		root = btrfs_alloc_root(fs_info,
+					BTRFS_BLOCK_GROUP_TREE_OBJECTID,
+					GFP_KERNEL);
+		if (!root)
+			return -ENOMEM;
+		fs_info->block_group_root = root;
+	}
+
 	for (i = 0; i < BTRFS_NUM_BACKUP_ROOTS; i++) {
 		if (handle_error) {
 			if (!IS_ERR(tree_root->node))
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 80b45fcac72a..fe2e16e75a3b 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -113,6 +113,8 @@ static inline struct btrfs_root *btrfs_grab_root(struct btrfs_root *root)
 
 static inline struct btrfs_root *btrfs_block_group_root(struct btrfs_fs_info *fs_info)
 {
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+		return fs_info->block_group_root;
 	return btrfs_extent_root(fs_info, 0);
 }
 
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index aae1027bd76a..5d89c230af94 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -23,6 +23,7 @@ static const struct root_name_map root_map[] = {
 	{ BTRFS_QUOTA_TREE_OBJECTID,		"QUOTA_TREE"		},
 	{ BTRFS_UUID_TREE_OBJECTID,		"UUID_TREE"		},
 	{ BTRFS_FREE_SPACE_TREE_OBJECTID,	"FREE_SPACE_TREE"	},
+	{ BTRFS_BLOCK_GROUP_TREE_OBJECTID,	"BLOCK_GROUP_TREE"	},
 	{ BTRFS_DATA_RELOC_TREE_OBJECTID,	"DATA_RELOC_TREE"	},
 };
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 0d729664b4b4..f068ff30d654 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -53,6 +53,7 @@ struct btrfs_space_info;
 		{ BTRFS_TREE_RELOC_OBJECTID,	"TREE_RELOC"	},	\
 		{ BTRFS_UUID_TREE_OBJECTID,	"UUID_TREE"	},	\
 		{ BTRFS_FREE_SPACE_TREE_OBJECTID, "FREE_SPACE_TREE" },	\
+		{ BTRFS_BLOCK_GROUP_TREE_OBJECTID, "BLOCK_GROUP_TREE" },\
 		{ BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" })
 
 #define show_root_type(obj)						\
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index e1c4c732aaba..75c76b685972 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -53,6 +53,9 @@
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* holds the block group items for extent tree v2. */
+#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
+
 /* device stats in the device tree */
 #define BTRFS_DEV_STATS_OBJECTID 0ULL
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
@ 2021-11-06  1:11   ` Qu Wenruo
  2021-11-08 19:36     ` Josef Bacik
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-06  1:11 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 2021/11/6 04:49, Josef Bacik wrote:
> This code adds the on disk structures for the block group root, which
> will hold the block group items for extent tree v2.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/ctree.h                | 26 ++++++++++++++++-
>   fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
>   fs/btrfs/disk-io.h              |  2 ++
>   fs/btrfs/print-tree.c           |  1 +
>   include/trace/events/btrfs.h    |  1 +
>   include/uapi/linux/btrfs_tree.h |  3 ++
>   6 files changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 8ec2f068a1c2..b57367141b95 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -271,8 +271,13 @@ struct btrfs_super_block {
>   	/* the UUID written into btree blocks */
>   	u8 metadata_uuid[BTRFS_FSID_SIZE];
>
> +	__le64 block_group_root;
> +	__le64 block_group_root_generation;
> +	u8 block_group_root_level;
> +

Is there any special reason that, block group root can't be put into
root tree?

If it's to reduce the unnecessary update on tree root, then I guess free
space tree root should also have some space in super block.

As now free space tree(s) and extent tree(s) are having almost the same
hotness, thus one having direct pointer in super block, while the other
doesn't would not make much sense.

>   	/* future expansion */
> -	__le64 reserved[28];
> +	u8 reserved8[7];
> +	__le64 reserved[25];
>   	u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
>   	struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
>
> @@ -648,6 +653,7 @@ struct btrfs_fs_info {
>   	struct btrfs_root *quota_root;
>   	struct btrfs_root *uuid_root;
>   	struct btrfs_root *data_reloc_root;
> +	struct btrfs_root *block_group_root;
>
>   	/* the log root tree is a directory of all the other log roots */
>   	struct btrfs_root *log_root_tree;
> @@ -2326,6 +2332,17 @@ BTRFS_SETGET_STACK_FUNCS(backup_bytes_used, struct btrfs_root_backup,
>   BTRFS_SETGET_STACK_FUNCS(backup_num_devices, struct btrfs_root_backup,
>   		   num_devices, 64);
>
> +/*
> + * for extent tree v2 we overload the extent root with the block group root, as
> + * we will have multiple extent roots.
> + */
> +BTRFS_SETGET_STACK_FUNCS(backup_block_group_root, struct btrfs_root_backup,
> +			 extent_root, 64);
> +BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_gen, struct btrfs_root_backup,
> +			 extent_root_gen, 64);
> +BTRFS_SETGET_STACK_FUNCS(backup_block_group_root_level,
> +			 struct btrfs_root_backup, extent_root_level, 8);

This also applies to free space trees root.

Thus I'd say, either they both have super block pointers and backup
roots, or none of them has.

Thanks,
Qu

> +
>   /* struct btrfs_balance_item */
>   BTRFS_SETGET_FUNCS(balance_flags, struct btrfs_balance_item, flags, 64);
>
> @@ -2460,6 +2477,13 @@ BTRFS_SETGET_STACK_FUNCS(super_cache_generation, struct btrfs_super_block,
>   BTRFS_SETGET_STACK_FUNCS(super_magic, struct btrfs_super_block, magic, 64);
>   BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
>   			 uuid_tree_generation, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_block_group_root, struct btrfs_super_block,
> +			 block_group_root, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_block_group_root_generation,
> +			 struct btrfs_super_block,
> +			 block_group_root_generation, 64);
> +BTRFS_SETGET_STACK_FUNCS(super_block_group_root_level, struct btrfs_super_block,
> +			 block_group_root_level, 8);
>
>   int btrfs_super_csum_size(const struct btrfs_super_block *s);
>   const char *btrfs_super_csum_name(u16 csum_type);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index db8e4856364e..45b2bde43150 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1733,6 +1733,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>   	btrfs_put_root(fs_info->uuid_root);
>   	btrfs_put_root(fs_info->fs_root);
>   	btrfs_put_root(fs_info->data_reloc_root);
> +	btrfs_put_root(fs_info->block_group_root);
>   	btrfs_check_leaked_roots(fs_info);
>   	btrfs_extent_buffer_leak_debug_check(fs_info);
>   	kfree(fs_info->super_copy);
> @@ -2093,7 +2094,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   {
>   	const int next_backup = info->backup_root_index;
>   	struct btrfs_root_backup *root_backup;
> -	struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
>   	struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>
>   	root_backup = info->super_for_commit->super_roots + next_backup;
> @@ -2119,11 +2119,23 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   	btrfs_set_backup_chunk_root_level(root_backup,
>   			       btrfs_header_level(info->chunk_root->node));
>
> -	btrfs_set_backup_extent_root(root_backup, extent_root->node->start);
> -	btrfs_set_backup_extent_root_gen(root_backup,
> -			       btrfs_header_generation(extent_root->node));
> -	btrfs_set_backup_extent_root_level(root_backup,
> -			       btrfs_header_level(extent_root->node));
> +	if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
> +		btrfs_set_backup_block_group_root(root_backup,
> +					info->block_group_root->node->start);
> +		btrfs_set_backup_block_group_root_gen(root_backup,
> +			btrfs_header_generation(info->block_group_root->node));
> +		btrfs_set_backup_block_group_root_level(root_backup,
> +			btrfs_header_level(info->block_group_root->node));
> +	} else {
> +		struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
> +
> +		btrfs_set_backup_extent_root(root_backup,
> +					     extent_root->node->start);
> +		btrfs_set_backup_extent_root_gen(root_backup,
> +				btrfs_header_generation(extent_root->node));
> +		btrfs_set_backup_extent_root_level(root_backup,
> +					btrfs_header_level(extent_root->node));
> +	}
>
>   	/*
>   	 * we might commit during log recovery, which happens before we set
> @@ -2268,6 +2280,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
>   	free_root_extent_buffers(info->uuid_root);
>   	free_root_extent_buffers(info->fs_root);
>   	free_root_extent_buffers(info->data_reloc_root);
> +	free_root_extent_buffers(info->block_group_root);
>   	if (free_chunk_root)
>   		free_root_extent_buffers(info->chunk_root);
>   }
> @@ -2970,8 +2983,20 @@ static int load_important_roots(struct btrfs_fs_info *fs_info)
>   	gen = btrfs_super_generation(sb);
>   	level = btrfs_super_root_level(sb);
>   	ret = load_super_root(fs_info->tree_root, bytenr, gen, level);
> -	if (ret)
> +	if (ret) {
>   		btrfs_warn(fs_info, "couldn't read tree root");
> +		return ret;
> +	}
> +
> +	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +		return 0;
> +
> +	bytenr = btrfs_super_block_group_root(sb);
> +	gen = btrfs_super_block_group_root_generation(sb);
> +	level = btrfs_super_block_group_root_level(sb);
> +	ret = load_super_root(fs_info->block_group_root, bytenr, gen, level);
> +	if (ret)
> +		btrfs_warn(fs_info, "couldn't read block group root");
>   	return ret;
>   }
>
> @@ -2984,6 +3009,16 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
>   	int ret = 0;
>   	int i;
>
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +		struct btrfs_root *root;
> +		root = btrfs_alloc_root(fs_info,
> +					BTRFS_BLOCK_GROUP_TREE_OBJECTID,
> +					GFP_KERNEL);
> +		if (!root)
> +			return -ENOMEM;
> +		fs_info->block_group_root = root;
> +	}
> +
>   	for (i = 0; i < BTRFS_NUM_BACKUP_ROOTS; i++) {
>   		if (handle_error) {
>   			if (!IS_ERR(tree_root->node))
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 80b45fcac72a..fe2e16e75a3b 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -113,6 +113,8 @@ static inline struct btrfs_root *btrfs_grab_root(struct btrfs_root *root)
>
>   static inline struct btrfs_root *btrfs_block_group_root(struct btrfs_fs_info *fs_info)
>   {
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +		return fs_info->block_group_root;
>   	return btrfs_extent_root(fs_info, 0);
>   }
>
> diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
> index aae1027bd76a..5d89c230af94 100644
> --- a/fs/btrfs/print-tree.c
> +++ b/fs/btrfs/print-tree.c
> @@ -23,6 +23,7 @@ static const struct root_name_map root_map[] = {
>   	{ BTRFS_QUOTA_TREE_OBJECTID,		"QUOTA_TREE"		},
>   	{ BTRFS_UUID_TREE_OBJECTID,		"UUID_TREE"		},
>   	{ BTRFS_FREE_SPACE_TREE_OBJECTID,	"FREE_SPACE_TREE"	},
> +	{ BTRFS_BLOCK_GROUP_TREE_OBJECTID,	"BLOCK_GROUP_TREE"	},
>   	{ BTRFS_DATA_RELOC_TREE_OBJECTID,	"DATA_RELOC_TREE"	},
>   };
>
> diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
> index 0d729664b4b4..f068ff30d654 100644
> --- a/include/trace/events/btrfs.h
> +++ b/include/trace/events/btrfs.h
> @@ -53,6 +53,7 @@ struct btrfs_space_info;
>   		{ BTRFS_TREE_RELOC_OBJECTID,	"TREE_RELOC"	},	\
>   		{ BTRFS_UUID_TREE_OBJECTID,	"UUID_TREE"	},	\
>   		{ BTRFS_FREE_SPACE_TREE_OBJECTID, "FREE_SPACE_TREE" },	\
> +		{ BTRFS_BLOCK_GROUP_TREE_OBJECTID, "BLOCK_GROUP_TREE" },\
>   		{ BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" })
>
>   #define show_root_type(obj)						\
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index e1c4c732aaba..75c76b685972 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -53,6 +53,9 @@
>   /* tracks free space in block groups. */
>   #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
>
> +/* holds the block group items for extent tree v2. */
> +#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
> +
>   /* device stats in the device tree */
>   #define BTRFS_DEV_STATS_OBJECTID 0ULL
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-06  1:11   ` Qu Wenruo
@ 2021-11-08 19:36     ` Josef Bacik
  2021-11-09  1:14       ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-11-08 19:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, kernel-team

On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/6 04:49, Josef Bacik wrote:
> > This code adds the on disk structures for the block group root, which
> > will hold the block group items for extent tree v2.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >   fs/btrfs/ctree.h                | 26 ++++++++++++++++-
> >   fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
> >   fs/btrfs/disk-io.h              |  2 ++
> >   fs/btrfs/print-tree.c           |  1 +
> >   include/trace/events/btrfs.h    |  1 +
> >   include/uapi/linux/btrfs_tree.h |  3 ++
> >   6 files changed, 74 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 8ec2f068a1c2..b57367141b95 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> >   	/* the UUID written into btree blocks */
> >   	u8 metadata_uuid[BTRFS_FSID_SIZE];
> > 
> > +	__le64 block_group_root;
> > +	__le64 block_group_root_generation;
> > +	u8 block_group_root_level;
> > +
> 
> Is there any special reason that, block group root can't be put into
> root tree?
> 

Yes, I'm so glad you asked!

One of the planned changes with extent-tree-v2 is how we do relocation.  With no
longer being able to track metadata in the extent tree, relocation becomes much
more of a pain in the ass.

In addition, relocation currently has a pretty big problem, it can generate
unlimited delayed refs because it absolutely has to update all paths that point
to a relocated block in a single transaction.

I'm fixing both of these problems with a new relocation thing, which will walk
through a block group, copy those extents to a new block group, and then update
a tree that maps the old logical address to the new logical address.

Because of this we could end up with blocks in the tree root that need to be
remapped from a relocated block group into a new block group.  Thus we need to
be able to know what that mapping is before we go read the tree root.  This
means we have to store the block group root (and the new mapping root I'll
introduce later) in the super block.

These two roots will behave like the chunk root, they'll have to be read first
in order to know where to find any remapped metadata blocks.  Because of that we
have to keep pointers to them in the super block instead of the tree root.

> If it's to reduce the unnecessary update on tree root, then I guess free
> space tree root should also have some space in super block.
> 
> As now free space tree(s) and extent tree(s) are having almost the same
> hotness, thus one having direct pointer in super block, while the other
> doesn't would not make much sense.

The extent tree and free space trees are both in the tree root, the only thing
that's in the superblock (currently) is the tree root and the chunk root.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-08 19:36     ` Josef Bacik
@ 2021-11-09  1:14       ` Qu Wenruo
  2021-11-09 19:24         ` Josef Bacik
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-09  1:14 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo; +Cc: linux-btrfs, kernel-team



On 2021/11/9 03:36, Josef Bacik wrote:
> On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/11/6 04:49, Josef Bacik wrote:
>>> This code adds the on disk structures for the block group root, which
>>> will hold the block group items for extent tree v2.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>    fs/btrfs/ctree.h                | 26 ++++++++++++++++-
>>>    fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
>>>    fs/btrfs/disk-io.h              |  2 ++
>>>    fs/btrfs/print-tree.c           |  1 +
>>>    include/trace/events/btrfs.h    |  1 +
>>>    include/uapi/linux/btrfs_tree.h |  3 ++
>>>    6 files changed, 74 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 8ec2f068a1c2..b57367141b95 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -271,8 +271,13 @@ struct btrfs_super_block {
>>>    	/* the UUID written into btree blocks */
>>>    	u8 metadata_uuid[BTRFS_FSID_SIZE];
>>>
>>> +	__le64 block_group_root;
>>> +	__le64 block_group_root_generation;
>>> +	u8 block_group_root_level;
>>> +
>>
>> Is there any special reason that, block group root can't be put into
>> root tree?
>>
> 
> Yes, I'm so glad you asked!
> 
> One of the planned changes with extent-tree-v2 is how we do relocation.  With no
> longer being able to track metadata in the extent tree, relocation becomes much
> more of a pain in the ass.

I'm even surprised that relocation can even be done without proper 
metadata tracking in the new extent tree(s).

> 
> In addition, relocation currently has a pretty big problem, it can generate
> unlimited delayed refs because it absolutely has to update all paths that point
> to a relocated block in a single transaction.

Yep, that's also the biggest problem I attacked for the qgroup balance 
optimization.

> 
> I'm fixing both of these problems with a new relocation thing, which will walk
> through a block group, copy those extents to a new block group, and then update
> a tree that maps the old logical address to the new logical address.

That sounds like the proposal from Johannes for zoned support of RAID56.
An FTL-like layer.

But I'm still not sure how we could even get all the tree blocks in one 
block group in the first place, as there is no longer backref in the 
extent tree(s).

By iterating all tree blocks? That doesn't sound sane to me...

> 
> Because of this we could end up with blocks in the tree root that need to be
> remapped from a relocated block group into a new block group.  Thus we need to
> be able to know what that mapping is before we go read the tree root.  This
> means we have to store the block group root (and the new mapping root I'll
> introduce later) in the super block.

Wouldn't the new mapping root becoming a new bottleneck then?

If we relocate the full fs, then the mapping root (block group root) 
would be no different than an old extent tree?

Especially the mapping is done in extent level, not chunk level, thus it 
can cause tons of mapping entries, really not that better than old 
extent tree then.

> 
> These two roots will behave like the chunk root, they'll have to be read first
> in order to know where to find any remapped metadata blocks.  Because of that we
> have to keep pointers to them in the super block instead of the tree root.

Got the reason now, but I'm not yet convinced the block group root 
mapping is the proper way to go....

And still not sure how can we find out all tree blocks in one block 
group without backref for each tree blocks...

Thanks,
Qu

> 
>> If it's to reduce the unnecessary update on tree root, then I guess free
>> space tree root should also have some space in super block.
>>
>> As now free space tree(s) and extent tree(s) are having almost the same
>> hotness, thus one having direct pointer in super block, while the other
>> doesn't would not make much sense.
> 
> The extent tree and free space trees are both in the tree root, the only thing
> that's in the superblock (currently) is the tree root and the chunk root.
> Thanks,
> 
> Josef
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-09  1:14       ` Qu Wenruo
@ 2021-11-09 19:24         ` Josef Bacik
  2021-11-09 23:44           ` Qu Wenruo
  2021-11-10  7:13           ` Qu Wenruo
  0 siblings, 2 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-09 19:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, kernel-team

On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/9 03:36, Josef Bacik wrote:
> > On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> > > 
> > > 
> > > On 2021/11/6 04:49, Josef Bacik wrote:
> > > > This code adds the on disk structures for the block group root, which
> > > > will hold the block group items for extent tree v2.
> > > > 
> > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > > ---
> > > >    fs/btrfs/ctree.h                | 26 ++++++++++++++++-
> > > >    fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
> > > >    fs/btrfs/disk-io.h              |  2 ++
> > > >    fs/btrfs/print-tree.c           |  1 +
> > > >    include/trace/events/btrfs.h    |  1 +
> > > >    include/uapi/linux/btrfs_tree.h |  3 ++
> > > >    6 files changed, 74 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > > index 8ec2f068a1c2..b57367141b95 100644
> > > > --- a/fs/btrfs/ctree.h
> > > > +++ b/fs/btrfs/ctree.h
> > > > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> > > >    	/* the UUID written into btree blocks */
> > > >    	u8 metadata_uuid[BTRFS_FSID_SIZE];
> > > > 
> > > > +	__le64 block_group_root;
> > > > +	__le64 block_group_root_generation;
> > > > +	u8 block_group_root_level;
> > > > +
> > > 
> > > Is there any special reason that, block group root can't be put into
> > > root tree?
> > > 
> > 
> > Yes, I'm so glad you asked!
> > 
> > One of the planned changes with extent-tree-v2 is how we do relocation.  With no
> > longer being able to track metadata in the extent tree, relocation becomes much
> > more of a pain in the ass.
> 
> I'm even surprised that relocation can even be done without proper metadata
> tracking in the new extent tree(s).
> 
> > 
> > In addition, relocation currently has a pretty big problem, it can generate
> > unlimited delayed refs because it absolutely has to update all paths that point
> > to a relocated block in a single transaction.
> 
> Yep, that's also the biggest problem I attacked for the qgroup balance
> optimization.
> 
> > 
> > I'm fixing both of these problems with a new relocation thing, which will walk
> > through a block group, copy those extents to a new block group, and then update
> > a tree that maps the old logical address to the new logical address.
> 
> That sounds like the proposal from Johannes for zoned support of RAID56.
> An FTL-like layer.
> 
> But I'm still not sure how we could even get all the tree blocks in one
> block group in the first place, as there is no longer backref in the extent
> tree(s).
> 
> By iterating all tree blocks? That doesn't sound sane to me...
> 

No, iterating the free areas in the free space tree.  We no longer care about
the metadata itself, just the space that is utilized in the block group.  We
will mark the block group as read only, search through the free space tree for
that block group to find extents, copy them to new locations, insert a mapping
object for that block group to say "X range is now at Y".

As extent's are free'd their new respective ranges are freed.  Once a relocated
block groups ->used hits 0 its mapping items are deleted.

> > 
> > Because of this we could end up with blocks in the tree root that need to be
> > remapped from a relocated block group into a new block group.  Thus we need to
> > be able to know what that mapping is before we go read the tree root.  This
> > means we have to store the block group root (and the new mapping root I'll
> > introduce later) in the super block.
> 
> Wouldn't the new mapping root becoming a new bottleneck then?
> 
> If we relocate the full fs, then the mapping root (block group root) would
> be no different than an old extent tree?
> 
> Especially the mapping is done in extent level, not chunk level, thus it can
> cause tons of mapping entries, really not that better than old extent tree
> then.
> 

Except the problem with the old extent tree is we are constantly modifying it.
The mapping's are never modified once they're created, unless we're remapping
and already remapped range.  Once the remapped extent is free'd it's new
location will be normal, and won't update anything in the mapping tree.

> > 
> > These two roots will behave like the chunk root, they'll have to be read first
> > in order to know where to find any remapped metadata blocks.  Because of that we
> > have to keep pointers to them in the super block instead of the tree root.
> 
> Got the reason now, but I'm not yet convinced the block group root mapping
> is the proper way to go....
> 
> And still not sure how can we find out all tree blocks in one block group
> without backref for each tree blocks...
> 

We won't, we'll find allocated ranges.  It's certainly less precise than the
backref tree, but waaaaaaaay faster, because we only care about the range that
is allocated and moving that range.

Also it gives us another neat ability, we can relocate parts of extents instead
of being required to move full extents.  Before we had to move the whole extent
because we have to modify the file extent items to point at exactly the same
range.

Here the translation happens at the logical level, so we can easily split up
large extents and simply split up any bio's across the new logical locations and
stitch it back together at the end.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-09 19:24         ` Josef Bacik
@ 2021-11-09 23:44           ` Qu Wenruo
  2021-11-10 13:57             ` Josef Bacik
  2021-11-10  7:13           ` Qu Wenruo
  1 sibling, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-09 23:44 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, kernel-team



On 2021/11/10 03:24, Josef Bacik wrote:
> On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/11/9 03:36, Josef Bacik wrote:
>>> On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2021/11/6 04:49, Josef Bacik wrote:
>>>>> This code adds the on disk structures for the block group root, which
>>>>> will hold the block group items for extent tree v2.
>>>>>
>>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>>> ---
>>>>>     fs/btrfs/ctree.h                | 26 ++++++++++++++++-
>>>>>     fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
>>>>>     fs/btrfs/disk-io.h              |  2 ++
>>>>>     fs/btrfs/print-tree.c           |  1 +
>>>>>     include/trace/events/btrfs.h    |  1 +
>>>>>     include/uapi/linux/btrfs_tree.h |  3 ++
>>>>>     6 files changed, 74 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index 8ec2f068a1c2..b57367141b95 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -271,8 +271,13 @@ struct btrfs_super_block {
>>>>>     	/* the UUID written into btree blocks */
>>>>>     	u8 metadata_uuid[BTRFS_FSID_SIZE];
>>>>>
>>>>> +	__le64 block_group_root;
>>>>> +	__le64 block_group_root_generation;
>>>>> +	u8 block_group_root_level;
>>>>> +
>>>>
>>>> Is there any special reason that, block group root can't be put into
>>>> root tree?
>>>>
>>>
>>> Yes, I'm so glad you asked!
>>>
>>> One of the planned changes with extent-tree-v2 is how we do relocation.  With no
>>> longer being able to track metadata in the extent tree, relocation becomes much
>>> more of a pain in the ass.
>>
>> I'm even surprised that relocation can even be done without proper metadata
>> tracking in the new extent tree(s).
>>
>>>
>>> In addition, relocation currently has a pretty big problem, it can generate
>>> unlimited delayed refs because it absolutely has to update all paths that point
>>> to a relocated block in a single transaction.
>>
>> Yep, that's also the biggest problem I attacked for the qgroup balance
>> optimization.
>>
>>>
>>> I'm fixing both of these problems with a new relocation thing, which will walk
>>> through a block group, copy those extents to a new block group, and then update
>>> a tree that maps the old logical address to the new logical address.
>>
>> That sounds like the proposal from Johannes for zoned support of RAID56.
>> An FTL-like layer.
>>
>> But I'm still not sure how we could even get all the tree blocks in one
>> block group in the first place, as there is no longer backref in the extent
>> tree(s).
>>
>> By iterating all tree blocks? That doesn't sound sane to me...
>>
> 
> No, iterating the free areas in the free space tree.  We no longer care about
> the metadata itself, just the space that is utilized in the block group.  We
> will mark the block group as read only, search through the free space tree for
> that block group to find extents, copy them to new locations, insert a mapping
> object for that block group to say "X range is now at Y".

OK, this makes sense now.

> 
> As extent's are free'd their new respective ranges are freed.  Once a relocated
> block groups ->used hits 0 its mapping items are deleted.
> 
>>>
>>> Because of this we could end up with blocks in the tree root that need to be
>>> remapped from a relocated block group into a new block group.  Thus we need to
>>> be able to know what that mapping is before we go read the tree root.  This
>>> means we have to store the block group root (and the new mapping root I'll
>>> introduce later) in the super block.
>>
>> Wouldn't the new mapping root becoming a new bottleneck then?
>>
>> If we relocate the full fs, then the mapping root (block group root) would
>> be no different than an old extent tree?
>>
>> Especially the mapping is done in extent level, not chunk level, thus it can
>> cause tons of mapping entries, really not that better than old extent tree
>> then.
>>
> 
> Except the problem with the old extent tree is we are constantly modifying it.
> The mapping's are never modified once they're created, unless we're remapping
> and already remapped range.  Once the remapped extent is free'd it's new
> location will be normal, and won't update anything in the mapping tree.

Oh, so the block group tree would be an colder version of extent tree, 
that would be really much nicer.

But that also means, to determine if a tree block/data extent is really 
belonging to a chunk/bg, we need to search for the new tree to be sure.

Would there be something to do reverse search? Or it may be a problem 
for balance again.

> 
>>>
>>> These two roots will behave like the chunk root, they'll have to be read first
>>> in order to know where to find any remapped metadata blocks.  Because of that we
>>> have to keep pointers to them in the super block instead of the tree root.
>>
>> Got the reason now, but I'm not yet convinced the block group root mapping
>> is the proper way to go....
>>
>> And still not sure how can we find out all tree blocks in one block group
>> without backref for each tree blocks...
>>
> 
> We won't, we'll find allocated ranges.  It's certainly less precise than the
> backref tree, but waaaaaaaay faster, because we only care about the range that
> is allocated and moving that range.
> 
> Also it gives us another neat ability, we can relocate parts of extents instead
> of being required to move full extents.  Before we had to move the whole extent
> because we have to modify the file extent items to point at exactly the same
> range.
> 
> Here the translation happens at the logical level, so we can easily split up
> large extents and simply split up any bio's across the new logical locations and
> stitch it back together at the end.  Thanks,

That really sounds better, and is what I'm going to do as a preparation 
for the iomap work.

Move all the bio split part into chunk layer, not in the current layer.

Thanks,
Qu

> 
> Josef
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-09 23:44           ` Qu Wenruo
@ 2021-11-10 13:57             ` Josef Bacik
  0 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-10 13:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, kernel-team

On Wed, Nov 10, 2021 at 07:44:51AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/10 03:24, Josef Bacik wrote:
> > On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
> > > 
> > > 
> > > On 2021/11/9 03:36, Josef Bacik wrote:
> > > > On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> > > > > 
> > > > > 
> > > > > On 2021/11/6 04:49, Josef Bacik wrote:
> > > > > > This code adds the on disk structures for the block group root, which
> > > > > > will hold the block group items for extent tree v2.
> > > > > > 
> > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > > > > ---
> > > > > >     fs/btrfs/ctree.h                | 26 ++++++++++++++++-
> > > > > >     fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
> > > > > >     fs/btrfs/disk-io.h              |  2 ++
> > > > > >     fs/btrfs/print-tree.c           |  1 +
> > > > > >     include/trace/events/btrfs.h    |  1 +
> > > > > >     include/uapi/linux/btrfs_tree.h |  3 ++
> > > > > >     6 files changed, 74 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > > > > index 8ec2f068a1c2..b57367141b95 100644
> > > > > > --- a/fs/btrfs/ctree.h
> > > > > > +++ b/fs/btrfs/ctree.h
> > > > > > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> > > > > >     	/* the UUID written into btree blocks */
> > > > > >     	u8 metadata_uuid[BTRFS_FSID_SIZE];
> > > > > > 
> > > > > > +	__le64 block_group_root;
> > > > > > +	__le64 block_group_root_generation;
> > > > > > +	u8 block_group_root_level;
> > > > > > +
> > > > > 
> > > > > Is there any special reason that, block group root can't be put into
> > > > > root tree?
> > > > > 
> > > > 
> > > > Yes, I'm so glad you asked!
> > > > 
> > > > One of the planned changes with extent-tree-v2 is how we do relocation.  With no
> > > > longer being able to track metadata in the extent tree, relocation becomes much
> > > > more of a pain in the ass.
> > > 
> > > I'm even surprised that relocation can even be done without proper metadata
> > > tracking in the new extent tree(s).
> > > 
> > > > 
> > > > In addition, relocation currently has a pretty big problem, it can generate
> > > > unlimited delayed refs because it absolutely has to update all paths that point
> > > > to a relocated block in a single transaction.
> > > 
> > > Yep, that's also the biggest problem I attacked for the qgroup balance
> > > optimization.
> > > 
> > > > 
> > > > I'm fixing both of these problems with a new relocation thing, which will walk
> > > > through a block group, copy those extents to a new block group, and then update
> > > > a tree that maps the old logical address to the new logical address.
> > > 
> > > That sounds like the proposal from Johannes for zoned support of RAID56.
> > > An FTL-like layer.
> > > 
> > > But I'm still not sure how we could even get all the tree blocks in one
> > > block group in the first place, as there is no longer backref in the extent
> > > tree(s).
> > > 
> > > By iterating all tree blocks? That doesn't sound sane to me...
> > > 
> > 
> > No, iterating the free areas in the free space tree.  We no longer care about
> > the metadata itself, just the space that is utilized in the block group.  We
> > will mark the block group as read only, search through the free space tree for
> > that block group to find extents, copy them to new locations, insert a mapping
> > object for that block group to say "X range is now at Y".
> 
> OK, this makes sense now.
> 
> > 
> > As extent's are free'd their new respective ranges are freed.  Once a relocated
> > block groups ->used hits 0 its mapping items are deleted.
> > 
> > > > 
> > > > Because of this we could end up with blocks in the tree root that need to be
> > > > remapped from a relocated block group into a new block group.  Thus we need to
> > > > be able to know what that mapping is before we go read the tree root.  This
> > > > means we have to store the block group root (and the new mapping root I'll
> > > > introduce later) in the super block.
> > > 
> > > Wouldn't the new mapping root becoming a new bottleneck then?
> > > 
> > > If we relocate the full fs, then the mapping root (block group root) would
> > > be no different than an old extent tree?
> > > 
> > > Especially the mapping is done in extent level, not chunk level, thus it can
> > > cause tons of mapping entries, really not that better than old extent tree
> > > then.
> > > 
> > 
> > Except the problem with the old extent tree is we are constantly modifying it.
> > The mapping's are never modified once they're created, unless we're remapping
> > and already remapped range.  Once the remapped extent is free'd it's new
> > location will be normal, and won't update anything in the mapping tree.
> 
> Oh, so the block group tree would be an colder version of extent tree, that
> would be really much nicer.
> 
> But that also means, to determine if a tree block/data extent is really
> belonging to a chunk/bg, we need to search for the new tree to be sure.
> 
> Would there be something to do reverse search? Or it may be a problem for
> balance again.
> 

Reverse search will be fuzzier than it used to be.  Using the snapshot tree
we'll be able to figure out who *potentially* points at a block, but we'll have
to go check.  This is the other reason I'm redoing balance at the same time,
first it super sucks and needs to be better, but secondly because it simply
won't be practical to do backref lookups for metadata anymore.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-09 19:24         ` Josef Bacik
  2021-11-09 23:44           ` Qu Wenruo
@ 2021-11-10  7:13           ` Qu Wenruo
  2021-11-10 13:54             ` Josef Bacik
  1 sibling, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-10  7:13 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo; +Cc: linux-btrfs, kernel-team



On 2021/11/10 03:24, Josef Bacik wrote:
> On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/11/9 03:36, Josef Bacik wrote:
>>> On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2021/11/6 04:49, Josef Bacik wrote:
>>>>> This code adds the on disk structures for the block group root, which
>>>>> will hold the block group items for extent tree v2.
>>>>>
>>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>>> ---
>>>>>     fs/btrfs/ctree.h                | 26 ++++++++++++++++-
>>>>>     fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
>>>>>     fs/btrfs/disk-io.h              |  2 ++
>>>>>     fs/btrfs/print-tree.c           |  1 +
>>>>>     include/trace/events/btrfs.h    |  1 +
>>>>>     include/uapi/linux/btrfs_tree.h |  3 ++
>>>>>     6 files changed, 74 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index 8ec2f068a1c2..b57367141b95 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -271,8 +271,13 @@ struct btrfs_super_block {
>>>>>     	/* the UUID written into btree blocks */
>>>>>     	u8 metadata_uuid[BTRFS_FSID_SIZE];
>>>>>
>>>>> +	__le64 block_group_root;
>>>>> +	__le64 block_group_root_generation;
>>>>> +	u8 block_group_root_level;
>>>>> +
>>>>
>>>> Is there any special reason that, block group root can't be put into
>>>> root tree?
>>>>
>>>
>>> Yes, I'm so glad you asked!
>>>
>>> One of the planned changes with extent-tree-v2 is how we do relocation.  With no
>>> longer being able to track metadata in the extent tree, relocation becomes much
>>> more of a pain in the ass.
>>
>> I'm even surprised that relocation can even be done without proper metadata
>> tracking in the new extent tree(s).
>>
>>>
>>> In addition, relocation currently has a pretty big problem, it can generate
>>> unlimited delayed refs because it absolutely has to update all paths that point
>>> to a relocated block in a single transaction.
>>
>> Yep, that's also the biggest problem I attacked for the qgroup balance
>> optimization.
>>
>>>
>>> I'm fixing both of these problems with a new relocation thing, which will walk
>>> through a block group, copy those extents to a new block group, and then update
>>> a tree that maps the old logical address to the new logical address.
>>
>> That sounds like the proposal from Johannes for zoned support of RAID56.
>> An FTL-like layer.
>>
>> But I'm still not sure how we could even get all the tree blocks in one
>> block group in the first place, as there is no longer backref in the extent
>> tree(s).
>>
>> By iterating all tree blocks? That doesn't sound sane to me...
>>
>
> No, iterating the free areas in the free space tree.  We no longer care about
> the metadata itself, just the space that is utilized in the block group.  We
> will mark the block group as read only, search through the free space tree for
> that block group to find extents, copy them to new locations, insert a mapping
> object for that block group to say "X range is now at Y".
>
> As extent's are free'd their new respective ranges are freed.  Once a relocated
> block groups ->used hits 0 its mapping items are deleted.
>
>>>
>>> Because of this we could end up with blocks in the tree root that need to be
>>> remapped from a relocated block group into a new block group.  Thus we need to
>>> be able to know what that mapping is before we go read the tree root.  This
>>> means we have to store the block group root (and the new mapping root I'll
>>> introduce later) in the super block.
>>
>> Wouldn't the new mapping root becoming a new bottleneck then?
>>
>> If we relocate the full fs, then the mapping root (block group root) would
>> be no different than an old extent tree?
>>
>> Especially the mapping is done in extent level, not chunk level, thus it can
>> cause tons of mapping entries, really not that better than old extent tree
>> then.
>>
>
> Except the problem with the old extent tree is we are constantly modifying it.

I have another question related to this block group tree.

AFAIK your new extent-tree-v2 will greatly reduce the amount of extent
items by:

- Skip all backref items for global trees

- Skip backref items for non-shared subvolumes
   As they act just like global trees (until being snapshotted).

I'm wondering if above modification is enough to make extent tree so
cold that we don't even need block group tree?

Thanks,
Qu

> The mapping's are never modified once they're created, unless we're remapping
> and already remapped range.  Once the remapped extent is free'd it's new
> location will be normal, and won't update anything in the mapping tree.
>
>>>
>>> These two roots will behave like the chunk root, they'll have to be read first
>>> in order to know where to find any remapped metadata blocks.  Because of that we
>>> have to keep pointers to them in the super block instead of the tree root.
>>
>> Got the reason now, but I'm not yet convinced the block group root mapping
>> is the proper way to go....
>>
>> And still not sure how can we find out all tree blocks in one block group
>> without backref for each tree blocks...
>>
>
> We won't, we'll find allocated ranges.  It's certainly less precise than the
> backref tree, but waaaaaaaay faster, because we only care about the range that
> is allocated and moving that range.
>
> Also it gives us another neat ability, we can relocate parts of extents instead
> of being required to move full extents.  Before we had to move the whole extent
> because we have to modify the file extent items to point at exactly the same
> range.
>
> Here the translation happens at the logical level, so we can easily split up
> large extents and simply split up any bio's across the new logical locations and
> stitch it back together at the end.  Thanks,
>
> Josef
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] btrfs: add code to support the block group root
  2021-11-10  7:13           ` Qu Wenruo
@ 2021-11-10 13:54             ` Josef Bacik
  0 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-10 13:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, kernel-team

On Wed, Nov 10, 2021 at 03:13:37PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/10 03:24, Josef Bacik wrote:
> > On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
> > > 
> > > 
> > > On 2021/11/9 03:36, Josef Bacik wrote:
> > > > On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> > > > > 
> > > > > 
> > > > > On 2021/11/6 04:49, Josef Bacik wrote:
> > > > > > This code adds the on disk structures for the block group root, which
> > > > > > will hold the block group items for extent tree v2.
> > > > > > 
> > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > > > > ---
> > > > > >     fs/btrfs/ctree.h                | 26 ++++++++++++++++-
> > > > > >     fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
> > > > > >     fs/btrfs/disk-io.h              |  2 ++
> > > > > >     fs/btrfs/print-tree.c           |  1 +
> > > > > >     include/trace/events/btrfs.h    |  1 +
> > > > > >     include/uapi/linux/btrfs_tree.h |  3 ++
> > > > > >     6 files changed, 74 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > > > > index 8ec2f068a1c2..b57367141b95 100644
> > > > > > --- a/fs/btrfs/ctree.h
> > > > > > +++ b/fs/btrfs/ctree.h
> > > > > > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> > > > > >     	/* the UUID written into btree blocks */
> > > > > >     	u8 metadata_uuid[BTRFS_FSID_SIZE];
> > > > > > 
> > > > > > +	__le64 block_group_root;
> > > > > > +	__le64 block_group_root_generation;
> > > > > > +	u8 block_group_root_level;
> > > > > > +
> > > > > 
> > > > > Is there any special reason that, block group root can't be put into
> > > > > root tree?
> > > > > 
> > > > 
> > > > Yes, I'm so glad you asked!
> > > > 
> > > > One of the planned changes with extent-tree-v2 is how we do relocation.  With no
> > > > longer being able to track metadata in the extent tree, relocation becomes much
> > > > more of a pain in the ass.
> > > 
> > > I'm even surprised that relocation can even be done without proper metadata
> > > tracking in the new extent tree(s).
> > > 
> > > > 
> > > > In addition, relocation currently has a pretty big problem, it can generate
> > > > unlimited delayed refs because it absolutely has to update all paths that point
> > > > to a relocated block in a single transaction.
> > > 
> > > Yep, that's also the biggest problem I attacked for the qgroup balance
> > > optimization.
> > > 
> > > > 
> > > > I'm fixing both of these problems with a new relocation thing, which will walk
> > > > through a block group, copy those extents to a new block group, and then update
> > > > a tree that maps the old logical address to the new logical address.
> > > 
> > > That sounds like the proposal from Johannes for zoned support of RAID56.
> > > An FTL-like layer.
> > > 
> > > But I'm still not sure how we could even get all the tree blocks in one
> > > block group in the first place, as there is no longer backref in the extent
> > > tree(s).
> > > 
> > > By iterating all tree blocks? That doesn't sound sane to me...
> > > 
> > 
> > No, iterating the free areas in the free space tree.  We no longer care about
> > the metadata itself, just the space that is utilized in the block group.  We
> > will mark the block group as read only, search through the free space tree for
> > that block group to find extents, copy them to new locations, insert a mapping
> > object for that block group to say "X range is now at Y".
> > 
> > As extent's are free'd their new respective ranges are freed.  Once a relocated
> > block groups ->used hits 0 its mapping items are deleted.
> > 
> > > > 
> > > > Because of this we could end up with blocks in the tree root that need to be
> > > > remapped from a relocated block group into a new block group.  Thus we need to
> > > > be able to know what that mapping is before we go read the tree root.  This
> > > > means we have to store the block group root (and the new mapping root I'll
> > > > introduce later) in the super block.
> > > 
> > > Wouldn't the new mapping root becoming a new bottleneck then?
> > > 
> > > If we relocate the full fs, then the mapping root (block group root) would
> > > be no different than an old extent tree?
> > > 
> > > Especially the mapping is done in extent level, not chunk level, thus it can
> > > cause tons of mapping entries, really not that better than old extent tree
> > > then.
> > > 
> > 
> > Except the problem with the old extent tree is we are constantly modifying it.
> 
> I have another question related to this block group tree.
> 
> AFAIK your new extent-tree-v2 will greatly reduce the amount of extent
> items by:
> 
> - Skip all backref items for global trees
> 
> - Skip backref items for non-shared subvolumes
>   As they act just like global trees (until being snapshotted).
> 
> I'm wondering if above modification is enough to make extent tree so
> cold that we don't even need block group tree?
> 

We need it separate still because we need to get at it from the super block in
order to pre-load it so we can load the mapping tree in order to do the
logical->logical translation for the new relocation scheme.

Also the extent tree is still going to have data backrefs, so we'll still end up
with a huge spread.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 8/8] btrfs: add support for multiple global roots
  2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
                   ` (6 preceding siblings ...)
  2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
@ 2021-11-05 20:49 ` Josef Bacik
  2021-11-06  1:18   ` Qu Wenruo
  7 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-11-05 20:49 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

With extent tree v2 you will be able to create multiple csum, extent,
and free space trees.  They will be used based on the block group, which
will now use the block_group_item->chunk_objectid to point to the set of
global roots that it will use.  When allocating new block groups we'll
simply mod the gigabyte offset of the block group against the number of
global roots we have and that will be the block groups global id.

From there we can take the bytenr that we're modifying in the respective
tree, look up the block group and get that block groups corresponding
global root id.  From there we can get to the appropriate global root
for that bytenr.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c     | 11 +++++++--
 fs/btrfs/block-group.h     |  1 +
 fs/btrfs/ctree.h           |  2 ++
 fs/btrfs/disk-io.c         | 49 +++++++++++++++++++++++++++++++-------
 fs/btrfs/free-space-tree.c |  2 ++
 fs/btrfs/transaction.c     | 15 ++++++++++++
 fs/btrfs/tree-checker.c    | 21 ++++++++++++++--
 7 files changed, 88 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 7eb0a8632a01..85516f2fd5da 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2002,6 +2002,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
 	cache->length = key->offset;
 	cache->used = btrfs_stack_block_group_used(bgi);
 	cache->flags = btrfs_stack_block_group_flags(bgi);
+	cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
 
 	set_free_space_tree_thresholds(cache);
 
@@ -2284,7 +2285,7 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
 	spin_lock(&block_group->lock);
 	btrfs_set_stack_block_group_used(&bgi, block_group->used);
 	btrfs_set_stack_block_group_chunk_objectid(&bgi,
-				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+						   block_group->global_root_id);
 	btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
 	key.objectid = block_group->start;
 	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
@@ -2460,6 +2461,12 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
 	cache->flags = type;
 	cache->last_byte_to_unpin = (u64)-1;
 	cache->cached = BTRFS_CACHE_FINISHED;
+	cache->global_root_id = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+		cache->global_root_id = div64_u64(cache->start, SZ_1G) %
+			fs_info->nr_global_roots;
+
 	if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
 		cache->needs_free_space = 1;
 
@@ -2676,7 +2683,7 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
 	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
 	btrfs_set_stack_block_group_used(&bgi, cache->used);
 	btrfs_set_stack_block_group_chunk_objectid(&bgi,
-			BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+						   cache->global_root_id);
 	btrfs_set_stack_block_group_flags(&bgi, cache->flags);
 	write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
 	btrfs_mark_buffer_dirty(leaf);
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 5878b7ce3b78..93aabc68bb6a 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -68,6 +68,7 @@ struct btrfs_block_group {
 	u64 bytes_super;
 	u64 flags;
 	u64 cache_generation;
+	u64 global_root_id;
 
 	/*
 	 * If the free space extent count exceeds this number, convert the block
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b57367141b95..7de0cd2b87ec 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1057,6 +1057,8 @@ struct btrfs_fs_info {
 	spinlock_t relocation_bg_lock;
 	u64 data_reloc_bg;
 
+	u64 nr_global_roots;
+
 	spinlock_t zone_active_bgs_lock;
 	struct list_head zone_active_bgs;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 45b2bde43150..a8bc00d17b26 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1295,13 +1295,33 @@ struct btrfs_root *btrfs_global_root(struct btrfs_fs_info *fs_info,
 	return root;
 }
 
+static u64 btrfs_global_root_id(struct btrfs_fs_info *fs_info, u64 bytenr)
+{
+	struct btrfs_block_group *block_group;
+	u64 ret;
+
+	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+		return 0;
+
+	if (likely(bytenr))
+		block_group = btrfs_lookup_block_group(fs_info, bytenr);
+	else
+		block_group = btrfs_lookup_first_block_group(fs_info, bytenr);
+	ASSERT(block_group);
+	if (!block_group)
+		return 0;
+	ret = block_group->global_root_id;
+	btrfs_put_block_group(block_group);
+	return ret;
+}
+
 struct btrfs_root *btrfs_csum_root(struct btrfs_fs_info *fs_info,
 				   u64 bytenr)
 {
 	struct btrfs_key key = {
 		.objectid = BTRFS_CSUM_TREE_OBJECTID,
 		.type = BTRFS_ROOT_ITEM_KEY,
-		.offset = 0,
+		.offset = btrfs_global_root_id(fs_info, bytenr),
 	};
 
 	return btrfs_global_root(fs_info, &key);
@@ -1313,7 +1333,7 @@ struct btrfs_root *btrfs_extent_root(struct btrfs_fs_info *fs_info,
 	struct btrfs_key key = {
 		.objectid = BTRFS_EXTENT_TREE_OBJECTID,
 		.type = BTRFS_ROOT_ITEM_KEY,
-		.offset = 0,
+		.offset = btrfs_global_root_id(fs_info, bytenr),
 	};
 
 	return btrfs_global_root(fs_info, &key);
@@ -2094,7 +2114,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 {
 	const int next_backup = info->backup_root_index;
 	struct btrfs_root_backup *root_backup;
-	struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
 
 	root_backup = info->super_for_commit->super_roots + next_backup;
 
@@ -2128,6 +2147,7 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 			btrfs_header_level(info->block_group_root->node));
 	} else {
 		struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
+		struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
 
 		btrfs_set_backup_extent_root(root_backup,
 					     extent_root->node->start);
@@ -2135,6 +2155,12 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 				btrfs_header_generation(extent_root->node));
 		btrfs_set_backup_extent_root_level(root_backup,
 					btrfs_header_level(extent_root->node));
+
+		btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
+		btrfs_set_backup_csum_root_gen(root_backup,
+					       btrfs_header_generation(csum_root->node));
+		btrfs_set_backup_csum_root_level(root_backup,
+						 btrfs_header_level(csum_root->node));
 	}
 
 	/*
@@ -2156,12 +2182,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
 	btrfs_set_backup_dev_root_level(root_backup,
 				       btrfs_header_level(info->dev_root->node));
 
-	btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
-	btrfs_set_backup_csum_root_gen(root_backup,
-				       btrfs_header_generation(csum_root->node));
-	btrfs_set_backup_csum_root_level(root_backup,
-					 btrfs_header_level(csum_root->node));
-
 	btrfs_set_backup_total_bytes(root_backup,
 			     btrfs_super_total_bytes(info->super_copy));
 	btrfs_set_backup_bytes_used(root_backup,
@@ -2550,6 +2570,7 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
 {
 	struct btrfs_fs_info *fs_info = tree_root->fs_info;
 	struct btrfs_root *root;
+	u64 max_global_id = 0;
 	int ret;
 	struct btrfs_key key = {
 		.objectid = objectid,
@@ -2586,6 +2607,13 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
 			break;
 		btrfs_release_path(path);
 
+		/*
+		 * Just worry about this for extent tree, it'll be the same for
+		 * everybody.
+		 */
+		if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
+			max_global_id = max(max_global_id, key.offset);
+
 		found = true;
 		root = read_tree_root_path(tree_root, path, &key);
 		if (IS_ERR(root)) {
@@ -2603,6 +2631,9 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
 	}
 	btrfs_release_path(path);
 
+	if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
+		fs_info->nr_global_roots = max_global_id + 1;
+
 	if (!found || ret) {
 		if (objectid == BTRFS_CSUM_TREE_OBJECTID)
 			set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index cf227450f356..60a73bcffaf1 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -24,6 +24,8 @@ static struct btrfs_root *btrfs_free_space_root(
 		.type = BTRFS_ROOT_ITEM_KEY,
 		.offset = 0,
 	};
+	if (btrfs_fs_incompat(block_group->fs_info, EXTENT_TREE_V2))
+		key.offset = block_group->global_root_id;
 	return btrfs_global_root(block_group->fs_info, &key);
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index ba8dd90ac3ce..e343ff8db05d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1827,6 +1827,14 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
 		super->cache_generation = 0;
 	if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
 		super->uuid_tree_generation = root_item->generation;
+
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		root_item = &fs_info->block_group_root->root_item;
+
+		super->block_group_root = root_item->bytenr;
+		super->block_group_root_generation = root_item->generation;
+		super->block_group_root_level = root_item->level;
+	}
 }
 
 int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
@@ -2261,6 +2269,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	list_add_tail(&fs_info->chunk_root->dirty_list,
 		      &cur_trans->switch_commits);
 
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		btrfs_set_root_node(&fs_info->block_group_root->root_item,
+				    fs_info->block_group_root->node);
+		list_add_tail(&fs_info->block_group_root->dirty_list,
+			      &cur_trans->switch_commits);
+	}
+
 	switch_commit_roots(trans);
 
 	ASSERT(list_empty(&cur_trans->dirty_bgs));
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1c33dd0e4afc..572f52d78297 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -639,8 +639,10 @@ static void block_group_err(const struct extent_buffer *eb, int slot,
 static int check_block_group_item(struct extent_buffer *leaf,
 				  struct btrfs_key *key, int slot)
 {
+	struct btrfs_fs_info *fs_info = leaf->fs_info;
 	struct btrfs_block_group_item bgi;
 	u32 item_size = btrfs_item_size_nr(leaf, slot);
+	u64 chunk_objectid;
 	u64 flags;
 	u64 type;
 
@@ -663,8 +665,23 @@ static int check_block_group_item(struct extent_buffer *leaf,
 
 	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
 			   sizeof(bgi));
-	if (unlikely(btrfs_stack_block_group_chunk_objectid(&bgi) !=
-		     BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
+	chunk_objectid = btrfs_stack_block_group_chunk_objectid(&bgi);
+	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+		/*
+		 * We don't init the nr_global_roots until we load the global
+		 * roots, so this could be 0 at mount time.  If it's 0 we'll
+		 * just assume we're fine, and later we'll check against our
+		 * actual value.
+		 */
+		if (unlikely(fs_info->nr_global_roots &&
+			     chunk_objectid >= fs_info->nr_global_roots)) {
+			block_group_err(leaf, slot,
+	"invalid block group global root id, have %llu, needs to be <= %llu",
+					chunk_objectid,
+					fs_info->nr_global_roots);
+			return -EUCLEAN;
+		}
+	} else if (unlikely(chunk_objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
 		block_group_err(leaf, slot,
 		"invalid block group chunk objectid, have %llu expect %llu",
 				btrfs_stack_block_group_chunk_objectid(&bgi),
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] btrfs: add support for multiple global roots
  2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
@ 2021-11-06  1:18   ` Qu Wenruo
  2021-11-06  1:51     ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-06  1:18 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 2021/11/6 04:49, Josef Bacik wrote:
> With extent tree v2 you will be able to create multiple csum, extent,
> and free space trees.  They will be used based on the block group, which
> will now use the block_group_item->chunk_objectid to point to the set of
> global roots that it will use.  When allocating new block groups we'll
> simply mod the gigabyte offset of the block group against the number of
> global roots we have and that will be the block groups global id.
>
>  From there we can take the bytenr that we're modifying in the respective
> tree, look up the block group and get that block groups corresponding
> global root id.  From there we can get to the appropriate global root
> for that bytenr.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/block-group.c     | 11 +++++++--
>   fs/btrfs/block-group.h     |  1 +
>   fs/btrfs/ctree.h           |  2 ++
>   fs/btrfs/disk-io.c         | 49 +++++++++++++++++++++++++++++++-------
>   fs/btrfs/free-space-tree.c |  2 ++
>   fs/btrfs/transaction.c     | 15 ++++++++++++
>   fs/btrfs/tree-checker.c    | 21 ++++++++++++++--
>   7 files changed, 88 insertions(+), 13 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 7eb0a8632a01..85516f2fd5da 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2002,6 +2002,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
>   	cache->length = key->offset;
>   	cache->used = btrfs_stack_block_group_used(bgi);
>   	cache->flags = btrfs_stack_block_group_flags(bgi);
> +	cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
>
>   	set_free_space_tree_thresholds(cache);
>
> @@ -2284,7 +2285,7 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
>   	spin_lock(&block_group->lock);
>   	btrfs_set_stack_block_group_used(&bgi, block_group->used);
>   	btrfs_set_stack_block_group_chunk_objectid(&bgi,
> -				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> +						   block_group->global_root_id);
>   	btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
>   	key.objectid = block_group->start;
>   	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> @@ -2460,6 +2461,12 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
>   	cache->flags = type;
>   	cache->last_byte_to_unpin = (u64)-1;
>   	cache->cached = BTRFS_CACHE_FINISHED;
> +	cache->global_root_id = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> +
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +		cache->global_root_id = div64_u64(cache->start, SZ_1G) %
> +			fs_info->nr_global_roots;
> +

Any special reason for this complex global_root_id calculation?

My initial assumption for global trees is pretty simple, just something
like (CSUM_TREE, ROOT_ITEM, bg bytenr) or (EXTENT_TREE, ROOT_ITEM, bg
bytenr) as their root key items.

But this is definitely not the case here.

Thus I'm wondering why we're not using something more simple.

Thanks,
Qu

>   	if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
>   		cache->needs_free_space = 1;
>
> @@ -2676,7 +2683,7 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
>   	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>   	btrfs_set_stack_block_group_used(&bgi, cache->used);
>   	btrfs_set_stack_block_group_chunk_objectid(&bgi,
> -			BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> +						   cache->global_root_id);
>   	btrfs_set_stack_block_group_flags(&bgi, cache->flags);
>   	write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
>   	btrfs_mark_buffer_dirty(leaf);
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 5878b7ce3b78..93aabc68bb6a 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -68,6 +68,7 @@ struct btrfs_block_group {
>   	u64 bytes_super;
>   	u64 flags;
>   	u64 cache_generation;
> +	u64 global_root_id;
>
>   	/*
>   	 * If the free space extent count exceeds this number, convert the block
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b57367141b95..7de0cd2b87ec 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1057,6 +1057,8 @@ struct btrfs_fs_info {
>   	spinlock_t relocation_bg_lock;
>   	u64 data_reloc_bg;
>
> +	u64 nr_global_roots;
> +
>   	spinlock_t zone_active_bgs_lock;
>   	struct list_head zone_active_bgs;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 45b2bde43150..a8bc00d17b26 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1295,13 +1295,33 @@ struct btrfs_root *btrfs_global_root(struct btrfs_fs_info *fs_info,
>   	return root;
>   }
>
> +static u64 btrfs_global_root_id(struct btrfs_fs_info *fs_info, u64 bytenr)
> +{
> +	struct btrfs_block_group *block_group;
> +	u64 ret;
> +
> +	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +		return 0;
> +
> +	if (likely(bytenr))
> +		block_group = btrfs_lookup_block_group(fs_info, bytenr);
> +	else
> +		block_group = btrfs_lookup_first_block_group(fs_info, bytenr);
> +	ASSERT(block_group);
> +	if (!block_group)
> +		return 0;
> +	ret = block_group->global_root_id;
> +	btrfs_put_block_group(block_group);
> +	return ret;
> +}
> +
>   struct btrfs_root *btrfs_csum_root(struct btrfs_fs_info *fs_info,
>   				   u64 bytenr)
>   {
>   	struct btrfs_key key = {
>   		.objectid = BTRFS_CSUM_TREE_OBJECTID,
>   		.type = BTRFS_ROOT_ITEM_KEY,
> -		.offset = 0,
> +		.offset = btrfs_global_root_id(fs_info, bytenr),
>   	};
>
>   	return btrfs_global_root(fs_info, &key);
> @@ -1313,7 +1333,7 @@ struct btrfs_root *btrfs_extent_root(struct btrfs_fs_info *fs_info,
>   	struct btrfs_key key = {
>   		.objectid = BTRFS_EXTENT_TREE_OBJECTID,
>   		.type = BTRFS_ROOT_ITEM_KEY,
> -		.offset = 0,
> +		.offset = btrfs_global_root_id(fs_info, bytenr),
>   	};
>
>   	return btrfs_global_root(fs_info, &key);
> @@ -2094,7 +2114,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   {
>   	const int next_backup = info->backup_root_index;
>   	struct btrfs_root_backup *root_backup;
> -	struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>
>   	root_backup = info->super_for_commit->super_roots + next_backup;
>
> @@ -2128,6 +2147,7 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   			btrfs_header_level(info->block_group_root->node));
>   	} else {
>   		struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
> +		struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>
>   		btrfs_set_backup_extent_root(root_backup,
>   					     extent_root->node->start);
> @@ -2135,6 +2155,12 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   				btrfs_header_generation(extent_root->node));
>   		btrfs_set_backup_extent_root_level(root_backup,
>   					btrfs_header_level(extent_root->node));
> +
> +		btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
> +		btrfs_set_backup_csum_root_gen(root_backup,
> +					       btrfs_header_generation(csum_root->node));
> +		btrfs_set_backup_csum_root_level(root_backup,
> +						 btrfs_header_level(csum_root->node));
>   	}
>
>   	/*
> @@ -2156,12 +2182,6 @@ static void backup_super_roots(struct btrfs_fs_info *info)
>   	btrfs_set_backup_dev_root_level(root_backup,
>   				       btrfs_header_level(info->dev_root->node));
>
> -	btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
> -	btrfs_set_backup_csum_root_gen(root_backup,
> -				       btrfs_header_generation(csum_root->node));
> -	btrfs_set_backup_csum_root_level(root_backup,
> -					 btrfs_header_level(csum_root->node));
> -
>   	btrfs_set_backup_total_bytes(root_backup,
>   			     btrfs_super_total_bytes(info->super_copy));
>   	btrfs_set_backup_bytes_used(root_backup,
> @@ -2550,6 +2570,7 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
>   {
>   	struct btrfs_fs_info *fs_info = tree_root->fs_info;
>   	struct btrfs_root *root;
> +	u64 max_global_id = 0;
>   	int ret;
>   	struct btrfs_key key = {
>   		.objectid = objectid,
> @@ -2586,6 +2607,13 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
>   			break;
>   		btrfs_release_path(path);
>
> +		/*
> +		 * Just worry about this for extent tree, it'll be the same for
> +		 * everybody.
> +		 */
> +		if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
> +			max_global_id = max(max_global_id, key.offset);
> +
>   		found = true;
>   		root = read_tree_root_path(tree_root, path, &key);
>   		if (IS_ERR(root)) {
> @@ -2603,6 +2631,9 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
>   	}
>   	btrfs_release_path(path);
>
> +	if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
> +		fs_info->nr_global_roots = max_global_id + 1;
> +
>   	if (!found || ret) {
>   		if (objectid == BTRFS_CSUM_TREE_OBJECTID)
>   			set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
> index cf227450f356..60a73bcffaf1 100644
> --- a/fs/btrfs/free-space-tree.c
> +++ b/fs/btrfs/free-space-tree.c
> @@ -24,6 +24,8 @@ static struct btrfs_root *btrfs_free_space_root(
>   		.type = BTRFS_ROOT_ITEM_KEY,
>   		.offset = 0,
>   	};
> +	if (btrfs_fs_incompat(block_group->fs_info, EXTENT_TREE_V2))
> +		key.offset = block_group->global_root_id;
>   	return btrfs_global_root(block_group->fs_info, &key);
>   }
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index ba8dd90ac3ce..e343ff8db05d 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1827,6 +1827,14 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
>   		super->cache_generation = 0;
>   	if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
>   		super->uuid_tree_generation = root_item->generation;
> +
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +		root_item = &fs_info->block_group_root->root_item;
> +
> +		super->block_group_root = root_item->bytenr;
> +		super->block_group_root_generation = root_item->generation;
> +		super->block_group_root_level = root_item->level;
> +	}
>   }
>
>   int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
> @@ -2261,6 +2269,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   	list_add_tail(&fs_info->chunk_root->dirty_list,
>   		      &cur_trans->switch_commits);
>
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +		btrfs_set_root_node(&fs_info->block_group_root->root_item,
> +				    fs_info->block_group_root->node);
> +		list_add_tail(&fs_info->block_group_root->dirty_list,
> +			      &cur_trans->switch_commits);
> +	}
> +
>   	switch_commit_roots(trans);
>
>   	ASSERT(list_empty(&cur_trans->dirty_bgs));
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1c33dd0e4afc..572f52d78297 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -639,8 +639,10 @@ static void block_group_err(const struct extent_buffer *eb, int slot,
>   static int check_block_group_item(struct extent_buffer *leaf,
>   				  struct btrfs_key *key, int slot)
>   {
> +	struct btrfs_fs_info *fs_info = leaf->fs_info;
>   	struct btrfs_block_group_item bgi;
>   	u32 item_size = btrfs_item_size_nr(leaf, slot);
> +	u64 chunk_objectid;
>   	u64 flags;
>   	u64 type;
>
> @@ -663,8 +665,23 @@ static int check_block_group_item(struct extent_buffer *leaf,
>
>   	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
>   			   sizeof(bgi));
> -	if (unlikely(btrfs_stack_block_group_chunk_objectid(&bgi) !=
> -		     BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
> +	chunk_objectid = btrfs_stack_block_group_chunk_objectid(&bgi);
> +	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +		/*
> +		 * We don't init the nr_global_roots until we load the global
> +		 * roots, so this could be 0 at mount time.  If it's 0 we'll
> +		 * just assume we're fine, and later we'll check against our
> +		 * actual value.
> +		 */
> +		if (unlikely(fs_info->nr_global_roots &&
> +			     chunk_objectid >= fs_info->nr_global_roots)) {
> +			block_group_err(leaf, slot,
> +	"invalid block group global root id, have %llu, needs to be <= %llu",
> +					chunk_objectid,
> +					fs_info->nr_global_roots);
> +			return -EUCLEAN;
> +		}
> +	} else if (unlikely(chunk_objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
>   		block_group_err(leaf, slot,
>   		"invalid block group chunk objectid, have %llu expect %llu",
>   				btrfs_stack_block_group_chunk_objectid(&bgi),
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] btrfs: add support for multiple global roots
  2021-11-06  1:18   ` Qu Wenruo
@ 2021-11-06  1:51     ` Qu Wenruo
  2021-11-08 19:39       ` Josef Bacik
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-11-06  1:51 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 2021/11/6 09:18, Qu Wenruo wrote:
> 
> 
> On 2021/11/6 04:49, Josef Bacik wrote:
>> With extent tree v2 you will be able to create multiple csum, extent,
>> and free space trees.  They will be used based on the block group, which
>> will now use the block_group_item->chunk_objectid to point to the set of
>> global roots that it will use.  When allocating new block groups we'll
>> simply mod the gigabyte offset of the block group against the number of
>> global roots we have and that will be the block groups global id.
>>
>>  From there we can take the bytenr that we're modifying in the respective
>> tree, look up the block group and get that block groups corresponding
>> global root id.  From there we can get to the appropriate global root
>> for that bytenr.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/block-group.c     | 11 +++++++--
>>   fs/btrfs/block-group.h     |  1 +
>>   fs/btrfs/ctree.h           |  2 ++
>>   fs/btrfs/disk-io.c         | 49 +++++++++++++++++++++++++++++++-------
>>   fs/btrfs/free-space-tree.c |  2 ++
>>   fs/btrfs/transaction.c     | 15 ++++++++++++
>>   fs/btrfs/tree-checker.c    | 21 ++++++++++++++--
>>   7 files changed, 88 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 7eb0a8632a01..85516f2fd5da 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -2002,6 +2002,7 @@ static int read_one_block_group(struct 
>> btrfs_fs_info *info,
>>       cache->length = key->offset;
>>       cache->used = btrfs_stack_block_group_used(bgi);
>>       cache->flags = btrfs_stack_block_group_flags(bgi);
>> +    cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
>>
>>       set_free_space_tree_thresholds(cache);
>>
>> @@ -2284,7 +2285,7 @@ static int insert_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>       spin_lock(&block_group->lock);
>>       btrfs_set_stack_block_group_used(&bgi, block_group->used);
>>       btrfs_set_stack_block_group_chunk_objectid(&bgi,
>> -                BTRFS_FIRST_CHUNK_TREE_OBJECTID);
>> +                           block_group->global_root_id);
>>       btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
>>       key.objectid = block_group->start;
>>       key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
>> @@ -2460,6 +2461,12 @@ struct btrfs_block_group 
>> *btrfs_make_block_group(struct btrfs_trans_handle *tran
>>       cache->flags = type;
>>       cache->last_byte_to_unpin = (u64)-1;
>>       cache->cached = BTRFS_CACHE_FINISHED;
>> +    cache->global_root_id = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>> +
>> +    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>> +        cache->global_root_id = div64_u64(cache->start, SZ_1G) %
>> +            fs_info->nr_global_roots;
>> +
> 
> Any special reason for this complex global_root_id calculation?
> 
> My initial assumption for global trees is pretty simple, just something
> like (CSUM_TREE, ROOT_ITEM, bg bytenr) or (EXTENT_TREE, ROOT_ITEM, bg
> bytenr) as their root key items.
> 
> But this is definitely not the case here.
> 
> Thus I'm wondering why we're not using something more simple.

And I also forgot that, for smaller fs, we could have metadata only 
sized several megabytes.

In that case, several metadata block groups would share the same 
global_root_id, which is definitely not a good idea.

Thanks,
Qu
> 
> Thanks,
> Qu
> 
>>       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
>>           cache->needs_free_space = 1;
>>
>> @@ -2676,7 +2683,7 @@ static int update_block_group_item(struct 
>> btrfs_trans_handle *trans,
>>       bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>>       btrfs_set_stack_block_group_used(&bgi, cache->used);
>>       btrfs_set_stack_block_group_chunk_objectid(&bgi,
>> -            BTRFS_FIRST_CHUNK_TREE_OBJECTID);
>> +                           cache->global_root_id);
>>       btrfs_set_stack_block_group_flags(&bgi, cache->flags);
>>       write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
>>       btrfs_mark_buffer_dirty(leaf);
>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>> index 5878b7ce3b78..93aabc68bb6a 100644
>> --- a/fs/btrfs/block-group.h
>> +++ b/fs/btrfs/block-group.h
>> @@ -68,6 +68,7 @@ struct btrfs_block_group {
>>       u64 bytes_super;
>>       u64 flags;
>>       u64 cache_generation;
>> +    u64 global_root_id;
>>
>>       /*
>>        * If the free space extent count exceeds this number, convert 
>> the block
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index b57367141b95..7de0cd2b87ec 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1057,6 +1057,8 @@ struct btrfs_fs_info {
>>       spinlock_t relocation_bg_lock;
>>       u64 data_reloc_bg;
>>
>> +    u64 nr_global_roots;
>> +
>>       spinlock_t zone_active_bgs_lock;
>>       struct list_head zone_active_bgs;
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 45b2bde43150..a8bc00d17b26 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1295,13 +1295,33 @@ struct btrfs_root *btrfs_global_root(struct 
>> btrfs_fs_info *fs_info,
>>       return root;
>>   }
>>
>> +static u64 btrfs_global_root_id(struct btrfs_fs_info *fs_info, u64 
>> bytenr)
>> +{
>> +    struct btrfs_block_group *block_group;
>> +    u64 ret;
>> +
>> +    if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>> +        return 0;
>> +
>> +    if (likely(bytenr))
>> +        block_group = btrfs_lookup_block_group(fs_info, bytenr);
>> +    else
>> +        block_group = btrfs_lookup_first_block_group(fs_info, bytenr);
>> +    ASSERT(block_group);
>> +    if (!block_group)
>> +        return 0;
>> +    ret = block_group->global_root_id;
>> +    btrfs_put_block_group(block_group);
>> +    return ret;
>> +}
>> +
>>   struct btrfs_root *btrfs_csum_root(struct btrfs_fs_info *fs_info,
>>                      u64 bytenr)
>>   {
>>       struct btrfs_key key = {
>>           .objectid = BTRFS_CSUM_TREE_OBJECTID,
>>           .type = BTRFS_ROOT_ITEM_KEY,
>> -        .offset = 0,
>> +        .offset = btrfs_global_root_id(fs_info, bytenr),
>>       };
>>
>>       return btrfs_global_root(fs_info, &key);
>> @@ -1313,7 +1333,7 @@ struct btrfs_root *btrfs_extent_root(struct 
>> btrfs_fs_info *fs_info,
>>       struct btrfs_key key = {
>>           .objectid = BTRFS_EXTENT_TREE_OBJECTID,
>>           .type = BTRFS_ROOT_ITEM_KEY,
>> -        .offset = 0,
>> +        .offset = btrfs_global_root_id(fs_info, bytenr),
>>       };
>>
>>       return btrfs_global_root(fs_info, &key);
>> @@ -2094,7 +2114,6 @@ static void backup_super_roots(struct 
>> btrfs_fs_info *info)
>>   {
>>       const int next_backup = info->backup_root_index;
>>       struct btrfs_root_backup *root_backup;
>> -    struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>>
>>       root_backup = info->super_for_commit->super_roots + next_backup;
>>
>> @@ -2128,6 +2147,7 @@ static void backup_super_roots(struct 
>> btrfs_fs_info *info)
>>               btrfs_header_level(info->block_group_root->node));
>>       } else {
>>           struct btrfs_root *extent_root = btrfs_extent_root(info, 0);
>> +        struct btrfs_root *csum_root = btrfs_csum_root(info, 0);
>>
>>           btrfs_set_backup_extent_root(root_backup,
>>                            extent_root->node->start);
>> @@ -2135,6 +2155,12 @@ static void backup_super_roots(struct 
>> btrfs_fs_info *info)
>>                   btrfs_header_generation(extent_root->node));
>>           btrfs_set_backup_extent_root_level(root_backup,
>>                       btrfs_header_level(extent_root->node));
>> +
>> +        btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
>> +        btrfs_set_backup_csum_root_gen(root_backup,
>> +                           btrfs_header_generation(csum_root->node));
>> +        btrfs_set_backup_csum_root_level(root_backup,
>> +                         btrfs_header_level(csum_root->node));
>>       }
>>
>>       /*
>> @@ -2156,12 +2182,6 @@ static void backup_super_roots(struct 
>> btrfs_fs_info *info)
>>       btrfs_set_backup_dev_root_level(root_backup,
>>                          btrfs_header_level(info->dev_root->node));
>>
>> -    btrfs_set_backup_csum_root(root_backup, csum_root->node->start);
>> -    btrfs_set_backup_csum_root_gen(root_backup,
>> -                       btrfs_header_generation(csum_root->node));
>> -    btrfs_set_backup_csum_root_level(root_backup,
>> -                     btrfs_header_level(csum_root->node));
>> -
>>       btrfs_set_backup_total_bytes(root_backup,
>>                    btrfs_super_total_bytes(info->super_copy));
>>       btrfs_set_backup_bytes_used(root_backup,
>> @@ -2550,6 +2570,7 @@ static int load_global_roots_objectid(struct 
>> btrfs_root *tree_root,
>>   {
>>       struct btrfs_fs_info *fs_info = tree_root->fs_info;
>>       struct btrfs_root *root;
>> +    u64 max_global_id = 0;
>>       int ret;
>>       struct btrfs_key key = {
>>           .objectid = objectid,
>> @@ -2586,6 +2607,13 @@ static int load_global_roots_objectid(struct 
>> btrfs_root *tree_root,
>>               break;
>>           btrfs_release_path(path);
>>
>> +        /*
>> +         * Just worry about this for extent tree, it'll be the same for
>> +         * everybody.
>> +         */
>> +        if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
>> +            max_global_id = max(max_global_id, key.offset);
>> +
>>           found = true;
>>           root = read_tree_root_path(tree_root, path, &key);
>>           if (IS_ERR(root)) {
>> @@ -2603,6 +2631,9 @@ static int load_global_roots_objectid(struct 
>> btrfs_root *tree_root,
>>       }
>>       btrfs_release_path(path);
>>
>> +    if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
>> +        fs_info->nr_global_roots = max_global_id + 1;
>> +
>>       if (!found || ret) {
>>           if (objectid == BTRFS_CSUM_TREE_OBJECTID)
>>               set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
>> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
>> index cf227450f356..60a73bcffaf1 100644
>> --- a/fs/btrfs/free-space-tree.c
>> +++ b/fs/btrfs/free-space-tree.c
>> @@ -24,6 +24,8 @@ static struct btrfs_root *btrfs_free_space_root(
>>           .type = BTRFS_ROOT_ITEM_KEY,
>>           .offset = 0,
>>       };
>> +    if (btrfs_fs_incompat(block_group->fs_info, EXTENT_TREE_V2))
>> +        key.offset = block_group->global_root_id;
>>       return btrfs_global_root(block_group->fs_info, &key);
>>   }
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index ba8dd90ac3ce..e343ff8db05d 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -1827,6 +1827,14 @@ static void update_super_roots(struct 
>> btrfs_fs_info *fs_info)
>>           super->cache_generation = 0;
>>       if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
>>           super->uuid_tree_generation = root_item->generation;
>> +
>> +    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +        root_item = &fs_info->block_group_root->root_item;
>> +
>> +        super->block_group_root = root_item->bytenr;
>> +        super->block_group_root_generation = root_item->generation;
>> +        super->block_group_root_level = root_item->level;
>> +    }
>>   }
>>
>>   int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
>> @@ -2261,6 +2269,13 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>       list_add_tail(&fs_info->chunk_root->dirty_list,
>>                 &cur_trans->switch_commits);
>>
>> +    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +        btrfs_set_root_node(&fs_info->block_group_root->root_item,
>> +                    fs_info->block_group_root->node);
>> +        list_add_tail(&fs_info->block_group_root->dirty_list,
>> +                  &cur_trans->switch_commits);
>> +    }
>> +
>>       switch_commit_roots(trans);
>>
>>       ASSERT(list_empty(&cur_trans->dirty_bgs));
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index 1c33dd0e4afc..572f52d78297 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -639,8 +639,10 @@ static void block_group_err(const struct 
>> extent_buffer *eb, int slot,
>>   static int check_block_group_item(struct extent_buffer *leaf,
>>                     struct btrfs_key *key, int slot)
>>   {
>> +    struct btrfs_fs_info *fs_info = leaf->fs_info;
>>       struct btrfs_block_group_item bgi;
>>       u32 item_size = btrfs_item_size_nr(leaf, slot);
>> +    u64 chunk_objectid;
>>       u64 flags;
>>       u64 type;
>>
>> @@ -663,8 +665,23 @@ static int check_block_group_item(struct 
>> extent_buffer *leaf,
>>
>>       read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
>>                  sizeof(bgi));
>> -    if (unlikely(btrfs_stack_block_group_chunk_objectid(&bgi) !=
>> -             BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
>> +    chunk_objectid = btrfs_stack_block_group_chunk_objectid(&bgi);
>> +    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +        /*
>> +         * We don't init the nr_global_roots until we load the global
>> +         * roots, so this could be 0 at mount time.  If it's 0 we'll
>> +         * just assume we're fine, and later we'll check against our
>> +         * actual value.
>> +         */
>> +        if (unlikely(fs_info->nr_global_roots &&
>> +                 chunk_objectid >= fs_info->nr_global_roots)) {
>> +            block_group_err(leaf, slot,
>> +    "invalid block group global root id, have %llu, needs to be <= 
>> %llu",
>> +                    chunk_objectid,
>> +                    fs_info->nr_global_roots);
>> +            return -EUCLEAN;
>> +        }
>> +    } else if (unlikely(chunk_objectid != 
>> BTRFS_FIRST_CHUNK_TREE_OBJECTID)) {
>>           block_group_err(leaf, slot,
>>           "invalid block group chunk objectid, have %llu expect %llu",
>>                   btrfs_stack_block_group_chunk_objectid(&bgi),
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] btrfs: add support for multiple global roots
  2021-11-06  1:51     ` Qu Wenruo
@ 2021-11-08 19:39       ` Josef Bacik
  0 siblings, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-11-08 19:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, kernel-team

On Sat, Nov 06, 2021 at 09:51:20AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/6 09:18, Qu Wenruo wrote:
> > 
> > 
> > On 2021/11/6 04:49, Josef Bacik wrote:
> > > With extent tree v2 you will be able to create multiple csum, extent,
> > > and free space trees.  They will be used based on the block group, which
> > > will now use the block_group_item->chunk_objectid to point to the set of
> > > global roots that it will use.  When allocating new block groups we'll
> > > simply mod the gigabyte offset of the block group against the number of
> > > global roots we have and that will be the block groups global id.
> > > 
> > >  From there we can take the bytenr that we're modifying in the respective
> > > tree, look up the block group and get that block groups corresponding
> > > global root id.  From there we can get to the appropriate global root
> > > for that bytenr.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > >   fs/btrfs/block-group.c     | 11 +++++++--
> > >   fs/btrfs/block-group.h     |  1 +
> > >   fs/btrfs/ctree.h           |  2 ++
> > >   fs/btrfs/disk-io.c         | 49 +++++++++++++++++++++++++++++++-------
> > >   fs/btrfs/free-space-tree.c |  2 ++
> > >   fs/btrfs/transaction.c     | 15 ++++++++++++
> > >   fs/btrfs/tree-checker.c    | 21 ++++++++++++++--
> > >   7 files changed, 88 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > index 7eb0a8632a01..85516f2fd5da 100644
> > > --- a/fs/btrfs/block-group.c
> > > +++ b/fs/btrfs/block-group.c
> > > @@ -2002,6 +2002,7 @@ static int read_one_block_group(struct
> > > btrfs_fs_info *info,
> > >       cache->length = key->offset;
> > >       cache->used = btrfs_stack_block_group_used(bgi);
> > >       cache->flags = btrfs_stack_block_group_flags(bgi);
> > > +    cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
> > > 
> > >       set_free_space_tree_thresholds(cache);
> > > 
> > > @@ -2284,7 +2285,7 @@ static int insert_block_group_item(struct
> > > btrfs_trans_handle *trans,
> > >       spin_lock(&block_group->lock);
> > >       btrfs_set_stack_block_group_used(&bgi, block_group->used);
> > >       btrfs_set_stack_block_group_chunk_objectid(&bgi,
> > > -                BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> > > +                           block_group->global_root_id);
> > >       btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
> > >       key.objectid = block_group->start;
> > >       key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> > > @@ -2460,6 +2461,12 @@ struct btrfs_block_group
> > > *btrfs_make_block_group(struct btrfs_trans_handle *tran
> > >       cache->flags = type;
> > >       cache->last_byte_to_unpin = (u64)-1;
> > >       cache->cached = BTRFS_CACHE_FINISHED;
> > > +    cache->global_root_id = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> > > +
> > > +    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> > > +        cache->global_root_id = div64_u64(cache->start, SZ_1G) %
> > > +            fs_info->nr_global_roots;
> > > +
> > 
> > Any special reason for this complex global_root_id calculation?
> > 
> > My initial assumption for global trees is pretty simple, just something
> > like (CSUM_TREE, ROOT_ITEM, bg bytenr) or (EXTENT_TREE, ROOT_ITEM, bg
> > bytenr) as their root key items.
> > 
> > But this is definitely not the case here.
> > 
> > Thus I'm wondering why we're not using something more simple.
> 

Because we don't have a global tree per-block group.  We have N global roots
that have to be round robined through the block groups.  We could do something
smarter in the future, but for now just round robin'ing them is easy to wrap
your head around, and is a decent default.

> And I also forgot that, for smaller fs, we could have metadata only sized
> several megabytes.
> 
> In that case, several metadata block groups would share the same
> global_root_id, which is definitely not a good idea.
> 

Sure, we can adjust this logic for smaller file systems, I'll change it to scale
down for smaller fs'es.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-11-10 13:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
2021-11-06  1:05   ` Qu Wenruo
2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
2021-11-06  1:11   ` Qu Wenruo
2021-11-08 19:36     ` Josef Bacik
2021-11-09  1:14       ` Qu Wenruo
2021-11-09 19:24         ` Josef Bacik
2021-11-09 23:44           ` Qu Wenruo
2021-11-10 13:57             ` Josef Bacik
2021-11-10  7:13           ` Qu Wenruo
2021-11-10 13:54             ` Josef Bacik
2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
2021-11-06  1:18   ` Qu Wenruo
2021-11-06  1:51     ` Qu Wenruo
2021-11-08 19:39       ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox