[PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched
@ 2022-10-12  9:12 Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 01/15] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:12 UTC (permalink / raw)
  To: linux-btrfs

[Changelog]
v2:
- Rebased to latest misc-next
  Most conflicts comes from the new function btrfs_check_features().

Just like init_btrfs_fs(), open_ctree() also has tons of different
labels for its error handling.

And unsurprisingly the error handling labels are not matched correctly,
e.g. we always call btrfs_mapping_tree_free() even we didn't reach
sys chunk array read.

And every time we need to add some new function, it will be a disaster
just to understand where the new function should be put and how the
error handling should be done.

This patchset will follow the init_btrfs_fs() method, by introducing
an open_ctree_seq[] array, which contains the following sections:

- btree_inode init/exit
- super block read and verification
- mount options and features check
- workqueues init/exit
- chunk tree init/exit
- tree roots init/exit
- mount time check and various item load
- sysfs init/exit
- block group tree init/exit
- subvolume trees init/exit
- kthread init/exit
- qgroup init/exit

The remaining part of open_ctree() is only less than 50 lines, and are
all related to the very end of the mount progress, including log-replay,
uuid tree check.

Also to do better testing, for DEBUG build there will be a new mount
option, "fail_mount=%u" to allow open_ctree() to fail at certain stage
of open_ctree_seq[].

Unfortunately since that mount option can only be parsed in
open_ctree_features_init(), this means we can only fail after stage 2.
But this should still provide much better testing coverage.

Qu Wenruo (15):
  btrfs: initialize fs_info->sb at the very beginning of open_ctree()
  btrfs: remove @fs_devices argument from open_ctree()
  btrfs: extract btree inode init code into its own init/exit helpers
  btrfs: extract super block read code into its own init helper
  btrfs: extract mount options and features init code into its own init
    helper
  btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into
    open_ctree_seq[]
  btrfs: extract chunk tree read code into its own init/exit helpers
  btrfs: extract tree roots and zone info initialization into init/exit
    helpers
  btrfs: extract mount time checks and items load code into its init
    helper
  btrfs: extract sysfs init into its own helper
  btrfs: extra block groups read code into its own init/exit helpers
  btrfs: move the fs root related code into its own init/exit helpers
  btrfs: extract kthread code into its own init/exit helpers
  btrfs: move qgroup init/exit code into open_ctree_seq[] array
  btrfs: introduce a debug mount option to do error injection for each
    stage of open_ctree()

 fs/btrfs/ctree.h   |   7 +
 fs/btrfs/disk-io.c | 611 +++++++++++++++++++++++++++++----------------
 fs/btrfs/disk-io.h |   4 +-
 fs/btrfs/super.c   |  18 +-
 4 files changed, 418 insertions(+), 222 deletions(-)

-- 
2.37.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/15] btrfs: initialize fs_info->sb at the very beginning of open_ctree()
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
@ 2022-10-12  9:12 ` Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 02/15] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:12 UTC (permalink / raw)
  To: linux-btrfs

Currently at open_ctree(), sb->s_fs_info is already initialized to the
fs_info we want.

On the other hand, fs_info->sb is not initialized until
init_mount_fs_info().

This patch will initialize fs_info->sb at the very beginning of
open_ctree(), so later code can use fs_info->sb to grab the super block.

This does not only remove the @sb parameter for init_mount_fs_info(),
but also provides the basis for later open_ctree() refactor which
requires everything to be accessible from a single @fs_info pointer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9f526841c68b..c00999212c91 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3102,13 +3102,12 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
 }
 
-static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
+static int init_mount_fs_info(struct btrfs_fs_info *fs_info)
 {
 	int ret;
 
-	fs_info->sb = sb;
-	sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE;
-	sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
+	fs_info->sb->s_blocksize = BTRFS_BDEV_BLOCKSIZE;
+	fs_info->sb->s_blocksize_bits = blksize_bits(BTRFS_BDEV_BLOCKSIZE);
 
 	ret = percpu_counter_init(&fs_info->ordered_bytes, 0, GFP_KERNEL);
 	if (ret)
@@ -3136,7 +3135,7 @@ static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block
 		return -ENOMEM;
 	btrfs_init_delayed_root(fs_info->delayed_root);
 
-	if (sb_rdonly(sb))
+	if (sb_rdonly(fs_info->sb))
 		set_bit(BTRFS_FS_STATE_RO, &fs_info->fs_state);
 
 	return btrfs_alloc_stripe_hash_table(fs_info);
@@ -3420,7 +3419,9 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	int err = -EINVAL;
 	int level;
 
-	ret = init_mount_fs_info(fs_info, sb);
+	fs_info->sb = sb;
+
+	ret = init_mount_fs_info(fs_info);
 	if (ret) {
 		err = ret;
 		goto fail;
@@ -3438,7 +3439,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail;
 	}
 
-	fs_info->btree_inode = new_inode(sb);
+	fs_info->btree_inode = new_inode(fs_info->sb);
 	if (!fs_info->btree_inode) {
 		err = -ENOMEM;
 		goto fail;
@@ -3546,13 +3547,13 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
 	fs_info->stripesize = stripesize;
 
-	ret = btrfs_parse_options(fs_info, options, sb->s_flags);
+	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
 	if (ret) {
 		err = ret;
 		goto fail_alloc;
 	}
 
-	ret = btrfs_check_features(fs_info, sb);
+	ret = btrfs_check_features(fs_info, fs_info->sb);
 	if (ret < 0) {
 		err = ret;
 		goto fail_alloc;
@@ -3588,12 +3589,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_sb_buffer;
 	}
 
-	sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
-	sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
+	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
+	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
-	sb->s_blocksize = sectorsize;
-	sb->s_blocksize_bits = blksize_bits(sectorsize);
-	memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
+	fs_info->sb->s_blocksize = sectorsize;
+	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
+	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
 
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
@@ -3790,7 +3791,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_qgroup;
 	}
 
-	if (sb_rdonly(sb))
+	if (sb_rdonly(fs_info->sb))
 		goto clear_oneshot;
 
 	ret = btrfs_start_pre_rw_mount(fs_info);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/15] btrfs: remove @fs_devices argument from open_ctree()
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 01/15] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
@ 2022-10-12  9:12 ` Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 03/15] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:12 UTC (permalink / raw)
  To: linux-btrfs

At the timing of open_ctree(), we have already initialized
fs_info->fs_devics, thus no need to pass a dedicated @fs_devices
argument into open_ctree().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 7 +++++--
 fs/btrfs/disk-io.h | 4 +---
 fs/btrfs/super.c   | 5 ++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c00999212c91..ba9fe0041e6f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3402,8 +3402,7 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, struct super_block *sb)
 	return 0;
 }
 
-int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices,
-		      char *options)
+int __cold open_ctree(struct super_block *sb, char *options)
 {
 	u32 sectorsize;
 	u32 nodesize;
@@ -3413,6 +3412,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	u16 csum_type;
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_root *tree_root;
 	struct btrfs_root *chunk_root;
 	int ret;
@@ -3421,6 +3421,9 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	fs_info->sb = sb;
 
+	/* Caller should have already initialized fs_info->fs_devices. */
+	ASSERT(fs_info->fs_devices);
+
 	ret = init_mount_fs_info(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 0a77948bb703..15a1a56258c8 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -42,9 +42,7 @@ struct extent_buffer *btrfs_find_create_tree_block(
 void btrfs_clean_tree_block(struct extent_buffer *buf);
 void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info);
 int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info);
-int __cold open_ctree(struct super_block *sb,
-	       struct btrfs_fs_devices *fs_devices,
-	       char *options);
+int __cold open_ctree(struct super_block *sb, char *options);
 void __cold close_ctree(struct btrfs_fs_info *fs_info);
 int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 			 struct btrfs_super_block *sb, int mirror_num);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 688f7704d3fd..bbdbd2a6e3bc 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1429,7 +1429,6 @@ static int get_default_subvol_objectid(struct btrfs_fs_info *fs_info, u64 *objec
 }
 
 static int btrfs_fill_super(struct super_block *sb,
-			    struct btrfs_fs_devices *fs_devices,
 			    void *data)
 {
 	struct inode *inode;
@@ -1458,7 +1457,7 @@ static int btrfs_fill_super(struct super_block *sb,
 		return err;
 	}
 
-	err = open_ctree(sb, fs_devices, (char *)data);
+	err = open_ctree(sb, (char *)data);
 	if (err) {
 		btrfs_err(fs_info, "open_ctree failed");
 		return err;
@@ -1826,7 +1825,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		btrfs_sb(s)->bdev_holder = fs_type;
 		if (!strstr(crc32c_impl(), "generic"))
 			set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags);
-		error = btrfs_fill_super(s, fs_devices, data);
+		error = btrfs_fill_super(s, data);
 	}
 	if (!error)
 		error = security_sb_set_mnt_opts(s, new_sec_opts, 0, NULL);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/15] btrfs: extract btree inode init code into its own init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 01/15] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
  2022-10-12  9:12 ` [PATCH v2 02/15] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
@ 2022-10-12  9:12 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 04/15] btrfs: extract super block read code into its own init helper Qu Wenruo
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:12 UTC (permalink / raw)
  To: linux-btrfs

Just like how we handle init_btrfs_fs(), here we also use an array to
handle the open_ctree() sequence.

This patch will do the first step by introducing the arrays, and put
btree_inode init/exit code into two helpers:

- open_ctree_btree_inode_init
- open_ctree_btree_inode_exit

[EXPOSED LABEL MISMATCH]
- Bad btrfs_mapping_tree_free() call
  Currently in fail_alloc tag, we call not only iput() on btree_inode, but
  also free the fs_info->mapping_tree, which is not correct.

  As the first time we touch mapping_tree is at btrfs_read_sys_array(),
  thus the btrfs_mapping_tree_free() call is already a label mismatch.

  This will be addressed when we refactor the chunk tree init code.

- Bad invalidate_inode_pages2() call
  After initiliazing the btree inode, we should invalidate all the page
  cache for metadata before we put the btree inode.

  But the old code is calling invalidate_inode_pages2() before stopping
  all workers.

  This is addressed by this patch as we're really the last exit function
  to be executed, thus we're safe to call invalidate_inode_pages2() then
  iput().

[SPECIAL HANDLING]
After init_mount_fs_info() if we hit some error, we don't need to undo
the work of init_mount_fs_info(), as caller will call btrfs_free_fs_info()
to free it anyway.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 111 +++++++++++++++++++++++++++++++--------------
 1 file changed, 77 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ba9fe0041e6f..1c1c767f2bf1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3402,6 +3402,64 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, struct super_block *sb)
 	return 0;
 }
 
+static int open_ctree_btree_inode_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	/*
+	 * The members initialized inside init_mount_fs_info() will be
+	 * handled in btrfs_free_fs_info() by the caller.
+	 * Thus we don't need to handle its error here.
+	 */
+	ret = init_mount_fs_info(fs_info);
+	if (ret < 0)
+		return ret;
+
+	/* These need to be init'ed before we start creating inodes and such. */
+	fs_info->tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID,
+					      GFP_KERNEL);
+	fs_info->chunk_root = btrfs_alloc_root(fs_info, BTRFS_CHUNK_TREE_OBJECTID,
+					       GFP_KERNEL);
+	if (!fs_info->tree_root || !fs_info->chunk_root)
+		goto enomem;
+	fs_info->btree_inode = new_inode(fs_info->sb);
+	if (!fs_info->btree_inode)
+		goto enomem;
+
+	mapping_set_gfp_mask(fs_info->btree_inode->i_mapping, GFP_NOFS);
+	btrfs_init_btree_inode(fs_info);
+	invalidate_bdev(fs_info->fs_devices->latest_dev->bdev);
+	return 0;
+enomem:
+	btrfs_put_root(fs_info->tree_root);
+	btrfs_put_root(fs_info->chunk_root);
+	fs_info->tree_root = NULL;
+	fs_info->chunk_root = NULL;
+	return -ENOMEM;
+}
+
+static void open_ctree_btree_inode_exit(struct btrfs_fs_info *fs_info)
+{
+	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
+	iput(fs_info->btree_inode);
+	btrfs_put_root(fs_info->tree_root);
+	btrfs_put_root(fs_info->chunk_root);
+	fs_info->tree_root = NULL;
+	fs_info->chunk_root = NULL;
+}
+
+struct init_sequence {
+	int (*init_func)(struct btrfs_fs_info *fs_info);
+	void (*exit_func)(struct btrfs_fs_info *fs_info);
+};
+
+static const struct init_sequence open_ctree_seq[] = {
+	{
+		.init_func = open_ctree_btree_inode_init,
+		.exit_func = open_ctree_btree_inode_exit,
+	}
+};
+
 int __cold open_ctree(struct super_block *sb, char *options)
 {
 	u32 sectorsize;
@@ -3410,47 +3468,26 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	u64 generation;
 	u64 features;
 	u16 csum_type;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
-	struct btrfs_root *tree_root;
-	struct btrfs_root *chunk_root;
 	int ret;
 	int err = -EINVAL;
 	int level;
+	int i;
 
 	fs_info->sb = sb;
 
 	/* Caller should have already initialized fs_info->fs_devices. */
 	ASSERT(fs_info->fs_devices);
 
-	ret = init_mount_fs_info(fs_info);
-	if (ret) {
-		err = ret;
-		goto fail;
-	}
-
-	/* These need to be init'ed before we start creating inodes and such. */
-	tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID,
-				     GFP_KERNEL);
-	fs_info->tree_root = tree_root;
-	chunk_root = btrfs_alloc_root(fs_info, BTRFS_CHUNK_TREE_OBJECTID,
-				      GFP_KERNEL);
-	fs_info->chunk_root = chunk_root;
-	if (!tree_root || !chunk_root) {
-		err = -ENOMEM;
-		goto fail;
-	}
-
-	fs_info->btree_inode = new_inode(fs_info->sb);
-	if (!fs_info->btree_inode) {
-		err = -ENOMEM;
-		goto fail;
+	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
+		ret = open_ctree_seq[i].init_func(fs_info);
+		if (ret < 0)
+			goto fail;
+		open_ctree_res[i] = true;
 	}
-	mapping_set_gfp_mask(fs_info->btree_inode->i_mapping, GFP_NOFS);
-	btrfs_init_btree_inode(fs_info);
-
-	invalidate_bdev(fs_devices->latest_dev->bdev);
 
 	/*
 	 * Read super block and check the signature bytes only
@@ -3609,14 +3646,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	generation = btrfs_super_chunk_root_generation(disk_super);
 	level = btrfs_super_chunk_root_level(disk_super);
-	ret = load_super_root(chunk_root, btrfs_super_chunk_root(disk_super),
+	ret = load_super_root(fs_info->chunk_root,
+			      btrfs_super_chunk_root(disk_super),
 			      generation, level);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read chunk root");
 		goto fail_tree_roots;
 	}
 
-	read_extent_buffer(chunk_root->node, fs_info->chunk_tree_uuid,
+	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
 			   offsetof(struct btrfs_header, chunk_tree_uuid),
 			   BTRFS_UUID_SIZE);
 
@@ -3740,7 +3778,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sysfs;
 
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
-						   tree_root,
+						   fs_info->tree_root,
 						   "btrfs-transaction");
 	if (IS_ERR(fs_info->transaction_kthread))
 		goto fail_cleaner;
@@ -3855,17 +3893,22 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (fs_info->data_reloc_root)
 		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
 	free_root_pointers(fs_info, true);
-	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
 
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
 fail_alloc:
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
-
-	iput(fs_info->btree_inode);
 fail:
+	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
+		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
+			continue;
+		open_ctree_seq[i].exit_func(fs_info);
+		open_ctree_res[i] = false;
+	}
 	btrfs_close_devices(fs_info->fs_devices);
+	if (ret < 0)
+		err = ret;
 	return err;
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/15] btrfs: extract super block read code into its own init helper
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-10-12  9:12 ` [PATCH v2 03/15] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 05/15] btrfs: extract mount options and features init " Qu Wenruo
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

This patch will extract the super block read and cached members
(sectorsize/nodesize/etc) initialization into a helper,
open_ctree_super_init().

This extraction also did the following non-functional change:

- Add an error message for super_root == 0 case
  Previously we just goto fail_alloc, with ret == 0.
  This can be very confusing and would cause problems since we didn't
  finish the mount at all.

- Move sb->s_blocksize and sb->s_bdi initialization into the new helper
  Since at this stage we already have valid super block and its
  sectorsize, we can directly initialize them here.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 185 +++++++++++++++++++++++----------------------
 1 file changed, 93 insertions(+), 92 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1c1c767f2bf1..f8be0a74d07a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3448,55 +3448,18 @@ static void open_ctree_btree_inode_exit(struct btrfs_fs_info *fs_info)
 	fs_info->chunk_root = NULL;
 }
 
-struct init_sequence {
-	int (*init_func)(struct btrfs_fs_info *fs_info);
-	void (*exit_func)(struct btrfs_fs_info *fs_info);
-};
-
-static const struct init_sequence open_ctree_seq[] = {
-	{
-		.init_func = open_ctree_btree_inode_init,
-		.exit_func = open_ctree_btree_inode_exit,
-	}
-};
-
-int __cold open_ctree(struct super_block *sb, char *options)
+static int open_ctree_super_init(struct btrfs_fs_info *fs_info)
 {
-	u32 sectorsize;
+	struct btrfs_super_block *disk_super;
 	u32 nodesize;
-	u32 stripesize;
-	u64 generation;
-	u64 features;
+	u32 sectorsize;
 	u16 csum_type;
-	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
-	struct btrfs_super_block *disk_super;
-	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	int ret;
-	int err = -EINVAL;
-	int level;
-	int i;
 
-	fs_info->sb = sb;
-
-	/* Caller should have already initialized fs_info->fs_devices. */
-	ASSERT(fs_info->fs_devices);
-
-	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
-		ret = open_ctree_seq[i].init_func(fs_info);
-		if (ret < 0)
-			goto fail;
-		open_ctree_res[i] = true;
-	}
-
-	/*
-	 * Read super block and check the signature bytes only
-	 */
-	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
-	if (IS_ERR(disk_super)) {
-		err = PTR_ERR(disk_super);
-		goto fail_alloc;
-	}
+	/* Read super block and check the signature bytes only */
+	disk_super = btrfs_read_dev_super(fs_info->fs_devices->latest_dev->bdev);
+	if (IS_ERR(disk_super))
+		return PTR_ERR(disk_super);
 
 	/*
 	 * Verify the type first, if that or the checksum value are
@@ -3506,19 +3469,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (!btrfs_supported_super_csum(csum_type)) {
 		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
 			  csum_type);
-		err = -EINVAL;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	fs_info->csum_size = btrfs_super_csum_size(disk_super);
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
-	if (ret) {
-		err = ret;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
-	}
+	if (ret)
+		goto error;
 
 	/*
 	 * We want to check superblock checksum, the type is stored inside.
@@ -3526,9 +3485,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	if (btrfs_check_super_csum(fs_info, (u8 *)disk_super)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
-		err = -EINVAL;
-		btrfs_release_disk_super(disk_super);
-		goto fail_alloc;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	/*
@@ -3536,33 +3494,30 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * following bytes up to INFO_SIZE, the checksum is calculated from
 	 * the whole block of INFO_SIZE
 	 */
-	memcpy(fs_info->super_copy, disk_super, sizeof(*fs_info->super_copy));
+	memcpy(fs_info->super_copy, disk_super, BTRFS_SUPER_INFO_SIZE);
 	btrfs_release_disk_super(disk_super);
 
-	disk_super = fs_info->super_copy;
-
-
-	features = btrfs_super_flags(disk_super);
-	if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
-		features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
-		btrfs_set_super_flags(disk_super, features);
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
+		btrfs_set_super_flags(fs_info->super_copy,
+				btrfs_super_flags(fs_info->super_copy) &
+				~BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
 		btrfs_info(fs_info,
 			"found metadata UUID change in progress flag, clearing");
 	}
-
 	memcpy(fs_info->super_for_commit, fs_info->super_copy,
-	       sizeof(*fs_info->super_for_commit));
-
+	       BTRFS_SUPER_INFO_SIZE);
 	ret = btrfs_validate_mount_super(fs_info);
-	if (ret) {
+	if (ret < 0) {
 		btrfs_err(fs_info, "superblock contains fatal errors");
-		err = -EINVAL;
-		goto fail_alloc;
+		return -EINVAL;
 	}
 
-	if (!btrfs_super_root(disk_super))
-		goto fail_alloc;
-
+	if (!btrfs_super_root(fs_info->super_copy)) {
+		btrfs_err(fs_info,
+		"invalid super root bytenr, should have non-zero bytenr");
+		return -EINVAL;
+	}
 	/* check FS state, whether FS is broken. */
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_ERROR)
 		set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state);
@@ -3573,11 +3528,9 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
 
-
 	/* Set up fs_info before parsing mount options */
 	nodesize = btrfs_super_nodesize(disk_super);
 	sectorsize = btrfs_super_sectorsize(disk_super);
-	stripesize = sectorsize;
 	fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids));
 	fs_info->delalloc_batch = sectorsize * 512 * (1 + ilog2(nr_cpu_ids));
 
@@ -3585,7 +3538,60 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	fs_info->sectorsize = sectorsize;
 	fs_info->sectorsize_bits = ilog2(sectorsize);
 	fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
-	fs_info->stripesize = stripesize;
+	fs_info->stripesize = sectorsize;
+
+	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(fs_info->super_copy);
+	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
+
+	fs_info->sb->s_blocksize = sectorsize;
+	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
+	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
+
+	return 0;
+
+error:
+	btrfs_release_disk_super(disk_super);
+	return ret;
+}
+
+struct init_sequence {
+	int (*init_func)(struct btrfs_fs_info *fs_info);
+	void (*exit_func)(struct btrfs_fs_info *fs_info);
+};
+
+static const struct init_sequence open_ctree_seq[] = {
+	{
+		.init_func = open_ctree_btree_inode_init,
+		.exit_func = open_ctree_btree_inode_exit,
+	}, {
+		.init_func = open_ctree_super_init,
+		.exit_func = NULL,
+	}
+};
+
+
+int __cold open_ctree(struct super_block *sb, char *options)
+{
+	u64 generation;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
+	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	int ret;
+	int err = -EINVAL;
+	int level;
+	int i;
+
+	fs_info->sb = sb;
+
+	/* Caller should have already initialized fs_info->fs_devices. */
+	ASSERT(fs_info->fs_devices);
+
+	for (i = 0; i < ARRAY_SIZE(open_ctree_seq); i++) {
+		ret = open_ctree_seq[i].init_func(fs_info);
+		if (ret < 0)
+			goto fail;
+		open_ctree_res[i] = true;
+	}
 
 	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
 	if (ret) {
@@ -3599,7 +3605,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_alloc;
 	}
 
-	if (sectorsize < PAGE_SIZE) {
+	if (fs_info->sectorsize < PAGE_SIZE) {
 		struct btrfs_subpage_info *subpage_info;
 
 		/*
@@ -3611,15 +3617,15 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE);
 		btrfs_set_and_info(fs_info, FREE_SPACE_TREE,
 			"forcing free space tree for sector size %u with page size %lu",
-			sectorsize, PAGE_SIZE);
+			fs_info->sectorsize, PAGE_SIZE);
 
 		btrfs_warn(fs_info,
 		"read-write for sector size %u with page size %lu is experimental",
-			   sectorsize, PAGE_SIZE);
+			   fs_info->sectorsize, PAGE_SIZE);
 		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
 		if (!subpage_info)
 			goto fail_alloc;
-		btrfs_init_subpage_info(subpage_info, sectorsize);
+		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
 		fs_info->subpage_info = subpage_info;
 	}
 
@@ -3629,13 +3635,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sb_buffer;
 	}
 
-	fs_info->sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
-	fs_info->sb->s_bdi->ra_pages = max(fs_info->sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
-
-	fs_info->sb->s_blocksize = sectorsize;
-	fs_info->sb->s_blocksize_bits = blksize_bits(sectorsize);
-	memcpy(&fs_info->sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
-
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
@@ -3644,10 +3643,10 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sb_buffer;
 	}
 
-	generation = btrfs_super_chunk_root_generation(disk_super);
-	level = btrfs_super_chunk_root_level(disk_super);
+	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
+	level = btrfs_super_chunk_root_level(fs_info->super_copy);
 	ret = load_super_root(fs_info->chunk_root,
-			      btrfs_super_chunk_root(disk_super),
+			      btrfs_super_chunk_root(fs_info->super_copy),
 			      generation, level);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read chunk root");
@@ -3703,7 +3702,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 * even though it was perfectly fine.
 	 */
 	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
-	    fs_info->generation == btrfs_super_uuid_tree_generation(disk_super))
+	    fs_info->generation ==
+	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
 		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
 
 	ret = btrfs_verify_dev_extents(fs_info);
@@ -3814,7 +3814,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		btrfs_err(fs_info, "couldn't build ref tree");
 
 	/* do not make disk changes in broken FS or nologreplay is given */
-	if (btrfs_super_log_root(disk_super) != 0 &&
+	if (btrfs_super_log_root(fs_info->super_copy) != 0 &&
 	    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
 		btrfs_info(fs_info, "start tree-log replay");
 		ret = btrfs_replay_log(fs_info, fs_devices);
@@ -3844,7 +3844,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	if (fs_info->uuid_root &&
 	    (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
-	     fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) {
+	     fs_info->generation !=
+	     btrfs_super_uuid_tree_generation(fs_info->super_copy))) {
 		btrfs_info(fs_info, "checking UUID tree");
 		ret = btrfs_check_uuid_tree(fs_info);
 		if (ret) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/15] btrfs: extract mount options and features init code into its own init helper
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 04/15] btrfs: extract super block read code into its own init helper Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 06/15] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

This patch will extract the mount option parsing and features checking
code into a new helper, open_ctree_features_init().

This extraction also did the following non-functional change:

- Add btrfs_fs_info::__options member
  Currently the open_ctree_* helpers can only accept a single @fs_info
  parameter, to parse the mount options we have to use a temporary
  pointer for this purpose.

  Thankfully we don't need to do anything like freeing it.

- Move ssd optimization check into open_ctree_features_init()

- Move check_int related code into open_ctree_features_init()
  The mount time integrity check doesn't rely on any tree blocks, thus
  is safe to be called at feature init time.

- Separate @features variable into @incompat and @compat_ro
  So there will be no confusion, and compiler should be clever enough
  to optimize them out anyway.

- Properly return error for subpage initialization failure
  Preivously we just goto fail_alloc label, relying on the default
  -EINVAL error.
  Now we can return a proper -ENOMEM error instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |  6 +++
 fs/btrfs/disk-io.c | 96 ++++++++++++++++++++++++----------------------
 2 files changed, 57 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a8b629a166be..a4557075b5c2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -837,6 +837,12 @@ struct btrfs_fs_info {
 	struct lockdep_map btrfs_trans_pending_ordered_map;
 	struct lockdep_map btrfs_ordered_extent_map;
 
+	/*
+	 * Temporary pointer to the mount option string.
+	 * This is to workaround the fact that all open_ctree() init
+	 * functions can only accept a single @fs_info pointer.
+	 */
+	char *__options;
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f8be0a74d07a..650eabb3d144 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3554,6 +3554,50 @@ static int open_ctree_super_init(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+static int open_ctree_features_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_parse_options(fs_info, fs_info->__options,
+				  fs_info->sb->s_flags);
+	if (ret)
+		return ret;
+
+	ret = btrfs_check_features(fs_info, fs_info->sb);
+	if (ret < 0)
+		return ret;
+
+	if (fs_info->sectorsize < PAGE_SIZE) {
+		struct btrfs_subpage_info *subpage_info;
+
+		/*
+		 * V1 space cache has some hardcoded PAGE_SIZE usage, and is
+		 * going to be deprecated.
+		 *
+		 * Force to use v2 cache for subpage case.
+		 */
+		btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE);
+		btrfs_set_and_info(fs_info, FREE_SPACE_TREE,
+			"forcing free space tree for sector size %u with page size %lu",
+			fs_info->sectorsize, PAGE_SIZE);
+
+		btrfs_warn(fs_info,
+		"read-write for sector size %u with page size %lu is experimental",
+			   fs_info->sectorsize, PAGE_SIZE);
+		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
+		if (!subpage_info)
+			return -ENOMEM;
+		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
+		fs_info->subpage_info = subpage_info;
+	}
+
+	if (!btrfs_test_opt(fs_info, NOSSD) &&
+	    !fs_info->fs_devices->rotating)
+		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
+
+	return 0;
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3566,22 +3610,25 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_super_init,
 		.exit_func = NULL,
+	}, {
+		.init_func = open_ctree_features_init,
+		.exit_func = NULL,
 	}
 };
 
-
 int __cold open_ctree(struct super_block *sb, char *options)
 {
 	u64 generation;
-	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	int ret;
 	int err = -EINVAL;
 	int level;
 	int i;
 
 	fs_info->sb = sb;
+	fs_info->__options = options;
 
 	/* Caller should have already initialized fs_info->fs_devices. */
 	ASSERT(fs_info->fs_devices);
@@ -3593,42 +3640,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_parse_options(fs_info, options, fs_info->sb->s_flags);
-	if (ret) {
-		err = ret;
-		goto fail_alloc;
-	}
-
-	ret = btrfs_check_features(fs_info, fs_info->sb);
-	if (ret < 0) {
-		err = ret;
-		goto fail_alloc;
-	}
-
-	if (fs_info->sectorsize < PAGE_SIZE) {
-		struct btrfs_subpage_info *subpage_info;
-
-		/*
-		 * V1 space cache has some hardcoded PAGE_SIZE usage, and is
-		 * going to be deprecated.
-		 *
-		 * Force to use v2 cache for subpage case.
-		 */
-		btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE);
-		btrfs_set_and_info(fs_info, FREE_SPACE_TREE,
-			"forcing free space tree for sector size %u with page size %lu",
-			fs_info->sectorsize, PAGE_SIZE);
-
-		btrfs_warn(fs_info,
-		"read-write for sector size %u with page size %lu is experimental",
-			   fs_info->sectorsize, PAGE_SIZE);
-		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
-		if (!subpage_info)
-			goto fail_alloc;
-		btrfs_init_subpage_info(subpage_info, fs_info->sectorsize);
-		fs_info->subpage_info = subpage_info;
-	}
-
 	ret = btrfs_init_workqueues(fs_info);
 	if (ret) {
 		err = ret;
@@ -3783,11 +3794,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (IS_ERR(fs_info->transaction_kthread))
 		goto fail_cleaner;
 
-	if (!btrfs_test_opt(fs_info, NOSSD) &&
-	    !fs_info->fs_devices->rotating) {
-		btrfs_set_and_info(fs_info, SSD, "enabling ssd optimizations");
-	}
-
 	/*
 	 * Mount does not set all options immediately, we can do it now and do
 	 * not have to wait for transaction commit
@@ -3796,7 +3802,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
-		ret = btrfsic_mount(fs_info, fs_devices,
+		ret = btrfsic_mount(fs_info, fs_info->fs_devices,
 				    btrfs_test_opt(fs_info,
 					CHECK_INTEGRITY_DATA) ? 1 : 0,
 				    fs_info->check_integrity_print_mask);
@@ -3806,6 +3812,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 				ret);
 	}
 #endif
+
 	ret = btrfs_read_qgroup_config(fs_info);
 	if (ret)
 		goto fail_trans_kthread;
@@ -3898,7 +3905,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
-fail_alloc:
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/15] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[]
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 05/15] btrfs: extract mount options and features init " Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 07/15] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

Those two helpers are already doing all the work, just move them into
the open_ctree_seq[] array.

There is only one small change:

- Call btrfs_stop_all_workers() inside btrfs_init_workqueus() for error
  handling
  Since open_ctree_seq[] makes all error path to call the exit function
  if and only if the corresponding init function finished without error.

  This means, if btrfs_init_workqueus() failed due to -ENOMEM, then we
  won't call btrfs_stop_all_workers() to cleanup whatever is already
  allocatd.

  To fix this problem, call btrfs_stop_all_workers() inside
  btrfs_init_workqueus() when we hit errors.
  Function btrfs_stop_all_workers() already has the checks to
  handle NULL pointers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 650eabb3d144..59775f37368f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2324,11 +2324,12 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 	      fs_info->endio_freespace_worker && fs_info->rmw_workers &&
 	      fs_info->caching_workers && fs_info->fixup_workers &&
 	      fs_info->delayed_workers && fs_info->qgroup_rescan_workers &&
-	      fs_info->discard_ctl.discard_workers)) {
-		return -ENOMEM;
-	}
-
+	      fs_info->discard_ctl.discard_workers))
+		goto error;
 	return 0;
+error:
+	btrfs_stop_all_workers(fs_info);
+	return -ENOMEM;
 }
 
 static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
@@ -3613,6 +3614,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_features_init,
 		.exit_func = NULL,
+	}, {
+		.init_func = btrfs_init_workqueues,
+		.exit_func = btrfs_stop_all_workers,
 	}
 };
 
@@ -3640,12 +3644,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_init_workqueues(fs_info);
-	if (ret) {
-		err = ret;
-		goto fail_sb_buffer;
-	}
-
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
 	mutex_unlock(&fs_info->chunk_mutex);
@@ -3903,7 +3901,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	free_root_pointers(fs_info, true);
 
 fail_sb_buffer:
-	btrfs_stop_all_workers(fs_info);
 	btrfs_free_block_groups(fs_info);
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/15] btrfs: extract chunk tree read code into its own init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 06/15] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 08/15] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

Involved functional changes:

- Properly free the chunk map and chunk root ebs at error handling
  Previously we rely the final close_ctree() to properly free the chunk
  root extent buffers.

  With the more strict open_ctree_seq[] requirement, since we're the
  first one to fully populate chunk root extent buffers, at error
  we should also free the extent buffers.

  Note, the tree root and chunk root themselves are first allocated by
  open_ctree_btree_inode_init(), thus we should not free the chunk_root
  pointer, but just the extent buffers.

- Do degradable check immediately after loading chunk tree
  The degradable check only requires the full chunk mapping, can be done
  immediately after btrfs_read_chunk_tree().

This also exposed one exiting label mismatch, at chunk tree read, we
didn't create block group items at all, but at the old fail_sb_buffer:
label we call btrfs_free_block_groups().

It doesn't hurt but just shows how bad the original code labels are
managed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 153 ++++++++++++++++++++++++++-------------------
 1 file changed, 87 insertions(+), 66 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 59775f37368f..54c7a2d66322 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3599,6 +3599,90 @@ static int open_ctree_features_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_chunk_tree_init(struct btrfs_fs_info *fs_info)
+{
+	u64 generation;
+	int level;
+	int ret;
+
+	mutex_lock(&fs_info->chunk_mutex);
+	ret = btrfs_read_sys_array(fs_info);
+	mutex_unlock(&fs_info->chunk_mutex);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read the system array: %d", ret);
+		goto free_mapping;
+	}
+
+	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
+	level = btrfs_super_chunk_root_level(fs_info->super_copy);
+	ret = load_super_root(fs_info->chunk_root,
+			      btrfs_super_chunk_root(fs_info->super_copy),
+			      generation, level);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read chunk root");
+		goto free_root;
+	}
+
+	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
+			   offsetof(struct btrfs_header, chunk_tree_uuid),
+			   BTRFS_UUID_SIZE);
+
+	ret = btrfs_read_chunk_tree(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to read chunk tree: %d", ret);
+		goto free_root;
+	}
+
+	/*
+	 * At this point we know all the devices that make this filesystem,
+	 * including the seed devices but we don't know yet if the replace
+	 * target is required. So free devices that are not part of this
+	 * filesystem but skip the replace target device which is checked
+	 * below in btrfs_init_dev_replace().
+	 */
+	btrfs_free_extra_devids(fs_info->fs_devices);
+	if (!fs_info->fs_devices->latest_dev->bdev) {
+		btrfs_err(fs_info, "failed to read devices");
+		goto free_root;
+	}
+
+	/* We have full chunk tree loaded, can do degradable check now. */
+	if (!sb_rdonly(fs_info->sb) && fs_info->fs_devices->missing_devices &&
+	    !btrfs_check_rw_degradable(fs_info, NULL)) {
+		btrfs_warn(fs_info,
+		"writable mount is not allowed due to too many missing devices");
+		ret = -EIO;
+		goto free_root;
+	}
+
+#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
+	/* And integrity check also relies on fully loaded chunk tree. */
+	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
+		ret = btrfsic_mount(fs_info, fs_info->fs_devices,
+				    btrfs_test_opt(fs_info,
+					CHECK_INTEGRITY_DATA) ? 1 : 0,
+				    fs_info->check_integrity_print_mask);
+		if (ret)
+			btrfs_warn(fs_info,
+				"failed to initialize integrity check module: %d",
+				ret);
+	}
+#endif
+	return 0;
+
+free_root:
+	free_root_extent_buffers(fs_info->chunk_root);
+free_mapping:
+	btrfs_mapping_tree_free(&fs_info->mapping_tree);
+	return ret;
+}
+
+static void open_ctree_chunk_tree_exit(struct btrfs_fs_info *fs_info)
+{
+	free_root_extent_buffers(fs_info->chunk_root);
+	btrfs_mapping_tree_free(&fs_info->mapping_tree);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3617,18 +3701,19 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = btrfs_init_workqueues,
 		.exit_func = btrfs_stop_all_workers,
+	}, {
+		.init_func = open_ctree_chunk_tree_init,
+		.exit_func = open_ctree_chunk_tree_exit,
 	}
 };
 
 int __cold open_ctree(struct super_block *sb, char *options)
 {
-	u64 generation;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	int ret;
 	int err = -EINVAL;
-	int level;
 	int i;
 
 	fs_info->sb = sb;
@@ -3644,47 +3729,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	mutex_lock(&fs_info->chunk_mutex);
-	ret = btrfs_read_sys_array(fs_info);
-	mutex_unlock(&fs_info->chunk_mutex);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read the system array: %d", ret);
-		goto fail_sb_buffer;
-	}
-
-	generation = btrfs_super_chunk_root_generation(fs_info->super_copy);
-	level = btrfs_super_chunk_root_level(fs_info->super_copy);
-	ret = load_super_root(fs_info->chunk_root,
-			      btrfs_super_chunk_root(fs_info->super_copy),
-			      generation, level);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read chunk root");
-		goto fail_tree_roots;
-	}
-
-	read_extent_buffer(fs_info->chunk_root->node, fs_info->chunk_tree_uuid,
-			   offsetof(struct btrfs_header, chunk_tree_uuid),
-			   BTRFS_UUID_SIZE);
-
-	ret = btrfs_read_chunk_tree(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read chunk tree: %d", ret);
-		goto fail_tree_roots;
-	}
-
-	/*
-	 * At this point we know all the devices that make this filesystem,
-	 * including the seed devices but we don't know yet if the replace
-	 * target is required. So free devices that are not part of this
-	 * filesystem but skip the replace target device which is checked
-	 * below in btrfs_init_dev_replace().
-	 */
-	btrfs_free_extra_devids(fs_devices);
-	if (!fs_devices->latest_dev->bdev) {
-		btrfs_err(fs_info, "failed to read devices");
-		goto fail_tree_roots;
-	}
-
 	ret = init_tree_roots(fs_info);
 	if (ret)
 		goto fail_tree_roots;
@@ -3774,13 +3818,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 	btrfs_free_zone_cache(fs_info);
 
-	if (!sb_rdonly(sb) && fs_info->fs_devices->missing_devices &&
-	    !btrfs_check_rw_degradable(fs_info, NULL)) {
-		btrfs_warn(fs_info,
-		"writable mount is not allowed due to too many missing devices");
-		goto fail_sysfs;
-	}
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
@@ -3798,19 +3835,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	btrfs_apply_pending_changes(fs_info);
 
-#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-	if (btrfs_test_opt(fs_info, CHECK_INTEGRITY)) {
-		ret = btrfsic_mount(fs_info, fs_info->fs_devices,
-				    btrfs_test_opt(fs_info,
-					CHECK_INTEGRITY_DATA) ? 1 : 0,
-				    fs_info->check_integrity_print_mask);
-		if (ret)
-			btrfs_warn(fs_info,
-				"failed to initialize integrity check module: %d",
-				ret);
-	}
-#endif
-
 	ret = btrfs_read_qgroup_config(fs_info);
 	if (ret)
 		goto fail_trans_kthread;
@@ -3899,10 +3923,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	if (fs_info->data_reloc_root)
 		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
 	free_root_pointers(fs_info, true);
-
-fail_sb_buffer:
 	btrfs_free_block_groups(fs_info);
-	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/15] btrfs: extract tree roots and zone info initialization into init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 07/15] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 09/15] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

No functional change, but one special thing to notice:

- No need to cleanup zone info
  As we call btrfs_close_devices() at error path to cleanup all
  zone device info, thus we don't need to handle it at
  open_ctree_tree_roots_init().

  This is a break of the init/exit layer, but since devices info is
  not per-device, but shared for the whole btrfs module, such break
  should still be acceptable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 88 ++++++++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 54c7a2d66322..cfed53675359 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3683,6 +3683,56 @@ static void open_ctree_chunk_tree_exit(struct btrfs_fs_info *fs_info)
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 }
 
+static int open_ctree_tree_roots_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = init_tree_roots(fs_info);
+	if (ret)
+		goto error;
+
+	/*
+	 * Get zone type information of zoned block devices. This will also
+	 * handle emulation of a zoned filesystem if a regular device has the
+	 * zoned incompat feature flag set.
+	 */
+	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
+	if (ret) {
+		btrfs_err(fs_info,
+			  "zoned: failed to read device zone info: %d",
+			  ret);
+		goto error;
+	}
+
+	/*
+	 * If we have a uuid root and we're not being told to rescan we need to
+	 * check the generation here so we can set the
+	 * BTRFS_FS_UPDATE_UUID_TREE_GEN bit.  Otherwise we could commit the
+	 * transaction during a balance or the log replay without updating the
+	 * uuid generation, and then if we crash we would rescan the uuid tree,
+	 * even though it was perfectly fine.
+	 */
+	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
+	    fs_info->generation ==
+	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
+		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
+
+	return 0;
+
+error:
+	if (fs_info->data_reloc_root)
+		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
+	free_root_pointers(fs_info, true);
+	return ret;
+}
+
+static void open_ctree_tree_roots_exit(struct btrfs_fs_info *fs_info)
+{
+	if (fs_info->data_reloc_root)
+		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
+	free_root_pointers(fs_info, true);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3704,6 +3754,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_chunk_tree_init,
 		.exit_func = open_ctree_chunk_tree_exit,
+	}, {
+		.init_func = open_ctree_tree_roots_init,
+		.exit_func = open_ctree_tree_roots_exit,
 	}
 };
 
@@ -3729,36 +3782,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = init_tree_roots(fs_info);
-	if (ret)
-		goto fail_tree_roots;
-
-	/*
-	 * Get zone type information of zoned block devices. This will also
-	 * handle emulation of a zoned filesystem if a regular device has the
-	 * zoned incompat feature flag set.
-	 */
-	ret = btrfs_get_dev_zone_info_all_devices(fs_info);
-	if (ret) {
-		btrfs_err(fs_info,
-			  "zoned: failed to read device zone info: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-
-	/*
-	 * If we have a uuid root and we're not being told to rescan we need to
-	 * check the generation here so we can set the
-	 * BTRFS_FS_UPDATE_UUID_TREE_GEN bit.  Otherwise we could commit the
-	 * transaction during a balance or the log replay without updating the
-	 * uuid generation, and then if we crash we would rescan the uuid tree,
-	 * even though it was perfectly fine.
-	 */
-	if (fs_info->uuid_root && !btrfs_test_opt(fs_info, RESCAN_UUID_TREE) &&
-	    fs_info->generation ==
-	    btrfs_super_uuid_tree_generation(fs_info->super_copy))
-		set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags);
-
 	ret = btrfs_verify_dev_extents(fs_info);
 	if (ret) {
 		btrfs_err(fs_info,
@@ -3918,11 +3941,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 fail_block_groups:
 	btrfs_put_block_group_cache(fs_info);
-
-fail_tree_roots:
-	if (fs_info->data_reloc_root)
-		btrfs_drop_and_free_fs_root(fs_info, fs_info->data_reloc_root);
-	free_root_pointers(fs_info, true);
 	btrfs_free_block_groups(fs_info);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/15] btrfs: extract mount time checks and items load code into its init helper
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 08/15] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 10/15] btrfs: extract sysfs init into its own helper Qu Wenruo
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

One thing to notice is, since we're also initializing zoned mode, also
move later btrfs_free_zone_cache() call into the helper to concentrace
the zoned code.

As later I found it pretty hard to find any logical connection around
that btrfs_free_zone_cache() call.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 81 +++++++++++++++++++++++++++-------------------
 1 file changed, 47 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cfed53675359..bee6204d357d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3733,6 +3733,50 @@ static void open_ctree_tree_roots_exit(struct btrfs_fs_info *fs_info)
 	free_root_pointers(fs_info, true);
 }
 
+/* Load various items for balance/replace, and do various mount time check. */
+static int open_ctree_load_items_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	/*
+	 * Dev extents can only be verified after both dev tree and chunk tree
+	 * being initialized.
+	 */
+	ret = btrfs_verify_dev_extents(fs_info);
+	if (ret) {
+		btrfs_err(fs_info,
+			  "failed to verify dev extents against chunks: %d",
+			  ret);
+		return ret;
+	}
+	ret = btrfs_recover_balance(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to recover balance: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_init_dev_stats(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init dev_stats: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_init_dev_replace(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init dev_replace: %d", ret);
+		return ret;
+	}
+
+	ret = btrfs_check_zoned_mode(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to initialize zoned mode: %d", ret);
+		return ret;
+	}
+	btrfs_free_zone_cache(fs_info);
+
+	return 0;
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3757,6 +3801,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_tree_roots_init,
 		.exit_func = open_ctree_tree_roots_exit,
+	}, {
+		.init_func = open_ctree_load_items_init,
+		.exit_func = NULL,
 	}
 };
 
@@ -3782,38 +3829,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_verify_dev_extents(fs_info);
-	if (ret) {
-		btrfs_err(fs_info,
-			  "failed to verify dev extents against chunks: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-	ret = btrfs_recover_balance(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to recover balance: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_init_dev_stats(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init dev_stats: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_init_dev_replace(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init dev_replace: %d", ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_check_zoned_mode(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to initialize zoned mode: %d",
-			  ret);
-		goto fail_block_groups;
-	}
-
 	ret = btrfs_sysfs_add_fsid(fs_devices);
 	if (ret) {
 		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
@@ -3839,8 +3854,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		goto fail_sysfs;
 	}
 
-	btrfs_free_zone_cache(fs_info);
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/15] btrfs: extract sysfs init into its own helper
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 09/15] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 11/15] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

The three functions, btrfs_sysfs_add_fsid(), btrfs_sysfs_add_mounted()
and btrfs_init_space_info() are all doing sysfs related code.

The last one can only be called after fsid sysfs entry created, thus
they are all put into the same helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 66 ++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bee6204d357d..3fa618c25e60 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3777,6 +3777,44 @@ static int open_ctree_load_items_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_sysfs_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_sysfs_add_fsid(fs_info->fs_devices);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
+				ret);
+		return ret;
+	}
+
+	ret = btrfs_sysfs_add_mounted(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init sysfs interface: %d", ret);
+		goto free_fsid;
+	}
+
+	/* This can only be called after the fsid entry being added. */
+	ret = btrfs_init_space_info(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to initialize space info: %d", ret);
+		goto free_mounted;
+	}
+	return 0;
+
+free_mounted:
+	btrfs_sysfs_remove_mounted(fs_info);
+free_fsid:
+	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
+	return ret;
+}
+
+static void open_ctree_sysfs_exit(struct btrfs_fs_info *fs_info)
+{
+	btrfs_sysfs_remove_mounted(fs_info);
+	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3804,6 +3842,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_load_items_init,
 		.exit_func = NULL,
+	}, {
+		.init_func = open_ctree_sysfs_init,
+		.exit_func = open_ctree_sysfs_exit,
 	}
 };
 
@@ -3829,25 +3870,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_sysfs_add_fsid(fs_devices);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
-				ret);
-		goto fail_block_groups;
-	}
-
-	ret = btrfs_sysfs_add_mounted(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to init sysfs interface: %d", ret);
-		goto fail_fsdev_sysfs;
-	}
-
-	ret = btrfs_init_space_info(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to initialize space info: %d", ret);
-		goto fail_sysfs;
-	}
-
 	ret = btrfs_read_block_groups(fs_info);
 	if (ret) {
 		btrfs_err(fs_info, "failed to read block groups: %d", ret);
@@ -3947,12 +3969,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
 fail_sysfs:
-	btrfs_sysfs_remove_mounted(fs_info);
-
-fail_fsdev_sysfs:
-	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
-
-fail_block_groups:
 	btrfs_put_block_group_cache(fs_info);
 	btrfs_free_block_groups(fs_info);
 fail:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/15] btrfs: extra block groups read code into its own init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (9 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 10/15] btrfs: extract sysfs init into its own helper Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 12/15] btrfs: move the fs root related " Qu Wenruo
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

The only special handling is:

- Need error cleanup even in open_ctree_block_group_init()
  As btrfs_read_block_groups() can error out with some block groups
  already inserted.

  Thus here we have to do the cleanup manually, as exit helper will
  not be called if the init helper failed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3fa618c25e60..33700753f915 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3815,6 +3815,31 @@ static void open_ctree_sysfs_exit(struct btrfs_fs_info *fs_info)
 	btrfs_sysfs_remove_fsid(fs_info->fs_devices);
 }
 
+static int open_ctree_block_groups_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_read_block_groups(fs_info);
+
+	/*
+	 * Even if btrfs_read_block_groups() failed, we may still have
+	 * inserted some block groups.
+	 * Thus we have to do cleanup here manually for error path, as
+	 * our exit function won't be executed for error path.
+	 */
+	if (ret < 0) {
+		btrfs_put_block_group_cache(fs_info);
+		btrfs_free_block_groups(fs_info);
+	}
+	return ret;
+}
+
+static void open_ctree_block_groups_exit(struct btrfs_fs_info *fs_info)
+{
+	btrfs_put_block_group_cache(fs_info);
+	btrfs_free_block_groups(fs_info);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3845,6 +3870,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_sysfs_init,
 		.exit_func = open_ctree_sysfs_exit,
+	}, {
+		.init_func = open_ctree_block_groups_init,
+		.exit_func = open_ctree_block_groups_exit,
 	}
 };
 
@@ -3870,16 +3898,10 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_read_block_groups(fs_info);
-	if (ret) {
-		btrfs_err(fs_info, "failed to read block groups: %d", ret);
-		goto fail_sysfs;
-	}
-
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
-		goto fail_sysfs;
+		goto fail;
 
 	fs_info->transaction_kthread = kthread_run(transaction_kthread,
 						   fs_info->tree_root,
@@ -3968,9 +3990,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	 */
 	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
-fail_sysfs:
-	btrfs_put_block_group_cache(fs_info);
-	btrfs_free_block_groups(fs_info);
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/15] btrfs: move the fs root related code into its own init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (10 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 11/15] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 13/15] btrfs: extract kthread " Qu Wenruo
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

The most important change in this patch is the timing change.

The existing code put fs root read very late, after
kthread/qgroup-rescan/log-replay, but put btrfs_free_fs_roots() very
early, as kthread/qgroup/log-replacey can all populate the fs roots.

Thus this patch will change the timing, by reading fs root early.
The fs root read part is not that important, but the cleanup part is.

After the timing change, the fs root would be the first subvolume to be
read, and its exit call can be ensured to cover all later possible
subvolume loads.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 33700753f915..71a9572b45f9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3840,6 +3840,19 @@ static void open_ctree_block_groups_exit(struct btrfs_fs_info *fs_info)
 	btrfs_free_block_groups(fs_info);
 }
 
+static int open_ctree_fs_root_init(struct btrfs_fs_info *fs_info)
+{
+	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
+	if (IS_ERR(fs_info->fs_root)) {
+		int ret = PTR_ERR(fs_info->fs_root);
+
+		btrfs_warn(fs_info, "failed to read fs tree: %d", ret);
+		fs_info->fs_root = NULL;
+		return ret;
+	}
+	return 0;
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3873,6 +3886,17 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_block_groups_init,
 		.exit_func = open_ctree_block_groups_exit,
+	}, {
+		/*
+		 * This fs roots related code should be called before anything
+		 * which may try to read a subvolume, including cleanup/commit
+		 * kthread, qgroup rescan, log replay etc.
+		 *
+		 * The main reason is for the exit function to be called for
+		 * any stage which may read some subvolume trees.
+		 */
+		.init_func = open_ctree_fs_root_init,
+		.exit_func = btrfs_free_fs_roots,
 	}
 };
 
@@ -3933,14 +3957,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		}
 	}
 
-	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
-	if (IS_ERR(fs_info->fs_root)) {
-		err = PTR_ERR(fs_info->fs_root);
-		btrfs_warn(fs_info, "failed to read fs tree: %d", err);
-		fs_info->fs_root = NULL;
-		goto fail_qgroup;
-	}
-
 	if (sb_rdonly(fs_info->sb))
 		goto clear_oneshot;
 
@@ -3980,7 +3996,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info);
-	btrfs_free_fs_roots(fs_info);
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/15] btrfs: extract kthread code into its own init/exit helpers
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (11 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 12/15] btrfs: move the fs root related " Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 14/15] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

There are several changes involved:

- Change the timing of btrfs_cleanup_transaction()
  That call is to address any unfinished transaction mostly caused by
  the cleaner/commit kthread.

  Thus at exit function and error handling path, we should stop all
  kthread, then cleanup the unfinished transaction.

  Not calling it before stopping cleaner thread.

- Remove the filemap_write_and_wait() call
  Now we have open_ctree_btree_inode_exit() call, which will invalidate
  all dirty pages of btree inode.
  Thus there is no need to writeback those dirtied tree blocks anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 79 +++++++++++++++++++++++++++++-----------------
 1 file changed, 50 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 71a9572b45f9..2d1f178cdecd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3853,6 +3853,52 @@ static int open_ctree_fs_root_init(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+static int open_ctree_kthread_init(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
+					       "btrfs-cleaner");
+	if (IS_ERR(fs_info->cleaner_kthread)) {
+		ret = PTR_ERR(fs_info->cleaner_kthread);
+		return ret;
+	}
+
+	fs_info->transaction_kthread = kthread_run(transaction_kthread,
+						   fs_info->tree_root,
+						   "btrfs-transaction");
+	if (IS_ERR(fs_info->transaction_kthread)) {
+		kthread_stop(fs_info->cleaner_kthread);
+
+		/*
+		 * Cleanup thread may have already started a trans.
+		 * The dirtied tree blocks will be invalidated at
+		 * open_ctree_btree_inode_exit() thus we don't need to bother.
+		 */
+		btrfs_cleanup_transaction(fs_info);
+
+		ret = PTR_ERR(fs_info->cleaner_kthread);
+		return ret;
+	}
+	/*
+	 * Mount does not set all options immediately, we can do it now and do
+	 * not have to wait for transaction commit
+	 */
+	btrfs_apply_pending_changes(fs_info);
+	return 0;
+}
+
+static void open_ctree_kthread_exit(struct btrfs_fs_info *fs_info)
+{
+	kthread_stop(fs_info->transaction_kthread);
+	kthread_stop(fs_info->cleaner_kthread);
+	/*
+	 * Cleanup any unfinished transaction started by transaction/cleaner
+	 * kthread.
+	 */
+	btrfs_cleanup_transaction(fs_info);
+}
+
 struct init_sequence {
 	int (*init_func)(struct btrfs_fs_info *fs_info);
 	void (*exit_func)(struct btrfs_fs_info *fs_info);
@@ -3897,6 +3943,9 @@ static const struct init_sequence open_ctree_seq[] = {
 		 */
 		.init_func = open_ctree_fs_root_init,
 		.exit_func = btrfs_free_fs_roots,
+	}, {
+		.init_func = open_ctree_kthread_init,
+		.exit_func = open_ctree_kthread_exit,
 	}
 };
 
@@ -3922,26 +3971,9 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
-					       "btrfs-cleaner");
-	if (IS_ERR(fs_info->cleaner_kthread))
-		goto fail;
-
-	fs_info->transaction_kthread = kthread_run(transaction_kthread,
-						   fs_info->tree_root,
-						   "btrfs-transaction");
-	if (IS_ERR(fs_info->transaction_kthread))
-		goto fail_cleaner;
-
-	/*
-	 * Mount does not set all options immediately, we can do it now and do
-	 * not have to wait for transaction commit
-	 */
-	btrfs_apply_pending_changes(fs_info);
-
 	ret = btrfs_read_qgroup_config(fs_info);
 	if (ret)
-		goto fail_trans_kthread;
+		goto fail;
 
 	if (btrfs_build_ref_tree(fs_info))
 		btrfs_err(fs_info, "couldn't build ref tree");
@@ -3993,17 +4025,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 
 fail_qgroup:
 	btrfs_free_qgroup_config(fs_info);
-fail_trans_kthread:
-	kthread_stop(fs_info->transaction_kthread);
-	btrfs_cleanup_transaction(fs_info);
-fail_cleaner:
-	kthread_stop(fs_info->cleaner_kthread);
-
-	/*
-	 * make sure we're done with the btree inode before we stop our
-	 * kthreads
-	 */
-	filemap_write_and_wait(fs_info->btree_inode->i_mapping);
 
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/15] btrfs: move qgroup init/exit code into open_ctree_seq[] array
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (12 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 13/15] btrfs: extract kthread " Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-12  9:13 ` [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
  2022-10-24 13:47 ` [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched David Sterba
  15 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

The qgroup related code is already extracted into two functions,
btrfs_read_qgroup_config() and btrfs_free_qgroup_config().

They are perfect matches for open_ctree_seq[], so just move them into
open_ctree_seq[] array.

And with the usage of open_ctree_seq[], there is no more need for @err
variable, just remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2d1f178cdecd..8e49a6dee207 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3946,6 +3946,9 @@ static const struct init_sequence open_ctree_seq[] = {
 	}, {
 		.init_func = open_ctree_kthread_init,
 		.exit_func = open_ctree_kthread_exit,
+	}, {
+		.init_func = btrfs_read_qgroup_config,
+		.exit_func = btrfs_free_qgroup_config,
 	}
 };
 
@@ -3955,7 +3958,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	bool open_ctree_res[ARRAY_SIZE(open_ctree_seq)] = {0};
 	int ret;
-	int err = -EINVAL;
 	int i;
 
 	fs_info->sb = sb;
@@ -3971,10 +3973,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = true;
 	}
 
-	ret = btrfs_read_qgroup_config(fs_info);
-	if (ret)
-		goto fail;
-
 	if (btrfs_build_ref_tree(fs_info))
 		btrfs_err(fs_info, "couldn't build ref tree");
 
@@ -3983,10 +3981,8 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
 		btrfs_info(fs_info, "start tree-log replay");
 		ret = btrfs_replay_log(fs_info, fs_devices);
-		if (ret) {
-			err = ret;
-			goto fail_qgroup;
-		}
+		if (ret)
+			goto fail;
 	}
 
 	if (sb_rdonly(fs_info->sb))
@@ -4023,9 +4019,6 @@ int __cold open_ctree(struct super_block *sb, char *options)
 	btrfs_clear_oneshot_options(fs_info);
 	return 0;
 
-fail_qgroup:
-	btrfs_free_qgroup_config(fs_info);
-
 fail:
 	for (i = ARRAY_SIZE(open_ctree_seq) - 1; i >= 0; i--) {
 		if (!open_ctree_res[i] || !open_ctree_seq[i].exit_func)
@@ -4034,9 +4027,7 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		open_ctree_res[i] = false;
 	}
 	btrfs_close_devices(fs_info->fs_devices);
-	if (ret < 0)
-		err = ret;
-	return err;
+	return ret;
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree()
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (13 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 14/15] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
@ 2022-10-12  9:13 ` Qu Wenruo
  2022-10-14 12:43   ` Josef Bacik
  2022-10-24 13:47 ` [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched David Sterba
  15 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2022-10-12  9:13 UTC (permalink / raw)
  To: linux-btrfs

With the new open_ctree_seq[] array, we can afford a debug mount option
to do all the error inject at different stages to have a much better
coverage for the error path.

The new "fail_mount=%u" mount option will be hidden behind
CONFIG_BTRFS_DEBUG option, and when enabled it will cause mount failure
just after the init function of specified stage.

This can be verified by the following script:

 mkfs.btrfs -f $dev
 for (( i=0;; i++ )) do
	mount -o fail_mount=$i $dev $mnt
	ret=$?
	if [ $ret -eq 0 ]; then
		umount $mnt
		exit
	fi
 done
 umount $mnt

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c | 14 ++++++++++++++
 fs/btrfs/super.c   | 13 +++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a4557075b5c2..6e0cd5b5bc61 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -854,6 +854,7 @@ struct btrfs_fs_info {
 
 	spinlock_t eb_leak_lock;
 	struct list_head allocated_ebs;
+	int fail_stage;
 #endif
 };
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8e49a6dee207..065d13891866 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3019,6 +3019,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&fs_info->allocated_roots);
 	INIT_LIST_HEAD(&fs_info->allocated_ebs);
 	spin_lock_init(&fs_info->eb_leak_lock);
+	fs_info->fail_stage = -1;
 #endif
 	extent_map_tree_init(&fs_info->mapping_tree);
 	btrfs_init_block_rsv(&fs_info->global_block_rsv,
@@ -3971,6 +3972,19 @@ int __cold open_ctree(struct super_block *sb, char *options)
 		if (ret < 0)
 			goto fail;
 		open_ctree_res[i] = true;
+#ifdef CONFIG_BTRFS_DEBUG
+		/*
+		 * This is not the best timing, as fail_stage will only be
+		 * initialized after open_ctree_features_init().
+		 * But this is still better to cover more error paths.
+		 */
+		if (fs_info->fail_stage >= 0 && i >= fs_info->fail_stage) {
+			btrfs_info(fs_info,
+				"error injected at open ctree stage %u", i);
+			ret = -ECANCELED;
+			goto fail;
+		}
+#endif
 	}
 
 	if (btrfs_build_ref_tree(fs_info))
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index bbdbd2a6e3bc..c25220bae232 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -447,6 +447,7 @@ enum {
 	Opt_enospc_debug, Opt_noenospc_debug,
 #ifdef CONFIG_BTRFS_DEBUG
 	Opt_fragment_data, Opt_fragment_metadata, Opt_fragment_all,
+	Opt_fail_mount,
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	Opt_ref_verify,
@@ -521,6 +522,7 @@ static const match_table_t tokens = {
 	{Opt_fragment_data, "fragment=data"},
 	{Opt_fragment_metadata, "fragment=metadata"},
 	{Opt_fragment_all, "fragment=all"},
+	{Opt_fail_mount, "fail_mount=%u"},
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	{Opt_ref_verify, "ref_verify"},
@@ -1106,6 +1108,17 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			btrfs_info(info, "fragmenting data");
 			btrfs_set_opt(info->mount_opt, FRAGMENT_DATA);
 			break;
+		case Opt_fail_mount:
+			ret = match_int(&args[0], &intarg);
+			if (ret) {
+				btrfs_err(info, "unrecognized fail_mount value %s",
+					  args[0].from);
+				goto out;
+			}
+			btrfs_info(info, "fail mount at open_ctree() stage %u",
+				   intarg);
+			info->fail_stage = intarg;
+			break;
 #endif
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 		case Opt_ref_verify:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree()
  2022-10-12  9:13 ` [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
@ 2022-10-14 12:43   ` Josef Bacik
  2022-10-14 23:04     ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Josef Bacik @ 2022-10-14 12:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 12, 2022 at 05:13:11PM +0800, Qu Wenruo wrote:
> With the new open_ctree_seq[] array, we can afford a debug mount option
> to do all the error inject at different stages to have a much better
> coverage for the error path.
> 
> The new "fail_mount=%u" mount option will be hidden behind
> CONFIG_BTRFS_DEBUG option, and when enabled it will cause mount failure
> just after the init function of specified stage.
> 
> This can be verified by the following script:
> 
>  mkfs.btrfs -f $dev
>  for (( i=0;; i++ )) do
> 	mount -o fail_mount=$i $dev $mnt
> 	ret=$?
> 	if [ $ret -eq 0 ]; then
> 		umount $mnt
> 		exit
> 	fi
>  done
>  umount $mnt
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Death to all mount options, especially for this.  I'd rather see something like
this inserted in the main loop

bool btrfs_mail_fail_init(struct btrfs_fs_info *fs_info, int seq)
{
	return false;
}
ALLOW_ERROR_INJECTION(btrfs_may_fail_init);

and then we can error inject that way.  Alternatively you could just use
ALLOW_ERROR_INJECTION for every one of the init/exit functions and acheive the
same thing.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree()
  2022-10-14 12:43   ` Josef Bacik
@ 2022-10-14 23:04     ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-14 23:04 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo; +Cc: linux-btrfs



On 2022/10/14 20:43, Josef Bacik wrote:
> On Wed, Oct 12, 2022 at 05:13:11PM +0800, Qu Wenruo wrote:
>> With the new open_ctree_seq[] array, we can afford a debug mount option
>> to do all the error inject at different stages to have a much better
>> coverage for the error path.
>>
>> The new "fail_mount=%u" mount option will be hidden behind
>> CONFIG_BTRFS_DEBUG option, and when enabled it will cause mount failure
>> just after the init function of specified stage.
>>
>> This can be verified by the following script:
>>
>>   mkfs.btrfs -f $dev
>>   for (( i=0;; i++ )) do
>> 	mount -o fail_mount=$i $dev $mnt
>> 	ret=$?
>> 	if [ $ret -eq 0 ]; then
>> 		umount $mnt
>> 		exit
>> 	fi
>>   done
>>   umount $mnt
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Death to all mount options, especially for this.  I'd rather see something like
> this inserted in the main loop
>
> bool btrfs_mail_fail_init(struct btrfs_fs_info *fs_info, int seq)
> {
> 	return false;
> }
> ALLOW_ERROR_INJECTION(btrfs_may_fail_init);

IIRC the error injection system can not do a proper conditional
injection according to the @seq parameter.

>
> and then we can error inject that way.  Alternatively you could just use
> ALLOW_ERROR_INJECTION for every one of the init/exit functions and acheive the
> same thing.  Thanks,

That is even worse.

The problem of injecting error directly into init functions is

- No exit will be called if init failed
   As we all expect the init function to cleanup itself if something
   wrong happened

Another problem is, it's much harder to test, we need to inject errors
to different functions, while a debug only mount option can easily test
the whole thing just with different fail stage number.

Thanks,
Qu

>
> Josef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched
  2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
                   ` (14 preceding siblings ...)
  2022-10-12  9:13 ` [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
@ 2022-10-24 13:47 ` David Sterba
  2022-10-24 23:02   ` Qu Wenruo
  15 siblings, 1 reply; 20+ messages in thread
From: David Sterba @ 2022-10-24 13:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 12, 2022 at 05:12:56PM +0800, Qu Wenruo wrote:
> [Changelog]
> v2:
> - Rebased to latest misc-next
>   Most conflicts comes from the new function btrfs_check_features().
> 
> 
> Just like init_btrfs_fs(), open_ctree() also has tons of different
> labels for its error handling.
> 
> And unsurprisingly the error handling labels are not matched correctly,
> e.g. we always call btrfs_mapping_tree_free() even we didn't reach
> sys chunk array read.
> 
> And every time we need to add some new function, it will be a disaster
> just to understand where the new function should be put and how the
> error handling should be done.
> 
> This patchset will follow the init_btrfs_fs() method, by introducing
> an open_ctree_seq[] array, which contains the following sections:
> 
> - btree_inode init/exit
> - super block read and verification
> - mount options and features check
> - workqueues init/exit
> - chunk tree init/exit
> - tree roots init/exit
> - mount time check and various item load
> - sysfs init/exit
> - block group tree init/exit
> - subvolume trees init/exit
> - kthread init/exit
> - qgroup init/exit
> 
> The remaining part of open_ctree() is only less than 50 lines, and are
> all related to the very end of the mount progress, including log-replay,
> uuid tree check.

I'm not sure it's a good idea to split the open_ctree to the sequence if
initializers, some of the code looks like it's not isolated the same way
as it was in the module init/exit. The readability is IMHO also worse,
verifying that some parts depend on each other requires jumping in the
file. Maybe some parts can be put into more helpers and we can make the
exit sequence robust enough so we don't need tons of labels and the
whole can be called regardless of from where it would be called.

This is similar to the array based approach but keeps the code in one
function. As it is implemented in this patchset I think it's taking it
too far.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched
  2022-10-24 13:47 ` [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched David Sterba
@ 2022-10-24 23:02   ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2022-10-24 23:02 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



On 2022/10/24 21:47, David Sterba wrote:
> On Wed, Oct 12, 2022 at 05:12:56PM +0800, Qu Wenruo wrote:
>> [Changelog]
>> v2:
>> - Rebased to latest misc-next
>>    Most conflicts comes from the new function btrfs_check_features().
>>
>>
>> Just like init_btrfs_fs(), open_ctree() also has tons of different
>> labels for its error handling.
>>
>> And unsurprisingly the error handling labels are not matched correctly,
>> e.g. we always call btrfs_mapping_tree_free() even we didn't reach
>> sys chunk array read.
>>
>> And every time we need to add some new function, it will be a disaster
>> just to understand where the new function should be put and how the
>> error handling should be done.
>>
>> This patchset will follow the init_btrfs_fs() method, by introducing
>> an open_ctree_seq[] array, which contains the following sections:
>>
>> - btree_inode init/exit
>> - super block read and verification
>> - mount options and features check
>> - workqueues init/exit
>> - chunk tree init/exit
>> - tree roots init/exit
>> - mount time check and various item load
>> - sysfs init/exit
>> - block group tree init/exit
>> - subvolume trees init/exit
>> - kthread init/exit
>> - qgroup init/exit
>>
>> The remaining part of open_ctree() is only less than 50 lines, and are
>> all related to the very end of the mount progress, including log-replay,
>> uuid tree check.
>
> I'm not sure it's a good idea to split the open_ctree to the sequence if
> initializers, some of the code looks like it's not isolated the same way
> as it was in the module init/exit. The readability is IMHO also worse,
> verifying that some parts depend on each other requires jumping in the
> file. Maybe some parts can be put into more helpers and we can make the
> exit sequence robust enough so we don't need tons of labels and the
> whole can be called regardless of from where it would be called.
>
> This is similar to the array based approach but keeps the code in one
> function. As it is implemented in this patchset I think it's taking it
> too far.

All right, then I'd manually fix the wrongly matched tags.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-25  0:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-12  9:12 [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched Qu Wenruo
2022-10-12  9:12 ` [PATCH v2 01/15] btrfs: initialize fs_info->sb at the very beginning of open_ctree() Qu Wenruo
2022-10-12  9:12 ` [PATCH v2 02/15] btrfs: remove @fs_devices argument from open_ctree() Qu Wenruo
2022-10-12  9:12 ` [PATCH v2 03/15] btrfs: extract btree inode init code into its own init/exit helpers Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 04/15] btrfs: extract super block read code into its own init helper Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 05/15] btrfs: extract mount options and features init " Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 06/15] btrfs: move btrfs_init_workqueus() and btrfs_stop_all_workers() into open_ctree_seq[] Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 07/15] btrfs: extract chunk tree read code into its own init/exit helpers Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 08/15] btrfs: extract tree roots and zone info initialization into " Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 09/15] btrfs: extract mount time checks and items load code into its init helper Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 10/15] btrfs: extract sysfs init into its own helper Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 11/15] btrfs: extra block groups read code into its own init/exit helpers Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 12/15] btrfs: move the fs root related " Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 13/15] btrfs: extract kthread " Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 14/15] btrfs: move qgroup init/exit code into open_ctree_seq[] array Qu Wenruo
2022-10-12  9:13 ` [PATCH v2 15/15] btrfs: introduce a debug mount option to do error injection for each stage of open_ctree() Qu Wenruo
2022-10-14 12:43   ` Josef Bacik
2022-10-14 23:04     ` Qu Wenruo
2022-10-24 13:47 ` [PATCH v2 00/15] btrfs: make open_ctree() init/exit sequence strictly matched David Sterba
2022-10-24 23:02   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).