Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: implement blk_holder_ops call backs
@ 2025-06-06  5:47 Qu Wenruo
  2025-06-06  5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06  5:47 UTC (permalink / raw)
  To: linux-btrfs

Although test case generic/730 is going to be skipped for btrfs due to
the missing shutdown support, it still exposed a problem that btrfs has
no implementation for blk_holder_ops, thus even if lower level driver
wants to notify btrfs there is a device that is going to be removed,
btrfs can do nothing but ignore such critical info.

This series will implement all 4 callbacks for blk_holder_ops, 3 of them
are straightforward, just call the full-fs version of
sync/freeze/unfreeze.

The trickier one is the mark_dead(), all other fses with shutdown
support will shutdown the fs, making all operations to return error,
even if there is a cached page.

For btrfs we do not yet have full shutdown support, but at least we can
notify such problem through dmesg, then check if the fs can still
maintain its RW operations.
If not then mark the fs error, so no more new writes will happen,
preventing further data loss.

Now btrfs will output something like this if the only device of a btrfs
is removed, instead of aborting transaction later due to IO error:

 BTRFS info (device sda): devid 1 device sda path /dev/sda is going to be removed
 BTRFS: error (device sda) in btrfs_dev_mark_dead:294: errno=-5 IO failure (btrfs can no longer maintain read-write due to missing device(s))
 BTRFS info (device sda state E): forced readonly

Future full-fs shutdown will depend on this feature to properly pass
generic/730.

Qu Wenruo (2):
  btrfs: use fs_info as the block device holder
  btrfs: add a simple dead device detection mechanism

 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/fs.h          |  2 -
 fs/btrfs/super.c       |  7 ++--
 fs/btrfs/super.h       |  2 +
 fs/btrfs/volumes.c     | 90 ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h     |  6 +++
 6 files changed, 99 insertions(+), 10 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] btrfs: use fs_info as the block device holder
  2025-06-06  5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
@ 2025-06-06  5:47 ` Qu Wenruo
  2025-06-06  5:47 ` [PATCH 2/2] btrfs: add a simple dead device detection mechanism Qu Wenruo
  2025-06-06  8:18 ` [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06  5:47 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses "btrfs_fs_type" as the bdev holder for all opened
device, which means all btrfses shares the same holder value.

That's only fine when there is no blk_holder_ops provided, but we're
going to implement blk_holder_ops soon, so replace the "btrfs_fs_type"
holder usage, and replace it with a proper fs_info instead.

This means we can remove the btrfs_fs_info::bdev_holder completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/dev-replace.c | 2 +-
 fs/btrfs/fs.h          | 2 --
 fs/btrfs/super.c       | 3 +--
 fs/btrfs/volumes.c     | 4 ++--
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 2decb9fff445..cf63f4b29327 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -250,7 +250,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	}
 
 	bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
-					fs_info->bdev_holder, NULL);
+					   fs_info, NULL);
 	if (IS_ERR(bdev_file)) {
 		btrfs_err(fs_info, "target device %s is invalid!", device_path);
 		return PTR_ERR(bdev_file);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b239e4b8421c..d90304d4e32c 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -715,8 +715,6 @@ struct btrfs_fs_info {
 	u32 data_chunk_allocations;
 	u32 metadata_ratio;
 
-	void *bdev_holder;
-
 	/* Private scrub information */
 	struct mutex scrub_lock;
 	atomic_t scrubs_running;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 2d0d8c6e77b4..c1efd20166cc 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1865,7 +1865,7 @@ static int btrfs_get_tree_super(struct fs_context *fc)
 	fs_devices = device->fs_devices;
 	fs_info->fs_devices = fs_devices;
 
-	ret = btrfs_open_devices(fs_devices, mode, &btrfs_fs_type);
+	ret = btrfs_open_devices(fs_devices, mode, fs_info);
 	mutex_unlock(&uuid_mutex);
 	if (ret)
 		return ret;
@@ -1905,7 +1905,6 @@ static int btrfs_get_tree_super(struct fs_context *fc)
 	} else {
 		snprintf(sb->s_id, sizeof(sb->s_id), "%pg", bdev);
 		shrinker_debugfs_rename(sb->s_shrink, "sb-btrfs:%s", sb->s_id);
-		btrfs_sb(sb)->bdev_holder = &btrfs_fs_type;
 		ret = btrfs_fill_super(sb, fs_devices);
 		if (ret) {
 			deactivate_locked_super(sb);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1535a425e8f9..dae946ee6b07 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2705,7 +2705,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		return -EROFS;
 
 	bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
-					fs_info->bdev_holder, NULL);
+					   fs_info, NULL);
 	if (IS_ERR(bdev_file))
 		return PTR_ERR(bdev_file);
 
@@ -7174,7 +7174,7 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
-	ret = open_fs_devices(fs_devices, BLK_OPEN_READ, fs_info->bdev_holder);
+	ret = open_fs_devices(fs_devices, BLK_OPEN_READ, fs_info);
 	if (ret) {
 		free_fs_devices(fs_devices);
 		return ERR_PTR(ret);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] btrfs: add a simple dead device detection mechanism
  2025-06-06  5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
  2025-06-06  5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
@ 2025-06-06  5:47 ` Qu Wenruo
  2025-06-06  8:18 ` [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06  5:47 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs doesn't have a real mechanism to detect a dead device at
runtime.

This makes btrfs to treat intentionally or unintentionally removed
device as usual, making test case generic/730 to fail as btrfs still
return the cached data from page cache.
(The root cause is btrfs has no shutdown support for test cases
requiring shutdown)

Add a very basic and simple dead device detection mechanism for btrfs,
it utilized the blk_holder_ops for all opened btrfs devices, if a device
is removed unexpectedly or is going to be removed, btrfs will:

- Output a warning or info line for that device
- Mark that device as missing
- If the fs can not maintain RW operations mark the fs as error

Furthermore to grab the btrfs_device just from a block device, introduce
one extra member, btrfs_dev_lookup_args::devt.

Unlike other members, if this member is provided it's enough to uniquely
locate a btrfs_device without any other members populated.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/super.c       |  4 +-
 fs/btrfs/super.h       |  2 +
 fs/btrfs/volumes.c     | 88 +++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h     |  6 +++
 5 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index cf63f4b29327..c7f724dc7892 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -250,7 +250,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	}
 
 	bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
-					   fs_info, NULL);
+					   fs_info, &btrfs_bdev_ops);
 	if (IS_ERR(bdev_file)) {
 		btrfs_err(fs_info, "target device %s is invalid!", device_path);
 		return PTR_ERR(bdev_file);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c1efd20166cc..2b03f54aac67 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2273,7 +2273,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 	return ret;
 }
 
-static int btrfs_freeze(struct super_block *sb)
+int btrfs_freeze(struct super_block *sb)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 
@@ -2339,7 +2339,7 @@ static int check_dev_super(struct btrfs_device *dev)
 	return ret;
 }
 
-static int btrfs_unfreeze(struct super_block *sb)
+int btrfs_unfreeze(struct super_block *sb)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_device *device;
diff --git a/fs/btrfs/super.h b/fs/btrfs/super.h
index d80a86acfbbe..1d4a029a6042 100644
--- a/fs/btrfs/super.h
+++ b/fs/btrfs/super.h
@@ -14,6 +14,8 @@ bool btrfs_check_options(const struct btrfs_fs_info *info,
 			 unsigned long long *mount_opt,
 			 unsigned long flags);
 int btrfs_sync_fs(struct super_block *sb, int wait);
+int btrfs_freeze(struct super_block *sb);
+int btrfs_unfreeze(struct super_block *sb);
 char *btrfs_get_subvol_name_from_objectid(struct btrfs_fs_info *fs_info,
 					  u64 subvol_objectid);
 void btrfs_set_free_space_cache_settings(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dae946ee6b07..96ee84ef0be0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -256,6 +256,88 @@ void btrfs_describe_block_groups(u64 bg_flags, char *buf, u32 size_buf)
 out_overflow:;
 }
 
+static void btrfs_dev_mark_dead(struct block_device *bdev, bool surprise)
+	__releases(&bdev->bd_holder_lock)
+{
+	struct btrfs_fs_info *fs_info = bdev->bd_holder;
+	struct btrfs_dev_lookup_args args = { .devt = bdev->bd_dev, };
+	struct btrfs_device *device;
+
+	mutex_lock(&fs_info->fs_devices->device_list_mutex);
+	device = btrfs_find_device(fs_info->fs_devices, &args);
+	if (unlikely(!device)) {
+		btrfs_crit(fs_info, "can't find a btrfs_device for %pg", bdev);
+		mutex_unlock(&bdev->bd_holder_lock);
+		goto out;
+	}
+	if (surprise)
+		btrfs_warn_in_rcu(fs_info, "devid %llu device %pg path %s is dead",
+				  device->devid, device->bdev, btrfs_dev_name(device));
+	else
+		btrfs_info_in_rcu(fs_info, "devid %llu device %pg path %s is going to be removed",
+				  device->devid, device->bdev, btrfs_dev_name(device));
+	mutex_unlock(&bdev->bd_holder_lock);
+	set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
+	device->fs_devices->missing_devices++;
+	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
+		list_del_init(&device->dev_alloc_list);
+		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+		device->fs_devices->rw_devices--;
+	}
+	/*
+	 * If we can no longer maintain the RW opeartions for the fs, mark the
+	 * fs error.
+	 */
+	if (!btrfs_check_rw_degradable(fs_info, device)) {
+		btrfs_handle_fs_error(fs_info, -EIO,
+			"btrfs can no longer maintain read-write due to missing device(s)");
+	} else  {
+		btrfs_set_opt(fs_info->mount_opt, DEGRADED);
+		btrfs_warn(fs_info, "filesystem degraded due to missing device(s)");
+	}
+out:
+	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+}
+
+static void btrfs_dev_sync(struct block_device *bdev)
+	__releases(&bdev->bd_holder_lock)
+{
+	struct btrfs_fs_info *fs_info = bdev->bd_holder;
+	int ret;
+
+	mutex_unlock(&bdev->bd_holder_lock);
+	ret = btrfs_start_delalloc_roots(fs_info, LONG_MAX, false);
+	if (ret)
+		return;
+	ret = btrfs_sync_fs(fs_info->sb, 1);
+	wake_up_process(fs_info->cleaner_kthread);
+}
+
+static int btrfs_dev_freeze(struct block_device *bdev)
+	__releases(&bdev->bd_holder_lock)
+{
+	struct btrfs_fs_info *fs_info = bdev->bd_holder;
+
+	mutex_unlock(&bdev->bd_holder_lock);
+	return btrfs_freeze(fs_info->sb);
+}
+
+static int btrfs_dev_unfreeze(struct block_device *bdev)
+	__releases(&bdev->bd_holder_lock)
+{
+	struct btrfs_fs_info *fs_info = bdev->bd_holder;
+
+	mutex_unlock(&bdev->bd_holder_lock);
+	return btrfs_unfreeze(fs_info->sb);
+}
+
+const struct blk_holder_ops btrfs_bdev_ops = {
+	.mark_dead = btrfs_dev_mark_dead,
+	.sync = btrfs_dev_sync,
+	.freeze = btrfs_dev_freeze,
+	.thaw = btrfs_dev_unfreeze,
+};
+
 static int init_first_rw_device(struct btrfs_trans_handle *trans);
 static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
@@ -473,7 +555,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 	struct block_device *bdev;
 	int ret;
 
-	*bdev_file = bdev_file_open_by_path(device_path, flags, holder, NULL);
+	*bdev_file = bdev_file_open_by_path(device_path, flags, holder, &btrfs_bdev_ops);
 
 	if (IS_ERR(*bdev_file)) {
 		ret = PTR_ERR(*bdev_file);
@@ -2705,7 +2787,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		return -EROFS;
 
 	bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
-					   fs_info, NULL);
+					   fs_info, &btrfs_bdev_ops);
 	if (IS_ERR(bdev_file))
 		return PTR_ERR(bdev_file);
 
@@ -6798,6 +6880,8 @@ static bool dev_args_match_device(const struct btrfs_dev_lookup_args *args,
 			return true;
 		return false;
 	}
+	if (args->devt)
+		return device->devt == args->devt;
 
 	if (device->devid != args->devid)
 		return false;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6d8b1f38e3ee..a16c6109154b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -607,6 +607,7 @@ struct btrfs_raid_attr {
 };
 
 extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
+extern const struct blk_holder_ops btrfs_bdev_ops;
 
 struct btrfs_chunk_map {
 	struct rb_node rb_node;
@@ -652,6 +653,11 @@ struct btrfs_dev_lookup_args {
 	u64 devid;
 	u8 *uuid;
 	u8 *fsid;
+	/*
+	 * If @devt is set (non-zero), then args will be ignored since the
+	 * non-zero dev_t can locate the device.
+	 */
+	dev_t devt;
 	bool missing;
 };
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] btrfs: implement blk_holder_ops call backs
  2025-06-06  5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
  2025-06-06  5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
  2025-06-06  5:47 ` [PATCH 2/2] btrfs: add a simple dead device detection mechanism Qu Wenruo
@ 2025-06-06  8:18 ` Qu Wenruo
  2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06  8:18 UTC (permalink / raw)
  To: linux-btrfs



在 2025/6/6 15:17, Qu Wenruo 写道:
> Although test case generic/730 is going to be skipped for btrfs due to
> the missing shutdown support, it still exposed a problem that btrfs has
> no implementation for blk_holder_ops, thus even if lower level driver
> wants to notify btrfs there is a device that is going to be removed,
> btrfs can do nothing but ignore such critical info.
> 
> This series will implement all 4 callbacks for blk_holder_ops, 3 of them
> are straightforward, just call the full-fs version of
> sync/freeze/unfreeze.
> 
> The trickier one is the mark_dead(), all other fses with shutdown
> support will shutdown the fs, making all operations to return error,
> even if there is a cached page.
> 
> For btrfs we do not yet have full shutdown support, but at least we can
> notify such problem through dmesg, then check if the fs can still
> maintain its RW operations.
> If not then mark the fs error, so no more new writes will happen,
> preventing further data loss.
> 
> Now btrfs will output something like this if the only device of a btrfs
> is removed, instead of aborting transaction later due to IO error:
> 
>   BTRFS info (device sda): devid 1 device sda path /dev/sda is going to be removed
>   BTRFS: error (device sda) in btrfs_dev_mark_dead:294: errno=-5 IO failure (btrfs can no longer maintain read-write due to missing device(s))
>   BTRFS info (device sda state E): forced readonly
> 
> Future full-fs shutdown will depend on this feature to properly pass
> generic/730.
> 
> Qu Wenruo (2):
>    btrfs: use fs_info as the block device holder
>    btrfs: add a simple dead device detection mechanism

The series is triggering crash at generic/085.

It looks like it may not be a good idea to use fs_info directly as the 
block device holder, under heavy mount/unmount and per-device 
freeze/unfreeze workload, this can lead to various use-after-free problems.

Will get this reworked.

Thanks,
Qu

> 
>   fs/btrfs/dev-replace.c |  2 +-
>   fs/btrfs/fs.h          |  2 -
>   fs/btrfs/super.c       |  7 ++--
>   fs/btrfs/super.h       |  2 +
>   fs/btrfs/volumes.c     | 90 ++++++++++++++++++++++++++++++++++++++++--
>   fs/btrfs/volumes.h     |  6 +++
>   6 files changed, 99 insertions(+), 10 deletions(-)
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-06  8:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06  5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
2025-06-06  5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
2025-06-06  5:47 ` [PATCH 2/2] btrfs: add a simple dead device detection mechanism Qu Wenruo
2025-06-06  8:18 ` [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox