* [PATCH 1/2] btrfs: use fs_info as the block device holder
2025-06-06 5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
@ 2025-06-06 5:47 ` Qu Wenruo
2025-06-06 5:47 ` [PATCH 2/2] btrfs: add a simple dead device detection mechanism Qu Wenruo
2025-06-06 8:18 ` [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06 5:47 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs uses "btrfs_fs_type" as the bdev holder for all opened
device, which means all btrfses shares the same holder value.
That's only fine when there is no blk_holder_ops provided, but we're
going to implement blk_holder_ops soon, so replace the "btrfs_fs_type"
holder usage, and replace it with a proper fs_info instead.
This means we can remove the btrfs_fs_info::bdev_holder completely.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/dev-replace.c | 2 +-
fs/btrfs/fs.h | 2 --
fs/btrfs/super.c | 3 +--
fs/btrfs/volumes.c | 4 ++--
4 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 2decb9fff445..cf63f4b29327 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -250,7 +250,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
}
bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->bdev_holder, NULL);
+ fs_info, NULL);
if (IS_ERR(bdev_file)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
return PTR_ERR(bdev_file);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index b239e4b8421c..d90304d4e32c 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -715,8 +715,6 @@ struct btrfs_fs_info {
u32 data_chunk_allocations;
u32 metadata_ratio;
- void *bdev_holder;
-
/* Private scrub information */
struct mutex scrub_lock;
atomic_t scrubs_running;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 2d0d8c6e77b4..c1efd20166cc 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1865,7 +1865,7 @@ static int btrfs_get_tree_super(struct fs_context *fc)
fs_devices = device->fs_devices;
fs_info->fs_devices = fs_devices;
- ret = btrfs_open_devices(fs_devices, mode, &btrfs_fs_type);
+ ret = btrfs_open_devices(fs_devices, mode, fs_info);
mutex_unlock(&uuid_mutex);
if (ret)
return ret;
@@ -1905,7 +1905,6 @@ static int btrfs_get_tree_super(struct fs_context *fc)
} else {
snprintf(sb->s_id, sizeof(sb->s_id), "%pg", bdev);
shrinker_debugfs_rename(sb->s_shrink, "sb-btrfs:%s", sb->s_id);
- btrfs_sb(sb)->bdev_holder = &btrfs_fs_type;
ret = btrfs_fill_super(sb, fs_devices);
if (ret) {
deactivate_locked_super(sb);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1535a425e8f9..dae946ee6b07 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2705,7 +2705,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
return -EROFS;
bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->bdev_holder, NULL);
+ fs_info, NULL);
if (IS_ERR(bdev_file))
return PTR_ERR(bdev_file);
@@ -7174,7 +7174,7 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
if (IS_ERR(fs_devices))
return fs_devices;
- ret = open_fs_devices(fs_devices, BLK_OPEN_READ, fs_info->bdev_holder);
+ ret = open_fs_devices(fs_devices, BLK_OPEN_READ, fs_info);
if (ret) {
free_fs_devices(fs_devices);
return ERR_PTR(ret);
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH 2/2] btrfs: add a simple dead device detection mechanism
2025-06-06 5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
2025-06-06 5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
@ 2025-06-06 5:47 ` Qu Wenruo
2025-06-06 8:18 ` [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06 5:47 UTC (permalink / raw)
To: linux-btrfs
Currently btrfs doesn't have a real mechanism to detect a dead device at
runtime.
This makes btrfs to treat intentionally or unintentionally removed
device as usual, making test case generic/730 to fail as btrfs still
return the cached data from page cache.
(The root cause is btrfs has no shutdown support for test cases
requiring shutdown)
Add a very basic and simple dead device detection mechanism for btrfs,
it utilized the blk_holder_ops for all opened btrfs devices, if a device
is removed unexpectedly or is going to be removed, btrfs will:
- Output a warning or info line for that device
- Mark that device as missing
- If the fs can not maintain RW operations mark the fs as error
Furthermore to grab the btrfs_device just from a block device, introduce
one extra member, btrfs_dev_lookup_args::devt.
Unlike other members, if this member is provided it's enough to uniquely
locate a btrfs_device without any other members populated.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/dev-replace.c | 2 +-
fs/btrfs/super.c | 4 +-
fs/btrfs/super.h | 2 +
fs/btrfs/volumes.c | 88 +++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/volumes.h | 6 +++
5 files changed, 97 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index cf63f4b29327..c7f724dc7892 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -250,7 +250,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
}
bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info, NULL);
+ fs_info, &btrfs_bdev_ops);
if (IS_ERR(bdev_file)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
return PTR_ERR(bdev_file);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c1efd20166cc..2b03f54aac67 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2273,7 +2273,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
return ret;
}
-static int btrfs_freeze(struct super_block *sb)
+int btrfs_freeze(struct super_block *sb)
{
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
@@ -2339,7 +2339,7 @@ static int check_dev_super(struct btrfs_device *dev)
return ret;
}
-static int btrfs_unfreeze(struct super_block *sb)
+int btrfs_unfreeze(struct super_block *sb)
{
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_device *device;
diff --git a/fs/btrfs/super.h b/fs/btrfs/super.h
index d80a86acfbbe..1d4a029a6042 100644
--- a/fs/btrfs/super.h
+++ b/fs/btrfs/super.h
@@ -14,6 +14,8 @@ bool btrfs_check_options(const struct btrfs_fs_info *info,
unsigned long long *mount_opt,
unsigned long flags);
int btrfs_sync_fs(struct super_block *sb, int wait);
+int btrfs_freeze(struct super_block *sb);
+int btrfs_unfreeze(struct super_block *sb);
char *btrfs_get_subvol_name_from_objectid(struct btrfs_fs_info *fs_info,
u64 subvol_objectid);
void btrfs_set_free_space_cache_settings(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dae946ee6b07..96ee84ef0be0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -256,6 +256,88 @@ void btrfs_describe_block_groups(u64 bg_flags, char *buf, u32 size_buf)
out_overflow:;
}
+static void btrfs_dev_mark_dead(struct block_device *bdev, bool surprise)
+ __releases(&bdev->bd_holder_lock)
+{
+ struct btrfs_fs_info *fs_info = bdev->bd_holder;
+ struct btrfs_dev_lookup_args args = { .devt = bdev->bd_dev, };
+ struct btrfs_device *device;
+
+ mutex_lock(&fs_info->fs_devices->device_list_mutex);
+ device = btrfs_find_device(fs_info->fs_devices, &args);
+ if (unlikely(!device)) {
+ btrfs_crit(fs_info, "can't find a btrfs_device for %pg", bdev);
+ mutex_unlock(&bdev->bd_holder_lock);
+ goto out;
+ }
+ if (surprise)
+ btrfs_warn_in_rcu(fs_info, "devid %llu device %pg path %s is dead",
+ device->devid, device->bdev, btrfs_dev_name(device));
+ else
+ btrfs_info_in_rcu(fs_info, "devid %llu device %pg path %s is going to be removed",
+ device->devid, device->bdev, btrfs_dev_name(device));
+ mutex_unlock(&bdev->bd_holder_lock);
+ set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
+ device->fs_devices->missing_devices++;
+ if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
+ list_del_init(&device->dev_alloc_list);
+ clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ device->fs_devices->rw_devices--;
+ }
+ /*
+ * If we can no longer maintain the RW opeartions for the fs, mark the
+ * fs error.
+ */
+ if (!btrfs_check_rw_degradable(fs_info, device)) {
+ btrfs_handle_fs_error(fs_info, -EIO,
+ "btrfs can no longer maintain read-write due to missing device(s)");
+ } else {
+ btrfs_set_opt(fs_info->mount_opt, DEGRADED);
+ btrfs_warn(fs_info, "filesystem degraded due to missing device(s)");
+ }
+out:
+ mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+}
+
+static void btrfs_dev_sync(struct block_device *bdev)
+ __releases(&bdev->bd_holder_lock)
+{
+ struct btrfs_fs_info *fs_info = bdev->bd_holder;
+ int ret;
+
+ mutex_unlock(&bdev->bd_holder_lock);
+ ret = btrfs_start_delalloc_roots(fs_info, LONG_MAX, false);
+ if (ret)
+ return;
+ ret = btrfs_sync_fs(fs_info->sb, 1);
+ wake_up_process(fs_info->cleaner_kthread);
+}
+
+static int btrfs_dev_freeze(struct block_device *bdev)
+ __releases(&bdev->bd_holder_lock)
+{
+ struct btrfs_fs_info *fs_info = bdev->bd_holder;
+
+ mutex_unlock(&bdev->bd_holder_lock);
+ return btrfs_freeze(fs_info->sb);
+}
+
+static int btrfs_dev_unfreeze(struct block_device *bdev)
+ __releases(&bdev->bd_holder_lock)
+{
+ struct btrfs_fs_info *fs_info = bdev->bd_holder;
+
+ mutex_unlock(&bdev->bd_holder_lock);
+ return btrfs_unfreeze(fs_info->sb);
+}
+
+const struct blk_holder_ops btrfs_bdev_ops = {
+ .mark_dead = btrfs_dev_mark_dead,
+ .sync = btrfs_dev_sync,
+ .freeze = btrfs_dev_freeze,
+ .thaw = btrfs_dev_unfreeze,
+};
+
static int init_first_rw_device(struct btrfs_trans_handle *trans);
static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
@@ -473,7 +555,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
struct block_device *bdev;
int ret;
- *bdev_file = bdev_file_open_by_path(device_path, flags, holder, NULL);
+ *bdev_file = bdev_file_open_by_path(device_path, flags, holder, &btrfs_bdev_ops);
if (IS_ERR(*bdev_file)) {
ret = PTR_ERR(*bdev_file);
@@ -2705,7 +2787,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
return -EROFS;
bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info, NULL);
+ fs_info, &btrfs_bdev_ops);
if (IS_ERR(bdev_file))
return PTR_ERR(bdev_file);
@@ -6798,6 +6880,8 @@ static bool dev_args_match_device(const struct btrfs_dev_lookup_args *args,
return true;
return false;
}
+ if (args->devt)
+ return device->devt == args->devt;
if (device->devid != args->devid)
return false;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6d8b1f38e3ee..a16c6109154b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -607,6 +607,7 @@ struct btrfs_raid_attr {
};
extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
+extern const struct blk_holder_ops btrfs_bdev_ops;
struct btrfs_chunk_map {
struct rb_node rb_node;
@@ -652,6 +653,11 @@ struct btrfs_dev_lookup_args {
u64 devid;
u8 *uuid;
u8 *fsid;
+ /*
+ * If @devt is set (non-zero), then args will be ignored since the
+ * non-zero dev_t can locate the device.
+ */
+ dev_t devt;
bool missing;
};
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 0/2] btrfs: implement blk_holder_ops call backs
2025-06-06 5:47 [PATCH 0/2] btrfs: implement blk_holder_ops call backs Qu Wenruo
2025-06-06 5:47 ` [PATCH 1/2] btrfs: use fs_info as the block device holder Qu Wenruo
2025-06-06 5:47 ` [PATCH 2/2] btrfs: add a simple dead device detection mechanism Qu Wenruo
@ 2025-06-06 8:18 ` Qu Wenruo
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2025-06-06 8:18 UTC (permalink / raw)
To: linux-btrfs
在 2025/6/6 15:17, Qu Wenruo 写道:
> Although test case generic/730 is going to be skipped for btrfs due to
> the missing shutdown support, it still exposed a problem that btrfs has
> no implementation for blk_holder_ops, thus even if lower level driver
> wants to notify btrfs there is a device that is going to be removed,
> btrfs can do nothing but ignore such critical info.
>
> This series will implement all 4 callbacks for blk_holder_ops, 3 of them
> are straightforward, just call the full-fs version of
> sync/freeze/unfreeze.
>
> The trickier one is the mark_dead(), all other fses with shutdown
> support will shutdown the fs, making all operations to return error,
> even if there is a cached page.
>
> For btrfs we do not yet have full shutdown support, but at least we can
> notify such problem through dmesg, then check if the fs can still
> maintain its RW operations.
> If not then mark the fs error, so no more new writes will happen,
> preventing further data loss.
>
> Now btrfs will output something like this if the only device of a btrfs
> is removed, instead of aborting transaction later due to IO error:
>
> BTRFS info (device sda): devid 1 device sda path /dev/sda is going to be removed
> BTRFS: error (device sda) in btrfs_dev_mark_dead:294: errno=-5 IO failure (btrfs can no longer maintain read-write due to missing device(s))
> BTRFS info (device sda state E): forced readonly
>
> Future full-fs shutdown will depend on this feature to properly pass
> generic/730.
>
> Qu Wenruo (2):
> btrfs: use fs_info as the block device holder
> btrfs: add a simple dead device detection mechanism
The series is triggering crash at generic/085.
It looks like it may not be a good idea to use fs_info directly as the
block device holder, under heavy mount/unmount and per-device
freeze/unfreeze workload, this can lead to various use-after-free problems.
Will get this reworked.
Thanks,
Qu
>
> fs/btrfs/dev-replace.c | 2 +-
> fs/btrfs/fs.h | 2 -
> fs/btrfs/super.c | 7 ++--
> fs/btrfs/super.h | 2 +
> fs/btrfs/volumes.c | 90 ++++++++++++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 6 +++
> 6 files changed, 99 insertions(+), 10 deletions(-)
>
^ permalink raw reply [flat|nested] 4+ messages in thread