* [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
2022-10-21 3:12 ` Yu Kuai
2022-10-20 16:46 ` [PATCH 2/6] dm: remove free_table_devices Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
Zero out the pointers to the holder related kobjects so that the holder
code doesn't incorrectly when called by dm for the delayed holder
registration.
Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/genhd.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c
index 17b33c62423df..cd90df6c775c2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -528,8 +528,10 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
blk_unregister_queue(disk);
out_put_slave_dir:
kobject_put(disk->slave_dir);
+ disk->slave_dir = NULL;
out_put_holder_dir:
kobject_put(disk->part0->bd_holder_dir);
+ disk->part0->bd_holder_dir = NULL;
out_del_integrity:
blk_integrity_del(disk);
out_del_block_link:
@@ -623,7 +625,9 @@ void del_gendisk(struct gendisk *disk)
blk_unregister_queue(disk);
kobject_put(disk->part0->bd_holder_dir);
+ disk->part0->bd_holder_dir = NULL;
kobject_put(disk->slave_dir);
+ disk->slave_dir = NULL;
part_stat_set_all(disk->part0, 0);
disk->part0->bd_stamp = 0;
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects
2022-10-20 16:46 ` [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects Christoph Hellwig
@ 2022-10-21 3:12 ` Yu Kuai
2022-10-21 3:22 ` Yu Kuai
0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2022-10-21 3:12 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer
Cc: Yu Kuai, dm-devel, linux-block, yukuai (C)
Hi, Christoph
在 2022/10/21 0:46, Christoph Hellwig 写道:
> Zero out the pointers to the holder related kobjects so that the holder
> code doesn't incorrectly when called by dm for the delayed holder
> registration.
>
> Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
> Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/genhd.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 17b33c62423df..cd90df6c775c2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -528,8 +528,10 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
> blk_unregister_queue(disk);
> out_put_slave_dir:
> kobject_put(disk->slave_dir);
> + disk->slave_dir = NULL;
> out_put_holder_dir:
> kobject_put(disk->part0->bd_holder_dir);
> + disk->part0->bd_holder_dir = NULL;
> out_del_integrity:
> blk_integrity_del(disk);
> out_del_block_link:
> @@ -623,7 +625,9 @@ void del_gendisk(struct gendisk *disk)
> blk_unregister_queue(disk);
>
> kobject_put(disk->part0->bd_holder_dir);
> + disk->part0->bd_holder_dir = NULL;
I don't think this is enough. There is still no guarantee that
bd_link_disk_holder() won't access freed bd_holder_dir. It's still
possible that bd_link_disk_holer() read bd_holder_dir first, and then
del_gendisk() free and reset it.
By the way, I still think that the problem for the bd_holder_dir uaf is
not just related to dm.
Thanks,
Kuai
> kobject_put(disk->slave_dir);
> + disk->slave_dir = NULL;
>
> part_stat_set_all(disk->part0, 0);
> disk->part0->bd_stamp = 0;
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects
2022-10-21 3:12 ` Yu Kuai
@ 2022-10-21 3:22 ` Yu Kuai
2022-10-30 15:24 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2022-10-21 3:22 UTC (permalink / raw)
To: Yu Kuai, Christoph Hellwig, Jens Axboe, Alasdair Kergon,
Mike Snitzer
Cc: dm-devel, linux-block, yukuai (C)
在 2022/10/21 11:12, Yu Kuai 写道:
> Hi, Christoph
>
> 在 2022/10/21 0:46, Christoph Hellwig 写道:
>> Zero out the pointers to the holder related kobjects so that the holder
>> code doesn't incorrectly when called by dm for the delayed holder
>> registration.
>>
>> Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
>> Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>> block/genhd.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 17b33c62423df..cd90df6c775c2 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -528,8 +528,10 @@ int __must_check device_add_disk(struct device
>> *parent, struct gendisk *disk,
>> blk_unregister_queue(disk);
>> out_put_slave_dir:
>> kobject_put(disk->slave_dir);
>> + disk->slave_dir = NULL;
>> out_put_holder_dir:
>> kobject_put(disk->part0->bd_holder_dir);
>> + disk->part0->bd_holder_dir = NULL;
>> out_del_integrity:
>> blk_integrity_del(disk);
>> out_del_block_link:
>> @@ -623,7 +625,9 @@ void del_gendisk(struct gendisk *disk)
>> blk_unregister_queue(disk);
>> kobject_put(disk->part0->bd_holder_dir);
>> + disk->part0->bd_holder_dir = NULL;
>
> I don't think this is enough. There is still no guarantee that
> bd_link_disk_holder() won't access freed bd_holder_dir. It's still
> possible that bd_link_disk_holer() read bd_holder_dir first, and then
> del_gendisk() free and reset it.
I just verify that with this patchset applied, and together with the
following patch to add some delay:
diff --git a/block/holder.c b/block/holder.c
index b058bda757c1..b7d87d47afee 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/blkdev.h>
#include <linux/slab.h>
+#include <linux/delay.h>
struct bd_holder_disk {
struct list_head list;
@@ -32,11 +33,16 @@ static void del_symlink(struct kobject *from, struct
kobject *to)
static int __link_disk_holder(struct block_device *bdev, struct
gendisk *disk)
{
int ret;
+ struct kobject *holder = bdev->bd_holder_dir;
ret = add_symlink(disk->slave_dir, bdev_kobj(bdev));
if (ret)
return ret;
- ret = add_symlink(bdev->bd_holder_dir, &disk_to_dev(disk)->kobj);
+
+ printk("%s: delay 5s\n", __func__);
+ msleep(5000);
+
+ ret = add_symlink(holder, &disk_to_dev(disk)->kobj);
if (ret)
del_symlink(disk->slave_dir, bdev_kobj(bdev));
return ret;
Fowlling uaf can be triggered 100%:
[ 115.675974] md: personality for level 10 is not loaded!
[ 121.365398] md: personality for level 10 is not loaded!
[ 123.011503] __link_disk_holder: delay 5s
[ 128.086884]
==================================================================
[ 128.092786] BUG: KASAN: use-after-free in sysfs_create_link+0x28/0x80
[ 128.094489] Read of size 8 at addr ffff888101435e30 by task dmsetup/935
[ 128.095629]
[ 128.095955] CPU: 7 PID: 935 Comm: dmsetup Not tainted
6.0.0-next-20221017-00018-gec99c641bf1
[ 128.097550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20190727_073836-b4
[ 128.098939] Call Trace:
[ 128.099293] <TASK>
[ 128.099602] ? dump_stack_lvl+0x73/0x9f
[ 128.100638] ? print_report+0x249/0x746
[ 128.101115] ? __virt_addr_valid+0xd4/0x200
[ 128.101669] ? sysfs_create_link+0x28/0x80
[ 128.102206] ? kasan_report+0xc0/0x120
[ 128.102687] ? sysfs_create_link+0x28/0x80
[ 128.103254] ? __asan_load8+0x74/0x110
[ 128.103729] ? sysfs_create_link+0x28/0x80
[ 128.104272] ? bd_link_disk_holder.cold+0x7d/0x1b3
[ 128.104905] ? dm_setup_md_queue+0x1d5/0x340
[ 128.105461] ? dm_table_complete+0x590/0xdd0
[ 128.106010] ? dm_get_immutable_target_type+0x30/0x30
[ 128.106702] ? memset+0x4a/0x70
[ 128.107136] ? kvfree+0x44/0x50
[ 128.107632] ? dm_table_create+0x1a3/0x240
[ 128.108341] ? table_load+0x469/0x710
[ 128.108971] ? list_devices+0x4c0/0x4c0
[ 128.109629] ? kvmalloc_node+0x7d/0x160
[ 128.110269] ? __kmalloc_node+0x185/0x2b0
[ 128.110928] ? ctl_ioctl+0x388/0x7b0
[ 128.111539] ? list_devices+0x4c0/0x4c0
[ 128.112216] ? free_params+0x50/0x50
[ 128.112825] ? do_vfs_ioctl+0x931/0x10d0
[ 128.113690] ? handle_mm_fault+0x3aa/0x610
[ 128.114392] ? __kasan_check_read+0x1d/0x30
[ 128.115089] ? __fget_light+0xc2/0x370
[ 128.115741] ? dm_ctl_ioctl+0x12/0x20
[ 128.116376] ? __x64_sys_ioctl+0xd5/0x150
[ 128.117100] ? do_syscall_64+0x35/0x80
[ 128.117746] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 128.118546] </TASK>
[ 128.118940]
[ 128.119246] Allocated by task 179:
[ 128.119799] kasan_save_stack+0x26/0x60
[ 128.120419] kasan_set_track+0x29/0x40
[ 128.120915] kasan_save_alloc_info+0x1f/0x40
[ 128.121508] __kasan_kmalloc+0xcb/0xe0
[ 128.122127] kmalloc_trace+0x7e/0x150
[ 128.122755] kobject_create_and_add+0x3d/0xc0
[ 128.123475] device_add_disk+0x3d1/0x830
[ 128.124154] sd_probe+0x603/0x8e0
[ 128.124727] really_probe+0x4f2/0x730
[ 128.125351] __driver_probe_device+0x223/0x320
[ 128.126080] driver_probe_device+0x69/0x140
[ 128.126765] __device_attach_driver+0x10a/0x200
[ 128.127475] bus_for_each_drv+0x10e/0x1b0
[ 128.128145] __device_attach_async_helper+0x175/0x230
[ 128.128944] async_run_entry_fn+0x73/0x300
[ 128.129608] process_one_work+0x47b/0x9a0
[ 128.130275] worker_thread+0x30c/0x8b0
[ 128.130905] kthread+0x1e5/0x250
[ 128.131476] ret_from_fork+0x1f/0x30
[ 128.132080]
[ 128.132379] Freed by task 823:
[ 128.132887] kasan_save_stack+0x26/0x60
[ 128.133547] kasan_set_track+0x29/0x40
[ 128.134187] kasan_save_free_info+0x32/0x60
[ 128.134853] __kasan_slab_free+0x172/0x2c0
[ 128.135528] __kmem_cache_free+0x11c/0x560
[ 128.136208] kfree+0xd3/0x240
[ 128.136724] dynamic_kobj_release+0x1e/0x60
[ 128.137417] kobject_put+0x192/0x410
[ 128.138027] del_gendisk+0x227/0x670
[ 128.138611] sd_remove+0x65/0xa0
[ 128.139168] device_remove+0xbe/0xe0
[ 128.139755] device_release_driver_internal+0x161/0x2e0
[ 128.140573] device_release_driver+0x16/0x20
[ 128.141285] bus_remove_device+0x1d3/0x310
[ 128.141958] device_del+0x310/0x7e0
[ 128.142555] __scsi_remove_device+0x26c/0x360
[ 128.143280] scsi_remove_device+0x38/0x60
[ 128.143959] sdev_store_delete+0x73/0xf0
[ 128.144631] dev_attr_store+0x40/0x70
[ 128.145271] sysfs_kf_write+0x89/0xc0
[ 128.145910] kernfs_fop_write_iter+0x21d/0x330
[ 128.146653] vfs_write+0x60e/0x840
[ 128.147244] ksys_write+0xcd/0x1e0
[ 128.147840] __x64_sys_write+0x46/0x60
[ 128.148483] do_syscall_64+0x35/0x80
[ 128.149134] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> By the way, I still think that the problem for the bd_holder_dir uaf is
> not just related to dm.
>
> Thanks,
> Kuai
>
>> kobject_put(disk->slave_dir);
>> + disk->slave_dir = NULL;
>> part_stat_set_all(disk->part0, 0);
>> disk->part0->bd_stamp = 0;
>>
>
> .
>
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects
2022-10-21 3:22 ` Yu Kuai
@ 2022-10-30 15:24 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-30 15:24 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, Jens Axboe, Alasdair Kergon, Mike Snitzer,
dm-devel, linux-block, yukuai (C)
On Fri, Oct 21, 2022 at 11:22:05AM +0800, Yu Kuai wrote:
> I just verify that with this patchset applied, and together with the
> following patch to add some delay:
Yes, that's because we should only blow away the slave_dir and not
bd_holder_dir.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/6] dm: remove free_table_devices
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
2022-10-20 16:46 ` [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
2022-10-20 16:46 ` [PATCH 3/6] dm: cleanup open_table_device Christoph Hellwig
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
free_table_devices just warns and frees all table_device structures when
the target removal did not remove them. This should never happen, but
if it did, just freeing the structure without deleting them from the
list or cleaning up the resources would not help at all. So just WARN on
a non-empty list instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 60549b65c799c..36189a32329aa 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -833,19 +833,6 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
mutex_unlock(&md->table_devices_lock);
}
-static void free_table_devices(struct list_head *devices)
-{
- struct list_head *tmp, *next;
-
- list_for_each_safe(tmp, next, devices) {
- struct table_device *td = list_entry(tmp, struct table_device, list);
-
- DMWARN("dm_destroy: %s still exists with %d references",
- td->dm_dev.name, refcount_read(&td->count));
- kfree(td);
- }
-}
-
/*
* Get the geometry associated with a dm device
*/
@@ -2123,7 +2110,7 @@ static void free_dev(struct mapped_device *md)
cleanup_mapped_device(md);
- free_table_devices(&md->table_devices);
+ WARN_ON_ONCE(!list_empty(&md->table_devices));
dm_stats_cleanup(&md->stats);
free_minor(minor);
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/6] dm: cleanup open_table_device
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
2022-10-20 16:46 ` [PATCH 1/6] block: clear the holder releated fields when deleting the kobjects Christoph Hellwig
2022-10-20 16:46 ` [PATCH 2/6] dm: remove free_table_devices Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
2022-10-20 16:46 ` [PATCH 4/6] dm: cleanup close_table_device Christoph Hellwig
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
Move all the logic for allocation the table_device and linking it into
the list into the open_table_device. This keeps the code tidy and
ensures that the table_devices only exist in fully initialized state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm.c | 56 ++++++++++++++++++++++++-------------------------
1 file changed, 27 insertions(+), 29 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 36189a32329aa..f60d9e8fc3abf 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -732,28 +732,41 @@ static char *_dm_claim_ptr = "I belong to device-mapper";
/*
* Open a table device so we can use it as a map destination.
*/
-static int open_table_device(struct table_device *td, dev_t dev,
- struct mapped_device *md)
+static struct table_device *open_table_device(struct mapped_device *md,
+ dev_t dev, fmode_t mode)
{
+ struct table_device *td;
struct block_device *bdev;
u64 part_off;
int r;
- BUG_ON(td->dm_dev.bdev);
+ td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
+ if (!td)
+ return ERR_PTR(-ENOMEM);
+ refcount_set(&td->count, 1);
- bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
- if (IS_ERR(bdev))
- return PTR_ERR(bdev);
+ bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+ if (IS_ERR(bdev)) {
+ r = PTR_ERR(bdev);
+ goto out_free_td;
+ }
r = bd_link_disk_holder(bdev, dm_disk(md));
- if (r) {
- blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
- return r;
- }
+ if (r)
+ goto out_blkdev_put;
+ td->dm_dev.mode = mode;
td->dm_dev.bdev = bdev;
td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off, NULL, NULL);
- return 0;
+ format_dev_t(td->dm_dev.name, dev);
+ list_add(&td->list, &md->table_devices);
+ return td;
+
+out_blkdev_put:
+ blkdev_put(bdev, mode | FMODE_EXCL);
+out_free_td:
+ kfree(td);
+ return ERR_PTR(r);
}
/*
@@ -786,31 +799,16 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev,
int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
struct dm_dev **result)
{
- int r;
struct table_device *td;
mutex_lock(&md->table_devices_lock);
td = find_table_device(&md->table_devices, dev, mode);
if (!td) {
- td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
- if (!td) {
- mutex_unlock(&md->table_devices_lock);
- return -ENOMEM;
- }
-
- td->dm_dev.mode = mode;
- td->dm_dev.bdev = NULL;
-
- if ((r = open_table_device(td, dev, md))) {
+ td = open_table_device(md, dev, mode);
+ if (IS_ERR(td)) {
mutex_unlock(&md->table_devices_lock);
- kfree(td);
- return r;
+ return PTR_ERR(td);
}
-
- format_dev_t(td->dm_dev.name, dev);
-
- refcount_set(&td->count, 1);
- list_add(&td->list, &md->table_devices);
} else {
refcount_inc(&td->count);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/6] dm: cleanup close_table_device
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
` (2 preceding siblings ...)
2022-10-20 16:46 ` [PATCH 3/6] dm: cleanup open_table_device Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
2022-10-20 16:46 ` [PATCH 5/6] dm: track per-add_disk holder relations in DM Christoph Hellwig
2022-10-20 16:46 ` [PATCH 6/6] block: remove delayed holder registration Christoph Hellwig
5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
Take the list unlink and free into close_table_device so that no half
torn down table_devices exist. Also remove the check for a NULL bdev
as that can't happen - open_table_device never adds a table_device to
the list that does not have a valid block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f60d9e8fc3abf..f592d34bd20ed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -774,14 +774,11 @@ static struct table_device *open_table_device(struct mapped_device *md,
*/
static void close_table_device(struct table_device *td, struct mapped_device *md)
{
- if (!td->dm_dev.bdev)
- return;
-
bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
put_dax(td->dm_dev.dax_dev);
- td->dm_dev.bdev = NULL;
- td->dm_dev.dax_dev = NULL;
+ list_del(&td->list);
+ kfree(td);
}
static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -823,11 +820,8 @@ void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
struct table_device *td = container_of(d, struct table_device, dm_dev);
mutex_lock(&md->table_devices_lock);
- if (refcount_dec_and_test(&td->count)) {
+ if (refcount_dec_and_test(&td->count))
close_table_device(td, md);
- list_del(&td->list);
- kfree(td);
- }
mutex_unlock(&md->table_devices_lock);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/6] dm: track per-add_disk holder relations in DM
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
` (3 preceding siblings ...)
2022-10-20 16:46 ` [PATCH 4/6] dm: cleanup close_table_device Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
2022-10-20 16:46 ` [PATCH 6/6] block: remove delayed holder registration Christoph Hellwig
5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
dm is a bit special in that it opens the underlying devices. Commit
89f871af1b26 ("dm: delay registering the gendisk") tried to accomodate
that by allowing to add the holder to the list before add_gendisk and
then just add them to sysfs once add_disk is called. But that leads to
really odd lifetime problems and error handling problems as we can't
know the state of the kobjects and don't unwind properly. To fix this
switch to just registering all existing table_devices with the holder
code right after add_disk, and remove them before calling del_gendisk.
Fixes: 89f871af1b26 ("dm: delay registering the gendisk")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm.c | 45 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f592d34bd20ed..a9c750abd5365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -751,9 +751,16 @@ static struct table_device *open_table_device(struct mapped_device *md,
goto out_free_td;
}
- r = bd_link_disk_holder(bdev, dm_disk(md));
- if (r)
- goto out_blkdev_put;
+ /*
+ * We can be called before the dm disk is added. In that case we can't
+ * register the holder relation here. It will be done once add_disk was
+ * called.
+ */
+ if (md->disk->slave_dir) {
+ r = bd_link_disk_holder(bdev, md->disk);
+ if (r)
+ goto out_blkdev_put;
+ }
td->dm_dev.mode = mode;
td->dm_dev.bdev = bdev;
@@ -774,7 +781,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
*/
static void close_table_device(struct table_device *td, struct mapped_device *md)
{
- bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+ if (md->disk->slave_dir)
+ bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
put_dax(td->dm_dev.dax_dev);
list_del(&td->list);
@@ -1951,7 +1959,13 @@ static void cleanup_mapped_device(struct mapped_device *md)
md->disk->private_data = NULL;
spin_unlock(&_minor_lock);
if (dm_get_md_type(md) != DM_TYPE_NONE) {
+ struct table_device *td;
+
dm_sysfs_exit(md);
+ list_for_each_entry(td, &md->table_devices, list) {
+ bd_unlink_disk_holder(td->dm_dev.bdev,
+ md->disk);
+ }
del_gendisk(md->disk);
}
dm_queue_destroy_crypto_profile(md->queue);
@@ -2285,6 +2299,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
{
enum dm_queue_mode type = dm_table_get_type(t);
struct queue_limits limits;
+ struct table_device *td;
int r;
switch (type) {
@@ -2317,13 +2332,27 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
if (r)
return r;
- r = dm_sysfs_init(md);
- if (r) {
- del_gendisk(md->disk);
- return r;
+ /*
+ * Register the holder relationship for devices added before the disk
+ * was live.
+ */
+ list_for_each_entry(td, &md->table_devices, list) {
+ r = bd_link_disk_holder(td->dm_dev.bdev, md->disk);
+ if (r)
+ goto out_undo_holders;
}
+
+ r = dm_sysfs_init(md);
+ if (r)
+ goto out_undo_holders;
md->type = type;
return 0;
+
+out_undo_holders:
+ list_for_each_entry_continue_reverse(td, &md->table_devices, list)
+ bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
+ del_gendisk(md->disk);
+ return r;
}
struct mapped_device *dm_get_md(dev_t dev)
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 6/6] block: remove delayed holder registration
2022-10-20 16:45 fix delayed holder tracking Christoph Hellwig
` (4 preceding siblings ...)
2022-10-20 16:46 ` [PATCH 5/6] dm: track per-add_disk holder relations in DM Christoph Hellwig
@ 2022-10-20 16:46 ` Christoph Hellwig
5 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-10-20 16:46 UTC (permalink / raw)
To: Jens Axboe, Alasdair Kergon, Mike Snitzer; +Cc: Yu Kuai, dm-devel, linux-block
Now that dm has been fixed to track of holder registrations before
add_disk, the somewhat buggy block layer code can be safely removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/genhd.c | 4 ----
block/holder.c | 40 +++++++++++-----------------------------
include/linux/blkdev.h | 5 -----
3 files changed, 11 insertions(+), 38 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index cd90df6c775c2..7c86b161fbd0f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -478,10 +478,6 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
goto out_put_holder_dir;
}
- ret = bd_register_pending_holders(disk);
- if (ret < 0)
- goto out_put_slave_dir;
-
ret = blk_register_queue(disk);
if (ret)
goto out_put_slave_dir;
diff --git a/block/holder.c b/block/holder.c
index 4923a77ebecdc..c183553503b07 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -79,6 +79,9 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
struct bd_holder_disk *holder;
int ret = 0;
+ if (WARN_ON_ONCE(!disk->slave_dir))
+ return -EINVAL;
+
mutex_lock(&disk->open_mutex);
WARN_ON_ONCE(!bdev->bd_holder);
@@ -98,12 +101,10 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
INIT_LIST_HEAD(&holder->list);
holder->bdev = bdev;
holder->refcnt = 1;
- if (disk->slave_dir) {
- ret = __link_disk_holder(bdev, disk);
- if (ret) {
- kfree(holder);
- goto out_unlock;
- }
+ ret = __link_disk_holder(bdev, disk);
+ if (ret) {
+ kfree(holder);
+ goto out_unlock;
}
list_add(&holder->list, &disk->slave_bdevs);
@@ -140,11 +141,13 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
{
struct bd_holder_disk *holder;
+ if (WARN_ON_ONCE(!disk->slave_dir))
+ return;
+
mutex_lock(&disk->open_mutex);
holder = bd_find_holder_disk(bdev, disk);
if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
- if (disk->slave_dir)
- __unlink_disk_holder(bdev, disk);
+ __unlink_disk_holder(bdev, disk);
kobject_put(bdev->bd_holder_dir);
list_del_init(&holder->list);
kfree(holder);
@@ -152,24 +155,3 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
mutex_unlock(&disk->open_mutex);
}
EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
-
-int bd_register_pending_holders(struct gendisk *disk)
-{
- struct bd_holder_disk *holder;
- int ret;
-
- mutex_lock(&disk->open_mutex);
- list_for_each_entry(holder, &disk->slave_bdevs, list) {
- ret = __link_disk_holder(holder->bdev, disk);
- if (ret)
- goto out_undo;
- }
- mutex_unlock(&disk->open_mutex);
- return 0;
-
-out_undo:
- list_for_each_entry_continue_reverse(holder, &disk->slave_bdevs, list)
- __unlink_disk_holder(holder->bdev, disk);
- mutex_unlock(&disk->open_mutex);
- return ret;
-}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50e358a19d986..5a208d6def879 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -839,7 +839,6 @@ void set_capacity(struct gendisk *disk, sector_t size);
#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk);
-int bd_register_pending_holders(struct gendisk *disk);
#else
static inline int bd_link_disk_holder(struct block_device *bdev,
struct gendisk *disk)
@@ -850,10 +849,6 @@ static inline void bd_unlink_disk_holder(struct block_device *bdev,
struct gendisk *disk)
{
}
-static inline int bd_register_pending_holders(struct gendisk *disk)
-{
- return 0;
-}
#endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */
dev_t part_devt(struct gendisk *disk, u8 partno);
--
2.30.2
^ permalink raw reply related [flat|nested] 10+ messages in thread