* [PATCH] block: fix kobject double initialization in add_disk
@ 2025-08-07 7:20 Zheng Qixing
2025-08-07 8:42 ` Yu Kuai
2025-08-07 11:47 ` Nilay Shroff
0 siblings, 2 replies; 8+ messages in thread
From: Zheng Qixing @ 2025-08-07 7:20 UTC (permalink / raw)
To: axboe
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun, houtao1,
zhengqixing
From: Zheng Qixing <zhengqixing@huawei.com>
Device-mapper can call add_disk() multiple times for the same gendisk
due to its two-phase creation process (dm create + dm load). This leads
to kobject double initialization errors when the underlying iSCSI devices
become temporarily unavailable and then reappear.
However, if the first add_disk() call fails and is retried, the queue_kobj
gets initialized twice, causing:
kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
something is seriously wrong.
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x80
kobject_init.cold+0x43/0x51
blk_register_queue+0x46/0x280
add_disk_fwnode+0xb5/0x280
dm_setup_md_queue+0x194/0x1c0
table_load+0x297/0x2d0
ctl_ioctl+0x2a2/0x480
dm_ctl_ioctl+0xe/0x20
__x64_sys_ioctl+0xc7/0x110
do_syscall_64+0x72/0x390
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix this by separating kobject initialization from sysfs registration:
- Initialize queue_kobj early during gendisk allocation
- add_disk() only adds the already-initialized kobject to sysfs
- del_gendisk() removes from sysfs but doesn't destroy the kobject
- Final cleanup happens when the disk is released
Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
block/blk-sysfs.c | 4 +---
block/blk.h | 1 +
block/genhd.c | 2 ++
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 396cded255ea..37d8654faff9 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
/* nothing to do here, all data is associated with the parent gendisk */
}
-static const struct kobj_type blk_queue_ktype = {
+const struct kobj_type blk_queue_ktype = {
.default_groups = blk_queue_attr_groups,
.sysfs_ops = &queue_sysfs_ops,
.release = blk_queue_release,
@@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
struct request_queue *q = disk->queue;
int ret;
- kobject_init(&disk->queue_kobj, &blk_queue_ktype);
ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
if (ret < 0)
goto out_put_queue_kobj;
@@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
elevator_set_none(q);
blk_debugfs_remove(disk);
- kobject_put(&disk->queue_kobj);
}
diff --git a/block/blk.h b/block/blk.h
index 0a2eccf28ca4..46f566f9b126 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -29,6 +29,7 @@ struct elevator_tags;
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
+extern const struct kobj_type blk_queue_ktype;
extern struct dentry *blk_debugfs_root;
struct blk_flush_queue {
diff --git a/block/genhd.c b/block/genhd.c
index c26733f6324b..9bbc38d12792 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1303,6 +1303,7 @@ static void disk_release(struct device *dev)
disk_free_zone_resources(disk);
xa_destroy(&disk->part_tbl);
+ kobject_put(&disk->queue_kobj);
disk->queue->disk = NULL;
blk_put_queue(disk->queue);
@@ -1486,6 +1487,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
INIT_LIST_HEAD(&disk->slave_bdevs);
#endif
mutex_init(&disk->rqos_state_mutex);
+ kobject_init(&disk->queue_kobj, &blk_queue_ktype);
return disk;
out_erase_part0:
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-07 7:20 [PATCH] block: fix kobject double initialization in add_disk Zheng Qixing
@ 2025-08-07 8:42 ` Yu Kuai
2025-08-07 11:47 ` Nilay Shroff
1 sibling, 0 replies; 8+ messages in thread
From: Yu Kuai @ 2025-08-07 8:42 UTC (permalink / raw)
To: Zheng Qixing, axboe
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, houtao1,
zhengqixing, yukuai (C)
在 2025/08/07 15:20, Zheng Qixing 写道:
> From: Zheng Qixing<zhengqixing@huawei.com>
>
> Device-mapper can call add_disk() multiple times for the same gendisk
> due to its two-phase creation process (dm create + dm load). This leads
> to kobject double initialization errors when the underlying iSCSI devices
> become temporarily unavailable and then reappear.
>
> However, if the first add_disk() call fails and is retried, the queue_kobj
> gets initialized twice, causing:
>
> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
> something is seriously wrong.
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5b/0x80
> kobject_init.cold+0x43/0x51
> blk_register_queue+0x46/0x280
> add_disk_fwnode+0xb5/0x280
> dm_setup_md_queue+0x194/0x1c0
> table_load+0x297/0x2d0
> ctl_ioctl+0x2a2/0x480
> dm_ctl_ioctl+0xe/0x20
> __x64_sys_ioctl+0xc7/0x110
> do_syscall_64+0x72/0x390
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Fix this by separating kobject initialization from sysfs registration:
> - Initialize queue_kobj early during gendisk allocation
> - add_disk() only adds the already-initialized kobject to sysfs
> - del_gendisk() removes from sysfs but doesn't destroy the kobject
> - Final cleanup happens when the disk is released
>
> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
> Reported-by: Li Lingfeng<lilingfeng3@huawei.com>
> Closes:https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
> Signed-off-by: Zheng Qixing<zhengqixing@huawei.com>
> ---
> block/blk-sysfs.c | 4 +---
> block/blk.h | 1 +
> block/genhd.c | 2 ++
> 3 files changed, 4 insertions(+), 3 deletions(-)
LGTM, the kobject_init() is called when queue is allocated before the
fix tag.
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-07 7:20 [PATCH] block: fix kobject double initialization in add_disk Zheng Qixing
2025-08-07 8:42 ` Yu Kuai
@ 2025-08-07 11:47 ` Nilay Shroff
2025-08-07 13:44 ` Zheng Qixing
1 sibling, 1 reply; 8+ messages in thread
From: Nilay Shroff @ 2025-08-07 11:47 UTC (permalink / raw)
To: Zheng Qixing, axboe
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun, houtao1,
zhengqixing
On 8/7/25 12:50 PM, Zheng Qixing wrote:
> From: Zheng Qixing <zhengqixing@huawei.com>
>
> Device-mapper can call add_disk() multiple times for the same gendisk
> due to its two-phase creation process (dm create + dm load). This leads
> to kobject double initialization errors when the underlying iSCSI devices
> become temporarily unavailable and then reappear.
>
> However, if the first add_disk() call fails and is retried, the queue_kobj
> gets initialized twice, causing:
>
> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
> something is seriously wrong.
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5b/0x80
> kobject_init.cold+0x43/0x51
> blk_register_queue+0x46/0x280
> add_disk_fwnode+0xb5/0x280
> dm_setup_md_queue+0x194/0x1c0
> table_load+0x297/0x2d0
> ctl_ioctl+0x2a2/0x480
> dm_ctl_ioctl+0xe/0x20
> __x64_sys_ioctl+0xc7/0x110
> do_syscall_64+0x72/0x390
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Fix this by separating kobject initialization from sysfs registration:
> - Initialize queue_kobj early during gendisk allocation
> - add_disk() only adds the already-initialized kobject to sysfs
> - del_gendisk() removes from sysfs but doesn't destroy the kobject
> - Final cleanup happens when the disk is released
>
> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
> ---
> block/blk-sysfs.c | 4 +---
> block/blk.h | 1 +
> block/genhd.c | 2 ++
> 3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 396cded255ea..37d8654faff9 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
> /* nothing to do here, all data is associated with the parent gendisk */
> }
>
> -static const struct kobj_type blk_queue_ktype = {
> +const struct kobj_type blk_queue_ktype = {
> .default_groups = blk_queue_attr_groups,
> .sysfs_ops = &queue_sysfs_ops,
> .release = blk_queue_release,
> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
> struct request_queue *q = disk->queue;
> int ret;
>
> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
> if (ret < 0)
> goto out_put_queue_kobj;
If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
where we release/put disk->queue_kobj. That would decrement the kref of
disk->queue_kobj and possibly bring it to zero.
Next time, when we call add_disk() again without invoking kobject_init()
(because the initialization is now moved outside add_disk()), the refcount
of disk->queue_kobj — which was previously released — would now go for a
toss. Wouldn't that lead to use-after-free or inconsistent state?
> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
> elevator_set_none(q);
>
> blk_debugfs_remove(disk);
> - kobject_put(&disk->queue_kobj);
> }
I'm thinking a case where add_disk() fails after the queue is registered.
In that case, we call blk_unregister_queue() — which would ideally put()
the disk->queue_kobj.
But if we skip that put() in blk_unregister_queue() (and that's what we do
above), and then later retry add_disk(), wouldn’t kobject_add() from
blk_register_queue() complain loudly — since we’re trying to add a kobject
that was already added previously?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-07 11:47 ` Nilay Shroff
@ 2025-08-07 13:44 ` Zheng Qixing
2025-08-08 0:48 ` Yu Kuai
0 siblings, 1 reply; 8+ messages in thread
From: Zheng Qixing @ 2025-08-07 13:44 UTC (permalink / raw)
To: Nilay Shroff, axboe
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun, houtao1,
zhengqixing
Hi,
在 2025/8/7 19:47, Nilay Shroff 写道:
>
> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>> From: Zheng Qixing <zhengqixing@huawei.com>
>>
>> Device-mapper can call add_disk() multiple times for the same gendisk
>> due to its two-phase creation process (dm create + dm load). This leads
>> to kobject double initialization errors when the underlying iSCSI devices
>> become temporarily unavailable and then reappear.
>>
>> However, if the first add_disk() call fails and is retried, the queue_kobj
>> gets initialized twice, causing:
>>
>> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
>> something is seriously wrong.
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x5b/0x80
>> kobject_init.cold+0x43/0x51
>> blk_register_queue+0x46/0x280
>> add_disk_fwnode+0xb5/0x280
>> dm_setup_md_queue+0x194/0x1c0
>> table_load+0x297/0x2d0
>> ctl_ioctl+0x2a2/0x480
>> dm_ctl_ioctl+0xe/0x20
>> __x64_sys_ioctl+0xc7/0x110
>> do_syscall_64+0x72/0x390
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fix this by separating kobject initialization from sysfs registration:
>> - Initialize queue_kobj early during gendisk allocation
>> - add_disk() only adds the already-initialized kobject to sysfs
>> - del_gendisk() removes from sysfs but doesn't destroy the kobject
>> - Final cleanup happens when the disk is released
>>
>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
>> ---
>> block/blk-sysfs.c | 4 +---
>> block/blk.h | 1 +
>> block/genhd.c | 2 ++
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 396cded255ea..37d8654faff9 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>> /* nothing to do here, all data is associated with the parent gendisk */
>> }
>>
>> -static const struct kobj_type blk_queue_ktype = {
>> +const struct kobj_type blk_queue_ktype = {
>> .default_groups = blk_queue_attr_groups,
>> .sysfs_ops = &queue_sysfs_ops,
>> .release = blk_queue_release,
>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>> struct request_queue *q = disk->queue;
>> int ret;
>>
>> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
>> if (ret < 0)
>> goto out_put_queue_kobj;
> If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
> where we release/put disk->queue_kobj. That would decrement the kref of
> disk->queue_kobj and possibly bring it to zero.
Since we remove the kobject_init() into alloc disk, when the
kobject_add() fails here,
it should return without kobject_del/put().
If kobject_add() succeeds but later steps fail, we should call
kobject_del() to rollback.
The current error handling with kobject_put() in blk_register_queue() is
indeed problematic.
> Next time, when we call add_disk() again without invoking kobject_init()
> (because the initialization is now moved outside add_disk()), the refcount
> of disk->queue_kobj — which was previously released — would now go for a
> toss. Wouldn't that lead to use-after-free or inconsistent state?
>
>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>> elevator_set_none(q);
>>
>> blk_debugfs_remove(disk);
>> - kobject_put(&disk->queue_kobj);
>> }
> I'm thinking a case where add_disk() fails after the queue is registered.
> In that case, we call blk_unregister_queue() — which would ideally put()
> the disk->queue_kobj.
> But if we skip that put() in blk_unregister_queue() (and that's what we do
> above), and then later retry add_disk(), wouldn’t kobject_add() from
> blk_register_queue() complain loudly — since we’re trying to add a kobject
> that was already added previously?
blk_unregister_queue() calls kobject_del(), then the sysfs state is
properly cleaned up
and retry should work fine.
>
> Thanks,
> --Nilay
Thanks,
Qixing
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-07 13:44 ` Zheng Qixing
@ 2025-08-08 0:48 ` Yu Kuai
2025-08-08 8:09 ` Nilay Shroff
0 siblings, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2025-08-08 0:48 UTC (permalink / raw)
To: Zheng Qixing, Nilay Shroff, axboe
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, houtao1,
zhengqixing, yukuai (C)
Hi,
在 2025/08/07 21:44, Zheng Qixing 写道:
> Hi,
>
>
> 在 2025/8/7 19:47, Nilay Shroff 写道:
>>
>> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>>> From: Zheng Qixing <zhengqixing@huawei.com>
>>>
>>> Device-mapper can call add_disk() multiple times for the same gendisk
>>> due to its two-phase creation process (dm create + dm load). This leads
>>> to kobject double initialization errors when the underlying iSCSI
>>> devices
>>> become temporarily unavailable and then reappear.
>>>
>>> However, if the first add_disk() call fails and is retried, the
>>> queue_kobj
>>> gets initialized twice, causing:
>>>
>>> kobject: kobject (ffff88810c27bb90): tried to init an initialized
>>> object,
>>> something is seriously wrong.
>>> Call Trace:
>>> <TASK>
>>> dump_stack_lvl+0x5b/0x80
>>> kobject_init.cold+0x43/0x51
>>> blk_register_queue+0x46/0x280
>>> add_disk_fwnode+0xb5/0x280
>>> dm_setup_md_queue+0x194/0x1c0
>>> table_load+0x297/0x2d0
>>> ctl_ioctl+0x2a2/0x480
>>> dm_ctl_ioctl+0xe/0x20
>>> __x64_sys_ioctl+0xc7/0x110
>>> do_syscall_64+0x72/0x390
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> Fix this by separating kobject initialization from sysfs registration:
>>> - Initialize queue_kobj early during gendisk allocation
>>> - add_disk() only adds the already-initialized kobject to sysfs
>>> - del_gendisk() removes from sysfs but doesn't destroy the kobject
>>> - Final cleanup happens when the disk is released
>>>
>>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from
>>> sysfs")
>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>> Closes:
>>> https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>>>
>>> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
>>> ---
>>> block/blk-sysfs.c | 4 +---
>>> block/blk.h | 1 +
>>> block/genhd.c | 2 ++
>>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>> index 396cded255ea..37d8654faff9 100644
>>> --- a/block/blk-sysfs.c
>>> +++ b/block/blk-sysfs.c
>>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>>> /* nothing to do here, all data is associated with the parent
>>> gendisk */
>>> }
>>> -static const struct kobj_type blk_queue_ktype = {
>>> +const struct kobj_type blk_queue_ktype = {
>>> .default_groups = blk_queue_attr_groups,
>>> .sysfs_ops = &queue_sysfs_ops,
>>> .release = blk_queue_release,
>>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>>> struct request_queue *q = disk->queue;
>>> int ret;
>>> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>>> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj,
>>> "queue");
>>> if (ret < 0)
>>> goto out_put_queue_kobj;
>> If the kobject_add() fails here, then we jump to the label
>> out_put_queue_kobj,
>> where we release/put disk->queue_kobj. That would decrement the kref of
>> disk->queue_kobj and possibly bring it to zero.
>
>
> Since we remove the kobject_init() into alloc disk, when the
> kobject_add() fails here,
>
> it should return without kobject_del/put().
Yes, sorry I didn't noticed.
>
>
> If kobject_add() succeeds but later steps fail, we should call
> kobject_del() to rollback.
>
>
> The current error handling with kobject_put() in blk_register_queue() is
> indeed problematic.
>
>
>> Next time, when we call add_disk() again without invoking kobject_init()
>> (because the initialization is now moved outside add_disk()), the
>> refcount
>> of disk->queue_kobj — which was previously released — would now go for a
>> toss. Wouldn't that lead to use-after-free or inconsistent state?
>>
>>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>>> elevator_set_none(q);
>>> blk_debugfs_remove(disk);
>>> - kobject_put(&disk->queue_kobj);
>>> }
>> I'm thinking a case where add_disk() fails after the queue is registered.
>> In that case, we call blk_unregister_queue() — which would ideally put()
>> the disk->queue_kobj.
>> But if we skip that put() in blk_unregister_queue() (and that's what
>> we do
>> above), and then later retry add_disk(), wouldn’t kobject_add() from
>> blk_register_queue() complain loudly — since we’re trying to add a
>> kobject
>> that was already added previously?
This is exactly the problem reported orginally, now is the same
procedures before 2bd85221a625:
1) allocate memory: kobject_init
2) register queue: kobject_add
3) unregister queue: kobject_del
4) free memory: kobject_put
Noted that kobject_add is corresponding to kobject_del, and they don't
grab/release kobject reference. 2) and 3) can be executed multiple
times, the only thing that I noticed is the following uevent:
kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
Looks like the uevent is only valid for the first one, if first add_disk
failed and then queue is registered again, there won't be uevent again,
see state_add_uevent_sent.
However, this is probably fine.
>
>
> blk_unregister_queue() calls kobject_del(), then the sysfs state is
> properly cleaned up
>
> and retry should work fine.
Thanks,
Kuai
>
>
>>
>> Thanks,
>> --Nilay
>
>
> Thanks,
>
> Qixing
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-08 0:48 ` Yu Kuai
@ 2025-08-08 8:09 ` Nilay Shroff
2025-08-08 8:34 ` Yu Kuai
0 siblings, 1 reply; 8+ messages in thread
From: Nilay Shroff @ 2025-08-08 8:09 UTC (permalink / raw)
To: Yu Kuai, Zheng Qixing, axboe
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, houtao1,
zhengqixing, yukuai (C)
On 8/8/25 6:18 AM, Yu Kuai wrote:
> Hi,
>
> 在 2025/08/07 21:44, Zheng Qixing 写道:
>> Hi,
>>
>>
>> 在 2025/8/7 19:47, Nilay Shroff 写道:
>>>
>>> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>>>> From: Zheng Qixing <zhengqixing@huawei.com>
>>>>
>>>> Device-mapper can call add_disk() multiple times for the same gendisk
>>>> due to its two-phase creation process (dm create + dm load). This leads
>>>> to kobject double initialization errors when the underlying iSCSI devices
>>>> become temporarily unavailable and then reappear.
>>>>
>>>> However, if the first add_disk() call fails and is retried, the queue_kobj
>>>> gets initialized twice, causing:
>>>>
>>>> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
>>>> something is seriously wrong.
>>>> Call Trace:
>>>> <TASK>
>>>> dump_stack_lvl+0x5b/0x80
>>>> kobject_init.cold+0x43/0x51
>>>> blk_register_queue+0x46/0x280
>>>> add_disk_fwnode+0xb5/0x280
>>>> dm_setup_md_queue+0x194/0x1c0
>>>> table_load+0x297/0x2d0
>>>> ctl_ioctl+0x2a2/0x480
>>>> dm_ctl_ioctl+0xe/0x20
>>>> __x64_sys_ioctl+0xc7/0x110
>>>> do_syscall_64+0x72/0x390
>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>
>>>> Fix this by separating kobject initialization from sysfs registration:
>>>> - Initialize queue_kobj early during gendisk allocation
>>>> - add_disk() only adds the already-initialized kobject to sysfs
>>>> - del_gendisk() removes from sysfs but doesn't destroy the kobject
>>>> - Final cleanup happens when the disk is released
>>>>
>>>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
>>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>>>> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
>>>> ---
>>>> block/blk-sysfs.c | 4 +---
>>>> block/blk.h | 1 +
>>>> block/genhd.c | 2 ++
>>>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>>> index 396cded255ea..37d8654faff9 100644
>>>> --- a/block/blk-sysfs.c
>>>> +++ b/block/blk-sysfs.c
>>>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>>>> /* nothing to do here, all data is associated with the parent gendisk */
>>>> }
>>>> -static const struct kobj_type blk_queue_ktype = {
>>>> +const struct kobj_type blk_queue_ktype = {
>>>> .default_groups = blk_queue_attr_groups,
>>>> .sysfs_ops = &queue_sysfs_ops,
>>>> .release = blk_queue_release,
>>>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>>>> struct request_queue *q = disk->queue;
>>>> int ret;
>>>> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>>>> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
>>>> if (ret < 0)
>>>> goto out_put_queue_kobj;
>>> If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
>>> where we release/put disk->queue_kobj. That would decrement the kref of
>>> disk->queue_kobj and possibly bring it to zero.
>>
>>
>> Since we remove the kobject_init() into alloc disk, when the kobject_add() fails here,
>>
>> it should return without kobject_del/put().
>
> Yes, sorry I didn't noticed.
>>
>>
>> If kobject_add() succeeds but later steps fail, we should call kobject_del() to rollback.
>>
>>
>> The current error handling with kobject_put() in blk_register_queue() is indeed problematic.
>>
>>
>>> Next time, when we call add_disk() again without invoking kobject_init()
>>> (because the initialization is now moved outside add_disk()), the refcount
>>> of disk->queue_kobj — which was previously released — would now go for a
>>> toss. Wouldn't that lead to use-after-free or inconsistent state?
>>>
>>>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>>>> elevator_set_none(q);
>>>> blk_debugfs_remove(disk);
>>>> - kobject_put(&disk->queue_kobj);
>>>> }
>>> I'm thinking a case where add_disk() fails after the queue is registered.
>>> In that case, we call blk_unregister_queue() — which would ideally put()
>>> the disk->queue_kobj.
>>> But if we skip that put() in blk_unregister_queue() (and that's what we do
>>> above), and then later retry add_disk(), wouldn’t kobject_add() from
>>> blk_register_queue() complain loudly — since we’re trying to add a kobject
>>> that was already added previously?
>
> This is exactly the problem reported orginally, now is the same
> procedures before 2bd85221a625:
>
> 1) allocate memory: kobject_init
> 2) register queue: kobject_add
> 3) unregister queue: kobject_del
> 4) free memory: kobject_put
>
> Noted that kobject_add is corresponding to kobject_del, and they don't
> grab/release kobject reference. 2) and 3) can be executed multiple
> times, the only thing that I noticed is the following uevent:
>
> kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
>
> Looks like the uevent is only valid for the first one, if first add_disk
> failed and then queue is registered again, there won't be uevent again,
> see state_add_uevent_sent.
>
Why do you think so? We always send the "add" uevent when we register
queue and then send "remove" uevent when the queue is unregistered. This
behavior should remain intact with this change. The kobj->state_add_uevent_sent
is only used for sending "remove" event during automatic cleanup of kobject.
It shouldn't have any side effect while we re-register queue or retry
add_disk(). Or am I missing something here?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-08 8:09 ` Nilay Shroff
@ 2025-08-08 8:34 ` Yu Kuai
2025-08-08 9:42 ` Nilay Shroff
0 siblings, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2025-08-08 8:34 UTC (permalink / raw)
To: Nilay Shroff, Yu Kuai, Zheng Qixing, axboe
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, houtao1,
zhengqixing, yukuai (C)
Hi,
在 2025/08/08 16:09, Nilay Shroff 写道:
>
>
> On 8/8/25 6:18 AM, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/08/07 21:44, Zheng Qixing 写道:
>>> Hi,
>>>
>>>
>>> 在 2025/8/7 19:47, Nilay Shroff 写道:
>>>>
>>>> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>>>>> From: Zheng Qixing <zhengqixing@huawei.com>
>>>>>
>>>>> Device-mapper can call add_disk() multiple times for the same gendisk
>>>>> due to its two-phase creation process (dm create + dm load). This leads
>>>>> to kobject double initialization errors when the underlying iSCSI devices
>>>>> become temporarily unavailable and then reappear.
>>>>>
>>>>> However, if the first add_disk() call fails and is retried, the queue_kobj
>>>>> gets initialized twice, causing:
>>>>>
>>>>> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
>>>>> something is seriously wrong.
>>>>> Call Trace:
>>>>> <TASK>
>>>>> dump_stack_lvl+0x5b/0x80
>>>>> kobject_init.cold+0x43/0x51
>>>>> blk_register_queue+0x46/0x280
>>>>> add_disk_fwnode+0xb5/0x280
>>>>> dm_setup_md_queue+0x194/0x1c0
>>>>> table_load+0x297/0x2d0
>>>>> ctl_ioctl+0x2a2/0x480
>>>>> dm_ctl_ioctl+0xe/0x20
>>>>> __x64_sys_ioctl+0xc7/0x110
>>>>> do_syscall_64+0x72/0x390
>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>
>>>>> Fix this by separating kobject initialization from sysfs registration:
>>>>> - Initialize queue_kobj early during gendisk allocation
>>>>> - add_disk() only adds the already-initialized kobject to sysfs
>>>>> - del_gendisk() removes from sysfs but doesn't destroy the kobject
>>>>> - Final cleanup happens when the disk is released
>>>>>
>>>>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
>>>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>>>>> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
>>>>> ---
>>>>> block/blk-sysfs.c | 4 +---
>>>>> block/blk.h | 1 +
>>>>> block/genhd.c | 2 ++
>>>>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>>>> index 396cded255ea..37d8654faff9 100644
>>>>> --- a/block/blk-sysfs.c
>>>>> +++ b/block/blk-sysfs.c
>>>>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>>>>> /* nothing to do here, all data is associated with the parent gendisk */
>>>>> }
>>>>> -static const struct kobj_type blk_queue_ktype = {
>>>>> +const struct kobj_type blk_queue_ktype = {
>>>>> .default_groups = blk_queue_attr_groups,
>>>>> .sysfs_ops = &queue_sysfs_ops,
>>>>> .release = blk_queue_release,
>>>>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>>>>> struct request_queue *q = disk->queue;
>>>>> int ret;
>>>>> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>>>>> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
>>>>> if (ret < 0)
>>>>> goto out_put_queue_kobj;
>>>> If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
>>>> where we release/put disk->queue_kobj. That would decrement the kref of
>>>> disk->queue_kobj and possibly bring it to zero.
>>>
>>>
>>> Since we remove the kobject_init() into alloc disk, when the kobject_add() fails here,
>>>
>>> it should return without kobject_del/put().
>>
>> Yes, sorry I didn't noticed.
>>>
>>>
>>> If kobject_add() succeeds but later steps fail, we should call kobject_del() to rollback.
>>>
>>>
>>> The current error handling with kobject_put() in blk_register_queue() is indeed problematic.
>>>
>>>
>>>> Next time, when we call add_disk() again without invoking kobject_init()
>>>> (because the initialization is now moved outside add_disk()), the refcount
>>>> of disk->queue_kobj — which was previously released — would now go for a
>>>> toss. Wouldn't that lead to use-after-free or inconsistent state?
>>>>
>>>>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>>>>> elevator_set_none(q);
>>>>> blk_debugfs_remove(disk);
>>>>> - kobject_put(&disk->queue_kobj);
>>>>> }
>>>> I'm thinking a case where add_disk() fails after the queue is registered.
>>>> In that case, we call blk_unregister_queue() — which would ideally put()
>>>> the disk->queue_kobj.
>>>> But if we skip that put() in blk_unregister_queue() (and that's what we do
>>>> above), and then later retry add_disk(), wouldn’t kobject_add() from
>>>> blk_register_queue() complain loudly — since we’re trying to add a kobject
>>>> that was already added previously?
>>
>> This is exactly the problem reported orginally, now is the same
>> procedures before 2bd85221a625:
>>
>> 1) allocate memory: kobject_init
>> 2) register queue: kobject_add
>> 3) unregister queue: kobject_del
>> 4) free memory: kobject_put
>>
>> Noted that kobject_add is corresponding to kobject_del, and they don't
>> grab/release kobject reference. 2) and 3) can be executed multiple
>> times, the only thing that I noticed is the following uevent:
>>
>> kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
>>
>> Looks like the uevent is only valid for the first one, if first add_disk
>> failed and then queue is registered again, there won't be uevent again,
>> see state_add_uevent_sent.
>>
> Why do you think so? We always send the "add" uevent when we register
> queue and then send "remove" uevent when the queue is unregistered. This
> behavior should remain intact with this change. The kobj->state_add_uevent_sent
> is only used for sending "remove" event during automatic cleanup of kobject.
> It shouldn't have any side effect while we re-register queue or retry
> add_disk(). Or am I missing something here?
Yes, turns out I misread the code after I tested and found that user
didn't get the queue uevent. Just dig deeper and realize this is due
to dev_uevent_filter() from the top devices_kset stop the queue kobj
uevent.
So, if I don't misread the code now, looks like the kobject_uevent()
call to queue_kobj is useless.
Thanks,
Kuai
>
> Thanks,
> --Nilay
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] block: fix kobject double initialization in add_disk
2025-08-08 8:34 ` Yu Kuai
@ 2025-08-08 9:42 ` Nilay Shroff
0 siblings, 0 replies; 8+ messages in thread
From: Nilay Shroff @ 2025-08-08 9:42 UTC (permalink / raw)
To: Yu Kuai, Zheng Qixing, axboe
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, houtao1,
zhengqixing, yukuai (C)
On 8/8/25 2:04 PM, Yu Kuai wrote:
> Hi,
>
> 在 2025/08/08 16:09, Nilay Shroff 写道:
>>
>>
>> On 8/8/25 6:18 AM, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2025/08/07 21:44, Zheng Qixing 写道:
>>>> Hi,
>>>>
>>>>
>>>> 在 2025/8/7 19:47, Nilay Shroff 写道:
>>>>>
>>>>> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>>>>>> From: Zheng Qixing <zhengqixing@huawei.com>
>>>>>>
>>>>>> Device-mapper can call add_disk() multiple times for the same gendisk
>>>>>> due to its two-phase creation process (dm create + dm load). This leads
>>>>>> to kobject double initialization errors when the underlying iSCSI devices
>>>>>> become temporarily unavailable and then reappear.
>>>>>>
>>>>>> However, if the first add_disk() call fails and is retried, the queue_kobj
>>>>>> gets initialized twice, causing:
>>>>>>
>>>>>> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
>>>>>> something is seriously wrong.
>>>>>> Call Trace:
>>>>>> <TASK>
>>>>>> dump_stack_lvl+0x5b/0x80
>>>>>> kobject_init.cold+0x43/0x51
>>>>>> blk_register_queue+0x46/0x280
>>>>>> add_disk_fwnode+0xb5/0x280
>>>>>> dm_setup_md_queue+0x194/0x1c0
>>>>>> table_load+0x297/0x2d0
>>>>>> ctl_ioctl+0x2a2/0x480
>>>>>> dm_ctl_ioctl+0xe/0x20
>>>>>> __x64_sys_ioctl+0xc7/0x110
>>>>>> do_syscall_64+0x72/0x390
>>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>
>>>>>> Fix this by separating kobject initialization from sysfs registration:
>>>>>> - Initialize queue_kobj early during gendisk allocation
>>>>>> - add_disk() only adds the already-initialized kobject to sysfs
>>>>>> - del_gendisk() removes from sysfs but doesn't destroy the kobject
>>>>>> - Final cleanup happens when the disk is released
>>>>>>
>>>>>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
>>>>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>>> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>>>>>> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
>>>>>> ---
>>>>>> block/blk-sysfs.c | 4 +---
>>>>>> block/blk.h | 1 +
>>>>>> block/genhd.c | 2 ++
>>>>>> 3 files changed, 4 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>>>>> index 396cded255ea..37d8654faff9 100644
>>>>>> --- a/block/blk-sysfs.c
>>>>>> +++ b/block/blk-sysfs.c
>>>>>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>>>>>> /* nothing to do here, all data is associated with the parent gendisk */
>>>>>> }
>>>>>> -static const struct kobj_type blk_queue_ktype = {
>>>>>> +const struct kobj_type blk_queue_ktype = {
>>>>>> .default_groups = blk_queue_attr_groups,
>>>>>> .sysfs_ops = &queue_sysfs_ops,
>>>>>> .release = blk_queue_release,
>>>>>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>>>>>> struct request_queue *q = disk->queue;
>>>>>> int ret;
>>>>>> - kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>>>>>> ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
>>>>>> if (ret < 0)
>>>>>> goto out_put_queue_kobj;
>>>>> If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
>>>>> where we release/put disk->queue_kobj. That would decrement the kref of
>>>>> disk->queue_kobj and possibly bring it to zero.
>>>>
>>>>
>>>> Since we remove the kobject_init() into alloc disk, when the kobject_add() fails here,
>>>>
>>>> it should return without kobject_del/put().
>>>
>>> Yes, sorry I didn't noticed.
>>>>
>>>>
>>>> If kobject_add() succeeds but later steps fail, we should call kobject_del() to rollback.
>>>>
>>>>
>>>> The current error handling with kobject_put() in blk_register_queue() is indeed problematic.
>>>>
>>>>
>>>>> Next time, when we call add_disk() again without invoking kobject_init()
>>>>> (because the initialization is now moved outside add_disk()), the refcount
>>>>> of disk->queue_kobj — which was previously released — would now go for a
>>>>> toss. Wouldn't that lead to use-after-free or inconsistent state?
>>>>>
>>>>>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>>>>>> elevator_set_none(q);
>>>>>> blk_debugfs_remove(disk);
>>>>>> - kobject_put(&disk->queue_kobj);
>>>>>> }
>>>>> I'm thinking a case where add_disk() fails after the queue is registered.
>>>>> In that case, we call blk_unregister_queue() — which would ideally put()
>>>>> the disk->queue_kobj.
>>>>> But if we skip that put() in blk_unregister_queue() (and that's what we do
>>>>> above), and then later retry add_disk(), wouldn’t kobject_add() from
>>>>> blk_register_queue() complain loudly — since we’re trying to add a kobject
>>>>> that was already added previously?
>>>
>>> This is exactly the problem reported orginally, now is the same
>>> procedures before 2bd85221a625:
>>>
>>> 1) allocate memory: kobject_init
>>> 2) register queue: kobject_add
>>> 3) unregister queue: kobject_del
>>> 4) free memory: kobject_put
>>>
>>> Noted that kobject_add is corresponding to kobject_del, and they don't
>>> grab/release kobject reference. 2) and 3) can be executed multiple
>>> times, the only thing that I noticed is the following uevent:
>>>
>>> kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
>>>
>>> Looks like the uevent is only valid for the first one, if first add_disk
>>> failed and then queue is registered again, there won't be uevent again,
>>> see state_add_uevent_sent.
>>>
>> Why do you think so? We always send the "add" uevent when we register
>> queue and then send "remove" uevent when the queue is unregistered. This
>> behavior should remain intact with this change. The kobj->state_add_uevent_sent
>> is only used for sending "remove" event during automatic cleanup of kobject.
>> It shouldn't have any side effect while we re-register queue or retry
>> add_disk(). Or am I missing something here?
>
> Yes, turns out I misread the code after I tested and found that user
> didn't get the queue uevent. Just dig deeper and realize this is due
> to dev_uevent_filter() from the top devices_kset stop the queue kobj
> uevent.
>
> So, if I don't misread the code now, looks like the kobject_uevent()
> call to queue_kobj is useless.
>
Yes that's correct. And so this patch doesn't alter the queue_kobj uevent behavior and so we're good here.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-08 9:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07 7:20 [PATCH] block: fix kobject double initialization in add_disk Zheng Qixing
2025-08-07 8:42 ` Yu Kuai
2025-08-07 11:47 ` Nilay Shroff
2025-08-07 13:44 ` Zheng Qixing
2025-08-08 0:48 ` Yu Kuai
2025-08-08 8:09 ` Nilay Shroff
2025-08-08 8:34 ` Yu Kuai
2025-08-08 9:42 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).