[PATCH 0/2] block: bugfix for bd_link_disk

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] block: bugfix for bd_link_disk_holder()
@ 2022-11-03  2:55 Yu Kuai
  2022-11-03  2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai
  2022-11-03  2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai
  0 siblings, 2 replies; 8+ messages in thread
From: Yu Kuai @ 2022-11-03  2:55 UTC (permalink / raw)
  To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Yu Kuai (2):
  block: don't allow a disk link holder to itself
  block: fix use after free for bd_holder_dir

 block/holder.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] block: don't allow a disk link holder to itself
  2022-11-03  2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai
@ 2022-11-03  2:55 ` Yu Kuai
  2022-11-03  8:06   ` Christoph Hellwig
  2022-11-03  2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai
  1 sibling, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2022-11-03  2:55 UTC (permalink / raw)
  To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Test procedures:

1) dmsetup create test --table "xxx sda", assume dm-0 is created
2) dmsetup suspend test
3) dmsetup reload test --table "xxx dm-0"
4) dmsetup resume test

Test result:

BUG: TASK stack guard page was hit at 00000000736a261f (stack is 000000008d12c88d..00000000c8dd82d5)
stack guard page: 0000 [#1] PREEMPT SMP
CPU: 29 PID: 946 Comm: systemd-udevd Not tainted 6.1.0-rc3-next-20221101-00006-g17640ca3b0ee #1295
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
RIP: 0010:dm_prepare_ioctl+0xf/0x1e0
Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9
RSP: 0018:ffffc90002090000 EFLAGS: 00010206
RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000
RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000
RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331
R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 dm_blk_ioctl+0x50/0x1c0
 ? dm_prepare_ioctl+0xe0/0x1e0
 dm_blk_ioctl+0x88/0x1c0
 dm_blk_ioctl+0x88/0x1c0
 ......(a lot of same lines)
 dm_blk_ioctl+0x88/0x1c0
 dm_blk_ioctl+0x88/0x1c0
 blkdev_ioctl+0x184/0x3e0
 __x64_sys_ioctl+0xa3/0x110
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fddf7306577
Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 88 8
RSP: 002b:00007ffd0b2ec318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005634ef478320 RCX: 00007fddf7306577
RDX: 0000000000000000 RSI: 0000000000005331 RDI: 0000000000000007
RBP: 0000000000000007 R08: 00005634ef4843e0 R09: 0000000000000080
R10: 00007fddf75cfb38 R11: 0000000000000246 R12: 00000000030d4000
R13: 0000000000000000 R14: 0000000000000000 R15: 00005634ef48b800
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:dm_prepare_ioctl+0xf/0x1e0
Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9
RSP: 0018:ffffc90002090000 EFLAGS: 00010206
RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000
RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000
RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331
R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

After creating a dm device, then user can reload such dm with itself,
and dead loop will be triggered because dm keep looking up to itself.

Fix the problem by forbidding a disk to create link to itself.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/holder.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/holder.c b/block/holder.c
index 5283bc804cc1..5fc68238ce3a 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -75,6 +75,9 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	struct bd_holder_disk *holder;
 	int ret = 0;
 
+	if (bdev->bd_disk == disk)
+		return -EINVAL;
+
 	mutex_lock(&disk->open_mutex);
 
 	WARN_ON_ONCE(!bdev->bd_holder);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] block: fix use after free for bd_holder_dir
  2022-11-03  2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai
  2022-11-03  2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai
@ 2022-11-03  2:55 ` Yu Kuai
  2022-11-03  8:12   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2022-11-03  2:55 UTC (permalink / raw)
  To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

Currently, the caller of bd_link_disk_holer() get 'bdev' by
blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'.
Howerver, it's possible that del_gendisk() can be called currently, and
'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus
use after free is triggered.

t1:				t2:
bdev = blkdev_get_by_dev
				del_gendisk
				 kobject_put(bd_holder_dir)
				  kobject_free()
bd_link_disk_holder

Fix the problem by checking disk is still live and grabbing a reference
to 'bd_holder_dir' first in bd_link_disk_holder().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/holder.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/block/holder.c b/block/holder.c
index 5fc68238ce3a..1c6c5b132a92 100644
--- a/block/holder.c
+++ b/block/holder.c
@@ -78,19 +78,32 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	if (bdev->bd_disk == disk)
 		return -EINVAL;
 
-	mutex_lock(&disk->open_mutex);
+	/*
+	 * del_gendisk drops the initial reference to bd_holder_dir, so we
+	 * need to keep our own here to allow for cleanup past that point.
+	 */
+	mutex_lock(&bdev->bd_disk->open_mutex);
+	if (!disk_live(bdev->bd_disk)) {
+		mutex_unlock(&bdev->bd_disk->open_mutex);
+		return -ENODEV;
+	}
 
-	WARN_ON_ONCE(!bdev->bd_holder);
+	kobject_get(bdev->bd_holder_dir);
+	mutex_unlock(&bdev->bd_disk->open_mutex);
 
+	mutex_lock(&disk->open_mutex);
+	WARN_ON_ONCE(!bdev->bd_holder);
 	holder = bd_find_holder_disk(bdev, disk);
 	if (holder) {
 		holder->refcnt++;
+		kobject_put(bdev->bd_holder_dir);
 		goto out_unlock;
 	}
 
 	holder = kzalloc(sizeof(*holder), GFP_KERNEL);
 	if (!holder) {
 		ret = -ENOMEM;
+		kobject_put(bdev->bd_holder_dir);
 		goto out_unlock;
 	}
 
@@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 		ret = __link_disk_holder(bdev, disk);
 		if (ret) {
 			kfree(holder);
+			kobject_put(bdev->bd_holder_dir);
 			goto out_unlock;
 		}
 	}
 
 	list_add(&holder->list, &disk->slave_bdevs);
-	/*
-	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
-	 * to keep our own here to allow for cleanup past that point.
-	 */
-	kobject_get(bdev->bd_holder_dir);
 
 out_unlock:
 	mutex_unlock(&disk->open_mutex);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] block: don't allow a disk link holder to itself
  2022-11-03  2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai
@ 2022-11-03  8:06   ` Christoph Hellwig
  2022-11-03  9:17     ` Yu Kuai
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-11-03  8:06 UTC (permalink / raw)
  To: Yu Kuai; +Cc: hch, axboe, linux-block, linux-kernel, yukuai3, yi.zhang

Yes, this is a good one.  Please add a single sentence blurb explaing
the issue before the Test prodecure?

With that this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

On Thu, Nov 03, 2022 at 10:55:40AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Test procedures:
> 
> 1) dmsetup create test --table "xxx sda", assume dm-0 is created
> 2) dmsetup suspend test
> 3) dmsetup reload test --table "xxx dm-0"
> 4) dmsetup resume test

Can you wire this up for blocktests?

I've also been wondering how we could wire your other two test cases
up as I think they'be very useful.  Hopefully I can find some time
to inject delays with eBPF or something like it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir
  2022-11-03  2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai
@ 2022-11-03  8:12   ` Christoph Hellwig
  2022-11-03  9:45     ` Yu Kuai
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-11-03  8:12 UTC (permalink / raw)
  To: Yu Kuai; +Cc: hch, axboe, linux-block, linux-kernel, yukuai3, yi.zhang

On Thu, Nov 03, 2022 at 10:55:41AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Currently, the caller of bd_link_disk_holer() get 'bdev' by
> blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'.
> Howerver, it's possible that del_gendisk() can be called currently, and
> 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus
> use after free is triggered.
> 
> t1:				t2:
> bdev = blkdev_get_by_dev
> 				del_gendisk
> 				 kobject_put(bd_holder_dir)
> 				  kobject_free()
> bd_link_disk_holder
> 
> Fix the problem by checking disk is still live and grabbing a reference
> to 'bd_holder_dir' first in bd_link_disk_holder().

Looks good with some minor stilistic nipicks:

> +	if (!disk_live(bdev->bd_disk)) {
> +		mutex_unlock(&bdev->bd_disk->open_mutex);
> +		return -ENODEV;
> +	}

This can use a goto out_unlock;

>  		holder->refcnt++;
> +		kobject_put(bdev->bd_holder_dir);
>  		goto out_unlock;
>  	}
>  
>  	holder = kzalloc(sizeof(*holder), GFP_KERNEL);
>  	if (!holder) {
>  		ret = -ENOMEM;
> +		kobject_put(bdev->bd_holder_dir);
>  		goto out_unlock;
>  	}
>  
> @@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>  		ret = __link_disk_holder(bdev, disk);
>  		if (ret) {
>  			kfree(holder);
> +			kobject_put(bdev->bd_holder_dir);

And I think a goto out_put_holder and out_free_holder would clean this up
nicely.

>  	list_add(&holder->list, &disk->slave_bdevs);
> -	/*
> -	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
> -	 * to keep our own here to allow for cleanup past that point.
> -	 */
> -	kobject_get(bdev->bd_holder_dir);

.. with this then jumping straight to out_unlock.


We should repost a series with my first 7 patches and your two.  I can do
that, but it might take some time as I just got through (minor) knee
surgery and am still at the hospital, so if you have spare cycles feel
free to do it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] block: don't allow a disk link holder to itself
  2022-11-03  8:06   ` Christoph Hellwig
@ 2022-11-03  9:17     ` Yu Kuai
  0 siblings, 0 replies; 8+ messages in thread
From: Yu Kuai @ 2022-11-03  9:17 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: axboe, linux-block, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/03 16:06, Christoph Hellwig 写道:
> Yes, this is a good one.  Please add a single sentence blurb explaing
> the issue before the Test prodecure?
Ok, I'll do that.
> 
> With that this looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> On Thu, Nov 03, 2022 at 10:55:40AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Test procedures:
>>
>> 1) dmsetup create test --table "xxx sda", assume dm-0 is created
>> 2) dmsetup suspend test
>> 3) dmsetup reload test --table "xxx dm-0"
>> 4) dmsetup resume test
> 
> Can you wire this up for blocktests?

Of course I can, I'm not very familiar with how to add new test in
blktests, so I might take some time to do that.
> 
> I've also been wondering how we could wire your other two test cases
> up as I think they'be very useful.  Hopefully I can find some time
> to inject delays with eBPF or something like it.

The problem was found without the delay originally, such delay can make
sure the problem is reproduced 100%, but it's still possible without the
delay. Perhaps we can add the test without delay for now.

Thanks,
Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir
  2022-11-03  8:12   ` Christoph Hellwig
@ 2022-11-03  9:45     ` Yu Kuai
  2022-11-03  9:57       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Kuai @ 2022-11-03  9:45 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: axboe, linux-block, linux-kernel, yi.zhang, yukuai (C)

Hi,

在 2022/11/03 16:12, Christoph Hellwig 写道:
> On Thu, Nov 03, 2022 at 10:55:41AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently, the caller of bd_link_disk_holer() get 'bdev' by
>> blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'.
>> Howerver, it's possible that del_gendisk() can be called currently, and
>> 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus
>> use after free is triggered.
>>
>> t1:				t2:
>> bdev = blkdev_get_by_dev
>> 				del_gendisk
>> 				 kobject_put(bd_holder_dir)
>> 				  kobject_free()
>> bd_link_disk_holder
>>
>> Fix the problem by checking disk is still live and grabbing a reference
>> to 'bd_holder_dir' first in bd_link_disk_holder().
> 
> Looks good with some minor stilistic nipicks:
> 
>> +	if (!disk_live(bdev->bd_disk)) {
>> +		mutex_unlock(&bdev->bd_disk->open_mutex);
>> +		return -ENODEV;
>> +	}
> 
> This can use a goto out_unlock;

This lock is different from current 'out_unlock', add a new lable will
make the code more complex, I think.
> 
>>   		holder->refcnt++;
>> +		kobject_put(bdev->bd_holder_dir);
>>   		goto out_unlock;
>>   	}
>>   
>>   	holder = kzalloc(sizeof(*holder), GFP_KERNEL);
>>   	if (!holder) {
>>   		ret = -ENOMEM;
>> +		kobject_put(bdev->bd_holder_dir);
>>   		goto out_unlock;
>>   	}
>>   
>> @@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
>>   		ret = __link_disk_holder(bdev, disk);
>>   		if (ret) {
>>   			kfree(holder);
>> +			kobject_put(bdev->bd_holder_dir);
> 
> And I think a goto out_put_holder and out_free_holder would clean this up
> nicely.

Yes, you're right.
> 
>>   	list_add(&holder->list, &disk->slave_bdevs);
>> -	/*
>> -	 * del_gendisk drops the initial reference to bd_holder_dir, so we need
>> -	 * to keep our own here to allow for cleanup past that point.
>> -	 */
>> -	kobject_get(bdev->bd_holder_dir);
> 
> .. with this then jumping straight to out_unlock.

Ok, I'll do that in next version.
> 
> 
> We should repost a series with my first 7 patches and your two.  I can do
> that, but it might take some time as I just got through (minor) knee
> surgery and am still at the hospital, so if you have spare cycles feel
> free to do it.

I'm glad to do that, and have a good rest 😄

Thanks,
Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir
  2022-11-03  9:45     ` Yu Kuai
@ 2022-11-03  9:57       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2022-11-03  9:57 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, axboe, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

On Thu, Nov 03, 2022 at 05:45:25PM +0800, Yu Kuai wrote:
> This lock is different from current 'out_unlock', add a new lable will
> make the code more complex, I think.

Of course, same mistake as last time..

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-03  9:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-03  2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai
2022-11-03  2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai
2022-11-03  8:06   ` Christoph Hellwig
2022-11-03  9:17     ` Yu Kuai
2022-11-03  2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai
2022-11-03  8:12   ` Christoph Hellwig
2022-11-03  9:45     ` Yu Kuai
2022-11-03  9:57       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox