* [PATCH 0/2] block: bugfix for bd_link_disk_holder() @ 2022-11-03 2:55 Yu Kuai 2022-11-03 2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai 2022-11-03 2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai 0 siblings, 2 replies; 8+ messages in thread From: Yu Kuai @ 2022-11-03 2:55 UTC (permalink / raw) To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang From: Yu Kuai <yukuai3@huawei.com> Yu Kuai (2): block: don't allow a disk link holder to itself block: fix use after free for bd_holder_dir block/holder.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] block: don't allow a disk link holder to itself 2022-11-03 2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai @ 2022-11-03 2:55 ` Yu Kuai 2022-11-03 8:06 ` Christoph Hellwig 2022-11-03 2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai 1 sibling, 1 reply; 8+ messages in thread From: Yu Kuai @ 2022-11-03 2:55 UTC (permalink / raw) To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang From: Yu Kuai <yukuai3@huawei.com> Test procedures: 1) dmsetup create test --table "xxx sda", assume dm-0 is created 2) dmsetup suspend test 3) dmsetup reload test --table "xxx dm-0" 4) dmsetup resume test Test result: BUG: TASK stack guard page was hit at 00000000736a261f (stack is 000000008d12c88d..00000000c8dd82d5) stack guard page: 0000 [#1] PREEMPT SMP CPU: 29 PID: 946 Comm: systemd-udevd Not tainted 6.1.0-rc3-next-20221101-00006-g17640ca3b0ee #1295 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 RIP: 0010:dm_prepare_ioctl+0xf/0x1e0 Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9 RSP: 0018:ffffc90002090000 EFLAGS: 00010206 RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000 RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000 RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331 R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dm_blk_ioctl+0x50/0x1c0 ? dm_prepare_ioctl+0xe0/0x1e0 dm_blk_ioctl+0x88/0x1c0 dm_blk_ioctl+0x88/0x1c0 ......(a lot of same lines) dm_blk_ioctl+0x88/0x1c0 dm_blk_ioctl+0x88/0x1c0 blkdev_ioctl+0x184/0x3e0 __x64_sys_ioctl+0xa3/0x110 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fddf7306577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 88 8 RSP: 002b:00007ffd0b2ec318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005634ef478320 RCX: 00007fddf7306577 RDX: 0000000000000000 RSI: 0000000000005331 RDI: 0000000000000007 RBP: 0000000000000007 R08: 00005634ef4843e0 R09: 0000000000000080 R10: 00007fddf75cfb38 R11: 0000000000000246 R12: 00000000030d4000 R13: 0000000000000000 R14: 0000000000000000 R15: 00005634ef48b800 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:dm_prepare_ioctl+0xf/0x1e0 Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9 RSP: 0018:ffffc90002090000 EFLAGS: 00010206 RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000 RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000 RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331 R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- After creating a dm device, then user can reload such dm with itself, and dead loop will be triggered because dm keep looking up to itself. Fix the problem by forbidding a disk to create link to itself. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/holder.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/holder.c b/block/holder.c index 5283bc804cc1..5fc68238ce3a 100644 --- a/block/holder.c +++ b/block/holder.c @@ -75,6 +75,9 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) struct bd_holder_disk *holder; int ret = 0; + if (bdev->bd_disk == disk) + return -EINVAL; + mutex_lock(&disk->open_mutex); WARN_ON_ONCE(!bdev->bd_holder); -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] block: don't allow a disk link holder to itself 2022-11-03 2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai @ 2022-11-03 8:06 ` Christoph Hellwig 2022-11-03 9:17 ` Yu Kuai 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2022-11-03 8:06 UTC (permalink / raw) To: Yu Kuai; +Cc: hch, axboe, linux-block, linux-kernel, yukuai3, yi.zhang Yes, this is a good one. Please add a single sentence blurb explaing the issue before the Test prodecure? With that this looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> On Thu, Nov 03, 2022 at 10:55:40AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Test procedures: > > 1) dmsetup create test --table "xxx sda", assume dm-0 is created > 2) dmsetup suspend test > 3) dmsetup reload test --table "xxx dm-0" > 4) dmsetup resume test Can you wire this up for blocktests? I've also been wondering how we could wire your other two test cases up as I think they'be very useful. Hopefully I can find some time to inject delays with eBPF or something like it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] block: don't allow a disk link holder to itself 2022-11-03 8:06 ` Christoph Hellwig @ 2022-11-03 9:17 ` Yu Kuai 0 siblings, 0 replies; 8+ messages in thread From: Yu Kuai @ 2022-11-03 9:17 UTC (permalink / raw) To: Christoph Hellwig, Yu Kuai Cc: axboe, linux-block, linux-kernel, yi.zhang, yukuai (C) Hi, 在 2022/11/03 16:06, Christoph Hellwig 写道: > Yes, this is a good one. Please add a single sentence blurb explaing > the issue before the Test prodecure? Ok, I'll do that. > > With that this looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > On Thu, Nov 03, 2022 at 10:55:40AM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Test procedures: >> >> 1) dmsetup create test --table "xxx sda", assume dm-0 is created >> 2) dmsetup suspend test >> 3) dmsetup reload test --table "xxx dm-0" >> 4) dmsetup resume test > > Can you wire this up for blocktests? Of course I can, I'm not very familiar with how to add new test in blktests, so I might take some time to do that. > > I've also been wondering how we could wire your other two test cases > up as I think they'be very useful. Hopefully I can find some time > to inject delays with eBPF or something like it. The problem was found without the delay originally, such delay can make sure the problem is reproduced 100%, but it's still possible without the delay. Perhaps we can add the test without delay for now. Thanks, Kuai > > . > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] block: fix use after free for bd_holder_dir 2022-11-03 2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai 2022-11-03 2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai @ 2022-11-03 2:55 ` Yu Kuai 2022-11-03 8:12 ` Christoph Hellwig 1 sibling, 1 reply; 8+ messages in thread From: Yu Kuai @ 2022-11-03 2:55 UTC (permalink / raw) To: hch, axboe; +Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang From: Yu Kuai <yukuai3@huawei.com> Currently, the caller of bd_link_disk_holer() get 'bdev' by blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'. Howerver, it's possible that del_gendisk() can be called currently, and 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus use after free is triggered. t1: t2: bdev = blkdev_get_by_dev del_gendisk kobject_put(bd_holder_dir) kobject_free() bd_link_disk_holder Fix the problem by checking disk is still live and grabbing a reference to 'bd_holder_dir' first in bd_link_disk_holder(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/holder.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/block/holder.c b/block/holder.c index 5fc68238ce3a..1c6c5b132a92 100644 --- a/block/holder.c +++ b/block/holder.c @@ -78,19 +78,32 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) if (bdev->bd_disk == disk) return -EINVAL; - mutex_lock(&disk->open_mutex); + /* + * del_gendisk drops the initial reference to bd_holder_dir, so we + * need to keep our own here to allow for cleanup past that point. + */ + mutex_lock(&bdev->bd_disk->open_mutex); + if (!disk_live(bdev->bd_disk)) { + mutex_unlock(&bdev->bd_disk->open_mutex); + return -ENODEV; + } - WARN_ON_ONCE(!bdev->bd_holder); + kobject_get(bdev->bd_holder_dir); + mutex_unlock(&bdev->bd_disk->open_mutex); + mutex_lock(&disk->open_mutex); + WARN_ON_ONCE(!bdev->bd_holder); holder = bd_find_holder_disk(bdev, disk); if (holder) { holder->refcnt++; + kobject_put(bdev->bd_holder_dir); goto out_unlock; } holder = kzalloc(sizeof(*holder), GFP_KERNEL); if (!holder) { ret = -ENOMEM; + kobject_put(bdev->bd_holder_dir); goto out_unlock; } @@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) ret = __link_disk_holder(bdev, disk); if (ret) { kfree(holder); + kobject_put(bdev->bd_holder_dir); goto out_unlock; } } list_add(&holder->list, &disk->slave_bdevs); - /* - * del_gendisk drops the initial reference to bd_holder_dir, so we need - * to keep our own here to allow for cleanup past that point. - */ - kobject_get(bdev->bd_holder_dir); out_unlock: mutex_unlock(&disk->open_mutex); -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir 2022-11-03 2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai @ 2022-11-03 8:12 ` Christoph Hellwig 2022-11-03 9:45 ` Yu Kuai 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2022-11-03 8:12 UTC (permalink / raw) To: Yu Kuai; +Cc: hch, axboe, linux-block, linux-kernel, yukuai3, yi.zhang On Thu, Nov 03, 2022 at 10:55:41AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Currently, the caller of bd_link_disk_holer() get 'bdev' by > blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'. > Howerver, it's possible that del_gendisk() can be called currently, and > 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus > use after free is triggered. > > t1: t2: > bdev = blkdev_get_by_dev > del_gendisk > kobject_put(bd_holder_dir) > kobject_free() > bd_link_disk_holder > > Fix the problem by checking disk is still live and grabbing a reference > to 'bd_holder_dir' first in bd_link_disk_holder(). Looks good with some minor stilistic nipicks: > + if (!disk_live(bdev->bd_disk)) { > + mutex_unlock(&bdev->bd_disk->open_mutex); > + return -ENODEV; > + } This can use a goto out_unlock; > holder->refcnt++; > + kobject_put(bdev->bd_holder_dir); > goto out_unlock; > } > > holder = kzalloc(sizeof(*holder), GFP_KERNEL); > if (!holder) { > ret = -ENOMEM; > + kobject_put(bdev->bd_holder_dir); > goto out_unlock; > } > > @@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) > ret = __link_disk_holder(bdev, disk); > if (ret) { > kfree(holder); > + kobject_put(bdev->bd_holder_dir); And I think a goto out_put_holder and out_free_holder would clean this up nicely. > list_add(&holder->list, &disk->slave_bdevs); > - /* > - * del_gendisk drops the initial reference to bd_holder_dir, so we need > - * to keep our own here to allow for cleanup past that point. > - */ > - kobject_get(bdev->bd_holder_dir); .. with this then jumping straight to out_unlock. We should repost a series with my first 7 patches and your two. I can do that, but it might take some time as I just got through (minor) knee surgery and am still at the hospital, so if you have spare cycles feel free to do it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir 2022-11-03 8:12 ` Christoph Hellwig @ 2022-11-03 9:45 ` Yu Kuai 2022-11-03 9:57 ` Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Yu Kuai @ 2022-11-03 9:45 UTC (permalink / raw) To: Christoph Hellwig, Yu Kuai Cc: axboe, linux-block, linux-kernel, yi.zhang, yukuai (C) Hi, 在 2022/11/03 16:12, Christoph Hellwig 写道: > On Thu, Nov 03, 2022 at 10:55:41AM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Currently, the caller of bd_link_disk_holer() get 'bdev' by >> blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'. >> Howerver, it's possible that del_gendisk() can be called currently, and >> 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus >> use after free is triggered. >> >> t1: t2: >> bdev = blkdev_get_by_dev >> del_gendisk >> kobject_put(bd_holder_dir) >> kobject_free() >> bd_link_disk_holder >> >> Fix the problem by checking disk is still live and grabbing a reference >> to 'bd_holder_dir' first in bd_link_disk_holder(). > > Looks good with some minor stilistic nipicks: > >> + if (!disk_live(bdev->bd_disk)) { >> + mutex_unlock(&bdev->bd_disk->open_mutex); >> + return -ENODEV; >> + } > > This can use a goto out_unlock; This lock is different from current 'out_unlock', add a new lable will make the code more complex, I think. > >> holder->refcnt++; >> + kobject_put(bdev->bd_holder_dir); >> goto out_unlock; >> } >> >> holder = kzalloc(sizeof(*holder), GFP_KERNEL); >> if (!holder) { >> ret = -ENOMEM; >> + kobject_put(bdev->bd_holder_dir); >> goto out_unlock; >> } >> >> @@ -101,16 +114,12 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk) >> ret = __link_disk_holder(bdev, disk); >> if (ret) { >> kfree(holder); >> + kobject_put(bdev->bd_holder_dir); > > And I think a goto out_put_holder and out_free_holder would clean this up > nicely. Yes, you're right. > >> list_add(&holder->list, &disk->slave_bdevs); >> - /* >> - * del_gendisk drops the initial reference to bd_holder_dir, so we need >> - * to keep our own here to allow for cleanup past that point. >> - */ >> - kobject_get(bdev->bd_holder_dir); > > .. with this then jumping straight to out_unlock. Ok, I'll do that in next version. > > > We should repost a series with my first 7 patches and your two. I can do > that, but it might take some time as I just got through (minor) knee > surgery and am still at the hospital, so if you have spare cycles feel > free to do it. I'm glad to do that, and have a good rest 😄 Thanks, Kuai > > . > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] block: fix use after free for bd_holder_dir 2022-11-03 9:45 ` Yu Kuai @ 2022-11-03 9:57 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2022-11-03 9:57 UTC (permalink / raw) To: Yu Kuai Cc: Christoph Hellwig, axboe, linux-block, linux-kernel, yi.zhang, yukuai (C) On Thu, Nov 03, 2022 at 05:45:25PM +0800, Yu Kuai wrote: > This lock is different from current 'out_unlock', add a new lable will > make the code more complex, I think. Of course, same mistake as last time.. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-11-03 9:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-03 2:55 [PATCH 0/2] block: bugfix for bd_link_disk_holder() Yu Kuai 2022-11-03 2:55 ` [PATCH 1/2] block: don't allow a disk link holder to itself Yu Kuai 2022-11-03 8:06 ` Christoph Hellwig 2022-11-03 9:17 ` Yu Kuai 2022-11-03 2:55 ` [PATCH 2/2] block: fix use after free for bd_holder_dir Yu Kuai 2022-11-03 8:12 ` Christoph Hellwig 2022-11-03 9:45 ` Yu Kuai 2022-11-03 9:57 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox