* Kernel null pointer dereference on stopping raid device @ 2023-06-13 20:12 Jain, Ayush 2023-06-14 7:10 ` Ayush Jain 0 siblings, 1 reply; 6+ messages in thread From: Jain, Ayush @ 2023-06-13 20:12 UTC (permalink / raw) To: sfr, Linux Kernel Mailing List, Linux Next Mailing List Hello All, On next-20230613 release after creation of raid devices while stopping the same hitting kernel NULL pointer dereference situation on AMD x86 systems. Kernel: 6.4.0-rc6-next-20230613 Commit: 1f6ce8392d6ff48 $ mdadm --create --assume-clean /dev/md/mdsraid --level=0 --raid-devices=1 /dev/loop0 --metadata=1.2 --verbose --force $ mdadm --stop /dev/md/mdsraid Attaching Kernel trace below [ 32.260763] PEFILE: Unsigned PE binary [ 117.236671] block device autoloading is deprecated and will be removed. [ 117.262329] md127: detected capacity change from 0 to 25581568 [ 180.249007] md127: detected capacity change from 25581568 to 0 [ 180.255540] md: md127 stopped. [ 180.268433] BUG: kernel NULL pointer dereference, address: 00000000000000a4 [ 180.276210] #PF: supervisor read access in kernel mode [ 180.281947] #PF: error_code(0x0000) - not-present page [ 180.287676] PGD 0 P4D 0 [ 180.290508] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 180.295374] CPU: 5 PID: 7674 Comm: mdadm Kdump: loaded Not tainted 6.4.0-rc6-next-20230613 #1 [ 180.315092] RIP: 0010:export_rdev+0xb2/0x1f0 [ 180.319869] Code: c7 43 40 00 00 00 00 48 8d bb 48 01 00 00 e8 c5 c0 c5 ff 48 8b 83 b8 00 00 00 a8 10 74 0c 48 8b 43 30 8b 78 34 e8 ae fe ff ff <83> bd a4 00 00 00 fe 48 c7 c6 c0 f9 aa 9d 48 8b 7b 30 48 0f 45 f3 [ 180.340820] RSP: 0018:ffffb1dadc677da0 EFLAGS: 00010246 [ 180.346655] RAX: 0000000000000002 RBX: ffff9ca944130e00 RCX: 0000000080080007 [ 180.354622] RDX: 0000000080080008 RSI: fffffc7fc20f2c00 RDI: 0000000000000000 [ 180.362588] RBP: 0000000000000000 R08: ffff9d0943cb0000 R09: 0000000080080007 [ 180.370553] R10: 0000000040000000 R11: 0000000000000001 R12: 0000000000000000 [ 180.378512] R13: 0000000000000000 R14: ffff9d0943cb21d8 R15: ffff9ca94307c400 [ 180.386470] FS: 00007f2a63448740(0000) GS:ffff9ca8fef40000(0000) knlGS:0000000000000000 [ 180.395502] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 180.401917] CR2: 00000000000000a4 CR3: 0000000102fcc000 CR4: 00000000003506e0 [ 180.409875] Call Trace: [ 180.412608] <TASK> [ 180.414957] ? __die+0x24/0x70 [ 180.418372] ? page_fault_oops+0x82/0x150 [ 180.422852] ? exc_page_fault+0x69/0x150 [ 180.427237] ? asm_exc_page_fault+0x26/0x30 [ 180.431916] ? export_rdev+0xb2/0x1f0 [ 180.436005] ? md_kick_rdev_from_array+0x118/0x150 [ 180.441354] do_md_stop+0x28e/0x580 [ 180.445241] ? security_capable+0x3a/0x60 [ 180.449721] md_ioctl+0x540/0x940 [ 180.453423] ? selinux_bprm_creds_for_exec+0x291/0x2a0 [ 180.459163] blkdev_ioctl+0x142/0x280 [ 180.463255] __x64_sys_ioctl+0x91/0xd0 [ 180.467447] do_syscall_64+0x3f/0x90 [ 180.471440] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 180.477081] RIP: 0033:0x7f2a6323ec6b [ 180.481073] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [ 180.502032] RSP: 002b:00007ffc29d52238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 180.510484] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f2a6323ec6b [ 180.518449] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000003 [ 180.526415] RBP: 0000000000000003 R08: 0000000000000207 R09: 00007ffc29d51eb5 [ 180.534373] R10: 000000000000007f R11: 0000000000000246 R12: 0000555c79876280 [ 180.542338] R13: 00007ffc29d55379 R14: 00007ffc29d52330 R15: 00007ffc29d523d0 [ 180.550305] </TASK> Thanks & Regards, Ayush Jain ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel null pointer dereference on stopping raid device 2023-06-13 20:12 Kernel null pointer dereference on stopping raid device Jain, Ayush @ 2023-06-14 7:10 ` Ayush Jain 2023-06-14 7:22 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Ayush Jain @ 2023-06-14 7:10 UTC (permalink / raw) To: sfr, Linux Kernel Mailing List, Linux Next Mailing List Cc: Wyes Karny, hch, Jens Axboe, linux-block Hello, On 6/14/2023 1:42 AM, Jain, Ayush wrote: > Hello All, > > On next-20230613 release after creation of raid devices while stopping > the same hitting kernel NULL pointer dereference situation on > AMD x86 systems. > > Kernel: 6.4.0-rc6-next-20230613 > Commit: 1f6ce8392d6ff48 > > $ mdadm --create --assume-clean /dev/md/mdsraid --level=0 --raid-devices=1 /dev/loop0 --metadata=1.2 --verbose --force > $ mdadm --stop /dev/md/mdsraid > > > Attaching Kernel trace below > [ 32.260763] PEFILE: Unsigned PE binary > [ 117.236671] block device autoloading is deprecated and will be removed. > [ 117.262329] md127: detected capacity change from 0 to 25581568 > [ 180.249007] md127: detected capacity change from 25581568 to 0 > [ 180.255540] md: md127 stopped. > [ 180.268433] BUG: kernel NULL pointer dereference, address: 00000000000000a4 > [ 180.276210] #PF: supervisor read access in kernel mode > [ 180.281947] #PF: error_code(0x0000) - not-present page > [ 180.287676] PGD 0 P4D 0 > [ 180.290508] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 180.295374] CPU: 5 PID: 7674 Comm: mdadm Kdump: loaded Not tainted 6.4.0-rc6-next-20230613 #1 > [ 180.315092] RIP: 0010:export_rdev+0xb2/0x1f0 > [ 180.319869] Code: c7 43 40 00 00 00 00 48 8d bb 48 01 00 00 e8 c5 c0 c5 ff 48 8b 83 b8 00 00 00 a8 10 74 0c 48 8b 43 30 8b 78 34 e8 ae fe ff ff <83> bd a4 00 00 00 fe 48 c7 c6 c0 f9 aa 9d 48 8b 7b 30 48 0f 45 f3 > [ 180.340820] RSP: 0018:ffffb1dadc677da0 EFLAGS: 00010246 > [ 180.346655] RAX: 0000000000000002 RBX: ffff9ca944130e00 RCX: 0000000080080007 > [ 180.354622] RDX: 0000000080080008 RSI: fffffc7fc20f2c00 RDI: 0000000000000000 > [ 180.362588] RBP: 0000000000000000 R08: ffff9d0943cb0000 R09: 0000000080080007 > [ 180.370553] R10: 0000000040000000 R11: 0000000000000001 R12: 0000000000000000 > [ 180.378512] R13: 0000000000000000 R14: ffff9d0943cb21d8 R15: ffff9ca94307c400 > [ 180.386470] FS: 00007f2a63448740(0000) GS:ffff9ca8fef40000(0000) knlGS:0000000000000000 > [ 180.395502] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 180.401917] CR2: 00000000000000a4 CR3: 0000000102fcc000 CR4: 00000000003506e0 > [ 180.409875] Call Trace: > [ 180.412608] <TASK> > [ 180.414957] ? __die+0x24/0x70 > [ 180.418372] ? page_fault_oops+0x82/0x150 > [ 180.422852] ? exc_page_fault+0x69/0x150 > [ 180.427237] ? asm_exc_page_fault+0x26/0x30 > [ 180.431916] ? export_rdev+0xb2/0x1f0 > [ 180.436005] ? md_kick_rdev_from_array+0x118/0x150 > [ 180.441354] do_md_stop+0x28e/0x580 > [ 180.445241] ? security_capable+0x3a/0x60 > [ 180.449721] md_ioctl+0x540/0x940 > [ 180.453423] ? selinux_bprm_creds_for_exec+0x291/0x2a0 > [ 180.459163] blkdev_ioctl+0x142/0x280 > [ 180.463255] __x64_sys_ioctl+0x91/0xd0 > [ 180.467447] do_syscall_64+0x3f/0x90 > [ 180.471440] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [ 180.477081] RIP: 0033:0x7f2a6323ec6b > [ 180.481073] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 > [ 180.502032] RSP: 002b:00007ffc29d52238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 180.510484] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f2a6323ec6b > [ 180.518449] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000003 > [ 180.526415] RBP: 0000000000000003 R08: 0000000000000207 R09: 00007ffc29d51eb5 > [ 180.534373] R10: 000000000000007f R11: 0000000000000246 R12: 0000555c79876280 > [ 180.542338] R13: 00007ffc29d55379 R14: 00007ffc29d52330 R15: 00007ffc29d523d0 > [ 180.550305] </TASK> > After reverting commit: 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307 Author: Christoph Hellwig <hch@lst.de> Date: Thu Jun 8 13:02:43 2023 +0200 block: use the holder as indication for exclusive opens Able to see problem resolved. Can you please look over the issue Christoph. > Thanks & Regards, > Ayush Jain Thanks & Regards, Ayush Jain ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel null pointer dereference on stopping raid device 2023-06-14 7:10 ` Ayush Jain @ 2023-06-14 7:22 ` Christoph Hellwig [not found] ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com> 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2023-06-14 7:22 UTC (permalink / raw) To: Ayush Jain Cc: sfr, Linux Kernel Mailing List, Linux Next Mailing List, Wyes Karny, hch, Jens Axboe, linux-block Hi Ayush, can you try this patch? diff --git a/drivers/md/md.c b/drivers/md/md.c index ca0de7ddd9434d..828c4e6b9c5013 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2460,7 +2460,7 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev) if (test_bit(AutoDetected, &rdev->flags)) md_autodetect_dev(rdev->bdev->bd_dev); #endif - blkdev_put(rdev->bdev, mddev->major_version == -2 ? &claim_rdev : rdev); + blkdev_put(rdev->bdev, &claim_rdev); rdev->bdev = NULL; kobject_put(&rdev->kobj); } @@ -3644,7 +3644,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe goto out_clear_rdev; rdev->bdev = blkdev_get_by_dev(newdev, BLK_OPEN_READ | BLK_OPEN_WRITE, - super_format == -2 ? &claim_rdev : rdev, NULL); + &claim_rdev, NULL); if (IS_ERR(rdev->bdev)) { pr_warn("md: could not open device unknown-block(%u,%u).\n", MAJOR(newdev), MINOR(newdev)); @@ -3681,7 +3681,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe return rdev; out_blkdev_put: - blkdev_put(rdev->bdev, super_format == -2 ? &claim_rdev : rdev); + blkdev_put(rdev->bdev, &claim_rdev); out_clear_rdev: md_rdev_clear(rdev); out_free_rdev: ^ permalink raw reply related [flat|nested] 6+ messages in thread
[parent not found: <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com>]
* Re: Kernel null pointer dereference on stopping raid device [not found] ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com> @ 2023-06-14 14:01 ` Christoph Hellwig 2023-06-15 5:44 ` Ayush Jain 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2023-06-14 14:01 UTC (permalink / raw) To: Jain, Ayush Cc: Christoph Hellwig, sfr@canb.auug.org.au, Linux Kernel Mailing List, Linux Next Mailing List, Karny, Wyes, Jens Axboe, linux-block@vger.kernel.org, V, Narasimhan, Shetty, Kalpana, Shukla, Santosh On Wed, Jun 14, 2023 at 09:54:07AM +0000, Jain, Ayush wrote: > Patch applied cleanly on next-20230614 and resolved the issue. > > Reported-by: Ayush Jain <ayush.jain3@amd.com> > Tested-by: Ayush Jain <ayush.jain3@amd.com> That was just a quick hack to verify the problem. I think this is the proper fix, can you try it as well? diff --git a/drivers/md/md.c b/drivers/md/md.c index ca0de7ddd9434d..da523e80a4e990 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2467,10 +2467,12 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev) static void md_kick_rdev_from_array(struct md_rdev *rdev) { - bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); + struct mddev *mddev = rdev->mddev; + + bd_unlink_disk_holder(rdev->bdev, mddev->gendisk); list_del_rcu(&rdev->same_set); pr_debug("md: unbind<%pg>\n", rdev->bdev); - mddev_destroy_serial_pool(rdev->mddev, rdev, false); + mddev_destroy_serial_pool(mddev, rdev, false); rdev->mddev = NULL; sysfs_remove_link(&rdev->kobj, "block"); sysfs_put(rdev->sysfs_state); @@ -2488,7 +2490,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev) INIT_WORK(&rdev->del_work, rdev_delayed_delete); kobject_get(&rdev->kobj); queue_work(md_rdev_misc_wq, &rdev->del_work); - export_rdev(rdev, rdev->mddev); + export_rdev(rdev, mddev); } static void export_array(struct mddev *mddev) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel null pointer dereference on stopping raid device 2023-06-14 14:01 ` Christoph Hellwig @ 2023-06-15 5:44 ` Ayush Jain 2023-06-15 6:03 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Ayush Jain @ 2023-06-15 5:44 UTC (permalink / raw) To: Christoph Hellwig Cc: sfr@canb.auug.org.au, Linux Kernel Mailing List, Linux Next Mailing List, Karny, Wyes, Jens Axboe, linux-block@vger.kernel.org, V, Narasimhan, Shetty, Kalpana, Shukla, Santosh On 6/14/2023 7:31 PM, Christoph Hellwig wrote: > On Wed, Jun 14, 2023 at 09:54:07AM +0000, Jain, Ayush wrote: >> Patch applied cleanly on next-20230614 and resolved the issue. >> >> Reported-by: Ayush Jain <ayush.jain3@amd.com> >> Tested-by: Ayush Jain <ayush.jain3@amd.com> > > That was just a quick hack to verify the problem. I think this is > the proper fix, can you try it as well? > Sure, this works on my machine. Tested-by: Ayush Jain <ayush.jain3@amd.com> > diff --git a/drivers/md/md.c b/drivers/md/md.c > index ca0de7ddd9434d..da523e80a4e990 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -2467,10 +2467,12 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev) > > static void md_kick_rdev_from_array(struct md_rdev *rdev) > { > - bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); > + struct mddev *mddev = rdev->mddev; > + > + bd_unlink_disk_holder(rdev->bdev, mddev->gendisk); > list_del_rcu(&rdev->same_set); > pr_debug("md: unbind<%pg>\n", rdev->bdev); > - mddev_destroy_serial_pool(rdev->mddev, rdev, false); > + mddev_destroy_serial_pool(mddev, rdev, false); > rdev->mddev = NULL; > sysfs_remove_link(&rdev->kobj, "block"); > sysfs_put(rdev->sysfs_state); > @@ -2488,7 +2490,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev) > INIT_WORK(&rdev->del_work, rdev_delayed_delete); > kobject_get(&rdev->kobj); > queue_work(md_rdev_misc_wq, &rdev->del_work); > - export_rdev(rdev, rdev->mddev); > + export_rdev(rdev, mddev); > } > > static void export_array(struct mddev *mddev) Thanks & Regards, Ayush Jain ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel null pointer dereference on stopping raid device 2023-06-15 5:44 ` Ayush Jain @ 2023-06-15 6:03 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2023-06-15 6:03 UTC (permalink / raw) To: Ayush Jain Cc: Christoph Hellwig, sfr@canb.auug.org.au, Linux Kernel Mailing List, Linux Next Mailing List, Karny, Wyes, Jens Axboe, linux-block@vger.kernel.org, V, Narasimhan, Shetty, Kalpana, Shukla, Santosh On Thu, Jun 15, 2023 at 11:14:02AM +0530, Ayush Jain wrote: > > That was just a quick hack to verify the problem. I think this is > > the proper fix, can you try it as well? > > > > Sure, this works on my machine. > > Tested-by: Ayush Jain <ayush.jain3@amd.com> So it turns out that Jens merged the md pull request for 6.5 yesterday, and that includes and equivalent change in 3ce94ce5d05ae89190a23f6187f64d8f4b2d3782 Author: Yu Kuai <yukuai3@huawei.com> Date: Tue May 23 09:27:27 2023 +0800 md: fix duplicate filename for rdev With that I think we don't need an extra fix. Sorry for all the extra work. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-15 6:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-13 20:12 Kernel null pointer dereference on stopping raid device Jain, Ayush
2023-06-14 7:10 ` Ayush Jain
2023-06-14 7:22 ` Christoph Hellwig
[not found] ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com>
2023-06-14 14:01 ` Christoph Hellwig
2023-06-15 5:44 ` Ayush Jain
2023-06-15 6:03 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox