public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel null pointer dereference on stopping raid device
@ 2023-06-13 20:12 Jain, Ayush
  2023-06-14  7:10 ` Ayush Jain
  0 siblings, 1 reply; 6+ messages in thread
From: Jain, Ayush @ 2023-06-13 20:12 UTC (permalink / raw)
  To: sfr, Linux Kernel Mailing List, Linux Next Mailing List

Hello All,

On next-20230613 release after creation of raid devices while stopping
the same hitting kernel NULL pointer dereference situation on
AMD x86 systems.

Kernel: 6.4.0-rc6-next-20230613
Commit: 1f6ce8392d6ff48

  $ mdadm --create --assume-clean /dev/md/mdsraid --level=0 --raid-devices=1 /dev/loop0 --metadata=1.2 --verbose --force
  $ mdadm --stop /dev/md/mdsraid


Attaching Kernel trace below
   
[   32.260763] PEFILE: Unsigned PE binary
[  117.236671] block device autoloading is deprecated and will be removed.
[  117.262329] md127: detected capacity change from 0 to 25581568
[  180.249007] md127: detected capacity change from 25581568 to 0
[  180.255540] md: md127 stopped.
[  180.268433] BUG: kernel NULL pointer dereference, address: 00000000000000a4
[  180.276210] #PF: supervisor read access in kernel mode
[  180.281947] #PF: error_code(0x0000) - not-present page
[  180.287676] PGD 0 P4D 0
[  180.290508] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  180.295374] CPU: 5 PID: 7674 Comm: mdadm Kdump: loaded Not tainted 6.4.0-rc6-next-20230613 #1
[  180.315092] RIP: 0010:export_rdev+0xb2/0x1f0
[  180.319869] Code: c7 43 40 00 00 00 00 48 8d bb 48 01 00 00 e8 c5 c0 c5 ff 48 8b 83 b8 00 00 00 a8 10 74 0c 48 8b 43 30 8b 78 34 e8 ae fe ff ff <83> bd a4 00 00 00 fe 48 c7 c6 c0 f9 aa 9d 48 8b 7b 30 48 0f 45 f3
[  180.340820] RSP: 0018:ffffb1dadc677da0 EFLAGS: 00010246
[  180.346655] RAX: 0000000000000002 RBX: ffff9ca944130e00 RCX: 0000000080080007
[  180.354622] RDX: 0000000080080008 RSI: fffffc7fc20f2c00 RDI: 0000000000000000
[  180.362588] RBP: 0000000000000000 R08: ffff9d0943cb0000 R09: 0000000080080007
[  180.370553] R10: 0000000040000000 R11: 0000000000000001 R12: 0000000000000000
[  180.378512] R13: 0000000000000000 R14: ffff9d0943cb21d8 R15: ffff9ca94307c400
[  180.386470] FS:  00007f2a63448740(0000) GS:ffff9ca8fef40000(0000) knlGS:0000000000000000
[  180.395502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  180.401917] CR2: 00000000000000a4 CR3: 0000000102fcc000 CR4: 00000000003506e0
[  180.409875] Call Trace:
[  180.412608]  <TASK>
[  180.414957]  ? __die+0x24/0x70
[  180.418372]  ? page_fault_oops+0x82/0x150
[  180.422852]  ? exc_page_fault+0x69/0x150
[  180.427237]  ? asm_exc_page_fault+0x26/0x30
[  180.431916]  ? export_rdev+0xb2/0x1f0
[  180.436005]  ? md_kick_rdev_from_array+0x118/0x150
[  180.441354]  do_md_stop+0x28e/0x580
[  180.445241]  ? security_capable+0x3a/0x60
[  180.449721]  md_ioctl+0x540/0x940
[  180.453423]  ? selinux_bprm_creds_for_exec+0x291/0x2a0
[  180.459163]  blkdev_ioctl+0x142/0x280
[  180.463255]  __x64_sys_ioctl+0x91/0xd0
[  180.467447]  do_syscall_64+0x3f/0x90
[  180.471440]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  180.477081] RIP: 0033:0x7f2a6323ec6b
[  180.481073] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[  180.502032] RSP: 002b:00007ffc29d52238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  180.510484] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f2a6323ec6b
[  180.518449] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000003
[  180.526415] RBP: 0000000000000003 R08: 0000000000000207 R09: 00007ffc29d51eb5
[  180.534373] R10: 000000000000007f R11: 0000000000000246 R12: 0000555c79876280
[  180.542338] R13: 00007ffc29d55379 R14: 00007ffc29d52330 R15: 00007ffc29d523d0
[  180.550305]  </TASK>

Thanks & Regards,
Ayush Jain

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel null pointer dereference on stopping raid device
  2023-06-13 20:12 Kernel null pointer dereference on stopping raid device Jain, Ayush
@ 2023-06-14  7:10 ` Ayush Jain
  2023-06-14  7:22   ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Ayush Jain @ 2023-06-14  7:10 UTC (permalink / raw)
  To: sfr, Linux Kernel Mailing List, Linux Next Mailing List
  Cc: Wyes Karny, hch, Jens Axboe, linux-block

Hello,

On 6/14/2023 1:42 AM, Jain, Ayush wrote:
> Hello All,
> 
> On next-20230613 release after creation of raid devices while stopping
> the same hitting kernel NULL pointer dereference situation on
> AMD x86 systems.
> 
> Kernel: 6.4.0-rc6-next-20230613
> Commit: 1f6ce8392d6ff48
> 
>   $ mdadm --create --assume-clean /dev/md/mdsraid --level=0 --raid-devices=1 /dev/loop0 --metadata=1.2 --verbose --force
>   $ mdadm --stop /dev/md/mdsraid
> 
> 
> Attaching Kernel trace below
> [   32.260763] PEFILE: Unsigned PE binary
> [  117.236671] block device autoloading is deprecated and will be removed.
> [  117.262329] md127: detected capacity change from 0 to 25581568
> [  180.249007] md127: detected capacity change from 25581568 to 0
> [  180.255540] md: md127 stopped.
> [  180.268433] BUG: kernel NULL pointer dereference, address: 00000000000000a4
> [  180.276210] #PF: supervisor read access in kernel mode
> [  180.281947] #PF: error_code(0x0000) - not-present page
> [  180.287676] PGD 0 P4D 0
> [  180.290508] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  180.295374] CPU: 5 PID: 7674 Comm: mdadm Kdump: loaded Not tainted 6.4.0-rc6-next-20230613 #1
> [  180.315092] RIP: 0010:export_rdev+0xb2/0x1f0
> [  180.319869] Code: c7 43 40 00 00 00 00 48 8d bb 48 01 00 00 e8 c5 c0 c5 ff 48 8b 83 b8 00 00 00 a8 10 74 0c 48 8b 43 30 8b 78 34 e8 ae fe ff ff <83> bd a4 00 00 00 fe 48 c7 c6 c0 f9 aa 9d 48 8b 7b 30 48 0f 45 f3
> [  180.340820] RSP: 0018:ffffb1dadc677da0 EFLAGS: 00010246
> [  180.346655] RAX: 0000000000000002 RBX: ffff9ca944130e00 RCX: 0000000080080007
> [  180.354622] RDX: 0000000080080008 RSI: fffffc7fc20f2c00 RDI: 0000000000000000
> [  180.362588] RBP: 0000000000000000 R08: ffff9d0943cb0000 R09: 0000000080080007
> [  180.370553] R10: 0000000040000000 R11: 0000000000000001 R12: 0000000000000000
> [  180.378512] R13: 0000000000000000 R14: ffff9d0943cb21d8 R15: ffff9ca94307c400
> [  180.386470] FS:  00007f2a63448740(0000) GS:ffff9ca8fef40000(0000) knlGS:0000000000000000
> [  180.395502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  180.401917] CR2: 00000000000000a4 CR3: 0000000102fcc000 CR4: 00000000003506e0
> [  180.409875] Call Trace:
> [  180.412608]  <TASK>
> [  180.414957]  ? __die+0x24/0x70
> [  180.418372]  ? page_fault_oops+0x82/0x150
> [  180.422852]  ? exc_page_fault+0x69/0x150
> [  180.427237]  ? asm_exc_page_fault+0x26/0x30
> [  180.431916]  ? export_rdev+0xb2/0x1f0
> [  180.436005]  ? md_kick_rdev_from_array+0x118/0x150
> [  180.441354]  do_md_stop+0x28e/0x580
> [  180.445241]  ? security_capable+0x3a/0x60
> [  180.449721]  md_ioctl+0x540/0x940
> [  180.453423]  ? selinux_bprm_creds_for_exec+0x291/0x2a0
> [  180.459163]  blkdev_ioctl+0x142/0x280
> [  180.463255]  __x64_sys_ioctl+0x91/0xd0
> [  180.467447]  do_syscall_64+0x3f/0x90
> [  180.471440]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  180.477081] RIP: 0033:0x7f2a6323ec6b
> [  180.481073] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
> [  180.502032] RSP: 002b:00007ffc29d52238 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  180.510484] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f2a6323ec6b
> [  180.518449] RDX: 0000000000000000 RSI: 0000000000000932 RDI: 0000000000000003
> [  180.526415] RBP: 0000000000000003 R08: 0000000000000207 R09: 00007ffc29d51eb5
> [  180.534373] R10: 000000000000007f R11: 0000000000000246 R12: 0000555c79876280
> [  180.542338] R13: 00007ffc29d55379 R14: 00007ffc29d52330 R15: 00007ffc29d523d0
> [  180.550305]  </TASK>
> 

After reverting commit: 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307

Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jun 8 13:02:43 2023 +0200

block: use the holder as indication for exclusive opens

Able to see problem resolved.

Can you please look over the issue Christoph.

> Thanks & Regards,
> Ayush Jain

Thanks & Regards,
Ayush Jain

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel null pointer dereference on stopping raid device
  2023-06-14  7:10 ` Ayush Jain
@ 2023-06-14  7:22   ` Christoph Hellwig
       [not found]     ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2023-06-14  7:22 UTC (permalink / raw)
  To: Ayush Jain
  Cc: sfr, Linux Kernel Mailing List, Linux Next Mailing List,
	Wyes Karny, hch, Jens Axboe, linux-block

Hi Ayush,

can you try this patch?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca0de7ddd9434d..828c4e6b9c5013 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2460,7 +2460,7 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
 	if (test_bit(AutoDetected, &rdev->flags))
 		md_autodetect_dev(rdev->bdev->bd_dev);
 #endif
-	blkdev_put(rdev->bdev, mddev->major_version == -2 ? &claim_rdev : rdev);
+	blkdev_put(rdev->bdev, &claim_rdev);
 	rdev->bdev = NULL;
 	kobject_put(&rdev->kobj);
 }
@@ -3644,7 +3644,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
 		goto out_clear_rdev;
 
 	rdev->bdev = blkdev_get_by_dev(newdev, BLK_OPEN_READ | BLK_OPEN_WRITE,
-			super_format == -2 ? &claim_rdev : rdev, NULL);
+			&claim_rdev, NULL);
 	if (IS_ERR(rdev->bdev)) {
 		pr_warn("md: could not open device unknown-block(%u,%u).\n",
 			MAJOR(newdev), MINOR(newdev));
@@ -3681,7 +3681,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
 	return rdev;
 
 out_blkdev_put:
-	blkdev_put(rdev->bdev, super_format == -2 ? &claim_rdev : rdev);
+	blkdev_put(rdev->bdev, &claim_rdev);
 out_clear_rdev:
 	md_rdev_clear(rdev);
 out_free_rdev:

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Kernel null pointer dereference on stopping raid device
       [not found]     ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com>
@ 2023-06-14 14:01       ` Christoph Hellwig
  2023-06-15  5:44         ` Ayush Jain
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2023-06-14 14:01 UTC (permalink / raw)
  To: Jain, Ayush
  Cc: Christoph Hellwig, sfr@canb.auug.org.au,
	Linux Kernel Mailing List, Linux Next Mailing List, Karny, Wyes,
	Jens Axboe, linux-block@vger.kernel.org, V, Narasimhan,
	Shetty, Kalpana, Shukla, Santosh

On Wed, Jun 14, 2023 at 09:54:07AM +0000, Jain, Ayush wrote:
> Patch applied cleanly on next-20230614 and resolved the issue.
> 
> Reported-by: Ayush Jain <ayush.jain3@amd.com>
> Tested-by: Ayush Jain <ayush.jain3@amd.com>

That was just a quick hack to verify the problem.  I think this is
the proper fix, can you try it as well?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca0de7ddd9434d..da523e80a4e990 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2467,10 +2467,12 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
 
 static void md_kick_rdev_from_array(struct md_rdev *rdev)
 {
-	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
+	struct mddev *mddev = rdev->mddev;
+
+	bd_unlink_disk_holder(rdev->bdev, mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	pr_debug("md: unbind<%pg>\n", rdev->bdev);
-	mddev_destroy_serial_pool(rdev->mddev, rdev, false);
+	mddev_destroy_serial_pool(mddev, rdev, false);
 	rdev->mddev = NULL;
 	sysfs_remove_link(&rdev->kobj, "block");
 	sysfs_put(rdev->sysfs_state);
@@ -2488,7 +2490,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
 	INIT_WORK(&rdev->del_work, rdev_delayed_delete);
 	kobject_get(&rdev->kobj);
 	queue_work(md_rdev_misc_wq, &rdev->del_work);
-	export_rdev(rdev, rdev->mddev);
+	export_rdev(rdev, mddev);
 }
 
 static void export_array(struct mddev *mddev)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Kernel null pointer dereference on stopping raid device
  2023-06-14 14:01       ` Christoph Hellwig
@ 2023-06-15  5:44         ` Ayush Jain
  2023-06-15  6:03           ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Ayush Jain @ 2023-06-15  5:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: sfr@canb.auug.org.au, Linux Kernel Mailing List,
	Linux Next Mailing List, Karny, Wyes, Jens Axboe,
	linux-block@vger.kernel.org, V, Narasimhan, Shetty, Kalpana,
	Shukla, Santosh

On 6/14/2023 7:31 PM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2023 at 09:54:07AM +0000, Jain, Ayush wrote:
>> Patch applied cleanly on next-20230614 and resolved the issue.
>>
>> Reported-by: Ayush Jain <ayush.jain3@amd.com>
>> Tested-by: Ayush Jain <ayush.jain3@amd.com>
> 
> That was just a quick hack to verify the problem.  I think this is
> the proper fix, can you try it as well?
> 

Sure, this works on my machine.

Tested-by: Ayush Jain <ayush.jain3@amd.com>

> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ca0de7ddd9434d..da523e80a4e990 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2467,10 +2467,12 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
>   
>   static void md_kick_rdev_from_array(struct md_rdev *rdev)
>   {
> -	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
> +	struct mddev *mddev = rdev->mddev;
> +
> +	bd_unlink_disk_holder(rdev->bdev, mddev->gendisk);
>   	list_del_rcu(&rdev->same_set);
>   	pr_debug("md: unbind<%pg>\n", rdev->bdev);
> -	mddev_destroy_serial_pool(rdev->mddev, rdev, false);
> +	mddev_destroy_serial_pool(mddev, rdev, false);
>   	rdev->mddev = NULL;
>   	sysfs_remove_link(&rdev->kobj, "block");
>   	sysfs_put(rdev->sysfs_state);
> @@ -2488,7 +2490,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
>   	INIT_WORK(&rdev->del_work, rdev_delayed_delete);
>   	kobject_get(&rdev->kobj);
>   	queue_work(md_rdev_misc_wq, &rdev->del_work);
> -	export_rdev(rdev, rdev->mddev);
> +	export_rdev(rdev, mddev);
>   }
>   
>   static void export_array(struct mddev *mddev)

Thanks & Regards,
Ayush Jain

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel null pointer dereference on stopping raid device
  2023-06-15  5:44         ` Ayush Jain
@ 2023-06-15  6:03           ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2023-06-15  6:03 UTC (permalink / raw)
  To: Ayush Jain
  Cc: Christoph Hellwig, sfr@canb.auug.org.au,
	Linux Kernel Mailing List, Linux Next Mailing List, Karny, Wyes,
	Jens Axboe, linux-block@vger.kernel.org, V, Narasimhan,
	Shetty, Kalpana, Shukla, Santosh

On Thu, Jun 15, 2023 at 11:14:02AM +0530, Ayush Jain wrote:
> > That was just a quick hack to verify the problem.  I think this is
> > the proper fix, can you try it as well?
> > 
> 
> Sure, this works on my machine.
> 
> Tested-by: Ayush Jain <ayush.jain3@amd.com>

So it turns out that Jens merged the md pull request for 6.5 yesterday,
and that includes and equivalent change in

3ce94ce5d05ae89190a23f6187f64d8f4b2d3782
Author: Yu Kuai <yukuai3@huawei.com>
Date:   Tue May 23 09:27:27 2023 +0800

    md: fix duplicate filename for rdev

With that I think we don't need an extra fix.  Sorry for all the
extra work.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-15  6:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-13 20:12 Kernel null pointer dereference on stopping raid device Jain, Ayush
2023-06-14  7:10 ` Ayush Jain
2023-06-14  7:22   ` Christoph Hellwig
     [not found]     ` <IA1PR12MB61375A452083D65B5FB815DBBA5AA@IA1PR12MB6137.namprd12.prod.outlook.com>
2023-06-14 14:01       ` Christoph Hellwig
2023-06-15  5:44         ` Ayush Jain
2023-06-15  6:03           ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox