Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Kernel OOPS while creating a NVMe Namespace
@ 2024-06-10  7:51 Venkat Rao Bagalkote
  2024-06-10  9:43 ` Hillf Danton
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Venkat Rao Bagalkote @ 2024-06-10  7:51 UTC (permalink / raw)
  To: kbusch, sagi; +Cc: linux-block, linux-kernel, linux-nvme, sachinp

Greetings!!!

Observing Kernel OOPS, while creating namespace on a NVMe device.

[  140.209777] BUG: Unable to handle kernel data access at 
0x18d7003065646fee
[  140.209792] Faulting instruction address: 0xc00000000023b45c
[  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
[  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag 
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set 
nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs 
libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core 
t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
[  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not 
tainted 6.10.0-rc3 #2
[  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
[  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR: 
c00000000023b42c
[  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted (6.10.0-rc3)
[  140.209899] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 24000244  XER: 00000000
[  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
[  140.209915] GPR00: c008000006a96b20 c000000050607b40 c000000001573700 
c000000004291ee0
[  140.209915] GPR04: 0000000000000000 c000000006150080 00000000c0080005 
fffffffffffe0000
[  140.209915] GPR08: 0000000000000000 18d7003065646f6e 0000000000000000 
c008000006aa8098
[  140.209915] GPR12: c00000000023b42c c00000000f7cdf00 c0000000001a151c 
c000000004f2be80
[  140.209915] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[  140.209915] GPR20: c000000004dbcc00 0000000000000006 0000000000000002 
c000000004911270
[  140.209915] GPR24: 0000000000000000 0000000000000000 c0000000ee254ffc 
c0000000049111f0
[  140.209915] GPR28: 0000000000000000 c000000004911260 c000000004291ee0 
c000000004911260
[  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
[  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 [nvme_core]
[  140.209994] Call Trace:
[  140.209997] [c000000050607b90] [c008000006a96b20] 
nvme_ns_remove+0x80/0x2d8 [nvme_core]
[  140.210008] [c000000050607bd0] [c008000006a972b4] 
nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
[  140.210020] [c000000050607c60] [c008000006a9dbd4] 
nvme_scan_ns_list+0x19c/0x370 [nvme_core]
[  140.210032] [c000000050607d70] [c008000006a9dfc8] 
nvme_scan_work+0xc8/0x278 [nvme_core]
[  140.210043] [c000000050607e40] [c00000000019414c] 
process_one_work+0x20c/0x4f4
[  140.210051] [c000000050607ef0] [c0000000001950cc] 
worker_thread+0x378/0x544
[  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
[  140.210065] [c000000050607fe0] [c00000000000df98] 
start_kernel_thread+0x14/0x18
[  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6 
fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010 
<e9290080> 7c2004ac 71290003 41820008
[  140.210093] ---[ end trace 0000000000000000 ]---


Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.


Reverting it, issue is not seen.


Regards,

Venkat.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10  7:51 Kernel OOPS while creating a NVMe Namespace Venkat Rao Bagalkote
@ 2024-06-10  9:43 ` Hillf Danton
  2024-06-10  9:57 ` Sagi Grimberg
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Hillf Danton @ 2024-06-10  9:43 UTC (permalink / raw)
  To: Venkat Rao Bagalkote
  Cc: kbusch, sagi, linux-block, linux-kernel, linux-nvme, sachinp

On Mon, 10 Jun 2024 13:21:00 +0530 Venkat Rao Bagalkote wrote:
> Greetings!!!
> 
> Observing Kernel OOPS, while creating namespace on a NVMe device.
> 
> [  140.209777] BUG: Unable to handle kernel data access at 
> 0x18d7003065646fee
> [  140.209792] Faulting instruction address: 0xc00000000023b45c
> [  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
> [  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> [  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag 
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
> bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set 
> nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs 
> libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core 
> t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
> [  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not 
> tainted 6.10.0-rc3 #2
> [  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
> 0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
> [  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR: 
> c00000000023b42c
> [  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted (6.10.0-rc3)
> [  140.209899] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
> CR: 24000244  XER: 00000000
> [  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
> [  140.209915] GPR00: c008000006a96b20 c000000050607b40 c000000001573700 
> c000000004291ee0
> [  140.209915] GPR04: 0000000000000000 c000000006150080 00000000c0080005 
> fffffffffffe0000
> [  140.209915] GPR08: 0000000000000000 18d7003065646f6e 0000000000000000 
> c008000006aa8098
> [  140.209915] GPR12: c00000000023b42c c00000000f7cdf00 c0000000001a151c 
> c000000004f2be80
> [  140.209915] GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [  140.209915] GPR20: c000000004dbcc00 0000000000000006 0000000000000002 
> c000000004911270
> [  140.209915] GPR24: 0000000000000000 0000000000000000 c0000000ee254ffc 
> c0000000049111f0
> [  140.209915] GPR28: 0000000000000000 c000000004911260 c000000004291ee0 
> c000000004911260
> [  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
> [  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 [nvme_core]
> [  140.209994] Call Trace:
> [  140.209997] [c000000050607b90] [c008000006a96b20] 
> nvme_ns_remove+0x80/0x2d8 [nvme_core]
> [  140.210008] [c000000050607bd0] [c008000006a972b4] 
> nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
> [  140.210020] [c000000050607c60] [c008000006a9dbd4] 
> nvme_scan_ns_list+0x19c/0x370 [nvme_core]
> [  140.210032] [c000000050607d70] [c008000006a9dfc8] 
> nvme_scan_work+0xc8/0x278 [nvme_core]
> [  140.210043] [c000000050607e40] [c00000000019414c] 
> process_one_work+0x20c/0x4f4
> [  140.210051] [c000000050607ef0] [c0000000001950cc] 
> worker_thread+0x378/0x544
> [  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
> [  140.210065] [c000000050607fe0] [c00000000000df98] 
> start_kernel_thread+0x14/0x18
> [  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6 
> fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010 
> <e9290080> 7c2004ac 71290003 41820008
> [  140.210093] ---[ end trace 0000000000000000 ]---
> 
> 
> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> 
> Reverting it, issue is not seen.

See if refcnt leak existed before be647e2c76b2

--- x/drivers/nvme/host/core.c
+++ y/drivers/nvme/host/core.c
@@ -4078,6 +4078,7 @@ static void nvme_scan_work(struct work_s
 		return;
 	}
 
+	nvme_get_ctrl(ctrl);
 	if (test_and_clear_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
 		dev_info(ctrl->device, "rescanning namespaces.\n");
 		nvme_clear_changed_ns_log(ctrl);
@@ -4097,6 +4098,7 @@ static void nvme_scan_work(struct work_s
 			nvme_scan_ns_sequential(ctrl);
 	}
 	mutex_unlock(&ctrl->scan_lock);
+	nvme_put_ctrl(ctrl);
 }
 
 /*
--


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10  7:51 Kernel OOPS while creating a NVMe Namespace Venkat Rao Bagalkote
  2024-06-10  9:43 ` Hillf Danton
@ 2024-06-10  9:57 ` Sagi Grimberg
  2024-06-10 15:24   ` Keith Busch
  2024-06-10 18:32 ` Chaitanya Kulkarni
  2024-06-10 18:53 ` Keith Busch
  3 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2024-06-10  9:57 UTC (permalink / raw)
  To: Venkat Rao Bagalkote, kbusch
  Cc: linux-block, linux-kernel, linux-nvme, sachinp



On 10/06/2024 10:51, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> Observing Kernel OOPS, while creating namespace on a NVMe device.
>
> [  140.209777] BUG: Unable to handle kernel data access at 
> 0x18d7003065646fee
> [  140.209792] Faulting instruction address: 0xc00000000023b45c
> [  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
> [  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> [  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag 
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
> bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set 
> nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs 
> libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core 
> t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
> [  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not 
> tainted 6.10.0-rc3 #2
> [  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
> 0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
> [  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR: 
> c00000000023b42c
> [  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted 
> (6.10.0-rc3)
> [  140.209899] MSR:  800000000280b033 
> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000244  XER: 00000000
> [  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
> [  140.209915] GPR00: c008000006a96b20 c000000050607b40 
> c000000001573700 c000000004291ee0
> [  140.209915] GPR04: 0000000000000000 c000000006150080 
> 00000000c0080005 fffffffffffe0000
> [  140.209915] GPR08: 0000000000000000 18d7003065646f6e 
> 0000000000000000 c008000006aa8098
> [  140.209915] GPR12: c00000000023b42c c00000000f7cdf00 
> c0000000001a151c c000000004f2be80
> [  140.209915] GPR16: 0000000000000000 0000000000000000 
> 0000000000000000 0000000000000000
> [  140.209915] GPR20: c000000004dbcc00 0000000000000006 
> 0000000000000002 c000000004911270
> [  140.209915] GPR24: 0000000000000000 0000000000000000 
> c0000000ee254ffc c0000000049111f0
> [  140.209915] GPR28: 0000000000000000 c000000004911260 
> c000000004291ee0 c000000004911260
> [  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
> [  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 
> [nvme_core]
> [  140.209994] Call Trace:
> [  140.209997] [c000000050607b90] [c008000006a96b20] 
> nvme_ns_remove+0x80/0x2d8 [nvme_core]
> [  140.210008] [c000000050607bd0] [c008000006a972b4] 
> nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
> [  140.210020] [c000000050607c60] [c008000006a9dbd4] 
> nvme_scan_ns_list+0x19c/0x370 [nvme_core]
> [  140.210032] [c000000050607d70] [c008000006a9dfc8] 
> nvme_scan_work+0xc8/0x278 [nvme_core]
> [  140.210043] [c000000050607e40] [c00000000019414c] 
> process_one_work+0x20c/0x4f4
> [  140.210051] [c000000050607ef0] [c0000000001950cc] 
> worker_thread+0x378/0x544
> [  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
> [  140.210065] [c000000050607fe0] [c00000000000df98] 
> start_kernel_thread+0x14/0x18
> [  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6 
> fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010 
> <e9290080> 7c2004ac 71290003 41820008
> [  140.210093] ---[ end trace 0000000000000000 ]---
>
>
> Issue is introduced by the patch: 
> be647e2c76b27f409cdd520f66c95be888b553a3.

Exactly this was the concern when introducing a behavior change in a 
sensitive area of the code
to silence lockdep...

I'm assuming that the bad dereference is:
         synchronize_srcu(&ns->ctrl->srcu);

btw, looking at the code again, I'm assuming that synchronizing srcu in 
every ns remove
slows down batch removal of many namespaces greatly...

>
>
> Reverting it, issue is not seen.
>
>
> Regards,
>
> Venkat.
>
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10  9:57 ` Sagi Grimberg
@ 2024-06-10 15:24   ` Keith Busch
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Busch @ 2024-06-10 15:24 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Venkat Rao Bagalkote, linux-block, linux-kernel, linux-nvme,
	sachinp

On Mon, Jun 10, 2024 at 12:57:02PM +0300, Sagi Grimberg wrote:
> On 10/06/2024 10:51, Venkat Rao Bagalkote wrote:
> > 
> > [  140.209777] BUG: Unable to handle kernel data access at 0x18d7003065646fee
> > [  140.209792] Faulting instruction address: 0xc00000000023b45c
> > [  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
> > [  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> > [  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not tainted 6.10.0-rc3 #2
> > [  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
> > [  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> > [  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR: c00000000023b42c
> > [  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted (6.10.0-rc3)
> > [  140.209899] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000244  XER: 00000000
> > [  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
> > [  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 [nvme_core]
> > [  140.209994] Call Trace:
> > [  140.209997] [c000000050607b90] [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 [nvme_core]
> > [  140.210008] [c000000050607bd0] [c008000006a972b4] nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
> > [  140.210020] [c000000050607c60] [c008000006a9dbd4] nvme_scan_ns_list+0x19c/0x370 [nvme_core]
> > [  140.210032] [c000000050607d70] [c008000006a9dfc8] nvme_scan_work+0xc8/0x278 [nvme_core]
> > [  140.210043] [c000000050607e40] [c00000000019414c] process_one_work+0x20c/0x4f4
> > [  140.210051] [c000000050607ef0] [c0000000001950cc] worker_thread+0x378/0x544
> > [  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
> > [  140.210065] [c000000050607fe0] [c00000000000df98] start_kernel_thread+0x14/0x18
> > [  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6 fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010 e9290080> 7c2004ac 71290003 41820008
> > [  140.210093] ---[ end trace 0000000000000000 ]---
> > 
> > Issue is introduced by the patch:
> > be647e2c76b27f409cdd520f66c95be888b553a3.
> 
> Exactly this was the concern when introducing a behavior change in a
> sensitive area of the code
> to silence lockdep...

No risk, no reward. :)

If we really can't figure this out, we can always revert and revisit for
the next merge.

> I'm assuming that the bad dereference is:
>         synchronize_srcu(&ns->ctrl->srcu);

That would have to be it based on the report. Not sure what the problem
could be, though, the ns->ctrl must have been valid or we would have
failed earlier, and the srcu struct can't be released while the
controller is still in use by any namespace.

Anyway, I tested this path quite a bit, but I'll revisit with dynamic
attachments instead and see if that helps.

> btw, looking at the code again, I'm assuming that synchronizing srcu in
> every ns remove
> slows down batch removal of many namespaces greatly...

I may need to test this out, but I thought srcu sync was pretty quick if
there were no active readers, which there shouldn't be here outside
unusual cases.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10  7:51 Kernel OOPS while creating a NVMe Namespace Venkat Rao Bagalkote
  2024-06-10  9:43 ` Hillf Danton
  2024-06-10  9:57 ` Sagi Grimberg
@ 2024-06-10 18:32 ` Chaitanya Kulkarni
  2024-06-10 18:53 ` Keith Busch
  3 siblings, 0 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2024-06-10 18:32 UTC (permalink / raw)
  To: Venkat Rao Bagalkote
  Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, sachinp@linux.vnet.com,
	kbusch@kernel.org, sagi@grimberg.me

On 6/10/24 00:51, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
> Observing Kernel OOPS, while creating namespace on a NVMe device.
>
> [  140.209777] BUG: Unable to handle kernel data access at 
> 0x18d7003065646fee
> [  140.209792] Faulting instruction address: 0xc00000000023b45c
> [  140.209798] Oops: Kernel access of bad area, sig: 11 [#1]
> [  140.209802] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
> [  140.209809] Modules linked in: rpadlpar_io rpaphp xsk_diag 
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
> bonding nf_conntrack tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set 
> nf_tables nfnetlink vmx_crypto pseries_rng binfmt_misc fuse xfs 
> libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth nvme nvme_core 
> t10_pi crc64_rocksoft_generic crc64_rocksoft crc64
> [  140.209864] CPU: 2 PID: 129 Comm: kworker/u65:3 Kdump: loaded Not 
> tainted 6.10.0-rc3 #2
> [  140.209870] Hardware name: IBM,9009-42A POWER9 (raw) 0x4e0202 
> 0xf000005 of:IBM,FW950.A0 (VL950_141) hv:phyp pSeries
> [  140.209876] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [  140.209889] NIP:  c00000000023b45c LR: c008000006a96b20 CTR: 
> c00000000023b42c
> [  140.209894] REGS: c0000000506078a0 TRAP: 0380   Not tainted 
> (6.10.0-rc3)
> [  140.209899] MSR:  800000000280b033 
> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24000244  XER: 00000000
> [  140.209915] CFAR: c008000006aa80ac IRQMASK: 0
> [  140.209915] GPR00: c008000006a96b20 c000000050607b40 
> c000000001573700 c000000004291ee0
> [  140.209915] GPR04: 0000000000000000 c000000006150080 
> 00000000c0080005 fffffffffffe0000
> [  140.209915] GPR08: 0000000000000000 18d7003065646f6e 
> 0000000000000000 c008000006aa8098
> [  140.209915] GPR12: c00000000023b42c c00000000f7cdf00 
> c0000000001a151c c000000004f2be80
> [  140.209915] GPR16: 0000000000000000 0000000000000000 
> 0000000000000000 0000000000000000
> [  140.209915] GPR20: c000000004dbcc00 0000000000000006 
> 0000000000000002 c000000004911270
> [  140.209915] GPR24: 0000000000000000 0000000000000000 
> c0000000ee254ffc c0000000049111f0
> [  140.209915] GPR28: 0000000000000000 c000000004911260 
> c000000004291ee0 c000000004911260
> [  140.209975] NIP [c00000000023b45c] synchronize_srcu+0x30/0x1c0
> [  140.209984] LR [c008000006a96b20] nvme_ns_remove+0x80/0x2d8 
> [nvme_core]
> [  140.209994] Call Trace:
> [  140.209997] [c000000050607b90] [c008000006a96b20] 
> nvme_ns_remove+0x80/0x2d8 [nvme_core]
> [  140.210008] [c000000050607bd0] [c008000006a972b4] 
> nvme_remove_invalid_namespaces+0x144/0x1ac [nvme_core]
> [  140.210020] [c000000050607c60] [c008000006a9dbd4] 
> nvme_scan_ns_list+0x19c/0x370 [nvme_core]
> [  140.210032] [c000000050607d70] [c008000006a9dfc8] 
> nvme_scan_work+0xc8/0x278 [nvme_core]
> [  140.210043] [c000000050607e40] [c00000000019414c] 
> process_one_work+0x20c/0x4f4
> [  140.210051] [c000000050607ef0] [c0000000001950cc] 
> worker_thread+0x378/0x544
> [  140.210058] [c000000050607f90] [c0000000001a164c] kthread+0x138/0x140
> [  140.210065] [c000000050607fe0] [c00000000000df98] 
> start_kernel_thread+0x14/0x18
> [  140.210072] Code: 3c4c0134 384282d4 7c0802a6 60000000 7c0802a6 
> fbc1fff0 fba1ffe8 fbe1fff8 7c7e1b78 f8010010 f821ffb1 e9230010 
> <e9290080> 7c2004ac 71290003 41820008
> [  140.210093] ---[ end trace 0000000000000000 ]---
>
>
> Issue is introduced by the patch: 
> be647e2c76b27f409cdd520f66c95be888b553a3.
>
>
> Reverting it, issue is not seen.
>
>
> Regards,
>
> Venkat.
>
>
>

do you have steps that you can share ?
did you find this using blktest ? if not can you please submit
the blktest for this issue ? this clearly needs to be tested regularly
since people are working on this sensitive area ...

-ck



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10  7:51 Kernel OOPS while creating a NVMe Namespace Venkat Rao Bagalkote
                   ` (2 preceding siblings ...)
  2024-06-10 18:32 ` Chaitanya Kulkarni
@ 2024-06-10 18:53 ` Keith Busch
  2024-06-10 19:05   ` Sagi Grimberg
  3 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2024-06-10 18:53 UTC (permalink / raw)
  To: Venkat Rao Bagalkote; +Cc: sagi, linux-block, linux-kernel, linux-nvme, sachinp

On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
> 
> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.

My mistake. The namespace remove list appears to be getting corrupted
because I'm using the wrong APIs to replace a "list_move_tail". This is
fixing the issue on my end:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7c9f91314d366..c667290de5133 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 
 	mutex_lock(&ctrl->namespaces_lock);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid)
-			list_splice_init_rcu(&ns->list, &rm_list,
-					     synchronize_rcu);
+		if (ns->head->ns_id > nsid) {
+			list_del_rcu(&ns->list);
+			list_add_tail_rcu(&ns->list, &rm_list);
+		}
 	}
 	mutex_unlock(&ctrl->namespaces_lock);
 	synchronize_srcu(&ctrl->srcu);
--


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10 18:53 ` Keith Busch
@ 2024-06-10 19:05   ` Sagi Grimberg
  2024-06-10 19:15     ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2024-06-10 19:05 UTC (permalink / raw)
  To: Keith Busch, Venkat Rao Bagalkote
  Cc: linux-block, linux-kernel, linux-nvme, sachinp



On 10/06/2024 21:53, Keith Busch wrote:
> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> My mistake. The namespace remove list appears to be getting corrupted
> because I'm using the wrong APIs to replace a "list_move_tail". This is
> fixing the issue on my end:
>
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7c9f91314d366..c667290de5133 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   
>   	mutex_lock(&ctrl->namespaces_lock);
>   	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> -		if (ns->head->ns_id > nsid)
> -			list_splice_init_rcu(&ns->list, &rm_list,
> -					     synchronize_rcu);
> +		if (ns->head->ns_id > nsid) {
> +			list_del_rcu(&ns->list);
> +			list_add_tail_rcu(&ns->list, &rm_list);
> +		}
>   	}
>   	mutex_unlock(&ctrl->namespaces_lock);
>   	synchronize_srcu(&ctrl->srcu);
> --

Can we add a reproducer for this in blktests? I'm assuming that we can 
easily trigger this
with adding/removing nvmet namespaces?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10 19:05   ` Sagi Grimberg
@ 2024-06-10 19:15     ` Keith Busch
  2024-06-10 19:17       ` Sagi Grimberg
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2024-06-10 19:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Venkat Rao Bagalkote, linux-block, linux-kernel, linux-nvme,
	sachinp

On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
> 
> 
> On 10/06/2024 21:53, Keith Busch wrote:
> > On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
> > > Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> > My mistake. The namespace remove list appears to be getting corrupted
> > because I'm using the wrong APIs to replace a "list_move_tail". This is
> > fixing the issue on my end:
> > 
> > ---
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 7c9f91314d366..c667290de5133 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
> >   	mutex_lock(&ctrl->namespaces_lock);
> >   	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > -		if (ns->head->ns_id > nsid)
> > -			list_splice_init_rcu(&ns->list, &rm_list,
> > -					     synchronize_rcu);
> > +		if (ns->head->ns_id > nsid) {
> > +			list_del_rcu(&ns->list);
> > +			list_add_tail_rcu(&ns->list, &rm_list);
> > +		}
> >   	}
> >   	mutex_unlock(&ctrl->namespaces_lock);
> >   	synchronize_srcu(&ctrl->srcu);
> > --
> 
> Can we add a reproducer for this in blktests? I'm assuming that we can
> easily trigger this
> with adding/removing nvmet namespaces?

I'm testing this with Namespace Manamgent commands, which nvmet doesn't
support. You can recreate the issue by detaching the last namespace.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10 19:15     ` Keith Busch
@ 2024-06-10 19:17       ` Sagi Grimberg
  2024-06-10 19:33         ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2024-06-10 19:17 UTC (permalink / raw)
  To: Keith Busch
  Cc: Venkat Rao Bagalkote, linux-block, linux-kernel, linux-nvme,
	sachinp



On 10/06/2024 22:15, Keith Busch wrote:
> On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
>>
>> On 10/06/2024 21:53, Keith Busch wrote:
>>> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>>>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
>>> My mistake. The namespace remove list appears to be getting corrupted
>>> because I'm using the wrong APIs to replace a "list_move_tail". This is
>>> fixing the issue on my end:
>>>
>>> ---
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 7c9f91314d366..c667290de5133 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>>    	mutex_lock(&ctrl->namespaces_lock);
>>>    	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
>>> -		if (ns->head->ns_id > nsid)
>>> -			list_splice_init_rcu(&ns->list, &rm_list,
>>> -					     synchronize_rcu);
>>> +		if (ns->head->ns_id > nsid) {
>>> +			list_del_rcu(&ns->list);
>>> +			list_add_tail_rcu(&ns->list, &rm_list);
>>> +		}
>>>    	}
>>>    	mutex_unlock(&ctrl->namespaces_lock);
>>>    	synchronize_srcu(&ctrl->srcu);
>>> --
>> Can we add a reproducer for this in blktests? I'm assuming that we can
>> easily trigger this
>> with adding/removing nvmet namespaces?
> I'm testing this with Namespace Manamgent commands, which nvmet doesn't
> support. You can recreate the issue by detaching the last namespace.
>

I think the same will happen in a test that creates two namespaces and 
then echo 0 > ns/enable.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10 19:17       ` Sagi Grimberg
@ 2024-06-10 19:33         ` Keith Busch
  2024-06-17  9:10           ` Nilay Shroff
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2024-06-10 19:33 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Venkat Rao Bagalkote, linux-block, linux-kernel, linux-nvme,
	sachinp

On Mon, Jun 10, 2024 at 10:17:42PM +0300, Sagi Grimberg wrote:
> On 10/06/2024 22:15, Keith Busch wrote:
> > On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
> > > 
> > > On 10/06/2024 21:53, Keith Busch wrote:
> > > > On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
> > > > > Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
> > > > My mistake. The namespace remove list appears to be getting corrupted
> > > > because I'm using the wrong APIs to replace a "list_move_tail". This is
> > > > fixing the issue on my end:
> > > > 
> > > > ---
> > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > > index 7c9f91314d366..c667290de5133 100644
> > > > --- a/drivers/nvme/host/core.c
> > > > +++ b/drivers/nvme/host/core.c
> > > > @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
> > > >    	mutex_lock(&ctrl->namespaces_lock);
> > > >    	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > > > -		if (ns->head->ns_id > nsid)
> > > > -			list_splice_init_rcu(&ns->list, &rm_list,
> > > > -					     synchronize_rcu);
> > > > +		if (ns->head->ns_id > nsid) {
> > > > +			list_del_rcu(&ns->list);
> > > > +			list_add_tail_rcu(&ns->list, &rm_list);
> > > > +		}
> > > >    	}
> > > >    	mutex_unlock(&ctrl->namespaces_lock);
> > > >    	synchronize_srcu(&ctrl->srcu);
> > > > --
> > > Can we add a reproducer for this in blktests? I'm assuming that we can
> > > easily trigger this
> > > with adding/removing nvmet namespaces?
> > I'm testing this with Namespace Manamgent commands, which nvmet doesn't
> > support. You can recreate the issue by detaching the last namespace.
> > 
> 
> I think the same will happen in a test that creates two namespaces and then
> echo 0 > ns/enable.

Looks like nvme/016 tess this. It's reporting as "passed" on my end, but
I don't think it's actually testing the driver as intended. Still
messing with it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel OOPS while creating a NVMe Namespace
  2024-06-10 19:33         ` Keith Busch
@ 2024-06-17  9:10           ` Nilay Shroff
  0 siblings, 0 replies; 11+ messages in thread
From: Nilay Shroff @ 2024-06-17  9:10 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg
  Cc: Venkat Rao Bagalkote, linux-block, linux-kernel, linux-nvme,
	sachinp, Chaitanya Kulkarni, shinichiro.kawasaki



On 6/11/24 01:03, Keith Busch wrote:
> On Mon, Jun 10, 2024 at 10:17:42PM +0300, Sagi Grimberg wrote:
>> On 10/06/2024 22:15, Keith Busch wrote:
>>> On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote:
>>>>
>>>> On 10/06/2024 21:53, Keith Busch wrote:
>>>>> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
>>>>>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
>>>>> My mistake. The namespace remove list appears to be getting corrupted
>>>>> because I'm using the wrong APIs to replace a "list_move_tail". This is
>>>>> fixing the issue on my end:
>>>>>
>>>>> ---
>>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>>> index 7c9f91314d366..c667290de5133 100644
>>>>> --- a/drivers/nvme/host/core.c
>>>>> +++ b/drivers/nvme/host/core.c
>>>>> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>>>>    	mutex_lock(&ctrl->namespaces_lock);
>>>>>    	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
>>>>> -		if (ns->head->ns_id > nsid)
>>>>> -			list_splice_init_rcu(&ns->list, &rm_list,
>>>>> -					     synchronize_rcu);
>>>>> +		if (ns->head->ns_id > nsid) {
>>>>> +			list_del_rcu(&ns->list);
>>>>> +			list_add_tail_rcu(&ns->list, &rm_list);
>>>>> +		}
>>>>>    	}
>>>>>    	mutex_unlock(&ctrl->namespaces_lock);
>>>>>    	synchronize_srcu(&ctrl->srcu);
>>>>> --
>>>> Can we add a reproducer for this in blktests? I'm assuming that we can
>>>> easily trigger this
>>>> with adding/removing nvmet namespaces?
>>> I'm testing this with Namespace Manamgent commands, which nvmet doesn't
>>> support. You can recreate the issue by detaching the last namespace.
>>>
>>
>> I think the same will happen in a test that creates two namespaces and then
>> echo 0 > ns/enable.
> 
> Looks like nvme/016 tess this. It's reporting as "passed" on my end, but
> I don't think it's actually testing the driver as intended. Still
> messing with it.
> 
I believe nvme/016 creates and deletes the namespace however there's no backstore 
associated with the loop device and hence nvme/016 is unable to recreate this issue.

To recreate this issue, we need to associate a backstore (either a block-dev or 
a regular-file) to the loop device and then use it for creating and then deleting 
the namespace.

I wrote a blktest for this specific regression and I could able to trigger this crash. 
I would submit this blktest in a separate email. 

Thanks,
--Nilay




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-17  9:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-10  7:51 Kernel OOPS while creating a NVMe Namespace Venkat Rao Bagalkote
2024-06-10  9:43 ` Hillf Danton
2024-06-10  9:57 ` Sagi Grimberg
2024-06-10 15:24   ` Keith Busch
2024-06-10 18:32 ` Chaitanya Kulkarni
2024-06-10 18:53 ` Keith Busch
2024-06-10 19:05   ` Sagi Grimberg
2024-06-10 19:15     ` Keith Busch
2024-06-10 19:17       ` Sagi Grimberg
2024-06-10 19:33         ` Keith Busch
2024-06-17  9:10           ` Nilay Shroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox