linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [blktest/nvme/058] Kernel OOPs while running nvme/058 tests
@ 2025-08-26  8:30 Venkat Rao Bagalkote
  2025-08-26  9:08 ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Venkat Rao Bagalkote @ 2025-08-26  8:30 UTC (permalink / raw)
  To: LKML, linux-nvme, Nilay Shroff

Greetings!!!


IBM CI has reported a kernel OOPs, while running blktest suite(nvme/058 
test).


Kernel Repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git


Traces:


[37496.800225] BUG: Kernel NULL pointer dereference at 0x00000000
[37496.800230] Faulting instruction address: 0xc0000000008a34b0
[37496.800235] Oops: Kernel access of bad area, sig: 11 [#1]
[37496.800239] LE PAGE_SIZE=64K MMU=Hash  SMP NR_CPUS=8192 NUMA pSeries
[37496.800245] Modules linked in: nvme_loop(E) nft_compat(E) bonding(E) 
nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) 
nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) 
nft_ct(E) nft_chain_nat(E) rfkill(E) ip_set(E) mlx5_ib(E) ib_uverbs(E) 
ib_core(E) pseries_rng(E) vmx_crypto(E) drm(E) 
drm_panel_orientation_quirks(E) xfs(E) sr_mod(E) cdrom(E) sd_mod(E) 
sg(E) lpfc(E) nvmet_fc(E) ibmvscsi(E) mlx5_core(E) ibmveth(E) 
scsi_transport_srp(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) mlxfw(E) 
nvme_core(E) tls(E) scsi_transport_fc(E) psample(E) fuse(E) [last 
unloaded: nvme_loop(E)]
[37496.800309] CPU: 40 UID: 0 PID: 417 Comm: kworker/40:1H Kdump: loaded 
Tainted: G            E       6.17.0-rc3-gb6add54ba618 #1 VOLUNTARY
[37496.800317] Tainted: [E]=UNSIGNED_MODULE
[37496.800320] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202 
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[37496.800326] Workqueue: kblockd nvme_requeue_work [nvme_core]
[37496.800339] NIP:  c0000000008a34b0 LR: c000000000869654 CTR: 
c00000000086954c
[37496.800344] REGS: c000000015e97970 TRAP: 0380   Tainted: G       E    
     (6.17.0-rc3-gb6add54ba618)
[37496.800349] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 4800020f  XER: 0000000f
[37496.800365] CFAR: c000000000869650 IRQMASK: 0
[37496.800365] GPR00: c000000000869654 c000000015e97c10 c000000001c88100 
0000000000000000
[37496.800365] GPR04: c0000000b005c400 c000000015e979d0 c000000015e979c8 
0000001037ba0000
[37496.800365] GPR08: 0000000000000100 c00000013923db18 0000000000000140 
c008000005bba2d0
[37496.800365] GPR12: c00000000086954c c000000017f7e700 c0000000001aab20 
c000000015b533c0
[37496.800365] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[37496.800365] GPR20: c000001039e1c418 c000000002e075d0 c0000000027a3818 
fffffffffffffef7
[37496.800365] GPR24: 0000000000000402 c000000015b72000 c0000001c0374850 
c000000134492010
[37496.800365] GPR28: 0000000000000001 0000000000000001 c0000000b005c400 
0000000000000000
[37496.800424] NIP [c0000000008a34b0] __rq_qos_done_bio+0x3c/0x88
[37496.800433] LR [c000000000869654] bio_endio+0x108/0x2b8
[37496.800440] Call Trace:
[37496.800442] [c000000015e97c40] [c000000000869654] bio_endio+0x108/0x2b8
[37496.800450] [c000000015e97c80] [c008000005bb6794] 
nvme_ns_head_submit_bio+0x25c/0x358 [nvme_core]
[37496.800462] [c000000015e97d10] [c00000000087320c] 
__submit_bio+0x150/0x304
[37496.800469] [c000000015e97da0] [c000000000873444] 
__submit_bio_noacct+0x84/0x250
[37496.800476] [c000000015e97e10] [c008000005bb5540] 
nvme_requeue_work+0x94/0xd8 [nvme_core]
[37496.800488] [c000000015e97e40] [c00000000019c72c] 
process_one_work+0x1fc/0x4ac
[37496.800497] [c000000015e97ef0] [c00000000019d6dc] 
worker_thread+0x340/0x510
[37496.800504] [c000000015e97f90] [c0000000001aac4c] kthread+0x134/0x164
[37496.800512] [c000000015e97fe0] [c00000000000df78] 
start_kernel_thread+0x14/0x18
[37496.800518] Code: 60000000 7c0802a6 fbc1fff0 fbe1fff8 7c9e2378 
7c7f1b78 f8010010 f821ffd1 f8410018 60000000 60000000 60000000 
<e93f0000> 7fe3fb78 7fc4f378 e9290030
[37496.800540] ---[ end trace 0000000000000000 ]---
[37496.827403] pstore: backend (nvram) writing error (-1)
[37496.827409]


If you happen to fix this, please add below tag.


Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>


Regards,

Venkat.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [blktest/nvme/058] Kernel OOPs while running nvme/058 tests
  2025-08-26  8:30 [blktest/nvme/058] Kernel OOPs while running nvme/058 tests Venkat Rao Bagalkote
@ 2025-08-26  9:08 ` Ming Lei
  2025-08-26  9:56   ` Nilay Shroff
  0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2025-08-26  9:08 UTC (permalink / raw)
  To: Venkat Rao Bagalkote; +Cc: LKML, linux-nvme, Nilay Shroff, linux-block

On Tue, Aug 26, 2025 at 02:00:56PM +0530, Venkat Rao Bagalkote wrote:
> Greetings!!!
> 
> 
> IBM CI has reported a kernel OOPs, while running blktest suite(nvme/058
> test).
> 
> 
> Kernel Repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 
> 
> Traces:
> 
> 
> [37496.800225] BUG: Kernel NULL pointer dereference at 0x00000000
> [37496.800230] Faulting instruction address: 0xc0000000008a34b0
> [37496.800235] Oops: Kernel access of bad area, sig: 11 [#1]

...

> [37496.800365] GPR28: 0000000000000001 0000000000000001 c0000000b005c400
> 0000000000000000
> [37496.800424] NIP [c0000000008a34b0] __rq_qos_done_bio+0x3c/0x88

It looks regression from 370ac285f23a ("block: avoid cpu_hotplug_lock depedency on freeze_lock"),
For nvme mpath, same bio crosses two drivers, so QUEUE_FLAG_QOS_ENABLED & q->rq_qos check can't
be skipped.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [blktest/nvme/058] Kernel OOPs while running nvme/058 tests
  2025-08-26  9:08 ` Ming Lei
@ 2025-08-26  9:56   ` Nilay Shroff
  2025-08-26 14:49     ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Nilay Shroff @ 2025-08-26  9:56 UTC (permalink / raw)
  To: Ming Lei, Venkat Rao Bagalkote; +Cc: LKML, linux-nvme, linux-block



On 8/26/25 2:38 PM, Ming Lei wrote:
> On Tue, Aug 26, 2025 at 02:00:56PM +0530, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>>
>>
>> IBM CI has reported a kernel OOPs, while running blktest suite(nvme/058
>> test).
>>
>>
>> Kernel Repo:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>>
>> Traces:
>>
>>
>> [37496.800225] BUG: Kernel NULL pointer dereference at 0x00000000
>> [37496.800230] Faulting instruction address: 0xc0000000008a34b0
>> [37496.800235] Oops: Kernel access of bad area, sig: 11 [#1]
> 
> ...
> 
>> [37496.800365] GPR28: 0000000000000001 0000000000000001 c0000000b005c400
>> 0000000000000000
>> [37496.800424] NIP [c0000000008a34b0] __rq_qos_done_bio+0x3c/0x88
> 
> It looks regression from 370ac285f23a ("block: avoid cpu_hotplug_lock depedency on freeze_lock"),
> For nvme mpath, same bio crosses two drivers, so QUEUE_FLAG_QOS_ENABLED & q->rq_qos check can't
> be skipped.
> 
Thanks Ming for looking at it. And yes you were correct, we can't skip
QUEUE_FLAG_QOS_ENABLED & q->rq_qos for NVMe, However this issue only
manifests with NVMe multipath enabled, as that would create the stacked
NVMe devices. So shall I send the fix or are you going to send the patch
with fix?

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [blktest/nvme/058] Kernel OOPs while running nvme/058 tests
  2025-08-26  9:56   ` Nilay Shroff
@ 2025-08-26 14:49     ` Ming Lei
  0 siblings, 0 replies; 4+ messages in thread
From: Ming Lei @ 2025-08-26 14:49 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Venkat Rao Bagalkote, LKML, linux-nvme, linux-block

On Tue, Aug 26, 2025 at 03:26:02PM +0530, Nilay Shroff wrote:
> 
> 
> On 8/26/25 2:38 PM, Ming Lei wrote:
> > On Tue, Aug 26, 2025 at 02:00:56PM +0530, Venkat Rao Bagalkote wrote:
> >> Greetings!!!
> >>
> >>
> >> IBM CI has reported a kernel OOPs, while running blktest suite(nvme/058
> >> test).
> >>
> >>
> >> Kernel Repo:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >>
> >>
> >> Traces:
> >>
> >>
> >> [37496.800225] BUG: Kernel NULL pointer dereference at 0x00000000
> >> [37496.800230] Faulting instruction address: 0xc0000000008a34b0
> >> [37496.800235] Oops: Kernel access of bad area, sig: 11 [#1]
> > 
> > ...
> > 
> >> [37496.800365] GPR28: 0000000000000001 0000000000000001 c0000000b005c400
> >> 0000000000000000
> >> [37496.800424] NIP [c0000000008a34b0] __rq_qos_done_bio+0x3c/0x88
> > 
> > It looks regression from 370ac285f23a ("block: avoid cpu_hotplug_lock depedency on freeze_lock"),
> > For nvme mpath, same bio crosses two drivers, so QUEUE_FLAG_QOS_ENABLED & q->rq_qos check can't
> > be skipped.
> > 
> Thanks Ming for looking at it. And yes you were correct, we can't skip
> QUEUE_FLAG_QOS_ENABLED & q->rq_qos for NVMe, However this issue only
> manifests with NVMe multipath enabled, as that would create the stacked
> NVMe devices. So shall I send the fix or are you going to send the patch
> with fix?

Yeah, please go ahead and prepare the fix.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-26 14:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26  8:30 [blktest/nvme/058] Kernel OOPs while running nvme/058 tests Venkat Rao Bagalkote
2025-08-26  9:08 ` Ming Lei
2025-08-26  9:56   ` Nilay Shroff
2025-08-26 14:49     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).