LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
@ 2026-06-01  6:41 Venkat Rao Bagalkote
  2026-06-01  9:16 ` Shrikanth Hegde
  0 siblings, 1 reply; 8+ messages in thread
From: Venkat Rao Bagalkote @ 2026-06-01  6:41 UTC (permalink / raw)
  To: Peter Zijlstra, Shrikanth Hegde, Srikar Dronamraju,
	Madhavan Srinivasan
  Cc: linuxppc-dev, LKML

Greetings!!!


I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 
LPAR). The issue was observed once in CI (Avocado tests) and I haven’t 
been able to reproduce it reliably yet.

Architecture: ppc64le (Power11, pSeries)
Kernel: 7.1.0-rc5-next-20260529
Config: PREEMPT(lazy)
CPUs: large system (NR_CPUS=8192)


So far, I have not reproduced the crash, but I am trying to stress 
similar conditions using:

parallel read workloads (fio / dd)
memory pressure


Traces:

  (5/8) 
/home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type-upstream-9cfe: 
STARTED
[ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be 
inaccurate
[ 1887.456075] ------------[ cut here ]------------
[ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
[ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding 
tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng 
vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm 
sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
[ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 
7.1.0-rc5-next-20260529 #1 PREEMPT(lazy)
[ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 
0000000000000000
[ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
24428222  XER: 0000005a
[ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
[ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 
c000000069f033e0
[ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 
c000000006dd3b00
[ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 
0000000024428220
[ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 
0000000000000000
[ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 
c000000069f033e0
[ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
[ 1887.456274] LR [c0000000003483bc] 
dynamic_irqentry_exit_cond_resched+0x40/0x1a4
[ 1887.456282] Call Trace:
[ 1887.456284] [c000000069f03360] [c0000000003483bc] 
dynamic_irqentry_exit_cond_resched+0x40/0x1a4
[ 1887.456291] [c000000069f03380] [c00000000014f3bc] 
do_page_fault+0xc0/0x104
[ 1887.456298] [c000000069f033b0] [c000000000008be0] 
data_access_common_virt+0x210/0x220
[ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
[ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 
0000000000000000
[ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
24428220  XER: 2004005a
[ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 
42000000 IRQMASK: 0
[ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 
00003fff879a8000
[ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 
0200c4080368f028
[ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 
0000000000000030
[ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 
000000000000000e
[ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 
c00000006e73e500
[ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 
0000000000000001
[ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 
c000000069f03a30
[ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 
0000000000001000
[ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
[ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
[ 1887.456405] ---- interrupt: 300
[ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] 
raw_copy_to_user+0x9c/0x314 (unreliable)
[ 1887.456416] [c000000069f036e0] [c000000000aacd08] 
_copy_to_iter+0xe4/0x79c
[ 1887.456423] [c000000069f037a0] [c000000000ab01ec] 
copy_page_to_iter+0xd4/0x1a4
[ 1887.456429] [c000000069f037f0] [c0000000005ddc34] 
filemap_read+0x420/0x4f0
[ 1887.456436] [c000000069f039c0] [c0080000043443e0] 
ext4_file_read_iter+0x78/0x31c [ext4]
[ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
[ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
[ 1887.456530] [c000000069f03b10] [c000000000032f98] 
system_call_exception+0x198/0x4e0
[ 1887.456537] [c000000069f03e30] [c00000000000d05c] 
system_call_vectored_common+0x15c/0x2ec
[ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
[ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 
0000000000000000
[ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted 
(7.1.0-rc5-next-20260529)
[ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 
44424402  XER: 00000000
[ 1887.456572] IRQMASK: 0
[ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 
0000000000000003
[ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 
0000000000000000
[ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 
0000000000000000
[ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 
0000000000000000
[ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 
00003fffe5fb5ee7
[ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 
00003fffe5fb42d0
[ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 
0000000000000000
[ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
[ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
[ 1887.456634] ---- interrupt: 3000
[ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 
39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e 
<0b090000> 60000000 3bc00000 ebed0128
[ 1887.456657] ---[ end trace 0000000000000000 ]---


If you happen to fix this, please add below tag.

Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>


Regards,

Venkat.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-01  6:41 [linux-next20260529] kernel BUG at kernel/sched/core.c:7512! Venkat Rao Bagalkote
@ 2026-06-01  9:16 ` Shrikanth Hegde
  2026-06-01  9:56   ` Peter Zijlstra
  2026-06-01 13:33   ` Venkat
  0 siblings, 2 replies; 8+ messages in thread
From: Shrikanth Hegde @ 2026-06-01  9:16 UTC (permalink / raw)
  To: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani
  Cc: linuxppc-dev, LKML, Srikar Dronamraju, Peter Zijlstra

Hi Venkat. Thanks for the report.

+ mukesh, ritesh

On 6/1/26 12:11 PM, Venkat Rao Bagalkote wrote:
> Greetings!!!
> 
> 
> I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 
> LPAR). The issue was observed once in CI (Avocado tests) and I haven’t 
> been able to reproduce it reliably yet.
> 

Can you run with lockdep and see if you can hit it?

> Architecture: ppc64le (Power11, pSeries)
> Kernel: 7.1.0-rc5-next-20260529
> Config: PREEMPT(lazy)
> CPUs: large system (NR_CPUS=8192)
> 

This is with GENERIC_ENTRY.

> 
> So far, I have not reproduced the crash, but I am trying to stress 
> similar conditions using:
> 
> parallel read workloads (fio / dd)
> memory pressure
> 
> 
> Traces:
> 
>   (5/8) /home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/ 
> cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type- 
> upstream-9cfe: STARTED
> [ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be 
> inaccurate
> [ 1887.456075] ------------[ cut here ]------------
> [ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
> [ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
> [ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding 
> tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng 
> vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm 
> sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
> [ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 7.1.0- 
> rc5-next-20260529 #1 PREEMPT(lazy)
> [ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 
> 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
> [ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 
> 0000000000000000
> [ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
> 24428222  XER: 0000005a
> [ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
> [ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 
> c000000069f033e0
> [ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 
> c000000006dd3b00
> [ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 
> 0000000024428220
> [ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 
> c000000069f033e0
> [ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
> [ 1887.456274] LR [c0000000003483bc] 
> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
> [ 1887.456282] Call Trace:
> [ 1887.456284] [c000000069f03360] [c0000000003483bc] 
> dynamic_irqentry_exit_cond_resched+0x40/0x1a4
> [ 1887.456291] [c000000069f03380] [c00000000014f3bc] 
> do_page_fault+0xc0/0x104
> [ 1887.456298] [c000000069f033b0] [c000000000008be0] 
> data_access_common_virt+0x210/0x220
> [ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
> [ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 
> 0000000000000000
> [ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 
> 24428220  XER: 2004005a
> [ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 
> 42000000 IRQMASK: 0
> [ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 
> 00003fff879a8000
> [ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 
> 0200c4080368f028
> [ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 
> 0000000000000030
> [ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 
> 000000000000000e
> [ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 
> c00000006e73e500
> [ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 
> 0000000000000001
> [ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 
> c000000069f03a30
> [ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 
> 0000000000001000
> [ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
> [ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
> [ 1887.456405] ---- interrupt: 300
> [ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] 
> raw_copy_to_user+0x9c/0x314 (unreliable)
> [ 1887.456416] [c000000069f036e0] [c000000000aacd08] 
> _copy_to_iter+0xe4/0x79c
> [ 1887.456423] [c000000069f037a0] [c000000000ab01ec] 
> copy_page_to_iter+0xd4/0x1a4
> [ 1887.456429] [c000000069f037f0] [c0000000005ddc34] 
> filemap_read+0x420/0x4f0
> [ 1887.456436] [c000000069f039c0] [c0080000043443e0] 
> ext4_file_read_iter+0x78/0x31c [ext4]
> [ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
> [ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
> [ 1887.456530] [c000000069f03b10] [c000000000032f98] 
> system_call_exception+0x198/0x4e0
> [ 1887.456537] [c000000069f03e30] [c00000000000d05c] 
> system_call_vectored_common+0x15c/0x2ec
> [ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
> [ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 
> 0000000000000000
> [ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted (7.1.0- 
> rc5-next-20260529)
> [ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 
> 44424402  XER: 00000000
> [ 1887.456572] IRQMASK: 0
> [ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 
> 0000000000000003
> [ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 
> 0000000000000000
> [ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 
> 0000000000000000
> [ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 
> 0000000000000000
> [ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 
> 00003fffe5fb5ee7
> [ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 
> 00003fffe5fb42d0
> [ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 
> 0000000000000000
> [ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
> [ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
> [ 1887.456634] ---- interrupt: 3000
> [ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 
> 39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e 
> <0b090000> 60000000 3bc00000 ebed0128
> [ 1887.456657] ---[ end trace 0000000000000000 ]---
> 
> 
> If you happen to fix this, please add below tag.
> 
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> 

Ritesh, Mukesh, Is below possible scenario?

do_page_fault seems to enable irq's in the interrupt handler?
is that expected? if so, one might see

-- do_page_fault (enter kernel mode)
    -- enables interrupts
    -- gets interrupt - Sets need_resched.
       -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
			 and calls preempt_schedule_irq, which catches both
			 preempt_count and !irqs_disabled. Hence the panic?

Should do_page_fault do preempt_disable when it enables the interrupts?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-01  9:16 ` Shrikanth Hegde
@ 2026-06-01  9:56   ` Peter Zijlstra
  2026-06-02  7:56     ` Shrikanth Hegde
  2026-06-01 13:33   ` Venkat
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-06-01  9:56 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani, linuxppc-dev, LKML,
	Srikar Dronamraju

On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:

> Ritesh, Mukesh, Is below possible scenario?
> 
> do_page_fault seems to enable irq's in the interrupt handler?
> is that expected? if so, one might see
> 
> -- do_page_fault (enter kernel mode)
>    -- enables interrupts
>    -- gets interrupt - Sets need_resched.
>       -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> 			 and calls preempt_schedule_irq, which catches both
> 			 preempt_count and !irqs_disabled. Hence the panic?
> 
> Should do_page_fault do preempt_disable when it enables the interrupts?

No, it is expected for page-fault to be able to schedule. Specifically,
it must be able to sleep to support loading pages from disk.

Please check the value of preempt_count() (does it perchance have
HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
also disable them again once done.

Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
but I'm not seeing a local_irq_disable() to match!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-01  9:16 ` Shrikanth Hegde
  2026-06-01  9:56   ` Peter Zijlstra
@ 2026-06-01 13:33   ` Venkat
  1 sibling, 0 replies; 8+ messages in thread
From: Venkat @ 2026-06-01 13:33 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Madhavan Srinivasan, Mukesh Kumar Chaurasiya, Ritesh Harjani,
	linuxppc-dev, LKML, Srikar Dronamraju, Peter Zijlstra



> On 1 Jun 2026, at 2:46 PM, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> 
> Hi Venkat. Thanks for the report.
> 
> + mukesh, ritesh
> 
> On 6/1/26 12:11 PM, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>> I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 LPAR). The issue was observed once in CI (Avocado tests) and I haven’t been able to reproduce it reliably yet.
> 
> Can you run with lockdep and see if you can hit it?

I did run with lockdep, I didn’t hit this issue. Though I will try few more times, If I hit, I will respond back here.

Meanwhile, I hit another boot warning with lockdep enabled, which is reproted here [1].

[1]: https://lore.kernel.org/all/b44aefc5-e066-478b-8d34-50d2d0deab6b@linux.ibm.com/

Regards,
Venkat.
> 
>> Architecture: ppc64le (Power11, pSeries)
>> Kernel: 7.1.0-rc5-next-20260529
>> Config: PREEMPT(lazy)
>> CPUs: large system (NR_CPUS=8192)
> 
> This is with GENERIC_ENTRY.
> 
>> So far, I have not reproduced the crash, but I am trying to stress similar conditions using:
>> parallel read workloads (fio / dd)
>> memory pressure
>> Traces:
>>  (5/8) /home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/ cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type- upstream-9cfe: STARTED
>> [ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1887.456075] ------------[ cut here ]------------
>> [ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
>> [ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
>> [ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
>> [ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 7.1.0- rc5-next-20260529 #1 PREEMPT(lazy)
>> [ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>> [ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 0000000000000000
>> [ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24428222  XER: 0000005a
>> [ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
>> [ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 c000000069f033e0
>> [ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 c000000006dd3b00
>> [ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 0000000024428220
>> [ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 0000000000000000
>> [ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000069f033e0
>> [ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
>> [ 1887.456274] LR [c0000000003483bc] dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456282] Call Trace:
>> [ 1887.456284] [c000000069f03360] [c0000000003483bc] dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456291] [c000000069f03380] [c00000000014f3bc] do_page_fault+0xc0/0x104
>> [ 1887.456298] [c000000069f033b0] [c000000000008be0] data_access_common_virt+0x210/0x220
>> [ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 0000000000000000
>> [ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24428220  XER: 2004005a
>> [ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 42000000 IRQMASK: 0
>> [ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 00003fff879a8000
>> [ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 0200c4080368f028
>> [ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 0000000000000030
>> [ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 000000000000000e
>> [ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 c00000006e73e500
>> [ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 0000000000000001
>> [ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 c000000069f03a30
>> [ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 0000000000001000
>> [ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
>> [ 1887.456405] ---- interrupt: 300
>> [ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] raw_copy_to_user+0x9c/0x314 (unreliable)
>> [ 1887.456416] [c000000069f036e0] [c000000000aacd08] _copy_to_iter+0xe4/0x79c
>> [ 1887.456423] [c000000069f037a0] [c000000000ab01ec] copy_page_to_iter+0xd4/0x1a4
>> [ 1887.456429] [c000000069f037f0] [c0000000005ddc34] filemap_read+0x420/0x4f0
>> [ 1887.456436] [c000000069f039c0] [c0080000043443e0] ext4_file_read_iter+0x78/0x31c [ext4]
>> [ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
>> [ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
>> [ 1887.456530] [c000000069f03b10] [c000000000032f98] system_call_exception+0x198/0x4e0
>> [ 1887.456537] [c000000069f03e30] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
>> [ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
>> [ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 0000000000000000
>> [ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44424402  XER: 00000000
>> [ 1887.456572] IRQMASK: 0
>> [ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 0000000000000003
>> [ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 0000000000000000
>> [ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 0000000000000000
>> [ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 0000000000000000
>> [ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 00003fffe5fb5ee7
>> [ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 00003fffe5fb42d0
>> [ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 0000000000000000
>> [ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456634] ---- interrupt: 3000
>> [ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e <0b090000> 60000000 3bc00000 ebed0128
>> [ 1887.456657] ---[ end trace 0000000000000000 ]---
>> If you happen to fix this, please add below tag.
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> 
> Ritesh, Mukesh, Is below possible scenario?
> 
> do_page_fault seems to enable irq's in the interrupt handler?
> is that expected? if so, one might see
> 
> -- do_page_fault (enter kernel mode)
>   -- enables interrupts
>   -- gets interrupt - Sets need_resched.
>      -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> and calls preempt_schedule_irq, which catches both
> preempt_count and !irqs_disabled. Hence the panic?
> 
> Should do_page_fault do preempt_disable when it enables the interrupts?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-01  9:56   ` Peter Zijlstra
@ 2026-06-02  7:56     ` Shrikanth Hegde
  2026-06-02  8:18       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Shrikanth Hegde @ 2026-06-02  7:56 UTC (permalink / raw)
  To: Peter Zijlstra, Venkat Rao Bagalkote
  Cc: Madhavan Srinivasan, Mukesh Kumar Chaurasiya, Ritesh Harjani,
	linuxppc-dev, LKML, Srikar Dronamraju



On 6/1/26 3:26 PM, Peter Zijlstra wrote:
> On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:
> 
>> Ritesh, Mukesh, Is below possible scenario?
>>
>> do_page_fault seems to enable irq's in the interrupt handler?
>> is that expected? if so, one might see
>>
>> -- do_page_fault (enter kernel mode)
>>     -- enables interrupts
>>     -- gets interrupt - Sets need_resched.
>>        -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
>> 			 and calls preempt_schedule_irq, which catches both
>> 			 preempt_count and !irqs_disabled. Hence the panic?
>>
>> Should do_page_fault do preempt_disable when it enables the interrupts?
> 
> No, it is expected for page-fault to be able to schedule. Specifically,
> it must be able to sleep to support loading pages from disk.

Oh yes. Ok. Thanks for taking a look.

> 
> Please check the value of preempt_count() (does it perchance have
> HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
> also disable them again once done.

Will check it.

> 
> Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
> but I'm not seeing a local_irq_disable() to match!

Yes, that's likely the culprit. It is possible that ___do_page_fault runs for longer
and it may set need_resched. If it was in kernel mode, then it may not disable the
interrupt and then subsequent irqentry_exit panics.

BTW I was able to consistently repro this on P9 with hackbench as below.

for i in {0..10}; do ./hackbench 10 process 10000 loops; done;
for i in {0..10}; do ./hackbench 20 process 10000 loops; done;
for i in {0..10}; do ./hackbench 30 process 10000 loops; done;
for i in {0..10}; do ./hackbench 40 process 10000 loops; done;    << usually panics here.
for i in {0..10}; do ./hackbench 10 thread 10000 loops; done;
for i in {0..10}; do ./hackbench 20 thread 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 10 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 20 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 30 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 40 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 10 thread 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 20 thread 10000 loops; done;

Note, if i run ./hackbench 40 process 10000 loops alone, it doesn't panic.
Likely some continous stressing needed to get into this case.

Below diff helps to fix it. With it see test passes. Hackbench numbers aren't super happy
about it. It is regressing a bit compared to baseline. But no panic atleast.
AND i have changed the BUG_ON to WARN_ON as irq_disabled right after. We could still fix the
call sites if the warning is seen.

diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
index de5601282755..7da373a56813 100644
--- a/arch/powerpc/include/asm/entry-common.h
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -253,16 +253,17 @@ static inline void arch_interrupt_enter_prepare(struct pt_regs *regs)
  static inline void arch_interrupt_exit_prepare(struct pt_regs *regs)
  {
         if (user_mode(regs)) {
-               BUG_ON(regs_is_unrecoverable(regs));
-               BUG_ON(regs_irqs_disabled(regs));
+               WARN_ON(regs_is_unrecoverable(regs));
+               WARN_ON(regs_irqs_disabled(regs));
                 /*
                  * We don't need to restore AMR on the way back to userspace for KUAP.
                  * AMR can only have been unlocked if we interrupted the kernel.
                  */
                 kuap_assert_locked();
-
-               local_irq_disable();
         }
+
+       /* irqentry_exit expects to be called with interrupts disabled */
+       local_irq_disable();
  }
  
  static inline void arch_interrupt_async_enter_prepare(struct pt_regs *regs)



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-02  7:56     ` Shrikanth Hegde
@ 2026-06-02  8:18       ` Peter Zijlstra
  2026-06-02  9:56         ` Shrikanth Hegde
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2026-06-02  8:18 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani, linuxppc-dev, LKML,
	Srikar Dronamraju

On Tue, Jun 02, 2026 at 01:26:48PM +0530, Shrikanth Hegde wrote:
> 
> 
> On 6/1/26 3:26 PM, Peter Zijlstra wrote:
> > On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:
> > 
> > > Ritesh, Mukesh, Is below possible scenario?
> > > 
> > > do_page_fault seems to enable irq's in the interrupt handler?
> > > is that expected? if so, one might see
> > > 
> > > -- do_page_fault (enter kernel mode)
> > >     -- enables interrupts
> > >     -- gets interrupt - Sets need_resched.
> > >        -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> > > 			 and calls preempt_schedule_irq, which catches both
> > > 			 preempt_count and !irqs_disabled. Hence the panic?
> > > 
> > > Should do_page_fault do preempt_disable when it enables the interrupts?
> > 
> > No, it is expected for page-fault to be able to schedule. Specifically,
> > it must be able to sleep to support loading pages from disk.
> 
> Oh yes. Ok. Thanks for taking a look.
> 
> > 
> > Please check the value of preempt_count() (does it perchance have
> > HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
> > also disable them again once done.
> 
> Will check it.
> 
> > 
> > Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
> > but I'm not seeing a local_irq_disable() to match!
> 
> Yes, that's likely the culprit. It is possible that ___do_page_fault runs for longer
> and it may set need_resched. If it was in kernel mode, then it may not disable the
> interrupt and then subsequent irqentry_exit panics.
> 
> BTW I was able to consistently repro this on P9 with hackbench as below.
> 
> for i in {0..10}; do ./hackbench 10 process 10000 loops; done;
> for i in {0..10}; do ./hackbench 20 process 10000 loops; done;
> for i in {0..10}; do ./hackbench 30 process 10000 loops; done;
> for i in {0..10}; do ./hackbench 40 process 10000 loops; done;    << usually panics here.
> for i in {0..10}; do ./hackbench 10 thread 10000 loops; done;
> for i in {0..10}; do ./hackbench 20 thread 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 10 process 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 20 process 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 30 process 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 40 process 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 10 thread 10000 loops; done;
> for i in {0..10}; do ./hackbench -pipe 20 thread 10000 loops; done;
> 
> Note, if i run ./hackbench 40 process 10000 loops alone, it doesn't panic.
> Likely some continous stressing needed to get into this case.
> 
> Below diff helps to fix it. With it see test passes. Hackbench numbers aren't super happy
> about it. It is regressing a bit compared to baseline. But no panic atleast.
> AND i have changed the BUG_ON to WARN_ON as irq_disabled right after. We could still fix the
> call sites if the warning is seen.
> 
> diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
> index de5601282755..7da373a56813 100644
> --- a/arch/powerpc/include/asm/entry-common.h
> +++ b/arch/powerpc/include/asm/entry-common.h
> @@ -253,16 +253,17 @@ static inline void arch_interrupt_enter_prepare(struct pt_regs *regs)
>  static inline void arch_interrupt_exit_prepare(struct pt_regs *regs)
>  {
>         if (user_mode(regs)) {
> -               BUG_ON(regs_is_unrecoverable(regs));
> -               BUG_ON(regs_irqs_disabled(regs));
> +               WARN_ON(regs_is_unrecoverable(regs));
> +               WARN_ON(regs_irqs_disabled(regs));
>                 /*
>                  * We don't need to restore AMR on the way back to userspace for KUAP.
>                  * AMR can only have been unlocked if we interrupted the kernel.
>                  */
>                 kuap_assert_locked();
> -
> -               local_irq_disable();
>         }
> +
> +       /* irqentry_exit expects to be called with interrupts disabled */
> +       local_irq_disable();
>  }
>  static inline void arch_interrupt_async_enter_prepare(struct pt_regs *regs)
> 

I would suggest trying something a little more focussed like so:

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 806c74e0d5ab..b002c179415c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -589,6 +589,7 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
 	err = ___do_page_fault(regs, regs->dar, regs->dsisr);
 	if (unlikely(err))
 		bad_page_fault(regs, err);
+	local_irq_disable();
 }
 
 DEFINE_INTERRUPT_HANDLER(do_page_fault)

Since only ___do_page_fault() will enable interrupts, you only need to
disable them again on its return path. 



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-02  8:18       ` Peter Zijlstra
@ 2026-06-02  9:56         ` Shrikanth Hegde
  2026-06-02 10:03           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Shrikanth Hegde @ 2026-06-02  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani, linuxppc-dev, LKML,
	Srikar Dronamraju

Hi Peter.

On 6/2/26 1:48 PM, Peter Zijlstra wrote:
> On Tue, Jun 02, 2026 at 01:26:48PM +0530, Shrikanth Hegde wrote:
>>
>>
>> On 6/1/26 3:26 PM, Peter Zijlstra wrote:
>>> On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:
>>>
>>>> Ritesh, Mukesh, Is below possible scenario?
>>>>
>>>> do_page_fault seems to enable irq's in the interrupt handler?
>>>> is that expected? if so, one might see
>>>>
>>>> -- do_page_fault (enter kernel mode)
>>>>      -- enables interrupts
>>>>      -- gets interrupt - Sets need_resched.
>>>>         -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
>>>> 			 and calls preempt_schedule_irq, which catches both
>>>> 			 preempt_count and !irqs_disabled. Hence the panic?
>>>>
>>>> Should do_page_fault do preempt_disable when it enables the interrupts?
>>>
>>> No, it is expected for page-fault to be able to schedule. Specifically,
>>> it must be able to sleep to support loading pages from disk.
>>
>> Oh yes. Ok. Thanks for taking a look.
>>
>>>
>>> Please check the value of preempt_count() (does it perchance have
>>> HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
>>> also disable them again once done.
>>
>> Will check it.
>>
>>>
>>> Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
>>> but I'm not seeing a local_irq_disable() to match!
>>
>> Yes, that's likely the culprit. It is possible that ___do_page_fault runs for longer
>> and it may set need_resched. If it was in kernel mode, then it may not disable the
>> interrupt and then subsequent irqentry_exit panics.
>>
>> BTW I was able to consistently repro this on P9 with hackbench as below.
>>
>> for i in {0..10}; do ./hackbench 10 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench 20 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench 30 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench 40 process 10000 loops; done;    << usually panics here.
>> for i in {0..10}; do ./hackbench 10 thread 10000 loops; done;
>> for i in {0..10}; do ./hackbench 20 thread 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 10 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 20 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 30 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 40 process 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 10 thread 10000 loops; done;
>> for i in {0..10}; do ./hackbench -pipe 20 thread 10000 loops; done;
>>
>> Note, if i run ./hackbench 40 process 10000 loops alone, it doesn't panic.
>> Likely some continous stressing needed to get into this case.
>>
>> Below diff helps to fix it. With it see test passes. Hackbench numbers aren't super happy
>> about it. It is regressing a bit compared to baseline. But no panic atleast.
>> AND i have changed the BUG_ON to WARN_ON as irq_disabled right after. We could still fix the
>> call sites if the warning is seen.
>>
>> diff --git a/arch/powerpc/include/asm/entry-common.h b/arch/powerpc/include/asm/entry-common.h
>> index de5601282755..7da373a56813 100644
>> --- a/arch/powerpc/include/asm/entry-common.h
>> +++ b/arch/powerpc/include/asm/entry-common.h
>> @@ -253,16 +253,17 @@ static inline void arch_interrupt_enter_prepare(struct pt_regs *regs)
>>   static inline void arch_interrupt_exit_prepare(struct pt_regs *regs)
>>   {
>>          if (user_mode(regs)) {
>> -               BUG_ON(regs_is_unrecoverable(regs));
>> -               BUG_ON(regs_irqs_disabled(regs));
>> +               WARN_ON(regs_is_unrecoverable(regs));
>> +               WARN_ON(regs_irqs_disabled(regs));
>>                  /*
>>                   * We don't need to restore AMR on the way back to userspace for KUAP.
>>                   * AMR can only have been unlocked if we interrupted the kernel.
>>                   */
>>                  kuap_assert_locked();
>> -
>> -               local_irq_disable();
>>          }
>> +
>> +       /* irqentry_exit expects to be called with interrupts disabled */
>> +       local_irq_disable();
>>   }
>>   static inline void arch_interrupt_async_enter_prepare(struct pt_regs *regs)
>>
> 
> I would suggest trying something a little more focussed like so:
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 806c74e0d5ab..b002c179415c 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -589,6 +589,7 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>   	err = ___do_page_fault(regs, regs->dar, regs->dsisr);
>   	if (unlikely(err))
>   		bad_page_fault(regs, err);
> +	local_irq_disable();
>   }
>   
>   DEFINE_INTERRUPT_HANDLER(do_page_fault)
> 
> Since only ___do_page_fault() will enable interrupts, you only need to
> disable them again on its return path.
> 

Seems there are more...

do_program_check (called by program_check_exception, emulation_assist_interrupt)
alignment_exception
SPEFloatingPointException
facility_unavailable_exception


Many looks like it can recover only if hit in userspace.
Hence i though it would make sense to put it under arch_interrupt_exit_prepare
which is called just before irqentry_exit.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
  2026-06-02  9:56         ` Shrikanth Hegde
@ 2026-06-02 10:03           ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2026-06-02 10:03 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani, linuxppc-dev, LKML,
	Srikar Dronamraju

On Tue, Jun 02, 2026 at 03:26:11PM +0530, Shrikanth Hegde wrote:

> > I would suggest trying something a little more focussed like so:
> > 
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 806c74e0d5ab..b002c179415c 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -589,6 +589,7 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
> >   	err = ___do_page_fault(regs, regs->dar, regs->dsisr);
> >   	if (unlikely(err))
> >   		bad_page_fault(regs, err);
> > +	local_irq_disable();
> >   }
> >   DEFINE_INTERRUPT_HANDLER(do_page_fault)
> > 
> > Since only ___do_page_fault() will enable interrupts, you only need to
> > disable them again on its return path.
> > 
> 
> Seems there are more...
> 
> do_program_check (called by program_check_exception, emulation_assist_interrupt)
> alignment_exception
> SPEFloatingPointException
> facility_unavailable_exception
> 
> 
> Many looks like it can recover only if hit in userspace.
> Hence i though it would make sense to put it under arch_interrupt_exit_prepare
> which is called just before irqentry_exit.

Ah, fair enough.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-02 10:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01  6:41 [linux-next20260529] kernel BUG at kernel/sched/core.c:7512! Venkat Rao Bagalkote
2026-06-01  9:16 ` Shrikanth Hegde
2026-06-01  9:56   ` Peter Zijlstra
2026-06-02  7:56     ` Shrikanth Hegde
2026-06-02  8:18       ` Peter Zijlstra
2026-06-02  9:56         ` Shrikanth Hegde
2026-06-02 10:03           ` Peter Zijlstra
2026-06-01 13:33   ` Venkat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox