* [BUG] powerpc: perf record crash
@ 2012-05-11 0:23 Sukadev Bhattiprolu
2012-05-11 0:37 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 5+ messages in thread
From: Sukadev Bhattiprolu @ 2012-05-11 0:23 UTC (permalink / raw)
To: linuxppc-dev
I get this crash when I run on 3.4.0-rc5 on a P6.
./perf record -a -d -- ./perf bench sched all
I rebuilt 'perf' locally after building the kernel 3.4.0-rc5.
Sometimes it occurs on the first attempt, sometimes on the second.
Pls let me know if I can provide any other debug info.
----
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel 3.4.0-rc5-mainline on an ppc64
stormy2 login: kernel BUG at arch/powerpc/kernel/irq.c:188!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
NIP: c00000000000ea9c LR: c000000000010548 CTR: 0000000000000000
REGS: c00000006de879f0 TRAP: 0700 Not tainted (3.4.0-rc5-mainline)
MSR: 8000000000021032 <SF,ME,IR,DR,RI> CR: 22000088 XER: 00000000
SOFTE: 0
CFAR: 0000000000003318
TASK = c00000006de62d50[0] 'swapper/7' THREAD: c00000006de84000 CPU: 7
GPR00: 0000000000000001 c00000006de87c70 c000000000c47d88 0000000000000500
GPR04: 000000000bf87ff4 0000000000000000 0000000000000007 000e1cfe2fc642ff
GPR08: 00000000007e0000 c000000006df1500 0000000000000001 0000000000000000
GPR12: 0000000042000088 c000000006df1500 c00000006de87f90 0000000006f042c0
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 8000000000009032
GPR24: c00000006de84000 0000000000000000 c000000000e356d0 c000000000cd0318
GPR28: c0000000006229b8 c000000000b55f58 c000000000bce190 c000000000ff8eb0
NIP [c00000000000ea9c] .__check_irq_replay+0x7c/0x90
LR [c000000000010548] .arch_local_irq_restore+0x38/0x90
Call Trace:
[c00000006de87c70] [00000000832e13f4] 0x832e13f4 (unreliable)
[c00000006de87ce0] [c0000000004ca118] .cpuidle_idle_call+0x1f8/0x380
[c00000006de87da0] [c000000000055450] .pSeries_idle+0x10/0x40
[c00000006de87e10] [c000000000017ce8] .cpu_idle+0x178/0x290
[c00000006de87ed0] [c0000000006081f0] .start_secondary+0x350/0x35c
[c00000006de87f90] [c00000000000936c] .start_secondary_prolog+0x10/0x14
Instruction dump:
4e800020 7da96b78 8809022b 794bf7e3 38600500 5400e87e 5400183e 9809022b
4082ffdc 880d022b 7c0000d0 78000fe0 <0b000000> 38600000 4bffffc4 60000000
---[ end trace 39f49db612f9ab0f ]---
Kernel panic - not syncing: Attempted to kill the idle task!
panic occurred, switching back to text console
Dumping ftrace buffer:
(ftrace buffer empty)
Current pid: 6199 comm: perf / Idle pid: 0 comm: swapper/7
------------[ cut here ]------------
WARNING: at kernel/rcutree.c:465
Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
NIP: c000000000110aa0 LR: c000000000110a9c CTR: c000000000060f40
REGS: c000000068b96e00 TRAP: 0700 Tainted: G D (3.4.0-rc5-mainline)
MSR: 8000000000021032 <SF,ME,IR,DR,RI> CR: 28022482 XER: 0000000a
SOFTE: 0
CFAR: c000000000602cc4
TASK = c000000059620090[6199] 'perf' THREAD: c000000068b94000 CPU: 7
GPR00: c000000000110a9c c000000068b97080 c000000000c47d88 000000000000003d
GPR04: 0000000000000000 0000000000000000 0000000000000000 800000000b2a3d80
GPR08: 0000000000000000 c000000000804208 000000000002ef30 00000000007e0000
GPR12: 0000000024022488 c000000006df1500 c000000069c8db00 c000000000815b80
GPR16: c000000000815b80 0000000000000000 c000000068b97750 c000000000caed60
GPR20: c000000068b94080 0000000000000001 0000000000000000 0000000000000004
GPR24: c000000000815b80 c000000000cacad8 c000000068b94000 0000000000000000
GPR28: c0000000008155f0 c00000006de62d50 c000000000bd5c40 c000000000b88053
NIP [c000000000110aa0] .rcu_idle_exit_common+0xc0/0x100
LR [c000000000110a9c] .rcu_idle_exit_common+0xbc/0x100
Call Trace:
[c000000068b97080] [c000000000110a9c] .rcu_idle_exit_common+0xbc/0x100 (unreliable)
[c000000068b97110] [c000000000110b98] .rcu_irq_enter+0xb8/0xe0
[c000000068b97190] [c00000000007b5ec] .irq_enter+0x1c/0xc0
[c000000068b97220] [c000000000010024] .do_IRQ+0x64/0x2e0
[c000000068b972e0] [c0000000000038c0] hardware_interrupt_common+0x140/0x180
--- Exception: 501 at .arch_local_irq_restore+0x74/0x90
LR = .arch_local_irq_restore+0x74/0x90
[c000000068b975d0] [c0000000000b35e8] .update_rq_clock+0x48/0x80 (unreliable)
[c000000068b97640] [c0000000000b319c] .finish_task_switch+0x7c/0x150
[c000000068b976e0] [c0000000005f62b0] .__schedule+0x2f0/0x6e0
[c000000068b97960] [c0000000001d0ac4] .pipe_wait+0x64/0xa0
[c000000068b97a20] [c0000000001d1578] .pipe_read+0x3d8/0x670
[c000000068b97b50] [c0000000001c35a4] .do_sync_read+0xb4/0x140
[c000000068b97ce0] [c0000000001c475c] .vfs_read+0xec/0x1e0
[c000000068b97d80] [c0000000001c4978] .SyS_read+0x58/0xd0
[c000000068b97e30] [c0000000000097dc] syscall_exit+0x0/0x38
Instruction dump:
881f000e 2f800001 41feffbc e92d0200 e8ad0200 e889022e e8dd022e e87e8120
38a503c8 38fd03c8 484f21fd 60000000 <0fe00000> 38000001 981f000e 4bffff88
---[ end trace 39f49db612f9ab10 ]---
Current pid: 6199 comm: perf / Idle pid: 0 comm: swapper/7
------------[ cut here ]------------
WARNING: at kernel/rcutree.c:355
Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
NIP: c000000000110d80 LR: c000000000110d7c CTR: c000000000060f40
REGS: c000000068b96e10 TRAP: 0700 Tainted: G D W (3.4.0-rc5-mainline)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28022482 XER: 0000000a
SOFTE: 0
CFAR: c000000000602cc4
TASK = c000000059620090[6199] 'perf' THREAD: c000000068b94000 CPU: 7
GPR00: c000000000110d7c c000000068b97090 c000000000c47d88 000000000000003d
GPR04: 0000000000000000 0000000000000000 0000000000000000 800000000b2a3d80
GPR08: 0000000000000000 c000000000804208 00000000000399f0 00000000007e0000
GPR12: 0000000028022488 c000000006df1500 c000000069c8db00 c000000000815b80
GPR16: c000000000815b80 0000000000000000 c000000068b97750 c000000000caed60
GPR20: c000000068b94080 0000000000000001 0000000000000010 c000000068b94100
GPR24: 0000000000000000 c000000068b94000 c0000000e7fac000 0000000000000000
GPR28: c00000006de62d50 c000000000b88053 c000000000bd5c40 c000000000fe4bf8
NIP [c000000000110d80] .rcu_idle_enter_common+0xd0/0x110
LR [c000000000110d7c] .rcu_idle_enter_common+0xcc/0x110
Call Trace:
[c000000068b97090] [c000000000110d7c] .rcu_idle_enter_common+0xcc/0x110 (unreliable)
[c000000068b97120] [c000000000110e74] .rcu_irq_exit+0xb4/0xe0
[c000000068b971a0] [c00000000007b3ec] .irq_exit+0x8c/0xf0
[c000000068b97220] [c000000000010094] .do_IRQ+0xd4/0x2e0
[c000000068b972e0] [c0000000000038c0] hardware_interrupt_common+0x140/0x180
--- Exception: 501 at .arch_local_irq_restore+0x74/0x90
LR = .arch_local_irq_restore+0x74/0x90
[c000000068b975d0] [c0000000000b35e8] .update_rq_clock+0x48/0x80 (unreliable)
[c000000068b97640] [c0000000000b319c] .finish_task_switch+0x7c/0x150
[c000000068b976e0] [c0000000005f62b0] .__schedule+0x2f0/0x6e0
[c000000068b97960] [c0000000001d0ac4] .pipe_wait+0x64/0xa0
[c000000068b97a20] [c0000000001d1578] .pipe_read+0x3d8/0x670
[c000000068b97b50] [c0000000001c35a4] .do_sync_read+0xb4/0x140
[c000000068b97ce0] [c0000000001c475c] .vfs_read+0xec/0x1e0
[c000000068b97d80] [c0000000001c4978] .SyS_read+0x58/0xd0
[c000000068b97e30] [c0000000000097dc] syscall_exit+0x0/0x38
Instruction dump:
881d0011 2f800001 41feff8c e92d0200 e8ad0200 e889022e e8dc022e e87e8120
38a503c8 38fc03c8 484f1f1d 60000000 <0fe00000> 38000001 981d0011 4bffff58
---[ end trace 39f49db612f9ab11 ]---
Rebooting in 10 seconds..
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] powerpc: perf record crash
2012-05-11 0:23 [BUG] powerpc: perf record crash Sukadev Bhattiprolu
@ 2012-05-11 0:37 ` Benjamin Herrenschmidt
2012-05-11 2:12 ` [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync Benjamin Herrenschmidt
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-11 0:37 UTC (permalink / raw)
To: Sukadev Bhattiprolu; +Cc: linuxppc-dev
On Thu, 2012-05-10 at 17:23 -0700, Sukadev Bhattiprolu wrote:
> I get this crash when I run on 3.4.0-rc5 on a P6.
>
> ./perf record -a -d -- ./perf bench sched all
>
> I rebuilt 'perf' locally after building the kernel 3.4.0-rc5.
>
> Sometimes it occurs on the first attempt, sometimes on the second.
> Pls let me know if I can provide any other debug info.
It looks like the same bug that Wang Sheng-Hui <shhuiw@gmail.com>
reported, I'm still tracking it. I pushed a patch upstream that I
thought fixed it but according to Wang, it's still around.
Looking at it now.
Cheers,
Ben.
> ----
>
> Red Hat Enterprise Linux Server release 6.2 (Santiago)
> Kernel 3.4.0-rc5-mainline on an ppc64
>
> stormy2 login: kernel BUG at arch/powerpc/kernel/irq.c:188!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> NIP: c00000000000ea9c LR: c000000000010548 CTR: 0000000000000000
> REGS: c00000006de879f0 TRAP: 0700 Not tainted (3.4.0-rc5-mainline)
> MSR: 8000000000021032 <SF,ME,IR,DR,RI> CR: 22000088 XER: 00000000
> SOFTE: 0
> CFAR: 0000000000003318
> TASK = c00000006de62d50[0] 'swapper/7' THREAD: c00000006de84000 CPU: 7
> GPR00: 0000000000000001 c00000006de87c70 c000000000c47d88 0000000000000500
> GPR04: 000000000bf87ff4 0000000000000000 0000000000000007 000e1cfe2fc642ff
> GPR08: 00000000007e0000 c000000006df1500 0000000000000001 0000000000000000
> GPR12: 0000000042000088 c000000006df1500 c00000006de87f90 0000000006f042c0
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 8000000000009032
> GPR24: c00000006de84000 0000000000000000 c000000000e356d0 c000000000cd0318
> GPR28: c0000000006229b8 c000000000b55f58 c000000000bce190 c000000000ff8eb0
> NIP [c00000000000ea9c] .__check_irq_replay+0x7c/0x90
> LR [c000000000010548] .arch_local_irq_restore+0x38/0x90
> Call Trace:
> [c00000006de87c70] [00000000832e13f4] 0x832e13f4 (unreliable)
> [c00000006de87ce0] [c0000000004ca118] .cpuidle_idle_call+0x1f8/0x380
> [c00000006de87da0] [c000000000055450] .pSeries_idle+0x10/0x40
> [c00000006de87e10] [c000000000017ce8] .cpu_idle+0x178/0x290
> [c00000006de87ed0] [c0000000006081f0] .start_secondary+0x350/0x35c
> [c00000006de87f90] [c00000000000936c] .start_secondary_prolog+0x10/0x14
> Instruction dump:
> 4e800020 7da96b78 8809022b 794bf7e3 38600500 5400e87e 5400183e 9809022b
> 4082ffdc 880d022b 7c0000d0 78000fe0 <0b000000> 38600000 4bffffc4 60000000
> ---[ end trace 39f49db612f9ab0f ]---
>
> Kernel panic - not syncing: Attempted to kill the idle task!
> panic occurred, switching back to text console
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Current pid: 6199 comm: perf / Idle pid: 0 comm: swapper/7
> ------------[ cut here ]------------
> WARNING: at kernel/rcutree.c:465
> Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> NIP: c000000000110aa0 LR: c000000000110a9c CTR: c000000000060f40
> REGS: c000000068b96e00 TRAP: 0700 Tainted: G D (3.4.0-rc5-mainline)
> MSR: 8000000000021032 <SF,ME,IR,DR,RI> CR: 28022482 XER: 0000000a
> SOFTE: 0
> CFAR: c000000000602cc4
> TASK = c000000059620090[6199] 'perf' THREAD: c000000068b94000 CPU: 7
> GPR00: c000000000110a9c c000000068b97080 c000000000c47d88 000000000000003d
> GPR04: 0000000000000000 0000000000000000 0000000000000000 800000000b2a3d80
> GPR08: 0000000000000000 c000000000804208 000000000002ef30 00000000007e0000
> GPR12: 0000000024022488 c000000006df1500 c000000069c8db00 c000000000815b80
> GPR16: c000000000815b80 0000000000000000 c000000068b97750 c000000000caed60
> GPR20: c000000068b94080 0000000000000001 0000000000000000 0000000000000004
> GPR24: c000000000815b80 c000000000cacad8 c000000068b94000 0000000000000000
> GPR28: c0000000008155f0 c00000006de62d50 c000000000bd5c40 c000000000b88053
> NIP [c000000000110aa0] .rcu_idle_exit_common+0xc0/0x100
> LR [c000000000110a9c] .rcu_idle_exit_common+0xbc/0x100
> Call Trace:
> [c000000068b97080] [c000000000110a9c] .rcu_idle_exit_common+0xbc/0x100 (unreliable)
> [c000000068b97110] [c000000000110b98] .rcu_irq_enter+0xb8/0xe0
> [c000000068b97190] [c00000000007b5ec] .irq_enter+0x1c/0xc0
> [c000000068b97220] [c000000000010024] .do_IRQ+0x64/0x2e0
> [c000000068b972e0] [c0000000000038c0] hardware_interrupt_common+0x140/0x180
> --- Exception: 501 at .arch_local_irq_restore+0x74/0x90
> LR = .arch_local_irq_restore+0x74/0x90
> [c000000068b975d0] [c0000000000b35e8] .update_rq_clock+0x48/0x80 (unreliable)
> [c000000068b97640] [c0000000000b319c] .finish_task_switch+0x7c/0x150
> [c000000068b976e0] [c0000000005f62b0] .__schedule+0x2f0/0x6e0
> [c000000068b97960] [c0000000001d0ac4] .pipe_wait+0x64/0xa0
> [c000000068b97a20] [c0000000001d1578] .pipe_read+0x3d8/0x670
> [c000000068b97b50] [c0000000001c35a4] .do_sync_read+0xb4/0x140
> [c000000068b97ce0] [c0000000001c475c] .vfs_read+0xec/0x1e0
> [c000000068b97d80] [c0000000001c4978] .SyS_read+0x58/0xd0
> [c000000068b97e30] [c0000000000097dc] syscall_exit+0x0/0x38
> Instruction dump:
> 881f000e 2f800001 41feffbc e92d0200 e8ad0200 e889022e e8dd022e e87e8120
> 38a503c8 38fd03c8 484f21fd 60000000 <0fe00000> 38000001 981f000e 4bffff88
> ---[ end trace 39f49db612f9ab10 ]---
> Current pid: 6199 comm: perf / Idle pid: 0 comm: swapper/7
> ------------[ cut here ]------------
> WARNING: at kernel/rcutree.c:355
> Modules linked in: ipv6 bnx2 ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm i2c_algo_bit i2c_core power_supply dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> NIP: c000000000110d80 LR: c000000000110d7c CTR: c000000000060f40
> REGS: c000000068b96e10 TRAP: 0700 Tainted: G D W (3.4.0-rc5-mainline)
> MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28022482 XER: 0000000a
> SOFTE: 0
> CFAR: c000000000602cc4
> TASK = c000000059620090[6199] 'perf' THREAD: c000000068b94000 CPU: 7
> GPR00: c000000000110d7c c000000068b97090 c000000000c47d88 000000000000003d
> GPR04: 0000000000000000 0000000000000000 0000000000000000 800000000b2a3d80
> GPR08: 0000000000000000 c000000000804208 00000000000399f0 00000000007e0000
> GPR12: 0000000028022488 c000000006df1500 c000000069c8db00 c000000000815b80
> GPR16: c000000000815b80 0000000000000000 c000000068b97750 c000000000caed60
> GPR20: c000000068b94080 0000000000000001 0000000000000010 c000000068b94100
> GPR24: 0000000000000000 c000000068b94000 c0000000e7fac000 0000000000000000
> GPR28: c00000006de62d50 c000000000b88053 c000000000bd5c40 c000000000fe4bf8
> NIP [c000000000110d80] .rcu_idle_enter_common+0xd0/0x110
> LR [c000000000110d7c] .rcu_idle_enter_common+0xcc/0x110
> Call Trace:
> [c000000068b97090] [c000000000110d7c] .rcu_idle_enter_common+0xcc/0x110 (unreliable)
> [c000000068b97120] [c000000000110e74] .rcu_irq_exit+0xb4/0xe0
> [c000000068b971a0] [c00000000007b3ec] .irq_exit+0x8c/0xf0
> [c000000068b97220] [c000000000010094] .do_IRQ+0xd4/0x2e0
> [c000000068b972e0] [c0000000000038c0] hardware_interrupt_common+0x140/0x180
> --- Exception: 501 at .arch_local_irq_restore+0x74/0x90
> LR = .arch_local_irq_restore+0x74/0x90
> [c000000068b975d0] [c0000000000b35e8] .update_rq_clock+0x48/0x80 (unreliable)
> [c000000068b97640] [c0000000000b319c] .finish_task_switch+0x7c/0x150
> [c000000068b976e0] [c0000000005f62b0] .__schedule+0x2f0/0x6e0
> [c000000068b97960] [c0000000001d0ac4] .pipe_wait+0x64/0xa0
> [c000000068b97a20] [c0000000001d1578] .pipe_read+0x3d8/0x670
> [c000000068b97b50] [c0000000001c35a4] .do_sync_read+0xb4/0x140
> [c000000068b97ce0] [c0000000001c475c] .vfs_read+0xec/0x1e0
> [c000000068b97d80] [c0000000001c4978] .SyS_read+0x58/0xd0
> [c000000068b97e30] [c0000000000097dc] syscall_exit+0x0/0x38
> Instruction dump:
> 881d0011 2f800001 41feff8c e92d0200 e8ad0200 e889022e e8dc022e e87e8120
> 38a503c8 38fc03c8 484f1f1d 60000000 <0fe00000> 38000001 981d0011 4bffff58
> ---[ end trace 39f49db612f9ab11 ]---
> Rebooting in 10 seconds..
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync
2012-05-11 0:37 ` Benjamin Herrenschmidt
@ 2012-05-11 2:12 ` Benjamin Herrenschmidt
2012-05-11 17:04 ` Sukadev Bhattiprolu
2012-05-14 0:53 ` Wang Sheng-Hui
0 siblings, 2 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-11 2:12 UTC (permalink / raw)
To: Sukadev Bhattiprolu, Wang Sheng-Hui; +Cc: Paul Mackerras, linuxppc-dev
So we have another case of paca->irq_happened getting out of
sync with the HW irq state. This can happen when a perfmon
interrupt occurs while soft disabled, as it will return to a
soft disabled but hard enabled context while leaving a stale
PACA_IRQ_HARD_DIS flag set.
This patch fixes it, and also adds a test for the condition
of those flags being out of sync in arch_local_irq_restore()
when CONFIG_TRACE_IRQFLAGS is enabled.
This helps catching those gremlins faster (and so far I
can't seem see any anymore, so that's good news).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Please test ASAP as I need to send that to Linus today
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index f8a7a1a..293e283 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -588,23 +588,19 @@ _GLOBAL(ret_from_except_lite)
fast_exc_return_irq:
restore:
/*
- * This is the main kernel exit path, we first check if we
- * have to change our interrupt state.
+ * This is the main kernel exit path. First we check if we
+ * are about to re-enable interrupts
*/
ld r5,SOFTE(r1)
lbz r6,PACASOFTIRQEN(r13)
- cmpwi cr1,r5,0
- cmpw cr0,r5,r6
- beq cr0,4f
+ cmpwi cr0,r5,0
+ beq restore_irq_off
- /* We do, handle disable first, which is easy */
- bne cr1,3f;
- li r0,0
- stb r0,PACASOFTIRQEN(r13);
- TRACE_DISABLE_INTS
- b 4f
+ /* We are enabling, were we already enabled ? Yes, just return */
+ cmpwi cr0,r6,1
+ beq cr0,do_restore
-3: /*
+ /*
* We are about to soft-enable interrupts (we are hard disabled
* at this point). We check if there's anything that needs to
* be replayed first.
@@ -626,7 +622,7 @@ restore_no_replay:
/*
* Final return path. BookE is handled in a different file
*/
-4:
+do_restore:
#ifdef CONFIG_PPC_BOOK3E
b .exception_return_book3e
#else
@@ -700,6 +696,25 @@ fast_exception_return:
#endif /* CONFIG_PPC_BOOK3E */
/*
+ * We are returning to a context with interrupts soft disabled.
+ *
+ * However, we may also about to hard enable, so we need to
+ * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
+ * or that bit can get out of sync and bad things will happen
+ */
+restore_irq_off:
+ ld r3,_MSR(r1)
+ lbz r7,PACAIRQHAPPENED(r13)
+ andi. r0,r3,MSR_EE
+ beq 1f
+ rlwinm r7,r7,0,~PACA_IRQ_HARD_DIS
+ stb r7,PACAIRQHAPPENED(r13)
+1: li r0,0
+ stb r0,PACASOFTIRQEN(r13);
+ TRACE_DISABLE_INTS
+ b do_restore
+
+ /*
* Something did happen, check if a re-emit is needed
* (this also clears paca->irq_happened)
*/
@@ -748,6 +763,9 @@ restore_check_irq_replay:
#endif /* CONFIG_PPC_BOOK3E */
1: b .ret_from_except /* What else to do here ? */
+
+
+3:
do_work:
#ifdef CONFIG_PREEMPT
andi. r0,r3,MSR_PR /* Returning to user mode? */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5ec1b23..fe8cf8e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -229,6 +229,19 @@ notrace void arch_local_irq_restore(unsigned long en)
*/
if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
__hard_irq_disable();
+#ifdef CONFIG_TRACE_IRQFLAG
+ else {
+ /*
+ * We should already be hard disabled here. We had bugs
+ * where that wasn't the case so let's dbl check it and
+ * warn if we are wrong. Only do that when IRQ tracing
+ * is enabled as mfmsr() can be costly.
+ */
+ if (WARN_ON(mfmsr() & MSR_EE))
+ __hard_irq_disable();
+ }
+#endif /* CONFIG_TRACE_IRQFLAG */
+
set_soft_enabled(0);
/*
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync
2012-05-11 2:12 ` [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync Benjamin Herrenschmidt
@ 2012-05-11 17:04 ` Sukadev Bhattiprolu
2012-05-14 0:53 ` Wang Sheng-Hui
1 sibling, 0 replies; 5+ messages in thread
From: Sukadev Bhattiprolu @ 2012-05-11 17:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Paul Mackerras, linuxppc-dev, Wang Sheng-Hui
Benjamin Herrenschmidt [benh@kernel.crashing.org] wrote:
| So we have another case of paca->irq_happened getting out of
| sync with the HW irq state. This can happen when a perfmon
| interrupt occurs while soft disabled, as it will return to a
| soft disabled but hard enabled context while leaving a stale
| PACA_IRQ_HARD_DIS flag set.
|
| This patch fixes it, and also adds a test for the condition
| of those flags being out of sync in arch_local_irq_restore()
| when CONFIG_TRACE_IRQFLAGS is enabled.
|
| This helps catching those gremlins faster (and so far I
| can't seem see any anymore, so that's good news).
|
| Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| ---
|
| Please test ASAP as I need to send that to Linus today
Works for me. I was able to run my script over a 100 times. Without
the patch, it fails reliably first or second attempt.
Thanks for fixing it quickly.
Sukadev
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync
2012-05-11 2:12 ` [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync Benjamin Herrenschmidt
2012-05-11 17:04 ` Sukadev Bhattiprolu
@ 2012-05-14 0:53 ` Wang Sheng-Hui
1 sibling, 0 replies; 5+ messages in thread
From: Wang Sheng-Hui @ 2012-05-14 0:53 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Paul Mackerras, Sukadev Bhattiprolu, linuxppc-dev
On 2012年05月11日 10:12, Benjamin Herrenschmidt wrote:
> So we have another case of paca->irq_happened getting out of
> sync with the HW irq state. This can happen when a perfmon
> interrupt occurs while soft disabled, as it will return to a
> soft disabled but hard enabled context while leaving a stale
> PACA_IRQ_HARD_DIS flag set.
>
> This patch fixes it, and also adds a test for the condition
> of those flags being out of sync in arch_local_irq_restore()
> when CONFIG_TRACE_IRQFLAGS is enabled.
>
> This helps catching those gremlins faster (and so far I
> can't seem see any anymore, so that's good news).
This patch can work on my system.
Verified.
Thanks,
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> Please test ASAP as I need to send that to Linus today
>
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index f8a7a1a..293e283 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -588,23 +588,19 @@ _GLOBAL(ret_from_except_lite)
> fast_exc_return_irq:
> restore:
> /*
> - * This is the main kernel exit path, we first check if we
> - * have to change our interrupt state.
> + * This is the main kernel exit path. First we check if we
> + * are about to re-enable interrupts
> */
> ld r5,SOFTE(r1)
> lbz r6,PACASOFTIRQEN(r13)
> - cmpwi cr1,r5,0
> - cmpw cr0,r5,r6
> - beq cr0,4f
> + cmpwi cr0,r5,0
> + beq restore_irq_off
>
> - /* We do, handle disable first, which is easy */
> - bne cr1,3f;
> - li r0,0
> - stb r0,PACASOFTIRQEN(r13);
> - TRACE_DISABLE_INTS
> - b 4f
> + /* We are enabling, were we already enabled ? Yes, just return */
> + cmpwi cr0,r6,1
> + beq cr0,do_restore
>
> -3: /*
> + /*
> * We are about to soft-enable interrupts (we are hard disabled
> * at this point). We check if there's anything that needs to
> * be replayed first.
> @@ -626,7 +622,7 @@ restore_no_replay:
> /*
> * Final return path. BookE is handled in a different file
> */
> -4:
> +do_restore:
> #ifdef CONFIG_PPC_BOOK3E
> b .exception_return_book3e
> #else
> @@ -700,6 +696,25 @@ fast_exception_return:
> #endif /* CONFIG_PPC_BOOK3E */
>
> /*
> + * We are returning to a context with interrupts soft disabled.
> + *
> + * However, we may also about to hard enable, so we need to
> + * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
> + * or that bit can get out of sync and bad things will happen
> + */
> +restore_irq_off:
> + ld r3,_MSR(r1)
> + lbz r7,PACAIRQHAPPENED(r13)
> + andi. r0,r3,MSR_EE
> + beq 1f
> + rlwinm r7,r7,0,~PACA_IRQ_HARD_DIS
> + stb r7,PACAIRQHAPPENED(r13)
> +1: li r0,0
> + stb r0,PACASOFTIRQEN(r13);
> + TRACE_DISABLE_INTS
> + b do_restore
> +
> + /*
> * Something did happen, check if a re-emit is needed
> * (this also clears paca->irq_happened)
> */
> @@ -748,6 +763,9 @@ restore_check_irq_replay:
> #endif /* CONFIG_PPC_BOOK3E */
> 1: b .ret_from_except /* What else to do here ? */
>
> +
> +
> +3:
> do_work:
> #ifdef CONFIG_PREEMPT
> andi. r0,r3,MSR_PR /* Returning to user mode? */
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 5ec1b23..fe8cf8e 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -229,6 +229,19 @@ notrace void arch_local_irq_restore(unsigned long en)
> */
> if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
> __hard_irq_disable();
> +#ifdef CONFIG_TRACE_IRQFLAG
> + else {
> + /*
> + * We should already be hard disabled here. We had bugs
> + * where that wasn't the case so let's dbl check it and
> + * warn if we are wrong. Only do that when IRQ tracing
> + * is enabled as mfmsr() can be costly.
> + */
> + if (WARN_ON(mfmsr() & MSR_EE))
> + __hard_irq_disable();
> + }
> +#endif /* CONFIG_TRACE_IRQFLAG */
> +
> set_soft_enabled(0);
>
> /*
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-05-14 0:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-11 0:23 [BUG] powerpc: perf record crash Sukadev Bhattiprolu
2012-05-11 0:37 ` Benjamin Herrenschmidt
2012-05-11 2:12 ` [PATCH] powerpc/irq: Fix another case of lazy IRQ state getting out of sync Benjamin Herrenschmidt
2012-05-11 17:04 ` Sukadev Bhattiprolu
2012-05-14 0:53 ` Wang Sheng-Hui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).