linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/powernv: Fix race in updating core_idle_state
@ 2015-07-01  6:34 Shreyas B. Prabhu
  2015-07-06  4:03 ` Michael Ellerman
  0 siblings, 1 reply; 3+ messages in thread
From: Shreyas B. Prabhu @ 2015-07-01  6:34 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, mahesh, Shreyas B. Prabhu

core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
 - The thread is first in subcore to wakeup from sleep/winkle.
 - If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea		Another thread in the same
waking up		   		core entering sleep/winkle

lwarx   r15,0,r14
ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx.  r15,0,r14
[Code to restore subcore state]

						lwarx   r15,0,r14
						[clear thread bit]
						stwcx.  r15,0,r14

andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw     r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle_power7.S | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index ccde8f0..48c7d4f 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -144,12 +144,26 @@ power7_enter_nap_mode:
 	bge	cr3,2f
 	IDLE_STATE_ENTER_SEQ(PPC_NAP)
 	/* No return */
+
+core_idle_lock_held_entry:
+       HMT_LOW
+core_idle_lock_loop_entry:
+       lwz     r15,0(r14)
+       andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
+       bne     core_idle_lock_loop_entry
+       HMT_MEDIUM
+       b       lwarx_loop1
+
 2:
 	/* Sleep or winkle */
 	lbz	r7,PACA_THREAD_MASK(r13)
 	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
 lwarx_loop1:
 	lwarx	r15,0,r14
+
+	andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
+	bne     core_idle_lock_held_entry
+
 	andc	r15,r15,r7			/* Clear thread bit */
 
 	andi.	r15,r15,PNV_CORE_IDLE_THREAD_BITS
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: powerpc/powernv: Fix race in updating core_idle_state
  2015-07-01  6:34 [PATCH] powerpc/powernv: Fix race in updating core_idle_state Shreyas B. Prabhu
@ 2015-07-06  4:03 ` Michael Ellerman
  2015-07-06  4:32   ` Shreyas B Prabhu
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Ellerman @ 2015-07-06  4:03 UTC (permalink / raw)
  To: Shreyas B. Prabhu, Paul Mackerras
  Cc: mahesh, linuxppc-dev, linux-kernel, Shreyas B. Prabhu

On Wed, 2015-01-07 at 06:34:10 UTC, "Shreyas B. Prabhu" wrote:
> core_idle_state is maintained for each core. It uses 0-7 bits to track
> whether a thread in the core has entered fastsleep or winkle. 8th bit is
> used as a lock bit.
> The lock bit is set in these 2 scenarios-
>  - The thread is first in subcore to wakeup from sleep/winkle.
>  - If its the last thread in the core about to enter sleep/winkle
> 
> While the lock bit is set, if any other thread in the core wakes up, it
> loops until the lock bit is cleared before proceeding in the wakeup
> path. This helps prevent race conditions w.r.t fastsleep workaround and
> prevents threads from switching to process context before core/subcore
> resources are restored.
> 
> But, in the path to sleep/winkle entry, we currently don't check for
> lock-bit. This exposes us to following race when running with subcore
> on-
> 
> First thread in the subcorea		Another thread in the same
> waking up		   		core entering sleep/winkle
> 
> lwarx   r15,0,r14
> ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
> stwcx.  r15,0,r14
> [Code to restore subcore state]
> 
> 						lwarx   r15,0,r14
> 						[clear thread bit]
> 						stwcx.  r15,0,r14
> 
> andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
> stw     r15,0(r14)
> 
> Here, after the thread entering sleep clears its thread bit in
> core_idle_state, the value is overwritten by the thread waking up.
> This patch fixes the above race by looping on the lock bit even while
> entering the idle states.

What are the symptoms of this bug?

I assume they're not good. In which case this should go to stable, shouldn't
it? If so which versions?

And which commit introduced the bug?

cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: powerpc/powernv: Fix race in updating core_idle_state
  2015-07-06  4:03 ` Michael Ellerman
@ 2015-07-06  4:32   ` Shreyas B Prabhu
  0 siblings, 0 replies; 3+ messages in thread
From: Shreyas B Prabhu @ 2015-07-06  4:32 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras; +Cc: mahesh, linuxppc-dev, linux-kernel


> 
> What are the symptoms of this bug?
> 
In the cases where we hit this race and the core enters fastsleep, 
code mistakes an idle thread as running. Because of this, the first
thread waking up from fastsleep which is supposed to resync timebase
 skips it. So we can end up having a core with stale timebase value.

We suspect this is causing soft lockups with call stacks similar to this-

[126529.208714] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [opal_errd:7722]
[126529.208849] CPU: 8 PID: 7722 Comm: opal_errd
[126529.208853] task: c00000bf67803a80 ti: c00000bf6788c000 task.ti: c00000bf6788c000
[126529.208856] NIP: c00000000015a180 LR: c00000000015a0d0 CTR: c00000000001ed70
[126529.208859] REGS: c00000bf6788faa0 TRAP: 0901   Not tainted  (3.18.13-336.el7_1.pkvm3_1_0.2000.1.ppc64le)
[126529.208860] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004824  XER: 20000000
[126529.208871] CFAR: c00000000015a194 SOFTE: 1 
GPR00: c0000000002db9e8 c00000bf6788fd20 c0000000012b1800 00003af5b88f569e 
GPR04: 0000000000d3dbb8 00003af5c236ca0b ffffffffffffffff 000000000001ee28 
GPR08: 000000003b9ac9ff 5bfc723fba82c8f9 00000000c06f2b88 c0000000009908c8 
GPR12: c00000000001ed70 c000000007da4c00 
[126529.208896] NIP [c00000000015a180] ktime_get_ts64+0x130/0x1f0
[126529.208899] LR [c00000000015a0d0] ktime_get_ts64+0x80/0x1f0
[126529.208902] Call Trace:
[126529.208909] [c00000bf6788fd20] [c00000000019c0e4] __audit_syscall_exit+0x214/0x2e0 (unreliable)
[126529.208916] [c00000bf6788fda0] [c0000000002db9e8] poll_select_set_timeout+0x98/0xe0
[126529.208919] [c00000bf6788fde0] [c0000000002dcf7c] SyS_poll+0x8c/0x160
[126529.208925] [c00000bf6788fe30] [c000000000009358] syscall_exit+0x0/0x98
[126529.208927] Instruction dump:
[126529.208930] 7d29ea14 6108c9ff 39400000 7fa94040 409d0038 4800001c 60000000 60000000 
[126529.208936] 60000000 60000000 60000000 60420000 <3d29c465> 394a0001 39293600 794a0020 

> I assume they're not good. In which case this should go to stable, shouldn't
> it? If so which versions?
> 
Yes this should go into stable. 3.19+

> And which commit introduced the bug?
> 
77b54e9f213f76a powernv/powerpc: Add winkle support for offline cpus


Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-07-06  4:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-01  6:34 [PATCH] powerpc/powernv: Fix race in updating core_idle_state Shreyas B. Prabhu
2015-07-06  4:03 ` Michael Ellerman
2015-07-06  4:32   ` Shreyas B Prabhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).