From: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>,
Paul Mackerras <paulus@samba.org>
Subject: Re: offlining cpus breakage
Date: Wed, 14 Jan 2015 16:33:00 +0530 [thread overview]
Message-ID: <54B64CE4.6050806@linux.vnet.ibm.com> (raw)
In-Reply-To: <54ACFE6D.3070308@ozlabs.ru>
On Wednesday 07 January 2015 03:07 PM, Alexey Kardashevskiy wrote:
> Hi!
>
> "ppc64_cpu --smt=off" produces multiple error on the latest upstream kernel
> (sha1 bdec419):
>
> NMI watchdog: BUG: soft lockup - CPU#20 stuck for 23s! [swapper/20:0]
>
> or
>
> INFO: rcu_sched detected stalls on CPUs/tasks: { 2 7 8 9 10 11 12 13 14 15
> 16 17 18 19 20 21 22 23 2
> 4 25 26 27 28 29 30 31} (detected by 6, t=2102 jiffies, g=1617, c=1616,
> q=1441)
>
> and many others, all about lockups
>
> I did bisecting and found out that reverting these helps:
>
> 77b54e9f213f76a23736940cf94bcd765fc00f40 powernv/powerpc: Add winkle
> support for offline cpus
> 7cba160ad789a3ad7e68b92bf20eaad6ed171f80 powernv/cpuidle: Redesign idle
> states management
> 8eb8ac89a364305d05ad16be983b7890eb462cc3 powerpc/powernv: Enable Offline
> CPUs to enter deep idle states
>
> btw reverting just two of them produces a compile error.
>
> It is pseries_le_defconfig, POWER8 machine:
> timebase : 512000000
> platform : PowerNV
> model : palmetto
> machine : PowerNV palmetto
> firmware : OPAL v3
>
>
The bug scenario is as follows:
In fastsleep decrementer state is not maintained, thus a cpu entering
fastsleep offloads its timer to a different cpu (lets call this
broadcast cpu). Now in the event that this broadcast cpu is offlined, it
assigns a new cpu with the task to handle broadcasting.
If this new cpu is one of the cpus which had entered fastsleep, its
decrementer will have been in an invalid state. This cpu has been woken
up by a need resched ipi (to take up the task of broadcasting) as
opposed to a broadcast ipi. The decrementer state is fixed only on a
broadcast ipi and not on a need resched ipi. Because of this, its timers
don't fire. Consequently it cannot wake up any cpu relying on broadcast ipi.
This scenario of a cpu that takes up the task of broadcasting being in
fastsleep is a corner case. This almost never happens on machines with
more number of cores. This explains why Alexey was able to hit it easily
on palmetto.
We'll be posting out a fix for this soon.
Thanks,
Shreyas
next prev parent reply other threads:[~2015-01-14 11:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-07 9:37 offlining cpus breakage Alexey Kardashevskiy
2015-01-14 4:20 ` Shreyas B Prabhu
2015-01-14 11:03 ` Shreyas B Prabhu [this message]
2015-01-15 13:22 ` Preeti U Murthy
2015-01-16 0:28 ` Alexey Kardashevskiy
2015-01-16 3:04 ` Michael Ellerman
2015-01-16 8:56 ` Preeti U Murthy
2015-01-16 9:10 ` Preeti U Murthy
2015-01-22 5:29 ` Michael Ellerman
2015-01-22 6:31 ` Preeti U Murthy
2015-01-17 13:39 ` Preeti U Murthy
2015-01-18 16:50 ` Preeti U Murthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54B64CE4.6050806@linux.vnet.ibm.com \
--to=shreyas@linux.vnet.ibm.com \
--cc=aik@ozlabs.ru \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=preeti@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).