From: Michael Bringmann <mwb@linux.vnet.ibm.com>
To: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: ego@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Subject: Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Date: Tue, 11 Dec 2018 16:11:28 -0600 [thread overview]
Message-ID: <f93ec711-a7c6-e754-2002-2dad2a893005@linux.vnet.ibm.com> (raw)
In-Reply-To: <87a7ld3yz4.fsf@morokweng.localdomain>
Note from Scott Mayes on latest crash:
Michael,
Since the partition crashed, I was able to get the last .2 seconds worth of RTAS call trace leading up to the crash.
Best I could tell from that bit of trace was that the removal of a processor involved the following steps:
-- Call to stop-self for a given thread
-- Repeated calls to query-cpu-stopped-state (which eventually indicated the thread was stopped)
-- Call to get-sensor-state for the thread to check its entity-state (9003) sensor which returned 'dr-entity-present'
-- Call to set-indicator to set the isolation-state (9001) indicator to ISOLATE state
-- Call to set-indicator to set the allocation-state (9003) indicator to UNUSABLE state
I noticed one example of thread x28 getting through all of these steps just fine, but for thread x20, although the
query-cpu-stopped state returned 0 status (STOPPED), a subsequent call to set-indicator to ISOLATE
failed. This failure was near the end of the trace, but was not the very last RTAS call made in the trace.
The set-indicator failure reported to Linux was a -9001 (Valid outstanding translation) which was mapped
from a 0x502 (Invalid thread state) return code from PHYP's H_SET_DR_STATE h-call.
On 12/10/2018 02:31 PM, Thiago Jung Bauermann wrote:
>
> Hello Michael,
>
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>
>> I have asked Scott Mayes to take a look at one of these crashes from
>> the phyp side. I will let you know if he finds anything notable.
>
> Thanks! It might make sense to test whether booting with
> cede_offline=off makes the bug go away.
Scott is looking at the system. I will try once he is finished.
>
> One suspicion I have is regarding the code handling CPU_STATE_INACTIVE.
>>From what I understand, it is a powerpc-specific CPU state and from the
> perspective of the generic CPU hotplug state machine, inactive CPUs are
> already fully offline. Which means that the locking performed by the
> generic code state machine doesn't apply to transitioning CPUs from
> INACTIVE to OFFLINE state. Perhaps the bug is that there is more than
> one CPU making that transition at the same time? That would cause two
> CPUs to call RTAS stop-self.
>
> I haven't checked whether this is really possible or not, though. It's
> just a conjecture.
Michael
>
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
>
>
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@linux.vnet.ibm.com
next prev parent reply other threads:[~2018-12-11 22:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-06 11:31 [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop Gautham R. Shenoy
2018-12-06 17:28 ` Thiago Jung Bauermann
2018-12-07 10:43 ` Gautham R Shenoy
2018-12-07 12:03 ` Gautham R Shenoy
2018-12-08 2:40 ` Thiago Jung Bauermann
2018-12-10 20:16 ` Michael Bringmann
2018-12-10 20:31 ` Thiago Jung Bauermann
2018-12-11 22:11 ` Michael Bringmann [this message]
2019-01-09 6:08 ` Gautham R Shenoy
2019-01-14 18:11 ` Michael Bringmann
2019-01-17 6:03 ` Gautham R Shenoy
2022-07-07 9:51 ` Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f93ec711-a7c6-e754-2002-2dad2a893005@linux.vnet.ibm.com \
--to=mwb@linux.vnet.ibm.com \
--cc=bauerman@linux.ibm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=tyreld@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).