From: Michael Bringmann <mwb@linux.vnet.ibm.com>
To: ego@linux.vnet.ibm.com, Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
Nicholas Piggin <npiggin@gmail.com>,
Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Subject: Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Date: Mon, 14 Jan 2019 12:11:44 -0600 [thread overview]
Message-ID: <74b2262b-48e7-b2a5-7d20-dc7f590958d9@linux.vnet.ibm.com> (raw)
In-Reply-To: <20190109060823.GA20248@in.ibm.com>
On 1/9/19 12:08 AM, Gautham R Shenoy wrote:
> I did some testing during the holidays. Here are the observations:
>
> 1) With just your patch (without any additional debug patch), if I run
> DLPAR on /off operations on a system that has SMT=off, I am able to
> see a crash involving RTAS stack corruption within an hour's time.
>
> 2) With the debug patch (appended below) which has additional debug to
> capture the callers of stop-self, start-cpu, set-power-levels, the
> system is able to perform DLPAR on/off operations on a system with
> SMT=off for three days. And then, it crashed with the dead CPU showing
> a "Bad kernel stack pointer". From this log, I can clearly
> see that there were no concurrent calls to stop-self, start-cpu,
> set-power-levels. The only concurrent RTAS calls were the dying CPU
> calling "stop-self", and the CPU running the DLPAR operation calling
> "query-cpu-stopped-state". The crash signature is appended below as
> well.
>
> 3) Modifying your patch to remove the udelay and increase the loop
> count from 25 to 1000 doesn't improve the situation. We are still able
> to see the crash.
>
> 4) With my patch, even without any additional debug, I was able to
> observe the system run the tests successfully for over a week (I
> started the tests before the Christmas weekend, and forgot to turn it
> off!)
So does this mean that the problem is fixed with your patch?
>
> It appears that there is a narrow race window involving rtas-stop-self
> and rtas-query-cpu-stopped-state calls that can be observed with your
> patch. Adding any printk's seems to reduce the probability of hitting
> this race window. It might be worth the while to check with RTAS
> folks, if they suspect something here.
What would the RTAS folks be looking at here? The 'narrow race window'
is with respect to a patch that it sound like we should not be using.
Thanks.
Michael
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@linux.vnet.ibm.com
next prev parent reply other threads:[~2019-01-14 18:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-06 11:31 [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop Gautham R. Shenoy
2018-12-06 17:28 ` Thiago Jung Bauermann
2018-12-07 10:43 ` Gautham R Shenoy
2018-12-07 12:03 ` Gautham R Shenoy
2018-12-08 2:40 ` Thiago Jung Bauermann
2018-12-10 20:16 ` Michael Bringmann
2018-12-10 20:31 ` Thiago Jung Bauermann
2018-12-11 22:11 ` Michael Bringmann
2019-01-09 6:08 ` Gautham R Shenoy
2019-01-14 18:11 ` Michael Bringmann [this message]
2019-01-17 6:03 ` Gautham R Shenoy
2022-07-07 9:51 ` Christophe Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74b2262b-48e7-b2a5-7d20-dc7f590958d9@linux.vnet.ibm.com \
--to=mwb@linux.vnet.ibm.com \
--cc=bauerman@linux.ibm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=tyreld@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).