From: Nathan Lynch <nathanl@linux.ibm.com>
To: "Nicholas Piggin" <npiggin@gmail.com>,
"Leonardo Brás" <leobras.c@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"
Date: Mon, 19 Sep 2022 08:51:16 -0500 [thread overview]
Message-ID: <87illjqxpn.fsf@linux.ibm.com> (raw)
In-Reply-To: <87h717t24d.fsf@linux.ibm.com>
Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Nicholas Piggin" <npiggin@gmail.com> writes:
>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>>> > Leonardo Brás <leobras.c@gmail.com> writes:
>>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>>> > > > Leonardo Brás <leobras.c@gmail.com> writes:
>>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
>>> > > > > > I had confirmed -- with PowerVM partition firmware development that
>>> > > > > > the following RTAS functions:
>>> > > > > >
>>> > > > > > - ibm,get-xive
>>> > > > > > - ibm,int-off
>>> > > > > > - ibm,int-on
>>> > > > > > - ibm,set-xive
>>> > > > > >
>>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
>>> > > > > > RTAS calls:
>>> > > > > >
>>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>>> > > > > >
>>> > > > > > Recent discussion with firmware development makes it clear that this
>>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
>>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
>>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
>>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>>> > > > >
>>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>>> > > >
>>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>>> > > > property described in the spec applies only to the individual RTAS
>>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>>> > > > simultaneously, but it must adhere to the more general requirement to
>>> > > > serialize with other RTAS functions.
>>> > > >
>>> > >
>>> > > I see. Thanks for explaining that part!
>>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>>> > > previously thought.
>>> > >
>>> > > OTOH, I think that instead of reverting the change, we could make use of the
>>> > > correct information and fix the current implementation. (This could help when we
>>> > > do the same rtas call in multiple cpus)
>>> >
>>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>>> > that. I'm not seeing the need.
>>> >
>>> > > I have an idea of a patch to fix this.
>>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
>>> > > this reversion?
>>> >
>>> > It is my preference, and I believe it is more common, to revert to the
>>> > well-understood prior state, imperfect as it may be. The revert can be
>>> > backported to -stable and distros while development and review of
>>> > another approach proceeds.
>>>
>>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>>
>>> FWIW:
>>> Reviewed-by: Leonardo Bras <leobras.c@gmail.com>
>>
>> A shame. I guess a reader/writer lock would not be much help because
>> the crash is probably more likely to hit longer running rtas calls?
>>
>> Alternative is just cheat and do this...?
[...]
>
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.
Regardless, I request that we proceed with the revert while the crash
path hardening gets sorted out. If I understand the motivation behind
commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
then it looks like it was incomplete anyway? rtas_os_term() still takes
the lock when calling ibm,os-term.
next prev parent reply other threads:[~2022-09-19 13:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-07 22:01 [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call" Nathan Lynch
2022-09-08 7:56 ` Laurent Dufour
[not found] ` <1d76891ee052112ee1547a4027e358d5cbcac23d.camel@gmail.com>
2022-09-09 14:04 ` Nathan Lynch
2022-09-12 15:22 ` Leonardo Brás
2022-09-12 19:58 ` Nathan Lynch
2022-09-13 17:39 ` Leonardo Brás
2022-09-16 1:31 ` Nicholas Piggin
2022-09-16 21:56 ` Nathan Lynch
2022-09-19 13:51 ` Nathan Lynch [this message]
2022-09-19 23:45 ` Michael Ellerman
2022-09-20 3:54 ` Nicholas Piggin
2022-09-21 15:54 ` Nathan Lynch
2023-04-14 14:20 ` Michal Suchánek
2023-04-17 13:55 ` Nathan Lynch
2022-09-23 10:57 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87illjqxpn.fsf@linux.ibm.com \
--to=nathanl@linux.ibm.com \
--cc=leobras.c@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.