Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Leonardo Bras <leonardo@linux.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	Alexios Zavras <alexios.zavras@intel.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Christophe Leroy <christophe.leroy@c-s.fr>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Enrico Weigelt <info@metux.net>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>,
	peterz@infradead.org, Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash
Date: Tue, 07 Apr 2020 23:36:57 -0300	[thread overview]
Message-ID: <fb98f346a4d6a9d689ae64dae33cbd45d2f8b0df.camel@linux.ibm.com> (raw)
In-Reply-To: <1585895551.7o9oa0ey62.astroid@bobo.none>

[-- Attachment #1: Type: text/plain, Size: 3648 bytes --]

Hello Nick, Michael,

On Fri, 2020-04-03 at 16:41 +1000, Nicholas Piggin wrote:
[...]
> > > PAPR says we are not allowed to have multiple CPUs calling RTAS at once,
> > > except for a very small list of RTAS calls. So if we bust the RTAS lock
> > > there's a risk we violate that part of PAPR and crash even harder.
> > 
> > Interesting, I was not aware.
> > 
> > > Also it's not specific to kdump, we can't even get through a normal
> > > reboot if we crash with the RTAS lock held.
> > > 
> > > Anyway here's a patch with some ideas. That allows me to get from a
> > > crash with the RTAS lock held through kdump into the 2nd kernel. But it
> > > only works if it's the crashing CPU that holds the RTAS lock.
> > > 
> > 
> > Nice idea. 
> > But my test environment is just triggering a crash from sysrq, so I
> > think it would not improve the result, given that this thread is
> > probably not holding the lock by the time.
> 
> Crash paths should not take that RTAS lock, it's a massive pain. I'm 
> fixing it for machine check, for other crashes I think it can be removed 
> too, it just needs to be unpicked. The good thing with crashing is that 
> you can reasonably *know* that you're single threaded, so you can 
> usually reason through situations like above.
> 
> > I noticed that when rtas is locked, irqs and preemption are also
> > disabled.
> > 
> > Should the IPI send by crash be able to interrupt a thread with
> > disabled irqs?
> 
> Yes. It's been a bit painful, but in the long term it means that a CPU 
> which hangs with interrupts off can be debugged, and it means we can 
> take it offline to crash without risking that it will be clobbering what 
> we're doing.
> 
> Arguably what I should have done is try a regular IPI first, wait a few 
> seconds, then NMI IPI.
> 
> A couple of problems with that. Firstly it probably avoids this issue 
> you hit almost all the time, so it won't get fixed. So when we really 
> need the NMI IPI in the field, it'll still be riddled with deadlocks.
> 
> Secondly, sending the IPI first in theory can be more intrusive to the 
> state that we want to debug. It uses the currently running stack, paca 
> save areas, ec. NMI IPI uses its own stack and save regions so it's a 
> little more isolated. Maybe this is only a small advantage but I'd like 
> to have it if we can.  
> 
> Thanks,
> Nick


I think the printk issue is solved (sent a patch on that), now what is
missing is the rtas call spinlock.

I noticed that rtas.lock is taken on machine_kexec_mask_interrupts(),
which happen after crashing the other threads and getting into
realmode. 

The following rtas are called each IRQ with valid interrupt descriptor:
ibm,int-off : Reset mask bit for that interrupt
ibm,set_xive : Set XIVE priority to 0xff

By what I could understand, these rtas calls happen to put the next
kexec kernel (kdump kernel) in a safer environment, so I think it's not
safe to just remove them.
 
(See commit d6c1a9081080c6c4658acf2a06d851feb2855933)

On the other hand, busting the rtas.lock could be dangerous, because
it's code we can't control.

According with LoPAR, for both of these rtas-calls, we have:

For the PowerPC External Interrupt option: The call must be reentrant
to the number of processors on the platform.
For the PowerPC External Interrupt option: The argument call buffer for
each simultaneous call must be physically unique.

Which I think means this rtas-calls can be done simultaneously.
Would it mean that busting the rtas.lock for these calls would be safe?

Best regards,
Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2020-04-08  2:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-01  0:00 [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash Leonardo Bras
2020-04-01  3:07 ` kbuild test robot
2020-04-01  9:26 ` Peter Zijlstra
2020-04-01 23:53   ` Leonardo Bras
2020-04-02 11:28 ` Michael Ellerman
2020-04-03  0:37   ` Leonardo Bras
2020-04-03  6:41     ` Nicholas Piggin
2020-04-08  2:36       ` Leonardo Bras [this message]
2020-04-08 12:21         ` Michael Ellerman
2020-04-08 16:48           ` Leonardo Bras
2020-04-08 18:00           ` Leonardo Bras
2020-04-08 22:55           ` Leonardo Bras
2020-04-09  0:27           ` Paul Mackerras
2020-05-12  3:48             ` Leonardo Bras
2020-05-12 10:42             ` Michael Ellerman
2020-04-06 18:46   ` Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb98f346a4d6a9d689ae64dae33cbd45d2f8b0df.camel@linux.ibm.com \
    --to=leonardo@linux.ibm.com \
    --cc=alexios.zavras@intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=gregkh@linuxfoundation.org \
    --cc=info@metux.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).