linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Christophe LEROY <christophe.leroy@c-s.fr>
Cc: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
Date: Tue, 9 Oct 2018 22:14:46 +1000	[thread overview]
Message-ID: <20181009221446.33b926e3@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>

On Tue, 9 Oct 2018 14:01:37 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
> > On Tue, 9 Oct 2018 09:36:18 +0000
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >   
> >> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:  
> >>> On Tue, 9 Oct 2018 06:46:30 +0200
> >>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >>>      
> >>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  
> >>>>> On Mon, 8 Oct 2018 17:39:11 +0200
> >>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> >>>>>         
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>>>>>> printk NMI buffers and turns off various debugging facilities that
> >>>>>>> helps avoid tripping on ourselves or other CPUs.
> >>>>>>>
> >>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>>>>> ---
> >>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>>>>>> index 2849c4f50324..6d31f9d7c333 100644
> >>>>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>>>>>      
> >>>>>>>      void machine_check_exception(struct pt_regs *regs)
> >>>>>>>      {
> >>>>>>> -	enum ctx_state prev_state = exception_enter();
> >>>>>>>      	int recover = 0;
> >>>>>>> +	bool nested = in_nmi();
> >>>>>>> +	if (!nested)
> >>>>>>> +		nmi_enter();  
> >>>>>>
> >>>>>> This alters preempt_count, then when die() is called
> >>>>>> in_interrupt() returns true allthough the trap didn't happen in
> >>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >>>>>> instead of gently sending SIGBUS the faulting app.  
> >>>>>
> >>>>> Thanks for tracking that down.
> >>>>>         
> >>>>>> Any idea on how to fix this ?  
> >>>>>
> >>>>> I would say we have to deliver the sigbus by hand.
> >>>>>
> >>>>>        if ((user_mode(regs)))
> >>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >>>>>        else
> >>>>>            die("Machine check", regs, SIGBUS);
> >>>>>         
> >>>>
> >>>> And what about all the other things done by 'die()' ?
> >>>>
> >>>> And what if it is a kernel thread ?
> >>>>
> >>>> In one of my boards, I have a kernel thread regularly checking the HW,
> >>>> and if it gets a machine check I expect it to gently stop and the die
> >>>> notification to be delivered to all registered notifiers.
> >>>>
> >>>> Until before this patch, it was working well.  
> >>>
> >>> I guess the alternative is we could check regs->trap for machine
> >>> check in the die test. Complication is having to account for MCE
> >>> in an interrupt handler.
> >>>
> >>>          if (in_interrupt()) {
> >>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
> >>>                       panic("Fatal exception in interrupt");
> >>>          }
> >>>
> >>> Something like that might work for you? We needs a ppc64 macro for the
> >>> MCE, and can probably add something like in_nmi_from_interrupt() for
> >>> the second part of the test.  
> >>
> >> Don't know, I'm away from home on business trip so I won't be able to
> >> test anything before next week. However it looks more or less like a
> >> hack, doesn't it ?  
> > 
> > I thought it seemed okay (with the right functions added). Actually it
> > could be a bit nicer to do this, then it works generally :
> > 
> >           if (in_interrupt()) {
> >                    if (!in_nmi() || in_nmi_from_interrupt())
> >                        panic("Fatal exception in interrupt");
> >           }  
> 
> 
> Yes looks nice, but:
> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() || 
> in_softirq()) ?

  return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;

(basically just in_interrupt() with the nmi_enter undone)

> 2/ what about in_nmi_from_nmi(), how do we detect that ?

Oh good point, I'm not sure. I guess we could irq_enter() in the
nested case, I think that would make in_nmi_from_interrupt()
return true.

Thanks,
Nick

  reply	other threads:[~2018-10-09 12:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19  6:59 [PATCH v2 0/3] machine check handling improvements Nicholas Piggin
2017-07-19  6:59 ` [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart Nicholas Piggin
2017-07-19  7:16   ` Nicholas Piggin
2017-07-20  5:39   ` Mahesh Jagannath Salgaonkar
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-07-19  6:59 ` [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path Nicholas Piggin
2017-07-20  7:14   ` Mahesh Jagannath Salgaonkar
2017-07-19  6:59 ` [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt Nicholas Piggin
2018-10-08 15:39   ` Christophe LEROY
2018-10-09  4:32     ` Nicholas Piggin
2018-10-09  4:46       ` Christophe LEROY
2018-10-09  5:30         ` Nicholas Piggin
2018-10-09  9:36           ` Christophe Leroy
2018-10-09 11:16             ` Nicholas Piggin
2018-10-09 12:01               ` Christophe LEROY
2018-10-09 12:14                 ` Nicholas Piggin [this message]
2018-10-11 14:23                   ` Christophe LEROY
2018-10-11 14:31               ` Christophe LEROY
2018-10-13  8:29                 ` Christophe Leroy
2018-10-13  8:48                   ` Nicholas Piggin
2018-10-13  8:56                     ` Christophe LEROY

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181009221446.33b926e3@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).