Re: Machine Check in P2010(e500v2)

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Scott Wood <oss@buserror.net>
To: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"laurentiu.tudor@nxp.com" <laurentiu.tudor@nxp.com>
Subject: Re: Machine Check in P2010(e500v2)
Date: Thu, 07 Sep 2017 20:56:50 -0500	[thread overview]
Message-ID: <1504835810.17625.4.camel@buserror.net> (raw)
In-Reply-To: <1504692964.27247.68.camel@infinera.com>

On Wed, 2017-09-06 at 10:16 +0000, Joakim Tjernlund wrote:
> On Wed, 2017-09-06 at 10:05 +0000, Laurentiu Tudor wrote:
> > Hi Jocke,
> > 
> > On 09/01/2017 02:32 PM, Joakim Tjernlund wrote:
> > > I am trying to debug a Machine Check for a P2010 (e500v2) CPU:
> > > 
> > > [   28.111816] Caused by (from MCSR=10008): Bus - Read Data Bus Error
> > > [   28.117998] Oops: Machine check, sig: 7 [#1]
> > > [   28.122263] P1010 RDB
> > > [   28.124529] Modules linked in: linux_bcm_knet(PO) linux_user_bde(PO)
> > > linux_kernel_bde(PO)
> > > [   28.132718] CPU: 0 PID: 470 Comm: emxp2_hw_bl Tainted:
> > > P           O    4.1.38+ #49
> > > [   28.140376] task: db16cd10 ti: df128000 task.ti: df128000
> > > [   28.145770] NIP: 00000000 LR: 10a4e404 CTR: 10046c38
> > > [   28.150730] REGS: df129f10 TRAP: 0204   Tainted:
> > > P           O     (4.1.38+)
> > > [   28.157776] MSR: 0002d000 <CE,EE,PR,ME>  CR: 44002428  XER: 00000000
> > > [   28.164140] DEAR: b7187000 ESR: 00000000
> > > GPR00: 10a4e404 bf86ea30 b7ca94a0 132f9fa8 07006000 07000000 00000000
> > > 132f9fd8
> > > GPR08: b7149000 b7159000 0003e000 bf86ea20 24004424 11d6cf7c 00000000
> > > 00000000
> > > GPR16: 10f6e29c 10f6c872 10f6db01 0000b541 0000b541 11d92fcc 00000011
> > > 00000001
> > > GPR24: 01a4d12d 132ffbf0 11d60000 00000000 07006000 00000000 132f9fa8
> > > 00000000
> > > [   28.196375] NIP [00000000]   (null)
> > > [   28.199859] LR [10a4e404] 0x10a4e404
> > > [   28.203426] Call Trace:
> > > [   28.205866] ---[ end trace f456255ddf9bee83 ]---
> > > 
> > > I cannot figure out why NIP is NULL ? It LOOKs like NIP is set to
> > > MCSRR0 early on but maybe it is lost somehow?
> > > 
> > > Anyhow, looking at entry_32.S:
> > > 	.globl	mcheck_transfer_to_handler
> > > mcheck_transfer_to_handler:
> > > 	mfspr	r0,SPRN_DSRR0
> > > 	stw	r0,_DSRR0(r11)
> > > 	mfspr	r0,SPRN_DSRR1
> > > 	stw	r0,_DSRR1(r11)
> > > 	/* fall through */
> > > 
> > > 	.globl	debug_transfer_to_handler
> > > debug_transfer_to_handler:
> > > 	mfspr	r0,SPRN_CSRR0
> > > 	stw	r0,_CSRR0(r11)
> > > 	mfspr	r0,SPRN_CSRR1
> > > 	stw	r0,_CSRR1(r11)
> > > 	/* fall through */
> > > 
> > > 	.globl	crit_transfer_to_handler
> > > crit_transfer_to_handler:
> > > 
> > > It looks odd that DSRRx is assigned in mcheck and CSRRx in debug and
> > > crit has none. Should not this assigment be shifted down one level?
> > > 
> > 
> > This does indeed looks weird. Have you tried moving the SPRN_CSRR* 
> > saving in the crit section? Any results?
> 
> After looking at this somwhat I think this is intentional and OK.
> I sorted NIP == NULL too:
> @@ -996,7 +998,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
>         if (is_in_pci_mem_space(addr)) {
>                 if (user_mode(regs)) {
>                         pagefault_disable();
> -                       ret = get_user(regs->nip, &inst);
> +                       ret = get_user(inst, (__u32 __user *)regs->nip);
>                         pagefault_enable();
>                 } else {
>                         ret = probe_kernel_address(regs->nip, inst);

:-(

> 
> But after this, the CPU is still locked after an Machine Check. Is this
> to be expected? I figured the user space process would get a SIGBUS and
> kernel
> would resume normal operations.
> 
> Scott, maybe you have some idea?

The userspace process should exit with SIGBUS (not quite the same as receiving
a SIGBUS that can be handled).  Maybe whatever is causing the machine check
ends up causing more problems that lead to the hang.

-Scott

     prev parent reply	other threads:[~2017-09-08  1:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 11:32 Machine Check in P2010(e500v2) Joakim Tjernlund
2017-09-05  8:40 ` Joakim Tjernlund
2017-09-06 15:38   ` York Sun
2017-09-06 19:31     ` Leo Li
2017-09-06 20:17       ` Joakim Tjernlund
2017-09-06 20:28         ` Leo Li
2017-09-06 20:53           ` Joakim Tjernlund
2017-09-06 21:13             ` Leo Li
2017-09-06 22:50               ` Joakim Tjernlund
2017-09-07  8:41                 ` Joakim Tjernlund
2017-09-07 18:54                   ` Leo Li
2017-09-08  9:54                     ` Joakim Tjernlund
2017-09-08 12:50                       ` Joakim Tjernlund
2017-09-08 22:27                         ` Leo Li
2017-09-09 12:45                           ` Joakim Tjernlund
     [not found]                             ` <1504961965.31322.72.camel@infinera.com>
2017-09-14 16:55                               ` Joakim Tjernlund
2017-09-20 16:45                             ` Joakim Tjernlund
2017-09-21 18:53                               ` Leo Li
2017-09-06 10:05 ` Laurentiu Tudor
2017-09-06 10:16   ` Joakim Tjernlund
2017-09-08  1:56     ` Scott Wood [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1504835810.17625.4.camel@buserror.net \
    --to=oss@buserror.net \
    --cc=Joakim.Tjernlund@infinera.com \
    --cc=laurentiu.tudor@nxp.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).