From: Scott Wood <oss@buserror.net>
To: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"laurentiu.tudor@nxp.com" <laurentiu.tudor@nxp.com>
Subject: Re: Machine Check in P2010(e500v2)
Date: Thu, 07 Sep 2017 20:56:50 -0500 [thread overview]
Message-ID: <1504835810.17625.4.camel@buserror.net> (raw)
In-Reply-To: <1504692964.27247.68.camel@infinera.com>
On Wed, 2017-09-06 at 10:16 +0000, Joakim Tjernlund wrote:
> On Wed, 2017-09-06 at 10:05 +0000, Laurentiu Tudor wrote:
> > Hi Jocke,
> >
> > On 09/01/2017 02:32 PM, Joakim Tjernlund wrote:
> > > I am trying to debug a Machine Check for a P2010 (e500v2) CPU:
> > >
> > > [ 28.111816] Caused by (from MCSR=10008): Bus - Read Data Bus Error
> > > [ 28.117998] Oops: Machine check, sig: 7 [#1]
> > > [ 28.122263] P1010 RDB
> > > [ 28.124529] Modules linked in: linux_bcm_knet(PO) linux_user_bde(PO)
> > > linux_kernel_bde(PO)
> > > [ 28.132718] CPU: 0 PID: 470 Comm: emxp2_hw_bl Tainted:
> > > P O 4.1.38+ #49
> > > [ 28.140376] task: db16cd10 ti: df128000 task.ti: df128000
> > > [ 28.145770] NIP: 00000000 LR: 10a4e404 CTR: 10046c38
> > > [ 28.150730] REGS: df129f10 TRAP: 0204 Tainted:
> > > P O (4.1.38+)
> > > [ 28.157776] MSR: 0002d000 <CE,EE,PR,ME> CR: 44002428 XER: 00000000
> > > [ 28.164140] DEAR: b7187000 ESR: 00000000
> > > GPR00: 10a4e404 bf86ea30 b7ca94a0 132f9fa8 07006000 07000000 00000000
> > > 132f9fd8
> > > GPR08: b7149000 b7159000 0003e000 bf86ea20 24004424 11d6cf7c 00000000
> > > 00000000
> > > GPR16: 10f6e29c 10f6c872 10f6db01 0000b541 0000b541 11d92fcc 00000011
> > > 00000001
> > > GPR24: 01a4d12d 132ffbf0 11d60000 00000000 07006000 00000000 132f9fa8
> > > 00000000
> > > [ 28.196375] NIP [00000000] (null)
> > > [ 28.199859] LR [10a4e404] 0x10a4e404
> > > [ 28.203426] Call Trace:
> > > [ 28.205866] ---[ end trace f456255ddf9bee83 ]---
> > >
> > > I cannot figure out why NIP is NULL ? It LOOKs like NIP is set to
> > > MCSRR0 early on but maybe it is lost somehow?
> > >
> > > Anyhow, looking at entry_32.S:
> > > .globl mcheck_transfer_to_handler
> > > mcheck_transfer_to_handler:
> > > mfspr r0,SPRN_DSRR0
> > > stw r0,_DSRR0(r11)
> > > mfspr r0,SPRN_DSRR1
> > > stw r0,_DSRR1(r11)
> > > /* fall through */
> > >
> > > .globl debug_transfer_to_handler
> > > debug_transfer_to_handler:
> > > mfspr r0,SPRN_CSRR0
> > > stw r0,_CSRR0(r11)
> > > mfspr r0,SPRN_CSRR1
> > > stw r0,_CSRR1(r11)
> > > /* fall through */
> > >
> > > .globl crit_transfer_to_handler
> > > crit_transfer_to_handler:
> > >
> > > It looks odd that DSRRx is assigned in mcheck and CSRRx in debug and
> > > crit has none. Should not this assigment be shifted down one level?
> > >
> >
> > This does indeed looks weird. Have you tried moving the SPRN_CSRR*
> > saving in the crit section? Any results?
>
> After looking at this somwhat I think this is intentional and OK.
> I sorted NIP == NULL too:
> @@ -996,7 +998,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> if (is_in_pci_mem_space(addr)) {
> if (user_mode(regs)) {
> pagefault_disable();
> - ret = get_user(regs->nip, &inst);
> + ret = get_user(inst, (__u32 __user *)regs->nip);
> pagefault_enable();
> } else {
> ret = probe_kernel_address(regs->nip, inst);
:-(
>
> But after this, the CPU is still locked after an Machine Check. Is this
> to be expected? I figured the user space process would get a SIGBUS and
> kernel
> would resume normal operations.
>
> Scott, maybe you have some idea?
The userspace process should exit with SIGBUS (not quite the same as receiving
a SIGBUS that can be handled). Maybe whatever is causing the machine check
ends up causing more problems that lead to the hang.
-Scott
prev parent reply other threads:[~2017-09-08 1:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-01 11:32 Machine Check in P2010(e500v2) Joakim Tjernlund
2017-09-05 8:40 ` Joakim Tjernlund
2017-09-06 15:38 ` York Sun
2017-09-06 19:31 ` Leo Li
2017-09-06 20:17 ` Joakim Tjernlund
2017-09-06 20:28 ` Leo Li
2017-09-06 20:53 ` Joakim Tjernlund
2017-09-06 21:13 ` Leo Li
2017-09-06 22:50 ` Joakim Tjernlund
2017-09-07 8:41 ` Joakim Tjernlund
2017-09-07 18:54 ` Leo Li
2017-09-08 9:54 ` Joakim Tjernlund
2017-09-08 12:50 ` Joakim Tjernlund
2017-09-08 22:27 ` Leo Li
2017-09-09 12:45 ` Joakim Tjernlund
[not found] ` <1504961965.31322.72.camel@infinera.com>
2017-09-14 16:55 ` Joakim Tjernlund
2017-09-20 16:45 ` Joakim Tjernlund
2017-09-21 18:53 ` Leo Li
2017-09-06 10:05 ` Laurentiu Tudor
2017-09-06 10:16 ` Joakim Tjernlund
2017-09-08 1:56 ` Scott Wood [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1504835810.17625.4.camel@buserror.net \
--to=oss@buserror.net \
--cc=Joakim.Tjernlund@infinera.com \
--cc=laurentiu.tudor@nxp.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.