From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"leoyang.li@nxp.com" <leoyang.li@nxp.com>,
"york.sun@nxp.com" <york.sun@nxp.com>
Subject: Re: Machine Check in P2010(e500v2)
Date: Thu, 7 Sep 2017 08:41:20 +0000 [thread overview]
Message-ID: <1504773676.31322.2.camel@infinera.com> (raw)
In-Reply-To: <1504738204.27247.133.camel@infinera.com>
On Thu, 2017-09-07 at 00:50 +0200, Joakim Tjernlund wrote:
> On Wed, 2017-09-06 at 21:13 +0000, Leo Li wrote:
> > > -----Original Message-----
> > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@infinera.com]
> > > Sent: Wednesday, September 06, 2017 3:54 PM
> > > To: linuxppc-dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>; York =
Sun
> > > <york.sun@nxp.com>
> > > Subject: Re: Machine Check in P2010(e500v2)
> > >=20
> > > On Wed, 2017-09-06 at 20:28 +0000, Leo Li wrote:
> > > > > -----Original Message-----
> > > > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@infinera.com]
> > > > > Sent: Wednesday, September 06, 2017 3:17 PM
> > > > > To: linuxppc-dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>; Y=
ork
> > > > > Sun <york.sun@nxp.com>
> > > > > Subject: Re: Machine Check in P2010(e500v2)
> > > > >=20
> > > > > On Wed, 2017-09-06 at 19:31 +0000, Leo Li wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: York Sun
> > > > > > > Sent: Wednesday, September 06, 2017 10:38 AM
> > > > > > > To: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>; linuxpp=
c-
> > > > > > > dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>
> > > > > > > Subject: Re: Machine Check in P2010(e500v2)
> > > > > > >=20
> > > > > > > Scott is no longer with Freescale/NXP. Adding Leo.
> > > > > > >=20
> > > > > > > On 09/05/2017 01:40 AM, Joakim Tjernlund wrote:
> > > > > > > > So after some debugging I found this bug:
> > > > > > > > @@ -996,7 +998,7 @@ int fsl_pci_mcheck_exception(struct pt_=
regs
> > >=20
> > > *regs)
> > > > > > > > if (is_in_pci_mem_space(addr)) {
> > > > > > > > if (user_mode(regs)) {
> > > > > > > > pagefault_disable();
> > > > > > > > - ret =3D get_user(regs->nip, &inst);
> > > > > > > > + ret =3D get_user(inst, (__u32 __use=
r
> > > > > > > > + *)regs->nip);
> > > > > > > > pagefault_enable();
> > > > > > > > } else {
> > > > > > > > ret =3D probe_kernel_address(regs-=
>nip,
> > > > > > > > inst);
> > > > > > > >=20
> > > > > > > > However, the kernel still locked up after fixing that.
> > > > > > > > Now I wonder why this fixup is there in the first place? Th=
e
> > > > > > > > routine will not really fixup the insn, just return 0xfffff=
fff
> > > > > > > > for the failing read and then advance the process NIP.
> > > > > >=20
> > > > > > You are right. The code here only gives 0xffffffff to the load
> > > > > > instructions and
> > > > >=20
> > > > > continue with the next instruction when the load instruction is
> > > > > causing the machine check. This will prevent a system lockup whe=
n
> > > > > reading from PCI/RapidIO device which is link down.
> > > > > >=20
> > > > > > I don't know what is actual problem in your case. Maybe it is =
a
> > > > > > write
> > > > >=20
> > > > > instruction instead of read? Or the code is in a infinite loop =
waiting for a
> > >=20
> > > valid
> > > > > read result? Are you able to do some further debugging with the =
NIP
> > > > > correctly printed?
> > > > > >=20
> > > > >=20
> > > > > According to the MC it is a Read and the NIP also leads to a read=
in the
> > >=20
> > > program.
> > > > > ATM, I have disabled the fixup but I will enable that again.
> > > > > Question, is it safe add a small printk when this MC happens(afte=
r
> > > > > fixing up)? I need to see that it has happened as the error is so=
mewhat
> > >=20
> > > random.
> > > >=20
> > > > I think it is safe to add printk as the current machine check handl=
ers are also
> > >=20
> > > using printk.
> > >=20
> > > I hope so, but if the fixup fires there is no printk at all so I was =
a bit unsure.
> > > Don't like this fixup though, is there not a better way than faking a=
read to user
> > > space(or kernel for that matter) ?
> >=20
> > I don't have a better idea. Without the fixup, the offending load inst=
ruction will never finish if there is anything wrong with the backing devic=
e and freeze the whole system. Do you have any suggestion in mind?
> >=20
>=20
> But it never finishes the load, it just fakes a load of 0xfffffffff, for =
user space I rather have it signal
> a SIGBUS but that does not seem to work either, at least not for us but t=
hat could be a bug in general MC code
> maybe.
> This fixup might be valid for kernel only as it has never worked for user=
space due to the bug I found.
>=20
> Where can I read about this errata ?
I have look high and low an cannot find an errata which maps to this fixup.
The closest I get is A-005125 which seems to have another workaround, I can=
not find
any evidence that this workaround has been applied in Linux, can you?
Jocke=
next prev parent reply other threads:[~2017-09-07 8:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-01 11:32 Machine Check in P2010(e500v2) Joakim Tjernlund
2017-09-05 8:40 ` Joakim Tjernlund
2017-09-06 15:38 ` York Sun
2017-09-06 19:31 ` Leo Li
2017-09-06 20:17 ` Joakim Tjernlund
2017-09-06 20:28 ` Leo Li
2017-09-06 20:53 ` Joakim Tjernlund
2017-09-06 21:13 ` Leo Li
2017-09-06 22:50 ` Joakim Tjernlund
2017-09-07 8:41 ` Joakim Tjernlund [this message]
2017-09-07 18:54 ` Leo Li
2017-09-08 9:54 ` Joakim Tjernlund
2017-09-08 12:50 ` Joakim Tjernlund
2017-09-08 22:27 ` Leo Li
2017-09-09 12:45 ` Joakim Tjernlund
[not found] ` <1504961965.31322.72.camel@infinera.com>
2017-09-14 16:55 ` Joakim Tjernlund
2017-09-20 16:45 ` Joakim Tjernlund
2017-09-21 18:53 ` Leo Li
2017-09-06 10:05 ` Laurentiu Tudor
2017-09-06 10:16 ` Joakim Tjernlund
2017-09-08 1:56 ` Scott Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1504773676.31322.2.camel@infinera.com \
--to=joakim.tjernlund@infinera.com \
--cc=leoyang.li@nxp.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=york.sun@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.