From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"leoyang.li@nxp.com" <leoyang.li@nxp.com>,
"york.sun@nxp.com" <york.sun@nxp.com>
Subject: Re: Machine Check in P2010(e500v2)
Date: Thu, 7 Sep 2017 08:41:20 +0000 [thread overview]
Message-ID: <1504773676.31322.2.camel@infinera.com> (raw)
In-Reply-To: <1504738204.27247.133.camel@infinera.com>
On Thu, 2017-09-07 at 00:50 +0200, Joakim Tjernlund wrote:
> On Wed, 2017-09-06 at 21:13 +0000, Leo Li wrote:
> > > -----Original Message-----
> > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@infinera.com]
> > > Sent: Wednesday, September 06, 2017 3:54 PM
> > > To: linuxppc-dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>; York =
Sun
> > > <york.sun@nxp.com>
> > > Subject: Re: Machine Check in P2010(e500v2)
> > >=20
> > > On Wed, 2017-09-06 at 20:28 +0000, Leo Li wrote:
> > > > > -----Original Message-----
> > > > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@infinera.com]
> > > > > Sent: Wednesday, September 06, 2017 3:17 PM
> > > > > To: linuxppc-dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>; Y=
ork
> > > > > Sun <york.sun@nxp.com>
> > > > > Subject: Re: Machine Check in P2010(e500v2)
> > > > >=20
> > > > > On Wed, 2017-09-06 at 19:31 +0000, Leo Li wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: York Sun
> > > > > > > Sent: Wednesday, September 06, 2017 10:38 AM
> > > > > > > To: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>; linuxpp=
c-
> > > > > > > dev@lists.ozlabs.org; Leo Li <leoyang.li@nxp.com>
> > > > > > > Subject: Re: Machine Check in P2010(e500v2)
> > > > > > >=20
> > > > > > > Scott is no longer with Freescale/NXP. Adding Leo.
> > > > > > >=20
> > > > > > > On 09/05/2017 01:40 AM, Joakim Tjernlund wrote:
> > > > > > > > So after some debugging I found this bug:
> > > > > > > > @@ -996,7 +998,7 @@ int fsl_pci_mcheck_exception(struct pt_=
regs
> > >=20
> > > *regs)
> > > > > > > > if (is_in_pci_mem_space(addr)) {
> > > > > > > > if (user_mode(regs)) {
> > > > > > > > pagefault_disable();
> > > > > > > > - ret =3D get_user(regs->nip, &inst);
> > > > > > > > + ret =3D get_user(inst, (__u32 __use=
r
> > > > > > > > + *)regs->nip);
> > > > > > > > pagefault_enable();
> > > > > > > > } else {
> > > > > > > > ret =3D probe_kernel_address(regs-=
>nip,
> > > > > > > > inst);
> > > > > > > >=20
> > > > > > > > However, the kernel still locked up after fixing that.
> > > > > > > > Now I wonder why this fixup is there in the first place? Th=
e
> > > > > > > > routine will not really fixup the insn, just return 0xfffff=
fff
> > > > > > > > for the failing read and then advance the process NIP.
> > > > > >=20
> > > > > > You are right. The code here only gives 0xffffffff to the load
> > > > > > instructions and
> > > > >=20
> > > > > continue with the next instruction when the load instruction is
> > > > > causing the machine check. This will prevent a system lockup whe=
n
> > > > > reading from PCI/RapidIO device which is link down.
> > > > > >=20
> > > > > > I don't know what is actual problem in your case. Maybe it is =
a
> > > > > > write
> > > > >=20
> > > > > instruction instead of read? Or the code is in a infinite loop =
waiting for a
> > >=20
> > > valid
> > > > > read result? Are you able to do some further debugging with the =
NIP
> > > > > correctly printed?
> > > > > >=20
> > > > >=20
> > > > > According to the MC it is a Read and the NIP also leads to a read=
in the
> > >=20
> > > program.
> > > > > ATM, I have disabled the fixup but I will enable that again.
> > > > > Question, is it safe add a small printk when this MC happens(afte=
r
> > > > > fixing up)? I need to see that it has happened as the error is so=
mewhat
> > >=20
> > > random.
> > > >=20
> > > > I think it is safe to add printk as the current machine check handl=
ers are also
> > >=20
> > > using printk.
> > >=20
> > > I hope so, but if the fixup fires there is no printk at all so I was =
a bit unsure.
> > > Don't like this fixup though, is there not a better way than faking a=
read to user
> > > space(or kernel for that matter) ?
> >=20
> > I don't have a better idea. Without the fixup, the offending load inst=
ruction will never finish if there is anything wrong with the backing devic=
e and freeze the whole system. Do you have any suggestion in mind?
> >=20
>=20
> But it never finishes the load, it just fakes a load of 0xfffffffff, for =
user space I rather have it signal
> a SIGBUS but that does not seem to work either, at least not for us but t=
hat could be a bug in general MC code
> maybe.
> This fixup might be valid for kernel only as it has never worked for user=
space due to the bug I found.
>=20
> Where can I read about this errata ?
I have look high and low an cannot find an errata which maps to this fixup.
The closest I get is A-005125 which seems to have another workaround, I can=
not find
any evidence that this workaround has been applied in Linux, can you?
Jocke=
next prev parent reply other threads:[~2017-09-07 8:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-01 11:32 Machine Check in P2010(e500v2) Joakim Tjernlund
2017-09-05 8:40 ` Joakim Tjernlund
2017-09-06 15:38 ` York Sun
2017-09-06 19:31 ` Leo Li
2017-09-06 20:17 ` Joakim Tjernlund
2017-09-06 20:28 ` Leo Li
2017-09-06 20:53 ` Joakim Tjernlund
2017-09-06 21:13 ` Leo Li
2017-09-06 22:50 ` Joakim Tjernlund
2017-09-07 8:41 ` Joakim Tjernlund [this message]
2017-09-07 18:54 ` Leo Li
2017-09-08 9:54 ` Joakim Tjernlund
2017-09-08 12:50 ` Joakim Tjernlund
2017-09-08 22:27 ` Leo Li
2017-09-09 12:45 ` Joakim Tjernlund
[not found] ` <1504961965.31322.72.camel@infinera.com>
2017-09-14 16:55 ` Joakim Tjernlund
2017-09-20 16:45 ` Joakim Tjernlund
2017-09-21 18:53 ` Leo Li
2017-09-06 10:05 ` Laurentiu Tudor
2017-09-06 10:16 ` Joakim Tjernlund
2017-09-08 1:56 ` Scott Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1504773676.31322.2.camel@infinera.com \
--to=joakim.tjernlund@infinera.com \
--cc=leoyang.li@nxp.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=york.sun@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).