linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH] powerpc/64s: Report SLB multi-hit rather than parity error
Date: Sun, 17 Jun 2018 22:07:57 +1000	[thread overview]
Message-ID: <20180617220757.3bafd4c2@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <87muvw9t78.fsf@concordia.ellerman.id.au>

On Fri, 15 Jun 2018 21:37:15 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> > On Wed, 13 Jun 2018 23:24:14 +1000
> > Michael Ellerman <mpe@ellerman.id.au> wrote:
> >  
> >> When we take an SLB multi-hit on bare metal, we see both the multi-hit
> >> and parity error bits set in DSISR. The user manuals indicates this is
> >> expected to always happen on Power8, whereas on Power9 it says a
> >> multi-hit will "usually" also cause a parity error.
> >> 
> >> We decide what to do based on the various error tables in mce_power.c,
> >> and because we process them in order and only report the first, we
> >> currently always report a parity error but not the multi-hit, eg:
> >> 
> >>   Severe Machine check interrupt [Recovered]
> >>     Initiator: CPU
> >>     Error type: SLB [Parity]
> >>       Effective address: c000000ffffd4300
> >> 
> >> Although this is correct, it leaves the user wondering why they got a
> >> parity error. It would be clearer instead if we reported the
> >> multi-hit because that is more likely to be simply a software bug,
> >> whereas a true parity error is possibly an indication of a bad core.
> >> 
> >> We can do that simply by reordering the error tables so that multi-hit
> >> appears before parity. That doesn't affect the error recovery at all,
> >> because we flush the SLB either way.  
> >
> > Yeah this is a good idea. I wonder if there are any other conditions
> > like this that should be reordered.  
> 
> Yeah good point, this one just caught my eye because I was testing it.
> Ideally it wouldn't matter and we could actually report multiple, but
> that would be a bit of a bigger change.

Yep this patch looks fine for a minimal fix.

> 
> > I think the i-side should not have to be changed here because it
> > matches the value not bits, so that shouldn't matter.  
> 
> Ah OK, will check.
> 
> > A bit of a shame we don't report i/d side, and ideally we'd be able
> > to report multiple conditions. The reporting APIs really want to be
> > massaged a bit, but for now this is a good step.  
> 
> Ah snap, yep, more detail & multiple conditions would be nice.
> 
> I don't really understand the way we do the reporting now. The
> struct machine_check_event is all carefully laid out with reserved
> fields and a version number and everything as if it's an ABI. But AFAICS
> it's purely internal to the kernel.
> 
> And then we have struct mce_error_info, but that's a separate thing and
> struct machine_check_event doesn't contain one of them?

Yeah I noticed that too a while back, was it an old OPAL API or maybe a
proposed new API that was never implemented? I would like to end up
doing most MCE decoding in firmware at some point, but I don't think
it's worth keeping this existing ABI thing around for it.

Thanks,
Nick

  reply	other threads:[~2018-06-17 12:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-13 13:24 [PATCH] powerpc/64s: Report SLB multi-hit rather than parity error Michael Ellerman
2018-06-13 14:40 ` Nicholas Piggin
2018-06-15 11:37   ` Michael Ellerman
2018-06-17 12:07     ` Nicholas Piggin [this message]
2018-07-19  6:07 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180617220757.3bafd4c2@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).