linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@ozlabs.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>, <skiboot@lists.ozlabs.org>
Subject: Re: [RFC PATCH 1/7] powerpc/book3s: Move machine check event structure to opal-api.h
Date: Tue, 21 Feb 2017 12:35:20 +1000	[thread overview]
Message-ID: <20170221123520.594ad9df@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <148764191593.19289.166429326262713768.stgit@jupiter.in.ibm.com>

On Tue, 21 Feb 2017 07:21:56 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> +enum MCE_TlbErrorType {
> +	MCE_TLB_ERROR_INDETERMINATE = 0,
> +	MCE_TLB_ERROR_PARITY = 1,
> +	MCE_TLB_ERROR_MULTIHIT = 2,
> +	MCE_TLB_ERROR_TLBIEL_PROG_ERROR = 3,
> +};

The new TLBIE error isn't really a TLB error as such. Not a hardware error.
I added a new "user" type for it.

I don't think we can handle it just by flushing TLB because it can also be
raised in response to invalid non-local tlbie. We could flush all TLBs maybe
but I think also have to advance nip to return to.

> +
> +enum MCE_NestErrorType {
> +	MCE_NEST_ERROR_ABRT_IFETCH = 0,
> +	MCE_NEST_ERROR_ABRT_IFETCH_TABLEWALK = 1,
> +	MCE_NEST_ERROR_ABRT_LOAD = 2,
> +	MCE_NEST_ERROR_ABRT_LOAD_TABLEWALK = 3,
> +};
> +
> +enum MCE_CrespErrorType {
> +	MCE_CRESP_ERROR_BAD_RADDR_IFETCH = 0,
> +	MCE_CRESP_ERROR_BAD_RADDR_IFETCH_TABLEWALK = 1,
> +	MCE_CRESP_ERROR_BAD_RADDR_LOAD = 2,
> +	MCE_CRESP_ERROR_BAD_RADDR_LOAD_TABLEWALK = 3,
> +};
> +
> +enum MCE_FspaceErrorType {
> +	MCE_FSPACE_ERROR_IFETCH = 0,
> +	MCE_FSPACE_ERROR_IFETCH_TABLEWALK = 1,
> +	MCE_FSPACE_ERROR_RADDR_TRANSLATION = 2,
> +	MCE_FSPACE_ERROR_RADDR_LOAD = 3,
> +};
> +
> +enum MCE_AsyncErrorType {
> +	MCE_ASYNC_ERROR_REAL_ADDR_STORE = 0,
> +	MCE_ASYNC_ERROR_NEST_ABRT_STORE = 1,
> +};
> +
> +struct OpalMachineCheckEvent {

Can we have more of a think about this structure and error types
before making it an OPAL API?

Errors don't always fit neatly into a simple classification like
this. For example "async" is not really an error. It's a property
of how the error is reported. The error is a timeout or real
address error. And it's caused by a store. And initiated by nest
or cResp... Other errors are caused by a table walk that was
caused by a store, etc.

I shoehorned these async errors into realaddr/link types in my
patch along with a different severity (i.e., not SYNC). But I
think we can do a lot better with a clean slate for OPAL.

More general thing is, I wonder how much we need to know of the
implementation details in this API? This still seems like it's
unnecessarily split between OS and FW. I think it would be much
nicer if we just return a set of things that the OS can usefully
respond to and have firmware construct the detailed messages for
logging.

That way we'll have much fewer new types of errors we don't know
how to handle, and never have to report unknown error.

Thanks,
Nick

  reply	other threads:[~2017-02-21  2:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21  1:51 [RFC PATCH 0/7] Machine check handling for Power9 with bacward compatibility Mahesh J Salgaonkar
2017-02-21  1:51 ` [RFC PATCH 1/7] powerpc/book3s: Move machine check event structure to opal-api.h Mahesh J Salgaonkar
2017-02-21  2:35   ` Nicholas Piggin [this message]
2017-02-21  6:51     ` Mahesh Jagannath Salgaonkar
2017-02-21  1:52 ` [RFC PATCH 2/7] powerpc/book3s: mce: Call opal mce handler to extract MCE error reason Mahesh J Salgaonkar
2017-02-21  1:52 ` [RFC PATCH 3/7] powerpc/book3s: mce: Process the MCE event and recover if possible Mahesh J Salgaonkar
2017-02-21  1:52 ` [RFC PATCH 4/7] powerpc/book3s: Print additional MCE errors introduced in power9 Mahesh J Salgaonkar
2017-02-21  1:52 ` [RFC PATCH 5/7] powerpc/book3s: Don't turn on the MSR[ME] bit until opal processes the reason Mahesh J Salgaonkar
2017-02-21  2:47   ` Nicholas Piggin
2017-02-21  4:17     ` Mahesh Jagannath Salgaonkar
2017-02-21  4:43       ` Nicholas Piggin
2017-02-21  1:53 ` [RFC PATCH 6/7] powerpc/book3s: Display more info for MCE error console log Mahesh J Salgaonkar
2017-02-21  1:53 ` [RFC PATCH 7/7] powerpc/book3s: Display task info for MCE error in user mode Mahesh J Salgaonkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170221123520.594ad9df@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=skiboot@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).