linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Barrat <fbarrat@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev@lists.ozlabs.org, ajd@linux.ibm.com,
	arbab@linux.ibm.com, aik@ozlabs.ru
Cc: clombard@linux.ibm.com, groug@kaod.org
Subject: Re: [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs
Date: Thu, 23 May 2019 16:52:55 +0200	[thread overview]
Message-ID: <5cf24147-006c-264f-462e-65961c4db0c6@linux.ibm.com> (raw)
In-Reply-To: <87blztqo7u.fsf@concordia.ellerman.id.au>



Le 23/05/2019 à 15:45, Michael Ellerman a écrit :
> Frederic Barrat <fbarrat@linux.ibm.com> writes:
> 
>> If the kernel is notified of an HMI caused by the NPU2, it's currently
>> not being recognized and it logs the default message:
>>
>>      Unknown Malfunction Alert of type 3
>>
>> The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
>> possible causes, but we should at least log that it's an NPU problem
>> and report which FIR and which bit were raised if opal gave us the
>> information.
>>
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> ---
>>
>> Could be merged independently from (the opal-api.h change is already
>> in the skiboot tree), but works better with, the matching skiboot
>> change:
>> http://patchwork.ozlabs.org/patch/1104076/
> 
> Well it *must* work with or without the skiboot change, because old/new
> kernels will run on old/new skiboots.
> 
> It looks like it will work fine, we just won't get any extra information
> in xstop_reason, right?


Yes, that's understood, and it was tested. On an old skiboot, we're now 
printing that we got an NPU checkstop (instead of the "unknown 
malfunction alert"), we just won't have the extra FIR info. That's what 
I meant by "works better with the skiboot patch".

   Fred



> cheers
> 
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index e1577cfa7186..2492fe248e1e 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -568,6 +568,7 @@ enum OpalHMI_XstopType {
>>   	CHECKSTOP_TYPE_UNKNOWN	=	0,
>>   	CHECKSTOP_TYPE_CORE	=	1,
>>   	CHECKSTOP_TYPE_NX	=	2,
>> +	CHECKSTOP_TYPE_NPU	=	3
>>   };
>>   
>>   enum OpalHMI_CoreXstopReason {
>> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
>> index 586ec71a4e17..de12a240b477 100644
>> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
>> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
>> @@ -149,6 +149,43 @@ static void print_nx_checkstop_reason(const char *level,
>>   					xstop_reason[i].description);
>>   }
>>   
>> +static void print_npu_checkstop_reason(const char *level,
>> +					struct OpalHMIEvent *hmi_evt)
>> +{
>> +	uint8_t reason, reason_count, i;
>> +
>> +	/*
>> +	 * We may not have a checkstop reason on some combination of
>> +	 * hardware and/or skiboot version
>> +	 */
>> +	if (!hmi_evt->u.xstop_error.xstop_reason) {
>> +		printk("%s	NPU checkstop on chip %x\n", level,
>> +			be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id));
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * NPU2 has 3 FIRs. Reason encoded on a byte as:
>> +	 *   2 bits for the FIR number
>> +	 *   6 bits for the bit number
>> +	 * It may be possible to find several reasons.
>> +	 *
>> +	 * We don't display a specific message per FIR bit as there
>> +	 * are too many and most are meaningless without the workbook
>> +	 * and/or hw team help anyway.
>> +	 */
>> +	reason_count = sizeof(hmi_evt->u.xstop_error.xstop_reason) /
>> +		sizeof(reason);
>> +	for (i = 0; i < reason_count; i++) {
>> +		reason = (hmi_evt->u.xstop_error.xstop_reason >> (8 * i)) & 0xFF;
>> +		if (reason)
>> +			printk("%s	NPU checkstop on chip %x: FIR%d bit %d is set\n",
>> +				level,
>> +				be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id),
>> +				reason >> 6, reason & 0x3F);
>> +	}
>> +}
>> +
>>   static void print_checkstop_reason(const char *level,
>>   					struct OpalHMIEvent *hmi_evt)
>>   {
>> @@ -160,6 +197,9 @@ static void print_checkstop_reason(const char *level,
>>   	case CHECKSTOP_TYPE_NX:
>>   		print_nx_checkstop_reason(level, hmi_evt);
>>   		break;
>> +	case CHECKSTOP_TYPE_NPU:
>> +		print_npu_checkstop_reason(level, hmi_evt);
>> +		break;
>>   	default:
>>   		printk("%s	Unknown Malfunction Alert of type %d\n",
>>   		       level, type);
>> -- 
>> 2.21.0
> 


  reply	other threads:[~2019-05-23 14:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 12:28 [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs Frederic Barrat
2019-05-23 13:45 ` Michael Ellerman
2019-05-23 14:52   ` Frederic Barrat [this message]
2019-05-30  2:00 ` Andrew Donnellan
2019-06-03 12:32 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5cf24147-006c-264f-462e-65961c4db0c6@linux.ibm.com \
    --to=fbarrat@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=ajd@linux.ibm.com \
    --cc=arbab@linux.ibm.com \
    --cc=clombard@linux.ibm.com \
    --cc=groug@kaod.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).