From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66627C282DD for ; Thu, 23 May 2019 12:30:11 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ADC0220851 for ; Thu, 23 May 2019 12:30:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ADC0220851 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 458pjr27r1zDqcQ for ; Thu, 23 May 2019 22:30:08 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=fbarrat@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 458pgf73gdzDqbp for ; Thu, 23 May 2019 22:28:13 +1000 (AEST) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4NC98iC004630 for ; Thu, 23 May 2019 08:28:11 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2snsy1vqfr-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 23 May 2019 08:28:11 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 23 May 2019 13:28:09 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 23 May 2019 13:28:06 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x4NCS5ui25821196 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 23 May 2019 12:28:05 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38F6D11C052; Thu, 23 May 2019 12:28:05 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E359711C054; Thu, 23 May 2019 12:28:04 +0000 (GMT) Received: from bali.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.17]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 23 May 2019 12:28:04 +0000 (GMT) From: Frederic Barrat To: linuxppc-dev@lists.ozlabs.org, ajd@linux.ibm.com, arbab@linux.ibm.com, aik@ozlabs.ru Subject: [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs Date: Thu, 23 May 2019 14:28:04 +0200 X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19052312-0020-0000-0000-0000033FACC8 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19052312-0021-0000-0000-000021929701 Message-Id: <20190523122804.4801-1-fbarrat@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-23_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905230087 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: clombard@linux.ibm.com, groug@kaod.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" If the kernel is notified of an HMI caused by the NPU2, it's currently not being recognized and it logs the default message: Unknown Malfunction Alert of type 3 The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of possible causes, but we should at least log that it's an NPU problem and report which FIR and which bit were raised if opal gave us the information. Signed-off-by: Frederic Barrat --- Could be merged independently from (the opal-api.h change is already in the skiboot tree), but works better with, the matching skiboot change: http://patchwork.ozlabs.org/patch/1104076/ arch/powerpc/include/asm/opal-api.h | 1 + arch/powerpc/platforms/powernv/opal-hmi.c | 40 +++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index e1577cfa7186..2492fe248e1e 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -568,6 +568,7 @@ enum OpalHMI_XstopType { CHECKSTOP_TYPE_UNKNOWN = 0, CHECKSTOP_TYPE_CORE = 1, CHECKSTOP_TYPE_NX = 2, + CHECKSTOP_TYPE_NPU = 3 }; enum OpalHMI_CoreXstopReason { diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c index 586ec71a4e17..de12a240b477 100644 --- a/arch/powerpc/platforms/powernv/opal-hmi.c +++ b/arch/powerpc/platforms/powernv/opal-hmi.c @@ -149,6 +149,43 @@ static void print_nx_checkstop_reason(const char *level, xstop_reason[i].description); } +static void print_npu_checkstop_reason(const char *level, + struct OpalHMIEvent *hmi_evt) +{ + uint8_t reason, reason_count, i; + + /* + * We may not have a checkstop reason on some combination of + * hardware and/or skiboot version + */ + if (!hmi_evt->u.xstop_error.xstop_reason) { + printk("%s NPU checkstop on chip %x\n", level, + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id)); + return; + } + + /* + * NPU2 has 3 FIRs. Reason encoded on a byte as: + * 2 bits for the FIR number + * 6 bits for the bit number + * It may be possible to find several reasons. + * + * We don't display a specific message per FIR bit as there + * are too many and most are meaningless without the workbook + * and/or hw team help anyway. + */ + reason_count = sizeof(hmi_evt->u.xstop_error.xstop_reason) / + sizeof(reason); + for (i = 0; i < reason_count; i++) { + reason = (hmi_evt->u.xstop_error.xstop_reason >> (8 * i)) & 0xFF; + if (reason) + printk("%s NPU checkstop on chip %x: FIR%d bit %d is set\n", + level, + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id), + reason >> 6, reason & 0x3F); + } +} + static void print_checkstop_reason(const char *level, struct OpalHMIEvent *hmi_evt) { @@ -160,6 +197,9 @@ static void print_checkstop_reason(const char *level, case CHECKSTOP_TYPE_NX: print_nx_checkstop_reason(level, hmi_evt); break; + case CHECKSTOP_TYPE_NPU: + print_npu_checkstop_reason(level, hmi_evt); + break; default: printk("%s Unknown Malfunction Alert of type %d\n", level, type); -- 2.21.0