From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B163C43603 for ; Sun, 8 Dec 2019 13:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5BA64206E0 for ; Sun, 8 Dec 2019 13:58:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726607AbfLHN6p (ORCPT ); Sun, 8 Dec 2019 08:58:45 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:60012 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLHNyk (ORCPT ); Sun, 8 Dec 2019 08:54:40 -0500 Received: from [192.168.4.242] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1idx1B-0007dL-81; Sun, 08 Dec 2019 13:54:37 +0000 Received: from ben by deadeye with local (Exim 4.93-RC1) (envelope-from ) id 1idx1A-0002LM-O3; Sun, 08 Dec 2019 13:54:36 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, Denis Kirjanov , "Xiaofei Tan" , "James Morse" , "Ard Biesheuvel" Date: Sun, 08 Dec 2019 13:52:56 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) X-Patchwork-Hint: ignore Subject: [PATCH 3.16 12/72] efi: cper: print AER info of PCIe fatal error In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.242 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org 3.16.79-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Xiaofei Tan commit b194a77fcc4001dc40aecdd15d249648e8a436d1 upstream. AER info of PCIe fatal error is not printed in the current driver. Because APEI driver will panic directly for fatal error, and can't run to the place of printing AER info. An example log is as following: {763}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 11 {763}[Hardware Error]: event severity: fatal {763}[Hardware Error]: Error 0, type: fatal {763}[Hardware Error]: section_type: PCIe error {763}[Hardware Error]: port_type: 0, PCIe end point {763}[Hardware Error]: version: 4.0 {763}[Hardware Error]: command: 0x0000, status: 0x0010 {763}[Hardware Error]: device_id: 0000:82:00.0 {763}[Hardware Error]: slot: 0 {763}[Hardware Error]: secondary_bus: 0x00 {763}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10fb {763}[Hardware Error]: class_code: 000002 Kernel panic - not syncing: Fatal hardware error! This issue was imported by the patch, '37448adfc7ce ("aerdrv: Move cper_print_aer() call out of interrupt context")'. To fix this issue, this patch adds print of AER info in cper_print_pcie() for fatal error. Here is the example log after this patch applied: {24}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 10 {24}[Hardware Error]: event severity: fatal {24}[Hardware Error]: Error 0, type: fatal {24}[Hardware Error]: section_type: PCIe error {24}[Hardware Error]: port_type: 0, PCIe end point {24}[Hardware Error]: version: 4.0 {24}[Hardware Error]: command: 0x0546, status: 0x4010 {24}[Hardware Error]: device_id: 0000:01:00.0 {24}[Hardware Error]: slot: 0 {24}[Hardware Error]: secondary_bus: 0x00 {24}[Hardware Error]: vendor_id: 0x15b3, device_id: 0x1019 {24}[Hardware Error]: class_code: 000002 {24}[Hardware Error]: aer_uncor_status: 0x00040000, aer_uncor_mask: 0x00000000 {24}[Hardware Error]: aer_uncor_severity: 0x00062010 {24}[Hardware Error]: TLP Header: 000000c0 01010000 00000001 00000000 Kernel panic - not syncing: Fatal hardware error! Fixes: 37448adfc7ce ("aerdrv: Move cper_print_aer() call out of interrupt context") Signed-off-by: Xiaofei Tan Reviewed-by: James Morse [ardb: put parens around terms of && operator] Signed-off-by: Ard Biesheuvel Signed-off-by: Ben Hutchings --- drivers/firmware/efi/cper.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -295,6 +295,21 @@ static void cper_print_pcie(const char * printk( "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n", pfx, pcie->bridge.secondary_status, pcie->bridge.control); + + /* Fatal errors call __ghes_panic() before AER handler prints this */ + if ((pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) && + (gdata->error_severity & CPER_SEV_FATAL)) { + struct aer_capability_regs *aer; + + aer = (struct aer_capability_regs *)pcie->aer_info; + printk("%saer_uncor_status: 0x%08x, aer_uncor_mask: 0x%08x\n", + pfx, aer->uncor_status, aer->uncor_mask); + printk("%saer_uncor_severity: 0x%08x\n", + pfx, aer->uncor_severity); + printk("%sTLP Header: %08x %08x %08x %08x\n", pfx, + aer->header_log.dw0, aer->header_log.dw1, + aer->header_log.dw2, aer->header_log.dw3); + } } static void cper_estatus_print_section(