From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1D08C001E0 for ; Mon, 23 Oct 2023 11:35:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234041AbjJWLfS (ORCPT ); Mon, 23 Oct 2023 07:35:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234038AbjJWLfP (ORCPT ); Mon, 23 Oct 2023 07:35:15 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C46FE4 for ; Mon, 23 Oct 2023 04:35:13 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D13E4C433C9; Mon, 23 Oct 2023 11:35:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1698060913; bh=323e7ZPBxUerWjSU6JDOH1uHcQCs5HNegQ0iZeElMLc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=1bweaR8xrgqUdN8ShJ5+Ft85Hc9P1p0NmZXXm23HRpbZl9jFtDrBPFxE1b/nnJ9us Q3tjqt1/MP1jauVr+gz+slQ5LxcfKN3y+Vk6UJQZCtOPcMy33j6oE4IG58bAjCt1jj UR8jsVIxQRBdczzt8/L6H6xsxSro+GrxQ3/2cL/E= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Vishal Agrawal , Jay Vosburgh , Przemek Kitszel , Jesse Brandeburg , Jakub Kicinski , Pucha Himasekhar Reddy Subject: [PATCH 5.15 011/137] ice: reset first in crash dump kernels Date: Mon, 23 Oct 2023 12:56:08 +0200 Message-ID: <20231023104821.282489932@linuxfoundation.org> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231023104820.849461819@linuxfoundation.org> References: <20231023104820.849461819@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org 5.15-stable review patch. If anyone has any objections, please let me know. ------------------ From: Jesse Brandeburg commit 0288c3e709e5fabd51e84715c5c798a02f43061a upstream. When the system boots into the crash dump kernel after a panic, the ice networking device may still have pending transactions that can cause errors or machine checks when the device is re-enabled. This can prevent the crash dump kernel from loading the driver or collecting the crash data. To avoid this issue, perform a function level reset (FLR) on the ice device via PCIe config space before enabling it on the crash kernel. This will clear any outstanding transactions and stop all queues and interrupts. Restore the config space after the FLR, otherwise it was found in testing that the driver wouldn't load successfully. The following sequence causes the original issue: - Load the ice driver with modprobe ice - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs - Trigger a crash with echo c > /proc/sysrq-trigger - Load the ice driver again (or let it load automatically) with modprobe ice - The system crashes again during pcim_enable_device() Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Reported-by: Vishal Agrawal Reviewed-by: Jay Vosburgh Reviewed-by: Przemek Kitszel Signed-off-by: Jesse Brandeburg Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6,6 +6,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include "ice.h" #include "ice_base.h" #include "ice_lib.h" @@ -4255,6 +4256,20 @@ ice_probe(struct pci_dev *pdev, const st return -EINVAL; } + /* when under a kdump kernel initiate a reset before enabling the + * device in order to clear out any pending DMA transactions. These + * transactions can cause some systems to machine check when doing + * the pcim_enable_device() below. + */ + if (is_kdump_kernel()) { + pci_save_state(pdev); + pci_clear_master(pdev); + err = pcie_flr(pdev); + if (err) + return err; + pci_restore_state(pdev); + } + /* this driver uses devres, see * Documentation/driver-api/driver-model/devres.rst */