From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: pmenzel@molgen.mpg.de, Vishal Agrawal <vagrawal@redhat.com>,
linux-pci@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, jkc@redhat.com,
Przemek Kitszel <przemyslaw.kitszel@intel.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: reset first in crash dump kernels
Date: Mon, 02 Oct 2023 16:49:46 -0700 [thread overview]
Message-ID: <17923.1696290586@famine> (raw)
In-Reply-To: <20231002200232.3682771-1-jesse.brandeburg@intel.com>
Jesse Brandeburg <jesse.brandeburg@intel.com> wrote:
>When the system boots into the crash dump kernel after a panic, the ice
>networking device may still have pending transactions that can cause errors
>or machine checks when the device is re-enabled. This can prevent the crash
>dump kernel from loading the driver or collecting the crash data.
>
>To avoid this issue, perform a function level reset (FLR) on the ice device
>via PCIe config space before enabling it on the crash kernel. This will
>clear any outstanding transactions and stop all queues and interrupts.
>Restore the config space after the FLR, otherwise it was found in testing
>that the driver wouldn't load successfully.
How does this differ from ading "reset_devices" to the crash
kernel command line, per Documentation/admin-guide/kdump/kdump.rst?
-J
>The following sequence causes the original issue:
>- Load the ice driver with modprobe ice
>- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
>- Trigger a crash with echo c > /proc/sysrq-trigger
>- Load the ice driver again (or let it load automatically) with modprobe ice
>- The system crashes again during pcim_enable_device()
>
>Reported-by: Vishal Agrawal <vagrawal@redhat.com>
>Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>---
>v2: respond to list comments and update commit message
>v1: initial version
>---
> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
>index c8286adae946..6550c46e4e36 100644
>--- a/drivers/net/ethernet/intel/ice/ice_main.c
>+++ b/drivers/net/ethernet/intel/ice/ice_main.c
>@@ -6,6 +6,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <generated/utsrelease.h>
>+#include <linux/crash_dump.h>
> #include "ice.h"
> #include "ice_base.h"
> #include "ice_lib.h"
>@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
> return -EINVAL;
> }
>
>+ /* when under a kdump kernel initiate a reset before enabling the
>+ * device in order to clear out any pending DMA transactions. These
>+ * transactions can cause some systems to machine check when doing
>+ * the pcim_enable_device() below.
>+ */
>+ if (is_kdump_kernel()) {
>+ pci_save_state(pdev);
>+ pci_clear_master(pdev);
>+ err = pcie_flr(pdev);
>+ if (err)
>+ return err;
>+ pci_restore_state(pdev);
>+ }
>+
> /* this driver uses devres, see
> * Documentation/driver-api/driver-model/devres.rst
> */
>
>base-commit: 6a70e5cbedaf8ad10528ac9ac114f3ec20f422df
>--
>2.39.3
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
WARNING: multiple messages have this Message-ID (diff)
From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: intel-wired-lan@lists.osuosl.org, linux-pci@vger.kernel.org,
pmenzel@molgen.mpg.de, netdev@vger.kernel.org, jkc@redhat.com,
Vishal Agrawal <vagrawal@redhat.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>
Subject: Re: [PATCH iwl-net v2] ice: reset first in crash dump kernels
Date: Mon, 02 Oct 2023 16:49:46 -0700 [thread overview]
Message-ID: <17923.1696290586@famine> (raw)
In-Reply-To: <20231002200232.3682771-1-jesse.brandeburg@intel.com>
Jesse Brandeburg <jesse.brandeburg@intel.com> wrote:
>When the system boots into the crash dump kernel after a panic, the ice
>networking device may still have pending transactions that can cause errors
>or machine checks when the device is re-enabled. This can prevent the crash
>dump kernel from loading the driver or collecting the crash data.
>
>To avoid this issue, perform a function level reset (FLR) on the ice device
>via PCIe config space before enabling it on the crash kernel. This will
>clear any outstanding transactions and stop all queues and interrupts.
>Restore the config space after the FLR, otherwise it was found in testing
>that the driver wouldn't load successfully.
How does this differ from ading "reset_devices" to the crash
kernel command line, per Documentation/admin-guide/kdump/kdump.rst?
-J
>The following sequence causes the original issue:
>- Load the ice driver with modprobe ice
>- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
>- Trigger a crash with echo c > /proc/sysrq-trigger
>- Load the ice driver again (or let it load automatically) with modprobe ice
>- The system crashes again during pcim_enable_device()
>
>Reported-by: Vishal Agrawal <vagrawal@redhat.com>
>Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>---
>v2: respond to list comments and update commit message
>v1: initial version
>---
> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
>index c8286adae946..6550c46e4e36 100644
>--- a/drivers/net/ethernet/intel/ice/ice_main.c
>+++ b/drivers/net/ethernet/intel/ice/ice_main.c
>@@ -6,6 +6,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <generated/utsrelease.h>
>+#include <linux/crash_dump.h>
> #include "ice.h"
> #include "ice_base.h"
> #include "ice_lib.h"
>@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
> return -EINVAL;
> }
>
>+ /* when under a kdump kernel initiate a reset before enabling the
>+ * device in order to clear out any pending DMA transactions. These
>+ * transactions can cause some systems to machine check when doing
>+ * the pcim_enable_device() below.
>+ */
>+ if (is_kdump_kernel()) {
>+ pci_save_state(pdev);
>+ pci_clear_master(pdev);
>+ err = pcie_flr(pdev);
>+ if (err)
>+ return err;
>+ pci_restore_state(pdev);
>+ }
>+
> /* this driver uses devres, see
> * Documentation/driver-api/driver-model/devres.rst
> */
>
>base-commit: 6a70e5cbedaf8ad10528ac9ac114f3ec20f422df
>--
>2.39.3
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2023-10-02 23:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-02 20:02 [Intel-wired-lan] [PATCH iwl-net v2] ice: reset first in crash dump kernels Jesse Brandeburg
2023-10-02 20:02 ` Jesse Brandeburg
2023-10-02 23:49 ` Jay Vosburgh [this message]
2023-10-02 23:49 ` Jay Vosburgh
2023-10-03 5:50 ` [Intel-wired-lan] " Jesse Brandeburg
2023-10-03 5:50 ` Jesse Brandeburg
2023-10-03 17:43 ` [Intel-wired-lan] " Jay Vosburgh
2023-10-03 17:43 ` Jay Vosburgh
2023-10-03 22:41 ` [Intel-wired-lan] " Tony Nguyen
2023-10-03 22:41 ` Tony Nguyen
2023-10-04 18:59 ` [Intel-wired-lan] " Jesse Brandeburg
2023-10-04 18:59 ` Jesse Brandeburg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17923.1696290586@famine \
--to=jay.vosburgh@canonical.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jesse.brandeburg@intel.com \
--cc=jkc@redhat.com \
--cc=linux-pci@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
--cc=przemyslaw.kitszel@intel.com \
--cc=vagrawal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.