* [Intel-wired-lan] [PATCH iwl-net v1] ice: reset first in crash dump kernels
@ 2023-09-19 21:29 Jesse Brandeburg
2023-09-20 5:18 ` Paul Menzel
0 siblings, 1 reply; 3+ messages in thread
From: Jesse Brandeburg @ 2023-09-19 21:29 UTC (permalink / raw)
To: intel-wired-lan; +Cc: Przemek Kitszel, jkc, Jesse Brandeburg
When booting into the crash dump kernels there are cases where upon
enabling the device, the system under test will panic or machine check.
One such test is to
- load ice driver
$ modprobe ice
- enable SR-IOV (2 VFs)
$ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
- crash
echo c > /proc/sysrq-trigger
- load ice driver (or happens automatically)
modprobe ice
- crash during pcim_enable_device()
Avoid this problem by issuing a FLR to the device via PCIe config space
on the crash kernel, to clear out any outstanding transactions and stop
all queues and interrupts. Restore config space afterword because the
driver won't load successfully otherwise.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c8286adae946..6550c46e4e36 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6,6 +6,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <generated/utsrelease.h>
+#include <linux/crash_dump.h>
#include "ice.h"
#include "ice_base.h"
#include "ice_lib.h"
@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
return -EINVAL;
}
+ /* when under a kdump kernel initiate a reset before enabling the
+ * device in order to clear out any pending DMA transactions. These
+ * transactions can cause some systems to machine check when doing
+ * the pcim_enable_device() below.
+ */
+ if (is_kdump_kernel()) {
+ pci_save_state(pdev);
+ pci_clear_master(pdev);
+ err = pcie_flr(pdev);
+ if (err)
+ return err;
+ pci_restore_state(pdev);
+ }
+
/* this driver uses devres, see
* Documentation/driver-api/driver-model/devres.rst
*/
--
2.39.3
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net v1] ice: reset first in crash dump kernels
2023-09-19 21:29 [Intel-wired-lan] [PATCH iwl-net v1] ice: reset first in crash dump kernels Jesse Brandeburg
@ 2023-09-20 5:18 ` Paul Menzel
2023-10-02 18:16 ` Jesse Brandeburg
0 siblings, 1 reply; 3+ messages in thread
From: Paul Menzel @ 2023-09-20 5:18 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: Przemek Kitszel, jkc, intel-wired-lan
Dear Jesse,
Thank you for your patch.
Am 19.09.23 um 23:29 schrieb Jesse Brandeburg:
> When booting into the crash dump kernels there are cases where upon
> enabling the device, the system under test will panic or machine check.
>
> One such test is to
> - load ice driver
> $ modprobe ice
> - enable SR-IOV (2 VFs)
> $ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
> - crash
> echo c > /proc/sysrq-trigger
Above you prepended a $.
> - load ice driver (or happens automatically)
> modprobe ice
> - crash during pcim_enable_device()
>
> Avoid this problem by issuing a FLR to the device via PCIe config space
> on the crash kernel, to clear out any outstanding transactions and stop
> all queues and interrupts. Restore config space afterword because the
afterw*a*rd
> driver won't load successfully otherwise.
Excuse my ignorance, could you please add, what the crashdump kernel
does differently from the “normal” kernel, so this special handling is
needed?
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index c8286adae946..6550c46e4e36 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -6,6 +6,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <generated/utsrelease.h>
> +#include <linux/crash_dump.h>
> #include "ice.h"
> #include "ice_base.h"
> #include "ice_lib.h"
> @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
> return -EINVAL;
> }
>
> + /* when under a kdump kernel initiate a reset before enabling the
> + * device in order to clear out any pending DMA transactions. These
> + * transactions can cause some systems to machine check when doing
> + * the pcim_enable_device() below.
> + */
> + if (is_kdump_kernel()) {
> + pci_save_state(pdev);
> + pci_clear_master(pdev);
> + err = pcie_flr(pdev);
> + if (err)
> + return err;
> + pci_restore_state(pdev);
> + }
> +
Should this be added to the common PCI code? Maybe loop the PCI
subsystem folks in?
> /* this driver uses devres, see
> * Documentation/driver-api/driver-model/devres.rst
> */
Kind regards,
Paul
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net v1] ice: reset first in crash dump kernels
2023-09-20 5:18 ` Paul Menzel
@ 2023-10-02 18:16 ` Jesse Brandeburg
0 siblings, 0 replies; 3+ messages in thread
From: Jesse Brandeburg @ 2023-10-02 18:16 UTC (permalink / raw)
To: Paul Menzel; +Cc: Przemek Kitszel, jkc, intel-wired-lan
On 9/19/2023 10:18 PM, Paul Menzel wrote:
> Dear Jesse,
>
>
> Thank you for your patch.
>
> Am 19.09.23 um 23:29 schrieb Jesse Brandeburg:
>> When booting into the crash dump kernels there are cases where upon
>> enabling the device, the system under test will panic or machine check.
>>
>> One such test is to
>> - load ice driver
>> $ modprobe ice
>> - enable SR-IOV (2 VFs)
>> $ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
>> - crash
>> echo c > /proc/sysrq-trigger
>
> Above you prepended a $.
Fixed in v2.
>
>> - load ice driver (or happens automatically)
>> modprobe ice
>> - crash during pcim_enable_device()
>>
>> Avoid this problem by issuing a FLR to the device via PCIe config space
>> on the crash kernel, to clear out any outstanding transactions and stop
>> all queues and interrupts. Restore config space afterword because the
>
> afterw*a*rd
Fixed in v2.
>
>> driver won't load successfully otherwise.
>
> Excuse my ignorance, could you please add, what the crashdump kernel
> does differently from the “normal” kernel, so this special handling is
> needed?
I added more description in the v2 commit message, I hope that helps.
In summary: the crashdump kernel is starting up on "dirty" state of
hardware, due to the surprise crash of the previously running kernel
that had running devices when it "panicked"
>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> ---
>> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
>> 1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
>> b/drivers/net/ethernet/intel/ice/ice_main.c
>> index c8286adae946..6550c46e4e36 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>> @@ -6,6 +6,7 @@
>> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> #include <generated/utsrelease.h>
>> +#include <linux/crash_dump.h>
>> #include "ice.h"
>> #include "ice_base.h"
>> #include "ice_lib.h"
>> @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct
>> pci_device_id __always_unused *ent)
>> return -EINVAL;
>> }
>> + /* when under a kdump kernel initiate a reset before enabling the
>> + * device in order to clear out any pending DMA transactions. These
>> + * transactions can cause some systems to machine check when doing
>> + * the pcim_enable_device() below.
>> + */
>> + if (is_kdump_kernel()) {
>> + pci_save_state(pdev);
>> + pci_clear_master(pdev);
>> + err = pcie_flr(pdev);
>> + if (err)
>> + return err;
>> + pci_restore_state(pdev);
>> + }
>> +
>
> Should this be added to the common PCI code? Maybe loop the PCI
> subsystem folks in?
Ok, I'll cc: linux-pci when I send v2.
>
>> /* this driver uses devres, see
>> * Documentation/driver-api/driver-model/devres.rst
>> */
>
>
> Kind regards,
>
> Paul
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-10-02 19:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-19 21:29 [Intel-wired-lan] [PATCH iwl-net v1] ice: reset first in crash dump kernels Jesse Brandeburg
2023-09-20 5:18 ` Paul Menzel
2023-10-02 18:16 ` Jesse Brandeburg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox