* [PATCH net 1/3] i40e: prevent crash on probe if hw registers have invalid values
2023-10-11 23:33 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jacob Keller
@ 2023-10-11 23:33 ` Jacob Keller
2023-10-11 23:33 ` [PATCH net 2/3] ice: reset first in crash dump kernels Jacob Keller
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2023-10-11 23:33 UTC (permalink / raw)
To: netdev, David Miller, Jakub Kicinski
Cc: Michal Schmidt, Simon Horman, Pucha Himasekhar Reddy
From: Michal Schmidt <mschmidt@redhat.com>
The hardware provides the indexes of the first and the last available
queue and VF. From the indexes, the driver calculates the numbers of
queues and VFs. In theory, a faulty device might say the last index is
smaller than the first index. In that case, the driver's calculation
would underflow, it would attempt to write to non-existent registers
outside of the ioremapped range and crash.
I ran into this not by having a faulty device, but by an operator error.
I accidentally ran a QE test meant for i40e devices on an ice device.
The test used 'echo i40e > /sys/...ice PCI device.../driver_override',
bound the driver to the device and crashed in one of the wr32 calls in
i40e_clear_hw.
Add checks to prevent underflows in the calculations of num_queues and
num_vfs. With this fix, the wrong device probing reports errors and
returns a failure without crashing.
Fixes: 838d41d92a90 ("i40e: clear all queues and interrupts")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
---
drivers/net/ethernet/intel/i40e/i40e_common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index eeef20f77106..1b493854f522 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1082,7 +1082,7 @@ void i40e_clear_hw(struct i40e_hw *hw)
I40E_PFLAN_QALLOC_FIRSTQ_SHIFT;
j = (val & I40E_PFLAN_QALLOC_LASTQ_MASK) >>
I40E_PFLAN_QALLOC_LASTQ_SHIFT;
- if (val & I40E_PFLAN_QALLOC_VALID_MASK)
+ if (val & I40E_PFLAN_QALLOC_VALID_MASK && j >= base_queue)
num_queues = (j - base_queue) + 1;
else
num_queues = 0;
@@ -1092,7 +1092,7 @@ void i40e_clear_hw(struct i40e_hw *hw)
I40E_PF_VT_PFALLOC_FIRSTVF_SHIFT;
j = (val & I40E_PF_VT_PFALLOC_LASTVF_MASK) >>
I40E_PF_VT_PFALLOC_LASTVF_SHIFT;
- if (val & I40E_PF_VT_PFALLOC_VALID_MASK)
+ if (val & I40E_PF_VT_PFALLOC_VALID_MASK && j >= i)
num_vfs = (j - i) + 1;
else
num_vfs = 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net 2/3] ice: reset first in crash dump kernels
2023-10-11 23:33 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jacob Keller
2023-10-11 23:33 ` [PATCH net 1/3] i40e: prevent crash on probe if hw registers have invalid values Jacob Keller
@ 2023-10-11 23:33 ` Jacob Keller
2023-10-11 23:33 ` [PATCH net 3/3] ice: Fix safe mode when DDP is missing Jacob Keller
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2023-10-11 23:33 UTC (permalink / raw)
To: netdev, David Miller, Jakub Kicinski
Cc: Jesse Brandeburg, Vishal Agrawal, Jay Vosburgh, Przemek Kitszel,
Pucha Himasekhar Reddy
From: Jesse Brandeburg <jesse.brandeburg@intel.com>
When the system boots into the crash dump kernel after a panic, the ice
networking device may still have pending transactions that can cause errors
or machine checks when the device is re-enabled. This can prevent the crash
dump kernel from loading the driver or collecting the crash data.
To avoid this issue, perform a function level reset (FLR) on the ice device
via PCIe config space before enabling it on the crash kernel. This will
clear any outstanding transactions and stop all queues and interrupts.
Restore the config space after the FLR, otherwise it was found in testing
that the driver wouldn't load successfully.
The following sequence causes the original issue:
- Load the ice driver with modprobe ice
- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
- Trigger a crash with echo c > /proc/sysrq-trigger
- Load the ice driver again (or let it load automatically) with modprobe ice
- The system crashes again during pcim_enable_device()
Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series")
Reported-by: Vishal Agrawal <vagrawal@redhat.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
---
drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c8286adae946..6550c46e4e36 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6,6 +6,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <generated/utsrelease.h>
+#include <linux/crash_dump.h>
#include "ice.h"
#include "ice_base.h"
#include "ice_lib.h"
@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
return -EINVAL;
}
+ /* when under a kdump kernel initiate a reset before enabling the
+ * device in order to clear out any pending DMA transactions. These
+ * transactions can cause some systems to machine check when doing
+ * the pcim_enable_device() below.
+ */
+ if (is_kdump_kernel()) {
+ pci_save_state(pdev);
+ pci_clear_master(pdev);
+ err = pcie_flr(pdev);
+ if (err)
+ return err;
+ pci_restore_state(pdev);
+ }
+
/* this driver uses devres, see
* Documentation/driver-api/driver-model/devres.rst
*/
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net 3/3] ice: Fix safe mode when DDP is missing
2023-10-11 23:33 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jacob Keller
2023-10-11 23:33 ` [PATCH net 1/3] i40e: prevent crash on probe if hw registers have invalid values Jacob Keller
2023-10-11 23:33 ` [PATCH net 2/3] ice: reset first in crash dump kernels Jacob Keller
@ 2023-10-11 23:33 ` Jacob Keller
2023-10-14 0:56 ` [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jakub Kicinski
2023-10-14 1:10 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2023-10-11 23:33 UTC (permalink / raw)
To: netdev, David Miller, Jakub Kicinski
Cc: Mateusz Pacuszka, Jan Sokolowski, Przemek Kitszel,
Pucha Himasekhar Reddy
From: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
One thing is broken in the safe mode, that is
ice_deinit_features() is being executed even
that ice_init_features() was not causing stack
trace during pci_unregister_driver().
Add check on the top of the function.
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
---
drivers/net/ethernet/intel/ice/ice_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6550c46e4e36..7784135160fd 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4684,6 +4684,9 @@ static void ice_init_features(struct ice_pf *pf)
static void ice_deinit_features(struct ice_pf *pf)
{
+ if (ice_is_safe_mode(pf))
+ return;
+
ice_deinit_lag(pf);
if (test_bit(ICE_FLAG_DCB_CAPABLE, pf->flags))
ice_cfg_lldp_mib_change(&pf->hw, false);
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice)
2023-10-11 23:33 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jacob Keller
` (2 preceding siblings ...)
2023-10-11 23:33 ` [PATCH net 3/3] ice: Fix safe mode when DDP is missing Jacob Keller
@ 2023-10-14 0:56 ` Jakub Kicinski
2023-10-14 1:10 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2023-10-14 0:56 UTC (permalink / raw)
To: Jacob Keller; +Cc: netdev, David Miller
On Wed, 11 Oct 2023 16:33:31 -0700 Jacob Keller wrote:
> The following are changes since commit a950a5921db450c74212327f69950ff03419483a:
> net/smc: Fix pos miscalculation in statistics
>
> I'm covering for Tony Nguyen while he's out, and don't have access to create
> a pull request branch on his net-queue, so these are sent via mail only.
No worries, next time do fix the subject not to say "pull request", tho;)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice)
2023-10-11 23:33 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jacob Keller
` (3 preceding siblings ...)
2023-10-14 0:56 ` [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-10-11 (i40e, ice) Jakub Kicinski
@ 2023-10-14 1:10 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-10-14 1:10 UTC (permalink / raw)
To: Jacob Keller; +Cc: netdev, davem, kuba
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 11 Oct 2023 16:33:31 -0700 you wrote:
> This series contains fixes for the i40e and ice drivers.
>
> Jesse adds handling to the ice driver which resetis the device when loading
> on a crash kernel, preventing stale transactions from causing machine check
> exceptions which could prevent capturing crash data.
>
> Mateusz fixes a bug in the ice driver 'Safe mode' logic for handling the
> device when the DDP is missing.
>
> [...]
Here is the summary with links:
- [net,1/3] i40e: prevent crash on probe if hw registers have invalid values
https://git.kernel.org/netdev/net/c/fc6f716a5069
- [net,2/3] ice: reset first in crash dump kernels
https://git.kernel.org/netdev/net/c/0288c3e709e5
- [net,3/3] ice: Fix safe mode when DDP is missing
https://git.kernel.org/netdev/net/c/42066c4d5d34
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 6+ messages in thread