* [Intel-wired-lan] [PATCH iwl-net V3, 0/2] Fix repeated EEH reports in MSI domain
@ 2024-05-14 20:21 ` Thinh Tran
0 siblings, 0 replies; 13+ messages in thread
From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw)
To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov,
przemyslaw.kitszel, horms
Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem
The patch fixes an issue where repeated EEH reports with a single error
on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain
causes the device to be permanently disabled. It fully resets and
restarts the device when handling the PCI EEH error.
v3: moved text commit messages from the cover letter to appropriate
patches.
v2: fixed typos and split into two commits
Thinh Tran (2):
i40e: fractoring out i40e_suspend/i40e_resume
i40e: Fully suspend and resume IO operations in EEH case
drivers/net/ethernet/intel/i40e/i40e_main.c | 257 +++++++++++---------
1 file changed, 140 insertions(+), 117 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH iwl-net V3,0/2] Fix repeated EEH reports in MSI domain @ 2024-05-14 20:21 ` Thinh Tran 0 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw) To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: jesse.brandeburg, davem, edumazet, pabeni, intel-wired-lan, rob.thomas, Thinh Tran The patch fixes an issue where repeated EEH reports with a single error on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain causes the device to be permanently disabled. It fully resets and restarts the device when handling the PCI EEH error. v3: moved text commit messages from the cover letter to appropriate patches. v2: fixed typos and split into two commits Thinh Tran (2): i40e: fractoring out i40e_suspend/i40e_resume i40e: Fully suspend and resume IO operations in EEH case drivers/net/ethernet/intel/i40e/i40e_main.c | 257 +++++++++++--------- 1 file changed, 140 insertions(+), 117 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume 2024-05-14 20:21 ` [PATCH iwl-net V3,0/2] " Thinh Tran @ 2024-05-14 20:21 ` Thinh Tran -1 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw) To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem Two new functions, i40e_io_suspend() and i40e_io_resume(), have been introduced. These functions were factored out from the existing i40e_suspend() and i40e_resume() respectively. This factoring was done due to concerns about the logic of the I40E_SUSPENSED state, which caused the device not able to recover. The functions are now used in the EEH handling for device suspend/resume callbacks. Reordered the function, i40e_enable_mc_magic_wake() has been moved ahead of i40e_io_suspend() to ensure it is declared before being used. Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> --- drivers/net/ethernet/intel/i40e/i40e_main.c | 248 +++++++++++--------- 1 file changed, 134 insertions(+), 114 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index ffb9f9f15c52..281c8ec27af2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -16303,6 +16303,138 @@ static void i40e_remove(struct pci_dev *pdev) pci_disable_device(pdev); } +/** + * i40e_enable_mc_magic_wake - enable multicast magic packet wake up + * using the mac_address_write admin q function + * @pf: pointer to i40e_pf struct + **/ +static void i40e_enable_mc_magic_wake(struct i40e_pf *pf) +{ + struct i40e_hw *hw = &pf->hw; + u8 mac_addr[6]; + u16 flags = 0; + int ret; + + /* Get current MAC address in case it's an LAA */ + if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) { + ether_addr_copy(mac_addr, + pf->vsi[pf->lan_vsi]->netdev->dev_addr); + } else { + dev_err(&pf->pdev->dev, + "Failed to retrieve MAC address; using default\n"); + ether_addr_copy(mac_addr, hw->mac.addr); + } + + /* The FW expects the mac address write cmd to first be called with + * one of these flags before calling it again with the multicast + * enable flags. + */ + flags = I40E_AQC_WRITE_TYPE_LAA_WOL; + + if (hw->func_caps.flex10_enable && hw->partition_id != 1) + flags = I40E_AQC_WRITE_TYPE_LAA_ONLY; + + ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); + if (ret) { + dev_err(&pf->pdev->dev, + "Failed to update MAC address registers; cannot enable Multicast Magic packet wake up"); + return; + } + + flags = I40E_AQC_MC_MAG_EN + | I40E_AQC_WOL_PRESERVE_ON_PFR + | I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG; + ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); + if (ret) + dev_err(&pf->pdev->dev, + "Failed to enable Multicast Magic Packet wake up\n"); +} + +/** + * i40e_io_suspend - suspend all IO operations + * @pf: pointer to i40e_pf struct + * + **/ +static int i40e_io_suspend(struct i40e_pf *pf) +{ + struct i40e_hw *hw = &pf->hw; + + set_bit(__I40E_DOWN, pf->state); + + /* Ensure service task will not be running */ + del_timer_sync(&pf->service_timer); + cancel_work_sync(&pf->service_task); + + /* Client close must be called explicitly here because the timer + * has been stopped. + */ + i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false); + + if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) && + pf->wol_en) + i40e_enable_mc_magic_wake(pf); + + /* Since we're going to destroy queues during the + * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this + * whole section + */ + rtnl_lock(); + + i40e_prep_for_reset(pf); + + wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); + wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0)); + + /* Clear the interrupt scheme and release our IRQs so that the system + * can safely hibernate even when there are a large number of CPUs. + * Otherwise hibernation might fail when mapping all the vectors back + * to CPU0. + */ + i40e_clear_interrupt_scheme(pf); + + rtnl_unlock(); + + return 0; +} + +/** + * i40e_io_resume - resume IO operations + * @pf: pointer to i40e_pf struct + * + **/ +static int i40e_io_resume(struct i40e_pf *pf) +{ + int err; + + /* We need to hold the RTNL lock prior to restoring interrupt schemes, + * since we're going to be restoring queues + */ + rtnl_lock(); + + /* We cleared the interrupt scheme when we suspended, so we need to + * restore it now to resume device functionality. + */ + err = i40e_restore_interrupt_scheme(pf); + if (err) { + dev_err(&pf->pdev->dev, "Cannot restore interrupt scheme: %d\n", + err); + } + + clear_bit(__I40E_DOWN, pf->state); + i40e_reset_and_rebuild(pf, false, true); + + rtnl_unlock(); + + /* Clear suspended state last after everything is recovered */ + clear_bit(__I40E_SUSPENDED, pf->state); + + /* Restart the service task */ + mod_timer(&pf->service_timer, + round_jiffies(jiffies + pf->service_timer_period)); + + return 0; +} + /** * i40e_pci_error_detected - warning that something funky happened in PCI land * @pdev: PCI device information struct @@ -16415,53 +16547,6 @@ static void i40e_pci_error_resume(struct pci_dev *pdev) i40e_handle_reset_warning(pf, false); } -/** - * i40e_enable_mc_magic_wake - enable multicast magic packet wake up - * using the mac_address_write admin q function - * @pf: pointer to i40e_pf struct - **/ -static void i40e_enable_mc_magic_wake(struct i40e_pf *pf) -{ - struct i40e_hw *hw = &pf->hw; - u8 mac_addr[6]; - u16 flags = 0; - int ret; - - /* Get current MAC address in case it's an LAA */ - if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) { - ether_addr_copy(mac_addr, - pf->vsi[pf->lan_vsi]->netdev->dev_addr); - } else { - dev_err(&pf->pdev->dev, - "Failed to retrieve MAC address; using default\n"); - ether_addr_copy(mac_addr, hw->mac.addr); - } - - /* The FW expects the mac address write cmd to first be called with - * one of these flags before calling it again with the multicast - * enable flags. - */ - flags = I40E_AQC_WRITE_TYPE_LAA_WOL; - - if (hw->func_caps.flex10_enable && hw->partition_id != 1) - flags = I40E_AQC_WRITE_TYPE_LAA_ONLY; - - ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); - if (ret) { - dev_err(&pf->pdev->dev, - "Failed to update MAC address registers; cannot enable Multicast Magic packet wake up"); - return; - } - - flags = I40E_AQC_MC_MAG_EN - | I40E_AQC_WOL_PRESERVE_ON_PFR - | I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG; - ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); - if (ret) - dev_err(&pf->pdev->dev, - "Failed to enable Multicast Magic Packet wake up\n"); -} - /** * i40e_shutdown - PCI callback for shutting down * @pdev: PCI device information struct @@ -16521,48 +16606,11 @@ static void i40e_shutdown(struct pci_dev *pdev) static int __maybe_unused i40e_suspend(struct device *dev) { struct i40e_pf *pf = dev_get_drvdata(dev); - struct i40e_hw *hw = &pf->hw; /* If we're already suspended, then there is nothing to do */ if (test_and_set_bit(__I40E_SUSPENDED, pf->state)) return 0; - - set_bit(__I40E_DOWN, pf->state); - - /* Ensure service task will not be running */ - del_timer_sync(&pf->service_timer); - cancel_work_sync(&pf->service_task); - - /* Client close must be called explicitly here because the timer - * has been stopped. - */ - i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false); - - if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) && - pf->wol_en) - i40e_enable_mc_magic_wake(pf); - - /* Since we're going to destroy queues during the - * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this - * whole section - */ - rtnl_lock(); - - i40e_prep_for_reset(pf); - - wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); - wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0)); - - /* Clear the interrupt scheme and release our IRQs so that the system - * can safely hibernate even when there are a large number of CPUs. - * Otherwise hibernation might fail when mapping all the vectors back - * to CPU0. - */ - i40e_clear_interrupt_scheme(pf); - - rtnl_unlock(); - - return 0; + return i40e_io_suspend(pf); } /** @@ -16572,39 +16620,11 @@ static int __maybe_unused i40e_suspend(struct device *dev) static int __maybe_unused i40e_resume(struct device *dev) { struct i40e_pf *pf = dev_get_drvdata(dev); - int err; /* If we're not suspended, then there is nothing to do */ if (!test_bit(__I40E_SUSPENDED, pf->state)) return 0; - - /* We need to hold the RTNL lock prior to restoring interrupt schemes, - * since we're going to be restoring queues - */ - rtnl_lock(); - - /* We cleared the interrupt scheme when we suspended, so we need to - * restore it now to resume device functionality. - */ - err = i40e_restore_interrupt_scheme(pf); - if (err) { - dev_err(dev, "Cannot restore interrupt scheme: %d\n", - err); - } - - clear_bit(__I40E_DOWN, pf->state); - i40e_reset_and_rebuild(pf, false, true); - - rtnl_unlock(); - - /* Clear suspended state last after everything is recovered */ - clear_bit(__I40E_SUSPENDED, pf->state); - - /* Restart the service task */ - mod_timer(&pf->service_timer, - round_jiffies(jiffies + pf->service_timer_period)); - - return 0; + return i40e_io_resume(pf); } static const struct pci_error_handlers i40e_err_handler = { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH iwl-net V3,1/2] i40e: fractoring out i40e_suspend/i40e_resume @ 2024-05-14 20:21 ` Thinh Tran 0 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw) To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: jesse.brandeburg, davem, edumazet, pabeni, intel-wired-lan, rob.thomas, Thinh Tran Two new functions, i40e_io_suspend() and i40e_io_resume(), have been introduced. These functions were factored out from the existing i40e_suspend() and i40e_resume() respectively. This factoring was done due to concerns about the logic of the I40E_SUSPENSED state, which caused the device not able to recover. The functions are now used in the EEH handling for device suspend/resume callbacks. Reordered the function, i40e_enable_mc_magic_wake() has been moved ahead of i40e_io_suspend() to ensure it is declared before being used. Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> --- drivers/net/ethernet/intel/i40e/i40e_main.c | 248 +++++++++++--------- 1 file changed, 134 insertions(+), 114 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index ffb9f9f15c52..281c8ec27af2 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -16303,6 +16303,138 @@ static void i40e_remove(struct pci_dev *pdev) pci_disable_device(pdev); } +/** + * i40e_enable_mc_magic_wake - enable multicast magic packet wake up + * using the mac_address_write admin q function + * @pf: pointer to i40e_pf struct + **/ +static void i40e_enable_mc_magic_wake(struct i40e_pf *pf) +{ + struct i40e_hw *hw = &pf->hw; + u8 mac_addr[6]; + u16 flags = 0; + int ret; + + /* Get current MAC address in case it's an LAA */ + if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) { + ether_addr_copy(mac_addr, + pf->vsi[pf->lan_vsi]->netdev->dev_addr); + } else { + dev_err(&pf->pdev->dev, + "Failed to retrieve MAC address; using default\n"); + ether_addr_copy(mac_addr, hw->mac.addr); + } + + /* The FW expects the mac address write cmd to first be called with + * one of these flags before calling it again with the multicast + * enable flags. + */ + flags = I40E_AQC_WRITE_TYPE_LAA_WOL; + + if (hw->func_caps.flex10_enable && hw->partition_id != 1) + flags = I40E_AQC_WRITE_TYPE_LAA_ONLY; + + ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); + if (ret) { + dev_err(&pf->pdev->dev, + "Failed to update MAC address registers; cannot enable Multicast Magic packet wake up"); + return; + } + + flags = I40E_AQC_MC_MAG_EN + | I40E_AQC_WOL_PRESERVE_ON_PFR + | I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG; + ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); + if (ret) + dev_err(&pf->pdev->dev, + "Failed to enable Multicast Magic Packet wake up\n"); +} + +/** + * i40e_io_suspend - suspend all IO operations + * @pf: pointer to i40e_pf struct + * + **/ +static int i40e_io_suspend(struct i40e_pf *pf) +{ + struct i40e_hw *hw = &pf->hw; + + set_bit(__I40E_DOWN, pf->state); + + /* Ensure service task will not be running */ + del_timer_sync(&pf->service_timer); + cancel_work_sync(&pf->service_task); + + /* Client close must be called explicitly here because the timer + * has been stopped. + */ + i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false); + + if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) && + pf->wol_en) + i40e_enable_mc_magic_wake(pf); + + /* Since we're going to destroy queues during the + * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this + * whole section + */ + rtnl_lock(); + + i40e_prep_for_reset(pf); + + wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); + wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0)); + + /* Clear the interrupt scheme and release our IRQs so that the system + * can safely hibernate even when there are a large number of CPUs. + * Otherwise hibernation might fail when mapping all the vectors back + * to CPU0. + */ + i40e_clear_interrupt_scheme(pf); + + rtnl_unlock(); + + return 0; +} + +/** + * i40e_io_resume - resume IO operations + * @pf: pointer to i40e_pf struct + * + **/ +static int i40e_io_resume(struct i40e_pf *pf) +{ + int err; + + /* We need to hold the RTNL lock prior to restoring interrupt schemes, + * since we're going to be restoring queues + */ + rtnl_lock(); + + /* We cleared the interrupt scheme when we suspended, so we need to + * restore it now to resume device functionality. + */ + err = i40e_restore_interrupt_scheme(pf); + if (err) { + dev_err(&pf->pdev->dev, "Cannot restore interrupt scheme: %d\n", + err); + } + + clear_bit(__I40E_DOWN, pf->state); + i40e_reset_and_rebuild(pf, false, true); + + rtnl_unlock(); + + /* Clear suspended state last after everything is recovered */ + clear_bit(__I40E_SUSPENDED, pf->state); + + /* Restart the service task */ + mod_timer(&pf->service_timer, + round_jiffies(jiffies + pf->service_timer_period)); + + return 0; +} + /** * i40e_pci_error_detected - warning that something funky happened in PCI land * @pdev: PCI device information struct @@ -16415,53 +16547,6 @@ static void i40e_pci_error_resume(struct pci_dev *pdev) i40e_handle_reset_warning(pf, false); } -/** - * i40e_enable_mc_magic_wake - enable multicast magic packet wake up - * using the mac_address_write admin q function - * @pf: pointer to i40e_pf struct - **/ -static void i40e_enable_mc_magic_wake(struct i40e_pf *pf) -{ - struct i40e_hw *hw = &pf->hw; - u8 mac_addr[6]; - u16 flags = 0; - int ret; - - /* Get current MAC address in case it's an LAA */ - if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) { - ether_addr_copy(mac_addr, - pf->vsi[pf->lan_vsi]->netdev->dev_addr); - } else { - dev_err(&pf->pdev->dev, - "Failed to retrieve MAC address; using default\n"); - ether_addr_copy(mac_addr, hw->mac.addr); - } - - /* The FW expects the mac address write cmd to first be called with - * one of these flags before calling it again with the multicast - * enable flags. - */ - flags = I40E_AQC_WRITE_TYPE_LAA_WOL; - - if (hw->func_caps.flex10_enable && hw->partition_id != 1) - flags = I40E_AQC_WRITE_TYPE_LAA_ONLY; - - ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); - if (ret) { - dev_err(&pf->pdev->dev, - "Failed to update MAC address registers; cannot enable Multicast Magic packet wake up"); - return; - } - - flags = I40E_AQC_MC_MAG_EN - | I40E_AQC_WOL_PRESERVE_ON_PFR - | I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG; - ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL); - if (ret) - dev_err(&pf->pdev->dev, - "Failed to enable Multicast Magic Packet wake up\n"); -} - /** * i40e_shutdown - PCI callback for shutting down * @pdev: PCI device information struct @@ -16521,48 +16606,11 @@ static void i40e_shutdown(struct pci_dev *pdev) static int __maybe_unused i40e_suspend(struct device *dev) { struct i40e_pf *pf = dev_get_drvdata(dev); - struct i40e_hw *hw = &pf->hw; /* If we're already suspended, then there is nothing to do */ if (test_and_set_bit(__I40E_SUSPENDED, pf->state)) return 0; - - set_bit(__I40E_DOWN, pf->state); - - /* Ensure service task will not be running */ - del_timer_sync(&pf->service_timer); - cancel_work_sync(&pf->service_task); - - /* Client close must be called explicitly here because the timer - * has been stopped. - */ - i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false); - - if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) && - pf->wol_en) - i40e_enable_mc_magic_wake(pf); - - /* Since we're going to destroy queues during the - * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this - * whole section - */ - rtnl_lock(); - - i40e_prep_for_reset(pf); - - wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0)); - wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0)); - - /* Clear the interrupt scheme and release our IRQs so that the system - * can safely hibernate even when there are a large number of CPUs. - * Otherwise hibernation might fail when mapping all the vectors back - * to CPU0. - */ - i40e_clear_interrupt_scheme(pf); - - rtnl_unlock(); - - return 0; + return i40e_io_suspend(pf); } /** @@ -16572,39 +16620,11 @@ static int __maybe_unused i40e_suspend(struct device *dev) static int __maybe_unused i40e_resume(struct device *dev) { struct i40e_pf *pf = dev_get_drvdata(dev); - int err; /* If we're not suspended, then there is nothing to do */ if (!test_bit(__I40E_SUSPENDED, pf->state)) return 0; - - /* We need to hold the RTNL lock prior to restoring interrupt schemes, - * since we're going to be restoring queues - */ - rtnl_lock(); - - /* We cleared the interrupt scheme when we suspended, so we need to - * restore it now to resume device functionality. - */ - err = i40e_restore_interrupt_scheme(pf); - if (err) { - dev_err(dev, "Cannot restore interrupt scheme: %d\n", - err); - } - - clear_bit(__I40E_DOWN, pf->state); - i40e_reset_and_rebuild(pf, false, true); - - rtnl_unlock(); - - /* Clear suspended state last after everything is recovered */ - clear_bit(__I40E_SUSPENDED, pf->state); - - /* Restart the service task */ - mod_timer(&pf->service_timer, - round_jiffies(jiffies + pf->service_timer_period)); - - return 0; + return i40e_io_resume(pf); } static const struct pci_error_handlers i40e_err_handler = { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume 2024-05-14 20:21 ` [PATCH iwl-net V3,1/2] " Thinh Tran (?) @ 2024-05-14 21:30 ` Jacob Keller -1 siblings, 0 replies; 13+ messages in thread From: Jacob Keller @ 2024-05-14 21:30 UTC (permalink / raw) To: intel-wired-lan On 5/14/2024 1:21 PM, Thinh Tran wrote: > Two new functions, i40e_io_suspend() and i40e_io_resume(), have been > introduced. These functions were factored out from the existing > i40e_suspend() and i40e_resume() respectively. This factoring was > done due to concerns about the logic of the I40E_SUSPENSED state, which > caused the device not able to recover. The functions are now used in the > EEH handling for device suspend/resume callbacks. > > Reordered the function, i40e_enable_mc_magic_wake() has been moved > ahead of i40e_io_suspend() to ensure it is declared before being used. > > Tested-by: Robert Thomas <rob.thomas@ibm.com> > Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> > --- Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume 2024-05-14 20:21 ` [PATCH iwl-net V3,1/2] " Thinh Tran @ 2024-05-15 5:08 ` Paul Menzel -1 siblings, 0 replies; 13+ messages in thread From: Paul Menzel @ 2024-05-15 5:08 UTC (permalink / raw) To: Thinh Tran Cc: pabeni, netdev, rob.thomas, aleksandr.loktionov, edumazet, anthony.l.nguyen, horms, przemyslaw.kitszel, kuba, intel-wired-lan, davem Dear Thinh, Thank you for your patch. Two minor comments for the title: Please use imperative mood and fix a typo: Factor out i40e_suspend/i40e_resume Kind regards, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume @ 2024-05-15 5:08 ` Paul Menzel 0 siblings, 0 replies; 13+ messages in thread From: Paul Menzel @ 2024-05-15 5:08 UTC (permalink / raw) To: Thinh Tran Cc: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms, edumazet, rob.thomas, intel-wired-lan, pabeni, davem Dear Thinh, Thank you for your patch. Two minor comments for the title: Please use imperative mood and fix a typo: Factor out i40e_suspend/i40e_resume Kind regards, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume 2024-05-15 5:08 ` Paul Menzel @ 2024-05-15 13:41 ` Thinh Tran -1 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-15 13:41 UTC (permalink / raw) To: Paul Menzel Cc: pabeni, netdev, rob.thomas, aleksandr.loktionov, edumazet, anthony.l.nguyen, horms, przemyslaw.kitszel, kuba, intel-wired-lan, davem On 5/15/2024 12:08 AM, Paul Menzel wrote: > Please use imperative mood and fix a typo I apologize for the oversight. Thank you for your feedback. I'll correct that. Thinh Tran ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume @ 2024-05-15 13:41 ` Thinh Tran 0 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-15 13:41 UTC (permalink / raw) To: Paul Menzel Cc: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms, edumazet, rob.thomas, intel-wired-lan, pabeni, davem On 5/15/2024 12:08 AM, Paul Menzel wrote: > Please use imperative mood and fix a typo I apologize for the oversight. Thank you for your feedback. I'll correct that. Thinh Tran ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-wired-lan] [PATCH iwl-net V3, 2/2] i40e: Fully suspend and resume IO operations in EEH case 2024-05-14 20:21 ` [PATCH iwl-net V3,0/2] " Thinh Tran @ 2024-05-14 20:21 ` Thinh Tran -1 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw) To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem When EEH events occurs, the callback functions in the i40e, which are managed by the EEH driver, will completely suspend and resume all IO operations. - In the PCI error detected callback, replaced i40e_prep_for_reset() with i40e_io_suspend(). The change is to fully suspend all I/O operations - In the PCI error slot reset callback, replaced pci_enable_device_mem() with pci_enable_device(). This change enables both I/O and memory of the device. - In the PCI error resume callback, replaced i40e_handle_reset_warning() with i40e_io_resume(). This change allows the system to resume I/O operations Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> --- drivers/net/ethernet/intel/i40e/i40e_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 281c8ec27af2..9f71a61e0c52 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11138,6 +11138,8 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit, ret = i40e_reset(pf); if (!ret) i40e_rebuild(pf, reinit, lock_acquired); + else + dev_err(&pf->pdev->dev, "%s: i40e_reset() FAILED", __func__); } /** @@ -16459,7 +16461,7 @@ static pci_ers_result_t i40e_pci_error_detected(struct pci_dev *pdev, /* shutdown all operations */ if (!test_bit(__I40E_SUSPENDED, pf->state)) - i40e_prep_for_reset(pf); + i40e_io_suspend(pf); /* Request a slot reset */ return PCI_ERS_RESULT_NEED_RESET; @@ -16481,7 +16483,8 @@ static pci_ers_result_t i40e_pci_error_slot_reset(struct pci_dev *pdev) u32 reg; dev_dbg(&pdev->dev, "%s\n", __func__); - if (pci_enable_device_mem(pdev)) { + /* enable I/O and memory of the device */ + if (pci_enable_device(pdev)) { dev_info(&pdev->dev, "Cannot re-enable PCI device after reset.\n"); result = PCI_ERS_RESULT_DISCONNECT; @@ -16544,7 +16547,7 @@ static void i40e_pci_error_resume(struct pci_dev *pdev) if (test_bit(__I40E_SUSPENDED, pf->state)) return; - i40e_handle_reset_warning(pf, false); + i40e_io_resume(pf); } /** -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH iwl-net V3,2/2] i40e: Fully suspend and resume IO operations in EEH case @ 2024-05-14 20:21 ` Thinh Tran 0 siblings, 0 replies; 13+ messages in thread From: Thinh Tran @ 2024-05-14 20:21 UTC (permalink / raw) To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: jesse.brandeburg, davem, edumazet, pabeni, intel-wired-lan, rob.thomas, Thinh Tran When EEH events occurs, the callback functions in the i40e, which are managed by the EEH driver, will completely suspend and resume all IO operations. - In the PCI error detected callback, replaced i40e_prep_for_reset() with i40e_io_suspend(). The change is to fully suspend all I/O operations - In the PCI error slot reset callback, replaced pci_enable_device_mem() with pci_enable_device(). This change enables both I/O and memory of the device. - In the PCI error resume callback, replaced i40e_handle_reset_warning() with i40e_io_resume(). This change allows the system to resume I/O operations Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> --- drivers/net/ethernet/intel/i40e/i40e_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 281c8ec27af2..9f71a61e0c52 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11138,6 +11138,8 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit, ret = i40e_reset(pf); if (!ret) i40e_rebuild(pf, reinit, lock_acquired); + else + dev_err(&pf->pdev->dev, "%s: i40e_reset() FAILED", __func__); } /** @@ -16459,7 +16461,7 @@ static pci_ers_result_t i40e_pci_error_detected(struct pci_dev *pdev, /* shutdown all operations */ if (!test_bit(__I40E_SUSPENDED, pf->state)) - i40e_prep_for_reset(pf); + i40e_io_suspend(pf); /* Request a slot reset */ return PCI_ERS_RESULT_NEED_RESET; @@ -16481,7 +16483,8 @@ static pci_ers_result_t i40e_pci_error_slot_reset(struct pci_dev *pdev) u32 reg; dev_dbg(&pdev->dev, "%s\n", __func__); - if (pci_enable_device_mem(pdev)) { + /* enable I/O and memory of the device */ + if (pci_enable_device(pdev)) { dev_info(&pdev->dev, "Cannot re-enable PCI device after reset.\n"); result = PCI_ERS_RESULT_DISCONNECT; @@ -16544,7 +16547,7 @@ static void i40e_pci_error_resume(struct pci_dev *pdev) if (test_bit(__I40E_SUSPENDED, pf->state)) return; - i40e_handle_reset_warning(pf, false); + i40e_io_resume(pf); } /** -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 2/2] i40e: Fully suspend and resume IO operations in EEH case 2024-05-14 20:21 ` [PATCH iwl-net V3,2/2] " Thinh Tran @ 2024-05-14 21:30 ` Jacob Keller -1 siblings, 0 replies; 13+ messages in thread From: Jacob Keller @ 2024-05-14 21:30 UTC (permalink / raw) To: Thinh Tran, netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: intel-wired-lan, edumazet, pabeni, rob.thomas, davem On 5/14/2024 1:21 PM, Thinh Tran wrote: > When EEH events occurs, the callback functions in the i40e, which are > managed by the EEH driver, will completely suspend and resume all IO > operations. > > - In the PCI error detected callback, replaced i40e_prep_for_reset() > with i40e_io_suspend(). The change is to fully suspend all I/O > operations > - In the PCI error slot reset callback, replaced pci_enable_device_mem() > with pci_enable_device(). This change enables both I/O and memory of > the device. > - In the PCI error resume callback, replaced i40e_handle_reset_warning() > with i40e_io_resume(). This change allows the system to resume I/O > operations > > Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") > Tested-by: Robert Thomas <rob.thomas@ibm.com> > Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> > --- Thanks for moving this info to the commit message. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net V3, 2/2] i40e: Fully suspend and resume IO operations in EEH case @ 2024-05-14 21:30 ` Jacob Keller 0 siblings, 0 replies; 13+ messages in thread From: Jacob Keller @ 2024-05-14 21:30 UTC (permalink / raw) To: Thinh Tran, netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel, horms Cc: edumazet, rob.thomas, intel-wired-lan, pabeni, davem On 5/14/2024 1:21 PM, Thinh Tran wrote: > When EEH events occurs, the callback functions in the i40e, which are > managed by the EEH driver, will completely suspend and resume all IO > operations. > > - In the PCI error detected callback, replaced i40e_prep_for_reset() > with i40e_io_suspend(). The change is to fully suspend all I/O > operations > - In the PCI error slot reset callback, replaced pci_enable_device_mem() > with pci_enable_device(). This change enables both I/O and memory of > the device. > - In the PCI error resume callback, replaced i40e_handle_reset_warning() > with i40e_io_resume(). This change allows the system to resume I/O > operations > > Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") > Tested-by: Robert Thomas <rob.thomas@ibm.com> > Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> > --- Thanks for moving this info to the commit message. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-05-15 13:41 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-14 20:21 [Intel-wired-lan] [PATCH iwl-net V3, 0/2] Fix repeated EEH reports in MSI domain Thinh Tran 2024-05-14 20:21 ` [PATCH iwl-net V3,0/2] " Thinh Tran 2024-05-14 20:21 ` [Intel-wired-lan] [PATCH iwl-net V3, 1/2] i40e: fractoring out i40e_suspend/i40e_resume Thinh Tran 2024-05-14 20:21 ` [PATCH iwl-net V3,1/2] " Thinh Tran 2024-05-14 21:30 ` [Intel-wired-lan] [PATCH iwl-net V3, 1/2] " Jacob Keller 2024-05-15 5:08 ` Paul Menzel 2024-05-15 5:08 ` Paul Menzel 2024-05-15 13:41 ` Thinh Tran 2024-05-15 13:41 ` Thinh Tran 2024-05-14 20:21 ` [Intel-wired-lan] [PATCH iwl-net V3, 2/2] i40e: Fully suspend and resume IO operations in EEH case Thinh Tran 2024-05-14 20:21 ` [PATCH iwl-net V3,2/2] " Thinh Tran 2024-05-14 21:30 ` [Intel-wired-lan] [PATCH iwl-net V3, 2/2] " Jacob Keller 2024-05-14 21:30 ` Jacob Keller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.