public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Emil Tantilov <emil.s.tantilov@intel.com>
To: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, przemyslaw.kitszel@intel.com,
	jay.bhat@intel.com, ivan.d.barrera@intel.com,
	aleksandr.loktionov@intel.com, larysa.zaremba@intel.com,
	anthony.l.nguyen@intel.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, aleksander.lobakin@intel.com,
	linux-pci@vger.kernel.org, madhu.chittim@intel.com,
	decot@google.com, willemb@google.com, sheenamo@google.com
Subject: [PATCH iwl-next 2/2] idpf: implement pci error handlers
Date: Fri, 10 Apr 2026 17:39:59 -0700	[thread overview]
Message-ID: <20260411003959.30959-3-emil.s.tantilov@intel.com> (raw)
In-Reply-To: <20260411003959.30959-1-emil.s.tantilov@intel.com>

Add callbacks to handle PCI errors and FLR reset. When preparing to handle
reset on the bus, the driver must stop all operations that can lead to MMIO
access in order to prevent HW errors. To accomplish this introduce helper
idpf_reset_prepare() that gets called prior to FLR or when PCI error is
detected. Upon resume the recovery is done through the existing reset path
by starting the event task.

The following callbacks are implemented:
.reset_prepare runs the first portion of the generic reset path leading up
to the part where we wait for the reset to complete.
.reset_done/resume runs the recovery part of the reset handling.
.error_detected is the callback dealing with PCI errors, similar to the
prepare call, we stop all operations, prior to attempting a recovery.
.slot_reset is the callback attempting to restore the device, provided a
PCI reset was initiated by the AER driver.

Whereas previously the init logic guaranteed netdevs during reset, the
addition of idpf_detach_and_close() to the PCI callbacks flow makes it
possible for the function to be called without netdevs. Add check to
avoid NULL pointer dereference in that case.

Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf.h      |   3 +
 drivers/net/ethernet/intel/idpf/idpf_lib.c  |  13 ++-
 drivers/net/ethernet/intel/idpf/idpf_main.c | 114 ++++++++++++++++++++
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 1d0e32e47e87..164d2f3e233a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -88,6 +88,7 @@ enum idpf_state {
  * @IDPF_REMOVE_IN_PROG: Driver remove in progress
  * @IDPF_MB_INTR_MODE: Mailbox in interrupt mode
  * @IDPF_VC_CORE_INIT: virtchnl core has been init
+ * @IDPF_PCI_CB_RESET: Reset via the PCI callbacks
  * @IDPF_FLAGS_NBITS: Must be last
  */
 enum idpf_flags {
@@ -97,6 +98,7 @@ enum idpf_flags {
 	IDPF_REMOVE_IN_PROG,
 	IDPF_MB_INTR_MODE,
 	IDPF_VC_CORE_INIT,
+	IDPF_PCI_CB_RESET,
 	IDPF_FLAGS_NBITS,
 };
 
@@ -1012,4 +1014,5 @@ void idpf_idc_vdev_mtu_event(struct iidc_rdma_vport_dev_info *vdev_info,
 int idpf_add_del_fsteer_filters(struct idpf_adapter *adapter,
 				struct virtchnl2_flow_rule_add_del *rule,
 				enum virtchnl2_op opcode);
+void idpf_detach_and_close(struct idpf_adapter *adapter);
 #endif /* !_IDPF_H_ */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 7988836fbae0..1e706beb0098 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -758,13 +758,16 @@ static int idpf_init_mac_addr(struct idpf_vport *vport,
 	return 0;
 }
 
-static void idpf_detach_and_close(struct idpf_adapter *adapter)
+void idpf_detach_and_close(struct idpf_adapter *adapter)
 {
 	int max_vports = adapter->max_vports;
 
 	for (int i = 0; i < max_vports; i++) {
 		struct net_device *netdev = adapter->netdevs[i];
 
+		if (!netdev)
+			continue;
+
 		/* If the interface is in detached state, that means the
 		 * previous reset was not handled successfully for this
 		 * vport.
@@ -1908,6 +1911,10 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
 
 	dev_info(dev, "Device HW Reset initiated\n");
 
+	/* Reset has already happened, skip to recovery. */
+	if (test_and_clear_bit(IDPF_PCI_CB_RESET, adapter->flags))
+		goto check_rst_complete;
+
 	/* Prepare for reset */
 	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags)) {
 		reg_ops->trigger_reset(adapter, IDPF_HR_DRV_LOAD);
@@ -1925,6 +1932,7 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
 		goto unlock_mutex;
 	}
 
+check_rst_complete:
 	/* Wait for reset to complete */
 	err = idpf_check_reset_complete(adapter, &adapter->reset_reg);
 	if (err) {
@@ -1984,7 +1992,8 @@ void idpf_vc_event_task(struct work_struct *work)
 	if (test_bit(IDPF_HR_FUNC_RESET, adapter->flags))
 		goto func_reset;
 
-	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags))
+	if (test_bit(IDPF_HR_DRV_LOAD, adapter->flags) ||
+	    test_bit(IDPF_PCI_CB_RESET, adapter->flags))
 		goto drv_load;
 
 	return;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index d99f759c55e1..cd467695047e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -234,6 +234,7 @@ static int idpf_cfg_device(struct idpf_adapter *adapter)
 	if (err)
 		pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
 
+	pci_save_state(pdev);
 	pci_set_drvdata(pdev, adapter);
 
 	return 0;
@@ -360,6 +361,118 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return err;
 }
 
+static void idpf_reset_prepare(struct idpf_adapter *adapter)
+{
+	pci_dbg(adapter->pdev, "resetting\n");
+	set_bit(IDPF_HR_RESET_IN_PROG, adapter->flags);
+	cancel_delayed_work_sync(&adapter->serv_task);
+	cancel_delayed_work_sync(&adapter->vc_event_task);
+	idpf_detach_and_close(adapter);
+	idpf_idc_issue_reset_event(adapter->cdev_info);
+	idpf_vc_core_deinit(adapter);
+}
+
+/**
+ * idpf_pci_err_detected - PCI error detected, about to attempt recovery
+ * @pdev: PCI device struct
+ * @err: err detected
+ *
+ * Return: %PCI_ERS_RESULT_NEED_RESET to attempt recovery,
+ * %PCI_ERS_RESULT_DISCONNECT if recovery is not possible.
+ */
+static pci_ers_result_t
+idpf_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t err)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	/* Shutdown the mailbox if PCI I/O is in a bad state to avoid MBX
+	 * timeouts during the prepare stage.
+	 */
+	if (pci_channel_offline(pdev))
+		libie_ctlq_xn_shutdown(adapter->xnm);
+
+	idpf_reset_prepare(adapter);
+
+	if (err == pci_channel_io_perm_failure)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	/* When called due to PCI error, driver will have to force PFR on
+	 * resume, in order to complete the recovery via the event task.
+	 */
+	set_bit(IDPF_PCI_CB_RESET, adapter->flags);
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * idpf_pci_err_slot_reset - PCI undergoing reset
+ * @pdev: PCI device struct
+ *
+ * Reset PCI state and use a register read to see if we're good.
+ *
+ * Return: %PCI_ERS_RESULT_RECOVERED on success,
+ * %PCI_ERS_RESULT_DISCONNECT on failure.
+ */
+static pci_ers_result_t
+idpf_pci_err_slot_reset(struct pci_dev *pdev)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	pci_restore_state(pdev);
+	pci_set_master(pdev);
+	pci_wake_from_d3(pdev, false);
+	if (readl(adapter->reset_reg.rstat) != 0xFFFFFFFF) {
+		pci_save_state(pdev);
+		return PCI_ERS_RESULT_RECOVERED;
+	}
+
+	return PCI_ERS_RESULT_DISCONNECT;
+}
+
+/**
+ * idpf_pci_err_resume - Resume operations after PCI error recovery
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_resume(struct pci_dev *pdev)
+{
+	struct idpf_adapter *adapter = pci_get_drvdata(pdev);
+
+	/* Force a PFR when resuming from PCI error. */
+	if (test_and_set_bit(IDPF_PCI_CB_RESET, adapter->flags))
+		adapter->dev_ops.reg_ops.trigger_reset(adapter, IDPF_HR_FUNC_RESET);
+
+	queue_delayed_work(adapter->vc_event_wq,
+			   &adapter->vc_event_task,
+			   msecs_to_jiffies(300));
+}
+
+/**
+ * idpf_pci_err_reset_prepare - Prepare driver for PCI reset
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_prepare(struct pci_dev *pdev)
+{
+	idpf_reset_prepare(pci_get_drvdata(pdev));
+}
+
+/**
+ * idpf_pci_err_reset_done - PCI err reset recovery complete
+ * @pdev: PCI device struct
+ */
+static void idpf_pci_err_reset_done(struct pci_dev *pdev)
+{
+	pci_dbg(pdev, "reset: done\n");
+	idpf_pci_err_resume(pdev);
+}
+
+static const struct pci_error_handlers idpf_pci_err_handler = {
+	.error_detected = idpf_pci_err_detected,
+	.slot_reset = idpf_pci_err_slot_reset,
+	.reset_prepare = idpf_pci_err_reset_prepare,
+	.reset_done = idpf_pci_err_reset_done,
+	.resume = idpf_pci_err_resume,
+};
+
 /* idpf_pci_tbl - PCI Dev idpf ID Table
  */
 static const struct pci_device_id idpf_pci_tbl[] = {
@@ -377,5 +490,6 @@ static struct pci_driver idpf_driver = {
 	.sriov_configure	= idpf_sriov_configure,
 	.remove			= idpf_remove,
 	.shutdown		= idpf_shutdown,
+	.err_handler		= &idpf_pci_err_handler,
 };
 module_pci_driver(idpf_driver);
-- 
2.37.3


  parent reply	other threads:[~2026-04-11  0:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-11  0:39 [PATCH iwl-next 0/2] Introduce IDPF PCI callbacks Emil Tantilov
2026-04-11  0:39 ` [PATCH iwl-next 1/2] idpf: remove conditonal MBX deinit from idpf_vc_core_deinit() Emil Tantilov
2026-04-11  0:39 ` Emil Tantilov [this message]
2026-04-11  5:43   ` [PATCH iwl-next 2/2] idpf: implement pci error handlers Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260411003959.30959-3-emil.s.tantilov@intel.com \
    --to=emil.s.tantilov@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=decot@google.com \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=ivan.d.barrera@intel.com \
    --cc=jay.bhat@intel.com \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=madhu.chittim@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=sheenamo@google.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox