[RFC PATCH] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.duyck@gmail.com>
To: zdai@linux.vnet.ibm.com
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	alexander.duyck@gmail.com, intel-wired-lan@lists.osuosl.org,
	jeffrey.t.kirsher@intel.com, zdai@us.ibm.com,
	davem@davemloft.net
Subject: [RFC PATCH] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
Date: Fri, 04 Oct 2019 16:36:28 -0700	[thread overview]
Message-ID: <20191004233052.28865.1609.stgit@localhost.localdomain> (raw)
In-Reply-To: <1570208647.1250.55.camel@oc5348122405>

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

This patch is meant to address possible race conditions that can exist
between network configuration and power management. A similar issue was
fixed for igb in commit 9474933caf21 ("igb: close/suspend race in
netif_device_detach").

In addition it consolidates the code so that the PCI error handling code
will essentially perform the power management freeze on the device prior to
attempting a reset, and will thaw the device afterwards if that is what it
is planning to do. Otherwise when we call close on the interface it should
see it is detached and not attempt to call the logic to down the interface
and free the IRQs again.

>From what I can tell the check that was adding the check for __E1000_DOWN
in e1000e_close was added when runtime power management was added. However
it should not be relevant for us as we perform a call to
pm_runtime_get_sync before we call e1000_down/free_irq so it should always
be back up before we call into this anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---

I'm putting this out as an RFC for now. I haven't had a chance to do much
testing yet, but I have verified no build issues, and the driver appears
to load, link, and pass traffic without problems.

This should address issues seen with either double freeing or never freeing
IRQs that have been seen on this and similar drivers in the past.

I'll submit this formally after testing it over the weekend assuming there
are no issues.

 drivers/net/ethernet/intel/e1000e/netdev.c |   33 ++++++++++++++--------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index d7d56e42a6aa..182a2c8f12d8 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4715,12 +4715,12 @@ int e1000e_close(struct net_device *netdev)
 
 	pm_runtime_get_sync(&pdev->dev);
 
-	if (!test_bit(__E1000_DOWN, &adapter->state)) {
+	if (netif_device_present(netdev)) {
 		e1000e_down(adapter, true);
 		e1000_free_irq(adapter);
 
 		/* Link status message must follow this format */
-		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
+		pr_info("%s NIC Link is Down\n", netdev->name);
 	}
 
 	napi_disable(&adapter->napi);
@@ -6299,6 +6299,7 @@ static int e1000e_pm_freeze(struct device *dev)
 	struct net_device *netdev = dev_get_drvdata(dev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
+	rtnl_lock();
 	netif_device_detach(netdev);
 
 	if (netif_running(netdev)) {
@@ -6313,6 +6314,8 @@ static int e1000e_pm_freeze(struct device *dev)
 		e1000e_down(adapter, false);
 		e1000_free_irq(adapter);
 	}
+	rtnl_unlock();
+
 	e1000e_reset_interrupt_capability(adapter);
 
 	/* Allow time for pending master requests to run */
@@ -6626,27 +6629,30 @@ static int __e1000_resume(struct pci_dev *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_PM_SLEEP
 static int e1000e_pm_thaw(struct device *dev)
 {
 	struct net_device *netdev = dev_get_drvdata(dev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
+	int rc = 0;
 
 	e1000e_set_interrupt_capability(adapter);
-	if (netif_running(netdev)) {
-		u32 err = e1000_request_irq(adapter);
 
-		if (err)
-			return err;
+	rtnl_lock();
+	if (netif_running(netdev)) {
+		rc = e1000_request_irq(adapter);
+		if (rc)
+			goto err_irq;
 
 		e1000e_up(adapter);
 	}
 
 	netif_device_attach(netdev);
-
-	return 0;
+	rtnl_unlock();
+err_irq:
+	return rc;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int e1000e_pm_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -6821,13 +6827,11 @@ static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	netif_device_detach(netdev);
+	e1000e_pm_freeze(&pdev->dev);
 
 	if (state == pci_channel_io_perm_failure)
 		return PCI_ERS_RESULT_DISCONNECT;
 
-	if (netif_running(netdev))
-		e1000e_down(adapter, true);
 	pci_disable_device(pdev);
 
 	/* Request a slot slot reset. */
@@ -6893,10 +6897,7 @@ static void e1000_io_resume(struct pci_dev *pdev)
 
 	e1000_init_manageability_pt(adapter);
 
-	if (netif_running(netdev))
-		e1000e_up(adapter);
-
-	netif_device_attach(netdev);
+	e1000e_pm_thaw(&pdev->dev);
 
 	/* If the controller has AMT, do not set DRV_LOAD until the interface
 	 * is up.  For all other cases, let the f/w know that the h/w is now

next prev parent reply	other threads:[~2019-10-04 23:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 16:54 [v1] e1000e: EEH on e1000e adapter detects io perm failure can trigger crash David Dai
2019-10-03 17:39 ` Alexander Duyck
2019-10-03 18:50   ` David Z. Dai
2019-10-03 20:39     ` Alexander Duyck
2019-10-04  0:02       ` David Z. Dai
2019-10-04 14:35         ` Alexander Duyck
2019-10-04 17:04           ` David Z. Dai
2019-10-04 23:36             ` Alexander Duyck [this message]
2019-10-05  2:18               ` [RFC PATCH] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm David Z. Dai
2019-10-05 17:22                 ` Alexander Duyck
2019-10-07 15:50                   ` David Z. Dai
2019-10-07 17:02                     ` Alexander Duyck
2019-10-07 17:12                       ` David Z. Dai
2019-10-07 17:23                         ` Alexander Duyck
2019-10-07 17:27                           ` [RFC PATCH v2] " Alexander Duyck
2019-10-08 20:49                             ` David Z. Dai
2020-02-25  9:42                             ` Kai-Heng Feng
2020-02-25 20:46                               ` Greg Kroah-Hartman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d7d56e42a6a dfblob:182a2c8f12d )
 OR (
bs:"[RFC PATCH] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191004233052.28865.1609.stgit@localhost.localdomain \
    --to=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=zdai@linux.vnet.ibm.com \
    --cc=zdai@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).