netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery
@ 2014-09-01 12:21 Ivan Vecera
  2014-09-02 17:32 ` Prashant Sreedharan
  2014-09-02 20:02 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Ivan Vecera @ 2014-09-01 12:21 UTC (permalink / raw)
  To: netdev; +Cc: Prashant Sreedharan, Michael Chan

The patch fixes race conditions between PCI error recovery callbacks and
potential ifup/ifdown.

First, if ifup (tg3_open) is called between tg3_io_error_detected() and
tg3_io_resume() then tp->timer is armed twice before expiry. Once during
tg3_open() and again during tg3_io_resume(). This results in BUG
at kernel/time/timer.c:945.

Second, if ifdown (tg3_close) is called between tg3_io_error_detected()
and tg3_io_resume() then tg3_napi_disable() is called twice without
a tg3_napi_enable between. Once during tg3_io_error_detected() and again
during tg3_close(). The tg3_io_resume() then hangs on rtnl_lock().

v2: Added logging messages per Prashant's request

Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 16 ++++++++++++++++
 drivers/net/ethernet/broadcom/tg3.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 3ac5d23..cb77ae9 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -11617,6 +11617,12 @@ static int tg3_open(struct net_device *dev)
 	struct tg3 *tp = netdev_priv(dev);
 	int err;
 
+	if (tp->pcierr_recovery) {
+		netdev_err(dev, "Failed to open device. PCI error recovery "
+			   "in progress\n");
+		return -EAGAIN;
+	}
+
 	if (tp->fw_needed) {
 		err = tg3_request_firmware(tp);
 		if (tg3_asic_rev(tp) == ASIC_REV_57766) {
@@ -11674,6 +11680,12 @@ static int tg3_close(struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
+	if (tp->pcierr_recovery) {
+		netdev_err(dev, "Failed to close device. PCI error recovery "
+			   "in progress\n");
+		return -EAGAIN;
+	}
+
 	tg3_ptp_fini(tp);
 
 	tg3_stop(tp);
@@ -17561,6 +17573,7 @@ static int tg3_init_one(struct pci_dev *pdev,
 	tp->rx_mode = TG3_DEF_RX_MODE;
 	tp->tx_mode = TG3_DEF_TX_MODE;
 	tp->irq_sync = 1;
+	tp->pcierr_recovery = false;
 
 	if (tg3_debug > 0)
 		tp->msg_enable = tg3_debug;
@@ -18071,6 +18084,8 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
 
 	rtnl_lock();
 
+	tp->pcierr_recovery = true;
+
 	/* We probably don't have netdev yet */
 	if (!netdev || !netif_running(netdev))
 		goto done;
@@ -18195,6 +18210,7 @@ static void tg3_io_resume(struct pci_dev *pdev)
 	tg3_phy_start(tp);
 
 done:
+	tp->pcierr_recovery = false;
 	rtnl_unlock();
 }
 
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 461acca..31c9f82 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -3407,6 +3407,7 @@ struct tg3 {
 
 	struct device			*hwmon_dev;
 	bool				link_up;
+	bool				pcierr_recovery;
 };
 
 /* Accessor macros for chip and asic attributes
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery
  2014-09-01 12:21 [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery Ivan Vecera
@ 2014-09-02 17:32 ` Prashant Sreedharan
  2014-09-02 20:02 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Prashant Sreedharan @ 2014-09-02 17:32 UTC (permalink / raw)
  To: Ivan Vecera; +Cc: netdev, Michael Chan

On Mon, 2014-09-01 at 14:21 +0200, Ivan Vecera wrote:
> The patch fixes race conditions between PCI error recovery callbacks and
> potential ifup/ifdown.
> 
> First, if ifup (tg3_open) is called between tg3_io_error_detected() and
> tg3_io_resume() then tp->timer is armed twice before expiry. Once during
> tg3_open() and again during tg3_io_resume(). This results in BUG
> at kernel/time/timer.c:945.
> 
> Second, if ifdown (tg3_close) is called between tg3_io_error_detected()
> and tg3_io_resume() then tg3_napi_disable() is called twice without
> a tg3_napi_enable between. Once during tg3_io_error_detected() and again
> during tg3_close(). The tg3_io_resume() then hangs on rtnl_lock().
> 
> v2: Added logging messages per Prashant's request
> 
> Cc: Prashant Sreedharan <prashant@broadcom.com>
> Cc: Michael Chan <mchan@broadcom.com>
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Acked-by: Prashant Sreedharan <prashant@broadcom.com>


--

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery
  2014-09-01 12:21 [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery Ivan Vecera
  2014-09-02 17:32 ` Prashant Sreedharan
@ 2014-09-02 20:02 ` David Miller
  2014-09-03 16:25   ` Yuval Mintz
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2014-09-02 20:02 UTC (permalink / raw)
  To: ivecera; +Cc: netdev, prashant, mchan

From: Ivan Vecera <ivecera@redhat.com>
Date: Mon,  1 Sep 2014 14:21:57 +0200

> The patch fixes race conditions between PCI error recovery callbacks and
> potential ifup/ifdown.
> 
> First, if ifup (tg3_open) is called between tg3_io_error_detected() and
> tg3_io_resume() then tp->timer is armed twice before expiry. Once during
> tg3_open() and again during tg3_io_resume(). This results in BUG
> at kernel/time/timer.c:945.
> 
> Second, if ifdown (tg3_close) is called between tg3_io_error_detected()
> and tg3_io_resume() then tg3_napi_disable() is called twice without
> a tg3_napi_enable between. Once during tg3_io_error_detected() and again
> during tg3_close(). The tg3_io_resume() then hangs on rtnl_lock().
> 
> v2: Added logging messages per Prashant's request
> 
> Cc: Prashant Sreedharan <prashant@broadcom.com>
> Cc: Michael Chan <mchan@broadcom.com>
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery
  2014-09-02 20:02 ` David Miller
@ 2014-09-03 16:25   ` Yuval Mintz
  0 siblings, 0 replies; 4+ messages in thread
From: Yuval Mintz @ 2014-09-03 16:25 UTC (permalink / raw)
  To: David Miller, ivecera@redhat.com
  Cc: netdev, prashant@broadcom.com, mchan@broadcom.com

> > The patch fixes race conditions between PCI error recovery callbacks
> > and potential ifup/ifdown.
> >
> > First, if ifup (tg3_open) is called between tg3_io_error_detected()
> > and
> > tg3_io_resume() then tp->timer is armed twice before expiry. Once
> > during
> > tg3_open() and again during tg3_io_resume(). This results in BUG at
> > kernel/time/timer.c:945.
> >
> > Second, if ifdown (tg3_close) is called between
> > tg3_io_error_detected() and tg3_io_resume() then tg3_napi_disable() is
> > called twice without a tg3_napi_enable between. Once during
> > tg3_io_error_detected() and again during tg3_close(). The tg3_io_resume()
> then hangs on rtnl_lock().

Hi, sorry for the late response but just seen this patch.
This sounds like a general problem, i.e., one that could affect multiple drivers
[Not exactly the same issue, but a similar one].

PCI error recovery flows are bad enough without the netdevice transitioning
its state in-between the PCI callbacks.
Perhaps we should consider some general solution, e.g., preventing state
transitions during such a flow [i.e., until the PCI error recovery fully completes]?

> >
> > v2: Added logging messages per Prashant's request
> >
> > Cc: Prashant Sreedharan <prashant@broadcom.com>
> > Cc: Michael Chan <mchan@broadcom.com>
> >
> > Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> 
> Applied, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-03 16:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-01 12:21 [PATCH net v2] tg3: prevent ifup/ifdown during PCI error recovery Ivan Vecera
2014-09-02 17:32 ` Prashant Sreedharan
2014-09-02 20:02 ` David Miller
2014-09-03 16:25   ` Yuval Mintz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).