netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2.6.28 1/1] cxgb3 - fix race in EEH
@ 2008-09-26  0:05 Divy Le Ray
  2008-10-03  1:00 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Divy Le Ray @ 2008-09-26  0:05 UTC (permalink / raw)
  To: jeff; +Cc: netdev, linux-kernel, swise

From: Divy Le Ray <divy@chelsio.com>

A SGE queue set timer might access registers while in EEH recovery,
triggering an EEH error loop. Stop all timers early in EEH process.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
---

 drivers/net/cxgb3/adapter.h    |    1 +
 drivers/net/cxgb3/cxgb3_main.c |    5 +++++
 drivers/net/cxgb3/sge.c        |   21 ++++++++++++++++++---
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
index 2711404..b1a694b 100644
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -285,6 +285,7 @@ void t3_os_link_changed(struct adapter *adapter, int port_id, int link_status,
 
 void t3_sge_start(struct adapter *adap);
 void t3_sge_stop(struct adapter *adap);
+void t3_stop_sge_timers(struct adapter *adap);
 void t3_free_sge_resources(struct adapter *adap);
 void t3_sge_err_intr_handler(struct adapter *adapter);
 irq_handler_t t3_intr_handler(struct adapter *adap, int polling);
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 5447f3e..d355c82 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -479,6 +479,7 @@ static int setup_sge_qsets(struct adapter *adap)
 							     irq_idx,
 				&adap->params.sge.qset[qset_idx], ntxq, dev);
 			if (err) {
+				t3_stop_sge_timers(adap);
 				t3_free_sge_resources(adap);
 				return err;
 			}
@@ -2449,6 +2450,9 @@ static pci_ers_result_t t3_io_error_detected(struct pci_dev *pdev,
 	    test_bit(OFFLOAD_DEVMAP_BIT, &adapter->open_device_map))
 		offload_close(&adapter->tdev);
 
+	/* Stop SGE timers */
+	t3_stop_sge_timers(adapter);
+
 	adapter->flags &= ~FULL_INIT_DONE;
 
 	pci_disable_device(pdev);
@@ -2801,6 +2805,7 @@ static void __devexit remove_one(struct pci_dev *pdev)
 		    if (test_bit(i, &adapter->registered_device_map))
 			unregister_netdev(adapter->port[i]);
 
+		t3_stop_sge_timers(adapter);
 		t3_free_sge_resources(adapter);
 		cxgb_disable_msi(adapter);
 
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index f78a42c..52f4138 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -603,9 +603,6 @@ static void t3_free_qset(struct adapter *adapter, struct sge_qset *q)
 	int i;
 	struct pci_dev *pdev = adapter->pdev;
 
-	if (q->tx_reclaim_timer.function)
-		del_timer_sync(&q->tx_reclaim_timer);
-
 	for (i = 0; i < SGE_RXQ_PER_SET; ++i)
 		if (q->fl[i].desc) {
 			spin_lock_irq(&adapter->sge.reg_lock);
@@ -3008,6 +3005,24 @@ err:
 }
 
 /**
+ *	t3_stop_sge_timers - stop SGE timer call backs
+ *	@adap: the adapter
+ *
+ *	Stops each SGE queue set's timer call back
+ */
+void t3_stop_sge_timers(struct adapter *adap)
+{
+	int i;
+
+	for (i = 0; i < SGE_QSETS; ++i) {
+		struct sge_qset *q = &adap->sge.qs[i];
+
+		if (q->tx_reclaim_timer.function)
+			del_timer_sync(&q->tx_reclaim_timer);
+	}
+}
+
+/**
  *	t3_free_sge_resources - free SGE resources
  *	@adap: the adapter
  *


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 2.6.28 1/1] cxgb3 - fix race in EEH
  2008-09-26  0:05 [PATCH 2.6.28 1/1] cxgb3 - fix race in EEH Divy Le Ray
@ 2008-10-03  1:00 ` Andrew Morton
  2008-10-09  0:41   ` Divy Le Ray
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2008-10-03  1:00 UTC (permalink / raw)
  To: Divy Le Ray; +Cc: jeff, netdev, linux-kernel, swise

On Thu, 25 Sep 2008 17:05:28 -0700
Divy Le Ray <divy@chelsio.com> wrote:

> A SGE queue set timer might access registers while in EEH recovery,
> triggering an EEH error loop. Stop all timers early in EEH process.

<looks>

It's deeply weird that t3_reset_qset() does

	memset(&q->tx_reclaim_timer, 0, sizeof(q->tx_reclaim_timer));

There are lots of things in the timer_list which the driver has no
business modifying.  For example, this might break the metadata in
Thomas's debugobjects stuff, which attempts to catch things being done
in the wrong order (I don't think it will, but still...).

Rerunning init_timer() should repair the damage, but I suspect a simple

	q->tx_reclaim_timer.function = NULL;	/* explanation goes here */

would suffice here.


t3_sge_alloc_qset() could use the newer setup_timer().

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2.6.28 1/1] cxgb3 - fix race in EEH
  2008-10-03  1:00 ` Andrew Morton
@ 2008-10-09  0:41   ` Divy Le Ray
  0 siblings, 0 replies; 3+ messages in thread
From: Divy Le Ray @ 2008-10-09  0:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jeff, netdev, linux-kernel, Steve Wise

Andrew Morton wrote:
> On Thu, 25 Sep 2008 17:05:28 -0700
> Divy Le Ray <divy@chelsio.com> wrote:
>
>   
>> A SGE queue set timer might access registers while in EEH recovery,
>> triggering an EEH error loop. Stop all timers early in EEH process.
>>     
>
> <looks>
>
> It's deeply weird that t3_reset_qset() does
>
> 	memset(&q->tx_reclaim_timer, 0, sizeof(q->tx_reclaim_timer));
>
> There are lots of things in the timer_list which the driver has no
> business modifying.  For example, this might break the metadata in
> Thomas's debugobjects stuff, which attempts to catch things being done
> in the wrong order (I don't think it will, but still...).
>
> Rerunning init_timer() should repair the damage, but I suspect a simple
>
> 	q->tx_reclaim_timer.function = NULL;	/* explanation goes here */
>
> would suffice here.
>
>
> t3_sge_alloc_qset() could use the newer setup_timer().
>
>   
Hi Andrew,

Your suggestion is implemented in the first patch of the series I just 
posted.
I apologize for the delayed reply.

Cheers,
Divy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-10-09  0:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-26  0:05 [PATCH 2.6.28 1/1] cxgb3 - fix race in EEH Divy Le Ray
2008-10-03  1:00 ` Andrew Morton
2008-10-09  0:41   ` Divy Le Ray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).