Re: via_rhine kernel crashes in 2.6.32

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: via_rhine kernel crashes in 2.6.32
       [not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain>
@ 2009-12-21 12:03   ` Christian Kujau
  2009-12-21 12:36     ` Andrey Rahmatullin
  2009-12-21 18:18     ` Andrey Rahmatullin
  0 siblings, 2 replies; 14+ messages in thread
From: Christian Kujau @ 2009-12-21 12:03 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev

[netdev Cc'ed]

On Mon, 21 Dec 2009 at 01:03, Andrey Rahmatullin wrote:
> Here are the proper screenshots from 2.6.33-rc1:
> http://wrar.name/temp/P1010724.JPG
> http://wrar.name/temp/P1010725.JPG

Wow, I had to rotate my laptop here. In you first posting [0] you said you 
"upgraded to .32" but had crashes in .30 too. What was the last working 
kernel?

In the .jpgs one can see something along the lines of 
schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
find touching these things [1] altogether is rather old, from 2.6.24. Is 
2.6.29 working for you? 

Christian.

[0] http://lkml.org/lkml/2009/12/19/36
[1] http://lists.openwall.net/netdev/2007/08/28/13
-- 
BOFH excuse #19:

floating point processor overflow

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 12:03   ` via_rhine kernel crashes in 2.6.32 Christian Kujau
@ 2009-12-21 12:36     ` Andrey Rahmatullin
  2009-12-21 18:18     ` Andrey Rahmatullin
  1 sibling, 0 replies; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 12:36 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> Wow, I had to rotate my laptop here. In you first posting [0] you said you 
> "upgraded to .32" but had crashes in .30 too. What was the last working 
> kernel?
It was 2.6.31, but as I've already said I'm not sure it was because of
kernel version alone.

> In the .jpgs one can see something along the lines of 
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
> find touching these things [1] altogether is rather old, from 2.6.24. Is 
> 2.6.29 working for you? 
Well, I can try some old versions.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 12:03   ` via_rhine kernel crashes in 2.6.32 Christian Kujau
  2009-12-21 12:36     ` Andrey Rahmatullin
@ 2009-12-21 18:18     ` Andrey Rahmatullin
  2009-12-21 19:32       ` Christian Kujau
  1 sibling, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 18:18 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> In the .jpgs one can see something along the lines of 
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could 
> find touching these things [1] altogether is rather old, from 2.6.24. Is 
> 2.6.29 working for you? 
I've installed 2.6.27, the earliest kernel supported by udev 149. I
started to download two large files to my second machine (maybe 700 or
1000 Kbyte/s combined). Nothing happened. I stopped the downloads and
started deluged. The kernel showed "netdev watchdog timeout", but nothing
else happened. deluged opened something like 100 TCP connections and
started to upload some data at ~20 Kbyte/s. Nothing happened. I resumed
one of downloads, waited for some time, nothing happened. I resumed the
second download, the kernel crashed into an endless stream of backtraces
(did 2.6.27 support pause_on_oops?), containing "whatever from the idle
thread", or smth like that, which was also in other crash logs.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 18:18     ` Andrey Rahmatullin
@ 2009-12-21 19:32       ` Christian Kujau
  2009-12-22 12:32         ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Kujau @ 2009-12-21 19:32 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev

On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
> I've installed 2.6.27, the earliest kernel supported by udev 149. I
[...]
> one of downloads, waited for some time, nothing happened. I resumed the
> second download, the kernel crashed into an endless stream of backtraces
> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
> thread", or smth like that, which was also in other crash logs.

So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? 
I know it's a long shot, but since you seem to be able to reproduce this 
pretty reliably, can you try 2.6.23? Or at least something before 
bea3348eef27e6044b6161fd04c3152215f96411 [0]?

Christian.

[0] I'm *really* guessing here, if some netdev guru has some better
    understanding of the backtraces Andrey sent, please step forward.
-- 
BOFH excuse #433:

error: one bad user found in front of screen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-21 19:32       ` Christian Kujau
@ 2009-12-22 12:32         ` Jarek Poplawski
  2009-12-22 13:21           ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 12:32 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On 21-12-2009 20:32, Christian Kujau wrote:
> On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
>> I've installed 2.6.27, the earliest kernel supported by udev 149. I
> [...]
>> one of downloads, waited for some time, nothing happened. I resumed the
>> second download, the kernel crashed into an endless stream of backtraces
>> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
>> thread", or smth like that, which was also in other crash logs.
> 
> So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? 
> I know it's a long shot, but since you seem to be able to reproduce this 
> pretty reliably, can you try 2.6.23? Or at least something before 
> bea3348eef27e6044b6161fd04c3152215f96411 [0]?
> 
> Christian.
> 
> [0] I'm *really* guessing here, if some netdev guru has some better
>     understanding of the backtraces Andrey sent, please step forward.

It looks like napi_disable() should be illegal in ndo_tx_timeout().
Here is a patch which moves most of the timeout work to a workqueue,
similarly to tg3 etc. It should prevent at least one of reported
bugs. Alas I can't even check-compile it at the moment, so let me
know on any problems.

Jarek P.

---

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_irq(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_irq(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 12:32         ` Jarek Poplawski
@ 2009-12-22 13:21           ` Jarek Poplawski
  2009-12-22 13:38             ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:21 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> It looks like napi_disable() should be illegal in ndo_tx_timeout().
> Here is a patch which moves most of the timeout work to a workqueue,
> similarly to tg3 etc. It should prevent at least one of reported
> bugs. Alas I can't even check-compile it at the moment, so let me
> know on any problems.

It seems I needlessly changed locking btw, so here it is again.

Jarek P.

--- (take 2)

 drivers/net/via-rhine.c |   37 ++++++++++++++++++++++++++-----------
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..15a4063 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,15 +1184,11 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 13:21           ` Jarek Poplawski
@ 2009-12-22 13:38             ` Jarek Poplawski
  2009-12-22 15:00               ` Andrey Rahmatullin
  0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:38 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev

On Tue, Dec 22, 2009 at 01:21:07PM +0000, Jarek Poplawski wrote:
> On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > Here is a patch which moves most of the timeout work to a workqueue,
> > similarly to tg3 etc. It should prevent at least one of reported
> > bugs. Alas I can't even check-compile it at the moment, so let me
> > know on any problems.
> 
> It seems I needlessly changed locking btw, so here it is again.

Hmm... On the other hand, it definitely needs at least _bh now...

Sorry,
Jarek P.
 
--- (take 3)

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+					        reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_bh(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_bh(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 13:38             ` Jarek Poplawski
@ 2009-12-22 15:00               ` Andrey Rahmatullin
  2009-12-22 15:26                 ` Roger Luethi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-22 15:00 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Christian Kujau, LKML, Roger Luethi, netdev

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > Here is a patch which moves most of the timeout work to a workqueue,
> > > similarly to tg3 etc. It should prevent at least one of reported
> > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > know on any problems.
> > It seems I needlessly changed locking btw, so here it is again.
> Hmm... On the other hand, it definitely needs at least _bh now...
I've tried this patch. There are lots of "Transmit timed out", but no
crashes.

-- 
WBR, wRAR (ALT Linux Team)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 15:00               ` Andrey Rahmatullin
@ 2009-12-22 15:26                 ` Roger Luethi
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
  0 siblings, 2 replies; 14+ messages in thread
From: Roger Luethi @ 2009-12-22 15:26 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: Jarek Poplawski, Christian Kujau, LKML, netdev

On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > know on any problems.
> > > It seems I needlessly changed locking btw, so here it is again.
> > Hmm... On the other hand, it definitely needs at least _bh now...
> I've tried this patch. There are lots of "Transmit timed out", but no
> crashes.

ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
I suspect we shouldn't have to reset due to timeouts that often, but that's
another story.

Roger

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] net/via-rhine: Fix scheduling while atomic bugs
  2009-12-22 15:26                 ` Roger Luethi
@ 2009-12-22 17:36                   ` Jarek Poplawski
  2009-12-24  5:54                     ` David Miller
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
  1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 17:36 UTC (permalink / raw)
  To: David Miller
  Cc: Roger Luethi, Andrey Rahmatullin, Christian Kujau, LKML, netdev

On Tue, Dec 22, 2009 at 04:26:59PM +0100, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > > know on any problems.
> > > > It seems I needlessly changed locking btw, so here it is again.
> > > Hmm... On the other hand, it definitely needs at least _bh now...
> > I've tried this patch. There are lots of "Transmit timed out", but no
> > crashes.
> 
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.
> 
> Roger

Thanks everybody,
Jarek P.
------------------->

There are BUGs "scheduling while atomic" triggered by the timer
rhine_tx_timeout(). They are caused by calling napi_disable() (with
msleep()). This patch fixes it by moving most of the timer content to
the workqueue function (similarly to other drivers, like tg3), with
spin_lock() changed to BH version.

Additionally, there is spin_lock_irq() moved in rhine_close() to
exclude napi_disable() etc., also tg3's way.

Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Roger Luethi <rl@hellgate.ch>
---

 drivers/net/via-rhine.c |   41 ++++++++++++++++++++++++++++-------------
 1 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
 #include <linux/ethtool.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/workqueue.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
 	struct net_device *dev;
 	struct napi_struct napi;
 	spinlock_t lock;
+	struct work_struct reset_task;
 
 	/* Frequently used values: keep some adjacent for cache effect. */
 	u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
 static int  mdio_read(struct net_device *dev, int phy_id, int location);
 static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
 static int  rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
 static void rhine_tx_timeout(struct net_device *dev);
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
 	dev->irq = pdev->irq;
 
 	spin_lock_init(&rp->lock);
+	INIT_WORK(&rp->reset_task, rhine_reset_task);
+
 	rp->mii_if.dev = dev;
 	rp->mii_if.mdio_read = mdio_read;
 	rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
 	return 0;
 }
 
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
 {
-	struct rhine_private *rp = netdev_priv(dev);
-	void __iomem *ioaddr = rp->base;
-
-	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
-	       "%4.4x, resetting...\n",
-	       dev->name, ioread16(ioaddr + IntrStatus),
-	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+	struct rhine_private *rp = container_of(work, struct rhine_private,
+						reset_task);
+	struct net_device *dev = rp->dev;
 
 	/* protect against concurrent rx interrupts */
 	disable_irq(rp->pdev->irq);
 
 	napi_disable(&rp->napi);
 
-	spin_lock(&rp->lock);
+	spin_lock_bh(&rp->lock);
 
 	/* clear all descriptors */
 	free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
 	rhine_chip_reset(dev);
 	init_registers(dev);
 
-	spin_unlock(&rp->lock);
+	spin_unlock_bh(&rp->lock);
 	enable_irq(rp->pdev->irq);
 
 	dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void rhine_tx_timeout(struct net_device *dev)
+{
+	struct rhine_private *rp = netdev_priv(dev);
+	void __iomem *ioaddr = rp->base;
+
+	printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+	       "%4.4x, resetting...\n",
+	       dev->name, ioread16(ioaddr + IntrStatus),
+	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+	schedule_work(&rp->reset_task);
+}
+
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
 				  struct net_device *dev)
 {
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
 	struct rhine_private *rp = netdev_priv(dev);
 	void __iomem *ioaddr = rp->base;
 
-	spin_lock_irq(&rp->lock);
-
-	netif_stop_queue(dev);
 	napi_disable(&rp->napi);
+	cancel_work_sync(&rp->reset_task);
+	netif_stop_queue(dev);
+
+	spin_lock_irq(&rp->lock);
 
 	if (debug > 1)
 		printk(KERN_DEBUG "%s: Shutting down ethercard, "

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-22 15:26                 ` Roger Luethi
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-23  9:52                   ` Jarek Poplawski
  2009-12-23 16:21                     ` Andrey Rahmatullin
  1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-23  9:52 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Andrey Rahmatullin, Christian Kujau, LKML, netdev

On 22-12-2009 16:26, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
>> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
>>>>> It looks like napi_disable() should be illegal in ndo_tx_timeout().
>>>>> Here is a patch which moves most of the timeout work to a workqueue,
>>>>> similarly to tg3 etc. It should prevent at least one of reported
>>>>> bugs. Alas I can't even check-compile it at the moment, so let me
>>>>> know on any problems.
>>>> It seems I needlessly changed locking btw, so here it is again.
>>> Hmm... On the other hand, it definitely needs at least _bh now...
>> I've tried this patch. There are lots of "Transmit timed out", but no
>> crashes.
> 
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.

BTW, it seems a change in 2.6.31 might trigger these timeouts more
often than before. Andrey, could you try if this matters here?

Thanks,
Jarek P.

--- (on top of net-2.6 with the previous "Fix scheduling..." patch)

diff -Nurp a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
--- a/drivers/net/via-rhine.c	2009-12-23 09:28:25.000000000 +0000
+++ b/drivers/net/via-rhine.c	2009-12-23 09:33:57.000000000 +0000
@@ -1226,6 +1226,7 @@ static void rhine_tx_timeout(struct net_
 	       mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
 
 	schedule_work(&rp->reset_task);
+	netdev_get_tx_queue(dev, 0)->trans_start = jiffies;
 }
 
 static netdev_tx_t rhine_start_tx(struct sk_buff *skb,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
@ 2009-12-23 16:21                     ` Andrey Rahmatullin
  2009-12-23 16:30                       ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-23 16:21 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Roger Luethi, Christian Kujau, LKML, netdev

On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
> BTW, it seems a change in 2.6.31 might trigger these timeouts more
> often than before. Andrey, could you try if this matters here?
They appear once per 4 seconds with or without this patch.

-- 
WBR, wRAR (ALT Linux Team)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: via_rhine kernel crashes in 2.6.32
  2009-12-23 16:21                     ` Andrey Rahmatullin
@ 2009-12-23 16:30                       ` Jarek Poplawski
  0 siblings, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-23 16:30 UTC (permalink / raw)
  To: Andrey Rahmatullin; +Cc: Roger Luethi, Christian Kujau, LKML, netdev

Andrey Rahmatullin wrote, On 12/23/2009 05:21 PM:

> On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
>> BTW, it seems a change in 2.6.31 might trigger these timeouts more
>> often than before. Andrey, could you try if this matters here?
> They appear once per 4 seconds with or without this patch.
> 

OK, thanks for checking this.

Jarek P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net/via-rhine: Fix scheduling while atomic bugs
  2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-24  5:54                     ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2009-12-24  5:54 UTC (permalink / raw)
  To: jarkao2; +Cc: rl, wrar, lists, linux-kernel, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 22 Dec 2009 18:36:42 +0100

> There are BUGs "scheduling while atomic" triggered by the timer
> rhine_tx_timeout(). They are caused by calling napi_disable() (with
> msleep()). This patch fixes it by moving most of the timer content to
> the workqueue function (similarly to other drivers, like tg3), with
> spin_lock() changed to BH version.
> 
> Additionally, there is spin_lock_irq() moved in rhine_close() to
> exclude napi_disable() etc., also tg3's way.
> 
> Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
> Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-12-24  5:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20091219111206.GA5307@wrars-comp.wrarsdomain>
     [not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain>
2009-12-21 12:03   ` via_rhine kernel crashes in 2.6.32 Christian Kujau
2009-12-21 12:36     ` Andrey Rahmatullin
2009-12-21 18:18     ` Andrey Rahmatullin
2009-12-21 19:32       ` Christian Kujau
2009-12-22 12:32         ` Jarek Poplawski
2009-12-22 13:21           ` Jarek Poplawski
2009-12-22 13:38             ` Jarek Poplawski
2009-12-22 15:00               ` Andrey Rahmatullin
2009-12-22 15:26                 ` Roger Luethi
2009-12-22 17:36                   ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
2009-12-24  5:54                     ` David Miller
2009-12-23  9:52                   ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
2009-12-23 16:21                     ` Andrey Rahmatullin
2009-12-23 16:30                       ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).