* Re: via_rhine kernel crashes in 2.6.32
[not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain>
@ 2009-12-21 12:03 ` Christian Kujau
2009-12-21 12:36 ` Andrey Rahmatullin
2009-12-21 18:18 ` Andrey Rahmatullin
0 siblings, 2 replies; 14+ messages in thread
From: Christian Kujau @ 2009-12-21 12:03 UTC (permalink / raw)
To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev
[netdev Cc'ed]
On Mon, 21 Dec 2009 at 01:03, Andrey Rahmatullin wrote:
> Here are the proper screenshots from 2.6.33-rc1:
> http://wrar.name/temp/P1010724.JPG
> http://wrar.name/temp/P1010725.JPG
Wow, I had to rotate my laptop here. In you first posting [0] you said you
"upgraded to .32" but had crashes in .30 too. What was the last working
kernel?
In the .jpgs one can see something along the lines of
schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could
find touching these things [1] altogether is rather old, from 2.6.24. Is
2.6.29 working for you?
Christian.
[0] http://lkml.org/lkml/2009/12/19/36
[1] http://lists.openwall.net/netdev/2007/08/28/13
--
BOFH excuse #19:
floating point processor overflow
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau
@ 2009-12-21 12:36 ` Andrey Rahmatullin
2009-12-21 18:18 ` Andrey Rahmatullin
1 sibling, 0 replies; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 12:36 UTC (permalink / raw)
To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev
[-- Attachment #1: Type: text/plain, Size: 641 bytes --]
On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> Wow, I had to rotate my laptop here. In you first posting [0] you said you
> "upgraded to .32" but had crashes in .30 too. What was the last working
> kernel?
It was 2.6.31, but as I've already said I'm not sure it was because of
kernel version alone.
> In the .jpgs one can see something along the lines of
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could
> find touching these things [1] altogether is rather old, from 2.6.24. Is
> 2.6.29 working for you?
Well, I can try some old versions.
--
WBR, wRAR (ALT Linux Team)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau
2009-12-21 12:36 ` Andrey Rahmatullin
@ 2009-12-21 18:18 ` Andrey Rahmatullin
2009-12-21 19:32 ` Christian Kujau
1 sibling, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-21 18:18 UTC (permalink / raw)
To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev
[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]
On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote:
> In the .jpgs one can see something along the lines of
> schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could
> find touching these things [1] altogether is rather old, from 2.6.24. Is
> 2.6.29 working for you?
I've installed 2.6.27, the earliest kernel supported by udev 149. I
started to download two large files to my second machine (maybe 700 or
1000 Kbyte/s combined). Nothing happened. I stopped the downloads and
started deluged. The kernel showed "netdev watchdog timeout", but nothing
else happened. deluged opened something like 100 TCP connections and
started to upload some data at ~20 Kbyte/s. Nothing happened. I resumed
one of downloads, waited for some time, nothing happened. I resumed the
second download, the kernel crashed into an endless stream of backtraces
(did 2.6.27 support pause_on_oops?), containing "whatever from the idle
thread", or smth like that, which was also in other crash logs.
--
WBR, wRAR (ALT Linux Team)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-21 18:18 ` Andrey Rahmatullin
@ 2009-12-21 19:32 ` Christian Kujau
2009-12-22 12:32 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Christian Kujau @ 2009-12-21 19:32 UTC (permalink / raw)
To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev
On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
> I've installed 2.6.27, the earliest kernel supported by udev 149. I
[...]
> one of downloads, waited for some time, nothing happened. I resumed the
> second download, the kernel crashed into an endless stream of backtraces
> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
> thread", or smth like that, which was also in other crash logs.
So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30?
I know it's a long shot, but since you seem to be able to reproduce this
pretty reliably, can you try 2.6.23? Or at least something before
bea3348eef27e6044b6161fd04c3152215f96411 [0]?
Christian.
[0] I'm *really* guessing here, if some netdev guru has some better
understanding of the backtraces Andrey sent, please step forward.
--
BOFH excuse #433:
error: one bad user found in front of screen
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-21 19:32 ` Christian Kujau
@ 2009-12-22 12:32 ` Jarek Poplawski
2009-12-22 13:21 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 12:32 UTC (permalink / raw)
To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev
On 21-12-2009 20:32, Christian Kujau wrote:
> On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote:
>> I've installed 2.6.27, the earliest kernel supported by udev 149. I
> [...]
>> one of downloads, waited for some time, nothing happened. I resumed the
>> second download, the kernel crashed into an endless stream of backtraces
>> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle
>> thread", or smth like that, which was also in other crash logs.
>
> So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30?
> I know it's a long shot, but since you seem to be able to reproduce this
> pretty reliably, can you try 2.6.23? Or at least something before
> bea3348eef27e6044b6161fd04c3152215f96411 [0]?
>
> Christian.
>
> [0] I'm *really* guessing here, if some netdev guru has some better
> understanding of the backtraces Andrey sent, please step forward.
It looks like napi_disable() should be illegal in ndo_tx_timeout().
Here is a patch which moves most of the timeout work to a workqueue,
similarly to tg3 etc. It should prevent at least one of reported
bugs. Alas I can't even check-compile it at the moment, so let me
know on any problems.
Jarek P.
---
drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++-------------
1 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
#include <linux/ethtool.h>
#include <linux/crc32.h>
#include <linux/bitops.h>
+#include <linux/workqueue.h>
#include <asm/processor.h> /* Processor type for cache alignment. */
#include <asm/io.h>
#include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
struct net_device *dev;
struct napi_struct napi;
spinlock_t lock;
+ struct work_struct reset_task;
/* Frequently used values: keep some adjacent for cache effect. */
u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
static int rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
static void rhine_tx_timeout(struct net_device *dev);
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
dev->irq = pdev->irq;
spin_lock_init(&rp->lock);
+ INIT_WORK(&rp->reset_task, rhine_reset_task);
+
rp->mii_if.dev = dev;
rp->mii_if.mdio_read = mdio_read;
rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
return 0;
}
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
{
- struct rhine_private *rp = netdev_priv(dev);
- void __iomem *ioaddr = rp->base;
-
- printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
- "%4.4x, resetting...\n",
- dev->name, ioread16(ioaddr + IntrStatus),
- mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+ struct rhine_private *rp = container_of(work, struct rhine_private,
+ reset_task);
+ struct net_device *dev = rp->dev;
/* protect against concurrent rx interrupts */
disable_irq(rp->pdev->irq);
napi_disable(&rp->napi);
- spin_lock(&rp->lock);
+ spin_lock_irq(&rp->lock);
/* clear all descriptors */
free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
rhine_chip_reset(dev);
init_registers(dev);
- spin_unlock(&rp->lock);
+ spin_unlock_irq(&rp->lock);
enable_irq(rp->pdev->irq);
dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
netif_wake_queue(dev);
}
+static void rhine_tx_timeout(struct net_device *dev)
+{
+ struct rhine_private *rp = netdev_priv(dev);
+ void __iomem *ioaddr = rp->base;
+
+ printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+ "%4.4x, resetting...\n",
+ dev->name, ioread16(ioaddr + IntrStatus),
+ mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+ schedule_work(&rp->reset_task);
+}
+
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
struct rhine_private *rp = netdev_priv(dev);
void __iomem *ioaddr = rp->base;
- spin_lock_irq(&rp->lock);
-
- netif_stop_queue(dev);
napi_disable(&rp->napi);
+ cancel_work_sync(&rp->reset_task);
+ netif_stop_queue(dev);
+
+ spin_lock_irq(&rp->lock);
if (debug > 1)
printk(KERN_DEBUG "%s: Shutting down ethercard, "
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-22 12:32 ` Jarek Poplawski
@ 2009-12-22 13:21 ` Jarek Poplawski
2009-12-22 13:38 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:21 UTC (permalink / raw)
To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev
On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> It looks like napi_disable() should be illegal in ndo_tx_timeout().
> Here is a patch which moves most of the timeout work to a workqueue,
> similarly to tg3 etc. It should prevent at least one of reported
> bugs. Alas I can't even check-compile it at the moment, so let me
> know on any problems.
It seems I needlessly changed locking btw, so here it is again.
Jarek P.
--- (take 2)
drivers/net/via-rhine.c | 37 ++++++++++++++++++++++++++-----------
1 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..15a4063 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
#include <linux/ethtool.h>
#include <linux/crc32.h>
#include <linux/bitops.h>
+#include <linux/workqueue.h>
#include <asm/processor.h> /* Processor type for cache alignment. */
#include <asm/io.h>
#include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
struct net_device *dev;
struct napi_struct napi;
spinlock_t lock;
+ struct work_struct reset_task;
/* Frequently used values: keep some adjacent for cache effect. */
u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
static int rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
static void rhine_tx_timeout(struct net_device *dev);
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
dev->irq = pdev->irq;
spin_lock_init(&rp->lock);
+ INIT_WORK(&rp->reset_task, rhine_reset_task);
+
rp->mii_if.dev = dev;
rp->mii_if.mdio_read = mdio_read;
rp->mii_if.mdio_write = mdio_write;
@@ -1179,15 +1184,11 @@ static int rhine_open(struct net_device *dev)
return 0;
}
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
{
- struct rhine_private *rp = netdev_priv(dev);
- void __iomem *ioaddr = rp->base;
-
- printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
- "%4.4x, resetting...\n",
- dev->name, ioread16(ioaddr + IntrStatus),
- mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+ struct rhine_private *rp = container_of(work, struct rhine_private,
+ reset_task);
+ struct net_device *dev = rp->dev;
/* protect against concurrent rx interrupts */
disable_irq(rp->pdev->irq);
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
netif_wake_queue(dev);
}
+static void rhine_tx_timeout(struct net_device *dev)
+{
+ struct rhine_private *rp = netdev_priv(dev);
+ void __iomem *ioaddr = rp->base;
+
+ printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+ "%4.4x, resetting...\n",
+ dev->name, ioread16(ioaddr + IntrStatus),
+ mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+ schedule_work(&rp->reset_task);
+}
+
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
struct rhine_private *rp = netdev_priv(dev);
void __iomem *ioaddr = rp->base;
- spin_lock_irq(&rp->lock);
-
- netif_stop_queue(dev);
napi_disable(&rp->napi);
+ cancel_work_sync(&rp->reset_task);
+ netif_stop_queue(dev);
+
+ spin_lock_irq(&rp->lock);
if (debug > 1)
printk(KERN_DEBUG "%s: Shutting down ethercard, "
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-22 13:21 ` Jarek Poplawski
@ 2009-12-22 13:38 ` Jarek Poplawski
2009-12-22 15:00 ` Andrey Rahmatullin
0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 13:38 UTC (permalink / raw)
To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev
On Tue, Dec 22, 2009 at 01:21:07PM +0000, Jarek Poplawski wrote:
> On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote:
> > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > Here is a patch which moves most of the timeout work to a workqueue,
> > similarly to tg3 etc. It should prevent at least one of reported
> > bugs. Alas I can't even check-compile it at the moment, so let me
> > know on any problems.
>
> It seems I needlessly changed locking btw, so here it is again.
Hmm... On the other hand, it definitely needs at least _bh now...
Sorry,
Jarek P.
--- (take 3)
drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++-------------
1 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
#include <linux/ethtool.h>
#include <linux/crc32.h>
#include <linux/bitops.h>
+#include <linux/workqueue.h>
#include <asm/processor.h> /* Processor type for cache alignment. */
#include <asm/io.h>
#include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
struct net_device *dev;
struct napi_struct napi;
spinlock_t lock;
+ struct work_struct reset_task;
/* Frequently used values: keep some adjacent for cache effect. */
u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
static int rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
static void rhine_tx_timeout(struct net_device *dev);
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
dev->irq = pdev->irq;
spin_lock_init(&rp->lock);
+ INIT_WORK(&rp->reset_task, rhine_reset_task);
+
rp->mii_if.dev = dev;
rp->mii_if.mdio_read = mdio_read;
rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
return 0;
}
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
{
- struct rhine_private *rp = netdev_priv(dev);
- void __iomem *ioaddr = rp->base;
-
- printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
- "%4.4x, resetting...\n",
- dev->name, ioread16(ioaddr + IntrStatus),
- mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+ struct rhine_private *rp = container_of(work, struct rhine_private,
+ reset_task);
+ struct net_device *dev = rp->dev;
/* protect against concurrent rx interrupts */
disable_irq(rp->pdev->irq);
napi_disable(&rp->napi);
- spin_lock(&rp->lock);
+ spin_lock_bh(&rp->lock);
/* clear all descriptors */
free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
rhine_chip_reset(dev);
init_registers(dev);
- spin_unlock(&rp->lock);
+ spin_unlock_bh(&rp->lock);
enable_irq(rp->pdev->irq);
dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
netif_wake_queue(dev);
}
+static void rhine_tx_timeout(struct net_device *dev)
+{
+ struct rhine_private *rp = netdev_priv(dev);
+ void __iomem *ioaddr = rp->base;
+
+ printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+ "%4.4x, resetting...\n",
+ dev->name, ioread16(ioaddr + IntrStatus),
+ mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+ schedule_work(&rp->reset_task);
+}
+
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
struct rhine_private *rp = netdev_priv(dev);
void __iomem *ioaddr = rp->base;
- spin_lock_irq(&rp->lock);
-
- netif_stop_queue(dev);
napi_disable(&rp->napi);
+ cancel_work_sync(&rp->reset_task);
+ netif_stop_queue(dev);
+
+ spin_lock_irq(&rp->lock);
if (debug > 1)
printk(KERN_DEBUG "%s: Shutting down ethercard, "
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-22 13:38 ` Jarek Poplawski
@ 2009-12-22 15:00 ` Andrey Rahmatullin
2009-12-22 15:26 ` Roger Luethi
0 siblings, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-22 15:00 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Christian Kujau, LKML, Roger Luethi, netdev
[-- Attachment #1: Type: text/plain, Size: 645 bytes --]
On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > Here is a patch which moves most of the timeout work to a workqueue,
> > > similarly to tg3 etc. It should prevent at least one of reported
> > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > know on any problems.
> > It seems I needlessly changed locking btw, so here it is again.
> Hmm... On the other hand, it definitely needs at least _bh now...
I've tried this patch. There are lots of "Transmit timed out", but no
crashes.
--
WBR, wRAR (ALT Linux Team)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-22 15:00 ` Andrey Rahmatullin
@ 2009-12-22 15:26 ` Roger Luethi
2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
0 siblings, 2 replies; 14+ messages in thread
From: Roger Luethi @ 2009-12-22 15:26 UTC (permalink / raw)
To: Andrey Rahmatullin; +Cc: Jarek Poplawski, Christian Kujau, LKML, netdev
On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > know on any problems.
> > > It seems I needlessly changed locking btw, so here it is again.
> > Hmm... On the other hand, it definitely needs at least _bh now...
> I've tried this patch. There are lots of "Transmit timed out", but no
> crashes.
ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
I suspect we shouldn't have to reset due to timeouts that often, but that's
another story.
Roger
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] net/via-rhine: Fix scheduling while atomic bugs
2009-12-22 15:26 ` Roger Luethi
@ 2009-12-22 17:36 ` Jarek Poplawski
2009-12-24 5:54 ` David Miller
2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-22 17:36 UTC (permalink / raw)
To: David Miller
Cc: Roger Luethi, Andrey Rahmatullin, Christian Kujau, LKML, netdev
On Tue, Dec 22, 2009 at 04:26:59PM +0100, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
> > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
> > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout().
> > > > > Here is a patch which moves most of the timeout work to a workqueue,
> > > > > similarly to tg3 etc. It should prevent at least one of reported
> > > > > bugs. Alas I can't even check-compile it at the moment, so let me
> > > > > know on any problems.
> > > > It seems I needlessly changed locking btw, so here it is again.
> > > Hmm... On the other hand, it definitely needs at least _bh now...
> > I've tried this patch. There are lots of "Transmit timed out", but no
> > crashes.
>
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.
>
> Roger
Thanks everybody,
Jarek P.
------------------->
There are BUGs "scheduling while atomic" triggered by the timer
rhine_tx_timeout(). They are caused by calling napi_disable() (with
msleep()). This patch fixes it by moving most of the timer content to
the workqueue function (similarly to other drivers, like tg3), with
spin_lock() changed to BH version.
Additionally, there is spin_lock_irq() moved in rhine_close() to
exclude napi_disable() etc., also tg3's way.
Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Roger Luethi <rl@hellgate.ch>
---
drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++-------------
1 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 593e01f..125406b 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32;
#include <linux/ethtool.h>
#include <linux/crc32.h>
#include <linux/bitops.h>
+#include <linux/workqueue.h>
#include <asm/processor.h> /* Processor type for cache alignment. */
#include <asm/io.h>
#include <asm/irq.h>
@@ -389,6 +390,7 @@ struct rhine_private {
struct net_device *dev;
struct napi_struct napi;
spinlock_t lock;
+ struct work_struct reset_task;
/* Frequently used values: keep some adjacent for cache effect. */
u32 quirks;
@@ -407,6 +409,7 @@ struct rhine_private {
static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
static int rhine_open(struct net_device *dev);
+static void rhine_reset_task(struct work_struct *work);
static void rhine_tx_timeout(struct net_device *dev);
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev);
@@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev,
dev->irq = pdev->irq;
spin_lock_init(&rp->lock);
+ INIT_WORK(&rp->reset_task, rhine_reset_task);
+
rp->mii_if.dev = dev;
rp->mii_if.mdio_read = mdio_read;
rp->mii_if.mdio_write = mdio_write;
@@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev)
return 0;
}
-static void rhine_tx_timeout(struct net_device *dev)
+static void rhine_reset_task(struct work_struct *work)
{
- struct rhine_private *rp = netdev_priv(dev);
- void __iomem *ioaddr = rp->base;
-
- printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
- "%4.4x, resetting...\n",
- dev->name, ioread16(ioaddr + IntrStatus),
- mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+ struct rhine_private *rp = container_of(work, struct rhine_private,
+ reset_task);
+ struct net_device *dev = rp->dev;
/* protect against concurrent rx interrupts */
disable_irq(rp->pdev->irq);
napi_disable(&rp->napi);
- spin_lock(&rp->lock);
+ spin_lock_bh(&rp->lock);
/* clear all descriptors */
free_tbufs(dev);
@@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev)
rhine_chip_reset(dev);
init_registers(dev);
- spin_unlock(&rp->lock);
+ spin_unlock_bh(&rp->lock);
enable_irq(rp->pdev->irq);
dev->trans_start = jiffies;
@@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev)
netif_wake_queue(dev);
}
+static void rhine_tx_timeout(struct net_device *dev)
+{
+ struct rhine_private *rp = netdev_priv(dev);
+ void __iomem *ioaddr = rp->base;
+
+ printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status "
+ "%4.4x, resetting...\n",
+ dev->name, ioread16(ioaddr + IntrStatus),
+ mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
+
+ schedule_work(&rp->reset_task);
+}
+
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev)
struct rhine_private *rp = netdev_priv(dev);
void __iomem *ioaddr = rp->base;
- spin_lock_irq(&rp->lock);
-
- netif_stop_queue(dev);
napi_disable(&rp->napi);
+ cancel_work_sync(&rp->reset_task);
+ netif_stop_queue(dev);
+
+ spin_lock_irq(&rp->lock);
if (debug > 1)
printk(KERN_DEBUG "%s: Shutting down ethercard, "
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-22 15:26 ` Roger Luethi
2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-23 9:52 ` Jarek Poplawski
2009-12-23 16:21 ` Andrey Rahmatullin
1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-23 9:52 UTC (permalink / raw)
To: Roger Luethi; +Cc: Andrey Rahmatullin, Christian Kujau, LKML, netdev
On 22-12-2009 16:26, Roger Luethi wrote:
> On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote:
>> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote:
>>>>> It looks like napi_disable() should be illegal in ndo_tx_timeout().
>>>>> Here is a patch which moves most of the timeout work to a workqueue,
>>>>> similarly to tg3 etc. It should prevent at least one of reported
>>>>> bugs. Alas I can't even check-compile it at the moment, so let me
>>>>> know on any problems.
>>>> It seems I needlessly changed locking btw, so here it is again.
>>> Hmm... On the other hand, it definitely needs at least _bh now...
>> I've tried this patch. There are lots of "Transmit timed out", but no
>> crashes.
>
> ACK. Looks like you guys tracked down the crashing and fixed it (thanks!).
> I suspect we shouldn't have to reset due to timeouts that often, but that's
> another story.
BTW, it seems a change in 2.6.31 might trigger these timeouts more
often than before. Andrey, could you try if this matters here?
Thanks,
Jarek P.
--- (on top of net-2.6 with the previous "Fix scheduling..." patch)
diff -Nurp a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
--- a/drivers/net/via-rhine.c 2009-12-23 09:28:25.000000000 +0000
+++ b/drivers/net/via-rhine.c 2009-12-23 09:33:57.000000000 +0000
@@ -1226,6 +1226,7 @@ static void rhine_tx_timeout(struct net_
mdio_read(dev, rp->mii_if.phy_id, MII_BMSR));
schedule_work(&rp->reset_task);
+ netdev_get_tx_queue(dev, 0)->trans_start = jiffies;
}
static netdev_tx_t rhine_start_tx(struct sk_buff *skb,
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
@ 2009-12-23 16:21 ` Andrey Rahmatullin
2009-12-23 16:30 ` Jarek Poplawski
0 siblings, 1 reply; 14+ messages in thread
From: Andrey Rahmatullin @ 2009-12-23 16:21 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Roger Luethi, Christian Kujau, LKML, netdev
On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
> BTW, it seems a change in 2.6.31 might trigger these timeouts more
> often than before. Andrey, could you try if this matters here?
They appear once per 4 seconds with or without this patch.
--
WBR, wRAR (ALT Linux Team)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32
2009-12-23 16:21 ` Andrey Rahmatullin
@ 2009-12-23 16:30 ` Jarek Poplawski
0 siblings, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-12-23 16:30 UTC (permalink / raw)
To: Andrey Rahmatullin; +Cc: Roger Luethi, Christian Kujau, LKML, netdev
Andrey Rahmatullin wrote, On 12/23/2009 05:21 PM:
> On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote:
>> BTW, it seems a change in 2.6.31 might trigger these timeouts more
>> often than before. Andrey, could you try if this matters here?
> They appear once per 4 seconds with or without this patch.
>
OK, thanks for checking this.
Jarek P.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net/via-rhine: Fix scheduling while atomic bugs
2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
@ 2009-12-24 5:54 ` David Miller
0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2009-12-24 5:54 UTC (permalink / raw)
To: jarkao2; +Cc: rl, wrar, lists, linux-kernel, netdev
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 22 Dec 2009 18:36:42 +0100
> There are BUGs "scheduling while atomic" triggered by the timer
> rhine_tx_timeout(). They are caused by calling napi_disable() (with
> msleep()). This patch fixes it by moving most of the timer content to
> the workqueue function (similarly to other drivers, like tg3), with
> spin_lock() changed to BH version.
>
> Additionally, there is spin_lock_irq() moved in rhine_close() to
> exclude napi_disable() etc., also tg3's way.
>
> Reported-by: Andrey Rahmatullin <wrar@altlinux.org>
> Tested-by: Andrey Rahmatullin <wrar@altlinux.org>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Applied, thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-12-24 5:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091219111206.GA5307@wrars-comp.wrarsdomain>
[not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain>
2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau
2009-12-21 12:36 ` Andrey Rahmatullin
2009-12-21 18:18 ` Andrey Rahmatullin
2009-12-21 19:32 ` Christian Kujau
2009-12-22 12:32 ` Jarek Poplawski
2009-12-22 13:21 ` Jarek Poplawski
2009-12-22 13:38 ` Jarek Poplawski
2009-12-22 15:00 ` Andrey Rahmatullin
2009-12-22 15:26 ` Roger Luethi
2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
2009-12-24 5:54 ` David Miller
2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
2009-12-23 16:21 ` Andrey Rahmatullin
2009-12-23 16:30 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).