* Re: via_rhine kernel crashes in 2.6.32 [not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain> @ 2009-12-21 12:03 ` Christian Kujau 2009-12-21 12:36 ` Andrey Rahmatullin 2009-12-21 18:18 ` Andrey Rahmatullin 0 siblings, 2 replies; 14+ messages in thread From: Christian Kujau @ 2009-12-21 12:03 UTC (permalink / raw) To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev [netdev Cc'ed] On Mon, 21 Dec 2009 at 01:03, Andrey Rahmatullin wrote: > Here are the proper screenshots from 2.6.33-rc1: > http://wrar.name/temp/P1010724.JPG > http://wrar.name/temp/P1010725.JPG Wow, I had to rotate my laptop here. In you first posting [0] you said you "upgraded to .32" but had crashes in .30 too. What was the last working kernel? In the .jpgs one can see something along the lines of schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could find touching these things [1] altogether is rather old, from 2.6.24. Is 2.6.29 working for you? Christian. [0] http://lkml.org/lkml/2009/12/19/36 [1] http://lists.openwall.net/netdev/2007/08/28/13 -- BOFH excuse #19: floating point processor overflow ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau @ 2009-12-21 12:36 ` Andrey Rahmatullin 2009-12-21 18:18 ` Andrey Rahmatullin 1 sibling, 0 replies; 14+ messages in thread From: Andrey Rahmatullin @ 2009-12-21 12:36 UTC (permalink / raw) To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev [-- Attachment #1: Type: text/plain, Size: 641 bytes --] On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote: > Wow, I had to rotate my laptop here. In you first posting [0] you said you > "upgraded to .32" but had crashes in .30 too. What was the last working > kernel? It was 2.6.31, but as I've already said I'm not sure it was because of kernel version alone. > In the .jpgs one can see something along the lines of > schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could > find touching these things [1] altogether is rather old, from 2.6.24. Is > 2.6.29 working for you? Well, I can try some old versions. -- WBR, wRAR (ALT Linux Team) [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau 2009-12-21 12:36 ` Andrey Rahmatullin @ 2009-12-21 18:18 ` Andrey Rahmatullin 2009-12-21 19:32 ` Christian Kujau 1 sibling, 1 reply; 14+ messages in thread From: Andrey Rahmatullin @ 2009-12-21 18:18 UTC (permalink / raw) To: Christian Kujau; +Cc: LKML, Roger Luethi, netdev [-- Attachment #1: Type: text/plain, Size: 1057 bytes --] On Mon, Dec 21, 2009 at 04:03:06AM -0800, Christian Kujau wrote: > In the .jpgs one can see something along the lines of > schedule_timeout, napi_disable, rhine_tx_timeout - the only patch I could > find touching these things [1] altogether is rather old, from 2.6.24. Is > 2.6.29 working for you? I've installed 2.6.27, the earliest kernel supported by udev 149. I started to download two large files to my second machine (maybe 700 or 1000 Kbyte/s combined). Nothing happened. I stopped the downloads and started deluged. The kernel showed "netdev watchdog timeout", but nothing else happened. deluged opened something like 100 TCP connections and started to upload some data at ~20 Kbyte/s. Nothing happened. I resumed one of downloads, waited for some time, nothing happened. I resumed the second download, the kernel crashed into an endless stream of backtraces (did 2.6.27 support pause_on_oops?), containing "whatever from the idle thread", or smth like that, which was also in other crash logs. -- WBR, wRAR (ALT Linux Team) [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-21 18:18 ` Andrey Rahmatullin @ 2009-12-21 19:32 ` Christian Kujau 2009-12-22 12:32 ` Jarek Poplawski 0 siblings, 1 reply; 14+ messages in thread From: Christian Kujau @ 2009-12-21 19:32 UTC (permalink / raw) To: Andrey Rahmatullin; +Cc: LKML, Roger Luethi, netdev On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote: > I've installed 2.6.27, the earliest kernel supported by udev 149. I [...] > one of downloads, waited for some time, nothing happened. I resumed the > second download, the kernel crashed into an endless stream of backtraces > (did 2.6.27 support pause_on_oops?), containing "whatever from the idle > thread", or smth like that, which was also in other crash logs. So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? I know it's a long shot, but since you seem to be able to reproduce this pretty reliably, can you try 2.6.23? Or at least something before bea3348eef27e6044b6161fd04c3152215f96411 [0]? Christian. [0] I'm *really* guessing here, if some netdev guru has some better understanding of the backtraces Andrey sent, please step forward. -- BOFH excuse #433: error: one bad user found in front of screen ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-21 19:32 ` Christian Kujau @ 2009-12-22 12:32 ` Jarek Poplawski 2009-12-22 13:21 ` Jarek Poplawski 0 siblings, 1 reply; 14+ messages in thread From: Jarek Poplawski @ 2009-12-22 12:32 UTC (permalink / raw) To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev On 21-12-2009 20:32, Christian Kujau wrote: > On Mon, 21 Dec 2009 at 23:18, Andrey Rahmatullin wrote: >> I've installed 2.6.27, the earliest kernel supported by udev 149. I > [...] >> one of downloads, waited for some time, nothing happened. I resumed the >> second download, the kernel crashed into an endless stream of backtraces >> (did 2.6.27 support pause_on_oops?), containing "whatever from the idle >> thread", or smth like that, which was also in other crash logs. > > So, 2.6.27 crashed as well. Was the backtrace similar to those on 2.6.30? > I know it's a long shot, but since you seem to be able to reproduce this > pretty reliably, can you try 2.6.23? Or at least something before > bea3348eef27e6044b6161fd04c3152215f96411 [0]? > > Christian. > > [0] I'm *really* guessing here, if some netdev guru has some better > understanding of the backtraces Andrey sent, please step forward. It looks like napi_disable() should be illegal in ndo_tx_timeout(). Here is a patch which moves most of the timeout work to a workqueue, similarly to tg3 etc. It should prevent at least one of reported bugs. Alas I can't even check-compile it at the moment, so let me know on any problems. Jarek P. --- drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++------------- 1 files changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c index 593e01f..125406b 100644 --- a/drivers/net/via-rhine.c +++ b/drivers/net/via-rhine.c @@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32; #include <linux/ethtool.h> #include <linux/crc32.h> #include <linux/bitops.h> +#include <linux/workqueue.h> #include <asm/processor.h> /* Processor type for cache alignment. */ #include <asm/io.h> #include <asm/irq.h> @@ -389,6 +390,7 @@ struct rhine_private { struct net_device *dev; struct napi_struct napi; spinlock_t lock; + struct work_struct reset_task; /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; @@ -407,6 +409,7 @@ struct rhine_private { static int mdio_read(struct net_device *dev, int phy_id, int location); static void mdio_write(struct net_device *dev, int phy_id, int location, int value); static int rhine_open(struct net_device *dev); +static void rhine_reset_task(struct work_struct *work); static void rhine_tx_timeout(struct net_device *dev); static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev); @@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev, dev->irq = pdev->irq; spin_lock_init(&rp->lock); + INIT_WORK(&rp->reset_task, rhine_reset_task); + rp->mii_if.dev = dev; rp->mii_if.mdio_read = mdio_read; rp->mii_if.mdio_write = mdio_write; @@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev) return 0; } -static void rhine_tx_timeout(struct net_device *dev) +static void rhine_reset_task(struct work_struct *work) { - struct rhine_private *rp = netdev_priv(dev); - void __iomem *ioaddr = rp->base; - - printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " - "%4.4x, resetting...\n", - dev->name, ioread16(ioaddr + IntrStatus), - mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + struct rhine_private *rp = container_of(work, struct rhine_private, + reset_task); + struct net_device *dev = rp->dev; /* protect against concurrent rx interrupts */ disable_irq(rp->pdev->irq); napi_disable(&rp->napi); - spin_lock(&rp->lock); + spin_lock_irq(&rp->lock); /* clear all descriptors */ free_tbufs(dev); @@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev) rhine_chip_reset(dev); init_registers(dev); - spin_unlock(&rp->lock); + spin_unlock_irq(&rp->lock); enable_irq(rp->pdev->irq); dev->trans_start = jiffies; @@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev) netif_wake_queue(dev); } +static void rhine_tx_timeout(struct net_device *dev) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp->base; + + printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " + "%4.4x, resetting...\n", + dev->name, ioread16(ioaddr + IntrStatus), + mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + + schedule_work(&rp->reset_task); +} + static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev) { @@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev) struct rhine_private *rp = netdev_priv(dev); void __iomem *ioaddr = rp->base; - spin_lock_irq(&rp->lock); - - netif_stop_queue(dev); napi_disable(&rp->napi); + cancel_work_sync(&rp->reset_task); + netif_stop_queue(dev); + + spin_lock_irq(&rp->lock); if (debug > 1) printk(KERN_DEBUG "%s: Shutting down ethercard, " ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-22 12:32 ` Jarek Poplawski @ 2009-12-22 13:21 ` Jarek Poplawski 2009-12-22 13:38 ` Jarek Poplawski 0 siblings, 1 reply; 14+ messages in thread From: Jarek Poplawski @ 2009-12-22 13:21 UTC (permalink / raw) To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote: > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > Here is a patch which moves most of the timeout work to a workqueue, > similarly to tg3 etc. It should prevent at least one of reported > bugs. Alas I can't even check-compile it at the moment, so let me > know on any problems. It seems I needlessly changed locking btw, so here it is again. Jarek P. --- (take 2) drivers/net/via-rhine.c | 37 ++++++++++++++++++++++++++----------- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c index 593e01f..15a4063 100644 --- a/drivers/net/via-rhine.c +++ b/drivers/net/via-rhine.c @@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32; #include <linux/ethtool.h> #include <linux/crc32.h> #include <linux/bitops.h> +#include <linux/workqueue.h> #include <asm/processor.h> /* Processor type for cache alignment. */ #include <asm/io.h> #include <asm/irq.h> @@ -389,6 +390,7 @@ struct rhine_private { struct net_device *dev; struct napi_struct napi; spinlock_t lock; + struct work_struct reset_task; /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; @@ -407,6 +409,7 @@ struct rhine_private { static int mdio_read(struct net_device *dev, int phy_id, int location); static void mdio_write(struct net_device *dev, int phy_id, int location, int value); static int rhine_open(struct net_device *dev); +static void rhine_reset_task(struct work_struct *work); static void rhine_tx_timeout(struct net_device *dev); static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev); @@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev, dev->irq = pdev->irq; spin_lock_init(&rp->lock); + INIT_WORK(&rp->reset_task, rhine_reset_task); + rp->mii_if.dev = dev; rp->mii_if.mdio_read = mdio_read; rp->mii_if.mdio_write = mdio_write; @@ -1179,15 +1184,11 @@ static int rhine_open(struct net_device *dev) return 0; } -static void rhine_tx_timeout(struct net_device *dev) +static void rhine_reset_task(struct work_struct *work) { - struct rhine_private *rp = netdev_priv(dev); - void __iomem *ioaddr = rp->base; - - printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " - "%4.4x, resetting...\n", - dev->name, ioread16(ioaddr + IntrStatus), - mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + struct rhine_private *rp = container_of(work, struct rhine_private, + reset_task); + struct net_device *dev = rp->dev; /* protect against concurrent rx interrupts */ disable_irq(rp->pdev->irq); @@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev) netif_wake_queue(dev); } +static void rhine_tx_timeout(struct net_device *dev) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp->base; + + printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " + "%4.4x, resetting...\n", + dev->name, ioread16(ioaddr + IntrStatus), + mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + + schedule_work(&rp->reset_task); +} + static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev) { @@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev) struct rhine_private *rp = netdev_priv(dev); void __iomem *ioaddr = rp->base; - spin_lock_irq(&rp->lock); - - netif_stop_queue(dev); napi_disable(&rp->napi); + cancel_work_sync(&rp->reset_task); + netif_stop_queue(dev); + + spin_lock_irq(&rp->lock); if (debug > 1) printk(KERN_DEBUG "%s: Shutting down ethercard, " ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-22 13:21 ` Jarek Poplawski @ 2009-12-22 13:38 ` Jarek Poplawski 2009-12-22 15:00 ` Andrey Rahmatullin 0 siblings, 1 reply; 14+ messages in thread From: Jarek Poplawski @ 2009-12-22 13:38 UTC (permalink / raw) To: Christian Kujau; +Cc: Andrey Rahmatullin, LKML, Roger Luethi, netdev On Tue, Dec 22, 2009 at 01:21:07PM +0000, Jarek Poplawski wrote: > On Tue, Dec 22, 2009 at 12:32:11PM +0000, Jarek Poplawski wrote: > > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > > Here is a patch which moves most of the timeout work to a workqueue, > > similarly to tg3 etc. It should prevent at least one of reported > > bugs. Alas I can't even check-compile it at the moment, so let me > > know on any problems. > > It seems I needlessly changed locking btw, so here it is again. Hmm... On the other hand, it definitely needs at least _bh now... Sorry, Jarek P. --- (take 3) drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++------------- 1 files changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c index 593e01f..125406b 100644 --- a/drivers/net/via-rhine.c +++ b/drivers/net/via-rhine.c @@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32; #include <linux/ethtool.h> #include <linux/crc32.h> #include <linux/bitops.h> +#include <linux/workqueue.h> #include <asm/processor.h> /* Processor type for cache alignment. */ #include <asm/io.h> #include <asm/irq.h> @@ -389,6 +390,7 @@ struct rhine_private { struct net_device *dev; struct napi_struct napi; spinlock_t lock; + struct work_struct reset_task; /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; @@ -407,6 +409,7 @@ struct rhine_private { static int mdio_read(struct net_device *dev, int phy_id, int location); static void mdio_write(struct net_device *dev, int phy_id, int location, int value); static int rhine_open(struct net_device *dev); +static void rhine_reset_task(struct work_struct *work); static void rhine_tx_timeout(struct net_device *dev); static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev); @@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev, dev->irq = pdev->irq; spin_lock_init(&rp->lock); + INIT_WORK(&rp->reset_task, rhine_reset_task); + rp->mii_if.dev = dev; rp->mii_if.mdio_read = mdio_read; rp->mii_if.mdio_write = mdio_write; @@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev) return 0; } -static void rhine_tx_timeout(struct net_device *dev) +static void rhine_reset_task(struct work_struct *work) { - struct rhine_private *rp = netdev_priv(dev); - void __iomem *ioaddr = rp->base; - - printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " - "%4.4x, resetting...\n", - dev->name, ioread16(ioaddr + IntrStatus), - mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + struct rhine_private *rp = container_of(work, struct rhine_private, + reset_task); + struct net_device *dev = rp->dev; /* protect against concurrent rx interrupts */ disable_irq(rp->pdev->irq); napi_disable(&rp->napi); - spin_lock(&rp->lock); + spin_lock_bh(&rp->lock); /* clear all descriptors */ free_tbufs(dev); @@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev) rhine_chip_reset(dev); init_registers(dev); - spin_unlock(&rp->lock); + spin_unlock_bh(&rp->lock); enable_irq(rp->pdev->irq); dev->trans_start = jiffies; @@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev) netif_wake_queue(dev); } +static void rhine_tx_timeout(struct net_device *dev) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp->base; + + printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " + "%4.4x, resetting...\n", + dev->name, ioread16(ioaddr + IntrStatus), + mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + + schedule_work(&rp->reset_task); +} + static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev) { @@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev) struct rhine_private *rp = netdev_priv(dev); void __iomem *ioaddr = rp->base; - spin_lock_irq(&rp->lock); - - netif_stop_queue(dev); napi_disable(&rp->napi); + cancel_work_sync(&rp->reset_task); + netif_stop_queue(dev); + + spin_lock_irq(&rp->lock); if (debug > 1) printk(KERN_DEBUG "%s: Shutting down ethercard, " ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-22 13:38 ` Jarek Poplawski @ 2009-12-22 15:00 ` Andrey Rahmatullin 2009-12-22 15:26 ` Roger Luethi 0 siblings, 1 reply; 14+ messages in thread From: Andrey Rahmatullin @ 2009-12-22 15:00 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Christian Kujau, LKML, Roger Luethi, netdev [-- Attachment #1: Type: text/plain, Size: 645 bytes --] On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote: > > > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > > > Here is a patch which moves most of the timeout work to a workqueue, > > > similarly to tg3 etc. It should prevent at least one of reported > > > bugs. Alas I can't even check-compile it at the moment, so let me > > > know on any problems. > > It seems I needlessly changed locking btw, so here it is again. > Hmm... On the other hand, it definitely needs at least _bh now... I've tried this patch. There are lots of "Transmit timed out", but no crashes. -- WBR, wRAR (ALT Linux Team) [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-22 15:00 ` Andrey Rahmatullin @ 2009-12-22 15:26 ` Roger Luethi 2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski 2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski 0 siblings, 2 replies; 14+ messages in thread From: Roger Luethi @ 2009-12-22 15:26 UTC (permalink / raw) To: Andrey Rahmatullin; +Cc: Jarek Poplawski, Christian Kujau, LKML, netdev On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote: > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote: > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > > > > Here is a patch which moves most of the timeout work to a workqueue, > > > > similarly to tg3 etc. It should prevent at least one of reported > > > > bugs. Alas I can't even check-compile it at the moment, so let me > > > > know on any problems. > > > It seems I needlessly changed locking btw, so here it is again. > > Hmm... On the other hand, it definitely needs at least _bh now... > I've tried this patch. There are lots of "Transmit timed out", but no > crashes. ACK. Looks like you guys tracked down the crashing and fixed it (thanks!). I suspect we shouldn't have to reset due to timeouts that often, but that's another story. Roger ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] net/via-rhine: Fix scheduling while atomic bugs 2009-12-22 15:26 ` Roger Luethi @ 2009-12-22 17:36 ` Jarek Poplawski 2009-12-24 5:54 ` David Miller 2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski 1 sibling, 1 reply; 14+ messages in thread From: Jarek Poplawski @ 2009-12-22 17:36 UTC (permalink / raw) To: David Miller Cc: Roger Luethi, Andrey Rahmatullin, Christian Kujau, LKML, netdev On Tue, Dec 22, 2009 at 04:26:59PM +0100, Roger Luethi wrote: > On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote: > > On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote: > > > > > It looks like napi_disable() should be illegal in ndo_tx_timeout(). > > > > > Here is a patch which moves most of the timeout work to a workqueue, > > > > > similarly to tg3 etc. It should prevent at least one of reported > > > > > bugs. Alas I can't even check-compile it at the moment, so let me > > > > > know on any problems. > > > > It seems I needlessly changed locking btw, so here it is again. > > > Hmm... On the other hand, it definitely needs at least _bh now... > > I've tried this patch. There are lots of "Transmit timed out", but no > > crashes. > > ACK. Looks like you guys tracked down the crashing and fixed it (thanks!). > I suspect we shouldn't have to reset due to timeouts that often, but that's > another story. > > Roger Thanks everybody, Jarek P. -------------------> There are BUGs "scheduling while atomic" triggered by the timer rhine_tx_timeout(). They are caused by calling napi_disable() (with msleep()). This patch fixes it by moving most of the timer content to the workqueue function (similarly to other drivers, like tg3), with spin_lock() changed to BH version. Additionally, there is spin_lock_irq() moved in rhine_close() to exclude napi_disable() etc., also tg3's way. Reported-by: Andrey Rahmatullin <wrar@altlinux.org> Tested-by: Andrey Rahmatullin <wrar@altlinux.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Roger Luethi <rl@hellgate.ch> --- drivers/net/via-rhine.c | 41 ++++++++++++++++++++++++++++------------- 1 files changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c index 593e01f..125406b 100644 --- a/drivers/net/via-rhine.c +++ b/drivers/net/via-rhine.c @@ -102,6 +102,7 @@ static const int multicast_filter_limit = 32; #include <linux/ethtool.h> #include <linux/crc32.h> #include <linux/bitops.h> +#include <linux/workqueue.h> #include <asm/processor.h> /* Processor type for cache alignment. */ #include <asm/io.h> #include <asm/irq.h> @@ -389,6 +390,7 @@ struct rhine_private { struct net_device *dev; struct napi_struct napi; spinlock_t lock; + struct work_struct reset_task; /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; @@ -407,6 +409,7 @@ struct rhine_private { static int mdio_read(struct net_device *dev, int phy_id, int location); static void mdio_write(struct net_device *dev, int phy_id, int location, int value); static int rhine_open(struct net_device *dev); +static void rhine_reset_task(struct work_struct *work); static void rhine_tx_timeout(struct net_device *dev); static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev); @@ -775,6 +778,8 @@ static int __devinit rhine_init_one(struct pci_dev *pdev, dev->irq = pdev->irq; spin_lock_init(&rp->lock); + INIT_WORK(&rp->reset_task, rhine_reset_task); + rp->mii_if.dev = dev; rp->mii_if.mdio_read = mdio_read; rp->mii_if.mdio_write = mdio_write; @@ -1179,22 +1184,18 @@ static int rhine_open(struct net_device *dev) return 0; } -static void rhine_tx_timeout(struct net_device *dev) +static void rhine_reset_task(struct work_struct *work) { - struct rhine_private *rp = netdev_priv(dev); - void __iomem *ioaddr = rp->base; - - printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " - "%4.4x, resetting...\n", - dev->name, ioread16(ioaddr + IntrStatus), - mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + struct rhine_private *rp = container_of(work, struct rhine_private, + reset_task); + struct net_device *dev = rp->dev; /* protect against concurrent rx interrupts */ disable_irq(rp->pdev->irq); napi_disable(&rp->napi); - spin_lock(&rp->lock); + spin_lock_bh(&rp->lock); /* clear all descriptors */ free_tbufs(dev); @@ -1206,7 +1207,7 @@ static void rhine_tx_timeout(struct net_device *dev) rhine_chip_reset(dev); init_registers(dev); - spin_unlock(&rp->lock); + spin_unlock_bh(&rp->lock); enable_irq(rp->pdev->irq); dev->trans_start = jiffies; @@ -1214,6 +1215,19 @@ static void rhine_tx_timeout(struct net_device *dev) netif_wake_queue(dev); } +static void rhine_tx_timeout(struct net_device *dev) +{ + struct rhine_private *rp = netdev_priv(dev); + void __iomem *ioaddr = rp->base; + + printk(KERN_WARNING "%s: Transmit timed out, status %4.4x, PHY status " + "%4.4x, resetting...\n", + dev->name, ioread16(ioaddr + IntrStatus), + mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); + + schedule_work(&rp->reset_task); +} + static netdev_tx_t rhine_start_tx(struct sk_buff *skb, struct net_device *dev) { @@ -1830,10 +1844,11 @@ static int rhine_close(struct net_device *dev) struct rhine_private *rp = netdev_priv(dev); void __iomem *ioaddr = rp->base; - spin_lock_irq(&rp->lock); - - netif_stop_queue(dev); napi_disable(&rp->napi); + cancel_work_sync(&rp->reset_task); + netif_stop_queue(dev); + + spin_lock_irq(&rp->lock); if (debug > 1) printk(KERN_DEBUG "%s: Shutting down ethercard, " ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] net/via-rhine: Fix scheduling while atomic bugs 2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski @ 2009-12-24 5:54 ` David Miller 0 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2009-12-24 5:54 UTC (permalink / raw) To: jarkao2; +Cc: rl, wrar, lists, linux-kernel, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Tue, 22 Dec 2009 18:36:42 +0100 > There are BUGs "scheduling while atomic" triggered by the timer > rhine_tx_timeout(). They are caused by calling napi_disable() (with > msleep()). This patch fixes it by moving most of the timer content to > the workqueue function (similarly to other drivers, like tg3), with > spin_lock() changed to BH version. > > Additionally, there is spin_lock_irq() moved in rhine_close() to > exclude napi_disable() etc., also tg3's way. > > Reported-by: Andrey Rahmatullin <wrar@altlinux.org> > Tested-by: Andrey Rahmatullin <wrar@altlinux.org> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Applied, thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-22 15:26 ` Roger Luethi 2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski @ 2009-12-23 9:52 ` Jarek Poplawski 2009-12-23 16:21 ` Andrey Rahmatullin 1 sibling, 1 reply; 14+ messages in thread From: Jarek Poplawski @ 2009-12-23 9:52 UTC (permalink / raw) To: Roger Luethi; +Cc: Andrey Rahmatullin, Christian Kujau, LKML, netdev On 22-12-2009 16:26, Roger Luethi wrote: > On Tue, 22 Dec 2009 20:00:45 +0500, Andrey Rahmatullin wrote: >> On Tue, Dec 22, 2009 at 01:38:17PM +0000, Jarek Poplawski wrote: >>>>> It looks like napi_disable() should be illegal in ndo_tx_timeout(). >>>>> Here is a patch which moves most of the timeout work to a workqueue, >>>>> similarly to tg3 etc. It should prevent at least one of reported >>>>> bugs. Alas I can't even check-compile it at the moment, so let me >>>>> know on any problems. >>>> It seems I needlessly changed locking btw, so here it is again. >>> Hmm... On the other hand, it definitely needs at least _bh now... >> I've tried this patch. There are lots of "Transmit timed out", but no >> crashes. > > ACK. Looks like you guys tracked down the crashing and fixed it (thanks!). > I suspect we shouldn't have to reset due to timeouts that often, but that's > another story. BTW, it seems a change in 2.6.31 might trigger these timeouts more often than before. Andrey, could you try if this matters here? Thanks, Jarek P. --- (on top of net-2.6 with the previous "Fix scheduling..." patch) diff -Nurp a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c --- a/drivers/net/via-rhine.c 2009-12-23 09:28:25.000000000 +0000 +++ b/drivers/net/via-rhine.c 2009-12-23 09:33:57.000000000 +0000 @@ -1226,6 +1226,7 @@ static void rhine_tx_timeout(struct net_ mdio_read(dev, rp->mii_if.phy_id, MII_BMSR)); schedule_work(&rp->reset_task); + netdev_get_tx_queue(dev, 0)->trans_start = jiffies; } static netdev_tx_t rhine_start_tx(struct sk_buff *skb, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski @ 2009-12-23 16:21 ` Andrey Rahmatullin 2009-12-23 16:30 ` Jarek Poplawski 0 siblings, 1 reply; 14+ messages in thread From: Andrey Rahmatullin @ 2009-12-23 16:21 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Roger Luethi, Christian Kujau, LKML, netdev On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote: > BTW, it seems a change in 2.6.31 might trigger these timeouts more > often than before. Andrey, could you try if this matters here? They appear once per 4 seconds with or without this patch. -- WBR, wRAR (ALT Linux Team) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: via_rhine kernel crashes in 2.6.32 2009-12-23 16:21 ` Andrey Rahmatullin @ 2009-12-23 16:30 ` Jarek Poplawski 0 siblings, 0 replies; 14+ messages in thread From: Jarek Poplawski @ 2009-12-23 16:30 UTC (permalink / raw) To: Andrey Rahmatullin; +Cc: Roger Luethi, Christian Kujau, LKML, netdev Andrey Rahmatullin wrote, On 12/23/2009 05:21 PM: > On Wed, Dec 23, 2009 at 09:52:03AM +0000, Jarek Poplawski wrote: >> BTW, it seems a change in 2.6.31 might trigger these timeouts more >> often than before. Andrey, could you try if this matters here? > They appear once per 4 seconds with or without this patch. > OK, thanks for checking this. Jarek P. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-12-24 5:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091219111206.GA5307@wrars-comp.wrarsdomain>
[not found] ` <20091220200343.GA5427@wrars-comp.wrarsdomain>
2009-12-21 12:03 ` via_rhine kernel crashes in 2.6.32 Christian Kujau
2009-12-21 12:36 ` Andrey Rahmatullin
2009-12-21 18:18 ` Andrey Rahmatullin
2009-12-21 19:32 ` Christian Kujau
2009-12-22 12:32 ` Jarek Poplawski
2009-12-22 13:21 ` Jarek Poplawski
2009-12-22 13:38 ` Jarek Poplawski
2009-12-22 15:00 ` Andrey Rahmatullin
2009-12-22 15:26 ` Roger Luethi
2009-12-22 17:36 ` [PATCH] net/via-rhine: Fix scheduling while atomic bugs Jarek Poplawski
2009-12-24 5:54 ` David Miller
2009-12-23 9:52 ` via_rhine kernel crashes in 2.6.32 Jarek Poplawski
2009-12-23 16:21 ` Andrey Rahmatullin
2009-12-23 16:30 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).