* bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar @ 2008-07-18 12:10 Sebastian Siewior 2008-07-21 22:57 ` Nate Case 2008-07-22 7:54 ` Wolfram Sang 0 siblings, 2 replies; 9+ messages in thread From: Sebastian Siewior @ 2008-07-18 12:10 UTC (permalink / raw) To: Andy Fleming; +Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] changed the phydev->lock from spinlock into a mutex. Now, the following code path got triggered while NFS was unavailable: |[ 21.287359] nfs: server 10.11.3.47 not responding, still trying |[ 38.891373] nfs: server 10.11.3.47 not responding, still trying |[ 148.179592] INFO: task udevd:1762 blocked for more than 120 seconds. |[ 148.185967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |[ 148.193810] udevd D 0fef1dd8 0 1762 1761 |[ 148.199055] Call Trace: |[ 148.201504] [cecdda80] [c00071e4] __switch_to+0x6c/0x84 |[ 148.206764] [cecddaa0] [c025973c] schedule+0x46c/0x4cc |[ 148.211937] [cecddad0] [c00f3d84] nfs_wait_schedule+0x24/0x38 |[ 148.217712] [cecddae0] [c0259b74] __wait_on_bit_lock+0x68/0xcc |[ 148.223576] [cecddb00] [c0259c4c] out_of_line_wait_on_bit_lock+0x74/0x88 |[ 148.230300] [cecddb50] [c00f3e6c] __nfs_revalidate_inode+0xd4/0x264 |[ 148.236597] [cecddc20] [c00f1298] nfs_lookup_revalidate+0x1bc/0x3d4 |[ 148.243071] [cecddd80] [c0081db8] do_lookup+0x148/0x1a0 |[ 148.248361] [cecdddb0] [c0083bac] __link_path_walk+0x930/0xe24 |[ 148.254219] [cecdde00] [c00840e8] path_walk+0x48/0xa8 |[ 148.259293] [cecdde30] [c008442c] do_path_lookup+0x160/0x194 |[ 148.264982] [cecdde60] [c0084fe0] __path_lookup_intent_open+0x58/0xa4 |[ 148.271444] [cecdde80] [c007ea54] open_exec+0x2c/0xdc |[ 148.276525] [cecddef0] [c007efa4] do_execve+0x58/0x1c4 |[ 148.281704] [cecddf20] [c0007568] sys_execve+0x58/0x84 |[ 148.286873] [cecddf40] [c000df58] ret_from_syscall+0x0/0x3c |[ 169.651632] INFO: task udevsettle:1053 blocked for more than 120 seconds. some more of this and now the interresting part: |[ 194.859659] NETDEV WATCHDOG: eth0: transmit timed out |[ 194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 |[ 194.875529] in_atomic():1, irqs_disabled():0 |[ 194.879805] Call Trace: |[ 194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) |[ 194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4 |[ 194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c |[ 194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70 |[ 194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 |[ 194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 |[ 194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 |[ 194.920064] [c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8 |[ 194.926008] [c0383e60] [c002b084] __do_softirq+0x5c/0xc4 |[ 194.931350] [c0383e80] [c00046fc] do_softirq+0x3c/0x54 |[ 194.936515] [c0383e90] [c002ac60] irq_exit+0x3c/0x5c |[ 194.941499] [c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8 |[ 194.947097] [c0383ec0] [c000e5ac] ret_from_except+0x0/0x18 |[ 194.952610] [c0383f80] [c000804c] cpu_idle+0xcc/0xdc |[ 194.957592] [c0383fa0] [c025c07c] etext+0x7c/0x90 |[ 194.962322] [c0383fc0] [c0338960] start_kernel+0x294/0x2a8 |[ 194.967839] [c0383ff0] [c00003dc] skpinv+0x304/0x340 |[ 194.972833] ------------[ cut here ]------------ |[ 194.977450] Badness at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:134 |[ 194.984589] NIP: c025a268 LR: c025a250 CTR: c017e224 |[ 194.989557] REGS: c0383cf0 TRAP: 0700 Not tainted (2.6.26) |[ 194.995302] MSR: 00029000 <EE,ME> CR: 28002022 XER: 00000000 |[ 195.001167] TASK = c035e500[0] 'swapper' THREAD: c0382000 |[ 195.006390] GPR00: 00000000 c0383da0 c035e500 00000001 c035e500 00000010 00000000 c0360000 |[ 195.014798] GPR08: 00000000 c0390000 00000001 c0360000 00006353 628a87a2 0ffe8600 00000000 |[ 195.023206] GPR16: cab54ee3 00000000 00000000 0ffe7384 00000000 00000000 0ff904a0 00000000 |[ 195.031612] GPR24: 00000000 00000000 c038e5a4 d1058000 c035e500 cf86b570 cf9c3888 cf9c3888 |[ 195.040199] NIP [c025a268] __mutex_lock_slowpath+0x44/0x1f4 |[ 195.045783] LR [c025a250] __mutex_lock_slowpath+0x2c/0x1f4 |[ 195.051277] Call Trace: |[ 195.053721] [c0383da0] [cf9c3888] 0xcf9c3888 (unreliable) |[ 195.059146] [c0383de0] [c019005c] phy_stop+0x20/0x70 |[ 195.064135] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 |[ 195.069202] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 |[ 195.074529] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 |[ 195.079946] [c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8 |[ 195.085885] [c0383e60] [c002b084] __do_softirq+0x5c/0xc4 |[ 195.091219] [c0383e80] [c00046fc] do_softirq+0x3c/0x54 |[ 195.096374] [c0383e90] [c002ac60] irq_exit+0x3c/0x5c |[ 195.101353] [c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8 |[ 195.106944] [c0383ec0] [c000e5ac] ret_from_except+0x0/0x18 |[ 195.112447] [c0383f80] [c000804c] cpu_idle+0xcc/0xdc |[ 195.117426] [c0383fa0] [c025c07c] etext+0x7c/0x90 |[ 195.122147] [c0383fc0] [c0338960] start_kernel+0x294/0x2a8 |[ 195.127655] [c0383ff0] [c00003dc] skpinv+0x304/0x340 |[ 195.132633] Instruction dump: |[ 195.135422] 90010044 7c5c1378 8009000c 5409012f 41a20024 4bee78dd 2f830000 419e0018 |[ 195.143222] 3d20c039 80098714 2f800000 409e0008 <0fe00000> 7fc000a6 7c000146 801f0024 I found out that the same code path may be trigger in - drivers/net/ucc_geth.c - drivers/net/fec_mpc52xx.c - drivers/net/fs_enet/fs_enet-main.c other drivers use phy_stop() in ->close only. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar 2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior @ 2008-07-21 22:57 ` Nate Case 2008-07-22 20:59 ` Sebastian Siewior 2008-07-23 22:12 ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt 2008-07-22 7:54 ` Wolfram Sang 1 sibling, 2 replies; 9+ messages in thread From: Nate Case @ 2008-07-21 22:57 UTC (permalink / raw) To: Sebastian Siewior; +Cc: netdev, linuxppc-dev, Vitaly Bordug, Li Yang On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote: > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] > changed the phydev->lock from spinlock into a mutex. Now, the following > code path got triggered while NFS was unavailable: > [snip] > |[ 194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 > |[ 194.875529] in_atomic():1, irqs_disabled():0 > |[ 194.879805] Call Trace: > |[ 194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) > |[ 194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4 > |[ 194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c > |[ 194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70 > |[ 194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 > |[ 194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 > |[ 194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 Hmm.. I'm not sure what the best solution is to this. Make the stop_gfar() call happen in a workqueue, and make a similar change to ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in a workqueue conditionally if in interrupt context? Between these two I'd lean toward the latter. Does anyone have any better ideas? -- Nate Case <ncase@xes-inc.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar 2008-07-21 22:57 ` Nate Case @ 2008-07-22 20:59 ` Sebastian Siewior 2008-07-23 20:03 ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior 2008-07-23 22:12 ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt 1 sibling, 1 reply; 9+ messages in thread From: Sebastian Siewior @ 2008-07-22 20:59 UTC (permalink / raw) To: Nate Case; +Cc: netdev, Sebastian Siewior, linuxppc-dev, Vitaly Bordug, Li Yang * Nate Case | 2008-07-21 17:57:08 [-0500]: >On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote: >> Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] >> changed the phydev->lock from spinlock into a mutex. Now, the following >> code path got triggered while NFS was unavailable: >> >[snip] >> |[ 194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 >> |[ 194.875529] in_atomic():1, irqs_disabled():0 >> |[ 194.879805] Call Trace: >> |[ 194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) >> |[ 194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4 >> |[ 194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c >> |[ 194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70 >> |[ 194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 >> |[ 194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 >> |[ 194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 > >Hmm.. I'm not sure what the best solution is to this. Make the >stop_gfar() call happen in a workqueue, and make a similar change to >ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in >a workqueue conditionally if in interrupt context? Between these two >I'd lean toward the latter. > >Does anyone have any better ideas? If I look at tg3.c than exactly this is done. Others call it only on close(). I guess this depends very much on driver's logic :) If nobody minds, than I would assume that tg3.c is a good example and I would move the timout path into a workqueu. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar 2008-07-22 20:59 ` Sebastian Siewior @ 2008-07-23 20:03 ` Sebastian Siewior 2008-07-25 14:16 ` Nate Case 2008-07-25 19:02 ` Andy Fleming 0 siblings, 2 replies; 9+ messages in thread From: Sebastian Siewior @ 2008-07-23 20:03 UTC (permalink / raw) To: Andy Fleming Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang, Jeff Garzik From: Sebastian Siewior <bigeasy@linutronix.de> I got the following backtrace while network was unavailble: |NETDEV WATCHDOG: eth0: transmit timed out |BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 |in_atomic():1, irqs_disabled():0 |Call Trace: |[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) |[c0383db0] [c001e938] __might_sleep+0xe0/0xf4 |[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c |[c0383de0] [c019005c] phy_stop+0x20/0x70 |[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 |[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 |[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 |[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8 |[c0383e60] [c002b084] __do_softirq+0x5c/0xc4 |[c0383e80] [c00046fc] do_softirq+0x3c/0x54 |[c0383e90] [c002ac60] irq_exit+0x3c/0x5c |[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8 |[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18 |[c0383f80] [c000804c] cpu_idle+0xcc/0xdc |[c0383fa0] [c025c07c] etext+0x7c/0x90 |[c0383fc0] [c0338960] start_kernel+0x294/0x2a8 |[c0383ff0] [c00003dc] skpinv+0x304/0x340 |------------[ cut here ]------------ The phylock was once a spinlock but got changed into a mutex via commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> --- bug report @ http://marc.info/?l=linux-netdev&m=121638307116389&w=2 I moved it into a workqueue, this is what tg3 does. I would convert the other three drivers unless $dude suggests a better method or somebody else takes care.... drivers/net/gianfar.c | 22 ++++++++++++++++++---- drivers/net/gianfar.h | 2 ++ 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 25bdd08..caa6cbd 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -112,6 +112,7 @@ const char gfar_driver_version[] = "1.3"; static int gfar_enet_open(struct net_device *dev); static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev); +static void gfar_reset_task(struct work_struct *work); static void gfar_timeout(struct net_device *dev); static int gfar_close(struct net_device *dev); struct sk_buff *gfar_new_skb(struct net_device *dev); @@ -216,6 +217,7 @@ static int gfar_probe(struct platform_device *pdev) spin_lock_init(&priv->txlock); spin_lock_init(&priv->rxlock); + INIT_WORK(&priv->reset_task, gfar_reset_task); platform_set_drvdata(pdev, dev); @@ -1132,6 +1134,7 @@ static int gfar_close(struct net_device *dev) napi_disable(&priv->napi); #endif + cancel_work_sync(&priv->reset_task); stop_gfar(dev); /* Disconnect from the PHY */ @@ -1246,13 +1249,16 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu) return 0; } -/* gfar_timeout gets called when a packet has not been +/* gfar_reset_task gets scheduled when a packet has not been * transmitted after a set amount of time. * For now, assume that clearing out all the structures, and - * starting over will fix the problem. */ -static void gfar_timeout(struct net_device *dev) + * starting over will fix the problem. + */ +static void gfar_reset_task(struct work_struct *work) { - dev->stats.tx_errors++; + struct gfar_private *priv = container_of(work, struct gfar_private, + reset_task); + struct net_device *dev = priv->dev; if (dev->flags & IFF_UP) { stop_gfar(dev); @@ -1262,6 +1268,14 @@ static void gfar_timeout(struct net_device *dev) netif_schedule(dev); } +static void gfar_timeout(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + + dev->stats.tx_errors++; + schedule_work(&priv->reset_task); +} + /* Interrupt Handler for Transmit complete */ static int gfar_clean_tx_ring(struct net_device *dev) { diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h index 27f37c8..d983a6a 100644 --- a/drivers/net/gianfar.h +++ b/drivers/net/gianfar.h @@ -759,6 +759,8 @@ struct gfar_private { uint32_t msg_enable; + struct work_struct reset_task; + /* Network Statistics */ struct gfar_extra_stats extra_stats; }; -- 1.5.5.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar 2008-07-23 20:03 ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior @ 2008-07-25 14:16 ` Nate Case 2008-07-25 19:02 ` Andy Fleming 1 sibling, 0 replies; 9+ messages in thread From: Nate Case @ 2008-07-25 14:16 UTC (permalink / raw) To: Sebastian Siewior Cc: netdev, linuxppc-dev, Vitaly Bordug, Li Yang, Jeff Garzik On Wed, 2008-07-23 at 22:03 +0200, Sebastian Siewior wrote: > I moved it into a workqueue, this is what tg3 does. > I would convert the other three drivers unless $dude suggests a better > method or somebody else takes care.... > > drivers/net/gianfar.c | 22 ++++++++++++++++++---- > drivers/net/gianfar.h | 2 ++ > 2 files changed, 20 insertions(+), 4 deletions(-) This looks good to me. -- Nate Case <ncase@xes-inc.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar 2008-07-23 20:03 ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior 2008-07-25 14:16 ` Nate Case @ 2008-07-25 19:02 ` Andy Fleming 1 sibling, 0 replies; 9+ messages in thread From: Andy Fleming @ 2008-07-25 19:02 UTC (permalink / raw) To: Sebastian Siewior Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang, Jeff Garzik On Jul 23, 2008, at 16:03, Sebastian Siewior wrote: > From: Sebastian Siewior <bigeasy@linutronix.de> > > I got the following backtrace while network was unavailble: > > |NETDEV WATCHDOG: eth0: transmit timed out > |BUG: sleeping function called from invalid context at /home/bigeasy/ > git/linux-2.6-powerpc/kernel/mutex.c:87 > |in_atomic():1, irqs_disabled():0 > |Call Trace: > |[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) > |[c0383db0] [c001e938] __might_sleep+0xe0/0xf4 > |[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c > |[c0383de0] [c019005c] phy_stop+0x20/0x70 > |[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 > |[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 > |[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 > |[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8 > |[c0383e60] [c002b084] __do_softirq+0x5c/0xc4 > |[c0383e80] [c00046fc] do_softirq+0x3c/0x54 > |[c0383e90] [c002ac60] irq_exit+0x3c/0x5c > |[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8 > |[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18 > |[c0383f80] [c000804c] cpu_idle+0xcc/0xdc > |[c0383fa0] [c025c07c] etext+0x7c/0x90 > |[c0383fc0] [c0338960] start_kernel+0x294/0x2a8 > |[c0383ff0] [c00003dc] skpinv+0x304/0x340 > |------------[ cut here ]------------ > > The phylock was once a spinlock but got changed into a mutex via > commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially > sleeping] > > Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> > --- Looks good to me. Thanks for taking care of this. Acked-by: Andy Fleming <afleming@freescale.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar 2008-07-21 22:57 ` Nate Case 2008-07-22 20:59 ` Sebastian Siewior @ 2008-07-23 22:12 ` Benjamin Herrenschmidt 2008-07-24 7:27 ` Sebastian Siewior 1 sibling, 1 reply; 9+ messages in thread From: Benjamin Herrenschmidt @ 2008-07-23 22:12 UTC (permalink / raw) To: Nate Case; +Cc: linuxppc-dev, netdev, Li Yang, Vitaly Bordug, Sebastian Siewior On Mon, 2008-07-21 at 17:57 -0500, Nate Case wrote: > On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote: > > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] > > changed the phydev->lock from spinlock into a mutex. Now, the following > > code path got triggered while NFS was unavailable: > > > [snip] > > |[ 194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 > > |[ 194.875529] in_atomic():1, irqs_disabled():0 > > |[ 194.879805] Call Trace: > > |[ 194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) > > |[ 194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4 > > |[ 194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c > > |[ 194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70 > > |[ 194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 > > |[ 194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 > > |[ 194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 > > Hmm.. I'm not sure what the best solution is to this. Make the > stop_gfar() call happen in a workqueue, and make a similar change to > ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in > a workqueue conditionally if in interrupt context? Between these two > I'd lean toward the latter. > > Does anyone have any better ideas? Move the reset task to a workqueue. Cheers, Ben. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar 2008-07-23 22:12 ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt @ 2008-07-24 7:27 ` Sebastian Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Siewior @ 2008-07-24 7:27 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: netdev, Li Yang, Nate Case, Vitaly Bordug, linuxppc-dev * Benjamin Herrenschmidt | 2008-07-24 08:12:48 [+1000]: >On Mon, 2008-07-21 at 17:57 -0500, Nate Case wrote: >> On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote: >> > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] >> > changed the phydev->lock from spinlock into a mutex. Now, the following >> > code path got triggered while NFS was unavailable: >> > >> [snip] >> > |[ 194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87 >> > |[ 194.875529] in_atomic():1, irqs_disabled():0 >> > |[ 194.879805] Call Trace: >> > |[ 194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable) >> > |[ 194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4 >> > |[ 194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c >> > |[ 194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70 >> > |[ 194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4 >> > |[ 194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60 >> > |[ 194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144 >> >> Hmm.. I'm not sure what the best solution is to this. Make the >> stop_gfar() call happen in a workqueue, and make a similar change to >> ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in >> a workqueue conditionally if in interrupt context? Between these two >> I'd lean toward the latter. >> >> Does anyone have any better ideas? > >Move the reset task to a workqueue. Done in [1] Ben. [1] http://marc.info/?l=linux-netdev&m=121684347609062&w=2 >Cheers, >Ben. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar 2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior 2008-07-21 22:57 ` Nate Case @ 2008-07-22 7:54 ` Wolfram Sang 1 sibling, 0 replies; 9+ messages in thread From: Wolfram Sang @ 2008-07-22 7:54 UTC (permalink / raw) To: Sebastian Siewior; +Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang [-- Attachment #1: Type: text/plain, Size: 1994 bytes --] Hi, On Fri, Jul 18, 2008 at 02:10:08PM +0200, Sebastian Siewior wrote: > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping] > changed the phydev->lock from spinlock into a mutex. Now, the following > code path got triggered while NFS was unavailable: [...] > I found out that the same code path may be trigger in > - drivers/net/ucc_geth.c > - drivers/net/fec_mpc52xx.c Recently, I described a (I think) similar problem: (http://ozlabs.org/pipermail/linuxppc-dev/2008-July/059686.html) === Hello, today, I was debugging a kernel crash on a board with a MPC5200B using 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c: static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id) { [...] /* on fifo error, soft-reset fec */ if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) { if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR)) dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n"); if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR)) dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n"); mpc52xx_fec_reset(dev); netif_wake_queue(dev); return IRQ_HANDLED; } [...] } Calling mpc52xx_fec_reset() from interrupt context is bad, at least because a) it calls phy_write, which contains BUG_ON(in_interrupt()) b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check if the reset was successful (1..50 us) I assume the proper thing to do is to set a flag in the ISR and handle the soft reset later in some other context. Having never dealt with the network core and its drivers so far, I am not sure which place would be the right one to perform the soft reset. To not make things worse, I hope people with more insight to network stuff can deliver a suitable solution to this problem. All the best, Wolfram === -- Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de Pengutronix - Linux Solutions for Science and Industry [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-25 19:02 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior 2008-07-21 22:57 ` Nate Case 2008-07-22 20:59 ` Sebastian Siewior 2008-07-23 20:03 ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior 2008-07-25 14:16 ` Nate Case 2008-07-25 19:02 ` Andy Fleming 2008-07-23 22:12 ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt 2008-07-24 7:27 ` Sebastian Siewior 2008-07-22 7:54 ` Wolfram Sang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).