bug: mutex_lock() in interrupt conntext via phy

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
@ 2008-07-18 12:10 Sebastian Siewior
  2008-07-21 22:57 ` Nate Case
  2008-07-22  7:54 ` Wolfram Sang
  0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Siewior @ 2008-07-18 12:10 UTC (permalink / raw)
  To: Andy Fleming; +Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang

Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
changed the phydev->lock from spinlock into a mutex. Now, the following
code path got triggered while NFS was unavailable:

|[   21.287359] nfs: server 10.11.3.47 not responding, still trying
|[   38.891373] nfs: server 10.11.3.47 not responding, still trying
|[  148.179592] INFO: task udevd:1762 blocked for more than 120 seconds.
|[  148.185967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
|[  148.193810] udevd         D 0fef1dd8     0  1762   1761
|[  148.199055] Call Trace:
|[  148.201504] [cecdda80] [c00071e4] __switch_to+0x6c/0x84
|[  148.206764] [cecddaa0] [c025973c] schedule+0x46c/0x4cc
|[  148.211937] [cecddad0] [c00f3d84] nfs_wait_schedule+0x24/0x38
|[  148.217712] [cecddae0] [c0259b74] __wait_on_bit_lock+0x68/0xcc
|[  148.223576] [cecddb00] [c0259c4c] out_of_line_wait_on_bit_lock+0x74/0x88
|[  148.230300] [cecddb50] [c00f3e6c] __nfs_revalidate_inode+0xd4/0x264
|[  148.236597] [cecddc20] [c00f1298] nfs_lookup_revalidate+0x1bc/0x3d4
|[  148.243071] [cecddd80] [c0081db8] do_lookup+0x148/0x1a0
|[  148.248361] [cecdddb0] [c0083bac] __link_path_walk+0x930/0xe24
|[  148.254219] [cecdde00] [c00840e8] path_walk+0x48/0xa8
|[  148.259293] [cecdde30] [c008442c] do_path_lookup+0x160/0x194
|[  148.264982] [cecdde60] [c0084fe0] __path_lookup_intent_open+0x58/0xa4
|[  148.271444] [cecdde80] [c007ea54] open_exec+0x2c/0xdc
|[  148.276525] [cecddef0] [c007efa4] do_execve+0x58/0x1c4
|[  148.281704] [cecddf20] [c0007568] sys_execve+0x58/0x84
|[  148.286873] [cecddf40] [c000df58] ret_from_syscall+0x0/0x3c
|[  169.651632] INFO: task udevsettle:1053 blocked for more than 120 seconds.

some more of this and now the interresting part:

|[  194.859659] NETDEV WATCHDOG: eth0: transmit timed out
|[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
|[  194.875529] in_atomic():1, irqs_disabled():0
|[  194.879805] Call Trace:
|[  194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
|[  194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4
|[  194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
|[  194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70
|[  194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
|[  194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
|[  194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
|[  194.920064] [c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
|[  194.926008] [c0383e60] [c002b084] __do_softirq+0x5c/0xc4
|[  194.931350] [c0383e80] [c00046fc] do_softirq+0x3c/0x54
|[  194.936515] [c0383e90] [c002ac60] irq_exit+0x3c/0x5c
|[  194.941499] [c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
|[  194.947097] [c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
|[  194.952610] [c0383f80] [c000804c] cpu_idle+0xcc/0xdc
|[  194.957592] [c0383fa0] [c025c07c] etext+0x7c/0x90
|[  194.962322] [c0383fc0] [c0338960] start_kernel+0x294/0x2a8
|[  194.967839] [c0383ff0] [c00003dc] skpinv+0x304/0x340
|[  194.972833] ------------[ cut here ]------------
|[  194.977450] Badness at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:134
|[  194.984589] NIP: c025a268 LR: c025a250 CTR: c017e224
|[  194.989557] REGS: c0383cf0 TRAP: 0700   Not tainted  (2.6.26)
|[  194.995302] MSR: 00029000 <EE,ME>  CR: 28002022  XER: 00000000
|[  195.001167] TASK = c035e500[0] 'swapper' THREAD: c0382000
|[  195.006390] GPR00: 00000000 c0383da0 c035e500 00000001 c035e500 00000010 00000000 c0360000 
|[  195.014798] GPR08: 00000000 c0390000 00000001 c0360000 00006353 628a87a2 0ffe8600 00000000 
|[  195.023206] GPR16: cab54ee3 00000000 00000000 0ffe7384 00000000 00000000 0ff904a0 00000000 
|[  195.031612] GPR24: 00000000 00000000 c038e5a4 d1058000 c035e500 cf86b570 cf9c3888 cf9c3888 
|[  195.040199] NIP [c025a268] __mutex_lock_slowpath+0x44/0x1f4
|[  195.045783] LR [c025a250] __mutex_lock_slowpath+0x2c/0x1f4
|[  195.051277] Call Trace:
|[  195.053721] [c0383da0] [cf9c3888] 0xcf9c3888 (unreliable)
|[  195.059146] [c0383de0] [c019005c] phy_stop+0x20/0x70
|[  195.064135] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
|[  195.069202] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
|[  195.074529] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
|[  195.079946] [c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
|[  195.085885] [c0383e60] [c002b084] __do_softirq+0x5c/0xc4
|[  195.091219] [c0383e80] [c00046fc] do_softirq+0x3c/0x54
|[  195.096374] [c0383e90] [c002ac60] irq_exit+0x3c/0x5c
|[  195.101353] [c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
|[  195.106944] [c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
|[  195.112447] [c0383f80] [c000804c] cpu_idle+0xcc/0xdc
|[  195.117426] [c0383fa0] [c025c07c] etext+0x7c/0x90
|[  195.122147] [c0383fc0] [c0338960] start_kernel+0x294/0x2a8
|[  195.127655] [c0383ff0] [c00003dc] skpinv+0x304/0x340
|[  195.132633] Instruction dump:
|[  195.135422] 90010044 7c5c1378 8009000c 5409012f 41a20024 4bee78dd 2f830000 419e0018 
|[  195.143222] 3d20c039 80098714 2f800000 409e0008 <0fe00000> 7fc000a6 7c000146 801f0024 

I found out that the same code path may be trigger in
- drivers/net/ucc_geth.c
- drivers/net/fec_mpc52xx.c
- drivers/net/fs_enet/fs_enet-main.c

other drivers use phy_stop() in ->close only.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
  2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior
@ 2008-07-21 22:57 ` Nate Case
  2008-07-22 20:59   ` Sebastian Siewior
  2008-07-23 22:12   ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt
  2008-07-22  7:54 ` Wolfram Sang
  1 sibling, 2 replies; 9+ messages in thread
From: Nate Case @ 2008-07-21 22:57 UTC (permalink / raw)
  To: Sebastian Siewior; +Cc: netdev, linuxppc-dev, Vitaly Bordug, Li Yang

On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote:
> Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
> changed the phydev->lock from spinlock into a mutex. Now, the following
> code path got triggered while NFS was unavailable:
> 
[snip]
> |[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
> |[  194.875529] in_atomic():1, irqs_disabled():0
> |[  194.879805] Call Trace:
> |[  194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
> |[  194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4
> |[  194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
> |[  194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70
> |[  194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
> |[  194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
> |[  194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144

Hmm..  I'm not sure what the best solution is to this.  Make the
stop_gfar() call happen in a workqueue, and make a similar change to
ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in
a workqueue conditionally if in interrupt context?  Between these two
I'd lean toward the latter.

Does anyone have any better ideas?

-- 
Nate Case <ncase@xes-inc.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
  2008-07-21 22:57 ` Nate Case
@ 2008-07-22 20:59   ` Sebastian Siewior
  2008-07-23 20:03     ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior
  2008-07-23 22:12   ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Siewior @ 2008-07-22 20:59 UTC (permalink / raw)
  To: Nate Case; +Cc: netdev, Sebastian Siewior, linuxppc-dev, Vitaly Bordug, Li Yang

* Nate Case | 2008-07-21 17:57:08 [-0500]:

>On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote:
>> Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
>> changed the phydev->lock from spinlock into a mutex. Now, the following
>> code path got triggered while NFS was unavailable:
>> 
>[snip]
>> |[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
>> |[  194.875529] in_atomic():1, irqs_disabled():0
>> |[  194.879805] Call Trace:
>> |[  194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
>> |[  194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4
>> |[  194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
>> |[  194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70
>> |[  194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
>> |[  194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
>> |[  194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
>
>Hmm..  I'm not sure what the best solution is to this.  Make the
>stop_gfar() call happen in a workqueue, and make a similar change to
>ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in
>a workqueue conditionally if in interrupt context?  Between these two
>I'd lean toward the latter.
>
>Does anyone have any better ideas?
If I look at tg3.c than exactly this is done. Others call it only on
close(). I guess this depends very much on driver's logic :)
If nobody minds, than I would assume that tg3.c is a good example and I
would move the timout path into a workqueu.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar
  2008-07-22 20:59   ` Sebastian Siewior
@ 2008-07-23 20:03     ` Sebastian Siewior
  2008-07-25 14:16       ` Nate Case
  2008-07-25 19:02       ` Andy Fleming
  0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Siewior @ 2008-07-23 20:03 UTC (permalink / raw)
  To: Andy Fleming
  Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang,
	Jeff Garzik

From: Sebastian Siewior <bigeasy@linutronix.de>

I got the following backtrace while network was unavailble:

|NETDEV WATCHDOG: eth0: transmit timed out
|BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
|in_atomic():1, irqs_disabled():0
|Call Trace:
|[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
|[c0383db0] [c001e938] __might_sleep+0xe0/0xf4
|[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
|[c0383de0] [c019005c] phy_stop+0x20/0x70
|[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
|[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
|[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
|[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
|[c0383e60] [c002b084] __do_softirq+0x5c/0xc4
|[c0383e80] [c00046fc] do_softirq+0x3c/0x54
|[c0383e90] [c002ac60] irq_exit+0x3c/0x5c
|[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
|[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
|[c0383f80] [c000804c] cpu_idle+0xcc/0xdc
|[c0383fa0] [c025c07c] etext+0x7c/0x90
|[c0383fc0] [c0338960] start_kernel+0x294/0x2a8
|[c0383ff0] [c00003dc] skpinv+0x304/0x340
|------------[ cut here ]------------

The phylock was once a spinlock but got changed into a mutex via
commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
---
bug report @ http://marc.info/?l=linux-netdev&m=121638307116389&w=2
I moved it into a workqueue, this is what tg3 does.
I would convert the other three drivers unless $dude suggests a better
method or somebody else takes care....

 drivers/net/gianfar.c |   22 ++++++++++++++++++----
 drivers/net/gianfar.h |    2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 25bdd08..caa6cbd 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -112,6 +112,7 @@ const char gfar_driver_version[] = "1.3";
 
 static int gfar_enet_open(struct net_device *dev);
 static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev);
+static void gfar_reset_task(struct work_struct *work);
 static void gfar_timeout(struct net_device *dev);
 static int gfar_close(struct net_device *dev);
 struct sk_buff *gfar_new_skb(struct net_device *dev);
@@ -216,6 +217,7 @@ static int gfar_probe(struct platform_device *pdev)
 
 	spin_lock_init(&priv->txlock);
 	spin_lock_init(&priv->rxlock);
+	INIT_WORK(&priv->reset_task, gfar_reset_task);
 
 	platform_set_drvdata(pdev, dev);
 
@@ -1132,6 +1134,7 @@ static int gfar_close(struct net_device *dev)
 	napi_disable(&priv->napi);
 #endif
 
+	cancel_work_sync(&priv->reset_task);
 	stop_gfar(dev);
 
 	/* Disconnect from the PHY */
@@ -1246,13 +1249,16 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-/* gfar_timeout gets called when a packet has not been
+/* gfar_reset_task gets scheduled when a packet has not been
  * transmitted after a set amount of time.
  * For now, assume that clearing out all the structures, and
- * starting over will fix the problem. */
-static void gfar_timeout(struct net_device *dev)
+ * starting over will fix the problem.
+ */
+static void gfar_reset_task(struct work_struct *work)
 {
-	dev->stats.tx_errors++;
+	struct gfar_private *priv = container_of(work, struct gfar_private,
+			reset_task);
+	struct net_device *dev = priv->dev;
 
 	if (dev->flags & IFF_UP) {
 		stop_gfar(dev);
@@ -1262,6 +1268,14 @@ static void gfar_timeout(struct net_device *dev)
 	netif_schedule(dev);
 }
 
+static void gfar_timeout(struct net_device *dev)
+{
+	struct gfar_private *priv = netdev_priv(dev);
+
+	dev->stats.tx_errors++;
+	schedule_work(&priv->reset_task);
+}
+
 /* Interrupt Handler for Transmit complete */
 static int gfar_clean_tx_ring(struct net_device *dev)
 {
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 27f37c8..d983a6a 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -759,6 +759,8 @@ struct gfar_private {
 
 	uint32_t msg_enable;
 
+	struct work_struct reset_task;
+
 	/* Network Statistics */
 	struct gfar_extra_stats extra_stats;
 };
-- 
1.5.5.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar
  2008-07-23 20:03     ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior
@ 2008-07-25 14:16       ` Nate Case
  2008-07-25 19:02       ` Andy Fleming
  1 sibling, 0 replies; 9+ messages in thread
From: Nate Case @ 2008-07-25 14:16 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: netdev, linuxppc-dev, Vitaly Bordug, Li Yang, Jeff Garzik

On Wed, 2008-07-23 at 22:03 +0200, Sebastian Siewior wrote:
> I moved it into a workqueue, this is what tg3 does.
> I would convert the other three drivers unless $dude suggests a better
> method or somebody else takes care....
> 
>  drivers/net/gianfar.c |   22 ++++++++++++++++++----
>  drivers/net/gianfar.h |    2 ++
>  2 files changed, 20 insertions(+), 4 deletions(-)

This looks good to me.

-- 
Nate Case <ncase@xes-inc.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar
  2008-07-23 20:03     ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior
  2008-07-25 14:16       ` Nate Case
@ 2008-07-25 19:02       ` Andy Fleming
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Fleming @ 2008-07-25 19:02 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang,
	Jeff Garzik


On Jul 23, 2008, at 16:03, Sebastian Siewior wrote:

> From: Sebastian Siewior <bigeasy@linutronix.de>
>
> I got the following backtrace while network was unavailble:
>
> |NETDEV WATCHDOG: eth0: transmit timed out
> |BUG: sleeping function called from invalid context at /home/bigeasy/ 
> git/linux-2.6-powerpc/kernel/mutex.c:87
> |in_atomic():1, irqs_disabled():0
> |Call Trace:
> |[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
> |[c0383db0] [c001e938] __might_sleep+0xe0/0xf4
> |[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
> |[c0383de0] [c019005c] phy_stop+0x20/0x70
> |[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
> |[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
> |[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
> |[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
> |[c0383e60] [c002b084] __do_softirq+0x5c/0xc4
> |[c0383e80] [c00046fc] do_softirq+0x3c/0x54
> |[c0383e90] [c002ac60] irq_exit+0x3c/0x5c
> |[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
> |[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
> |[c0383f80] [c000804c] cpu_idle+0xcc/0xdc
> |[c0383fa0] [c025c07c] etext+0x7c/0x90
> |[c0383fc0] [c0338960] start_kernel+0x294/0x2a8
> |[c0383ff0] [c00003dc] skpinv+0x304/0x340
> |------------[ cut here ]------------
>
> The phylock was once a spinlock but got changed into a mutex via
> commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially  
> sleeping]
>
> Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
> ---


Looks good to me.  Thanks for taking care of this.

Acked-by: Andy Fleming <afleming@freescale.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
  2008-07-21 22:57 ` Nate Case
  2008-07-22 20:59   ` Sebastian Siewior
@ 2008-07-23 22:12   ` Benjamin Herrenschmidt
  2008-07-24  7:27     ` Sebastian Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-23 22:12 UTC (permalink / raw)
  To: Nate Case; +Cc: linuxppc-dev, netdev, Li Yang, Vitaly Bordug, Sebastian Siewior

On Mon, 2008-07-21 at 17:57 -0500, Nate Case wrote:
> On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote:
> > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
> > changed the phydev->lock from spinlock into a mutex. Now, the following
> > code path got triggered while NFS was unavailable:
> > 
> [snip]
> > |[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
> > |[  194.875529] in_atomic():1, irqs_disabled():0
> > |[  194.879805] Call Trace:
> > |[  194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
> > |[  194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4
> > |[  194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
> > |[  194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70
> > |[  194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
> > |[  194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
> > |[  194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
> 
> Hmm..  I'm not sure what the best solution is to this.  Make the
> stop_gfar() call happen in a workqueue, and make a similar change to
> ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in
> a workqueue conditionally if in interrupt context?  Between these two
> I'd lean toward the latter.
> 
> Does anyone have any better ideas?

Move the reset task to a workqueue.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
  2008-07-23 22:12   ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt
@ 2008-07-24  7:27     ` Sebastian Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Siewior @ 2008-07-24  7:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: netdev, Li Yang, Nate Case, Vitaly Bordug, linuxppc-dev

* Benjamin Herrenschmidt | 2008-07-24 08:12:48 [+1000]:

>On Mon, 2008-07-21 at 17:57 -0500, Nate Case wrote:
>> On Fri, 2008-07-18 at 14:10 +0200, Sebastian Siewior wrote:
>> > Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
>> > changed the phydev->lock from spinlock into a mutex. Now, the following
>> > code path got triggered while NFS was unavailable:
>> > 
>> [snip]
>> > |[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
>> > |[  194.875529] in_atomic():1, irqs_disabled():0
>> > |[  194.879805] Call Trace:
>> > |[  194.882250] [c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
>> > |[  194.888649] [c0383db0] [c001e938] __might_sleep+0xe0/0xf4
>> > |[  194.894069] [c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
>> > |[  194.899234] [c0383de0] [c019005c] phy_stop+0x20/0x70
>> > |[  194.904234] [c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
>> > |[  194.909305] [c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
>> > |[  194.914638] [c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
>> 
>> Hmm..  I'm not sure what the best solution is to this.  Make the
>> stop_gfar() call happen in a workqueue, and make a similar change to
>> ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in
>> a workqueue conditionally if in interrupt context?  Between these two
>> I'd lean toward the latter.
>> 
>> Does anyone have any better ideas?
>
>Move the reset task to a workqueue.
Done in [1] Ben.

[1] http://marc.info/?l=linux-netdev&m=121684347609062&w=2
>Cheers,
>Ben.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar
  2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior
  2008-07-21 22:57 ` Nate Case
@ 2008-07-22  7:54 ` Wolfram Sang
  1 sibling, 0 replies; 9+ messages in thread
From: Wolfram Sang @ 2008-07-22  7:54 UTC (permalink / raw)
  To: Sebastian Siewior; +Cc: Nate Case, netdev, linuxppc-dev, Vitaly Bordug, Li Yang

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

Hi,

On Fri, Jul 18, 2008 at 02:10:08PM +0200, Sebastian Siewior wrote:
> Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
> changed the phydev->lock from spinlock into a mutex. Now, the following
> code path got triggered while NFS was unavailable:
[...]
> I found out that the same code path may be trigger in
> - drivers/net/ucc_geth.c
> - drivers/net/fec_mpc52xx.c

Recently, I described a (I think) similar problem:
(http://ozlabs.org/pipermail/linuxppc-dev/2008-July/059686.html)

===

Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
{
[...]
	/* on fifo error, soft-reset fec */
	if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {

		if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n");
		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");

		mpc52xx_fec_reset(dev);

		netif_wake_queue(dev);
		return IRQ_HANDLED;
	}
[...]
}

Calling mpc52xx_fec_reset() from interrupt context is bad, at least
because

a) it calls phy_write, which contains BUG_ON(in_interrupt())
b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
   if the reset was successful (1..50 us)

I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

All the best,

   Wolfram

===

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-07-25 19:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-18 12:10 bug: mutex_lock() in interrupt conntext via phy_stop() in gianfar Sebastian Siewior
2008-07-21 22:57 ` Nate Case
2008-07-22 20:59   ` Sebastian Siewior
2008-07-23 20:03     ` [PATCH / RFC] net: don't grab a mutex within a timer context " Sebastian Siewior
2008-07-25 14:16       ` Nate Case
2008-07-25 19:02       ` Andy Fleming
2008-07-23 22:12   ` bug: mutex_lock() in interrupt conntext via phy_stop() " Benjamin Herrenschmidt
2008-07-24  7:27     ` Sebastian Siewior
2008-07-22  7:54 ` Wolfram Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).