linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
@ 2008-07-10 12:39 Wolfram Sang
  2008-07-10 15:31 ` Grant Likely
  2008-07-27  1:31 ` Grant Likely
  0 siblings, 2 replies; 6+ messages in thread
From: Wolfram Sang @ 2008-07-10 12:39 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: domen.puncer

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
{
[...]
	/* on fifo error, soft-reset fec */
	if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {

		if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n");
		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");

		mpc52xx_fec_reset(dev);

		netif_wake_queue(dev);
		return IRQ_HANDLED;
	}
[...]
}

Calling mpc52xx_fec_reset() from interrupt context is bad, at least
because

a) it calls phy_write, which contains BUG_ON(in_interrupt())
b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
   if the reset was successful (1..50 us)

I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

All the best,

   Wolfram

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
  2008-07-10 12:39 [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR Wolfram Sang
@ 2008-07-10 15:31 ` Grant Likely
  2008-08-13 13:48   ` Juergen Beisert
  2008-07-27  1:31 ` Grant Likely
  1 sibling, 1 reply; 6+ messages in thread
From: Grant Likely @ 2008-07-10 15:31 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linuxppc-dev, domen.puncer

On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
> Hello,
> 
> today, I was debugging a kernel crash on a board with a MPC5200B using
> 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
<snip>
> I assume the proper thing to do is to set a flag in the ISR and handle
> the soft reset later in some other context. Having never dealt with the
> network core and its drivers so far, I am not sure which place would be
> the right one to perform the soft reset. To not make things worse, I
> hope people with more insight to network stuff can deliver a suitable
> solution to this problem.

Thanks for the bug report.  I'll take a look.

g.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
  2008-07-10 12:39 [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR Wolfram Sang
  2008-07-10 15:31 ` Grant Likely
@ 2008-07-27  1:31 ` Grant Likely
  1 sibling, 0 replies; 6+ messages in thread
From: Grant Likely @ 2008-07-27  1:31 UTC (permalink / raw)
  To: Wolfram Sang; +Cc: linuxppc-dev, domen.puncer

On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
> Hello,
> 
> today, I was debugging a kernel crash on a board with a MPC5200B using
> 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
> 
> static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
> {
> [...]
> 	/* on fifo error, soft-reset fec */
> 	if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {
> 
> 		if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR))
> 			dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n");
> 		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
> 			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");
> 
> 		mpc52xx_fec_reset(dev);
> 
> 		netif_wake_queue(dev);
> 		return IRQ_HANDLED;
> 	}
> [...]
> }
> 
> Calling mpc52xx_fec_reset() from interrupt context is bad, at least
> because
> 
> a) it calls phy_write, which contains BUG_ON(in_interrupt())
> b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
>    if the reset was successful (1..50 us)
> 
> I assume the proper thing to do is to set a flag in the ISR and handle
> the soft reset later in some other context. Having never dealt with the
> network core and its drivers so far, I am not sure which place would be
> the right one to perform the soft reset. To not make things worse, I
> hope people with more insight to network stuff can deliver a suitable
> solution to this problem.
> 
> All the best,
> 
>    Wolfram

Hi Wolfram,

I'm not an expert on the FEC driver, but I'll take a look when I get back
home to Calgary.  There has also been other churn in this area and I
need to get myself up to speed.

g.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
  2008-07-10 15:31 ` Grant Likely
@ 2008-08-13 13:48   ` Juergen Beisert
  2008-08-13 14:12     ` Wolfgang Grandegger
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Beisert @ 2008-08-13 13:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Wolfram Sang, domen.puncer

On Donnerstag, 10. Juli 2008, Grant Likely wrote:
> On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
> > Hello,
> >
> > today, I was debugging a kernel crash on a board with a MPC5200B using
> > 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
>
> <snip>
>
> > I assume the proper thing to do is to set a flag in the ISR and handle
> > the soft reset later in some other context. Having never dealt with the
> > network core and its drivers so far, I am not sure which place would be
> > the right one to perform the soft reset. To not make things worse, I
> > hope people with more insight to network stuff can deliver a suitable
> > solution to this problem.
>
> Thanks for the bug report.  I'll take a look.

Some update:

Enabling XLB pipelining let occure this error less often. Kernel disables t=
his=20
feature by default yet.
The comment talks about "cfr errate 292." that is valid for MPC5200A, but=20
_it_seems_ no longer for MPC5200B. Has anybody experience if we can enablin=
g=20
pipelining on MPC5200B CPUs without triggering this bug?

We currently are playing with this setting:

Index: arch/powerpc/platforms/52xx/mpc52xx_common.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=2D-- arch/powerpc/platforms/52xx/mpc52xx_common.c.orig
+++ arch/powerpc/platforms/52xx/mpc52xx_common.c
@@ -99,11 +99,11 @@
 	out_be32(&xlb->master_pri_enable, 0xff);
 	out_be32(&xlb->master_priority, 0x11111111);
=20
=2D	/* Disable XLB pipelining
=2D	 * (cfr errate 292. We could do this only just before ATA PIO
=2D	 *  transaction and re-enable it afterwards ...)
+	/*
+	 * Enable pipelining, fixes FEC problems. The previous workaround seems
+	 * not needed, as we have an MPC5200B (not A).
 	 */
=2D	out_be32(&xlb->config, in_be32(&xlb->config) | MPC52xx_XLB_CFG_PLDIS);
+	out_be32(&xlb->config, in_be32(&xlb->config) & ~MPC52xx_XLB_CFG_PLDIS);
=20
 	iounmap(xlb);
 }

jbe

=2D-=20
Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de
=A0Pengutronix - Linux Solutions for Science and Industry
=A0   Handelsregister: Amtsgericht Hildesheim, HRA 2686
=A0 =A0 =A0    Vertretung Sued/Muenchen, Germany
   Phone: +49-8766-939 228 |  Fax: +49-5121-206917-9

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
  2008-08-13 13:48   ` Juergen Beisert
@ 2008-08-13 14:12     ` Wolfgang Grandegger
  2008-08-15 11:45       ` Wolfram Sang
  0 siblings, 1 reply; 6+ messages in thread
From: Wolfgang Grandegger @ 2008-08-13 14:12 UTC (permalink / raw)
  To: Juergen Beisert; +Cc: linuxppc-dev, Wolfram Sang, domen.puncer

Hi Jürgen,

Juergen Beisert wrote:
> On Donnerstag, 10. Juli 2008, Grant Likely wrote:
>> On Thu, Jul 10, 2008 at 02:39:09PM +0200, Wolfram Sang wrote:
>>> Hello,
>>>
>>> today, I was debugging a kernel crash on a board with a MPC5200B using
>>> 2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:
>> <snip>
>>
>>> I assume the proper thing to do is to set a flag in the ISR and handle
>>> the soft reset later in some other context. Having never dealt with the
>>> network core and its drivers so far, I am not sure which place would be
>>> the right one to perform the soft reset. To not make things worse, I
>>> hope people with more insight to network stuff can deliver a suitable
>>> solution to this problem.
>> Thanks for the bug report.  I'll take a look.
> 
> Some update:
> 
> Enabling XLB pipelining let occure this error less often. Kernel disables this 
> feature by default yet.
> The comment talks about "cfr errate 292." that is valid for MPC5200A, but 
> _it_seems_ no longer for MPC5200B. Has anybody experience if we can enabling 
> pipelining on MPC5200B CPUs without triggering this bug?

No, I haven't...

> We currently are playing with this setting:
> 
> Index: arch/powerpc/platforms/52xx/mpc52xx_common.c
> ===================================================================
> --- arch/powerpc/platforms/52xx/mpc52xx_common.c.orig
> +++ arch/powerpc/platforms/52xx/mpc52xx_common.c
> @@ -99,11 +99,11 @@
>  	out_be32(&xlb->master_pri_enable, 0xff);
>  	out_be32(&xlb->master_priority, 0x11111111);
>  
> -	/* Disable XLB pipelining
> -	 * (cfr errate 292. We could do this only just before ATA PIO
> -	 *  transaction and re-enable it afterwards ...)
> +	/*
> +	 * Enable pipelining, fixes FEC problems. The previous workaround seems
> +	 * not needed, as we have an MPC5200B (not A).
>  	 */
> -	out_be32(&xlb->config, in_be32(&xlb->config) | MPC52xx_XLB_CFG_PLDIS);
> +	out_be32(&xlb->config, in_be32(&xlb->config) & ~MPC52xx_XLB_CFG_PLDIS);
>  
>  	iounmap(xlb);
>  }

...but I prepared a patch to do the reset in the process context. Would be
nice if you could give the patch below a try.

Wolfgang.

===================================================================
From: Wolfgang Grandegger <wg@grandegger.com>
Subject: [PATCH] powerpc: fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR

Calling mpc52xx_fec_reset() in the ISR is a bug. This patch puts a task
for resetting the FEC into the kernel-global workqueue to be processed
lateron savely in process context.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
---
 drivers/net/fec_mpc52xx.c |   31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

Index: linux-2.6.26/drivers/net/fec_mpc52xx.c
===================================================================
--- linux-2.6.26.orig/drivers/net/fec_mpc52xx.c
+++ linux-2.6.26/drivers/net/fec_mpc52xx.c
@@ -24,6 +24,7 @@
 #include <linux/crc32.h>
 #include <linux/hardirq.h>
 #include <linux/delay.h>
+#include <linux/workqueue.h>
 #include <linux/of_device.h>
 #include <linux/of_platform.h>
 
@@ -57,6 +58,8 @@ struct mpc52xx_fec_priv {
 	struct bcom_task *tx_dmatsk;
 	spinlock_t lock;
 	int msg_enable;
+	struct work_struct reset_task;
+	struct net_device *ndev;
 
 	/* MDIO link details */
 	int phy_addr;
@@ -83,6 +86,19 @@ static int debug = -1;	/* the above defa
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "debugging messages level");
 
+static void mpc52xx_fec_reset_task(struct work_struct *work)
+{
+	struct mpc52xx_fec_priv *priv =
+		container_of(work, struct mpc52xx_fec_priv, reset_task);
+	struct net_device *dev = priv->ndev;
+
+	dev_warn(&dev->dev, "deferred FEC reset due to errors\n");
+
+	mpc52xx_fec_reset(dev);
+
+	netif_wake_queue(dev);
+}
+
 static void mpc52xx_fec_tx_timeout(struct net_device *dev)
 {
 	dev_warn(&dev->dev, "transmit timed out\n");
@@ -355,6 +371,8 @@ static int mpc52xx_fec_close(struct net_
 {
 	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 
+	flush_scheduled_work();
+
 	netif_stop_queue(dev);
 
 	mpc52xx_fec_stop(dev);
@@ -522,9 +540,9 @@ static irqreturn_t mpc52xx_fec_interrupt
 		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
 			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");
 
-		mpc52xx_fec_reset(dev);
-
-		netif_wake_queue(dev);
+		netif_stop_queue(dev);
+		netif_carrier_off(dev);
+		schedule_work(&priv->reset_task);
 		return IRQ_HANDLED;
 	}
 
@@ -934,7 +952,9 @@ mpc52xx_fec_probe(struct of_device *op, 
 
 	priv->t_irq = priv->r_irq = ndev->irq = NO_IRQ; /* IRQ are free for now */
 
-	spin_lock_init(&priv->lock);
+	priv->ndev = ndev;
+ 	spin_lock_init(&priv->lock);
+	INIT_WORK(&priv->reset_task, mpc52xx_fec_reset_task);
 
 	/* ioremap the zones */
 	priv->fec = ioremap(mem.start, sizeof(struct mpc52xx_fec));
@@ -1068,6 +1088,9 @@ mpc52xx_fec_remove(struct of_device *op)
 	ndev = dev_get_drvdata(&op->dev);
 	priv = netdev_priv(ndev);
 
+	if (netif_running(ndev))
+		mpc52xx_fec_close(ndev);
+
 	unregister_netdev(ndev);
 
 	irq_dispose_mapping(ndev->irq);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR
  2008-08-13 14:12     ` Wolfgang Grandegger
@ 2008-08-15 11:45       ` Wolfram Sang
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfram Sang @ 2008-08-15 11:45 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: linuxppc-dev, domen.puncer

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

Hello Wolfgang (and all),

On Wed, Aug 13, 2008 at 04:12:17PM +0200, Wolfgang Grandegger wrote:

> ...but I prepared a patch to do the reset in the process context. Would be
> nice if you could give the patch below a try.
Will do later. Thanks!

Still, I think it might be useful to discuss if a complete reset
is not overkill anyhow. The documentation says that only the FIFO
and the Bestcom needs to be reset. Or, if we take the "big hammer"
solution, it would be good to audit if this won't cause any
side-effects (do all related states get updated...).

Remember that there lately have been patches removing some improper
usage of netif_* calls; furthermore, I also found some questionable
areas in this code (mails will be sent later). So, this driver
needs some careful attention IMHO.

All the best,

   Wolfram

-- 
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-08-15 11:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-10 12:39 [BUG] fec_mpc52xx: Don't call mpc52xx_fec_reset() in ISR Wolfram Sang
2008-07-10 15:31 ` Grant Likely
2008-08-13 13:48   ` Juergen Beisert
2008-08-13 14:12     ` Wolfgang Grandegger
2008-08-15 11:45       ` Wolfram Sang
2008-07-27  1:31 ` Grant Likely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).