public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Soft lockup in sungem on Netra AC200 when switching interface up
@ 2009-02-06 11:29 Ilkka Virta
  2009-02-06 19:45 ` Alexander Clouter
  2009-02-07  6:01 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Ilkka Virta @ 2009-02-06 11:29 UTC (permalink / raw)
  To: linux-kernel, netdev

What ho, chaps

The sungem network driver seems to be broken with the integrated
Ethernet ports of a Sun Netra T1 AC200. On that machine, switching the
interface up when link is up leads to a soft lockup. However,
switching the interface up with no link, and only then connecting the
cable works; as does the same driver on seemingly same hardware on a
Sun Blade 1000.

lspci doesn't show any real differences between the gems on the Netra
and on the Blade, both are these: 
0000:00:05.1 Ethernet controller [0200]: Sun Microsystems Computer
             Corp. RIO 10/100 Ethernet [eri] [108e:1101] (rev 01)

Earlier reports of the same problem indicate that the driver was
broken by commit bea3348eef27e6044b6161fd04c3152215f96411 in around
2.6.24, but the problem still exists in 2.6.28.

http://kerneltrap.org/mailarchive/linux-kernel/2008/8/7/2856094
http://bugzilla.kernel.org/show_bug.cgi?id=10309
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508151

Now, I didn't find any ready-made cure for this, so I poked around the
driver a bit to see what happens. What follows is very much only
guesswork, since I don't really know anything about Linux network
drivers.

In the lockup situation the driver seems to go off in an eternal storm
of interrupts right after calling request_irq(). It doesn't actually
do anything interesting in the interrupt handler. Since connecting the link
afterwards works, something later in initialization must fix this.

Looking at gem_do_start() and gem_open(), it seems that the only thing
done while opening the device after the request_irq(), is a call to
napi_enable().

I don't know what the ordering requirements are for the
initialization, but I boldly tried to move the napi_enable() call
inside gem_do_start() before the link state is checked and interrupts
subsequently enabled, and it seems to work for me. Doesn't even break
anything too obvious...

Any ideas on how this really should be fixed?

--- linux-2.6.28.2/drivers/net/sungem.c.orig	2009-01-25 02:42:07.000000000 +0200
+++ linux-2.6.28.2/drivers/net/sungem.c	2009-02-05 20:46:23.000000000 +0200
@@ -2222,6 +2222,8 @@ static int gem_do_start(struct net_devic
 
 	gp->running = 1;
 
+	napi_enable(&gp->napi);
+
 	if (gp->lstate == link_up) {
 		netif_carrier_on(gp->dev);
 		gem_set_link_modes(gp);
@@ -2239,6 +2241,8 @@ static int gem_do_start(struct net_devic
 		spin_lock_irqsave(&gp->lock, flags);
 		spin_lock(&gp->tx_lock);
 
+		napi_disable(&gp->napi);
+
 		gp->running =  0;
 		gem_reset(gp);
 		gem_clean_rings(gp);
@@ -2339,8 +2343,6 @@ static int gem_open(struct net_device *d
 	if (!gp->asleep)
 		rc = gem_do_start(dev);
 	gp->opened = (rc == 0);
-	if (gp->opened)
-		napi_enable(&gp->napi);
 
 	mutex_unlock(&gp->pm_mutex);
 
@@ -2477,8 +2479,6 @@ static int gem_resume(struct pci_dev *pd
 
 		/* Re-attach net device */
 		netif_device_attach(dev);
-
-		napi_enable(&gp->napi);
 	}
 
 	spin_lock_irqsave(&gp->lock, flags);

-- 
Ilkka Virta / itvirta at iki dot fi / ilkkachu@IRCNet
 ()  ascii ribbon campaign - against HTML mail and attachments in 
 /\                          closed file formats

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Soft lockup in sungem on Netra AC200 when switching interface up
  2009-02-06 11:29 Soft lockup in sungem on Netra AC200 when switching interface up Ilkka Virta
@ 2009-02-06 19:45 ` Alexander Clouter
  2009-02-07  6:01 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Alexander Clouter @ 2009-02-06 19:45 UTC (permalink / raw)
  To: linux-kernel

* Ilkka Virta <itvirta@iki.fi> [Fri, 6 Feb 2009 13:29:02 +0200]:
> What ho, chaps
>
> The sungem network driver seems to be broken with the integrated
> Ethernet ports of a Sun Netra T1 AC200. On that machine, switching the
> interface up when link is up leads to a soft lockup. However,
> switching the interface up with no link, and only then connecting the
> cable works; as does the same driver on seemingly same hardware on a
> Sun Blade 1000.
>
Et tu Brute?

> lspci doesn't show any real differences between the gems on the Netra
> and on the Blade, both are these: 
> 0000:00:05.1 Ethernet controller [0200]: Sun Microsystems Computer
>              Corp. RIO 10/100 Ethernet [eri] [108e:1101] (rev 01)
>
> Earlier reports of the same problem indicate that the driver was
> broken by commit bea3348eef27e6044b6161fd04c3152215f96411 in around
> 2.6.24, but the problem still exists in 2.6.28.
>
> http://kerneltrap.org/mailarchive/linux-kernel/2008/8/7/2856094
> http://bugzilla.kernel.org/show_bug.cgi?id=10309
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508151
>
> Now, I didn't find any ready-made cure for this, so I poked around the
> driver a bit to see what happens. What follows is very much only
> guesswork, since I don't really know anything about Linux network
> drivers.
>
I live with the following:
----
# The primary network interface (right hand side port)
auto eth0
iface eth0 inet static
  address 77.75.105.223
  netmask 255.255.255.0
  gateway 77.75.105.1

  pre-up ethtool -s eth0 autoneg off
  pre-up ethtool -s eth1 autoneg off
  pre-up ethtool -s eth0 autoneg on
----

Seems to make it work...the autonegoation caused it to sulk if the cable 
was in for us.  The above worked, and has since I started using it.  We 
are running 2.6.24 but I'm pretty sure the problem was there back on 
2.6.18.  I have a spare Netra in my attic for testing but last time I 
reported this no one seemed to care so I lived with my 'workaround'.

As a side note, do you find the cpufreq scaling locks up your machine 
instantly without even an oops?

Cheers

-- 
Alexander Clouter
.sigmonster says: Hold the MAYO & pass the COSMIC AWARENESS ...


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Soft lockup in sungem on Netra AC200 when switching interface up
  2009-02-06 11:29 Soft lockup in sungem on Netra AC200 when switching interface up Ilkka Virta
  2009-02-06 19:45 ` Alexander Clouter
@ 2009-02-07  6:01 ` David Miller
  2009-02-07 12:24   ` Jarek Poplawski
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2009-02-07  6:01 UTC (permalink / raw)
  To: itvirta; +Cc: linux-kernel, netdev

From: Ilkka Virta <itvirta@iki.fi>
Date: Fri, 6 Feb 2009 13:29:02 +0200

> Looking at gem_do_start() and gem_open(), it seems that the only thing
> done while opening the device after the request_irq(), is a call to
> napi_enable().
> 
> I don't know what the ordering requirements are for the
> initialization, but I boldly tried to move the napi_enable() call
> inside gem_do_start() before the link state is checked and interrupts
> subsequently enabled, and it seems to work for me. Doesn't even break
> anything too obvious...
> 
> Any ideas on how this really should be fixed?

Actually your fix looks good, I'll apply this :-)

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Soft lockup in sungem on Netra AC200 when switching interface up
  2009-02-07  6:01 ` David Miller
@ 2009-02-07 12:24   ` Jarek Poplawski
  0 siblings, 0 replies; 4+ messages in thread
From: Jarek Poplawski @ 2009-02-07 12:24 UTC (permalink / raw)
  To: David Miller; +Cc: itvirta, linux-kernel, netdev

David Miller wrote, On 02/07/2009 07:01 AM:

> From: Ilkka Virta <itvirta@iki.fi>
> Date: Fri, 6 Feb 2009 13:29:02 +0200
> 
>> Looking at gem_do_start() and gem_open(), it seems that the only thing
>> done while opening the device after the request_irq(), is a call to
>> napi_enable().
>>
>> I don't know what the ordering requirements are for the
>> initialization, but I boldly tried to move the napi_enable() call
>> inside gem_do_start() before the link state is checked and interrupts
>> subsequently enabled, and it seems to work for me. Doesn't even break
>> anything too obvious...
>>
>> Any ideas on how this really should be fixed?
> 
> Actually your fix looks good, I'll apply this :-)

Alas it could be not enough. It seems this problem is caused by not
serving interrupts if napi is disabled. This patch added napi_enable()
on one path, but e.g. here:

static int gem_close(struct net_device *dev)
{
        struct gem *gp = netdev_priv(dev);

        mutex_lock(&gp->pm_mutex);

        napi_disable(&gp->napi);

        gp->opened = 0;
        if (!gp->asleep)
                gem_do_stop(dev, 0);
...

similar storm can happen if an interrupt is triggered just after
napi_disable().

Jarek P.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-07 12:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-06 11:29 Soft lockup in sungem on Netra AC200 when switching interface up Ilkka Virta
2009-02-06 19:45 ` Alexander Clouter
2009-02-07  6:01 ` David Miller
2009-02-07 12:24   ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox