Patch: Fix SMP hang on modem close

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Patch: Fix SMP hang on modem close
@ 2002-04-05 20:20 roger blofeld
  2002-04-06 10:24 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: roger blofeld @ 2002-04-05 20:20 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev


Ben,
 This patch removes two dangling LOCK() statements for
core99/pangea. The core99 one hung my dual g4.
-roger

--- linux/arch/ppc/kernel/pmac_feature.c.orig   Tue
Apr  2 08:17:31 2002
+++ linux/arch/ppc/kernel/pmac_feature.c        Fri
Apr  5 14:02:13 2002
@@ -788,7 +788,7 @@
                UNLOCK(flags); mdelay(250);
LOCK(flags);
                MACIO_OUT8(KL_GPIO_MODEM_RESET, gpio |
KEYLARGO_GPIO_OUTOUT_DATA);
                (void)MACIO_IN8(KL_GPIO_MODEM_RESET);
-               UNLOCK(flags); mdelay(250);
LOCK(flags);
+               UNLOCK(flags); mdelay(250);
        }
        return 0;
 }
@@ -1445,7 +1445,7 @@
                UNLOCK(flags); mdelay(250);
LOCK(flags);
                MACIO_OUT8(KL_GPIO_MODEM_RESET, gpio |
KEYLARGO_GPIO_OUTOUT_DATA);
                (void)MACIO_IN8(KL_GPIO_MODEM_RESET);
-               UNLOCK(flags); mdelay(250);
LOCK(flags);
+               UNLOCK(flags); mdelay(250);
        }
        return 0;
 }


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Patch: Fix SMP hang on modem close
  2002-04-05 20:20 Patch: Fix SMP hang on modem close roger blofeld
@ 2002-04-06 10:24 ` Benjamin Herrenschmidt
  2002-06-06 19:25   ` Sungem bug or something else? roger blofeld
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2002-04-06 10:24 UTC (permalink / raw)
  To: roger blofeld; +Cc: linuxppc-dev


>
>Ben,
> This patch removes two dangling LOCK() statements for
>core99/pangea. The core99 one hung my dual g4.
>-roger

Good catch ! That would indeed have cause SMP lockups
when using the modem.

Thanks,
Ben.

>--- linux/arch/ppc/kernel/pmac_feature.c.orig   Tue
>Apr  2 08:17:31 2002
>+++ linux/arch/ppc/kernel/pmac_feature.c        Fri
>Apr  5 14:02:13 2002
>@@ -788,7 +788,7 @@
>                UNLOCK(flags); mdelay(250);
>LOCK(flags);
>                MACIO_OUT8(KL_GPIO_MODEM_RESET, gpio |
>KEYLARGO_GPIO_OUTOUT_DATA);
>                (void)MACIO_IN8(KL_GPIO_MODEM_RESET);
>-               UNLOCK(flags); mdelay(250);
>LOCK(flags);
>+               UNLOCK(flags); mdelay(250);
>        }
>        return 0;
> }
>@@ -1445,7 +1445,7 @@
>                UNLOCK(flags); mdelay(250);
>LOCK(flags);
>                MACIO_OUT8(KL_GPIO_MODEM_RESET, gpio |
>KEYLARGO_GPIO_OUTOUT_DATA);
>                (void)MACIO_IN8(KL_GPIO_MODEM_RESET);
>-               UNLOCK(flags); mdelay(250);
>LOCK(flags);
>+               UNLOCK(flags); mdelay(250);
>        }
>        return 0;
> }
>
>
>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Sungem bug or something else?
  2002-04-06 10:24 ` Benjamin Herrenschmidt
@ 2002-06-06 19:25   ` roger blofeld
  2002-06-06 19:30     ` Tom Rini
  2002-06-06 19:45     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 10+ messages in thread
From: roger blofeld @ 2002-06-06 19:25 UTC (permalink / raw)
  To: linuxppc-dev


I encounter an oops during boot bringing up a sungem
interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I defer
bringing up the network at boot, I can successfully
start eth0 (sungem) if I start eth1 (tulip) first, so
it may not be the sungem driver itself. This happens
on benh 2.4.19-Bpre10, and pre9.

The area which fails (according to ksymoops) is in
sungem.c <__phy_read+54/a4>

static u16 __phy_read(struct gem *gp, int reg, int
phy_addr)
{
    u32 cmd;
    int limit = 10000;

    cmd  = (1 << 30);
    cmd |= (2 << 28);
    cmd |= (phy_addr << 23) & MIF_FRAME_PHYAD;
    cmd |= (reg << 18) & MIF_FRAME_REGAD;
    cmd |= (MIF_FRAME_TAMSB);
    writel(cmd, gp->regs + MIF_FRAME);

    while (limit--) {
        cmd = readl(gp->regs + MIF_FRAME); *** here
***
        if (cmd & MIF_FRAME_TALSB)
            break;

        udelay(10);
    }

    if (!limit)
        cmd = 0xffff;

    return cmd & MIF_FRAME_DATA;
}

The actual faulting address is 0xe20d920c (the value
of gpr0; gpr31 is 0)

0xc00de2a4 <__phy_read+80>:     lwbrx   r31,r0,r0
0xc00de2a8 <__phy_read+84>:     eieio


Any clues where I should look?
Thanks
-roger


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 19:25   ` Sungem bug or something else? roger blofeld
@ 2002-06-06 19:30     ` Tom Rini
  2002-06-06 19:45     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 10+ messages in thread
From: Tom Rini @ 2002-06-06 19:30 UTC (permalink / raw)
  To: roger blofeld; +Cc: linuxppc-dev


On Thu, Jun 06, 2002 at 12:25:10PM -0700, roger blofeld wrote:

> I encounter an oops during boot bringing up a sungem
> interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I defer
> bringing up the network at boot, I can successfully
> start eth0 (sungem) if I start eth1 (tulip) first, so
> it may not be the sungem driver itself. This happens
> on benh 2.4.19-Bpre10, and pre9.

Have you tried gcc-3.0 or gcc-2.95 ?

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 19:25   ` Sungem bug or something else? roger blofeld
  2002-06-06 19:30     ` Tom Rini
@ 2002-06-06 19:45     ` Benjamin Herrenschmidt
  2002-06-06 20:35       ` roger blofeld
                         ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2002-06-06 19:45 UTC (permalink / raw)
  To: roger blofeld, linuxppc-dev

>I encounter an oops during boot bringing up a sungem
>interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I defer
>bringing up the network at boot, I can successfully
>start eth0 (sungem) if I start eth1 (tulip) first, so
>it may not be the sungem driver itself. This happens
>on benh 2.4.19-Bpre10, and pre9.

What kind of error is it ? A Machine Check ?

Looking at your backtrace, it looks like the driver is
trying to access the PHY chip. That can sometimes happen
if you have some tool like miitool or ethtool trying to
get at the link status while the chip isn't powered up.

The problem here is that sungem on Apple HW only powers
the chip when the interface is brought up, and powers it
down about 10 seconds after bringing the interface down.

This improve power management, but kills link monitoring
tools.

There may be also a bug in the driver causing it to try
to access the PHY registers when the chip is in down
mode & getting the ethtool ioctl's

Ben.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 20:41       ` Kevin B. Hendricks
@ 2002-06-06 20:25         ` benh
  2002-06-06 21:02         ` roger blofeld
  1 sibling, 0 replies; 10+ messages in thread
From: benh @ 2002-06-06 20:25 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: roger blofeld, linuxppc-dev


>Hi,
>
>Does sungem use autonegotiate to determine its interface type and speed
>(like some  of the more advanced interface drivers) or does it look at
>the rom or use a table?
>
>If it autonegotiates, does the driver actually wait long enough for the
>autonegotiation to fully complete before returning the first time?

It autonegociates first, then tries fixed speeds, etc..

>Under some tulip drivers, I noticed something very similar (but no oops,
>just a inability to use the driver until I rmmod and then insmod it
>once).  I think it happens because the the autonegotiation results where
>handled asynchronously and the main driver routine simply started it and
>returned before the auonegotiation actually completed and the interface
>and speed were properly determined.  The problem was right after
>bringing up the network in the boot sequence things tried to use it (the
>appletalk drivers, etc).  So if I waited to insert the module for the
>driver until after everything else was started (at the end of the
>bootsequence) all was well.
>
>This is all just a wag, but it is something to look at.

Nah, it's clearly the chip beeing powered down. I know I have a bug
in the driver that doesn't prevent HW access to the PHY via the
ethtool ioctl's when the chip is down, and that will cause a Machine
Check. I just didn't yet take the time to fix it.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 19:45     ` Benjamin Herrenschmidt
@ 2002-06-06 20:35       ` roger blofeld
  2002-06-06 20:41       ` Kevin B. Hendricks
  2002-06-07  0:51       ` roger blofeld
  2 siblings, 0 replies; 10+ messages in thread
From: roger blofeld @ 2002-06-06 20:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev


--- Benjamin Herrenschmidt <benh@kernel.crashing.org>
wrote:
> >I encounter an oops during boot bringing up a
> sungem
> >interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I
> defer
> >bringing up the network at boot, I can successfully
> >start eth0 (sungem) if I start eth1 (tulip) first,
> so
> >it may not be the sungem driver itself. This
> happens
> >on benh 2.4.19-Bpre10, and pre9.
>
> What kind of error is it ? A Machine Check ?
>
> Looking at your backtrace, it looks like the driver
> is
> trying to access the PHY chip. That can sometimes
> happen
> if you have some tool like miitool or ethtool trying
> to
> get at the link status while the chip isn't powered
> up.
>
> The problem here is that sungem on Apple HW only
> powers
> the chip when the interface is brought up, and
> powers it
> down about 10 seconds after bringing the interface
> down.
>
> This improve power management, but kills link
> monitoring
> tools.
>
> There may be also a bug in the driver causing it to
> try
> to access the PHY registers when the chip is in down
> mode & getting the ethtool ioctl's
>
> Ben.
>
>
Ben,
 I suspect your last thought may be correct. From the
oops:
Machine check in kernel mode.
Oops: machine check, sig: 7
NIP: C00DE2A8 XER: 20000000 LR: C00E319C SP: DDBB5E10
REGS: ddbb5d60 TRAP: 0200    Not tainted
MSR: 00049030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = ddbb4000[753] 'mii-tool' Last syscall: 54
last math 00000000 last altivec 00000000 CPU: 0

Note mii-tool is running.

The backtrace:

Trace; 02000000 Before first symbol
Trace; c00e319c <gem_ioctl+158/178>
Trace; c01785d0 <dev_ifsioc+414/484>
Trace; c0178850 <dev_ioctl+210/39c>
Trace; c01b6f08 <inet_ioctl+200/20c>
Trace; c016de8c <sock_ioctl+40/ac>
Trace; c005785c <sys_ioctl+13c/338>
Trace; c000601c <ret_from_syscall_1+0/b4>
Trace; 7ffff9e0 Before first symbol
Trace; 100012cc Before first symbol
Trace; 1000195c Before first symbol
Trace; 0fed8d94 Before first symbol
Trace; 00000000 Before first symbol

shows clearly that an ioctl is in progress.

-roger

=====
no microsoft products were used in the production of this email


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 19:45     ` Benjamin Herrenschmidt
  2002-06-06 20:35       ` roger blofeld
@ 2002-06-06 20:41       ` Kevin B. Hendricks
  2002-06-06 20:25         ` benh
  2002-06-06 21:02         ` roger blofeld
  2002-06-07  0:51       ` roger blofeld
  2 siblings, 2 replies; 10+ messages in thread
From: Kevin B. Hendricks @ 2002-06-06 20:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: roger blofeld, linuxppc-dev

Hi,

Does sungem use autonegotiate to determine its interface type and speed
(like some  of the more advanced interface drivers) or does it look at
the rom or use a table?

If it autonegotiates, does the driver actually wait long enough for the
autonegotiation to fully complete before returning the first time?

Under some tulip drivers, I noticed something very similar (but no oops,
just a inability to use the driver until I rmmod and then insmod it
once).  I think it happens because the the autonegotiation results where
handled asynchronously and the main driver routine simply started it and
returned before the auonegotiation actually completed and the interface
and speed were properly determined.  The problem was right after
bringing up the network in the boot sequence things tried to use it (the
appletalk drivers, etc).  So if I waited to insert the module for the
driver until after everything else was started (at the end of the
bootsequence) all was well.

This is all just a wag, but it is something to look at.

Kevin

On Thursday, June 6, 2002, at 03:45  PM, Benjamin Herrenschmidt wrote:

>
>> I encounter an oops during boot bringing up a sungem
>> interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I defer
>> bringing up the network at boot, I can successfully
>> start eth0 (sungem) if I start eth1 (tulip) first, so
>> it may not be the sungem driver itself. This happens
>> on benh 2.4.19-Bpre10, and pre9.
>
> What kind of error is it ? A Machine Check ?
>
> Looking at your backtrace, it looks like the driver is
> trying to access the PHY chip. That can sometimes happen
> if you have some tool like miitool or ethtool trying to
> get at the link status while the chip isn't powered up.
>
> The problem here is that sungem on Apple HW only powers
> the chip when the interface is brought up, and powers it
> down about 10 seconds after bringing the interface down.
>
> This improve power management, but kills link monitoring
> tools.
>
> There may be also a bug in the driver causing it to try
> to access the PHY registers when the chip is in down
> mode & getting the ethtool ioctl's
>
> Ben.
>
>
>

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 20:41       ` Kevin B. Hendricks
  2002-06-06 20:25         ` benh
@ 2002-06-06 21:02         ` roger blofeld
  1 sibling, 0 replies; 10+ messages in thread
From: roger blofeld @ 2002-06-06 21:02 UTC (permalink / raw)
  To: Kevin B. Hendricks, Benjamin Herrenschmidt; +Cc: roger blofeld, linuxppc-dev


Kevin,
 That is a good thought. After getting everything
working, I get:

# mii-tool eth0
eth0: autonegotiation failed, link ok

so I require the maximum timeout.
-roger
--- "Kevin B. Hendricks" <khendricks@ivey.uwo.ca>
wrote:
> Hi,
>
> Does sungem use autonegotiate to determine its
> interface type and speed
> (like some  of the more advanced interface drivers)
> or does it look at
> the rom or use a table?
>
> If it autonegotiates, does the driver actually wait
> long enough for the
> autonegotiation to fully complete before returning
> the first time?
>
> Under some tulip drivers, I noticed something very
> similar (but no oops,
> just a inability to use the driver until I rmmod and
> then insmod it
> once).  I think it happens because the the
> autonegotiation results where
> handled asynchronously and the main driver routine
> simply started it and
> returned before the auonegotiation actually
> completed and the interface
> and speed were properly determined.  The problem was
> right after
> bringing up the network in the boot sequence things
> tried to use it (the
> appletalk drivers, etc).  So if I waited to insert
> the module for the
> driver until after everything else was started (at
> the end of the
> bootsequence) all was well.
>
> This is all just a wag, but it is something to look
> at.
>
> Kevin
>
> On Thursday, June 6, 2002, at 03:45  PM, Benjamin
> Herrenschmidt wrote:
>
> >
> >> I encounter an oops during boot bringing up a
> sungem
> >> interface. (smp g4 450/gcc 3.1/glibc 2.2.5) If I
> defer
> >> bringing up the network at boot, I can
> successfully
> >> start eth0 (sungem) if I start eth1 (tulip)
> first, so
> >> it may not be the sungem driver itself. This
> happens
> >> on benh 2.4.19-Bpre10, and pre9.
> >
> > What kind of error is it ? A Machine Check ?
> >
> > Looking at your backtrace, it looks like the
> driver is
> > trying to access the PHY chip. That can sometimes
> happen
> > if you have some tool like miitool or ethtool
> trying to
> > get at the link status while the chip isn't
> powered up.
> >
> > The problem here is that sungem on Apple HW only
> powers
> > the chip when the interface is brought up, and
> powers it
> > down about 10 seconds after bringing the interface
> down.
> >
> > This improve power management, but kills link
> monitoring
> > tools.
> >
> > There may be also a bug in the driver causing it
> to try
> > to access the PHY registers when the chip is in
> down
> > mode & getting the ethtool ioctl's
> >
> > Ben.
> >
> >
> >
>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sungem bug or something else?
  2002-06-06 19:45     ` Benjamin Herrenschmidt
  2002-06-06 20:35       ` roger blofeld
  2002-06-06 20:41       ` Kevin B. Hendricks
@ 2002-06-07  0:51       ` roger blofeld
  2 siblings, 0 replies; 10+ messages in thread
From: roger blofeld @ 2002-06-07  0:51 UTC (permalink / raw)
  To: linuxppc-dev


The problem was triggered by upgrading to
initscripts-6.76. The new ifup script calls the
network-scripts function check_link_down, which in
turn calls 'ip link set up eth0', then mii-tools.
Apparently at boot time the phy is not powered,
causing the oops.
Work-around: remove the link check in ifup
-roger


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-06-07  0:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-05 20:20 Patch: Fix SMP hang on modem close roger blofeld
2002-04-06 10:24 ` Benjamin Herrenschmidt
2002-06-06 19:25   ` Sungem bug or something else? roger blofeld
2002-06-06 19:30     ` Tom Rini
2002-06-06 19:45     ` Benjamin Herrenschmidt
2002-06-06 20:35       ` roger blofeld
2002-06-06 20:41       ` Kevin B. Hendricks
2002-06-06 20:25         ` benh
2002-06-06 21:02         ` roger blofeld
2002-06-07  0:51       ` roger blofeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).