440gx ethernet lockup

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* 440gx ethernet lockup
@ 2003-11-21  1:54 Brian Kuschak
  2003-11-21  2:36 ` Eugene Surovegin
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Kuschak @ 2003-11-21  1:54 UTC (permalink / raw)
  To: linuxppc-embedded

Is anyone actively working with ethernet on the 440GX?
 I'm seeing TX lockups when stressing the 10/100 EMAC
with heavy NFS traffic:

find /nfs_mnt -type f |xargs grep blahblahblah

This is the only way I can make it happen, but it does
happen quickly.

The 'get_next_packet' bit is set but never clears.
The EMAC_ISR doesn't have any unusual errors (except
for some deferrals).  The 'dead_bit' is _not_
asserted.  The MAL channels are enabled, and so is the
EMAC.  No TXDE interrupts have occurred.  The BD ring
is filled with packets ready to send.  The same code
on a 440GP works fine.  This version of the CPU (PVR
0x51b21851) is supposed to have all the EMAC-related
errata fixed, but this smells like a silicon bug to
me.

I'm still waiting on a response from IBM...  Any
ideas?

Thanks,
Brian

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-11-21  1:54 440gx ethernet lockup Brian Kuschak
@ 2003-11-21  2:36 ` Eugene Surovegin
  2003-11-21  6:32   ` Brian Kuschak
  0 siblings, 1 reply; 8+ messages in thread
From: Eugene Surovegin @ 2003-11-21  2:36 UTC (permalink / raw)
  To: Brian Kuschak; +Cc: linuxppc-embedded


On Thu, Nov 20, 2003 at 05:54:49PM -0800, Brian Kuschak wrote:
>
> Is anyone actively working with ethernet on the 440GX?
>  I'm seeing TX lockups when stressing the 10/100 EMAC
> with heavy NFS traffic:
>
> find /nfs_mnt -type f |xargs grep blahblahblah
>
> This is the only way I can make it happen, but it does
> happen quickly.
>
> The 'get_next_packet' bit is set but never clears.
> The EMAC_ISR doesn't have any unusual errors (except
> for some deferrals).  The 'dead_bit' is _not_
> asserted.  The MAL channels are enabled, and so is the
> EMAC.  No TXDE interrupts have occurred.  The BD ring
> is filled with packets ready to send.  The same code
> on a 440GP works fine.  This version of the CPU (PVR
> 0x51b21851) is supposed to have all the EMAC-related
> errata fixed, but this smells like a silicon bug to
> me.

What kernel version? What board? Eval or your custom one?
If the custom one, did you try your test on eval?

How long does it usually take to get into lock up state?

I've just ran your "find" cmd for 10 minutes on our 440GX board
without any problems.

What clock mode are you using (533/152, 500/166 or smth else)?
Do you have L2C enabled? If yes, please check L2C0_SR for parity
errors (we have some problems with several our 440GX boards).

Eugene.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-11-21  2:36 ` Eugene Surovegin
@ 2003-11-21  6:32   ` Brian Kuschak
  2003-11-21  7:11     ` Eugene Surovegin
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Kuschak @ 2003-11-21  6:32 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded

>
> What kernel version? What board? Eval or your custom
> one?
> If the custom one, did you try your test on eval?
>

It's based on 2.4.19 linuxppc_2_4 plus IBM's
440gx_nova_fp patch.  Custom hardware.  During
previous testing on eval board there was a loss of
connectivity, which in hindsight could have been this
problem.  I'm working on reproducing it on the eval
board.

> How long does it usually take to get into lock up
> state?
>
> I've just ran your "find" cmd for 10 minutes on our
> 440GX board
> without any problems.
>

With aforementioned command, <5000 to 300,000 packets,
if MSWM bit is disabled or enabled, respectively.
Only happens if nfs is mounted tcp, udp mode doesn't
seem to trigger it, or at least not as quickly.

> What clock mode are you using (533/152, 500/166 or
> smth else)?
> Do you have L2C enabled? If yes, please check
> L2C0_SR for parity
> errors (we have some problems with several our 440GX
> boards).
>

666/166.
CONFIG_440GX_L2_INSTRUCTION=y
CONFIG_440GXL2_CACHE=y

I'll check on the parity errors, although I'm not sure
how that could lockup the EMAC.

Brian

> Eugene.
>
>
>

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-11-21  6:32   ` Brian Kuschak
@ 2003-11-21  7:11     ` Eugene Surovegin
  2003-12-02 21:01       ` Brian Kuschak
  0 siblings, 1 reply; 8+ messages in thread
From: Eugene Surovegin @ 2003-11-21  7:11 UTC (permalink / raw)
  To: Brian Kuschak; +Cc: linuxppc-embedded


On Thu, Nov 20, 2003 at 10:32:47PM -0800, Brian Kuschak wrote:
>
> >
> > What kernel version? What board? Eval or your custom
> > one?
> > If the custom one, did you try your test on eval?
> >
>
> It's based on 2.4.19 linuxppc_2_4 plus IBM's
> 440gx_nova_fp patch.

I got impression that IBM patch wasn't of a good quality (the one
against mvl 2.4.17).
linuxppc-2.4 tree has some changes for EMAC4, not sure they
were in IBM's patch.

> Custom hardware.  During
> previous testing on eval board there was a loss of
> connectivity, which in hindsight could have been this
> problem.  I'm working on reproducing it on the eval
> board.
>
> > How long does it usually take to get into lock up
> > state?
> >
> > I've just ran your "find" cmd for 10 minutes on our
> > 440GX board
> > without any problems.
> >
>
> With aforementioned command, <5000 to 300,000 packets,
> if MSWM bit is disabled or enabled, respectively.
> Only happens if nfs is mounted tcp, udp mode doesn't
> seem to trigger it, or at least not as quickly.

Hmm, I was testing NFS over UDP for 40 min. Not sure I can easily test
NFS over TCP. What about netperf?

> > What clock mode are you using (533/152, 500/166 or
> > smth else)?
> > Do you have L2C enabled? If yes, please check
> > L2C0_SR for parity
> > errors (we have some problems with several our 440GX
> > boards).
> >
>
> 666/166.

You may want to try lower speeds. May help to isolate problem :)

> CONFIG_440GX_L2_INSTRUCTION=y
> CONFIG_440GXL2_CACHE=y
>
> I'll check on the parity errors, although I'm not sure
> how that could lockup the EMAC.

Well, you can get corrupted code with impredictable results. We are
still investigating these L2C parity errors. I saw several times
something similar to your situation when EMAC was stuck on faulty
board, although I didn't look at it hard.

Eugene.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-11-21  7:11     ` Eugene Surovegin
@ 2003-12-02 21:01       ` Brian Kuschak
  2003-12-02 21:57         ` Cort Dougan
  2003-12-03  3:43         ` Eugene Surovegin
  0 siblings, 2 replies; 8+ messages in thread
From: Brian Kuschak @ 2003-12-02 21:01 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded


I found the problem.  Our SEEPROM was misconfigured to
set the MAL clock to PLB/4.  This resulted in a MAL
clock of 41MHz, apparently too slow...  After changing
the MAL clock to PLB/1, the system ran for days
without any problems.  Unfortunately I couldn't find
any IBM docs which specify requirements for the MAL
clock, and IBM still hasn't responded.

BTW, is anyone else having trouble with ppcsupp
lately?

Brian


> > > What kernel version? What board? Eval or your
> custom
> > > one?
> > > If the custom one, did you try your test on
> eval?
> > >
> >
> > It's based on 2.4.19 linuxppc_2_4 plus IBM's
> > 440gx_nova_fp patch.
>
> I got impression that IBM patch wasn't of a good
> quality (the one
> against mvl 2.4.17).
> linuxppc-2.4 tree has some changes for EMAC4, not
> sure they
> were in IBM's patch.
>
> > Custom hardware.  During
> > previous testing on eval board there was a loss of
> > connectivity, which in hindsight could have been
> this
> > problem.  I'm working on reproducing it on the
> eval
> > board.
> >
> > > How long does it usually take to get into lock
> up
> > > state?
> > >
> > > I've just ran your "find" cmd for 10 minutes on
> our
> > > 440GX board
> > > without any problems.
> > >
> >
> > With aforementioned command, <5000 to 300,000
> packets,
> > if MSWM bit is disabled or enabled, respectively.
> > Only happens if nfs is mounted tcp, udp mode
> doesn't
> > seem to trigger it, or at least not as quickly.
>
> Hmm, I was testing NFS over UDP for 40 min. Not sure
> I can easily test
> NFS over TCP. What about netperf?
>
> > > What clock mode are you using (533/152, 500/166
> or
> > > smth else)?
> > > Do you have L2C enabled? If yes, please check
> > > L2C0_SR for parity
> > > errors (we have some problems with several our
> 440GX
> > > boards).
> > >
> >
> > 666/166.
>
> You may want to try lower speeds. May help to
> isolate problem :)
>
> > CONFIG_440GX_L2_INSTRUCTION=y
> > CONFIG_440GXL2_CACHE=y
> >
> > I'll check on the parity errors, although I'm not
> sure
> > how that could lockup the EMAC.
>
> Well, you can get corrupted code with impredictable
> results. We are
> still investigating these L2C parity errors. I saw
> several times
> something similar to your situation when EMAC was
> stuck on faulty
> board, although I didn't look at it hard.
>
> Eugene.
>
>


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-12-02 21:01       ` Brian Kuschak
@ 2003-12-02 21:57         ` Cort Dougan
  2003-12-03  3:43         ` Eugene Surovegin
  1 sibling, 0 replies; 8+ messages in thread
From: Cort Dougan @ 2003-12-02 21:57 UTC (permalink / raw)
  To: Brian Kuschak; +Cc: Eugene Surovegin, linuxppc-embedded


ppcsup seems to be pretty busy lately.  I've pushed them on some important
issues and they take action but it seems that there are a lot of very
important issues right now.

} I found the problem.  Our SEEPROM was misconfigured to
} set the MAL clock to PLB/4.  This resulted in a MAL
} clock of 41MHz, apparently too slow...  After changing
} the MAL clock to PLB/1, the system ran for days
} without any problems.  Unfortunately I couldn't find
} any IBM docs which specify requirements for the MAL
} clock, and IBM still hasn't responded.
}
} BTW, is anyone else having trouble with ppcsupp
} lately?
}
} Brian
}
}
} > > > What kernel version? What board? Eval or your
} > custom
} > > > one?
} > > > If the custom one, did you try your test on
} > eval?
} > > >
} > >
} > > It's based on 2.4.19 linuxppc_2_4 plus IBM's
} > > 440gx_nova_fp patch.
} >
} > I got impression that IBM patch wasn't of a good
} > quality (the one
} > against mvl 2.4.17).
} > linuxppc-2.4 tree has some changes for EMAC4, not
} > sure they
} > were in IBM's patch.
} >
} > > Custom hardware.  During
} > > previous testing on eval board there was a loss of
} > > connectivity, which in hindsight could have been
} > this
} > > problem.  I'm working on reproducing it on the
} > eval
} > > board.
} > >
} > > > How long does it usually take to get into lock
} > up
} > > > state?
} > > >
} > > > I've just ran your "find" cmd for 10 minutes on
} > our
} > > > 440GX board
} > > > without any problems.
} > > >
} > >
} > > With aforementioned command, <5000 to 300,000
} > packets,
} > > if MSWM bit is disabled or enabled, respectively.
} > > Only happens if nfs is mounted tcp, udp mode
} > doesn't
} > > seem to trigger it, or at least not as quickly.
} >
} > Hmm, I was testing NFS over UDP for 40 min. Not sure
} > I can easily test
} > NFS over TCP. What about netperf?
} >
} > > > What clock mode are you using (533/152, 500/166
} > or
} > > > smth else)?
} > > > Do you have L2C enabled? If yes, please check
} > > > L2C0_SR for parity
} > > > errors (we have some problems with several our
} > 440GX
} > > > boards).
} > > >
} > >
} > > 666/166.
} >
} > You may want to try lower speeds. May help to
} > isolate problem :)
} >
} > > CONFIG_440GX_L2_INSTRUCTION=y
} > > CONFIG_440GXL2_CACHE=y
} > >
} > > I'll check on the parity errors, although I'm not
} > sure
} > > how that could lockup the EMAC.
} >
} > Well, you can get corrupted code with impredictable
} > results. We are
} > still investigating these L2C parity errors. I saw
} > several times
} > something similar to your situation when EMAC was
} > stuck on faulty
} > board, although I didn't look at it hard.
} >
} > Eugene.
} >
} >
}
}

--
Cort Dougan
Director of Engineering, FSMLabs
Office: (505) 838-9109

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-12-02 21:01       ` Brian Kuschak
  2003-12-02 21:57         ` Cort Dougan
@ 2003-12-03  3:43         ` Eugene Surovegin
  2003-12-03 18:28           ` Brian Kuschak
  1 sibling, 1 reply; 8+ messages in thread
From: Eugene Surovegin @ 2003-12-03  3:43 UTC (permalink / raw)
  To: Brian Kuschak; +Cc: linuxppc-embedded


On Tue, Dec 02, 2003 at 01:01:18PM -0800, Brian Kuschak wrote:
>
> I found the problem.  Our SEEPROM was misconfigured to
> set the MAL clock to PLB/4.  This resulted in a MAL
> clock of 41MHz, apparently too slow...  After changing
> the MAL clock to PLB/1, the system ran for days
> without any problems.  Unfortunately I couldn't find
> any IBM docs which specify requirements for the MAL
> clock, and IBM still hasn't responded.
>

Well, I just checked 440GX manual and it does states that minimum MAL
clock is 45MHz (see CPR0_MALD register definition).

Eugene.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 440gx ethernet lockup
  2003-12-03  3:43         ` Eugene Surovegin
@ 2003-12-03 18:28           ` Brian Kuschak
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Kuschak @ 2003-12-03 18:28 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded


>
> Well, I just checked 440GX manual and it does states
> that minimum MAL
> clock is 45MHz (see CPR0_MALD register definition).
>
> Eugene.

Hmmm... Mine doesn't say that.  Which revision of the
manual are you using?  Paper or PDF?

That is good information, though.  Thanks.

Brian


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-12-03 18:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-21  1:54 440gx ethernet lockup Brian Kuschak
2003-11-21  2:36 ` Eugene Surovegin
2003-11-21  6:32   ` Brian Kuschak
2003-11-21  7:11     ` Eugene Surovegin
2003-12-02 21:01       ` Brian Kuschak
2003-12-02 21:57         ` Cort Dougan
2003-12-03  3:43         ` Eugene Surovegin
2003-12-03 18:28           ` Brian Kuschak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).