* 440gx ethernet lockup @ 2003-11-21 1:54 Brian Kuschak 2003-11-21 2:36 ` Eugene Surovegin 0 siblings, 1 reply; 8+ messages in thread From: Brian Kuschak @ 2003-11-21 1:54 UTC (permalink / raw) To: linuxppc-embedded Is anyone actively working with ethernet on the 440GX? I'm seeing TX lockups when stressing the 10/100 EMAC with heavy NFS traffic: find /nfs_mnt -type f |xargs grep blahblahblah This is the only way I can make it happen, but it does happen quickly. The 'get_next_packet' bit is set but never clears. The EMAC_ISR doesn't have any unusual errors (except for some deferrals). The 'dead_bit' is _not_ asserted. The MAL channels are enabled, and so is the EMAC. No TXDE interrupts have occurred. The BD ring is filled with packets ready to send. The same code on a 440GP works fine. This version of the CPU (PVR 0x51b21851) is supposed to have all the EMAC-related errata fixed, but this smells like a silicon bug to me. I'm still waiting on a response from IBM... Any ideas? Thanks, Brian ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-11-21 1:54 440gx ethernet lockup Brian Kuschak @ 2003-11-21 2:36 ` Eugene Surovegin 2003-11-21 6:32 ` Brian Kuschak 0 siblings, 1 reply; 8+ messages in thread From: Eugene Surovegin @ 2003-11-21 2:36 UTC (permalink / raw) To: Brian Kuschak; +Cc: linuxppc-embedded On Thu, Nov 20, 2003 at 05:54:49PM -0800, Brian Kuschak wrote: > > Is anyone actively working with ethernet on the 440GX? > I'm seeing TX lockups when stressing the 10/100 EMAC > with heavy NFS traffic: > > find /nfs_mnt -type f |xargs grep blahblahblah > > This is the only way I can make it happen, but it does > happen quickly. > > The 'get_next_packet' bit is set but never clears. > The EMAC_ISR doesn't have any unusual errors (except > for some deferrals). The 'dead_bit' is _not_ > asserted. The MAL channels are enabled, and so is the > EMAC. No TXDE interrupts have occurred. The BD ring > is filled with packets ready to send. The same code > on a 440GP works fine. This version of the CPU (PVR > 0x51b21851) is supposed to have all the EMAC-related > errata fixed, but this smells like a silicon bug to > me. What kernel version? What board? Eval or your custom one? If the custom one, did you try your test on eval? How long does it usually take to get into lock up state? I've just ran your "find" cmd for 10 minutes on our 440GX board without any problems. What clock mode are you using (533/152, 500/166 or smth else)? Do you have L2C enabled? If yes, please check L2C0_SR for parity errors (we have some problems with several our 440GX boards). Eugene. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-11-21 2:36 ` Eugene Surovegin @ 2003-11-21 6:32 ` Brian Kuschak 2003-11-21 7:11 ` Eugene Surovegin 0 siblings, 1 reply; 8+ messages in thread From: Brian Kuschak @ 2003-11-21 6:32 UTC (permalink / raw) To: Eugene Surovegin; +Cc: linuxppc-embedded > > What kernel version? What board? Eval or your custom > one? > If the custom one, did you try your test on eval? > It's based on 2.4.19 linuxppc_2_4 plus IBM's 440gx_nova_fp patch. Custom hardware. During previous testing on eval board there was a loss of connectivity, which in hindsight could have been this problem. I'm working on reproducing it on the eval board. > How long does it usually take to get into lock up > state? > > I've just ran your "find" cmd for 10 minutes on our > 440GX board > without any problems. > With aforementioned command, <5000 to 300,000 packets, if MSWM bit is disabled or enabled, respectively. Only happens if nfs is mounted tcp, udp mode doesn't seem to trigger it, or at least not as quickly. > What clock mode are you using (533/152, 500/166 or > smth else)? > Do you have L2C enabled? If yes, please check > L2C0_SR for parity > errors (we have some problems with several our 440GX > boards). > 666/166. CONFIG_440GX_L2_INSTRUCTION=y CONFIG_440GXL2_CACHE=y I'll check on the parity errors, although I'm not sure how that could lockup the EMAC. Brian > Eugene. > > > ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-11-21 6:32 ` Brian Kuschak @ 2003-11-21 7:11 ` Eugene Surovegin 2003-12-02 21:01 ` Brian Kuschak 0 siblings, 1 reply; 8+ messages in thread From: Eugene Surovegin @ 2003-11-21 7:11 UTC (permalink / raw) To: Brian Kuschak; +Cc: linuxppc-embedded On Thu, Nov 20, 2003 at 10:32:47PM -0800, Brian Kuschak wrote: > > > > > What kernel version? What board? Eval or your custom > > one? > > If the custom one, did you try your test on eval? > > > > It's based on 2.4.19 linuxppc_2_4 plus IBM's > 440gx_nova_fp patch. I got impression that IBM patch wasn't of a good quality (the one against mvl 2.4.17). linuxppc-2.4 tree has some changes for EMAC4, not sure they were in IBM's patch. > Custom hardware. During > previous testing on eval board there was a loss of > connectivity, which in hindsight could have been this > problem. I'm working on reproducing it on the eval > board. > > > How long does it usually take to get into lock up > > state? > > > > I've just ran your "find" cmd for 10 minutes on our > > 440GX board > > without any problems. > > > > With aforementioned command, <5000 to 300,000 packets, > if MSWM bit is disabled or enabled, respectively. > Only happens if nfs is mounted tcp, udp mode doesn't > seem to trigger it, or at least not as quickly. Hmm, I was testing NFS over UDP for 40 min. Not sure I can easily test NFS over TCP. What about netperf? > > What clock mode are you using (533/152, 500/166 or > > smth else)? > > Do you have L2C enabled? If yes, please check > > L2C0_SR for parity > > errors (we have some problems with several our 440GX > > boards). > > > > 666/166. You may want to try lower speeds. May help to isolate problem :) > CONFIG_440GX_L2_INSTRUCTION=y > CONFIG_440GXL2_CACHE=y > > I'll check on the parity errors, although I'm not sure > how that could lockup the EMAC. Well, you can get corrupted code with impredictable results. We are still investigating these L2C parity errors. I saw several times something similar to your situation when EMAC was stuck on faulty board, although I didn't look at it hard. Eugene. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-11-21 7:11 ` Eugene Surovegin @ 2003-12-02 21:01 ` Brian Kuschak 2003-12-02 21:57 ` Cort Dougan 2003-12-03 3:43 ` Eugene Surovegin 0 siblings, 2 replies; 8+ messages in thread From: Brian Kuschak @ 2003-12-02 21:01 UTC (permalink / raw) To: Eugene Surovegin; +Cc: linuxppc-embedded I found the problem. Our SEEPROM was misconfigured to set the MAL clock to PLB/4. This resulted in a MAL clock of 41MHz, apparently too slow... After changing the MAL clock to PLB/1, the system ran for days without any problems. Unfortunately I couldn't find any IBM docs which specify requirements for the MAL clock, and IBM still hasn't responded. BTW, is anyone else having trouble with ppcsupp lately? Brian > > > What kernel version? What board? Eval or your > custom > > > one? > > > If the custom one, did you try your test on > eval? > > > > > > > It's based on 2.4.19 linuxppc_2_4 plus IBM's > > 440gx_nova_fp patch. > > I got impression that IBM patch wasn't of a good > quality (the one > against mvl 2.4.17). > linuxppc-2.4 tree has some changes for EMAC4, not > sure they > were in IBM's patch. > > > Custom hardware. During > > previous testing on eval board there was a loss of > > connectivity, which in hindsight could have been > this > > problem. I'm working on reproducing it on the > eval > > board. > > > > > How long does it usually take to get into lock > up > > > state? > > > > > > I've just ran your "find" cmd for 10 minutes on > our > > > 440GX board > > > without any problems. > > > > > > > With aforementioned command, <5000 to 300,000 > packets, > > if MSWM bit is disabled or enabled, respectively. > > Only happens if nfs is mounted tcp, udp mode > doesn't > > seem to trigger it, or at least not as quickly. > > Hmm, I was testing NFS over UDP for 40 min. Not sure > I can easily test > NFS over TCP. What about netperf? > > > > What clock mode are you using (533/152, 500/166 > or > > > smth else)? > > > Do you have L2C enabled? If yes, please check > > > L2C0_SR for parity > > > errors (we have some problems with several our > 440GX > > > boards). > > > > > > > 666/166. > > You may want to try lower speeds. May help to > isolate problem :) > > > CONFIG_440GX_L2_INSTRUCTION=y > > CONFIG_440GXL2_CACHE=y > > > > I'll check on the parity errors, although I'm not > sure > > how that could lockup the EMAC. > > Well, you can get corrupted code with impredictable > results. We are > still investigating these L2C parity errors. I saw > several times > something similar to your situation when EMAC was > stuck on faulty > board, although I didn't look at it hard. > > Eugene. > > ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-12-02 21:01 ` Brian Kuschak @ 2003-12-02 21:57 ` Cort Dougan 2003-12-03 3:43 ` Eugene Surovegin 1 sibling, 0 replies; 8+ messages in thread From: Cort Dougan @ 2003-12-02 21:57 UTC (permalink / raw) To: Brian Kuschak; +Cc: Eugene Surovegin, linuxppc-embedded ppcsup seems to be pretty busy lately. I've pushed them on some important issues and they take action but it seems that there are a lot of very important issues right now. } I found the problem. Our SEEPROM was misconfigured to } set the MAL clock to PLB/4. This resulted in a MAL } clock of 41MHz, apparently too slow... After changing } the MAL clock to PLB/1, the system ran for days } without any problems. Unfortunately I couldn't find } any IBM docs which specify requirements for the MAL } clock, and IBM still hasn't responded. } } BTW, is anyone else having trouble with ppcsupp } lately? } } Brian } } } > > > What kernel version? What board? Eval or your } > custom } > > > one? } > > > If the custom one, did you try your test on } > eval? } > > > } > > } > > It's based on 2.4.19 linuxppc_2_4 plus IBM's } > > 440gx_nova_fp patch. } > } > I got impression that IBM patch wasn't of a good } > quality (the one } > against mvl 2.4.17). } > linuxppc-2.4 tree has some changes for EMAC4, not } > sure they } > were in IBM's patch. } > } > > Custom hardware. During } > > previous testing on eval board there was a loss of } > > connectivity, which in hindsight could have been } > this } > > problem. I'm working on reproducing it on the } > eval } > > board. } > > } > > > How long does it usually take to get into lock } > up } > > > state? } > > > } > > > I've just ran your "find" cmd for 10 minutes on } > our } > > > 440GX board } > > > without any problems. } > > > } > > } > > With aforementioned command, <5000 to 300,000 } > packets, } > > if MSWM bit is disabled or enabled, respectively. } > > Only happens if nfs is mounted tcp, udp mode } > doesn't } > > seem to trigger it, or at least not as quickly. } > } > Hmm, I was testing NFS over UDP for 40 min. Not sure } > I can easily test } > NFS over TCP. What about netperf? } > } > > > What clock mode are you using (533/152, 500/166 } > or } > > > smth else)? } > > > Do you have L2C enabled? If yes, please check } > > > L2C0_SR for parity } > > > errors (we have some problems with several our } > 440GX } > > > boards). } > > > } > > } > > 666/166. } > } > You may want to try lower speeds. May help to } > isolate problem :) } > } > > CONFIG_440GX_L2_INSTRUCTION=y } > > CONFIG_440GXL2_CACHE=y } > > } > > I'll check on the parity errors, although I'm not } > sure } > > how that could lockup the EMAC. } > } > Well, you can get corrupted code with impredictable } > results. We are } > still investigating these L2C parity errors. I saw } > several times } > something similar to your situation when EMAC was } > stuck on faulty } > board, although I didn't look at it hard. } > } > Eugene. } > } > } } -- Cort Dougan Director of Engineering, FSMLabs Office: (505) 838-9109 ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-12-02 21:01 ` Brian Kuschak 2003-12-02 21:57 ` Cort Dougan @ 2003-12-03 3:43 ` Eugene Surovegin 2003-12-03 18:28 ` Brian Kuschak 1 sibling, 1 reply; 8+ messages in thread From: Eugene Surovegin @ 2003-12-03 3:43 UTC (permalink / raw) To: Brian Kuschak; +Cc: linuxppc-embedded On Tue, Dec 02, 2003 at 01:01:18PM -0800, Brian Kuschak wrote: > > I found the problem. Our SEEPROM was misconfigured to > set the MAL clock to PLB/4. This resulted in a MAL > clock of 41MHz, apparently too slow... After changing > the MAL clock to PLB/1, the system ran for days > without any problems. Unfortunately I couldn't find > any IBM docs which specify requirements for the MAL > clock, and IBM still hasn't responded. > Well, I just checked 440GX manual and it does states that minimum MAL clock is 45MHz (see CPR0_MALD register definition). Eugene. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 440gx ethernet lockup 2003-12-03 3:43 ` Eugene Surovegin @ 2003-12-03 18:28 ` Brian Kuschak 0 siblings, 0 replies; 8+ messages in thread From: Brian Kuschak @ 2003-12-03 18:28 UTC (permalink / raw) To: Eugene Surovegin; +Cc: linuxppc-embedded > > Well, I just checked 440GX manual and it does states > that minimum MAL > clock is 45MHz (see CPR0_MALD register definition). > > Eugene. Hmmm... Mine doesn't say that. Which revision of the manual are you using? Paper or PDF? That is good information, though. Thanks. Brian ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-12-03 18:28 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-11-21 1:54 440gx ethernet lockup Brian Kuschak 2003-11-21 2:36 ` Eugene Surovegin 2003-11-21 6:32 ` Brian Kuschak 2003-11-21 7:11 ` Eugene Surovegin 2003-12-02 21:01 ` Brian Kuschak 2003-12-02 21:57 ` Cort Dougan 2003-12-03 3:43 ` Eugene Surovegin 2003-12-03 18:28 ` Brian Kuschak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).