* Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
@ 2004-03-25 23:13 Marcelo Tosatti
2004-03-26 6:51 ` Dan Malek
2004-03-26 7:23 ` LC Geldenhuys
0 siblings, 2 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2004-03-25 23:13 UTC (permalink / raw)
To: linuxppc-embedded
Hi,
We encountered a problem with our MPC855T based appliances under heavy
load. The crashes looked like this:
Oops: Kernel Mode Software FPU Emulation, sig: 8
NIP: 00001FFC XER: 20000000 LR: 00000590 SP: C0D99DC0 REGS: c0d99d10 TRAP:1000 Not tainted
MSR: 00001000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
TASK = c0d98000[115] 'webs' Last syscall: 102
last math 00000000 last altivec 00000000
GPR00: 00000001 C0D99DC0 C0D98000 C0D99DD0 00000000 00000001 000005A8
00000000
GPR08: C59ED4F0 00000000 00000000 00000002 00000000 1007335C 00000000
00000000
GPR16: 00000000 00000000 00000000 00000000 00001032 00D99DC0 C00CFFFC
00009032
GPR24: C00027C0 10049C90 00000000 00000002 000005A8 C653D000 C59ED630
C653D0D8
Call backtrace:
C00E2470 C00CFFBC C00C4610 C00E25A4 C00A2A00 C00A3024 C000281C
00000001 100330A4 10033B1C 10041F40 10029418 1002E530 1003FA7C
1003F240 1003F140 1003485C 100346A4 1000234C 0FBE7FDC 00000000
The kernel crashed trying to execute address "00001FFC". I have seen similar
reports on linux PPC lists archives. The problem is that "bl transfer_to_handler"
(transfer_to_handler is at "2000") was jumping to "1FFC" instead, in some rare ocasions
(only under heavy network/memory activity).
After thinking for a while and talking to Dan Malek, it seems "isync" instructions before
"bl transfer_to_handler" are required to avoid cache coherency problems.
I'm not exactly sure why we were jumping to "1FFC" instead of "2000",
but adding "isync" before "bl transfer_to_handler" in both DecrementTimer
and HardwareInterrupt fixed the problem for us.
On the following patch against 2.4.25 I also add "isync" to FINISH_EXCEPTION define, for safety.
The performance impact of this is hardly noticeable.
It seems Dan is OK including this into linuxppc-2.4 repository. We also want
to add this to 2.6 when the m8xx support gets fixed.
Regards,
--- head_8xx.S.orig 2004-03-25 18:30:49.323575664 -0300
+++ head_8xx.S 2004-03-25 18:32:00.464760560 -0300
@@ -172,6 +172,7 @@
*/
#define FINISH_EXCEPTION(func) \
+ isync; \
bl transfer_to_handler; \
.long func; \
.long ret_from_except
@@ -228,6 +229,7 @@
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
li r4,0
+ isync
bl transfer_to_handler
.globl do_IRQ_intercept
do_IRQ_intercept:
@@ -265,6 +267,7 @@
EXCEPTION_PROLOG
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
+ isync
bl transfer_to_handler
.globl timer_interrupt_intercept
timer_interrupt_intercept:
----- End forwarded message -----
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
2004-03-25 23:13 Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors Marcelo Tosatti
@ 2004-03-26 6:51 ` Dan Malek
2004-03-26 9:04 ` Wolfgang Denk
2004-03-26 7:23 ` LC Geldenhuys
1 sibling, 1 reply; 6+ messages in thread
From: Dan Malek @ 2004-03-26 6:51 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linuxppc-embedded
Marcelo Tosatti wrote:
> We encountered a problem with our MPC855T based appliances under heavy
> load. The crashes looked like this:
> The kernel crashed trying to execute address "00001FFC". I have seen similar
> reports on linux PPC lists archives. The problem is that "bl transfer_to_handler"
> (transfer_to_handler is at "2000") was jumping to "1FFC" instead, in some rare ocasions
> (only under heavy network/memory activity).
Here is my standard answer to bad things happening under heavy network
activity. Something is likely wrong with the SDRAM UPM Burst Mode programming.
The only way you can get back to back burst mode bus operations is with the
core very busy and the CPM or FEC performing DMA. Neither one on their own
can generate this special case bus cycle. I've seen this myself, and the
cause was always the same. It's a PITA to debug, but I still suspect that is
the problem.
I don't remember the details of our IRC discusson, but one thing I would suggest
to test this is setting the Burst Inhibit (BI) flag in the memory controller
for the SDRAM chip select.
> After thinking for a while and talking to Dan Malek, it seems "isync" instructions before
> "bl transfer_to_handler" are required to avoid cache coherency problems.
I was actually thinking of a different interrupt controller problem. I am
surprised this works. This isn't a cache coherency problem.
> I'm not exactly sure why we were jumping to "1FFC" instead of "2000",
> but adding "isync" before "bl transfer_to_handler" in both DecrementTimer
> and HardwareInterrupt fixed the problem for us.
That's just too weird. We need to understand why this happens. Here is another
test. At about line 652, change the:
. = 0x2000
to:
. = 0x1ffc
nop
Let's see if it happens to jump to any other location or if this one is
special.
> On the following patch against 2.4.25 I also add "isync" .....
Let's put a big comment around this. Indicate it was a problem for
one person with an 855T. I don't have any 855T parts, if anyone else has
some and can do some heavy network testing, I'd appreciate knowing the
results. Like I keep saying, I've seen similar problems on the 860T parts,
but it was clearly my fault programming the UPM. Once that was fixed,
problem solved.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
2004-03-26 6:51 ` Dan Malek
@ 2004-03-26 9:04 ` Wolfgang Denk
2004-03-26 13:44 ` Dan Malek
0 siblings, 1 reply; 6+ messages in thread
From: Wolfgang Denk @ 2004-03-26 9:04 UTC (permalink / raw)
To: Dan Malek; +Cc: Marcelo Tosatti, linuxppc-embedded
In message <4063D300.2060006@embeddededge.com> Dan Malek wrote:
>
> Let's put a big comment around this. Indicate it was a problem for one
> person with an 855T. I don't have any 855T parts, if anyone else has
> some and can do some heavy network testing, I'd appreciate knowing the
Can you define which sort of network load is suitable to sfficiently
stress the system? We ave many kinds of systems of all types in the
lab, and I can easily run such a test.
But I actually doubt that we would see any problem, as we perform
such testing on a regular base, and never had any such problems on
8xx systems.
> results. Like I keep saying, I've seen similar problems on the 860T
> parts, but it was clearly my fault programming the UPM. Once that was
> fixed, problem solved.
I agree 100%.
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-4596-87 Fax: (+49)-8142-4596-88 Email: wd@denx.de
Quantum Mechanics is God's version of "Trust me."
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
2004-03-26 9:04 ` Wolfgang Denk
@ 2004-03-26 13:44 ` Dan Malek
0 siblings, 0 replies; 6+ messages in thread
From: Dan Malek @ 2004-03-26 13:44 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Marcelo Tosatti, linuxppc-embedded
Wolfgang Denk wrote:
> Can you define which sort of network load is suitable to sfficiently
> stress the system? We ave many kinds of systems of all types in the
> lab, and I can easily run such a test.
Copying files on a root NFS usually works for me. :-)
> But I actually doubt that we would see any problem,
I just don't have an 855T handy, although I doubt this problem would
be unique to that part. I just wanted to cover that base.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
2004-03-25 23:13 Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors Marcelo Tosatti
2004-03-26 6:51 ` Dan Malek
@ 2004-03-26 7:23 ` LC Geldenhuys
2004-03-26 8:07 ` Dan Malek
1 sibling, 1 reply; 6+ messages in thread
From: LC Geldenhuys @ 2004-03-26 7:23 UTC (permalink / raw)
To: linuxppc-embedded
> -----Original Message-----
> From: owner-linuxppc-embedded@lists.linuxppc.org
> [mailto:owner-linuxppc-embedded@lists.linuxppc.org] On Behalf
> Of Marcelo Tosatti
> Sent: 26 March 2004 01:14 AM
> To: linuxppc-embedded@lists.linuxppc.org
> Subject: Kernel Mode Software Emulation NIP: 00001FFC - cache
> coherency problem on m8xx processors
<snip>
> I'm not exactly sure why we were jumping to "1FFC" instead of "2000",
> but adding "isync" before "bl transfer_to_handler" in both
> DecrementTimer
> and HardwareInterrupt fixed the problem for us.
This sounds distinctly familiar to the CPU13 Errata. Do you have
Instruction fetch show cycles enabled in ICTRL[IST_SER]? See:
http://e-www.motorola.com/files/32bit/doc/errata/MPC860CE.pdf
Cheers,
Lourens
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors
2004-03-26 7:23 ` LC Geldenhuys
@ 2004-03-26 8:07 ` Dan Malek
0 siblings, 0 replies; 6+ messages in thread
From: Dan Malek @ 2004-03-26 8:07 UTC (permalink / raw)
To: lourens; +Cc: linuxppc-embedded
LC Geldenhuys wrote:
> This sounds distinctly familiar to the CPU13 Errata.
Not exactly. This isn't the first instruction of the exception
handler. It's the first instruction of the general handler
that all of them call after excuting about 30 instructions of
context saving.
There are, however, several Errata associated with the setting
of the ICTRL[IST_SER] and page boundary instruction execution.
Thanks for pointing out ensuring the ICTRL is set for normal,
"no show" operation. It's worth checking that, too. The
default reset is not what you want here, the boot rom should
set this register to something for normal operation.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-03-26 13:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-25 23:13 Kernel Mode Software Emulation NIP: 00001FFC - cache coherency problem on m8xx processors Marcelo Tosatti
2004-03-26 6:51 ` Dan Malek
2004-03-26 9:04 ` Wolfgang Denk
2004-03-26 13:44 ` Dan Malek
2004-03-26 7:23 ` LC Geldenhuys
2004-03-26 8:07 ` Dan Malek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).