linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Software Emulation Kernel Panic
@ 2000-06-12 18:16 Lucinda Schafer
  2000-06-12 19:43 ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Lucinda Schafer @ 2000-06-12 18:16 UTC (permalink / raw)
  To: linuxppc-embedded, cort, paulus


Greetings:

On our MPC823-based custom boards, we are experiencing the "Kernel Mode
Software FPU Emulation" panic called from the SoftwareEmulation function in
/linux/arch/ppc/kernel/traps.c in the 2.2.13 kernel. We see this on boot-up
or shortly after.

Could you shed some light on some situations where this may happen? Most
other situations (unimplemented and illegal instructions, floating point,
etc.) would cause the user_mode(regs) call to return a TRUE condition, but
ours apparently returns a FALSE condition, thus the panic. Why does one get
a software emulation exception with the PR bit of the MSR equal to 0?

Any suggestions highly appreciated.

Lucinda Schafer
Staff Software Engineer
Adaptive Micro-Ware, Inc.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Software Emulation Kernel Panic
  2000-06-12 18:16 Software Emulation Kernel Panic Lucinda Schafer
@ 2000-06-12 19:43 ` Dan Malek
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Malek @ 2000-06-12 19:43 UTC (permalink / raw)
  To: Lucinda Schafer; +Cc: linuxppc-embedded, cort, paulus


Lucinda Schafer wrote:

> On our MPC823-based custom boards, we are experiencing the "Kernel Mode
> Software FPU Emulation" panic


This has little to do with floating point.  Nearly all instructions
the processor can't decode are vectored to this function.  It assumes
the primary reason you are here is to emulate floating point instructions.
If the function can't decode the instruction as a floating point operation,
it is really something the processor can't execute, so the panic message
spews forth.

> Could you shed some light on some situations where this may happen?

This can be either a software or hardware bug.  If it is a software
bug, just unravel the stack backtrace and debug it.  It could be a
trashed stack frame, resulting in a bad function return address, or
some indirect function call that was not properly computed.

It could also happen because of a hardware bug while fetching instructions
from memory.  Verify the NIP instruction that it tried to decode is
what is really supposed to be at that location in memory.  This is
a typical failure when the UPM is not programmed correctly.  Since
this is a custom board, have you verified all memory cycles?  Disable
the cache and try again, you will probably get a different result.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Software Emulation Kernel Panic
@ 2000-06-12 19:49 Wohlgemuth, Jason
  0 siblings, 0 replies; 6+ messages in thread
From: Wohlgemuth, Jason @ 2000-06-12 19:49 UTC (permalink / raw)
  To: 'Lucinda Schafer',
	'linuxppc-embedded@lists.linuxppc.org'


Hmmm... Does the kernel completely boot-up?  (As in a shell prompt)  Does
this occur when loading the shell?  What libraries are you using?  Where did
you get those libraries or did you build them yourself?  Or, is this
occurring before the kernel is completely booted?

Jason

-----Original Message-----
From: owner-linuxppc-embedded@lists.linuxppc.org
[mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Lucinda
Schafer
Sent: Monday, June 12, 2000 2:16 PM
To: linuxppc-embedded@lists.linuxppc.org; cort@cs.nmt.edu;
paulus@cs.anu.edu.au
Subject: Software Emulation Kernel Panic



Greetings:

On our MPC823-based custom boards, we are experiencing the "Kernel Mode
Software FPU Emulation" panic called from the SoftwareEmulation function in
/linux/arch/ppc/kernel/traps.c in the 2.2.13 kernel. We see this on boot-up
or shortly after.

Could you shed some light on some situations where this may happen? Most
other situations (unimplemented and illegal instructions, floating point,
etc.) would cause the user_mode(regs) call to return a TRUE condition, but
ours apparently returns a FALSE condition, thus the panic. Why does one get
a software emulation exception with the PR bit of the MSR equal to 0?

Any suggestions highly appreciated.

Lucinda Schafer
Staff Software Engineer
Adaptive Micro-Ware, Inc.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Software Emulation Kernel Panic
@ 2000-06-12 20:16 Lucinda Schafer
  2000-06-13  0:47 ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Lucinda Schafer @ 2000-06-12 20:16 UTC (permalink / raw)
  To: Dan Malek, Wolfgang Denk, Wohlgemuth, Jason, linuxppc-embedded


Thanks for answering so promptly.

Some more information...

We are using the mpc8xx-2.2.13 Kernel (Dan's) with the FPU and Memory
Management patches. We are using glibc 2.1 (compiled ourselves) for our
application (linking statically) and libc 1.99 for everything else.

Our system is stand-alone, running from flash, although at times we have the
ethernet connected for mounting our development box's drive. We have
multiple boards running. Some boards do seem more likely than others to
panic.

This is a very intermittent problem. I have a special application program
that will run automatically at bootup and shut the processor power down for
a minute, powerup, and then have the reboot sequence starts over again. I've
seen it run 21 hours like this before a panic occurs.

I have another system (2 boards that communicate every 4 minutes using
AX.25) that ran all weekend but I did not reboot the board--just kept the
application running--with no panic problem.

This seems to be a boot related problem, although we have observed that
usually the panic occurs on running a program from bash; although, one time
I saw it panic in the init process (exec from main.c). The panic usually if
not always occurs on the FIRST run of a program. If it works once, it will
not panic on this particular bootup ever.

Wolfgang's suggestion makes a lot of sense to me:
"Usually this means that you are running on an old  mask  revision  of
the  CPU,  which  still  has  the  (in)famous  "Cache Corruption When
Writing to Special Registers" bug."

I read in the MPC823 user's manual that accessing some off-core SFR's can
cause the software emulation exception with PR=0 if there is a bus error
(why should there be a bus error?? this isn't answered), which would cause
the particular message of "Kernel Mode Software FPU Emulation".

We will try turning off the cache as a test.

Lucinda Schafer
Adaptive Micro-Ware, Inc.
Staff Software Engineer

-----Original Message-----
From: Dan Malek [mailto:dan@netx4.com]
Sent: Monday, June 12, 2000 2:44 PM
To: Lucinda Schafer
Cc: linuxppc-embedded@lists.linuxppc.org; cort@cs.nmt.edu;
paulus@cs.anu.edu.au
Subject: Re: Software Emulation Kernel Panic


Lucinda Schafer wrote:

> On our MPC823-based custom boards, we are experiencing the "Kernel Mode
> Software FPU Emulation" panic


This has little to do with floating point.  Nearly all instructions
the processor can't decode are vectored to this function.  It assumes
the primary reason you are here is to emulate floating point instructions.
If the function can't decode the instruction as a floating point operation,
it is really something the processor can't execute, so the panic message
spews forth.

> Could you shed some light on some situations where this may happen?

This can be either a software or hardware bug.  If it is a software
bug, just unravel the stack backtrace and debug it.  It could be a
trashed stack frame, resulting in a bad function return address, or
some indirect function call that was not properly computed.

It could also happen because of a hardware bug while fetching instructions
from memory.  Verify the NIP instruction that it tried to decode is
what is really supposed to be at that location in memory.  This is
a typical failure when the UPM is not programmed correctly.  Since
this is a custom board, have you verified all memory cycles?  Disable
the cache and try again, you will probably get a different result.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Software Emulation Kernel Panic
  2000-06-12 20:16 Lucinda Schafer
@ 2000-06-13  0:47 ` Dan Malek
  2000-06-13  7:07   ` Wolfgang Denk
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2000-06-13  0:47 UTC (permalink / raw)
  To: Lucinda Schafer
  Cc: Dan Malek, Wolfgang Denk, Wohlgemuth, Jason, linuxppc-embedded


Lucinda Schafer wrote:
>
> Thanks for answering so promptly.

Enjoy it when you can, it doesn't always happen that way :-).

> We are using the mpc8xx-2.2.13 Kernel ......

You may want to try taking the entire CDK from the MontaVista FTP
server.  It is pretty nice to have a complete set of tools, kernel,
applications, filesystem that has seen QA and doesn't need patches
taken from the Internet.

> ..... Some boards do seem more likely than others to
> panic.

Some QA people would interpret this as hardware DVT failure.  Manufacturing
variations trigger borderline design decisions.

> ........ I have a special application program
> that will run automatically at bootup and shut the processor power down for
> a minute, powerup, and then have the reboot sequence starts over again.


A minute isn't very long.  If you suspect temperature related problems
put these things in an environmental chamber using temperature cycles
that actually affect operation.

> This seems to be a boot related problem,....
> ..... If it works once, it will
> not panic on this particular bootup ever.

So, are these identical systems?  Same processor silicon, same memory
devices, same lot of boards, same boot rom?  Are you properly initializing
the processor cache/mmu/debug registers from power up?

> Wolfgang's suggestion makes a lot of sense to me:
> "Usually this means that you are running on an old  mask  revision  of
> the  CPU,  which  still  has  the  (in)famous  "Cache Corruption When
> Writing to Special Registers" bug."

I believe that only affected some 860(T) processors.  I don't remember
that listed for any other processor model or silicon.


> ....software emulation exception with PR=0 if there is a bus error
> (why should there be a bus error?? this isn't answered),

There are lots of SPRs listed for PowerPC cores that the MPC8xx family
doesn't support (or need).  If you have some generic software that
would access these, you have to emulate the function in software.  There
isn't any Linux software that would access SPRs that don't exist.



	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Software Emulation Kernel Panic
  2000-06-13  0:47 ` Dan Malek
@ 2000-06-13  7:07   ` Wolfgang Denk
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Denk @ 2000-06-13  7:07 UTC (permalink / raw)
  To: Dan Malek; +Cc: Lucinda Schafer, linuxppc-embedded


In message <39458490.234545FC@embeddededge.com> Dan Malek wrote:
>
> > Wolfgang's suggestion makes a lot of sense to me:
> > "Usually this means that you are running on an old  mask  revision  of
> > the  CPU,  which  still  has  the  (in)famous  "Cache Corruption When
> > Writing to Special Registers" bug."
>
> I believe that only affected some 860(T) processors.  I don't remember
> that listed for any other processor model or silicon.

Unfortunately,  this  silicon  bug  is  present  on  the  early  mask
revisions of _all_ MPC8xx CPUs.

On the MPC823 it's listed as "CPU6. Possible  Data  Cache  Curruption
With  Special Purpose Register Access Located in Data Cache, Data MMU
or SIU" up to and including CPU revision 0.3  (mask  set  3F98S);  it
seems fixed in revision A (mask set 0H98G, 1H98G) and later.

For the MPC850 it's listed as "CPU7. Possible Data  Cache  Curruption
When  Writing  SPRs"  up  to and including CPU revision 0.3 (mask set
3F98S); it seems fixed in revision A  (mask  set  0H98G,  2H98G)  and
later. I have a few samples of MPC850 CPUs labeled as 7F98S which are
not listed by Motorola, which are also affected.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
"No matter where you go, there you are..."          - Buckaroo Banzai

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2000-06-13  7:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-06-12 18:16 Software Emulation Kernel Panic Lucinda Schafer
2000-06-12 19:43 ` Dan Malek
  -- strict thread matches above, loose matches on Subject: below --
2000-06-12 19:49 Wohlgemuth, Jason
2000-06-12 20:16 Lucinda Schafer
2000-06-13  0:47 ` Dan Malek
2000-06-13  7:07   ` Wolfgang Denk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).