linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Software Emulation Kernel Panic--specific information
@ 2000-06-21 17:26 Lucinda Schafer
  2000-06-21 20:08 ` Wolfgang Denk
  0 siblings, 1 reply; 5+ messages in thread
From: Lucinda Schafer @ 2000-06-21 17:26 UTC (permalink / raw)
  To: linuxppc-embedded


Greetings!

When we have our Kernel Mode Software FPU Emulation panic, we get the
following information.

The NIP, LR, and TRAP always seem to be set to the same values (others
vary). I assume those values are physical addresses.

If so, from System.map, the NIP seems to be between c0001f00 t Trap_1f and
c0002000 T transfer_to_handler.
The LR seems to be between c0000900 t Decrementer and c0000a00 t Trap_0a.
The TRAP is set to our Software Emulation trap, c0001000 t SoftEmu.

I can't make sense of this, since I don't fully understand what the
relationship of NIP and LR are.  Does the LR address refer to the return
address after the exception? Does this mean the exception happened in the
Decrementer timer interrupt? Why is the NIP set to a value between Trap_1f
and transfer_to_handler?

I need your expertise!

NIP: 00001FFC XER: 8000FF7F LR: 00000988 REGS: c0e87c90 TRAP: 1000
MSR: 00001000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
TASK = c0e86000[7] 'rc.sysinit' mm->pgd c0e8b000 Last syscall: 67
last math 00000000
GPR00: 00000000 C0E87D40 C0E86000 C0E87D50 0189EE34 C0133404 C0E88278
C0E8B000
GPR08: C00DA634 00FBA9E1 00FBA9E1 00FBA9E1 3555F593 018A31D0 00000000
00000D08
GPR16: 000000C1 01005000 0100A400 00300008 00001032 00E87D40 C00162F0
00009032
GPR24: 0189EE34 C0E88278 C0615760 00FBA8A1 C0FBA000 C0E86000 C0F5F000
C0132510
Call backtrace:
C0E87DF0 C0016204 C0016A98 C0009098 C0002544 018A4490 300E88A4
0180C6D4 0180C5D8 01803250 01802B64 01801D80
Instruction DUMP:

Lucinda Schafer
Staff Software Engineer
Adaptive Micro-Ware, Inc.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Software Emulation Kernel Panic--specific information
@ 2000-06-21 19:13 Lucinda Schafer
  0 siblings, 0 replies; 5+ messages in thread
From: Lucinda Schafer @ 2000-06-21 19:13 UTC (permalink / raw)
  To: diekema, linuxppc-embedded


- What processor are you using? MPC823
- Have you tried running with caches disabled? No, is this important?
- When do this problem happen? Just at the end of boot up 1 of every 1000
bootups
- Can you boot to a stand alone shell? Yes except when this happens
- Can you boot to multi-user? Yes, except when this happens
- What Linux bits are you using? ??not sure which bits to which you are
referring...

-----Original Message-----
From: diekema@bucks.si.com [mailto:diekema@bucks.si.com]
Sent: Wednesday, June 21, 2000 2:09 PM
To: lucsch@adaptivemicro.com
Subject: Re: Software Emulation Kernel Panic--specific information


> The results of the backtrace are as follows (manual, not by your script):

> C0016204  put_dirty_page
> C0016A98  handle_mm_fault
> C0009098  do_page_fault
> C0002544  _switch
> 018A4490  ???
> 300E88A4  ???
> 0180C6D4  ???
> 0180C5D8  ???
> 01803250  ???
> 01802B64  ???
> 01801D80  ???

> Does this tell us that we have a page that is not handled properly? Why?

I haven't been paying too much attention to your problem.  So, you
might of already answered these questions?

- What processor are you using?
- Have you tried running with caches disabled?
- When do this problem happen?
- Can you boot to a stand alone shell?
- Can you boot to multi-user?
- What Linux bits are you using?

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Software Emulation Kernel Panic--specific information
@ 2000-06-21 19:18 Lucinda Schafer
  0 siblings, 0 replies; 5+ messages in thread
From: Lucinda Schafer @ 2000-06-21 19:18 UTC (permalink / raw)
  To: diekema, linuxppc-embedded


Oops. One of my coworkers corrected me and asks:

I looked into the data cache setting in the MPC823 and, as far as I know, it
is disabled. Can you ask if there is something specific that we can look at
to determine if the caches are disabled and if there is something in Linux
that should be disabled?


Thanks.

-----Original Message-----
From: Lucinda Schafer [mailto:lucsch@adaptivemicro.com]
Sent: Wednesday, June 21, 2000 2:13 PM
To: diekema@bucks.si.com; linuxppc-embedded@lists.linuxppc.org
Subject: RE: Software Emulation Kernel Panic--specific information



- What processor are you using? MPC823
- Have you tried running with caches disabled? No, is this important?
- When do this problem happen? Just at the end of boot up 1 of every 1000
bootups
- Can you boot to a stand alone shell? Yes except when this happens
- Can you boot to multi-user? Yes, except when this happens
- What Linux bits are you using? ??not sure which bits to which you are
referring...

-----Original Message-----
From: diekema@bucks.si.com [mailto:diekema@bucks.si.com]
Sent: Wednesday, June 21, 2000 2:09 PM
To: lucsch@adaptivemicro.com
Subject: Re: Software Emulation Kernel Panic--specific information


> The results of the backtrace are as follows (manual, not by your script):

> C0016204  put_dirty_page
> C0016A98  handle_mm_fault
> C0009098  do_page_fault
> C0002544  _switch
> 018A4490  ???
> 300E88A4  ???
> 0180C6D4  ???
> 0180C5D8  ???
> 01803250  ???
> 01802B64  ???
> 01801D80  ???

> Does this tell us that we have a page that is not handled properly? Why?

I haven't been paying too much attention to your problem.  So, you
might of already answered these questions?

- What processor are you using?
- Have you tried running with caches disabled?
- When do this problem happen?
- Can you boot to a stand alone shell?
- Can you boot to multi-user?
- What Linux bits are you using?


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Software Emulation Kernel Panic--specific information
@ 2000-06-21 19:52 Lucinda Schafer
  0 siblings, 0 replies; 5+ messages in thread
From: Lucinda Schafer @ 2000-06-21 19:52 UTC (permalink / raw)
  To: diekema, linuxppc-embedded


We came across this before and got all excited until we looked at our mask
revision (3H97G) and searched through the errata. That particular errata is
no longer listed (I thought, though,it is CPU6, not CPU5). There is a fix
(or workaround) for CPU6 in the Montavista kernel. So far that hasn't fixed
our problem, though.

Thanks for all your ideas.

Lucinda Schafer
Staff Software Engineer
Adaptive Micro-Ware, Inc.


-----Original Message-----
From: diekema@bucks.si.com [mailto:diekema@bucks.si.com]
Sent: Wednesday, June 21, 2000 2:29 PM
To: lucsch@adaptivemicro.com
Subject: Re: Software Emulation Kernel Panic--specific information


This should help...

>From wd@denx.de Sat Feb 26 19:57:50 2000
Return-Path: <wd@denx.de>
Received: from checkers.si.com([126.1.8.254]) (6026 bytes) by bucks.si.com
	via sendmail with P:esmtp/D:user/T:local
	(sender: <wd@denx.de> owner: <real-diekema>)
	id <m12Os1t-001SwhC@bucks.si.com>
	for <diekema@bucks.si.com>; Sat, 26 Feb 2000 19:57:49 -0500 (EST)
	(Smail-3.2.0.111 2000-Feb-17 #1 built 2000-Feb-25)
Received: from challenger.si.com (unverified) by checkers.si.com
 (Content Technologies SMTPRS 2.0.15) with SMTP id
<B0000916151@checkers.si.com> for <diekema@bucks.si.com>;
 Sat, 26 Feb 2000 20:00:15 -0500
Received: by challenger.si.com; id TAA08911; Sat, 26 Feb 2000 19:57:42 -0500
(EST)
Received: from unknown(195.30.0.14) by challenger.si.com via smap (V5.5)
	id xma008900; Sat, 26 Feb 00 19:56:44 -0500
Received: (qmail 13206 invoked from network); 27 Feb 2000 00:56:42 -0000
Received: from denx.muc.de (193.149.49.53)
  by popmail.space.net with SMTP; 27 Feb 2000 00:56:42 -0000
Received: from denx.local.net (IDENT:root@denx.local.net [10.0.0.2])
	by denx.muc.de (8.9.3/8.9.3) with ESMTP id BAA25238
	for <diekema@bucks.si.com>; Sun, 27 Feb 2000 01:59:03 +0100
Received: from denx.local.net (IDENT:wd@localhost [127.0.0.1])
	by denx.local.net (8.9.3/8.9.3) with ESMTP id BAA21111
	for <diekema@bucks.si.com>; Sun, 27 Feb 2000 01:56:38 +0100
Message-Id: <200002270056.BAA21111@denx.local.net>
To: diekema@bucks.si.com (diekema_jon)
From: Wolfgang Denk <wd@denx.de>
Subject: Re: 27.75.228.193 vs. 207.75.228.193: I think I found the problem
X-Mailer: exmh version 1.6.4 10/10/1995
MIME-Version: 1.0
In-Reply-To: Your message of "Sat, 26 Feb 2000 17:45:40 EST."
             <m12Opy0-001SwhC@bucks>
Date: Sun, 27 Feb 2000 01:56:38 +0100
Sender: wd@denx.de
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Content-Length: 4445
X-Status:
X-Keywords:
X-UID: 60
Status: OR

On the other hand, while this obviously was  the  immediate  problem,
I'm not sure if you now will be much luckier. These other crashes are
not explained by the bad IP address.

And I have similar problems when running  a  more  extensive  set  of
tests on my module. Problems which look very familiar to me... :-(

I'm reading a Microprocessor Mask Set of 0160 in my IMMR - which is a
bit strange, since it  is  not  listed  in  the  Motorola  docs  (for
instance http://www.mot.com/SPS/RISC/netcomm/docs/pubs/860.html); but
the  printing  on  the  chip reads "XPC860TZP50B5", which indicates a
B.5/3J21M  mask  -   and   the   errata   sheet   (MPC860err.pdf   on
http://www.mot.com/SPS/RISC/netcomm/support/index.html)   still  con-
tains the CPU Bug #5: "Possible Data Cache  Corruption  When  Writing
SPR's".

This is a known problem, and it will KILL you during context switches
and/or interrupts with any OS which uses the MMU with virtual  memory
- including Linux.

Please check your CPU - I'm afraid that TQ shipped you old CPU,  too.
If this should be the case you need to get a replacement, because you
will  never  be  able to run a normal Linux with full speed on such a
system. [As a workaround and for verification that this is indeed the
problem,  you  can  disable  the   data   cache   -   please   modify
arch/ppc/kernel/head.S as follows:

--- arch/ppc/kernel/head.S.OLD	Tue Feb 15 01:10:40 2000
+++ arch/ppc/kernel/head.S	Sun Feb 27 01:49:16 2000
@@ -425,14 +425,6 @@
 	mtspr	IC_CST, r8
 #if 0
 	mtspr	DC_CST, r8
-#else
-	/* For a debug option, I left this here to easily enable
-	 * the write through cache mode
-	 */
-	lis	r8, DC_SFWT@h
-	mtspr	DC_CST, r8
-	lis	r8, IDC_ENABLE@h
-	mtspr	DC_CST, r8
 #endif

 /* We now have the lower 8 Meg mapped into TLB entries, and the caches


This will completely disable the data cache. You system will be a bit
slower than normal, but it should not crash any more.]


Seems we found both a software bug *and* a hardware problem :-(

I'm sorry, but I don't have any better news for you.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
In C we had to code our own bugs, in C++ we can inherit them.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Software Emulation Kernel Panic--specific information
  2000-06-21 17:26 Software Emulation Kernel Panic--specific information Lucinda Schafer
@ 2000-06-21 20:08 ` Wolfgang Denk
  0 siblings, 0 replies; 5+ messages in thread
From: Wolfgang Denk @ 2000-06-21 20:08 UTC (permalink / raw)
  To: Lucinda Schafer; +Cc: linuxppc-embedded


In message <A109131318C4D1119AC20060088DECE330F493@amwmail.adaptivemicro.com>
Lucinda Schafer wrote:
>
> The NIP, LR, and TRAP always seem to be set to the same values (others
> vary). I assume those values are physical addresses.

No. All these are virtual addresses.

> If so, from System.map, the NIP seems to be between c0001f00 t Trap_1f and
> c0002000 T transfer_to_handler.

It should be obvious that 0xCxxxxxxx is a virtual address.

> I can't make sense of this, since I don't fully understand what the
> relationship of NIP and LR are.  Does the LR address refer to the return
> address after the exception? Does this mean the exception happened in the
> Decrementer timer interrupt? Why is the NIP set to a value between Trap_1f
> and transfer_to_handler?

NIP means "Next Instruction Pointer" and contains the address of  the
statement following the one that caused the exception; LR is the link
register and contains the return address = the address where executun
continues when you return from the current function.

So in simple words: NIP-4 gives the IP (Intruction  Pointer)  aka  PC
(Program  Counter), and LR-4 is the place where your current function
was called.

Hope this helps,

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
To get something done, a committee should consist  of  no  more  than
three men, two of them absent.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-06-21 20:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-06-21 17:26 Software Emulation Kernel Panic--specific information Lucinda Schafer
2000-06-21 20:08 ` Wolfgang Denk
  -- strict thread matches above, loose matches on Subject: below --
2000-06-21 19:13 Lucinda Schafer
2000-06-21 19:18 Lucinda Schafer
2000-06-21 19:52 Lucinda Schafer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).