linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* kernel crashes at InstructionTLBMiss
@ 2000-06-04  4:40 Daniel Wu
  2000-06-05  2:32 ` Dan A. Dickey
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Daniel Wu @ 2000-06-04  4:40 UTC (permalink / raw)
  To: linuxppc-embedded


Hi,

I'm still having a few problems with my linux port (860T based board) so I hope
someone can give me some fresh ideas to how to track down the problem. When I
boot the target, I get the following output and nothing more.

loaded at:     00800000 0080B1D8
relocated to:  00B00000 00B0B1D8
board data at: 00B00190 00B001B8
relocated to:  007F0100 007F0128
zimage at:     00806000 0087C6C1
initrd at:     0087C6C1 00A53511
avail ram:     00A54000 02000000

Linux/PPC load:
Uncompressing Linux...done.
Now booting the kernel
Linux version 2.2.13 (aaluser@c1rb) (gcc version 2.95.2 19991024 (release)
) #97 Fri Jun 2 18:18:27 EST 2000
Boot arguments: root=/dev/ram
time_init: decrementer frequency = 187500000/60
Calibrating delay loop... 49.77 BogoMIPS
Memory: 29308k available (852k kernel code, 688k data, 32k init)
[c0000000,c2000
000]
DENTRY hash table entries: 262144 (order: 9, 2097152 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 8192 (order: 3, 32768 bytes)
POSIX conformance testing by UNIFIX

I then ran the same code using a BDM debugger and it is showing that the code
is crashing at InstructionTLBMiss:

InstructionTLBMiss:
#ifndef NO_MPC8xxBUG_CPU6
        stw     r3, 8(r0)
        li      r3, M_TW_ADDR
        stw     r3, 12(r0)
        lwz     r3, 12(r0)
        mtspr   M_TW, r20       /* Save a couple of working registers */
        mfcr    r20
        stw     r20, 0(r0)
        stw     r21, 4(r0)
        mfspr   r20, SRR0       /* Get effective address of fault */
        li      r3, MD_EPN_ADDR
        stw     r3, 12(r0)
        lwz     r3, 12(r0)
#else /* NO_MPC8xxBUG_CPU6 */
        mtspr   M_TW, r20       /* Save a couple of working registers */
        mfcr    r20
        stw     r20, 0(r0)
        stw     r21, 4(r0)
        mfspr   r20, SRR0       /* Get effective address of fault */
#endif /* NO_MPC8xxBUG_CPU6 */
        mtspr   MD_EPN, r20     /* Have to use MD_EPN for walk, MI_EPN can't */

        mfspr   r20, M_TWB      /* Get level 1 table entry address */
==>        lwz     r21, 0(r20)     /* Get the level 1 entry */
        rlwinm. r20, r21,0,0,20 /* Extract page descriptor page address */

Note that I've applied the patch by Marcus Sundberg but either way, the same
thing happens.

The values of the general registers at the crash point are:

r0: 00a54230 c0a55dc0 c0a54000 00003780 c0a54230 00000000 c00f2000 00000319
r8: 0000001f 400f1000 0000000b c00f5b5c 84000028 00000000 00000000 00000000
r16: 00000000 00000000 00000000 00000000 400f1c00 000f4c20 00000000 00000000
r24: c0002284 00000000 00000000 c00f4bf0 00000001 c0a54000 c00f2ca8 c00f4be8

As you can see, r20 is 400f1c00, which looks wrong, but why? Any suggestions?

Thanks,
Daniel


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-04  4:40 kernel crashes at InstructionTLBMiss Daniel Wu
@ 2000-06-05  2:32 ` Dan A. Dickey
  2000-06-05  8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Dan A. Dickey @ 2000-06-05  2:32 UTC (permalink / raw)
  To: Daniel Wu; +Cc: linuxppc-embedded


Daniel Wu wrote:
...
> I then ran the same code using a BDM debugger and it is showing that the code
> is crashing at InstructionTLBMiss:

Daniel,
your problem sounds suspiciously like a problem I was having with
the mpc8bug debugger.  Issuing a 'rms der 0' command before the
'go ...' worked well for me.  Why don't you give that a try?
	-Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-04  4:40 kernel crashes at InstructionTLBMiss Daniel Wu
  2000-06-05  2:32 ` Dan A. Dickey
@ 2000-06-05  8:19 ` Murray Jensen
  2000-06-05 20:37   ` Dan Malek
  2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek
  2000-06-30  6:17 ` Debug information for elf format Kwansuk Kim
  3 siblings, 1 reply; 22+ messages in thread
From: Murray Jensen @ 2000-06-05  8:19 UTC (permalink / raw)
  To: Daniel Wu; +Cc: linuxppc-embedded


>        mfspr   r20, M_TWB      /* Get level 1 table entry address */
...
>As you can see, r20 is 400f1c00, which looks wrong, but why? Any suggestions?

At this point, the MMU is disabled so r20, which is loaded from the MMU Table
Walk Base register, should be a physical address - 400f1c00 is not a likely
physical address for something in RAM (unless you have a weird disjoint RAM
setup), so yes it certainly looks wrong. Incorrect TWB contents is a disaster.

Here we come to a dilemma that I have had since I started with this stuff.
I have never been able to get an 8xx kernel running without adding a patch
to update the Table Walk Base register at the time that a new mm context is
activated.

Let me explain: normally the TWB is loaded at context switch time which makes
sense because a different task with a different virtual memory context will be
running. This is done in the following code in the _switch function in
arch/ppc/kernel/entry.S:

>	tophys(r0,r4)
>	mtspr	SPRG3,r0	/* Update current THREAD phys addr */
>#ifdef CONFIG_8xx
>	/* XXX it would be nice to find a SPRGx for this on 6xx,7xx too */
>	lwz	r9,PGDIR(r4)	/* cache the page table root */
>        tophys(r9,r9)		/* convert to phys addr */
>        mtspr   M_TWB,r9	/* Update MMU base address */
>	tlbia
>	SYNC
>#endif /* CONFIG_8xx */

The contents of the TWB should be the address stored in current->thread.pgdir
converted to a physical address. The above code is the only place that the
TWB is written to, anywhere in the kernel (that I can find). The TWB is then
used in the TLB miss handlers to load the TLB entry (assuming a mapping exists
- if not, do_page_fault() is called to fill it in).

But I have found that there is a situation during "exec()" where a newly
created mm context is "activated" (via activate_mm() in asm/mmu_context.h)
before the task is actually "switch"ed to (presumably to copy the arguments
and environment etc from the old task - which is being overwritten) i.e. the
TWB is not updated because a switch hasn't occurred [NOTE: this is only my
theory - I am not an expert on this stuff]

Without my patch, the exec of "/sbin/init" hangs in an endless TLBMiss handler
loop, where a virtual address is accessed which causes a TLB miss, the TWB
has contents of the old pgdir which does not have a mapping for that virtual
address so do_page_fault() is called to fill it in, but do_page_fault()
decides that that mapping exists and everything is ok so why the hell did
you call me, I'll just return doing nothing! - the access is re-tried which
causes a TLB miss again at the same virtual address. The kernel is in a
dead hang (although later 2.[34].* kernels exhibit different symptoms, which
mystifies me a bit - i.e. characters typed on the console are echoed, and I
know timer interrupts are occuring, because I have a rotating thingy on the
LCD display which updates once a second via the timer interrupt handler, so it
is not a complete dead hang).

The patch I always have to add to arch/ppc/kernel/head_8xx.S is:

  */
 _GLOBAL(set_context)
         mtspr   M_CASID,r3		/* Update context */
+	lwz	r3, THREAD+PGDIR(r2)
+	tophys(r3, r3)
+	mtspr	M_TWB, r3
         tlbia
 	SYNC
 	blr

I know this is wrong, but it seems to work for me (unless the TWB can be
considered to be part of the MMU context, and therefore it is legitimate
to update it in set_context()? I don't know).

I have tried other things e.g. adding a "set_context_and_twb()" function,
just after the set_context() function (without above patch), e.g.:

--- arch/ppc/kernel/head_8xx.S	2000/04/28 06:35:05	1.1.1.5
+++ arch/ppc/kernel/head_8xx.S	2000/06/05 07:51:50
@@ -905,6 +905,19 @@
 	SYNC
 	blr

+/*
+ * the 8xx tablewalk base register (M_TWB) must be consistent with
+ * the currently active mm. This is called from switch_mm() and
+ * activate_mm() in include/asm-ppc/mmu_context.h
+ */
+_GLOBAL(set_context_and_twb)
+        mtspr   M_CASID,r3		/* Update context */
+	tophys(r4, r4)
+	mtspr	M_TWB, r4
+	tlbia
+	SYNC
+	blr
+
 /* Jump into the system reset for the rom.
  * We first disable the MMU, and then jump to the ROM reset address.
  *

Then doing something like this:

--- include/asm-ppc/mmu_context.h	2000/03/07 03:59:54	1.1.1.2
+++ include/asm-ppc/mmu_context.h	2000/06/05 07:46:35
@@ -52,6 +52,11 @@
 extern void set_context(int context);

 #ifdef CONFIG_8xx
+/* same as above plus loads the 8xx tablewalk base register also */
+extern void set_context_and_twb(int, void *);
+#endif
+
+#ifdef CONFIG_8xx
 extern inline void mmu_context_overflow(void)
 {
 	atomic_set(&next_mmu_context, -1);
@@ -85,7 +90,10 @@
 {
 	tsk->thread.pgdir = next->pgd;
 	get_mmu_context(next);
-	set_context(next->context);
+	if (tsk == current)
+		set_context_and_twb(next->context, tsk->thread.pgdir);
+	else
+		set_context(next->context);
 }

 /*
@@ -96,7 +104,7 @@
 {
 	current->thread.pgdir = mm->pgd;
 	get_mmu_context(mm);
-	set_context(mm->context);
+	set_context_and_twb(mm->context, current->thread.pgdir);
 }

 /*

This works also, though I'm not sure about it. I was thinking that maybe the
set_context() in switch_mm() should only be done if the switch_mm() is being
performed on the "current" task. e.g.

	tsk->thread.pgdir = next->pgd;
	get_mmu_context(next);
	if (tsk == current)
		set_context(next->context);

Then set_context() could simply update the TWB with current->thread.pgdir.
But I think the only place switch_mm() is called is in the task context
switch code anyway, which means current is about to change, and also means
I get confused :-) But I know activate_mm() is used in other places -
something to do with "lazy tlb" mode, and also in exec(). I give up.

One thing I think is for certain in all this - do_page_fault() should
*NEVER* return without having done something - anything - to ensure
that the same fault does not re-occur after the handler returns - if it
can't handle the fault, it should either kill the task if it is in user
mode, or panic if in kernel mode.

One thing that bothers me is why this behaviour only occurs for me? I have
no idea, but obviously it is only me, otherwise no-one would have a working
embedded 8xx 2.[34].* kernel. I suspect I am doing something else which
triggers this bug, or else there is something I don't understand
(not unlikely :-).

Note: I have only ever tried the 2.[34].* series of kernels. I have not
tried the 2.2.* kernels, but some code snippets I have seen in the list
archives suggest ... I just searched the list and found the following
comment from Dan Malek on 16 Dec 98:

	> BTW, why must the M_TWB be set in SET_PAGE_DIR ?

	The M_TWB points to the first level page table (Linux pgd_t)
	and is used in the mpc8xx page fault handler.  When Linux
	deletes or otherwise modifies the memory map object such that
	the first level page table is modified (as during exec), it
	uses SET_PAGE_DIR.  Since the first level table has potentially
	moved to a new memory location, we have to set M_TWB at
	this time.  If we don't, a process exec without an intervening
	context switch will cause us to use a bogus M_TWB when
	trying to find page tables.


	    -- Dan

OK - where is SET_PAGE_DIR() in the 2.[34].* kernels? Following the threads
it appears that this discussion was had a long time ago, but in the other
direction - the TWB was being updated too often, and the consensus was that
it should only be updated when the SET_PAGE_DIR macro was setting the page
dir for the current task. Now it is not setting it at all.

I think I'd better shut up now and let other more experienced people tell me
what I have missed or where I have gone wrong :-) Cheers!
								Murray...
--
Murray Jensen, CSIRO Manufacturing Sci & Tech,         Phone: +61 3 9662 7763
Locked Bag No. 9, Preston, Vic, 3072, Australia.         Fax: +61 3 9662 7853
Internet: Murray.Jensen@cmst.csiro.au  (old address was mjj@mlb.dmt.csiro.au)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-04  4:40 kernel crashes at InstructionTLBMiss Daniel Wu
  2000-06-05  2:32 ` Dan A. Dickey
  2000-06-05  8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen
@ 2000-06-05 14:51 ` Dan Malek
  2000-06-05 15:55   ` Dan Malek
  2000-06-06  3:56   ` Daniel Wu
  2000-06-30  6:17 ` Debug information for elf format Kwansuk Kim
  3 siblings, 2 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-05 14:51 UTC (permalink / raw)
  To: Daniel Wu; +Cc: linuxppc-embedded


Daniel Wu wrote:

> boot the target, I get the following output and nothing more.
>
> loaded at:     00800000 0080B1D8
> relocated to:  00B00000 00B0B1D8
> board data at: 00B00190 00B001B8
> relocated to:  007F0100 007F0128
> zimage at:     00806000 0087C6C1
> initrd at:     0087C6C1 00A53511
> avail ram:     00A54000 02000000

There are several things to watch for.  First, I am surprised you see
this much output.  You have obviously changed link addresses in the
Makefile, which you shouldn't do.  Because of the early kernel
mapping, everything should reside in the lower 8Mbytes of memory.  The
zImage support loader (arch/ppc/mbxboot/...stuff...) should link to
low memory, 0x00100000.  You should load the image either just above
that, at 0x00200000 or in very high ROM addresses ( > 16 Mbyte).

You are also running an 860T at 50 MHz, so you are likely to discover
the "CPU6" silicon errata.  You need all of the patches for this.

Go to the MontaVista ftp site (ftp.mvista.com), /pub/CDK/wip/ppc_8xx/RPMS.
Get the kernel sources/headers from there (along with any other tools
you may want or need).  This is a 2.2.13 kernel with all patches and
the option to include the "CPU6" patch.  Don't apply any other patches
from anywhere.  Just use it and make the minimal changes for your
board.

Using a BDM is more likely to cause trouble than help.  This kernel
has XMON and KGDB options.  Use them instead of BDM.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek
@ 2000-06-05 15:55   ` Dan Malek
  2000-06-05 16:19     ` Dan Malek
  2000-06-06  3:59     ` Graham Stoney
  2000-06-06  3:56   ` Daniel Wu
  1 sibling, 2 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-05 15:55 UTC (permalink / raw)
  To: Dan Malek; +Cc: Daniel Wu, linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 326 bytes --]

Dan Malek wrote:

> ..... This is a 2.2.13 kernel with all patches and
> the option to include the "CPU6" patch.  Don't apply any other patches
> from anywhere.


I was just reminded of a patch to correct a mistake I made in this
particular kernel.  It is attached.  Apply just this one to the
MontaVista kernel :-).


	-- Dan

[-- Attachment #2: mv-cpu6-3.patch --]
[-- Type: text/plain, Size: 726 bytes --]

diff -Nru linux-2.2.13.orig/arch/ppc/kernel/head.S linux-2.2.13/arch/ppc/kernel/head.S
--- linux-2.2.13.orig/arch/ppc/kernel/head.S	Mon Jun  5 11:44:56 2000
+++ linux-2.2.13/arch/ppc/kernel/head.S	Mon Jun  5 11:45:59 2000
@@ -2452,11 +2452,11 @@
 	SYNC			/* Some chip revs need this... */
 	mtmsr	r6
 	SYNC
-	lis	r7, cmd_line@h
-	ori	r7, r7, cmd_line@l
+	lis	r7, cpu6_bug@h
+	ori	r7, r7, cpu6_bug@l
 	li	r4, 0x2c00
-	stw	r4, 12(r7)
-	lwz	r4, 12(r7)
+	stw	r4, 0(r7)
+	lwz	r4, 0(r7)
         mtspr   22, r3		/* Update Decrementer */
 	SYNC
 	mtmsr	r5
@@ -2899,6 +2899,10 @@
 	.globl	cmd_line
 cmd_line:
 	.space	512
+
+#ifdef CONFIG_8xx_CPU6
+	.space	4
+#endif

 /*
  * An undocumented "feature" of 604e requires that the v bit

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-05 15:55   ` Dan Malek
@ 2000-06-05 16:19     ` Dan Malek
  2000-06-06  3:59     ` Graham Stoney
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-05 16:19 UTC (permalink / raw)
  To: Dan Malek; +Cc: Daniel Wu, linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 294 bytes --]

Dan Malek wrote:

> I was just reminded of a patch to correct a mistake I made in this
> particular kernel.  It is attached.  Apply just this one to the
> MontaVista kernel :-).

Nope, not that one......try this one, sorry.  Too many windows open
and didn't see the build error......



	-- Dan

[-- Attachment #2: mv-cpu6-1.patch --]
[-- Type: text/plain, Size: 737 bytes --]

diff -Nru linux-2.2.13.orig/arch/ppc/kernel/head.S linux-2.2.13/arch/ppc/kernel/head.S
--- linux-2.2.13.orig/arch/ppc/kernel/head.S	Fri Mar 24 23:43:32 2000
+++ linux-2.2.13/arch/ppc/kernel/head.S	Fri Mar 24 23:51:19 2000
@@ -2452,11 +2452,11 @@
 	SYNC			/* Some chip revs need this... */
 	mtmsr	r6
 	SYNC
-	lis	r7, cmd_line@h
-	ori	r7, r7, cmd_line@l
+	lis	r7, cpu6_bug@h
+	ori	r7, r7, cpu6_bug@l
 	li	r4, 0x2c00
-	stw	r4, 12(r7)
-	lwz	r4, 12(r7)
+	stw	r4, 0(r7)
+	lwz	r4, 0(r7)
         mtspr   22, r3		/* Update Decrementer */
 	SYNC
 	mtmsr	r5
@@ -2899,6 +2899,10 @@
 	.globl	cmd_line
 cmd_line:
 	.space	512
+
+#ifdef CONFIG_8xx_CPU6
+cpu6_bug:
+	.space	4
+#endif

 /*
  * An undocumented "feature" of 604e requires that the v bit

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-05  8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen
@ 2000-06-05 20:37   ` Dan Malek
  2000-06-06  6:31     ` Murray Jensen
  2000-06-06 17:03     ` net driver receive problems Tom Roberts
  0 siblings, 2 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-05 20:37 UTC (permalink / raw)
  To: Murray Jensen; +Cc: Daniel Wu, linuxppc-embedded


Murray Jensen wrote:

> Here we come to a dilemma that I have had since I started with this stuff.
> I have never been able to get an 8xx kernel running without adding a patch
> to update the Table Walk Base register at the time that a new mm context is
> activated.


After reading your diatribe perhaps I should provide a little information.
There are many subtle changes to context switching that happen during
the minor updates (which could be weekly).  There are several patches
floating around (and probably more kernel sources) that certainly
are not correct.  I don't know where you get your source code, but there
are exactly two consistent and working kernel sources that I have ever
provided.  One is in ftp://linuxppc.cs.nmt.edu/pub/linuxppc/embedded,
the mpc8xx-2.2.13.tgz tarball.  A better and completely up to date
kernel is in ftp.mvista.com/pub/CDK/wip/ppc_8xx/RPMS (along with
everything else to build an 8xx embedded system).  Everyone should be
using the kernel from MontaVista, and if something isn't in there
that you want, send me patches against that.

There are patches posted against that original tarball, and make sure
you are not mixing kernel versions and patches.

Finally, lots of bugs associated with porting to new hardware manifest
themselves as "problems" in any VM related function.  Since many people
don't understand the subtle interactions of all of these functions (as
evidenced by your message) you become convinced the problem is associated
with this complexity and fail to unravel the clues to the real cause.
This could be as simple as intrusive debugging hardware, some silicon
bug not understood, or prototype hardware not working correctly.

There are lots of products and systems in development running this software,
so you have to approach this generic software from the assumption that
it is first likely to be working.  You seldom hear from those people.
Are there possible bugs?  Sure, and you have to provide minimal information
for the rest of us to help out.  Where did you get the sources? What
patches did you apply?  What are your hardware details?  What
modifications did you make?


As for 2.4.xx, the 8xx still doesn't work correctly.  However, I
discovered it failed to work after the 403 additions, so I am now
learning about the 403 in an effort to make everything live happily
together again.  Note, this has nothing to do with M_TWB......


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek
  2000-06-05 15:55   ` Dan Malek
@ 2000-06-06  3:56   ` Daniel Wu
  2000-06-06 20:18     ` Dan Malek
  2000-08-10 12:05     ` too few RAM? Wojciech Kromer
  1 sibling, 2 replies; 22+ messages in thread
From: Daniel Wu @ 2000-06-06  3:56 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


Dan,

Dan Malek wrote:

> > boot the target, I get the following output and nothing more.
> >
> > loaded at:     00800000 0080B1D8
> > relocated to:  00B00000 00B0B1D8
> > board data at: 00B00190 00B001B8
> > relocated to:  007F0100 007F0128
> > zimage at:     00806000 0087C6C1
> > initrd at:     0087C6C1 00A53511
> > avail ram:     00A54000 02000000
>
> There are several things to watch for.  First, I am surprised you see
> this much output.  You have obviously changed link addresses in the
> Makefile, which you shouldn't do.  Because of the early kernel
> mapping, everything should reside in the lower 8Mbytes of memory.  The
> zImage support loader (arch/ppc/mbxboot/...stuff...) should link to
> low memory, 0x00100000.  You should load the image either just above
> that, at 0x00200000 or in very high ROM addresses ( > 16 Mbyte).
>

The reason why I changed the address was because my uncompressed kernel is
about 1.3M. This means if I load at 0x100000 (the default), then the board data
gets trashed and I get nothing after the kernel has finished decompressing. I
initially moved the address to 0x200000, but then I was getting other strange
errors so I moved the whole thing higher into memory.

BTW, I found the probelm that caused the crash in InstructionTLBMiss, partly
thanks to Murray Jensen. I did not implement his patches but while reading his
email, and trying to follow it, I realised that the M_TWB was not initialised
properly in the first place! There was some code that was not suppose to be
there - probably introduced while trying to patch the file from various
sources.

I now get further, but unfortunately I'm not there yet. The code stops after
the RAM disk driver inits. Anyway, I'm thinking of starting from scratch with
the kernel and patches at the MontaVista site. Come to think of it, all my
changes are in one file so it should not be too difficult to port. I will let
you know how I go.

>
> You are also running an 860T at 50 MHz, so you are likely to discover
> the "CPU6" silicon errata.  You need all of the patches for this.
>
> Go to the MontaVista ftp site (ftp.mvista.com), /pub/CDK/wip/ppc_8xx/RPMS.
> Get the kernel sources/headers from there (along with any other tools
> you may want or need).  This is a 2.2.13 kernel with all patches and
> the option to include the "CPU6" patch.  Don't apply any other patches
> from anywhere.  Just use it and make the minimal changes for your
> board.
>
> Using a BDM is more likely to cause trouble than help.  This kernel
> has XMON and KGDB options.  Use them instead of BDM.

Unfortunately, if you get no output, BDM is the _only_ option you have - at
least it will give you some details of the registers, although you can't step
through code die to the virtual addresses.

Thanks for everyone suggestions.

Regards,
Daniel


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-05 15:55   ` Dan Malek
  2000-06-05 16:19     ` Dan Malek
@ 2000-06-06  3:59     ` Graham Stoney
  1 sibling, 0 replies; 22+ messages in thread
From: Graham Stoney @ 2000-06-06  3:59 UTC (permalink / raw)
  To: Dan Malek; +Cc: LinuxPPC Embedded Mailing List


Dan Malek writes:
> I was just reminded of a patch to correct a mistake I made in this
> particular kernel.  It is attached.  Apply just this one to the
> MontaVista kernel :-).

I'd like to suggest the following patch as a more complete fix to avoid any
possible cmd_line corruption due to the CPU6 workaround.

Regards,
Graham

Index: arch/ppc/kernel/head.S
===================================================================
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 head.S
--- arch/ppc/kernel/head.S	2000/03/10 01:11:12	1.1.1.3
+++ arch/ppc/kernel/head.S	2000/06/06 03:53:32
@@ -2286,15 +2286,15 @@
         lwz     r9,PGD(r9)              /* get new->mm->pgd */
         addis   r9,r9,-KERNELBASE@h     /* convert to phys addr */
 #ifdef CONFIG_8xx_CPU6
-	lis	r6, cmd_line@h
-	ori	r6, r6, cmd_line@l
+	lis	r6, cpu6_bug@h
+	ori	r6, r6, cpu6_bug@l
 	li	r7, 0x3980
-	stw	r7, 12(r6)
-	lwz	r7, 12(r6)
+	stw	r7, 0(r6)
+	lwz	r7, 0(r6)
         mtspr   M_TWB, r9               /* Update MMU base address */
 	li	r7, 0x3380
-	stw	r7, 12(r6)
-	lwz	r7, 12(r6)
+	stw	r7, 0(r6)
+	lwz	r7, 0(r6)
         mtspr   M_CASID, r5             /* Update context */
 #else
         mtspr   M_TWB, r9               /* Update MMU base address */
@@ -2432,11 +2432,11 @@
 	SYNC			/* Some chip revs need this... */
 	mtmsr	r6
 	SYNC
-	lis	r7, cmd_line@h
-	ori	r7, r7, cmd_line@l
+	lis	r7, cpu6_bug@h
+	ori	r7, r7, cpu6_bug@l
 	li	r4, 0x3980
-	stw	r4, 12(r7)
-	lwz	r4, 12(r7)
+	stw	r4, 0(r7)
+	lwz	r4, 0(r7)
         mtspr   M_TWB, r3               /* Update MMU base address */
 	SYNC
 	mtmsr	r5
@@ -2452,11 +2452,11 @@
 	SYNC			/* Some chip revs need this... */
 	mtmsr	r6
 	SYNC
-	lis	r7, cmd_line@h
-	ori	r7, r7, cmd_line@l
+	lis	r7, cpu6_bug@h
+	ori	r7, r7, cpu6_bug@l
 	li	r4, 0x2c00
-	stw	r4, 12(r7)
-	lwz	r4, 12(r7)
+	stw	r4, 0(r7)
+	lwz	r4, 0(r7)
         mtspr   22, r3		/* Update Decrementer */
 	SYNC
 	mtmsr	r5
@@ -2899,6 +2899,11 @@
 	.globl	cmd_line
 cmd_line:
 	.space	512
+
+#ifdef CONFIG_8xx_CPU6
+cpu6_bug:
+	.space 4
+#endif

 /*
  * An undocumented "feature" of 604e requires that the v bit


--
Graham Stoney
Principal Hardware/Software Engineer
Canon Information Systems Research Australia
Ph: +61 2 9805 2909  Fax: +61 2 9805 2929

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-05 20:37   ` Dan Malek
@ 2000-06-06  6:31     ` Murray Jensen
  2000-06-06 20:05       ` Dan Malek
  2000-06-07  3:02       ` Dan A. Dickey
  2000-06-06 17:03     ` net driver receive problems Tom Roberts
  1 sibling, 2 replies; 22+ messages in thread
From: Murray Jensen @ 2000-06-06  6:31 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


On Mon, 05 Jun 2000 16:37:55 -0400, Dan Malek <dan@netx4.com> writes:
>Murray Jensen wrote:
>
>> Here we come to a dilemma that I have had since I started with this stuff.
>> I have never been able to get an 8xx kernel running without adding a patch
>> to update the Table Walk Base register at the time that a new mm context is
>> activated.
>
>
>After reading your diatribe

Diatribe? Hmm.. Sorry, I didn't mean to offend you - I thought I was being
reasonably clear, and definitely polite. I wasn't being at all critical of
anyone associated with Linux/PPC or the 8xx embedded version - I think you
and they all do a great job, and I am very impressed. In my eagerness I left
out some information I should have provided, sorry. I will try to correct
that now.

I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the
base for my local changes. I use a Sun Ultra 60 dual cpu sparc workstation
running Solaris 2.7 as my host o/s, with gcc-2.95.2, the latest binutils from
the CVS repository at :pserver:anoncvs@anoncvs.cygnus.com:/cvs/src, and
glibc-2.1.3 configured as an mpc8xx cross-compiler for Solaris. I build
my own root filesystem, based on sources from the net. When I compile the
kernel, I build zImage.initrd and download it to the target using the GDB
protocol via a serial port.

My hardware is a Cogent CMA102 motherboard, with CMA286-60 CPU module
(MPC860 cpu - rev no. XPC860MHZP66C1), and CMA302 I/O module with 8Mb
flash. The motherboard has 32Mb RAM, 2 serial and 1 parallel ports, and
LCD display. The cpu module has a 128K boot eprom, which I load with a
small ROM monitor I wrote based on the GDB eprom stubs configuration of
eCos (embedded cygnus operating system - which supports the cogent
platform). The monitor supports downloading via the serial port (at
230400bps) into RAM using the GDB protocol, programming flash from a
RAM image, and booting an image that resides in flash, among other
things (I call it ELILO :-).

Modifications I make to the kernel are minimal - just drivers for devices
on the cogent platform (including the I/O mappings, which are different
to the MBX in that they reside in the lower half of the address space which
required me to use ioremap() correctly by setting ioremap_base and saving
its return value and using this to access my devices) and some other minor
changes, which I believe are not relevant. The only major change I have had
to make to the kernel is the one I discussed in my previous message.

I checked this out again, and one other change was moving most of the code
at _start in head_8xx.S to after the exception handlers because the extra
mappings required for the Cogent devices caused this code to exceed 0x100
bytes. The other thing I added was making use of the MPC860 watchdog
which I could do because I had control of the boot eprom code (if the
kernel hangs I get a watchdog reset in some circumstances, depending
on the type of hang).

>There are many subtle changes to context switching that happen during
>the minor updates (which could be weekly).

I usually update daily, or every couple of days, a local copy of the
bitkeeper repository (using rsync, but I also maintain a read-only
anonymous bitkeeper clone which I bk pull at the same time, because I
like to use bk sccstool to follow the changes), which I then "import"
into a vendor branch of a local CVS repository. My local changes are
maintained in the HEAD revision. I also maintain a "stable" branch
which is a working kernel, based on repository as at October 1999.

>There are several patches
>floating around (and probably more kernel sources) that certainly
>are not correct.

I don't use any patches from the net - all changes made are local.

>I don't know where you get your source code, but there
>are exactly two consistent and working kernel sources that I have ever
>provided.  One is in ftp://linuxppc.cs.nmt.edu/pub/linuxppc/embedded,
>the mpc8xx-2.2.13.tgz tarball.  A better and completely up to date
>kernel is in ftp.mvista.com/pub/CDK/wip/ppc_8xx/RPMS (along with
>everything else to build an 8xx embedded system).  Everyone should be
>using the kernel from MontaVista, and if something isn't in there
>that you want, send me patches against that.

These are all 2.2.x, no? I believe I need 2.[34].x because I want to use
the latest RT-Linux stuff eventually, which only works with the 2.3.x, or
later, kernels.

>There are patches posted against that original tarball, and make sure
>you are not mixing kernel versions and patches.

As I say, I use a pristine 2.[34].x kernel with local changes only.

>Finally, lots of bugs associated with porting to new hardware manifest
>themselves as "problems" in any VM related function.  Since many people
>don't understand the subtle interactions of all of these functions (as
>evidenced by your message) you become convinced the problem is associated
>with this complexity and fail to unravel the clues to the real cause.

I don't think I deserve this sort of belittling. Treating potential
contributors in this way can only have a negative effect on open
source development. I admit I don't yet fully understand the PowerPC
architecture, or the MPC8xx implementation of it, but I am learning,
and with nearly 20 years experience in computer science I believe I
should be able to pick it up eventually (I've "seen it all before" :-).

>This could be as simple as intrusive debugging hardware,

I use kgdb.

>some silicon
>bug not understood,

I included my chip revision above. It appears to be a C1 revision chip.

>or prototype hardware not working correctly.

Definitely.

>There are lots of products and systems in development running this software,
>so you have to approach this generic software from the assumption that
>it is first likely to be working.

I did. I said I was intrigued as to why this problem only affected me. And
once I make the described change, the "generic software" works for me also
(at least an older revision works - current revisions still crash, something
to do with the memory allocation stuff, I believe).

As I said in my previous message, I suspect something else I am doing is
triggering this bug (that much is obvious), but there are two possibilities:
either I am doing something wrong in my local changes, or the "generic
software" has a bug which does not show up in anyone else's implementation. I
was wondering whether the latter was the case (I wasn't blaming anyone, I was
excited that maybe I had discovered a long existing hidden fault in the
software, that may explain some mysterious failure modes, that someone else
might be getting - other developers may then post, saying "yeah, that would
explain my problem, blah blah", and so the discussion goes on. Upon searching
the archives, I found that a similar problem had been discussed for the 2.2.x
kernels, so maybe the fix or fixes didn't make their way into the 2.[34].x
kernels. I don't know, anything is possible, that's why we have these
discussion groups).

>Are there possible bugs?  Sure, and you have to provide minimal information
>for the rest of us to help out.

Again, apologies for not providing enough information in my message - I made
assumptions I shouldn't have. Obviously, on my first post I should have been
completely anal, because no-one knows me from a bar of soap. I can then start
to be less exacting after I have been around for a while.

>Where did you get the sources? What
>patches did you apply?  What are your hardware details?  What
>modifications did you make?

See above.

>As for 2.4.xx, the 8xx still doesn't work correctly.  However, I
>discovered it failed to work after the 403 additions, so I am now
>learning about the 403 in an effort to make everything live happily
>together again.

It was my feeling that the problems were to do with the new memory allocation
stuff introduced a couple of months ago.

>Note, this has nothing to do with M_TWB......

I know. Now that we have gotten past treating me like a dill, please can you
re-read my original message and see if I am making any sense at all? I would
very much appreciate some insights and even constructive criticism. Cheers!
								Murray...

PS: I haven't contributed the Cogent platform changes yet, because I wasn't
happy that I had done everything properly. This was really my first foray
into taking part in the Linux/PPC embedded development community - I can't
say it has been particularly successful (despite my good feelings about
contributing a small fix a couple of days ago). I will try not to be too
discouraged.
--
Murray Jensen, CSIRO Manufacturing Sci & Tech,         Phone: +61 3 9662 7763
Locked Bag No. 9, Preston, Vic, 3072, Australia.         Fax: +61 3 9662 7853
Internet: Murray.Jensen@cmst.csiro.au  (old address was mjj@mlb.dmt.csiro.au)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* net driver receive problems
  2000-06-05 20:37   ` Dan Malek
  2000-06-06  6:31     ` Murray Jensen
@ 2000-06-06 17:03     ` Tom Roberts
  1 sibling, 0 replies; 22+ messages in thread
From: Tom Roberts @ 2000-06-06 17:03 UTC (permalink / raw)
  To: linuxppc-embedded


Does anybody know how to write a net driver for a 2.2 kernel?

Rubini's book _Linux_Device_Drivers_ only covers the 2.1 kernel,
and significant changes have been made since then.

I have essentially identical drivers on board and host, but
they behave differently -- the PowerPC version works but the i386
does not.

In particular, I have looked through the kernel code and made the
driver no longer crash the kernel. But I cannot get the host's
network stack to accept packets received by my driver; the powerpc
Linux stack accepts them just fine.

My configuration is that I boot a PowerPC board (with Linux
2.2.15-2.9.0) from an i386 host running RedHat Linux 2.2.14-5.0.
I have a SIO driver on both host and board which becomes the console
of the board's Linux, and on the host the boot program becomes a
cheap terminal emulator so I can issue commands to the board's Linux
and see its printk's and responses. The on-board linux comes up
fine and my init script configures and starts the network device.

ifconfig of my device on the host shows packets received, but
netstat -s shows IP did not get them. When I dump the skb the data
it contains looks at least superficially like a valid IP datagram
(the 3rd word is the IP address of the board, and the 4th word is
the IP address of the host [count words from 0]).

The weird thing is that when I ping my PowerPC board from the
i386 host, the packets are received on the PowerPC Linux just fine,
and they are returned just fine, but the host does not see them
after the driver calls netif_rx(). On the PowerPC "netstat -s"
shows all packets received and sent by both IP and ICMP; on the
host neither IP nor ICMP sees any packets. On both PowerPC and
host, "ifconfig lspsnet" shows the right number of packets sent
and received. And debugging printk-s of the skb just before the
call to netif_rx() are quite similar on the PowerPC and host --
the IP addresses are interchanged as expected.


What I think are the relevant details: this is not an ethernet;
my do_rcvpkt() is called every tick using a timer; it checks for
a packet arrival (in a memory buffer), and when one arrives:
	// data = pointer to the packet data
	// len = length of the packet data (# bytes)
	/* allocate and fill a skb */
        len4 = (len+3) & ~3;
        skb = dev_alloc_skb(len4);
        if(!skb) return (npkt ? 0 : -ENOMEM);
	memcpy(skb_put(skb,len),data,len);
        skb->dev = dev;
        skb->protocol = ETH_P_IP;
        skb->pkt_type = PACKET_HOST;
        skb->ip_summed = CHECKSUM_UNNECESSARY;
        /* all packets received are for us -- fake mac addr */
        skb->mac.raw = skb_push(skb,dev->addr_len);
        memcpy(skb->mac.raw,dev->dev_addr,dev->addr_len);
        skb_pull(skb,dev->addr_len);
	// deliver skb to higher layers
        netif_rx(skb);
	// update counter and statistics
	++npkt;
	dev->last_rx = jiffies;
        ++Enet_stats.rx_packets;
        Enet_stats.rx_bytes += len;

Note the attempt to fake out the mac address (mac.raw must point
between skb->head and skb->data). My dev->addr_len is 1, but
changing it to 6 did not help. Printing the skb contents shows that
the drivers do indeed transfer the data unchanged, and I think
I set all the required skb fields above.... WHAT AM I MISSING???


Tom Roberts	tjroberts@lucent.com

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-06  6:31     ` Murray Jensen
@ 2000-06-06 20:05       ` Dan Malek
  2000-06-07  3:05         ` Dan A. Dickey
  2000-06-07  9:17         ` Murray Jensen
  2000-06-07  3:02       ` Dan A. Dickey
  1 sibling, 2 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-06 20:05 UTC (permalink / raw)
  To: Murray Jensen; +Cc: Dan Malek, linuxppc-embedded


Murray Jensen wrote:

> I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the
> base for my local changes.

This has not run correctly on the 8xx for quite some time.  It won't
boot since the addition of the IBM403 changes.

> .... (including the I/O mappings, which are different
> to the MBX in that they reside in the lower half of the address space which
> required me to use ioremap() correctly by setting ioremap_base and saving
> its return value and using this to access my devices) and some other minor
> changes, which I believe are not relevant.

Not again......Did you read any of my past postings about memory
mapping on the 8xx?  You can't change ioremap_base, and any memory
mapping change is highly relevant.

> I checked this out again, and one other change was moving most of the code
> at _start in head_8xx.S


Oh geeze.....Let me quickly paraphrase what I have written in the past.
You should not be changing _any_ code in head_8xx.S.  This code will
minimally map some memory and the IMMR.  This is all that is required
to boot the kernel into further initialization functions.  If there
are some devices that you must use early (such as board control/status
registers), you ioremap() these in arch/ppc/mm/init.c.  These physical
hardware addresses must reside outside of the user and kernel text/data
virtual addresses.


> ..... to after the exception handlers because the extra
> mappings required for the Cogent devices caused this code to exceed 0x100
> bytes.

All of this mapping should be done inside of the device drivers, not
part of the early kernel initialization.

> These are all 2.2.x, no? I believe I need 2.[34].x because I want to use
> the latest RT-Linux stuff eventually, which only works with the 2.3.x, or
> later, kernels.

Yes, but 2.4.xx doesn't work right now.  I am trying to get that
working among other things.  You have to back up to a much older
version of 2.3.xx if you want to use this baseline right now.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kernel crashes at InstructionTLBMiss
  2000-06-06  3:56   ` Daniel Wu
@ 2000-06-06 20:18     ` Dan Malek
  2000-08-10 12:05     ` too few RAM? Wojciech Kromer
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Malek @ 2000-06-06 20:18 UTC (permalink / raw)
  To: Daniel Wu; +Cc: Dan Malek, linuxppc-embedded


Daniel Wu wrote:

> The reason why I changed the address was because my uncompressed kernel is
> about 1.3M. This means if I load at 0x100000 (the default), then the board data
> gets trashed

Yes, but just move it up a little.  I know you are running the 2.2.xx
kernel, and in the 2.3/2.4 kernel I moved this to 0x180000, which is
the only change necessary.

> .... I realised that the M_TWB was not initialised
> properly in the first place!

I would believe this in a 2.3.xx kernel, but not 2.2......

> ... Anyway, I'm thinking of starting from scratch with
> the kernel and patches at the MontaVista site.

Please do.  I know that runs on many platforms.

> Unfortunately, if you get no output, BDM is the _only_ option you have - at
> least it will give you some details of the registers, although you can't step
> through code die to the virtual addresses.

A good boot rom is a better debugging tool than a BDM.  The BDM is only
useful for the first few instructions of the kernel.  Dumping out key
kernel data structures is more useful than the contents of registers
at the time a BDM catches a trap.

Porting to a new 8xx board is almost a no brainer with the MontaVista
2.2.13 and later kernels.  All you need to do is properly set the IMMR
and the processor clock speed in the board information structure.  Any
board will boot far enough to get console output and attach KGDB or XMON.
When you start changing lots of code before this point, you are just
asking for trouble.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-07  3:02       ` Dan A. Dickey
@ 2000-06-06 21:37         ` Steve Tarr
  0 siblings, 0 replies; 22+ messages in thread
From: Steve Tarr @ 2000-06-06 21:37 UTC (permalink / raw)
  To: Dan A. Dickey; +Cc: Murray Jensen, Dan Malek, linuxppc-embedded


"Dan A. Dickey" wrote:
Clip, clip, clip....
> > discussion groups).
>
> Murray,
> as far as I know - you are maybe the only one running 2.3.x on
> a powerpc.  Most of the kernels that one can find lying about
> are 2.2.x (13/14? Can't remember at the moment).
>
> I, as well as others, definitely want to see 2.3.x or 2.4.0 running
> on an embedded powerpc.
>

Hey, it does. I have 2.3.99-pre7 hacked and running on a MPC8260.
Actually,
a pretty clean port with the exception of handling the SCC as the
console.

> ...
>
> > Again, apologies for not providing enough information in my message - I made
> > assumptions I shouldn't have. Obviously, on my first post I should have been
> > completely anal, because no-one knows me from a bar of soap. I can then start
> > to be less exacting after I have been around for a while.
>
> Everyone enjoys sarcasm... :)  (Don't they?)
>
> > >Where did you get the sources? What
> > >patches did you apply?  What are your hardware details?  What
> > >modifications did you make?
> >
> > See above.
> >
> > >As for 2.4.xx, the 8xx still doesn't work correctly.  However, I
> > >discovered it failed to work after the 403 additions, so I am now
> > >learning about the 403 in an effort to make everything live happily
> > >together again.
> >
> > It was my feeling that the problems were to do with the new memory allocation
> > stuff introduced a couple of months ago.
> >
> > >Note, this has nothing to do with M_TWB......
> >
> > I know. Now that we have gotten past treating me like a dill, please can you
> > re-read my original message and see if I am making any sense at all? I would
> > very much appreciate some insights and even constructive criticism. Cheers!
> >                                                                 Murray...
> >
> > PS: I haven't contributed the Cogent platform changes yet, because I wasn't
> > happy that I had done everything properly. This was really my first foray
> > into taking part in the Linux/PPC embedded development community - I can't
> > say it has been particularly successful (despite my good feelings about
> > contributing a small fix a couple of days ago). I will try not to be too
> > discouraged.
>
> That's the spirit!
>

I get more done asking stupid questions than I do pondering the
elusive answer. Elephant hide and a resonable reputation of getting
things
done helps. Hang tough and have fun.......

Cheers --
tarr

>         -Dan    (A different one).
>

--
Steven Tarr
Lucent Technologies - Bell Labs
303-538-4056
tarr@lucent.com

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-06  6:31     ` Murray Jensen
  2000-06-06 20:05       ` Dan Malek
@ 2000-06-07  3:02       ` Dan A. Dickey
  2000-06-06 21:37         ` Steve Tarr
  1 sibling, 1 reply; 22+ messages in thread
From: Dan A. Dickey @ 2000-06-07  3:02 UTC (permalink / raw)
  To: Murray Jensen; +Cc: Dan Malek, linuxppc-embedded


Murray Jensen wrote:
>
> On Mon, 05 Jun 2000 16:37:55 -0400, Dan Malek <dan@netx4.com> writes:
...
> >After reading your diatribe
>
> Diatribe? Hmm.. Sorry, I didn't mean to offend you - I thought I was being
> reasonably clear, and definitely polite.
...
> >Finally, lots of bugs associated with porting to new hardware manifest
> >themselves as "problems" in any VM related function.  Since many people
> >don't understand the subtle interactions of all of these functions (as
> >evidenced by your message) you become convinced the problem is associated
> >with this complexity and fail to unravel the clues to the real cause.
>
> I don't think I deserve this sort of belittling. Treating potential
> contributors in this way can only have a negative effect on open
> source development.

Murray,
please - hang in there.  We need more people like you.
Cut Dan some slack - he appears to be a genius at programming,
but maybe is a little short on people skills.  He means no harm,
but calls them as he sees them.  And as in baseball, not everyone
always agrees with the umpire.  :)
(At least; this is the impression
I've gathered in the relatively short time I've made his acquaintance
and have been reading this list).

>
> >some silicon
> >bug not understood,
>
> I included my chip revision above. It appears to be a C1 revision chip.
>
> >or prototype hardware not working correctly.
>
> Definitely.
>
> >There are lots of products and systems in development running this software,
> >so you have to approach this generic software from the assumption that
> >it is first likely to be working.
>
> I did. I said I was intrigued as to why this problem only affected me. And
> once I make the described change, the "generic software" works for me also
> (at least an older revision works - current revisions still crash, something
> to do with the memory allocation stuff, I believe).
>
> As I said in my previous message, I suspect something else I am doing is
> triggering this bug (that much is obvious), but there are two possibilities:
> either I am doing something wrong in my local changes, or the "generic
> software" has a bug which does not show up in anyone else's implementation. I
> was wondering whether the latter was the case (I wasn't blaming anyone, I was
> excited that maybe I had discovered a long existing hidden fault in the
> software, that may explain some mysterious failure modes, that someone else
> might be getting - other developers may then post, saying "yeah, that would
> explain my problem, blah blah", and so the discussion goes on. Upon searching
> the archives, I found that a similar problem had been discussed for the 2.2.x
> kernels, so maybe the fix or fixes didn't make their way into the 2.[34].x
> kernels. I don't know, anything is possible, that's why we have these
> discussion groups).

Murray,
as far as I know - you are maybe the only one running 2.3.x on
a powerpc.  Most of the kernels that one can find lying about
are 2.2.x (13/14? Can't remember at the moment).

I, as well as others, definitely want to see 2.3.x or 2.4.0 running
on an embedded powerpc.

...

> Again, apologies for not providing enough information in my message - I made
> assumptions I shouldn't have. Obviously, on my first post I should have been
> completely anal, because no-one knows me from a bar of soap. I can then start
> to be less exacting after I have been around for a while.

Everyone enjoys sarcasm... :)  (Don't they?)

> >Where did you get the sources? What
> >patches did you apply?  What are your hardware details?  What
> >modifications did you make?
>
> See above.
>
> >As for 2.4.xx, the 8xx still doesn't work correctly.  However, I
> >discovered it failed to work after the 403 additions, so I am now
> >learning about the 403 in an effort to make everything live happily
> >together again.
>
> It was my feeling that the problems were to do with the new memory allocation
> stuff introduced a couple of months ago.
>
> >Note, this has nothing to do with M_TWB......
>
> I know. Now that we have gotten past treating me like a dill, please can you
> re-read my original message and see if I am making any sense at all? I would
> very much appreciate some insights and even constructive criticism. Cheers!
>                                                                 Murray...
>
> PS: I haven't contributed the Cogent platform changes yet, because I wasn't
> happy that I had done everything properly. This was really my first foray
> into taking part in the Linux/PPC embedded development community - I can't
> say it has been particularly successful (despite my good feelings about
> contributing a small fix a couple of days ago). I will try not to be too
> discouraged.

That's the spirit!

	-Dan	(A different one).

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-06 20:05       ` Dan Malek
@ 2000-06-07  3:05         ` Dan A. Dickey
  2000-06-07  9:17         ` Murray Jensen
  1 sibling, 0 replies; 22+ messages in thread
From: Dan A. Dickey @ 2000-06-07  3:05 UTC (permalink / raw)
  To: Dan Malek; +Cc: Murray Jensen, linuxppc-embedded


Dan Malek wrote:
...
> Yes, but 2.4.xx doesn't work right now.  I am trying to get that
> working among other things.

Dan,
is there anyway others can help?  (Sure there are,
just let us know how...)
	-Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss )
  2000-06-06 20:05       ` Dan Malek
  2000-06-07  3:05         ` Dan A. Dickey
@ 2000-06-07  9:17         ` Murray Jensen
  1 sibling, 0 replies; 22+ messages in thread
From: Murray Jensen @ 2000-06-07  9:17 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


On Tue, 06 Jun 2000 16:05:42 -0400, Dan Malek <dan@netx4.com> writes:
>Murray Jensen wrote:
>
>> I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the
>> base for my local changes.
>
>This has not run correctly on the 8xx for quite some time.

I know - I said as much in my message. I have a working 2.3.x kernel
from some months ago (October 1999).

>It won't
>boot since the addition of the IBM403 changes.

It boots fine for me, but eventually crashes with the following:

	kmem_alloc: Bad slab magic (corrupt) (name=buffer_head)

As far as I can tell, a completely new method of memory allocation was
introduced a few months ago and it hasn't worked since.

>> .... (including the I/O mappings, which are different
>> to the MBX in that they reside in the lower half of the address space which
>> required me to use ioremap() correctly by setting ioremap_base and saving
>> its return value and using this to access my devices) and some other minor
>> changes, which I believe are not relevant.
>
>Not again......Did you read any of my past postings about memory
>mapping on the 8xx?

I scanned the archives as best I could. I only found the linuxppc mailing
lists a couple of weeks ago (I don't know how I managed to overlook them
before).

>You can't change ioremap_base,

I have changed ioremap_base and it runs fine with a kernel based on 2.3.x
as at October 1999.

The way it was being done before is, in my opinion, incorrect. The return
value from ioremap() was being ignored, which is the same as assuming that
physical address == virtual address for all I/O mappings (because ioremap()
uses the physical address as the virtual address if the physical address is
greater than or equal to ioremap_base, but because the 8xx port does not set
ioremap_base, it defaults to zero, hence all I/O mappings are done in this
fashion).

The Cogent CMA286-60 by default has I/O devices starting at 0x02000000.
One of my problems in the early days of porting to Cogent was that I blindly
copied the way I/O mappings were being done for other platforms. When it
didn't work I had to find out why - of course it was because having an I/O
device mapped to kernel virtual address 0x02000000 was not a good idea.

I could move the location of these I/O devices in the physical address space
by manipulating the PowerPC hardware in the boot rom, but this would be
confusing at best (because the Cogent documentation says otherwise) and the
kernel would then be reliant upon being booted from a compatible boot ROM,
or to make the kernel independent, I could change the hardware mappings at
kernel boot time, but this would require hacking head_8xx.S and I didn't want
to change anything in there at the time.

So I instead chose to do the ioremap()'ing correctly, by setting ioremap_base
to a sensible value (I chose 0xf8000000, which isn't to say this is a sensible
value, just the value I chose), storing the return value from ioremap() and
using that as the base virtual address for access to the cogent I/O devices.

>and any memory
>mapping change is highly relevant.

OK, if you say so (and it makes sense), but I don't believe my I/O mappings
are causing any problems. However, I will change my boot rom and add a command
that will change the hardware mappings so that the cogent devices are up high
in the physical address space (by programming the base and option registers in
the memory controller), then I can test a kernel with a pristine
arch/ppc/mm/init.c and see how much difference it makes (this will take me a
while). I don't consider this a high priority though, since I have a working
kernel using these memory mappings.

>> I checked this out again, and one other change was moving most of the code
>> at _start in head_8xx.S
>
>Oh geeze.....Let me quickly paraphrase what I have written in the past.
>You should not be changing _any_ code in head_8xx.S.  This code will
>minimally map some memory and the IMMR.  This is all that is required
>to boot the kernel into further initialization functions.  If there
>are some devices that you must use early (such as board control/status
>registers), you ioremap() these in arch/ppc/mm/init.c.

I wanted to access the Cogent LCD display for diagnostic purposes, before
MMU_init was called. I simply added a second 8Mb temporary TLB entry (almost
identical to the one for the IMMR). This TLB entry would have been invalidated
after the first tlbia, same as for the IMMR. This was the only change to
head_8xx.S (I am very careful making changes in there, if I do it at all), but
it meant the code went over the available 0x100 bytes, so I moved it to 0x2000
(by the same method that is used to transfer execution to "start_here").

In any case, I believe this does not affect anything else because I have
run with and without that change and it appears to make no difference (other
than that I cannot access the LCD display). My kernel (the working one)
runs fine in either case.

>These physical
>hardware addresses must reside outside of the user and kernel text/data
>virtual addresses.

Only because the ioremap()'ing in arch/ppc/mm/init.c is not done correctly.
My working kernel runs fine with the Cogent I/O devices located at 0x02000000
in the physical address space. They are not at that location in the virtual
address space, but this is hidden (by indirection).

>> ..... to after the exception handlers because the extra
>> mappings required for the Cogent devices caused this code to exceed 0x100
>> bytes.
>
>All of this mapping should be done inside of the device drivers, not
>part of the early kernel initialization.

Hmm.. I do all I/O mapping in MMU_init() using ioremap() - is there another
way? I suppose I could map each individual device in the driver initialisation
routines (probably the usual way?), but the Cogent has the concept of I/O
slots, which have a fixed location and size in the physical address space
(by default), so I simply map the entire range (32Mb) for the slot, and then
each device driver treats I/O addresses as offsets from the I/O slot's virtual
base address, as returned by ioremap() (it's actually done generically by
macros in the board specific header). This is wasteful in the page map (even
the I/O slot that has the flash, only uses 8Mb - although it could have had
16Mb flash on it - I only got the 8Mb version), but conceptually simpler.
However, the device registers are fairly sparsely arranged within the 32Mb
address ranges, especially for the motherboard I/O area, so I reckon the
saving trying to do it bit by bit wouldn't really be worth the extra
complication.

>> These are all 2.2.x, no? I believe I need 2.[34].x because I want to use
>> the latest RT-Linux stuff eventually, which only works with the 2.3.x, or
>> later, kernels.
>
>Yes, but 2.4.xx doesn't work right now.

I know, I have been tracking it, but it doesn't seem to be getting much
better.

>I am trying to get that working among other things.

I too am trying various things (but its not a priority at the moment).

>You have to back up to a much older
>version of 2.3.xx if you want to use this baseline right now.

Yep, I have done that - I backed up to October 1999 and it works. I could
try later versions, but each attempt is fairly arduous and I have one that
works, so I didn't bother.

Now back to my original post - updating the TWB: here is the relevant
code in include/asm-ppc/mmu_context.h:

	/*
	 * After we have set current->mm to a new value, this activates
	 * the context for the new mm so we see the new mappings.
	 */
	static inline void activate_mm(struct mm_struct *active_mm, struct mm_struct *mm)
	{
		current->thread.pgdir = mm->pgd;
		get_mmu_context(mm);
		set_context(mm->context);
	}

I believe it is wrong to change current->thread.pgdir, without mirroring
that change in the MMU TWB register. This is the gist of my (long winded?)
first posting. Is this true or not?

Similarly, this code:

	static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
				     struct task_struct *tsk, int cpu)
	{
		tsk->thread.pgdir = next->pgd;
		get_mmu_context(next);
		set_context(next->context);
	}

Surely:

	1. the set_context() should not be done, unless (tsk == current)
	   is true;

	2. if (tsk == current) is true, then the TWB should be updated
	   with the contents of tsk->thread.pgdir.

However, in this second case, switch_mm() is only called inside _switch()
(as far as I can see) and therefore the TWB will be updated anyway when the
task switch happens, so this second case is not that important (other than
the case when someone thinks "oh, all I have to do here is call switch_mm()
and that will save me a lot of work" but instead all hell breaks lose because
the code isn't right).

But I believe the first case will cause problems. In an exec, a new mm context
is created, and the current one is destroyed (after copying arguments and
environment etc). It looks to me like this is done using activate_mm() i.e.
the new mm context is activated using this function (makes sense - no point
creating a whole new task, just use the one we have - this is the entire point
of exec). But the call is not happening inside _switch() as with the other
case and so it will only be fluke if the TWB maintains the correct value (e.g.
maybe a task switch occurs before any damage happens in all but the most
exceptional circumstances).

I would like to hear people's opinions on this.

Finally, is the "Wrath of Dan" some sort of juvenile initiation right that
all new members of the elite "Linux/PPC Embedded" gang have to go through?
Twice now you have treated me with contempt or in a condescending way. I
should be able to ignore it, because *I know* that I have some skill in this
area (I was hacking drivers for 4.2BSD on a VAX 15 years ago), but others
might be put off by your attitude and open development in this area might
suffer as a result. Please try to accept this as constructive criticism
(despite my sarcastic crack above - as Maxwell Smart would say, "I hope I
wasn't outta-line with that crack about the gang" :-). I want to learn from
you and others, and I hope I will be able to give some knowledge/experience
back. Cheers!
								Murray...
--
Murray Jensen, CSIRO Manufacturing Sci & Tech,         Phone: +61 3 9662 7763
Locked Bag No. 9, Preston, Vic, 3072, Australia.         Fax: +61 3 9662 7853
Internet: Murray.Jensen@cmst.csiro.au  (old address was mjj@mlb.dmt.csiro.au)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Debug information for elf format
  2000-06-04  4:40 kernel crashes at InstructionTLBMiss Daniel Wu
                   ` (2 preceding siblings ...)
  2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek
@ 2000-06-30  6:17 ` Kwansuk Kim
  2000-06-30  6:46   ` sungyeon
  3 siblings, 1 reply; 22+ messages in thread
From: Kwansuk Kim @ 2000-06-30  6:17 UTC (permalink / raw)
  To: linuxppc-embedded


Hi, everyone,

I use SMC BDM tool to load linux kernel on my custom mpu860 board.

I try to fix the source to operate on the custom board. But in case of C file it's too difficult. SMC BDM tool supports only ELF file format. But according to the GCC howto, it doesn't produce debug information for elf format but stab, DWARF or COFF.

What should I do to debug C code on BDM?

I'm debugging with data prompted on serial console (SMC2). But it's too hard because I compile the kernel on Linux and operate on win98. Reboot over twenty times a day.  :-(


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug information for elf format
  2000-06-30  6:17 ` Debug information for elf format Kwansuk Kim
@ 2000-06-30  6:46   ` sungyeon
  0 siblings, 0 replies; 22+ messages in thread
From: sungyeon @ 2000-06-30  6:46 UTC (permalink / raw)
  To: Kwansuk Kim, linuxppc-embedded


Hi.
you can crate DWARF using  "-gdwarf" option.

----- Original Message -----
From: "Kwansuk Kim" <kskim@neowave.co.kr>
To: <linuxppc-embedded@lists.linuxppc.org>
Sent: Friday, June 30, 2000 3:17 PM
Subject: Debug information for elf format


>
> Hi, everyone,
>
> I use SMC BDM tool to load linux kernel on my custom mpu860 board.
>
> I try to fix the source to operate on the custom board. But in case of C file it's too difficult. SMC BDM tool supports only ELF file format. But according to the GCC howto, it doesn't produce debug information for elf format but stab, DWARF or COFF.
>
> What should I do to debug C code on BDM?
>
> I'm debugging with data prompted on serial console (SMC2). But it's too hard because I compile the kernel on Linux and operate on win98. Reboot over twenty times a day.  :-(
>
>
>
>

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* too few RAM?
  2000-06-06  3:56   ` Daniel Wu
  2000-06-06 20:18     ` Dan Malek
@ 2000-08-10 12:05     ` Wojciech Kromer
  2000-08-10 14:49       ` Dan Malek
  1 sibling, 1 reply; 22+ messages in thread
From: Wojciech Kromer @ 2000-08-10 12:05 UTC (permalink / raw)
  To: Daniel Wu, linuxppc-embedded


i'm trying to run  mpc8xx-2.2.13 (with all patches I found)
 - with 8xxrom (0.3.0)
 - on MPC8XXFADS
    .MPC823e
    .4MB RAM
    .2MB FLASH

this is is my boot time output:
==================

entry 0x100000, phoff 0x34, shoff 0x75a60
phnum 0x1, shnum 0x9
p_offset 0x10000, p_vaddr 0x100000, p_paddr 0x100000
p_filesz 0x530c, p_memsz 0xb1cc
Loading at 0x10c000
Size 486060
475 blocks
Starting 0x11c000
loaded at:     0011C000 001271CC
relocated to:  00100000 0010B1CC
board data at: 003F0000 003F001C
relocated to:  0010C100 0010C11C
zimage at:     00122000 00181A24
avail ram:     00182000 00400000

Linux/PPC load:
Uncompressing Linux...done.
Now booting the kernel

and here is what
====================
exception: Implementation Specific Instruction TLB miss
0xc00d95bc  can't read memory address

f823Bug> md 0d95b0 :i
0x000d95b0:             bb010010 lmw    r24, 0x10(r1)
0x000d95b4:             38210030 addi   r1,r1, 0x30
0x000d95b8:             4e800020 bclr   0x14,0
0x000d95bc:             9421ffd0 stwu   r1,-0x30(r1)
0x000d95c0:             7c0802a6 mfspr  r0,LR
0x000d95c4:             bf810020 stmw   r28, 0x20(r1)
0x000d95c8:             90010034 stw    r0, 0x34(r1)
0x000d95cc:             3d20c00d addis  r9,r0,0xc00d
0x000d95d0:             80697420 lwz    r3, 0x7420(r9)
0x000d95d4:             3fa0c00d addis  r29,r0,0xc00d
0x000d95d8:             4bf36f79 bl     0x00010550
0x000d95dc:             3f80c00d addis  r28,r0,0xc00d
0x000d95e0:             38610008 addi   r3,r1, 0x8
0x000d95e4:             38bc7408 addi   r5,r28, 0x7408
0x000d95e8:             389d7404 addi   r4,r29, 0x7404
0x000d95ec:             480004d9 bl     0x000d9ac4

====================
Q1: is it not enough RAM to run this stuff ?

Q2: does anyone have ALL pathched files to run with my board?
  (HHL nos not support FADS!)


PS please answer to my priv too (krom@dgt-lab.com.pl)

* * * * * * * * * * * * *
* per pedes ad astra !  *
* * * * * * * * * * * * *    mailto:krom@softomat.com.pl


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: too few RAM?
  2000-08-10 12:05     ` too few RAM? Wojciech Kromer
@ 2000-08-10 14:49       ` Dan Malek
  2000-08-17 11:49         ` Wojciech Kromer
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Malek @ 2000-08-10 14:49 UTC (permalink / raw)
  To: Wojciech Kromer; +Cc: Daniel Wu, linuxppc-embedded


Wojciech Kromer wrote:
>
> i'm trying to run  mpc8xx-2.2.13 (with all patches I found)
>  - with 8xxrom (0.3.0)
>  - on MPC8XXFADS
>     .MPC823e
>     .4MB RAM
>     .2MB FLASH

Yes, sorry, this is too little RAM.  A long time ago in a land
far, far away (before initrd support), you could boot in less than 8
Mbytes.  Not any more.


	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: too few RAM?
  2000-08-10 14:49       ` Dan Malek
@ 2000-08-17 11:49         ` Wojciech Kromer
  0 siblings, 0 replies; 22+ messages in thread
From: Wojciech Kromer @ 2000-08-17 11:49 UTC (permalink / raw)
  To: Dan Malek, linuxppc-embedded


Dan Malek wrote:
>
> Wojciech Kromer wrote:
> >
> > i'm trying to run  mpc8xx-2.2.13 (with all patches I found)
> >  - with 8xxrom (0.3.0)
> >  - on MPC8XXFADS
> >     .MPC823e
> >     .4MB RAM
> >     .2MB FLASH
>
> Yes, sorry, this is too little RAM.  A long time ago in a land
> far, far away (before initrd support), you could boot in less than 8
> Mbytes.  Not any more.
>
>         -- Dan
>


now i have 8MB RAM,but my kernel still hangs trying to run any
application

my kernel is: 'Using ELF interpreter /lib/ld.so.1'

i was trying to compile ld.so, but there are erros:
lddstub.S:19: #error Only know how to support i386, m68k and sparc
architectures
lddstub.S:36: #error Only know how to support i386, m68k and sparc
architectures

ld.so... versions from net dosent semm to work
(i was trying to use one from HHL, and from mbxroot.min.tgz file)

Q1: where can i get FULL BINARY versions of:
 -8xxrom (or something like this)
 -kernel starting from nfs
 -root file system (to put it on nfs server)
form my board


Q2: are there any source packages for ALL what i need to re-compile


--
* * * * * * * * * * * * *
* per pedes ad astra !  *
* * * * * * * * * * * * *    mailto:krom@dgt-lab.com.pl

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2000-08-17 11:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-06-04  4:40 kernel crashes at InstructionTLBMiss Daniel Wu
2000-06-05  2:32 ` Dan A. Dickey
2000-06-05  8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen
2000-06-05 20:37   ` Dan Malek
2000-06-06  6:31     ` Murray Jensen
2000-06-06 20:05       ` Dan Malek
2000-06-07  3:05         ` Dan A. Dickey
2000-06-07  9:17         ` Murray Jensen
2000-06-07  3:02       ` Dan A. Dickey
2000-06-06 21:37         ` Steve Tarr
2000-06-06 17:03     ` net driver receive problems Tom Roberts
2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek
2000-06-05 15:55   ` Dan Malek
2000-06-05 16:19     ` Dan Malek
2000-06-06  3:59     ` Graham Stoney
2000-06-06  3:56   ` Daniel Wu
2000-06-06 20:18     ` Dan Malek
2000-08-10 12:05     ` too few RAM? Wojciech Kromer
2000-08-10 14:49       ` Dan Malek
2000-08-17 11:49         ` Wojciech Kromer
2000-06-30  6:17 ` Debug information for elf format Kwansuk Kim
2000-06-30  6:46   ` sungyeon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).