* kernel crashes at InstructionTLBMiss
@ 2000-06-04 4:40 Daniel Wu
2000-06-05 2:32 ` Dan A. Dickey
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Daniel Wu @ 2000-06-04 4:40 UTC (permalink / raw)
To: linuxppc-embedded
Hi,
I'm still having a few problems with my linux port (860T based board) so I hope
someone can give me some fresh ideas to how to track down the problem. When I
boot the target, I get the following output and nothing more.
loaded at: 00800000 0080B1D8
relocated to: 00B00000 00B0B1D8
board data at: 00B00190 00B001B8
relocated to: 007F0100 007F0128
zimage at: 00806000 0087C6C1
initrd at: 0087C6C1 00A53511
avail ram: 00A54000 02000000
Linux/PPC load:
Uncompressing Linux...done.
Now booting the kernel
Linux version 2.2.13 (aaluser@c1rb) (gcc version 2.95.2 19991024 (release)
) #97 Fri Jun 2 18:18:27 EST 2000
Boot arguments: root=/dev/ram
time_init: decrementer frequency = 187500000/60
Calibrating delay loop... 49.77 BogoMIPS
Memory: 29308k available (852k kernel code, 688k data, 32k init)
[c0000000,c2000
000]
DENTRY hash table entries: 262144 (order: 9, 2097152 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 8192 (order: 3, 32768 bytes)
POSIX conformance testing by UNIFIX
I then ran the same code using a BDM debugger and it is showing that the code
is crashing at InstructionTLBMiss:
InstructionTLBMiss:
#ifndef NO_MPC8xxBUG_CPU6
stw r3, 8(r0)
li r3, M_TW_ADDR
stw r3, 12(r0)
lwz r3, 12(r0)
mtspr M_TW, r20 /* Save a couple of working registers */
mfcr r20
stw r20, 0(r0)
stw r21, 4(r0)
mfspr r20, SRR0 /* Get effective address of fault */
li r3, MD_EPN_ADDR
stw r3, 12(r0)
lwz r3, 12(r0)
#else /* NO_MPC8xxBUG_CPU6 */
mtspr M_TW, r20 /* Save a couple of working registers */
mfcr r20
stw r20, 0(r0)
stw r21, 4(r0)
mfspr r20, SRR0 /* Get effective address of fault */
#endif /* NO_MPC8xxBUG_CPU6 */
mtspr MD_EPN, r20 /* Have to use MD_EPN for walk, MI_EPN can't */
mfspr r20, M_TWB /* Get level 1 table entry address */
==> lwz r21, 0(r20) /* Get the level 1 entry */
rlwinm. r20, r21,0,0,20 /* Extract page descriptor page address */
Note that I've applied the patch by Marcus Sundberg but either way, the same
thing happens.
The values of the general registers at the crash point are:
r0: 00a54230 c0a55dc0 c0a54000 00003780 c0a54230 00000000 c00f2000 00000319
r8: 0000001f 400f1000 0000000b c00f5b5c 84000028 00000000 00000000 00000000
r16: 00000000 00000000 00000000 00000000 400f1c00 000f4c20 00000000 00000000
r24: c0002284 00000000 00000000 c00f4bf0 00000001 c0a54000 c00f2ca8 c00f4be8
As you can see, r20 is 400f1c00, which looks wrong, but why? Any suggestions?
Thanks,
Daniel
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: kernel crashes at InstructionTLBMiss 2000-06-04 4:40 kernel crashes at InstructionTLBMiss Daniel Wu @ 2000-06-05 2:32 ` Dan A. Dickey 2000-06-05 8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Dan A. Dickey @ 2000-06-05 2:32 UTC (permalink / raw) To: Daniel Wu; +Cc: linuxppc-embedded Daniel Wu wrote: ... > I then ran the same code using a BDM debugger and it is showing that the code > is crashing at InstructionTLBMiss: Daniel, your problem sounds suspiciously like a problem I was having with the mpc8bug debugger. Issuing a 'rms der 0' command before the 'go ...' worked well for me. Why don't you give that a try? -Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-04 4:40 kernel crashes at InstructionTLBMiss Daniel Wu 2000-06-05 2:32 ` Dan A. Dickey @ 2000-06-05 8:19 ` Murray Jensen 2000-06-05 20:37 ` Dan Malek 2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek 2000-06-30 6:17 ` Debug information for elf format Kwansuk Kim 3 siblings, 1 reply; 22+ messages in thread From: Murray Jensen @ 2000-06-05 8:19 UTC (permalink / raw) To: Daniel Wu; +Cc: linuxppc-embedded > mfspr r20, M_TWB /* Get level 1 table entry address */ ... >As you can see, r20 is 400f1c00, which looks wrong, but why? Any suggestions? At this point, the MMU is disabled so r20, which is loaded from the MMU Table Walk Base register, should be a physical address - 400f1c00 is not a likely physical address for something in RAM (unless you have a weird disjoint RAM setup), so yes it certainly looks wrong. Incorrect TWB contents is a disaster. Here we come to a dilemma that I have had since I started with this stuff. I have never been able to get an 8xx kernel running without adding a patch to update the Table Walk Base register at the time that a new mm context is activated. Let me explain: normally the TWB is loaded at context switch time which makes sense because a different task with a different virtual memory context will be running. This is done in the following code in the _switch function in arch/ppc/kernel/entry.S: > tophys(r0,r4) > mtspr SPRG3,r0 /* Update current THREAD phys addr */ >#ifdef CONFIG_8xx > /* XXX it would be nice to find a SPRGx for this on 6xx,7xx too */ > lwz r9,PGDIR(r4) /* cache the page table root */ > tophys(r9,r9) /* convert to phys addr */ > mtspr M_TWB,r9 /* Update MMU base address */ > tlbia > SYNC >#endif /* CONFIG_8xx */ The contents of the TWB should be the address stored in current->thread.pgdir converted to a physical address. The above code is the only place that the TWB is written to, anywhere in the kernel (that I can find). The TWB is then used in the TLB miss handlers to load the TLB entry (assuming a mapping exists - if not, do_page_fault() is called to fill it in). But I have found that there is a situation during "exec()" where a newly created mm context is "activated" (via activate_mm() in asm/mmu_context.h) before the task is actually "switch"ed to (presumably to copy the arguments and environment etc from the old task - which is being overwritten) i.e. the TWB is not updated because a switch hasn't occurred [NOTE: this is only my theory - I am not an expert on this stuff] Without my patch, the exec of "/sbin/init" hangs in an endless TLBMiss handler loop, where a virtual address is accessed which causes a TLB miss, the TWB has contents of the old pgdir which does not have a mapping for that virtual address so do_page_fault() is called to fill it in, but do_page_fault() decides that that mapping exists and everything is ok so why the hell did you call me, I'll just return doing nothing! - the access is re-tried which causes a TLB miss again at the same virtual address. The kernel is in a dead hang (although later 2.[34].* kernels exhibit different symptoms, which mystifies me a bit - i.e. characters typed on the console are echoed, and I know timer interrupts are occuring, because I have a rotating thingy on the LCD display which updates once a second via the timer interrupt handler, so it is not a complete dead hang). The patch I always have to add to arch/ppc/kernel/head_8xx.S is: */ _GLOBAL(set_context) mtspr M_CASID,r3 /* Update context */ + lwz r3, THREAD+PGDIR(r2) + tophys(r3, r3) + mtspr M_TWB, r3 tlbia SYNC blr I know this is wrong, but it seems to work for me (unless the TWB can be considered to be part of the MMU context, and therefore it is legitimate to update it in set_context()? I don't know). I have tried other things e.g. adding a "set_context_and_twb()" function, just after the set_context() function (without above patch), e.g.: --- arch/ppc/kernel/head_8xx.S 2000/04/28 06:35:05 1.1.1.5 +++ arch/ppc/kernel/head_8xx.S 2000/06/05 07:51:50 @@ -905,6 +905,19 @@ SYNC blr +/* + * the 8xx tablewalk base register (M_TWB) must be consistent with + * the currently active mm. This is called from switch_mm() and + * activate_mm() in include/asm-ppc/mmu_context.h + */ +_GLOBAL(set_context_and_twb) + mtspr M_CASID,r3 /* Update context */ + tophys(r4, r4) + mtspr M_TWB, r4 + tlbia + SYNC + blr + /* Jump into the system reset for the rom. * We first disable the MMU, and then jump to the ROM reset address. * Then doing something like this: --- include/asm-ppc/mmu_context.h 2000/03/07 03:59:54 1.1.1.2 +++ include/asm-ppc/mmu_context.h 2000/06/05 07:46:35 @@ -52,6 +52,11 @@ extern void set_context(int context); #ifdef CONFIG_8xx +/* same as above plus loads the 8xx tablewalk base register also */ +extern void set_context_and_twb(int, void *); +#endif + +#ifdef CONFIG_8xx extern inline void mmu_context_overflow(void) { atomic_set(&next_mmu_context, -1); @@ -85,7 +90,10 @@ { tsk->thread.pgdir = next->pgd; get_mmu_context(next); - set_context(next->context); + if (tsk == current) + set_context_and_twb(next->context, tsk->thread.pgdir); + else + set_context(next->context); } /* @@ -96,7 +104,7 @@ { current->thread.pgdir = mm->pgd; get_mmu_context(mm); - set_context(mm->context); + set_context_and_twb(mm->context, current->thread.pgdir); } /* This works also, though I'm not sure about it. I was thinking that maybe the set_context() in switch_mm() should only be done if the switch_mm() is being performed on the "current" task. e.g. tsk->thread.pgdir = next->pgd; get_mmu_context(next); if (tsk == current) set_context(next->context); Then set_context() could simply update the TWB with current->thread.pgdir. But I think the only place switch_mm() is called is in the task context switch code anyway, which means current is about to change, and also means I get confused :-) But I know activate_mm() is used in other places - something to do with "lazy tlb" mode, and also in exec(). I give up. One thing I think is for certain in all this - do_page_fault() should *NEVER* return without having done something - anything - to ensure that the same fault does not re-occur after the handler returns - if it can't handle the fault, it should either kill the task if it is in user mode, or panic if in kernel mode. One thing that bothers me is why this behaviour only occurs for me? I have no idea, but obviously it is only me, otherwise no-one would have a working embedded 8xx 2.[34].* kernel. I suspect I am doing something else which triggers this bug, or else there is something I don't understand (not unlikely :-). Note: I have only ever tried the 2.[34].* series of kernels. I have not tried the 2.2.* kernels, but some code snippets I have seen in the list archives suggest ... I just searched the list and found the following comment from Dan Malek on 16 Dec 98: > BTW, why must the M_TWB be set in SET_PAGE_DIR ? The M_TWB points to the first level page table (Linux pgd_t) and is used in the mpc8xx page fault handler. When Linux deletes or otherwise modifies the memory map object such that the first level page table is modified (as during exec), it uses SET_PAGE_DIR. Since the first level table has potentially moved to a new memory location, we have to set M_TWB at this time. If we don't, a process exec without an intervening context switch will cause us to use a bogus M_TWB when trying to find page tables. -- Dan OK - where is SET_PAGE_DIR() in the 2.[34].* kernels? Following the threads it appears that this discussion was had a long time ago, but in the other direction - the TWB was being updated too often, and the consensus was that it should only be updated when the SET_PAGE_DIR macro was setting the page dir for the current task. Now it is not setting it at all. I think I'd better shut up now and let other more experienced people tell me what I have missed or where I have gone wrong :-) Cheers! Murray... -- Murray Jensen, CSIRO Manufacturing Sci & Tech, Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@cmst.csiro.au (old address was mjj@mlb.dmt.csiro.au) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-05 8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen @ 2000-06-05 20:37 ` Dan Malek 2000-06-06 6:31 ` Murray Jensen 2000-06-06 17:03 ` net driver receive problems Tom Roberts 0 siblings, 2 replies; 22+ messages in thread From: Dan Malek @ 2000-06-05 20:37 UTC (permalink / raw) To: Murray Jensen; +Cc: Daniel Wu, linuxppc-embedded Murray Jensen wrote: > Here we come to a dilemma that I have had since I started with this stuff. > I have never been able to get an 8xx kernel running without adding a patch > to update the Table Walk Base register at the time that a new mm context is > activated. After reading your diatribe perhaps I should provide a little information. There are many subtle changes to context switching that happen during the minor updates (which could be weekly). There are several patches floating around (and probably more kernel sources) that certainly are not correct. I don't know where you get your source code, but there are exactly two consistent and working kernel sources that I have ever provided. One is in ftp://linuxppc.cs.nmt.edu/pub/linuxppc/embedded, the mpc8xx-2.2.13.tgz tarball. A better and completely up to date kernel is in ftp.mvista.com/pub/CDK/wip/ppc_8xx/RPMS (along with everything else to build an 8xx embedded system). Everyone should be using the kernel from MontaVista, and if something isn't in there that you want, send me patches against that. There are patches posted against that original tarball, and make sure you are not mixing kernel versions and patches. Finally, lots of bugs associated with porting to new hardware manifest themselves as "problems" in any VM related function. Since many people don't understand the subtle interactions of all of these functions (as evidenced by your message) you become convinced the problem is associated with this complexity and fail to unravel the clues to the real cause. This could be as simple as intrusive debugging hardware, some silicon bug not understood, or prototype hardware not working correctly. There are lots of products and systems in development running this software, so you have to approach this generic software from the assumption that it is first likely to be working. You seldom hear from those people. Are there possible bugs? Sure, and you have to provide minimal information for the rest of us to help out. Where did you get the sources? What patches did you apply? What are your hardware details? What modifications did you make? As for 2.4.xx, the 8xx still doesn't work correctly. However, I discovered it failed to work after the 403 additions, so I am now learning about the 403 in an effort to make everything live happily together again. Note, this has nothing to do with M_TWB...... -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-05 20:37 ` Dan Malek @ 2000-06-06 6:31 ` Murray Jensen 2000-06-06 20:05 ` Dan Malek 2000-06-07 3:02 ` Dan A. Dickey 2000-06-06 17:03 ` net driver receive problems Tom Roberts 1 sibling, 2 replies; 22+ messages in thread From: Murray Jensen @ 2000-06-06 6:31 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-embedded On Mon, 05 Jun 2000 16:37:55 -0400, Dan Malek <dan@netx4.com> writes: >Murray Jensen wrote: > >> Here we come to a dilemma that I have had since I started with this stuff. >> I have never been able to get an 8xx kernel running without adding a patch >> to update the Table Walk Base register at the time that a new mm context is >> activated. > > >After reading your diatribe Diatribe? Hmm.. Sorry, I didn't mean to offend you - I thought I was being reasonably clear, and definitely polite. I wasn't being at all critical of anyone associated with Linux/PPC or the 8xx embedded version - I think you and they all do a great job, and I am very impressed. In my eagerness I left out some information I should have provided, sorry. I will try to correct that now. I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the base for my local changes. I use a Sun Ultra 60 dual cpu sparc workstation running Solaris 2.7 as my host o/s, with gcc-2.95.2, the latest binutils from the CVS repository at :pserver:anoncvs@anoncvs.cygnus.com:/cvs/src, and glibc-2.1.3 configured as an mpc8xx cross-compiler for Solaris. I build my own root filesystem, based on sources from the net. When I compile the kernel, I build zImage.initrd and download it to the target using the GDB protocol via a serial port. My hardware is a Cogent CMA102 motherboard, with CMA286-60 CPU module (MPC860 cpu - rev no. XPC860MHZP66C1), and CMA302 I/O module with 8Mb flash. The motherboard has 32Mb RAM, 2 serial and 1 parallel ports, and LCD display. The cpu module has a 128K boot eprom, which I load with a small ROM monitor I wrote based on the GDB eprom stubs configuration of eCos (embedded cygnus operating system - which supports the cogent platform). The monitor supports downloading via the serial port (at 230400bps) into RAM using the GDB protocol, programming flash from a RAM image, and booting an image that resides in flash, among other things (I call it ELILO :-). Modifications I make to the kernel are minimal - just drivers for devices on the cogent platform (including the I/O mappings, which are different to the MBX in that they reside in the lower half of the address space which required me to use ioremap() correctly by setting ioremap_base and saving its return value and using this to access my devices) and some other minor changes, which I believe are not relevant. The only major change I have had to make to the kernel is the one I discussed in my previous message. I checked this out again, and one other change was moving most of the code at _start in head_8xx.S to after the exception handlers because the extra mappings required for the Cogent devices caused this code to exceed 0x100 bytes. The other thing I added was making use of the MPC860 watchdog which I could do because I had control of the boot eprom code (if the kernel hangs I get a watchdog reset in some circumstances, depending on the type of hang). >There are many subtle changes to context switching that happen during >the minor updates (which could be weekly). I usually update daily, or every couple of days, a local copy of the bitkeeper repository (using rsync, but I also maintain a read-only anonymous bitkeeper clone which I bk pull at the same time, because I like to use bk sccstool to follow the changes), which I then "import" into a vendor branch of a local CVS repository. My local changes are maintained in the HEAD revision. I also maintain a "stable" branch which is a working kernel, based on repository as at October 1999. >There are several patches >floating around (and probably more kernel sources) that certainly >are not correct. I don't use any patches from the net - all changes made are local. >I don't know where you get your source code, but there >are exactly two consistent and working kernel sources that I have ever >provided. One is in ftp://linuxppc.cs.nmt.edu/pub/linuxppc/embedded, >the mpc8xx-2.2.13.tgz tarball. A better and completely up to date >kernel is in ftp.mvista.com/pub/CDK/wip/ppc_8xx/RPMS (along with >everything else to build an 8xx embedded system). Everyone should be >using the kernel from MontaVista, and if something isn't in there >that you want, send me patches against that. These are all 2.2.x, no? I believe I need 2.[34].x because I want to use the latest RT-Linux stuff eventually, which only works with the 2.3.x, or later, kernels. >There are patches posted against that original tarball, and make sure >you are not mixing kernel versions and patches. As I say, I use a pristine 2.[34].x kernel with local changes only. >Finally, lots of bugs associated with porting to new hardware manifest >themselves as "problems" in any VM related function. Since many people >don't understand the subtle interactions of all of these functions (as >evidenced by your message) you become convinced the problem is associated >with this complexity and fail to unravel the clues to the real cause. I don't think I deserve this sort of belittling. Treating potential contributors in this way can only have a negative effect on open source development. I admit I don't yet fully understand the PowerPC architecture, or the MPC8xx implementation of it, but I am learning, and with nearly 20 years experience in computer science I believe I should be able to pick it up eventually (I've "seen it all before" :-). >This could be as simple as intrusive debugging hardware, I use kgdb. >some silicon >bug not understood, I included my chip revision above. It appears to be a C1 revision chip. >or prototype hardware not working correctly. Definitely. >There are lots of products and systems in development running this software, >so you have to approach this generic software from the assumption that >it is first likely to be working. I did. I said I was intrigued as to why this problem only affected me. And once I make the described change, the "generic software" works for me also (at least an older revision works - current revisions still crash, something to do with the memory allocation stuff, I believe). As I said in my previous message, I suspect something else I am doing is triggering this bug (that much is obvious), but there are two possibilities: either I am doing something wrong in my local changes, or the "generic software" has a bug which does not show up in anyone else's implementation. I was wondering whether the latter was the case (I wasn't blaming anyone, I was excited that maybe I had discovered a long existing hidden fault in the software, that may explain some mysterious failure modes, that someone else might be getting - other developers may then post, saying "yeah, that would explain my problem, blah blah", and so the discussion goes on. Upon searching the archives, I found that a similar problem had been discussed for the 2.2.x kernels, so maybe the fix or fixes didn't make their way into the 2.[34].x kernels. I don't know, anything is possible, that's why we have these discussion groups). >Are there possible bugs? Sure, and you have to provide minimal information >for the rest of us to help out. Again, apologies for not providing enough information in my message - I made assumptions I shouldn't have. Obviously, on my first post I should have been completely anal, because no-one knows me from a bar of soap. I can then start to be less exacting after I have been around for a while. >Where did you get the sources? What >patches did you apply? What are your hardware details? What >modifications did you make? See above. >As for 2.4.xx, the 8xx still doesn't work correctly. However, I >discovered it failed to work after the 403 additions, so I am now >learning about the 403 in an effort to make everything live happily >together again. It was my feeling that the problems were to do with the new memory allocation stuff introduced a couple of months ago. >Note, this has nothing to do with M_TWB...... I know. Now that we have gotten past treating me like a dill, please can you re-read my original message and see if I am making any sense at all? I would very much appreciate some insights and even constructive criticism. Cheers! Murray... PS: I haven't contributed the Cogent platform changes yet, because I wasn't happy that I had done everything properly. This was really my first foray into taking part in the Linux/PPC embedded development community - I can't say it has been particularly successful (despite my good feelings about contributing a small fix a couple of days ago). I will try not to be too discouraged. -- Murray Jensen, CSIRO Manufacturing Sci & Tech, Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@cmst.csiro.au (old address was mjj@mlb.dmt.csiro.au) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-06 6:31 ` Murray Jensen @ 2000-06-06 20:05 ` Dan Malek 2000-06-07 3:05 ` Dan A. Dickey 2000-06-07 9:17 ` Murray Jensen 2000-06-07 3:02 ` Dan A. Dickey 1 sibling, 2 replies; 22+ messages in thread From: Dan Malek @ 2000-06-06 20:05 UTC (permalink / raw) To: Murray Jensen; +Cc: Dan Malek, linuxppc-embedded Murray Jensen wrote: > I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the > base for my local changes. This has not run correctly on the 8xx for quite some time. It won't boot since the addition of the IBM403 changes. > .... (including the I/O mappings, which are different > to the MBX in that they reside in the lower half of the address space which > required me to use ioremap() correctly by setting ioremap_base and saving > its return value and using this to access my devices) and some other minor > changes, which I believe are not relevant. Not again......Did you read any of my past postings about memory mapping on the 8xx? You can't change ioremap_base, and any memory mapping change is highly relevant. > I checked this out again, and one other change was moving most of the code > at _start in head_8xx.S Oh geeze.....Let me quickly paraphrase what I have written in the past. You should not be changing _any_ code in head_8xx.S. This code will minimally map some memory and the IMMR. This is all that is required to boot the kernel into further initialization functions. If there are some devices that you must use early (such as board control/status registers), you ioremap() these in arch/ppc/mm/init.c. These physical hardware addresses must reside outside of the user and kernel text/data virtual addresses. > ..... to after the exception handlers because the extra > mappings required for the Cogent devices caused this code to exceed 0x100 > bytes. All of this mapping should be done inside of the device drivers, not part of the early kernel initialization. > These are all 2.2.x, no? I believe I need 2.[34].x because I want to use > the latest RT-Linux stuff eventually, which only works with the 2.3.x, or > later, kernels. Yes, but 2.4.xx doesn't work right now. I am trying to get that working among other things. You have to back up to a much older version of 2.3.xx if you want to use this baseline right now. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-06 20:05 ` Dan Malek @ 2000-06-07 3:05 ` Dan A. Dickey 2000-06-07 9:17 ` Murray Jensen 1 sibling, 0 replies; 22+ messages in thread From: Dan A. Dickey @ 2000-06-07 3:05 UTC (permalink / raw) To: Dan Malek; +Cc: Murray Jensen, linuxppc-embedded Dan Malek wrote: ... > Yes, but 2.4.xx doesn't work right now. I am trying to get that > working among other things. Dan, is there anyway others can help? (Sure there are, just let us know how...) -Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-06 20:05 ` Dan Malek 2000-06-07 3:05 ` Dan A. Dickey @ 2000-06-07 9:17 ` Murray Jensen 1 sibling, 0 replies; 22+ messages in thread From: Murray Jensen @ 2000-06-07 9:17 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-embedded On Tue, 06 Jun 2000 16:05:42 -0400, Dan Malek <dan@netx4.com> writes: >Murray Jensen wrote: > >> I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the >> base for my local changes. > >This has not run correctly on the 8xx for quite some time. I know - I said as much in my message. I have a working 2.3.x kernel from some months ago (October 1999). >It won't >boot since the addition of the IBM403 changes. It boots fine for me, but eventually crashes with the following: kmem_alloc: Bad slab magic (corrupt) (name=buffer_head) As far as I can tell, a completely new method of memory allocation was introduced a few months ago and it hasn't worked since. >> .... (including the I/O mappings, which are different >> to the MBX in that they reside in the lower half of the address space which >> required me to use ioremap() correctly by setting ioremap_base and saving >> its return value and using this to access my devices) and some other minor >> changes, which I believe are not relevant. > >Not again......Did you read any of my past postings about memory >mapping on the 8xx? I scanned the archives as best I could. I only found the linuxppc mailing lists a couple of weeks ago (I don't know how I managed to overlook them before). >You can't change ioremap_base, I have changed ioremap_base and it runs fine with a kernel based on 2.3.x as at October 1999. The way it was being done before is, in my opinion, incorrect. The return value from ioremap() was being ignored, which is the same as assuming that physical address == virtual address for all I/O mappings (because ioremap() uses the physical address as the virtual address if the physical address is greater than or equal to ioremap_base, but because the 8xx port does not set ioremap_base, it defaults to zero, hence all I/O mappings are done in this fashion). The Cogent CMA286-60 by default has I/O devices starting at 0x02000000. One of my problems in the early days of porting to Cogent was that I blindly copied the way I/O mappings were being done for other platforms. When it didn't work I had to find out why - of course it was because having an I/O device mapped to kernel virtual address 0x02000000 was not a good idea. I could move the location of these I/O devices in the physical address space by manipulating the PowerPC hardware in the boot rom, but this would be confusing at best (because the Cogent documentation says otherwise) and the kernel would then be reliant upon being booted from a compatible boot ROM, or to make the kernel independent, I could change the hardware mappings at kernel boot time, but this would require hacking head_8xx.S and I didn't want to change anything in there at the time. So I instead chose to do the ioremap()'ing correctly, by setting ioremap_base to a sensible value (I chose 0xf8000000, which isn't to say this is a sensible value, just the value I chose), storing the return value from ioremap() and using that as the base virtual address for access to the cogent I/O devices. >and any memory >mapping change is highly relevant. OK, if you say so (and it makes sense), but I don't believe my I/O mappings are causing any problems. However, I will change my boot rom and add a command that will change the hardware mappings so that the cogent devices are up high in the physical address space (by programming the base and option registers in the memory controller), then I can test a kernel with a pristine arch/ppc/mm/init.c and see how much difference it makes (this will take me a while). I don't consider this a high priority though, since I have a working kernel using these memory mappings. >> I checked this out again, and one other change was moving most of the code >> at _start in head_8xx.S > >Oh geeze.....Let me quickly paraphrase what I have written in the past. >You should not be changing _any_ code in head_8xx.S. This code will >minimally map some memory and the IMMR. This is all that is required >to boot the kernel into further initialization functions. If there >are some devices that you must use early (such as board control/status >registers), you ioremap() these in arch/ppc/mm/init.c. I wanted to access the Cogent LCD display for diagnostic purposes, before MMU_init was called. I simply added a second 8Mb temporary TLB entry (almost identical to the one for the IMMR). This TLB entry would have been invalidated after the first tlbia, same as for the IMMR. This was the only change to head_8xx.S (I am very careful making changes in there, if I do it at all), but it meant the code went over the available 0x100 bytes, so I moved it to 0x2000 (by the same method that is used to transfer execution to "start_here"). In any case, I believe this does not affect anything else because I have run with and without that change and it appears to make no difference (other than that I cannot access the LCD display). My kernel (the working one) runs fine in either case. >These physical >hardware addresses must reside outside of the user and kernel text/data >virtual addresses. Only because the ioremap()'ing in arch/ppc/mm/init.c is not done correctly. My working kernel runs fine with the Cogent I/O devices located at 0x02000000 in the physical address space. They are not at that location in the virtual address space, but this is hidden (by indirection). >> ..... to after the exception handlers because the extra >> mappings required for the Cogent devices caused this code to exceed 0x100 >> bytes. > >All of this mapping should be done inside of the device drivers, not >part of the early kernel initialization. Hmm.. I do all I/O mapping in MMU_init() using ioremap() - is there another way? I suppose I could map each individual device in the driver initialisation routines (probably the usual way?), but the Cogent has the concept of I/O slots, which have a fixed location and size in the physical address space (by default), so I simply map the entire range (32Mb) for the slot, and then each device driver treats I/O addresses as offsets from the I/O slot's virtual base address, as returned by ioremap() (it's actually done generically by macros in the board specific header). This is wasteful in the page map (even the I/O slot that has the flash, only uses 8Mb - although it could have had 16Mb flash on it - I only got the 8Mb version), but conceptually simpler. However, the device registers are fairly sparsely arranged within the 32Mb address ranges, especially for the motherboard I/O area, so I reckon the saving trying to do it bit by bit wouldn't really be worth the extra complication. >> These are all 2.2.x, no? I believe I need 2.[34].x because I want to use >> the latest RT-Linux stuff eventually, which only works with the 2.3.x, or >> later, kernels. > >Yes, but 2.4.xx doesn't work right now. I know, I have been tracking it, but it doesn't seem to be getting much better. >I am trying to get that working among other things. I too am trying various things (but its not a priority at the moment). >You have to back up to a much older >version of 2.3.xx if you want to use this baseline right now. Yep, I have done that - I backed up to October 1999 and it works. I could try later versions, but each attempt is fairly arduous and I have one that works, so I didn't bother. Now back to my original post - updating the TWB: here is the relevant code in include/asm-ppc/mmu_context.h: /* * After we have set current->mm to a new value, this activates * the context for the new mm so we see the new mappings. */ static inline void activate_mm(struct mm_struct *active_mm, struct mm_struct *mm) { current->thread.pgdir = mm->pgd; get_mmu_context(mm); set_context(mm->context); } I believe it is wrong to change current->thread.pgdir, without mirroring that change in the MMU TWB register. This is the gist of my (long winded?) first posting. Is this true or not? Similarly, this code: static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk, int cpu) { tsk->thread.pgdir = next->pgd; get_mmu_context(next); set_context(next->context); } Surely: 1. the set_context() should not be done, unless (tsk == current) is true; 2. if (tsk == current) is true, then the TWB should be updated with the contents of tsk->thread.pgdir. However, in this second case, switch_mm() is only called inside _switch() (as far as I can see) and therefore the TWB will be updated anyway when the task switch happens, so this second case is not that important (other than the case when someone thinks "oh, all I have to do here is call switch_mm() and that will save me a lot of work" but instead all hell breaks lose because the code isn't right). But I believe the first case will cause problems. In an exec, a new mm context is created, and the current one is destroyed (after copying arguments and environment etc). It looks to me like this is done using activate_mm() i.e. the new mm context is activated using this function (makes sense - no point creating a whole new task, just use the one we have - this is the entire point of exec). But the call is not happening inside _switch() as with the other case and so it will only be fluke if the TWB maintains the correct value (e.g. maybe a task switch occurs before any damage happens in all but the most exceptional circumstances). I would like to hear people's opinions on this. Finally, is the "Wrath of Dan" some sort of juvenile initiation right that all new members of the elite "Linux/PPC Embedded" gang have to go through? Twice now you have treated me with contempt or in a condescending way. I should be able to ignore it, because *I know* that I have some skill in this area (I was hacking drivers for 4.2BSD on a VAX 15 years ago), but others might be put off by your attitude and open development in this area might suffer as a result. Please try to accept this as constructive criticism (despite my sarcastic crack above - as Maxwell Smart would say, "I hope I wasn't outta-line with that crack about the gang" :-). I want to learn from you and others, and I hope I will be able to give some knowledge/experience back. Cheers! Murray... -- Murray Jensen, CSIRO Manufacturing Sci & Tech, Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@cmst.csiro.au (old address was mjj@mlb.dmt.csiro.au) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-06 6:31 ` Murray Jensen 2000-06-06 20:05 ` Dan Malek @ 2000-06-07 3:02 ` Dan A. Dickey 2000-06-06 21:37 ` Steve Tarr 1 sibling, 1 reply; 22+ messages in thread From: Dan A. Dickey @ 2000-06-07 3:02 UTC (permalink / raw) To: Murray Jensen; +Cc: Dan Malek, linuxppc-embedded Murray Jensen wrote: > > On Mon, 05 Jun 2000 16:37:55 -0400, Dan Malek <dan@netx4.com> writes: ... > >After reading your diatribe > > Diatribe? Hmm.. Sorry, I didn't mean to offend you - I thought I was being > reasonably clear, and definitely polite. ... > >Finally, lots of bugs associated with porting to new hardware manifest > >themselves as "problems" in any VM related function. Since many people > >don't understand the subtle interactions of all of these functions (as > >evidenced by your message) you become convinced the problem is associated > >with this complexity and fail to unravel the clues to the real cause. > > I don't think I deserve this sort of belittling. Treating potential > contributors in this way can only have a negative effect on open > source development. Murray, please - hang in there. We need more people like you. Cut Dan some slack - he appears to be a genius at programming, but maybe is a little short on people skills. He means no harm, but calls them as he sees them. And as in baseball, not everyone always agrees with the umpire. :) (At least; this is the impression I've gathered in the relatively short time I've made his acquaintance and have been reading this list). > > >some silicon > >bug not understood, > > I included my chip revision above. It appears to be a C1 revision chip. > > >or prototype hardware not working correctly. > > Definitely. > > >There are lots of products and systems in development running this software, > >so you have to approach this generic software from the assumption that > >it is first likely to be working. > > I did. I said I was intrigued as to why this problem only affected me. And > once I make the described change, the "generic software" works for me also > (at least an older revision works - current revisions still crash, something > to do with the memory allocation stuff, I believe). > > As I said in my previous message, I suspect something else I am doing is > triggering this bug (that much is obvious), but there are two possibilities: > either I am doing something wrong in my local changes, or the "generic > software" has a bug which does not show up in anyone else's implementation. I > was wondering whether the latter was the case (I wasn't blaming anyone, I was > excited that maybe I had discovered a long existing hidden fault in the > software, that may explain some mysterious failure modes, that someone else > might be getting - other developers may then post, saying "yeah, that would > explain my problem, blah blah", and so the discussion goes on. Upon searching > the archives, I found that a similar problem had been discussed for the 2.2.x > kernels, so maybe the fix or fixes didn't make their way into the 2.[34].x > kernels. I don't know, anything is possible, that's why we have these > discussion groups). Murray, as far as I know - you are maybe the only one running 2.3.x on a powerpc. Most of the kernels that one can find lying about are 2.2.x (13/14? Can't remember at the moment). I, as well as others, definitely want to see 2.3.x or 2.4.0 running on an embedded powerpc. ... > Again, apologies for not providing enough information in my message - I made > assumptions I shouldn't have. Obviously, on my first post I should have been > completely anal, because no-one knows me from a bar of soap. I can then start > to be less exacting after I have been around for a while. Everyone enjoys sarcasm... :) (Don't they?) > >Where did you get the sources? What > >patches did you apply? What are your hardware details? What > >modifications did you make? > > See above. > > >As for 2.4.xx, the 8xx still doesn't work correctly. However, I > >discovered it failed to work after the 403 additions, so I am now > >learning about the 403 in an effort to make everything live happily > >together again. > > It was my feeling that the problems were to do with the new memory allocation > stuff introduced a couple of months ago. > > >Note, this has nothing to do with M_TWB...... > > I know. Now that we have gotten past treating me like a dill, please can you > re-read my original message and see if I am making any sense at all? I would > very much appreciate some insights and even constructive criticism. Cheers! > Murray... > > PS: I haven't contributed the Cogent platform changes yet, because I wasn't > happy that I had done everything properly. This was really my first foray > into taking part in the Linux/PPC embedded development community - I can't > say it has been particularly successful (despite my good feelings about > contributing a small fix a couple of days ago). I will try not to be too > discouraged. That's the spirit! -Dan (A different one). ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) 2000-06-07 3:02 ` Dan A. Dickey @ 2000-06-06 21:37 ` Steve Tarr 0 siblings, 0 replies; 22+ messages in thread From: Steve Tarr @ 2000-06-06 21:37 UTC (permalink / raw) To: Dan A. Dickey; +Cc: Murray Jensen, Dan Malek, linuxppc-embedded "Dan A. Dickey" wrote: Clip, clip, clip.... > > discussion groups). > > Murray, > as far as I know - you are maybe the only one running 2.3.x on > a powerpc. Most of the kernels that one can find lying about > are 2.2.x (13/14? Can't remember at the moment). > > I, as well as others, definitely want to see 2.3.x or 2.4.0 running > on an embedded powerpc. > Hey, it does. I have 2.3.99-pre7 hacked and running on a MPC8260. Actually, a pretty clean port with the exception of handling the SCC as the console. > ... > > > Again, apologies for not providing enough information in my message - I made > > assumptions I shouldn't have. Obviously, on my first post I should have been > > completely anal, because no-one knows me from a bar of soap. I can then start > > to be less exacting after I have been around for a while. > > Everyone enjoys sarcasm... :) (Don't they?) > > > >Where did you get the sources? What > > >patches did you apply? What are your hardware details? What > > >modifications did you make? > > > > See above. > > > > >As for 2.4.xx, the 8xx still doesn't work correctly. However, I > > >discovered it failed to work after the 403 additions, so I am now > > >learning about the 403 in an effort to make everything live happily > > >together again. > > > > It was my feeling that the problems were to do with the new memory allocation > > stuff introduced a couple of months ago. > > > > >Note, this has nothing to do with M_TWB...... > > > > I know. Now that we have gotten past treating me like a dill, please can you > > re-read my original message and see if I am making any sense at all? I would > > very much appreciate some insights and even constructive criticism. Cheers! > > Murray... > > > > PS: I haven't contributed the Cogent platform changes yet, because I wasn't > > happy that I had done everything properly. This was really my first foray > > into taking part in the Linux/PPC embedded development community - I can't > > say it has been particularly successful (despite my good feelings about > > contributing a small fix a couple of days ago). I will try not to be too > > discouraged. > > That's the spirit! > I get more done asking stupid questions than I do pondering the elusive answer. Elephant hide and a resonable reputation of getting things done helps. Hang tough and have fun....... Cheers -- tarr > -Dan (A different one). > -- Steven Tarr Lucent Technologies - Bell Labs 303-538-4056 tarr@lucent.com ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* net driver receive problems 2000-06-05 20:37 ` Dan Malek 2000-06-06 6:31 ` Murray Jensen @ 2000-06-06 17:03 ` Tom Roberts 1 sibling, 0 replies; 22+ messages in thread From: Tom Roberts @ 2000-06-06 17:03 UTC (permalink / raw) To: linuxppc-embedded Does anybody know how to write a net driver for a 2.2 kernel? Rubini's book _Linux_Device_Drivers_ only covers the 2.1 kernel, and significant changes have been made since then. I have essentially identical drivers on board and host, but they behave differently -- the PowerPC version works but the i386 does not. In particular, I have looked through the kernel code and made the driver no longer crash the kernel. But I cannot get the host's network stack to accept packets received by my driver; the powerpc Linux stack accepts them just fine. My configuration is that I boot a PowerPC board (with Linux 2.2.15-2.9.0) from an i386 host running RedHat Linux 2.2.14-5.0. I have a SIO driver on both host and board which becomes the console of the board's Linux, and on the host the boot program becomes a cheap terminal emulator so I can issue commands to the board's Linux and see its printk's and responses. The on-board linux comes up fine and my init script configures and starts the network device. ifconfig of my device on the host shows packets received, but netstat -s shows IP did not get them. When I dump the skb the data it contains looks at least superficially like a valid IP datagram (the 3rd word is the IP address of the board, and the 4th word is the IP address of the host [count words from 0]). The weird thing is that when I ping my PowerPC board from the i386 host, the packets are received on the PowerPC Linux just fine, and they are returned just fine, but the host does not see them after the driver calls netif_rx(). On the PowerPC "netstat -s" shows all packets received and sent by both IP and ICMP; on the host neither IP nor ICMP sees any packets. On both PowerPC and host, "ifconfig lspsnet" shows the right number of packets sent and received. And debugging printk-s of the skb just before the call to netif_rx() are quite similar on the PowerPC and host -- the IP addresses are interchanged as expected. What I think are the relevant details: this is not an ethernet; my do_rcvpkt() is called every tick using a timer; it checks for a packet arrival (in a memory buffer), and when one arrives: // data = pointer to the packet data // len = length of the packet data (# bytes) /* allocate and fill a skb */ len4 = (len+3) & ~3; skb = dev_alloc_skb(len4); if(!skb) return (npkt ? 0 : -ENOMEM); memcpy(skb_put(skb,len),data,len); skb->dev = dev; skb->protocol = ETH_P_IP; skb->pkt_type = PACKET_HOST; skb->ip_summed = CHECKSUM_UNNECESSARY; /* all packets received are for us -- fake mac addr */ skb->mac.raw = skb_push(skb,dev->addr_len); memcpy(skb->mac.raw,dev->dev_addr,dev->addr_len); skb_pull(skb,dev->addr_len); // deliver skb to higher layers netif_rx(skb); // update counter and statistics ++npkt; dev->last_rx = jiffies; ++Enet_stats.rx_packets; Enet_stats.rx_bytes += len; Note the attempt to fake out the mac address (mac.raw must point between skb->head and skb->data). My dev->addr_len is 1, but changing it to 6 did not help. Printing the skb contents shows that the drivers do indeed transfer the data unchanged, and I think I set all the required skb fields above.... WHAT AM I MISSING??? Tom Roberts tjroberts@lucent.com ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-04 4:40 kernel crashes at InstructionTLBMiss Daniel Wu 2000-06-05 2:32 ` Dan A. Dickey 2000-06-05 8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen @ 2000-06-05 14:51 ` Dan Malek 2000-06-05 15:55 ` Dan Malek 2000-06-06 3:56 ` Daniel Wu 2000-06-30 6:17 ` Debug information for elf format Kwansuk Kim 3 siblings, 2 replies; 22+ messages in thread From: Dan Malek @ 2000-06-05 14:51 UTC (permalink / raw) To: Daniel Wu; +Cc: linuxppc-embedded Daniel Wu wrote: > boot the target, I get the following output and nothing more. > > loaded at: 00800000 0080B1D8 > relocated to: 00B00000 00B0B1D8 > board data at: 00B00190 00B001B8 > relocated to: 007F0100 007F0128 > zimage at: 00806000 0087C6C1 > initrd at: 0087C6C1 00A53511 > avail ram: 00A54000 02000000 There are several things to watch for. First, I am surprised you see this much output. You have obviously changed link addresses in the Makefile, which you shouldn't do. Because of the early kernel mapping, everything should reside in the lower 8Mbytes of memory. The zImage support loader (arch/ppc/mbxboot/...stuff...) should link to low memory, 0x00100000. You should load the image either just above that, at 0x00200000 or in very high ROM addresses ( > 16 Mbyte). You are also running an 860T at 50 MHz, so you are likely to discover the "CPU6" silicon errata. You need all of the patches for this. Go to the MontaVista ftp site (ftp.mvista.com), /pub/CDK/wip/ppc_8xx/RPMS. Get the kernel sources/headers from there (along with any other tools you may want or need). This is a 2.2.13 kernel with all patches and the option to include the "CPU6" patch. Don't apply any other patches from anywhere. Just use it and make the minimal changes for your board. Using a BDM is more likely to cause trouble than help. This kernel has XMON and KGDB options. Use them instead of BDM. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek @ 2000-06-05 15:55 ` Dan Malek 2000-06-05 16:19 ` Dan Malek 2000-06-06 3:59 ` Graham Stoney 2000-06-06 3:56 ` Daniel Wu 1 sibling, 2 replies; 22+ messages in thread From: Dan Malek @ 2000-06-05 15:55 UTC (permalink / raw) To: Dan Malek; +Cc: Daniel Wu, linuxppc-embedded [-- Attachment #1: Type: text/plain, Size: 326 bytes --] Dan Malek wrote: > ..... This is a 2.2.13 kernel with all patches and > the option to include the "CPU6" patch. Don't apply any other patches > from anywhere. I was just reminded of a patch to correct a mistake I made in this particular kernel. It is attached. Apply just this one to the MontaVista kernel :-). -- Dan [-- Attachment #2: mv-cpu6-3.patch --] [-- Type: text/plain, Size: 726 bytes --] diff -Nru linux-2.2.13.orig/arch/ppc/kernel/head.S linux-2.2.13/arch/ppc/kernel/head.S --- linux-2.2.13.orig/arch/ppc/kernel/head.S Mon Jun 5 11:44:56 2000 +++ linux-2.2.13/arch/ppc/kernel/head.S Mon Jun 5 11:45:59 2000 @@ -2452,11 +2452,11 @@ SYNC /* Some chip revs need this... */ mtmsr r6 SYNC - lis r7, cmd_line@h - ori r7, r7, cmd_line@l + lis r7, cpu6_bug@h + ori r7, r7, cpu6_bug@l li r4, 0x2c00 - stw r4, 12(r7) - lwz r4, 12(r7) + stw r4, 0(r7) + lwz r4, 0(r7) mtspr 22, r3 /* Update Decrementer */ SYNC mtmsr r5 @@ -2899,6 +2899,10 @@ .globl cmd_line cmd_line: .space 512 + +#ifdef CONFIG_8xx_CPU6 + .space 4 +#endif /* * An undocumented "feature" of 604e requires that the v bit ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-05 15:55 ` Dan Malek @ 2000-06-05 16:19 ` Dan Malek 2000-06-06 3:59 ` Graham Stoney 1 sibling, 0 replies; 22+ messages in thread From: Dan Malek @ 2000-06-05 16:19 UTC (permalink / raw) To: Dan Malek; +Cc: Daniel Wu, linuxppc-embedded [-- Attachment #1: Type: text/plain, Size: 294 bytes --] Dan Malek wrote: > I was just reminded of a patch to correct a mistake I made in this > particular kernel. It is attached. Apply just this one to the > MontaVista kernel :-). Nope, not that one......try this one, sorry. Too many windows open and didn't see the build error...... -- Dan [-- Attachment #2: mv-cpu6-1.patch --] [-- Type: text/plain, Size: 737 bytes --] diff -Nru linux-2.2.13.orig/arch/ppc/kernel/head.S linux-2.2.13/arch/ppc/kernel/head.S --- linux-2.2.13.orig/arch/ppc/kernel/head.S Fri Mar 24 23:43:32 2000 +++ linux-2.2.13/arch/ppc/kernel/head.S Fri Mar 24 23:51:19 2000 @@ -2452,11 +2452,11 @@ SYNC /* Some chip revs need this... */ mtmsr r6 SYNC - lis r7, cmd_line@h - ori r7, r7, cmd_line@l + lis r7, cpu6_bug@h + ori r7, r7, cpu6_bug@l li r4, 0x2c00 - stw r4, 12(r7) - lwz r4, 12(r7) + stw r4, 0(r7) + lwz r4, 0(r7) mtspr 22, r3 /* Update Decrementer */ SYNC mtmsr r5 @@ -2899,6 +2899,10 @@ .globl cmd_line cmd_line: .space 512 + +#ifdef CONFIG_8xx_CPU6 +cpu6_bug: + .space 4 +#endif /* * An undocumented "feature" of 604e requires that the v bit ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-05 15:55 ` Dan Malek 2000-06-05 16:19 ` Dan Malek @ 2000-06-06 3:59 ` Graham Stoney 1 sibling, 0 replies; 22+ messages in thread From: Graham Stoney @ 2000-06-06 3:59 UTC (permalink / raw) To: Dan Malek; +Cc: LinuxPPC Embedded Mailing List Dan Malek writes: > I was just reminded of a patch to correct a mistake I made in this > particular kernel. It is attached. Apply just this one to the > MontaVista kernel :-). I'd like to suggest the following patch as a more complete fix to avoid any possible cmd_line corruption due to the CPU6 workaround. Regards, Graham Index: arch/ppc/kernel/head.S =================================================================== retrieving revision 1.1.1.3 diff -u -r1.1.1.3 head.S --- arch/ppc/kernel/head.S 2000/03/10 01:11:12 1.1.1.3 +++ arch/ppc/kernel/head.S 2000/06/06 03:53:32 @@ -2286,15 +2286,15 @@ lwz r9,PGD(r9) /* get new->mm->pgd */ addis r9,r9,-KERNELBASE@h /* convert to phys addr */ #ifdef CONFIG_8xx_CPU6 - lis r6, cmd_line@h - ori r6, r6, cmd_line@l + lis r6, cpu6_bug@h + ori r6, r6, cpu6_bug@l li r7, 0x3980 - stw r7, 12(r6) - lwz r7, 12(r6) + stw r7, 0(r6) + lwz r7, 0(r6) mtspr M_TWB, r9 /* Update MMU base address */ li r7, 0x3380 - stw r7, 12(r6) - lwz r7, 12(r6) + stw r7, 0(r6) + lwz r7, 0(r6) mtspr M_CASID, r5 /* Update context */ #else mtspr M_TWB, r9 /* Update MMU base address */ @@ -2432,11 +2432,11 @@ SYNC /* Some chip revs need this... */ mtmsr r6 SYNC - lis r7, cmd_line@h - ori r7, r7, cmd_line@l + lis r7, cpu6_bug@h + ori r7, r7, cpu6_bug@l li r4, 0x3980 - stw r4, 12(r7) - lwz r4, 12(r7) + stw r4, 0(r7) + lwz r4, 0(r7) mtspr M_TWB, r3 /* Update MMU base address */ SYNC mtmsr r5 @@ -2452,11 +2452,11 @@ SYNC /* Some chip revs need this... */ mtmsr r6 SYNC - lis r7, cmd_line@h - ori r7, r7, cmd_line@l + lis r7, cpu6_bug@h + ori r7, r7, cpu6_bug@l li r4, 0x2c00 - stw r4, 12(r7) - lwz r4, 12(r7) + stw r4, 0(r7) + lwz r4, 0(r7) mtspr 22, r3 /* Update Decrementer */ SYNC mtmsr r5 @@ -2899,6 +2899,11 @@ .globl cmd_line cmd_line: .space 512 + +#ifdef CONFIG_8xx_CPU6 +cpu6_bug: + .space 4 +#endif /* * An undocumented "feature" of 604e requires that the v bit -- Graham Stoney Principal Hardware/Software Engineer Canon Information Systems Research Australia Ph: +61 2 9805 2909 Fax: +61 2 9805 2929 ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek 2000-06-05 15:55 ` Dan Malek @ 2000-06-06 3:56 ` Daniel Wu 2000-06-06 20:18 ` Dan Malek 2000-08-10 12:05 ` too few RAM? Wojciech Kromer 1 sibling, 2 replies; 22+ messages in thread From: Daniel Wu @ 2000-06-06 3:56 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-embedded Dan, Dan Malek wrote: > > boot the target, I get the following output and nothing more. > > > > loaded at: 00800000 0080B1D8 > > relocated to: 00B00000 00B0B1D8 > > board data at: 00B00190 00B001B8 > > relocated to: 007F0100 007F0128 > > zimage at: 00806000 0087C6C1 > > initrd at: 0087C6C1 00A53511 > > avail ram: 00A54000 02000000 > > There are several things to watch for. First, I am surprised you see > this much output. You have obviously changed link addresses in the > Makefile, which you shouldn't do. Because of the early kernel > mapping, everything should reside in the lower 8Mbytes of memory. The > zImage support loader (arch/ppc/mbxboot/...stuff...) should link to > low memory, 0x00100000. You should load the image either just above > that, at 0x00200000 or in very high ROM addresses ( > 16 Mbyte). > The reason why I changed the address was because my uncompressed kernel is about 1.3M. This means if I load at 0x100000 (the default), then the board data gets trashed and I get nothing after the kernel has finished decompressing. I initially moved the address to 0x200000, but then I was getting other strange errors so I moved the whole thing higher into memory. BTW, I found the probelm that caused the crash in InstructionTLBMiss, partly thanks to Murray Jensen. I did not implement his patches but while reading his email, and trying to follow it, I realised that the M_TWB was not initialised properly in the first place! There was some code that was not suppose to be there - probably introduced while trying to patch the file from various sources. I now get further, but unfortunately I'm not there yet. The code stops after the RAM disk driver inits. Anyway, I'm thinking of starting from scratch with the kernel and patches at the MontaVista site. Come to think of it, all my changes are in one file so it should not be too difficult to port. I will let you know how I go. > > You are also running an 860T at 50 MHz, so you are likely to discover > the "CPU6" silicon errata. You need all of the patches for this. > > Go to the MontaVista ftp site (ftp.mvista.com), /pub/CDK/wip/ppc_8xx/RPMS. > Get the kernel sources/headers from there (along with any other tools > you may want or need). This is a 2.2.13 kernel with all patches and > the option to include the "CPU6" patch. Don't apply any other patches > from anywhere. Just use it and make the minimal changes for your > board. > > Using a BDM is more likely to cause trouble than help. This kernel > has XMON and KGDB options. Use them instead of BDM. Unfortunately, if you get no output, BDM is the _only_ option you have - at least it will give you some details of the registers, although you can't step through code die to the virtual addresses. Thanks for everyone suggestions. Regards, Daniel ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: kernel crashes at InstructionTLBMiss 2000-06-06 3:56 ` Daniel Wu @ 2000-06-06 20:18 ` Dan Malek 2000-08-10 12:05 ` too few RAM? Wojciech Kromer 1 sibling, 0 replies; 22+ messages in thread From: Dan Malek @ 2000-06-06 20:18 UTC (permalink / raw) To: Daniel Wu; +Cc: Dan Malek, linuxppc-embedded Daniel Wu wrote: > The reason why I changed the address was because my uncompressed kernel is > about 1.3M. This means if I load at 0x100000 (the default), then the board data > gets trashed Yes, but just move it up a little. I know you are running the 2.2.xx kernel, and in the 2.3/2.4 kernel I moved this to 0x180000, which is the only change necessary. > .... I realised that the M_TWB was not initialised > properly in the first place! I would believe this in a 2.3.xx kernel, but not 2.2...... > ... Anyway, I'm thinking of starting from scratch with > the kernel and patches at the MontaVista site. Please do. I know that runs on many platforms. > Unfortunately, if you get no output, BDM is the _only_ option you have - at > least it will give you some details of the registers, although you can't step > through code die to the virtual addresses. A good boot rom is a better debugging tool than a BDM. The BDM is only useful for the first few instructions of the kernel. Dumping out key kernel data structures is more useful than the contents of registers at the time a BDM catches a trap. Porting to a new 8xx board is almost a no brainer with the MontaVista 2.2.13 and later kernels. All you need to do is properly set the IMMR and the processor clock speed in the board information structure. Any board will boot far enough to get console output and attach KGDB or XMON. When you start changing lots of code before this point, you are just asking for trouble. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* too few RAM? 2000-06-06 3:56 ` Daniel Wu 2000-06-06 20:18 ` Dan Malek @ 2000-08-10 12:05 ` Wojciech Kromer 2000-08-10 14:49 ` Dan Malek 1 sibling, 1 reply; 22+ messages in thread From: Wojciech Kromer @ 2000-08-10 12:05 UTC (permalink / raw) To: Daniel Wu, linuxppc-embedded i'm trying to run mpc8xx-2.2.13 (with all patches I found) - with 8xxrom (0.3.0) - on MPC8XXFADS .MPC823e .4MB RAM .2MB FLASH this is is my boot time output: ================== entry 0x100000, phoff 0x34, shoff 0x75a60 phnum 0x1, shnum 0x9 p_offset 0x10000, p_vaddr 0x100000, p_paddr 0x100000 p_filesz 0x530c, p_memsz 0xb1cc Loading at 0x10c000 Size 486060 475 blocks Starting 0x11c000 loaded at: 0011C000 001271CC relocated to: 00100000 0010B1CC board data at: 003F0000 003F001C relocated to: 0010C100 0010C11C zimage at: 00122000 00181A24 avail ram: 00182000 00400000 Linux/PPC load: Uncompressing Linux...done. Now booting the kernel and here is what ==================== exception: Implementation Specific Instruction TLB miss 0xc00d95bc can't read memory address f823Bug> md 0d95b0 :i 0x000d95b0: bb010010 lmw r24, 0x10(r1) 0x000d95b4: 38210030 addi r1,r1, 0x30 0x000d95b8: 4e800020 bclr 0x14,0 0x000d95bc: 9421ffd0 stwu r1,-0x30(r1) 0x000d95c0: 7c0802a6 mfspr r0,LR 0x000d95c4: bf810020 stmw r28, 0x20(r1) 0x000d95c8: 90010034 stw r0, 0x34(r1) 0x000d95cc: 3d20c00d addis r9,r0,0xc00d 0x000d95d0: 80697420 lwz r3, 0x7420(r9) 0x000d95d4: 3fa0c00d addis r29,r0,0xc00d 0x000d95d8: 4bf36f79 bl 0x00010550 0x000d95dc: 3f80c00d addis r28,r0,0xc00d 0x000d95e0: 38610008 addi r3,r1, 0x8 0x000d95e4: 38bc7408 addi r5,r28, 0x7408 0x000d95e8: 389d7404 addi r4,r29, 0x7404 0x000d95ec: 480004d9 bl 0x000d9ac4 ==================== Q1: is it not enough RAM to run this stuff ? Q2: does anyone have ALL pathched files to run with my board? (HHL nos not support FADS!) PS please answer to my priv too (krom@dgt-lab.com.pl) * * * * * * * * * * * * * * per pedes ad astra ! * * * * * * * * * * * * * * mailto:krom@softomat.com.pl ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: too few RAM? 2000-08-10 12:05 ` too few RAM? Wojciech Kromer @ 2000-08-10 14:49 ` Dan Malek 2000-08-17 11:49 ` Wojciech Kromer 0 siblings, 1 reply; 22+ messages in thread From: Dan Malek @ 2000-08-10 14:49 UTC (permalink / raw) To: Wojciech Kromer; +Cc: Daniel Wu, linuxppc-embedded Wojciech Kromer wrote: > > i'm trying to run mpc8xx-2.2.13 (with all patches I found) > - with 8xxrom (0.3.0) > - on MPC8XXFADS > .MPC823e > .4MB RAM > .2MB FLASH Yes, sorry, this is too little RAM. A long time ago in a land far, far away (before initrd support), you could boot in less than 8 Mbytes. Not any more. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: too few RAM? 2000-08-10 14:49 ` Dan Malek @ 2000-08-17 11:49 ` Wojciech Kromer 0 siblings, 0 replies; 22+ messages in thread From: Wojciech Kromer @ 2000-08-17 11:49 UTC (permalink / raw) To: Dan Malek, linuxppc-embedded Dan Malek wrote: > > Wojciech Kromer wrote: > > > > i'm trying to run mpc8xx-2.2.13 (with all patches I found) > > - with 8xxrom (0.3.0) > > - on MPC8XXFADS > > .MPC823e > > .4MB RAM > > .2MB FLASH > > Yes, sorry, this is too little RAM. A long time ago in a land > far, far away (before initrd support), you could boot in less than 8 > Mbytes. Not any more. > > -- Dan > now i have 8MB RAM,but my kernel still hangs trying to run any application my kernel is: 'Using ELF interpreter /lib/ld.so.1' i was trying to compile ld.so, but there are erros: lddstub.S:19: #error Only know how to support i386, m68k and sparc architectures lddstub.S:36: #error Only know how to support i386, m68k and sparc architectures ld.so... versions from net dosent semm to work (i was trying to use one from HHL, and from mbxroot.min.tgz file) Q1: where can i get FULL BINARY versions of: -8xxrom (or something like this) -kernel starting from nfs -root file system (to put it on nfs server) form my board Q2: are there any source packages for ALL what i need to re-compile -- * * * * * * * * * * * * * * per pedes ad astra ! * * * * * * * * * * * * * * mailto:krom@dgt-lab.com.pl ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Debug information for elf format 2000-06-04 4:40 kernel crashes at InstructionTLBMiss Daniel Wu ` (2 preceding siblings ...) 2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek @ 2000-06-30 6:17 ` Kwansuk Kim 2000-06-30 6:46 ` sungyeon 3 siblings, 1 reply; 22+ messages in thread From: Kwansuk Kim @ 2000-06-30 6:17 UTC (permalink / raw) To: linuxppc-embedded Hi, everyone, I use SMC BDM tool to load linux kernel on my custom mpu860 board. I try to fix the source to operate on the custom board. But in case of C file it's too difficult. SMC BDM tool supports only ELF file format. But according to the GCC howto, it doesn't produce debug information for elf format but stab, DWARF or COFF. What should I do to debug C code on BDM? I'm debugging with data prompted on serial console (SMC2). But it's too hard because I compile the kernel on Linux and operate on win98. Reboot over twenty times a day. :-( ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug information for elf format 2000-06-30 6:17 ` Debug information for elf format Kwansuk Kim @ 2000-06-30 6:46 ` sungyeon 0 siblings, 0 replies; 22+ messages in thread From: sungyeon @ 2000-06-30 6:46 UTC (permalink / raw) To: Kwansuk Kim, linuxppc-embedded Hi. you can crate DWARF using "-gdwarf" option. ----- Original Message ----- From: "Kwansuk Kim" <kskim@neowave.co.kr> To: <linuxppc-embedded@lists.linuxppc.org> Sent: Friday, June 30, 2000 3:17 PM Subject: Debug information for elf format > > Hi, everyone, > > I use SMC BDM tool to load linux kernel on my custom mpu860 board. > > I try to fix the source to operate on the custom board. But in case of C file it's too difficult. SMC BDM tool supports only ELF file format. But according to the GCC howto, it doesn't produce debug information for elf format but stab, DWARF or COFF. > > What should I do to debug C code on BDM? > > I'm debugging with data prompted on serial console (SMC2). But it's too hard because I compile the kernel on Linux and operate on win98. Reboot over twenty times a day. :-( > > > > ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2000-08-17 11:49 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2000-06-04 4:40 kernel crashes at InstructionTLBMiss Daniel Wu 2000-06-05 2:32 ` Dan A. Dickey 2000-06-05 8:19 ` 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) Murray Jensen 2000-06-05 20:37 ` Dan Malek 2000-06-06 6:31 ` Murray Jensen 2000-06-06 20:05 ` Dan Malek 2000-06-07 3:05 ` Dan A. Dickey 2000-06-07 9:17 ` Murray Jensen 2000-06-07 3:02 ` Dan A. Dickey 2000-06-06 21:37 ` Steve Tarr 2000-06-06 17:03 ` net driver receive problems Tom Roberts 2000-06-05 14:51 ` kernel crashes at InstructionTLBMiss Dan Malek 2000-06-05 15:55 ` Dan Malek 2000-06-05 16:19 ` Dan Malek 2000-06-06 3:59 ` Graham Stoney 2000-06-06 3:56 ` Daniel Wu 2000-06-06 20:18 ` Dan Malek 2000-08-10 12:05 ` too few RAM? Wojciech Kromer 2000-08-10 14:49 ` Dan Malek 2000-08-17 11:49 ` Wojciech Kromer 2000-06-30 6:17 ` Debug information for elf format Kwansuk Kim 2000-06-30 6:46 ` sungyeon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).