From mboxrd@z Thu Jan 1 00:00:00 1970 To: Daniel Wu cc: linuxppc-embedded@lists.linuxppc.org Subject: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) In-Reply-To: Message from Daniel Wu of "Sun, 04 Jun 2000 14:40:31 +1000." <00Jun4.144038est.115228@border.alcanet.com.au> Date: Mon, 05 Jun 2000 18:19:31 +1000 Message-ID: <21966.960193171@msa.cmst.csiro.au> From: Murray Jensen Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: > mfspr r20, M_TWB /* Get level 1 table entry address */ ... >As you can see, r20 is 400f1c00, which looks wrong, but why? Any suggestions? At this point, the MMU is disabled so r20, which is loaded from the MMU Table Walk Base register, should be a physical address - 400f1c00 is not a likely physical address for something in RAM (unless you have a weird disjoint RAM setup), so yes it certainly looks wrong. Incorrect TWB contents is a disaster. Here we come to a dilemma that I have had since I started with this stuff. I have never been able to get an 8xx kernel running without adding a patch to update the Table Walk Base register at the time that a new mm context is activated. Let me explain: normally the TWB is loaded at context switch time which makes sense because a different task with a different virtual memory context will be running. This is done in the following code in the _switch function in arch/ppc/kernel/entry.S: > tophys(r0,r4) > mtspr SPRG3,r0 /* Update current THREAD phys addr */ >#ifdef CONFIG_8xx > /* XXX it would be nice to find a SPRGx for this on 6xx,7xx too */ > lwz r9,PGDIR(r4) /* cache the page table root */ > tophys(r9,r9) /* convert to phys addr */ > mtspr M_TWB,r9 /* Update MMU base address */ > tlbia > SYNC >#endif /* CONFIG_8xx */ The contents of the TWB should be the address stored in current->thread.pgdir converted to a physical address. The above code is the only place that the TWB is written to, anywhere in the kernel (that I can find). The TWB is then used in the TLB miss handlers to load the TLB entry (assuming a mapping exists - if not, do_page_fault() is called to fill it in). But I have found that there is a situation during "exec()" where a newly created mm context is "activated" (via activate_mm() in asm/mmu_context.h) before the task is actually "switch"ed to (presumably to copy the arguments and environment etc from the old task - which is being overwritten) i.e. the TWB is not updated because a switch hasn't occurred [NOTE: this is only my theory - I am not an expert on this stuff] Without my patch, the exec of "/sbin/init" hangs in an endless TLBMiss handler loop, where a virtual address is accessed which causes a TLB miss, the TWB has contents of the old pgdir which does not have a mapping for that virtual address so do_page_fault() is called to fill it in, but do_page_fault() decides that that mapping exists and everything is ok so why the hell did you call me, I'll just return doing nothing! - the access is re-tried which causes a TLB miss again at the same virtual address. The kernel is in a dead hang (although later 2.[34].* kernels exhibit different symptoms, which mystifies me a bit - i.e. characters typed on the console are echoed, and I know timer interrupts are occuring, because I have a rotating thingy on the LCD display which updates once a second via the timer interrupt handler, so it is not a complete dead hang). The patch I always have to add to arch/ppc/kernel/head_8xx.S is: */ _GLOBAL(set_context) mtspr M_CASID,r3 /* Update context */ + lwz r3, THREAD+PGDIR(r2) + tophys(r3, r3) + mtspr M_TWB, r3 tlbia SYNC blr I know this is wrong, but it seems to work for me (unless the TWB can be considered to be part of the MMU context, and therefore it is legitimate to update it in set_context()? I don't know). I have tried other things e.g. adding a "set_context_and_twb()" function, just after the set_context() function (without above patch), e.g.: --- arch/ppc/kernel/head_8xx.S 2000/04/28 06:35:05 1.1.1.5 +++ arch/ppc/kernel/head_8xx.S 2000/06/05 07:51:50 @@ -905,6 +905,19 @@ SYNC blr +/* + * the 8xx tablewalk base register (M_TWB) must be consistent with + * the currently active mm. This is called from switch_mm() and + * activate_mm() in include/asm-ppc/mmu_context.h + */ +_GLOBAL(set_context_and_twb) + mtspr M_CASID,r3 /* Update context */ + tophys(r4, r4) + mtspr M_TWB, r4 + tlbia + SYNC + blr + /* Jump into the system reset for the rom. * We first disable the MMU, and then jump to the ROM reset address. * Then doing something like this: --- include/asm-ppc/mmu_context.h 2000/03/07 03:59:54 1.1.1.2 +++ include/asm-ppc/mmu_context.h 2000/06/05 07:46:35 @@ -52,6 +52,11 @@ extern void set_context(int context); #ifdef CONFIG_8xx +/* same as above plus loads the 8xx tablewalk base register also */ +extern void set_context_and_twb(int, void *); +#endif + +#ifdef CONFIG_8xx extern inline void mmu_context_overflow(void) { atomic_set(&next_mmu_context, -1); @@ -85,7 +90,10 @@ { tsk->thread.pgdir = next->pgd; get_mmu_context(next); - set_context(next->context); + if (tsk == current) + set_context_and_twb(next->context, tsk->thread.pgdir); + else + set_context(next->context); } /* @@ -96,7 +104,7 @@ { current->thread.pgdir = mm->pgd; get_mmu_context(mm); - set_context(mm->context); + set_context_and_twb(mm->context, current->thread.pgdir); } /* This works also, though I'm not sure about it. I was thinking that maybe the set_context() in switch_mm() should only be done if the switch_mm() is being performed on the "current" task. e.g. tsk->thread.pgdir = next->pgd; get_mmu_context(next); if (tsk == current) set_context(next->context); Then set_context() could simply update the TWB with current->thread.pgdir. But I think the only place switch_mm() is called is in the task context switch code anyway, which means current is about to change, and also means I get confused :-) But I know activate_mm() is used in other places - something to do with "lazy tlb" mode, and also in exec(). I give up. One thing I think is for certain in all this - do_page_fault() should *NEVER* return without having done something - anything - to ensure that the same fault does not re-occur after the handler returns - if it can't handle the fault, it should either kill the task if it is in user mode, or panic if in kernel mode. One thing that bothers me is why this behaviour only occurs for me? I have no idea, but obviously it is only me, otherwise no-one would have a working embedded 8xx 2.[34].* kernel. I suspect I am doing something else which triggers this bug, or else there is something I don't understand (not unlikely :-). Note: I have only ever tried the 2.[34].* series of kernels. I have not tried the 2.2.* kernels, but some code snippets I have seen in the list archives suggest ... I just searched the list and found the following comment from Dan Malek on 16 Dec 98: > BTW, why must the M_TWB be set in SET_PAGE_DIR ? The M_TWB points to the first level page table (Linux pgd_t) and is used in the mpc8xx page fault handler. When Linux deletes or otherwise modifies the memory map object such that the first level page table is modified (as during exec), it uses SET_PAGE_DIR. Since the first level table has potentially moved to a new memory location, we have to set M_TWB at this time. If we don't, a process exec without an intervening context switch will cause us to use a bogus M_TWB when trying to find page tables. -- Dan OK - where is SET_PAGE_DIR() in the 2.[34].* kernels? Following the threads it appears that this discussion was had a long time ago, but in the other direction - the TWB was being updated too often, and the consensus was that it should only be updated when the SET_PAGE_DIR macro was setting the page dir for the current task. Now it is not setting it at all. I think I'd better shut up now and let other more experienced people tell me what I have missed or where I have gone wrong :-) Cheers! Murray... -- Murray Jensen, CSIRO Manufacturing Sci & Tech, Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@cmst.csiro.au (old address was mjj@mlb.dmt.csiro.au) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/