LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 23:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254782248.7122.49.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 06/10/2009 00:37:28:
>
> On Tue, 2009-10-06 at 00:31 +0200, Joakim Tjernlund wrote:
> >
> > regs or regs->nip is NULL? Either one does not make sense
> > In any case it might be a secondary problem as DAR is NULL already
> > when you
> > enter the page fault.
> > >
> > >                 insn = *((unsigned long *)regs->nip);
> > > c000e110:       80 a9 00 00     lwz     r5,0(r9)
> > >
> > > fails.
> >
> > hmm, I wonder if you managed to invalidate the a kernel TLB?
> > Are you using pinned kernel TLBs?
>
> You should not dereference a user address like that. Use get_user !

So how does this look? Does it change anything?
It should as the previous way was way off :(

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c33c6de..08a392f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -153,7 +153,7 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
 #ifdef DEBUG_DCBX
 		const char *istr = NULL;

-		insn = *((unsigned long *)regs->nip);
+		__get_user(insn, (unsigned long __user *)regs->nip);
 		if (((insn >> (31-5)) & 0x3f) == 31) {
 			if (((insn >> 1) & 0x3ff) == 1014) /* dcbz ? 0x3f6 */
 				istr = "dcbz";
@@ -178,11 +178,12 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
 					       ra, rb, dar);
 					is_write = 0;
 				}
-
+#if 0
 				if (trap == 0x300 && address != dar) {
 					__asm__ ("mtdar %0" : : "r" (dar));
 					return 0;
 				}
+#endif
 			}
 		}
 #endif
@@ -191,7 +192,7 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,

 			/* This is from a dcbX or icbi insn gone bad, these
 			 * insn do not set DAR so we have to do it here instead */
-			insn = *((unsigned long *)regs->nip);
+			__get_user(insn, (unsigned long __user *)regs->nip);

 			ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
 			rb = (insn >> (31-20)) & 0x1f; /* Reg RB */

^ permalink raw reply related

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Joakim Tjernlund @ 2009-10-05 23:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254784539.7122.52.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 06/10/2009 01:15:39:
>
> On Tue, 2009-10-06 at 00:55 +0200, Joakim Tjernlund wrote:
> > > Sure but if dirty is cleared by the kernel, then you also need to
> > remove
> > > write permission in the TLB or it will miss setting dirty on the
> > next
> > > store to the page.
> >
> > Dirty map to HW changed bit so if kernel clears it, the next
> > store will trap to DTLB error. There I will check RW and clear it
> > again
> > without trapping to C. Is that OK? Not if I read you correcly below ..
>
> Well, if the HW has the ability to enforce trap when store with !DIRTY,

Yes, provided that the kernel invalidates the TLB too so the next access
will provoke a TLB Miss, which will then provoke a TLB error. The TLB
error routine checks VALID, RW and USER(if not a kernel access), then sets
ACCESSED & DIRTY and writes the TLB(RPN reg).

Perhaps the missing invalidate is haunting us here?

> then that's fine, just map it that way, but you shouldn't have to handle
> it in the DTLB error neither, the kernel will fix it up for you in
> handle_pte_fault().

Does not all ppc have the Changed bit?

>
> I think I need to get myself an 8xx manual and figure out what that damn
> MMU actually does :-)

Please do, get the mpc862 users manual :)

>
> > No it's really !(RW _AND_ DIRTY) -> no store permitted, and
> >
> > ..  hmm, I don't do that. I should do a
> >     if ( store && !(pte & (RW | DIRTY))
> >        goto DSI
> >  in DTLB error?
>
> Well, if the HW does the test of DIRTY, then it's fine, as you seem to
> suggest above. _something_ needs to do it, either SW or HW, ie, we need
> to catch any attempt to store to a non-dirty page to set dirty.

Yes, I think I do this.

>
> > > !(PRESENT _AND_ ACCESSED) -> no access permitted.
> >
> > Yes, how about those pinned kernel ptes. Do they have ACCESSED from
> > start?
>
> They should.
>
> Cheers,
> Ben.
>
>
>
>

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Benjamin Herrenschmidt @ 2009-10-05 23:15 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF346FF2E6.3CCDA0FA-ONC1257646.007C13A0-C1257646.007DF396@transmode.se>

On Tue, 2009-10-06 at 00:55 +0200, Joakim Tjernlund wrote:
> > Sure but if dirty is cleared by the kernel, then you also need to
> remove
> > write permission in the TLB or it will miss setting dirty on the
> next
> > store to the page.
> 
> Dirty map to HW changed bit so if kernel clears it, the next
> store will trap to DTLB error. There I will check RW and clear it
> again
> without trapping to C. Is that OK? Not if I read you correcly below ..

Well, if the HW has the ability to enforce trap when store with !DIRTY,
then that's fine, just map it that way, but you shouldn't have to handle
it in the DTLB error neither, the kernel will fix it up for you in
handle_pte_fault().

I think I need to get myself an 8xx manual and figure out what that damn
MMU actually does :-)

> No it's really !(RW _AND_ DIRTY) -> no store permitted, and
> 
> ..  hmm, I don't do that. I should do a
>     if ( store && !(pte & (RW | DIRTY))
>        goto DSI
>  in DTLB error?

Well, if the HW does the test of DIRTY, then it's fine, as you seem to
suggest above. _something_ needs to do it, either SW or HW, ie, we need
to catch any attempt to store to a non-dirty page to set dirty.

> > !(PRESENT _AND_ ACCESSED) -> no access permitted.
> 
> Yes, how about those pinned kernel ptes. Do they have ACCESSED from
> start?

They should.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 22:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254782248.7122.49.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 06/10/2009 00:37:28:
>
> On Tue, 2009-10-06 at 00:31 +0200, Joakim Tjernlund wrote:
> >
> > regs or regs->nip is NULL? Either one does not make sense
> > In any case it might be a secondary problem as DAR is NULL already
> > when you
> > enter the page fault.
> > >
> > >                 insn = *((unsigned long *)regs->nip);
> > > c000e110:       80 a9 00 00     lwz     r5,0(r9)
> > >
> > > fails.
> >
> > hmm, I wonder if you managed to invalidate the a kernel TLB?
> > Are you using pinned kernel TLBs?
>
> You should not dereference a user address like that. Use get_user !

Ah, forgot about that. Will change

>
> Obviously you got 0 in SRR0 for some reason (somebody tried to jump
> to 0, either intentionally or as a result of some other problem) and
> the above will crash the kernel when it happens.
>
> Cheers,
> Ben.
>
>
>

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 23:00 UTC (permalink / raw)
  To: Rex Feany; +Cc: Scott Wood, linuxppc-dev@ozlabs.org
In-Reply-To: <20091005224218.GA5273@compile2.chatsunix.int.mrv.com>

Rex Feany <RFeany@mrv.com> wrote on 06/10/2009 00:42:18:
>
> Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
>
> > > I got this oops:
> > >
> > > Unable to handle kernel paging request for data at address 0x00000000
> > > Faulting instruction address: 0xc000e110
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > MRV NM2
> > > NIP: c000e110 LR: c000d520 CTR: 1006bf40
> > > REGS: c2203dd0 TRAP: 0300   Not tainted  (2.6.32-rc3-00014-gcea49b0-dirty)
> > > MSR: 00009032 <EE,ME,IR,DR>  CR: 44022422  XER: 20000000
> > > DAR: 00000000, DSISR: c0000000
> > > TASK = c21c18c0[69] 'rc.sysinit' THREAD: c2202000
> > > GPR00: 00000401 c2203e80 c21c18c0 c2203f50 00000000 4000d032 00000000 00000000
> > > GPR08: 10093eb4 00000000 00009032 c000d514 021c1ad0 10095c50 03ffb000 10006500
> > > GPR16: 10006508 00000000 00000000 00000000 100a2fc8 021c4238 00000000 00000400
> > > GPR24: c21a2700 00000002 40000000 00000000 00000000 c2203f50 00000000 00000011
> > > NIP [c000e110] do_page_fault+0x44/0x5fc
> > > LR [c000d520] handle_page_fault+0xc/0x80
> > > Call Trace:
> > > [c2203e80] [c000e58c] do_page_fault+0x4c0/0x5fc (unreliable)
> > > [c2203f40] [c000d520] handle_page_fault+0xc/0x80
> > > Instruction dump:
> > > 800300a0 7cba2b78 54170036 2f970400 7c9c2378 830200e0 54b6018c 40be000c
> > > 74ba4820 3ac00000 813d0080 3bc00000 <80a90000> 54a036be 2f80001f 40be0100
> > > ---[ end trace 99a4d88f7e2f1b60 ]---
> > >
> > > this happens in do_page_fault becaose regs->nip is null, so
> >
> > regs or regs->nip is NULL? Either one does not make sense
> > In any case it might be a secondary problem as DAR is NULL already when you
> > enter the page fault.
>
> I assumed it was NIP because ... I'm not sure why. TRAP() above
> dereferences regs, and that didn't fail, but I didn't see that until
> now.

It probably was and I need to use get_user() instead.

>
> > >                 insn = *((unsigned long *)regs->nip);
> > > c000e110:       80 a9 00 00     lwz     r5,0(r9)
> >
> > hmm, I wonder if you managed to invalidate the a kernel TLB?
> > Are you using pinned kernel TLBs?
>
> I'm not using pinned kernel TLBs, should I be?

Pinned TLBs is usally faster unless you know better or so I think.

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Joakim Tjernlund @ 2009-10-05 22:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254780556.7122.47.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 06/10/2009 00:09:16:
>
> On Tue, 2009-10-06 at 00:00 +0200, Joakim Tjernlund wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 23:37:23:
> > >
> > > On Mon, 2009-10-05 at 23:25 +0200, Joakim Tjernlund wrote:
> > > >
> > > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 22:17:04:
> > > > >
> > > > > On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> > > > > > Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> > > > > > Pros:
> > > > > >  - I/D TLB Miss never needs to write to the linux pte.
> > > > > >  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
> > > > > >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > > > > >     when a page has been made dirty.
> > > > >
> > > > > Not sure here. You seem to still set those from asm. Is that necessary ?
> > > >
> > > > Well, yes. head_32.S also sets ACCESSED and DIRTY from asm so why not me?
> > >
> > > Don't look at the hash 32 code :-)
> >
> > So what to look at then? :)
>
> head_440.S on a recent 2.6

OK, will look

>
> > > Most of the time you do anyways since the PTE isn't populated at all. At
> > > which point, Linux will populate a PTE with DIRTY and ACCESSED already
> > > set. It should be reasonably rare to actually fault because DIRTY and/or
> > > ACCESSED are missing.
> >
> > I tried to unconditionally trap to C in DTLB error but it just hung if I did
> > that so the asm has to do something.
>
> Sure, the question is what and how can we get away without it ? :-)

You tell me :)

>
> > > > > The approach I take on BookE is to simply not set these from asm, -and-
> > > > > (this is important) not set write permission if dirty is not set in
> >
> > Did you really mean "if dirty is not" ? I test RW for write permission.
> > Dirty is just set when the first write happens after the permission check.
>
> Sure but if dirty is cleared by the kernel, then you also need to remove
> write permission in the TLB or it will miss setting dirty on the next
> store to the page.

Dirty map to HW changed bit so if kernel clears it, the next
store will trap to DTLB error. There I will check RW and clear it again
without trapping to C. Is that OK? Not if I read you correcly below ..

>
> So dirty basically acts as a filter on RW, and accessed as a filter on
> valid if you want.
>
> > > Mostly only !ACCESSED -> no access permitted and !DIRTY -> no store
> > > permitted and don't write anything back if !VALID.
> >
> > That should be !RW and not !DIRTY I hope? Then trap
> > first store and set DIRTY (without trapping to C)
>
> No it's really !(RW _AND_ DIRTY) -> no store permitted, and

..  hmm, I don't do that. I should do a
    if ( store && !(pte & (RW | DIRTY))
       goto DSI
 in DTLB error?

> !(PRESENT _AND_ ACCESSED) -> no access permitted.

Yes, how about those pinned kernel ptes. Do they have ACCESSED from start?

>
> Cheers,
> Ben.
>
> > >
> > > Cheers,
> > > Ben.
> > >
> > > >  Jocke
> > > >
> > > > >
> > > > > Cheers,
> > > > > Ben.
> > > > >
> > > > > >  - Proper RO/RW mapping of user space.
> > > > > >  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> > > > > > Cons:
> > > > > >  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
> > > > > >    not written anymore, it should still be a win.
>
>
>
>

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Rex Feany @ 2009-10-05 22:42 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org
In-Reply-To: <OF9AD6CA21.DCD59A1B-ONC1257646.007AE622-C1257646.007BB767@transmode.se>

Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):

> > I got this oops:
> >
> > Unable to handle kernel paging request for data at address 0x00000000
> > Faulting instruction address: 0xc000e110
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > MRV NM2
> > NIP: c000e110 LR: c000d520 CTR: 1006bf40
> > REGS: c2203dd0 TRAP: 0300   Not tainted  (2.6.32-rc3-00014-gcea49b0-dirty)
> > MSR: 00009032 <EE,ME,IR,DR>  CR: 44022422  XER: 20000000
> > DAR: 00000000, DSISR: c0000000
> > TASK = c21c18c0[69] 'rc.sysinit' THREAD: c2202000
> > GPR00: 00000401 c2203e80 c21c18c0 c2203f50 00000000 4000d032 00000000 00000000
> > GPR08: 10093eb4 00000000 00009032 c000d514 021c1ad0 10095c50 03ffb000 10006500
> > GPR16: 10006508 00000000 00000000 00000000 100a2fc8 021c4238 00000000 00000400
> > GPR24: c21a2700 00000002 40000000 00000000 00000000 c2203f50 00000000 00000011
> > NIP [c000e110] do_page_fault+0x44/0x5fc
> > LR [c000d520] handle_page_fault+0xc/0x80
> > Call Trace:
> > [c2203e80] [c000e58c] do_page_fault+0x4c0/0x5fc (unreliable)
> > [c2203f40] [c000d520] handle_page_fault+0xc/0x80
> > Instruction dump:
> > 800300a0 7cba2b78 54170036 2f970400 7c9c2378 830200e0 54b6018c 40be000c
> > 74ba4820 3ac00000 813d0080 3bc00000 <80a90000> 54a036be 2f80001f 40be0100
> > ---[ end trace 99a4d88f7e2f1b60 ]---
> >
> > this happens in do_page_fault becaose regs->nip is null, so
> 
> regs or regs->nip is NULL? Either one does not make sense
> In any case it might be a secondary problem as DAR is NULL already when you
> enter the page fault.

I assumed it was NIP because ... I'm not sure why. TRAP() above
dereferences regs, and that didn't fail, but I didn't see that until
now.

> >                 insn = *((unsigned long *)regs->nip);
> > c000e110:       80 a9 00 00     lwz     r5,0(r9)
> 
> hmm, I wonder if you managed to invalidate the a kernel TLB?
> Are you using pinned kernel TLBs?

I'm not using pinned kernel TLBs, should I be?

take care!
/rex.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Benjamin Herrenschmidt @ 2009-10-05 22:37 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF9AD6CA21.DCD59A1B-ONC1257646.007AE622-C1257646.007BB767@transmode.se>

On Tue, 2009-10-06 at 00:31 +0200, Joakim Tjernlund wrote:
> 
> regs or regs->nip is NULL? Either one does not make sense
> In any case it might be a secondary problem as DAR is NULL already
> when you
> enter the page fault.
> >
> >                 insn = *((unsigned long *)regs->nip);
> > c000e110:       80 a9 00 00     lwz     r5,0(r9)
> >
> > fails.
> 
> hmm, I wonder if you managed to invalidate the a kernel TLB?
> Are you using pinned kernel TLBs? 

You should not dereference a user address like that. Use get_user !

Obviously you got 0 in SRR0 for some reason (somebody tried to jump
to 0, either intentionally or as a result of some other problem) and
the above will crash the kernel when it happens.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 22:31 UTC (permalink / raw)
  To: Rex Feany; +Cc: Scott Wood, linuxppc-dev@ozlabs.org
In-Reply-To: <20091005220420.GA27923@compile2.chatsunix.int.mrv.com>

Rex Feany <RFeany@mrv.com> wrote on 06/10/2009 00:04:20:
>
> Thus spake Joakim Tjernlund (Joakim.Tjernlund@transmode.se):
>
> > Scott and Rex, please disregard other patches from me and
> > try these out instead.
>
> I have results similar to Scott's. I tried both with and without patch
> 5 & 6, and I also need the tlbia_va in ptep_set_access_flags().
>
> I got this oops:
>
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc000e110
> Oops: Kernel access of bad area, sig: 11 [#1]
> MRV NM2
> NIP: c000e110 LR: c000d520 CTR: 1006bf40
> REGS: c2203dd0 TRAP: 0300   Not tainted  (2.6.32-rc3-00014-gcea49b0-dirty)
> MSR: 00009032 <EE,ME,IR,DR>  CR: 44022422  XER: 20000000
> DAR: 00000000, DSISR: c0000000
> TASK = c21c18c0[69] 'rc.sysinit' THREAD: c2202000
> GPR00: 00000401 c2203e80 c21c18c0 c2203f50 00000000 4000d032 00000000 00000000
> GPR08: 10093eb4 00000000 00009032 c000d514 021c1ad0 10095c50 03ffb000 10006500
> GPR16: 10006508 00000000 00000000 00000000 100a2fc8 021c4238 00000000 00000400
> GPR24: c21a2700 00000002 40000000 00000000 00000000 c2203f50 00000000 00000011
> NIP [c000e110] do_page_fault+0x44/0x5fc
> LR [c000d520] handle_page_fault+0xc/0x80
> Call Trace:
> [c2203e80] [c000e58c] do_page_fault+0x4c0/0x5fc (unreliable)
> [c2203f40] [c000d520] handle_page_fault+0xc/0x80
> Instruction dump:
> 800300a0 7cba2b78 54170036 2f970400 7c9c2378 830200e0 54b6018c 40be000c
> 74ba4820 3ac00000 813d0080 3bc00000 <80a90000> 54a036be 2f80001f 40be0100
> ---[ end trace 99a4d88f7e2f1b60 ]---
>
> this happens in do_page_fault becaose regs->nip is null, so

regs or regs->nip is NULL? Either one does not make sense
In any case it might be a secondary problem as DAR is NULL already when you
enter the page fault.
>
>                 insn = *((unsigned long *)regs->nip);
> c000e110:       80 a9 00 00     lwz     r5,0(r9)
>
> fails.

hmm, I wonder if you managed to invalidate the a kernel TLB?
Are you using pinned kernel TLBs?

   Jocke

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Benjamin Herrenschmidt @ 2009-10-05 22:09 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF16069D5F.03682AAC-ONC1257646.00774BB8-C1257646.0078E24B@transmode.se>

On Tue, 2009-10-06 at 00:00 +0200, Joakim Tjernlund wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 23:37:23:
> >
> > On Mon, 2009-10-05 at 23:25 +0200, Joakim Tjernlund wrote:
> > >
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 22:17:04:
> > > >
> > > > On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> > > > > Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> > > > > Pros:
> > > > >  - I/D TLB Miss never needs to write to the linux pte.
> > > > >  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
> > > > >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > > > >     when a page has been made dirty.
> > > >
> > > > Not sure here. You seem to still set those from asm. Is that necessary ?
> > >
> > > Well, yes. head_32.S also sets ACCESSED and DIRTY from asm so why not me?
> >
> > Don't look at the hash 32 code :-)
> 
> So what to look at then? :)

head_440.S on a recent 2.6

> > Most of the time you do anyways since the PTE isn't populated at all. At
> > which point, Linux will populate a PTE with DIRTY and ACCESSED already
> > set. It should be reasonably rare to actually fault because DIRTY and/or
> > ACCESSED are missing.
> 
> I tried to unconditionally trap to C in DTLB error but it just hung if I did
> that so the asm has to do something.

Sure, the question is what and how can we get away without it ? :-)

> > > > The approach I take on BookE is to simply not set these from asm, -and-
> > > > (this is important) not set write permission if dirty is not set in
> 
> Did you really mean "if dirty is not" ? I test RW for write permission.
> Dirty is just set when the first write happens after the permission check.

Sure but if dirty is cleared by the kernel, then you also need to remove
write permission in the TLB or it will miss setting dirty on the next
store to the page.

So dirty basically acts as a filter on RW, and accessed as a filter on
valid if you want.

> > Mostly only !ACCESSED -> no access permitted and !DIRTY -> no store
> > permitted and don't write anything back if !VALID.
> 
> That should be !RW and not !DIRTY I hope? Then trap
> first store and set DIRTY (without trapping to C)

No it's really !(RW _AND_ DIRTY) -> no store permitted, and
!(PRESENT _AND_ ACCESSED) -> no access permitted.

Cheers,
Ben.

> >
> > Cheers,
> > Ben.
> >
> > >  Jocke
> > >
> > > >
> > > > Cheers,
> > > > Ben.
> > > >
> > > > >  - Proper RO/RW mapping of user space.
> > > > >  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> > > > > Cons:
> > > > >  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
> > > > >    not written anymore, it should still be a win.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Rex Feany @ 2009-10-05 22:04 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org
In-Reply-To: <1254744999-3158-1-git-send-email-Joakim.Tjernlund@transmode.se>

Thus spake Joakim Tjernlund (Joakim.Tjernlund@transmode.se):

> Scott and Rex, please disregard other patches from me and
> try these out instead.

I have results similar to Scott's. I tried both with and without patch 
5 & 6, and I also need the tlbia_va in ptep_set_access_flags().

I got this oops:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000e110
Oops: Kernel access of bad area, sig: 11 [#1]
MRV NM2
NIP: c000e110 LR: c000d520 CTR: 1006bf40
REGS: c2203dd0 TRAP: 0300   Not tainted  (2.6.32-rc3-00014-gcea49b0-dirty)
MSR: 00009032 <EE,ME,IR,DR>  CR: 44022422  XER: 20000000
DAR: 00000000, DSISR: c0000000
TASK = c21c18c0[69] 'rc.sysinit' THREAD: c2202000
GPR00: 00000401 c2203e80 c21c18c0 c2203f50 00000000 4000d032 00000000 00000000
GPR08: 10093eb4 00000000 00009032 c000d514 021c1ad0 10095c50 03ffb000 10006500
GPR16: 10006508 00000000 00000000 00000000 100a2fc8 021c4238 00000000 00000400
GPR24: c21a2700 00000002 40000000 00000000 00000000 c2203f50 00000000 00000011
NIP [c000e110] do_page_fault+0x44/0x5fc
LR [c000d520] handle_page_fault+0xc/0x80
Call Trace:
[c2203e80] [c000e58c] do_page_fault+0x4c0/0x5fc (unreliable)
[c2203f40] [c000d520] handle_page_fault+0xc/0x80
Instruction dump:
800300a0 7cba2b78 54170036 2f970400 7c9c2378 830200e0 54b6018c 40be000c
74ba4820 3ac00000 813d0080 3bc00000 <80a90000> 54a036be 2f80001f 40be0100
---[ end trace 99a4d88f7e2f1b60 ]---

this happens in do_page_fault becaose regs->nip is null, so

                insn = *((unsigned long *)regs->nip);
c000e110:       80 a9 00 00     lwz     r5,0(r9)

fails.

take care!
/rex.

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Joakim Tjernlund @ 2009-10-05 22:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254778643.7122.43.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 23:37:23:
>
> On Mon, 2009-10-05 at 23:25 +0200, Joakim Tjernlund wrote:
> >
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 22:17:04:
> > >
> > > On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> > > > Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> > > > Pros:
> > > >  - I/D TLB Miss never needs to write to the linux pte.
> > > >  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
> > > >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > > >     when a page has been made dirty.
> > >
> > > Not sure here. You seem to still set those from asm. Is that necessary ?
> >
> > Well, yes. head_32.S also sets ACCESSED and DIRTY from asm so why not me?
>
> Don't look at the hash 32 code :-)

So what to look at then? :)

>
> > The basic method I use is:
> > TLB gets mapped with NoAccess, then the first access will trap
> > into TLB error, where ACCESSED will be set if permission to access
> > the page is OK (and RW too). On first store 8xx will trap
> > into DTLB error and permissions is OK, DIRTY will be set too.
> > Is this wrong?
>
> >From your explanation it looks ok. IE. as long as !DIRTY -> not
> writeable (will fault on store) and !ACCESSED means not accessible (will
> fault on any access) then you are ok.

Yes, that is what I have (tried) to do.

>
> > Do I have to trap to C for first store?
>
> Most of the time you do anyways since the PTE isn't populated at all. At
> which point, Linux will populate a PTE with DIRTY and ACCESSED already
> set. It should be reasonably rare to actually fault because DIRTY and/or
> ACCESSED are missing.

I tried to unconditionally trap to C in DTLB error but it just hung if I did
that so the asm has to do something.

>
> > > The approach I take on BookE is to simply not set these from asm, -and-
> > > (this is important) not set write permission if dirty is not set in

Did you really mean "if dirty is not" ? I test RW for write permission.
Dirty is just set when the first write happens after the permission check.

> > > the TLB miss and set no access permission at all when accessed is not
> > > set. This is important or we'll miss dirty updates which can
> > > be fatal.
> >
> > not sure, but this seems similar to what I do. DIRTY will be set,
> > in asm, on first store.
>
> Don't set DIRTY if !VALID

I don't. !VALID trap to C

>
> > ACCESSED will only be set iff (USER && VALID)
>
> My point is that you should be able to simplify the code here, have only
> the TLB miss look at the PTE, not the data access and instruction
> access, and have the later be a boring exception going straight to C.

I do this, TLB Miss only looks at TLB and then transforms it into a HW pte.
The HW pte do not map 1:1 so I need to some magic to fit the linux pte
into a HW pte.

>
> > >
> > > The C code in handle_pte_fault() will fixup missing access and dirty
> > > if necessary and flush.
> > >
> > > Also look at the 440 code, I think you could simplify your
> > > implementation using similar techniques, such as andc of PTE against
> > > requested bits etc... and thus maybe save a couple of insns.
> >
> > Great, but first I want to make sure I doing it right :)
> >
> > So is there some golden rule I have to follow?
>
> Mostly only !ACCESSED -> no access permitted and !DIRTY -> no store
> permitted and don't write anything back if !VALID.

That should be !RW and not !DIRTY I hope? Then trap
first store and set DIRTY (without trapping to C)

>
> Cheers,
> Ben.
>
> >  Jocke
> >
> > >
> > > Cheers,
> > > Ben.
> > >
> > > >  - Proper RO/RW mapping of user space.
> > > >  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> > > > Cons:
> > > >  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
> > > >    not written anymore, it should still be a win.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Scott Wood @ 2009-10-05 21:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Rex Feany, linuxppc-dev@ozlabs.org
In-Reply-To: <OF17005BC2.01F7F762-ONC1257646.0076C3C3-C1257646.00772EBF@transmode.se>

Joakim Tjernlund wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 23:31:39:
> 
>>> Yes, every ld.so uses dcbX and icbi insn when relocatin code.
>> Even with -msecure-plt ?
> 
> hmm, maybe not. Can't remember now. But perhaps LAZY relocs still
> need dcbX? Easiest is to check in uClibc. I impl. it, but was to long
> time ago for me to remember. Scott are you using uClibc? I don't
> think so because of the dynamic check of the cache line size

No, this is glibc.

-Scott

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 21:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254778299.7122.39.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 23:31:39:

> >
> > Yes, every ld.so uses dcbX and icbi insn when relocatin code.
>
> Even with -msecure-plt ?

hmm, maybe not. Can't remember now. But perhaps LAZY relocs still
need dcbX? Easiest is to check in uClibc. I impl. it, but was to long
time ago for me to remember. Scott are you using uClibc? I don't
think so because of the dynamic check of the cache line size

>
> > Maybe you see some version of the dcbX bug, but my fault.c should
> > fix them up. My bet would be the 32 byte cache line, it will miss
> > out every second line and so the results are unreliable.
>
> Cheers,
> Ben.

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Benjamin Herrenschmidt @ 2009-10-05 21:37 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OF3182684B.BB80DBCF-ONC1257646.0073D59B-C1257646.0075AB6F@transmode.se>

On Mon, 2009-10-05 at 23:25 +0200, Joakim Tjernlund wrote:
> 
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 22:17:04:
> >
> > On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> > > Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> > > Pros:
> > >  - I/D TLB Miss never needs to write to the linux pte.
> > >  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
> > >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > >     when a page has been made dirty.
> >
> > Not sure here. You seem to still set those from asm. Is that necessary ?
> 
> Well, yes. head_32.S also sets ACCESSED and DIRTY from asm so why not me?

Don't look at the hash 32 code :-)

> The basic method I use is:
> TLB gets mapped with NoAccess, then the first access will trap
> into TLB error, where ACCESSED will be set if permission to access
> the page is OK (and RW too). On first store 8xx will trap
> into DTLB error and permissions is OK, DIRTY will be set too.
> Is this wrong?

>From your explanation it looks ok. IE. as long as !DIRTY -> not
writeable (will fault on store) and !ACCESSED means not accessible (will
fault on any access) then you are ok.

> Do I have to trap to C for first store?

Most of the time you do anyways since the PTE isn't populated at all. At
which point, Linux will populate a PTE with DIRTY and ACCESSED already
set. It should be reasonably rare to actually fault because DIRTY and/or
ACCESSED are missing.

> > The approach I take on BookE is to simply not set these from asm, -and-
> > (this is important) not set write permission if dirty is not set in
> > the TLB miss and set no access permission at all when accessed is not
> > set. This is important or we'll miss dirty updates which can
> > be fatal.
> 
> not sure, but this seems similar to what I do. DIRTY will be set,
> in asm, on first store.

Don't set DIRTY if !VALID

> ACCESSED will only be set iff (USER && VALID)

My point is that you should be able to simplify the code here, have only
the TLB miss look at the PTE, not the data access and instruction
access, and have the later be a boring exception going straight to C.

> >
> > The C code in handle_pte_fault() will fixup missing access and dirty
> > if necessary and flush.
> >
> > Also look at the 440 code, I think you could simplify your
> > implementation using similar techniques, such as andc of PTE against
> > requested bits etc... and thus maybe save a couple of insns.
> 
> Great, but first I want to make sure I doing it right :)
> 
> So is there some golden rule I have to follow?

Mostly only !ACCESSED -> no access permitted and !DIRTY -> no store
permitted and don't write anything back if !VALID.

Cheers,
Ben.

>  Jocke
> 
> >
> > Cheers,
> > Ben.
> >
> > >  - Proper RO/RW mapping of user space.
> > >  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> > > Cons:
> > >  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
> > >    not written anymore, it should still be a win.
> > > ---
> > >  arch/powerpc/include/asm/pte-8xx.h |    9 +-
> > >  arch/powerpc/kernel/head_8xx.S     |  163 ++++++++++++++++++++++++++----------
> > >  2 files changed, 122 insertions(+), 50 deletions(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/pte-8xx.h b/arch/powerpc/include/asm/pte-8xx.h
> > > index 8c6e312..af541a2 100644
> > > --- a/arch/powerpc/include/asm/pte-8xx.h
> > > +++ b/arch/powerpc/include/asm/pte-8xx.h
> > > @@ -32,22 +32,21 @@
> > >  #define _PAGE_FILE   0x0002   /* when !present: nonlinear file mapping */
> > >  #define _PAGE_NO_CACHE   0x0002   /* I: cache inhibit */
> > >  #define _PAGE_SHARED   0x0004   /* No ASID (context) compare */
> > > +#define _PAGE_DIRTY   0x0100   /* C: page changed */
> > >
> > >  /* These five software bits must be masked out when the entry is loaded
> > >   * into the TLB.
> > >   */
> > >  #define _PAGE_EXEC   0x0008   /* software: i-cache coherency required */
> > >  #define _PAGE_GUARDED   0x0010   /* software: guarded access */
> > > -#define _PAGE_DIRTY   0x0020   /* software: page changed */
> > > -#define _PAGE_RW   0x0040   /* software: user write access allowed */
> > > -#define _PAGE_ACCESSED   0x0080   /* software: page referenced */
> > > +#define _PAGE_USER   0x0020   /* software: User space access */
> > >
> > >  /* Setting any bits in the nibble with the follow two controls will
> > >   * require a TLB exception handler change.  It is assumed unused bits
> > >   * are always zero.
> > >   */
> > > -#define _PAGE_HWWRITE   0x0100   /* h/w write enable: never set in Linux PTE */
> > > -#define _PAGE_USER   0x0800   /* One of the PP bits, the other is USER&~RW */
> > > +#define _PAGE_RW   0x0400   /* lsb PP bits */
> > > +#define _PAGE_ACCESSED   0x0800   /* msb PP bits */
> > >
> > >  #define _PMD_PRESENT   0x0001
> > >  #define _PMD_BAD   0x0ff0
> > > diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> > > index 118bb05..b1f72d9 100644
> > > --- a/arch/powerpc/kernel/head_8xx.S
> > > +++ b/arch/powerpc/kernel/head_8xx.S
> > > @@ -333,21 +333,15 @@ InstructionTLBMiss:
> > >     mfspr   r11, SPRN_MD_TWC   /* ....and get the pte address */
> > >     lwz   r10, 0(r11)   /* Get the pte */
> > >
> > > -#ifdef CONFIG_SWAP
> > > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > > -   andi.   r11, r10, _PAGE_PRESENT
> > > -   beq   4f
> > > -   ori   r10, r10, _PAGE_ACCESSED
> > > -   mfspr   r11, SPRN_MD_TWC   /* get the pte address again */
> > > -   stw   r10, 0(r11)
> > > -4:
> > > -#else
> > > -   ori   r10, r10, _PAGE_ACCESSED
> > > -   stw   r10, 0(r11)
> > > -#endif
> > > -
> > > +   andi.   r11, r10, _PAGE_USER | _PAGE_ACCESSED
> > > +   cmpwi   cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> > > +   beq+   cr0, 5f   /* branch if access allowed */
> > > +   rlwinm   r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> > > +   b   6f
> > > +5:   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > > +6:
> > >     /* The Linux PTE won't go exactly into the MMU TLB.
> > > -    * Software indicator bits 21, 22 and 28 must be clear.
> > > +    * Software indicator bit 28 must be clear.
> > >      * Software indicator bits 24, 25, 26, and 27 must be
> > >      * set.  All other Linux PTE bits control the behavior
> > >      * of the MMU.
> > > @@ -409,21 +403,15 @@ DataStoreTLBMiss:
> > >     DO_8xx_CPU6(0x3b80, r3)
> > >     mtspr   SPRN_MD_TWC, r11
> > >
> > > -#ifdef CONFIG_SWAP
> > > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > > -   andi.   r11, r10, _PAGE_PRESENT
> > > -   beq   4f
> > > -   ori   r10, r10, _PAGE_ACCESSED
> > > -4:
> > > -   /* and update pte in table */
> > > -#else
> > > -   ori   r10, r10, _PAGE_ACCESSED
> > > -#endif
> > > -   mfspr   r11, SPRN_MD_TWC   /* get the pte address again */
> > > -   stw   r10, 0(r11)
> > > -
> > > +   andi.   r11, r10, _PAGE_USER | _PAGE_ACCESSED
> > > +   cmpwi   cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> > > +   beq+   cr0, 5f   /* branch if access allowed */
> > > +   rlwinm   r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> > > +   b   6f
> > > +5:   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > > +6:
> > >     /* The Linux PTE won't go exactly into the MMU TLB.
> > > -    * Software indicator bits 21, 22 and 28 must be clear.
> > > +    * Software indicator bit 28 must be clear.
> > >      * Software indicator bits 24, 25, 26, and 27 must be
> > >      * set.  All other Linux PTE bits control the behavior
> > >      * of the MMU.
> > > @@ -448,6 +436,91 @@ DataStoreTLBMiss:
> > >   */
> > >     . = 0x1300
> > >  InstructionTLBError:
> > > +#ifdef CONFIG_8xx_CPU6
> > > +   stw   r3, 8(r0)
> > > +#endif
> > > +   DO_8xx_CPU6(0x3f80, r3)
> > > +   mtspr   SPRN_M_TW, r10   /* Save a couple of working registers */
> > > +   mfcr   r10
> > > +   stw   r10, 0(r0)
> > > +   stw   r11, 4(r0)
> > > +
> > > +   mfspr   r11, SPRN_SRR1
> > > +   andis.   r11, r11, 0x5000   /* no translation, guarded */
> > > +   bne   2f
> > > +
> > > +   mfspr   r10, SPRN_SRR0   /* Get effective address of fault */
> > > +#ifdef CONFIG_8xx_CPU15
> > > +   addi   r11, r10, 0x1000
> > > +   tlbie   r11
> > > +   addi   r11, r10, -0x1000
> > > +   tlbie   r11
> > > +#endif
> > > +   DO_8xx_CPU6(0x3780, r3)
> > > +   mtspr   SPRN_MD_EPN, r10   /* Have to use MD_EPN for walk, MI_EPN can't */
> > > +   mfspr   r10, SPRN_M_TWB   /* Get level 1 table entry address */
> > > +
> > > +   /* If we are faulting a kernel address, we have to use the
> > > +    * kernel page tables.
> > > +    */
> > > +   andi.   r11, r10, 0x0800   /* Address >= 0x80000000 */
> > > +   beq   3f
> > > +   lis   r11, swapper_pg_dir@h
> > > +   ori   r11, r11, swapper_pg_dir@l
> > > +   rlwimi   r10, r11, 0, 2, 19
> > > +3:
> > > +   lwz   r11, 0(r10)   /* Get the level 1 entry */
> > > +   rlwinm.   r10, r11,0,0,19   /* Extract page descriptor page address */
> > > +   beq   2f      /* If zero, don't try to find a pte */
> > > +
> > > +   /* We have a pte table, so load the MI_TWC with the attributes
> > > +    * for this "segment."
> > > +    */
> > > +   ori   r11,r11,1      /* Set valid bit */
> > > +   DO_8xx_CPU6(0x2b80, r3)
> > > +   mtspr   SPRN_MI_TWC, r11   /* Set segment attributes */
> > > +   DO_8xx_CPU6(0x3b80, r3)
> > > +   mtspr   SPRN_MD_TWC, r11   /* Load pte table base address */
> > > +
> > > +   mfspr   r11, SPRN_SRR1
> > > +   andi.   r11, r11, 0x4000 /* MSR[PR] */
> > > +   mfspr   r11, SPRN_MD_TWC   /* ....and get the pte address */
> > > +   lwz   r10, 0(r11)   /* Get the pte */
> > > +   beq   5f /* Kernel access always OK */
> > > +   andi.   r11,r10, _PAGE_USER
> > > +   beq   2f
> > > +5:   ori   r10, r10, _PAGE_ACCESSED
> > > +   mfspr   r21, SPRN_MD_TWC   /* ....and get the pte address */
> > > +   stw   r10, 0(r11)
> > > +   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > > +
> > > +   /* The Linux PTE won't go exactly into the MMU TLB.
> > > +    * Software indicator bit 28 must be clear.
> > > +    * Software indicator bits 24, 25, 26, and 27 must be
> > > +    * set.  All other Linux PTE bits control the behavior
> > > +    * of the MMU.
> > > +    */
> > > +   li   r11, 0x00f0
> > > +   rlwimi   r10, r11, 0, 24, 28   /* Set 24-27, clear 28 */
> > > +   DO_8xx_CPU6(0x2d80, r3)
> > > +   mtspr   SPRN_MI_RPN, r10   /* Update TLB entry */
> > > +
> > > +   mfspr   r10, SPRN_M_TW   /* Restore registers */
> > > +   lwz   r11, 0(r0)
> > > +   mtcr   r11
> > > +   lwz   r11, 4(r0)
> > > +#ifdef CONFIG_8xx_CPU6
> > > +   lwz   r3, 8(r0)
> > > +#endif
> > > +   rfi
> > > +
> > > +2:   mfspr   r10, SPRN_M_TW   /* Restore registers */
> > > +   lwz   r11, 0(r0)
> > > +   mtcr   r11
> > > +   lwz   r11, 4(r0)
> > > +#ifdef CONFIG_8xx_CPU6
> > > +   lwz   r3, 8(r0)
> > > +#endif
> > >     b   InstructionAccess
> > >
> > >  /* This is the data TLB error on the MPC8xx.  This could be due to
> > > @@ -472,8 +545,8 @@ DataTLBError:
> > >     /* First, make sure this was a store operation.
> > >     */
> > >     mfspr   r10, SPRN_DSISR
> > > -   andis.   r11, r10, 0x4800 /* no translation, no permission. */
> > > -   bne   2f   /* branch if either is set */
> > > +   andis.   r11, r10, 0x4000 /* no translation */
> > > +   bne   2f   /* branch if set */
> > >
> > >     /* The EA of a data TLB miss is automatically stored in the MD_EPN
> > >      * register.  The EA of a data TLB error is automatically stored in
> > > @@ -522,26 +595,26 @@ DataTLBError:
> > >     mfspr   r11, SPRN_MD_TWC      /* ....and get the pte address */
> > >     lwz   r10, 0(r11)      /* Get the pte */
> > >
> > > -   andi.   r11, r10, _PAGE_RW   /* Is it writeable? */
> > > -   beq   2f         /* Bail out if not */
> > > +   mfspr   r11, SPRN_SRR1
> > > +   andi.   r11, r11, 0x4000 /* MSR[PR] */
> > > +   beq   5f /* Kernel access always OK */
> > > +   andi.   r11,r10, _PAGE_USER
> > > +   beq   2f
> > > +5:   mfspr   r11, SPRN_DSISR
> > > +   andis.   r11, r11, 0x0200   /* store */
> > > +   beq   6f
> > > +   andi.   r11, r10, _PAGE_RW   /* writeable? */
> > > +   beq   2f /* branch if not */
> > > +   ori   r10, r10, _PAGE_DIRTY | _PAGE_HWWRITE
> > > +6:   ori   r10, r10, _PAGE_ACCESSED
> > >
> > > -   /* Update 'changed', among others.
> > > -   */
> > > -#ifdef CONFIG_SWAP
> > > -   ori   r10, r10, _PAGE_DIRTY|_PAGE_HWWRITE
> > > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > > -   andi.   r11, r10, _PAGE_PRESENT
> > > -   beq   4f
> > > -   ori   r10, r10, _PAGE_ACCESSED
> > > -4:
> > > -#else
> > > -   ori   r10, r10, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
> > > -#endif
> > >     mfspr   r11, SPRN_MD_TWC      /* Get pte address again */
> > >     stw   r10, 0(r11)      /* and update pte in table */
> > >
> > > +   xori   r10, r10, _PAGE_RW   /* Invert RW bit */
> > > +
> > >     /* The Linux PTE won't go exactly into the MMU TLB.
> > > -    * Software indicator bits 21, 22 and 28 must be clear.
> > > +    * Software indicator bit 28 must be clear.
> > >      * Software indicator bits 24, 25, 26, and 27 must be
> > >      * set.  All other Linux PTE bits control the behavior
> > >      * of the MMU.
> >
> >
> >
> >

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Benjamin Herrenschmidt @ 2009-10-05 21:31 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OFFC147C60.2DA4B63D-ONC1257646.00733843-C1257646.0073CFFD@transmode.se>

On Mon, 2009-10-05 at 23:04 +0200, Joakim Tjernlund wrote:
> 
> Yes, every ld.so uses dcbX and icbi insn when relocatin code.

Even with -msecure-plt ?

> Maybe you see some version of the dcbX bug, but my fault.c should
> fix them up. My bet would be the 32 byte cache line, it will miss
> out every second line and so the results are unreliable.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Scott Wood @ 2009-10-05 21:31 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Rex Feany, linuxppc-dev@ozlabs.org
In-Reply-To: <OFFC147C60.2DA4B63D-ONC1257646.00733843-C1257646.0073CFFD@transmode.se>

Joakim Tjernlund wrote:
> Yes, every ld.so uses dcbX and icbi insn when relocatin code.
> Maybe you see some version of the dcbX bug, but my fault.c should
> fix them up. My bet would be the 32 byte cache line, it will miss
> out every second line and so the results are unreliable.

I found the 32-byte stuff from an objdump; it looks like there's a 
runtime check for the cache line size that avoids running that code if 
it's different.

-Scott

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Joakim Tjernlund @ 2009-10-05 21:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254773824.7122.33.camel@pasglop>



Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 05/10/2009 22:17:04:
>
> On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> > Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> > Pros:
> >  - I/D TLB Miss never needs to write to the linux pte.
> >  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
> >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> >     when a page has been made dirty.
>
> Not sure here. You seem to still set those from asm. Is that necessary ?

Well, yes. head_32.S also sets ACCESSED and DIRTY from asm so why not me?
The basic method I use is:
TLB gets mapped with NoAccess, then the first access will trap
into TLB error, where ACCESSED will be set if permission to access
the page is OK (and RW too). On first store 8xx will trap
into DTLB error and permissions is OK, DIRTY will be set too.
Is this wrong?
Do I have to trap to C for first store?

>
> The approach I take on BookE is to simply not set these from asm, -and-
> (this is important) not set write permission if dirty is not set in
> the TLB miss and set no access permission at all when accessed is not
> set. This is important or we'll miss dirty updates which can
> be fatal.

not sure, but this seems similar to what I do. DIRTY will be set,
in asm, on first store.
ACCESSED will only be set iff (USER && VALID)

>
> The C code in handle_pte_fault() will fixup missing access and dirty
> if necessary and flush.
>
> Also look at the 440 code, I think you could simplify your
> implementation using similar techniques, such as andc of PTE against
> requested bits etc... and thus maybe save a couple of insns.

Great, but first I want to make sure I doing it right :)

So is there some golden rule I have to follow?

 Jocke

>
> Cheers,
> Ben.
>
> >  - Proper RO/RW mapping of user space.
> >  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> > Cons:
> >  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
> >    not written anymore, it should still be a win.
> > ---
> >  arch/powerpc/include/asm/pte-8xx.h |    9 +-
> >  arch/powerpc/kernel/head_8xx.S     |  163 ++++++++++++++++++++++++++----------
> >  2 files changed, 122 insertions(+), 50 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pte-8xx.h b/arch/powerpc/include/asm/pte-8xx.h
> > index 8c6e312..af541a2 100644
> > --- a/arch/powerpc/include/asm/pte-8xx.h
> > +++ b/arch/powerpc/include/asm/pte-8xx.h
> > @@ -32,22 +32,21 @@
> >  #define _PAGE_FILE   0x0002   /* when !present: nonlinear file mapping */
> >  #define _PAGE_NO_CACHE   0x0002   /* I: cache inhibit */
> >  #define _PAGE_SHARED   0x0004   /* No ASID (context) compare */
> > +#define _PAGE_DIRTY   0x0100   /* C: page changed */
> >
> >  /* These five software bits must be masked out when the entry is loaded
> >   * into the TLB.
> >   */
> >  #define _PAGE_EXEC   0x0008   /* software: i-cache coherency required */
> >  #define _PAGE_GUARDED   0x0010   /* software: guarded access */
> > -#define _PAGE_DIRTY   0x0020   /* software: page changed */
> > -#define _PAGE_RW   0x0040   /* software: user write access allowed */
> > -#define _PAGE_ACCESSED   0x0080   /* software: page referenced */
> > +#define _PAGE_USER   0x0020   /* software: User space access */
> >
> >  /* Setting any bits in the nibble with the follow two controls will
> >   * require a TLB exception handler change.  It is assumed unused bits
> >   * are always zero.
> >   */
> > -#define _PAGE_HWWRITE   0x0100   /* h/w write enable: never set in Linux PTE */
> > -#define _PAGE_USER   0x0800   /* One of the PP bits, the other is USER&~RW */
> > +#define _PAGE_RW   0x0400   /* lsb PP bits */
> > +#define _PAGE_ACCESSED   0x0800   /* msb PP bits */
> >
> >  #define _PMD_PRESENT   0x0001
> >  #define _PMD_BAD   0x0ff0
> > diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> > index 118bb05..b1f72d9 100644
> > --- a/arch/powerpc/kernel/head_8xx.S
> > +++ b/arch/powerpc/kernel/head_8xx.S
> > @@ -333,21 +333,15 @@ InstructionTLBMiss:
> >     mfspr   r11, SPRN_MD_TWC   /* ....and get the pte address */
> >     lwz   r10, 0(r11)   /* Get the pte */
> >
> > -#ifdef CONFIG_SWAP
> > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > -   andi.   r11, r10, _PAGE_PRESENT
> > -   beq   4f
> > -   ori   r10, r10, _PAGE_ACCESSED
> > -   mfspr   r11, SPRN_MD_TWC   /* get the pte address again */
> > -   stw   r10, 0(r11)
> > -4:
> > -#else
> > -   ori   r10, r10, _PAGE_ACCESSED
> > -   stw   r10, 0(r11)
> > -#endif
> > -
> > +   andi.   r11, r10, _PAGE_USER | _PAGE_ACCESSED
> > +   cmpwi   cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> > +   beq+   cr0, 5f   /* branch if access allowed */
> > +   rlwinm   r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> > +   b   6f
> > +5:   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > +6:
> >     /* The Linux PTE won't go exactly into the MMU TLB.
> > -    * Software indicator bits 21, 22 and 28 must be clear.
> > +    * Software indicator bit 28 must be clear.
> >      * Software indicator bits 24, 25, 26, and 27 must be
> >      * set.  All other Linux PTE bits control the behavior
> >      * of the MMU.
> > @@ -409,21 +403,15 @@ DataStoreTLBMiss:
> >     DO_8xx_CPU6(0x3b80, r3)
> >     mtspr   SPRN_MD_TWC, r11
> >
> > -#ifdef CONFIG_SWAP
> > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > -   andi.   r11, r10, _PAGE_PRESENT
> > -   beq   4f
> > -   ori   r10, r10, _PAGE_ACCESSED
> > -4:
> > -   /* and update pte in table */
> > -#else
> > -   ori   r10, r10, _PAGE_ACCESSED
> > -#endif
> > -   mfspr   r11, SPRN_MD_TWC   /* get the pte address again */
> > -   stw   r10, 0(r11)
> > -
> > +   andi.   r11, r10, _PAGE_USER | _PAGE_ACCESSED
> > +   cmpwi   cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> > +   beq+   cr0, 5f   /* branch if access allowed */
> > +   rlwinm   r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> > +   b   6f
> > +5:   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > +6:
> >     /* The Linux PTE won't go exactly into the MMU TLB.
> > -    * Software indicator bits 21, 22 and 28 must be clear.
> > +    * Software indicator bit 28 must be clear.
> >      * Software indicator bits 24, 25, 26, and 27 must be
> >      * set.  All other Linux PTE bits control the behavior
> >      * of the MMU.
> > @@ -448,6 +436,91 @@ DataStoreTLBMiss:
> >   */
> >     . = 0x1300
> >  InstructionTLBError:
> > +#ifdef CONFIG_8xx_CPU6
> > +   stw   r3, 8(r0)
> > +#endif
> > +   DO_8xx_CPU6(0x3f80, r3)
> > +   mtspr   SPRN_M_TW, r10   /* Save a couple of working registers */
> > +   mfcr   r10
> > +   stw   r10, 0(r0)
> > +   stw   r11, 4(r0)
> > +
> > +   mfspr   r11, SPRN_SRR1
> > +   andis.   r11, r11, 0x5000   /* no translation, guarded */
> > +   bne   2f
> > +
> > +   mfspr   r10, SPRN_SRR0   /* Get effective address of fault */
> > +#ifdef CONFIG_8xx_CPU15
> > +   addi   r11, r10, 0x1000
> > +   tlbie   r11
> > +   addi   r11, r10, -0x1000
> > +   tlbie   r11
> > +#endif
> > +   DO_8xx_CPU6(0x3780, r3)
> > +   mtspr   SPRN_MD_EPN, r10   /* Have to use MD_EPN for walk, MI_EPN can't */
> > +   mfspr   r10, SPRN_M_TWB   /* Get level 1 table entry address */
> > +
> > +   /* If we are faulting a kernel address, we have to use the
> > +    * kernel page tables.
> > +    */
> > +   andi.   r11, r10, 0x0800   /* Address >= 0x80000000 */
> > +   beq   3f
> > +   lis   r11, swapper_pg_dir@h
> > +   ori   r11, r11, swapper_pg_dir@l
> > +   rlwimi   r10, r11, 0, 2, 19
> > +3:
> > +   lwz   r11, 0(r10)   /* Get the level 1 entry */
> > +   rlwinm.   r10, r11,0,0,19   /* Extract page descriptor page address */
> > +   beq   2f      /* If zero, don't try to find a pte */
> > +
> > +   /* We have a pte table, so load the MI_TWC with the attributes
> > +    * for this "segment."
> > +    */
> > +   ori   r11,r11,1      /* Set valid bit */
> > +   DO_8xx_CPU6(0x2b80, r3)
> > +   mtspr   SPRN_MI_TWC, r11   /* Set segment attributes */
> > +   DO_8xx_CPU6(0x3b80, r3)
> > +   mtspr   SPRN_MD_TWC, r11   /* Load pte table base address */
> > +
> > +   mfspr   r11, SPRN_SRR1
> > +   andi.   r11, r11, 0x4000 /* MSR[PR] */
> > +   mfspr   r11, SPRN_MD_TWC   /* ....and get the pte address */
> > +   lwz   r10, 0(r11)   /* Get the pte */
> > +   beq   5f /* Kernel access always OK */
> > +   andi.   r11,r10, _PAGE_USER
> > +   beq   2f
> > +5:   ori   r10, r10, _PAGE_ACCESSED
> > +   mfspr   r21, SPRN_MD_TWC   /* ....and get the pte address */
> > +   stw   r10, 0(r11)
> > +   xori   r10, r10, _PAGE_RW /* invert RW bit */
> > +
> > +   /* The Linux PTE won't go exactly into the MMU TLB.
> > +    * Software indicator bit 28 must be clear.
> > +    * Software indicator bits 24, 25, 26, and 27 must be
> > +    * set.  All other Linux PTE bits control the behavior
> > +    * of the MMU.
> > +    */
> > +   li   r11, 0x00f0
> > +   rlwimi   r10, r11, 0, 24, 28   /* Set 24-27, clear 28 */
> > +   DO_8xx_CPU6(0x2d80, r3)
> > +   mtspr   SPRN_MI_RPN, r10   /* Update TLB entry */
> > +
> > +   mfspr   r10, SPRN_M_TW   /* Restore registers */
> > +   lwz   r11, 0(r0)
> > +   mtcr   r11
> > +   lwz   r11, 4(r0)
> > +#ifdef CONFIG_8xx_CPU6
> > +   lwz   r3, 8(r0)
> > +#endif
> > +   rfi
> > +
> > +2:   mfspr   r10, SPRN_M_TW   /* Restore registers */
> > +   lwz   r11, 0(r0)
> > +   mtcr   r11
> > +   lwz   r11, 4(r0)
> > +#ifdef CONFIG_8xx_CPU6
> > +   lwz   r3, 8(r0)
> > +#endif
> >     b   InstructionAccess
> >
> >  /* This is the data TLB error on the MPC8xx.  This could be due to
> > @@ -472,8 +545,8 @@ DataTLBError:
> >     /* First, make sure this was a store operation.
> >     */
> >     mfspr   r10, SPRN_DSISR
> > -   andis.   r11, r10, 0x4800 /* no translation, no permission. */
> > -   bne   2f   /* branch if either is set */
> > +   andis.   r11, r10, 0x4000 /* no translation */
> > +   bne   2f   /* branch if set */
> >
> >     /* The EA of a data TLB miss is automatically stored in the MD_EPN
> >      * register.  The EA of a data TLB error is automatically stored in
> > @@ -522,26 +595,26 @@ DataTLBError:
> >     mfspr   r11, SPRN_MD_TWC      /* ....and get the pte address */
> >     lwz   r10, 0(r11)      /* Get the pte */
> >
> > -   andi.   r11, r10, _PAGE_RW   /* Is it writeable? */
> > -   beq   2f         /* Bail out if not */
> > +   mfspr   r11, SPRN_SRR1
> > +   andi.   r11, r11, 0x4000 /* MSR[PR] */
> > +   beq   5f /* Kernel access always OK */
> > +   andi.   r11,r10, _PAGE_USER
> > +   beq   2f
> > +5:   mfspr   r11, SPRN_DSISR
> > +   andis.   r11, r11, 0x0200   /* store */
> > +   beq   6f
> > +   andi.   r11, r10, _PAGE_RW   /* writeable? */
> > +   beq   2f /* branch if not */
> > +   ori   r10, r10, _PAGE_DIRTY | _PAGE_HWWRITE
> > +6:   ori   r10, r10, _PAGE_ACCESSED
> >
> > -   /* Update 'changed', among others.
> > -   */
> > -#ifdef CONFIG_SWAP
> > -   ori   r10, r10, _PAGE_DIRTY|_PAGE_HWWRITE
> > -   /* do not set the _PAGE_ACCESSED bit of a non-present page */
> > -   andi.   r11, r10, _PAGE_PRESENT
> > -   beq   4f
> > -   ori   r10, r10, _PAGE_ACCESSED
> > -4:
> > -#else
> > -   ori   r10, r10, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
> > -#endif
> >     mfspr   r11, SPRN_MD_TWC      /* Get pte address again */
> >     stw   r10, 0(r11)      /* and update pte in table */
> >
> > +   xori   r10, r10, _PAGE_RW   /* Invert RW bit */
> > +
> >     /* The Linux PTE won't go exactly into the MMU TLB.
> > -    * Software indicator bits 21, 22 and 28 must be clear.
> > +    * Software indicator bit 28 must be clear.
> >      * Software indicator bits 24, 25, 26, and 27 must be
> >      * set.  All other Linux PTE bits control the behavior
> >      * of the MMU.
>
>
>
>

^ permalink raw reply

* Re: [PATCH 0/6] PowerPc 8xx TLB/MMU fixes
From: Joakim Tjernlund @ 2009-10-05 21:04 UTC (permalink / raw)
  To: Scott Wood; +Cc: Rex Feany, linuxppc-dev@ozlabs.org
In-Reply-To: <20091005200941.GD3576@loki.buserror.net>

Scott Wood <scottwood@freescale.com> wrote on 05/10/2009 22:09:41:
>
> On Mon, Oct 05, 2009 at 08:27:39PM +0200, Joakim Tjernlund wrote:
> > > After resolving the conflict, without adding tlbil_va in do_page_fault(), I
> > > get the same stuck behavior as before.
> >
> > Expected, I havn't not tried to fix the missing tlbil_va(). That is
> > different problem that you and Ben needs to sort out.
>
> Right, I just wanted to be clear about which results were from which
> modifications.
>
> > >  With tlbil_va, I get this (not sure
> > > if it's related to using the wrong base tree):
> >
> > Could be the debug code doing something bad or the
> > "Fixup DAR from buggy dcbX instructions". Could you back that one out?
>
> That gets rid of the segfaults.  The bootup gets stuck running udev, though
> -- the system is either idle, or it's stuck doing some syscall.

Ok, that just means that my asm version of fixing up dcbX insn does not work.

>
> I see it sometimes (but less reliably) without this patchset, so it may just
> be changing the circumstances to expose the issue more consistently.  Or
> maybe I'm seeing the dcbX bug?  There do seem to be dcbX instructions in the
> dynamic linker I'm using (including a disturbing section that appears to be
> assuming 32 byte cache lines, even though this is supposed to be an 8xx
> RFS).

Yes, every ld.so uses dcbX and icbi insn when relocatin code.
Maybe you see some version of the dcbX bug, but my fault.c should
fix them up. My bet would be the 32 byte cache line, it will miss
out every second line and so the results are unreliable.

>
> I'll look into it.
>
> > and if that does not help, backout:
> > "start using dcbX instructions in various copy routines" too.
>
> No difference.

Good, it seems possible to uses the dcbX insn now. I will
have to fixup fault a bit more though.

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Scott Wood @ 2009-10-05 21:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254774555.7122.38.camel@pasglop>

Benjamin Herrenschmidt wrote:
> On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
>> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
>> didn't put it in the filter function, because that doesn't take address as a
>> parameter).  I'd misread your suggestion as referring to the various changes
>> to set_pte_filter() that were posted.
>>
>> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
>> for me unless I check for kernel addresses -- I guess we populate page
>> tables that overlap the pinned large page region? 
> 
> Good point. I think we do on 8xx. Does doing it in set_pte_filter()
> (with the kernel check) makes any difference ? faster ? slower ? no
> visible change ?

ptep_set_access_flags() is enough to make the worst of it go away.  I'm 
having another intermittent problem that seems to happen more often 
without the change in set_pte_filter(), but I've yet to narrow down 
what's actually going on there.

-Scott

^ permalink raw reply

* Re: [FTRACE] Enabling function_graph causes OOPS
From: Steven Rostedt @ 2009-10-05 20:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1254774039.7122.35.camel@pasglop>

On Tue, 2009-10-06 at 07:20 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2009-10-05 at 09:25 -0400, Steven Rostedt wrote:
> > > > Sachin, can you give me more details on how you built that kernel ? (or
> > > > give them again in case I missed them the first time around :-), ie,
> > > > what toolchain, options, etc... or even better, give me remote access to
> > > > the build host ?
> > > 
> > > Ok, got access and had a quick look... seems to be a toolchain problem
> > > to me. I'll investigate more tomorrow.
> > 
> > Hi Ben,
> > 
> > Any more word on this issue? 
> 
> Didn't you fix it using a TOC access ?
> 
> Unless I'm confusing things, I think the problem is the usage
> of LOAD_REG_IMMEDIATE which generates relocs that we don't support
> when CONFIG_RELOCATABLE is set.
> 
> I've merged a patch that post-processes the kernel now, to check
> for such relocs so at least you should be warned at build time.

I thought we had two issues. One was the use of the relocs that did
cause issues. But then there was still crashes reported after that.
IIRC. I'm still suffering jetlag, so my memory is not that fresh about
this.

-- Steve

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Benjamin Herrenschmidt @ 2009-10-05 20:29 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <20091005192806.GC3576@loki.buserror.net>

On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
> didn't put it in the filter function, because that doesn't take address as a
> parameter).  I'd misread your suggestion as referring to the various changes
> to set_pte_filter() that were posted.
> 
> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
> for me unless I check for kernel addresses -- I guess we populate page
> tables that overlap the pinned large page region? 

Good point. I think we do on 8xx. Does doing it in set_pte_filter()
(with the kernel check) makes any difference ? faster ? slower ? no
visible change ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
From: Benjamin Herrenschmidt @ 2009-10-05 20:28 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <OFA4FCE21A.BE62CF0F-ONC1257646.0069703B-C1257646.0069E5EB@transmode.se>

On Mon, 2009-10-05 at 21:16 +0200, Joakim Tjernlund wrote:
> Ben, for my understanding: It seems to that the TLB Miss routines in
> head_32.S are less than optimal as it too touches the pte every time
> it hits. Would it not be better to test if ACCESSED and friends are
> already set
> and skip storing the same pte over and over again?

I wouldn't think it's a big deal, but then, the 32-bit hash code
has to also update _PAGE_HASHPTE etc... overall I wouldn't touch
it for now.

However, 8xx should instead look at what I do in recent versions
of head_44x.S or what Kumar did in head_fsl_booke.S

Cheers,
Ben.

^ permalink raw reply

* Re: [FTRACE] Enabling function_graph causes OOPS
From: Benjamin Herrenschmidt @ 2009-10-05 20:20 UTC (permalink / raw)
  To: rostedt; +Cc: linuxppc-dev
In-Reply-To: <1254749155.13160.6.camel@gandalf.stny.rr.com>

On Mon, 2009-10-05 at 09:25 -0400, Steven Rostedt wrote:
> > > Sachin, can you give me more details on how you built that kernel ? (or
> > > give them again in case I missed them the first time around :-), ie,
> > > what toolchain, options, etc... or even better, give me remote access to
> > > the build host ?
> > 
> > Ok, got access and had a quick look... seems to be a toolchain problem
> > to me. I'll investigate more tomorrow.
> 
> Hi Ben,
> 
> Any more word on this issue? 

Didn't you fix it using a TOC access ?

Unless I'm confusing things, I think the problem is the usage
of LOAD_REG_IMMEDIATE which generates relocs that we don't support
when CONFIG_RELOCATABLE is set.

I've merged a patch that post-processes the kernel now, to check
for such relocs so at least you should be warned at build time.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 3/6] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
From: Benjamin Herrenschmidt @ 2009-10-05 20:17 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev@ozlabs.org, Rex Feany
In-Reply-To: <1254744999-3158-4-git-send-email-Joakim.Tjernlund@transmode.se>

On Mon, 2009-10-05 at 14:16 +0200, Joakim Tjernlund wrote:
> Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
> Pros:
>  - I/D TLB Miss never needs to write to the linux pte.
>  - _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
>  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
>     when a page has been made dirty.

Not sure here. You seem to still set those from asm. Is that necessary ?

The approach I take on BookE is to simply not set these from asm, -and-
(this is important) not set write permission if dirty is not set in
the TLB miss and set no access permission at all when accessed is not
set. This is important or we'll miss dirty updates which can
be fatal.

The C code in handle_pte_fault() will fixup missing access and dirty
if necessary and flush.

Also look at the 440 code, I think you could simplify your
implementation using similar techniques, such as andc of PTE against
requested bits etc... and thus maybe save a couple of insns.

Cheers,
Ben.

>  - Proper RO/RW mapping of user space.
>  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
> Cons:
>  - 4 more instructions in I/D TLB Miss, but the since the linux pte is
>    not written anymore, it should still be a win.
> ---
>  arch/powerpc/include/asm/pte-8xx.h |    9 +-
>  arch/powerpc/kernel/head_8xx.S     |  163 ++++++++++++++++++++++++++----------
>  2 files changed, 122 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pte-8xx.h b/arch/powerpc/include/asm/pte-8xx.h
> index 8c6e312..af541a2 100644
> --- a/arch/powerpc/include/asm/pte-8xx.h
> +++ b/arch/powerpc/include/asm/pte-8xx.h
> @@ -32,22 +32,21 @@
>  #define _PAGE_FILE	0x0002	/* when !present: nonlinear file mapping */
>  #define _PAGE_NO_CACHE	0x0002	/* I: cache inhibit */
>  #define _PAGE_SHARED	0x0004	/* No ASID (context) compare */
> +#define _PAGE_DIRTY	0x0100	/* C: page changed */
>  
>  /* These five software bits must be masked out when the entry is loaded
>   * into the TLB.
>   */
>  #define _PAGE_EXEC	0x0008	/* software: i-cache coherency required */
>  #define _PAGE_GUARDED	0x0010	/* software: guarded access */
> -#define _PAGE_DIRTY	0x0020	/* software: page changed */
> -#define _PAGE_RW	0x0040	/* software: user write access allowed */
> -#define _PAGE_ACCESSED	0x0080	/* software: page referenced */
> +#define _PAGE_USER	0x0020	/* software: User space access */
>  
>  /* Setting any bits in the nibble with the follow two controls will
>   * require a TLB exception handler change.  It is assumed unused bits
>   * are always zero.
>   */
> -#define _PAGE_HWWRITE	0x0100	/* h/w write enable: never set in Linux PTE */
> -#define _PAGE_USER	0x0800	/* One of the PP bits, the other is USER&~RW */
> +#define _PAGE_RW	0x0400	/* lsb PP bits */
> +#define _PAGE_ACCESSED	0x0800	/* msb PP bits */
>  
>  #define _PMD_PRESENT	0x0001
>  #define _PMD_BAD	0x0ff0
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 118bb05..b1f72d9 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -333,21 +333,15 @@ InstructionTLBMiss:
>  	mfspr	r11, SPRN_MD_TWC	/* ....and get the pte address */
>  	lwz	r10, 0(r11)	/* Get the pte */
>  
> -#ifdef CONFIG_SWAP
> -	/* do not set the _PAGE_ACCESSED bit of a non-present page */
> -	andi.	r11, r10, _PAGE_PRESENT
> -	beq	4f
> -	ori	r10, r10, _PAGE_ACCESSED
> -	mfspr	r11, SPRN_MD_TWC	/* get the pte address again */
> -	stw	r10, 0(r11)
> -4:
> -#else
> -	ori	r10, r10, _PAGE_ACCESSED
> -	stw	r10, 0(r11)
> -#endif
> -
> +	andi.	r11, r10, _PAGE_USER | _PAGE_ACCESSED
> +	cmpwi	cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> +	beq+	cr0, 5f	/* branch if access allowed */
> +	rlwinm	r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> +	b	6f
> +5:	xori	r10, r10, _PAGE_RW /* invert RW bit */
> +6:
>  	/* The Linux PTE won't go exactly into the MMU TLB.
> -	 * Software indicator bits 21, 22 and 28 must be clear.
> +	 * Software indicator bit 28 must be clear.
>  	 * Software indicator bits 24, 25, 26, and 27 must be
>  	 * set.  All other Linux PTE bits control the behavior
>  	 * of the MMU.
> @@ -409,21 +403,15 @@ DataStoreTLBMiss:
>  	DO_8xx_CPU6(0x3b80, r3)
>  	mtspr	SPRN_MD_TWC, r11
>  
> -#ifdef CONFIG_SWAP
> -	/* do not set the _PAGE_ACCESSED bit of a non-present page */
> -	andi.	r11, r10, _PAGE_PRESENT
> -	beq	4f
> -	ori	r10, r10, _PAGE_ACCESSED
> -4:
> -	/* and update pte in table */
> -#else
> -	ori	r10, r10, _PAGE_ACCESSED
> -#endif
> -	mfspr	r11, SPRN_MD_TWC	/* get the pte address again */
> -	stw	r10, 0(r11)
> -
> +	andi.	r11, r10, _PAGE_USER | _PAGE_ACCESSED
> +	cmpwi	cr0, r11, _PAGE_USER | _PAGE_ACCESSED
> +	beq+	cr0, 5f	/* branch if access allowed */
> +	rlwinm	r10, r10, 0, 22, 19 /* r10 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
> +	b	6f
> +5:	xori	r10, r10, _PAGE_RW /* invert RW bit */
> +6:
>  	/* The Linux PTE won't go exactly into the MMU TLB.
> -	 * Software indicator bits 21, 22 and 28 must be clear.
> +	 * Software indicator bit 28 must be clear.
>  	 * Software indicator bits 24, 25, 26, and 27 must be
>  	 * set.  All other Linux PTE bits control the behavior
>  	 * of the MMU.
> @@ -448,6 +436,91 @@ DataStoreTLBMiss:
>   */
>  	. = 0x1300
>  InstructionTLBError:
> +#ifdef CONFIG_8xx_CPU6
> +	stw	r3, 8(r0)
> +#endif
> +	DO_8xx_CPU6(0x3f80, r3)
> +	mtspr	SPRN_M_TW, r10	/* Save a couple of working registers */
> +	mfcr	r10
> +	stw	r10, 0(r0)
> +	stw	r11, 4(r0)
> +
> +	mfspr	r11, SPRN_SRR1
> +	andis.	r11, r11, 0x5000	/* no translation, guarded */
> +	bne	2f
> +
> +	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
> +#ifdef CONFIG_8xx_CPU15
> +	addi	r11, r10, 0x1000
> +	tlbie	r11
> +	addi	r11, r10, -0x1000
> +	tlbie	r11
> +#endif
> +	DO_8xx_CPU6(0x3780, r3)
> +	mtspr	SPRN_MD_EPN, r10	/* Have to use MD_EPN for walk, MI_EPN can't */
> +	mfspr	r10, SPRN_M_TWB	/* Get level 1 table entry address */
> +
> +	/* If we are faulting a kernel address, we have to use the
> +	 * kernel page tables.
> +	 */
> +	andi.	r11, r10, 0x0800	/* Address >= 0x80000000 */
> +	beq	3f
> +	lis	r11, swapper_pg_dir@h
> +	ori	r11, r11, swapper_pg_dir@l
> +	rlwimi	r10, r11, 0, 2, 19
> +3:
> +	lwz	r11, 0(r10)	/* Get the level 1 entry */
> +	rlwinm.	r10, r11,0,0,19	/* Extract page descriptor page address */
> +	beq	2f		/* If zero, don't try to find a pte */
> +
> +	/* We have a pte table, so load the MI_TWC with the attributes
> +	 * for this "segment."
> +	 */
> +	ori	r11,r11,1		/* Set valid bit */
> +	DO_8xx_CPU6(0x2b80, r3)
> +	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
> +	DO_8xx_CPU6(0x3b80, r3)
> +	mtspr	SPRN_MD_TWC, r11	/* Load pte table base address */
> +
> +	mfspr	r11, SPRN_SRR1
> +	andi.	r11, r11, 0x4000 /* MSR[PR] */
> +	mfspr	r11, SPRN_MD_TWC	/* ....and get the pte address */
> +	lwz	r10, 0(r11)	/* Get the pte */
> +	beq	5f /* Kernel access always OK */
> +	andi.	r11,r10, _PAGE_USER
> +	beq	2f
> +5:	ori	r10, r10, _PAGE_ACCESSED
> +	mfspr	r21, SPRN_MD_TWC	/* ....and get the pte address */
> +	stw	r10, 0(r11)
> +	xori	r10, r10, _PAGE_RW /* invert RW bit */
> +
> +	/* The Linux PTE won't go exactly into the MMU TLB.
> +	 * Software indicator bit 28 must be clear.
> +	 * Software indicator bits 24, 25, 26, and 27 must be
> +	 * set.  All other Linux PTE bits control the behavior
> +	 * of the MMU.
> +	 */
> +	li	r11, 0x00f0
> +	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
> +	DO_8xx_CPU6(0x2d80, r3)
> +	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
> +
> +	mfspr	r10, SPRN_M_TW	/* Restore registers */
> +	lwz	r11, 0(r0)
> +	mtcr	r11
> +	lwz	r11, 4(r0)
> +#ifdef CONFIG_8xx_CPU6
> +	lwz	r3, 8(r0)
> +#endif
> +	rfi
> +
> +2:	mfspr	r10, SPRN_M_TW	/* Restore registers */
> +	lwz	r11, 0(r0)
> +	mtcr	r11
> +	lwz	r11, 4(r0)
> +#ifdef CONFIG_8xx_CPU6
> +	lwz	r3, 8(r0)
> +#endif
>  	b	InstructionAccess
>  
>  /* This is the data TLB error on the MPC8xx.  This could be due to
> @@ -472,8 +545,8 @@ DataTLBError:
>  	/* First, make sure this was a store operation.
>  	*/
>  	mfspr	r10, SPRN_DSISR
> -	andis.	r11, r10, 0x4800 /* no translation, no permission. */
> -	bne	2f	/* branch if either is set */
> +	andis.	r11, r10, 0x4000 /* no translation */
> +	bne	2f	/* branch if set */
>  
>  	/* The EA of a data TLB miss is automatically stored in the MD_EPN
>  	 * register.  The EA of a data TLB error is automatically stored in
> @@ -522,26 +595,26 @@ DataTLBError:
>  	mfspr	r11, SPRN_MD_TWC		/* ....and get the pte address */
>  	lwz	r10, 0(r11)		/* Get the pte */
>  
> -	andi.	r11, r10, _PAGE_RW	/* Is it writeable? */
> -	beq	2f			/* Bail out if not */
> +	mfspr	r11, SPRN_SRR1
> +	andi.	r11, r11, 0x4000 /* MSR[PR] */
> +	beq	5f /* Kernel access always OK */
> +	andi.	r11,r10, _PAGE_USER
> +	beq	2f
> +5:	mfspr	r11, SPRN_DSISR
> +	andis.	r11, r11, 0x0200	/* store */
> +	beq	6f
> +	andi.	r11, r10, _PAGE_RW	/* writeable? */
> +	beq	2f /* branch if not */
> +	ori	r10, r10, _PAGE_DIRTY | _PAGE_HWWRITE
> +6:	ori	r10, r10, _PAGE_ACCESSED
>  
> -	/* Update 'changed', among others.
> -	*/
> -#ifdef CONFIG_SWAP
> -	ori	r10, r10, _PAGE_DIRTY|_PAGE_HWWRITE
> -	/* do not set the _PAGE_ACCESSED bit of a non-present page */
> -	andi.	r11, r10, _PAGE_PRESENT
> -	beq	4f
> -	ori	r10, r10, _PAGE_ACCESSED
> -4:
> -#else
> -	ori	r10, r10, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
> -#endif
>  	mfspr	r11, SPRN_MD_TWC		/* Get pte address again */
>  	stw	r10, 0(r11)		/* and update pte in table */
>  
> +	xori	r10, r10, _PAGE_RW	/* Invert RW bit */
> +
>  	/* The Linux PTE won't go exactly into the MMU TLB.
> -	 * Software indicator bits 21, 22 and 28 must be clear.
> +	 * Software indicator bit 28 must be clear.
>  	 * Software indicator bits 24, 25, 26, and 27 must be
>  	 * set.  All other Linux PTE bits control the behavior
>  	 * of the MMU.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox