[PATCH] Handle I-TLB Error and Miss separately on 8xx

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Handle I-TLB Error and Miss separately on 8xx
@ 2005-01-07 16:22 Tom Rini
  0 siblings, 0 replies; 21+ messages in thread
From: Tom Rini @ 2005-01-07 16:22 UTC (permalink / raw)
  To: linuxppc-embedded

As the code stands currently, there is a bug in the 2.4 and 2.6 handling
of I-TLB Miss and Error exceptions on 8xx.  The problem is that since we
treat both of them as the same exception when we hit do_page_fault,
there is a case where we can incorrectly find that a protection fault
has occured, when it hasn't.  This is because we check bit 4 of SRR1 in
both cases, but in the case of an I-TLB Miss, this bit is always set,
and it only indicates a protection fault on an I-TLB Error.

Originally from Grigori Tolstolytkin <gtolstolytkin@ru.mvista.com>.
Signed-off-by: Tom Rini <trini@kernel.crashing.org>

Patch vs 2.4-current:
--- 1.19/arch/ppc/kernel/head_8xx.S	2003-10-27 12:31:25 -07:00
+++ edited/arch/ppc/kernel/head_8xx.S	2005-01-07 08:57:31 -07:00
@@ -501,10 +501,18 @@
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
+ * But we can't just jump from the InstructionAccess fault (0x400) as
+ * do_page_fault() needs to know.
  */
 	. = 0x1300
 InstructionTLBError:
-	b	InstructionAccess
+	EXCEPTION_PROLOG
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	mr	r4,r22
+	mr	r5,r23
+	li	r20,MSR_KERNEL
+	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
+	FINISH_EXCEPTION(do_page_fault)
 
 /* This is the data TLB error on the MPC8xx.  This could be due to
  * many reasons, including a dirty update to a pte.  We can catch that
--- 1.15/arch/ppc/mm/fault.c	2003-08-29 03:37:49 -07:00
+++ edited/arch/ppc/mm/fault.c	2005-01-07 08:59:25 -07:00
@@ -91,7 +91,8 @@
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault. For 400-family processors
  * the error_code parameter is ESR for a data fault, 0 for an instruction
- * fault.
+ * fault.  On 800-family processors, we fudge an I-TLB Miss (0x1100) as
+ * being at 0x400 for space reasons.
  */
 void do_page_fault(struct pt_regs *regs, unsigned long address,
 		   unsigned long error_code)
@@ -111,7 +112,11 @@
 	 * bits we are interested in.  But there are some bits which
 	 * indicate errors in DSISR but can validly be set in SRR1.
 	 */
+#ifdef CONFIG_8xx
+	if (regs->trap == 0x400 || regs->trap == 0x1300)
+#else
 	if (regs->trap == 0x400)
+#endif
 		error_code &= 0x48200000;
 	else
 		is_write = error_code & 0x02000000;
@@ -204,8 +209,17 @@
 			goto bad_area;
 	/* a read */
 	} else {
-		/* protection fault */
+		/*
+		 * On non-8xx, a protection fault.  On 8xx, this bit is
+		 * always set on I-TLB Miss, but indicates a protection
+		 * fault on an I-TLB Error.  So we only check this bit
+		 * if we aren't an I-TLB Miss.
+		 */
+#ifdef CONFIG_8xx
+		if ((error_code & 0x08000000) && regs->trap != 0x400)
+#else
 		if (error_code & 0x08000000)
+#endif
 			goto bad_area;
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;

Patch vs 2.6-current:
--- 1.18/arch/ppc/kernel/head_8xx.S	2004-11-11 01:25:53 -07:00
+++ edited/arch/ppc/kernel/head_8xx.S	2005-01-07 09:13:05 -07:00
@@ -445,10 +445,15 @@
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
+ * But we can't just jump from the InstructionAccess fault (0x400) as
+ * do_page_fault() needs to know.
  */
 	. = 0x1300
 InstructionTLBError:
-	b	InstructionAccess
+	EXCEPTION_PROLOG
+	mr	r4,r12
+	mr	r5,r9
+	EXC_XFER_EE_LITE(0x1300, handle_page_fault)
 
 /* This is the data TLB error on the MPC8xx.  This could be due to
  * many reasons, including a dirty update to a pte.  We can catch that
--- 1.21/arch/ppc/mm/fault.c	2004-07-26 14:43:22 -07:00
+++ edited/arch/ppc/mm/fault.c	2005-01-07 09:11:44 -07:00
@@ -90,7 +90,8 @@
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault. For 400-family processors
  * the error_code parameter is ESR for a data fault, 0 for an instruction
- * fault.
+ * fault.  On 800-family processors, we fudge an I-TLB Miss (0x1100) as
+ * being at 0x400 for space reasons.
  */
 int do_page_fault(struct pt_regs *regs, unsigned long address,
 		  unsigned long error_code)
@@ -110,7 +111,11 @@
 	 * bits we are interested in.  But there are some bits which
 	 * indicate errors in DSISR but can validly be set in SRR1.
 	 */
-	if (TRAP(regs) == 0x400)
+#ifdef CONFIG_8xx
+	if (TRAP(regs) == 0x400 || TRAP(regs) == 0x1300)
+#else
+ 	if (TRAP(regs) == 0x400)
+#endif
 		error_code &= 0x48200000;
 	else
 		is_write = error_code & 0x02000000;
@@ -235,8 +240,17 @@
 #endif
 	/* a read */
 	} else {
-		/* protection fault */
+		/*
+		 * On non-8xx, a protection fault.  On 8xx, this bit is
+		 * always set on I-TLB Miss, but indicates a protection
+		 * fault on an I-TLB Error.  So we only check this bit
+		 * if we aren't an I-TLB Miss.
+		 */
+#ifdef CONFIG_8xx
+		if ((error_code & 0x08000000) && regs->trap != 0x400)
+#else
 		if (error_code & 0x08000000)
+#endif
 			goto bad_area;
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
 			goto bad_area;

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
@ 2005-01-12  7:53 Joakim Tjernlund
  2005-01-12 14:06 ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-12  7:53 UTC (permalink / raw)
  To: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

> As the code stands currently, there is a bug in the 2.4 and 2.6 handling
> of I-TLB Miss and Error exceptions on 8xx.  The problem is that since we
> treat both of them as the same exception when we hit do_page_fault,
> there is a case where we can incorrectly find that a protection fault
> has occured, when it hasn't.  This is because we check bit 4 of SRR1 in
> both cases, but in the case of an I-TLB Miss, this bit is always set,
> and it only indicates a protection fault on an I-TLB Error.

Patch looks good to me, but I want to ask when this error
can be triggered in practice?

I have never seen it happen and it makes me wonder if the test
for a null pte in the I-TLB Miss handler is needed?

In linuxppc-2.4 there is a special case for pinned tlbs were
one could remove 4 instructions if the test for null ptes is removed.

I belive SPRG2 is free in 2.6 and if combined with the special case for pinned
tlbs in linuxppc-2.4 one can remove all memory references used for temporary
storage in the I-TLB Miss handler. That will save a cache line load&store.

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-12  7:53 [PATCH] Handle I-TLB Error and Miss separately on 8xx Joakim Tjernlund
@ 2005-01-12 14:06 ` Tom Rini
  2005-01-12 14:17   ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Rini @ 2005-01-12 14:06 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > As the code stands currently, there is a bug in the 2.4 and 2.6 handling
> > of I-TLB Miss and Error exceptions on 8xx.  The problem is that since we
> > treat both of them as the same exception when we hit do_page_fault,
> > there is a case where we can incorrectly find that a protection fault
> > has occured, when it hasn't.  This is because we check bit 4 of SRR1 in
> > both cases, but in the case of an I-TLB Miss, this bit is always set,
> > and it only indicates a protection fault on an I-TLB Error.
> 
> Patch looks good to me, but I want to ask when this error
> can be triggered in practice?

It is possible to see this in the real world, as we (<hat=mvista>) found
this with a customers app.

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-12 14:06 ` Tom Rini
@ 2005-01-12 14:17   ` Joakim Tjernlund
  2005-01-12 15:15     ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-12 14:17 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > > As the code stands currently, there is a bug in the 2.4 and 2.6 handling
> > > of I-TLB Miss and Error exceptions on 8xx.  The problem is that since we
> > > treat both of them as the same exception when we hit do_page_fault,
> > > there is a case where we can incorrectly find that a protection fault
> > > has occured, when it hasn't.  This is because we check bit 4 of SRR1 in
> > > both cases, but in the case of an I-TLB Miss, this bit is always set,
> > > and it only indicates a protection fault on an I-TLB Error.
> > 
> > Patch looks good to me, but I want to ask when this error
> > can be triggered in practice?
> 
> It is possible to see this in the real world, as we (<hat=mvista>) found
> this with a customers app.

hmm, this app must have been doing something pretty special. Any idea what
caused it?

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-12 14:17   ` Joakim Tjernlund
@ 2005-01-12 15:15     ` Tom Rini
  2005-01-13 15:16       ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Rini @ 2005-01-12 15:15 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > > > As the code stands currently, there is a bug in the 2.4 and 2.6 handling
> > > > of I-TLB Miss and Error exceptions on 8xx.  The problem is that since we
> > > > treat both of them as the same exception when we hit do_page_fault,
> > > > there is a case where we can incorrectly find that a protection fault
> > > > has occured, when it hasn't.  This is because we check bit 4 of SRR1 in
> > > > both cases, but in the case of an I-TLB Miss, this bit is always set,
> > > > and it only indicates a protection fault on an I-TLB Error.
> > > 
> > > Patch looks good to me, but I want to ask when this error
> > > can be triggered in practice?
> > 
> > It is possible to see this in the real world, as we (<hat=mvista>) found
> > this with a customers app.
> 
> hmm, this app must have been doing something pretty special. Any idea what
> caused it?

Only vaugely.  I'll poke the folks who did the investigation to see if
they recall (the app is quite large) and follow up with details, I hope.

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-12 15:15     ` Tom Rini
@ 2005-01-13 15:16       ` Tom Rini
  2005-01-14 14:03         ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Rini @ 2005-01-13 15:16 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Wed, Jan 12, 2005 at 08:15:08AM -0700, Tom Rini wrote:
> On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
[snip]
> > > > Patch looks good to me, but I want to ask when this error
> > > > can be triggered in practice?
> > > 
> > > It is possible to see this in the real world, as we (<hat=mvista>) found
> > > this with a customers app.
> > 
> > hmm, this app must have been doing something pretty special. Any idea what
> > caused it?
> 
> Only vaugely.  I'll poke the folks who did the investigation to see if
> they recall (the app is quite large) and follow up with details, I hope.

First, we couldn't get this issue to happen w/ anything but the custom
app.  It would generate a lot of I-TLB Error exceptions, with bit 1 of
SRR1 set, and these went fine, the I-TLB got updated, and execution
continued.  But then at some point, and we aren't sure why exactly, an
0x1100 is generated, and we crash.  We don't know what went and caused
an 0x1100 to be generated instead of an 0x1300 (my wild-ass-guess is the
code jumped very very far ahead).

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-13 15:16       ` Tom Rini
@ 2005-01-14 14:03         ` Joakim Tjernlund
  2005-01-14 17:38           ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-14 14:03 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> -----Original Message-----
> From: Tom Rini [mailto:trini@kernel.crashing.org]
> On Wed, Jan 12, 2005 at 08:15:08AM -0700, Tom Rini wrote:
> > On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > > > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> [snip]
> > > > > Patch looks good to me, but I want to ask when this error
> > > > > can be triggered in practice?
> > > > 
> > > > It is possible to see this in the real world, as we (<hat=mvista>) found
> > > > this with a customers app.
> > > 
> > > hmm, this app must have been doing something pretty special. Any idea what
> > > caused it?
> > 
> > Only vaugely.  I'll poke the folks who did the investigation to see if
> > they recall (the app is quite large) and follow up with details, I hope.
> 
> First, we couldn't get this issue to happen w/ anything but the custom
> app.  It would generate a lot of I-TLB Error exceptions, with bit 1 of
> SRR1 set, and these went fine, the I-TLB got updated, and execution
> continued.  But then at some point, and we aren't sure why exactly, an
> 0x1100 is generated, and we crash.  We don't know what went and caused
> an 0x1100 to be generated instead of an 0x1300 (my wild-ass-guess is the
> code jumped very very far ahead).

To me this looks like you entered the I-TLB Miss handler with a NULL pte which
is something that never happens in my system, don't know why this is so but I am
guessing that the kernel populates all instruction pte's at exec time. On the
other hand I don't understand why there are so many I-TLB errors, is that normal?

Does the app modify its own code or construct a code trampoline which it jumps to? Not
sure how that would be handled by the kernel w.r.t NULL pte's

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 14:03         ` Joakim Tjernlund
@ 2005-01-14 17:38           ` Joakim Tjernlund
  2005-01-14 17:47             ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-14 17:38 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> > -----Original Message-----
> > From: Tom Rini [mailto:trini@kernel.crashing.org]
> > On Wed, Jan 12, 2005 at 08:15:08AM -0700, Tom Rini wrote:
> > > On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > > > > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > [snip]
> > > > > > Patch looks good to me, but I want to ask when this error
> > > > > > can be triggered in practice?
> > > > > 
> > > > > It is possible to see this in the real world, as we (<hat=mvista>) found
> > > > > this with a customers app.
> > > > 
> > > > hmm, this app must have been doing something pretty special. Any idea what
> > > > caused it?
> > > 
> > > Only vaugely.  I'll poke the folks who did the investigation to see if
> > > they recall (the app is quite large) and follow up with details, I hope.
> > 
> > First, we couldn't get this issue to happen w/ anything but the custom
> > app.  It would generate a lot of I-TLB Error exceptions, with bit 1 of
> > SRR1 set, and these went fine, the I-TLB got updated, and execution
> > continued.  But then at some point, and we aren't sure why exactly, an
> > 0x1100 is generated, and we crash.  We don't know what went and caused
> > an 0x1100 to be generated instead of an 0x1300 (my wild-ass-guess is the
> > code jumped very very far ahead).
> 
> To me this looks like you entered the I-TLB Miss handler with a NULL pte which
> is something that never happens in my system, don't know why this is so but I am
> guessing that the kernel populates all instruction pte's at exec time. On the
> other hand I don't understand why there are so many I-TLB errors, is that normal?
> 
> Does the app modify its own code or construct a code trampoline which it jumps to? Not
> sure how that would be handled by the kernel w.r.t NULL pte's
> 
>  Jocke

I think I have figured this out. The first TLB misses that happen at app startup is Data
TLB misses. These will then hit the NULL L1 entry and end up in do_page_fault() which
will populate the L1 entry. But when you have a very large app that spans more than one
L1 entry (16 MB I think) it may happen that you will have I-TLB Miss first one of the
L1 entrys which will make the I-TLB handler bail out to do_page_fault() and the app
craches(SEGV).

Your patch will fix this. 
I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 17:38           ` Joakim Tjernlund
@ 2005-01-14 17:47             ` Tom Rini
  2005-01-14 17:56               ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Rini @ 2005-01-14 17:47 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Fri, Jan 14, 2005 at 06:38:35PM +0100, Joakim Tjernlund wrote:
> > > -----Original Message-----
> > > From: Tom Rini [mailto:trini@kernel.crashing.org]
> > > On Wed, Jan 12, 2005 at 08:15:08AM -0700, Tom Rini wrote:
> > > > On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > > > > > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > > [snip]
> > > > > > > Patch looks good to me, but I want to ask when this error
> > > > > > > can be triggered in practice?
> > > > > > 
> > > > > > It is possible to see this in the real world, as we (<hat=mvista>) found
> > > > > > this with a customers app.
> > > > > 
> > > > > hmm, this app must have been doing something pretty special. Any idea what
> > > > > caused it?
> > > > 
> > > > Only vaugely.  I'll poke the folks who did the investigation to see if
> > > > they recall (the app is quite large) and follow up with details, I hope.
> > > 
> > > First, we couldn't get this issue to happen w/ anything but the custom
> > > app.  It would generate a lot of I-TLB Error exceptions, with bit 1 of
> > > SRR1 set, and these went fine, the I-TLB got updated, and execution
> > > continued.  But then at some point, and we aren't sure why exactly, an
> > > 0x1100 is generated, and we crash.  We don't know what went and caused
> > > an 0x1100 to be generated instead of an 0x1300 (my wild-ass-guess is the
> > > code jumped very very far ahead).
> > 
> > To me this looks like you entered the I-TLB Miss handler with a NULL pte which
> > is something that never happens in my system, don't know why this is so but I am
> > guessing that the kernel populates all instruction pte's at exec time. On the
> > other hand I don't understand why there are so many I-TLB errors, is that normal?
> > 
> > Does the app modify its own code or construct a code trampoline which it jumps to? Not
> > sure how that would be handled by the kernel w.r.t NULL pte's
> > 
> >  Jocke
> 
> I think I have figured this out. The first TLB misses that happen at app startup is Data
> TLB misses. These will then hit the NULL L1 entry and end up in do_page_fault() which
> will populate the L1 entry. But when you have a very large app that spans more than one
> L1 entry (16 MB I think) it may happen that you will have I-TLB Miss first one of the
> L1 entrys which will make the I-TLB handler bail out to do_page_fault() and the app
> craches(SEGV).

Yes, that sounds like it.  Thanks.

> Your patch will fix this. 
> I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?

I was hoping Marcelo would pick this up since I thought he was on the
list.  I'll re-poke him.  For 2.6, the app in question crashes
differently, prior to hitting this bug, but I do want to get it pushed
out.  I've just been swamped lately.

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 17:47             ` Tom Rini
@ 2005-01-14 17:56               ` Joakim Tjernlund
  2005-01-14 18:05                 ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-14 17:56 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> > Your patch will fix this. 
> > I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?
> 
> I was hoping Marcelo would pick this up since I thought he was on the
> list.  I'll re-poke him.  For 2.6, the app in question crashes
> differently, prior to hitting this bug, but I do want to get it pushed
> out.  I've just been swamped lately.

Is 2.6 on 8xx stable for you(except for this app)? I was under the impression that
2.6 was a bit flaky due to an unidentified MM bug.

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 17:56               ` Joakim Tjernlund
@ 2005-01-14 18:05                 ` Tom Rini
  2005-01-14 19:51                   ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Rini @ 2005-01-14 18:05 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Fri, Jan 14, 2005 at 06:56:58PM +0100, Joakim Tjernlund wrote:
> > > Your patch will fix this. 
> > > I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?
> > 
> > I was hoping Marcelo would pick this up since I thought he was on the
> > list.  I'll re-poke him.  For 2.6, the app in question crashes
> > differently, prior to hitting this bug, but I do want to get it pushed
> > out.  I've just been swamped lately.
> 
> Is 2.6 on 8xx stable for you(except for this app)? I was under the impression that
> 2.6 was a bit flaky due to an unidentified MM bug.

No, I don't believe it's very stable (this wa the first chance I've had
in a while to fire up my rpxlite).

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 18:05                 ` Tom Rini
@ 2005-01-14 19:51                   ` Joakim Tjernlund
  2005-01-18 20:09                     ` Tom Rini
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-14 19:51 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> 
> On Fri, Jan 14, 2005 at 06:56:58PM +0100, Joakim Tjernlund wrote:
> > > > Your patch will fix this. 
> > > > I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?
> > > 
> > > I was hoping Marcelo would pick this up since I thought he was on the
> > > list.  I'll re-poke him.  For 2.6, the app in question crashes
> > > differently, prior to hitting this bug, but I do want to get it pushed
> > > out.  I've just been swamped lately.
> > 
> > Is 2.6 on 8xx stable for you(except for this app)? I was under the impression that
> > 2.6 was a bit flaky due to an unidentified MM bug.
> 
> No, I don't believe it's very stable (this wa the first chance I've had
> in a while to fire up my rpxlite).

BTW, there is a simpler fix to the TLB Miss problem.
In the TLB Miss handlers, just move the 2: label a few instr. upwards to
the same line as the "li	r21, 0x00f0". That way you will force a 
TLB error. You can do this for both Data and Instr. Miss handlers.
The code after where the 2: label used to be can be deleted. 

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-14 19:51                   ` Joakim Tjernlund
@ 2005-01-18 20:09                     ` Tom Rini
  2005-01-18 22:02                       ` Joakim Tjernlund
  2005-01-19  0:30                       ` Joakim Tjernlund
  0 siblings, 2 replies; 21+ messages in thread
From: Tom Rini @ 2005-01-18 20:09 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Linuxppc-Embedded@Ozlabs. Org

On Fri, Jan 14, 2005 at 08:51:44PM +0100, Joakim Tjernlund wrote:

> BTW, there is a simpler fix to the TLB Miss problem.
> In the TLB Miss handlers, just move the 2: label a few instr. upwards to
> the same line as the "li	r21, 0x00f0". That way you will force a 
> TLB error. You can do this for both Data and Instr. Miss handlers.
> The code after where the 2: label used to be can be deleted. 

Like this?  Only lightly tested on my rpxlite on 2.6:


---

 linux-2.6-current-trini/arch/ppc/kernel/head_8xx.S |   22 +--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)

diff -puN arch/ppc/kernel/head_8xx.S~ppc32-mpc8xx-itlbmiss-fix arch/ppc/kernel/head_8xx.S
--- linux-2.6-current/arch/ppc/kernel/head_8xx.S~ppc32-mpc8xx-itlbmiss-fix	2005-01-18 11:55:34.000000000 -0700
+++ linux-2.6-current-trini/arch/ppc/kernel/head_8xx.S	2005-01-18 11:57:11.000000000 -0700
@@ -343,7 +343,7 @@ InstructionTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
-	li	r11, 0x00f0
+2:	li	r11, 0x00f0
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	MI_RPN, r10	/* Update TLB entry */
@@ -357,15 +357,6 @@ InstructionTLBMiss:
 #endif
 	rfi
 
-2:	mfspr	r10, M_TW	/* Restore registers */
-	lwz	r11, 0(r0)
-	mtcr	r11
-	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
-	lwz	r3, 8(r0)
-#endif
-	b	InstructionAccess
-
 	. = 0x1200
 DataStoreTLBMiss:
 #ifdef CONFIG_8xx_CPU6
@@ -419,7 +410,7 @@ DataStoreTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
-	li	r11, 0x00f0
+2:	li	r11, 0x00f0
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	MD_RPN, r10	/* Update TLB entry */
@@ -433,15 +424,6 @@ DataStoreTLBMiss:
 #endif
 	rfi
 
-2:	mfspr	r10, M_TW	/* Restore registers */
-	lwz	r11, 0(r0)
-	mtcr	r11
-	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
-	lwz	r3, 8(r0)
-#endif
-	b	DataAccess
-
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
_

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-18 20:09                     ` Tom Rini
@ 2005-01-18 22:02                       ` Joakim Tjernlund
  2005-01-19 17:25                         ` Dan Malek
  2005-01-19  0:30                       ` Joakim Tjernlund
  1 sibling, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-18 22:02 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> 
> On Fri, Jan 14, 2005 at 08:51:44PM +0100, Joakim Tjernlund wrote:
> 
> > BTW, there is a simpler fix to the TLB Miss problem.
> > In the TLB Miss handlers, just move the 2: label a few instr. upwards to
> > the same line as the "li	r21, 0x00f0". That way you will force a 
> > TLB error. You can do this for both Data and Instr. Miss handlers.
> > The code after where the 2: label used to be can be deleted. 
> 
> Like this?  Only lightly tested on my rpxlite on 2.6:

Yes, that way you will load a zero pte, except for the bits that always must be ones
into the MMU and TLB error will follow.

You can also move beq 2f instr. 2 lines down and change it to beq- 2f to let branch
prediction do its thing.

Then all that is missing is my dcbX patch and support for 16K pages :)

 Jocke 
> 
> 
> ---
> 
>  linux-2.6-current-trini/arch/ppc/kernel/head_8xx.S |   22 +--------------------
>  1 files changed, 2 insertions(+), 20 deletions(-)
> 
> diff -puN arch/ppc/kernel/head_8xx.S~ppc32-mpc8xx-itlbmiss-fix arch/ppc/kernel/head_8xx.S
> --- linux-2.6-current/arch/ppc/kernel/head_8xx.S~ppc32-mpc8xx-itlbmiss-fix	2005-01-18 11:55:34.000000000 -0700
> +++ linux-2.6-current-trini/arch/ppc/kernel/head_8xx.S	2005-01-18 11:57:11.000000000 -0700
> @@ -343,7 +343,7 @@ InstructionTLBMiss:
>  	 * set.  All other Linux PTE bits control the behavior
>  	 * of the MMU.
>  	 */
> -	li	r11, 0x00f0
> +2:	li	r11, 0x00f0
>  	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
>  	DO_8xx_CPU6(0x2d80, r3)
>  	mtspr	MI_RPN, r10	/* Update TLB entry */
> @@ -357,15 +357,6 @@ InstructionTLBMiss:
>  #endif
>  	rfi
>  
> -2:	mfspr	r10, M_TW	/* Restore registers */
> -	lwz	r11, 0(r0)
> -	mtcr	r11
> -	lwz	r11, 4(r0)
> -#ifdef CONFIG_8xx_CPU6
> -	lwz	r3, 8(r0)
> -#endif
> -	b	InstructionAccess
> -
>  	. = 0x1200
>  DataStoreTLBMiss:
>  #ifdef CONFIG_8xx_CPU6
> @@ -419,7 +410,7 @@ DataStoreTLBMiss:
>  	 * set.  All other Linux PTE bits control the behavior
>  	 * of the MMU.
>  	 */
> -	li	r11, 0x00f0
> +2:	li	r11, 0x00f0
>  	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
>  	DO_8xx_CPU6(0x3d80, r3)
>  	mtspr	MD_RPN, r10	/* Update TLB entry */
> @@ -433,15 +424,6 @@ DataStoreTLBMiss:
>  #endif
>  	rfi
>  
> -2:	mfspr	r10, M_TW	/* Restore registers */
> -	lwz	r11, 0(r0)
> -	mtcr	r11
> -	lwz	r11, 4(r0)
> -#ifdef CONFIG_8xx_CPU6
> -	lwz	r3, 8(r0)
> -#endif
> -	b	DataAccess
> -
>  /* This is an instruction TLB error on the MPC8xx.  This could be due
>   * to many reasons, such as executing guarded memory or illegal instruction
>   * addresses.  There is nothing to do but handle a big time error fault.
> _
> 
> -- 
> Tom Rini
> http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-18 20:09                     ` Tom Rini
  2005-01-18 22:02                       ` Joakim Tjernlund
@ 2005-01-19  0:30                       ` Joakim Tjernlund
  2005-01-19 17:28                         ` Dan Malek
  1 sibling, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-19  0:30 UTC (permalink / raw)
  To: Tom Rini; +Cc: Linuxppc-Embedded@Ozlabs. Org

> On Fri, Jan 14, 2005 at 08:51:44PM +0100, Joakim Tjernlund wrote:
> 
> > BTW, there is a simpler fix to the TLB Miss problem.
> > In the TLB Miss handlers, just move the 2: label a few instr. upwards to
> > the same line as the "li	r21, 0x00f0". That way you will force a 
> > TLB error. You can do this for both Data and Instr. Miss handlers.
> > The code after where the 2: label used to be can be deleted. 
> 
> Like this?  Only lightly tested on my rpxlite on 2.6:

Something related I wonder about. Is it necessary to update the ACCESSED
	ori	r10, r10, _PAGE_ACCESSED
	stw	r10, 0(r11)
bit in the pte? 2 instr. and a cache line write will be saved in each TLB Miss handler
if this step can be omitted. Any MM gurus around?

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-18 22:02                       ` Joakim Tjernlund
@ 2005-01-19 17:25                         ` Dan Malek
  2005-01-19 18:06                           ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Malek @ 2005-01-19 17:25 UTC (permalink / raw)
  To: Joakim.Tjernlund; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

On Jan 18, 2005, at 5:02 PM, Joakim Tjernlund wrote:

> Then all that is missing is my dcbX patch and support for 16K pages :)

We'll get to that.  I'd prefer to support 4K and 8M pages, though.  I 
had
started on doing that, some code remnants are still there, but I never
got it to work without adding too much code to the I-tlbmiss hander.
I want an tlbmiss handler like the example in the manual, about 10
lines of straight code.  I don't like to be mucking around with PTE 
entries
a higher level function could/should have set for us in a less critical 
path.

	-- Dan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-19  0:30                       ` Joakim Tjernlund
@ 2005-01-19 17:28                         ` Dan Malek
  2005-01-19 18:12                           ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Malek @ 2005-01-19 17:28 UTC (permalink / raw)
  To: Joakim.Tjernlund; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

On Jan 18, 2005, at 7:30 PM, Joakim Tjernlund wrote:

> Something related I wonder about. Is it necessary to update the 
> ACCESSED

Yes.  I used to take some shortcuts in the past that always marked pages
accessed and dirty (if necessary) to eliminate code in the tlbmiss 
handler.
Paulus made me fix that, so page stealing and swapping would work
more effectively.  However, I'm still considering making a configuration
option because on embedded systems where the stealing and swapping
don't occur you shouldn't have to pay the price of the management in
the tlbmiss handler.

	-- Dan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-19 17:25                         ` Dan Malek
@ 2005-01-19 18:06                           ` Joakim Tjernlund
  2005-01-19 18:37                             ` Dan Malek
  0 siblings, 1 reply; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-19 18:06 UTC (permalink / raw)
  To: Dan Malek; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

> 
> On Jan 18, 2005, at 5:02 PM, Joakim Tjernlund wrote:
> 
> > Then all that is missing is my dcbX patch and support for 16K pages :)
> 
> We'll get to that.  I'd prefer to support 4K and 8M pages, though.  I had

8MB pages for user space? Isn't that a bit big?

> started on doing that, some code remnants are still there, but I never
> got it to work without adding too much code to the I-tlbmiss hander.
> I want an tlbmiss handler like the example in the manual, about 10
> lines of straight code.  I don't like to be mucking around with PTE 
> entries
> a higher level function could/should have set for us in a less critical 
> path.

yes, that would be nice :)

> 
> 	-- Dan
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-19 17:28                         ` Dan Malek
@ 2005-01-19 18:12                           ` Joakim Tjernlund
  0 siblings, 0 replies; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-19 18:12 UTC (permalink / raw)
  To: Dan Malek; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

> 
> On Jan 18, 2005, at 7:30 PM, Joakim Tjernlund wrote:
> 
> > Something related I wonder about. Is it necessary to update the 
> > ACCESSED
> 
> Yes.  I used to take some shortcuts in the past that always marked pages
> accessed and dirty (if necessary) to eliminate code in the tlbmiss 
> handler.
> Paulus made me fix that, so page stealing and swapping would work
> more effectively.  However, I'm still considering making a configuration
> option because on embedded systems where the stealing and swapping
> don't occur you shouldn't have to pay the price of the management in
> the tlbmiss handler.

hmm, don't understand the page stealing part. Which configs doesn't
steal pages? All that doesn't swap?

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-19 18:06                           ` Joakim Tjernlund
@ 2005-01-19 18:37                             ` Dan Malek
  2005-01-19 19:58                               ` Joakim Tjernlund
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Malek @ 2005-01-19 18:37 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

On Jan 19, 2005, at 1:06 PM, Joakim Tjernlund wrote:

> 8MB pages for user space? Isn't that a bit big?

Just for kernel space.  I think eliminating lots of TLB updates for the
kernel will be one of the easiest ways to get a boost of performance
in this area.  This way, system calls or interrupts won't pollute the 
TLB
for the application, which can then just continue to run without having
to reload the TLB after such events.

Part of my graduate work years ago was MMU overhead with various
page sizes and replacement algorithms.  Increase page sizes from
something like 1K to 8M made a difference you could measure, but
the difference between 1K to 32K made little difference.   Of course,
you can always write some application that just trashes the TLB and
proves any improvement incorrect, but that would also trash caches and
other system resources.  I'll always contend that if you can measure
the difference between a 4K and 16K page, something else is grossly
wrong with the system in general and you should be looking for
performance problems elsewhere.

	-- Dan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] Handle I-TLB Error and Miss separately on 8xx
  2005-01-19 18:37                             ` Dan Malek
@ 2005-01-19 19:58                               ` Joakim Tjernlund
  0 siblings, 0 replies; 21+ messages in thread
From: Joakim Tjernlund @ 2005-01-19 19:58 UTC (permalink / raw)
  To: Dan Malek; +Cc: Tom Rini, Linuxppc-Embedded@Ozlabs. Org

> On Jan 19, 2005, at 1:06 PM, Joakim Tjernlund wrote:
> 
> > 8MB pages for user space? Isn't that a bit big?
> 
> Just for kernel space.  I think eliminating lots of TLB updates for the
> kernel will be one of the easiest ways to get a boost of performance
> in this area.  This way, system calls or interrupts won't pollute the 
> TLB
> for the application, which can then just continue to run without having
> to reload the TLB after such events.

I see, similar to the pinned TLBs in 8xx but without the "pin"
This would include vmalloc space and ioremapped space as well?

> 
> Part of my graduate work years ago was MMU overhead with various
> page sizes and replacement algorithms.  Increase page sizes from
> something like 1K to 8M made a difference you could measure, but
> the difference between 1K to 32K made little difference.   Of course,
> you can always write some application that just trashes the TLB and
> proves any improvement incorrect, but that would also trash caches and
> other system resources.  I'll always contend that if you can measure
> the difference between a 4K and 16K page, something else is grossly
> wrong with the system in general and you should be looking for
> performance problems elsewhere.

hmm, did you measure this on a 8xx with that have few TLBs?
I havn't done any measurements, it just seemed like a good idea for
systems with few TLBs and lots of RAM.

 Jocke

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-01-19 19:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-12  7:53 [PATCH] Handle I-TLB Error and Miss separately on 8xx Joakim Tjernlund
2005-01-12 14:06 ` Tom Rini
2005-01-12 14:17   ` Joakim Tjernlund
2005-01-12 15:15     ` Tom Rini
2005-01-13 15:16       ` Tom Rini
2005-01-14 14:03         ` Joakim Tjernlund
2005-01-14 17:38           ` Joakim Tjernlund
2005-01-14 17:47             ` Tom Rini
2005-01-14 17:56               ` Joakim Tjernlund
2005-01-14 18:05                 ` Tom Rini
2005-01-14 19:51                   ` Joakim Tjernlund
2005-01-18 20:09                     ` Tom Rini
2005-01-18 22:02                       ` Joakim Tjernlund
2005-01-19 17:25                         ` Dan Malek
2005-01-19 18:06                           ` Joakim Tjernlund
2005-01-19 18:37                             ` Dan Malek
2005-01-19 19:58                               ` Joakim Tjernlund
2005-01-19  0:30                       ` Joakim Tjernlund
2005-01-19 17:28                         ` Dan Malek
2005-01-19 18:12                           ` Joakim Tjernlund
  -- strict thread matches above, loose matches on Subject: below --
2005-01-07 16:22 Tom Rini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).