* 8xx v2.6 TLB problems and suggested workaround
@ 2005-04-04 19:17 Marcelo Tosatti
2005-04-04 20:09 ` Marcelo Tosatti
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-04 19:17 UTC (permalink / raw)
To: linux-ppc-embedded, Dan Malek, Pantelis Antoniou; +Cc: Paul Mackerras
(need volunteers to test the patch below on 8xx)
Hi,
I've been investigating the 8xx update_mmu_cache() oops for the last weeks, and
here is what I have gathered.
Oops: kernel access of bad area, sig: 11 [#1]
NIP: C00049E8 LR: C000A5D0 SP: C4F53E10 REGS: c4f53d60 TRAP: 0300 Not taintedMSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 100113A0, DSISR: C2000000
TASK = c53f17e0[1224] 'a' THREAD: c4f52000
Last syscall: 47
GPR00: C783D2A0 C4F53E10 C53F17E0 10050000 00000100 0009F0A0 10050000 00000000
GPR08: 00075925 C783D2A0 C53F17E0 00000000 00076924 10077178 00000000 100B4338
GPR16: 100BBDE8 0ED792CE 7FFFF670 00000000 00000000 00000000 00000000 C4F41100
GPR24: 00000000 C4F3CAD4 C783D2A0 1005078C C4EB9140 C53861D0 04F85889 C034A0A0
NIP [c00049e8] __flush_dcache_icache+0x14/0x40
LR [c000a5d0] update_mmu_cache+0x64/0x98
Call trace:
[c003fa7c] do_no_page+0x2f8/0x370
[c003fc44] handle_mm_fault+0x88/0x160
[c0009b58] do_page_fault+0x168/0x394
[c0002c28] handle_page_fault+0xc/0x80
What is happening here is that update_mmu_cache() calls __flush_dcache_icache()
to sync the d-cache with memory and invalidate any stale i-cache entries for
the address being faulted in.
Problem is that the "dcbst" instruction will, _sometimes_ (the failure/success rate is about 1/4
with my test application) fault as a _write_ operation on the data.
The address in question is always at the very beginning of the read-only data section,
thus the write fault (as can be verified in DSISR: 0x02000000) is rejected
because the vma structure is marked as read-only (vma->flags = ~VM_WRITE).
8xx machines running v2.6 are operating at the moment with a "tlbie()" call at
update_mmu_cache() just before __flush_dcache_icache(), which worksaround the problem.
I've been able to watch the "problematic" TLB entry just before update_mmu_cache().
Here it is:
SPR 824 : 0x10011f0b 268508939
BDI>rds 825
SPR 825 : 0x000001e0 480
BDI>rds 826
SPR 826 : 0x00001f00 7936
As you can see by bit 18 of the D-TLB debugging register MD_RAM1 (SPR 826), this entry
is marked as invalid, which will invocate DataTLBError in case of an access at this point
and handle the fault properly in most cases.
This is expected, and is how the sequence "DataTLBMiss" (no effective address in TLB entry) ->
"DataTLBError" (existant EA but valid bit not set) works on 8xx.
Kumar Gala suggested inspection of memory which holds __flush_dcache_icache().
With the BDI I could verify that the instruction sequence is there, intact.
I'm unable to determine why a "dcbst" fault is incorrectly being treated as a WRITE operation.
That seems to be the real problem. Likely to be Yet Another CPU bug?
I've came up with a workaround which looks acceptable (unlike the tlbie one).
Solution is to jump directly from the data tlb miss exception to DataAccess, which
in turn calls do_page_fault() and friends.
This avoids the dcbst's from being called to sync an address with an "invalid" TLB entry.
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
--- a/arch/ppc/kernel/head_8xx.S.orig 2005-04-04 19:43:23.000000000 -0300
+++ b/arch/ppc/kernel/head_8xx.S 2005-04-04 19:47:40.000000000 -0300
@@ -359,9 +359,7 @@
. = 0x1200
DataStoreTLBMiss:
-#ifdef CONFIG_8xx_CPU6
stw r3, 8(r0)
-#endif
DO_8xx_CPU6(0x3f80, r3)
mtspr M_TW, r10 /* Save a couple of working registers */
mfcr r10
@@ -390,6 +388,16 @@
mfspr r10, MD_TWC /* ....and get the pte address */
lwz r10, 0(r10) /* Get the pte */
+ li r3, 0
+ cmpw r10, r3 /* does the pte contain a valid address? */
+ bne 4f
+ mfspr r10, M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+ lwz r3, 8(r0)
+ b DataAccess
+4:
/* Insert the Guarded flag into the TWC from the Linux PTE.
* It is bit 27 of both the Linux PTE and the TWC (at least
* I got that right :-). It will be better when we can put
@@ -419,9 +427,7 @@
lwz r11, 0(r0)
mtcr r11
lwz r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
-#endif
rfi
/* This is an instruction TLB error on the MPC8xx. This could be due
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-04 19:17 8xx v2.6 TLB problems and suggested workaround Marcelo Tosatti
@ 2005-04-04 20:09 ` Marcelo Tosatti
2005-04-05 7:08 ` Pantelis Antoniou
2005-04-05 1:11 ` Kumar Gala
2005-04-05 15:58 ` 8xx v2.6 TLB problems and suggested workaround Dan Malek
2 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-04 20:09 UTC (permalink / raw)
To: linux-ppc-embedded, Dan Malek, Pantelis Antoniou; +Cc: Paul Mackerras
Hum,
The machine seems to be acting strange, but it boots normally
and applications run (more importantly there is no TLB entry
which could cause dcbst fault strangeness).
Some "dd" hangs till I press "ctrl+c", others just work. Really strange.
G'night, I'll look at it tomorrow.
[root@(none) /]# time dd if=/dev/zero of=file bs=16k count=400
400+0 records in
400+0 records out
real 0m4.261s
user 0m0.040s
sys 0m1.240s
[root@(none) /]# time dd if=/dev/zero of=file bs=32k count=400
real 0m50.369s
user 0m0.040s
sys 0m1.680s (ctrl+c)
[root@(none) /]#
> --- a/arch/ppc/kernel/head_8xx.S.orig 2005-04-04 19:43:23.000000000 -0300
> +++ b/arch/ppc/kernel/head_8xx.S 2005-04-04 19:47:40.000000000 -0300
> @@ -359,9 +359,7 @@
>
> . = 0x1200
> DataStoreTLBMiss:
> -#ifdef CONFIG_8xx_CPU6
> stw r3, 8(r0)
> -#endif
> DO_8xx_CPU6(0x3f80, r3)
> mtspr M_TW, r10 /* Save a couple of working registers */
> mfcr r10
> @@ -390,6 +388,16 @@
> mfspr r10, MD_TWC /* ....and get the pte address */
> lwz r10, 0(r10) /* Get the pte */
>
> + li r3, 0
> + cmpw r10, r3 /* does the pte contain a valid address? */
> + bne 4f
> + mfspr r10, M_TW /* Restore registers */
> + lwz r11, 0(r0)
> + mtcr r11
> + lwz r11, 4(r0)
> + lwz r3, 8(r0)
> + b DataAccess
> +4:
> /* Insert the Guarded flag into the TWC from the Linux PTE.
> * It is bit 27 of both the Linux PTE and the TWC (at least
> * I got that right :-). It will be better when we can put
> @@ -419,9 +427,7 @@
> lwz r11, 0(r0)
> mtcr r11
> lwz r11, 4(r0)
> -#ifdef CONFIG_8xx_CPU6
> lwz r3, 8(r0)
> -#endif
> rfi
>
> /* This is an instruction TLB error on the MPC8xx. This could be due
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-04 20:09 ` Marcelo Tosatti
@ 2005-04-05 7:08 ` Pantelis Antoniou
0 siblings, 0 replies; 28+ messages in thread
From: Pantelis Antoniou @ 2005-04-05 7:08 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Paul Mackerras, linux-ppc-embedded
Marcelo Tosatti wrote:
> Hum,
>
> The machine seems to be acting strange, but it boots normally
> and applications run (more importantly there is no TLB entry
> which could cause dcbst fault strangeness).
>
> Some "dd" hangs till I press "ctrl+c", others just work. Really strange.
>
> G'night, I'll look at it tomorrow.
>
> [root@(none) /]# time dd if=/dev/zero of=file bs=16k count=400
> 400+0 records in
> 400+0 records out
>
> real 0m4.261s
> user 0m0.040s
> sys 0m1.240s
> [root@(none) /]# time dd if=/dev/zero of=file bs=32k count=400
>
>
> real 0m50.369s
> user 0m0.040s
> sys 0m1.680s (ctrl+c)
> [root@(none) /]#
>
>
I can confirm that the patch works.
I no longer need the tlbie in update_mmu_cache.
/tmp # time dd if=/dev/zero of=file bs=16k count=400
400+0 records in
400+0 records out
real 0m 0.55s
user 0m 0.01s
sys 0m 0.52s
/tmp is tmpfs
Well done Marcelo!
Regards
Pantelis
P.S. CPU errata perhaps?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-04 19:17 8xx v2.6 TLB problems and suggested workaround Marcelo Tosatti
2005-04-04 20:09 ` Marcelo Tosatti
@ 2005-04-05 1:11 ` Kumar Gala
2005-04-05 3:14 ` PPC linux v2.6.11 network configuration hangs Pari Subramaniam
2005-04-05 15:58 ` 8xx v2.6 TLB problems and suggested workaround Dan Malek
2 siblings, 1 reply; 28+ messages in thread
From: Kumar Gala @ 2005-04-05 1:11 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Paul Mackerras, linux-ppc-embedded
Marcelo,
One thing would be useful to comment why we are doing this so if it=20
ends up being a CPU errata we at least know why we are doing this.
- kumar
On Apr 4, 2005, at 2:17 PM, Marcelo Tosatti wrote:
> (need volunteers to test the patch below on 8xx)
>
> Hi,
>
> I've been investigating the 8xx update_mmu_cache() oops for the last=20=
> weeks, and
> here is what I have gathered.
>
> Oops: kernel access of bad area, sig: 11 [#1]
> NIP: C00049E8 LR: C000A5D0 SP: C4F53E10 REGS: c4f53d60 TRAP: 0300=A0=A0=
=A0=20
> Not taintedMSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
>
> DAR: 100113A0, DSISR: C2000000
> TASK =3D c53f17e0[1224] 'a' THREAD: c4f52000
> Last syscall: 47
> GPR00: C783D2A0 C4F53E10 C53F17E0 10050000 00000100 0009F0A0 10050000=20=
> 00000000
> GPR08: 00075925 C783D2A0 C53F17E0 00000000 00076924 10077178 00000000=20=
> 100B4338
> GPR16: 100BBDE8 0ED792CE 7FFFF670 00000000 00000000 00000000 00000000=20=
> C4F41100
> GPR24: 00000000 C4F3CAD4 C783D2A0 1005078C C4EB9140 C53861D0 04F85889=20=
> C034A0A0
> NIP [c00049e8] __flush_dcache_icache+0x14/0x40
> LR [c000a5d0] update_mmu_cache+0x64/0x98
> Call trace:
> =A0[c003fa7c] do_no_page+0x2f8/0x370
> =A0[c003fc44] handle_mm_fault+0x88/0x160
> =A0[c0009b58] do_page_fault+0x168/0x394
> =A0[c0002c28] handle_page_fault+0xc/0x80
>
> What is happening here is that update_mmu_cache() calls=20
> __flush_dcache_icache()
> to sync the d-cache with memory and invalidate any stale i-cache=20
> entries for
> the address being faulted in.
>
> Problem is that the "dcbst" instruction will, _sometimes_ (the=20
> failure/success rate is about 1/4
> with my test application) fault as a _write_ operation on the data.
>
> The address in question is always at the very beginning of the=20
> read-only data section,
> thus the write fault (as can be verified in DSISR: 0x02000000) is=20
> rejected
> because the vma structure is marked as read-only (vma->flags =3D=20
> ~VM_WRITE).
>
> 8xx machines running v2.6 are operating at the moment with a "tlbie()"=20=
> call at
> update_mmu_cache() just before __flush_dcache_icache(), which=20
> worksaround the problem.
>
> I've been able to watch the "problematic" TLB entry just before=20
> update_mmu_cache().
> Here it is:
>
> SPR=A0 824 : 0x10011f0b=A0=A0=A0 268508939
> BDI>rds 825
> SPR=A0 825 : 0x000001e0=A0=A0=A0=A0=A0=A0=A0=A0=A0 480
> BDI>rds 826
> SPR=A0 826 : 0x00001f00=A0=A0=A0=A0=A0=A0=A0=A0 7936
>
> As you can see by bit 18 of the D-TLB debugging register MD_RAM1 (SPR=20=
> 826), this entry
> is marked as invalid, which will invocate DataTLBError in case of an=20=
> access at this point
> and handle the fault properly in most cases.
>
> This is expected, and is how the sequence "DataTLBMiss" (no effective=20=
> address in TLB entry) ->
> "DataTLBError" (existant EA but valid bit not set) works on 8xx.
>
> Kumar Gala suggested inspection of memory which holds=20
> __flush_dcache_icache().
> With the BDI I could verify that the instruction sequence is there,=20
> intact.
>
> I'm unable to determine why a "dcbst" fault is incorrectly being=20
> treated as a WRITE operation.
>
> That seems to be the real problem. Likely to be Yet Another CPU bug?
>
> I've came up with a workaround which looks acceptable (unlike the=20
> tlbie one).
>
> Solution is to jump directly from the data tlb miss exception to=20
> DataAccess, which
> in turn calls do_page_fault() and friends.
>
> This avoids the dcbst's from being called to sync an address with an=20=
> "invalid" TLB entry.
>
> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>
> --- a/arch/ppc/kernel/head_8xx.S.orig=A0=A0 2005-04-04 =
19:43:23.000000000=20
> -0300
> +++ b/arch/ppc/kernel/head_8xx.S=A0=A0=A0=A0=A0=A0=A0 2005-04-04 =
19:47:40.000000000=20
> -0300
> @@ -359,9 +359,7 @@
> =A0
> =A0=A0=A0=A0=A0=A0=A0 . =3D 0x1200
> =A0DataStoreTLBMiss:
> -#ifdef CONFIG_8xx_CPU6
> =A0=A0=A0=A0=A0=A0=A0 stw=A0=A0=A0=A0 r3, 8(r0)
> -#endif
> =A0=A0=A0=A0=A0=A0=A0 DO_8xx_CPU6(0x3f80, r3)
> =A0=A0=A0=A0=A0=A0=A0 mtspr=A0=A0 M_TW, r10=A0=A0=A0=A0=A0=A0 /* Save =
a couple of working registers=20
> */
> =A0=A0=A0=A0=A0=A0=A0 mfcr=A0=A0=A0 r10
> @@ -390,6 +388,16 @@
> =A0=A0=A0=A0=A0=A0=A0 mfspr=A0=A0 r10, MD_TWC=A0=A0=A0=A0 /* ....and =
get the pte address */
> =A0=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r10, 0(r10)=A0=A0=A0=A0 /* Get =
the pte */
> =A0
> +=A0=A0=A0=A0=A0=A0 li=A0=A0=A0=A0=A0 r3, 0
> +=A0=A0=A0=A0=A0=A0 cmpw=A0=A0=A0 r10, r3=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
/* does the pte contain a valid=20
> address? */
> +=A0=A0=A0=A0=A0=A0 bne=A0=A0=A0=A0 4f
> +=A0=A0=A0=A0=A0=A0 mfspr=A0=A0 r10, M_TW=A0=A0=A0=A0=A0=A0 /* Restore =
registers */
> +=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r11, 0(r0)
> +=A0=A0=A0=A0=A0=A0 mtcr=A0=A0=A0 r11
> +=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r11, 4(r0)
> +=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r3, 8(r0)
> +=A0=A0=A0=A0=A0=A0 b DataAccess
> +4:
> =A0=A0=A0=A0=A0=A0=A0 /* Insert the Guarded flag into the TWC from =
the Linux PTE.
> =A0=A0=A0=A0=A0=A0=A0=A0 * It is bit 27 of both the Linux PTE and the =
TWC (at least
> =A0=A0=A0=A0=A0=A0=A0=A0 * I got that right :-).=A0 It will be better =
when we can put
> @@ -419,9 +427,7 @@
> =A0=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r11, 0(r0)
> =A0=A0=A0=A0=A0=A0=A0 mtcr=A0=A0=A0 r11
> =A0=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r11, 4(r0)
> -#ifdef CONFIG_8xx_CPU6
> =A0=A0=A0=A0=A0=A0=A0 lwz=A0=A0=A0=A0 r3, 8(r0)
> -#endif
> =A0=A0=A0=A0=A0=A0=A0 rfi
> =A0
> =A0/* This is an instruction TLB error on the MPC8xx.=A0 This could =
be due
>
>
>
>
>
>
> _______________________________________________
> Linuxppc-embedded mailing list
> Linuxppc-embedded@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
^ permalink raw reply [flat|nested] 28+ messages in thread
* PPC linux v2.6.11 network configuration hangs
2005-04-05 1:11 ` Kumar Gala
@ 2005-04-05 3:14 ` Pari Subramaniam
0 siblings, 0 replies; 28+ messages in thread
From: Pari Subramaniam @ 2005-04-05 3:14 UTC (permalink / raw)
To: 'linux-ppc-embedded'
Hi,
We have 8540 based board running PPC port ver-2.4.30-pre1. when I tried =
to
upgrade to ver-2.6.11, the network interface loops (enabled TSEC alone)
indefinitely in the gfar_probe() at the following while loop:
/* Stop the DMA engine now, in case it was running before */
/* (The firmware could have used it, and left it running). */
/* To do this, we write Graceful Receive Stop and Graceful */
/* Transmit Stop, and then wait until the corresponding bits */
/* in IEVENT indicate the stops have completed. */
tempval =3D gfar_read(&priv->regs->dmactrl);
tempval &=3D ~(DMACTRL_GRS | DMACTRL_GTS);
gfar_write(&priv->regs->dmactrl, tempval);
tempval =3D gfar_read(&priv->regs->dmactrl);
tempval |=3D (DMACTRL_GRS | DMACTRL_GTS);
gfar_write(&priv->regs->dmactrl, tempval);
/*---------------------------------stays in this loop for
ever--------------------------------*/
while (!(gfar_read(&priv->regs->ievent) & (IEVENT_GRSC | =
IEVENT_GTSC)))
cpu_relax();
/*-----------------------------------------------------------------------=
-------
--------------*/
MPC8540 based system running boot loader U-Boot version-1.1.2. The TSEC =
port is
tested from the boot loader. The same behavior observed in all the =
boards.
I appreciate any help in this regard.
Thanks in advance
regards
-pari =20
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-04 19:17 8xx v2.6 TLB problems and suggested workaround Marcelo Tosatti
2005-04-04 20:09 ` Marcelo Tosatti
2005-04-05 1:11 ` Kumar Gala
@ 2005-04-05 15:58 ` Dan Malek
2005-04-05 11:41 ` Marcelo Tosatti
2005-04-06 6:00 ` Pantelis Antoniou
2 siblings, 2 replies; 28+ messages in thread
From: Dan Malek @ 2005-04-05 15:58 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Paul Mackerras, linux-ppc-embedded
On Apr 4, 2005, at 3:17 PM, Marcelo Tosatti wrote:
> Problem is that the "dcbst" instruction will, _sometimes_ (the
> failure/success rate is about 1/4
> with my test application) fault as a _write_ operation on the data.
Oh, geeze .... It's all coming back to me now ....
The 8xx cache operations don't always operate as defined in the PEM.
There are likely to be some archive discussions within the Freescale
knowledge data base that describe the different behaviors I've seen
with the chip variants and revisions. I can't find any of those e-mail
discussions, so I'll try to recall from memory.
The PEM cache instructions are all implemented in a microcode that
uses the 8xx unique cache control SPRs. Depending upon the state
of the cache and MMU, it seems in some cases the EA translation is
subject to a "normal" protection match instead of a load operation
match.
The behavior of these operations isn't consistent across all of the 8xx
processor revisions, especially with early silicon if people are still
using those. During conversations with Freescale engineers, it seems
the only guaranteed operation was to use the 8xx unique SPRs, but
I think I only did that in 8xx specific functions.
We have way too much code in the TLB exception handlers already,
so let's just try a tlbia of the EA in the update_mmu_cache, with an
#ifdef
for the 8xx. It seems if the dcbst causes a TLB miss during execution,
it does the right thing. We may want to make the dcbxxx instructions
some
kind of macro, so on 8xx we can include such operations in otherwise
"standard" software.
Thanks for the great work!
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-05 15:58 ` 8xx v2.6 TLB problems and suggested workaround Dan Malek
@ 2005-04-05 11:41 ` Marcelo Tosatti
2005-04-05 20:26 ` Marcelo Tosatti
2005-04-06 6:00 ` Pantelis Antoniou
1 sibling, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-05 11:41 UTC (permalink / raw)
To: Dan Malek; +Cc: Paul Mackerras, linux-ppc-embedded
On Tue, Apr 05, 2005 at 11:58:17AM -0400, Dan Malek wrote:
>
> On Apr 4, 2005, at 3:17 PM, Marcelo Tosatti wrote:
>
> >Problem is that the "dcbst" instruction will, _sometimes_ (the
> >failure/success rate is about 1/4
> >with my test application) fault as a _write_ operation on the data.
>
> Oh, geeze .... It's all coming back to me now ....
>
> The 8xx cache operations don't always operate as defined in the PEM.
> There are likely to be some archive discussions within the Freescale
> knowledge data base that describe the different behaviors I've seen
> with the chip variants and revisions. I can't find any of those e-mail
> discussions, so I'll try to recall from memory.
>
> The PEM cache instructions are all implemented in a microcode that
> uses the 8xx unique cache control SPRs. Depending upon the state
> of the cache and MMU, it seems in some cases the EA translation is
> subject to a "normal" protection match instead of a load operation
> match.
>
> The behavior of these operations isn't consistent across all of the 8xx
> processor revisions, especially with early silicon if people are still
> using those. During conversations with Freescale engineers, it seems
> the only guaranteed operation was to use the 8xx unique SPRs, but
> I think I only did that in 8xx specific functions.
How sweet. :)
> We have way too much code in the TLB exception handlers already,
> so let's just try a tlbia of the EA in the update_mmu_cache, with an
> #ifdef
> for the 8xx.
Are you sure this is the best solution ?
Problem is that update_mmu_cache() is called from other context's where
the tlb invalidate is not necessary (because it has already been invalidated).
For example all ptep_set_access_flags() (which does the tlb invalidate) ->
update_mmu_cache() sequences.
Moreover jumping directly from DataTLBMiss to the page fault handler
shortcuts the process: there is no need to jump back to execution if we
know in advance that DataTLBError exception is going to happen.
But hey, you are the boss. Even with the above facts you prefer
to leave the DataTLBMiss untouched?
About size: I think it is the smaller expection handler present.
> It seems if the dcbst causes a TLB miss during execution,
> it does the right thing.
It should always cause a miss because the TLB entry is marked as invalid
(DataTLBMiss just created the invalid TLB entry).
So even when a miss happens, it can do the wrong thing.
Right?
> We may want to make the dcbxxx instructions
> some
> kind of macro, so on 8xx we can include such operations in otherwise
> "standard" software.
I'm a bit lost here: you're talking about the kernel side of things only
or userspace also?
The latter would require "GNU as" dcbxxx macro? Hum...
> Thanks for the great work!
Your help has been invaluable!
I feel very good after many days of debugging pain. :)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-05 11:41 ` Marcelo Tosatti
@ 2005-04-05 20:26 ` Marcelo Tosatti
0 siblings, 0 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-05 20:26 UTC (permalink / raw)
To: Dan Malek; +Cc: Paul Mackerras, linux-ppc-embedded
On Tue, Apr 05, 2005 at 08:41:09AM -0300, Marcelo Tosatti wrote:
> On Tue, Apr 05, 2005 at 11:58:17AM -0400, Dan Malek wrote:
> >
> > On Apr 4, 2005, at 3:17 PM, Marcelo Tosatti wrote:
> >
> > >Problem is that the "dcbst" instruction will, _sometimes_ (the
> > >failure/success rate is about 1/4
> > >with my test application) fault as a _write_ operation on the data.
> >
> > Oh, geeze .... It's all coming back to me now ....
> >
> > The 8xx cache operations don't always operate as defined in the PEM.
> > There are likely to be some archive discussions within the Freescale
> > knowledge data base that describe the different behaviors I've seen
> > with the chip variants and revisions. I can't find any of those e-mail
> > discussions, so I'll try to recall from memory.
> >
> > The PEM cache instructions are all implemented in a microcode that
> > uses the 8xx unique cache control SPRs. Depending upon the state
> > of the cache and MMU, it seems in some cases the EA translation is
> > subject to a "normal" protection match instead of a load operation
> > match.
> >
> > The behavior of these operations isn't consistent across all of the 8xx
> > processor revisions, especially with early silicon if people are still
> > using those. During conversations with Freescale engineers, it seems
> > the only guaranteed operation was to use the 8xx unique SPRs, but
> > I think I only did that in 8xx specific functions.
>
> How sweet. :)
>
> > We have way too much code in the TLB exception handlers already,
> > so let's just try a tlbia of the EA in the update_mmu_cache, with an
> > #ifdef
> > for the 8xx.
>
> Are you sure this is the best solution ?
>
> Problem is that update_mmu_cache() is called from other context's where
> the tlb invalidate is not necessary (because it has already been invalidated).
>
> For example all ptep_set_access_flags() (which does the tlb invalidate) ->
> update_mmu_cache() sequences.
>
> Moreover jumping directly from DataTLBMiss to the page fault handler
> shortcuts the process: there is no need to jump back to execution if we
> know in advance that DataTLBError exception is going to happen.
>
> But hey, you are the boss. Even with the above facts you prefer
> to leave the DataTLBMiss untouched?
>
> About size: I think it is the smaller expection handler present.
Well, you know what you're talking about. Whatever you prefer.
Can we just ask someone to send the _tlbie patch around #ifdef CONFIG_M8XX?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-05 15:58 ` 8xx v2.6 TLB problems and suggested workaround Dan Malek
2005-04-05 11:41 ` Marcelo Tosatti
@ 2005-04-06 6:00 ` Pantelis Antoniou
1 sibling, 0 replies; 28+ messages in thread
From: Pantelis Antoniou @ 2005-04-06 6:00 UTC (permalink / raw)
To: Dan Malek; +Cc: Paul Mackerras, linux-ppc-embedded
Dan Malek wrote:
>
> On Apr 4, 2005, at 3:17 PM, Marcelo Tosatti wrote:
>
>> Problem is that the "dcbst" instruction will, _sometimes_ (the
>> failure/success rate is about 1/4
>> with my test application) fault as a _write_ operation on the data.
>
>
> Oh, geeze .... It's all coming back to me now ....
>
> The 8xx cache operations don't always operate as defined in the PEM.
> There are likely to be some archive discussions within the Freescale
> knowledge data base that describe the different behaviors I've seen
> with the chip variants and revisions. I can't find any of those e-mail
> discussions, so I'll try to recall from memory.
>
> The PEM cache instructions are all implemented in a microcode that
> uses the 8xx unique cache control SPRs. Depending upon the state
> of the cache and MMU, it seems in some cases the EA translation is
> subject to a "normal" protection match instead of a load operation match.
>
OK, maybe we should make 8xx specifics cache flushing functions, that
use the SPR, and forget about this mess.
However is this problem also triggered by user space? If it is we should
try to maintain compatibility...
> The behavior of these operations isn't consistent across all of the 8xx
> processor revisions, especially with early silicon if people are still
> using those. During conversations with Freescale engineers, it seems
> the only guaranteed operation was to use the 8xx unique SPRs, but
> I think I only did that in 8xx specific functions.
>
> We have way too much code in the TLB exception handlers already,
> so let's just try a tlbia of the EA in the update_mmu_cache, with an #ifdef
> for the 8xx. It seems if the dcbst causes a TLB miss during execution,
> it does the right thing. We may want to make the dcbxxx instructions some
> kind of macro, so on 8xx we can include such operations in otherwise
> "standard" software.
>
> Thanks for the great work!
>
>
> -- Dan
>
>
>
Regards
Pantelis
^ permalink raw reply [flat|nested] 28+ messages in thread
* 8xx v2.6 TLB problems and suggested workaround
@ 2005-04-05 21:51 Joakim Tjernlund
2005-04-06 12:16 ` Marcelo Tosatti
0 siblings, 1 reply; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-05 21:51 UTC (permalink / raw)
To: marcelo.tosatti, linuxppc-embedded
Hi Marcelo
Reading your report it doesn't sound likely but I will ask anyway:
Is it possible that the problem you are seeing isn't caused by the
"famous" CPU bug mentioned here:
http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
The DTLB error handler needs DAR to be set correctly and since the
dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
may end up trying to fix the wrong address.
Jocke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-05 21:51 Joakim Tjernlund
@ 2005-04-06 12:16 ` Marcelo Tosatti
2005-04-06 21:24 ` Joakim Tjernlund
0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-06 12:16 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-embedded
On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> Hi Marcelo
>
> Reading your report it doesn't sound likely but I will ask anyway:
> Is it possible that the problem you are seeing isn't caused by the
> "famous" CPU bug mentioned here:
> http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
>
> The DTLB error handler needs DAR to be set correctly and since the
> dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> may end up trying to fix the wrong address.
Hi Joakim,
First of all, thanks your care!
Well, I dont think the above issue is exactly what we're hitting because
DAR is correctly updated on our case with "dcbst".
The problem is that it is treated as a write operation, but shouldnt.
Maybe it is related to dcbst's inability to set DAR?
BTW, about the CPU15 bug fix, has there been any effort to port/merge
it in v2.6 ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-06 12:16 ` Marcelo Tosatti
@ 2005-04-06 21:24 ` Joakim Tjernlund
2005-04-07 12:00 ` Marcelo Tosatti
0 siblings, 1 reply; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-06 21:24 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linuxppc-embedded
> On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > Hi Marcelo
> >
> > Reading your report it doesn't sound likely but I will ask anyway:
> > Is it possible that the problem you are seeing isn't caused by the
> > "famous" CPU bug mentioned here:
> > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> >
> > The DTLB error handler needs DAR to be set correctly and since the
> > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > may end up trying to fix the wrong address.
>
> Hi Joakim,
>
> First of all, thanks your care!
NP, I want to be able to run 8xx on 2.6 in the future.
>
> Well, I dont think the above issue is exactly what we're hitting because
> DAR is correctly updated on our case with "dcbst".
Are you sure? Cant remeber all details but this looks a bit strange to me
SPR 826 : 0x00001f00 7936
is not 0x00001 supposed to be the physical page?
Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
is correct?
Don't understand why the "tlbie()" call works around the problem. Can you
explain that a bit more?
>
> The problem is that it is treated as a write operation, but shouldnt.
>
> Maybe it is related to dcbst's inability to set DAR?
Could be, but even if it isn't you are in trouble when dcbX instr.
generates DTLB Misses/Errors Sooner or later you will end up with
strange SEGV or hangs.
>
> BTW, about the CPU15 bug fix, has there been any effort to port/merge
> it in v2.6 ?
None that I know.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-06 21:24 ` Joakim Tjernlund
@ 2005-04-07 12:00 ` Marcelo Tosatti
2005-04-07 20:35 ` Joakim Tjernlund
0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-07 12:00 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-embedded
On Wed, Apr 06, 2005 at 11:24:46PM +0200, Joakim Tjernlund wrote:
> > On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > > Hi Marcelo
> > >
> > > Reading your report it doesn't sound likely but I will ask anyway:
> > > Is it possible that the problem you are seeing isn't caused by the
> > > "famous" CPU bug mentioned here:
> > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> > >
> > > The DTLB error handler needs DAR to be set correctly and since the
> > > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > > may end up trying to fix the wrong address.
> >
> > Hi Joakim,
> >
> > First of all, thanks your care!
>
> NP, I want to be able to run 8xx on 2.6 in the future.
>
> >
> > Well, I dont think the above issue is exactly what we're hitting because
> > DAR is correctly updated on our case with "dcbst".
>
> Are you sure? Cant remeber all details but this looks a bit strange to me
> SPR 826 : 0x00001f00 7936
> is not 0x00001 supposed to be the physical page?
SPR 826 contains the page attributes, not Physical Page Number (which is held
by SPR 825).
> Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
> is correct?
As defined by the PEM, bit 1 indicates "data-store error exception", bit 2
indicates:
"Set if the translation of an attempted access is not found in the primary hash
table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a
DBAT register (page fault condition); otherwise cleared."
And bit 6 indicates a store operation (shouldnt be set).
> Don't understand why the "tlbie()" call works around the problem. Can you
> explain that a bit more?
It must be because the TLB entry is now removed from the cache, which avoids
dcbst from faulting as a store.
There must be some relation to the invalid present TLB entry and dcbst
misbehaviour.
I didnt check what happens with the TLB after tlbie(), I should do that.
But I suppose it gets wiped off?
> > The problem is that it is treated as a write operation, but shouldnt.
> >
> > Maybe it is related to dcbst's inability to set DAR?
>
> Could be, but even if it isn't you are in trouble when dcbX instr.
> generates DTLB Misses/Errors Sooner or later you will end up with
> strange SEGV or hangs.
Hangs due to the dcbX misbehaviour wrt DAR setting, you mean? (which your
patch corrects).
Yep, that makes sense.
> > BTW, about the CPU15 bug fix, has there been any effort to port/merge
> > it in v2.6 ?
>
> None that I know.
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-07 12:00 ` Marcelo Tosatti
@ 2005-04-07 20:35 ` Joakim Tjernlund
2005-04-07 19:38 ` Marcelo Tosatti
0 siblings, 1 reply; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-07 20:35 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linuxppc-embedded
> -----Original Message-----
> From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> Sent: den 7 april 2005 14:00
> On Wed, Apr 06, 2005 at 11:24:46PM +0200, Joakim Tjernlund wrote:
> > > On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > > > Hi Marcelo
> > > >
> > > > Reading your report it doesn't sound likely but I will ask anyway:
> > > > Is it possible that the problem you are seeing isn't caused by the
> > > > "famous" CPU bug mentioned here:
> > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> > > >
> > > > The DTLB error handler needs DAR to be set correctly and since the
> > > > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > > > may end up trying to fix the wrong address.
> > >
> > > Hi Joakim,
> > >
> > > First of all, thanks your care!
> >
> > NP, I want to be able to run 8xx on 2.6 in the future.
> >
> > >
> > > Well, I dont think the above issue is exactly what we're hitting because
> > > DAR is correctly updated on our case with "dcbst".
> >
> > Are you sure? Cant remeber all details but this looks a bit strange to me
> > SPR 826 : 0x00001f00 7936
> > is not 0x00001 supposed to be the physical page?
>
> SPR 826 contains the page attributes, not Physical Page Number (which is held
> by SPR 825).
Yes, my memory is getting really bad :)
Does SPR 825 hould the correct physical page? 0x000001e0 looks like
Zero to me(I should probably bring the manual home so i don't have the rely on
my bad memory :)
>
> > Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
> > is correct?
>
> As defined by the PEM, bit 1 indicates "data-store error exception", bit 2
> indicates:
>
> "Set if the translation of an attempted access is not found in the primary hash
> table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a
> DBAT register (page fault condition); otherwise cleared."
>
> And bit 6 indicates a store operation (shouldnt be set).
Yes, but bit 0 is also set and if I remember correctly(don't have the manual handy)
it should always be zero?
>
> > Don't understand why the "tlbie()" call works around the problem. Can you
> > explain that a bit more?
>
> It must be because the TLB entry is now removed from the cache, which avoids
> dcbst from faulting as a store.
>
> There must be some relation to the invalid present TLB entry and dcbst
> misbehaviour.
>
> I didnt check what happens with the TLB after tlbie(), I should do that.
> But I suppose it gets wiped off?
Unless the pte gets populated(valid) before the next TLB miss I think you
will repeat the same sequence that caused the error in the first place. So
why does that work?
>
> > > The problem is that it is treated as a write operation, but shouldnt.
> > >
> > > Maybe it is related to dcbst's inability to set DAR?
> >
> > Could be, but even if it isn't you are in trouble when dcbX instr.
> > generates DTLB Misses/Errors Sooner or later you will end up with
> > strange SEGV or hangs.
>
> Hangs due to the dcbX misbehaviour wrt DAR setting, you mean? (which your
> patch corrects).
Yes.
>
> Yep, that makes sense.
>
> > > BTW, about the CPU15 bug fix, has there been any effort to port/merge
> > > it in v2.6 ?
> >
> > None that I know.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-07 20:35 ` Joakim Tjernlund
@ 2005-04-07 19:38 ` Marcelo Tosatti
2005-04-08 2:09 ` Dan Malek
2005-04-08 8:01 ` Joakim Tjernlund
0 siblings, 2 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-07 19:38 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-embedded
Joakim,
On Thu, Apr 07, 2005 at 10:35:30PM +0200, Joakim Tjernlund wrote:
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> > Sent: den 7 april 2005 14:00
> > On Wed, Apr 06, 2005 at 11:24:46PM +0200, Joakim Tjernlund wrote:
> > > > On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > > > > Hi Marcelo
> > > > >
> > > > > Reading your report it doesn't sound likely but I will ask anyway:
> > > > > Is it possible that the problem you are seeing isn't caused by the
> > > > > "famous" CPU bug mentioned here:
> > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> > > > >
> > > > > The DTLB error handler needs DAR to be set correctly and since the
> > > > > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > > > > may end up trying to fix the wrong address.
> > > >
> > > > Hi Joakim,
> > > >
> > > > First of all, thanks your care!
> > >
> > > NP, I want to be able to run 8xx on 2.6 in the future.
> > >
> > > >
> > > > Well, I dont think the above issue is exactly what we're hitting because
> > > > DAR is correctly updated on our case with "dcbst".
> > >
> > > Are you sure? Cant remeber all details but this looks a bit strange to me
> > > SPR 826 : 0x00001f00 7936
> > > is not 0x00001 supposed to be the physical page?
> >
> > SPR 826 contains the page attributes, not Physical Page Number (which is held
> > by SPR 825).
>
> Yes, my memory is getting really bad :)
>
> Does SPR 825 hould the correct physical page? 0x000001e0 looks like
> Zero to me(I should probably bring the manual home so i don't have the rely on
> my bad memory :)
Yes, it is zero. That is because there is no pte entry for the page yet (DataStoreTLBMiss
sets the pte even if its zero). Thats when DataTLBError (EA present in TLB entry but valid
bit not set) gets called.
> > > Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
> > > is correct?
> >
> > As defined by the PEM, bit 1 indicates "data-store error exception", bit 2
> > indicates:
I meant "bit 0 and bit 1".
> > "Set if the translation of an attempted access is not found in the primary hash
> > table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a
> > DBAT register (page fault condition); otherwise cleared."
> >
> > And bit 6 indicates a store operation (shouldnt be set).
>
> Yes, but bit 0 is also set and if I remember correctly(don't have the manual handy)
> it should always be zero?
Well, bit 0 and bit 1 are set.
> > > Don't understand why the "tlbie()" call works around the problem. Can you
> > > explain that a bit more?
> >
> > It must be because the TLB entry is now removed from the cache, which avoids
> > dcbst from faulting as a store.
> >
> > There must be some relation to the invalid present TLB entry and dcbst
> > misbehaviour.
> >
> > I didnt check what happens with the TLB after tlbie(), I should do that.
> > But I suppose it gets wiped off?
>
> Unless the pte gets populated(valid) before the next TLB miss I think you
> will repeat the same sequence that caused the error in the first place.
> So why does that work?
It does get populated.
The sequence is:
1) userspace access triggers DataTLBMiss
2) DataTLBMiss sets TLB from Linux pte. At this stage pte entry is still
zeroed (pte table entry clear). Thats why PPN points to page "00000".
3) DataTLBError (TLB EA match but valid bit not set) - jumps to page fault
handler
4) do_no_page()
- allocates a page
- set pte accordingly
- update_mmu_cache() (dcbst access faults as a write)
So, there must be some relation over dcbst's misbehaviour and the _invalid_
zero RPN TLB entry.
Thing is dcbst is not supposed to fault as a store operation, from what PEM
indicates.
As I understand 8xx deviates from other PPC's in many aspects. Dan says:
"The PEM cache instructions are all implemented in a microcode that
uses the 8xx unique cache control SPRs. Depending upon the state
of the cache and MMU, it seems in some cases the EA translation is
subject to a "normal" protection match instead of a load operation
match.
The behavior of these operations isn't consistent across all of the 8xx
processor revisions, especially with early silicon if people are still
using those. During conversations with Freescale engineers, it seems
the only guaranteed operation was to use the 8xx unique SPRs, but
I think I only did that in 8xx specific functions."
I'll check what the tlbie does precisely (tomorrow). I suppose it wipes the TLB
entry completly.
Would be nice to have someone from 8xx team look into this?
> > > > The problem is that it is treated as a write operation, but shouldnt.
> > > >
> > > > Maybe it is related to dcbst's inability to set DAR?
> > >
> > > Could be, but even if it isn't you are in trouble when dcbX instr.
> > > generates DTLB Misses/Errors Sooner or later you will end up with
> > > strange SEGV or hangs.
> >
> > Hangs due to the dcbX misbehaviour wrt DAR setting, you mean? (which your
> > patch corrects).
>
> Yes.
>
> >
> > Yep, that makes sense.
> >
> > > > BTW, about the CPU15 bug fix, has there been any effort to port/merge
> > > > it in v2.6 ?
> > >
> > > None that I know.
I'll try cpu15.c on v2.6 tomorrow.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-07 19:38 ` Marcelo Tosatti
@ 2005-04-08 2:09 ` Dan Malek
2005-04-08 11:07 ` Marcelo Tosatti
2005-04-08 8:01 ` Joakim Tjernlund
1 sibling, 1 reply; 28+ messages in thread
From: Dan Malek @ 2005-04-08 2:09 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Joakim Tjernlund, linuxppc-embedded
On Apr 7, 2005, at 3:38 PM, Marcelo Tosatti wrote:
> Would be nice to have someone from 8xx team look into this?
I'll look into it and find some solution. I suspect it is an
interaction with the previous TLB miss and the behavior
of the dcbst TLB look up. Perhaps, if we ensure the
TLB entry is not valid at the time of the dcbst, it will work.
This is why the tlbie() I added as a hack a long time
ago made the "problem" disappear. The other dcbxx
instructions in the code work on already existing pages,
while this one is a special case of a miss on a page
that doesn't exist.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-08 2:09 ` Dan Malek
@ 2005-04-08 11:07 ` Marcelo Tosatti
2005-04-09 5:16 ` Dan Malek
0 siblings, 1 reply; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-08 11:07 UTC (permalink / raw)
To: Dan Malek; +Cc: Joakim Tjernlund, linuxppc-embedded
Hi Dan,
On Thu, Apr 07, 2005 at 10:09:58PM -0400, Dan Malek wrote:
>
> On Apr 7, 2005, at 3:38 PM, Marcelo Tosatti wrote:
>
> >Would be nice to have someone from 8xx team look into this?
>
> I'll look into it and find some solution.
What do you envision as another solution?
We have two now:
1) _tlbie() on update_mmu_cache() surrounded by CONFIG_8xx #ifdef
Did you give up about it?
2) jump directly from DataTLBMiss to fault handler.
You seem to dislike it.
What else you think can be done?
> I suspect it is an interaction with the previous TLB miss and the behavior
> of the dcbst TLB look up. Perhaps, if we ensure the
> TLB entry is not valid at the time of the dcbst, it will work.
Note that the TLB entry is _not valid_ at the time of the dcbst:
BDI>rds 824
SPR 824 : 0x10011f05 268508933
BDI>rds 825
SPR 825 : 0x000001e0 480
BDI>rds 826
SPR 826 : 0x00001f00 7936
bit 18 (valid bit) of SPR 826 is not set.
So even with the TLB invalid, dcbst misbehaves.
> This is why the tlbie() I added as a hack a long time
> ago made the "problem" disappear. The other dcbxx
> instructions in the code work on already existing pages,
> while this one is a special case of a miss on a page
> that doesn't exist.
>
> Thanks.
>
> -- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-08 11:07 ` Marcelo Tosatti
@ 2005-04-09 5:16 ` Dan Malek
2005-04-09 19:03 ` Joakim Tjernlund
0 siblings, 1 reply; 28+ messages in thread
From: Dan Malek @ 2005-04-09 5:16 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Joakim Tjernlund, linuxppc-embedded
On Apr 8, 2005, at 7:07 AM, Marcelo Tosatti wrote:
> 1) _tlbie() on update_mmu_cache() surrounded by CONFIG_8xx #ifdef
> Did you give up about it?
I think a tlbia() of the vaddr should work here. No sense blowing
away the whole TLB cache for this.
> What else you think can be done?
It would be interesting to change __flush_dcache_icache()
to use the 8xx SPR cache operations instead of the dcbst instruction.
I wouldn't be surprised if it worked differently, but I'd not be
able to explain it :-)
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-09 5:16 ` Dan Malek
@ 2005-04-09 19:03 ` Joakim Tjernlund
2005-04-09 22:37 ` Marcelo Tosatti
2005-04-23 21:55 ` Dan Malek
0 siblings, 2 replies; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-09 19:03 UTC (permalink / raw)
To: Dan Malek, Marcelo Tosatti; +Cc: linuxppc-embedded
>
> On Apr 8, 2005, at 7:07 AM, Marcelo Tosatti wrote:
>
> > 1) _tlbie() on update_mmu_cache() surrounded by CONFIG_8xx #ifdef
> > Did you give up about it?
>
> I think a tlbia() of the vaddr should work here. No sense blowing
> away the whole TLB cache for this.
Umm, isn't it the other way around? tlbie flushes one TLB whereas tlbia flushes
all TLBs.
> > What else you think can be done?
>
> It would be interesting to change __flush_dcache_icache()
> to use the 8xx SPR cache operations instead of the dcbst instruction.
yes, but I think these operates on physical addresses which makes it a bit harder.
I still think this can be resolved in fault.c. Replace
andis. r11, r10, 0x0200 /* If set, indicates store op */
beq 2f
in the DTLB Error handler with
andis. r11, r10, 0x4800 /* If set, indicates invalid pte or protection violation */
bne 2f
In fault.c you can check if both store and invalid is set simultaneously. If it is, clear
the store flag and continue as usual.
> I wouldn't be surprised if it worked differently, but I'd not be
> able to explain it :-)
>
> Thanks.
>
> -- Dan
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-09 19:03 ` Joakim Tjernlund
@ 2005-04-09 22:37 ` Marcelo Tosatti
2005-04-10 10:08 ` Joakim Tjernlund
2005-04-22 17:14 ` Marcelo Tosatti
2005-04-23 21:55 ` Dan Malek
1 sibling, 2 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-09 22:37 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-embedded
On Sat, Apr 09, 2005 at 09:03:54PM +0200, Joakim Tjernlund wrote:
> >
> > On Apr 8, 2005, at 7:07 AM, Marcelo Tosatti wrote:
> >
> > > 1) _tlbie() on update_mmu_cache() surrounded by CONFIG_8xx #ifdef
> > > Did you give up about it?
> >
> > I think a tlbia() of the vaddr should work here. No sense blowing
> > away the whole TLB cache for this.
>
> Umm, isn't it the other way around? tlbie flushes one TLB whereas tlbia flushes
> all TLBs.
Yep
> > > What else you think can be done?
> >
> > It would be interesting to change __flush_dcache_icache()
> > to use the 8xx SPR cache operations instead of the dcbst instruction.
>
> yes, but I think these operates on physical addresses which makes it a bit harder.
Other than the fact of userspace dcbst users.
> I still think this can be resolved in fault.c. Replace
> andis. r11, r10, 0x0200 /* If set, indicates store op */
> beq 2f
> in the DTLB Error handler with
> andis. r11, r10, 0x4800 /* If set, indicates invalid pte or protection violation */
> bne 2f
Why does the current code jump to page fault handler in case of store operation?
Out of curiosity, aren't there any other valid bit combinations for DSISR other
than 0x4800 which should allow a fastpath DataTLBError ?
I can't find DSISR settings in MPC860UM.pdf neither paper manual. AFAICS it
always refer to the PEM when talking about DSISR bit assignments.
I can't find section "7-15" as you mentioned in the other email.
> In fault.c you can check if both store and invalid is set simultaneously. If it is, clear
> the store flag and continue as usual.
One point is that by changing the in-kernel dcbst implementation userspace is
still vulnerable to the problem.
Now fixing the exception handler to deal with such boggosity as Joakim proposes is
complete - it handles userspace dcbst callers.
> > I wouldn't be surprised if it worked differently, but I'd not be
> > able to explain it :-)
> >
> > Thanks.
> >
> > -- Dan
> >
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-09 22:37 ` Marcelo Tosatti
@ 2005-04-10 10:08 ` Joakim Tjernlund
2005-04-22 17:14 ` Marcelo Tosatti
1 sibling, 0 replies; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-10 10:08 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linuxppc-embedded
> -----Original Message-----
> From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> On Sat, Apr 09, 2005 at 09:03:54PM +0200, Joakim Tjernlund wrote:
> > >
> > > On Apr 8, 2005, at 7:07 AM, Marcelo Tosatti wrote:
[SNIP]
> > I still think this can be resolved in fault.c. Replace
> > andis. r11, r10, 0x0200 /* If set, indicates store op */
> > beq 2f
> > in the DTLB Error handler with
> > andis. r11, r10, 0x4800 /* If set, indicates invalid pte or protection violation */
> > bne 2f
>
> Why does the current code jump to page fault handler in case of store operation?
It doesn't. It jumps if some other bit is set as well. In your dcbst case it seems like invalid
is set too. I hope that will be enough to work something out in fault.c.
>
> Out of curiosity, aren't there any other valid bit combinations for DSISR other
> than 0x4800 which should allow a fastpath DataTLBError ?
Don't know, hopefully Dan knows.
>
> I can't find DSISR settings in MPC860UM.pdf neither paper manual. AFAICS it
> always refer to the PEM when talking about DSISR bit assignments.
>
> I can't find section "7-15" as you mentioned in the other email.
It is page 7-15 and I think(not having the manual handy) that chapter 7 describes the
different exceptions including the DTLB Error.
>
> > In fault.c you can check if both store and invalid is set simultaneously. If it is, clear
> > the store flag and continue as usual.
>
> One point is that by changing the in-kernel dcbst implementation userspace is
> still vulnerable to the problem.
>
> Now fixing the exception handler to deal with such boggosity as Joakim proposes is
> complete - it handles userspace dcbst callers.
Yes, ldso uses dcbst and icbi. Currently it works fine because a normal store
has been performed before dcbst/icbi is executed.
Jocke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-09 22:37 ` Marcelo Tosatti
2005-04-10 10:08 ` Joakim Tjernlund
@ 2005-04-22 17:14 ` Marcelo Tosatti
1 sibling, 0 replies; 28+ messages in thread
From: Marcelo Tosatti @ 2005-04-22 17:14 UTC (permalink / raw)
To: Joakim Tjernlund, Dan Malek; +Cc: linuxppc-embedded
Dan,
I haven't heard your opinion on Joakim's proposed change yet.
It looks plausible, and its more complete than dcbst's reimplementation
with 8xx specific cache functions (because it also covers userspace dcbst
callers).
I would love to see this getting fixed in v2.6 mainline.
Thanks
On Sat, Apr 09, 2005 at 07:37:21PM -0300, Marcelo Tosatti wrote:
> On Sat, Apr 09, 2005 at 09:03:54PM +0200, Joakim Tjernlund wrote:
> > >
> > > On Apr 8, 2005, at 7:07 AM, Marcelo Tosatti wrote:
> > >
> > > > 1) _tlbie() on update_mmu_cache() surrounded by CONFIG_8xx #ifdef
> > > > Did you give up about it?
> > >
> > > I think a tlbia() of the vaddr should work here. No sense blowing
> > > away the whole TLB cache for this.
> >
> > Umm, isn't it the other way around? tlbie flushes one TLB whereas tlbia flushes
> > all TLBs.
>
> Yep
>
> > > > What else you think can be done?
> > >
> > > It would be interesting to change __flush_dcache_icache()
> > > to use the 8xx SPR cache operations instead of the dcbst instruction.
> >
> > yes, but I think these operates on physical addresses which makes it a bit harder.
>
> Other than the fact of userspace dcbst users.
>
> > I still think this can be resolved in fault.c. Replace
> > andis. r11, r10, 0x0200 /* If set, indicates store op */
> > beq 2f
> > in the DTLB Error handler with
> > andis. r11, r10, 0x4800 /* If set, indicates invalid pte or protection violation */
> > bne 2f
>
> Why does the current code jump to page fault handler in case of store operation?
>
> Out of curiosity, aren't there any other valid bit combinations for DSISR other
> than 0x4800 which should allow a fastpath DataTLBError ?
>
> I can't find DSISR settings in MPC860UM.pdf neither paper manual. AFAICS it
> always refer to the PEM when talking about DSISR bit assignments.
>
> I can't find section "7-15" as you mentioned in the other email.
>
> > In fault.c you can check if both store and invalid is set simultaneously. If it is, clear
> > the store flag and continue as usual.
>
> One point is that by changing the in-kernel dcbst implementation userspace is
> still vulnerable to the problem.
>
> Now fixing the exception handler to deal with such boggosity as Joakim proposes is
> complete - it handles userspace dcbst callers.
>
> > > I wouldn't be surprised if it worked differently, but I'd not be
> > > able to explain it :-)
> > >
> > > Thanks.
> > >
> > > -- Dan
> > >
> _______________________________________________
> Linuxppc-embedded mailing list
> Linuxppc-embedded@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-09 19:03 ` Joakim Tjernlund
2005-04-09 22:37 ` Marcelo Tosatti
@ 2005-04-23 21:55 ` Dan Malek
2005-04-23 22:07 ` Joakim Tjernlund
1 sibling, 1 reply; 28+ messages in thread
From: Dan Malek @ 2005-04-23 21:55 UTC (permalink / raw)
To: Joakim.Tjernlund; +Cc: linuxppc-embedded
On Apr 9, 2005, at 3:03 PM, Joakim Tjernlund wrote:
> yes, but I think these operates on physical addresses which makes it a
> bit harder.
> I still think this can be resolved in fault.c. Replace
> andis. r11, r10, 0x0200 /* If set, indicates store op */
> beq 2f
> in the DTLB Error handler with
> andis. r11, r10, 0x4800 /* If set, indicates invalid pte or
> protection violation */
> bne 2f
> In fault.c you can check if both store and invalid is set
> simultaneously. If it is, clear
> the store flag and continue as usual.
The purpose for the code in TLB Error is to create fast path for
tracking
writable pages as dirty. I think we should stop writing all of this
assembler
code and if we find anything that isn't simply updating the "dirty"
flags, we
should bail out to the fault.c and do whatever is necessary. This
includes
simulating any cache instructions that fail.
As a further performance enhancement, which used to be the standard
mode for 8xx, we should allow the option to mark all writable pages as
dirty when the PTE is created. This eliminates the TLB Error fault in
this
case completely. Since most 8xx systems don't do page swapping, this
has no other effect. If it does page swapping, it may swap more pages
than necessary, or the option can be disabled to do proper paging.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-23 21:55 ` Dan Malek
@ 2005-04-23 22:07 ` Joakim Tjernlund
2005-04-23 22:23 ` Dan Malek
0 siblings, 1 reply; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-23 22:07 UTC (permalink / raw)
To: Dan Malek; +Cc: linuxppc-embedded
> On Apr 9, 2005, at 3:03 PM, Joakim Tjernlund wrote:
>
> > yes, but I think these operates on physical addresses which makes it a
> > bit harder.
> > I still think this can be resolved in fault.c. Replace
> > andis. r11, r10, 0x0200 /* If set, indicates store op */
> > beq 2f
> > in the DTLB Error handler with
> > andis. r11, r10, 0x4800 /* If set, indicates invalid pte or
> > protection violation */
> > bne 2f
> > In fault.c you can check if both store and invalid is set
> > simultaneously. If it is, clear
> > the store flag and continue as usual.
>
> The purpose for the code in TLB Error is to create fast path for
> tracking
> writable pages as dirty. I think we should stop writing all of this
> assembler
Do you refer to the assembler code above or some other assembler code such
as my dcbX workaround? I once had a C version of that workaround that lived
in fault.c. Sadly it think it is gone now, not sure if I ever sent it to the list.
> code and if we find anything that isn't simply updating the "dirty"
> flags, we
> should bail out to the fault.c and do whatever is necessary. This
> includes
> simulating any cache instructions that fail.
>
> As a further performance enhancement, which used to be the standard
> mode for 8xx, we should allow the option to mark all writable pages as
> dirty when the PTE is created. This eliminates the TLB Error fault in
> this
> case completely. Since most 8xx systems don't do page swapping, this
> has no other effect. If it does page swapping, it may swap more pages
> than necessary, or the option can be disabled to do proper paging.
>
> Thanks.
>
>
> -- Dan
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-23 22:07 ` Joakim Tjernlund
@ 2005-04-23 22:23 ` Dan Malek
0 siblings, 0 replies; 28+ messages in thread
From: Dan Malek @ 2005-04-23 22:23 UTC (permalink / raw)
To: Joakim.Tjernlund; +Cc: linuxppc-embedded
On Apr 23, 2005, at 6:07 PM, Joakim Tjernlund wrote:
> Do you refer to the assembler code above or some other assembler code
> such
> as my dcbX workaround? I once had a C version of that workaround that
> lived
> in fault.c. Sadly it think it is gone now, not sure if I ever sent it
> to the list.
Yes, the dcbx and anything else we may have to place in there. At most,
we should emulate whatever is necessary in C code and just invalidate
the TLB entry for reloading, if the PTE has changed. We shouldn't be
trying to load TLB entries.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-07 19:38 ` Marcelo Tosatti
2005-04-08 2:09 ` Dan Malek
@ 2005-04-08 8:01 ` Joakim Tjernlund
2005-04-08 13:39 ` Dan Malek
1 sibling, 1 reply; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-08 8:01 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linuxppc-embedded
Hi Marcelo
> -----Original Message-----
> From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> Joakim,
>
> On Thu, Apr 07, 2005 at 10:35:30PM +0200, Joakim Tjernlund wrote:
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:marcelo.tosatti@cyclades.com]
> > > Sent: den 7 april 2005 14:00
> > > On Wed, Apr 06, 2005 at 11:24:46PM +0200, Joakim Tjernlund wrote:
> > > > > On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > > > > > Hi Marcelo
> > > > > >
> > > > > > Reading your report it doesn't sound likely but I will ask anyway:
> > > > > > Is it possible that the problem you are seeing isn't caused by the
> > > > > > "famous" CPU bug mentioned here:
> > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> > > > > >
> > > > > > The DTLB error handler needs DAR to be set correctly and since the
> > > > > > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > > > > > may end up trying to fix the wrong address.
> > > > >
> > > > > Hi Joakim,
> > > > >
> > > > > First of all, thanks your care!
> > > >
> > > > NP, I want to be able to run 8xx on 2.6 in the future.
> > > >
> > > > >
> > > > > Well, I dont think the above issue is exactly what we're hitting because
> > > > > DAR is correctly updated on our case with "dcbst".
> > > >
> > > > Are you sure? Cant remeber all details but this looks a bit strange to me
> > > > SPR 826 : 0x00001f00 7936
> > > > is not 0x00001 supposed to be the physical page?
> > >
> > > SPR 826 contains the page attributes, not Physical Page Number (which is held
> > > by SPR 825).
> >
> > Yes, my memory is getting really bad :)
> >
> > Does SPR 825 hould the correct physical page? 0x000001e0 looks like
> > Zero to me(I should probably bring the manual home so i don't have the rely on
> > my bad memory :)
>
> Yes, it is zero. That is because there is no pte entry for the page yet (DataStoreTLBMiss
> sets the pte even if its zero). Thats when DataTLBError (EA present in TLB entry but valid
> bit not set) gets called.
I see.
>
> > > > Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
> > > > is correct?
> > >
> > > As defined by the PEM, bit 1 indicates "data-store error exception", bit 2
> > > indicates:
>
> I meant "bit 0 and bit 1".
>
> > > "Set if the translation of an attempted access is not found in the primary hash
> > > table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a
> > > DBAT register (page fault condition); otherwise cleared."
> > >
> > > And bit 6 indicates a store operation (shouldnt be set).
> >
> > Yes, but bit 0 is also set and if I remember correctly(don't have the manual handy)
> > it should always be zero?
I was looking at the DTLB Error excetion(p. 7-15) in the MPC860 User's Manual. There
bit 1 = invalid TLB
bit 4 = protection violation
bit 6 = store operation
the rest is always zero.
Thats why DSISR = C2000000 looks somewhat impossible since bit 0 is set and that one
should always be zero.
hmm, I just remembered that there is a comment in mm/fault.c that 8xx always set bit 0 even
though it shouldn't. There is also a 8xx specific test with bit 3(0x1000000) in fault.c but
bit 3 is always zero according the MPC860 Manual for a DTLB Error.
Then we end up with bit 1(invalid TLB) and bit 6(store operation) set. Maybe one
could make the DTLB Error handler test if bit 1 is set and then branch to
DataAccess and then deal with the problem in fault.c?
>
> Well, bit 0 and bit 1 are set.
>
> > > > Don't understand why the "tlbie()" call works around the problem. Can you
> > > > explain that a bit more?
> > >
> > > It must be because the TLB entry is now removed from the cache, which avoids
> > > dcbst from faulting as a store.
> > >
> > > There must be some relation to the invalid present TLB entry and dcbst
> > > misbehaviour.
> > >
> > > I didnt check what happens with the TLB after tlbie(), I should do that.
> > > But I suppose it gets wiped off?
> >
> > Unless the pte gets populated(valid) before the next TLB miss I think you
> > will repeat the same sequence that caused the error in the first place.
> > So why does that work?
>
> It does get populated.
>
> The sequence is:
>
> 1) userspace access triggers DataTLBMiss
>
> 2) DataTLBMiss sets TLB from Linux pte. At this stage pte entry is still
> zeroed (pte table entry clear). Thats why PPN points to page "00000".
>
> 3) DataTLBError (TLB EA match but valid bit not set) - jumps to page fault
> handler
>
> 4) do_no_page()
> - allocates a page
> - set pte accordingly
> - update_mmu_cache() (dcbst access faults as a write)
>
> So, there must be some relation over dcbst's misbehaviour and the _invalid_
> zero RPN TLB entry.
>
> Thing is dcbst is not supposed to fault as a store operation, from what PEM
> indicates.
Ah, now I get it. Thanks.
Jocke
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: 8xx v2.6 TLB problems and suggested workaround
2005-04-08 8:01 ` Joakim Tjernlund
@ 2005-04-08 13:39 ` Dan Malek
2005-04-08 14:29 ` Joakim Tjernlund
0 siblings, 1 reply; 28+ messages in thread
From: Dan Malek @ 2005-04-08 13:39 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-embedded
On Apr 8, 2005, at 4:01 AM, Joakim Tjernlund wrote:
> I was looking at the DTLB Error excetion(p. 7-15) in the MPC860 User's
> Manual. There
Yeah, well what the manual says and what really happens seems to be
two different things.
> .... There is also a 8xx specific test with bit 3(0x1000000) in
> fault.c but
> bit 3 is always zero according the MPC860 Manual for a DTLB Error.
Read the comment. It really happens. I spent lots and lots of time
sorting
out how the 8xx works, setting up precise test cases and examining the
results. Stop reading the manual too closely and create test cases to
see what exactly happens.
> Then we end up with bit 1(invalid TLB) and bit 6(store operation) set.
> Maybe one
> could make the DTLB Error handler test if bit 1 is set and then branch
> to
> DataAccess and then deal with the problem in fault.c?
No. That is adding even more code to the "normal" path. The TLB miss
should simply take a value from memory and load it into the TLB.
Nothing
more. It should emulate what a hardware implementation would do ...
eight instructions, no branches, if done properly :-)
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: 8xx v2.6 TLB problems and suggested workaround
2005-04-08 13:39 ` Dan Malek
@ 2005-04-08 14:29 ` Joakim Tjernlund
0 siblings, 0 replies; 28+ messages in thread
From: Joakim Tjernlund @ 2005-04-08 14:29 UTC (permalink / raw)
To: Dan Malek; +Cc: linuxppc-embedded
> -----Original Message-----
> From: Dan Malek [mailto:dan@embeddededge.com]
>
> On Apr 8, 2005, at 4:01 AM, Joakim Tjernlund wrote:
>
> > I was looking at the DTLB Error excetion(p. 7-15) in the MPC860 User's
> > Manual. There
>
> Yeah, well what the manual says and what really happens seems to be
> two different things.
Yep, the manual really sucks sometimes. The manual errata is huge
so it would be more than welcome if Freesacle actually updated the
maunal with the errata items.
>
> > .... There is also a 8xx specific test with bit 3(0x1000000) in
> > fault.c but
> > bit 3 is always zero according the MPC860 Manual for a DTLB Error.
>
> Read the comment. It really happens. I spent lots and lots of time
> sorting
> out how the 8xx works, setting up precise test cases and examining the
> results. Stop reading the manual too closely and create test cases to
> see what exactly happens.
OK, fine. Just wanted to hear that.
>
> > Then we end up with bit 1(invalid TLB) and bit 6(store operation) set.
> > Maybe one
> > could make the DTLB Error handler test if bit 1 is set and then branch
> > to
> > DataAccess and then deal with the problem in fault.c?
>
> No. That is adding even more code to the "normal" path. The TLB miss
> should simply take a value from memory and load it into the TLB.
> Nothing
> more. It should emulate what a hardware implementation would do ...
> eight instructions, no branches, if done properly :-)
I was talking about the TLB Error handler. The TLB Miss handler should be kept as small
as possible.
Any ideas on how to fix the problem Marcelo reported?
Jocke
>
> Thanks.
>
>
> -- Dan
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2005-04-23 22:23 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-04 19:17 8xx v2.6 TLB problems and suggested workaround Marcelo Tosatti
2005-04-04 20:09 ` Marcelo Tosatti
2005-04-05 7:08 ` Pantelis Antoniou
2005-04-05 1:11 ` Kumar Gala
2005-04-05 3:14 ` PPC linux v2.6.11 network configuration hangs Pari Subramaniam
2005-04-05 15:58 ` 8xx v2.6 TLB problems and suggested workaround Dan Malek
2005-04-05 11:41 ` Marcelo Tosatti
2005-04-05 20:26 ` Marcelo Tosatti
2005-04-06 6:00 ` Pantelis Antoniou
-- strict thread matches above, loose matches on Subject: below --
2005-04-05 21:51 Joakim Tjernlund
2005-04-06 12:16 ` Marcelo Tosatti
2005-04-06 21:24 ` Joakim Tjernlund
2005-04-07 12:00 ` Marcelo Tosatti
2005-04-07 20:35 ` Joakim Tjernlund
2005-04-07 19:38 ` Marcelo Tosatti
2005-04-08 2:09 ` Dan Malek
2005-04-08 11:07 ` Marcelo Tosatti
2005-04-09 5:16 ` Dan Malek
2005-04-09 19:03 ` Joakim Tjernlund
2005-04-09 22:37 ` Marcelo Tosatti
2005-04-10 10:08 ` Joakim Tjernlund
2005-04-22 17:14 ` Marcelo Tosatti
2005-04-23 21:55 ` Dan Malek
2005-04-23 22:07 ` Joakim Tjernlund
2005-04-23 22:23 ` Dan Malek
2005-04-08 8:01 ` Joakim Tjernlund
2005-04-08 13:39 ` Dan Malek
2005-04-08 14:29 ` Joakim Tjernlund
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).