LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] 8xx: Large page(8MB) support for 2.4
From: Joakim Tjernlund @ 2011-10-10 11:38 UTC (permalink / raw)
  To: Dan Malek, linuxppc-dev, Scott Wood, Willy Tarreau

This adds Large page support for 8xx and uses it
for all kernel RAM.

Further usage is possible, IMAP_ADDR and on board
flash comes to mind.

There is one bit free the pte which could be used for
selecting different large page sizes but that is for another
day.

- Dan, what do you think :)

Joakim Tjernlund (3):
  8xx: replace _PAGE_EXEC with _PAGE_PSE
  8xx: Support LARGE pages in TLB code.
  8xx: Use LARGE pages for kernel RAM.

 arch/ppc/kernel/head_8xx.S |   30 +++++++++++++++++++-----------
 arch/ppc/mm/pgtable.c      |    4 +++-
 include/asm-ppc/pgtable.h  |    6 +++++-
 3 files changed, 27 insertions(+), 13 deletions(-)

-- 
1.7.3.4

^ permalink raw reply

* [PATCH 14/14] 8xx: The TLB miss handler manages ACCESSED correctly.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

The new MMU/TLB code no longer sets ACCESSED unconditionally
so remove the exception.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 include/asm-ppc/pgtable.h |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h
index 6cfc5fc..b94e8a8 100644
--- a/include/asm-ppc/pgtable.h
+++ b/include/asm-ppc/pgtable.h
@@ -318,16 +318,6 @@ extern unsigned long vmalloc_start;
 #define _PMD_PAGE_MASK	0x000c
 #define _PMD_PAGE_8M	0x000c
 
-/*
- * The 8xx TLB miss handler allegedly sets _PAGE_ACCESSED in the PTE
- * for an address even if _PAGE_PRESENT is not set, as a performance
- * optimization.  This is a bug if you ever want to use swap unless
- * _PAGE_ACCESSED is 2, which it isn't, or unless you have 8xx-specific
- * definitions for __swp_entry etc. below, which would be gross.
- *  -- paulus
- */
-#define _PTE_NONE_MASK	_PAGE_ACCESSED
-
 #else /* CONFIG_6xx */
 /* Definitions for 60x, 740/750, etc. */
 #define _PAGE_PRESENT	0x001	/* software: pte contains a translation */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 13/14] 8xx: Optimize TLB Miss handlers
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

Only update pte w.r.t ACCESSED if it isn't already set
Wrap ACCESSED with #ifndef NO_SWAP for too ease optimization.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 0f2101d..36089cc 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -377,10 +377,14 @@ InstructionTLBMiss:
 	mfspr	r21, MD_TWC	/* ....and get the pte address */
 	lwz	r20, 0(r21)	/* Get the pte */
 
-#if 1
+#ifndef NO_SWAP
 	/* if !swap, you can delete this */
+	andi.	r21, r20, _PAGE_ACCESSED	/* test ACCESSED bit */
+	bne+	4f		/* Branch if set */
+	mfspr	r21, MD_TWC	/* get the pte address */
 	rlwimi	r20, r20, 5, _PAGE_PRESENT<<5	/* Copy PRESENT to ACCESSED */
 	stw	r20, 0(r21)	/* Update pte */
+4:
 #endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 21 and 28 must be clear.
@@ -450,11 +454,14 @@ DataStoreTLBMiss:
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 
-#if 1
+#ifndef NO_SWAP
 	/* if !swap, you can delete this */
+	andi.	r21, r20, _PAGE_ACCESSED	/* test ACCESSED bit */
+	bne+	4f		/* Branch if set */
 	mfspr	r21, MD_TWC	/* get the pte address */
 	rlwimi	r20, r20, 5, _PAGE_PRESENT<<5	/* Copy PRESENT to ACCESSED */
 	stw	r20, 0(r21)	/* Update pte */
+4:
 #endif
 
 	/* Honour kernel RO, User NA */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 11/14] 8xx: start using dcbX instructions in various copy routines
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

Now that 8xx can fixup dcbX instructions, start using them
where possible like every other PowerPc arch do.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/misc.S |   18 ------------------
 arch/ppc/lib/string.S  |   17 -----------------
 2 files changed, 0 insertions(+), 35 deletions(-)

diff --git a/arch/ppc/kernel/misc.S b/arch/ppc/kernel/misc.S
index c616098..c291005 100644
--- a/arch/ppc/kernel/misc.S
+++ b/arch/ppc/kernel/misc.S
@@ -662,15 +662,7 @@ _GLOBAL(__flush_dcache_icache)
 _GLOBAL(clear_page)
 	li	r0,4096/L1_CACHE_LINE_SIZE
 	mtctr	r0
-#ifdef CONFIG_8xx
-	li	r4, 0
-1:	stw	r4, 0(r3)
-	stw	r4, 4(r3)
-	stw	r4, 8(r3)
-	stw	r4, 12(r3)
-#else
 1:	dcbz	0,r3
-#endif
 	addi	r3,r3,L1_CACHE_LINE_SIZE
 	bdnz	1b
 	blr
@@ -695,15 +687,6 @@ _GLOBAL(copy_page)
 	addi	r3,r3,-4
 	addi	r4,r4,-4
 
-#ifdef CONFIG_8xx
-	/* don't use prefetch on 8xx */
-    	li	r0,4096/L1_CACHE_LINE_SIZE
-	mtctr	r0
-1:	COPY_16_BYTES
-	bdnz	1b
-	blr
-
-#else	/* not 8xx, we can prefetch */
 	li	r5,4
 
 #if MAX_COPY_PREFETCH > 1
@@ -744,7 +727,6 @@ _GLOBAL(copy_page)
 	li	r0,MAX_COPY_PREFETCH
 	li	r11,4
 	b	2b
-#endif	/* CONFIG_8xx */
 
 /*
  * Atomic [test&set] exchange
diff --git a/arch/ppc/lib/string.S b/arch/ppc/lib/string.S
index 6ca54b4..b6ea44b 100644
--- a/arch/ppc/lib/string.S
+++ b/arch/ppc/lib/string.S
@@ -159,14 +159,7 @@ _GLOBAL(cacheable_memzero)
 	bdnz	4b
 3:	mtctr	r9
 	li	r7,4
-#if !defined(CONFIG_8xx)
 10:	dcbz	r7,r6
-#else
-10:	stw	r4, 4(r6)
-	stw	r4, 8(r6)
-	stw	r4, 12(r6)
-	stw	r4, 16(r6)
-#endif
 	addi	r6,r6,CACHELINE_BYTES
 	bdnz	10b
 	clrlwi	r5,r8,32-LG_CACHELINE_BYTES
@@ -261,9 +254,7 @@ _GLOBAL(cacheable_memcpy)
 	mtctr	r0
 	beq	63f
 53:
-#if !defined(CONFIG_8xx)
 	dcbz	r11,r6
-#endif
 	COPY_16_BYTES
 #if L1_CACHE_LINE_SIZE >= 32
 	COPY_16_BYTES
@@ -443,13 +434,6 @@ _GLOBAL(__copy_tofrom_user)
 	li	r11,4
 	beq	63f
 
-#ifdef CONFIG_8xx
-	/* Don't use prefetch on 8xx */
-	mtctr	r0
-53:	COPY_16_BYTES_WITHEX(0)
-	bdnz	53b
-
-#else /* not CONFIG_8xx */
 	/* Here we decide how far ahead to prefetch the source */
 	li	r3,4
 	cmpwi	r0,1
@@ -502,7 +486,6 @@ _GLOBAL(__copy_tofrom_user)
 	li	r3,4
 	li	r7,0
 	bne	114b
-#endif /* CONFIG_8xx */	
 
 63:	srwi.	r0,r5,2
 	mtctr	r0
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 12/14] 8xx: Use symbolic constants in TLB asm
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

Use the PTE #defines where possible instead of
hardcoded constants.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 4bcd9b3..0f2101d 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -442,11 +442,11 @@ DataStoreTLBMiss:
 	 * this into the Linux pgd/pmd and load it in the operation
 	 * above.
 	 */
-	rlwimi	r21, r20, 0, 27, 27
+	rlwimi	r21, r20, 0, _PAGE_GUARDED
 	/* Insert the WriteThru flag into the TWC from the Linux PTE.
 	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
 	 */
-	rlwimi	r21, r20, 32-5, 30, 30
+	rlwimi	r21, r20, 32-5, _PAGE_WRITETHRU>>5
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 
@@ -460,9 +460,9 @@ DataStoreTLBMiss:
 	/* Honour kernel RO, User NA */
 	/* 0x200 == Extended encoding, bit 22 */
 	/* r20 |=  (r20 & _PAGE_USER) >> 2 */
-	rlwimi	r20, r20, 32-2, 0x200
+	rlwimi	r20, r20, 32-2, _PAGE_USER>>2 /* Copy USER to Encoding */
 	/* r21 =  (r20 & _PAGE_RW) >> 1 */
-	rlwinm	r21, r20, 32-1, 0x200
+	rlwinm	r21, r20, 32-1, _PAGE_RW>>1
 	or	r20, r21, r20
 	/* invert RW and 0x200 bits */
 	xori	r20, r20, _PAGE_RW | 0x200
@@ -582,11 +582,11 @@ DARFixed:
 	/* Insert the Guarded flag into the TWC from the Linux PTE.
 	 * It is bit 27 of both the Linux PTE and the TWC
 	 */
-	rlwimi	r21, r20, 0, 27, 27
+	rlwimi	r21, r20, 0, _PAGE_GUARDED
 	/* Insert the WriteThru flag into the TWC from the Linux PTE.
 	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
 	 */
-	rlwimi	r21, r20, 32-5, 30, 30
+	rlwimi	r21, r20, 32-5, _PAGE_WRITETHRU>>5
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 	mfspr	r21, MD_TWC		/* get the pte address again */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 10/14] 8xx: Set correct HW pte flags in DTLB Error too
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

DTLB Error needs to adjust the HW PTE bits as DTLB Miss
does.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 402158d..4bcd9b3 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -592,7 +592,12 @@ DARFixed:
 	mfspr	r21, MD_TWC		/* get the pte address again */
 	ori	r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
 	stw	r20, 0(r21)		/* and update pte in table */
-	xori	r20, r20, _PAGE_RW	/* RW bit is inverted */
+	rlwimi	r20, r20, 32-2, _PAGE_USER>>2 /* Copy USER to Encoding */
+	/* r21 =  (r20 & _PAGE_RW) >> 1 */
+	rlwinm	r21, r20, 32-1, _PAGE_RW>>1
+	or	r20, r21, r20
+	/* invert RW and 0x200 bits */
+	xori	r20, r20, _PAGE_RW | 0x200
 	b	finish_DTLB
 2:
 	mfspr	r20, M_TW	/* Restore registers */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 09/14] 8xx: Restore _PAGE_WRITETHRU
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

8xx has not had WRITETHRU due to lack of bits in the pte.
After the recent rewrite of the 8xx TLB code, there are
two bits left. Use one of them to WRITETHRU.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |    8 ++++++++
 include/asm-ppc/pgtable.h  |    5 +++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 86bc727..402158d 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -443,6 +443,10 @@ DataStoreTLBMiss:
 	 * above.
 	 */
 	rlwimi	r21, r20, 0, 27, 27
+	/* Insert the WriteThru flag into the TWC from the Linux PTE.
+	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
+	 */
+	rlwimi	r21, r20, 32-5, 30, 30
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 
@@ -579,6 +583,10 @@ DARFixed:
 	 * It is bit 27 of both the Linux PTE and the TWC
 	 */
 	rlwimi	r21, r20, 0, 27, 27
+	/* Insert the WriteThru flag into the TWC from the Linux PTE.
+	 * It is bit 25 in the Linux PTE and bit 30 in the TWC
+	 */
+	rlwimi	r21, r20, 32-5, 30, 30
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 	mfspr	r21, MD_TWC		/* get the pte address again */
diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h
index 2ba37d3..6cfc5fc 100644
--- a/include/asm-ppc/pgtable.h
+++ b/include/asm-ppc/pgtable.h
@@ -298,12 +298,13 @@ extern unsigned long vmalloc_start;
 #define _PAGE_NO_CACHE	0x0002	/* I: cache inhibit */
 #define _PAGE_SHARED	0x0004	/* No ASID (context) compare */
 
-/* These three software bits must be masked out when the entry is loaded
- * into the TLB, 2 SW bits free.
+/* These four software bits must be masked out when the entry is loaded
+ * into the TLB, 1 SW bits left(0x0080).
  */
 #define _PAGE_EXEC	0x0008	/* software: i-cache coherency required */
 #define _PAGE_GUARDED	0x0010	/* software: guarded access */
 #define _PAGE_ACCESSED	0x0020	/* software: page referenced */
+#define _PAGE_WRITETHRU	0x0040	/* software: caching is write through */
 
 /* Setting any bits in the nibble with the follow two controls will
  * require a TLB exception handler change.  It is assumed unused bits
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 08/14] 8xx: Add missing Guarded setting in DTLB Error.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

only DTLB Miss did set this bit, DTLB Error needs too otherwise
the setting is lost when the page becomes dirty.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 367fec0..86bc727 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -573,9 +573,15 @@ DARFixed:
 	ori	r21, r21, 1		/* Set valid bit in physical L2 page */
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21		/* Load pte table base address */
-	mfspr	r21, MD_TWC		/* ....and get the pte address */
-	lwz	r20, 0(r21)		/* Get the pte */
-
+	mfspr	r20, MD_TWC		/* ....and get the pte address */
+	lwz	r20, 0(r20)		/* Get the pte */
+	/* Insert the Guarded flag into the TWC from the Linux PTE.
+	 * It is bit 27 of both the Linux PTE and the TWC
+	 */
+	rlwimi	r21, r20, 0, 27, 27
+	DO_8xx_CPU6(0x3b80, r3)
+	mtspr	MD_TWC, r21
+	mfspr	r21, MD_TWC		/* get the pte address again */
 	ori	r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
 	stw	r20, 0(r21)		/* and update pte in table */
 	xori	r20, r20, _PAGE_RW	/* RW bit is inverted */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 07/14] 8xx: CPU6 errata make DTLB error too big to fit.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

branch to common code in DTLB Miss instead.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   23 ++---------------------
 1 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 0891b96..367fec0 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -469,6 +469,7 @@ DataStoreTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
+finish_DTLB:
 2:	li	r21, 0x00f0
 	mtspr	DAR, r21	/* Tag DAR */
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
@@ -578,27 +579,7 @@ DARFixed:
 	ori	r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
 	stw	r20, 0(r21)		/* and update pte in table */
 	xori	r20, r20, _PAGE_RW	/* RW bit is inverted */
-
-	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 22 and 28 must be clear.
-	 * Software indicator bits 24, 25, 26, and 27 must be
-	 * set.  All other Linux PTE bits control the behavior
-	 * of the MMU.
-	 */
-	li	r21, 0x00f0
-	mtspr	DAR, r21	/* Tag DAR */
-	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
-	DO_8xx_CPU6(0x3d80, r3)
-	mtspr	MD_RPN, r20	/* Update TLB entry */
-
-	mfspr	r20, M_TW	/* Restore registers */
-	lwz	r21, 0(r0)
-	mtcr	r21
-	lwz	r21, 4(r0)
-#ifdef CONFIG_8xx_CPU6
-	lwz	r3, 8(r0)
-#endif
-	rfi
+	b	finish_DTLB
 2:
 	mfspr	r20, M_TW	/* Restore registers */
 	lwz	r21, 0(r0)
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 05/14] 8xx: Update TLB asm so it behaves as linux mm expects.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

Update the TLB asm to make proper use of _PAGE_DIRTY and _PAGE_ACCESSED.
Get rid of _PAGE_HWWRITE too.
Pros:
  - PRESENT is copied to ACCESSED, fixing accounting
  - DIRTY is mapped to 0x100, the changed bit, and is set directly
    when a page has been made dirty.
  - Proper RO/RW mapping of user space.
  - Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
  - kernel RO/user NA support. Not sure this is really needed, would save
    a few insn if not required.
Cons:
  - A few more instructions in the DTLB Miss routine.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   53 ++++++++++++++++++++++++++-----------------
 include/asm-ppc/pgtable.h  |   15 +++++------
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 9d8a1b5..c9770b6 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -369,25 +369,27 @@ InstructionTLBMiss:
 	 */
 	tophys(r21,r21)
 	ori	r21,r21,1		/* Set valid bit */
-	beq-	2f			/* If zero, don't try to find a pte */
 	DO_8xx_CPU6(0x2b80, r3)
 	mtspr	MI_TWC, r21	/* Set segment attributes */
+	beq-	2f		/* If zero, don't try to find a pte */
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21	/* Load pte table base address */
 	mfspr	r21, MD_TWC	/* ....and get the pte address */
 	lwz	r20, 0(r21)	/* Get the pte */
 
-	ori	r20, r20, _PAGE_ACCESSED
-	stw	r20, 0(r21)
-
+#if 1
+	/* if !swap, you can delete this */
+	rlwimi	r20, r20, 5, _PAGE_PRESENT<<5	/* Copy PRESENT to ACCESSED */
+	stw	r20, 0(r21)	/* Update pte */
+#endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 21, 22 and 28 must be clear.
+	 * Software indicator bits 21 and 28 must be clear.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
 2:	li	r21, 0x00f0
-	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
+	rlwimi	r20, r21, 0, 0x07f8	/* Set 24-27, clear 21-23,28 */
 	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	MI_RPN, r20	/* Update TLB entry */
 
@@ -444,12 +446,25 @@ DataStoreTLBMiss:
 	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 
-	mfspr	r21, MD_TWC	/* get the pte address again */
-	ori	r20, r20, _PAGE_ACCESSED
-	stw	r20, 0(r21)
+#if 1
+	/* if !swap, you can delete this */
+	mfspr	r21, MD_TWC	/* get the pte address */
+	rlwimi	r20, r20, 5, _PAGE_PRESENT<<5	/* Copy PRESENT to ACCESSED */
+	stw	r20, 0(r21)	/* Update pte */
+#endif
+
+	/* Honour kernel RO, User NA */
+	/* 0x200 == Extended encoding, bit 22 */
+	/* r20 |=  (r20 & _PAGE_USER) >> 2 */
+	rlwimi	r20, r20, 32-2, 0x200
+	/* r21 =  (r20 & _PAGE_RW) >> 1 */
+	rlwinm	r21, r20, 32-1, 0x200
+	or	r20, r21, r20
+	/* invert RW and 0x200 bits */
+	xori	r20, r20, _PAGE_RW | 0x200
 
 	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 21, 22 and 28 must be clear.
+	 * Software indicator bits 22 and 28 must be clear.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
@@ -496,11 +511,12 @@ DataTLBError:
 	stw	r20, 0(r0)
 	stw	r21, 4(r0)
 
-	/* First, make sure this was a store operation.
-	*/
 	mfspr	r20, DSISR
-	andis.	r21, r20, 0x0200	/* If set, indicates store op */
-	beq	2f
+	andis.	r21, r20, 0x4800	/* !translation or protection */
+	bne-	2f
+	/* Only Change bit left now, do it here as it is faster
+	 * than trapping to the C fault handler.
+ 	 */
 
 	/* The EA of a data TLB miss is automatically stored in the MD_EPN
 	 * register.  The EA of a data TLB error is automatically stored in
@@ -550,17 +566,12 @@ DataTLBError:
 	mfspr	r21, MD_TWC		/* ....and get the pte address */
 	lwz	r20, 0(r21)		/* Get the pte */
 
-	andi.	r21, r20, _PAGE_RW	/* Is it writeable? */
-	beq	2f			/* Bail out if not */
-
-	/* Update 'changed', among others.
-	*/
 	ori	r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
-	mfspr	r21, MD_TWC		/* Get pte address again */
 	stw	r20, 0(r21)		/* and update pte in table */
+	xori	r20, r20, _PAGE_RW	/* RW bit is inverted */
 
 	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 21, 22 and 28 must be clear.
+	 * Software indicator bits 22 and 28 must be clear.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h
index 71b2165..2ba37d3 100644
--- a/include/asm-ppc/pgtable.h
+++ b/include/asm-ppc/pgtable.h
@@ -298,21 +298,20 @@ extern unsigned long vmalloc_start;
 #define _PAGE_NO_CACHE	0x0002	/* I: cache inhibit */
 #define _PAGE_SHARED	0x0004	/* No ASID (context) compare */
 
-/* These five software bits must be masked out when the entry is loaded
- * into the TLB.
+/* These three software bits must be masked out when the entry is loaded
+ * into the TLB, 2 SW bits free.
  */
 #define _PAGE_EXEC	0x0008	/* software: i-cache coherency required */
 #define _PAGE_GUARDED	0x0010	/* software: guarded access */
-#define _PAGE_DIRTY	0x0020	/* software: page changed */
-#define _PAGE_RW	0x0040	/* software: user write access allowed */
-#define _PAGE_ACCESSED	0x0080	/* software: page referenced */
+#define _PAGE_ACCESSED	0x0020	/* software: page referenced */
 
 /* Setting any bits in the nibble with the follow two controls will
  * require a TLB exception handler change.  It is assumed unused bits
- * are always zero.
+ * are always zero, encoding(bit 22).
  */
-#define _PAGE_HWWRITE	0x0100	/* h/w write enable: never set in Linux PTE */
-#define _PAGE_USER	0x0800	/* One of the PP bits, the other is USER&~RW */
+#define _PAGE_DIRTY	0x0100	/* Changed: page changed */
+#define _PAGE_RW	0x0400	/* PP lsb(bit 21), user write access allowed */
+#define _PAGE_USER	0x0800	/* PP msb(bit 20), user access allowed */
 
 #define _PMD_PRESENT	PAGE_MASK
 #define _PMD_PAGE_MASK	0x000c
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 06/14] 8xx: Fixup DAR from buggy dcbX instructions.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

This is an assembler version to fixup DAR not being set
by dcbX, icbi instructions. There are two versions, one
uses selfmodifing code, the other uses a
jump table but is much bigger(default).

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |  149 +++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 146 insertions(+), 3 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index c9770b6..0891b96 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -511,8 +511,17 @@ DataTLBError:
 	stw	r20, 0(r0)
 	stw	r21, 4(r0)
 
-	mfspr	r20, DSISR
-	andis.	r21, r20, 0x4800	/* !translation or protection */
+	mfspr	r20, DAR
+	cmpwi	cr0, r20, 0x00f0
+	beq-	FixupDAR	/* must be a buggy dcbX, icbi insn. */
+DARFixed:
+	/* As the DAR fixup may clear store we may have all 3 states zero.
+	 * Make sure only 0x0200(store) falls down into DIRTY handling
+	 */
+	mfspr	r21, DSISR
+	andis.	r21, r21, 0x4a00	/* !translation, protection or store */
+	srwi	r21, r21, 16
+	cmpwi	cr0, r21, 0x0200	/* just store ? */
 	bne-	2f
 	/* Only Change bit left now, do it here as it is faster
 	 * than trapping to the C fault handler.
@@ -534,7 +543,7 @@ DataTLBError:
 	 * are initialized in mapin_ram().  This will avoid the problem,
 	 * assuming we only use the dcbi instruction on kernel addresses.
 	 */
-	mfspr	r20, DAR
+	/* DAR is in r20 already */
 	rlwinm	r21, r20, 0, 0, 19
 	ori	r21, r21, MD_EVALID
 	mfspr	r20, M_CASID
@@ -618,6 +627,140 @@ DataTLBError:
 	STD_EXCEPTION(0x1f00, Trap_1f, UnknownException)
 
 	. = 0x2000
+/* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions
+ * by decoding the registers used by the dcbx instruction and adding them.
+ * DAR is set to the calculated address and r10 also holds the EA on exit.
+ */
+ /* define if you don't want to use self modifying code */
+#define NO_SELF_MODIFYING_CODE
+FixupDAR:/* Entry point for dcbx workaround. */
+	/* fetch instruction from memory. */
+	mfspr	r20, SRR0
+	andis.	r21, r20, 0x8000	/* Address >= 0x80000000 */
+	DO_8xx_CPU6(0x3780, r3)
+	mtspr	MD_EPN, r20
+	mfspr	r21, M_TWB	/* Get level 1 table entry address */
+	beq-	3f		/* Branch if user space */
+	lis	r21, (swapper_pg_dir-PAGE_OFFSET)@h
+	ori	r21, r21, (swapper_pg_dir-PAGE_OFFSET)@l
+	rlwimi	r21, r20, 32-20, 0xffc /* r21 = r21&~0xffc|(r20>>20)&0xffc */
+3:	lwz	r21, 0(r21)	/* Get the level 1 entry */
+	tophys  (r21, r21)
+	DO_8xx_CPU6(0x3b80, r3)
+	mtspr	MD_TWC, r21	/* Load pte table base address */
+	mfspr	r21, MD_TWC	/* ....and get the pte address */
+	lwz	r21, 0(r21)	/* Get the pte */
+	/* concat physical page address(r21) and page offset(r20) */
+	rlwimi	r21, r20, 0, 20, 31
+	lwz	r21,0(r21)
+/* Check if it really is a dcbx instruction. */
+/* dcbt and dcbtst does not generate DTLB Misses/Errors,
+ * no need to include them here */
+	srwi	r20, r21, 26	/* check if major OP code is 31 */
+	cmpwi	cr0, r20, 31
+	bne-	141f
+	rlwinm	r20, r21, 0, 21, 30
+	cmpwi	cr0, r20, 2028	/* Is dcbz? */
+	beq+	142f
+	cmpwi	cr0, r20, 940	/* Is dcbi? */
+	beq+	142f
+	cmpwi	cr0, r20, 108	/* Is dcbst? */
+	beq+	144f		/* Fix up store bit! */
+	cmpwi	cr0, r20, 172	/* Is dcbf? */
+	beq+	142f
+	cmpwi	cr0, r20, 1964	/* Is icbi? */
+	beq+	142f
+141:	mfspr	r20, DAR	/* r20 must hold DAR at exit */
+	b	DARFixed	/* Nope, go back to normal TLB processing */
+
+144:	mfspr	r20, DSISR
+	rlwinm	r20, r20,0,7,5	/* Clear store bit for buggy dcbst insn */
+	mtspr	DSISR, r20
+142:	/* continue, it was a dcbx, dcbi instruction. */
+#ifdef CONFIG_8xx_CPU6
+	lwz	r3, 8(r0)	/* restore r3 from memory */
+#endif
+#ifndef NO_SELF_MODIFYING_CODE
+	andis.	r20,r21,0x1f	/* test if reg RA is r0 */
+	li	r20,modified_instr@l
+	dcbtst	r0,r20		/* touch for store */
+	rlwinm	r21,r21,0,0,20	/* Zero lower 10 bits */
+	oris	r21,r21,640	/* Transform instr. to a "add r20,RA,RB" */
+	ori	r21,r21,532
+	stw	r21,0(r20)	/* store add/and instruction */
+	dcbf	0,r20		/* flush new instr. to memory. */
+	icbi	0,r20		/* invalidate instr. cache line */
+	lwz	r21, 4(r0)	/* restore r21 from memory */
+	mfspr	r20, M_TW	/* restore r20 from M_TW */
+	isync			/* Wait until new instr is loaded from memory */
+modified_instr:
+	.space	4		/* this is where the add instr. is stored */
+	bne+	143f
+	subf	r20,r0,r20	/* r20=r20-r0, only if reg RA is r0 */
+143:	mtdar	r20		/* store faulting EA in DAR */
+	b	DARFixed	/* Go back to normal TLB handling */
+#else
+	mfctr	r20
+	mtdar	r20			/* save ctr reg in DAR */
+	rlwinm	r20, r21, 24, 24, 28	/* offset into jump table for reg RB */
+	addi	r20, r20, 150f@l	/* add start of table */
+	mtctr	r20			/* load ctr with jump address */
+	xor	r20, r20, r20		/* sum starts at zero */
+	bctr				/* jump into table */
+150:
+	add	r20, r20, r0	;b	151f
+	add	r20, r20, r1	;b	151f
+	add	r20, r20, r2	;b	151f
+	add	r20, r20, r3	;b	151f
+	add	r20, r20, r4	;b	151f
+	add	r20, r20, r5	;b	151f
+	add	r20, r20, r6	;b	151f
+	add	r20, r20, r7	;b	151f
+	add	r20, r20, r8	;b	151f
+	add	r20, r20, r9	;b	151f
+	add	r20, r20, r10	;b	151f
+	add	r20, r20, r11	;b	151f
+	add	r20, r20, r12	;b	151f
+	add	r20, r20, r13	;b	151f
+	add	r20, r20, r14	;b	151f
+	add	r20, r20, r15	;b	151f
+	add	r20, r20, r16	;b	151f
+	add	r20, r20, r17	;b	151f
+	add	r20, r20, r18	;b	151f
+	add	r20, r20, r19	;b	151f
+	mtctr	r21	;b	154f	/* r20 needs special handling */
+	mtctr	r21	;b	153f	/* r21 needs special handling */
+	add	r20, r20, r22	;b	151f
+	add	r20, r20, r23	;b	151f
+	add	r20, r20, r24	;b	151f
+	add	r20, r20, r25	;b	151f
+	add	r20, r20, r26	;b	151f
+	add	r20, r20, r27	;b	151f
+	add	r20, r20, r28	;b	151f
+	add	r20, r20, r29	;b	151f
+	add	r20, r20, r30	;b	151f
+	add	r20, r20, r31
+151:
+	rlwinm. r21,r21,19,24,28	/* offset into jump table for reg RA */
+	beq	152f			/* if reg RA is zero, don't add it */ 
+	addi	r21, r21, 150b@l	/* add start of table */
+	mtctr	r21			/* load ctr with jump address */
+	rlwinm	r21,r21,0,16,10		/* make sure we don't execute this more than once */
+	bctr				/* jump into table */
+152:
+	mfdar	r21
+	mtctr	r21			/* restore ctr reg from DAR */
+	mtdar	r20			/* save fault EA to DAR */
+	b	DARFixed		/* Go back to normal TLB handling */
+
+	/* special handling for r20,r21 since these are modified already */
+153:	lwz	r21, 4(r0)	/* load r21 from memory */
+	b	155f
+154:	mfspr	r21, M_TW	/* load r20 from M_TW */
+155:	add	r20, r20, r21	/* add it */
+	mfctr	r21		/* restore r21 */
+	b	151b
+#endif
 
 /*
  * This code finishes saving the registers to the exception frame
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 04/14] 8xx: Fix CONFIG_PIN_TLB
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

The wrong register was loaded into MD_RPN.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index b3aff21..9d8a1b5 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -848,13 +848,13 @@ initial_mmu:
 	mtspr	MD_TWC, r9
 	li	r11, MI_BOOTINIT	/* Create RPN for address 0 */
 	addis	r11, r11, 0x0080	/* Add 8M */
-	mtspr	MD_RPN, r8
+	mtspr	MD_RPN, r11
 
 	addis	r8, r8, 0x0080		/* Add 8M */
 	mtspr	MD_EPN, r8
 	mtspr	MD_TWC, r9
 	addis	r11, r11, 0x0080	/* Add 8M */
-	mtspr	MD_RPN, r8
+	mtspr	MD_RPN, r11
 #endif
 
 	/* Since the cache is enabled according to the information we
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 03/14] 8xx: invalidate non present TLBs
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.
These must be invalidated separately as 8xx MMU don't.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index 57858ce..b3aff21 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -221,7 +221,11 @@ DataAccess:
 	mr	r5,r20
 	mfspr	r4,DAR
 	stw	r4,_DAR(r21)
-	li	r20,0x00f0
+	/* invalidate ~PRESENT TLBs, 8xx MMU don't do this */
+	andis.	r20,r5,0x4000
+	beq+	1f
+	tlbie	r4
+1:	li	r20,0x00f0
 	mtspr	DAR,r20	/* Tag DAR */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	li	r20,MSR_KERNEL
@@ -238,7 +242,11 @@ InstructionAccess:
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	mr	r4,r22
 	mr	r5,r23
-	li	r20,MSR_KERNEL
+	/* invalidate ~PRESENT TLBs, 8xx MMU don't do this */
+	andis.	r20,r5,0x4000
+	beq+	1f
+	tlbie	r4
+1:	li	r20,MSR_KERNEL
 	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
 	FINISH_EXCEPTION(do_page_fault)
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 01/14] 8xx: Use a macro to simpliy CPU6 errata code.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>


Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   84 +++++++++++--------------------------------
 1 files changed, 22 insertions(+), 62 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index f9a30f3..ba05a57 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -31,6 +31,15 @@
 #include <asm/ppc_asm.h>
 #include "ppc_defs.h"
 
+/* Macro to make the code more readable. */
+#ifdef CONFIG_8xx_CPU6
+  #define DO_8xx_CPU6(val, reg) \
+	li	reg, val; \
+	stw	reg, 12(r0); \
+	lwz	reg, 12(r0);
+#else
+  #define DO_8xx_CPU6(val, reg)
+#endif
 	.text
 	.globl	_stext
 _stext:
@@ -310,20 +319,14 @@ SystemCall:
 InstructionTLBMiss:
 #ifdef CONFIG_8xx_CPU6
 	stw	r3, 8(r0)
-	li	r3, 0x3f80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
 #endif
+	DO_8xx_CPU6(0x3f80, r3)
 	mtspr	M_TW, r20	/* Save a couple of working registers */
 	mfcr	r20
 	stw	r20, 0(r0)
 	stw	r21, 4(r0)
 	mfspr	r20, SRR0	/* Get effective address of fault */
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3780
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3780, r3)
 	mtspr	MD_EPN, r20	/* Have to use MD_EPN for walk, MI_EPN can't */
 	mfspr	r20, M_TWB	/* Get level 1 table entry address */
 
@@ -345,17 +348,9 @@ InstructionTLBMiss:
 	tophys(r21,r21)
 	ori	r21,r21,1		/* Set valid bit */
 	beq-	2f			/* If zero, don't try to find a pte */
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x2b80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x2b80, r3)
 	mtspr	MI_TWC, r21	/* Set segment attributes */
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3b80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21	/* Load pte table base address */
 	mfspr	r21, MD_TWC	/* ....and get the pte address */
 	lwz	r20, 0(r21)	/* Get the pte */
@@ -371,12 +366,7 @@ InstructionTLBMiss:
 	 */
 2:	li	r21, 0x00f0
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
-
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x2d80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	MI_RPN, r20	/* Update TLB entry */
 
 	mfspr	r20, M_TW	/* Restore registers */
@@ -392,10 +382,8 @@ InstructionTLBMiss:
 DataStoreTLBMiss:
 #ifdef CONFIG_8xx_CPU6
 	stw	r3, 8(r0)
-	li	r3, 0x3f80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
 #endif
+	DO_8xx_CPU6(0x3f80, r3)
 	mtspr	M_TW, r20	/* Save a couple of working registers */
 	mfcr	r20
 	stw	r20, 0(r0)
@@ -419,11 +407,7 @@ DataStoreTLBMiss:
 	tophys(r21, r21)
 	ori	r21, r21, 1	/* Set valid bit in physical L2 page */
 	beq-	2f		/* If zero, don't try to find a pte */
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3b80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21	/* Load pte table base address */
 	mfspr	r20, MD_TWC	/* ....and get the pte address */
 	lwz	r20, 0(r20)	/* Get the pte */
@@ -435,11 +419,7 @@ DataStoreTLBMiss:
 	 * above.
 	 */
 	rlwimi	r21, r20, 0, 27, 27
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3b80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21
 
 	mfspr	r21, MD_TWC	/* get the pte address again */
@@ -454,12 +434,7 @@ DataStoreTLBMiss:
 	 */
 2:	li	r21, 0x00f0
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
-
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3d80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	MD_RPN, r20	/* Update TLB entry */
 
 	mfspr	r20, M_TW	/* Restore registers */
@@ -491,10 +466,8 @@ InstructionTLBError:
 DataTLBError:
 #ifdef CONFIG_8xx_CPU6
 	stw	r3, 8(r0)
-	li	r3, 0x3f80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
 #endif
+	DO_8xx_CPU6(0x3f80, r3)
 	mtspr	M_TW, r20	/* Save a couple of working registers */
 	mfcr	r20
 	stw	r20, 0(r0)
@@ -527,11 +500,7 @@ DataTLBError:
 	ori	r21, r21, MD_EVALID
 	mfspr	r20, M_CASID
 	rlwimi	r21, r20, 0, 28, 31
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3780
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3780, r3)
 	mtspr	MD_EPN, r21
 
 	mfspr	r20, M_TWB	/* Get level 1 table entry address */
@@ -553,11 +522,7 @@ DataTLBError:
 	 */
 	tophys(r21, r21)
 	ori	r21, r21, 1		/* Set valid bit in physical L2 page */
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3b80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3b80, r3)
 	mtspr	MD_TWC, r21		/* Load pte table base address */
 	mfspr	r21, MD_TWC		/* ....and get the pte address */
 	lwz	r20, 0(r21)		/* Get the pte */
@@ -579,12 +544,7 @@ DataTLBError:
 	 */
 	li	r21, 0x00f0
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
-
-#ifdef CONFIG_8xx_CPU6
-	li	r3, 0x3d80
-	stw	r3, 12(r0)
-	lwz	r3, 12(r0)
-#endif
+	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	MD_RPN, r20	/* Update TLB entry */
 
 	mfspr	r20, M_TW	/* Restore registers */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 02/14] 8xx: Tag DAR with 0x00f0 to catch buggy instructions.
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek
In-Reply-To: <1318246220-4839-1-git-send-email-Joakim.Tjernlund@transmode.se>

dcbz, dcbf, dcbi, dcbst and icbi do not set DAR when they
cause a DTLB Error. Dectect this by tagging DAR with 0x00f0
at every exception exit that modifies DAR.
This also fixes MachineCheck to pass DAR and DSISR as well.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/ppc/kernel/head_8xx.S |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/ppc/kernel/head_8xx.S b/arch/ppc/kernel/head_8xx.S
index ba05a57..57858ce 100644
--- a/arch/ppc/kernel/head_8xx.S
+++ b/arch/ppc/kernel/head_8xx.S
@@ -197,7 +197,17 @@ label:						\
 	STD_EXCEPTION(0x100, Reset, UnknownException)
 
 /* Machine check */
-	STD_EXCEPTION(0x200, MachineCheck, MachineCheckException)
+	. = 0x200
+MachineCheck:
+	EXCEPTION_PROLOG
+	mfspr	r20,DSISR
+	stw	r20,_DSISR(r21)
+	mfspr	r20,DAR
+	stw	r20,_DAR(r21)
+	li	r20,0x00f0
+	mtspr	DAR,r20	/* Tag DAR */
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	FINISH_EXCEPTION(MachineCheckException)
 
 /* Data access exception.
  * This is "never generated" by the MPC8xx.  We jump to it for other
@@ -211,6 +221,8 @@ DataAccess:
 	mr	r5,r20
 	mfspr	r4,DAR
 	stw	r4,_DAR(r21)
+	li	r20,0x00f0
+	mtspr	DAR,r20	/* Tag DAR */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	li	r20,MSR_KERNEL
 	rlwimi	r20,r23,0,16,16		/* copy EE bit from saved MSR */
@@ -249,6 +261,8 @@ Alignment:
 	EXCEPTION_PROLOG
 	mfspr	r4,DAR
 	stw	r4,_DAR(r21)
+	li	r20,0x00f0
+	mtspr	DAR,r20	/* Tag DAR */
 	mfspr	r5,DSISR
 	stw	r5,_DSISR(r21)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -433,6 +447,7 @@ DataStoreTLBMiss:
 	 * of the MMU.
 	 */
 2:	li	r21, 0x00f0
+	mtspr	DAR, r21	/* Tag DAR */
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	MD_RPN, r20	/* Update TLB entry */
@@ -543,6 +558,7 @@ DataTLBError:
 	 * of the MMU.
 	 */
 	li	r21, 0x00f0
+	mtspr	DAR, r21	/* Tag DAR */
 	rlwimi	r20, r21, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	MD_RPN, r20	/* Update TLB entry */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH 00/14] Backport 8xx TLB to 2.4
From: Joakim Tjernlund @ 2011-10-10 11:30 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood, Willy Tarreau, Dan Malek

This is a backport from 2.6 which I did to overcome 8xx CPU
bugs. 8xx does not update the DAR register when taking a TLB
error caused by dcbX and icbi insns which makes it very
tricky to use these insns. Also the dcbst wrongly sets the
the store bit when faulting into DTLB error.
A few more bugs very found during development.

I know 2.4 is in strict maintenance mode and 8xx is obsolete
but as it is still in use I wanted 8xx to age with grace.

Addendum:
I have now ported our 8xx custom board to 2.4.37.11 and
tested these patches there.

V2:
 - Remove mandatory pinning of kernel ITLB. It is not
   needed in 2.4

8 MB Large page support will follow.

Joakim Tjernlund (14):
  8xx: Use a macro to simpliy CPU6 errata code.
  8xx: Tag DAR with 0x00f0 to catch buggy instructions.
  8xx: invalidate non present TLBs
  8xx: Fix CONFIG_PIN_TLB
  8xx: Update TLB asm so it behaves as linux mm expects.
  8xx: Fixup DAR from buggy dcbX instructions.
  8xx: CPU6 errata make DTLB error too big to fit.
  8xx: Add missing Guarded setting in DTLB Error.
  8xx: Restore _PAGE_WRITETHRU
  8xx: Set correct HW pte flags in DTLB Error too
  8xx: start using dcbX instructions in various copy routines
  8xx: Use symbolic constants in TLB asm
  8xx: Optimize TLB Miss handlers
  8xx: The TLB miss handler manages ACCESSED correctly.

 arch/ppc/kernel/head_8xx.S |  367 ++++++++++++++++++++++++++++++-------------
 arch/ppc/kernel/misc.S     |   18 --
 arch/ppc/lib/string.S      |   17 --
 include/asm-ppc/pgtable.h  |   26 +--
 4 files changed, 264 insertions(+), 164 deletions(-)

-- 
1.7.3.4

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10 10:18 UTC (permalink / raw)
  To: Eli Cohen
  Cc: netdev, Yevgeny Petrilin, Eli Cohen, David Laight,
	Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <20111010092926.GO2681@mtldesk30>

On Mon, 2011-10-10 at 11:29 +0200, Eli Cohen wrote:
> On Mon, Oct 10, 2011 at 11:24:05AM +0200, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:
> > 
> > > Until then I think we need to have the logic working right on ppc and
> > > measure if blue flame buys us any improvement in ppc. If that's not
> > > the case (e.g because write combining is not working), then maybe we
> > > should avoid using blueflame in ppc.
> > > Could any of the guys from IBM check this and give us feedback?
> > 
> > I don't have the necessary hardware myself to test that but maybe Thadeu
> > can.
> > 
> > Note that for WC to work, things must be mapped non-guarded. You can do
> > that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
> > ioremap_wc() (dunno how "generic" the later is).
> 
> I use the io mapping API:
> 
> at driver statrt:
>         priv->bf_mapping = io_mapping_create_wc(bf_start, bf_len);
>         if (!priv->bf_mapping)
>                 err = -ENOMEM;
> 
> and then:
>         uar->bf_map = io_mapping_map_wc(priv->bf_mapping, uar->index << PAGE_SHIFT);
> 
>         
> Will this work on ppc?

That API has never been tested on ppc I suspect. We don't have
CONFIG_HAVE_ATOMIC_IOMAP (mostly because we never needed it, it
was designed and only ever used for Intel graphics before), so
it will fallback to:

static inline struct io_mapping *
io_mapping_create_wc(resource_size_t base, unsigned long size)
{
	return (struct io_mapping __force *) ioremap_wc(base, size);
}

Which should work (hopefully :-)

Cheers,
Ben.

^ permalink raw reply

* [PATCH 3/3] [44x] Enable CRASH_DUMP for 440x
From: Suzuki K. Poulose @ 2011-10-10  9:57 UTC (permalink / raw)
  To: linux ppc dev
  Cc: Michal Simek, tmarri, Mahesh Jagannath Salgaonkar, Dave Hansen,
	David Laight, Suzuki K. Poulose, Scott Wood, linuxppc-dev,
	Vivek Goyal
In-Reply-To: <20111010094627.16589.52367.stgit@suzukikp.in.ibm.com>

Now that we have relocatable kernel, supporting CRASH_DUMP only requires
turning the switches on for UP machines.

We don't have kexec support on 47x yet. Enabling SMP support would be done
as part of enabling the PPC_47x support.


Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:	Josh Boyer <jwboyer@gmail.com>
Cc:	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 99558d6..fc41ce5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -362,8 +362,8 @@ config KEXEC
 
 config CRASH_DUMP
 	bool "Build a kdump crash kernel"
-	depends on PPC64 || 6xx || FSL_BOOKE
-	select RELOCATABLE if PPC64 || FSL_BOOKE
+	depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
+	select RELOCATABLE if PPC64 || FSL_BOOKE || 44x
 	help
 	  Build a kernel suitable for use as a kdump capture kernel.
 	  The same kernel binary can be used as production kernel and dump

^ permalink raw reply related

* [PATCH 2/3] [44x] Enable CONFIG_RELOCATABLE for PPC44x
From: Suzuki K. Poulose @ 2011-10-10  9:56 UTC (permalink / raw)
  To: linux ppc dev
  Cc: Michal Simek, tmarri, Mahesh Jagannath Salgaonkar, Dave Hansen,
	David Laight, Suzuki K. Poulose, Scott Wood, Paul Mackerras,
	linuxppc-dev, Vivek Goyal
In-Reply-To: <20111010094627.16589.52367.stgit@suzukikp.in.ibm.com>

The following patch adds relocatable support for PPC44x kernel.

We find the runtime address of _stext and relocate ourselves based
on the following calculation.

	virtual_base = ALIGN(KERNELBASE,256M) +
			MODULO(_stext.run,256M)

relocate() is called with the Effective Virtual Base Address (as
shown below)

            | Phys. Addr| Virt. Addr |
Page (256M) |------------------------|
Boundary    |           |            |
            |           |            |
            |           |            |
Kernel Load |___________|_ __ _ _ _ _|<- Effective
Addr(_stext)|           |      ^     |Virt. Base Addr
            |           |      |     |
            |           |      |     |
            |           |reloc_offset|
            |           |      |     |
            |           |      |     |
            |           |______v_____|<-(KERNELBASE)%256M
            |           |            |
            |           |            |
            |           |            |
Page(256M)  |-----------|------------|
Boundary    |           |            |


On BookE, we need __va() & __pa() early in the boot process to access
the device tree.

Currently this has been defined as :

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) -
						PHYSICAL_START + KERNELBASE)
where:
 PHYSICAL_START is kernstart_addr - a variable updated at runtime.
 KERNELBASE	is the compile time Virtual base address of kernel.

This won't work for us, as kernstart_addr is dynamic and will yield different
results for __va()/__pa() for same mapping.

e.g.,

Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as
PAGE_OFFSET).

In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M

Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
		= 0xbc100000 , which is wrong.

it should be : 0xc0000000 + 0x100000 = 0xc0100000

On PPC_47x (which is based on 44x), the kernel could be loaded at highmem.
Hence we cannot always depend on the compile time constants for mapping.

Here are the possible solutions:

1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of
compile time KERNELBASE value, instead of the actual Physical_Address(_stext).

The disadvantage is that we may break other users of PHYSICAL_START. They
could be replaced with __pa(_stext).

2) Redefine __va() & __pa() with relocation offset


#if defined(CONFIG_RELOCATABLE) && defined(CONFIG_44x)
#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET)))
#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET))
#endif

where, RELOC_OFFSET could be

  a) A variable, say relocation_offset (like kernstart_addr), updated
     at boot time. This impacts performance, as we have to load an additional
     variable from memory.

		OR

  b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \
                      (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK))

   This introduces more calculations for doing the translation.

3) Redefine __va() & __pa() with a new variable

i.e,

#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))

where VIRT_PHYS_OFFSET :

#ifdef CONFIG_44x
#define VIRT_PHYS_OFFSET virt_phys_offset
#else
#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
#endif /* 44x */

where virt_phy_offset is updated at runtime to :

	Effective KERNELBASE - kernstart_addr.

Taking our example, above:

virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr
		 = 0xc0400000 - 0x400000
		 = 0xc0000000
	and

	__va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000
	 which is what we want.

I have implemented (3) in the following patch which has same cost of
operation as the existing one.

I have tested the patches on 440x platforms only. However this should
work fine for PPC_47x also, as we only depend on the runtime address
and the current TLB XLAT entry for the startup code, which is available
in r25. I don't have access to a 47x board yet. So, it would be great if
somebody could test this on 47x.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:	Paul Mackerras <paulus@samba.org>
Cc:	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:	Kumar Gala <galak@kernel.crashing.org>
Cc:	Tony Breeds <tony@bakeyournoodle.com>
Cc:	Josh Boyer <jwboyer@gmail.com>
Cc:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig            |    2 -
 arch/powerpc/Makefile           |    1 
 arch/powerpc/include/asm/page.h |   84 +++++++++++++++++++++++++++++-
 arch/powerpc/kernel/head_44x.S  |  111 ++++++++++++++++++++++++++++++++++-----
 arch/powerpc/mm/init_32.c       |    7 ++
 5 files changed, 187 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9eb2e60..99558d6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -843,7 +843,7 @@ config LOWMEM_CAM_NUM
 
 config RELOCATABLE
 	bool "Build a relocatable kernel (EXPERIMENTAL)"
-	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || PPC_47x)
+	depends on EXPERIMENTAL && ADVANCED_OPTIONS && FLATMEM && (FSL_BOOKE || 44x || PPC_47x)
 	help
 	  This builds a kernel image that is capable of running at the
 	  location the kernel is loaded at (some alignment restrictions may
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 57af16e..632b3dd 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -65,6 +65,7 @@ endif
 
 LDFLAGS_vmlinux-yy := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_PPC64)$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux-$(CONFIG_44x)$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-yy)
 
 CFLAGS-$(CONFIG_PPC64)	:= -mminimal-toc -mtraceback=no -mcall-aixdesc
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index dd9c4fd..6898542 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -97,10 +97,25 @@ extern unsigned int HPAGE_SHIFT;
 
 extern phys_addr_t memstart_addr;
 extern phys_addr_t kernstart_addr;
+
+#ifdef CONFIG_44x
+extern long long virt_phys_offset;
 #endif
+
+#endif /* __ASSEMBLY__ */
 #define PHYSICAL_START	kernstart_addr
+
+
+/* See Description below for VIRT_PHYS_OFFSET */
+#ifdef CONFIG_44x
+#define VIRT_PHYS_OFFSET virt_phys_offset
 #else
+#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
+#endif /* 44x */
+
+#else	/* !CONFIG_RELOCATABLE */
 #define PHYSICAL_START	ASM_CONST(CONFIG_PHYSICAL_START)
+#define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
 #endif
 
 #ifdef CONFIG_PPC64
@@ -125,12 +140,77 @@ extern phys_addr_t kernstart_addr;
  * determine MEMORY_START until then.  However we can determine PHYSICAL_START
  * from information at hand (program counter, TLB lookup).
  *
+ *  Relocation on 44x
+ *
+ *  On 44x, we support loading the kernel at any physical address without
+ *  any restriction on the page alignment.
+ *
+ *  We find the runtime address of _stext and relocate ourselves based on 
+ *  the following calculation:
+ *
+ *  	virtual_base = ALIGN_DOWN(KERNELBASE,256M) +
+ *  				MODULO(_stext.run,256M)
+ *  and create the following mapping:
+ *
+ * 	 ALIGN_DOWN(_stext.run,256M) => ALIGN_DOWN(KERNELBASE,256M)
+ *
+ * When we process relocations, we cannot depend on the
+ * existing equation for the __va()/__pa() translations:
+ *
+ * 	 __va(x) = (x)  - PHYSICAL_START + KERNELBASE
+ *
+ *  Where:
+ *  	PHYSICAL_START = kernstart_addr = Physical address of _stext
+ *  	KERNELBASE = Compiled virtual address of _stext.
+ *
+ * This formula holds true iff, kernel load address is TLB page aligned.
+ *
+ * In our case, we need to also account for the shift in the kernel Virtual 
+ * address.
+ *
+ * E.g.,
+ *
+ * Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as PAGE_OFFSET).
+ * In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M
+ *
+ * Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000
+ *               = 0xbc100000 , which is wrong.
+ *
+ * Rather, it should be : 0xc0000000 + 0x100000 = 0xc0100000
+ * 	according to our mapping.
+ *
+ * Hence we use the following formula to get the translations right:
+ *
+ * 	__va(x) = (x) - [ PHYSICAL_START - Effective KERNELBASE ]
+ *
+ * 	Where :
+ * 		PHYSICAL_START = dynamic load address.(kernstart_addr variable)
+ * 		Effective KERNELBASE = virtual_base =
+ * 				     = ALIGN_DOWN(KERNELBASE,256M) +
+ * 						MODULO(PHYSICAL_START,256M)
+ *
+ * 	To make the cost of __va() / __pa() more light weight, we introduce
+ * 	a new variable virt_phys_offset, which will hold :
+ *
+ * 	virt_phys_offset = Effective KERNELBASE - PHYSICAL_START
+ * 			 = ALIGN_DOWN(KERNELBASE,256M) - 
+ * 			 	ALIGN_DOWN(PHYSICALSTART,256M)
+ *
+ * 	Hence :
+ *
+ * 	__va(x) = x - PHYSICAL_START + Effective KERNELBASE
+ * 		= x + virt_phys_offset
+ *
+ * 		and
+ * 	__pa(x) = x + PHYSICAL_START - Effective KERNELBASE
+ * 		= x - virt_phys_offset
+ * 		
  * On non-Book-E PPC64 PAGE_OFFSET and MEMORY_START are constants so use
  * the other definitions for __va & __pa.
  */
 #ifdef CONFIG_BOOKE
-#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + KERNELBASE))
-#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - KERNELBASE)
+#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET))
+#define __pa(x) ((unsigned long)(x) - VIRT_PHYS_OFFSET)
 #else
 #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + PAGE_OFFSET - MEMORY_START))
 #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET + MEMORY_START)
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index b725dab..8f57c31 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -64,6 +64,35 @@ _ENTRY(_start);
 	mr	r31,r3		/* save device tree ptr */
 	li	r24,0		/* CPU number */
 
+#if defined(CONFIG_RELOCATABLE)
+/*
+ * Relocate ourselves to the current runtime address.
+ * This is called only by the Boot CPU.
+ * "relocate" is called with our current runtime virutal
+ * address.
+ * r21 will be loaded with the physical runtime address of _stext
+ */
+	bl	0f				/* Get our runtime address */
+0:	mflr	r21				/* Make it accessible */
+	addis	r21,r21,(_stext - 0b)@ha
+	addi	r21,r21,(_stext - 0b)@l 	/* Get our current runtime base */
+
+	/*
+	 * We have the runtime (virutal) address of our base.
+	 * We calculate our shift of offset from a 256M page.
+	 * We could map the 256M page we belong to at PAGE_OFFSET and
+	 * get going from there.
+	 */
+	lis	r4,KERNELBASE@h
+	ori	r4,r4,KERNELBASE@l
+	rlwinm	r6,r21,0,4,31			/* r6 = PHYS_START % 256M */
+	rlwinm	r5,r4,0,4,31			/* r5 = KERNELBASE % 256M */
+	subf	r3,r5,r6			/* r3 = r6 - r5 */
+	add	r3,r4,r3			/* Required Virutal Address */
+
+	bl	relocate
+#endif
+
 	bl	init_cpu_state
 
 	/*
@@ -88,27 +117,60 @@ _ENTRY(_start);
 
 #ifdef CONFIG_RELOCATABLE
 	/*
-	 * r25 will contain RPN/ERPN for the start address of memory
-	 *
-	 * Add the difference between KERNELBASE and PAGE_OFFSET to the
-	 * start of physical memory to get kernstart_addr.
+	 * When we reach here :
+	 * r25 holds RPN/ERPN for the start address of memory
+	 * r21 contain the physical address of _stext
 	 */
 	lis	r3,kernstart_addr@ha
 	la	r3,kernstart_addr@l(r3)
 
-	lis	r4,KERNELBASE@h
-	ori	r4,r4,KERNELBASE@l
-	lis	r5,PAGE_OFFSET@h
-	ori	r5,r5,PAGE_OFFSET@l
-	subf	r4,r5,r4
-
-	rlwinm	r6,r25,0,28,31	/* ERPN */
+	/*
+	 * Compute the kernstart_addr.
+	 * kernstart_addr => (r6,r8)
+	 * kernstart_addr & ~0xfffffff => (r6,r7)
+	 */
+	rlwinm	r6,r25,0,28,31	/* ERPN. Bits 32-35 of Address */
 	rlwinm	r7,r25,0,0,3	/* RPN - assuming 256 MB page size */
-	add	r7,r7,r4
+	rlwinm	r8,r21,0,4,31	/* r8 = (_stext & 0xfffffff) */
+	or	r8,r7,r8	/* Compute the lower 32bit of kernstart_addr */
+
+	/* Store kernstart_addr */
+	stw	r6,0(r3)	/* higher 32bit */
+	stw	r8,4(r3)	/* lower 32bit  */
+
+	/* 
+	 * Compute the virt_phys_offset :
+	 * virt_phys_offset = stext.run - kernstart_addr
+	 * 
+	 * stext.run = (KERNELBASE & ~0xfffffff) + (kernstart_addr & 0xfffffff)
+	 * When we relocate, we have :
+	 *
+	 *	(kernstart_addr & 0xfffffff) = (stext.run & 0xfffffff) 
+	 *
+	 * hence:
+	 *  virt_phys_offset = (KERNELBASE & ~0xfffffff) - (kernstart_addr & ~0xfffffff)
+	 * 
+	 */
 
-	stw	r6,0(r3)
-	stw	r7,4(r3)
-#endif
+	/* KERNELBASE&~0xfffffff => (r4,r5) */
+	li	r4, 0		/* higer 32bit */
+	lis	r5,KERNELBASE@h
+	rlwinm	r5,r5,0,0,3	/* Align to 256M, lower 32bit */
+
+	/* 
+	 * 64bit subtraction.
+	 */ 
+	subfc	r5,r7,r5
+	subfe	r4,r6,r4
+
+	/* Store virt_phys_offset */
+	lis	r3,virt_phys_offset@ha
+	la	r3,virt_phys_offset@l(r3)
+
+	stw	r4,0(r3)
+	stw	r5,4(r3)
+
+#endif	/* CONFIG_RELOCATABLE */
 
 /*
  * Decide what sort of machine this is and initialize the MMU.
@@ -801,11 +863,30 @@ skpinv:	addi	r4,r4,1				/* Increment */
  * Configure and load pinned entry into TLB slot 63.
  */
 
+#ifdef CONFIG_RELOCATABLE
+	/*
+	 * Stores the XLAT entry for this code at r25.
+	 * Uses the mapping where we are loaded.
+	 */
+
+	tlbre	r25,r23,PPC44x_TLB_XLAT		/* Read our XLAT entry in r25 */
+
+	/* PAGEID fields for mapping */
+	lis	r3,KERNELBASE@h
+	rlwinm	r3,r3,0,0,3			/* Round to 256M page boundary */
+
+	/* Use the current XLAT entry */
+	mr	r4,r25
+#else
+
+
 	lis	r3,PAGE_OFFSET@h
 	ori	r3,r3,PAGE_OFFSET@l
 
 	/* Kernel is at the base of RAM */
 	li r4, 0			/* Load the kernel physical address */
+#endif
+
 
 	/* Load the kernel PID = 0 */
 	li	r0,0
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 161cefd..a249edb 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -65,6 +65,13 @@ phys_addr_t memstart_addr = (phys_addr_t)~0ull;
 EXPORT_SYMBOL(memstart_addr);
 phys_addr_t kernstart_addr;
 EXPORT_SYMBOL(kernstart_addr);
+
+#if	defined(CONFIG_44x) && defined(CONFIG_RELOCATABLE)
+/* Used in __va()/__pa() for 44x */
+long long virt_phys_offset;
+EXPORT_SYMBOL(virt_phys_offset);
+#endif
+
 phys_addr_t lowmem_end_addr;
 
 int boot_mapsize;

^ permalink raw reply related

* [PATCH 1/3] [powerpc32] Process dynamic relocations for kernel
From: Suzuki K. Poulose @ 2011-10-10  9:55 UTC (permalink / raw)
  To: linux ppc dev
  Cc: Michal Simek, tmarri, Mahesh Jagannath Salgaonkar, Alan Modra,
	Dave Hansen, David Laight, Suzuki K. Poulose, Scott Wood,
	Paul Mackerras, linuxppc-dev, Vivek Goyal
In-Reply-To: <20111010094627.16589.52367.stgit@suzukikp.in.ibm.com>

The following patch implements the dynamic relocation processing for
PPC32 kernel. relocate() accepts the target virtual address and relocates
 the kernel image to the same.

Currently the following relocation types are handled :

	R_PPC_RELATIVE
	R_PPC_ADDR16_LO
	R_PPC_ADDR16_HI
	R_PPC_ADDR16_HA

The last 3 relocations in the above list depends on value of Symbol indexed
whose index is encoded in the Relocation entry. Hence we need the Symbol
Table for processing such relocations.

Note: The GNU ld for ppc32 produces buggy relocations for relocation types
that depend on symbols. The value of the symbols with STB_LOCAL scope
should be assumed to be zero. - Alan Modra

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:	Paul Mackerras <paulus@samba.org>
Cc:	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:	Alan Modra <amodra@au1.ibm.com>
Cc:	Kumar Gala <galak@kernel.crashing.org>
Cc:	Josh Boyer <jwboyer@gmail.com>
Cc:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
---

 arch/powerpc/Kconfig              |    4 +
 arch/powerpc/kernel/Makefile      |    2 
 arch/powerpc/kernel/reloc_32.S    |  194 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S |    8 +-
 4 files changed, 207 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/kernel/reloc_32.S

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8523bd1..9eb2e60 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -859,6 +859,10 @@ config RELOCATABLE
 	  setting can still be useful to bootwrappers that need to know the
 	  load location of the kernel (eg. u-boot/mkimage).
 
+config RELOCATABLE_PPC32
+	def_bool y
+	depends on PPC32 && RELOCATABLE
+
 config PAGE_OFFSET_BOOL
 	bool "Set custom page offset address"
 	depends on ADVANCED_OPTIONS
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ce4f7f1..ee728e4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -85,6 +85,8 @@ extra-$(CONFIG_FSL_BOOKE)	:= head_fsl_booke.o
 extra-$(CONFIG_8xx)		:= head_8xx.o
 extra-y				+= vmlinux.lds
 
+obj-$(CONFIG_RELOCATABLE_PPC32)	+= reloc_32.o
+
 obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o
 obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
diff --git a/arch/powerpc/kernel/reloc_32.S b/arch/powerpc/kernel/reloc_32.S
new file mode 100644
index 0000000..045d61e
--- /dev/null
+++ b/arch/powerpc/kernel/reloc_32.S
@@ -0,0 +1,194 @@
+/*
+ * Code to process dynamic relocations for PPC32.
+ *
+ * Copyrights (C) IBM Corporation, 2011.
+ *	Author: Suzuki Poulose <suzuki@in.ibm.com>
+ *
+ *  - Based on ppc64 code - reloc_64.S
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/ppc_asm.h>
+
+/* Dynamic section table entry tags */
+DT_RELA = 7			/* Tag for Elf32_Rela section */
+DT_RELASZ = 8			/* Size of the Rela relocs */
+DT_RELAENT = 9			/* Size of one Rela reloc entry */
+
+STN_UNDEF = 0			/* Undefined symbol index */
+STB_LOCAL = 0			/* Local binding for the symbol */
+
+R_PPC_ADDR16_LO = 4		/* Lower half of (S+A) */
+R_PPC_ADDR16_HI = 5		/* Upper half of (S+A) */
+R_PPC_ADDR16_HA = 6		/* High Adjusted (S+A) */
+R_PPC_RELATIVE = 22
+
+/*
+ * r3 = desired final address
+ */
+
+_GLOBAL(relocate)
+
+	mflr	r0
+	bl	0f		/* Find our current runtime address */
+0:	mflr	r12		/* Make it accessible */
+	mtlr	r0
+
+	lwz	r11, (p_dyn - 0b)(r12)
+	add	r11, r11, r12	/* runtime address of .dynamic section */
+	lwz	r9, (p_rela - 0b)(r12)
+	add	r9, r9, r12	/* runtime address of .rela.dyn section */
+	lwz	r10, (p_st - 0b)(r12)
+	add	r10, r10, r12	/* runtime address of _stext section */
+	lwz	r13, (p_sym - 0b)(r12)
+	add	r13, r13, r12	/* runtime address of .dynsym section */
+
+	/*
+	 * Scan the dynamic section for RELA, RELASZ entries
+	 */
+	li	r6, 0
+	li	r7, 0
+	li	r8, 0
+1:	lwz	r5, 0(r11)	/* ELF_Dyn.d_tag */
+	cmpwi	r5, 0		/* End of ELF_Dyn[] */
+	beq	eodyn
+	cmpwi	r5, DT_RELA
+	bne	relasz
+	lwz	r7, 4(r11)	/* r7 = rela.link */
+	b	skip
+relasz:
+	cmpwi	r5, DT_RELASZ
+	bne	relaent
+	lwz	r8, 4(r11)	/* r8 = Total Rela relocs size */
+	b	skip
+relaent:
+	cmpwi	r5, DT_RELAENT
+	bne	skip
+	lwz	r6, 4(r11)	/* r6 = Size of one Rela reloc */
+skip:
+	addi	r11, r11, 8
+	b	1b
+eodyn:				/* End of Dyn Table scan */
+
+	/* Check if we have found all the entries */
+	cmpwi	r7, 0
+	beq	done
+	cmpwi	r8, 0
+	beq	done
+	cmpwi	r6, 0
+	beq	done
+
+
+	/*
+	 * Work out the current offset from the link time address of .rela
+	 * section.
+	 *  cur_offset[r7] = rela.run[r9] - rela.link [r7]
+	 *  _stext.link[r10] = _stext.run[r10] - cur_offset[r7]
+	 *  final_offset[r3] = _stext.final[r3] - _stext.link[r10]
+	 */
+	subf	r7, r7, r9	/* cur_offset */
+	subf	r10, r7, r10
+	subf	r3, r10, r3	/* final_offset */
+
+	subf	r8, r6, r8	/* relaz -= relaent */
+	/*
+	 * Scan through the .rela table and process each entry
+	 * r9	- points to the current .rela table entry
+	 * r13	- points to the symbol table
+	 */
+
+	/*
+	 * Check if we have a relocation based on symbol
+	 * r5 will hold the value of the symbol.
+	 */
+applyrela:
+	lwz	r4, 4(r9)
+	srwi	r5, r4, 8		/* ELF32_R_SYM(r_info) */
+	cmpwi	r5, STN_UNDEF	/* sym == STN_UNDEF ? */
+	beq	get_type	/* value = 0 */
+	/* Find the value of the symbol at index(r5) */
+	slwi	r5, r5, 4		/* r5 = r5 * sizeof(Elf32_Sym) */
+	add	r12, r13, r5	/* r12 = &__dyn_sym[Index] */
+
+	/*
+	 * GNU ld has a bug, where dynamic relocs based on
+	 * STB_LOCAL symbols, the value should be assumed
+	 * to be zero. - Alan Modra
+	 */
+	/* XXX: Do we need to check if we are using GNU ld ? */
+	lbz	r5, 12(r12)	/* r5 = dyn_sym[Index].st_info */
+	extrwi	r5, r5, 4, 24	/* r5 = ELF32_ST_BIND(r5) */
+	cmpwi	r5, STB_LOCAL	/* st_value = 0, ld bug */
+	beq	get_type	/* We have r5 = 0 */
+	lwz	r5, 4(r12)	/* r5 = __dyn_sym[Index].st_value */
+
+get_type:
+	/* r4 holds the relocation type */
+	extrwi	r4, r4, 8, 24	/* r4 = ((char*)r4)[3] */
+
+	/* R_PPC_RELATIVE */
+	cmpwi	r4, R_PPC_RELATIVE
+	bne	hi16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3	/* final addend */
+	stwx	r0, r4, r7	/* memory[r4+r7]) = (u32)r0 */
+	b	nxtrela		/* continue */
+
+	/* R_PPC_ADDR16_HI */
+hi16:
+	cmpwi	r4, R_PPC_ADDR16_HI
+	bne	ha16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r0, r0, 16, 0	/* r0 = (r0 >> 16) */
+	b	store_half
+
+	/* R_PPC_ADDR16_HA */
+ha16:
+	cmpwi	r4, R_PPC_ADDR16_HA
+	bne	lo16
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r5, r0, 1, 16	/* Extract bit 16 */
+	extrwi	r0, r0, 16, 0	/* r0 = (r0 >> 16) */
+	add	r0, r0, r5	/* Add it to r0 */
+	b	store_half
+
+	/* R_PPC_ADDR16_LO */
+lo16:
+	cmpwi	r4, R_PPC_ADDR16_LO
+	bne	nxtrela
+	lwz	r4, 0(r9)	/* r_offset */
+	lwz	r0, 8(r9)	/* r_addend */
+	add	r0, r0, r3
+	add	r0, r0, r5	/* r0 = (S+A+Offset) */
+	extrwi	r0, r0, 16, 16	/* r0 &= 0xffff */
+	/* Fall through to */
+
+	/* Store half word */
+store_half:
+	sthx	r0, r4, r7	/* memory[r4+r7] = (u16)r0 */
+
+nxtrela:
+	cmpwi	r8, 0		/* relasz = 0 ? */
+	ble	done
+	add	r9, r9, r6	/* move to next entry in the .rela table */
+	subf	r8, r6, r8	/* relasz -= relaent */
+	b	applyrela
+
+done:	blr
+
+
+p_dyn:		.long	__dynamic_start - 0b
+p_rela:		.long	__rela_dyn_start - 0b
+p_sym:		.long	__dynamic_symtab - 0b
+p_st:		.long	_stext - 0b
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 920276c..710a540 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -170,7 +170,13 @@ SECTIONS
 	}
 #ifdef CONFIG_RELOCATABLE
 	. = ALIGN(8);
-	.dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET) { *(.dynsym) }
+	.dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET)
+	{
+#ifdef CONFIG_RELOCATABLE_PPC32
+		__dynamic_symtab = .;
+#endif
+		*(.dynsym)
+	}
 	.dynstr : AT(ADDR(.dynstr) - LOAD_OFFSET) { *(.dynstr) }
 	.dynamic : AT(ADDR(.dynamic) - LOAD_OFFSET)
 	{

^ permalink raw reply related

* [PATCH 0/3] Kdump support for PPC440x
From: Suzuki K. Poulose @ 2011-10-10  9:54 UTC (permalink / raw)
  To: linux ppc dev
  Cc: Michal Simek, tmarri, Mahesh Jagannath Salgaonkar, Dave Hansen,
	David Laight, Scott Wood, Vivek Goyal

The following series implements CRASH_DUMP support for PPC440x. The
patches apply on top of power-next tree. This set also adds support
for CONFIG_RELOCATABLE on 44x.

I have tested the patches on Ebony and Virtex(QEMU Emulated). Testing
these patches would require latest snapshot of kexec-tools git tree and
(preferrably) the following patch for kexec-tools :

	http://lists.infradead.org/pipermail/kexec/2011-October/005552.html

---

Suzuki K. Poulose (3):
      [44x] Enable CRASH_DUMP for 440x
      [44x] Enable CONFIG_RELOCATABLE for PPC44x
      [powerpc32] Process dynamic relocations for kernel


 arch/powerpc/Kconfig              |   10 +-
 arch/powerpc/Makefile             |    1 
 arch/powerpc/include/asm/page.h   |   84 ++++++++++++++++
 arch/powerpc/kernel/Makefile      |    2 
 arch/powerpc/kernel/head_44x.S    |  111 ++++++++++++++++++---
 arch/powerpc/kernel/reloc_32.S    |  194 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/vmlinux.lds.S |    8 +-
 arch/powerpc/mm/init_32.c         |    7 +
 8 files changed, 396 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/kernel/reloc_32.S

-- 
Thanks
Suzuki

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-10  9:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: netdev, Yevgeny Petrilin, Eli Cohen, David Laight,
	Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <1318238645.29415.426.camel@pasglop>

On Mon, Oct 10, 2011 at 11:24:05AM +0200, Benjamin Herrenschmidt wrote:
> On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:
> 
> > Until then I think we need to have the logic working right on ppc and
> > measure if blue flame buys us any improvement in ppc. If that's not
> > the case (e.g because write combining is not working), then maybe we
> > should avoid using blueflame in ppc.
> > Could any of the guys from IBM check this and give us feedback?
> 
> I don't have the necessary hardware myself to test that but maybe Thadeu
> can.
> 
> Note that for WC to work, things must be mapped non-guarded. You can do
> that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
> ioremap_wc() (dunno how "generic" the later is).

I use the io mapping API:

at driver statrt:
        priv->bf_mapping = io_mapping_create_wc(bf_start, bf_len);
        if (!priv->bf_mapping)
                err = -ENOMEM;

and then:
        uar->bf_map = io_mapping_map_wc(priv->bf_mapping, uar->index << PAGE_SHIFT);

        
Will this work on ppc?

> 
> >From there, you should get write combining provided that you don't have
> barriers between every access (ie those copy operations in their current
> form should do the trick).
> 
> Cheers,
> Ben.
> 
> > > Maybe it's time for us to revive those discussions about providing a
> > > good set of relaxed MMIO accessors with explicit barriers :-)
> > > 
> > > Cheers,
> > > Ben.
> > >  
> 

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  9:24 UTC (permalink / raw)
  To: Eli Cohen
  Cc: netdev, Yevgeny Petrilin, Eli Cohen, David Laight,
	Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <20111010091611.GN2681@mtldesk30>

On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:

> Until then I think we need to have the logic working right on ppc and
> measure if blue flame buys us any improvement in ppc. If that's not
> the case (e.g because write combining is not working), then maybe we
> should avoid using blueflame in ppc.
> Could any of the guys from IBM check this and give us feedback?

I don't have the necessary hardware myself to test that but maybe Thadeu
can.

Note that for WC to work, things must be mapped non-guarded. You can do
that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
ioremap_wc() (dunno how "generic" the later is).

>From there, you should get write combining provided that you don't have
barriers between every access (ie those copy operations in their current
form should do the trick).

Cheers,
Ben.

> > Maybe it's time for us to revive those discussions about providing a
> > good set of relaxed MMIO accessors with explicit barriers :-)
> > 
> > Cheers,
> > Ben.
> >  

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-10  9:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: netdev, Yevgeny Petrilin, Eli Cohen, David Laight,
	Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <1318237284.29415.422.camel@pasglop>

On Mon, Oct 10, 2011 at 11:01:24AM +0200, Benjamin Herrenschmidt wrote:
> 
> The case where things get a bit more nasty is when you try to use MMIO
> for low latency small-data type transfers instead of DMA, in which case
> you do want the ability for the chipset to write-combine and control the
> barriers more precisely.
> 
> However, this is hard and Linux doesn't provide very good accessors to
> do so, thus you need to be extra careful (see my example about wmb()
> 
> In the case of the iomap "copy" operations, my problem is that they
> don't properly advertise their lack of ordering since normal iomap does
> have full ordering.
> 
> I believe they should provide ordering with a barrier before & a barrier
> after, eventually with _relaxed variants or _raw variants for those who
> "know what they are doing".

Until then I think we need to have the logic working right on ppc and
measure if blue flame buys us any improvement in ppc. If that's not
the case (e.g because write combining is not working), then maybe we
should avoid using blueflame in ppc.
Could any of the guys from IBM check this and give us feedback?
> 
> Maybe it's time for us to revive those discussions about providing a
> good set of relaxed MMIO accessors with explicit barriers :-)
> 
> Cheers,
> Ben.
>  

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  9:01 UTC (permalink / raw)
  To: Eli Cohen
  Cc: netdev, Yevgeny Petrilin, Eli Cohen, David Laight,
	Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <20111010084726.GM2681@mtldesk30>

On Mon, 2011-10-10 at 10:47 +0200, Eli Cohen wrote:
> On Mon, Oct 10, 2011 at 09:40:17AM +0100, David Laight wrote:
> > 
> > Actually memory barriers shouldn't really be added to
> > any of these 'accessor' functions.
> > (Or, at least, ones without barriers should be provided.)
> > 
> > The driver may want to to a series of writes, then a
> > single barrier, before a final write of a command (etc).
> > 
> > in_le32() from io.h is specially horrid!
> > 
> > 	David
> > 
> The driver would like to control if and when we want to put a memory
> barrier. We really don't want it to be done under the hood. In this
> respect we prefer raw functions which are still available to all
> platforms.

 ... but not necessarily the corresponding barriers.

That's why on powerpc we had to make all rmb,wmb and mb the same, aka a
full sync, because our weaker barriers don't order cachable vs.
non-cachable.

In any case, the raw functions are a bit nasty to use because they both
don't have barriers -and- don't handle endianness. So you have to be
extra careful.

In 90% of the cases, the barriers are what you want anyway. For example
in the else case of the driver, the doorbell MMIO typically wants it, so
using writel() is fine (or iowrite32be) and will have the necessary
barriers.

The case where things get a bit more nasty is when you try to use MMIO
for low latency small-data type transfers instead of DMA, in which case
you do want the ability for the chipset to write-combine and control the
barriers more precisely.

However, this is hard and Linux doesn't provide very good accessors to
do so, thus you need to be extra careful (see my example about wmb()

In the case of the iomap "copy" operations, my problem is that they
don't properly advertise their lack of ordering since normal iomap does
have full ordering.

I believe they should provide ordering with a barrier before & a barrier
after, eventually with _relaxed variants or _raw variants for those who
"know what they are doing".

Maybe it's time for us to revive those discussions about providing a
good set of relaxed MMIO accessors with explicit barriers :-)

Cheers,
Ben.
 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox