LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table
From: Benjamin Herrenschmidt @ 2011-05-18 21:33 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518210528.GA29524@schlenkerla.am.freescale.net>

On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> Loads with non-linear access patterns were producing a very high
> ratio of recursive pt faults to regular tlb misses.  Rather than
> choose between a 4-level table walk or a 1-level virtual page table
> lookup, use a hybrid scheme with a virtual linear pmd, followed by a
> 2-level lookup in the normal handler.
> 
> This adds about 5 cycles (assuming no cache misses, and e5500 timing)
> to a normal TLB miss, but greatly reduces the recursive fault rate
> for loads which don't have locality within 2 MiB regions but do have
> significant locality within 1 GiB regions.  Improvements of close to 50%
> were seen on such benchmarks.

Can you publish benchmarks that compare these two with no virtual at all
(4 full loads) ?

Cheers,
Ben.

> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/mm/tlb_low_64e.S |   23 +++++++++++++++--------
>  1 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index af08922..17726d3 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -24,7 +24,7 @@
>  #ifdef CONFIG_PPC_64K_PAGES
>  #define VPTE_PMD_SHIFT	(PTE_INDEX_SIZE+1)
>  #else
> -#define VPTE_PMD_SHIFT	(PTE_INDEX_SIZE)
> +#define VPTE_PMD_SHIFT	0
>  #endif
>  #define VPTE_PUD_SHIFT	(VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
>  #define VPTE_PGD_SHIFT	(VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
> @@ -185,7 +185,7 @@ normal_tlb_miss:
>  	/* Insert the bottom bits in */
>  	rlwimi	r14,r15,0,16,31
>  #else
> -	rldicl	r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
> +	rldicl	r14,r16,64-(PMD_SHIFT-3),PMD_SHIFT-3+4
>  #endif
>  	sldi	r15,r10,60
>  	clrrdi	r14,r14,3
> @@ -202,6 +202,16 @@ MMU_FTR_SECTION_ELSE
>  	ld	r14,0(r10)
>  ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)
>  
> +#ifndef CONFIG_PPC_64K_PAGES
> +	rldicl	r15,r16,64-PAGE_SHIFT+3,64-PTE_INDEX_SIZE-3
> +	clrrdi	r15,r15,3
> +
> +	cmpldi	cr0,r14,0
> +	beq	normal_tlb_miss_access_fault
> +
> +	ldx	r14,r14,r15
> +#endif
> +
>  finish_normal_tlb_miss:
>  	/* Check if required permissions are met */
>  	andc.	r15,r11,r14
> @@ -353,14 +363,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
>  #ifndef CONFIG_PPC_64K_PAGES
>  	/* Get to PUD entry */
>  	rldicl	r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
> -	clrrdi	r10,r11,3
> -	ldx	r15,r10,r15
> -	cmpldi	cr0,r15,0
> -	beq	virt_page_table_tlb_miss_fault
> -#endif /* CONFIG_PPC_64K_PAGES */
> -
> +#else
>  	/* Get to PMD entry */
>  	rldicl	r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
> +#endif
> +
>  	clrrdi	r10,r11,3
>  	ldx	r15,r10,r15
>  	cmpldi	cr0,r15,0

^ permalink raw reply

* Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs
From: Benjamin Herrenschmidt @ 2011-05-18 21:32 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

On Wed, 2011-05-18 at 16:04 -0500, Scott Wood wrote:
> This allows a virtual page table to be used at the PMD rather than
> the PTE level.
> 
> Rather than adjust the constant in pgd_index() (or ignore it, as
> too-large values don't hurt as long as overly large addresses aren't
> passed in), go back to using PTRS_PER_PGD.  The overflow comment seems to
> apply to a very old implementation of free_pgtables that used pgd_index()
> (unfortunately the commit message, if you seek it out in the historic
> tree, doesn't mention any details about the overflow).  The existing
> value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
> using it shouldn't produce an overflow where it's not otherwise possible.
> 
> Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.

Why do you want to create a virtual page table at the PMD level ? Also,
you are changing the geometry of the page tables which I think we don't
want. We chose that geometry so that the levels match the segment sizes
on server, I think it may have an impact with the hugetlbfs code (check
with David), it also was meant as a way to implement shared page tables
on hash64 tho we never published that.

Cheers,
Ben.

> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
>  arch/powerpc/include/asm/pgtable-ppc64-4k.h |   12 ++++--------
>  arch/powerpc/include/asm/pgtable-ppc64.h    |    3 +--
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> index 6eefdcf..194005e 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
> @@ -1,14 +1,10 @@
>  #ifndef _ASM_POWERPC_PGTABLE_PPC64_4K_H
>  #define _ASM_POWERPC_PGTABLE_PPC64_4K_H
> -/*
> - * Entries per page directory level.  The PTE level must use a 64b record
> - * for each page table entry.  The PMD and PGD level use a 32b record for
> - * each entry by assuming that each entry is page aligned.
> - */
> +
>  #define PTE_INDEX_SIZE  9
> -#define PMD_INDEX_SIZE  7
> +#define PMD_INDEX_SIZE  9
>  #define PUD_INDEX_SIZE  7
> -#define PGD_INDEX_SIZE  9
> +#define PGD_INDEX_SIZE  7
>  
>  #ifndef __ASSEMBLY__
>  #define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
> @@ -19,7 +15,7 @@
>  
>  #define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
>  #define PTRS_PER_PMD	(1 << PMD_INDEX_SIZE)
> -#define PTRS_PER_PUD	(1 << PMD_INDEX_SIZE)
> +#define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
>  #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
>  
>  /* PMD_SHIFT determines what a second-level page table entry can map */
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index 2b09cd5..8bd1cd9 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -181,8 +181,7 @@
>   * Find an entry in a page-table-directory.  We combine the address region
>   * (the high order N bits) and the pgd portion of the address.
>   */
> -/* to avoid overflow in free_pgtables we don't use PTRS_PER_PGD here */
> -#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & 0x1ff)
> +#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 1))
>  
>  #define pgd_offset(mm, address)	 ((mm)->pgd + pgd_index(address))
>  

^ permalink raw reply

* RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Benjamin Herrenschmidt @ 2011-05-18 21:30 UTC (permalink / raw)
  To: Moore, Eric
  Cc: Prakash, Sathya, Desai, Kashyap, linux-scsi@vger.kernel.org,
	Matthew Wilcox, Milton Miller, James Bottomley, paulus@samba.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4565AEA676113A449269C2F3A549520F80B66280@cosmail03.lsi.com>

On Wed, 2011-05-18 at 09:35 -0600, Moore, Eric wrote:
> I worked the original defect a couple months ago, and Kashyap is now
> getting around to posting my patch's.
> 
> This original defect has nothing to do with PPC64.  The original
> problem was only on x86.    It only became a problem on PPC64 when I
> tried to fix the original x86 issue by copying the writeq code from
> the linux headers, then it broke PPC64.   I doubt that broken patch
> was ever posted. Anyways, back to the original defect.  The reason it
> because a problem for x86 is because the kernel headers had a
> implementation of writeq in the arch/x86 headers, which means our
> internal implementation of writeq is not being used.  The writeq
> implementation in the kernel is total wrong for arch/x86 because it
> doesn't not have spin locks, and if two processor simultaneously doing
> two separate 32bit pci writes, then what is received by controller
> firmware is out of order.   This change occurs between Red Hat RHEL5
> and RHEL6.  In RHEL5, this writeq was not implemented in arch/x86
> headers, and our driver internal implementation of write was used.

You may also want to look at Milton's comments, it looks like the way
you do init_completion followed immediately by wait_completion is racy.

You should init the completion before you do the IO that will eventually
trigger complete() to be called.

Cheers,
Ben.

^ permalink raw reply

* [PATCH 7/7] [RFC] SMP support code
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

This patch adds the necessary core code to enable SMP support on BlueGene/P

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/kernel/head_44x.S         |   72 +++++++++++++++++++++++++++++
 arch/powerpc/mm/fault.c                |   77 ++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype |    2 +-
 3 files changed, 150 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 1f7ae60..57d4483 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1133,6 +1133,70 @@ clear_utlb_entry:
 
 #endif /* CONFIG_PPC_47x */
 
+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+_GLOBAL(start_secondary_bgp)
+	/* U2 will be enabled in TLBs. */
+        lis     r7,PPC44x_MMUCR_U2@h
+        mtspr   SPRN_MMUCR,r7
+        li      r7,0
+        mtspr   SPRN_PID,r7
+        sync
+        lis     r8,KERNELBASE@h
+
+        /* The tlb_44x_hwater global var (setup by cpu#0) reveals how many
+         * 256M TLBs we need to map.
+         */
+        lis     r9, tlb_44x_hwater@ha
+        lwz     r9, tlb_44x_hwater@l(r9)
+
+        li      r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+						PPC44x_TLB_M|PPC44x_TLB_U2)
+        oris    r5, r5, PPC44x_TLB_WL1@h
+
+        /* tlb_44x_hwater is the biggest TLB slot number for regular TLBs.
+           TLB 63 covers kernel base mapping(256MB) and TLB 62 covers CNS.
+           With 768MB lowmem, it is set to 59.
+        */
+2:
+        addi    r9, r9, 1
+        cmpwi   r9,62                  /* Stop at entry 62 which is the fw */
+        beq     3f
+        addis   r7,r7,0x1000           /* add 256M */
+        addis   r8,r8,0x1000
+        ori     r6,r8,PPC44x_TLB_VALID | PPC44x_TLB_256M
+
+        tlbwe   r6,r9,PPC44x_TLB_PAGEID /* Load the pageid fields */
+        tlbwe   r7,r9,PPC44x_TLB_XLAT   /* Load the translation fields */
+        tlbwe   r5,r9,PPC44x_TLB_ATTRIB /* Load the attrib/access fields */
+        b       2b
+
+3:      isync
+
+        /* Setup context from global var secondary_ti */
+        lis     r1, secondary_ti@ha
+        lwz     r1, secondary_ti@l(r1)
+        lwz     r2, TI_TASK(r1)         /*  r2 = task_info */
+
+        addi    r3,r2,THREAD    /* init task's THREAD */
+        mtspr   SPRN_SPRG3,r3
+
+        li      r0,0
+        stwu    r0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+
+        /* Let's move on */
+        lis     r4,start_secondary@h
+        ori     r4,r4,start_secondary@l
+        lis     r3,MSR_KERNEL@h
+        ori     r3,r3,MSR_KERNEL@l
+        mtspr   SPRN_SRR0,r4
+        mtspr   SPRN_SRR1,r3
+        rfi                     /* change context and jump to start_secondary */
+
+_GLOBAL(start_secondary_resume)
+	/* I don't think this currently happens on BGP */
+	b       .
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
 /*
  * Here we are back to code that is common between 44x and 47x
  *
@@ -1144,6 +1208,14 @@ head_start_common:
 	lis	r4,interrupt_base@h	/* IVPR only uses the high 16-bits */
 	mtspr	SPRN_IVPR,r4
 
+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+	/* are we an additional CPU */
+	li	r0, 0
+	mfspr	r4, SPRN_PIR
+	cmpw	r4, r0
+	bgt	start_secondary_bgp
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
 	addis	r22,r22,KERNELBASE@h
 	mtlr	r22
 	isync
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 54f4fb9..0e73244 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -103,6 +103,77 @@ static int store_updates_sp(struct pt_regs *regs)
 	return 0;
 }
 
+#ifdef CONFIG_BGP
+/*
+ * The icbi instruction does not broadcast to all cpus in the ppc450
+ * processor used by Blue Gene/P.  It is unlikely this problem will
+ * be exhibited in other processors so this remains ifdef'ed for BGP
+ * specifically.
+ *
+ * We deal with this by marking executable pages either writable, or
+ * executable, but never both.  The permissions will fault back and
+ * forth if the thread is actively writing to executable sections.
+ * Each time we fault to become executable we flush the dcache into
+ * icache on all cpus.
+ */
+struct bgp_fixup_parm {
+	struct page		*page;
+	unsigned long		address;
+	struct vm_area_struct	*vma;
+};
+
+static void bgp_fixup_cache_tlb(void *parm)
+{
+	struct bgp_fixup_parm	*p = parm;
+
+	if (!PageHighMem(p->page))
+		flush_dcache_icache_page(p->page);
+	local_flush_tlb_page(p->vma, p->address);
+}
+
+static void bgp_fixup_access_perms(struct vm_area_struct *vma,
+				  unsigned long address,
+				  int is_write, int is_exec)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pte_t *ptep = NULL;
+	pmd_t *pmdp;
+
+	if (get_pteptr(mm, address, &ptep, &pmdp)) {
+		spinlock_t *ptl = pte_lockptr(mm, pmdp);
+		pte_t old;
+
+		spin_lock(ptl);
+		old = *ptep;
+		if (pte_present(old)) {
+			struct page *page = pte_page(old);
+
+			if (is_exec) {
+				struct bgp_fixup_parm param = {
+					.page		= page,
+					.address	= address,
+					.vma		= vma,
+				};
+				pte_update(ptep, _PAGE_HWWRITE, 0);
+				on_each_cpu(bgp_fixup_cache_tlb, &param, 1);
+				pte_update(ptep, 0, _PAGE_EXEC);
+				pte_unmap_unlock(ptep, ptl);
+				return;
+			}
+			if (is_write &&
+			    (pte_val(old) & _PAGE_RW) &&
+			    (pte_val(old) & _PAGE_DIRTY) &&
+			    !(pte_val(old) & _PAGE_HWWRITE)) {
+				pte_update(ptep, _PAGE_EXEC, _PAGE_HWWRITE);
+			}
+		}
+		if (!pte_same(old, *ptep))
+			flush_tlb_page(vma, address);
+		pte_unmap_unlock(ptep, ptl);
+	}
+}
+#endif /* CONFIG_BGP */
+
 /*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault. For 400-family processors
@@ -333,6 +404,12 @@ good_area:
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, 0,
 				     regs, address);
 	}
+
+#ifdef CONFIG_BGP
+	/* Fixup _PAGE_EXEC and _PAGE_HWWRITE if necessary */
+	bgp_fixup_access_perms(vma, address, is_write, is_exec);
+#endif /* CONFIG_BGP */
+
 	up_read(&mm->mmap_sem);
 	return 0;
 
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 3a3c711..b77a25f 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -300,7 +300,7 @@ config PPC_PERF_CTRS
          This enables the powerpc-specific perf_event back-end.
 
 config SMP
-	depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
+	depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x || BGP
 	bool "Symmetric multi-processing support"
 	---help---
 	  This enables support for systems with more than one CPU. If you have
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 5/7] [RFC] force 32-byte aligned kmallocs
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

For BGP, it is convenient for 'kmalloc' to come back with 32-byte
aligned units for torus DMA

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/include/asm/page_32.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
index 68d73b2..fb0a7ae 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -9,7 +9,7 @@
 
 #define VM_DATA_DEFAULT_FLAGS	VM_DATA_DEFAULT_FLAGS32
 
-#ifdef CONFIG_NOT_COHERENT_CACHE
+#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
 #define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
 #endif
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 6/7] [RFC] enable early TLBs for BG/P
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

BG/P maps firmware with an early TLB

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/include/asm/mmu-44x.h |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index ca1b90c..2807d6e 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -115,8 +115,12 @@ typedef struct {
 #endif /* !__ASSEMBLY__ */
 
 #ifndef CONFIG_PPC_EARLY_DEBUG_44x
+#ifndef CONFIG_BGP
 #define PPC44x_EARLY_TLBS	1
-#else
+#else /* CONFIG_BGP */
+#define PPC44x_EARLY_TLBS	2
+#endif /* CONFIG_BGP */
+#else /* CONFIG_PPC_EARLY_DEBUG_44x */
 #define PPC44x_EARLY_TLBS	2
 #define PPC44x_EARLY_DEBUG_VIRTADDR	(ASM_CONST(0xf0000000) \
 	| (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0xffff))
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/7] [RFC] add support for BlueGene/P FPU
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

This patch adds save/restore register support for the BlueGene/P
double hummer FPU.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/include/asm/ppc_asm.h |   39 ++++++++++++++++++++++++-----------
 arch/powerpc/kernel/fpu.S          |    8 +++---
 arch/powerpc/platforms/44x/Kconfig |    9 ++++++++
 3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 9821006..daa22bb 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -88,6 +88,13 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 				REST_10GPRS(22, base)
 #endif
 
+#ifdef CONFIG_BGP
+#define LFPDX(frt, ra, rb)	.long (31<<26)|((frt)<<21)|((ra)<<16)| \
+							((rb)<<11)|(462<<1)
+#define STFPDX(frt, ra, rb)	.long (31<<26)|((frt)<<21)|((ra)<<16)| \
+							((rb)<<11)|(974<<1)
+#endif /* CONFIG_BGP */
+
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
 #define SAVE_8GPRS(n, base)	SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
@@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
-#define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
-#define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
-#define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
-#define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
-#define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
-#define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
-#define REST_16FPRS(n, base)	REST_8FPRS(n, base); REST_8FPRS(n+8, base)
-#define REST_32FPRS(n, base)	REST_16FPRS(n, base); REST_16FPRS(n+16, base)
+#ifdef CONFIG_BGP
+#define SAVE_FPR(n, b, base)	li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
+#define REST_FPR(n, b, base)	li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
+#else /* CONFIG_BGP */
+#define SAVE_FPR(n, b, base)	(stfd	n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#define REST_FPR(n, b, base)	(lfd	n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#endif /* CONFIG_BGP */
+
+#define SAVE_2FPRS(n, b, base)	SAVE_FPR(n, b, base); SAVE_FPR(n+1, b, base)
+#define SAVE_4FPRS(n, b, base)	SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2, b, base)
+#define SAVE_8FPRS(n, b, base)	SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4, b, base)
+#define SAVE_16FPRS(n, b, base)	SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8, b, base)
+#define SAVE_32FPRS(n, b, base)	SAVE_16FPRS(n, b, base); \
+				SAVE_16FPRS(n+16, b, base)
+#define REST_2FPRS(n, b, base)	REST_FPR(n, b, base); REST_FPR(n+1, b, base)
+#define REST_4FPRS(n, b, base)	REST_2FPRS(n, b, base); REST_2FPRS(n+2, b, base)
+#define REST_8FPRS(n, b, base)	REST_4FPRS(n, b, base); REST_4FPRS(n+4, b, base)
+#define REST_16FPRS(n, b, base)	REST_8FPRS(n, b, base); REST_8FPRS(n+8, b, base)
+#define REST_32FPRS(n, b, base)	REST_16FPRS(n, b, base); \
+				REST_16FPRS(n+16, b, base)
 
 #define SAVE_VR(n,b,base)	li b,THREAD_VR0+(16*(n));  stvx n,base,b
 #define SAVE_2VRS(n,b,base)	SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index de36955..9f11c66 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -30,7 +30,7 @@
 BEGIN_FTR_SECTION							\
 	b	2f;							\
 END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
-	REST_32FPRS(n,base);						\
+	REST_32FPRS(n,c,base);						\
 	b	3f;							\
 2:	REST_32VSRS(n,c,base);						\
 3:
@@ -39,13 +39,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
 BEGIN_FTR_SECTION							\
 	b	2f;							\
 END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
-	SAVE_32FPRS(n,base);						\
+	SAVE_32FPRS(n,c,base);						\
 	b	3f;							\
 2:	SAVE_32VSRS(n,c,base);						\
 3:
 #else
-#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
-#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n,b,base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n,b,base)
 #endif
 
 /*
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index f485fc5f..24a515e 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -169,6 +169,15 @@ config YOSEMITE
 	help
 	  This option enables support for the AMCC PPC440EP evaluation board.
 
+config	BGP
+	bool "Blue Gene/P"
+	depends on 44x
+	default n
+	select PPC_FPU
+	select PPC_DOUBLE_FPU
+	help
+	  This option enables support for the IBM BlueGene/P supercomputer.
+
 config ISS4xx
 	bool "ISS 4xx Simulator"
 	depends on (44x || 40x)
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

BG/P nodes need to be configured for writethrough to work in SMP
configurations.  This patch adds the right hooks in the MMU code
to make sure L1_WRITETHROUGH configurations are setup for BG/P.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/include/asm/mmu-44x.h     |    2 ++
 arch/powerpc/kernel/head_44x.S         |   24 ++++++++++++++++++++++--
 arch/powerpc/kernel/misc_32.S          |   15 +++++++++++++++
 arch/powerpc/lib/copy_32.S             |   10 ++++++++++
 arch/powerpc/mm/44x_mmu.c              |    7 +++++--
 arch/powerpc/platforms/Kconfig         |    5 +++++
 arch/powerpc/platforms/Kconfig.cputype |    4 ++++
 7 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index bf52d70..ca1b90c 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -8,6 +8,7 @@
 
 #define PPC44x_MMUCR_TID	0x000000ff
 #define PPC44x_MMUCR_STS	0x00010000
+#define PPC44x_MMUCR_U2		0x00200000
 
 #define	PPC44x_TLB_PAGEID	0
 #define	PPC44x_TLB_XLAT		1
@@ -32,6 +33,7 @@
 
 /* Storage attribute and access control fields */
 #define PPC44x_TLB_ATTR_MASK	0x0000ff80
+#define PPC44x_TLB_WL1		0x00100000	/* Write-through L1 */
 #define PPC44x_TLB_U0		0x00008000      /* User 0 */
 #define PPC44x_TLB_U1		0x00004000      /* User 1 */
 #define PPC44x_TLB_U2		0x00002000      /* User 2 */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 5e12b74..1f7ae60 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -429,7 +429,16 @@ finish_tlb_load_44x:
 	andi.	r10,r12,_PAGE_USER		/* User page ? */
 	beq	1f				/* nope, leave U bits empty */
 	rlwimi	r11,r11,3,26,28			/* yes, copy S bits to U */
-1:	tlbwe	r11,r13,PPC44x_TLB_ATTRIB	/* Write ATTRIB */
+1:
+#ifdef CONFIG_L1_WRITETHROUGH
+	andi.	r10, r11, PPC44x_TLB_I
+	bne	2f
+	oris    r11,r11,PPC44x_TLB_WL1@h	/* Add coherency for */
+						/* non-inhibited */
+	ori	r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
+2:
+#endif /* CONFIG_L1_WRITETHROUGH */
+	tlbwe	r11,r13,PPC44x_TLB_ATTRIB	/* Write ATTRIB */
 
 	/* Done...restore registers and get out of here.
 	*/
@@ -799,7 +808,11 @@ skpinv:	addi	r4,r4,1				/* Increment */
 	sync
 
 	/* Initialize MMUCR */
+#ifdef CONFIG_L1_WRITETHROUGH
+	lis	r5, PPC44x_MMUCR_U2@h
+#else
 	li	r5,0
+#endif /* CONFIG_L1_WRITETHROUGH */
 	mtspr	SPRN_MMUCR,r5
 	sync
 
@@ -814,7 +827,14 @@ skpinv:	addi	r4,r4,1				/* Increment */
 	/* attrib fields */
 	/* Added guarded bit to protect against speculative loads/stores */
 	li	r5,0
-	ori	r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
+#ifdef CONFIG_L1_WRITETHROUGH
+	ori	r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+						PPC44x_TLB_G | PPC44x_TLB_U2)
+	oris	r5,r5,PPC44x_TLB_WL1@h
+#else
+	ori	r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+			PPC44x_TLB_G)
+#endif /* CONFIG_L1_WRITETHROUGH
 
         li      r0,63                    /* TLB slot 63 */
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 094bd98..d88369b 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
 	li	r0,PAGE_SIZE/L1_CACHE_BYTES
 	slw	r0,r0,r4
 	mtctr	r0
+#ifdef CONFIG_L1_WRITETHROUGH
+	/* assuming 32 byte cacheline */
+	li      r4, 0
+1:	stw     r4, 0(r3)
+	stw     r4, 4(r3)
+	stw     r4, 8(r3)
+	stw     r4, 12(r3)
+	stw     r4, 16(r3)
+	stw     r4, 20(r3)
+	stw     r4, 24(r3)
+	stw     r4, 28(r3)
+#else
 1:	dcbz	0,r3
+#endif /* CONFIG_L1_WRITETHROUGH */
 	addi	r3,r3,L1_CACHE_BYTES
 	bdnz	1b
 	blr
@@ -550,7 +563,9 @@ _GLOBAL(copy_page)
 	mtctr	r0
 1:
 	dcbt	r11,r4
+#ifndef CONFIG_L1_WRITETHROUGH
 	dcbz	r5,r3
+#endif
 	COPY_16_BYTES
 #if L1_CACHE_BYTES >= 32
 	COPY_16_BYTES
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..98a07e3 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
 	bdnz	4b
 3:	mtctr	r9
 	li	r7,4
+#ifdef CONFIG_L1_WRITETHROUGH
+10:
+#else
 10:	dcbz	r7,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
 	addi	r6,r6,CACHELINE_BYTES
 	bdnz	10b
 	clrlwi	r5,r8,32-LG_CACHELINE_BYTES
@@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
 	mtctr	r0
 	beq	63f
 53:
+#ifndef CONFIG_L1_WRITETHROUGH
 	dcbz	r11,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
 	COPY_16_BYTES
 #if L1_CACHE_BYTES >= 32
 	COPY_16_BYTES
@@ -368,7 +374,11 @@ _GLOBAL(__copy_tofrom_user)
 	mtctr	r8
 
 53:	dcbt	r3,r4
+#ifdef CONFIG_L1_WRITETHROUGH
+54:
+#else
 54:	dcbz	r11,r6
+#endif
 	.section __ex_table,"a"
 	.align	2
 	.long	54b,105f
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 024acab..b684c8a 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
 	:
 #ifdef CONFIG_PPC47x
 	: "r" (PPC47x_TLB2_S_RWX),
-#else
+#elseif CONFIG_L1_WRITETHROUGH
+	: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
+		| PPC44x_TLB_U2 | PPC44x_TLB_M),
+#else /* neither CONFIG_PPC47x or CONFIG_L1_WRITETHROUGH */
 	: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
-#endif
+#endif /* CONFIG_PPC47x */
 	  "r" (phys),
 	  "r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
 	  "r" (entry),
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f7b0772..684a281 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -348,4 +348,9 @@ config XILINX_PCI
 	bool "Xilinx PCI host bridge support"
 	depends on PCI && XILINX_VIRTEX
 
+config L1_WRITETHROUGH
+	bool "Blue Gene/P enabled writethrough mode"
+	depends on BGP
+	default y
+
 endmenu
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 111138c..3a3c711 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -329,9 +329,13 @@ config NOT_COHERENT_CACHE
 	bool
 	depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 	default n if PPC_47x
+	default n if BGP
 	default y
 
 config CHECK_CACHE_COHERENCY
 	bool
 
+config L1_WRITETHROUGH
+	bool
+
 endmenu
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 2/7] [RFC] add bluegene entry to cputable
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux
In-Reply-To: <1305753895-24845-1-git-send-email-ericvh@gmail.com>

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 arch/powerpc/kernel/cputable.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index b9602ee..0eb245e 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1732,6 +1732,20 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.machine_check		= machine_check_440A,
 		.platform		= "ppc440",
 	},
+	{ /* Blue Gene/P */
+		.pvr_mask		= 0xfffffff0,
+		.pvr_value		= 0x52131880,
+		.cpu_name		= "450 Blue Gene/P",
+		.cpu_features		= CPU_FTRS_440x6,
+		.cpu_user_features	= COMMON_USER_BOOKE |
+						PPC_FEATURE_HAS_FPU,
+		.mmu_features		= MMU_FTR_TYPE_44x,
+		.icache_bsize		= 32,
+		.dcache_bsize		= 32,
+		.cpu_setup		= __setup_cpu_460gt,
+		.machine_check		= machine_check_440A,
+		.platform		= "ppc440",
+	},
 	{ /* 460EX */
 		.pvr_mask		= 0xffff0006,
 		.pvr_value		= 0x13020002,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/7] [RFC] Mainline BG/P platform support
From: Eric Van Hensbergen @ 2011-05-18 21:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev, bg-linux

The Linux kernel patches for the IBM BlueGene/P have been open-sourced
for quite some time, but haven't been integrated into the mainline Linux
kernel source tree.  This is the first patch series of several where I
will attempt to cleanup and mainline the already public patches.  I
welcome feedback as well as any help I can get.  I'm drawing on
the patches available for the IBM Compute Node kernel, the ZeptoOS project
and the Kittyhawk project.
(all available from http://wiki.bg.anl-external.org)

I'll be prioritizing core patches which are harder to keep current with
mainline due to merge conflicts and then slowly incorporating the drivers
and other extensions (if acceptable after community review).

I'll be maintaining the patchset in my kernel.org repository
(/pub/scm/linux/kernel/git/ericvh/bluegene.git) under the bluegene
branch with the source repos (zepto, kittyhawk, ibmcn) available in
respective branches.  Ben - if you would prefer me to send pull requests
once we get rolling, I can switch to that -- otherwise I'll stick to
just submitting patches to the list assuming you'll pull them when they
become acceptable.  Thanks for your attention reviewing these patches.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
 MAINTAINERS |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 69f19f1..3ffca88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3863,6 +3863,14 @@ S:	Maintained
 F:	arch/powerpc/platforms/40x/
 F:	arch/powerpc/platforms/44x/
 
+LINUX FOR POWERPC BLUEGENE/P
+M:	Eric Van Hensbergen <ericvh@gmail.com>
+W:	http://bg-linux.anl-external.org/wiki/index.php/Main_Page
+L:	bg-linux@lists.anl-external.org
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/bluegene.git
+S:	Maintained
+F:	arch/powerpc/platforms/44x/bgp*
+
 LINUX FOR POWERPC EMBEDDED XILINX VIRTEX
 M:	Grant Likely <grant.likely@secretlab.ca>
 W:	http://wiki.secretlab.ca/index.php/Linux_on_Xilinx_Virtex
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

I don't see where any non-standard page size will be set in the
kernel page tables, so don't waste time checking for it.  It wouldn't
work with TLB0 on an FSL MMU anyway, so if there's something I missed
(or which is out-of-tree), it's relying on implementation-specific
behavior.  If there's an out-of-tree need for occasional 4K mappings
with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
that is defined.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/mm/tlb_low_64e.S |   13 -------------
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 922fece..e782023 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -232,19 +232,6 @@ finish_normal_tlb_miss:
 	rlwimi	r11,r14,32-19,27,31	/* Insert WIMGE */
 	mtspr	SPRN_MAS2,r11
 
-	/* Check page size, if not standard, update MAS1 */
-	rldicl	r11,r14,64-8,64-8
-#ifdef CONFIG_PPC_64K_PAGES
-	cmpldi	cr0,r11,BOOK3E_PAGESZ_64K
-#else
-	cmpldi	cr0,r11,BOOK3E_PAGESZ_4K
-#endif
-	beq-	1f
-	mfspr	r11,SPRN_MAS1
-	rlwimi	r11,r14,31,21,24
-	rlwinm	r11,r11,0,21,19
-	mtspr	SPRN_MAS1,r11
-1:
 	/* Move RPN in position */
 	rldicr	r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
 	clrldi	r15,r11,12		/* Clear crap at the top */
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/7] powerpc/mm: 64-bit: Don't load PACA in normal TLB miss exceptions
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

Load it only when needed, in recursive/linear/indirect faults,
and in the stats code.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/exception-64e.h |   28 +++++++++---------
 arch/powerpc/mm/tlb_low_64e.S            |   43 +++++++++++++++++------------
 2 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h
index 6921261..9b57a27 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -80,9 +80,9 @@ exc_##label##_book3e:
  *
  * This prolog handles re-entrancy (up to 3 levels supported in the PACA
  * though we currently don't test for overflow). It provides you with a
- * re-entrancy safe working space of r10...r16 and CR with r12 being used
- * as the exception area pointer in the PACA for that level of re-entrancy
- * and r13 containing the PACA pointer.
+ * re-entrancy safe working space of r10...r16 (except r13) and CR with r12
+ * being used as the exception area pointer in the PACA for that level of
+ * re-entrancy.
  *
  * SRR0 and SRR1 are saved, but DEAR and ESR are not, since they don't apply
  * as-is for instruction exceptions. It's up to the actual exception code
@@ -95,8 +95,6 @@ exc_##label##_book3e:
 	mfcr	r10;							    \
 	std	r11,EX_TLB_R11(r12);					    \
 	mfspr	r11,SPRN_SPRG_TLB_SCRATCH;				    \
-	std	r13,EX_TLB_R13(r12);					    \
-	ld	r13,EX_TLB_PACA(r12);					    \
 	std	r14,EX_TLB_R14(r12);					    \
 	addi	r14,r12,EX_TLB_SIZE;					    \
 	std	r15,EX_TLB_R15(r12);					    \
@@ -135,7 +133,6 @@ exc_##label##_book3e:
 	mtspr	SPRN_SPRG_TLB_EXFRAME,freg;				    \
 	ld	r11,EX_TLB_R11(r12);					    \
 	mtcr	r14;							    \
-	ld	r13,EX_TLB_R13(r12);					    \
 	ld	r14,EX_TLB_R14(r12);					    \
 	mtspr	SPRN_SRR0,r15;						    \
 	ld	r15,EX_TLB_R15(r12);					    \
@@ -148,11 +145,13 @@ exc_##label##_book3e:
 	TLB_MISS_RESTORE(r12)
 
 #define TLB_MISS_EPILOG_ERROR						    \
-	addi	r12,r13,PACA_EXTLB;					    \
+	ld	r10,EX_TLB_PACA(r12);					    \
+	addi	r12,r10,PACA_EXTLB;					    \
 	TLB_MISS_RESTORE(r12)
 
 #define TLB_MISS_EPILOG_ERROR_SPECIAL					    \
-	addi	r11,r13,PACA_EXTLB;					    \
+	ld	r10,EX_TLB_PACA(r12);					    \
+	addi	r11,r10,PACA_EXTLB;					    \
 	TLB_MISS_RESTORE(r11)
 
 #ifdef CONFIG_BOOK3E_MMU_TLB_STATS
@@ -160,25 +159,26 @@ exc_##label##_book3e:
 	mflr	r10;							    \
 	std	r8,EX_TLB_R8(r12);					    \
 	std	r9,EX_TLB_R9(r12);					    \
-	std	r10,EX_TLB_LR(r12);
+	std	r10,EX_TLB_LR(r12);					    \
+	ld	r9,EX_TLB_PACA(r12);
 #define TLB_MISS_RESTORE_STATS					            \
 	ld	r16,EX_TLB_LR(r12);					    \
 	ld	r9,EX_TLB_R9(r12);					    \
 	ld	r8,EX_TLB_R8(r12);					    \
 	mtlr	r16;
 #define TLB_MISS_STATS_D(name)						    \
-	addi	r9,r13,MMSTAT_DSTATS+name;				    \
+	addi	r9,r9,MMSTAT_DSTATS+name;				    \
 	bl	.tlb_stat_inc;
 #define TLB_MISS_STATS_I(name)						    \
-	addi	r9,r13,MMSTAT_ISTATS+name;				    \
+	addi	r9,r9,MMSTAT_ISTATS+name;				    \
 	bl	.tlb_stat_inc;
 #define TLB_MISS_STATS_X(name)						    \
-	ld	r8,PACA_EXTLB+EX_TLB_ESR(r13);				    \
+	ld	r8,PACA_EXTLB+EX_TLB_ESR(r9);				    \
 	cmpdi	cr2,r8,-1;						    \
 	beq	cr2,61f;						    \
-	addi	r9,r13,MMSTAT_DSTATS+name;				    \
+	addi	r9,r9,MMSTAT_DSTATS+name;				    \
 	b	62f;							    \
-61:	addi	r9,r13,MMSTAT_ISTATS+name;				    \
+61:	addi	r9,r9,MMSTAT_ISTATS+name;				    \
 62:	bl	.tlb_stat_inc;
 #define TLB_MISS_STATS_SAVE_INFO					    \
 	std	r14,EX_TLB_ESR(r12);	/* save ESR */			    \
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index 17726d3..922fece 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -160,7 +160,6 @@
  * r16 = faulting address
  * r15 = region ID
  * r14 = crap (free to use)
- * r13 = PACA
  * r12 = TLB exception frame in PACA
  * r11 = PTE permission mask
  * r10 = crap (free to use)
@@ -299,8 +298,7 @@ normal_tlb_miss_access_fault:
  *
  * r16 = virtual page table faulting address
  * r15 = region (top 4 bits of address)
- * r14 = crap (free to use)
- * r13 = PACA
+ * r14 = crap (we load with PACA)
  * r12 = TLB exception frame in PACA
  * r11 = crap (free to use)
  * r10 = crap (free to use)
@@ -318,6 +316,8 @@ normal_tlb_miss_access_fault:
  * so we could probably optimize things a bit
  */
 virt_page_table_tlb_miss:
+	ld	r14,EX_TLB_PACA(r12)
+
 	/* Are we hitting a kernel page table ? */
 	andi.	r10,r15,0x8
 
@@ -325,7 +325,7 @@ virt_page_table_tlb_miss:
 	 * and we happen to have the swapper_pg_dir at offset 8 from the user
 	 * pgdir in the PACA :-).
 	 */
-	add	r11,r10,r13
+	add	r11,r10,r14
 
 	/* If kernel, we need to clear MAS1 TID */
 	beq	1f
@@ -417,12 +417,12 @@ virt_page_table_tlb_miss_done:
 	 * offset the return address by -4 in order to replay the tlbsrx
 	 * instruction there
 	 */
-	subf	r10,r13,r12
+	subf	r10,r14,r12
 	cmpldi	cr0,r10,PACA_EXTLB+EX_TLB_SIZE
 	bne-	1f
-	ld	r11,PACA_EXTLB+EX_TLB_SIZE+EX_TLB_SRR0(r13)
+	ld	r11,PACA_EXTLB+EX_TLB_SIZE+EX_TLB_SRR0(r14)
 	addi	r10,r11,-4
-	std	r10,PACA_EXTLB+EX_TLB_SIZE+EX_TLB_SRR0(r13)
+	std	r10,PACA_EXTLB+EX_TLB_SIZE+EX_TLB_SRR0(r14)
 1:
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
 	/* Return to caller, normal case */
@@ -449,13 +449,13 @@ virt_page_table_tlb_miss_fault:
 	 * level as well. Since we are doing that, we don't need to clear or
 	 * restore the TLB reservation neither.
 	 */
-	subf	r10,r13,r12
+	subf	r10,r14,r12
 	cmpldi	cr0,r10,PACA_EXTLB+EX_TLB_SIZE
 	bne-	virt_page_table_tlb_miss_whacko_fault
 
 	/* We dig the original DEAR and ESR from slot 0 */
-	ld	r15,EX_TLB_DEAR+PACA_EXTLB(r13)
-	ld	r16,EX_TLB_ESR+PACA_EXTLB(r13)
+	ld	r15,EX_TLB_DEAR+PACA_EXTLB(r14)
+	ld	r16,EX_TLB_ESR+PACA_EXTLB(r14)
 
 	/* We check for the "special" ESR value for instruction faults */
 	cmpdi	cr0,r16,-1
@@ -489,6 +489,8 @@ virt_page_table_tlb_miss_whacko_fault:
 	START_EXCEPTION(data_tlb_miss_htw)
 	TLB_MISS_PROLOG
 
+	ld	r15,EX_TLB_PACA(r12)
+
 	/* Now we handle the fault proper. We only save DEAR in normal
 	 * fault case since that's the only interesting values here.
 	 * We could probably also optimize by not saving SRR0/1 in the
@@ -503,8 +505,12 @@ virt_page_table_tlb_miss_whacko_fault:
 
 	/* We do the user/kernel test for the PID here along with the RW test
 	 */
+	/* The cool thing now is that r11 contains 0 for user and 8 for kernel,
+	 * and we happen to have the swapper_pg_dir at offset 8 from the user
+	 * pgdir in the PACA :-).
+	 */
 	cmpldi	cr0,r11,0		/* Check for user region */
-	ld	r15,PACAPGD(r13)	/* Load user pgdir */
+	add	r15,r15,r11
 	beq	htw_tlb_miss
 
 	/* XXX replace the RMW cycles with immediate loads + writes */
@@ -512,7 +518,6 @@ virt_page_table_tlb_miss_whacko_fault:
 	cmpldi	cr0,r11,8		/* Check for vmalloc region */
 	rlwinm	r10,r10,0,16,1		/* Clear TID */
 	mtspr	SPRN_MAS1,r10
-	ld	r15,PACA_KERNELPGD(r13)	/* Load kernel pgdir */
 	beq+	htw_tlb_miss
 
 	/* We got a crappy address, just fault with whatever DEAR and ESR
@@ -526,6 +531,8 @@ virt_page_table_tlb_miss_whacko_fault:
 	START_EXCEPTION(instruction_tlb_miss_htw)
 	TLB_MISS_PROLOG
 
+	ld	r15,EX_TLB_PACA(r12)
+
 	/* If we take a recursive fault, the second level handler may need
 	 * to know whether we are handling a data or instruction fault in
 	 * order to get to the right store fault handler. We provide that
@@ -548,7 +555,7 @@ virt_page_table_tlb_miss_whacko_fault:
 	/* We do the user/kernel test for the PID here along with the RW test
 	 */
 	cmpldi	cr0,r11,0			/* Check for user region */
-	ld	r15,PACAPGD(r13)		/* Load user pgdir */
+	add	r15,r15,r11
 	beq	htw_tlb_miss
 
 	/* XXX replace the RMW cycles with immediate loads + writes */
@@ -556,7 +563,6 @@ virt_page_table_tlb_miss_whacko_fault:
 	cmpldi	cr0,r11,8			/* Check for vmalloc region */
 	rlwinm	r10,r10,0,16,1			/* Clear TID */
 	mtspr	SPRN_MAS1,r10
-	ld	r15,PACA_KERNELPGD(r13)		/* Load kernel pgdir */
 	beq+	htw_tlb_miss
 
 	/* We got a crappy address, just fault */
@@ -570,9 +576,8 @@ virt_page_table_tlb_miss_whacko_fault:
  * misses. We are entered with:
  *
  * r16 = virtual page table faulting address
- * r15 = PGD pointer
+ * r15 = pointer to PGD pointer
  * r14 = ESR
- * r13 = PACA
  * r12 = TLB exception frame in PACA
  * r11 = crap (free to use)
  * r10 = crap (free to use)
@@ -581,6 +586,8 @@ virt_page_table_tlb_miss_whacko_fault:
  * avoid too much complication, it will save/restore things for us
  */
 htw_tlb_miss:
+	ld	r15,0(r15)
+
 	/* Search if we already have a TLB entry for that virtual address, and
 	 * if we do, bail out.
 	 *
@@ -692,7 +699,6 @@ htw_tlb_miss_fault:
  * r16 = faulting address
  * r15 = crap (free to use)
  * r14 = ESR (data) or -1 (instruction)
- * r13 = PACA
  * r12 = TLB exception frame in PACA
  * r11 = crap (free to use)
  * r10 = crap (free to use)
@@ -714,7 +720,8 @@ tlb_load_linear:
 	 * we only use 1G pages for now. That might have to be changed in a
 	 * final implementation, especially when dealing with hypervisors
 	 */
-	ld	r11,PACATOC(r13)
+	ld	r11,EX_TLB_PACA(r12)
+	ld	r11,PACATOC(r11)
 	ld	r11,linear_map_top@got(r11)
 	ld	r10,0(r11)
 	cmpld	cr0,r10,r16
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
Is there any 64-bit book3e chip that doesn't support this?  It
doesn't appear to be optional in the ISA.

 arch/powerpc/kernel/cputable.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 34d2722..a3b8eeb 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1981,7 +1981,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_features		= CPU_FTRS_E5500,
 		.cpu_user_features	= COMMON_USER_BOOKE,
 		.mmu_features		= MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS |
-			MMU_FTR_USE_TLBILX,
+			MMU_FTR_USE_TLBILX | MMU_FTR_USE_PAIRED_MAS,
 		.icache_bsize		= 64,
 		.dcache_bsize		= 64,
 		.num_pmcs		= 4,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

A little more speed up measured on e5500.

Setting of U0-3 is dropped as it is not used by Linux as far as I can
see.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/mm/tlb_low_64e.S |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index e782023..a94c87b 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -47,10 +47,10 @@
 	 * We could probably also optimize by not saving SRR0/1 in the
 	 * linear mapping case but I'll leave that for later
 	 */
-	mfspr	r14,SPRN_ESR
 	mfspr	r16,SPRN_DEAR		/* get faulting address */
 	srdi	r15,r16,60		/* get region */
 	cmpldi	cr0,r15,0xc		/* linear mapping ? */
+	mfspr	r14,SPRN_ESR
 	TLB_MISS_STATS_SAVE_INFO
 	beq	tlb_load_linear		/* yes -> go to linear map load */
 
@@ -62,11 +62,11 @@
 	andi.	r10,r15,0x1
 	bne-	virt_page_table_tlb_miss
 
-	std	r14,EX_TLB_ESR(r12);	/* save ESR */
-	std	r16,EX_TLB_DEAR(r12);	/* save DEAR */
+	/* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
 
-	 /* We need _PAGE_PRESENT and  _PAGE_ACCESSED set */
+	std	r14,EX_TLB_ESR(r12);	/* save ESR */
 	li	r11,_PAGE_PRESENT
+	std	r16,EX_TLB_DEAR(r12);	/* save DEAR */
 	oris	r11,r11,_PAGE_ACCESSED@h
 
 	/* We do the user/kernel test for the PID here along with the RW test
@@ -225,21 +225,16 @@ finish_normal_tlb_miss:
 	 *                 yet implemented for now
 	 * MAS 2   :	Defaults not useful, need to be redone
 	 * MAS 3+7 :	Needs to be done
-	 *
-	 * TODO: mix up code below for better scheduling
 	 */
 	clrrdi	r11,r16,12		/* Clear low crap in EA */
+	rldicr	r15,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
 	rlwimi	r11,r14,32-19,27,31	/* Insert WIMGE */
+	clrldi	r15,r15,12		/* Clear crap at the top */
 	mtspr	SPRN_MAS2,r11
-
-	/* Move RPN in position */
-	rldicr	r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
-	clrldi	r15,r11,12		/* Clear crap at the top */
-	rlwimi	r15,r14,32-8,22,25	/* Move in U bits */
+	andi.	r11,r14,_PAGE_DIRTY
 	rlwimi	r15,r14,32-2,26,31	/* Move in BAP bits */
 
 	/* Mask out SW and UW if !DIRTY (XXX optimize this !) */
-	andi.	r11,r14,_PAGE_DIRTY
 	bne	1f
 	li	r11,MAS3_SW|MAS3_UW
 	andc	r15,r15,r11
@@ -483,10 +478,10 @@ virt_page_table_tlb_miss_whacko_fault:
 	 * We could probably also optimize by not saving SRR0/1 in the
 	 * linear mapping case but I'll leave that for later
 	 */
-	mfspr	r14,SPRN_ESR
 	mfspr	r16,SPRN_DEAR		/* get faulting address */
 	srdi	r11,r16,60		/* get region */
 	cmpldi	cr0,r11,0xc		/* linear mapping ? */
+	mfspr	r14,SPRN_ESR
 	TLB_MISS_STATS_SAVE_INFO
 	beq	tlb_load_linear		/* yes -> go to linear map load */
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

Loads with non-linear access patterns were producing a very high
ratio of recursive pt faults to regular tlb misses.  Rather than
choose between a 4-level table walk or a 1-level virtual page table
lookup, use a hybrid scheme with a virtual linear pmd, followed by a
2-level lookup in the normal handler.

This adds about 5 cycles (assuming no cache misses, and e5500 timing)
to a normal TLB miss, but greatly reduces the recursive fault rate
for loads which don't have locality within 2 MiB regions but do have
significant locality within 1 GiB regions.  Improvements of close to 50%
were seen on such benchmarks.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/mm/tlb_low_64e.S |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index af08922..17726d3 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -24,7 +24,7 @@
 #ifdef CONFIG_PPC_64K_PAGES
 #define VPTE_PMD_SHIFT	(PTE_INDEX_SIZE+1)
 #else
-#define VPTE_PMD_SHIFT	(PTE_INDEX_SIZE)
+#define VPTE_PMD_SHIFT	0
 #endif
 #define VPTE_PUD_SHIFT	(VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
 #define VPTE_PGD_SHIFT	(VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
@@ -185,7 +185,7 @@ normal_tlb_miss:
 	/* Insert the bottom bits in */
 	rlwimi	r14,r15,0,16,31
 #else
-	rldicl	r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
+	rldicl	r14,r16,64-(PMD_SHIFT-3),PMD_SHIFT-3+4
 #endif
 	sldi	r15,r10,60
 	clrrdi	r14,r14,3
@@ -202,6 +202,16 @@ MMU_FTR_SECTION_ELSE
 	ld	r14,0(r10)
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)
 
+#ifndef CONFIG_PPC_64K_PAGES
+	rldicl	r15,r16,64-PAGE_SHIFT+3,64-PTE_INDEX_SIZE-3
+	clrrdi	r15,r15,3
+
+	cmpldi	cr0,r14,0
+	beq	normal_tlb_miss_access_fault
+
+	ldx	r14,r14,r15
+#endif
+
 finish_normal_tlb_miss:
 	/* Check if required permissions are met */
 	andc.	r15,r11,r14
@@ -353,14 +363,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
 #ifndef CONFIG_PPC_64K_PAGES
 	/* Get to PUD entry */
 	rldicl	r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
-	clrrdi	r10,r11,3
-	ldx	r15,r10,r15
-	cmpldi	cr0,r15,0
-	beq	virt_page_table_tlb_miss_fault
-#endif /* CONFIG_PPC_64K_PAGES */
-
+#else
 	/* Get to PMD entry */
 	rldicl	r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
+#endif
+
 	clrrdi	r10,r11,3
 	ldx	r15,r10,r15
 	cmpldi	cr0,r15,0
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/7] powerpc/mm: 64-bit tlb miss: get PACA from memory rather than SPR
From: Scott Wood @ 2011-05-18 21:05 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110518210453.GA29500@schlenkerla.am.freescale.net>

This saves a few cycles, at least on e5500.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/exception-64e.h |   16 +++++++---------
 arch/powerpc/kernel/paca.c               |    5 +++++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h
index 6d53f31..6921261 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -62,16 +62,14 @@
 #define EX_TLB_ESR	( 9 * 8) /* Level 0 and 2 only */
 #define EX_TLB_SRR0	(10 * 8)
 #define EX_TLB_SRR1	(11 * 8)
-#define EX_TLB_MMUCR0	(12 * 8) /* Level 0 */
-#define EX_TLB_MAS1	(12 * 8) /* Level 0 */
-#define EX_TLB_MAS2	(13 * 8) /* Level 0 */
+#define EX_TLB_PACA	(12 * 8)
 #ifdef CONFIG_BOOK3E_MMU_TLB_STATS
-#define EX_TLB_R8	(14 * 8)
-#define EX_TLB_R9	(15 * 8)
-#define EX_TLB_LR	(16 * 8)
-#define EX_TLB_SIZE	(17 * 8)
+#define EX_TLB_R8	(13 * 8)
+#define EX_TLB_R9	(14 * 8)
+#define EX_TLB_LR	(15 * 8)
+#define EX_TLB_SIZE	(16 * 8)
 #else
-#define EX_TLB_SIZE	(14 * 8)
+#define EX_TLB_SIZE	(13 * 8)
 #endif
 
 #define	START_EXCEPTION(label)						\
@@ -98,7 +96,7 @@ exc_##label##_book3e:
 	std	r11,EX_TLB_R11(r12);					    \
 	mfspr	r11,SPRN_SPRG_TLB_SCRATCH;				    \
 	std	r13,EX_TLB_R13(r12);					    \
-	mfspr	r13,SPRN_SPRG_PACA;					    \
+	ld	r13,EX_TLB_PACA(r12);					    \
 	std	r14,EX_TLB_R14(r12);					    \
 	addi	r14,r12,EX_TLB_SIZE;					    \
 	std	r15,EX_TLB_R15(r12);					    \
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 102244e..814dae2 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -151,6 +151,11 @@ void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 #ifdef CONFIG_PPC_STD_MMU_64
 	new_paca->slb_shadow_ptr = &slb_shadow[cpu];
 #endif /* CONFIG_PPC_STD_MMU_64 */
+#ifdef CONFIG_PPC_BOOK3E
+	new_paca->extlb[0][EX_TLB_PACA / 8] = (u64)new_paca;
+	new_paca->extlb[1][EX_TLB_PACA / 8] = (u64)new_paca;
+	new_paca->extlb[2][EX_TLB_PACA / 8] = (u64)new_paca;
+#endif
 }
 
 /* Put the paca pointer into r13 and SPRG_PACA */
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs
From: Scott Wood @ 2011-05-18 21:04 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev

This allows a virtual page table to be used at the PMD rather than
the PTE level.

Rather than adjust the constant in pgd_index() (or ignore it, as
too-large values don't hurt as long as overly large addresses aren't
passed in), go back to using PTRS_PER_PGD.  The overflow comment seems to
apply to a very old implementation of free_pgtables that used pgd_index()
(unfortunately the commit message, if you seek it out in the historic
tree, doesn't mention any details about the overflow).  The existing
value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
using it shouldn't produce an overflow where it's not otherwise possible.

Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/include/asm/pgtable-ppc64-4k.h |   12 ++++--------
 arch/powerpc/include/asm/pgtable-ppc64.h    |    3 +--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
index 6eefdcf..194005e 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
@@ -1,14 +1,10 @@
 #ifndef _ASM_POWERPC_PGTABLE_PPC64_4K_H
 #define _ASM_POWERPC_PGTABLE_PPC64_4K_H
-/*
- * Entries per page directory level.  The PTE level must use a 64b record
- * for each page table entry.  The PMD and PGD level use a 32b record for
- * each entry by assuming that each entry is page aligned.
- */
+
 #define PTE_INDEX_SIZE  9
-#define PMD_INDEX_SIZE  7
+#define PMD_INDEX_SIZE  9
 #define PUD_INDEX_SIZE  7
-#define PGD_INDEX_SIZE  9
+#define PGD_INDEX_SIZE  7
 
 #ifndef __ASSEMBLY__
 #define PTE_TABLE_SIZE	(sizeof(pte_t) << PTE_INDEX_SIZE)
@@ -19,7 +15,7 @@
 
 #define PTRS_PER_PTE	(1 << PTE_INDEX_SIZE)
 #define PTRS_PER_PMD	(1 << PMD_INDEX_SIZE)
-#define PTRS_PER_PUD	(1 << PMD_INDEX_SIZE)
+#define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
 
 /* PMD_SHIFT determines what a second-level page table entry can map */
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 2b09cd5..8bd1cd9 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -181,8 +181,7 @@
  * Find an entry in a page-table-directory.  We combine the address region
  * (the high order N bits) and the pgd portion of the address.
  */
-/* to avoid overflow in free_pgtables we don't use PTRS_PER_PGD here */
-#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & 0x1ff)
+#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 1))
 
 #define pgd_offset(mm, address)	 ((mm)->pgd + pgd_index(address))
 
-- 
1.7.4.1

^ permalink raw reply related

* RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Moore, Eric @ 2011-05-18 19:11 UTC (permalink / raw)
  To: Milton Miller, Hitoshi Mitake, Sam Ravnborg, Ingo Molnar,
	Ingo Molnar, Desai, Kashyap, Prakash, Sathya
  Cc: linux-arch, linux scsi dev, Matthew Wilcox, linux kernel,
	James Bottomley, paulus@samba.org, linux pci, linux powerpc dev
In-Reply-To: <x86-32-writeq-is-broken@mdm.bga.com>

On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
> Ingo I would propose the following commits added in 2.6.29 be reverted.
> I think the current concensus is drivers must know if the writeq is
> not atomic so they can provide their own locking or other workaround.
>=20


Exactly.

^ permalink raw reply

* RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Milton Miller @ 2011-05-18 18:31 UTC (permalink / raw)
  To: Hitoshi Mitake, Sam Ravnborg, Ingo Molnar, Ingo Molnar,
	Desai, Kashyap, Prakash, Sathya
  Cc: linux-arch, linux scsi dev, Matthew Wilcox, linux kernel,
	James Bottomley, paulus@samba.org, linux pci, linux powerpc dev
In-Reply-To: <4565AEA676113A449269C2F3A549520F80B66280@cosmail03.lsi.com>

On Wed, 18 May 2011 about 09:35:56 -0600, Eric Moore wrote:
> On Wednesday, May 18, 2011 2:24 AM, Milton Miller wrote:
> > On Wed, 18 May 2011 around 17:00:10 +1000, Benjamin Herrenschmidt wrote:
> > > (Just adding Milton to the CC list, he suspects races in the
> > > driver instead).
> > >
> > > On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> > > > On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > > > > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > > > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > > > > The following code seems to be there in
> > /usr/src/linux/arch/x86/include/asm/io.h.
> > > > > > > This is not going to work.
> > > > > > >
> > > > > > > static inline void writeq(__u64 val, volatile void __iomem *addr)
> > > > > > > {
> > > > > > >         writel(val, addr);
> > > > > > >         writel(val >> 32, addr+4);
> > > > > > > }
> > > > > > >
> > > > > > > So with this code turned on in the kernel, there is going to be
> > race condition
> > > > > > > where multiple cpus can be writing to the request descriptor at
> > the same time.
> > > > > > >
> > > > > > > Meaning this could happen:
> > > > > > > (A) CPU A doest 32bit write
> > > > > > > (B) CPU B does 32 bit write
> > > > > > > (C) CPU A does 32 bit write
> > > > > > > (D) CPU B does 32 bit write
> > > > > > >
> > > > > > > We need the 64 bit completed in one access pci memory write, else
> > spin lock is required.
> > > > > > > Since it's going to be difficult to know which writeq was
> > implemented in the kernel,
> > > > > > > the driver is going to have to always acquire a spin lock each
> > time we do 64bit write.
> > > > > > >
> > > > > > > Cc: stable@kernle.org
> > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
> > > > > > > ---
> > > > > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > index efa0255..5778334 100644
> > > > > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct
> > MPT2SAS_ADAPTER *ioc, u16 smid)
> > > > > > >   * care of 32 bit environment where its not quarenteed to send
> > the entire word
> > > > > > >   * in one transfer.
> > > > > > >   */
> > > > > > > -#ifndef writeq
> > > > > >
> > > > > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > > > > systems have writeq implemented correctly; you suspect 32 bit
> > systems
> > > > > > don't.
> > > > > >
> > > > > > James
> > > > > >
> > > > > > James, This issue was observed on PPC64 system. So what you have
> > suggested will not solve this issue.
> > > > > > If we are sure that writeq() is atomic across all architecture, we
> > can use it safely. As we have seen issue on ppc64, we are not confident to
> > use
> > > > > > "writeq" call.
> > > > >
> > > > > So have you told the powerpc people that they have a broken writeq?
> > > >
> > > > I'm just in the process of finding them now on IRC so I can demand an
> > > > explanation: this is a really serious API problem because writeq is
> > > > supposed to be atomic on 64 bit.
> > > >
> > > > > And why do you obfuscate your report by talking about i386 when it's
> > > > > really about powerpc64?
> > > >
> > > > James
> >
> > I checked the assembly for my complied output and it ends up with
> > a single std (store doubleword aka 64 bits) instruction with offset
> > 192 decimal (0xc0) from the base register obtained from the structure.
> >
> > An aligned doubleword store is atomic on 64 bit powerpc.
> >
> > So I would really like more details if you are blaming 64 bit
> > powerpc of a non-atomic store.
> >
> > That said, the patch will affect the code by adding barriers.
> > Specifically, while powerpc has a sync before doing the store as part
> > of writeq, wrapping in a spinlock adds a sync before releasing the lock
> > whenever a writeq (or writex x=b,w,d,q) was issued inside the lock.
> >
> > (sync orders all reads and all writes to both memory and devices from
> > that cpu).
> >
> > But looking further at the code, I see such things as:
> >
> > drivers/scsi/mpt2sas/mpt2sas_base.c  line 2944
> >
> >         mpt2sas_base_put_smid_default(ioc, smid);
> >         init_completion(&ioc->base_cmds.done);
> >         timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
> >
> > where mpt2sas_base_put_smid_default is a routine that has a call to
> > _base_writeq.  This will initiate io to the adapter, then initialize
> > the completion, then hope that the timeout is long enough to let the io
> > complete and be marked done but short enough to not be a problem when
> > the timeout occurs because we initialized the compeltion after the irq
> > came in.
> >
> > The code then looks at a status flag, but there is no indication how
> > the access to that field is serialized between the interrupt handler
> > and the submission routine.  It may mostly work due to barriers in
> > the primitives but I don't see any statement of rules.
> >
> > Also, while I see a few wmb before writel in _base_interrupt, I don't
> > see any rmb, which I would expect between establishing a element is
> > valid and reading other fields in that element.
> >
> > So I'd really like to hear more about what your symptoms were and how
> > you determined writeq on 64 bit powerpc was not atomic.
> >
> > Milton
> 
> 
> I worked the original defect a couple months ago, and Kashyap is now
> getting around to posting my patch's.
> 
> This original defect has nothing to do with PPC64.  The original
> problem was only on x86.    It only became a problem on PPC64 when I
> tried to fix the original x86 issue by copying the writeq code from
> the linux headers, then it broke PPC64.   I doubt that broken patch
> was ever posted. Anyways, back to the original defect.  The reason
> it because a problem for x86 is because the kernel headers had a
> implementation of writeq in the arch/x86 headers, which means our
> internal implementation of writeq is not being used.  The writeq
> implementation in the kernel is total wrong for arch/x86 because it
> doesn't not have spin locks, and if two processor simultaneously doing
> two separate 32bit pci writes, then what is received by controller
> firmware is out of order.   This change occurs between Red Hat RHEL5
> and RHEL6.  In RHEL5, this writeq was not implemented in arch/x86
> headers, and our driver internal implementation of write was used.
> 
> Eric

So the real question should be why is x86-32 supplying a broken writeq
instead of letting drivers work out what to do it when needed?

It was added in 2.6.29 with out any pci or other io acks in
the change log.

Also the only reference I find to HAVE_WRITEQ and HAVE_READQ is the
x86 selects, so I think those can just be removed (vs move to x86_64).

The original changelog is:
    Impact: add new API for drivers
    
    Add implementation of readq/writeq to x86_32, and add config value to
    the x86 architecture to determine existence of readq/writeq.

Ingo I would propose the following commits added in 2.6.29 be reverted.
I think the current concensus is drivers must know if the writeq is
not atomic so they can provide their own locking or other workaround.

2c5643b1c5c7fbb13f340d4c58944d9642f41796
	x86: provide readq()/writeq() on 32-bit too
a0b1131e479e5af32eefac8bc54c9742e23d638e
	x86: provide readq()/writeq() on 32-bit too, cleanup
93093d099e5dd0c258fd530c12668e828c20df41
	x86: provide readq()/writeq() on 32-bit too, complete

milton

^ permalink raw reply

* Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame
From: Milton Miller @ 2011-05-18 17:19 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20110518120320.GA3025@riccoc20.at.omicron.at>

Does this patch help?  If so please reply to that thread so patchwork
will see it in addition to here.

http://patchwork.ozlabs.org/patch/96146/

milton

^ permalink raw reply

* RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Moore, Eric @ 2011-05-18 15:35 UTC (permalink / raw)
  To: Milton Miller, Desai, Kashyap, Prakash, Sathya
  Cc: linux-scsi@vger.kernel.org, Matthew Wilcox, James Bottomley,
	paulus@samba.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <mpt2sas-writeq-powerpc64@mdm.bga.com>

On Wednesday, May 18, 2011 2:24 AM, Milton Miller wrote:
> On Wed, 18 May 2011 around 17:00:10 +1000, Benjamin Herrenschmidt wrote:
> > (Just adding Milton to the CC list, he suspects races in the
> > driver instead).
> >
> > On Wed, 2011-05-18 at 08:23 +0400, James Bottomley wrote:
> > > On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
> > > > On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
> > > > > On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
> > > > > > The following code seems to be there in
> /usr/src/linux/arch/x86/include/asm/io.h.
> > > > > > This is not going to work.
> > > > > >
> > > > > > static inline void writeq(__u64 val, volatile void __iomem *add=
r)
> > > > > > {
> > > > > >         writel(val, addr);
> > > > > >         writel(val >> 32, addr+4);
> > > > > > }
> > > > > >
> > > > > > So with this code turned on in the kernel, there is going to be
> race condition
> > > > > > where multiple cpus can be writing to the request descriptor at
> the same time.
> > > > > >
> > > > > > Meaning this could happen:
> > > > > > (A) CPU A doest 32bit write
> > > > > > (B) CPU B does 32 bit write
> > > > > > (C) CPU A does 32 bit write
> > > > > > (D) CPU B does 32 bit write
> > > > > >
> > > > > > We need the 64 bit completed in one access pci memory write, el=
se
> spin lock is required.
> > > > > > Since it's going to be difficult to know which writeq was
> implemented in the kernel,
> > > > > > the driver is going to have to always acquire a spin lock each
> time we do 64bit write.
> > > > > >
> > > > > > Cc: stable@kernle.org
> > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
> > > > > > ---
> > > > > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c
> b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > index efa0255..5778334 100644
> > > > > > --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> > > > > > @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct
> MPT2SAS_ADAPTER *ioc, u16 smid)
> > > > > >   * care of 32 bit environment where its not quarenteed to send
> the entire word
> > > > > >   * in one transfer.
> > > > > >   */
> > > > > > -#ifndef writeq
> > > > >
> > > > > Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
> > > > > systems have writeq implemented correctly; you suspect 32 bit
> systems
> > > > > don't.
> > > > >
> > > > > James
> > > > >
> > > > > James, This issue was observed on PPC64 system. So what you have
> suggested will not solve this issue.
> > > > > If we are sure that writeq() is atomic across all architecture, w=
e
> can use it safely. As we have seen issue on ppc64, we are not confident t=
o
> use
> > > > > "writeq" call.
> > > >
> > > > So have you told the powerpc people that they have a broken writeq?
> > >
> > > I'm just in the process of finding them now on IRC so I can demand an
> > > explanation: this is a really serious API problem because writeq is
> > > supposed to be atomic on 64 bit.
> > >
> > > > And why do you obfuscate your report by talking about i386 when it'=
s
> > > > really about powerpc64?
> > >
> > > James
>=20
> I checked the assembly for my complied output and it ends up with
> a single std (store doubleword aka 64 bits) instruction with offset
> 192 decimal (0xc0) from the base register obtained from the structure.
>=20
> An aligned doubleword store is atomic on 64 bit powerpc.
>=20
> So I would really like more details if you are blaming 64 bit
> powerpc of a non-atomic store.
>=20
> That said, the patch will affect the code by adding barriers.
> Specifically, while powerpc has a sync before doing the store as part
> of writeq, wrapping in a spinlock adds a sync before releasing the lock
> whenever a writeq (or writex x=3Db,w,d,q) was issued inside the lock.
>=20
> (sync orders all reads and all writes to both memory and devices from
> that cpu).
>=20
> But looking further at the code, I see such things as:
>=20
> drivers/scsi/mpt2sas/mpt2sas_base.c  line 2944
>=20
>         mpt2sas_base_put_smid_default(ioc, smid);
>         init_completion(&ioc->base_cmds.done);
>         timeleft =3D wait_for_completion_timeout(&ioc->base_cmds.done,
>=20
> where mpt2sas_base_put_smid_default is a routine that has a call to
> _base_writeq.  This will initiate io to the adapter, then initialize
> the completion, then hope that the timeout is long enough to let the io
> complete and be marked done but short enough to not be a problem when
> the timeout occurs because we initialized the compeltion after the irq
> came in.
>=20
> The code then looks at a status flag, but there is no indication how
> the access to that field is serialized between the interrupt handler
> and the submission routine.  It may mostly work due to barriers in
> the primitives but I don't see any statement of rules.
>=20
> Also, while I see a few wmb before writel in _base_interrupt, I don't
> see any rmb, which I would expect between establishing a element is
> valid and reading other fields in that element.
>=20
> So I'd really like to hear more about what your symptoms were and how
> you determined writeq on 64 bit powerpc was not atomic.
>=20
> Milton


I worked the original defect a couple months ago, and Kashyap is now gettin=
g around to posting my patch's.

This original defect has nothing to do with PPC64.  The original problem wa=
s only on x86.    It only became a problem on PPC64 when I tried to fix the=
 original x86 issue by copying the writeq code from the linux headers, then=
 it broke PPC64.   I doubt that broken patch was ever posted. Anyways, back=
 to the original defect.  The reason it because a problem for x86 is becaus=
e the kernel headers had a implementation of writeq in the arch/x86 headers=
, which means our internal implementation of writeq is not being used.  The=
 writeq implementation in the kernel is total wrong for arch/x86 because it=
 doesn't not have spin locks, and if two processor simultaneously doing two=
 separate 32bit pci writes, then what is received by controller firmware is=
 out of order.   This change occurs between Red Hat RHEL5 and RHEL6.  In RH=
EL5, this writeq was not implemented in arch/x86 headers, and our driver in=
ternal implementation of write was used.

Eric

^ permalink raw reply

* Re: Kernel cannot see PCI device
From: Bjorn Helgaas @ 2011-05-18 14:14 UTC (permalink / raw)
  To: Prashant Bhole; +Cc: linux-pci, linuxppc-dev
In-Reply-To: <BANLkTi=tvyOPoN3f3v_C+NuVOwr+YKaRJA@mail.gmail.com>

On Wed, May 18, 2011 at 4:02 AM, Prashant Bhole
<prashantsmailcenter@gmail.com> wrote:
> On Mon, May 2, 2011 at 10:21 AM, Prashant Bhole
> <prashantsmailcenter@gmail.com> wrote:
>>
>> Hi,
>> I have a custom made powerpc 460EX board. On that board u-boot
>> can see a PCI device but Linux kernel cannot see it. What could be the p=
roblem?
>>
>> On u-boot "pci =A02" commands displays following device:
>> Scanning PCI devices on bus 2
>> BusDevFun =A0VendorId =A0 DeviceId =A0 Device Class =A0 =A0 =A0 Sub-Clas=
s
>> _____________________________________________________________
>> 02.00.00 =A0 0x1000 =A0 =A0 0x0072 =A0 =A0 Mass storage controller 0x00
>>
>> And when the kernel is booted, there is only one pci device (bridge):
>> #ls /sys/bus/pci/devices
>> 0000:80:00.0
>>
>
> I am still facing in this problem.
>
> a call to pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l) return=
s
> positive value in the function pci_scan_device(), which means VENDOR_ID r=
eading
> failed. I could not find the reason. Any hints?

Hmm...  probably powerpc-related, so I added linuxppc-dev.

My guess would be that Linux didn't find the host bridge to the
hierarchy containing bus 2.  I would guess the host bridge info is
supposed to come from OF.  More information, like the complete u-boot
PCI scan and the kernel dmesg log, would be useful.  And maybe u-boot
has a way to dump the OF device tree?

^ permalink raw reply

* [PATCH] [klibc] ppc64: Fix build failure with stricter as
From: maximilian attems @ 2011-05-18 13:41 UTC (permalink / raw)
  To: klibc; +Cc: Matthias Klose, linuxppc-dev, maximilian attems, Paul Mackerras

From: Matthias Klose <doko@ubuntu.com>

Landed in Ubuntu klibc version 1.5.20-1ubuntu3.


Signed-off-by: maximilian attems <max@stro.at>
---
 usr/klibc/arch/ppc64/crt0.S |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/usr/klibc/arch/ppc64/crt0.S b/usr/klibc/arch/ppc64/crt0.S
index a7776a1..c976d5c 100644
--- a/usr/klibc/arch/ppc64/crt0.S
+++ b/usr/klibc/arch/ppc64/crt0.S
@@ -12,16 +12,17 @@
 	.section ".toc","aw"
 .LC0:	.tc	environ[TC],environ
 
+	.text
+	.align 4
+
 	.section ".opd","aw"
-	.align 3
-	.globl _start
 _start:
-	.quad	._start
-	.quad	.TOC.@tocbase, 0
-
-	.text
-	.globl	._start
+	.quad	._start, .TOC.@tocbase, 0
+	.previous
+	.size	_start, 24
 	.type	._start,@function
+	.globl	_start
+	.globl	._start
 ._start:
 	stdu    %r1,-32(%r1)
 	addi    %r3,%r1,32
@@ -29,4 +30,4 @@ _start:
 	b 	.__libc_init
 	nop
 
-	.size _start,.-_start
+	.size ._start,.-._start
-- 
1.7.4.4

^ permalink raw reply related

* Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame
From: Richard Cochran @ 2011-05-18 12:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1305668416.2781.23.camel@pasglop>

On Wed, May 18, 2011 at 07:40:16AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-05-17 at 18:28 +0200, Richard Cochran wrote:
> > Ben,
> > 
> > Recent 2.6.39-rc kernels behave strangely on the Freescale dual core
> > mpc8572 and p2020. There is a long pause (like 2 seconds) in the boot
> > sequence after "mpic: requesting IPIs..."
> > 
> > When the system comes up, only one core shows in /proc/cpuinfo. Later
> > on, lots of messages appear like the following:
> > 
> >    INFO: task ksoftirqd/1:9 blocked for more than 120 seconds.
> > 
> > I bisected [1] the problem to:
> > 
> >    commit c56e58537d504706954a06570b4034c04e5b7500
> >    Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >    Date:   Tue Mar 8 14:40:04 2011 +1100
> > 
> >        powerpc/smp: Create idle threads on demand and properly reset them
> > 
> > I don't see from that commit what had gone wrong. Perhaps you can
> > help resolve this?
> 
> Hrm, odd. Kumar, care to have a look ? That's what happens when you
> don't get me HW to test with :-)

(I get the feeling that I am the only one testing recent kernels with
the mpc85xx.)

Anyhow, I see that this commit was one of a series. For my own use,
can I simply revert this one commit independently?

Thanks,
Richard

^ permalink raw reply

* [PATCH] PPC_47x SMP fix
From: Kerstin Jonsson @ 2011-05-18  9:57 UTC (permalink / raw)
  To: benh, linuxppc-dev, linux-kernel
  Cc: Kerstin Jonsson, Michael Neuling, Paul Mackerras, Will Schmidt

 commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x chip.
 secondary_ti must be set to current thread info before callin kick_cpu or else
 start_secondary_47x will jump into void when trying to return to c-code.
 In the current setup secondary_ti is initialized before the CPU idle task is started
 and only the boot core will start. I am not sure this is the correct solution, but it
 makes SMP possible in my chip.
 Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
 available in head_44x.S - start_secondary_resume?


Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Will Schmidt <will_schmidt@vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cbdbb14..f2dcab7 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -410,8 +410,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
 {
 	int rc, c;
 
-	secondary_ti = current_set[cpu];
-
 	if (smp_ops == NULL ||
 	    (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
 		return -EINVAL;
@@ -421,6 +419,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
 	if (rc)
 		return rc;
 
+	secondary_ti = current_set[cpu];
+
 	/* Make sure callin-map entry is 0 (can be leftover a CPU
 	 * hotplug
 	 */
-- 
1.7.2.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox