LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Scott Wood @ 2012-05-31 17:47 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <OF1016B1AA.DF19B646-ONC1257A0F.003647EF-C1257A0F.003699E5@transmode.se>

On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
>>
>>
>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
>>>>
>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
>>>>>> it works for me. I don't know if there are any bad side effects,
>>>>>> therfore
>>>>>> this RFC.
>>>>
>>>>> We used to have MSR_DE surrounded by CONFIG_something
>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
>>>>> is set, you will have problems with software debuggers that
>>>>> utilize the the debugging registers in the chip itself.  You only want
>>>>> to force this to be set when using the BDI, not at other times.
>>>>
>>>> This MSR_DE is also of interest and used for software debuggers that
>>>> make use of the debug registers. Only if MSR_DE is set then debug
>>>> interrupts are generated. If a debug event leads to a debug interrupt
>>>> handled by a software debugger or if it leads to a debug halt handled
>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
>>>>
>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
>>>> Debug Support" explains in detail the effect of MSR_DE.
>>
>>> So what is the verdict on this? I don't buy into Dan argument without some
>>> hard data.
>>
>> What I tried to mention is that handling the MSR_DE correct is not only
>> an emulator (JTAG debugger) requirement. Also a software debugger may
>> depend on a correct handled MSR_DE bit.
> 
> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
> turning off MSR_DE first chance it gets?

The kernel selectively enables MSR_DE when it wants to debug.  I'm not
sure if anything will be bothered by leaving it on all the time.  This
is something we need for virtualization as well, so a hypervisor can
debug the guest.

-Scott

^ permalink raw reply

* [Early RFC 0/6] arch/powerpc: Add 64TB support to ppc64
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev

Hi,

This patchset include preparatory patches for supporting 64TB with ppc64. I haven't
completed the actual patch that bump the USER_ESID bits. I wanted the share the
changes early so that I can get feedback on the approach. The changes itself
contains few FIXME!! which I will be addressing in the later updates.

Thanks,
-aneesh

^ permalink raw reply

* [Early RFC 1/6] arch/powerpc: Use hpt_va to compute virtual address
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Don't open code the same

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/cell/beat_htab.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/cell/beat_htab.c b/arch/powerpc/platforms/cell/beat_htab.c
index 943c9d3..b83077e 100644
--- a/arch/powerpc/platforms/cell/beat_htab.c
+++ b/arch/powerpc/platforms/cell/beat_htab.c
@@ -259,7 +259,7 @@ static void beat_lpar_hpte_updateboltedpp(unsigned long newpp,
 	u64 dummy0, dummy1;
 
 	vsid = get_kernel_vsid(ea, MMU_SEGSIZE_256M);
-	va = (vsid << 28) | (ea & 0x0fffffff);
+	va = hpt_va(ea, vsid, MMU_SEGSIZE_256M);
 
 	raw_spin_lock(&beat_htab_lock);
 	slot = beat_lpar_hpte_find(va, psize);
-- 
1.7.10

^ permalink raw reply related

* [Early RFC 2/6] arch/powerpc: Convert virtual address to a struct
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This is in preparation to the conversion of 64 bit powerpc virtual address
to the max 78 bits. Later patch will switch struct virt_addr to a struct
of virtual segment id and segment offset.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s.h   |    2 +-
 arch/powerpc/include/asm/machdep.h      |    6 +--
 arch/powerpc/include/asm/mmu-hash64.h   |   24 ++++++----
 arch/powerpc/include/asm/tlbflush.h     |    4 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c   |    3 +-
 arch/powerpc/mm/hash_native_64.c        |   76 +++++++++++++++++--------------
 arch/powerpc/mm/hash_utils_64.c         |   12 ++---
 arch/powerpc/mm/hugetlbpage-hash64.c    |    3 +-
 arch/powerpc/mm/tlb_hash64.c            |    3 +-
 arch/powerpc/platforms/cell/beat_htab.c |   17 +++----
 arch/powerpc/platforms/ps3/htab.c       |    6 +--
 arch/powerpc/platforms/pseries/lpar.c   |   30 ++++++------
 12 files changed, 103 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index fd07f43..374b75d 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -59,7 +59,7 @@ struct hpte_cache {
 	struct hlist_node list_vpte;
 	struct hlist_node list_vpte_long;
 	struct rcu_head rcu_head;
-	u64 host_va;
+	struct virt_addr host_va;
 	u64 pfn;
 	ulong slot;
 	struct kvmppc_pte pte;
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 42ce570..b34d0a9 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -34,19 +34,19 @@ struct machdep_calls {
 	char		*name;
 #ifdef CONFIG_PPC64
 	void            (*hpte_invalidate)(unsigned long slot,
-					   unsigned long va,
+					   struct virt_addr va,
 					   int psize, int ssize,
 					   int local);
 	long		(*hpte_updatepp)(unsigned long slot, 
 					 unsigned long newpp, 
-					 unsigned long va,
+					 struct virt_addr va,
 					 int psize, int ssize,
 					 int local);
 	void            (*hpte_updateboltedpp)(unsigned long newpp, 
 					       unsigned long ea,
 					       int psize, int ssize);
 	long		(*hpte_insert)(unsigned long hpte_group,
-				       unsigned long va,
+				       struct virt_addr va,
 				       unsigned long prpn,
 				       unsigned long rflags,
 				       unsigned long vflags,
diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 1c65a59..5ff936b 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -143,6 +143,10 @@ struct mmu_psize_def
 	unsigned long	sllp;	/* SLB L||LP (exact mask to use in slbmte) */
 };
 
+struct virt_addr {
+	unsigned long addr;
+};
+
 #endif /* __ASSEMBLY__ */
 
 /*
@@ -183,11 +187,11 @@ extern int mmu_ci_restrictions;
  * This function sets the AVPN and L fields of the HPTE  appropriately
  * for the page size
  */
-static inline unsigned long hpte_encode_v(unsigned long va, int psize,
+static inline unsigned long hpte_encode_v(struct virt_addr va, int psize,
 					  int ssize)
 {
 	unsigned long v;
-	v = (va >> 23) & ~(mmu_psize_defs[psize].avpnm);
+	v = (va.addr >> 23) & ~(mmu_psize_defs[psize].avpnm);
 	v <<= HPTE_V_AVPN_SHIFT;
 	if (psize != MMU_PAGE_4K)
 		v |= HPTE_V_LARGE;
@@ -218,28 +222,30 @@ static inline unsigned long hpte_encode_r(unsigned long pa, int psize)
 /*
  * Build a VA given VSID, EA and segment size
  */
-static inline unsigned long hpt_va(unsigned long ea, unsigned long vsid,
+static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid,
 				   int ssize)
 {
+	struct virt_addr va;
 	if (ssize == MMU_SEGSIZE_256M)
-		return (vsid << 28) | (ea & 0xfffffffUL);
-	return (vsid << 40) | (ea & 0xffffffffffUL);
+		va.addr = (vsid << 28) | (ea & 0xfffffffUL);
+	va.addr = (vsid << 40) | (ea & 0xffffffffffUL);
+	return va;
 }
 
 /*
  * This hashes a virtual address
  */
 
-static inline unsigned long hpt_hash(unsigned long va, unsigned int shift,
+static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift,
 				     int ssize)
 {
 	unsigned long hash, vsid;
 
 	if (ssize == MMU_SEGSIZE_256M) {
-		hash = (va >> 28) ^ ((va & 0x0fffffffUL) >> shift);
+		hash = (va.addr >> 28) ^ ((va.addr & 0x0fffffffUL) >> shift);
 	} else {
-		vsid = va >> 40;
-		hash = vsid ^ (vsid << 25) ^ ((va & 0xffffffffffUL) >> shift);
+		vsid = va.addr >> 40;
+		hash = vsid ^ (vsid << 25) ^ ((va.addr & 0xffffffffffUL) >> shift);
 	}
 	return hash & 0x7fffffffffUL;
 }
diff --git a/arch/powerpc/include/asm/tlbflush.h b/arch/powerpc/include/asm/tlbflush.h
index 81143fc..2c8ad50 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -95,7 +95,7 @@ struct ppc64_tlb_batch {
 	unsigned long		index;
 	struct mm_struct	*mm;
 	real_pte_t		pte[PPC64_TLB_BATCH_NR];
-	unsigned long		vaddr[PPC64_TLB_BATCH_NR];
+	struct virt_addr	vaddr[PPC64_TLB_BATCH_NR];
 	unsigned int		psize;
 	int			ssize;
 };
@@ -127,7 +127,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
 #define arch_flush_lazy_mmu_mode()      do {} while (0)
 
 
-extern void flush_hash_page(unsigned long va, real_pte_t pte, int psize,
+extern void flush_hash_page(struct virt_addr va, real_pte_t pte, int psize,
 			    int ssize, int local);
 extern void flush_hash_range(unsigned long number, int local);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 10fc8ec..933b117 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -80,8 +80,9 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
 
 int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
 {
+	struct virt_addr va;
 	pfn_t hpaddr;
-	ulong hash, hpteg, va;
+	ulong hash, hpteg;
 	u64 vsid;
 	int ret;
 	int rflags = 0x192;
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 90039bc..cab3892 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -39,62 +39,67 @@
 
 DEFINE_RAW_SPINLOCK(native_tlbie_lock);
 
-static inline void __tlbie(unsigned long va, int psize, int ssize)
+/* Verify docs says 14 .. 14+i bits */
+static inline void __tlbie(struct virt_addr va, int psize, int ssize)
 {
+	unsigned long vaddr = va.addr;
 	unsigned int penc;
 
+	vaddr &= ~(0xffffULL << 48);
+
 	/* clear top 16 bits, non SLS segment */
-	va &= ~(0xffffULL << 48);
+	vaddr &= ~(0xffffULL << 48);
 
 	switch (psize) {
 	case MMU_PAGE_4K:
-		va &= ~0xffful;
-		va |= ssize << 8;
+		vaddr &= ~0xffful;
+		vaddr |= ssize << 8;
 		asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
-			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
+			     : : "r" (vaddr), "r"(0), "i" (CPU_FTR_ARCH_206)
 			     : "memory");
 		break;
 	default:
 		penc = mmu_psize_defs[psize].penc;
-		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
-		va |= penc << 12;
-		va |= ssize << 8;
-		va |= 1; /* L */
+		vaddr &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
+		vaddr |= penc << 12;
+		vaddr |= ssize << 8;
+		vaddr |= 1; /* L */
 		asm volatile(ASM_FTR_IFCLR("tlbie %0,1", PPC_TLBIE(%1,%0), %2)
-			     : : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
+			     : : "r" (vaddr), "r"(0), "i" (CPU_FTR_ARCH_206)
 			     : "memory");
 		break;
 	}
 }
 
-static inline void __tlbiel(unsigned long va, int psize, int ssize)
+/* Verify docs says 14 .. 14+i bits */
+static inline void __tlbiel(struct virt_addr va, int psize, int ssize)
 {
+	unsigned long vaddr = va.addr;
 	unsigned int penc;
 
-	/* clear top 16 bits, non SLS segment */
-	va &= ~(0xffffULL << 48);
+	vaddr &= ~(0xffffULL << 48);
 
 	switch (psize) {
 	case MMU_PAGE_4K:
-		va &= ~0xffful;
-		va |= ssize << 8;
+		vaddr &= ~0xffful;
+		vaddr |= ssize << 8;
 		asm volatile(".long 0x7c000224 | (%0 << 11) | (0 << 21)"
-			     : : "r"(va) : "memory");
+			     : : "r"(vaddr) : "memory");
 		break;
 	default:
 		penc = mmu_psize_defs[psize].penc;
-		va &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
-		va |= penc << 12;
-		va |= ssize << 8;
-		va |= 1; /* L */
+		vaddr &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
+		vaddr |= penc << 12;
+		vaddr |= ssize << 8;
+		vaddr |= 1; /* L */
 		asm volatile(".long 0x7c000224 | (%0 << 11) | (1 << 21)"
-			     : : "r"(va) : "memory");
+			     : : "r"(vaddr) : "memory");
 		break;
 	}
 
 }
 
-static inline void tlbie(unsigned long va, int psize, int ssize, int local)
+static inline void tlbie(struct virt_addr va, int psize, int ssize, int local)
 {
 	unsigned int use_local = local && mmu_has_feature(MMU_FTR_TLBIEL);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
@@ -134,7 +139,7 @@ static inline void native_unlock_hpte(struct hash_pte *hptep)
 	clear_bit_unlock(HPTE_LOCK_BIT, word);
 }
 
-static long native_hpte_insert(unsigned long hpte_group, unsigned long va,
+static long native_hpte_insert(unsigned long hpte_group, struct virt_addr va,
 			unsigned long pa, unsigned long rflags,
 			unsigned long vflags, int psize, int ssize)
 {
@@ -225,7 +230,7 @@ static long native_hpte_remove(unsigned long hpte_group)
 }
 
 static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
-				 unsigned long va, int psize, int ssize,
+				 struct virt_addr va, int psize, int ssize,
 				 int local)
 {
 	struct hash_pte *hptep = htab_address + slot;
@@ -259,7 +264,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 	return ret;
 }
 
-static long native_hpte_find(unsigned long va, int psize, int ssize)
+static long native_hpte_find(struct virt_addr va, int psize, int ssize)
 {
 	struct hash_pte *hptep;
 	unsigned long hash;
@@ -295,7 +300,8 @@ static long native_hpte_find(unsigned long va, int psize, int ssize)
 static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 				       int psize, int ssize)
 {
-	unsigned long vsid, va;
+	struct virt_addr va;
+	unsigned long vsid;
 	long slot;
 	struct hash_pte *hptep;
 
@@ -315,7 +321,7 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 	tlbie(va, psize, ssize, 0);
 }
 
-static void native_hpte_invalidate(unsigned long slot, unsigned long va,
+static void native_hpte_invalidate(unsigned long slot, struct virt_addr va,
 				   int psize, int ssize, int local)
 {
 	struct hash_pte *hptep = htab_address + slot;
@@ -349,7 +355,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long va,
 #define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
 
 static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
-			int *psize, int *ssize, unsigned long *va)
+			int *psize, int *ssize, struct virt_addr *va)
 {
 	unsigned long hpte_r = hpte->r;
 	unsigned long hpte_v = hpte->v;
@@ -403,7 +409,7 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 		avpn |= (vpi << mmu_psize_defs[size].shift);
 	}
 
-	*va = avpn;
+	va->addr = avpn;
 	*psize = size;
 	*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
 }
@@ -418,9 +424,10 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
  */
 static void native_hpte_clear(void)
 {
+	struct virt_addr va;
 	unsigned long slot, slots, flags;
 	struct hash_pte *hptep = htab_address;
-	unsigned long hpte_v, va;
+	unsigned long hpte_v;
 	unsigned long pteg_count;
 	int psize, ssize;
 
@@ -465,7 +472,8 @@ static void native_hpte_clear(void)
  */
 static void native_flush_hash_range(unsigned long number, int local)
 {
-	unsigned long va, hash, index, hidx, shift, slot;
+	struct virt_addr va;
+	unsigned long hash, index, hidx, shift, slot;
 	struct hash_pte *hptep;
 	unsigned long hpte_v;
 	unsigned long want_v;
@@ -482,7 +490,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 		va = batch->vaddr[i];
 		pte = batch->pte[i];
 
-		pte_iterate_hashed_subpages(pte, psize, va, index, shift) {
+		pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
 			hash = hpt_hash(va, shift, ssize);
 			hidx = __rpte_to_hidx(pte, index);
 			if (hidx & _PTEIDX_SECONDARY)
@@ -508,7 +516,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			va = batch->vaddr[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize, va, index,
+			pte_iterate_hashed_subpages(pte, psize, va.addr, index,
 						    shift) {
 				__tlbiel(va, psize, ssize);
 			} pte_iterate_hashed_end();
@@ -525,7 +533,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			va = batch->vaddr[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize, va, index,
+			pte_iterate_hashed_subpages(pte, psize, va.addr, index,
 						    shift) {
 				__tlbie(va, psize, ssize);
 			} pte_iterate_hashed_end();
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 377e5cb..2429d53 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -192,7 +192,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
 	     vaddr += step, paddr += step) {
 		unsigned long hash, hpteg;
 		unsigned long vsid = get_kernel_vsid(vaddr, ssize);
-		unsigned long va = hpt_va(vaddr, vsid, ssize);
+		struct virt_addr va = hpt_va(vaddr, vsid, ssize);
 		unsigned long tprot = prot;
 
 		/* Make kernel text executable */
@@ -1153,13 +1153,13 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
 /* WARNING: This is called from hash_low_64.S, if you change this prototype,
  *          do not forget to update the assembly call site !
  */
-void flush_hash_page(unsigned long va, real_pte_t pte, int psize, int ssize,
+void flush_hash_page(struct virt_addr va, real_pte_t pte, int psize, int ssize,
 		     int local)
 {
 	unsigned long hash, index, shift, hidx, slot;
 
-	DBG_LOW("flush_hash_page(va=%016lx)\n", va);
-	pte_iterate_hashed_subpages(pte, psize, va, index, shift) {
+	DBG_LOW("flush_hash_page(va=%016lx)\n", va.addr);
+	pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
 		hash = hpt_hash(va, shift, ssize);
 		hidx = __rpte_to_hidx(pte, index);
 		if (hidx & _PTEIDX_SECONDARY)
@@ -1208,7 +1208,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
 	unsigned long hash, hpteg;
 	unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
-	unsigned long va = hpt_va(vaddr, vsid, mmu_kernel_ssize);
+	struct virt_addr va = hpt_va(vaddr, vsid, mmu_kernel_ssize);
 	unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL);
 	int ret;
 
@@ -1229,7 +1229,7 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
 {
 	unsigned long hash, hidx, slot;
 	unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
-	unsigned long va = hpt_va(vaddr, vsid, mmu_kernel_ssize);
+	struct virt_addr va = hpt_va(vaddr, vsid, mmu_kernel_ssize);
 
 	hash = hpt_hash(va, PAGE_SHIFT, mmu_kernel_ssize);
 	spin_lock(&linear_map_hash_lock);
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index cc5c273..47b4ed0 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -18,8 +18,9 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 		     pte_t *ptep, unsigned long trap, int local, int ssize,
 		     unsigned int shift, unsigned int mmu_psize)
 {
+	struct virt_addr va;
 	unsigned long old_pte, new_pte;
-	unsigned long va, rflags, pa, sz;
+	unsigned long rflags, pa, sz;
 	long slot;
 
 	BUG_ON(shift != mmu_psize_defs[mmu_psize].shift);
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index 31f1820..b830466 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -42,8 +42,9 @@ DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
 void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 		     pte_t *ptep, unsigned long pte, int huge)
 {
+	struct virt_addr vaddr;
 	struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
-	unsigned long vsid, vaddr;
+	unsigned long vsid;
 	unsigned int psize;
 	int ssize;
 	real_pte_t rpte;
diff --git a/arch/powerpc/platforms/cell/beat_htab.c b/arch/powerpc/platforms/cell/beat_htab.c
index b83077e..a2dfc22 100644
--- a/arch/powerpc/platforms/cell/beat_htab.c
+++ b/arch/powerpc/platforms/cell/beat_htab.c
@@ -88,7 +88,7 @@ static inline unsigned int beat_read_mask(unsigned hpte_group)
 }
 
 static long beat_lpar_hpte_insert(unsigned long hpte_group,
-				  unsigned long va, unsigned long pa,
+				  struct virt_addr va, unsigned long pa,
 				  unsigned long rflags, unsigned long vflags,
 				  int psize, int ssize)
 {
@@ -184,7 +184,7 @@ static void beat_lpar_hptab_clear(void)
  */
 static long beat_lpar_hpte_updatepp(unsigned long slot,
 				    unsigned long newpp,
-				    unsigned long va,
+				    struct virt_addr va,
 				    int psize, int ssize, int local)
 {
 	unsigned long lpar_rc;
@@ -220,7 +220,7 @@ static long beat_lpar_hpte_updatepp(unsigned long slot,
 	return 0;
 }
 
-static long beat_lpar_hpte_find(unsigned long va, int psize)
+static long beat_lpar_hpte_find(struct virt_addr va, int psize)
 {
 	unsigned long hash;
 	unsigned long i, j;
@@ -255,7 +255,8 @@ static void beat_lpar_hpte_updateboltedpp(unsigned long newpp,
 					  unsigned long ea,
 					  int psize, int ssize)
 {
-	unsigned long lpar_rc, slot, vsid, va;
+	struct virt_addr va;
+	unsigned long lpar_rc, slot, vsid;
 	u64 dummy0, dummy1;
 
 	vsid = get_kernel_vsid(ea, MMU_SEGSIZE_256M);
@@ -272,7 +273,7 @@ static void beat_lpar_hpte_updateboltedpp(unsigned long newpp,
 	BUG_ON(lpar_rc != 0);
 }
 
-static void beat_lpar_hpte_invalidate(unsigned long slot, unsigned long va,
+static void beat_lpar_hpte_invalidate(unsigned long slot, struct virt_addr va,
 					 int psize, int ssize, int local)
 {
 	unsigned long want_v;
@@ -311,7 +312,7 @@ void __init hpte_init_beat(void)
 }
 
 static long beat_lpar_hpte_insert_v3(unsigned long hpte_group,
-				  unsigned long va, unsigned long pa,
+				  struct virt_addr va, unsigned long pa,
 				  unsigned long rflags, unsigned long vflags,
 				  int psize, int ssize)
 {
@@ -364,7 +365,7 @@ static long beat_lpar_hpte_insert_v3(unsigned long hpte_group,
  */
 static long beat_lpar_hpte_updatepp_v3(unsigned long slot,
 				    unsigned long newpp,
-				    unsigned long va,
+				    struct virt_addr va,
 				    int psize, int ssize, int local)
 {
 	unsigned long lpar_rc;
@@ -392,7 +393,7 @@ static long beat_lpar_hpte_updatepp_v3(unsigned long slot,
 	return 0;
 }
 
-static void beat_lpar_hpte_invalidate_v3(unsigned long slot, unsigned long va,
+static void beat_lpar_hpte_invalidate_v3(unsigned long slot, struct virt_addr va,
 					 int psize, int ssize, int local)
 {
 	unsigned long want_v;
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index 3124cf7..6e27576 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -43,7 +43,7 @@ enum ps3_lpar_vas_id {
 
 static DEFINE_SPINLOCK(ps3_htab_lock);
 
-static long ps3_hpte_insert(unsigned long hpte_group, unsigned long va,
+static long ps3_hpte_insert(unsigned long hpte_group, struct virt_addr va,
 	unsigned long pa, unsigned long rflags, unsigned long vflags,
 	int psize, int ssize)
 {
@@ -107,7 +107,7 @@ static long ps3_hpte_remove(unsigned long hpte_group)
 }
 
 static long ps3_hpte_updatepp(unsigned long slot, unsigned long newpp,
-	unsigned long va, int psize, int ssize, int local)
+	struct virt_addr va, int psize, int ssize, int local)
 {
 	int result;
 	u64 hpte_v, want_v, hpte_rs;
@@ -159,7 +159,7 @@ static void ps3_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 	panic("ps3_hpte_updateboltedpp() not implemented");
 }
 
-static void ps3_hpte_invalidate(unsigned long slot, unsigned long va,
+static void ps3_hpte_invalidate(unsigned long slot, struct virt_addr va,
 	int psize, int ssize, int local)
 {
 	unsigned long flags;
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 5f3ef87..b4e9641 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -108,9 +108,9 @@ void vpa_init(int cpu)
 }
 
 static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
- 			      unsigned long va, unsigned long pa,
- 			      unsigned long rflags, unsigned long vflags,
-			      int psize, int ssize)
+				     struct virt_addr va, unsigned long pa,
+				     unsigned long rflags, unsigned long vflags,
+				     int psize, int ssize)
 {
 	unsigned long lpar_rc;
 	unsigned long flags;
@@ -120,7 +120,7 @@ static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
 	if (!(vflags & HPTE_V_BOLTED))
 		pr_devel("hpte_insert(group=%lx, va=%016lx, pa=%016lx, "
 			 "rflags=%lx, vflags=%lx, psize=%d)\n",
-			 hpte_group, va, pa, rflags, vflags, psize);
+			 hpte_group, va.addr, pa, rflags, vflags, psize);
 
 	hpte_v = hpte_encode_v(va, psize, ssize) | vflags | HPTE_V_VALID;
 	hpte_r = hpte_encode_r(pa, psize) | rflags;
@@ -231,12 +231,12 @@ static void pSeries_lpar_hptab_clear(void)
  * for use when we want to match an existing PTE.  The bottom 7 bits
  * of the returned value are zero.
  */
-static inline unsigned long hpte_encode_avpn(unsigned long va, int psize,
+static inline unsigned long hpte_encode_avpn(struct virt_addr va, int psize,
 					     int ssize)
 {
 	unsigned long v;
 
-	v = (va >> 23) & ~(mmu_psize_defs[psize].avpnm);
+	v = (va.addr >> 23) & ~(mmu_psize_defs[psize].avpnm);
 	v <<= HPTE_V_AVPN_SHIFT;
 	v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
 	return v;
@@ -250,7 +250,7 @@ static inline unsigned long hpte_encode_avpn(unsigned long va, int psize,
  */
 static long pSeries_lpar_hpte_updatepp(unsigned long slot,
 				       unsigned long newpp,
-				       unsigned long va,
+				       struct virt_addr va,
 				       int psize, int ssize, int local)
 {
 	unsigned long lpar_rc;
@@ -295,7 +295,7 @@ static unsigned long pSeries_lpar_hpte_getword0(unsigned long slot)
 	return dword0;
 }
 
-static long pSeries_lpar_hpte_find(unsigned long va, int psize, int ssize)
+static long pSeries_lpar_hpte_find(struct virt_addr va, int psize, int ssize)
 {
 	unsigned long hash;
 	unsigned long i;
@@ -323,7 +323,8 @@ static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 					     unsigned long ea,
 					     int psize, int ssize)
 {
-	unsigned long lpar_rc, slot, vsid, va, flags;
+	struct virt_addr va;
+	unsigned long lpar_rc, slot, vsid, flags;
 
 	vsid = get_kernel_vsid(ea, ssize);
 	va = hpt_va(ea, vsid, ssize);
@@ -337,7 +338,7 @@ static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
-static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long va,
+static void pSeries_lpar_hpte_invalidate(unsigned long slot, struct virt_addr va,
 					 int psize, int ssize, int local)
 {
 	unsigned long want_v;
@@ -345,7 +346,7 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long va,
 	unsigned long dummy1, dummy2;
 
 	pr_devel("    inval : slot=%lx, va=%016lx, psize: %d, local: %d\n",
-		 slot, va, psize, local);
+		 slot, va.addr, psize, local);
 
 	want_v = hpte_encode_avpn(va, psize, ssize);
 	lpar_rc = plpar_pte_remove(H_AVPN, slot, want_v, &dummy1, &dummy2);
@@ -358,7 +359,8 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long va,
 static void pSeries_lpar_hpte_removebolted(unsigned long ea,
 					   int psize, int ssize)
 {
-	unsigned long slot, vsid, va;
+	struct virt_addr va;
+	unsigned long slot, vsid;
 
 	vsid = get_kernel_vsid(ea, ssize);
 	va = hpt_va(ea, vsid, ssize);
@@ -382,12 +384,12 @@ static void pSeries_lpar_hpte_removebolted(unsigned long ea,
  */
 static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 {
+	struct virt_addr va;
 	unsigned long i, pix, rc;
 	unsigned long flags = 0;
 	struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[9];
-	unsigned long va;
 	unsigned long hash, index, shift, hidx, slot;
 	real_pte_t pte;
 	int psize, ssize;
@@ -401,7 +403,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	for (i = 0; i < number; i++) {
 		va = batch->vaddr[i];
 		pte = batch->pte[i];
-		pte_iterate_hashed_subpages(pte, psize, va, index, shift) {
+		pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
 			hash = hpt_hash(va, shift, ssize);
 			hidx = __rpte_to_hidx(pte, index);
 			if (hidx & _PTEIDX_SECONDARY)
-- 
1.7.10

^ permalink raw reply related

* [Early RFC 3/6] arch/powerpc: Simplify hpte_decode
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch simplify hpte_decode for easy switching of virtual address to
vsid and segment offset combination in the later patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hash_native_64.c |   51 ++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index cab3892..76c2574 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -357,9 +357,10 @@ static void native_hpte_invalidate(unsigned long slot, struct virt_addr va,
 static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			int *psize, int *ssize, struct virt_addr *va)
 {
+	unsigned long avpn, pteg, vpi;
 	unsigned long hpte_r = hpte->r;
 	unsigned long hpte_v = hpte->v;
-	unsigned long avpn;
+	unsigned long vsid, seg_off;
 	int i, size, shift, penc;
 
 	if (!(hpte_v & HPTE_V_LARGE))
@@ -386,32 +387,40 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 	}
 
 	/* This works for all page sizes, and for 256M and 1T segments */
+	*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
 	shift = mmu_psize_defs[size].shift;
-	avpn = (HPTE_V_AVPN_VAL(hpte_v) & ~mmu_psize_defs[size].avpnm) << 23;
-
-	if (shift < 23) {
-		unsigned long vpi, vsid, pteg;
 
-		pteg = slot / HPTES_PER_GROUP;
-		if (hpte_v & HPTE_V_SECONDARY)
-			pteg = ~pteg;
-		switch (hpte_v >> HPTE_V_SSIZE_SHIFT) {
-		case MMU_SEGSIZE_256M:
-			vpi = ((avpn >> 28) ^ pteg) & htab_hash_mask;
-			break;
-		case MMU_SEGSIZE_1T:
-			vsid = avpn >> 40;
+	avpn = (HPTE_V_AVPN_VAL(hpte_v) & ~mmu_psize_defs[size].avpnm);
+	pteg = slot / HPTES_PER_GROUP;
+	if (hpte_v & HPTE_V_SECONDARY)
+		pteg = ~pteg;
+
+	switch (*ssize) {
+	case MMU_SEGSIZE_256M:
+		/* We only have 28 - 23 bits of seg_off in avpn */
+		seg_off = (avpn & 0x1f) << 23;
+		vsid    =  avpn >> 5;
+		/* We can find more bits from the pteg value */
+		if (shift < 23) {
+			vpi = (vsid ^ pteg) & htab_hash_mask;
+			seg_off |= vpi << shift;
+		}
+		va->addr = vsid << 28 | seg_off;
+	case MMU_SEGSIZE_1T:
+		/* We only have 40 - 23 bits of seg_off in avpn */
+		seg_off = (avpn & 0x1ffff) << 23;
+		vsid    = avpn >> 17;
+		if (shift < 23) {
 			vpi = (vsid ^ (vsid << 25) ^ pteg) & htab_hash_mask;
-			break;
-		default:
-			avpn = vpi = size = 0;
+			seg_off |= vpi << shift;
 		}
-		avpn |= (vpi << mmu_psize_defs[size].shift);
+		va->addr = vsid << 40 | seg_off;
+	default:
+		seg_off = 0;
+		vsid    = 0;
+		va->addr = 0;
 	}
-
-	va->addr = avpn;
 	*psize = size;
-	*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
 }
 
 /*
-- 
1.7.10

^ permalink raw reply related

* [Early RFC 4/6] arch/powerpc: Use vsid and segment offset to represent virtual address
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch enables us to have 78 bit virtual address.

With 1TB segments we use 40 bits of virtual adress as segment offset and
the remaining 24 bits (of the current 64 bit virtual address) are used
to index the virtual segment. Out of the 24 bits we currently use 19 bits
for user context and that leave us with only 4 bits for effective segment
ID. In-order to support more than 16TB of memory we would require more than
4 ESID bits. This patch splits the virtual address to two unsigned long
components, vsid and segment offset thereby allowing us to support 78 bit
virtual address.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mmu-hash64.h |   62 ++++++++---
 arch/powerpc/mm/hash_low_64.S         |  191 ++++++++++++++++++---------------
 arch/powerpc/mm/hash_native_64.c      |   34 +++---
 arch/powerpc/mm/hash_utils_64.c       |    6 +-
 arch/powerpc/platforms/ps3/htab.c     |   13 +--
 arch/powerpc/platforms/pseries/lpar.c |   29 ++---
 6 files changed, 191 insertions(+), 144 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 5ff936b..e563bd2 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -143,8 +143,10 @@ struct mmu_psize_def
 	unsigned long	sllp;	/* SLB L||LP (exact mask to use in slbmte) */
 };
 
+/* 78 bit power virtual address */
 struct virt_addr {
-	unsigned long addr;
+	unsigned long vsid;
+	unsigned long seg_off;
 };
 
 #endif /* __ASSEMBLY__ */
@@ -161,6 +163,13 @@ struct virt_addr {
 
 #ifndef __ASSEMBLY__
 
+static inline int segment_shift(int ssize)
+{
+	if (ssize == MMU_SEGSIZE_256M)
+		return SID_SHIFT;
+	return SID_SHIFT_1T;
+}
+
 /*
  * The current system page and segment sizes
  */
@@ -184,6 +193,32 @@ extern unsigned long tce_alloc_start, tce_alloc_end;
 extern int mmu_ci_restrictions;
 
 /*
+ * This computes the AVPN and B fields of the first dword of a HPTE,
+ * for use when we want to match an existing PTE.  The bottom 7 bits
+ * of the returned value are zero.
+ */
+static inline unsigned long hpte_encode_avpn(struct virt_addr va, int psize,
+					     int ssize)
+{
+	unsigned long v;
+
+	/*
+	 * The AVA field omits the low-order 23 bits of the 78 bits VA.
+	 * These bits are not needed in the PTE, because the
+	 * low-order b of these bits are part of the byte offset
+	 * into the virtual page and, if b < 23, the high-order
+	 * 23-b of these bits are always used in selecting the
+	 * PTEGs to be searched
+	 */
+	v = va.seg_off >> 23;
+	v |= va.vsid << (segment_shift(ssize) - 23);
+	v &= ~(mmu_psize_defs[psize].avpnm);
+	v <<= HPTE_V_AVPN_SHIFT;
+	v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
+	return v;
+}
+
+/*
  * This function sets the AVPN and L fields of the HPTE  appropriately
  * for the page size
  */
@@ -191,11 +226,9 @@ static inline unsigned long hpte_encode_v(struct virt_addr va, int psize,
 					  int ssize)
 {
 	unsigned long v;
-	v = (va.addr >> 23) & ~(mmu_psize_defs[psize].avpnm);
-	v <<= HPTE_V_AVPN_SHIFT;
+	v = hpte_encode_avpn(va, psize, ssize);
 	if (psize != MMU_PAGE_4K)
 		v |= HPTE_V_LARGE;
-	v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
 	return v;
 }
 
@@ -222,30 +255,31 @@ static inline unsigned long hpte_encode_r(unsigned long pa, int psize)
 /*
  * Build a VA given VSID, EA and segment size
  */
-static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid,
-				   int ssize)
+static inline struct virt_addr hpt_va(unsigned long ea, unsigned long vsid, int ssize)
 {
 	struct virt_addr va;
+
+	va.vsid    = vsid;
 	if (ssize == MMU_SEGSIZE_256M)
-		va.addr = (vsid << 28) | (ea & 0xfffffffUL);
-	va.addr = (vsid << 40) | (ea & 0xffffffffffUL);
+		va.seg_off = ea & 0xfffffffUL;
+	else
+		va.seg_off = ea & 0xffffffffffUL;
 	return va;
 }
 
 /*
  * This hashes a virtual address
  */
-
-static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift,
-				     int ssize)
+/* Verify */
+static inline unsigned long hpt_hash(struct virt_addr va, unsigned int shift, int ssize)
 {
 	unsigned long hash, vsid;
 
 	if (ssize == MMU_SEGSIZE_256M) {
-		hash = (va.addr >> 28) ^ ((va.addr & 0x0fffffffUL) >> shift);
+		hash = va.vsid ^ (va.seg_off >> shift);
 	} else {
-		vsid = va.addr >> 40;
-		hash = vsid ^ (vsid << 25) ^ ((va.addr & 0xffffffffffUL) >> shift);
+		vsid = va.vsid;
+		hash = vsid ^ (vsid << 25) ^ (va.seg_off >> shift);
 	}
 	return hash & 0x7fffffffffUL;
 }
diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
index a242b5d..cf66a0a 100644
--- a/arch/powerpc/mm/hash_low_64.S
+++ b/arch/powerpc/mm/hash_low_64.S
@@ -71,10 +71,12 @@ _GLOBAL(__hash_page_4K)
 	/* Save non-volatile registers.
 	 * r31 will hold "old PTE"
 	 * r30 is "new PTE"
-	 * r29 is "va"
+	 * r29 is vsid
 	 * r28 is a hash value
 	 * r27 is hashtab mask (maybe dynamic patched instead ?)
+	 * r26 is seg_off
 	 */
+	std	r26,STK_REG(r26)(r1)
 	std	r27,STK_REG(r27)(r1)
 	std	r28,STK_REG(r28)(r1)
 	std	r29,STK_REG(r29)(r1)
@@ -119,10 +121,9 @@ BEGIN_FTR_SECTION
 	cmpdi	r9,0			/* check segment size */
 	bne	3f
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
-	/* Calc va and put it in r29 */
-	rldicr	r29,r5,28,63-28
-	rldicl	r3,r3,0,36
-	or	r29,r3,r29
+	/* r29 is virtual address and r26 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r26,r3,0,36		/* ea & 0x000000000fffffffUL */
 
 	/* Calculate hash value for primary slot and store it in r28 */
 	rldicl	r5,r5,0,25		/* vsid & 0x0000007fffffffff */
@@ -130,14 +131,17 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	xor	r28,r5,r0
 	b	4f
 
-3:	/* Calc VA and hash in r29 and r28 for 1T segment */
-	sldi	r29,r5,40		/* vsid << 40 */
-	clrldi	r3,r3,24		/* ea & 0xffffffffff */
+3:	/* r29 is virtual address and r26 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r26,r3,0,24		/* ea & 0xffffffffff */
+	/*
+	 * calculate hash value for primary slot and
+	 * store it in r28 for 1T segment
+	 */
 	rldic	r28,r5,25,25		/* (vsid << 25) & 0x7fffffffff */
 	clrldi	r5,r5,40		/* vsid & 0xffffff */
 	rldicl	r0,r3,64-12,36		/* (ea >> 12) & 0xfffffff */
 	xor	r28,r28,r5
-	or	r29,r3,r29		/* VA */
 	xor	r28,r28,r0		/* hash */
 
 	/* Convert linux PTE bits into HW equivalents */
@@ -183,20 +187,21 @@ htab_insert_pte:
 	andc	r30,r30,r0
 	ori	r30,r30,_PAGE_HASHPTE
 
-	/* physical address r5 */
-	rldicl	r5,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
-	sldi	r5,r5,PAGE_SHIFT
+	/* physical address r6 */
+	rldicl	r6,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
+	sldi	r6,r6,PAGE_SHIFT
 
 	/* Calculate primary group hash */
 	and	r0,r28,r27
 	rldicr	r3,r0,3,63-3		/* r3 = (hash & mask) << 3 */
 
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,0			/* !bolted, !secondary */
-	li	r8,MMU_PAGE_4K		/* page size */
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr	r5,r26			/* seg_off */
+	li	r8,0			/* !bolted, !secondary */
+	li	r9,MMU_PAGE_4K		/* page size */
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(htab_call_hpte_insert1)
 	bl	.			/* Patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -206,20 +211,21 @@ _GLOBAL(htab_call_hpte_insert1)
 
 	/* Now try secondary slot */
 	
-	/* physical address r5 */
-	rldicl	r5,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
-	sldi	r5,r5,PAGE_SHIFT
+	/* physical address r6 */
+	rldicl	r6,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
+	sldi	r6,r6,PAGE_SHIFT
 
 	/* Calculate secondary group hash */
 	andc	r0,r27,r28
 	rldicr	r3,r0,3,63-3	/* r0 = (~hash & mask) << 3 */
 	
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,HPTE_V_SECONDARY	/* !bolted, secondary */
-	li	r8,MMU_PAGE_4K		/* page size */
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr	r5,r26			/* seg_off */
+	li	r8,HPTE_V_SECONDARY	/* !bolted, secondary */
+	li	r9,MMU_PAGE_4K		/* page size */
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(htab_call_hpte_insert2)
 	bl	.			/* Patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -286,13 +292,13 @@ htab_modify_pte:
 	add	r3,r0,r3	/* add slot idx */
 
 	/* Call ppc_md.hpte_updatepp */
-	mr	r5,r29			/* va */
-	li	r6,MMU_PAGE_4K		/* page size */
-	ld	r7,STK_PARM(r9)(r1)	/* segment size */
-	ld	r8,STK_PARM(r8)(r1)	/* get "local" param */
+	mr	r5,r29			/* vsid */
+	mr	r6,r26			/* seg off */
+	li	r7,MMU_PAGE_4K		/* page size */
+	ld	r8,STK_PARM(r9)(r1)	/* segment size */
+	ld	r9,STK_PARM(r8)(r1)	/* get "local" param */
 _GLOBAL(htab_call_hpte_updatepp)
 	bl	.			/* Patched by htab_finish_init() */
-
 	/* if we failed because typically the HPTE wasn't really here
 	 * we try an insertion. 
 	 */
@@ -347,12 +353,14 @@ _GLOBAL(__hash_page_4K)
 	/* Save non-volatile registers.
 	 * r31 will hold "old PTE"
 	 * r30 is "new PTE"
-	 * r29 is "va"
+	 * r29 is vsid
 	 * r28 is a hash value
 	 * r27 is hashtab mask (maybe dynamic patched instead ?)
 	 * r26 is the hidx mask
 	 * r25 is the index in combo page
+	 * r24 is seg_off
 	 */
+	std	r24,STK_REG(r24)(r1)
 	std	r25,STK_REG(r25)(r1)
 	std	r26,STK_REG(r26)(r1)
 	std	r27,STK_REG(r27)(r1)
@@ -402,10 +410,9 @@ BEGIN_FTR_SECTION
 	cmpdi	r9,0			/* check segment size */
 	bne	3f
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
-	/* Calc va and put it in r29 */
-	rldicr	r29,r5,28,63-28		/* r29 = (vsid << 28) */
-	rldicl	r3,r3,0,36		/* r3 = (ea & 0x0fffffff) */
-	or	r29,r3,r29		/* r29 = va */
+	/* r29 is virtual address and r24 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r24,r3,0,36		/* ea & 0x000000000fffffffUL */
 
 	/* Calculate hash value for primary slot and store it in r28 */
 	rldicl	r5,r5,0,25		/* vsid & 0x0000007fffffffff */
@@ -413,14 +420,17 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	xor	r28,r5,r0
 	b	4f
 
-3:	/* Calc VA and hash in r29 and r28 for 1T segment */
-	sldi	r29,r5,40		/* vsid << 40 */
-	clrldi	r3,r3,24		/* ea & 0xffffffffff */
+3:	/* r29 is virtual address and r24 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r24,r3,0,24		/* ea & 0xffffffffff */
+	/*
+	 * Calculate hash value for primary slot and
+	 * store it in r28  for 1T segment
+	 */
 	rldic	r28,r5,25,25		/* (vsid << 25) & 0x7fffffffff */
 	clrldi	r5,r5,40		/* vsid & 0xffffff */
 	rldicl	r0,r3,64-12,36		/* (ea >> 12) & 0xfffffff */
 	xor	r28,r28,r5
-	or	r29,r3,r29		/* VA */
 	xor	r28,r28,r0		/* hash */
 
 	/* Convert linux PTE bits into HW equivalents */
@@ -481,25 +491,26 @@ END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)
 	bne	htab_modify_pte
 
 htab_insert_pte:
-	/* real page number in r5, PTE RPN value + index */
+	/* real page number in r6, PTE RPN value + index */
 	andis.	r0,r31,_PAGE_4K_PFN@h
 	srdi	r5,r31,PTE_RPN_SHIFT
 	bne-	htab_special_pfn
 	sldi	r5,r5,PAGE_SHIFT-HW_PAGE_SHIFT
 	add	r5,r5,r25
 htab_special_pfn:
-	sldi	r5,r5,HW_PAGE_SHIFT
+	sldi	r6,r5,HW_PAGE_SHIFT
 
 	/* Calculate primary group hash */
 	and	r0,r28,r27
 	rldicr	r3,r0,3,63-3		/* r0 = (hash & mask) << 3 */
 
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,0			/* !bolted, !secondary */
-	li	r8,MMU_PAGE_4K		/* page size */
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr      r5,r24			/* seg off */
+	li	r8,0			/* !bolted, !secondary */
+	li	r9,MMU_PAGE_4K		/* page size */
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(htab_call_hpte_insert1)
 	bl	.			/* patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -515,18 +526,19 @@ _GLOBAL(htab_call_hpte_insert1)
 	bne-	3f
 	sldi	r5,r5,PAGE_SHIFT-HW_PAGE_SHIFT
 	add	r5,r5,r25
-3:	sldi	r5,r5,HW_PAGE_SHIFT
+3:	sldi	r6,r5,HW_PAGE_SHIFT
 
 	/* Calculate secondary group hash */
 	andc	r0,r27,r28
 	rldicr	r3,r0,3,63-3		/* r0 = (~hash & mask) << 3 */
 
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,HPTE_V_SECONDARY	/* !bolted, secondary */
-	li	r8,MMU_PAGE_4K		/* page size */
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr      r5,r24			/* seg off */
+	li	r8,HPTE_V_SECONDARY	/* !bolted, secondary */
+	li	r9,MMU_PAGE_4K		/* page size */
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(htab_call_hpte_insert2)
 	bl	.			/* patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -628,13 +640,13 @@ htab_modify_pte:
 	add	r3,r0,r3	/* add slot idx */
 
 	/* Call ppc_md.hpte_updatepp */
-	mr	r5,r29			/* va */
-	li	r6,MMU_PAGE_4K		/* page size */
-	ld	r7,STK_PARM(r9)(r1)	/* segment size */
-	ld	r8,STK_PARM(r8)(r1)	/* get "local" param */
+	mr	r5,r29			/* vsid */
+	mr      r6,r24			/* seg off */
+	li	r7,MMU_PAGE_4K		/* page size */
+	ld	r8,STK_PARM(r9)(r1)	/* segment size */
+	ld	r9,STK_PARM(r8)(r1)	/* get "local" param */
 _GLOBAL(htab_call_hpte_updatepp)
 	bl	.			/* patched by htab_finish_init() */
-
 	/* if we failed because typically the HPTE wasn't really here
 	 * we try an insertion.
 	 */
@@ -684,10 +696,12 @@ _GLOBAL(__hash_page_64K)
 	/* Save non-volatile registers.
 	 * r31 will hold "old PTE"
 	 * r30 is "new PTE"
-	 * r29 is "va"
+	 * r29 is vsid
 	 * r28 is a hash value
 	 * r27 is hashtab mask (maybe dynamic patched instead ?)
+	 * r26 is seg off
 	 */
+	std	r26,STK_REG(r26)(r1)
 	std	r27,STK_REG(r27)(r1)
 	std	r28,STK_REG(r28)(r1)
 	std	r29,STK_REG(r29)(r1)
@@ -737,10 +751,9 @@ BEGIN_FTR_SECTION
 	cmpdi	r9,0			/* check segment size */
 	bne	3f
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
-	/* Calc va and put it in r29 */
-	rldicr	r29,r5,28,63-28
-	rldicl	r3,r3,0,36
-	or	r29,r3,r29
+	/* r29 is virtual address and r26 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r26,r3,0,36		/* ea & 0x000000000fffffffUL */
 
 	/* Calculate hash value for primary slot and store it in r28 */
 	rldicl	r5,r5,0,25		/* vsid & 0x0000007fffffffff */
@@ -748,14 +761,17 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	xor	r28,r5,r0
 	b	4f
 
-3:	/* Calc VA and hash in r29 and r28 for 1T segment */
-	sldi	r29,r5,40		/* vsid << 40 */
-	clrldi	r3,r3,24		/* ea & 0xffffffffff */
+3:	/* r29 is virtual address and r26 is seg_off */
+	mr	r29,r5			/* vsid */
+	rldicl	r26,r3,0,24		/* ea & 0xffffffffff */
+	/*
+	 * calculate hash value for primary slot and
+	 * store it in r28 for 1T segment
+	 */
 	rldic	r28,r5,25,25		/* (vsid << 25) & 0x7fffffffff */
 	clrldi	r5,r5,40		/* vsid & 0xffffff */
 	rldicl	r0,r3,64-16,40		/* (ea >> 16) & 0xffffff */
 	xor	r28,r28,r5
-	or	r29,r3,r29		/* VA */
 	xor	r28,r28,r0		/* hash */
 
 	/* Convert linux PTE bits into HW equivalents */
@@ -804,20 +820,21 @@ ht64_insert_pte:
 #else
 	ori	r30,r30,_PAGE_HASHPTE
 #endif
-	/* Phyical address in r5 */
-	rldicl	r5,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
-	sldi	r5,r5,PAGE_SHIFT
+	/* Phyical address in r6 */
+	rldicl	r6,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
+	sldi	r6,r6,PAGE_SHIFT
 
 	/* Calculate primary group hash */
 	and	r0,r28,r27
 	rldicr	r3,r0,3,63-3	/* r0 = (hash & mask) << 3 */
 
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,0			/* !bolted, !secondary */
-	li	r8,MMU_PAGE_64K
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr	r5,r26			/* seg_off */
+	li	r8,0			/* !bolted, !secondary */
+	li	r9,MMU_PAGE_64K
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(ht64_call_hpte_insert1)
 	bl	.			/* patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -827,20 +844,21 @@ _GLOBAL(ht64_call_hpte_insert1)
 
 	/* Now try secondary slot */
 
-	/* Phyical address in r5 */
-	rldicl	r5,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
-	sldi	r5,r5,PAGE_SHIFT
+	/* Phyical address in r6 */
+	rldicl	r6,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
+	sldi	r6,r6,PAGE_SHIFT
 
 	/* Calculate secondary group hash */
 	andc	r0,r27,r28
 	rldicr	r3,r0,3,63-3	/* r0 = (~hash & mask) << 3 */
 
 	/* Call ppc_md.hpte_insert */
-	ld	r6,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
-	mr	r4,r29			/* Retrieve va */
-	li	r7,HPTE_V_SECONDARY	/* !bolted, secondary */
-	li	r8,MMU_PAGE_64K
-	ld	r9,STK_PARM(r9)(r1)	/* segment size */
+	ld	r7,STK_PARM(r4)(r1)	/* Retrieve new pp bits */
+	mr	r4,r29			/* Retrieve vsid */
+	mr	r5,r26			/* seg_off */
+	li	r8,HPTE_V_SECONDARY	/* !bolted, secondary */
+	li	r9,MMU_PAGE_64K
+	ld	r10,STK_PARM(r9)(r1)	/* segment size */
 _GLOBAL(ht64_call_hpte_insert2)
 	bl	.			/* patched by htab_finish_init() */
 	cmpdi	0,r3,0
@@ -907,10 +925,11 @@ ht64_modify_pte:
 	add	r3,r0,r3	/* add slot idx */
 
 	/* Call ppc_md.hpte_updatepp */
-	mr	r5,r29			/* va */
-	li	r6,MMU_PAGE_64K
-	ld	r7,STK_PARM(r9)(r1)	/* segment size */
-	ld	r8,STK_PARM(r8)(r1)	/* get "local" param */
+	mr	r5,r29			/* vsid */
+	mr	r6,r26			/* seg off */
+	li	r7,MMU_PAGE_64K
+	ld	r8,STK_PARM(r9)(r1)	/* segment size */
+	ld	r9,STK_PARM(r8)(r1)	/* get "local" param */
 _GLOBAL(ht64_call_hpte_updatepp)
 	bl	.			/* patched by htab_finish_init() */
 
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 76c2574..f3628f8 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -42,10 +42,12 @@ DEFINE_RAW_SPINLOCK(native_tlbie_lock);
 /* Verify docs says 14 .. 14+i bits */
 static inline void __tlbie(struct virt_addr va, int psize, int ssize)
 {
-	unsigned long vaddr = va.addr;
+	unsigned long vaddr;
 	unsigned int penc;
 
-	vaddr &= ~(0xffffULL << 48);
+	/* We need only lower 48 bit of va, non SLS segment */
+	vaddr = va.vsid << segment_shift(ssize);
+	vaddr |= va.seg_off;
 
 	/* clear top 16 bits, non SLS segment */
 	vaddr &= ~(0xffffULL << 48);
@@ -74,9 +76,13 @@ static inline void __tlbie(struct virt_addr va, int psize, int ssize)
 /* Verify docs says 14 .. 14+i bits */
 static inline void __tlbiel(struct virt_addr va, int psize, int ssize)
 {
-	unsigned long vaddr = va.addr;
+	unsigned long vaddr;
 	unsigned int penc;
 
+	/* We need only lower 48 bit of va, non SLS segment */
+	vaddr = va.vsid << segment_shift(ssize);
+	vaddr |= va.seg_off;
+
 	vaddr &= ~(0xffffULL << 48);
 
 	switch (psize) {
@@ -148,9 +154,9 @@ static long native_hpte_insert(unsigned long hpte_group, struct virt_addr va,
 	int i;
 
 	if (!(vflags & HPTE_V_BOLTED)) {
-		DBG_LOW("    insert(group=%lx, va=%016lx, pa=%016lx,"
-			" rflags=%lx, vflags=%lx, psize=%d)\n",
-			hpte_group, va, pa, rflags, vflags, psize);
+		DBG_LOW("    insert(group=%lx, vsid=%016lx, seg_off=%016lx, pa=%016lx,"
+			" rflags=%lx, vflags=%lx, psize=%d)\n", hpte_group,
+			va.vsid, va.seg_off, pa, rflags, vflags, psize);
 	}
 
 	for (i = 0; i < HPTES_PER_GROUP; i++) {
@@ -239,8 +245,9 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 
 	want_v = hpte_encode_v(va, psize, ssize);
 
-	DBG_LOW("    update(va=%016lx, avpnv=%016lx, hash=%016lx, newpp=%x)",
-		va, want_v & HPTE_V_AVPN, slot, newpp);
+	DBG_LOW("    update(vsid=%016lx, seg_off=%016lx, avpnv=%016lx, "
+		"hash=%016lx, newpp=%x)", va.vsid, va.seg_off,
+		want_v & HPTE_V_AVPN, slot, newpp);
 
 	native_lock_hpte(hptep);
 
@@ -405,7 +412,6 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			vpi = (vsid ^ pteg) & htab_hash_mask;
 			seg_off |= vpi << shift;
 		}
-		va->addr = vsid << 28 | seg_off;
 	case MMU_SEGSIZE_1T:
 		/* We only have 40 - 23 bits of seg_off in avpn */
 		seg_off = (avpn & 0x1ffff) << 23;
@@ -414,12 +420,12 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			vpi = (vsid ^ (vsid << 25) ^ pteg) & htab_hash_mask;
 			seg_off |= vpi << shift;
 		}
-		va->addr = vsid << 40 | seg_off;
 	default:
 		seg_off = 0;
 		vsid    = 0;
-		va->addr = 0;
 	}
+	va->vsid = vsid;
+	va->seg_off = seg_off;
 	*psize = size;
 }
 
@@ -499,7 +505,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 		va = batch->vaddr[i];
 		pte = batch->pte[i];
 
-		pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
+		pte_iterate_hashed_subpages(pte, psize, va.seg_off, index, shift) {
 			hash = hpt_hash(va, shift, ssize);
 			hidx = __rpte_to_hidx(pte, index);
 			if (hidx & _PTEIDX_SECONDARY)
@@ -525,7 +531,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			va = batch->vaddr[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize, va.addr, index,
+			pte_iterate_hashed_subpages(pte, psize, va.seg_off, index,
 						    shift) {
 				__tlbiel(va, psize, ssize);
 			} pte_iterate_hashed_end();
@@ -542,7 +548,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			va = batch->vaddr[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize, va.addr, index,
+			pte_iterate_hashed_subpages(pte, psize, va.seg_off, index,
 						    shift) {
 				__tlbie(va, psize, ssize);
 			} pte_iterate_hashed_end();
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 2429d53..8b5d3c2 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1158,8 +1158,10 @@ void flush_hash_page(struct virt_addr va, real_pte_t pte, int psize, int ssize,
 {
 	unsigned long hash, index, shift, hidx, slot;
 
-	DBG_LOW("flush_hash_page(va=%016lx)\n", va.addr);
-	pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
+	DBG_LOW("flush_hash_page(vsid=%016lx seg_off=%016lx)\n",
+		va.vsid, va.seg_off);
+	/* since we won't cross segments, use seg_off for iteration */
+	pte_iterate_hashed_subpages(pte, psize, va.seg_off, index, shift) {
 		hash = hpt_hash(va, shift, ssize);
 		hidx = __rpte_to_hidx(pte, index);
 		if (hidx & _PTEIDX_SECONDARY)
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index 6e27576..4aa969d 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -75,8 +75,9 @@ static long ps3_hpte_insert(unsigned long hpte_group, struct virt_addr va,
 
 	if (result) {
 		/* all entries bolted !*/
-		pr_info("%s:result=%d va=%lx pa=%lx ix=%lx v=%llx r=%llx\n",
-			__func__, result, va, pa, hpte_group, hpte_v, hpte_r);
+		pr_info("%s:result=%d vsid=%lx seg_off=%lx pa=%lx ix=%lx "
+			"v=%llx r=%llx\n", __func__, result, va.vsid,
+			va.seg_off, pa, hpte_group, hpte_v, hpte_r);
 		BUG();
 	}
 
@@ -125,8 +126,8 @@ static long ps3_hpte_updatepp(unsigned long slot, unsigned long newpp,
 				       &hpte_rs);
 
 	if (result) {
-		pr_info("%s: res=%d read va=%lx slot=%lx psize=%d\n",
-			__func__, result, va, slot, psize);
+		pr_info("%s: res=%d read vsid=%lx seg_off=%lx slot=%lx psize=%d\n",
+			__func__, result, va.vsid, va.seg_off, slot, psize);
 		BUG();
 	}
 
@@ -170,8 +171,8 @@ static void ps3_hpte_invalidate(unsigned long slot, struct virt_addr va,
 	result = lv1_write_htab_entry(PS3_LPAR_VAS_ID_CURRENT, slot, 0, 0);
 
 	if (result) {
-		pr_info("%s: res=%d va=%lx slot=%lx psize=%d\n",
-			__func__, result, va, slot, psize);
+		pr_info("%s: res=%d vsid=%lx seg_off=%lx slot=%lx psize=%d\n",
+			__func__, result, va.vsid, va.seg_off, slot, psize);
 		BUG();
 	}
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index b4e9641..4c0848f 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -118,9 +118,10 @@ static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
 	unsigned long hpte_v, hpte_r;
 
 	if (!(vflags & HPTE_V_BOLTED))
-		pr_devel("hpte_insert(group=%lx, va=%016lx, pa=%016lx, "
-			 "rflags=%lx, vflags=%lx, psize=%d)\n",
-			 hpte_group, va.addr, pa, rflags, vflags, psize);
+		pr_devel("hpte_insert(group=%lx, vsid=%016lx, segoff=%016lx, "
+			 "pa=%016lx, rflags=%lx, vflags=%lx, psize=%d)\n",
+			 hpte_group, va.vsid, va.seg_off,
+			 pa, rflags, vflags, psize);
 
 	hpte_v = hpte_encode_v(va, psize, ssize) | vflags | HPTE_V_VALID;
 	hpte_r = hpte_encode_r(pa, psize) | rflags;
@@ -227,22 +228,6 @@ static void pSeries_lpar_hptab_clear(void)
 }
 
 /*
- * This computes the AVPN and B fields of the first dword of a HPTE,
- * for use when we want to match an existing PTE.  The bottom 7 bits
- * of the returned value are zero.
- */
-static inline unsigned long hpte_encode_avpn(struct virt_addr va, int psize,
-					     int ssize)
-{
-	unsigned long v;
-
-	v = (va.addr >> 23) & ~(mmu_psize_defs[psize].avpnm);
-	v <<= HPTE_V_AVPN_SHIFT;
-	v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
-	return v;
-}
-
-/*
  * NOTE: for updatepp ops we are fortunate that the linux "newpp" bits and
  * the low 3 bits of flags happen to line up.  So no transform is needed.
  * We can probably optimize here and assume the high bits of newpp are
@@ -345,8 +330,8 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, struct virt_addr va
 	unsigned long lpar_rc;
 	unsigned long dummy1, dummy2;
 
-	pr_devel("    inval : slot=%lx, va=%016lx, psize: %d, local: %d\n",
-		 slot, va.addr, psize, local);
+	pr_devel("    inval : slot=%lx, vsid=%016lx, seg_off=%016lx, psize: %d, local: %d\n",
+		 slot, va.vsid, va.seg_off, psize, local);
 
 	want_v = hpte_encode_avpn(va, psize, ssize);
 	lpar_rc = plpar_pte_remove(H_AVPN, slot, want_v, &dummy1, &dummy2);
@@ -403,7 +388,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	for (i = 0; i < number; i++) {
 		va = batch->vaddr[i];
 		pte = batch->pte[i];
-		pte_iterate_hashed_subpages(pte, psize, va.addr, index, shift) {
+		pte_iterate_hashed_subpages(pte, psize, va.seg_off, index, shift) {
 			hash = hpt_hash(va, shift, ssize);
 			hidx = __rpte_to_hidx(pte, index);
 			if (hidx & _PTEIDX_SECONDARY)
-- 
1.7.10

^ permalink raw reply related

* [Early RFC 5/6] arch/powerpc: Make KERN_VIRT_SIZE not dependend on PGTABLE_RANGE
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

As we keep increasing PGTABLE_RANGE we need not increase the virual
map area for kernel.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index c420561..8af1cf2 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -41,7 +41,7 @@
 #else
 #define KERN_VIRT_START ASM_CONST(0xD000000000000000)
 #endif
-#define KERN_VIRT_SIZE	PGTABLE_RANGE
+#define KERN_VIRT_SIZE	ASM_CONST(0x0000100000000000)
 
 /*
  * The vmalloc space starts at the beginning of that region, and
-- 
1.7.10

^ permalink raw reply related

* [Early RFC 6/6] arch/powerpc: Increase the slice range to 64TB
From: Aneesh Kumar K.V @ 2012-05-31 19:50 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This patch makes the high psizes mask as an unsigned char array
so that we can have more than 16TB. Currently we support upto
64TB

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mmu-hash64.h |    7 ++-
 arch/powerpc/include/asm/page_64.h    |    7 ++-
 arch/powerpc/mm/hash_utils_64.c       |   15 +++---
 arch/powerpc/mm/slb_low.S             |   35 ++++++++----
 arch/powerpc/mm/slice.c               |   95 +++++++++++++++++++++------------
 5 files changed, 109 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index e563bd2..0f8b10d 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -456,7 +456,12 @@ typedef struct {
 
 #ifdef CONFIG_PPC_MM_SLICES
 	u64 low_slices_psize;	/* SLB page size encodings */
-	u64 high_slices_psize;  /* 4 bits per slice for now */
+	/*
+	 * FIXME!! it should be derived from PGTABLE_RANGE
+	 * Right now we support 64TB and 4 bits for each
+	 * 1TB slice we need 32 bytes for 64TB.
+	 */
+	unsigned char high_slices_psize[32];  /* 4 bits per slice for now */
 #else
 	u16 sllp;		/* SLB page size encoding */
 #endif
diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h
index fed85e6..8806e87 100644
--- a/arch/powerpc/include/asm/page_64.h
+++ b/arch/powerpc/include/asm/page_64.h
@@ -82,7 +82,12 @@ extern u64 ppc64_pft_size;
 
 struct slice_mask {
 	u16 low_slices;
-	u16 high_slices;
+	/*
+	 * FIXME!!
+	 * This should be derived out of PGTABLE_RANGE. For the current
+	 * max 64TB, u64 should be ok.
+	 */
+	u64 high_slices;
 };
 
 struct mm_struct;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 8b5d3c2..beace0b 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -804,16 +804,19 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
 #ifdef CONFIG_PPC_MM_SLICES
 unsigned int get_paca_psize(unsigned long addr)
 {
-	unsigned long index, slices;
+	u64 lpsizes;
+	unsigned char *hpsizes;
+	unsigned long index, mask_index;
 
 	if (addr < SLICE_LOW_TOP) {
-		slices = get_paca()->context.low_slices_psize;
+		lpsizes = get_paca()->context.low_slices_psize;
 		index = GET_LOW_SLICE_INDEX(addr);
-	} else {
-		slices = get_paca()->context.high_slices_psize;
-		index = GET_HIGH_SLICE_INDEX(addr);
+		return (lpsizes >> (index * 4)) & 0xF;
 	}
-	return (slices >> (index * 4)) & 0xF;
+	hpsizes = get_paca()->context.high_slices_psize;
+	index = GET_HIGH_SLICE_INDEX(addr) >> 1;
+	mask_index = GET_HIGH_SLICE_INDEX(addr) - (index << 1);
+	return (hpsizes[index] >> (mask_index * 4)) & 0xF;
 }
 
 #else
diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index b9ee79ce..c355af6 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -108,17 +108,34 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 	 * between 4k and 64k standard page size
 	 */
 #ifdef CONFIG_PPC_MM_SLICES
+	/* r10 have esid */
 	cmpldi	r10,16
-
-	/* Get the slice index * 4 in r11 and matching slice size mask in r9 */
-	ld	r9,PACALOWSLICESPSIZE(r13)
-	sldi	r11,r10,2
+	/* below SLICE_LOW_TOP */
 	blt	5f
-	ld	r9,PACAHIGHSLICEPSIZE(r13)
-	srdi	r11,r10,(SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT - 2)
-	andi.	r11,r11,0x3c
-
-5:	/* Extract the psize and multiply to get an array offset */
+	/*
+	 * Handle hpsizes,
+	 * r9 is get_paca()->context.high_slices_psize[index], r11 is mask_index
+	 * We use r10 here, later we restore it to esid.
+	 * Can we use other register instead of r10 ?
+	 */
+	srdi    r10,r10,(SLICE_HIGH_SHIFT - SLICE_LOW_SHIFT) /* index */
+	srdi	r11,r10,1			/* r11 is array index */
+	addi	r9,r11,PACAHIGHSLICEPSIZE
+	lbzx	r9,r9,r13			/* r9 is hpsizes[r11] */
+	sldi    r11,r11,1
+	subf	r11,r11,r10	/* mask_index = index - (array_index << 1) */
+	srdi	r10,r3,28	/* restore r10 with esid */
+	b	6f
+5:
+	/*
+	 * Handle lpsizes
+	 * r9 is get_paca()->context.low_slices_psize, r11 is index
+	 */
+	ld	r9,PACALOWSLICESPSIZE(r13)
+	mr	r11,r10
+6:
+	sldi	r11,r11,2  /* index * 4 */
+	/* Extract the psize and multiply to get an array offset */
 	srd	r9,r9,r11
 	andi.	r9,r9,0xf
 	mulli	r9,r9,MMUPSIZEDEFSIZE
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 73709f7..302a481 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -42,7 +42,7 @@ int _slice_debug = 1;
 
 static void slice_print_mask(const char *label, struct slice_mask mask)
 {
-	char	*p, buf[16 + 3 + 16 + 1];
+	char	*p, buf[16 + 3 + 64 + 1];
 	int	i;
 
 	if (!_slice_debug)
@@ -142,19 +142,24 @@ static struct slice_mask slice_mask_for_free(struct mm_struct *mm)
 
 static struct slice_mask slice_mask_for_size(struct mm_struct *mm, int psize)
 {
+	unsigned char *hpsizes;
+	int index, mask_index;
 	struct slice_mask ret = { 0, 0 };
 	unsigned long i;
-	u64 psizes;
+	u64 lpsizes;
 
-	psizes = mm->context.low_slices_psize;
+	lpsizes = mm->context.low_slices_psize;
 	for (i = 0; i < SLICE_NUM_LOW; i++)
-		if (((psizes >> (i * 4)) & 0xf) == psize)
+		if (((lpsizes >> (i * 4)) & 0xf) == psize)
 			ret.low_slices |= 1u << i;
 
-	psizes = mm->context.high_slices_psize;
-	for (i = 0; i < SLICE_NUM_HIGH; i++)
-		if (((psizes >> (i * 4)) & 0xf) == psize)
+	hpsizes = mm->context.high_slices_psize;
+	for (i = 0; i < SLICE_NUM_HIGH; i++) {
+		index = i >> 1;
+		mask_index = i - (index << 1);
+		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
 			ret.high_slices |= 1u << i;
+	}
 
 	return ret;
 }
@@ -183,8 +188,10 @@ static void slice_flush_segments(void *parm)
 
 static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psize)
 {
+	int index, mask_index;
 	/* Write the new slice psize bits */
-	u64 lpsizes, hpsizes;
+	unsigned char *hpsizes;
+	u64 lpsizes;
 	unsigned long i, flags;
 
 	slice_dbg("slice_convert(mm=%p, psize=%d)\n", mm, psize);
@@ -201,14 +208,18 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 			lpsizes = (lpsizes & ~(0xful << (i * 4))) |
 				(((unsigned long)psize) << (i * 4));
 
+	/* Assign the value back */
+	mm->context.low_slices_psize = lpsizes;
+
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < SLICE_NUM_HIGH; i++)
+	for (i = 0; i < SLICE_NUM_HIGH; i++) {
+		index = i >> 1;
+		mask_index = i - (index << 1);
 		if (mask.high_slices & (1u << i))
-			hpsizes = (hpsizes & ~(0xful << (i * 4))) |
-				(((unsigned long)psize) << (i * 4));
-
-	mm->context.low_slices_psize = lpsizes;
-	mm->context.high_slices_psize = hpsizes;
+			hpsizes[index] = (hpsizes[index] &
+					  ~(0xf << (mask_index * 4))) |
+				(((unsigned long)psize) << (mask_index * 4));
+	}
 
 	slice_dbg(" lsps=%lx, hsps=%lx\n",
 		  mm->context.low_slices_psize,
@@ -587,18 +598,19 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp,
 
 unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
 {
-	u64 psizes;
-	int index;
+	unsigned char *hpsizes;
+	int index, mask_index;
 
 	if (addr < SLICE_LOW_TOP) {
-		psizes = mm->context.low_slices_psize;
+		u64 lpsizes;
+		lpsizes = mm->context.low_slices_psize;
 		index = GET_LOW_SLICE_INDEX(addr);
-	} else {
-		psizes = mm->context.high_slices_psize;
-		index = GET_HIGH_SLICE_INDEX(addr);
+		return (lpsizes >> (index * 4)) & 0xf;
 	}
-
-	return (psizes >> (index * 4)) & 0xf;
+	hpsizes = mm->context.high_slices_psize;
+	index = GET_HIGH_SLICE_INDEX(addr) >> 1;
+	mask_index = GET_HIGH_SLICE_INDEX(addr) - (index << 1);
+	return (hpsizes[index] >> (mask_index * 4)) & 0xf;
 }
 EXPORT_SYMBOL_GPL(get_slice_psize);
 
@@ -618,7 +630,9 @@ EXPORT_SYMBOL_GPL(get_slice_psize);
  */
 void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
 {
-	unsigned long flags, lpsizes, hpsizes;
+	int index, mask_index;
+	unsigned char *hpsizes;
+	unsigned long flags, lpsizes;
 	unsigned int old_psize;
 	int i;
 
@@ -639,15 +653,21 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
 		if (((lpsizes >> (i * 4)) & 0xf) == old_psize)
 			lpsizes = (lpsizes & ~(0xful << (i * 4))) |
 				(((unsigned long)psize) << (i * 4));
+	/* Assign the value back */
+	mm->context.low_slices_psize = lpsizes;
 
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < SLICE_NUM_HIGH; i++)
-		if (((hpsizes >> (i * 4)) & 0xf) == old_psize)
-			hpsizes = (hpsizes & ~(0xful << (i * 4))) |
-				(((unsigned long)psize) << (i * 4));
+	for (i = 0; i < SLICE_NUM_HIGH; i++) {
+		index = i >> 1;
+		mask_index = i - (index << 1);
+		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == old_psize)
+			hpsizes[index] = (hpsizes[index] &
+					  ~(0xf << (mask_index * 4))) |
+				(((unsigned long)psize) << (mask_index * 4));
+	}
+
+
 
-	mm->context.low_slices_psize = lpsizes;
-	mm->context.high_slices_psize = hpsizes;
 
 	slice_dbg(" lsps=%lx, hsps=%lx\n",
 		  mm->context.low_slices_psize,
@@ -660,18 +680,27 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
 void slice_set_psize(struct mm_struct *mm, unsigned long address,
 		     unsigned int psize)
 {
+	unsigned char *hpsizes;
 	unsigned long i, flags;
-	u64 *p;
+	u64 *lpsizes;
 
 	spin_lock_irqsave(&slice_convert_lock, flags);
 	if (address < SLICE_LOW_TOP) {
 		i = GET_LOW_SLICE_INDEX(address);
-		p = &mm->context.low_slices_psize;
+		lpsizes = &mm->context.low_slices_psize;
+		*lpsizes = (*lpsizes & ~(0xful << (i * 4))) |
+			((unsigned long) psize << (i * 4));
 	} else {
+		int index, mask_index;
 		i = GET_HIGH_SLICE_INDEX(address);
-		p = &mm->context.high_slices_psize;
+		hpsizes = mm->context.high_slices_psize;
+		index = i >> 1;
+		mask_index = i - (index << 1);
+		hpsizes[index] = (hpsizes[index] &
+				  ~(0xf << (mask_index * 4))) |
+			(((unsigned long)psize) << (mask_index * 4));
 	}
-	*p = (*p & ~(0xful << (i * 4))) | ((unsigned long) psize << (i * 4));
+
 	spin_unlock_irqrestore(&slice_convert_lock, flags);
 
 #ifdef CONFIG_SPU_BASE
-- 
1.7.10

^ permalink raw reply related

* Re: [Early RFC 0/6] arch/powerpc: Add 64TB support to ppc64
From: Aneesh Kumar K.V @ 2012-05-31 19:58 UTC (permalink / raw)
  To: benh, paulus, michael, anton; +Cc: linuxppc-dev
In-Reply-To: <1338493841-9926-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Hi,
>
> This patchset include preparatory patches for supporting 64TB with ppc64. I haven't
> completed the actual patch that bump the USER_ESID bits. I wanted the share the
> changes early so that I can get feedback on the approach. The changes itself
> contains few FIXME!! which I will be addressing in the later updates.
>

Here is the patch that update USER_ESID_BITS. I get a machine check
exception with this changes. That is why I didn't include this in the patch
series. 

commit 5fff8ff606bc136c510350a40528431196f60001
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed May 23 16:59:36 2012 +0530

    arch/powerpc: Add 64TB support
    
    Increase max addressable range to 64TB. This is not tested on
    real hardware yet.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index 0f8b10d..9f9a2a1 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -375,8 +375,8 @@ extern void slb_set_size(u16 size);
 #define VSID_MODULUS_1T		((1UL<<VSID_BITS_1T)-1)
 
 #define CONTEXT_BITS		19
-#define USER_ESID_BITS		16
-#define USER_ESID_BITS_1T	4
+#define USER_ESID_BITS		18
+#define USER_ESID_BITS_1T	6
 
 #define USER_VSID_RANGE	(1UL << (USER_ESID_BITS + SID_SHIFT))
 
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-4k.h b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
index 6eefdcf..b3eccf2 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-4k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-4k.h
@@ -7,7 +7,7 @@
  */
 #define PTE_INDEX_SIZE  9
 #define PMD_INDEX_SIZE  7
-#define PUD_INDEX_SIZE  7
+#define PUD_INDEX_SIZE  9
 #define PGD_INDEX_SIZE  9
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
index 90533dd..be4e287 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
@@ -7,7 +7,7 @@
 #define PTE_INDEX_SIZE  12
 #define PMD_INDEX_SIZE  12
 #define PUD_INDEX_SIZE	0
-#define PGD_INDEX_SIZE  4
+#define PGD_INDEX_SIZE  6
 
 #ifndef __ASSEMBLY__
 #define PTE_TABLE_SIZE	(sizeof(real_pte_t) << PTE_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 8e2d037..426ed13 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -100,8 +100,8 @@ extern struct task_struct *last_task_used_spe;
 #endif
 
 #ifdef CONFIG_PPC64
-/* 64-bit user address space is 44-bits (16TB user VM) */
-#define TASK_SIZE_USER64 (0x0000100000000000UL)
+/* 64-bit user address space is 46-bits (64TB user VM) */
+#define TASK_SIZE_USER64 (0x0000400000000000UL)
 
 /* 
  * 32-bit user address space is 4GB - 1 page 
diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
index 0c5fa31..f6fc0ee 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -10,8 +10,8 @@
  */
 #define SECTION_SIZE_BITS       24
 
-#define MAX_PHYSADDR_BITS       44
-#define MAX_PHYSMEM_BITS        44
+#define MAX_PHYSADDR_BITS       46
+#define MAX_PHYSMEM_BITS        46
 
 #endif /* CONFIG_SPARSEMEM */
 

^ permalink raw reply related

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Joakim Tjernlund @ 2012-05-31 21:38 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <4FC7AEC9.5050203@freescale.com>

Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
>
> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
> > Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
> >>
> >>
> >>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> >>>>
> >>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
> >>>>>> it works for me. I don't know if there are any bad side effects,
> >>>>>> therfore
> >>>>>> this RFC.
> >>>>
> >>>>> We used to have MSR_DE surrounded by CONFIG_something
> >>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> >>>>> is set, you will have problems with software debuggers that
> >>>>> utilize the the debugging registers in the chip itself.  You only want
> >>>>> to force this to be set when using the BDI, not at other times.
> >>>>
> >>>> This MSR_DE is also of interest and used for software debuggers that
> >>>> make use of the debug registers. Only if MSR_DE is set then debug
> >>>> interrupts are generated. If a debug event leads to a debug interrupt
> >>>> handled by a software debugger or if it leads to a debug halt handled
> >>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> >>>>
> >>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
> >>>> Debug Support" explains in detail the effect of MSR_DE.
> >>
> >>> So what is the verdict on this? I don't buy into Dan argument without some
> >>> hard data.
> >>
> >> What I tried to mention is that handling the MSR_DE correct is not only
> >> an emulator (JTAG debugger) requirement. Also a software debugger may
> >> depend on a correct handled MSR_DE bit.
> >
> > Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
> > turning off MSR_DE first chance it gets?
>
> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
> sure if anything will be bothered by leaving it on all the time.  This
> is something we need for virtualization as well, so a hypervisor can
> debug the guest.

hmm, I read that as you as in favour of the patch?

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Scott Wood @ 2012-05-31 21:43 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <OF3AA5925F.E17EF6A4-ONC1257A0F.0076C23F-C1257A0F.0076D836@transmode.se>

On 05/31/2012 04:38 PM, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
>>
>> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
>>> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
>>>>
>>>>
>>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
>>>>>>
>>>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
>>>>>>>> it works for me. I don't know if there are any bad side effects,
>>>>>>>> therfore
>>>>>>>> this RFC.
>>>>>>
>>>>>>> We used to have MSR_DE surrounded by CONFIG_something
>>>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
>>>>>>> is set, you will have problems with software debuggers that
>>>>>>> utilize the the debugging registers in the chip itself.  You only want
>>>>>>> to force this to be set when using the BDI, not at other times.
>>>>>>
>>>>>> This MSR_DE is also of interest and used for software debuggers that
>>>>>> make use of the debug registers. Only if MSR_DE is set then debug
>>>>>> interrupts are generated. If a debug event leads to a debug interrupt
>>>>>> handled by a software debugger or if it leads to a debug halt handled
>>>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
>>>>>>
>>>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
>>>>>> Debug Support" explains in detail the effect of MSR_DE.
>>>>
>>>>> So what is the verdict on this? I don't buy into Dan argument without some
>>>>> hard data.
>>>>
>>>> What I tried to mention is that handling the MSR_DE correct is not only
>>>> an emulator (JTAG debugger) requirement. Also a software debugger may
>>>> depend on a correct handled MSR_DE bit.
>>>
>>> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
>>> turning off MSR_DE first chance it gets?
>>
>> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
>> sure if anything will be bothered by leaving it on all the time.  This
>> is something we need for virtualization as well, so a hypervisor can
>> debug the guest.
> 
> hmm, I read that as you as in favour of the patch?

I'd want some confirmation that it doesn't break anything, and that
there aren't any other places that need MSR_DE that this doesn't cover,
but in general yes.

-Scott

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Joakim Tjernlund @ 2012-05-31 22:14 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <4FC7E606.1070205@freescale.com>

Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 23:43:34:
>
> On 05/31/2012 04:38 PM, Joakim Tjernlund wrote:
> > Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
> >>
> >> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
> >>> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
> >>>>
> >>>>
> >>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> >>>>>>
> >>>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
> >>>>>>>> it works for me. I don't know if there are any bad side effects,
> >>>>>>>> therfore
> >>>>>>>> this RFC.
> >>>>>>
> >>>>>>> We used to have MSR_DE surrounded by CONFIG_something
> >>>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> >>>>>>> is set, you will have problems with software debuggers that
> >>>>>>> utilize the the debugging registers in the chip itself.  You only want
> >>>>>>> to force this to be set when using the BDI, not at other times.
> >>>>>>
> >>>>>> This MSR_DE is also of interest and used for software debuggers that
> >>>>>> make use of the debug registers. Only if MSR_DE is set then debug
> >>>>>> interrupts are generated. If a debug event leads to a debug interrupt
> >>>>>> handled by a software debugger or if it leads to a debug halt handled
> >>>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> >>>>>>
> >>>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
> >>>>>> Debug Support" explains in detail the effect of MSR_DE.
> >>>>
> >>>>> So what is the verdict on this? I don't buy into Dan argument without some
> >>>>> hard data.
> >>>>
> >>>> What I tried to mention is that handling the MSR_DE correct is not only
> >>>> an emulator (JTAG debugger) requirement. Also a software debugger may
> >>>> depend on a correct handled MSR_DE bit.
> >>>
> >>> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
> >>> turning off MSR_DE first chance it gets?
> >>
> >> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
> >> sure if anything will be bothered by leaving it on all the time.  This
> >> is something we need for virtualization as well, so a hypervisor can
> >> debug the guest.
> >
> > hmm, I read that as you as in favour of the patch?
>
> I'd want some confirmation that it doesn't break anything, and that
> there aren't any other places that need MSR_DE that this doesn't cover,
> but in general yes.

Then you need to test drive the patch :)

 Jocke

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Scott Wood @ 2012-05-31 22:16 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <OF43D61D1E.31CC8410-ONC1257A0F.0079F3A4-C1257A0F.007A237A@transmode.se>

On 05/31/2012 05:14 PM, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 23:43:34:
>>
>> On 05/31/2012 04:38 PM, Joakim Tjernlund wrote:
>>> Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
>>>>
>>>> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
>>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
>>>>>>
>>>>>>
>>>>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
>>>>>>>>
>>>>>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
>>>>>>>>>> it works for me. I don't know if there are any bad side effects,
>>>>>>>>>> therfore
>>>>>>>>>> this RFC.
>>>>>>>>
>>>>>>>>> We used to have MSR_DE surrounded by CONFIG_something
>>>>>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
>>>>>>>>> is set, you will have problems with software debuggers that
>>>>>>>>> utilize the the debugging registers in the chip itself.  You only want
>>>>>>>>> to force this to be set when using the BDI, not at other times.
>>>>>>>>
>>>>>>>> This MSR_DE is also of interest and used for software debuggers that
>>>>>>>> make use of the debug registers. Only if MSR_DE is set then debug
>>>>>>>> interrupts are generated. If a debug event leads to a debug interrupt
>>>>>>>> handled by a software debugger or if it leads to a debug halt handled
>>>>>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
>>>>>>>>
>>>>>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
>>>>>>>> Debug Support" explains in detail the effect of MSR_DE.
>>>>>>
>>>>>>> So what is the verdict on this? I don't buy into Dan argument without some
>>>>>>> hard data.
>>>>>>
>>>>>> What I tried to mention is that handling the MSR_DE correct is not only
>>>>>> an emulator (JTAG debugger) requirement. Also a software debugger may
>>>>>> depend on a correct handled MSR_DE bit.
>>>>>
>>>>> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
>>>>> turning off MSR_DE first chance it gets?
>>>>
>>>> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
>>>> sure if anything will be bothered by leaving it on all the time.  This
>>>> is something we need for virtualization as well, so a hypervisor can
>>>> debug the guest.
>>>
>>> hmm, I read that as you as in favour of the patch?
>>
>> I'd want some confirmation that it doesn't break anything, and that
>> there aren't any other places that need MSR_DE that this doesn't cover,
>> but in general yes.
> 
> Then you need to test drive the patch :)

I was thinking more along the lines of someone who's more familiar with
the relevant parts of the code confirming that it's really OK, not just
testing that it doesn't blow up in my face.

-Scott

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Joakim Tjernlund @ 2012-05-31 22:33 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <4FC7EDD5.8090302@freescale.com>

Scott Wood <scottwood@freescale.com> wrote on 2012/06/01 00:16:53:
>
> On 05/31/2012 05:14 PM, Joakim Tjernlund wrote:
> > Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 23:43:34:
> >>
> >> On 05/31/2012 04:38 PM, Joakim Tjernlund wrote:
> >>> Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
> >>>>
> >>>> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
> >>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
> >>>>>>
> >>>>>>
> >>>>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> >>>>>>>>
> >>>>>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
> >>>>>>>>>> it works for me. I don't know if there are any bad side effects,
> >>>>>>>>>> therfore
> >>>>>>>>>> this RFC.
> >>>>>>>>
> >>>>>>>>> We used to have MSR_DE surrounded by CONFIG_something
> >>>>>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> >>>>>>>>> is set, you will have problems with software debuggers that
> >>>>>>>>> utilize the the debugging registers in the chip itself.  You only want
> >>>>>>>>> to force this to be set when using the BDI, not at other times.
> >>>>>>>>
> >>>>>>>> This MSR_DE is also of interest and used for software debuggers that
> >>>>>>>> make use of the debug registers. Only if MSR_DE is set then debug
> >>>>>>>> interrupts are generated. If a debug event leads to a debug interrupt
> >>>>>>>> handled by a software debugger or if it leads to a debug halt handled
> >>>>>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> >>>>>>>>
> >>>>>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
> >>>>>>>> Debug Support" explains in detail the effect of MSR_DE.
> >>>>>>
> >>>>>>> So what is the verdict on this? I don't buy into Dan argument without some
> >>>>>>> hard data.
> >>>>>>
> >>>>>> What I tried to mention is that handling the MSR_DE correct is not only
> >>>>>> an emulator (JTAG debugger) requirement. Also a software debugger may
> >>>>>> depend on a correct handled MSR_DE bit.
> >>>>>
> >>>>> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
> >>>>> turning off MSR_DE first chance it gets?
> >>>>
> >>>> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
> >>>> sure if anything will be bothered by leaving it on all the time.  This
> >>>> is something we need for virtualization as well, so a hypervisor can
> >>>> debug the guest.
> >>>
> >>> hmm, I read that as you as in favour of the patch?
> >>
> >> I'd want some confirmation that it doesn't break anything, and that
> >> there aren't any other places that need MSR_DE that this doesn't cover,
> >> but in general yes.
> >
> > Then you need to test drive the patch :)
>
> I was thinking more along the lines of someone who's more familiar with
> the relevant parts of the code confirming that it's really OK, not just
> testing that it doesn't blow up in my face.

Still needs a test run, just throw it in :)

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Joakim Tjernlund @ 2012-05-31 22:35 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <4FC7EDD5.8090302@freescale.com>

Scott Wood <scottwood@freescale.com> wrote on 2012/06/01 00:16:53:
>
> On 05/31/2012 05:14 PM, Joakim Tjernlund wrote:
> > Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 23:43:34:
> >>
> >> On 05/31/2012 04:38 PM, Joakim Tjernlund wrote:
> >>> Scott Wood <scottwood@freescale.com> wrote on 2012/05/31 19:47:53:
> >>>>
> >>>> On 05/31/2012 04:56 AM, Joakim Tjernlund wrote:
> >>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/31 11:30:57:
> >>>>>>
> >>>>>>
> >>>>>>> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> >>>>>>>>
> >>>>>>>>>> I have tested this briefly with BDI2000 on P2010(e500) and
> >>>>>>>>>> it works for me. I don't know if there are any bad side effects,
> >>>>>>>>>> therfore
> >>>>>>>>>> this RFC.
> >>>>>>>>
> >>>>>>>>> We used to have MSR_DE surrounded by CONFIG_something
> >>>>>>>>> to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> >>>>>>>>> is set, you will have problems with software debuggers that
> >>>>>>>>> utilize the the debugging registers in the chip itself.  You only want
> >>>>>>>>> to force this to be set when using the BDI, not at other times.
> >>>>>>>>
> >>>>>>>> This MSR_DE is also of interest and used for software debuggers that
> >>>>>>>> make use of the debug registers. Only if MSR_DE is set then debug
> >>>>>>>> interrupts are generated. If a debug event leads to a debug interrupt
> >>>>>>>> handled by a software debugger or if it leads to a debug halt handled
> >>>>>>>> by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> >>>>>>>>
> >>>>>>>> The "e500 Core Family Reference Manual" chapter "Chapter 8
> >>>>>>>> Debug Support" explains in detail the effect of MSR_DE.
> >>>>>>
> >>>>>>> So what is the verdict on this? I don't buy into Dan argument without some
> >>>>>>> hard data.
> >>>>>>
> >>>>>> What I tried to mention is that handling the MSR_DE correct is not only
> >>>>>> an emulator (JTAG debugger) requirement. Also a software debugger may
> >>>>>> depend on a correct handled MSR_DE bit.
> >>>>>
> >>>>> Yes, that made sense to me too. How would SW debuggers work if the kernel keeps
> >>>>> turning off MSR_DE first chance it gets?
> >>>>
> >>>> The kernel selectively enables MSR_DE when it wants to debug.  I'm not
> >>>> sure if anything will be bothered by leaving it on all the time.  This
> >>>> is something we need for virtualization as well, so a hypervisor can
> >>>> debug the guest.
> >>>
> >>> hmm, I read that as you as in favour of the patch?
> >>
> >> I'd want some confirmation that it doesn't break anything, and that
> >> there aren't any other places that need MSR_DE that this doesn't cover,
> >> but in general yes.
> >
> > Then you need to test drive the patch :)
>
> I was thinking more along the lines of someone who's more familiar with
> the relevant parts of the code confirming that it's really OK, not just
> testing that it doesn't blow up in my face.

It just occurred to me that you guys have this already in your Linux SDK so it can't be that bad.

 Jocke

^ permalink raw reply

* Re: [PATCH] powerpc/pci: cleanup on duplicate assignment
From: Richard Yang @ 2012-06-01  1:25 UTC (permalink / raw)
  To: Gavin Shan; +Cc: michael, linuxppc-dev
In-Reply-To: <1338445049-20520-1-git-send-email-shangw@linux.vnet.ibm.com>

Agree, this is a duplication.

On Thu, May 31, 2012 at 02:17:29PM +0800, Gavin Shan wrote:
>While creating the PCI root bus through function pci_create_root_bus()
>of PCI core, it should have assigned the secondary bus number for the
>newly created PCI root bus. Thus we needn't do the explicit assignment
>for the secondary bus number again in pcibios_scan_phb().
>
>Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>---
> arch/powerpc/kernel/pci-common.c |    1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>index 8e78e93..0f75bd5 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -1646,7 +1646,6 @@ void __devinit pcibios_scan_phb(struct pci_controller *hose)
> 		pci_free_resource_list(&resources);
> 		return;
> 	}
>-	bus->secondary = hose->first_busno;
> 	hose->bus = bus;
>
> 	/* Get probe mode and perform scan */
>-- 
>1.7.9.5
>
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

-- 
Richard Yang
Help you, Help me

^ permalink raw reply

* Re: [PATCH 5/9] blackfin: A couple of task->mm handling fixes
From: Mike Frysinger @ 2012-06-01  4:36 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: linaro-kernel, Russell King, Peter Zijlstra,
	user-mode-linux-devel, linux-sh, patches, Oleg Nesterov,
	linux-kernel, linux-mm, Paul Mundt, John Stultz, KOSAKI Motohiro,
	Richard Weinberger, uclinux-dist-devel, Andrew Morton,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20120423070901.GE30752@lizard>

[-- Attachment #1: Type: Text/Plain, Size: 1106 bytes --]

On Monday 23 April 2012 03:09:01 Anton Vorontsov wrote:
> 1. Working with task->mm w/o getting mm or grabing the task lock is
>    dangerous as ->mm might disappear (exit_mm() assigns NULL under
>    task_lock(), so tasklist lock is not enough).

that isn't a problem for this code as it specifically checks if it's in an 
atomic section.  if it is, then task->mm can't go away on us.

>    We can't use get_task_mm()/mmput() pair as mmput() might sleep,
>    so we have to take the task lock while handle its mm.

if we're not in an atomic section, then sleeping is fine.

> 2. Checking for process->mm is not enough because process' main
>    thread may exit or detach its mm via use_mm(), but other threads
>    may still have a valid mm.

i don't think it matters for this code (per the reasons above).

>    To catch this we use find_lock_task_mm(), which walks up all
>    threads and returns an appropriate task (with task lock held).

certainly fine for the non-atomic code path.  i guess we'll notice in crashes 
if it causes a problem in atomic code paths as well.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* linux-next: boot failuresfor next-20120601
From: Stephen Rothwell @ 2012-06-01  6:59 UTC (permalink / raw)
  To: linux-next; +Cc: ppc-dev, LKML

[-- Attachment #1: Type: text/plain, Size: 3296 bytes --]

Hi all,

Today's linux-next fails to boot on my PowerPC boxes like this (one
example):

calling  .ipr_init+0x0/0x68 @ 1
ipr: IBM Power RAID SCSI Device Driver version: 2.5.3 (March 10, 2012)
ipr 0000:01:01.0: Found IOA with IRQ: 26
ipr 0000:01:01.0: Starting IOA initialization sequence.
ipr 0000:01:01.0: Adapter firmware version: 06160039
ipr 0000:01:01.0: IOA initialized.
scsi0 : IBM 572E Storage Adapter
async_waiting @ 1
async_continuing @ 1 after 0 usec
scsi 0:0:1:0: Direct-Access     IBM-ESXS MAY2036RC        T106 PQ: 0 ANSI: 5
async_waiting @ 1
async_continuing @ 1 after 0 usec
async_waiting @ 1
async_continuing @ 1 after 0 usec
	.
	.	(lots and lots of these)
	.
async_waiting @ 1
async_continuing @ 1 after 0 usec
scsi: unknown device type 31
scsi 0:255:255:255: No Device         IBM      572E001          0150 PQ: 0 ANSI: 0
initcall .ipr_init+0x0/0x68 returned 0 after 967815 usecs
	.
	.
	.
initcall 1_.sd_probe_async+0x0/0x1d0 returned 0 after 220331 usecs
INFO: rcu_sched self-detected stall on CPU { 2}  (t=6001 jiffies)
Call Trace:
[c00000007897b2a0] [c000000000013b40] .show_stack+0x70/0x1c0 (unreliable)
[c00000007897b350] [c00000000012937c] .__rcu_pending+0x20c/0x580
[c00000007897b410] [c000000000129844] .rcu_check_callbacks+0x154/0x250
[c00000007897b4b0] [c0000000000b1e64] .update_process_times+0x44/0xa0
[c00000007897b540] [c0000000000fbc6c] .tick_sched_timer+0x7c/0x100
[c00000007897b5e0] [c0000000000cee88] .__run_hrtimer+0xb8/0x270
[c00000007897b690] [c0000000000cf3a8] .hrtimer_interrupt+0x128/0x2c0
[c00000007897b7a0] [c00000000001cc98] .timer_interrupt+0x108/0x290
[c00000007897b850] [c000000000003b80] decrementer_common+0x180/0x200
--- Exception: 901 at .arch_local_irq_restore+0x74/0x90
    LR = .arch_local_irq_restore+0x74/0x90
[c00000007897bb40] [c000000000010108] .arch_local_irq_restore+0x38/0x90 (unreliable)
[c00000007897bbb0] [c00000000009de30] .console_unlock+0x370/0x460
[c00000007897bca0] [c00000000041c1ec] .fb_flashcursor+0x7c/0x180
[c00000007897bd50] [c0000000000c30ec] .process_one_work+0x19c/0x590
[c00000007897be10] [c0000000000c3a48] .worker_thread+0x188/0x4b0
[c00000007897bed0] [c0000000000c9cdc] .kthread+0xbc/0xd0
[c00000007897bf90] [c000000000020eb8] .kernel_thread+0x54/0x70
	.
	.	
	.
calling  .initialize_hashrnd+0x0/0x40 @ 1
initcall .initialize_hashrnd+0x0/0x40 returned 0 after 1 usecs
async_waiting @ 1
async_continuing @ 1 after 0 usec
Freeing unused kernel memory: 468k freed
/init: 71: mknod: Permission denied
/init: 88: mknod: Permission denied
/init: 88: mknod: Permission denied
/init: 88: mknod: Permission denied
/init: 88: mknod: Permission denied
/init: 99: cannot open /proc/cmdline: No such file
/init: 103: mkdir: Permission denied
/init: 104: mkdir: Permission denied
/init: 108: udevadm: Permission denied
dracut: FATAL: No or empty root= argument
dracut: Refusing to continue


Signal caught!

Boot has failed, sleeping forever.
/init: 1: sleep: Permission denied
/init: 1: sleep: Permission denied
/init: 1: sleep: Permission denied
	.
	.
	.

Another machine did not use the ipr device and only got the init problems
at the end.

Anyone got any ideas?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Re[2]: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Benjamin Herrenschmidt @ 2012-06-01  9:12 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <OF53CDE18C.59DD377E-ONC1257A0F.0031C7A0-C1257A0F.0031EA81@transmode.se>

On Thu, 2012-05-31 at 11:05 +0200, Joakim Tjernlund wrote:
> Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> >
> > >> I have tested this briefly with BDI2000 on P2010(e500) and
> > >> it works for me. I don't know if there are any bad side effects,
> > >> therfore
> > >> this RFC.
> >
> > > We used to have MSR_DE surrounded by CONFIG_something
> > > to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> > > is set, you will have problems with software debuggers that
> > > utilize the the debugging registers in the chip itself.  You only want
> > > to force this to be set when using the BDI, not at other times.
> >
> > This MSR_DE is also of interest and used for software debuggers that
> > make use of the debug registers. Only if MSR_DE is set then debug
> > interrupts are generated. If a debug event leads to a debug interrupt
> > handled by a software debugger or if it leads to a debug halt handled
> > by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> >
> > The "e500 Core Family Reference Manual" chapter "Chapter 8
> > Debug Support" explains in detail the effect of MSR_DE.
> 
> So what is the verdict on this? I don't buy into Dan argument without some
> hard data.

The kernel normally controls when to set or not set MSR:DE, at least
when using SW breakpoints. Setting it globally should remain some kind
of specific debug option.

In fact on some CPUs, we even leave user set dbcr settings and rely on
DE being off in kernel space to avoid user->kernel attacks via the debug
registers (I think we still do that on 64-bit BookE though it should
eventually change).

Cheers,
Ben.

>  Jocke
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Benjamin Herrenschmidt @ 2012-06-01  9:14 UTC (permalink / raw)
  To: Bob Cochran; +Cc: linuxppc-dev, support
In-Reply-To: <4FC62018.3040404@mindchasers.com>

On Wed, 2012-05-30 at 09:26 -0400, Bob Cochran wrote:
> I believe that additional patches are required for CodeWarrior to
> work 
> properly (e.g., assembly start up).  I think the patches should come 
> from Freescale.  For whatever reason, they include them in their SDK, 
> but haven't submitted them for inclusion in the mainline.
> 
> As a developer on Freescale Power products, I would like to see 
> Freescale offer up a CodeWarrior patch set, so I don't have to manage 
> the patches myself when working outside the SDK (i.e., on a more
> recent 
> kernel).

Such patches would have a hard time getting upstream considering that
codewarrior is a commercial product.

Ben.

^ permalink raw reply

* Re: kernel panic during kernel module load (powerpc specific part)
From: Benjamin Herrenschmidt @ 2012-06-01  9:18 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Wrobel Heinz-R39252, Michael Ellerman, Steffen Rumler,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20120531110439.GA17373@visitor2.iram.es>

On Thu, 2012-05-31 at 13:04 +0200, Gabriel Paubert wrote:

> I believe that the basic premise is that you should provide a directly reachable copy 
> of the save/rstore functions, even if this means that you need several copies of the functions.

I just fixed a very similar problem with grub2 in fact. It was using r0
and trashing the saved LR that way.

The real fix is indeed to statically link those gcc "helpers", we
shouldn't generate things like cross-module calls inside function
prologs and epilogues, when stackframes aren't even guaranteed to be
reliable.

However, in the grub2 case, it was easier to just use r12 :-)

Cheers,
Ben.

> > 
> > Unfortunately the same doc and predecessors show r11 in all basic examples for PLT/trampoline code AFAICS, which is likely why all trampoline code uses r11 in any known case.
> > 
> > I would guess that it was never envisioned that compiler generated code would be in a different section than save/restore functions, i.e., the Linux module "__init" assumptions for Power break the ABI. Or does the ABI break the __init concept?!
> > 
> > Using r12 in the trampoline seems to be the obvious solution for module loading.
> > 
> > But what about other code loading done? If, e.g., a user runs any app from bash it gets loaded and relocated and trampolines might get set up somehow.
> 
> I don't think so. The linker/whatever should generate a copy of the save/restore functions for every
> executable code area (shared library), and probably more than one copy if the text becomes too large.
> 
> For 64 bit code, these functions are actually inserted by the linker.
> 
> [Ok, I just recompiled my 64 bit kernel with -Os and I see that vmlinux gets one copy
> of the save/restore functions and every module also gets its copy.]
> 
> This makes sense, really these functions are there for a compromise between locality 
> and performance, there should be one per code section, otherwise the cache line 
> used by the trampoline negates a large part of their advantage.
> 
> > 
> > Wouldn't we have to find fix ANY trampoline code generator remotely related to a basic Power Architecture Linux?
> > Or is it a basic assumption for anything but modules that compiler generated code may never ever be outside the .text section? I am not sure that would be a safe assumption.
> > 
> > Isn't this problem going beyond just module loading for Power Architecture Linux?
> 
> I don't think so. It really seems to be a 32 bit kernel problem.
> 
> 	Regards,
> 	Gabriel
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* [PATCH 18/27] powerpc, smpboot: Use generic SMP booting infrastructure
From: Srivatsa S. Bhat @ 2012-06-01  9:14 UTC (permalink / raw)
  To: tglx, peterz, paulmck
  Cc: linux-arch, Thomas Gleixner, Nikunj A. Dadhania, Paul Gortmaker,
	rusty, vatsa, linux-kernel, rjw, Yong Zhang, Paul Mackerras,
	srivatsa.bhat, akpm, linuxppc-dev, mingo
In-Reply-To: <20120601090952.31979.24799.stgit@srivatsabhat.in.ibm.com>

From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>

Convert powerpc to use the generic framework to boot secondary CPUs.

Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/smp.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 1928058a..96c3718 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -544,16 +544,18 @@ static struct device_node *cpu_to_l2cache(int cpu)
 /* Activate a secondary processor. */
 void __devinit start_secondary(void *unused)
 {
+	smpboot_start_secondary(unused);
+}
+
+void __cpuinit __cpu_pre_starting(void *unused)
+{
 	unsigned int cpu = smp_processor_id();
-	struct device_node *l2_cache;
-	int i, base;
 
 	atomic_inc(&init_mm.mm_count);
 	current->active_mm = &init_mm;
 
 	smp_store_cpu_info(cpu);
 	set_dec(tb_ticks_per_jiffy);
-	preempt_disable();
 	cpu_callin_map[cpu] = 1;
 
 	if (smp_ops->setup_cpu)
@@ -567,8 +569,16 @@ void __devinit start_secondary(void *unused)
 	if (system_state == SYSTEM_RUNNING)
 		vdso_data->processorCount++;
 #endif
-	notify_cpu_starting(cpu);
-	set_cpu_online(cpu, true);
+}
+
+void __cpuinit __cpu_post_online(void *unused)
+{
+	unsigned int cpu;
+	struct device_node *l2_cache;
+	int i, base;
+
+	cpu = smp_processor_id();
+
 	/* Update sibling maps */
 	base = cpu_first_thread_sibling(cpu);
 	for (i = 0; i < threads_per_core; i++) {
@@ -596,12 +606,6 @@ void __devinit start_secondary(void *unused)
 		of_node_put(np);
 	}
 	of_node_put(l2_cache);
-
-	local_irq_enable();
-
-	cpu_idle();
-
-	BUG();
 }
 
 int setup_profiling_timer(unsigned int multiplier)

^ permalink raw reply related

* [PATCH] powerpc/time: Sanity check of decrementer expiration is necessary
From: Paul Mackerras @ 2012-06-01 10:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anton Blanchard

This reverts 68568add2c ("powerpc/time: Remove unnecessary sanity check
of decrementer expiration").  We do need to check whether we have reached
the expiration time of the next event, because we sometimes get an early
decrementer interrupt, most notably when we set the decrementer to 1 in
arch_irq_work_raise().  The effect of not having the sanity check is that
if timer_interrupt() gets called early, we leave the decrementer set to
its maximum value, which means we then don't get any more decrementer
interrupts for about 4 seconds (or longer, depending on timebase
frequency).  I saw these pauses as a consequence of getting a stray
hypervisor decrementer interrupt left over from exiting a KVM guest.

This isn't quite a straight revert because of changes to the surrounding
code, but it restores the same algorithm as was previously used.

Cc: stable@kernel.org
Cc: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
If there are no objections, I'll send this to Linus shortly.  This
regression is present in 3.3 and 3.4 as well as current upstream.

 arch/powerpc/kernel/time.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 99a995c..be171ee 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -475,6 +475,7 @@ void timer_interrupt(struct pt_regs * regs)
 	struct pt_regs *old_regs;
 	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
 	struct clock_event_device *evt = &__get_cpu_var(decrementers);
+	u64 now;

 	/* Ensure a positive value is written to the decrementer, or else
 	 * some CPUs will continue to take decrementer exceptions.
@@ -509,9 +510,16 @@ void timer_interrupt(struct pt_regs * regs)
 		irq_work_run();
 	}

-	*next_tb = ~(u64)0;
-	if (evt->event_handler)
-		evt->event_handler(evt);
+	now = get_tb_or_rtc();
+	if (now >= *next_tb) {
+		*next_tb = ~(u64)0;
+		if (evt->event_handler)
+			evt->event_handler(evt);
+	} else {
+		now = *next_tb - now;
+		if (now <= DECREMENTER_MAX)
+			set_dec((int)now);
+	}

 #ifdef CONFIG_PPC64
 	/* collect purr register values often, for accurate calculations */
-- 
1.7.10

^ permalink raw reply related

* power management patch set for mpc85xx
From: Zhao Chenhui-B35336 @ 2012-06-01 10:29 UTC (permalink / raw)
  To: benh@kernel.crashing.org, paulus@samba.org
  Cc: linuxppc-dev@lists.ozlabs.org, Li Yang-R58472

Hi Ben and Paul,

I am sorry to trouble you. It seems that Kumar is busy recently.

Could you have a review on the following patches? These patches
implement the power management support on MPC85xx platform.

http://patchwork.ozlabs.org/patch/158484/
http://patchwork.ozlabs.org/patch/158485/
http://patchwork.ozlabs.org/patch/158487/
http://patchwork.ozlabs.org/patch/158486/
http://patchwork.ozlabs.org/patch/158488/

Thanks.

Best Regards,
Chenhui

^ permalink raw reply

* Re: Re[2]: [RFC] [PATCH] powerpc: Add MSR_DE to MSR_KERNEL
From: Joakim Tjernlund @ 2012-06-01 10:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Dan Malek, Bob Cochran, Support
In-Reply-To: <1338541971.16119.47.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 2012/06/01 11:12:51:
>
> On Thu, 2012-05-31 at 11:05 +0200, Joakim Tjernlund wrote:
> > Abatron Support <support@abatron.ch> wrote on 2012/05/30 14:08:26:
> > >
> > > >> I have tested this briefly with BDI2000 on P2010(e500) and
> > > >> it works for me. I don't know if there are any bad side effects,
> > > >> therfore
> > > >> this RFC.
> > >
> > > > We used to have MSR_DE surrounded by CONFIG_something
> > > > to ensure it wasn't set under normal operation.  IIRC, if MSR_DE
> > > > is set, you will have problems with software debuggers that
> > > > utilize the the debugging registers in the chip itself.  You only want
> > > > to force this to be set when using the BDI, not at other times.
> > >
> > > This MSR_DE is also of interest and used for software debuggers that
> > > make use of the debug registers. Only if MSR_DE is set then debug
> > > interrupts are generated. If a debug event leads to a debug interrupt
> > > handled by a software debugger or if it leads to a debug halt handled
> > > by a JTAG tool is selected with DBCR0_EDM / DBCR0_IDM.
> > >
> > > The "e500 Core Family Reference Manual" chapter "Chapter 8
> > > Debug Support" explains in detail the effect of MSR_DE.
> >
> > So what is the verdict on this? I don't buy into Dan argument without some
> > hard data.
>
> The kernel normally controls when to set or not set MSR:DE, at least
> when using SW breakpoints. Setting it globally should remain some kind
> of specific debug option.
>
> In fact on some CPUs, we even leave user set dbcr settings and rely on
> DE being off in kernel space to avoid user->kernel attacks via the debug
> registers (I think we still do that on 64-bit BookE though it should
> eventually change).

hmm, would it not be better to always clear out/control dbcr settings and always have MSR:DE
on? It would be much easier to control dbcr, even dynamically, than MSR:DE

 Jocke

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox