linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 00/10] Reduce the pte framgment size.
@ 2015-11-23 10:33 Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 01/10] powerpc/mm: Don't hardcode page table size Aneesh Kumar K.V
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Hi,

This patch series update 4k subpage tracking in pte page, thereby reducing
the ptefragment size. This results in us allocating less number of
pgtable_t for an application. One of the side effect is that we now make
and hcall to find out whether a 4k subpage is present in the hash page table or not.
We try to optmize that in patch "powerpc/mm: Optmize the hashed subpage iteration"

Changes from V1:
* rebased on top of 4.3 + change pte format series
* Use H_READ_4 so that we read 4 hpte slot information in single hcall.

Aneesh Kumar K.V (10):
  powerpc/mm: Don't hardcode page table size
  powerpc/mm: Don't hardcode the hash pte slot shift
  powerpc/nohash: Update 64K nohash config to have 32 pte fragement
  powerpc/nohash: we don't use real_pte_t for nohash
  powerpc/mm: Use H_READ with H_READ_4
  powerpc/mm: Don't track 4k subpage information with 64k linux page
    size
  powerpc/mm: update PTE frag size
  powerpc/mm: Update pte_iterate_hashed_subpages args
  powerpc/mm: Drop real_pte_t usage
  powerpc/mm: Optmize the hashed subpage iteration

 arch/powerpc/include/asm/book3s/64/hash-64k.h    |  82 ++++++++-----------
 arch/powerpc/include/asm/book3s/64/pgtable.h     |  35 ++++----
 arch/powerpc/include/asm/machdep.h               |   1 +
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h |  21 ++++-
 arch/powerpc/include/asm/nohash/64/pgtable.h     |  33 --------
 arch/powerpc/include/asm/page.h                  |  15 ----
 arch/powerpc/include/asm/pgalloc-64.h            |  10 ---
 arch/powerpc/include/asm/plpar_wrappers.h        |  17 ++++
 arch/powerpc/include/asm/tlbflush.h              |   4 +-
 arch/powerpc/mm/hash64_64k.c                     | 100 ++++++++++++++---------
 arch/powerpc/mm/hash_native_64.c                 |  55 +++++++++++--
 arch/powerpc/mm/hash_utils_64.c                  |  13 +--
 arch/powerpc/mm/init_64.c                        |   7 +-
 arch/powerpc/mm/pgtable_64.c                     |   6 +-
 arch/powerpc/mm/tlb_hash64.c                     |  15 ++--
 arch/powerpc/platforms/pseries/lpar.c            |  90 +++++++++++++-------
 16 files changed, 279 insertions(+), 225 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 01/10] powerpc/mm: Don't hardcode page table size
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 02/10] powerpc/mm: Don't hardcode the hash pte slot shift Aneesh Kumar K.V
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h    | 30 ++++++++++++++++++------
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 22 +++++++++++++----
 arch/powerpc/include/asm/pgalloc-64.h            | 10 --------
 arch/powerpc/mm/init_64.c                        |  4 ----
 4 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 957d66d13a97..565f9418c25f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -25,12 +25,6 @@
 #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
-/* Bits to mask out from a PMD to get to the PTE page */
-/* PMDs point to PTE table fragments which are 4K aligned.  */
-#define PMD_MASKED_BITS		0xfff
-/* Bits to mask out from a PGD/PUD to get to the PMD page */
-#define PUD_MASKED_BITS		0x1ff
-
 #define _PAGE_COMBO	0x00020000 /* this is a combo 4k page */
 #define _PAGE_4K_PFN	0x00040000 /* PFN is for a single 4k page */
 
@@ -44,6 +38,24 @@
  * of addressable physical space, or 46 bits for the special 4k PFNs.
  */
 #define PTE_RPN_SHIFT	(30)
+/*
+ * we support 8 fragments per PTE page of 64K size.
+ */
+#define PTE_FRAG_NR	8
+/*
+ * We use a 2K PTE page fragment and another 4K for storing
+ * real_pte_t hash index. Rounding the entire thing to 8K
+ */
+#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+
+/*
+ * Bits to mask out from a PMD to get to the PTE page
+ * PMDs point to PTE table fragments which are PTE_FRAG_SIZE aligned.
+ */
+#define PMD_MASKED_BITS		(PTE_FRAG_SIZE - 1)
+/* Bits to mask out from a PGD/PUD to get to the PMD page */
+#define PUD_MASKED_BITS		0x1ff
 
 #ifndef __ASSEMBLY__
 
@@ -112,8 +124,12 @@ static inline bool __rpte_sub_valid(real_pte_t rpte, unsigned long index)
 		remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE,	\
 			__pgprot(pgprot_val((prot)) | _PAGE_4K_PFN)))
 
-#define PTE_TABLE_SIZE	(sizeof(real_pte_t) << PTE_INDEX_SIZE)
+#define PTE_TABLE_SIZE	PTE_FRAG_SIZE
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define PMD_TABLE_SIZE	((sizeof(pmd_t) << PMD_INDEX_SIZE) + (sizeof(unsigned long) << PMD_INDEX_SIZE))
+#else
 #define PMD_TABLE_SIZE	(sizeof(pmd_t) << PMD_INDEX_SIZE)
+#endif
 #define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
 
 #define pgd_pte(pgd)	(pud_pte(((pud_t){ pgd })))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index a44660d76096..1d8e26e8167b 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -9,8 +9,20 @@
 #define PUD_INDEX_SIZE	0
 #define PGD_INDEX_SIZE  12
 
+/*
+ * we support 8 fragments per PTE page of 64K size
+ */
+#define PTE_FRAG_NR	8
+/*
+ * We use a 2K PTE page fragment and another 4K for storing
+ * real_pte_t hash index. Rounding the entire thing to 8K
+ */
+#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+
+
 #ifndef __ASSEMBLY__
-#define PTE_TABLE_SIZE	(sizeof(real_pte_t) << PTE_INDEX_SIZE)
+#define PTE_TABLE_SIZE	PTE_FRAG_SIZE
 #define PMD_TABLE_SIZE	(sizeof(pmd_t) << PMD_INDEX_SIZE)
 #define PGD_TABLE_SIZE	(sizeof(pgd_t) << PGD_INDEX_SIZE)
 #endif	/* __ASSEMBLY__ */
@@ -32,9 +44,11 @@
 #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
-/* Bits to mask out from a PMD to get to the PTE page */
-/* PMDs point to PTE table fragments which are 4K aligned.  */
-#define PMD_MASKED_BITS		0xfff
+/*
+ * Bits to mask out from a PMD to get to the PTE page
+ * PMDs point to PTE table fragments which are PTE_FRAG_SIZE aligned.
+ */
+#define PMD_MASKED_BITS		(PTE_FRAG_SIZE - 1)
 /* Bits to mask out from a PGD/PUD to get to the PMD page */
 #define PUD_MASKED_BITS		0x1ff
 
diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index 4f1cc6c46728..69ef28a81733 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -163,16 +163,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 }
 
 #else /* if CONFIG_PPC_64K_PAGES */
-/*
- * we support 8 fragments per PTE page.
- */
-#define PTE_FRAG_NR	8
-/*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
- */
-#define PTE_FRAG_SIZE_SHIFT  13
-#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
 extern void page_table_free(struct mm_struct *, unsigned long *, int);
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index d747dd7bc90b..379a6a90644b 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -87,11 +87,7 @@ static void pgd_ctor(void *addr)
 
 static void pmd_ctor(void *addr)
 {
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	memset(addr, 0, PMD_TABLE_SIZE * 2);
-#else
 	memset(addr, 0, PMD_TABLE_SIZE);
-#endif
 }
 
 struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 02/10] powerpc/mm: Don't hardcode the hash pte slot shift
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 01/10] powerpc/mm: Don't hardcode page table size Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 03/10] powerpc/nohash: Update 64K nohash config to have 32 pte fragement Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Use the #define instead of open-coding the same

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 2 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 565f9418c25f..681657cabbe4 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -71,7 +71,7 @@ static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index)
 {
 	if ((pte_val(rpte.pte) & _PAGE_COMBO))
 		return (unsigned long) rpte.hidx[index] >> 4;
-	return (pte_val(rpte.pte) >> 12) & 0xf;
+	return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
 }
 
 static inline pte_t __rpte_to_pte(real_pte_t rpte)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 0b43ca60dcb9..64ef7316ff88 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -50,7 +50,7 @@
 #define __real_pte(a,e,p)	(e)
 #define __rpte_to_pte(r)	(__pte(r))
 #endif
-#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >> 12)
+#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >>_PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
 	do {							         \
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index c4dff4d41c26..8969b4c93c4f 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -121,7 +121,7 @@
 #define __real_pte(a,e,p)	(e)
 #define __rpte_to_pte(r)	(__pte(r))
 #endif
-#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >> 12)
+#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >> _PAGE_F_GIX_SHIFT)
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
 	do {							         \
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 03/10] powerpc/nohash: Update 64K nohash config to have 32 pte fragement
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 01/10] powerpc/mm: Don't hardcode page table size Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 02/10] powerpc/mm: Don't hardcode the hash pte slot shift Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 04/10] powerpc/nohash: we don't use real_pte_t for nohash Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index 1d8e26e8167b..dbd9de9264c2 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -10,14 +10,14 @@
 #define PGD_INDEX_SIZE  12
 
 /*
- * we support 8 fragments per PTE page of 64K size
+ * we support 32 fragments per PTE page of 64K size
  */
-#define PTE_FRAG_NR	8
+#define PTE_FRAG_NR	32
 /*
  * We use a 2K PTE page fragment and another 4K for storing
  * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 04/10] powerpc/nohash: we don't use real_pte_t for nohash
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 03/10] powerpc/nohash: Update 64K nohash config to have 32 pte fragement Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 05/10] powerpc/mm: Use H_READ with H_READ_4 Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Remove the related functions and #defines

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/nohash/64/pgtable.h | 33 ----------------------------
 1 file changed, 33 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 8969b4c93c4f..b9f734dd5b81 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -106,39 +106,6 @@
 #endif /* CONFIG_PPC_MM_SLICES */
 
 #ifndef __ASSEMBLY__
-
-/*
- * This is the default implementation of various PTE accessors, it's
- * used in all cases except Book3S with 64K pages where we have a
- * concept of sub-pages
- */
-#ifndef __real_pte
-
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
-#define __real_pte(a,e,p)	((real_pte_t){(e)})
-#define __rpte_to_pte(r)	((r).pte)
-#else
-#define __real_pte(a,e,p)	(e)
-#define __rpte_to_pte(r)	(__pte(r))
-#endif
-#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >> _PAGE_F_GIX_SHIFT)
-
-#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
-	do {							         \
-		index = 0;					         \
-		shift = mmu_psize_defs[psize].shift;		         \
-
-#define pte_iterate_hashed_end() } while(0)
-
-/*
- * We expect this to be called only for user addresses or kernel virtual
- * addresses other than the linear mapping.
- */
-#define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
-
-#endif /* __real_pte */
-
-
 /* pte_clear moved to later in this file */
 
 #define PMD_BAD_BITS		(PTE_TABLE_SIZE-1)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 05/10] powerpc/mm: Use H_READ with H_READ_4
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 04/10] powerpc/nohash: we don't use real_pte_t for nohash Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 06/10] powerpc/mm: Don't track 4k subpage information with 64k linux page size Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

This will bulk read 4 hash pte slot entries and should reduce the loop

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/plpar_wrappers.h | 17 ++++++++++
 arch/powerpc/platforms/pseries/lpar.c     | 54 +++++++++++++++----------------
 2 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 67859edbf8fd..1b394247afc2 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -202,6 +202,23 @@ static inline long plpar_pte_read_raw(unsigned long flags, unsigned long ptex,
 }
 
 /*
+ * ptes must be 8*sizeof(unsigned long)
+ */
+static inline long plpar_pte_read_4(unsigned long flags, unsigned long ptex,
+				    unsigned long *ptes)
+
+{
+	long rc;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+	rc = plpar_hcall9(H_READ, retbuf, flags | H_READ_4, ptex);
+
+	memcpy(ptes, retbuf, 8*sizeof(unsigned long));
+
+	return rc;
+}
+
+/*
  * plpar_pte_read_4_raw can be called in real mode.
  * ptes must be 8*sizeof(unsigned long)
  */
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 6d46547871aa..477290ad855e 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -315,48 +315,48 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
 	return 0;
 }
 
-static unsigned long pSeries_lpar_hpte_getword0(unsigned long slot)
+static long __pSeries_lpar_hpte_find(unsigned long want_v, unsigned long hpte_group)
 {
-	unsigned long dword0;
-	unsigned long lpar_rc;
-	unsigned long dummy_word1;
-	unsigned long flags;
+	long lpar_rc;
+	unsigned long i, j;
+	struct {
+		unsigned long pteh;
+		unsigned long ptel;
+	} ptes[4];
 
-	/* Read 1 pte at a time                        */
-	/* Do not need RPN to logical page translation */
-	/* No cross CEC PFT access                     */
-	flags = 0;
+	for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) {
 
-	lpar_rc = plpar_pte_read(flags, slot, &dword0, &dummy_word1);
+		lpar_rc = plpar_pte_read_4(0, hpte_group, (void *)ptes);
+		if (lpar_rc != H_SUCCESS)
+			continue;
 
-	BUG_ON(lpar_rc != H_SUCCESS);
+		for (j = 0; j < 4; j++) {
+			if (HPTE_V_COMPARE(ptes[j].pteh, want_v) &&
+			    (ptes[j].pteh & HPTE_V_VALID))
+				return i + j;
+		}
+	}
 
-	return dword0;
+	return -1;
 }
 
 static long pSeries_lpar_hpte_find(unsigned long vpn, int psize, int ssize)
 {
-	unsigned long hash;
-	unsigned long i;
 	long slot;
-	unsigned long want_v, hpte_v;
+	unsigned long hash;
+	unsigned long want_v;
+	unsigned long hpte_group;
 
 	hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
 	want_v = hpte_encode_avpn(vpn, psize, ssize);
 
 	/* Bolted entries are always in the primary group */
-	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-	for (i = 0; i < HPTES_PER_GROUP; i++) {
-		hpte_v = pSeries_lpar_hpte_getword0(slot);
-
-		if (HPTE_V_COMPARE(hpte_v, want_v) && (hpte_v & HPTE_V_VALID))
-			/* HPTE matches */
-			return slot;
-		++slot;
-	}
-
-	return -1;
-} 
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+	slot = __pSeries_lpar_hpte_find(want_v, hpte_group);
+	if (slot < 0)
+		return -1;
+	return hpte_group + slot;
+}
 
 static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 					     unsigned long ea,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 06/10] powerpc/mm: Don't track 4k subpage information with 64k linux page size
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 05/10] powerpc/mm: Use H_READ with H_READ_4 Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 07/10] powerpc/mm: update PTE frag size Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

We search the hash table to find the slot information. This slows down
the lookup, but we do that only for 4k subpage config

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 33 ++----------------
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 11 +++++-
 arch/powerpc/include/asm/machdep.h            |  1 +
 arch/powerpc/include/asm/page.h               |  4 +--
 arch/powerpc/mm/hash64_64k.c                  | 44 ++++++++++++++----------
 arch/powerpc/mm/hash_native_64.c              | 49 ++++++++++++++++++++++++++-
 arch/powerpc/mm/hash_utils_64.c               |  5 ++-
 arch/powerpc/mm/pgtable_64.c                  |  6 ++--
 arch/powerpc/platforms/pseries/lpar.c         | 30 +++++++++++++++-
 9 files changed, 127 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 681657cabbe4..5062c6d423fd 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -67,51 +67,22 @@
  */
 #define __real_pte __real_pte
 extern real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep);
-static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index)
-{
-	if ((pte_val(rpte.pte) & _PAGE_COMBO))
-		return (unsigned long) rpte.hidx[index] >> 4;
-	return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
-}
-
+extern unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
+				    unsigned long vpn, int ssize, bool *valid);
 static inline pte_t __rpte_to_pte(real_pte_t rpte)
 {
 	return rpte.pte;
 }
 /*
- * we look at the second half of the pte page to determine whether
- * the sub 4k hpte is valid. We use 8 bits per each index, and we have
- * 16 index mapping full 64K page. Hence for each
- * 64K linux page we use 128 bit from the second half of pte page.
- * The encoding in the second half of the page is as below:
- * [ index 15 ] .........................[index 0]
- * [bit 127 ..................................bit 0]
- * fomat of each index
- * bit 7 ........ bit0
- * [one bit secondary][ 3 bit hidx][1 bit valid][000]
- */
-static inline bool __rpte_sub_valid(real_pte_t rpte, unsigned long index)
-{
-	unsigned char index_val = rpte.hidx[index];
-
-	if ((index_val >> 3) & 0x1)
-		return true;
-	return false;
-}
-
-/*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
 #define pte_iterate_hashed_subpages(rpte, psize, vpn, index, shift)	\
 	do {								\
 		unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT));	\
-		unsigned __split = (psize == MMU_PAGE_4K ||		\
-				    psize == MMU_PAGE_64K_AP);		\
 		shift = mmu_psize_defs[psize].shift;			\
 		for (index = 0; vpn < __end; index++,			\
 			     vpn += (1L << (shift - VPN_SHIFT))) {	\
-			if (!__split || __rpte_sub_valid(rpte, index))	\
 				do {
 
 #define pte_iterate_hashed_end() } while(0); } } while(0)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 64ef7316ff88..875b2ca3d0a9 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -50,7 +50,16 @@
 #define __real_pte(a,e,p)	(e)
 #define __rpte_to_pte(r)	(__pte(r))
 #endif
-#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >>_PAGE_F_GIX_SHIFT)
+static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
+					   unsigned long vpn, int ssize, bool *valid)
+{
+	*valid = false;
+	if (pte_val(__rpte_to_pte(rpte)) & _PAGE_HASHPTE) {
+		*valid = true;
+		return (pte_val(__rpte_to_pte(rpte)) >> _PAGE_F_GIX_SHIFT) & 0xf;
+	}
+	return 0;
+}
 
 #define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
 	do {							         \
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 3f191f573d4f..0ad49663fc87 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -61,6 +61,7 @@ struct machdep_calls {
 					       unsigned long addr,
 					       unsigned char *hpte_slot_array,
 					       int psize, int ssize, int local);
+	long (*get_hpte_slot)(unsigned long want_v, unsigned long hash);
 	/*
 	 * Special for kexec.
 	 * To be called in real mode with interrupts disabled. No locks are
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f63b2761cdd0..bbdf9e6cc8b1 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -295,7 +295,7 @@ static inline pte_basic_t pte_val(pte_t x)
  * the "second half" part of the PTE for pseudo 64k pages
  */
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
+typedef struct { pte_t pte; } real_pte_t;
 #else
 typedef struct { pte_t pte; } real_pte_t;
 #endif
@@ -347,7 +347,7 @@ static inline pte_basic_t pte_val(pte_t pte)
 }
 
 #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned char *hidx; } real_pte_t;
+typedef struct { pte_t pte; } real_pte_t;
 #else
 typedef pte_t real_pte_t;
 #endif
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 84867a1491a2..983574e6f8d5 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -18,29 +18,40 @@
 
 real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
 {
-	int indx;
 	real_pte_t rpte;
-	pte_t *pte_headp;
 
 	rpte.pte = pte;
-	rpte.hidx = NULL;
-	if (pte_val(pte) & _PAGE_COMBO) {
-		indx = pte_index(addr);
-		pte_headp = ptep - indx;
-		/*
-		 * Make sure we order the hidx load against the _PAGE_COMBO
-		 * check. The store side ordering is done in __hash_page_4K
-		 */
-		smp_rmb();
-		rpte.hidx = (unsigned char *)(pte_headp + PTRS_PER_PTE) + (16 * indx);
-	}
 	return rpte;
 }
 
+unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
+			     unsigned long vpn, int ssize, bool *valid)
+{
+	long slot;
+	unsigned long want_v;
+
+	*valid = false;
+	if ((pte_val(rpte.pte) & _PAGE_COMBO)) {
+
+		want_v = hpte_encode_avpn(vpn, MMU_PAGE_4K, ssize);
+		slot = ppc_md.get_hpte_slot(want_v, hash);
+		if (slot < 0)
+			return 0;
+		*valid = true;
+		return slot;
+	}
+	if (pte_val(rpte.pte) & _PAGE_HASHPTE) {
+		*valid = true;
+		return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
+	}
+	return 0;
+}
+
 int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		   pte_t *ptep, unsigned long trap, unsigned long flags,
 		   int ssize, int subpg_prot)
 {
+	bool valid_slot;
 	real_pte_t rpte;
 	unsigned long hpte_group;
 	unsigned int subpg_index;
@@ -111,11 +122,11 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	/*
 	 * Check for sub page valid and update
 	 */
-	if (__rpte_sub_valid(rpte, subpg_index)) {
+	hash = hpt_hash(vpn, shift, ssize);
+	hidx = __rpte_to_hidx(rpte, hash, vpn, ssize, &valid_slot);
+	if (valid_slot) {
 		int ret;
 
-		hash = hpt_hash(vpn, shift, ssize);
-		hidx = __rpte_to_hidx(rpte, subpg_index);
 		if (hidx & _PTEIDX_SECONDARY)
 			hash = ~hash;
 		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
@@ -191,7 +202,6 @@ repeat:
 	 * Since we have _PAGE_BUSY set on ptep, we can be sure
 	 * nobody is undating hidx.
 	 */
-	rpte.hidx[subpg_index] = (unsigned char)(slot << 4 | 0x1 << 3);
 	new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE | _PAGE_COMBO;
 	/*
 	 * check __real_pte for details on matching smp_rmb()
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 8eaac81347fd..63074bc031b1 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -659,12 +659,16 @@ static void native_flush_hash_range(unsigned long number, int local)
 	local_irq_save(flags);
 
 	for (i = 0; i < number; i++) {
+		bool valid_slot;
+
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
+			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+			if (!valid_slot)
+				continue;
 			if (hidx & _PTEIDX_SECONDARY)
 				hash = ~hash;
 			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
@@ -690,6 +694,9 @@ static void native_flush_hash_range(unsigned long number, int local)
 
 			pte_iterate_hashed_subpages(pte, psize,
 						    vpn, index, shift) {
+				/*
+				 * We are not looking at subpage valid here
+				 */
 				__tlbiel(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
@@ -707,6 +714,9 @@ static void native_flush_hash_range(unsigned long number, int local)
 
 			pte_iterate_hashed_subpages(pte, psize,
 						    vpn, index, shift) {
+				/*
+				 * We are not looking at subpage valid here
+				 */
 				__tlbie(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
@@ -719,6 +729,42 @@ static void native_flush_hash_range(unsigned long number, int local)
 	local_irq_restore(flags);
 }
 
+/*
+ * return the slot (3 bits) details in a hash pte group. For secondary
+ * hash we also set the top bit.
+ */
+static long native_get_hpte_slot(unsigned long want_v, unsigned long hash)
+{
+	int i;
+	unsigned long slot;
+	unsigned long hpte_v;
+	struct hash_pte *hptep;
+
+	/*
+	 * try primary first
+	 */
+	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+	for (i = 0; i < HPTES_PER_GROUP; i++) {
+		hptep = htab_address + slot;
+		hpte_v = be64_to_cpu(hptep->v);
+		if (HPTE_V_COMPARE(hpte_v, want_v) && (hpte_v & HPTE_V_VALID))
+			return i;
+		++slot;
+	}
+
+	/* try secondary */
+	slot = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
+	for (i = 0; i < HPTES_PER_GROUP; i++) {
+		hptep = htab_address + slot;
+		hpte_v = be64_to_cpu(hptep->v);
+		if (HPTE_V_COMPARE(hpte_v, want_v) && (hpte_v & HPTE_V_VALID))
+			/* Add secondary bit */
+			return i | (1 << 3);
+		++slot;
+	}
+	return -1;
+}
+
 void __init hpte_init_native(void)
 {
 	ppc_md.hpte_invalidate	= native_hpte_invalidate;
@@ -729,4 +775,5 @@ void __init hpte_init_native(void)
 	ppc_md.hpte_clear_all	= native_hpte_clear;
 	ppc_md.flush_hash_range = native_flush_hash_range;
 	ppc_md.hugepage_invalidate   = native_hugepage_invalidate;
+	ppc_md.get_hpte_slot	= native_get_hpte_slot;
 }
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 3d261bc6fef8..f3d113b32c5e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1297,13 +1297,16 @@ out_exit:
 void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
 		     unsigned long flags)
 {
+	bool valid_slot;
 	unsigned long hash, index, shift, hidx, slot;
 	int local = flags & HPTE_LOCAL_UPDATE;
 
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
 	pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
-		hidx = __rpte_to_hidx(pte, index);
+		hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+		if (!valid_slot)
+			continue;
 		if (hidx & _PTEIDX_SECONDARY)
 			hash = ~hash;
 		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index ea6bc31debb0..835c6a4ded90 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -417,9 +417,11 @@ pte_t *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
 
 	pte = get_from_cache(mm);
 	if (pte)
-		return pte;
+		goto out;
 
-	return __alloc_for_cache(mm, kernel);
+	pte = __alloc_for_cache(mm, kernel);
+out:
+	return pte;
 }
 
 void page_table_free(struct mm_struct *mm, unsigned long *table, int kernel)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 477290ad855e..828e298f6ce6 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -545,11 +545,15 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	ssize = batch->ssize;
 	pix = 0;
 	for (i = 0; i < number; i++) {
+		bool valid_slot;
+
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
+			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+			if (!valid_slot)
+				continue;
 			if (hidx & _PTEIDX_SECONDARY)
 				hash = ~hash;
 			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
@@ -588,6 +592,29 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
 }
 
+static long pSeries_lpar_get_hpte_slot(unsigned long want_v, unsigned long hash)
+{
+	long slot;
+	unsigned long hpte_group;
+
+	/*
+	 * try primary first
+	 */
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+	slot = __pSeries_lpar_hpte_find(want_v, hpte_group);
+	if (slot >= 0)
+		return slot;
+
+	/* try secondary */
+	hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
+	slot = __pSeries_lpar_hpte_find(want_v, hpte_group);
+	if (slot >= 0)
+		return slot | (1 << 3);
+	return -1;
+}
+
+
+
 static int __init disable_bulk_remove(char *str)
 {
 	if (strcmp(str, "off") == 0 &&
@@ -611,6 +638,7 @@ void __init hpte_init_lpar(void)
 	ppc_md.flush_hash_range	= pSeries_lpar_flush_hash_range;
 	ppc_md.hpte_clear_all   = pSeries_lpar_hptab_clear;
 	ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
+	ppc_md.get_hpte_slot	= pSeries_lpar_get_hpte_slot;
 }
 
 #ifdef CONFIG_PPC_SMLPAR
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 07/10] powerpc/mm: update PTE frag size
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 06/10] powerpc/mm: Don't track 4k subpage information with 64k linux page size Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-27  7:27   ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 08/10] powerpc/mm: Update pte_iterate_hashed_subpages args Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Now that we don't track 4k subpage information we can use 2K PTE
fragments.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 5062c6d423fd..a28dbfe2baed 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -39,14 +39,14 @@
  */
 #define PTE_RPN_SHIFT	(30)
 /*
- * we support 8 fragments per PTE page of 64K size.
+ * we support 32 fragments per PTE page of 64K size.
  */
-#define PTE_FRAG_NR	8
+#define PTE_FRAG_NR	32
 /*
  * We use a 2K PTE page fragment and another 4K for storing
  * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
 /*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 08/10] powerpc/mm: Update pte_iterate_hashed_subpages args
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 07/10] powerpc/mm: update PTE frag size Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 09/10] powerpc/mm: Drop real_pte_t usage Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 10/10] powerpc/mm: Optmize the hashed subpage iteration Aneesh Kumar K.V
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Now that we don't really use real_pte_t drop them from iterator argument
list. The follow up patch will remove real_pte_t completely

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  5 +++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  7 +++----
 arch/powerpc/mm/hash_native_64.c              | 10 ++++------
 arch/powerpc/mm/hash_utils_64.c               |  6 +++---
 arch/powerpc/platforms/pseries/lpar.c         |  4 ++--
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index a28dbfe2baed..19e0afb36fa8 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -77,9 +77,10 @@ static inline pte_t __rpte_to_pte(real_pte_t rpte)
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
-#define pte_iterate_hashed_subpages(rpte, psize, vpn, index, shift)	\
+#define pte_iterate_hashed_subpages(vpn, psize, shift)			\
 	do {								\
-		unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT));	\
+		unsigned long index;					\
+		unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT)); \
 		shift = mmu_psize_defs[psize].shift;			\
 		for (index = 0; vpn < __end; index++,			\
 			     vpn += (1L << (shift - VPN_SHIFT))) {	\
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 875b2ca3d0a9..63120d4025d7 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -61,10 +61,9 @@ static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
 	return 0;
 }
 
-#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
-	do {							         \
-		index = 0;					         \
-		shift = mmu_psize_defs[psize].shift;		         \
+#define pte_iterate_hashed_subpages(vpn, psize, shift)		\
+	do {							\
+		shift = mmu_psize_defs[psize].shift;		\
 
 #define pte_iterate_hashed_end() } while(0)
 
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 63074bc031b1..15c92279953d 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -645,7 +645,7 @@ static void native_hpte_clear(void)
 static void native_flush_hash_range(unsigned long number, int local)
 {
 	unsigned long vpn;
-	unsigned long hash, index, hidx, shift, slot;
+	unsigned long hash, hidx, shift, slot;
 	struct hash_pte *hptep;
 	unsigned long hpte_v;
 	unsigned long want_v;
@@ -664,7 +664,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 
-		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+		pte_iterate_hashed_subpages(vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
 			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
@@ -692,8 +692,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			vpn = batch->vpn[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize,
-						    vpn, index, shift) {
+			pte_iterate_hashed_subpages(vpn, psize, shift) {
 				/*
 				 * We are not looking at subpage valid here
 				 */
@@ -712,8 +711,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			vpn = batch->vpn[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize,
-						    vpn, index, shift) {
+			pte_iterate_hashed_subpages(vpn, psize, shift) {
 				/*
 				 * We are not looking at subpage valid here
 				 */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f3d113b32c5e..99a9de74993e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1298,11 +1298,11 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
 		     unsigned long flags)
 {
 	bool valid_slot;
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long hash, shift, hidx, slot;
 	int local = flags & HPTE_LOCAL_UPDATE;
 
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
-	pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+	pte_iterate_hashed_subpages(vpn, psize, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
 		hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 		if (!valid_slot)
@@ -1311,7 +1311,7 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
 			hash = ~hash;
 		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 		slot += hidx & _PTEIDX_GROUP_IX;
-		DBG_LOW(" sub %ld: hash=%lx, hidx=%lx\n", index, slot, hidx);
+		DBG_LOW(" hash=%lx, hidx=%lx\n", slot, hidx);
 		/*
 		 * We use same base page size and actual psize, because we don't
 		 * use these functions for hugepage
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 828e298f6ce6..e3c20ea64ec8 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -534,7 +534,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[9];
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long hash, shift, hidx, slot;
 	real_pte_t pte;
 	int psize, ssize;
 
@@ -549,7 +549,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
-		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+		pte_iterate_hashed_subpages(vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
 			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 09/10] powerpc/mm: Drop real_pte_t usage
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 08/10] powerpc/mm: Update pte_iterate_hashed_subpages args Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  2015-11-23 10:33 ` [PATCH V2 10/10] powerpc/mm: Optmize the hashed subpage iteration Aneesh Kumar K.V
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

Now that we don't track 4k subpage slot details, get rid of real_pte

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h    | 15 +++--------
 arch/powerpc/include/asm/book3s/64/pgtable.h     | 33 ++++++++++--------------
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h |  3 +--
 arch/powerpc/include/asm/page.h                  | 15 -----------
 arch/powerpc/include/asm/tlbflush.h              |  4 +--
 arch/powerpc/mm/hash64_64k.c                     | 29 ++++++---------------
 arch/powerpc/mm/hash_native_64.c                 |  4 +--
 arch/powerpc/mm/hash_utils_64.c                  |  4 +--
 arch/powerpc/mm/init_64.c                        |  3 +--
 arch/powerpc/mm/tlb_hash64.c                     | 15 +++++------
 arch/powerpc/platforms/pseries/lpar.c            |  4 +--
 11 files changed, 43 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 19e0afb36fa8..5f18801ae722 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -43,8 +43,7 @@
  */
 #define PTE_FRAG_NR	32
 /*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
+ * We use a 2K PTE page fragment
  */
 #define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
@@ -58,21 +57,15 @@
 #define PUD_MASKED_BITS		0x1ff
 
 #ifndef __ASSEMBLY__
-
 /*
  * With 64K pages on hash table, we have a special PTE format that
  * uses a second "half" of the page table to encode sub-page information
  * in order to deal with 64K made of 4K HW pages. Thus we override the
  * generic accessors and iterators here
  */
-#define __real_pte __real_pte
-extern real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep);
-extern unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
-				    unsigned long vpn, int ssize, bool *valid);
-static inline pte_t __rpte_to_pte(real_pte_t rpte)
-{
-	return rpte.pte;
-}
+#define pte_to_hidx pte_to_hidx
+extern unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
+				 unsigned long vpn, int ssize, bool *valid);
 /*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 63120d4025d7..74be69c8e5de 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -35,45 +35,40 @@
 #define __HAVE_ARCH_PTE_SPECIAL
 
 #ifndef __ASSEMBLY__
-
 /*
  * This is the default implementation of various PTE accessors, it's
  * used in all cases except Book3S with 64K pages where we have a
  * concept of sub-pages
  */
-#ifndef __real_pte
-
-#ifdef CONFIG_STRICT_MM_TYPECHECKS
-#define __real_pte(a,e,p)	((real_pte_t){(e)})
-#define __rpte_to_pte(r)	((r).pte)
-#else
-#define __real_pte(a,e,p)	(e)
-#define __rpte_to_pte(r)	(__pte(r))
-#endif
-static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
-					   unsigned long vpn, int ssize, bool *valid)
+#ifndef pte_to_hidx
+static inline unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
+					unsigned long vpn, int ssize, bool *valid)
 {
 	*valid = false;
-	if (pte_val(__rpte_to_pte(rpte)) & _PAGE_HASHPTE) {
+	if (pte_val(pte) & _PAGE_HASHPTE) {
 		*valid = true;
-		return (pte_val(__rpte_to_pte(rpte)) >> _PAGE_F_GIX_SHIFT) & 0xf;
+		return (pte_val(pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
 	}
 	return 0;
 }
+#define pte_to_hidx pte_to_hidx
+#endif
 
-#define pte_iterate_hashed_subpages(vpn, psize, shift)		\
-	do {							\
-		shift = mmu_psize_defs[psize].shift;		\
+#ifndef pte_iterate_hashed_subpages
+#define pte_iterate_hashed_subpages(vpn, psize, shift)	\
+	do {						\
+		shift = mmu_psize_defs[psize].shift;	\
 
 #define pte_iterate_hashed_end() } while(0)
+#endif
 
 /*
  * We expect this to be called only for user addresses or kernel virtual
  * addresses other than the linear mapping.
  */
+#ifndef pte_pagesize_index
 #define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
-
-#endif /* __real_pte */
+#endif
 
 static inline void pmd_set(pmd_t *pmdp, unsigned long val)
 {
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
index dbd9de9264c2..0f075799ae97 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
@@ -14,8 +14,7 @@
  */
 #define PTE_FRAG_NR	32
 /*
- * We use a 2K PTE page fragment and another 4K for storing
- * real_pte_t hash index. Rounding the entire thing to 8K
+ * We use a 2K PTE page fragment
  */
 #define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index bbdf9e6cc8b1..ac30cfd6f9c1 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -291,14 +291,6 @@ static inline pte_basic_t pte_val(pte_t x)
 	return x.pte;
 }
 
-/* 64k pages additionally define a bigger "real PTE" type that gathers
- * the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
 
 /* PMD level */
 #ifdef CONFIG_PPC64
@@ -346,13 +338,6 @@ static inline pte_basic_t pte_val(pte_t pte)
 	return pte;
 }
 
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; } real_pte_t;
-#else
-typedef pte_t real_pte_t;
-#endif
-
-
 #ifdef CONFIG_PPC64
 typedef unsigned long pmd_t;
 #define __pmd(x)	(x)
diff --git a/arch/powerpc/include/asm/tlbflush.h b/arch/powerpc/include/asm/tlbflush.h
index 23d351ca0303..1a4824fabcad 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -94,7 +94,7 @@ struct ppc64_tlb_batch {
 	int			active;
 	unsigned long		index;
 	struct mm_struct	*mm;
-	real_pte_t		pte[PPC64_TLB_BATCH_NR];
+	pte_t			pte[PPC64_TLB_BATCH_NR];
 	unsigned long		vpn[PPC64_TLB_BATCH_NR];
 	unsigned int		psize;
 	int			ssize;
@@ -124,7 +124,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
 #define arch_flush_lazy_mmu_mode()      do {} while (0)
 
 
-extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
+extern void flush_hash_page(unsigned long vpn, pte_t pte, int psize,
 			    int ssize, unsigned long flags);
 extern void flush_hash_range(unsigned long number, int local);
 extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 983574e6f8d5..c0bed3d01c1c 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -16,23 +16,14 @@
 #include <asm/machdep.h>
 #include <asm/mmu.h>
 
-real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
-{
-	real_pte_t rpte;
-
-	rpte.pte = pte;
-	return rpte;
-}
-
-unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
-			     unsigned long vpn, int ssize, bool *valid)
+unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
+			  unsigned long vpn, int ssize, bool *valid)
 {
 	long slot;
 	unsigned long want_v;
 
 	*valid = false;
-	if ((pte_val(rpte.pte) & _PAGE_COMBO)) {
-
+	if ((pte_val(pte) & _PAGE_COMBO)) {
 		want_v = hpte_encode_avpn(vpn, MMU_PAGE_4K, ssize);
 		slot = ppc_md.get_hpte_slot(want_v, hash);
 		if (slot < 0)
@@ -40,9 +31,9 @@ unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long hash,
 		*valid = true;
 		return slot;
 	}
-	if (pte_val(rpte.pte) & _PAGE_HASHPTE) {
+	if (pte_val(pte) & _PAGE_HASHPTE) {
 		*valid = true;
-		return (pte_val(rpte.pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
+		return (pte_val(pte) >> _PAGE_F_GIX_SHIFT) & 0xf;
 	}
 	return 0;
 }
@@ -52,7 +43,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		   int ssize, int subpg_prot)
 {
 	bool valid_slot;
-	real_pte_t rpte;
 	unsigned long hpte_group;
 	unsigned int subpg_index;
 	unsigned long rflags, pa, hidx;
@@ -101,10 +91,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 
 	subpg_index = (ea & (PAGE_SIZE - 1)) >> shift;
 	vpn  = hpt_vpn(ea, vsid, ssize);
-	if (!(old_pte & _PAGE_COMBO))
-		rpte = __real_pte(ea, __pte(old_pte | _PAGE_COMBO), ptep);
-	else
-		rpte = __real_pte(ea, __pte(old_pte), ptep);
 	/*
 	 *None of the sub 4k page is hashed
 	 */
@@ -115,7 +101,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	 * as a 64k HW page, and invalidate the 64k HPTE if so.
 	 */
 	if (!(old_pte & _PAGE_COMBO)) {
-		flush_hash_page(vpn, rpte, MMU_PAGE_64K, ssize, flags);
+		flush_hash_page(vpn, __pte(old_pte), MMU_PAGE_64K, ssize, flags);
 		old_pte &= ~_PAGE_HASHPTE | _PAGE_F_GIX | _PAGE_F_SECOND;
 		goto htab_insert_hpte;
 	}
@@ -123,7 +109,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	 * Check for sub page valid and update
 	 */
 	hash = hpt_hash(vpn, shift, ssize);
-	hidx = __rpte_to_hidx(rpte, hash, vpn, ssize, &valid_slot);
+	hidx = pte_to_hidx(__pte(old_pte), hash, vpn, ssize, &valid_slot);
 	if (valid_slot) {
 		int ret;
 
@@ -205,6 +191,7 @@ repeat:
 	new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE | _PAGE_COMBO;
 	/*
 	 * check __real_pte for details on matching smp_rmb()
+	 * FIXME!! We can possibly get rid of this ?
 	 */
 	smp_wmb();
 	*ptep = __pte(new_pte & ~_PAGE_BUSY);
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 15c92279953d..9f7f6673e726 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -650,7 +650,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 	unsigned long hpte_v;
 	unsigned long want_v;
 	unsigned long flags;
-	real_pte_t pte;
+	pte_t pte;
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	unsigned long psize = batch->psize;
 	int ssize = batch->ssize;
@@ -666,7 +666,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 
 		pte_iterate_hashed_subpages(vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+			hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
 				continue;
 			if (hidx & _PTEIDX_SECONDARY)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 99a9de74993e..80e71ccc9474 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1294,7 +1294,7 @@ out_exit:
 /* WARNING: This is called from hash_low_64.S, if you change this prototype,
  *          do not forget to update the assembly call site !
  */
-void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
+void flush_hash_page(unsigned long vpn, pte_t pte, int psize, int ssize,
 		     unsigned long flags)
 {
 	bool valid_slot;
@@ -1304,7 +1304,7 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
 	pte_iterate_hashed_subpages(vpn, psize, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
-		hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+		hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 		if (!valid_slot)
 			continue;
 		if (hidx & _PTEIDX_SECONDARY)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 379a6a90644b..6478c4970c2d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -95,8 +95,7 @@ struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
 /*
  * Create a kmem_cache() for pagetables.  This is not used for PTE
  * pages - they're linked to struct page, come from the normal free
- * pages pool and have a different entry size (see real_pte_t) to
- * everything else.  Caches created by this function are used for all
+ * pages pool. Caches created by this function are used for all
  * the higher level pagetables, and for hugepage pagetables.
  */
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index dd0fd1783bcc..5fa78b1ab7d3 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -41,14 +41,14 @@ DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
  * batch on it.
  */
 void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
-		     pte_t *ptep, unsigned long pte, int huge)
+		     pte_t *ptep, unsigned long ptev, int huge)
 {
 	unsigned long vpn;
 	struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
 	unsigned long vsid;
 	unsigned int psize;
 	int ssize;
-	real_pte_t rpte;
+	pte_t pte;
 	int i;
 
 	i = batch->index;
@@ -67,10 +67,10 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 		addr &= ~((1UL << mmu_psize_defs[psize].shift) - 1);
 #else
 		BUG();
-		psize = pte_pagesize_index(mm, addr, pte); /* shutup gcc */
+		psize = pte_pagesize_index(mm, addr, ptev); /* shutup gcc */
 #endif
 	} else {
-		psize = pte_pagesize_index(mm, addr, pte);
+		psize = pte_pagesize_index(mm, addr, ptev);
 		/* Mask the address for the standard page size.  If we
 		 * have a 64k page kernel, but the hardware does not
 		 * support 64k pages, this might be different from the
@@ -89,8 +89,7 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 	}
 	WARN_ON(vsid == 0);
 	vpn = hpt_vpn(addr, vsid, ssize);
-	rpte = __real_pte(addr, __pte(pte), ptep);
-
+	pte = __pte(ptev);
 	/*
 	 * Check if we have an active batch on this CPU. If not, just
 	 * flush now and return. For now, we don global invalidates
@@ -98,7 +97,7 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 	 * and decide to use local invalidates instead...
 	 */
 	if (!batch->active) {
-		flush_hash_page(vpn, rpte, psize, ssize, 0);
+		flush_hash_page(vpn, pte, psize, ssize, 0);
 		put_cpu_var(ppc64_tlb_batch);
 		return;
 	}
@@ -123,7 +122,7 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 		batch->psize = psize;
 		batch->ssize = ssize;
 	}
-	batch->pte[i] = rpte;
+	batch->pte[i] = pte;
 	batch->vpn[i] = vpn;
 	batch->index = ++i;
 	if (i >= PPC64_TLB_BATCH_NR)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index e3c20ea64ec8..1708cab20fc8 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -535,7 +535,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[9];
 	unsigned long hash, shift, hidx, slot;
-	real_pte_t pte;
+	pte_t pte;
 	int psize, ssize;
 
 	if (lock_tlbie)
@@ -551,7 +551,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		pte = batch->pte[i];
 		pte_iterate_hashed_subpages(vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
+			hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
 				continue;
 			if (hidx & _PTEIDX_SECONDARY)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 10/10] powerpc/mm: Optmize the hashed subpage iteration
  2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
                   ` (8 preceding siblings ...)
  2015-11-23 10:33 ` [PATCH V2 09/10] powerpc/mm: Drop real_pte_t usage Aneesh Kumar K.V
@ 2015-11-23 10:33 ` Aneesh Kumar K.V
  9 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-23 10:33 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov
  Cc: linuxppc-dev, Aneesh Kumar K.V

If we have _PAGE_COMBO set, we override the _PAGE_F_GIX_SHIFT
and _PAGE_F_SECOND. Together we have 4 bits, each of them
used to indicate whether any of the 4 4k subpage in that group
is valid. ie,

[ group 1 bit ]   [ group 2 bit ]  ..... [ group 4 ]
[ subpage 1 - 4]  [ subpage 5- 8]  ..... [ subpage 13 - 16]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  9 ++++-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  6 ++--
 arch/powerpc/mm/hash64_64k.c                  | 51 ++++++++++++++++++++-------
 arch/powerpc/mm/hash_native_64.c              | 12 ++-----
 arch/powerpc/mm/hash_utils_64.c               |  2 +-
 arch/powerpc/platforms/pseries/lpar.c         |  2 +-
 6 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 5f18801ae722..9ae5eb82fb85 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -27,6 +27,11 @@
 
 #define _PAGE_COMBO	0x00020000 /* this is a combo 4k page */
 #define _PAGE_4K_PFN	0x00040000 /* PFN is for a single 4k page */
+/*
+ * Used to track subpage group valid if _PAGE_COMBO is set
+ * This overloads _PAGE_F_GIX and _PAGE_F_SECOND
+ */
+#define _PAGE_COMBO_VALID	(_PAGE_F_GIX | _PAGE_F_SECOND)
 
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_F_SECOND | \
@@ -66,17 +71,19 @@
 #define pte_to_hidx pte_to_hidx
 extern unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
 				 unsigned long vpn, int ssize, bool *valid);
+extern bool pte_or_subptegroup_valid(pte_t pte, unsigned long index);
 /*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
-#define pte_iterate_hashed_subpages(vpn, psize, shift)			\
+#define pte_iterate_hashed_subpages(pte, vpn, psize, shift)		\
 	do {								\
 		unsigned long index;					\
 		unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT)); \
 		shift = mmu_psize_defs[psize].shift;			\
 		for (index = 0; vpn < __end; index++,			\
 			     vpn += (1L << (shift - VPN_SHIFT))) {	\
+			if (pte_or_subptegroup_valid(pte, index))		\
 				do {
 
 #define pte_iterate_hashed_end() } while(0); } } while(0)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 74be69c8e5de..9000884cf715 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -55,9 +55,9 @@ static inline unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
 #endif
 
 #ifndef pte_iterate_hashed_subpages
-#define pte_iterate_hashed_subpages(vpn, psize, shift)	\
-	do {						\
-		shift = mmu_psize_defs[psize].shift;	\
+#define pte_iterate_hashed_subpages(pte, vpn, psize, shift)	\
+        do {                                                    \
+                shift = mmu_psize_defs[psize].shift;            \
 
 #define pte_iterate_hashed_end() } while(0)
 #endif
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index c0bed3d01c1c..4c38ccd1e52f 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -16,6 +16,43 @@
 #include <asm/machdep.h>
 #include <asm/mmu.h>
 
+/*
+ * index from 0 - 15
+ */
+bool pte_or_subptegroup_valid(pte_t pte, unsigned long index)
+{
+	unsigned long ptev = pte_val(pte);
+
+	if (!(ptev & _PAGE_HASHPTE))
+		return false;
+	if (ptev & _PAGE_COMBO) {
+		unsigned long g_idx;
+
+		g_idx = (ptev & _PAGE_COMBO_VALID) >> _PAGE_F_GIX_SHIFT;
+		index = index >> 2;
+		if (g_idx & (0x1 << index))
+			return true;
+		else
+			return false;
+	}
+	return true;
+}
+
+/*
+ * index from 0 - 15
+ */
+static unsigned long mark_subptegroup_valid(unsigned long ptev, unsigned long index)
+{
+	unsigned long g_idx;
+
+	if (!(ptev & _PAGE_COMBO))
+		return ptev;
+	index = index >> 2;
+	g_idx = 0x1 << index;
+
+	return ptev | (g_idx << _PAGE_F_GIX_SHIFT);
+}
+
 unsigned long pte_to_hidx(pte_t pte, unsigned long hash,
 			  unsigned long vpn, int ssize, bool *valid)
 {
@@ -182,18 +219,8 @@ repeat:
 				   MMU_PAGE_4K, MMU_PAGE_4K, old_pte);
 		return -1;
 	}
-	/*
-	 * Insert slot number & secondary bit in PTE second half,
-	 * clear _PAGE_BUSY and set appropriate HPTE slot bit
-	 * Since we have _PAGE_BUSY set on ptep, we can be sure
-	 * nobody is undating hidx.
-	 */
-	new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE | _PAGE_COMBO;
-	/*
-	 * check __real_pte for details on matching smp_rmb()
-	 * FIXME!! We can possibly get rid of this ?
-	 */
-	smp_wmb();
+	new_pte = mark_subptegroup_valid(new_pte, subpg_index);
+	new_pte |=  _PAGE_HASHPTE;
 	*ptep = __pte(new_pte & ~_PAGE_BUSY);
 	return 0;
 }
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 9f7f6673e726..9bd0c6f505f0 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -664,7 +664,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 
-		pte_iterate_hashed_subpages(vpn, psize, shift) {
+		pte_iterate_hashed_subpages(pte, vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
 			hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
@@ -692,10 +692,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			vpn = batch->vpn[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(vpn, psize, shift) {
-				/*
-				 * We are not looking at subpage valid here
-				 */
+			pte_iterate_hashed_subpages(pte, vpn, psize, shift) {
 				__tlbiel(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
@@ -711,10 +708,7 @@ static void native_flush_hash_range(unsigned long number, int local)
 			vpn = batch->vpn[i];
 			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(vpn, psize, shift) {
-				/*
-				 * We are not looking at subpage valid here
-				 */
+			pte_iterate_hashed_subpages(pte, vpn, psize, shift) {
 				__tlbie(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 80e71ccc9474..d2fa41effa23 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1302,7 +1302,7 @@ void flush_hash_page(unsigned long vpn, pte_t pte, int psize, int ssize,
 	int local = flags & HPTE_LOCAL_UPDATE;
 
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
-	pte_iterate_hashed_subpages(vpn, psize, shift) {
+	pte_iterate_hashed_subpages(pte, vpn, psize, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
 		hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 		if (!valid_slot)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 1708cab20fc8..06af06420a35 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -549,7 +549,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
-		pte_iterate_hashed_subpages(vpn, psize, shift) {
+		pte_iterate_hashed_subpages(pte, vpn, psize, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
 			hidx = pte_to_hidx(pte, hash, vpn, ssize, &valid_slot);
 			if (!valid_slot)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 07/10] powerpc/mm: update PTE frag size
  2015-11-23 10:33 ` [PATCH V2 07/10] powerpc/mm: update PTE frag size Aneesh Kumar K.V
@ 2015-11-27  7:27   ` Aneesh Kumar K.V
  2015-11-27 11:56     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-27  7:27 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov; +Cc: linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Now that we don't track 4k subpage information we can use 2K PTE
> fragments.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 5062c6d423fd..a28dbfe2baed 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -39,14 +39,14 @@
>   */
>  #define PTE_RPN_SHIFT	(30)
>  /*
> - * we support 8 fragments per PTE page of 64K size.
> + * we support 32 fragments per PTE page of 64K size.
>   */
> -#define PTE_FRAG_NR	8
> +#define PTE_FRAG_NR	32
>  /*
>   * We use a 2K PTE page fragment and another 4K for storing
>   * real_pte_t hash index. Rounding the entire thing to 8K
>   */
> -#define PTE_FRAG_SIZE_SHIFT  13
> +#define PTE_FRAG_SIZE_SHIFT  11
>  #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
>


This break THP with 4k hpte support because we need to track 4096
subpage information,  and we have only 2048 bytes after this change.

Another thing I noticed is the impact of not tracking subpage
information. We do see some significant impact as shown by the mmtest
results below. The plan now is to go back to 4K pte framgments but
instead of using 16 bits to track 4k subpage valid bit in pte, we use only 4
bits as the last patch in this series does ("[PATCH V2 10/10]
powerpc/mm: Optmize the hashed subpage iteration"). We will track the
secondary and slot information on the second half. This will result in us using
hidx value 0x0, wrongly. This actually indicate primary hash with slot
number zero. But since we are not going to track individual 4k
subpage information we may using slot 0 wrongly. I checked the existing
code and we should be able to handle that case gracefuly.

aim9
                                         guest                       guest
                                 without-patch                  with-patch
Min      page_test   467386.67 (  0.00%)   330480.00 (-29.29%)
Min      brk_test   4231312.46 (  0.00%)  4217133.33 ( -0.34%)
Min      exec_test     1015.66 (  0.00%)      610.33 (-39.91%)
Min      fork_test     2208.92 (  0.00%)     1556.89 (-29.52%)
Hmean    page_test   475149.52 (  0.00%)   334539.38 (-29.59%)
Hmean    brk_test   4277644.85 (  0.00%)  4294441.01 (  0.39%)
Hmean    exec_test     1042.32 (  0.00%)      625.09 (-40.03%)
Hmean    fork_test     2391.28 (  0.00%)     1663.53 (-30.43%)
Stddev   page_test     3315.58 (  0.00%)     1978.01 ( 40.34%)
Stddev   brk_test     42167.57 (  0.00%)    76647.45 (-81.77%)
Stddev   exec_test       18.02 (  0.00%)        8.16 ( 54.73%)
Stddev   fork_test       77.69 (  0.00%)       49.06 ( 36.84%)
CoeffVar page_test        0.70 (  0.00%)        0.59 ( 15.27%)
CoeffVar brk_test         0.99 (  0.00%)        1.78 (-81.02%)
CoeffVar exec_test        1.73 (  0.00%)        1.30 ( 24.50%)
CoeffVar fork_test        3.25 (  0.00%)        2.95 (  9.20%)
Max      page_test   479513.33 (  0.00%)   338526.67 (-29.40%)
Max      brk_test   4412066.67 (  0.00%)  4454430.38 (  0.96%)
Max      exec_test     1071.62 (  0.00%)      637.00 (-40.56%)
Max      fork_test     2500.00 (  0.00%)     1732.18 (-30.71%)

               guest       guest
        without-patch  with-patch
User            0.23        4.49
System          0.44        2.43
Elapsed       723.30      726.21

                                 guest       guest
                          without-patch  with-patch
Minor Faults                  38893956    25714580
Major Faults                        25         228
Swap Ins                             0           0
Swap Outs                            0           0
Allocation stalls                    0           0
DMA allocs                    12760829     8717992
DMA32 allocs                         0           0
Normal allocs                        0           0
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned                 0           0
Kswapd pages reclaimed               0           0
Direct pages reclaimed               0           0
Kswapd efficiency                 100%        100%
Kswapd velocity                  0.000       0.000
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Zone normal velocity             0.000       0.000
Zone dma32 velocity              0.000       0.000
Zone dma velocity                0.000       0.000
Page writes by reclaim           0.000       0.000
Page writes file                     0           0
Page writes anon                     0           0
Page reclaim immediate               0           0
Sector Reads                      4252       27096
Sector Writes                    69488       69948
Page rescued immediate               0           0
Slabs scanned                        0           0
Direct inode steals                  0           0
Kswapd inode steals                  0           0
Kswapd skipped wait                  0           0
THP fault alloc                      0           0
THP collapse alloc                   0           0
THP splits                           0           0
THP fault fallback                   0           0
THP collapse fail                    0           0
Compaction stalls                    0           0
Compaction success                   0           0
Compaction failures                  0           0
Page migrate success                 0           0
Page migrate failure                 0           0
Compaction pages isolated            0           0
Compaction migrate scanned           0           0
Compaction free scanned              0           0
Compaction cost                      0           0
NUMA alloc hit                12757558     8716737
NUMA alloc miss                      0           0
NUMA interleave hit                  0           0
NUMA alloc local              12757558     8716737
NUMA base PTE updates              646         648
NUMA huge PMD updates                0           0
NUMA page range updates            646         648
NUMA hint faults                878759      610567
NUMA hint local faults          878759      610567
NUMA hint local percent            100         100
NUMA pages migrated                  0           0
AutoNUMA cost                    4393%       3052%

pft timings
                                         guest                       guest
                                 without-patch                  with-patch
Min      system-1        0.0900 (  0.00%)       0.0600 ( 33.33%)
Min      system-3        0.1500 (  0.00%)       0.1300 ( 13.33%)
Min      system-5        0.2400 (  0.00%)       0.2000 ( 16.67%)
Min      system-7        0.2700 (  0.00%)       0.2600 (  3.70%)
Min      system-8        0.2400 (  0.00%)       0.2400 (  0.00%)
Min      elapsed-1       0.1000 (  0.00%)       0.0800 ( 20.00%)
Min      elapsed-3       0.0700 (  0.00%)       0.0600 ( 14.29%)
Min      elapsed-5       0.0600 (  0.00%)       0.0600 (  0.00%)
Min      elapsed-7       0.0600 (  0.00%)       0.0600 (  0.00%)
Min      elapsed-8       0.0600 (  0.00%)       0.0600 (  0.00%)
Amean    system-1        0.1896 (  0.00%)       0.1071 ( 43.51%)
Amean    system-3        0.1805 (  0.00%)       0.1640 (  9.14%)
Amean    system-5        0.3372 (  0.00%)       0.3261 (  3.30%)
Amean    system-7        0.3609 (  0.00%)       0.4526 (-25.42%)
Amean    system-8        0.4686 (  0.00%)       0.4309 (  8.06%)
Amean    elapsed-1       0.2313 (  0.00%)       0.1281 ( 44.59%)
Amean    elapsed-3       0.0779 (  0.00%)       0.0714 (  8.35%)
Amean    elapsed-5       0.0916 (  0.00%)       0.0898 (  2.05%)
Amean    elapsed-7       0.0711 (  0.00%)       0.0877 (-23.37%)
Amean    elapsed-8       0.0786 (  0.00%)       0.0749 (  4.77%)
Stddev   system-1        0.3466 (  0.00%)       0.2874 ( 17.07%)
Stddev   system-3        0.0228 (  0.00%)       0.0278 (-21.83%)
Stddev   system-5        0.1883 (  0.00%)       0.1865 (  1.01%)
Stddev   system-7        0.0437 (  0.00%)       0.3484 (-696.61%)
Stddev   system-8        0.2732 (  0.00%)       0.2877 ( -5.33%)
Stddev   elapsed-1       0.4832 (  0.00%)       0.3672 ( 24.01%)
Stddev   elapsed-3       0.0085 (  0.00%)       0.0095 (-11.61%)
Stddev   elapsed-5       0.0464 (  0.00%)       0.0515 (-11.10%)
Stddev   elapsed-7       0.0063 (  0.00%)       0.0645 (-920.30%)
Stddev   elapsed-8       0.0379 (  0.00%)       0.0430 (-13.49%)
CoeffVar system-1      182.7586 (  0.00%)     268.2875 (-46.80%)
CoeffVar system-3       12.6305 (  0.00%)      16.9365 (-34.09%)
CoeffVar system-5       55.8482 (  0.00%)      57.1714 ( -2.37%)
CoeffVar system-7       12.1199 (  0.00%)      76.9777 (-535.14%)
CoeffVar system-8       58.2889 (  0.00%)      66.7748 (-14.56%)
CoeffVar elapsed-1     208.9340 (  0.00%)     286.5759 (-37.16%)
CoeffVar elapsed-3      10.8759 (  0.00%)      13.2441 (-21.78%)
CoeffVar elapsed-5      50.6194 (  0.00%)      57.4107 (-13.42%)
CoeffVar elapsed-7       8.8904 (  0.00%)      73.5236 (-727.00%)
CoeffVar elapsed-8      48.1691 (  0.00%)      57.4055 (-19.18%)
Max      system-1        2.9900 (  0.00%)       2.6600 ( 11.04%)
Max      system-3        0.2900 (  0.00%)       0.2700 (  6.90%)
Max      system-5        1.5200 (  0.00%)       1.4000 (  7.89%)
Max      system-7        0.5700 (  0.00%)       2.6200 (-359.65%)
Max      system-8        1.8700 (  0.00%)       1.9300 ( -3.21%)
Max      elapsed-1       4.1200 (  0.00%)       3.3900 ( 17.72%)
Max      elapsed-3       0.1100 (  0.00%)       0.1000 (  9.09%)
Max      elapsed-5       0.3800 (  0.00%)       0.3700 (  2.63%)
Max      elapsed-7       0.1000 (  0.00%)       0.4900 (-390.00%)
Max      elapsed-8       0.2700 (  0.00%)       0.2900 ( -7.41%)

pft faults
                                            guest                       guest
                                    without-patch                  with-patch
Min      faults/cpu-1    4339.4990 (  0.00%)    4883.1470 ( 12.53%)
Min      faults/cpu-3   42093.3690 (  0.00%)   45489.7530 (  8.07%)
Min      faults/cpu-5    8458.7940 (  0.00%)    9107.9580 (  7.67%)
Min      faults/cpu-7   21747.4000 (  0.00%)    4824.7950 (-77.81%)
Min      faults/cpu-8    6778.2410 (  0.00%)    6542.3320 ( -3.48%)
Min      faults/sec-1    3168.3390 (  0.00%)    3844.8860 ( 21.35%)
Min      faults/sec-3  122912.2040 (  0.00%)  124417.2460 (  1.22%)
Min      faults/sec-5   33891.4780 (  0.00%)   35242.9170 (  3.99%)
Min      faults/sec-7  126138.4850 (  0.00%)   26380.8510 (-79.09%)
Min      faults/sec-8   47493.9370 (  0.00%)   44882.3140 ( -5.50%)
Hmean    faults/cpu-1   65728.7295 (  0.00%)  112975.9206 ( 71.88%)
Hmean    faults/cpu-3   67715.8785 (  0.00%)   74340.0153 (  9.78%)
Hmean    faults/cpu-5   36339.2283 (  0.00%)   37396.4486 (  2.91%)
Hmean    faults/cpu-7   33959.8828 (  0.00%)   27199.2785 (-19.91%)
Hmean    faults/cpu-8   26145.9644 (  0.00%)   28495.1995 (  8.99%)
Hmean    faults/sec-1   56465.3726 (  0.00%)  100362.9738 ( 77.74%)
Hmean    faults/sec-3  166282.3786 (  0.00%)  183015.5259 ( 10.06%)
Hmean    faults/sec-5  142135.4993 (  0.00%)  144915.9134 (  1.96%)
Hmean    faults/sec-7  182191.8573 (  0.00%)  147016.2008 (-19.31%)
Hmean    faults/sec-8  166712.4745 (  0.00%)  175603.5422 (  5.33%)
Stddev   faults/cpu-1   21957.2462 (  0.00%)   23554.5092 ( -7.27%)
Stddev   faults/cpu-3    7091.4813 (  0.00%)    9985.8867 (-40.82%)
Stddev   faults/cpu-5    7541.9566 (  0.00%)    8947.1204 (-18.63%)
Stddev   faults/cpu-7    3980.6287 (  0.00%)    8638.2507 (-117.01%)
Stddev   faults/cpu-8    6429.2045 (  0.00%)    7858.0479 (-22.22%)
Stddev   faults/sec-1   21569.0114 (  0.00%)   21640.5914 ( -0.33%)
Stddev   faults/sec-3   14715.2173 (  0.00%)   21064.4060 (-43.15%)
Stddev   faults/sec-5   30084.3765 (  0.00%)   34040.2952 (-13.15%)
Stddev   faults/sec-7   13592.3120 (  0.00%)   45590.2568 (-235.41%)
Stddev   faults/sec-8   31470.9091 (  0.00%)   40231.8893 (-27.84%)
CoeffVar faults/cpu-1      23.2163 (  0.00%)      14.9965 ( 35.41%)
CoeffVar faults/cpu-3      10.3387 (  0.00%)      13.1506 (-27.20%)
CoeffVar faults/cpu-5      18.9342 (  0.00%)      21.4360 (-13.21%)
CoeffVar faults/cpu-7      11.5648 (  0.00%)      26.3385 (-127.75%)
CoeffVar faults/cpu-8      22.0578 (  0.00%)      23.8259 ( -8.02%)
CoeffVar faults/sec-1      23.8979 (  0.00%)      14.7121 ( 38.44%)
CoeffVar faults/sec-3       8.7752 (  0.00%)      11.3428 (-29.26%)
CoeffVar faults/sec-5      19.3505 (  0.00%)      21.0165 ( -8.61%)
CoeffVar faults/sec-7       7.4160 (  0.00%)      25.8994 (-249.24%)
CoeffVar faults/sec-8      17.3858 (  0.00%)      20.3632 (-17.13%)
Max      faults/cpu-1  135058.1560 (  0.00%)  179611.8510 ( 32.99%)
Max      faults/cpu-3   81575.7040 (  0.00%)   91510.1490 ( 12.18%)
Max      faults/cpu-5   50969.3560 (  0.00%)   62628.1740 ( 22.87%)
Max      faults/cpu-7   45106.2770 (  0.00%)   47090.8720 (  4.40%)
Max      faults/cpu-8   50540.8730 (  0.00%)   51169.9640 (  1.24%)
Max      faults/sec-1  130267.4950 (  0.00%)  159347.5220 ( 22.32%)
Max      faults/sec-3  193635.1080 (  0.00%)  223454.3190 ( 15.40%)
Max      faults/sec-5  204045.3880 (  0.00%)  219350.7390 (  7.50%)
Max      faults/sec-7  217209.2480 (  0.00%)  225028.8150 (  3.60%)
Max      faults/sec-8  215116.6780 (  0.00%)  231008.5060 (  7.39%)

               guest       guest
        without-patch  with-patch
User           15.01       16.37
System        146.03      164.43
Elapsed        52.21       57.82

                                 guest       guest
                          without-patch  with-patch
Minor Faults                   5481327     5493919
Major Faults                         0          90
Swap Ins                             0           0
Swap Outs                            0           0
Allocation stalls                    0           0
DMA allocs                     5271143     5277377
DMA32 allocs                         0           0
Normal allocs                        0           0
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned                 0           0
Kswapd pages reclaimed               0           0
Direct pages reclaimed               0           0
Kswapd efficiency                 100%        100%
Kswapd velocity                  0.000       0.000
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Zone normal velocity             0.000       0.000
Zone dma32 velocity              0.000       0.000
Zone dma velocity                0.000       0.000
Page writes by reclaim           0.000       0.000
Page writes file                     0           0
Page writes anon                     0           0
Page reclaim immediate               0           0
Sector Reads                        72       14632
Sector Writes                      508        3760
Page rescued immediate               0           0
Slabs scanned                        0           0
Direct inode steals                  0           0
Kswapd inode steals                  0           0
Kswapd skipped wait                  0           0
THP fault alloc                      0           0
THP collapse alloc                   0           0
THP splits                           0           0
THP fault fallback                   0           0
THP collapse fail                    0           0
Compaction stalls                    0           0
Compaction success                   0           0
Compaction failures                  0           0
Page migrate success                 0           0
Page migrate failure                 0           0
Compaction pages isolated            0           0
Compaction migrate scanned           0           0
Compaction free scanned              0           0
Compaction cost                      0           0
NUMA alloc hit                 5271051     5277260
NUMA alloc miss                      0           0
NUMA interleave hit                  0           0
NUMA alloc local               5271051     5277260
NUMA base PTE updates            19661       11194
NUMA huge PMD updates                0           0
NUMA page range updates          19661       11194
NUMA hint faults                    19          14
NUMA hint local faults              19          14
NUMA hint local percent            100         100
NUMA pages migrated                  0           0
AutoNUMA cost                       0%          0%

ebizzy Overall Throughput
                                       guest                       guest
                               without-patch                  with-patch
Min      Rsec-1      6614.00 (  0.00%)     4366.00 (-33.99%)
Min      Rsec-3     10621.00 (  0.00%)     7221.00 (-32.01%)
Min      Rsec-5     10792.00 (  0.00%)     7634.00 (-29.26%)
Min      Rsec-7     10994.00 (  0.00%)     7649.00 (-30.43%)
Min      Rsec-12    13824.00 (  0.00%)     7520.00 (-45.60%)
Min      Rsec-18    12508.00 (  0.00%)     7465.00 (-40.32%)
Min      Rsec-24    14684.00 (  0.00%)     9897.00 (-32.60%)
Min      Rsec-30    14917.00 (  0.00%)    10430.00 (-30.08%)
Min      Rsec-32    14566.00 (  0.00%)    10135.00 (-30.42%)
Hmean    Rsec-1      6693.18 (  0.00%)     4393.27 (-34.36%)
Hmean    Rsec-3     10900.07 (  0.00%)     7536.06 (-30.86%)
Hmean    Rsec-5     11428.67 (  0.00%)     7776.80 (-31.95%)
Hmean    Rsec-7     11632.77 (  0.00%)     7862.60 (-32.41%)
Hmean    Rsec-12    14513.82 (  0.00%)     7719.10 (-46.82%)
Hmean    Rsec-18    13291.53 (  0.00%)     7785.99 (-41.42%)
Hmean    Rsec-24    14897.25 (  0.00%)    10232.64 (-31.31%)
Hmean    Rsec-30    15259.81 (  0.00%)    10658.01 (-30.16%)
Hmean    Rsec-32    14884.59 (  0.00%)    10651.59 (-28.44%)
Stddev   Rsec-1        64.40 (  0.00%)       24.20 ( 62.41%)
Stddev   Rsec-3       178.18 (  0.00%)      170.17 (  4.49%)
Stddev   Rsec-5       374.77 (  0.00%)       78.30 ( 79.11%)
Stddev   Rsec-7       495.33 (  0.00%)      124.71 ( 74.82%)
Stddev   Rsec-12      825.33 (  0.00%)      113.93 ( 86.20%)
Stddev   Rsec-18      733.83 (  0.00%)      393.07 ( 46.44%)
Stddev   Rsec-24      118.49 (  0.00%)      279.08 (-135.54%)
Stddev   Rsec-30      270.38 (  0.00%)      145.14 ( 46.32%)
Stddev   Rsec-32      342.51 (  0.00%)      441.92 (-29.03%)
CoeffVar Rsec-1         0.96 (  0.00%)        0.55 ( 42.73%)
CoeffVar Rsec-3         1.63 (  0.00%)        2.26 (-38.10%)
CoeffVar Rsec-5         3.28 (  0.00%)        1.01 ( 69.27%)
CoeffVar Rsec-7         4.25 (  0.00%)        1.59 ( 62.69%)
CoeffVar Rsec-12        5.67 (  0.00%)        1.48 ( 73.97%)
CoeffVar Rsec-18        5.50 (  0.00%)        5.04 (  8.51%)
CoeffVar Rsec-24        0.80 (  0.00%)        2.73 (-242.68%)
CoeffVar Rsec-30        1.77 (  0.00%)        1.36 ( 23.13%)
CoeffVar Rsec-32        2.30 (  0.00%)        4.14 (-80.09%)
Max      Rsec-1      6792.00 (  0.00%)     4429.00 (-34.79%)
Max      Rsec-3     11131.00 (  0.00%)     7700.00 (-30.82%)
Max      Rsec-5     11933.00 (  0.00%)     7842.00 (-34.28%)
Max      Rsec-7     12508.00 (  0.00%)     8027.00 (-35.83%)
Max      Rsec-12    16002.00 (  0.00%)     7847.00 (-50.96%)
Max      Rsec-18    14516.00 (  0.00%)     8567.00 (-40.98%)
Max      Rsec-24    15013.00 (  0.00%)    10599.00 (-29.40%)
Max      Rsec-30    15656.00 (  0.00%)    10853.00 (-30.68%)
Max      Rsec-32    15385.00 (  0.00%)    11381.00 (-26.03%)

ebizzy Per-thread
                                       guest                       guest
                               without-patch                  with-patch
Min      Rsec-1      6614.00 (  0.00%)     4366.00 (-33.99%)
Min      Rsec-3      3464.00 (  0.00%)     2387.00 (-31.09%)
Min      Rsec-5      2100.00 (  0.00%)     1486.00 (-29.24%)
Min      Rsec-7      1494.00 (  0.00%)     1016.00 (-31.99%)
Min      Rsec-12     1010.00 (  0.00%)      586.00 (-41.98%)
Min      Rsec-18      571.00 (  0.00%)      373.00 (-34.68%)
Min      Rsec-24      473.00 (  0.00%)      330.00 (-30.23%)
Min      Rsec-30      398.00 (  0.00%)      283.00 (-28.89%)
Min      Rsec-32      364.00 (  0.00%)      250.00 (-31.32%)
Hmean    Rsec-1      6693.18 (  0.00%)     4393.27 (-34.36%)
Hmean    Rsec-3      3627.65 (  0.00%)     2504.84 (-30.95%)
Hmean    Rsec-5      2283.84 (  0.00%)     1554.53 (-31.93%)
Hmean    Rsec-7      1637.70 (  0.00%)     1121.58 (-31.51%)
Hmean    Rsec-12     1192.25 (  0.00%)      642.41 (-46.12%)
Hmean    Rsec-18      687.06 (  0.00%)      428.11 (-37.69%)
Hmean    Rsec-24      606.35 (  0.00%)      415.99 (-31.39%)
Hmean    Rsec-30      497.96 (  0.00%)      349.99 (-29.72%)
Hmean    Rsec-32      455.21 (  0.00%)      323.05 (-29.03%)
Stddev   Rsec-1        64.40 (  0.00%)       24.20 (-62.41%)
Stddev   Rsec-3       156.31 (  0.00%)      147.43 ( -5.68%)
Stddev   Rsec-5        95.61 (  0.00%)       30.04 (-68.58%)
Stddev   Rsec-7       286.41 (  0.00%)       40.64 (-85.81%)
Stddev   Rsec-12      178.07 (  0.00%)       19.11 (-89.27%)
Stddev   Rsec-18      250.94 (  0.00%)       69.96 (-72.12%)
Stddev   Rsec-24       97.93 (  0.00%)       69.89 (-28.63%)
Stddev   Rsec-30       81.68 (  0.00%)       42.31 (-48.20%)
Stddev   Rsec-32       70.76 (  0.00%)       62.86 (-11.16%)
CoeffVar Rsec-1         0.96 (  0.00%)        0.55 ( 42.73%)
CoeffVar Rsec-3         4.30 (  0.00%)        5.87 (-36.40%)
CoeffVar Rsec-5         4.18 (  0.00%)        1.93 ( 53.78%)
CoeffVar Rsec-7        17.21 (  0.00%)        3.62 ( 78.97%)
CoeffVar Rsec-12       14.68 (  0.00%)        2.97 ( 79.76%)
CoeffVar Rsec-18       33.90 (  0.00%)       16.15 ( 52.35%)
CoeffVar Rsec-24       15.79 (  0.00%)       16.40 ( -3.86%)
CoeffVar Rsec-30       16.07 (  0.00%)       11.92 ( 25.79%)
CoeffVar Rsec-32       15.22 (  0.00%)       18.88 (-24.05%)
Max      Rsec-1      6792.00 (  0.00%)     4429.00 (-34.79%)
Max      Rsec-3      3968.00 (  0.00%)     2818.00 (-28.98%)
Max      Rsec-5      2480.00 (  0.00%)     1607.00 (-35.20%)
Max      Rsec-7      3290.00 (  0.00%)     1193.00 (-63.74%)
Max      Rsec-12     1842.00 (  0.00%)      678.00 (-63.19%)
Max      Rsec-18     1495.00 (  0.00%)     1074.00 (-28.16%)
Max      Rsec-24      987.00 (  0.00%)      738.00 (-25.23%)
Max      Rsec-30      824.00 (  0.00%)      456.00 (-44.66%)
Max      Rsec-32      690.00 (  0.00%)      581.00 (-15.80%)

ebizzy Thread spread
                                         guest                       guest
                                 without-patch                  with-patch
Min      spread-1         0.00 (  0.00%)        0.00 (  0.00%)
Min      spread-3        69.00 (  0.00%)       38.00 ( 44.93%)
Min      spread-5       126.00 (  0.00%)       54.00 ( 57.14%)
Min      spread-7        79.00 (  0.00%)       78.00 (  1.27%)
Min      spread-12      262.00 (  0.00%)       36.00 ( 86.26%)
Min      spread-18      667.00 (  0.00%)       39.00 ( 94.15%)
Min      spread-24      300.00 (  0.00%)      156.00 ( 48.00%)
Min      spread-30      330.00 (  0.00%)      100.00 ( 69.70%)
Min      spread-32      220.00 (  0.00%)      158.00 ( 28.18%)
Hmean    spread-1         0.00 (  0.00%)        0.00 (  0.00%)
Hmean    spread-3       162.44 (  0.00%)      125.30 ( 22.86%)
Hmean    spread-5       161.03 (  0.00%)       65.41 ( 59.38%)
Hmean    spread-7       143.51 (  0.00%)      102.37 ( 28.67%)
Hmean    spread-12      376.64 (  0.00%)       53.48 ( 85.80%)
Hmean    spread-18      758.21 (  0.00%)       58.48 ( 92.29%)
Hmean    spread-24      348.60 (  0.00%)      215.23 ( 38.26%)
Hmean    spread-30      358.66 (  0.00%)      126.49 ( 64.73%)
Hmean    spread-32      264.70 (  0.00%)      199.06 ( 24.80%)
Stddev   spread-1         0.00 (  0.00%)        0.00 (  0.00%)
Stddev   spread-3       160.65 (  0.00%)      134.27 ( 16.42%)
Stddev   spread-5        20.56 (  0.00%)       14.92 ( 27.43%)
Stddev   spread-7       667.59 (  0.00%)       23.11 ( 96.54%)
Stddev   spread-12      190.86 (  0.00%)       11.03 ( 94.22%)
Stddev   spread-18       64.39 (  0.00%)      244.20 (-279.25%)
Stddev   spread-24       63.41 (  0.00%)       78.27 (-23.44%)
Stddev   spread-30       34.19 (  0.00%)       25.58 ( 25.20%)
Stddev   spread-32       27.74 (  0.00%)       63.89 (-130.27%)
CoeffVar spread-1         0.00 (  0.00%)        0.00 (  0.00%)
CoeffVar spread-3        58.00 (  0.00%)       50.71 (-12.57%)
CoeffVar spread-5        12.54 (  0.00%)       21.81 ( 73.99%)
CoeffVar spread-7       144.19 (  0.00%)       21.60 (-85.02%)
CoeffVar spread-12       42.68 (  0.00%)       19.63 (-54.00%)
CoeffVar spread-18        8.43 (  0.00%)      139.55 (1554.83%)
CoeffVar spread-24       17.71 (  0.00%)       33.28 ( 87.89%)
CoeffVar spread-30        9.46 (  0.00%)       19.49 (106.17%)
CoeffVar spread-32       10.36 (  0.00%)       29.61 (185.76%)
Max      spread-1         0.00 (  0.00%)        0.00 (  0.00%)
Max      spread-3       492.00 (  0.00%)      386.00 ( 21.54%)
Max      spread-5       185.00 (  0.00%)       89.00 ( 51.89%)
Max      spread-7      1796.00 (  0.00%)      146.00 ( 91.87%)
Max      spread-12      686.00 (  0.00%)       69.00 ( 89.94%)
Max      spread-18      869.00 (  0.00%)      662.00 ( 23.82%)
Max      spread-24      479.00 (  0.00%)      384.00 ( 19.83%)
Max      spread-30      426.00 (  0.00%)      173.00 ( 59.39%)
Max      spread-32      302.00 (  0.00%)      312.00 ( -3.31%)

               guest       guest
        without-patch  with-patch
User         1585.46      996.79
System       6802.57     7388.01
Elapsed      1352.70     1355.62

                                 guest       guest
                          without-patch  with-patch
Minor Faults                 102375790    67314425
Major Faults                         0           1
Swap Ins                             0           0
Swap Outs                            0           0
Allocation stalls                    0           0
DMA allocs                   102297177    67229359
DMA32 allocs                         0           0
Normal allocs                        0           0
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned                 0           0
Kswapd pages reclaimed               0           0
Direct pages reclaimed               0           0
Kswapd efficiency                 100%        100%
Kswapd velocity                  0.000       0.000
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Zone normal velocity             0.000       0.000
Zone dma32 velocity              0.000       0.000
Zone dma velocity                0.000       0.000
Page writes by reclaim           0.000       0.000
Page writes file                     0           0
Page writes anon                     0           0
Page reclaim immediate               0           0
Sector Reads                       428         620
Sector Writes                     4164        4620
Page rescued immediate               0           0
Slabs scanned                        0           0
Direct inode steals                  0           0
Kswapd inode steals                  0           0
Kswapd skipped wait                  0           0
THP fault alloc                      0           0
THP collapse alloc                   0           0
THP splits                           0           0
THP fault fallback                   0           0
THP collapse fail                    0           0
Compaction stalls                    0           0
Compaction success                   0           0
Compaction failures                  0           0
Page migrate success                 0           0
Page migrate failure                 0           0
Compaction pages isolated            0           0
Compaction migrate scanned           0           0
Compaction free scanned              0           0
Compaction cost                      0           0
NUMA alloc hit               102297213    67229443
NUMA alloc miss                      0           0
NUMA interleave hit                  0           0
NUMA alloc local             102297213    67229443
NUMA base PTE updates            43814       45596
NUMA huge PMD updates                0           0
NUMA page range updates          43814       45596
NUMA hint faults                 35181       36243
NUMA hint local faults           35181       36243
NUMA hint local percent            100         100
NUMA pages migrated                  0           0
AutoNUMA cost                     176%        181%

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 07/10] powerpc/mm: update PTE frag size
  2015-11-27  7:27   ` Aneesh Kumar K.V
@ 2015-11-27 11:56     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2015-11-27 11:56 UTC (permalink / raw)
  To: benh, paulus, mpe, Scott Wood, Denis Kirjanov; +Cc: linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>
>> Now that we don't track 4k subpage information we can use 2K PTE
>> fragments.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
>> index 5062c6d423fd..a28dbfe2baed 100644
>> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
>> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
>> @@ -39,14 +39,14 @@
>>   */
>>  #define PTE_RPN_SHIFT	(30)
>>  /*
>> - * we support 8 fragments per PTE page of 64K size.
>> + * we support 32 fragments per PTE page of 64K size.
>>   */
>> -#define PTE_FRAG_NR	8
>> +#define PTE_FRAG_NR	32
>>  /*
>>   * We use a 2K PTE page fragment and another 4K for storing
>>   * real_pte_t hash index. Rounding the entire thing to 8K
>>   */
>> -#define PTE_FRAG_SIZE_SHIFT  13
>> +#define PTE_FRAG_SIZE_SHIFT  11
>>  #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
>>
>
>
> This break THP with 4k hpte support because we need to track 4096
> subpage information,  and we have only 2048 bytes after this change.
>
> Another thing I noticed is the impact of not tracking subpage
> information. We do see some significant impact as shown by the mmtest
> results below. The plan now is to go back to 4K pte framgments but
> instead of using 16 bits to track 4k subpage valid bit in pte, we use only 4
> bits as the last patch in this series does ("[PATCH V2 10/10]
> powerpc/mm: Optmize the hashed subpage iteration"). We will track the
> secondary and slot information on the second half. This will result in us using
> hidx value 0x0, wrongly. This actually indicate primary hash with slot
> number zero. But since we are not going to track individual 4k
> subpage information we may using slot 0 wrongly. I checked the existing
> code and we should be able to handle that case gracefuly.

I pushed the changes to 

https://github.com/kvaneesh/linux/commits/book3s-pte-format-v2

This needs full round of testing. I only did a sanity test with
4k hash pte config. Will send an updated series once I finish testing.
Meanwhile if you are interested please take a look

-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-27 11:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-23 10:33 [PATCH V2 00/10] Reduce the pte framgment size Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 01/10] powerpc/mm: Don't hardcode page table size Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 02/10] powerpc/mm: Don't hardcode the hash pte slot shift Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 03/10] powerpc/nohash: Update 64K nohash config to have 32 pte fragement Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 04/10] powerpc/nohash: we don't use real_pte_t for nohash Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 05/10] powerpc/mm: Use H_READ with H_READ_4 Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 06/10] powerpc/mm: Don't track 4k subpage information with 64k linux page size Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 07/10] powerpc/mm: update PTE frag size Aneesh Kumar K.V
2015-11-27  7:27   ` Aneesh Kumar K.V
2015-11-27 11:56     ` Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 08/10] powerpc/mm: Update pte_iterate_hashed_subpages args Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 09/10] powerpc/mm: Drop real_pte_t usage Aneesh Kumar K.V
2015-11-23 10:33 ` [PATCH V2 10/10] powerpc/mm: Optmize the hashed subpage iteration Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).