linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use
@ 2018-03-07  1:37 Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Overall on POWER8, this series increases vfork+exec+exit
microbenchmark rate by 15.6%, and mmap+munmap rate by 81%. Slice
code/data size is reduced by 1kB, and max stack overhead through
slice_get_unmapped_area call goes rom 992 to 448 bytes. The cost is
288 bytes added to the mm_context_t per mm for the slice masks on
Book3S.

Since v1:
- Fixed a couple of bugs and compile errors on 8xx.
- Accounted for all Christophe's review feedback hopefully.
- Got rid of unrelated "cleanup" hunks, and split one to its own patch.
- Dropped patch to dynamically limit bitmap operations. This may be
  revisited after Aneesh's 4TB patches.

Thanks,
Nick

Nicholas Piggin (10):
  powerpc/mm/slice: Simplify and optimise slice context initialisation
  powerpc/mm/slice: tidy lpsizes and hpsizes update loops
  powerpc/mm/slice: pass pointers to struct slice_mask where possible
  powerpc/mm/slice: implement a slice mask cache
  powerpc/mm/slice: implement slice_check_range_fits
  powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  powerpc/mm/slice: remove dead code
  powerpc/mm/slice: Use const pointers to cached slice masks where
    possible
  powerpc/mm/slice: remove radix calls to the slice code
  powerpc/mm/slice: use the dynamic high slice size to limit bitmap
    operations

 arch/powerpc/include/asm/book3s/64/mmu.h |  18 ++
 arch/powerpc/include/asm/hugetlb.h       |   7 +-
 arch/powerpc/include/asm/mmu-8xx.h       |  10 +
 arch/powerpc/include/asm/slice.h         |   8 +-
 arch/powerpc/mm/hugetlbpage.c            |   6 +-
 arch/powerpc/mm/mmu_context_book3s64.c   |   9 +-
 arch/powerpc/mm/mmu_context_nohash.c     |   5 +-
 arch/powerpc/mm/slice.c                  | 461 ++++++++++++++++---------------
 8 files changed, 277 insertions(+), 247 deletions(-)

-- 
2.16.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-14  9:28   ` [v2, " Michael Ellerman
  2018-03-07  1:37 ` [PATCH v2 02/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.

This speeds up vfork+exec+exit performance on POWER8 by 3%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/slice.h       |  8 ++--
 arch/powerpc/mm/mmu_context_book3s64.c |  9 +----
 arch/powerpc/mm/mmu_context_nohash.c   |  5 +--
 arch/powerpc/mm/slice.c                | 72 +++++++++-------------------------
 4 files changed, 23 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h
index 172711fadb1c..e40406cf5628 100644
--- a/arch/powerpc/include/asm/slice.h
+++ b/arch/powerpc/include/asm/slice.h
@@ -28,15 +28,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 
 unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr);
 
-void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 			   unsigned long len, unsigned int psize);
-#endif /* __ASSEMBLY__ */
 
-#else /* CONFIG_PPC_MM_SLICES */
+void slice_init_new_context_exec(struct mm_struct *mm);
+
+#endif /* __ASSEMBLY__ */
 
-#define slice_set_range_psize(mm, start, len, psize)	\
-	slice_set_user_psize((mm), (psize))
 #endif /* CONFIG_PPC_MM_SLICES */
 
 #endif /* _ASM_POWERPC_SLICE_H */
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 929d9ef7083f..80acad52b006 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -93,13 +93,6 @@ static int hash__init_new_context(struct mm_struct *mm)
 	if (index < 0)
 		return index;
 
-	/*
-	 * In the case of exec, use the default limit,
-	 * otherwise inherit it from the mm we are duplicating.
-	 */
-	if (!mm->context.slb_addr_limit)
-		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;
-
 	/*
 	 * The old code would re-promote on fork, we don't do that when using
 	 * slices as it could cause problem promoting slices that have been
@@ -115,7 +108,7 @@ static int hash__init_new_context(struct mm_struct *mm)
 	 * check against 0 is OK.
 	 */
 	if (mm->context.id == 0)
-		slice_set_user_psize(mm, mmu_virtual_psize);
+		slice_init_new_context_exec(mm);
 
 	subpage_prot_init_new_context(mm);
 
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index d98f7e5c141b..be8f5c9d4d08 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -332,9 +332,6 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
 	pr_hard("initing context for mm @%p\n", mm);
 
 #ifdef	CONFIG_PPC_MM_SLICES
-	if (!mm->context.slb_addr_limit)
-		mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW;
-
 	/*
 	 * We have MMU_NO_CONTEXT set to be ~0. Hence check
 	 * explicitly against context.id == 0. This ensures that we properly
@@ -343,7 +340,7 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
 	 * will have id != 0).
 	 */
 	if (mm->context.id == 0)
-		slice_set_user_psize(mm, mmu_virtual_psize);
+		slice_init_new_context_exec(mm);
 #endif
 	mm->context.id = MMU_NO_CONTEXT;
 	mm->context.active = 0;
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 5e9e1e57d580..7b51f962ce0c 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -671,70 +671,34 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(get_slice_psize);
 
-/*
- * This is called by hash_page when it needs to do a lazy conversion of
- * an address space from real 64K pages to combo 4K pages (typically
- * when hitting a non cacheable mapping on a processor or hypervisor
- * that won't allow them for 64K pages).
- *
- * This is also called in init_new_context() to change back the user
- * psize from whatever the parent context had it set to
- * N.B. This may be called before mm->context.id has been set.
- *
- * This function will only change the content of the {low,high)_slice_psize
- * masks, it will not flush SLBs as this shall be handled lazily by the
- * caller.
- */
-void slice_set_user_psize(struct mm_struct *mm, unsigned int psize)
+void slice_init_new_context_exec(struct mm_struct *mm)
 {
-	int index, mask_index;
 	unsigned char *hpsizes, *lpsizes;
-	unsigned long flags;
-	unsigned int old_psize;
-	int i;
+	unsigned int psize = mmu_virtual_psize;
 
-	slice_dbg("slice_set_user_psize(mm=%p, psize=%d)\n", mm, psize);
+	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
 
-	VM_BUG_ON(radix_enabled());
-	spin_lock_irqsave(&slice_convert_lock, flags);
-
-	old_psize = mm->context.user_psize;
-	slice_dbg(" old_psize=%d\n", old_psize);
-	if (old_psize == psize)
-		goto bail;
+	/*
+	 * In the case of exec, use the default limit. In the
+	 * case of fork it is just inherited from the mm being
+	 * duplicated.
+	 */
+#ifdef CONFIG_PPC64
+	mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64;
+#else
+	mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW;
+#endif
 
 	mm->context.user_psize = psize;
-	wmb();
 
+	/*
+	 * Set all slice psizes to the default.
+	 */
 	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((lpsizes[index] >> (mask_index * 4)) & 0xf) == old_psize)
-			lpsizes[index] = (lpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
-				(((unsigned long)psize) << (mask_index * 4));
-	}
+	memset(lpsizes, (psize << 4) | psize, SLICE_NUM_LOW >> 1);
 
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < SLICE_NUM_HIGH; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == old_psize)
-			hpsizes[index] = (hpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
-				(((unsigned long)psize) << (mask_index * 4));
-	}
-
-
-
-
-	slice_dbg(" lsps=%lx, hsps=%lx\n",
-		  (unsigned long)mm->context.low_slices_psize,
-		  (unsigned long)mm->context.high_slices_psize);
-
- bail:
-	spin_unlock_irqrestore(&slice_convert_lock, flags);
+	memset(hpsizes, (psize << 4) | psize, SLICE_NUM_HIGH >> 1);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 02/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 03/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Make these loops look the same, and change their form so the
important part is not wrapped over so many lines.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 7b51f962ce0c..432c328b3e94 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -232,22 +232,24 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 	spin_lock_irqsave(&slice_convert_lock, flags);
 
 	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++)
-		if (mask.low_slices & (1u << i)) {
-			mask_index = i & 0x1;
-			index = i >> 1;
-			lpsizes[index] = (lpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
+	for (i = 0; i < SLICE_NUM_LOW; i++) {
+		if (!(mask.low_slices & (1u << i)))
+			continue;
+
+		mask_index = i & 0x1;
+		index = i >> 1;
+		lpsizes[index] = (lpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
-		}
+	}
 
 	hpsizes = mm->context.high_slices_psize;
 	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
+		if (!test_bit(i, mask.high_slices))
+			continue;
+
 		mask_index = i & 0x1;
 		index = i >> 1;
-		if (test_bit(i, mask.high_slices))
-			hpsizes[index] = (hpsizes[index] &
-					  ~(0xf << (mask_index * 4))) |
+		hpsizes[index] = (hpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 03/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 02/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 04/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.

checkstack.pl gives, before:
0x00000d1c slice_get_unmapped_area [slice.o]:		592
0x00001864 is_hugepage_only_range [slice.o]:		448
0x00000754 slice_find_area_topdown [slice.o]:		400
0x00000484 slice_find_area_bottomup.isra.1 [slice.o]:	272
0x000017b4 slice_set_range_psize [slice.o]:		224
0x00000a4c slice_find_area [slice.o]:			128
0x00000160 slice_check_fit [slice.o]:			112

after:
0x00000ad0 slice_get_unmapped_area [slice.o]:		448
0x00001464 is_hugepage_only_range [slice.o]:		288
0x000006c0 slice_find_area [slice.o]:			144
0x0000016c slice_check_fit [slice.o]:			128
0x00000528 slice_find_area_bottomup.isra.2 [slice.o]:	128
0x000013e4 slice_set_range_psize [slice.o]:		128

This increases vfork+exec+exit performance by 1.5%.

Reduces time to mmap+munmap a 64kB page by 17%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 84 ++++++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 432c328b3e94..420d791f0e18 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -50,19 +50,21 @@ struct slice_mask {
 #ifdef DEBUG
 int _slice_debug = 1;
 
-static void slice_print_mask(const char *label, struct slice_mask mask)
+static void slice_print_mask(const char *label, const struct slice_mask *mask)
 {
 	if (!_slice_debug)
 		return;
-	pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, &mask.low_slices);
-	pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, mask.high_slices);
+	pr_devel("%s low_slice: %*pbl\n", label,
+			(int)SLICE_NUM_LOW, &mask->low_slices);
+	pr_devel("%s high_slice: %*pbl\n", label,
+			(int)SLICE_NUM_HIGH, mask->high_slices);
 }
 
 #define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
 
 #else
 
-static void slice_print_mask(const char *label, struct slice_mask mask) {}
+static void slice_print_mask(const char *label, const struct slice_mask *mask) {}
 #define slice_dbg(fmt...)
 
 #endif
@@ -179,7 +181,8 @@ static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_ma
 }
 
 static int slice_check_fit(struct mm_struct *mm,
-			   struct slice_mask mask, struct slice_mask available)
+			   const struct slice_mask *mask,
+			   const struct slice_mask *available)
 {
 	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
 	/*
@@ -189,14 +192,14 @@ static int slice_check_fit(struct mm_struct *mm,
 	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
 
 	if (!SLICE_NUM_HIGH)
-		return (mask.low_slices & available.low_slices) ==
-		       mask.low_slices;
+		return (mask->low_slices & available->low_slices) ==
+		       mask->low_slices;
 
-	bitmap_and(result, mask.high_slices,
-		   available.high_slices, slice_count);
+	bitmap_and(result, mask->high_slices,
+		   available->high_slices, slice_count);
 
-	return (mask.low_slices & available.low_slices) == mask.low_slices &&
-		bitmap_equal(result, mask.high_slices, slice_count);
+	return (mask->low_slices & available->low_slices) == mask->low_slices &&
+		bitmap_equal(result, mask->high_slices, slice_count);
 }
 
 static void slice_flush_segments(void *parm)
@@ -216,7 +219,8 @@ static void slice_flush_segments(void *parm)
 #endif
 }
 
-static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psize)
+static void slice_convert(struct mm_struct *mm,
+				const struct slice_mask *mask, int psize)
 {
 	int index, mask_index;
 	/* Write the new slice psize bits */
@@ -233,7 +237,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	lpsizes = mm->context.low_slices_psize;
 	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		if (!(mask.low_slices & (1u << i)))
+		if (!(mask->low_slices & (1u << i)))
 			continue;
 
 		mask_index = i & 0x1;
@@ -244,7 +248,7 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
 
 	hpsizes = mm->context.high_slices_psize;
 	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
-		if (!test_bit(i, mask.high_slices))
+		if (!test_bit(i, mask->high_slices))
 			continue;
 
 		mask_index = i & 0x1;
@@ -270,26 +274,25 @@ static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int psiz
  * 'available' slice_mark.
  */
 static bool slice_scan_available(unsigned long addr,
-				 struct slice_mask available,
-				 int end,
-				 unsigned long *boundary_addr)
+				 const struct slice_mask *available,
+				 int end, unsigned long *boundary_addr)
 {
 	unsigned long slice;
 	if (addr < SLICE_LOW_TOP) {
 		slice = GET_LOW_SLICE_INDEX(addr);
 		*boundary_addr = (slice + end) << SLICE_LOW_SHIFT;
-		return !!(available.low_slices & (1u << slice));
+		return !!(available->low_slices & (1u << slice));
 	} else {
 		slice = GET_HIGH_SLICE_INDEX(addr);
 		*boundary_addr = (slice + end) ?
 			((slice + end) << SLICE_HIGH_SHIFT) : SLICE_LOW_TOP;
-		return !!test_bit(slice, available.high_slices);
+		return !!test_bit(slice, available->high_slices);
 	}
 }
 
 static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
 					      unsigned long len,
-					      struct slice_mask available,
+					      const struct slice_mask *available,
 					      int psize, unsigned long high_limit)
 {
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
@@ -335,7 +338,7 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
 
 static unsigned long slice_find_area_topdown(struct mm_struct *mm,
 					     unsigned long len,
-					     struct slice_mask available,
+					     const struct slice_mask *available,
 					     int psize, unsigned long high_limit)
 {
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
@@ -393,7 +396,7 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
 
 
 static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
-				     struct slice_mask mask, int psize,
+				     const struct slice_mask *mask, int psize,
 				     int topdown, unsigned long high_limit)
 {
 	if (topdown)
@@ -402,7 +405,8 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
 }
 
-static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
+static inline void slice_or_mask(struct slice_mask *dst,
+					const struct slice_mask *src)
 {
 	dst->low_slices |= src->low_slices;
 	if (!SLICE_NUM_HIGH)
@@ -411,7 +415,8 @@ static inline void slice_or_mask(struct slice_mask *dst, struct slice_mask *src)
 		  SLICE_NUM_HIGH);
 }
 
-static inline void slice_andnot_mask(struct slice_mask *dst, struct slice_mask *src)
+static inline void slice_andnot_mask(struct slice_mask *dst,
+					const struct slice_mask *src)
 {
 	dst->low_slices &= ~src->low_slices;
 
@@ -501,7 +506,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * already
 	 */
 	slice_mask_for_size(mm, psize, &good_mask, high_limit);
-	slice_print_mask(" good_mask", good_mask);
+	slice_print_mask(" good_mask", &good_mask);
 
 	/*
 	 * Here "good" means slices that are already the right page size,
@@ -535,12 +540,12 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr != 0 || fixed) {
 		/* Build a mask for the requested range */
 		slice_range_to_mask(addr, len, &mask);
-		slice_print_mask(" mask", mask);
+		slice_print_mask(" mask", &mask);
 
 		/* Check if we fit in the good mask. If we do, we just return,
 		 * nothing else to do
 		 */
-		if (slice_check_fit(mm, mask, good_mask)) {
+		if (slice_check_fit(mm, &mask, &good_mask)) {
 			slice_dbg(" fits good !\n");
 			return addr;
 		}
@@ -548,7 +553,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		/* Now let's see if we can find something in the existing
 		 * slices for that size
 		 */
-		newaddr = slice_find_area(mm, len, good_mask,
+		newaddr = slice_find_area(mm, len, &good_mask,
 					  psize, topdown, high_limit);
 		if (newaddr != -ENOMEM) {
 			/* Found within the good mask, we don't have to setup,
@@ -564,9 +569,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 */
 	slice_mask_for_free(mm, &potential_mask, high_limit);
 	slice_or_mask(&potential_mask, &good_mask);
-	slice_print_mask(" potential", potential_mask);
+	slice_print_mask(" potential", &potential_mask);
 
-	if ((addr != 0 || fixed) && slice_check_fit(mm, mask, potential_mask)) {
+	if ((addr != 0 || fixed) &&
+			slice_check_fit(mm, &mask, &potential_mask)) {
 		slice_dbg(" fits potential !\n");
 		goto convert;
 	}
@@ -581,7 +587,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * anywhere in the good area.
 	 */
 	if (addr) {
-		addr = slice_find_area(mm, len, good_mask,
+		addr = slice_find_area(mm, len, &good_mask,
 				       psize, topdown, high_limit);
 		if (addr != -ENOMEM) {
 			slice_dbg(" found area at 0x%lx\n", addr);
@@ -592,14 +598,14 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* Now let's see if we can find something in the existing slices
 	 * for that size plus free slices
 	 */
-	addr = slice_find_area(mm, len, potential_mask,
+	addr = slice_find_area(mm, len, &potential_mask,
 			       psize, topdown, high_limit);
 
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
 		slice_or_mask(&potential_mask, &compat_mask);
-		addr = slice_find_area(mm, len, potential_mask,
+		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
 #endif
@@ -609,7 +615,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 
 	slice_range_to_mask(addr, len, &mask);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
-	slice_print_mask(" mask", mask);
+	slice_print_mask(" mask", &mask);
 
  convert:
 	slice_andnot_mask(&mask, &good_mask);
@@ -617,7 +623,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (mask.low_slices ||
 	    (SLICE_NUM_HIGH &&
 	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
-		slice_convert(mm, mask, psize);
+		slice_convert(mm, &mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
 	}
@@ -711,7 +717,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 	VM_BUG_ON(radix_enabled());
 
 	slice_range_to_mask(start, len, &mask);
-	slice_convert(mm, mask, psize);
+	slice_convert(mm, &mask, psize);
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
@@ -758,9 +764,9 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 #if 0 /* too verbose */
 	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
 		 mm, addr, len);
-	slice_print_mask(" mask", mask);
-	slice_print_mask(" available", available);
+	slice_print_mask(" mask", &mask);
+	slice_print_mask(" available", &available);
 #endif
-	return !slice_check_fit(mm, mask, available);
+	return !slice_check_fit(mm, &mask, &available);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 04/10] powerpc/mm/slice: implement a slice mask cache
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (2 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 03/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 05/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Benjamin Herrenschmidt, Anton Blanchard

Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.

On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.

Reduces time to mmap+munmap by about 10% on 8xx.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  18 +++++
 arch/powerpc/include/asm/mmu-8xx.h       |  10 +++
 arch/powerpc/mm/slice.c                  | 112 +++++++++++++++++++------------
 3 files changed, 98 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index bef6e39ed63a..777778579305 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -80,6 +80,16 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/*
+ * One bit per slice. We have lower slices which cover 256MB segments
+ * upto 4G range. That gets us 16 low slices. For the rest we track slices
+ * in 1TB size.
+ */
+struct slice_mask {
+	u64 low_slices;
+	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
+};
+
 typedef struct {
 	mm_context_id_t id;
 	u16 user_psize;		/* page size index */
@@ -95,6 +105,14 @@ typedef struct {
 	unsigned char low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
 	unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
 	unsigned long slb_addr_limit;
+# ifdef CONFIG_PPC_64K_PAGES
+	struct slice_mask mask_64k;
+# endif
+	struct slice_mask mask_4k;
+# ifdef CONFIG_HUGETLB_PAGE
+	struct slice_mask mask_16m;
+	struct slice_mask mask_16g;
+# endif
 #else
 	u16 sllp;		/* SLB page size encoding */
 #endif
diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index d3d7e79140c6..4f547752ae79 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -192,6 +192,11 @@
 #endif
 
 #ifndef __ASSEMBLY__
+struct slice_mask {
+	u64 low_slices;
+	DECLARE_BITMAP(high_slices, 0);
+};
+
 typedef struct {
 	unsigned int id;
 	unsigned int active;
@@ -201,6 +206,11 @@ typedef struct {
 	unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
 	unsigned char high_slices_psize[0];
 	unsigned long slb_addr_limit;
+	struct slice_mask mask_base_psize; /* 4k or 16k */
+# ifdef CONFIG_HUGETLB_PAGE
+	struct slice_mask mask_512k;
+	struct slice_mask mask_8m;
+# endif
 #endif
 } mm_context_t;
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 420d791f0e18..3e199b9cbbfd 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -37,15 +37,6 @@
 #include <asm/hugetlb.h>
 
 static DEFINE_SPINLOCK(slice_convert_lock);
-/*
- * One bit per slice. We have lower slices which cover 256MB segments
- * upto 4G range. That gets us 16 low slices. For the rest we track slices
- * in 1TB size.
- */
-struct slice_mask {
-	u64 low_slices;
-	DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
-};
 
 #ifdef DEBUG
 int _slice_debug = 1;
@@ -149,36 +140,39 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
 			__set_bit(i, ret->high_slices);
 }
 
-static void slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret,
-				unsigned long high_limit)
+#ifdef CONFIG_PPC_BOOK3S_64
+static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
 {
-	unsigned char *hpsizes, *lpsizes;
-	int index, mask_index;
-	unsigned long i;
-
-	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
-	lpsizes = mm->context.low_slices_psize;
-	for (i = 0; i < SLICE_NUM_LOW; i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((lpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
-			ret->low_slices |= 1u << i;
-	}
-
-	if (high_limit <= SLICE_LOW_TOP)
-		return;
-
-	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
-		mask_index = i & 0x1;
-		index = i >> 1;
-		if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
-			__set_bit(i, ret->high_slices);
-	}
+#ifdef CONFIG_PPC_64K_PAGES
+	if (psize == MMU_PAGE_64K)
+		return &mm->context.mask_64k;
+#endif
+	if (psize == MMU_PAGE_4K)
+		return &mm->context.mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_16M)
+		return &mm->context.mask_16m;
+	if (psize == MMU_PAGE_16G)
+		return &mm->context.mask_16g;
+#endif
+	BUG();
 }
+#elif defined(CONFIG_PPC_8xx)
+static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
+{
+	if (psize == mmu_virtual_psize)
+		return &mm->context.mask_base_psize;
+#ifdef CONFIG_HUGETLB_PAGE
+	if (psize == MMU_PAGE_512K)
+		return &mm->context.mask_512k;
+	if (psize == MMU_PAGE_8M)
+		return &mm->context.mask_8m;
+#endif
+	BUG();
+}
+#else
+#error "Must define the slice masks for page sizes supported by the platform"
+#endif
 
 static int slice_check_fit(struct mm_struct *mm,
 			   const struct slice_mask *mask,
@@ -225,11 +219,15 @@ static void slice_convert(struct mm_struct *mm,
 	int index, mask_index;
 	/* Write the new slice psize bits */
 	unsigned char *hpsizes, *lpsizes;
+	struct slice_mask *psize_mask, *old_mask;
 	unsigned long i, flags;
+	int old_psize;
 
 	slice_dbg("slice_convert(mm=%p, psize=%d)\n", mm, psize);
 	slice_print_mask(" mask", mask);
 
+	psize_mask = slice_mask_for_size(mm, psize);
+
 	/* We need to use a spinlock here to protect against
 	 * concurrent 64k -> 4k demotion ...
 	 */
@@ -242,6 +240,14 @@ static void slice_convert(struct mm_struct *mm,
 
 		mask_index = i & 0x1;
 		index = i >> 1;
+
+		/* Update the slice_mask */
+		old_psize = (lpsizes[index] >> (mask_index * 4)) & 0xf;
+		old_mask = slice_mask_for_size(mm, old_psize);
+		old_mask->low_slices &= ~(1u << i);
+		psize_mask->low_slices |= 1u << i;
+
+		/* Update the sizes array */
 		lpsizes[index] = (lpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
@@ -253,6 +259,14 @@ static void slice_convert(struct mm_struct *mm,
 
 		mask_index = i & 0x1;
 		index = i >> 1;
+
+		/* Update the slice_mask */
+		old_psize = (hpsizes[index] >> (mask_index * 4)) & 0xf;
+		old_mask = slice_mask_for_size(mm, old_psize);
+		__clear_bit(i, old_mask->high_slices);
+		__set_bit(i, psize_mask->high_slices);
+
+		/* Update the sizes array */
 		hpsizes[index] = (hpsizes[index] & ~(0xf << (mask_index * 4))) |
 				(((unsigned long)psize) << (mask_index * 4));
 	}
@@ -463,7 +477,13 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	}
 
 	if (high_limit > mm->context.slb_addr_limit) {
+		/*
+		 * Increasing the slb_addr_limit does not require
+		 * slice mask cache to be recalculated because it should
+		 * be already initialised beyond the old address limit.
+		 */
 		mm->context.slb_addr_limit = high_limit;
+
 		on_each_cpu(slice_flush_segments, mm, 1);
 	}
 
@@ -505,7 +525,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* First make up a "good" mask of slices that have the right size
 	 * already
 	 */
-	slice_mask_for_size(mm, psize, &good_mask, high_limit);
+	good_mask = *slice_mask_for_size(mm, psize);
 	slice_print_mask(" good_mask", &good_mask);
 
 	/*
@@ -530,7 +550,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	/* If we support combo pages, we can allow 64k pages in 4k slices */
 	if (psize == MMU_PAGE_64K) {
-		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
+		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
 			slice_or_mask(&good_mask, &compat_mask);
 	}
@@ -682,6 +702,7 @@ EXPORT_SYMBOL_GPL(get_slice_psize);
 void slice_init_new_context_exec(struct mm_struct *mm)
 {
 	unsigned char *hpsizes, *lpsizes;
+	struct slice_mask *mask;
 	unsigned int psize = mmu_virtual_psize;
 
 	slice_dbg("slice_init_new_context_exec(mm=%p)\n", mm);
@@ -707,6 +728,14 @@ void slice_init_new_context_exec(struct mm_struct *mm)
 
 	hpsizes = mm->context.high_slices_psize;
 	memset(hpsizes, (psize << 4) | psize, SLICE_NUM_HIGH >> 1);
+
+	/*
+	 * Slice mask cache starts zeroed, fill the default size cache.
+	 */
+	mask = slice_mask_for_size(mm, psize);
+	mask->low_slices = ~0UL;
+	if (SLICE_NUM_HIGH)
+		bitmap_fill(mask->high_slices, SLICE_NUM_HIGH);
 }
 
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
@@ -745,18 +774,17 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 {
 	struct slice_mask mask, available;
 	unsigned int psize = mm->context.user_psize;
-	unsigned long high_limit = mm->context.slb_addr_limit;
 
 	if (radix_enabled())
 		return 0;
 
 	slice_range_to_mask(addr, len, &mask);
-	slice_mask_for_size(mm, psize, &available, high_limit);
+	available = *slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
 	if (psize == MMU_PAGE_64K) {
 		struct slice_mask compat_mask;
-		slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit);
+		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		slice_or_mask(&available, &compat_mask);
 	}
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 05/10] powerpc/mm/slice: implement slice_check_range_fits
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (3 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 04/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 06/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.

This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases, so building and
comparing a full mask is going to be more expensive than testing just
one or two bits of the range.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 5%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 62 +++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 3e199b9cbbfd..0a5efa40e739 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -174,26 +174,36 @@ static struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize)
 #error "Must define the slice masks for page sizes supported by the platform"
 #endif
 
-static int slice_check_fit(struct mm_struct *mm,
-			   const struct slice_mask *mask,
-			   const struct slice_mask *available)
+static bool slice_check_range_fits(struct mm_struct *mm,
+			   const struct slice_mask *available,
+			   unsigned long start, unsigned long len)
 {
-	DECLARE_BITMAP(result, SLICE_NUM_HIGH);
-	/*
-	 * Make sure we just do bit compare only to the max
-	 * addr limit and not the full bit map size.
-	 */
-	unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	unsigned long end = start + len - 1;
+	u64 low_slices = 0;
 
-	if (!SLICE_NUM_HIGH)
-		return (mask->low_slices & available->low_slices) ==
-		       mask->low_slices;
+	if (start < SLICE_LOW_TOP) {
+		unsigned long mend = min(end,
+					 (unsigned long)(SLICE_LOW_TOP - 1));
 
-	bitmap_and(result, mask->high_slices,
-		   available->high_slices, slice_count);
+		low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
+				- (1u << GET_LOW_SLICE_INDEX(start));
+	}
+	if ((low_slices & available->low_slices) != low_slices)
+		return false;
 
-	return (mask->low_slices & available->low_slices) == mask->low_slices &&
-		bitmap_equal(result, mask->high_slices, slice_count);
+	if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
+		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
+		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
+		unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - start_index;
+		unsigned long i;
+
+		for (i = start_index; i < start_index + count; i++) {
+			if (!test_bit(i, available->high_slices))
+				return false;
+		}
+	}
+
+	return true;
 }
 
 static void slice_flush_segments(void *parm)
@@ -558,14 +568,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 
 	/* First check hint if it's valid or if we have MAP_FIXED */
 	if (addr != 0 || fixed) {
-		/* Build a mask for the requested range */
-		slice_range_to_mask(addr, len, &mask);
-		slice_print_mask(" mask", &mask);
-
 		/* Check if we fit in the good mask. If we do, we just return,
 		 * nothing else to do
 		 */
-		if (slice_check_fit(mm, &mask, &good_mask)) {
+		if (slice_check_range_fits(mm, &good_mask, addr, len)) {
 			slice_dbg(" fits good !\n");
 			return addr;
 		}
@@ -591,10 +597,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	slice_or_mask(&potential_mask, &good_mask);
 	slice_print_mask(" potential", &potential_mask);
 
-	if ((addr != 0 || fixed) &&
-			slice_check_fit(mm, &mask, &potential_mask)) {
-		slice_dbg(" fits potential !\n");
-		goto convert;
+	if (addr != 0 || fixed) {
+		if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
+			slice_dbg(" fits potential !\n");
+			goto convert;
+		}
 	}
 
 	/* If we have MAP_FIXED and failed the above steps, then error out */
@@ -772,13 +779,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
-	struct slice_mask mask, available;
+	struct slice_mask available;
 	unsigned int psize = mm->context.user_psize;
 
 	if (radix_enabled())
 		return 0;
 
-	slice_range_to_mask(addr, len, &mask);
 	available = *slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
@@ -795,6 +801,6 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	slice_print_mask(" mask", &mask);
 	slice_print_mask(" available", &available);
 #endif
-	return !slice_check_fit(mm, &mask, &available);
+	return !slice_check_range_fits(mm, &available, addr, len);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 06/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (4 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 05/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 07/10] powerpc/mm/slice: remove dead code Nicholas Piggin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which allows 2 inputs to set a different output
without an extra copy, which is used in the next patch.

Adds slice_copy_mask, which will be used in the next patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 0a5efa40e739..4b2fd37b727a 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -429,25 +429,33 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 		return slice_find_area_bottomup(mm, len, mask, psize, high_limit);
 }
 
-static inline void slice_or_mask(struct slice_mask *dst,
+static inline void slice_copy_mask(struct slice_mask *dst,
 					const struct slice_mask *src)
 {
-	dst->low_slices |= src->low_slices;
+	dst->low_slices = src->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_or(dst->high_slices, dst->high_slices, src->high_slices,
-		  SLICE_NUM_HIGH);
+	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
 }
 
-static inline void slice_andnot_mask(struct slice_mask *dst,
-					const struct slice_mask *src)
+static inline void slice_or_mask(struct slice_mask *dst,
+					const struct slice_mask *src1,
+					const struct slice_mask *src2)
 {
-	dst->low_slices &= ~src->low_slices;
+	dst->low_slices = src1->low_slices | src2->low_slices;
+	if (!SLICE_NUM_HIGH)
+		return;
+	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+}
 
+static inline void slice_andnot_mask(struct slice_mask *dst,
+					const struct slice_mask *src1,
+					const struct slice_mask *src2)
+{
+	dst->low_slices = src1->low_slices & ~src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_andnot(dst->high_slices, dst->high_slices, src->high_slices,
-		      SLICE_NUM_HIGH);
+	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
@@ -562,7 +570,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (psize == MMU_PAGE_64K) {
 		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, &compat_mask);
+			slice_or_mask(&good_mask, &good_mask, &compat_mask);
 	}
 #endif
 
@@ -594,7 +602,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * empty and thus can be converted
 	 */
 	slice_mask_for_free(mm, &potential_mask, high_limit);
-	slice_or_mask(&potential_mask, &good_mask);
+	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
 	slice_print_mask(" potential", &potential_mask);
 
 	if (addr != 0 || fixed) {
@@ -631,7 +639,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &compat_mask);
+		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -645,8 +653,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	slice_print_mask(" mask", &mask);
 
  convert:
-	slice_andnot_mask(&mask, &good_mask);
-	slice_andnot_mask(&mask, &compat_mask);
+	slice_andnot_mask(&mask, &mask, &good_mask);
+	slice_andnot_mask(&mask, &mask, &compat_mask);
 	if (mask.low_slices ||
 	    (SLICE_NUM_HIGH &&
 	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
@@ -791,7 +799,7 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	if (psize == MMU_PAGE_64K) {
 		struct slice_mask compat_mask;
 		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, &compat_mask);
+		slice_or_mask(&available, &available, &compat_mask);
 	}
 #endif
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 07/10] powerpc/mm/slice: remove dead code
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (5 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 06/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

This code is never compiled in, and it gets broken by the next
patch, so remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 4b2fd37b727a..c4cb4de1fab5 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -803,12 +803,6 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	}
 #endif
 
-#if 0 /* too verbose */
-	slice_dbg("is_hugepage_only_range(mm=%p, addr=%lx, len=%lx)\n",
-		 mm, addr, len);
-	slice_print_mask(" mask", &mask);
-	slice_print_mask(" available", &available);
-#endif
 	return !slice_check_range_fits(mm, &available, addr, len);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (6 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 07/10] powerpc/mm/slice: remove dead code Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 09/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 2%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 79 ++++++++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index c4cb4de1fab5..b3b465c37224 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -468,10 +468,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 				      unsigned long flags, unsigned int psize,
 				      int topdown)
 {
-	struct slice_mask mask;
 	struct slice_mask good_mask;
 	struct slice_mask potential_mask;
-	struct slice_mask compat_mask;
+	const struct slice_mask *maskp;
+	const struct slice_mask *compat_maskp = NULL;
 	int fixed = (flags & MAP_FIXED);
 	int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
 	unsigned long page_size = 1UL << pshift;
@@ -505,22 +505,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 		on_each_cpu(slice_flush_segments, mm, 1);
 	}
 
-	/*
-	 * init different masks
-	 */
-	mask.low_slices = 0;
-
-	/* silence stupid warning */;
-	potential_mask.low_slices = 0;
-
-	compat_mask.low_slices = 0;
-
-	if (SLICE_NUM_HIGH) {
-		bitmap_zero(mask.high_slices, SLICE_NUM_HIGH);
-		bitmap_zero(potential_mask.high_slices, SLICE_NUM_HIGH);
-		bitmap_zero(compat_mask.high_slices, SLICE_NUM_HIGH);
-	}
-
 	/* Sanity checks */
 	BUG_ON(mm->task_size == 0);
 	BUG_ON(mm->context.slb_addr_limit == 0);
@@ -543,8 +527,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	/* First make up a "good" mask of slices that have the right size
 	 * already
 	 */
-	good_mask = *slice_mask_for_size(mm, psize);
-	slice_print_mask(" good_mask", &good_mask);
+	maskp = slice_mask_for_size(mm, psize);
 
 	/*
 	 * Here "good" means slices that are already the right page size,
@@ -565,14 +548,24 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 *	search in good | compat | free, found => convert free.
 	 */
 
-#ifdef CONFIG_PPC_64K_PAGES
-	/* If we support combo pages, we can allow 64k pages in 4k slices */
-	if (psize == MMU_PAGE_64K) {
-		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
+	/*
+	 * If we support combo pages, we can allow 64k pages in 4k slices
+	 * The mask copies could be avoided in most cases here if we had
+	 * a pointer to good mask for the next code to use.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_64K_PAGES) && psize == MMU_PAGE_64K) {
+		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, &good_mask, &compat_mask);
+			slice_or_mask(&good_mask, maskp, compat_maskp);
+		else
+			slice_copy_mask(&good_mask, maskp);
+	} else {
+		slice_copy_mask(&good_mask, maskp);
 	}
-#endif
+
+	slice_print_mask(" good_mask", &good_mask);
+	if (compat_maskp)
+		slice_print_mask(" compat_mask", compat_maskp);
 
 	/* First check hint if it's valid or if we have MAP_FIXED */
 	if (addr != 0 || fixed) {
@@ -639,7 +632,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &potential_mask, &compat_mask);
+		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -648,17 +641,18 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr == -ENOMEM)
 		return -ENOMEM;
 
-	slice_range_to_mask(addr, len, &mask);
+	slice_range_to_mask(addr, len, &potential_mask);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
-	slice_print_mask(" mask", &mask);
+	slice_print_mask(" mask", &potential_mask);
 
  convert:
-	slice_andnot_mask(&mask, &mask, &good_mask);
-	slice_andnot_mask(&mask, &mask, &compat_mask);
-	if (mask.low_slices ||
-	    (SLICE_NUM_HIGH &&
-	     !bitmap_empty(mask.high_slices, SLICE_NUM_HIGH))) {
-		slice_convert(mm, &mask, psize);
+	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
+	if (compat_maskp && !fixed)
+		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
+	if (potential_mask.low_slices ||
+		(SLICE_NUM_HIGH &&
+		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
+		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
 	}
@@ -787,22 +781,25 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
-	struct slice_mask available;
+	const struct slice_mask *maskp;
 	unsigned int psize = mm->context.user_psize;
 
 	if (radix_enabled())
 		return 0;
 
-	available = *slice_mask_for_size(mm, psize);
+	maskp = slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
 	/* We need to account for 4k slices too */
 	if (psize == MMU_PAGE_64K) {
-		struct slice_mask compat_mask;
-		compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, &available, &compat_mask);
+		const struct slice_mask *compat_maskp;
+		struct slice_mask available;
+
+		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
+		slice_or_mask(&available, maskp, compat_maskp);
+		return !slice_check_range_fits(mm, &available, addr, len);
 	}
 #endif
 
-	return !slice_check_range_fits(mm, &available, addr, len);
+	return !slice_check_range_fits(mm, maskp, addr, len);
 }
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 09/10] powerpc/mm/slice: remove radix calls to the slice code
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (7 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  1:37 ` [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
  9 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

This is a tidy up which removes radix MMU calls into the slice
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hugetlb.h |  7 ++++---
 arch/powerpc/mm/hugetlbpage.c      |  6 ++++--
 arch/powerpc/mm/slice.c            | 17 ++++-------------
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 1a4847f67ea8..9e168407cd1e 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -90,16 +90,17 @@ pte_t *huge_pte_offset_and_shift(struct mm_struct *mm,
 void flush_dcache_icache_hugepage(struct page *page);
 
 #if defined(CONFIG_PPC_MM_SLICES)
-int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
+int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len);
-#else
+#endif
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 					 unsigned long addr,
 					 unsigned long len)
 {
+	if (IS_ENABLED(CONFIG_PPC_MM_SLICES) && !radix_enabled())
+		return slice_is_hugepage_only_range(mm, addr, len);
 	return 0;
 }
-#endif
 
 void book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea,
 			    pte_t pte);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 590be3fa0ce2..f4153f21d214 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -565,10 +565,12 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
 {
 #ifdef CONFIG_PPC_MM_SLICES
-	unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
 	/* With radix we don't use slice, so derive it from vma*/
-	if (!radix_enabled())
+	if (!radix_enabled()) {
+		unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
+
 		return 1UL << mmu_psize_to_shift(psize);
+	}
 #endif
 	if (!is_vm_hugetlb_page(vma))
 		return PAGE_SIZE;
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b3b465c37224..1297b3ad7dd2 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -686,16 +686,8 @@ unsigned int get_slice_psize(struct mm_struct *mm, unsigned long addr)
 	unsigned char *psizes;
 	int index, mask_index;
 
-	/*
-	 * Radix doesn't use slice, but can get enabled along with MMU_SLICE
-	 */
-	if (radix_enabled()) {
-#ifdef CONFIG_PPC_64K_PAGES
-		return MMU_PAGE_64K;
-#else
-		return MMU_PAGE_4K;
-#endif
-	}
+	VM_BUG_ON(radix_enabled());
+
 	if (addr < SLICE_LOW_TOP) {
 		psizes = mm->context.low_slices_psize;
 		index = GET_LOW_SLICE_INDEX(addr);
@@ -778,14 +770,13 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
  * for now as we only use slices with hugetlbfs enabled. This should
  * be fixed as the generic code gets fixed.
  */
-int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
+int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 			   unsigned long len)
 {
 	const struct slice_mask *maskp;
 	unsigned int psize = mm->context.user_psize;
 
-	if (radix_enabled())
-		return 0;
+	VM_BUG_ON(radix_enabled());
 
 	maskp = slice_mask_for_size(mm, psize);
 #ifdef CONFIG_PPC_64K_PAGES
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
                   ` (8 preceding siblings ...)
  2018-03-07  1:37 ` [PATCH v2 09/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
@ 2018-03-07  1:37 ` Nicholas Piggin
  2018-03-07  3:22   ` Nicholas Piggin
  9 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  1:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy

The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.

Perormance numbers aren't changed significantly, this may change
with larger address spaces or different mmap access patterns that
require more slice mask building.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/slice.c | 75 +++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 1297b3ad7dd2..09b95e976de9 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -61,14 +61,12 @@ static void slice_print_mask(const char *label, const struct slice_mask *mask) {
 #endif
 
 static void slice_range_to_mask(unsigned long start, unsigned long len,
-				struct slice_mask *ret)
+				struct slice_mask *ret,
+				unsigned long high_slices)
 {
 	unsigned long end = start + len - 1;
 
 	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
 	if (start < SLICE_LOW_TOP) {
 		unsigned long mend = min(end,
 					 (unsigned long)(SLICE_LOW_TOP - 1));
@@ -77,6 +75,11 @@ static void slice_range_to_mask(unsigned long start, unsigned long len,
 			- (1u << GET_LOW_SLICE_INDEX(start));
 	}
 
+	if (!SLICE_NUM_HIGH)
+		return;
+
+#error XXX: bitmap_zero with non-const size? Good code?
+	bitmap_zero(ret->high_slices, high_slices);
 	if ((start + len) > SLICE_LOW_TOP) {
 		unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
 		unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
@@ -120,22 +123,20 @@ static int slice_high_has_vma(struct mm_struct *mm, unsigned long slice)
 }
 
 static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
-				unsigned long high_limit)
+				unsigned long high_slices)
 {
 	unsigned long i;
 
 	ret->low_slices = 0;
-	if (SLICE_NUM_HIGH)
-		bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
 	for (i = 0; i < SLICE_NUM_LOW; i++)
 		if (!slice_low_has_vma(mm, i))
 			ret->low_slices |= 1u << i;
 
-	if (high_limit <= SLICE_LOW_TOP)
+	if (!SLICE_NUM_HIGH || !high_slices)
 		return;
 
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
+	bitmap_zero(ret->high_slices, high_slices);
+	for (i = 0; i < high_slices; i++)
 		if (!slice_high_has_vma(mm, i))
 			__set_bit(i, ret->high_slices);
 }
@@ -228,6 +229,7 @@ static void slice_convert(struct mm_struct *mm,
 {
 	int index, mask_index;
 	/* Write the new slice psize bits */
+	unsigned long high_slices;
 	unsigned char *hpsizes, *lpsizes;
 	struct slice_mask *psize_mask, *old_mask;
 	unsigned long i, flags;
@@ -263,7 +265,8 @@ static void slice_convert(struct mm_struct *mm,
 	}
 
 	hpsizes = mm->context.high_slices_psize;
-	for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) {
+	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	for (i = 0; SLICE_NUM_HIGH && i < high_slices; i++) {
 		if (!test_bit(i, mask->high_slices))
 			continue;
 
@@ -430,32 +433,35 @@ static unsigned long slice_find_area(struct mm_struct *mm, unsigned long len,
 }
 
 static inline void slice_copy_mask(struct slice_mask *dst,
-					const struct slice_mask *src)
+					const struct slice_mask *src,
+					unsigned long high_slices)
 {
 	dst->low_slices = src->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
+	bitmap_copy(dst->high_slices, src->high_slices, high_slices);
 }
 
 static inline void slice_or_mask(struct slice_mask *dst,
 					const struct slice_mask *src1,
-					const struct slice_mask *src2)
+					const struct slice_mask *src2,
+					unsigned long high_slices)
 {
 	dst->low_slices = src1->low_slices | src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+	bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices, high_slices);
 }
 
 static inline void slice_andnot_mask(struct slice_mask *dst,
 					const struct slice_mask *src1,
-					const struct slice_mask *src2)
+					const struct slice_mask *src2,
+					unsigned long high_slices)
 {
 	dst->low_slices = src1->low_slices & ~src2->low_slices;
 	if (!SLICE_NUM_HIGH)
 		return;
-	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, SLICE_NUM_HIGH);
+	bitmap_andnot(dst->high_slices, src1->high_slices, src2->high_slices, high_slices);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
@@ -478,6 +484,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	struct mm_struct *mm = current->mm;
 	unsigned long newaddr;
 	unsigned long high_limit;
+	unsigned long high_slices;
 
 	high_limit = DEFAULT_MAP_WINDOW;
 	if (addr >= high_limit || (fixed && (addr + len > high_limit)))
@@ -494,6 +501,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 			return -ENOMEM;
 	}
 
+	high_slices = GET_HIGH_SLICE_INDEX(high_limit);
 	if (high_limit > mm->context.slb_addr_limit) {
 		/*
 		 * Increasing the slb_addr_limit does not require
@@ -556,11 +564,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (IS_ENABLED(CONFIG_PPC_64K_PAGES) && psize == MMU_PAGE_64K) {
 		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
 		if (fixed)
-			slice_or_mask(&good_mask, maskp, compat_maskp);
+			slice_or_mask(&good_mask, maskp, compat_maskp, high_slices);
 		else
-			slice_copy_mask(&good_mask, maskp);
+			slice_copy_mask(&good_mask, maskp, high_slices);
 	} else {
-		slice_copy_mask(&good_mask, maskp);
+		slice_copy_mask(&good_mask, maskp, high_slices);
 	}
 
 	slice_print_mask(" good_mask", &good_mask);
@@ -594,8 +602,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	 * We don't fit in the good mask, check what other slices are
 	 * empty and thus can be converted
 	 */
-	slice_mask_for_free(mm, &potential_mask, high_limit);
-	slice_or_mask(&potential_mask, &potential_mask, &good_mask);
+	slice_mask_for_free(mm, &potential_mask, high_slices);
+	slice_or_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
 	slice_print_mask(" potential", &potential_mask);
 
 	if (addr != 0 || fixed) {
@@ -632,7 +640,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 #ifdef CONFIG_PPC_64K_PAGES
 	if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
 		/* retry the search with 4k-page slices included */
-		slice_or_mask(&potential_mask, &potential_mask, compat_maskp);
+		slice_or_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
 		addr = slice_find_area(mm, len, &potential_mask,
 				       psize, topdown, high_limit);
 	}
@@ -641,17 +649,18 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len,
 	if (addr == -ENOMEM)
 		return -ENOMEM;
 
-	slice_range_to_mask(addr, len, &potential_mask);
+	slice_range_to_mask(addr, len, &potential_mask, high_slices);
 	slice_dbg(" found potential area at 0x%lx\n", addr);
 	slice_print_mask(" mask", &potential_mask);
 
  convert:
-	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask);
+	slice_andnot_mask(&potential_mask, &potential_mask, &good_mask, high_slices);
 	if (compat_maskp && !fixed)
-		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp);
+		slice_andnot_mask(&potential_mask, &potential_mask, compat_maskp, high_slices);
+#error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
 	if (potential_mask.low_slices ||
 		(SLICE_NUM_HIGH &&
-		 !bitmap_empty(potential_mask.high_slices, SLICE_NUM_HIGH))) {
+		 !bitmap_empty(potential_mask.high_slices, high_slices))) {
 		slice_convert(mm, &potential_mask, psize);
 		if (psize > MMU_PAGE_BASE)
 			on_each_cpu(slice_flush_segments, mm, 1);
@@ -722,7 +731,9 @@ void slice_init_new_context_exec(struct mm_struct *mm)
 	mm->context.user_psize = psize;
 
 	/*
-	 * Set all slice psizes to the default.
+	 * Set all slice psizes to the default. High slices could
+	 * be initialised up to slb_addr_limit if we ensure to
+	 * initialise the rest of them as slb_addr_limit is expanded.
 	 */
 	lpsizes = mm->context.low_slices_psize;
 	memset(lpsizes, (psize << 4) | psize, SLICE_NUM_LOW >> 1);
@@ -743,10 +754,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
 			   unsigned long len, unsigned int psize)
 {
 	struct slice_mask mask;
+	unsigned long high_slices;
 
 	VM_BUG_ON(radix_enabled());
 
-	slice_range_to_mask(start, len, &mask);
+	high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+	slice_range_to_mask(start, len, &mask, high_slices);
 	slice_convert(mm, &mask, psize);
 }
 
@@ -784,9 +797,11 @@ int slice_is_hugepage_only_range(struct mm_struct *mm, unsigned long addr,
 	if (psize == MMU_PAGE_64K) {
 		const struct slice_mask *compat_maskp;
 		struct slice_mask available;
+		unsigned long high_slices;
 
 		compat_maskp = slice_mask_for_size(mm, MMU_PAGE_4K);
-		slice_or_mask(&available, maskp, compat_maskp);
+		high_slices = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+		slice_or_mask(&available, maskp, compat_maskp, high_slices);
 		return !slice_check_range_fits(mm, &available, addr, len);
 	}
 #endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-07  1:37 ` [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
@ 2018-03-07  3:22   ` Nicholas Piggin
  2018-03-07 10:45     ` Michael Ellerman
  0 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2018-03-07  3:22 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Christophe Leroy

On Wed,  7 Mar 2018 11:37:18 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> The number of high slices a process might use now depends on its
> address space size, and what allocation address it has requested.
> 
> This patch uses that limit throughout call chains where possible,
> rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
> This saves some cost for processes that don't use very large address
> spaces.
> 
> Perormance numbers aren't changed significantly, this may change
> with larger address spaces or different mmap access patterns that
> require more slice mask building.


Ignore this patch in the series. I didn't intend to send it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations
  2018-03-07  3:22   ` Nicholas Piggin
@ 2018-03-07 10:45     ` Michael Ellerman
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Ellerman @ 2018-03-07 10:45 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V

Nicholas Piggin <npiggin@gmail.com> writes:
> On Wed,  7 Mar 2018 11:37:18 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>
>> The number of high slices a process might use now depends on its
>> address space size, and what allocation address it has requested.
>> 
>> This patch uses that limit throughout call chains where possible,
>> rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
>> This saves some cost for processes that don't use very large address
>> spaces.
>> 
>> Perormance numbers aren't changed significantly, this may change
>> with larger address spaces or different mmap access patterns that
>> require more slice mask building.
>
>
> Ignore this patch in the series. I didn't intend to send it.

Oops :D

I'll drop it and rebuild.

cheers

kisskb: Failed 206/268
http://kisskb.ellerman.id.au/kisskb/head/82bc47f26969f6ed290cda529b9893941923c0f4/
  Failed: powerpc-next/pseries_le_defconfig+NO_NUMA/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296349/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_le_defconfig+NO_SPLPAR/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296347/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_SPLPAR/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296346/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_SPLPAR/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296345/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pmac32_defconfig+KVM/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296343/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
  Failed: powerpc-next/allmodconfig+64K_PAGES/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296342/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/allmodconfig+64K_PAGES/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296341/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/skiroot_defconfig/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296339/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig+NO_NUMA/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296338/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig+STRICT_RWX/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296337/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/corenet64_smp_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296332/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:10: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/corenet32_smp_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296331/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:10: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/ppc64_defconfig+NO_HUGETLB/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296322/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_KVM/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296321/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_MEMORY_HOTPLUG/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296319/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powerpc-randconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296317/log/)
    /kisskb/src/arch/powerpc/kernel/watchdog.c:168:3: error: implicit declaration of function 'smp_flush_nmi_ipi' [-Werror=implicit-function-declaration]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/kernel/watchdog.c:166:4: error: implicit declaration of function 'smp_send_nmi_ipi' [-Werror=implicit-function-declaration]
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_TM/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296316/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_MEMORY_HOTREMOVE/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296292/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/mpc85xx_smp_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296256/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:10: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/ppc64_defconfig+UP/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296248/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_ALTIVEC/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296247/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powerpc-allyesconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296246/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powerpc-allmodconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296243/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296239/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/mpc85xx_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296233/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:10: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/pasemi_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296230/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ps3_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296229/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/maple_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296228/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/g5_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296227/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/cell_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296225/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296222/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296221/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_ALTIVEC/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296219/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1611:1: error: label 'out' defined but not used [-Werror=unused-label]
  Failed: powerpc-next/corenet64_smp_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296213/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:3: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/corenet32_smp_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296212/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:3: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/pseries_defconfig+FA_DUMP/powerpc-5.3		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296201/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_HUGETLB/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296200/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/allmodconfig+ppc64le/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296199/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+FA_DUMP/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296198/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64le_defconfig+NO_KVM/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296197/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_KVM/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296196/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_le_defconfig/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296195/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64le_defconfig/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296194/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_TM/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296193/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+UP/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296181/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig+NO_RADIX/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296177/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig+NO_MEMORY_HOTPLUG/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296175/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powernv_defconfig+NO_RADIX/ppc64le		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296173/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/maple_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296170/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/mpc85xx_smp_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296120/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:3: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/powerpc-randconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296110/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/mpc85xx_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296105/log/)
    /kisskb/src/arch/powerpc/include/asm/hugetlb.h:101:3: error: implicit declaration of function 'slice_is_hugepage_only_range' [-Werror=implicit-function-declaration]
  Failed: powerpc-next/powerpc-allyesconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296104/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/powerpc-allmodconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296103/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pasemi_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296101/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ppc64_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296100/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/ps3_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296096/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/g5_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296095/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/cell_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296093/log/)
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?
  Failed: powerpc-next/pseries_defconfig/powerpc		(http://kisskb.ellerman.id.au/kisskb/buildresult/13296091/log/)
    /kisskb/src/arch/powerpc/kvm/powerpc.c:1361:2: error: 'emulated' may be used uninitialized in this function [-Werror=uninitialized]
    /kisskb/src/arch/powerpc/mm/slice.c:81:2: error: #error XXX: bitmap_zero with non-const size? Good code?
    /kisskb/src/arch/powerpc/mm/slice.c:660:2: error: #error XXX: are we sure high_slices is > 0 here when SLICE_NUM_HIGH? Perhaps not for 32-bit tasks on 64?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2, 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation
  2018-03-07  1:37 ` [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
@ 2018-03-14  9:28   ` Michael Ellerman
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Ellerman @ 2018-03-14  9:28 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

On Wed, 2018-03-07 at 01:37:09 UTC, Nicholas Piggin wrote:
> The slice state of an mm gets zeroed then initialised upon exec.
> This is the only caller of slice_set_user_psize now, so that can be
> removed and instead implement a faster and simplified approach that
> requires no locking or checking existing state.
> 
> This speeds up vfork+exec+exit performance on POWER8 by 3%.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Patches 1-9 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1753dd1830367709144f68f539554d

cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-14  9:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-07  1:37 [PATCH v2 00/10] powerpc/mm/slice: improve slice speed and stack use Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 01/10] powerpc/mm/slice: Simplify and optimise slice context initialisation Nicholas Piggin
2018-03-14  9:28   ` [v2, " Michael Ellerman
2018-03-07  1:37 ` [PATCH v2 02/10] powerpc/mm/slice: tidy lpsizes and hpsizes update loops Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 03/10] powerpc/mm/slice: pass pointers to struct slice_mask where possible Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 04/10] powerpc/mm/slice: implement a slice mask cache Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 05/10] powerpc/mm/slice: implement slice_check_range_fits Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 06/10] powerpc/mm/slice: Switch to 3-operand slice bitops helpers Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 07/10] powerpc/mm/slice: remove dead code Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 08/10] powerpc/mm/slice: Use const pointers to cached slice masks where possible Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 09/10] powerpc/mm/slice: remove radix calls to the slice code Nicholas Piggin
2018-03-07  1:37 ` [PATCH v2 10/10] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations Nicholas Piggin
2018-03-07  3:22   ` Nicholas Piggin
2018-03-07 10:45     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).