[PATCH V3 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
@ 2014-04-28  9:01 Madhavan Srinivasan
  2014-04-28  9:01 ` [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
  2014-04-28  9:01 ` [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
  0 siblings, 2 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-28  9:01 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Kirill A. Shutemov with 8c6e50b029 commit introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patch creates infrastructure to modify the FAULT_AROUND_ORDER
value using mm/Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. First patch also defaults
FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
out the performance numbers for powerpc (platform pseries) and
initialize the fault around order variable for pseries platform of
powerpc.

V3 Changes:
  Replaced FAULT_AROUND_ORDER macro to a variable to support arch's that
   supports sub platforms.
  Made changes in commit messages.

V2 Changes:
  Created Kconfig parameter for FAULT_AROUND_ORDER
  Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0
  Made changes in commit messages.

Madhavan Srinivasan (2):
  mm: move FAULT_AROUND_ORDER to arch/
  powerpc/pseries: init fault_around_order for pseries

 arch/powerpc/platforms/pseries/setup.c |    5 +++++
 mm/Kconfig                             |    8 ++++++++
 mm/memory.c                            |   11 ++++-------
 3 files changed, 17 insertions(+), 7 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-04-28  9:01 [PATCH V3 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
@ 2014-04-28  9:01 ` Madhavan Srinivasan
  2014-04-28  9:06   ` Peter Zijlstra
  2014-04-28  9:36   ` Kirill A. Shutemov
  2014-04-28  9:01 ` [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
  1 sibling, 2 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-28  9:01 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Kirill A. Shutemov with 8c6e50b029 commit introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patch creates infrastructure to modify the FAULT_AROUND_ORDER
value using mm/Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. Patch also defaults
FAULT_AROUND_ORDER Kconfig element to 4.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 mm/Kconfig  |    8 ++++++++
 mm/memory.c |   11 ++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ebe5880..c7fc4f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -176,6 +176,14 @@ config MOVABLE_NODE
 config HAVE_BOOTMEM_INFO_NODE
 	def_bool n
 
+#
+# Fault around order is a control knob to decide the fault around pages.
+# Default value is set to 4 , but the arch can override it as desired.
+#
+config FAULT_AROUND_ORDER
+	int
+	default	4
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"
diff --git a/mm/memory.c b/mm/memory.c
index d0f0bef..457436d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3382,11 +3382,9 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 	update_mmu_cache(vma, address, pte);
 }
 
-#define FAULT_AROUND_ORDER 4
+unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
 
 #ifdef CONFIG_DEBUG_FS
-static unsigned int fault_around_order = FAULT_AROUND_ORDER;
-
 static int fault_around_order_get(void *data, u64 *val)
 {
 	*val = fault_around_order;
@@ -3395,7 +3393,6 @@ static int fault_around_order_get(void *data, u64 *val)
 
 static int fault_around_order_set(void *data, u64 val)
 {
-	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
 	if (1UL << val > PTRS_PER_PTE)
 		return -EINVAL;
 	fault_around_order = val;
@@ -3430,14 +3427,14 @@ static inline unsigned long fault_around_pages(void)
 {
 	unsigned long nr_pages;
 
-	nr_pages = 1UL << FAULT_AROUND_ORDER;
+	nr_pages = 1UL << fault_around_order;
 	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
 	return nr_pages;
 }
 
 static inline unsigned long fault_around_mask(void)
 {
-	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
+	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
 }
 #endif
 
@@ -3495,7 +3492,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * if page by the offset is not ready to be mapped (cold cache or
 	 * something).
 	 */
-	if (vma->vm_ops->map_pages) {
+	if ((vma->vm_ops->map_pages) && (fault_around_order)) {
 		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 		do_fault_around(vma, address, pte, pgoff, flags);
 		if (!pte_same(*pte, orig_pte))
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-28  9:01 [PATCH V3 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
  2014-04-28  9:01 ` [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
@ 2014-04-28  9:01 ` Madhavan Srinivasan
  2014-04-29  2:18   ` Rusty Russell
  2014-04-29  7:06   ` Ingo Molnar
  1 sibling, 2 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-28  9:01 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Performance data for different FAULT_AROUND_ORDER values from 4 socket
Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
v3.15-rc1 for different fault around order values.

FAULT_AROUND_ORDER      Baseline        1               3               4               5               8

Linux build (make -j64)
minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
 stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
 %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%

Linux rebuild (make -j64)
minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
 stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
 %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%

Binutils build (make all -j64 )
minor-faults            474,821         371,380         269,463         247,715         235,255         228,337
times in seconds        53.882492432    53.584289348    53.882773216    53.755816431    53.607824348    53.423759642
 stddev for time        ( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  0.11% )   ( +-  0.60% )   ( +-  0.69% )
 %chg time to baseline                  -0.55%          0.0%            -0.23%          -0.51%          -0.85%

Two synthetic tests: access every word in file in sequential/random order.

Sequential access 16GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     263,148         131,166         32,908          16,514          8,260           1,093
       times in seconds 53.091138345    53.113191672    53.188776177    53.233017218    53.206841347    53.429979442
       stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  0.09% )   ( +-  0.03% )   ( +-  0.03% )
       %chg time to baseline            0.04%           0.18%           0.26%           0.21%           0.63%
8 threads
       minor-faults     2,097,267       1,048,753       262,237         131,397         65,621          8,274
       times in seconds 55.173790028    54.591880790    54.824623287    54.802162211    54.969680503    54.790387715
       stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  0.07% )   ( +-  0.28% )   ( +-  0.05% )
       %chg time to baseline            -1.05%          -0.63%          -0.67%          -0.36%          -0.69%
32 threads
       minor-faults     8,388,751       4,195,621       1,049,664       525,461         262,535         32,924
       times in seconds 60.431573046    60.669110744    60.485336388    60.697789706    60.077959564    60.588855032
       stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  0.67% )   ( +-  0.31% )   ( +-  0.49% )
       %chg time to baseline            0.39%           0.08%           0.44%           -0.58%          0.25%
64 threads
       minor-faults     16,777,409      8,607,527       2,289,766       1,202,264       598,405         67,587
       times in seconds 96.932617720    100.675418760   102.109880836   103.881733383   102.580199555   105.751194041
       stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  0.76% )   ( +-  1.65% )   ( +-  1.60% )
       %chg time to baseline            3.86%           5.34%           7.16%           5.82%           9.09%
128 threads
       minor-faults     33,554,705      17,375,375      4,682,462       2,337,245       1,179,007       134,819
       times in seconds 128.766704495   115.659225437   120.353046307   115.291871270   115.450886036   113.991902150
       stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  1.24% )   ( +-  1.03% )   ( +-  0.70% )
       %chg time to baseline            -10.17%         -6.53%          -10.46%         -10.34%         -11.47%

Random access 1GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     17,155          8,678           2,126           1,097           581             134
       times in seconds 51.904430523    51.658017987    51.919270792    51.560531738    52.354431597    51.976469502
       stddev for time  ( +-  3.19% )   ( +-  1.35% )   ( +-  1.56% )   ( +-  0.91% )   ( +-  1.70% )   ( +-  2.02% )
       %chg time to baseline            -0.47%          0.02%           -0.66%          0.86%           0.13%
8 threads
       minor-faults     131,844         70,705          17,457          8,505           4,251           598
       times in seconds 58.162813956    54.991706305    54.952675791    55.323057492    54.755587379    53.376722828
       stddev for time  ( +-  1.44% )   ( +-  0.69% )   ( +-  1.23% )   ( +-  2.78% )   ( +-  1.90% )   ( +-  2.91% )
       %chg time to baseline            -5.45%          -5.52%          -4.88%          -5.86%          -8.22%
32 threads
       minor-faults     524,437         270,760         67,069          33,414          16,641          2,204
       times in seconds 69.981777072    76.539570015    79.753578505    76.245943618    77.254258344    79.072596831
       stddev for time  ( +-  2.81% )   ( +-  1.95% )   ( +-  2.66% )   ( +-  0.99% )   ( +-  2.35% )   ( +-  3.22% )
       %chg time to baseline            9.37%           13.96%          8.95%           10.39%          12.98%
64 threads
       minor-faults     1,049,117       527,451         134,016         66,638          33,391          4,559
       times in seconds 108.024517536   117.575067996   115.322659914   111.943998437   115.049450815   119.218450840
       stddev for time  ( +-  2.40% )   ( +-  1.77% )   ( +-  1.19% )   ( +-  3.29% )   ( +-  2.32% )   ( +-  1.42% )
       %chg time to baseline            8.84%           6.75%           3.62%           6.5%            10.3%
128 threads
       minor-faults     2,097,440       1,054,360       267,042         133,328         66,532          8,652
       times in seconds 155.055861167   153.059625968   152.449492156   151.024005282   150.844647770   155.954366718
       stddev for time  ( +-  1.32% )   ( +-  1.14% )   ( +-  1.32% )   ( +-  0.81% )   ( +-  0.75% )   ( +-  0.72% )
       %chg time to baseline            -1.28%          -1.68%          -2.59%          -2.71%          0.57%

Incase of Kernel compilation, fault around order (fao) of 1 and 3 provides fast compilation time
when compared to a value of 4. On closer look, fao of 3 has higher agains. Incase of Sequential access
synthetic tests fao of 1 has higher gains and in Random access test, fao of 3 has marginal gains.
Going by compilation time, fao value of 3 is suggested in this patch for pseries platform.

Worst case scenario: we touch one page every 16M to demonstrate overhead.

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     1,104           1,090           1,071           1,068           1,065           1,063
       times in seconds 0.006583298     0.008531502     0.019733795     0.036033763     0.062300553     0.406857086
       stddev for time  ( +-  2.79% )   ( +-  2.42% )   ( +-  3.47% )   ( +-  2.81% )   ( +-  2.01% )   ( +-  1.33% )
8 threads
       minor-faults     8,279           8,264           8,245           8,243           8,239           8,240
       times in seconds 0.044572398     0.057211811     0.107606306     0.205626815     0.381679120     2.647979955
       stddev for time  ( +-  1.95% )   ( +-  2.98% )   ( +-  1.74% )   ( +-  2.80% )   ( +-  2.01% )   ( +-  1.86% )
32 threads
       minor-faults     32,879          32,864          32,849          32,845          32,839          32,843
       times in seconds 0.197659343     0.218486087     0.445116407     0.694235883     1.296894038     9.127517045
       stddev for time  ( +-  3.05% )   ( +-  3.05% )   ( +-  4.33% )   ( +-  3.08% )   ( +-  3.75% )   ( +-  0.56% )
64 threads
       minor-faults     65,680          65,664          65,646          65,645          65,640          65,647
       times in seconds 0.455537304     0.489688780     0.866490093     1.427393118     2.379628982     17.059295051
       stddev for time  ( +-  4.01% )   ( +-  4.13% )   ( +-  2.92% )   ( +-  1.68% )   ( +-  1.79% )   ( +-  0.48% )
128 threads
       minor-faults     131,279         131,265         131,250         131,245         131,241         131,254
       times in seconds 1.026880651     1.095327536     1.721728274     2.808233068     4.662729948     31.732848290
       stddev for time  ( +-  6.85% )   ( +-  4.09% )   ( +-  1.71% )   ( +-  3.45% )   ( +-  2.40% )   ( +-  0.68% )

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/setup.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2db8cc6..c87e6b6 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,6 +74,8 @@ int CMO_SecPSP = -1;
 unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
 EXPORT_SYMBOL(CMO_PageSize);
 
+extern unsigned int fault_around_order;
+
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
 static struct device_node *pSeries_mpic_node;
@@ -465,6 +467,9 @@ static void __init pSeries_setup_arch(void)
 {
 	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
 
+	/* Measured on a 4 socket Power7 system (128 Threads and 128GB memory) */
+	fault_around_order = 3;
+
 	/* Discover PIC type and setup ppc_md accordingly */
 	pseries_discover_pic();
 
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-04-28  9:01 ` [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
@ 2014-04-28  9:06   ` Peter Zijlstra
  2014-04-29  9:33     ` Madhavan Srinivasan
  2014-04-28  9:36   ` Kirill A. Shutemov
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-04-28  9:06 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, mingo,
	dave.hansen

On Mon, Apr 28, 2014 at 02:31:29PM +0530, Madhavan Srinivasan wrote:
> +unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;

__read_mostly?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-04-28  9:01 ` [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
  2014-04-28  9:06   ` Peter Zijlstra
@ 2014-04-28  9:36   ` Kirill A. Shutemov
  2014-04-29  9:35     ` Madhavan Srinivasan
  1 sibling, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2014-04-28  9:36 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz,
	mingo, dave.hansen

Madhavan Srinivasan wrote:
> Kirill A. Shutemov with 8c6e50b029 commit introduced
> vm_ops->map_pages() for mapping easy accessible pages around
> fault address in hope to reduce number of minor page faults.
> 
> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
> value using mm/Kconfig. This will enable architecture maintainers
> to decide on suitable FAULT_AROUND_ORDER value based on
> performance data for that architecture. Patch also defaults
> FAULT_AROUND_ORDER Kconfig element to 4.
> 
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> ---
>  mm/Kconfig  |    8 ++++++++
>  mm/memory.c |   11 ++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebe5880..c7fc4f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -176,6 +176,14 @@ config MOVABLE_NODE
>  config HAVE_BOOTMEM_INFO_NODE
>  	def_bool n
>  
> +#
> +# Fault around order is a control knob to decide the fault around pages.
> +# Default value is set to 4 , but the arch can override it as desired.
> +#
> +config FAULT_AROUND_ORDER
> +	int
> +	default	4
> +
>  # eventually, we can have this option just 'select SPARSEMEM'
>  config MEMORY_HOTPLUG
>  	bool "Allow for memory hot-add"
> diff --git a/mm/memory.c b/mm/memory.c
> index d0f0bef..457436d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3382,11 +3382,9 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>  	update_mmu_cache(vma, address, pte);
>  }
>  
> -#define FAULT_AROUND_ORDER 4
> +unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
>  
>  #ifdef CONFIG_DEBUG_FS
> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
> -
>  static int fault_around_order_get(void *data, u64 *val)
>  {
>  	*val = fault_around_order;
> @@ -3395,7 +3393,6 @@ static int fault_around_order_get(void *data, u64 *val)
>  
>  static int fault_around_order_set(void *data, u64 val)
>  {
> -	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
>  	if (1UL << val > PTRS_PER_PTE)
>  		return -EINVAL;
>  	fault_around_order = val;
> @@ -3430,14 +3427,14 @@ static inline unsigned long fault_around_pages(void)
>  {
>  	unsigned long nr_pages;
>  
> -	nr_pages = 1UL << FAULT_AROUND_ORDER;
> +	nr_pages = 1UL << fault_around_order;
>  	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);

This BUILD_BUG_ON() doesn't make sense any more since compiler usually can't
prove anything about extern variable.

Drop it or change to VM_BUG_ON() or something.

>  	return nr_pages;
>  }
>  
>  static inline unsigned long fault_around_mask(void)
>  {
> -	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
> +	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);

fault_around_pages() and fault_around_mask() should be moved outside of
#ifdef CONFIG_DEBUG_FS and consolidated since they are practically identical
both branches after the changes.

>  }
>  #endif
>  
> @@ -3495,7 +3492,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * if page by the offset is not ready to be mapped (cold cache or
>  	 * something).
>  	 */
> -	if (vma->vm_ops->map_pages) {
> +	if ((vma->vm_ops->map_pages) && (fault_around_order)) {

Way too many parentheses.

>  		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
>  		do_fault_around(vma, address, pte, pgoff, flags);
>  		if (!pte_same(*pte, orig_pte))
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-28  9:01 ` [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
@ 2014-04-29  2:18   ` Rusty Russell
  2014-04-29  9:36     ` Madhavan Srinivasan
  2014-04-29  7:06   ` Ingo Molnar
  1 sibling, 1 reply; 14+ messages in thread
From: Rusty Russell @ 2014-04-29  2:18 UTC (permalink / raw)
  To: Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz,
	mingo, dave.hansen

Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index 2db8cc6..c87e6b6 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -74,6 +74,8 @@ int CMO_SecPSP = -1;
>  unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
>  EXPORT_SYMBOL(CMO_PageSize);
>  
> +extern unsigned int fault_around_order;
> +

It's considered bad form to do this.  Put the declaration in linux/mm.h.

Thanks,
Rusty.
PS.  But we're getting there! :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-28  9:01 ` [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
  2014-04-29  2:18   ` Rusty Russell
@ 2014-04-29  7:06   ` Ingo Molnar
  2014-04-29 10:35     ` Madhavan Srinivasan
  2014-04-30  7:04     ` Rusty Russell
  1 sibling, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2014-04-29  7:06 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz,
	dave.hansen, Linus Torvalds


* Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:

> Performance data for different FAULT_AROUND_ORDER values from 4 socket
> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
> v3.15-rc1 for different fault around order values.
> 
> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> 
> Linux build (make -j64)
> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%

Probably too noisy.

> Linux rebuild (make -j64)
> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%

Here it looks like a speedup. Optimal value: 5+.

> Binutils build (make all -j64 )
> minor-faults            474,821         371,380         269,463         247,715         235,255         228,337
> times in seconds        53.882492432    53.584289348    53.882773216    53.755816431    53.607824348    53.423759642
>  stddev for time        ( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  0.11% )   ( +-  0.60% )   ( +-  0.69% )
>  %chg time to baseline                  -0.55%          0.0%            -0.23%          -0.51%          -0.85%

Probably too noisy, but looks like a potential slowdown?

> Two synthetic tests: access every word in file in sequential/random order.
> 
> Sequential access 16GiB file
> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> 1 thread
>        minor-faults     263,148         131,166         32,908          16,514          8,260           1,093
>        times in seconds 53.091138345    53.113191672    53.188776177    53.233017218    53.206841347    53.429979442
>        stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  0.09% )   ( +-  0.03% )   ( +-  0.03% )
>        %chg time to baseline            0.04%           0.18%           0.26%           0.21%           0.63%

Speedup, optimal value: 8+.

> 8 threads
>        minor-faults     2,097,267       1,048,753       262,237         131,397         65,621          8,274
>        times in seconds 55.173790028    54.591880790    54.824623287    54.802162211    54.969680503    54.790387715
>        stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  0.07% )   ( +-  0.28% )   ( +-  0.05% )
>        %chg time to baseline            -1.05%          -0.63%          -0.67%          -0.36%          -0.69%

Looks like a regression?

> 32 threads
>        minor-faults     8,388,751       4,195,621       1,049,664       525,461         262,535         32,924
>        times in seconds 60.431573046    60.669110744    60.485336388    60.697789706    60.077959564    60.588855032
>        stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  0.67% )   ( +-  0.31% )   ( +-  0.49% )
>        %chg time to baseline            0.39%           0.08%           0.44%           -0.58%          0.25%

Probably too noisy.

> 64 threads
>        minor-faults     16,777,409      8,607,527       2,289,766       1,202,264       598,405         67,587
>        times in seconds 96.932617720    100.675418760   102.109880836   103.881733383   102.580199555   105.751194041
>        stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  0.76% )   ( +-  1.65% )   ( +-  1.60% )
>        %chg time to baseline            3.86%           5.34%           7.16%           5.82%           9.09%

Speedup, optimal value: 4+

> 128 threads
>        minor-faults     33,554,705      17,375,375      4,682,462       2,337,245       1,179,007       134,819
>        times in seconds 128.766704495   115.659225437   120.353046307   115.291871270   115.450886036   113.991902150
>        stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  1.24% )   ( +-  1.03% )   ( +-  0.70% )
>        %chg time to baseline            -10.17%         -6.53%          -10.46%         -10.34%         -11.47%

Rather significant regression at order 1 already.

> Random access 1GiB file
> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> 1 thread
>        minor-faults     17,155          8,678           2,126           1,097           581             134
>        times in seconds 51.904430523    51.658017987    51.919270792    51.560531738    52.354431597    51.976469502
>        stddev for time  ( +-  3.19% )   ( +-  1.35% )   ( +-  1.56% )   ( +-  0.91% )   ( +-  1.70% )   ( +-  2.02% )
>        %chg time to baseline            -0.47%          0.02%           -0.66%          0.86%           0.13%

Probably too noisy.

> 8 threads
>        minor-faults     131,844         70,705          17,457          8,505           4,251           598
>        times in seconds 58.162813956    54.991706305    54.952675791    55.323057492    54.755587379    53.376722828
>        stddev for time  ( +-  1.44% )   ( +-  0.69% )   ( +-  1.23% )   ( +-  2.78% )   ( +-  1.90% )   ( +-  2.91% )
>        %chg time to baseline            -5.45%          -5.52%          -4.88%          -5.86%          -8.22%

Regression.

> 32 threads
>        minor-faults     524,437         270,760         67,069          33,414          16,641          2,204
>        times in seconds 69.981777072    76.539570015    79.753578505    76.245943618    77.254258344    79.072596831
>        stddev for time  ( +-  2.81% )   ( +-  1.95% )   ( +-  2.66% )   ( +-  0.99% )   ( +-  2.35% )   ( +-  3.22% )
>        %chg time to baseline            9.37%           13.96%          8.95%           10.39%          12.98%

Speedup, optimal value hard to tell due to noise - 3+ or 8+.

> 64 threads
>        minor-faults     1,049,117       527,451         134,016         66,638          33,391          4,559
>        times in seconds 108.024517536   117.575067996   115.322659914   111.943998437   115.049450815   119.218450840
>        stddev for time  ( +-  2.40% )   ( +-  1.77% )   ( +-  1.19% )   ( +-  3.29% )   ( +-  2.32% )   ( +-  1.42% )
>        %chg time to baseline            8.84%           6.75%           3.62%           6.5%            10.3%

Speedup, optimal value again hard to tell due to noise.

> 128 threads
>        minor-faults     2,097,440       1,054,360       267,042         133,328         66,532          8,652
>        times in seconds 155.055861167   153.059625968   152.449492156   151.024005282   150.844647770   155.954366718
>        stddev for time  ( +-  1.32% )   ( +-  1.14% )   ( +-  1.32% )   ( +-  0.81% )   ( +-  0.75% )   ( +-  0.72% )
>        %chg time to baseline            -1.28%          -1.68%          -2.59%          -2.71%          0.57%

Slowdown for most orders.

> Incase of Kernel compilation, fault around order (fao) of 1 and 3 
> provides fast compilation time when compared to a value of 4. On 
> closer look, fao of 3 has higher agains. Incase of Sequential access 
> synthetic tests fao of 1 has higher gains and in Random access test, 
> fao of 3 has marginal gains. Going by compilation time, fao value of 
> 3 is suggested in this patch for pseries platform.

So I'm really at loss to understand where you get the optimal value of 
'3' from. The data does not seem to match your claim that '1 and 3 
provides fast compilation time when compared to a value of 4':

> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> 
> Linux rebuild (make -j64)
> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%

5 and 8, and probably 6, 7 are better than 4.

3 is probably _slower_ than the current default - but it's hard to 
tell due to inherent noise.

But the other two build tests were too noisy, and if then they showed 
a slowdown.

> Worst case scenario: we touch one page every 16M to demonstrate overhead.
> 
> Touch only one page in page table in 16GiB file
> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> 1 thread
>        minor-faults     1,104           1,090           1,071           1,068           1,065           1,063
>        times in seconds 0.006583298     0.008531502     0.019733795     0.036033763     0.062300553     0.406857086
>        stddev for time  ( +-  2.79% )   ( +-  2.42% )   ( +-  3.47% )   ( +-  2.81% )   ( +-  2.01% )   ( +-  1.33% )
> 8 threads
>        minor-faults     8,279           8,264           8,245           8,243           8,239           8,240
>        times in seconds 0.044572398     0.057211811     0.107606306     0.205626815     0.381679120     2.647979955
>        stddev for time  ( +-  1.95% )   ( +-  2.98% )   ( +-  1.74% )   ( +-  2.80% )   ( +-  2.01% )   ( +-  1.86% )
> 32 threads
>        minor-faults     32,879          32,864          32,849          32,845          32,839          32,843
>        times in seconds 0.197659343     0.218486087     0.445116407     0.694235883     1.296894038     9.127517045
>        stddev for time  ( +-  3.05% )   ( +-  3.05% )   ( +-  4.33% )   ( +-  3.08% )   ( +-  3.75% )   ( +-  0.56% )
> 64 threads
>        minor-faults     65,680          65,664          65,646          65,645          65,640          65,647
>        times in seconds 0.455537304     0.489688780     0.866490093     1.427393118     2.379628982     17.059295051
>        stddev for time  ( +-  4.01% )   ( +-  4.13% )   ( +-  2.92% )   ( +-  1.68% )   ( +-  1.79% )   ( +-  0.48% )
> 128 threads
>        minor-faults     131,279         131,265         131,250         131,245         131,241         131,254
>        times in seconds 1.026880651     1.095327536     1.721728274     2.808233068     4.662729948     31.732848290
>        stddev for time  ( +-  6.85% )   ( +-  4.09% )   ( +-  1.71% )   ( +-  3.45% )   ( +-  2.40% )   ( +-  0.68% )

There's no '%change' values shown, but the slowdown looks significant, 
it's the worst case: for example with 1 thread order 3 looks about 
300% slower (3x slowdown) compared to order 0.

All in one, looking at your latest data I don't think the conclusion 
from your first version of this optimization patch from a month ago is 
true anymore:

> +	/* Measured on a 4 socket Power7 system (128 Threads and 128GB memory) */
> +	fault_around_order = 3;

As the data is rather conflicting and inconclusive, and if it shows a 
sweet spot it's not at order 3. New data should in general trigger 
reanalysis of your first optimization value.

I'm starting to suspect that maybe workloads ought to be given a 
choice in this matter, via madvise() or such.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-04-28  9:06   ` Peter Zijlstra
@ 2014-04-29  9:33     ` Madhavan Srinivasan
  0 siblings, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-29  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, mingo,
	dave.hansen

On Monday 28 April 2014 02:36 PM, Peter Zijlstra wrote:
> On Mon, Apr 28, 2014 at 02:31:29PM +0530, Madhavan Srinivasan wrote:
>> +unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
> 
> __read_mostly?
> 
Agreed. Will add it.

Thanks for review.
With regards
Maddy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-04-28  9:36   ` Kirill A. Shutemov
@ 2014-04-29  9:35     ` Madhavan Srinivasan
  0 siblings, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-29  9:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, rusty, akpm, riel, mgorman, ak, peterz, mingo,
	dave.hansen

On Monday 28 April 2014 03:06 PM, Kirill A. Shutemov wrote:
> Madhavan Srinivasan wrote:
>> Kirill A. Shutemov with 8c6e50b029 commit introduced
>> vm_ops->map_pages() for mapping easy accessible pages around
>> fault address in hope to reduce number of minor page faults.
>>
>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
>> value using mm/Kconfig. This will enable architecture maintainers
>> to decide on suitable FAULT_AROUND_ORDER value based on
>> performance data for that architecture. Patch also defaults
>> FAULT_AROUND_ORDER Kconfig element to 4.
>>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>> ---
>>  mm/Kconfig  |    8 ++++++++
>>  mm/memory.c |   11 ++++-------
>>  2 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index ebe5880..c7fc4f1 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -176,6 +176,14 @@ config MOVABLE_NODE
>>  config HAVE_BOOTMEM_INFO_NODE
>>  	def_bool n
>>  
>> +#
>> +# Fault around order is a control knob to decide the fault around pages.
>> +# Default value is set to 4 , but the arch can override it as desired.
>> +#
>> +config FAULT_AROUND_ORDER
>> +	int
>> +	default	4
>> +
>>  # eventually, we can have this option just 'select SPARSEMEM'
>>  config MEMORY_HOTPLUG
>>  	bool "Allow for memory hot-add"
>> diff --git a/mm/memory.c b/mm/memory.c
>> index d0f0bef..457436d 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3382,11 +3382,9 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>>  	update_mmu_cache(vma, address, pte);
>>  }
>>  
>> -#define FAULT_AROUND_ORDER 4
>> +unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER;
>>  
>>  #ifdef CONFIG_DEBUG_FS
>> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
>> -
>>  static int fault_around_order_get(void *data, u64 *val)
>>  {
>>  	*val = fault_around_order;
>> @@ -3395,7 +3393,6 @@ static int fault_around_order_get(void *data, u64 *val)
>>  
>>  static int fault_around_order_set(void *data, u64 val)
>>  {
>> -	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
>>  	if (1UL << val > PTRS_PER_PTE)
>>  		return -EINVAL;
>>  	fault_around_order = val;
>> @@ -3430,14 +3427,14 @@ static inline unsigned long fault_around_pages(void)
>>  {
>>  	unsigned long nr_pages;
>>  
>> -	nr_pages = 1UL << FAULT_AROUND_ORDER;
>> +	nr_pages = 1UL << fault_around_order;
>>  	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
> 
> This BUILD_BUG_ON() doesn't make sense any more since compiler usually can't
> prove anything about extern variable.
> 
> Drop it or change to VM_BUG_ON() or something.
> 

Ok.

>>  	return nr_pages;
>>  }
>>  
>>  static inline unsigned long fault_around_mask(void)
>>  {
>> -	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
>> +	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
> 
> fault_around_pages() and fault_around_mask() should be moved outside of
> #ifdef CONFIG_DEBUG_FS and consolidated since they are practically identical
> both branches after the changes.
> 

Agreed. Will change it.

>>  }
>>  #endif
>>  
>> @@ -3495,7 +3492,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>>  	 * if page by the offset is not ready to be mapped (cold cache or
>>  	 * something).
>>  	 */
>> -	if (vma->vm_ops->map_pages) {
>> +	if ((vma->vm_ops->map_pages) && (fault_around_order)) {
> 
> Way too many parentheses.
> 

Sure. Will modify it.

Thanks for review
with regards
Maddy
>>  		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
>>  		do_fault_around(vma, address, pte, pgoff, flags);
>>  		if (!pte_same(*pte, orig_pte))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-29  2:18   ` Rusty Russell
@ 2014-04-29  9:36     ` Madhavan Srinivasan
  0 siblings, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-29  9:36 UTC (permalink / raw)
  To: Rusty Russell, linux-kernel, linuxppc-dev, linux-mm, linux-arch,
	x86
  Cc: benh, paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz,
	mingo, dave.hansen

On Tuesday 29 April 2014 07:48 AM, Rusty Russell wrote:
> Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
>> index 2db8cc6..c87e6b6 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -74,6 +74,8 @@ int CMO_SecPSP = -1;
>>  unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
>>  EXPORT_SYMBOL(CMO_PageSize);
>>  
>> +extern unsigned int fault_around_order;
>> +
> 
> It's considered bad form to do this.  Put the declaration in linux/mm.h.
> 

ok. Will change it.

Thanks for review
With regards
Maddy

> Thanks,
> Rusty.
> PS.  But we're getting there! :)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-29  7:06   ` Ingo Molnar
@ 2014-04-29 10:35     ` Madhavan Srinivasan
  2014-04-30  7:04     ` Rusty Russell
  1 sibling, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-29 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz,
	dave.hansen, Linus Torvalds

On Tuesday 29 April 2014 12:36 PM, Ingo Molnar wrote:
> 
> * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
> 
>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>> v3.15-rc1 for different fault around order values.
>>
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>>
>> Linux build (make -j64)
>> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
>> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
> 
> Probably too noisy.

Ok. I should have added the formula used for %change to clarify the data
presented. My bad.

Just to clarify, %change here is calculated based on this formula.

((new value - baseline)/baseline)

And in this case, negative %change says it a drop in time and
positive value has increase the time when compared to baseline.

With regards
Maddy

> 
>> Linux rebuild (make -j64)
>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
> 
> Here it looks like a speedup. Optimal value: 5+.
> 
>> Binutils build (make all -j64 )
>> minor-faults            474,821         371,380         269,463         247,715         235,255         228,337
>> times in seconds        53.882492432    53.584289348    53.882773216    53.755816431    53.607824348    53.423759642
>>  stddev for time        ( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  0.11% )   ( +-  0.60% )   ( +-  0.69% )
>>  %chg time to baseline                  -0.55%          0.0%            -0.23%          -0.51%          -0.85%
> 
> Probably too noisy, but looks like a potential slowdown?
> 
>> Two synthetic tests: access every word in file in sequential/random order.
>>
>> Sequential access 16GiB file
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>> 1 thread
>>        minor-faults     263,148         131,166         32,908          16,514          8,260           1,093
>>        times in seconds 53.091138345    53.113191672    53.188776177    53.233017218    53.206841347    53.429979442
>>        stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  0.09% )   ( +-  0.03% )   ( +-  0.03% )
>>        %chg time to baseline            0.04%           0.18%           0.26%           0.21%           0.63%
> 
> Speedup, optimal value: 8+.
> 
>> 8 threads
>>        minor-faults     2,097,267       1,048,753       262,237         131,397         65,621          8,274
>>        times in seconds 55.173790028    54.591880790    54.824623287    54.802162211    54.969680503    54.790387715
>>        stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  0.07% )   ( +-  0.28% )   ( +-  0.05% )
>>        %chg time to baseline            -1.05%          -0.63%          -0.67%          -0.36%          -0.69%
> 
> Looks like a regression?
> 
>> 32 threads
>>        minor-faults     8,388,751       4,195,621       1,049,664       525,461         262,535         32,924
>>        times in seconds 60.431573046    60.669110744    60.485336388    60.697789706    60.077959564    60.588855032
>>        stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  0.67% )   ( +-  0.31% )   ( +-  0.49% )
>>        %chg time to baseline            0.39%           0.08%           0.44%           -0.58%          0.25%
> 
> Probably too noisy.
> 
>> 64 threads
>>        minor-faults     16,777,409      8,607,527       2,289,766       1,202,264       598,405         67,587
>>        times in seconds 96.932617720    100.675418760   102.109880836   103.881733383   102.580199555   105.751194041
>>        stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  0.76% )   ( +-  1.65% )   ( +-  1.60% )
>>        %chg time to baseline            3.86%           5.34%           7.16%           5.82%           9.09%
> 
> Speedup, optimal value: 4+
> 
>> 128 threads
>>        minor-faults     33,554,705      17,375,375      4,682,462       2,337,245       1,179,007       134,819
>>        times in seconds 128.766704495   115.659225437   120.353046307   115.291871270   115.450886036   113.991902150
>>        stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  1.24% )   ( +-  1.03% )   ( +-  0.70% )
>>        %chg time to baseline            -10.17%         -6.53%          -10.46%         -10.34%         -11.47%
> 
> Rather significant regression at order 1 already.
> 
>> Random access 1GiB file
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>> 1 thread
>>        minor-faults     17,155          8,678           2,126           1,097           581             134
>>        times in seconds 51.904430523    51.658017987    51.919270792    51.560531738    52.354431597    51.976469502
>>        stddev for time  ( +-  3.19% )   ( +-  1.35% )   ( +-  1.56% )   ( +-  0.91% )   ( +-  1.70% )   ( +-  2.02% )
>>        %chg time to baseline            -0.47%          0.02%           -0.66%          0.86%           0.13%
> 
> Probably too noisy.
> 
>> 8 threads
>>        minor-faults     131,844         70,705          17,457          8,505           4,251           598
>>        times in seconds 58.162813956    54.991706305    54.952675791    55.323057492    54.755587379    53.376722828
>>        stddev for time  ( +-  1.44% )   ( +-  0.69% )   ( +-  1.23% )   ( +-  2.78% )   ( +-  1.90% )   ( +-  2.91% )
>>        %chg time to baseline            -5.45%          -5.52%          -4.88%          -5.86%          -8.22%
> 
> Regression.
> 
>> 32 threads
>>        minor-faults     524,437         270,760         67,069          33,414          16,641          2,204
>>        times in seconds 69.981777072    76.539570015    79.753578505    76.245943618    77.254258344    79.072596831
>>        stddev for time  ( +-  2.81% )   ( +-  1.95% )   ( +-  2.66% )   ( +-  0.99% )   ( +-  2.35% )   ( +-  3.22% )
>>        %chg time to baseline            9.37%           13.96%          8.95%           10.39%          12.98%
> 
> Speedup, optimal value hard to tell due to noise - 3+ or 8+.
> 
>> 64 threads
>>        minor-faults     1,049,117       527,451         134,016         66,638          33,391          4,559
>>        times in seconds 108.024517536   117.575067996   115.322659914   111.943998437   115.049450815   119.218450840
>>        stddev for time  ( +-  2.40% )   ( +-  1.77% )   ( +-  1.19% )   ( +-  3.29% )   ( +-  2.32% )   ( +-  1.42% )
>>        %chg time to baseline            8.84%           6.75%           3.62%           6.5%            10.3%
> 
> Speedup, optimal value again hard to tell due to noise.
> 
>> 128 threads
>>        minor-faults     2,097,440       1,054,360       267,042         133,328         66,532          8,652
>>        times in seconds 155.055861167   153.059625968   152.449492156   151.024005282   150.844647770   155.954366718
>>        stddev for time  ( +-  1.32% )   ( +-  1.14% )   ( +-  1.32% )   ( +-  0.81% )   ( +-  0.75% )   ( +-  0.72% )
>>        %chg time to baseline            -1.28%          -1.68%          -2.59%          -2.71%          0.57%
> 
> Slowdown for most orders.
> 
>> Incase of Kernel compilation, fault around order (fao) of 1 and 3 
>> provides fast compilation time when compared to a value of 4. On 
>> closer look, fao of 3 has higher agains. Incase of Sequential access 
>> synthetic tests fao of 1 has higher gains and in Random access test, 
>> fao of 3 has marginal gains. Going by compilation time, fao value of 
>> 3 is suggested in this patch for pseries platform.
> 
> So I'm really at loss to understand where you get the optimal value of 
> '3' from. The data does not seem to match your claim that '1 and 3 
> provides fast compilation time when compared to a value of 4':
> 



>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>>
>> Linux rebuild (make -j64)
>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
> 
> 5 and 8, and probably 6, 7 are better than 4.
> 
> 3 is probably _slower_ than the current default - but it's hard to 
> tell due to inherent noise.
> 
> But the other two build tests were too noisy, and if then they showed 
> a slowdown.
> 
>> Worst case scenario: we touch one page every 16M to demonstrate overhead.
>>
>> Touch only one page in page table in 16GiB file
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>> 1 thread
>>        minor-faults     1,104           1,090           1,071           1,068           1,065           1,063
>>        times in seconds 0.006583298     0.008531502     0.019733795     0.036033763     0.062300553     0.406857086
>>        stddev for time  ( +-  2.79% )   ( +-  2.42% )   ( +-  3.47% )   ( +-  2.81% )   ( +-  2.01% )   ( +-  1.33% )
>> 8 threads
>>        minor-faults     8,279           8,264           8,245           8,243           8,239           8,240
>>        times in seconds 0.044572398     0.057211811     0.107606306     0.205626815     0.381679120     2.647979955
>>        stddev for time  ( +-  1.95% )   ( +-  2.98% )   ( +-  1.74% )   ( +-  2.80% )   ( +-  2.01% )   ( +-  1.86% )
>> 32 threads
>>        minor-faults     32,879          32,864          32,849          32,845          32,839          32,843
>>        times in seconds 0.197659343     0.218486087     0.445116407     0.694235883     1.296894038     9.127517045
>>        stddev for time  ( +-  3.05% )   ( +-  3.05% )   ( +-  4.33% )   ( +-  3.08% )   ( +-  3.75% )   ( +-  0.56% )
>> 64 threads
>>        minor-faults     65,680          65,664          65,646          65,645          65,640          65,647
>>        times in seconds 0.455537304     0.489688780     0.866490093     1.427393118     2.379628982     17.059295051
>>        stddev for time  ( +-  4.01% )   ( +-  4.13% )   ( +-  2.92% )   ( +-  1.68% )   ( +-  1.79% )   ( +-  0.48% )
>> 128 threads
>>        minor-faults     131,279         131,265         131,250         131,245         131,241         131,254
>>        times in seconds 1.026880651     1.095327536     1.721728274     2.808233068     4.662729948     31.732848290
>>        stddev for time  ( +-  6.85% )   ( +-  4.09% )   ( +-  1.71% )   ( +-  3.45% )   ( +-  2.40% )   ( +-  0.68% )
> 
> There's no '%change' values shown, but the slowdown looks significant, 
> it's the worst case: for example with 1 thread order 3 looks about 
> 300% slower (3x slowdown) compared to order 0.
> 
> All in one, looking at your latest data I don't think the conclusion 
> from your first version of this optimization patch from a month ago is 
> true anymore:
> 
>> +	/* Measured on a 4 socket Power7 system (128 Threads and 128GB memory) */
>> +	fault_around_order = 3;
> 
> As the data is rather conflicting and inconclusive, and if it shows a 
> sweet spot it's not at order 3. New data should in general trigger 
> reanalysis of your first optimization value.
>
> I'm starting to suspect that maybe workloads ought to be given a 
> choice in this matter, via madvise() or such.
> 
> Thanks,
> 
> 	Ingo
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-29  7:06   ` Ingo Molnar
  2014-04-29 10:35     ` Madhavan Srinivasan
@ 2014-04-30  7:04     ` Rusty Russell
  2014-04-30  8:15       ` Madhavan Srinivasan
  2014-05-06 11:29       ` Ingo Molnar
  1 sibling, 2 replies; 14+ messages in thread
From: Rusty Russell @ 2014-04-30  7:04 UTC (permalink / raw)
  To: Ingo Molnar, Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz,
	dave.hansen, Linus Torvalds

Ingo Molnar <mingo@kernel.org> writes:
> * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
>
>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>> v3.15-rc1 for different fault around order values.
>> 
>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>> 
>> Linux build (make -j64)
>> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
>> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
>
> Probably too noisy.

A little, but 3 still looks like the winner.

>> Linux rebuild (make -j64)
>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
>
> Here it looks like a speedup. Optimal value: 5+.

No, lower time is better.  Baseline (no faultaround) wins.


etc.

It's not a huge surprise that a 64k page arch wants a smaller value than
a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
see downside.

Most extreme results:
Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.

> I'm starting to suspect that maybe workloads ought to be given a 
> choice in this matter, via madvise() or such.

I really don't think they'll be able to use it; it'll change far too
much with machine and kernel updates.  I think we should apply patch #1
(with fixes) to make it a variable, then set it to 0 for PPC.

Cheers,
Rusty.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-30  7:04     ` Rusty Russell
@ 2014-04-30  8:15       ` Madhavan Srinivasan
  2014-05-06 11:29       ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2014-04-30  8:15 UTC (permalink / raw)
  To: Rusty Russell, Ingo Molnar
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz,
	dave.hansen, Linus Torvalds

On Wednesday 30 April 2014 12:34 PM, Rusty Russell wrote:
> Ingo Molnar <mingo@kernel.org> writes:
>> * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
>>
>>> Performance data for different FAULT_AROUND_ORDER values from 4 socket
>>> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
>>> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
>>> v3.15-rc1 for different fault around order values.
>>>
>>> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
>>>
>>> Linux build (make -j64)
>>> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
>>> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
>>>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
>>>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
>>
>> Probably too noisy.
> 
> A little, but 3 still looks like the winner.
> 
>>> Linux rebuild (make -j64)
>>> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
>>> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
>>>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
>>>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
>>
>> Here it looks like a speedup. Optimal value: 5+.
> 
> No, lower time is better.  Baseline (no faultaround) wins.
> 
> 
> etc.
> 
> It's not a huge surprise that a 64k page arch wants a smaller value than
> a 4k system.  But I agree: I don't see much upside for FAO > 0, but I do
> see downside.
> 
> Most extreme results:
> Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
> Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
> Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.
> 
>> I'm starting to suspect that maybe workloads ought to be given a 
>> choice in this matter, via madvise() or such.
> 
> I really don't think they'll be able to use it; it'll change far too
> much with machine and kernel updates.  I think we should apply patch #1
> (with fixes) to make it a variable, then set it to 0 for PPC.
> 

Ok. Will do.

Thanks for review
With regards
Maddy


> Cheers,
> Rusty.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-04-30  7:04     ` Rusty Russell
  2014-04-30  8:15       ` Madhavan Srinivasan
@ 2014-05-06 11:29       ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2014-05-06 11:29 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86, benh, paulus, kirill.shutemov, akpm, riel,
	mgorman, ak, peterz, dave.hansen, Linus Torvalds


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> Ingo Molnar <mingo@kernel.org> writes:
> > * Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
> >
> >> Performance data for different FAULT_AROUND_ORDER values from 4 socket
> >> Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
> >> is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
> >> v3.15-rc1 for different fault around order values.
> >> 
> >> FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
> >> 
> >> Linux build (make -j64)
> >> minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
> >> times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
> >>  stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
> >>  %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%
> >
> > Probably too noisy.
> 
> A little, but 3 still looks like the winner.
> 
> >> Linux rebuild (make -j64)
> >> minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
> >> times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
> >>  stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
> >>  %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%
> >
> > Here it looks like a speedup. Optimal value: 5+.
> 
> No, lower time is better.  Baseline (no faultaround) wins.
> 
> 
> etc.

ah, yeah, you are right. Brainfart of the week...

> It's not a huge surprise that a 64k page arch wants a smaller value 
> than a 4k system.  But I agree: I don't see much upside for FAO > 0, 
> but I do see downside.
> 
> Most extreme results:
> Order 1: 2% loss on recompile.  10% win 4% loss on seq.  9% loss random.
> Order 3: 2% loss on recompile.  6% win 5% loss on seq.  14% loss on random.
> Order 4: 2.8% loss on recompile. 10% win 7% loss on seq.  9% loss on random.
> 
> > I'm starting to suspect that maybe workloads ought to be given a 
> > choice in this matter, via madvise() or such.
> 
> I really don't think they'll be able to use it; it'll change far too 
> much with machine and kernel updates. [...]

Do we know that?

> [...] I think we should apply patch
> #1 (with fixes) to make it a variable, then set it to 0 for PPC.

Ok, agreed - at least until contrary data comes around.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-05-06 11:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-28  9:01 [PATCH V3 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
2014-04-28  9:01 ` [PATCH V3 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
2014-04-28  9:06   ` Peter Zijlstra
2014-04-29  9:33     ` Madhavan Srinivasan
2014-04-28  9:36   ` Kirill A. Shutemov
2014-04-29  9:35     ` Madhavan Srinivasan
2014-04-28  9:01 ` [PATCH V3 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
2014-04-29  2:18   ` Rusty Russell
2014-04-29  9:36     ` Madhavan Srinivasan
2014-04-29  7:06   ` Ingo Molnar
2014-04-29 10:35     ` Madhavan Srinivasan
2014-04-30  7:04     ` Rusty Russell
2014-04-30  8:15       ` Madhavan Srinivasan
2014-05-06 11:29       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).