* [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc @ 2014-05-08 9:28 Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-08 9:28 UTC (permalink / raw) To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86 Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen, Madhavan Srinivasan Kirill A. Shutemov with 8c6e50b029 commit introduced vm_ops->map_pages() for mapping easy accessible pages around fault address in hope to reduce number of minor page faults. This patch creates infrastructure to modify the FAULT_AROUND_ORDER value using mm/Kconfig. This will enable architecture maintainers to decide on suitable FAULT_AROUND_ORDER value based on performance data for that architecture. First patch also defaults FAULT_AROUND_ORDER Kconfig element to 4. Second patch list out the performance numbers for powerpc (platform pseries) and initialize the fault around order variable for pseries platform of powerpc. V4 Changes: Replaced the BUILD_BUG_ON with VM_BUG_ON. Moved fault_around_pages() and fault_around_mask() functions outside of #ifdef CONFIG_DEBUG_FS. V3 Changes: Replaced FAULT_AROUND_ORDER macro to a variable to support arch's that supports sub platforms. Made changes in commit messages. V2 Changes: Created Kconfig parameter for FAULT_AROUND_ORDER Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0 Made changes in commit messages. Madhavan Srinivasan (2): mm: move FAULT_AROUND_ORDER to arch/ powerpc/pseries: init fault_around_order for pseries arch/powerpc/platforms/pseries/pseries.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 5 +++++ mm/Kconfig | 8 ++++++++ mm/memory.c | 25 ++++++------------------- 4 files changed, 21 insertions(+), 19 deletions(-) -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ 2014-05-08 9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan @ 2014-05-08 9:28 ` Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan 2014-05-15 8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2 siblings, 0 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-08 9:28 UTC (permalink / raw) To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86 Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen, Madhavan Srinivasan Kirill A. Shutemov with 8c6e50b029 commit introduced vm_ops->map_pages() for mapping easy accessible pages around fault address in hope to reduce number of minor page faults. This patch creates infrastructure to modify the FAULT_AROUND_ORDER value using mm/Kconfig. This will enable architecture maintainers to decide on suitable FAULT_AROUND_ORDER value based on performance data for that architecture. Patch also defaults FAULT_AROUND_ORDER Kconfig element to 4. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> --- mm/Kconfig | 8 ++++++++ mm/memory.c | 25 ++++++------------------- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index ebe5880..c7fc4f1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -176,6 +176,14 @@ config MOVABLE_NODE config HAVE_BOOTMEM_INFO_NODE def_bool n +# +# Fault around order is a control knob to decide the fault around pages. +# Default value is set to 4 , but the arch can override it as desired. +# +config FAULT_AROUND_ORDER + int + default 4 + # eventually, we can have this option just 'select SPARSEMEM' config MEMORY_HOTPLUG bool "Allow for memory hot-add" diff --git a/mm/memory.c b/mm/memory.c index 037b812..e3931ef 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3402,11 +3402,9 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -#define FAULT_AROUND_ORDER 4 +unsigned int fault_around_order __read_mostly = CONFIG_FAULT_AROUND_ORDER; #ifdef CONFIG_DEBUG_FS -static unsigned int fault_around_order = FAULT_AROUND_ORDER; - static int fault_around_order_get(void *data, u64 *val) { *val = fault_around_order; @@ -3415,7 +3413,6 @@ static int fault_around_order_get(void *data, u64 *val) static int fault_around_order_set(void *data, u64 val) { - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); if (1UL << val > PTRS_PER_PTE) return -EINVAL; fault_around_order = val; @@ -3435,31 +3432,21 @@ static int __init fault_around_debugfs(void) return 0; } late_initcall(fault_around_debugfs); +#endif static inline unsigned long fault_around_pages(void) { - return 1UL << fault_around_order; -} - -static inline unsigned long fault_around_mask(void) -{ - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); -} -#else -static inline unsigned long fault_around_pages(void) -{ unsigned long nr_pages; - nr_pages = 1UL << FAULT_AROUND_ORDER; - BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); + nr_pages = 1UL << fault_around_order; + VM_BUG_ON(nr_pages > PTRS_PER_PTE); return nr_pages; } static inline unsigned long fault_around_mask(void) { - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); + return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); } -#endif static void do_fault_around(struct vm_area_struct *vma, unsigned long address, pte_t *pte, pgoff_t pgoff, unsigned int flags) @@ -3515,7 +3502,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages) { + if ((vma->vm_ops->map_pages) && fault_around_order) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte)) -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries 2014-05-08 9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan @ 2014-05-08 9:28 ` Madhavan Srinivasan 2014-05-20 7:28 ` Andrew Morton 2014-05-15 8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2 siblings, 1 reply; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-08 9:28 UTC (permalink / raw) To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86 Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen, Madhavan Srinivasan Performance data for different FAULT_AROUND_ORDER values from 4 socket Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5 is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and v3.15-rc1 for different fault around order values. %change here is calculated in this method ((new value - baseline)/baseline). And negative %change says its a drop in time. FAULT_AROUND_ORDER Baseline 1 3 4 5 8 Linux build (make -j64) minor-faults 47,437,359 35,279,286 25,425,347 23,461,275 22,002,189 21,435,836 times in seconds 347.302528420 344.061588460 340.974022391 348.193508116 348.673900158 350.986543618 stddev for time ( +- 1.50% ) ( +- 0.73% ) ( +- 1.13% ) ( +- 1.01% ) ( +- 1.89% ) ( +- 1.55% ) %chg time to baseline -0.9% -1.8% 0.2% 0.39% 1.06% Linux rebuild (make -j64) minor-faults 941,552 718,319 486,625 440,124 410,510 397,416 times in seconds 30.569834718 31.219637539 31.319370649 31.434285472 31.972367174 31.443043580 stddev for time ( +- 1.07% ) ( +- 0.13% ) ( +- 0.43% ) ( +- 0.18% ) ( +- 0.95% ) ( +- 0.58% ) %chg time to baseline 2.1% 2.4% 2.8% 4.58% 2.85% Binutils build (make all -j64 ) minor-faults 474,821 371,380 269,463 247,715 235,255 228,337 times in seconds 53.882492432 53.584289348 53.882773216 53.755816431 53.607824348 53.423759642 stddev for time ( +- 0.08% ) ( +- 0.56% ) ( +- 0.17% ) ( +- 0.11% ) ( +- 0.60% ) ( +- 0.69% ) %chg time to baseline -0.55% 0.0% -0.23% -0.51% -0.85% Two synthetic tests: access every word in file in sequential/random order. Sequential access 16GiB file FAULT_AROUND_ORDER Baseline 1 3 4 5 8 1 thread minor-faults 263,148 131,166 32,908 16,514 8,260 1,093 times in seconds 53.091138345 53.113191672 53.188776177 53.233017218 53.206841347 53.429979442 stddev for time ( +- 0.06% ) ( +- 0.07% ) ( +- 0.08% ) ( +- 0.09% ) ( +- 0.03% ) ( +- 0.03% ) %chg time to baseline 0.04% 0.18% 0.26% 0.21% 0.63% 8 threads minor-faults 2,097,267 1,048,753 262,237 131,397 65,621 8,274 times in seconds 55.173790028 54.591880790 54.824623287 54.802162211 54.969680503 54.790387715 stddev for time ( +- 0.78% ) ( +- 0.09% ) ( +- 0.08% ) ( +- 0.07% ) ( +- 0.28% ) ( +- 0.05% ) %chg time to baseline -1.05% -0.63% -0.67% -0.36% -0.69% 32 threads minor-faults 8,388,751 4,195,621 1,049,664 525,461 262,535 32,924 times in seconds 60.431573046 60.669110744 60.485336388 60.697789706 60.077959564 60.588855032 stddev for time ( +- 0.44% ) ( +- 0.27% ) ( +- 0.46% ) ( +- 0.67% ) ( +- 0.31% ) ( +- 0.49% ) %chg time to baseline 0.39% 0.08% 0.44% -0.58% 0.25% 64 threads minor-faults 16,777,409 8,607,527 2,289,766 1,202,264 598,405 67,587 times in seconds 96.932617720 100.675418760 102.109880836 103.881733383 102.580199555 105.751194041 stddev for time ( +- 1.39% ) ( +- 1.06% ) ( +- 0.99% ) ( +- 0.76% ) ( +- 1.65% ) ( +- 1.60% ) %chg time to baseline 3.86% 5.34% 7.16% 5.82% 9.09% 128 threads minor-faults 33,554,705 17,375,375 4,682,462 2,337,245 1,179,007 134,819 times in seconds 128.766704495 115.659225437 120.353046307 115.291871270 115.450886036 113.991902150 stddev for time ( +- 2.93% ) ( +- 0.30% ) ( +- 2.93% ) ( +- 1.24% ) ( +- 1.03% ) ( +- 0.70% ) %chg time to baseline -10.17% -6.53% -10.46% -10.34% -11.47% Random access 1GiB file FAULT_AROUND_ORDER Baseline 1 3 4 5 8 1 thread minor-faults 17,155 8,678 2,126 1,097 581 134 times in seconds 51.904430523 51.658017987 51.919270792 51.560531738 52.354431597 51.976469502 stddev for time ( +- 3.19% ) ( +- 1.35% ) ( +- 1.56% ) ( +- 0.91% ) ( +- 1.70% ) ( +- 2.02% ) %chg time to baseline -0.47% 0.02% -0.66% 0.86% 0.13% 8 threads minor-faults 131,844 70,705 17,457 8,505 4,251 598 times in seconds 58.162813956 54.991706305 54.952675791 55.323057492 54.755587379 53.376722828 stddev for time ( +- 1.44% ) ( +- 0.69% ) ( +- 1.23% ) ( +- 2.78% ) ( +- 1.90% ) ( +- 2.91% ) %chg time to baseline -5.45% -5.52% -4.88% -5.86% -8.22% 32 threads minor-faults 524,437 270,760 67,069 33,414 16,641 2,204 times in seconds 69.981777072 76.539570015 79.753578505 76.245943618 77.254258344 79.072596831 stddev for time ( +- 2.81% ) ( +- 1.95% ) ( +- 2.66% ) ( +- 0.99% ) ( +- 2.35% ) ( +- 3.22% ) %chg time to baseline 9.37% 13.96% 8.95% 10.39% 12.98% 64 threads minor-faults 1,049,117 527,451 134,016 66,638 33,391 4,559 times in seconds 108.024517536 117.575067996 115.322659914 111.943998437 115.049450815 119.218450840 stddev for time ( +- 2.40% ) ( +- 1.77% ) ( +- 1.19% ) ( +- 3.29% ) ( +- 2.32% ) ( +- 1.42% ) %chg time to baseline 8.84% 6.75% 3.62% 6.5% 10.3% 128 threads minor-faults 2,097,440 1,054,360 267,042 133,328 66,532 8,652 times in seconds 155.055861167 153.059625968 152.449492156 151.024005282 150.844647770 155.954366718 stddev for time ( +- 1.32% ) ( +- 1.14% ) ( +- 1.32% ) ( +- 0.81% ) ( +- 0.75% ) ( +- 0.72% ) %chg time to baseline -1.28% -1.68% -2.59% -2.71% 0.57% In case of kernel build, fault around order (fao) value of 1 and 3 wins when compared to 4 (but bit noisy). Incase of kernel rebuild, slowdown for fao > 0 is seen. Incase of synthetic test, there are sporadic agains, but mostly slowdown. No clear sweet spot fao value that can be suggested for the ppc64/pseries with the current performance data. Hence, patch suggest value of zero to the fao. Worst case scenario: we touch one page every 16M to demonstrate overhead. Touch only one page in page table in 16GiB file FAULT_AROUND_ORDER Baseline 1 3 4 5 8 1 thread minor-faults 1,104 1,090 1,071 1,068 1,065 1,063 times in seconds 0.006583298 0.008531502 0.019733795 0.036033763 0.062300553 0.406857086 stddev for time ( +- 2.79% ) ( +- 2.42% ) ( +- 3.47% ) ( +- 2.81% ) ( +- 2.01% ) ( +- 1.33% ) 8 threads minor-faults 8,279 8,264 8,245 8,243 8,239 8,240 times in seconds 0.044572398 0.057211811 0.107606306 0.205626815 0.381679120 2.647979955 stddev for time ( +- 1.95% ) ( +- 2.98% ) ( +- 1.74% ) ( +- 2.80% ) ( +- 2.01% ) ( +- 1.86% ) 32 threads minor-faults 32,879 32,864 32,849 32,845 32,839 32,843 times in seconds 0.197659343 0.218486087 0.445116407 0.694235883 1.296894038 9.127517045 stddev for time ( +- 3.05% ) ( +- 3.05% ) ( +- 4.33% ) ( +- 3.08% ) ( +- 3.75% ) ( +- 0.56% ) 64 threads minor-faults 65,680 65,664 65,646 65,645 65,640 65,647 times in seconds 0.455537304 0.489688780 0.866490093 1.427393118 2.379628982 17.059295051 stddev for time ( +- 4.01% ) ( +- 4.13% ) ( +- 2.92% ) ( +- 1.68% ) ( +- 1.79% ) ( +- 0.48% ) 128 threads minor-faults 131,279 131,265 131,250 131,245 131,241 131,254 times in seconds 1.026880651 1.095327536 1.721728274 2.808233068 4.662729948 31.732848290 stddev for time ( +- 6.85% ) ( +- 4.09% ) ( +- 1.71% ) ( +- 3.45% ) ( +- 2.40% ) ( +- 0.68% ) Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> --- arch/powerpc/platforms/pseries/pseries.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 9921953..6e6c993 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -17,6 +17,8 @@ struct device_node; extern void request_event_sources_irqs(struct device_node *np, irq_handler_t handler, const char *name); +extern unsigned int fault_around_order; + #include <linux/of.h> extern void __init fw_hypertas_feature_init(const char *hypertas, diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 2db8cc6..4391c3c 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -465,6 +465,11 @@ static void __init pSeries_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); + /* + * Defaulting to zero since no sweet spot value found in the performance test. + */ + fault_around_order = 0; + /* Discover PIC type and setup ppc_md accordingly */ pseries_discover_pic(); -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries 2014-05-08 9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan @ 2014-05-20 7:28 ` Andrew Morton 2014-05-20 8:03 ` Madhavan Srinivasan 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2014-05-20 7:28 UTC (permalink / raw) To: Madhavan Srinivasan Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, kirill.shutemov, rusty, riel, mgorman, ak, peterz, mingo, dave.hansen On Thu, 8 May 2014 14:58:16 +0530 Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote: > --- a/arch/powerpc/platforms/pseries/pseries.h > +++ b/arch/powerpc/platforms/pseries/pseries.h > @@ -17,6 +17,8 @@ struct device_node; > extern void request_event_sources_irqs(struct device_node *np, > irq_handler_t handler, const char *name); > > +extern unsigned int fault_around_order; This isn't an appropriate header file for exporting something from core mm - what happens if arch/mn10300 wants it?. I guess include/linux/mm.h is the place. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries 2014-05-20 7:28 ` Andrew Morton @ 2014-05-20 8:03 ` Madhavan Srinivasan 0 siblings, 0 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-20 8:03 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, kirill.shutemov, rusty, riel, mgorman, ak, peterz, mingo, dave.hansen On Tuesday 20 May 2014 12:58 PM, Andrew Morton wrote: > On Thu, 8 May 2014 14:58:16 +0530 Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote: > >> --- a/arch/powerpc/platforms/pseries/pseries.h >> +++ b/arch/powerpc/platforms/pseries/pseries.h >> @@ -17,6 +17,8 @@ struct device_node; >> extern void request_event_sources_irqs(struct device_node *np, >> irq_handler_t handler, const char *name); >> >> +extern unsigned int fault_around_order; > > This isn't an appropriate header file for exporting something from core > mm - what happens if arch/mn10300 wants it?. > > I guess include/linux/mm.h is the place. > Rusty already suggested this. My bad. Reason for adding it here was that, I did the performance test for this platform. Will change and send it out. Thanks for review Regards Maddy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-08 9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan @ 2014-05-15 8:25 ` Madhavan Srinivasan 2014-05-15 17:28 ` Hugh Dickins 2 siblings, 1 reply; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-15 8:25 UTC (permalink / raw) To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86 Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen Hi Ingo, Do you have any comments for the latest version of the patchset. If not, kindly can you pick it up as is. With regards Maddy > Kirill A. Shutemov with 8c6e50b029 commit introduced > vm_ops->map_pages() for mapping easy accessible pages around > fault address in hope to reduce number of minor page faults. > > This patch creates infrastructure to modify the FAULT_AROUND_ORDER > value using mm/Kconfig. This will enable architecture maintainers > to decide on suitable FAULT_AROUND_ORDER value based on > performance data for that architecture. First patch also defaults > FAULT_AROUND_ORDER Kconfig element to 4. Second patch list > out the performance numbers for powerpc (platform pseries) and > initialize the fault around order variable for pseries platform of > powerpc. > > V4 Changes: > Replaced the BUILD_BUG_ON with VM_BUG_ON. > Moved fault_around_pages() and fault_around_mask() functions outside of > #ifdef CONFIG_DEBUG_FS. > > V3 Changes: > Replaced FAULT_AROUND_ORDER macro to a variable to support arch's that > supports sub platforms. > Made changes in commit messages. > > V2 Changes: > Created Kconfig parameter for FAULT_AROUND_ORDER > Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0 > Made changes in commit messages. > > Madhavan Srinivasan (2): > mm: move FAULT_AROUND_ORDER to arch/ > powerpc/pseries: init fault_around_order for pseries > > arch/powerpc/platforms/pseries/pseries.h | 2 ++ > arch/powerpc/platforms/pseries/setup.c | 5 +++++ > mm/Kconfig | 8 ++++++++ > mm/memory.c | 25 ++++++------------------- > 4 files changed, 21 insertions(+), 19 deletions(-) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-15 8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan @ 2014-05-15 17:28 ` Hugh Dickins 2014-05-19 0:12 ` Rusty Russell 0 siblings, 1 reply; 26+ messages in thread From: Hugh Dickins @ 2014-05-15 17:28 UTC (permalink / raw) To: Madhavan Srinivasan Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen On Thu, 15 May 2014, Madhavan Srinivasan wrote: > > Hi Ingo, > > Do you have any comments for the latest version of the patchset. If > not, kindly can you pick it up as is. > > > With regards > Maddy > > > Kirill A. Shutemov with 8c6e50b029 commit introduced > > vm_ops->map_pages() for mapping easy accessible pages around > > fault address in hope to reduce number of minor page faults. > > > > This patch creates infrastructure to modify the FAULT_AROUND_ORDER > > value using mm/Kconfig. This will enable architecture maintainers > > to decide on suitable FAULT_AROUND_ORDER value based on > > performance data for that architecture. First patch also defaults > > FAULT_AROUND_ORDER Kconfig element to 4. Second patch list > > out the performance numbers for powerpc (platform pseries) and > > initialize the fault around order variable for pseries platform of > > powerpc. Sorry for not commenting earlier - just reminded by this ping to Ingo. I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. arch/powerpc/Kconfig suggests that Power supports base page size of 4k, 16k, 64k or 256k. I would expect your optimal fault_around_order to depend very much on the base page size. Perhaps fault_around_size would provide a more useful default? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-15 17:28 ` Hugh Dickins @ 2014-05-19 0:12 ` Rusty Russell 2014-05-19 3:05 ` Madhavan Srinivasan 0 siblings, 1 reply; 26+ messages in thread From: Rusty Russell @ 2014-05-19 0:12 UTC (permalink / raw) To: Hugh Dickins, Madhavan Srinivasan Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen Hugh Dickins <hughd@google.com> writes: > On Thu, 15 May 2014, Madhavan Srinivasan wrote: >> >> Hi Ingo, >> >> Do you have any comments for the latest version of the patchset. If >> not, kindly can you pick it up as is. >> >> >> With regards >> Maddy >> >> > Kirill A. Shutemov with 8c6e50b029 commit introduced >> > vm_ops->map_pages() for mapping easy accessible pages around >> > fault address in hope to reduce number of minor page faults. >> > >> > This patch creates infrastructure to modify the FAULT_AROUND_ORDER >> > value using mm/Kconfig. This will enable architecture maintainers >> > to decide on suitable FAULT_AROUND_ORDER value based on >> > performance data for that architecture. First patch also defaults >> > FAULT_AROUND_ORDER Kconfig element to 4. Second patch list >> > out the performance numbers for powerpc (platform pseries) and >> > initialize the fault around order variable for pseries platform of >> > powerpc. > > Sorry for not commenting earlier - just reminded by this ping to Ingo. > > I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. > > arch/powerpc/Kconfig suggests that Power supports base page size of > 4k, 16k, 64k or 256k. > > I would expect your optimal fault_around_order to depend very much on > the base page size. It was 64k, which is what PPC64 uses on all the major distributions. You really only get a choice of 4k and 64k with 64 bit power. > Perhaps fault_around_size would provide a more useful default? That seems to fit. With 4k pages and order 4, you're asking for 64k. Maddy's result shows 64k is also reasonable for 64k pages. Perhaps we try to generalize from two data points (a slight improvement over doing it from 1!), eg: /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ unsigned int fault_around_order __read_mostly = (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); Cheers, Rusty. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 0:12 ` Rusty Russell @ 2014-05-19 3:05 ` Madhavan Srinivasan 2014-05-19 23:23 ` Hugh Dickins 0 siblings, 1 reply; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-19 3:05 UTC (permalink / raw) To: Rusty Russell, Hugh Dickins Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen On Monday 19 May 2014 05:42 AM, Rusty Russell wrote: > Hugh Dickins <hughd@google.com> writes: >> On Thu, 15 May 2014, Madhavan Srinivasan wrote: >>> >>> Hi Ingo, >>> >>> Do you have any comments for the latest version of the patchset. If >>> not, kindly can you pick it up as is. >>> >>> >>> With regards >>> Maddy >>> >>>> Kirill A. Shutemov with 8c6e50b029 commit introduced >>>> vm_ops->map_pages() for mapping easy accessible pages around >>>> fault address in hope to reduce number of minor page faults. >>>> >>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER >>>> value using mm/Kconfig. This will enable architecture maintainers >>>> to decide on suitable FAULT_AROUND_ORDER value based on >>>> performance data for that architecture. First patch also defaults >>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list >>>> out the performance numbers for powerpc (platform pseries) and >>>> initialize the fault around order variable for pseries platform of >>>> powerpc. >> >> Sorry for not commenting earlier - just reminded by this ping to Ingo. >> >> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. >> >> arch/powerpc/Kconfig suggests that Power supports base page size of >> 4k, 16k, 64k or 256k. >> >> I would expect your optimal fault_around_order to depend very much on >> the base page size. > > It was 64k, which is what PPC64 uses on all the major distributions. > You really only get a choice of 4k and 64k with 64 bit power. > This is true. PPC64 support multiple pagesize and yes the default page size of 64k, is taken as base pagesize for the tests. >> Perhaps fault_around_size would provide a more useful default? > > That seems to fit. With 4k pages and order 4, you're asking for 64k. > Maddy's result shows 64k is also reasonable for 64k pages. > > Perhaps we try to generalize from two data points (a slight improvement > over doing it from 1!), eg: > > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ > unsigned int fault_around_order __read_mostly = > (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); > This may be right. But these are the concerns, will not this make other arch to pick default without any tuning and also this will remove the compile time option to disable the feature? Thanks for review With regards Maddy > Cheers, > Rusty. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 3:05 ` Madhavan Srinivasan @ 2014-05-19 23:23 ` Hugh Dickins 2014-05-19 23:43 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Hugh Dickins @ 2014-05-19 23:23 UTC (permalink / raw) To: Madhavan Srinivasan Cc: Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen On Mon, 19 May 2014, Madhavan Srinivasan wrote: > On Monday 19 May 2014 05:42 AM, Rusty Russell wrote: > > Hugh Dickins <hughd@google.com> writes: > >> On Thu, 15 May 2014, Madhavan Srinivasan wrote: > >>> > >>> Hi Ingo, > >>> > >>> Do you have any comments for the latest version of the patchset. If > >>> not, kindly can you pick it up as is. > >>> > >>> > >>> With regards > >>> Maddy > >>> > >>>> Kirill A. Shutemov with 8c6e50b029 commit introduced > >>>> vm_ops->map_pages() for mapping easy accessible pages around > >>>> fault address in hope to reduce number of minor page faults. > >>>> > >>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER > >>>> value using mm/Kconfig. This will enable architecture maintainers > >>>> to decide on suitable FAULT_AROUND_ORDER value based on > >>>> performance data for that architecture. First patch also defaults > >>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list > >>>> out the performance numbers for powerpc (platform pseries) and > >>>> initialize the fault around order variable for pseries platform of > >>>> powerpc. > >> > >> Sorry for not commenting earlier - just reminded by this ping to Ingo. > >> > >> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. > >> > >> arch/powerpc/Kconfig suggests that Power supports base page size of > >> 4k, 16k, 64k or 256k. > >> > >> I would expect your optimal fault_around_order to depend very much on > >> the base page size. > > > > It was 64k, which is what PPC64 uses on all the major distributions. > > You really only get a choice of 4k and 64k with 64 bit power. > > > This is true. PPC64 support multiple pagesize and yes the default page > size of 64k, is taken as base pagesize for the tests. > > >> Perhaps fault_around_size would provide a more useful default? > > > > That seems to fit. With 4k pages and order 4, you're asking for 64k. > > Maddy's result shows 64k is also reasonable for 64k pages. > > > > Perhaps we try to generalize from two data points (a slight improvement > > over doing it from 1!), eg: > > > > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ > > unsigned int fault_around_order __read_mostly = > > (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); Rusty's bimodal answer doesn't seem the right starting point to me. Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be the order of the fault-around size in bytes, and fault_around_pages() use 1UL << (fault_around_order - PAGE_SHIFT) - when that doesn't wrap, of course! That would at least have a better chance of being appropriate for architectures with 8k and 16k pages (Itanium springs to mind). Not necessarily right for them, since each architecture may have different faulting overheads; but a better chance of being right than blindly assuming 4k or 64k pages for everyone. I'd be glad to see that change go into v3.15: what do you think, Kirill, are we too late to make such a change now? Or do you see some objection to it? > This may be right. But these are the concerns, will not this make other > arch to pick default without any tuning Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages? Did other architectures, with other page sizes, back that default? Clearly not powerpc. > and also this will remove the > compile time option to disable the feature? Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig for v3.16? I'm not sure whether Rusty was arguing against that or not. I think we are all three concerned to have a more sensible default than what's there at present. I don't feel very strongly about your Kconfig option: I've no objection, if it were to default to byte order 16. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 23:23 ` Hugh Dickins @ 2014-05-19 23:43 ` Andrew Morton 2014-05-20 0:44 ` Kirill A. Shutemov 2014-05-20 1:14 ` Rusty Russell 2014-05-20 2:06 ` Madhavan Srinivasan 2 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2014-05-19 23:43 UTC (permalink / raw) To: Hugh Dickins Cc: Madhavan Srinivasan, Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > the order of the fault-around size in bytes, and fault_around_pages() > use 1UL << (fault_around_order - PAGE_SHIFT) Yes. And shame on me for missing it (this time!) at review. There's still time to fix this. Patches, please. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 23:43 ` Andrew Morton @ 2014-05-20 0:44 ` Kirill A. Shutemov 2014-05-20 6:22 ` Rusty Russell 0 siblings, 1 reply; 26+ messages in thread From: Kirill A. Shutemov @ 2014-05-20 0:44 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Madhavan Srinivasan, Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen Andrew Morton wrote: > On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > > the order of the fault-around size in bytes, and fault_around_pages() > > use 1UL << (fault_around_order - PAGE_SHIFT) > > Yes. And shame on me for missing it (this time!) at review. > > There's still time to fix this. Patches, please. Here it is. Made at 3.30 AM, build tested only. I'll sign it off tomorrow after testing. diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..9d6941c9a9e4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3402,62 +3402,62 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -#define FAULT_AROUND_ORDER 4 +#define FAULT_AROUND_BYTES 65536 #ifdef CONFIG_DEBUG_FS -static unsigned int fault_around_order = FAULT_AROUND_ORDER; +static unsigned int fault_around_bytes = FAULT_AROUND_BYTES; -static int fault_around_order_get(void *data, u64 *val) +static int fault_around_bytes_get(void *data, u64 *val) { - *val = fault_around_order; + *val = fault_around_bytes; return 0; } -static int fault_around_order_set(void *data, u64 val) +static int fault_around_bytes_set(void *data, u64 val) { - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); - if (1UL << val > PTRS_PER_PTE) + BUILD_BUG_ON(FAULT_AROUND_BYTES / PAGE_SIZE > PTRS_PER_PTE); + if (val / PAGE_SIZE > PTRS_PER_PTE) return -EINVAL; - fault_around_order = val; + fault_around_bytes = val; return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops, - fault_around_order_get, fault_around_order_set, "%llu\n"); +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, + fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); static int __init fault_around_debugfs(void) { void *ret; - ret = debugfs_create_file("fault_around_order", 0644, NULL, NULL, - &fault_around_order_fops); + ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, + &fault_around_bytes_fops); if (!ret) - pr_warn("Failed to create fault_around_order in debugfs"); + pr_warn("Failed to create fault_around_bytes in debugfs"); return 0; } late_initcall(fault_around_debugfs); static inline unsigned long fault_around_pages(void) { - return 1UL << fault_around_order; + return fault_around_bytes / PAGE_SIZE; } static inline unsigned long fault_around_mask(void) { - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); + return ~(round_down(fault_around_bytes, PAGE_SIZE) - 1); } #else static inline unsigned long fault_around_pages(void) { unsigned long nr_pages; - nr_pages = 1UL << FAULT_AROUND_ORDER; + nr_pages = FAULT_AROUND_BYTES / PAGE_SIZE; BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); return nr_pages; } static inline unsigned long fault_around_mask(void) { - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); + return ~(round_down(FAULT_AROUND_BYTES, PAGE_SIZE) - 1); } #endif @@ -3515,7 +3515,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages) { + if (vma->vm_ops->map_pages && fault_around_pages() > 1) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte)) -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 0:44 ` Kirill A. Shutemov @ 2014-05-20 6:22 ` Rusty Russell 2014-05-20 7:32 ` Andrew Morton 2014-05-20 10:27 ` Kirill A. Shutemov 0 siblings, 2 replies; 26+ messages in thread From: Rusty Russell @ 2014-05-20 6:22 UTC (permalink / raw) To: Kirill A. Shutemov, Andrew Morton Cc: Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > Andrew Morton wrote: >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >> >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >> > the order of the fault-around size in bytes, and fault_around_pages() >> > use 1UL << (fault_around_order - PAGE_SHIFT) >> >> Yes. And shame on me for missing it (this time!) at review. >> >> There's still time to fix this. Patches, please. > > Here it is. Made at 3.30 AM, build tested only. Prefer on top of Maddy's patch which makes it always a variable, rather than CONFIG_DEBUG_FS. It's got enough hair as it is. Cheers, Rusty. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 6:22 ` Rusty Russell @ 2014-05-20 7:32 ` Andrew Morton 2014-05-20 7:53 ` Madhavan Srinivasan 2014-05-20 10:27 ` Kirill A. Shutemov 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2014-05-20 7:32 UTC (permalink / raw) To: Rusty Russell Cc: Kirill A. Shutemov, Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Tue, 20 May 2014 15:52:07 +0930 Rusty Russell <rusty@rustcorp.com.au> wrote: > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > Andrew Morton wrote: > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > >> > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > >> > the order of the fault-around size in bytes, and fault_around_pages() > >> > use 1UL << (fault_around_order - PAGE_SHIFT) > >> > >> Yes. And shame on me for missing it (this time!) at review. > >> > >> There's still time to fix this. Patches, please. > > > > Here it is. Made at 3.30 AM, build tested only. > > Prefer on top of Maddy's patch which makes it always a variable, rather > than CONFIG_DEBUG_FS. It's got enough hair as it is. > We're at 3.15-rc5 and this interface should be finalised for 3.16. So Kirrill's patch is pretty urgent and should come first. Well. It's only a debugfs interface at this stage so we are allowed to change it later, but it's better not to. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 7:32 ` Andrew Morton @ 2014-05-20 7:53 ` Madhavan Srinivasan 0 siblings, 0 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-20 7:53 UTC (permalink / raw) To: Andrew Morton, Rusty Russell Cc: Kirill A. Shutemov, Hugh Dickins, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Tuesday 20 May 2014 01:02 PM, Andrew Morton wrote: > On Tue, 20 May 2014 15:52:07 +0930 Rusty Russell <rusty@rustcorp.com.au> wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >>> Andrew Morton wrote: >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >>>> >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >>>>> the order of the fault-around size in bytes, and fault_around_pages() >>>>> use 1UL << (fault_around_order - PAGE_SHIFT) >>>> >>>> Yes. And shame on me for missing it (this time!) at review. >>>> >>>> There's still time to fix this. Patches, please. >>> >>> Here it is. Made at 3.30 AM, build tested only. >> >> Prefer on top of Maddy's patch which makes it always a variable, rather >> than CONFIG_DEBUG_FS. It's got enough hair as it is. >> > > We're at 3.15-rc5 and this interface should be finalised for 3.16. So > Kirrill's patch is pretty urgent and should come first. > > Well. It's only a debugfs interface at this stage so we are allowed to > change it later, but it's better not to. > My patchset does not change the interface, but uses the current fault around order variable from CONFIG_DEBUG_FS block to allow changes at runtime, instead of having a constant and some cleanup. Thanks for review Regards --Maddy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 6:22 ` Rusty Russell 2014-05-20 7:32 ` Andrew Morton @ 2014-05-20 10:27 ` Kirill A. Shutemov 2014-05-20 19:59 ` Andrew Morton 2014-05-27 6:24 ` Madhavan Srinivasan 1 sibling, 2 replies; 26+ messages in thread From: Kirill A. Shutemov @ 2014-05-20 10:27 UTC (permalink / raw) To: Rusty Russell Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins, Madhavan Srinivasan Rusty Russell wrote: > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > Andrew Morton wrote: > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > >> > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > >> > the order of the fault-around size in bytes, and fault_around_pages() > >> > use 1UL << (fault_around_order - PAGE_SHIFT) > >> > >> Yes. And shame on me for missing it (this time!) at review. > >> > >> There's still time to fix this. Patches, please. > > > > Here it is. Made at 3.30 AM, build tested only. > > Prefer on top of Maddy's patch which makes it always a variable, rather > than CONFIG_DEBUG_FS. It's got enough hair as it is. Something like this? From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Date: Tue, 20 May 2014 13:02:03 +0300 Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order There are evidences that faultaround feature is less relevant on architectures with page size bigger then 4k. Which makes sense since page fault overhead per byte of mapped area should be less there. Let's rework the feature to specify faultaround area in bytes instead of page order. It's 64 kilobytes for now. The patch effectively disables faultaround on architectures with page size >= 64k (like ppc64). It's possible that some other size of faultaround area is relevant for a platform. We can expose `fault_around_bytes' variable to arch-specific code once such platforms will be found. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..252b319e8cdf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -#define FAULT_AROUND_ORDER 4 +static unsigned long fault_around_bytes = 65536; + +static inline unsigned long fault_around_pages(void) +{ + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; +} + +static inline unsigned long fault_around_mask(void) +{ + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; +} -#ifdef CONFIG_DEBUG_FS -static unsigned int fault_around_order = FAULT_AROUND_ORDER; -static int fault_around_order_get(void *data, u64 *val) +#ifdef CONFIG_DEBUG_FS +static int fault_around_bytes_get(void *data, u64 *val) { - *val = fault_around_order; + *val = fault_around_bytes; return 0; } -static int fault_around_order_set(void *data, u64 val) +static int fault_around_bytes_set(void *data, u64 val) { - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); - if (1UL << val > PTRS_PER_PTE) + if (val / PAGE_SIZE > PTRS_PER_PTE) return -EINVAL; - fault_around_order = val; + fault_around_bytes = val; return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops, - fault_around_order_get, fault_around_order_set, "%llu\n"); +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, + fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); static int __init fault_around_debugfs(void) { void *ret; - ret = debugfs_create_file("fault_around_order", 0644, NULL, NULL, - &fault_around_order_fops); + ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, + &fault_around_bytes_fops); if (!ret) - pr_warn("Failed to create fault_around_order in debugfs"); + pr_warn("Failed to create fault_around_bytes in debugfs"); return 0; } late_initcall(fault_around_debugfs); - -static inline unsigned long fault_around_pages(void) -{ - return 1UL << fault_around_order; -} - -static inline unsigned long fault_around_mask(void) -{ - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); -} -#else -static inline unsigned long fault_around_pages(void) -{ - unsigned long nr_pages; - - nr_pages = 1UL << FAULT_AROUND_ORDER; - BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); - return nr_pages; -} - -static inline unsigned long fault_around_mask(void) -{ - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); -} #endif static void do_fault_around(struct vm_area_struct *vma, unsigned long address, @@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages) { + if (vma->vm_ops->map_pages && fault_around_pages() > 1) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte)) -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 10:27 ` Kirill A. Shutemov @ 2014-05-20 19:59 ` Andrew Morton 2014-05-21 13:40 ` Kirill A. Shutemov 2014-05-27 6:24 ` Madhavan Srinivasan 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2014-05-20 19:59 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Tue, 20 May 2014 13:27:38 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > Rusty Russell wrote: > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > Andrew Morton wrote: > > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > >> > > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > > >> > the order of the fault-around size in bytes, and fault_around_pages() > > >> > use 1UL << (fault_around_order - PAGE_SHIFT) > > >> > > >> Yes. And shame on me for missing it (this time!) at review. > > >> > > >> There's still time to fix this. Patches, please. > > > > > > Here it is. Made at 3.30 AM, build tested only. > > > > Prefer on top of Maddy's patch which makes it always a variable, rather > > than CONFIG_DEBUG_FS. It's got enough hair as it is. > > Something like this? This appears to be against mainline, not against Madhavan's patch. As mentioned previously, I'd prefer it that way but confused. > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Date: Tue, 20 May 2014 13:02:03 +0300 > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > There are evidences that faultaround feature is less relevant on > architectures with page size bigger then 4k. Which makes sense since > page fault overhead per byte of mapped area should be less there. > > Let's rework the feature to specify faultaround area in bytes instead of > page order. It's 64 kilobytes for now. > > The patch effectively disables faultaround on architectures with > page size >= 64k (like ppc64). > > It's possible that some other size of faultaround area is relevant for a > platform. We can expose `fault_around_bytes' variable to arch-specific > code once such platforms will be found. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > 1 file changed, 23 insertions(+), 39 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..252b319e8cdf 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > update_mmu_cache(vma, address, pte); > } > > -#define FAULT_AROUND_ORDER 4 > +static unsigned long fault_around_bytes = 65536; > + > +static inline unsigned long fault_around_pages(void) > +{ > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > +} I think we should round up, not down. So if the user asks for 1kb, they get one page. So this becomes return PAGE_ALIGN(fault_around_bytes) / PAGE_SIZE; > +static inline unsigned long fault_around_mask(void) > +{ > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > +} And this has me a bit stumped. It's not helpful that do_fault_around() is undocumented. Does it fault in N/2 pages ahead and N/2 pages behind? Or does it align the address down to the highest multiple of fault_around_bytes? It appears to be the latter, so the location of the faultaround window around the fault address is basically random, depending on what address userspace happened to pick. I don't know why we did this :( Or something. Can we please get some code commentary over do_fault_around() describing this design decision and explaining the reasoning behind it? Also, "neast" is not a word. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 19:59 ` Andrew Morton @ 2014-05-21 13:40 ` Kirill A. Shutemov 2014-05-21 20:34 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Kirill A. Shutemov @ 2014-05-21 13:40 UTC (permalink / raw) To: Andrew Morton Cc: Kirill A. Shutemov, Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen Andrew Morton wrote: > On Tue, 20 May 2014 13:27:38 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > > > Rusty Russell wrote: > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > > Andrew Morton wrote: > > > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > > >> > > > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > > > >> > the order of the fault-around size in bytes, and fault_around_pages() > > > >> > use 1UL << (fault_around_order - PAGE_SHIFT) > > > >> > > > >> Yes. And shame on me for missing it (this time!) at review. > > > >> > > > >> There's still time to fix this. Patches, please. > > > > > > > > Here it is. Made at 3.30 AM, build tested only. > > > > > > Prefer on top of Maddy's patch which makes it always a variable, rather > > > than CONFIG_DEBUG_FS. It's got enough hair as it is. > > > > Something like this? > > This appears to be against mainline, not against Madhavan's patch. As > mentioned previously, I'd prefer it that way but confused. > > > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Date: Tue, 20 May 2014 13:02:03 +0300 > > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > > > There are evidences that faultaround feature is less relevant on > > architectures with page size bigger then 4k. Which makes sense since > > page fault overhead per byte of mapped area should be less there. > > > > Let's rework the feature to specify faultaround area in bytes instead of > > page order. It's 64 kilobytes for now. > > > > The patch effectively disables faultaround on architectures with > > page size >= 64k (like ppc64). > > > > It's possible that some other size of faultaround area is relevant for a > > platform. We can expose `fault_around_bytes' variable to arch-specific > > code once such platforms will be found. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > > 1 file changed, 23 insertions(+), 39 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 037b812a9531..252b319e8cdf 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > > update_mmu_cache(vma, address, pte); > > } > > > > -#define FAULT_AROUND_ORDER 4 > > +static unsigned long fault_around_bytes = 65536; > > + > > +static inline unsigned long fault_around_pages(void) > > +{ > > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > > +} > > I think we should round up, not down. So if the user asks for 1kb, > they get one page. > > So this becomes > > return PAGE_ALIGN(fault_around_bytes) / PAGE_SIZE; See below. > > +static inline unsigned long fault_around_mask(void) > > +{ > > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > > +} > > And this has me a bit stumped. It's not helpful that do_fault_around() > is undocumented. Does it fault in N/2 pages ahead and N/2 pages > behind? Or does it align the address down to the highest multiple of > fault_around_bytes? It appears to be the latter, so the location of > the faultaround window around the fault address is basically random, > depending on what address userspace happened to pick. I don't know why > we did this :( When we call ->map_pages() we need to make sure that we stay within VMA and the page table. We don't want to cross page table boundary, because page table is what ptlock covers in split ptlock case. I've designed the feature with fault area nominated in page order in mind and I found it's easier to make sure we don't cross boundaries, if we would align virtual address of fault around area to PAGE_SIZE << FAULT_AROUND_ORDER. And yes fault address may be anywhere within the area. You can think about this as a virtual page with size PAGE_SIZE << FAULT_AROUND_ORDER: no matter what is fault address, we handle area naturally aligned to page size which fault address belong to. I've used rounddown_pow_of_two() in the patch to align to nearest page order, not to page size, because that's what current do_fault_around() expect to see. And roundup is not an option: nobody expects fault around area to be 128k if fault_around_bytes set to 64k + 1 bytes. If you think we need this I can rework do_fault_around() to handle non-pow-of-two fault_around_pages(), but I don't think it's good idea to do this for v3.15. Anyway, patch I've proposed allows change fault_around_bytes only from DEBUG_FS and roundown should be good enough there. > Or something. Can we please get some code commentary over > do_fault_around() describing this design decision and explaining the > reasoning behind it? I'll do this. But if do_fault_around() rework is needed, I want to do that first. > Also, "neast" is not a word. :facepalm: From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Date: Wed, 21 May 2014 16:36:42 +0300 Subject: [PATCH] mm: fix typo in comment in do_fault_around() Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 252b319e8cdf..f76663c31da6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3460,7 +3460,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address, /* * max_pgoff is either end of page table or end of vma - * or fault_around_pages() from pgoff, depending what is neast. + * or fault_around_pages() from pgoff, depending what is nearest. */ max_pgoff = pgoff - ((start_addr >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) + PTRS_PER_PTE - 1; -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-21 13:40 ` Kirill A. Shutemov @ 2014-05-21 20:34 ` Andrew Morton 2014-05-23 12:28 ` Kirill A. Shutemov 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2014-05-21 20:34 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Wed, 21 May 2014 16:40:27 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > > Or something. Can we please get some code commentary over > > do_fault_around() describing this design decision and explaining the > > reasoning behind it? > > I'll do this. But if do_fault_around() rework is needed, I want to do that > first. This sort of thing should be at least partially driven by observation and I don't have the data for that. My seat of the pants feel is that after the first fault, accesses at higher addresses are more common/probable than accesses at lower addresses. In which case we should see improvements by centering the window at some higher address than the fault. Much instrumentation and downstream analysis is needed and the returns will be pretty small! But we don't need to do all that right now. Let's get the current implementation wrapped up for 3.15: get the interface finalized (bytes, not pages!) and get the current design decisions appropriately documented. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-21 20:34 ` Andrew Morton @ 2014-05-23 12:28 ` Kirill A. Shutemov 0 siblings, 0 replies; 26+ messages in thread From: Kirill A. Shutemov @ 2014-05-23 12:28 UTC (permalink / raw) To: Andrew Morton Cc: Kirill A. Shutemov, Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen Andrew Morton wrote: > On Wed, 21 May 2014 16:40:27 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > > > > Or something. Can we please get some code commentary over > > > do_fault_around() describing this design decision and explaining the > > > reasoning behind it? > > > > I'll do this. But if do_fault_around() rework is needed, I want to do that > > first. > > This sort of thing should be at least partially driven by observation > and I don't have the data for that. My seat of the pants feel is that > after the first fault, accesses at higher addresses are more > common/probable than accesses at lower addresses. It's probably true for data, but the feature is mostly targeted to code pages and situation is not that obvious to me with all jumps. > But we don't need to do all that right now. Let's get the current > implementation wrapped up for 3.15: get the interface finalized (bytes, > not pages!) The patch above by thread is okay for that, right? > and get the current design decisions appropriately documented. Here it is. Based on patch to convert order->bytes. From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Date: Fri, 23 May 2014 15:16:47 +0300 Subject: [PATCH] mm: document do_fault_around() feature Some clarification on how faultaround works. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/memory.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 252b319e8cdf..8d723b8d3c86 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3404,6 +3404,10 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, static unsigned long fault_around_bytes = 65536; +/* + * fault_around_pages() and fault_around_mask() round down fault_around_bytes + * to nearest page order. It's what do_fault_around() expects to see. + */ static inline unsigned long fault_around_pages(void) { return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; @@ -3445,6 +3449,29 @@ static int __init fault_around_debugfs(void) late_initcall(fault_around_debugfs); #endif +/* + * do_fault_around() tries to map few pages around the fault address. The hope + * is that the pages will be needed soon and this would lower the number of + * faults to handle. + * + * It uses vm_ops->map_pages() to map the pages, which skips the page if it's + * not ready to be mapped: not up-to-date, locked, etc. + * + * This function is called with the page table lock taken. In the split ptlock + * case the page table lock only protects only those entries which belong to + * page table corresponding to the fault address. + * + * This function don't cross the VMA boundaries in order to call map_pages() + * only once. + * + * fault_around_pages() defines how many pages we'll try to map. + * do_fault_around() expects it to be power of two and less or equal to + * PTRS_PER_PTE. + * + * The virtual address of the area that we map is naturally aligned to the + * fault_around_pages() (and therefore to page order). This way it's easier to + * guarantee that we don't cross the page table boundaries. + */ static void do_fault_around(struct vm_area_struct *vma, unsigned long address, pte_t *pte, pgoff_t pgoff, unsigned int flags) { -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 10:27 ` Kirill A. Shutemov 2014-05-20 19:59 ` Andrew Morton @ 2014-05-27 6:24 ` Madhavan Srinivasan 2014-05-27 10:21 ` Kirill A. Shutemov 1 sibling, 1 reply; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-27 6:24 UTC (permalink / raw) To: Kirill A. Shutemov, Rusty Russell Cc: Andrew Morton, Hugh Dickins, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: > Rusty Russell wrote: >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >>> Andrew Morton wrote: >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >>>> >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >>>>> the order of the fault-around size in bytes, and fault_around_pages() >>>>> use 1UL << (fault_around_order - PAGE_SHIFT) >>>> >>>> Yes. And shame on me for missing it (this time!) at review. >>>> >>>> There's still time to fix this. Patches, please. >>> >>> Here it is. Made at 3.30 AM, build tested only. >> >> Prefer on top of Maddy's patch which makes it always a variable, rather >> than CONFIG_DEBUG_FS. It's got enough hair as it is. > > Something like this? > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Date: Tue, 20 May 2014 13:02:03 +0300 > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > There are evidences that faultaround feature is less relevant on > architectures with page size bigger then 4k. Which makes sense since > page fault overhead per byte of mapped area should be less there. > > Let's rework the feature to specify faultaround area in bytes instead of > page order. It's 64 kilobytes for now. > > The patch effectively disables faultaround on architectures with > page size >= 64k (like ppc64). > > It's possible that some other size of faultaround area is relevant for a > platform. We can expose `fault_around_bytes' variable to arch-specific > code once such platforms will be found. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > 1 file changed, 23 insertions(+), 39 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..252b319e8cdf 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > update_mmu_cache(vma, address, pte); > } > > -#define FAULT_AROUND_ORDER 4 > +static unsigned long fault_around_bytes = 65536; > + > +static inline unsigned long fault_around_pages(void) > +{ > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > +} > + > +static inline unsigned long fault_around_mask(void) > +{ > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > +} > > -#ifdef CONFIG_DEBUG_FS > -static unsigned int fault_around_order = FAULT_AROUND_ORDER; > > -static int fault_around_order_get(void *data, u64 *val) > +#ifdef CONFIG_DEBUG_FS > +static int fault_around_bytes_get(void *data, u64 *val) > { > - *val = fault_around_order; > + *val = fault_around_bytes; > return 0; > } > > -static int fault_around_order_set(void *data, u64 val) > +static int fault_around_bytes_set(void *data, u64 val) > { Kindly ignore the question if not relevant. Even though we need root access to alter the value, will we be fine with negative value?. Regards Maddy > - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); > - if (1UL << val > PTRS_PER_PTE) > + if (val / PAGE_SIZE > PTRS_PER_PTE) > return -EINVAL; > - fault_around_order = val; > + fault_around_bytes = val; > return 0; > } > -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops, > - fault_around_order_get, fault_around_order_set, "%llu\n"); > +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, > + fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); > > static int __init fault_around_debugfs(void) > { > void *ret; > > - ret = debugfs_create_file("fault_around_order", 0644, NULL, NULL, > - &fault_around_order_fops); > + ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, > + &fault_around_bytes_fops); > if (!ret) > - pr_warn("Failed to create fault_around_order in debugfs"); > + pr_warn("Failed to create fault_around_bytes in debugfs"); > return 0; > } > late_initcall(fault_around_debugfs); > - > -static inline unsigned long fault_around_pages(void) > -{ > - return 1UL << fault_around_order; > -} > - > -static inline unsigned long fault_around_mask(void) > -{ > - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); > -} > -#else > -static inline unsigned long fault_around_pages(void) > -{ > - unsigned long nr_pages; > - > - nr_pages = 1UL << FAULT_AROUND_ORDER; > - BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); > - return nr_pages; > -} > - > -static inline unsigned long fault_around_mask(void) > -{ > - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); > -} > #endif > > static void do_fault_around(struct vm_area_struct *vma, unsigned long address, > @@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * if page by the offset is not ready to be mapped (cold cache or > * something). > */ > - if (vma->vm_ops->map_pages) { > + if (vma->vm_ops->map_pages && fault_around_pages() > 1) { > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > do_fault_around(vma, address, pte, pgoff, flags); > if (!pte_same(*pte, orig_pte)) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-27 6:24 ` Madhavan Srinivasan @ 2014-05-27 10:21 ` Kirill A. Shutemov 2014-05-27 10:44 ` Madhavan Srinivasan 0 siblings, 1 reply; 26+ messages in thread From: Kirill A. Shutemov @ 2014-05-27 10:21 UTC (permalink / raw) To: Madhavan Srinivasan Cc: Kirill A. Shutemov, Rusty Russell, Andrew Morton, Hugh Dickins, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen Madhavan Srinivasan wrote: > On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: > > Rusty Russell wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > >>> Andrew Morton wrote: > >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > >>>> > >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > >>>>> the order of the fault-around size in bytes, and fault_around_pages() > >>>>> use 1UL << (fault_around_order - PAGE_SHIFT) > >>>> > >>>> Yes. And shame on me for missing it (this time!) at review. > >>>> > >>>> There's still time to fix this. Patches, please. > >>> > >>> Here it is. Made at 3.30 AM, build tested only. > >> > >> Prefer on top of Maddy's patch which makes it always a variable, rather > >> than CONFIG_DEBUG_FS. It's got enough hair as it is. > > > > Something like this? > > > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Date: Tue, 20 May 2014 13:02:03 +0300 > > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > > > There are evidences that faultaround feature is less relevant on > > architectures with page size bigger then 4k. Which makes sense since > > page fault overhead per byte of mapped area should be less there. > > > > Let's rework the feature to specify faultaround area in bytes instead of > > page order. It's 64 kilobytes for now. > > > > The patch effectively disables faultaround on architectures with > > page size >= 64k (like ppc64). > > > > It's possible that some other size of faultaround area is relevant for a > > platform. We can expose `fault_around_bytes' variable to arch-specific > > code once such platforms will be found. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > > 1 file changed, 23 insertions(+), 39 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 037b812a9531..252b319e8cdf 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > > update_mmu_cache(vma, address, pte); > > } > > > > -#define FAULT_AROUND_ORDER 4 > > +static unsigned long fault_around_bytes = 65536; > > + > > +static inline unsigned long fault_around_pages(void) > > +{ > > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > > +} > > + > > +static inline unsigned long fault_around_mask(void) > > +{ > > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > > +} > > > > -#ifdef CONFIG_DEBUG_FS > > -static unsigned int fault_around_order = FAULT_AROUND_ORDER; > > > > -static int fault_around_order_get(void *data, u64 *val) > > +#ifdef CONFIG_DEBUG_FS > > +static int fault_around_bytes_get(void *data, u64 *val) > > { > > - *val = fault_around_order; > > + *val = fault_around_bytes; > > return 0; > > } > > > > -static int fault_around_order_set(void *data, u64 val) > > +static int fault_around_bytes_set(void *data, u64 val) > > { > > Kindly ignore the question if not relevant. Even though we need root > access to alter the value, will we be fine with > negative value?. val is u64. or I miss something? -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-27 10:21 ` Kirill A. Shutemov @ 2014-05-27 10:44 ` Madhavan Srinivasan 0 siblings, 0 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-27 10:44 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Rusty Russell, Andrew Morton, Hugh Dickins, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo, dave.hansen On Tuesday 27 May 2014 03:51 PM, Kirill A. Shutemov wrote: > Madhavan Srinivasan wrote: >> On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: >>> Rusty Russell wrote: >>>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >>>>> Andrew Morton wrote: >>>>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >>>>>> >>>>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >>>>>>> the order of the fault-around size in bytes, and fault_around_pages() >>>>>>> use 1UL << (fault_around_order - PAGE_SHIFT) >>>>>> >>>>>> Yes. And shame on me for missing it (this time!) at review. >>>>>> >>>>>> There's still time to fix this. Patches, please. >>>>> >>>>> Here it is. Made at 3.30 AM, build tested only. >>>> >>>> Prefer on top of Maddy's patch which makes it always a variable, rather >>>> than CONFIG_DEBUG_FS. It's got enough hair as it is. >>> >>> Something like this? >>> >>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> >>> Date: Tue, 20 May 2014 13:02:03 +0300 >>> Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order >>> >>> There are evidences that faultaround feature is less relevant on >>> architectures with page size bigger then 4k. Which makes sense since >>> page fault overhead per byte of mapped area should be less there. >>> >>> Let's rework the feature to specify faultaround area in bytes instead of >>> page order. It's 64 kilobytes for now. >>> >>> The patch effectively disables faultaround on architectures with >>> page size >= 64k (like ppc64). >>> >>> It's possible that some other size of faultaround area is relevant for a >>> platform. We can expose `fault_around_bytes' variable to arch-specific >>> code once such platforms will be found. >>> >>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >>> --- >>> mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- >>> 1 file changed, 23 insertions(+), 39 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 037b812a9531..252b319e8cdf 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, >>> update_mmu_cache(vma, address, pte); >>> } >>> >>> -#define FAULT_AROUND_ORDER 4 >>> +static unsigned long fault_around_bytes = 65536; >>> + >>> +static inline unsigned long fault_around_pages(void) >>> +{ >>> + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; >>> +} >>> + >>> +static inline unsigned long fault_around_mask(void) >>> +{ >>> + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; >>> +} >>> >>> -#ifdef CONFIG_DEBUG_FS >>> -static unsigned int fault_around_order = FAULT_AROUND_ORDER; >>> >>> -static int fault_around_order_get(void *data, u64 *val) >>> +#ifdef CONFIG_DEBUG_FS >>> +static int fault_around_bytes_get(void *data, u64 *val) >>> { >>> - *val = fault_around_order; >>> + *val = fault_around_bytes; >>> return 0; >>> } >>> >>> -static int fault_around_order_set(void *data, u64 val) >>> +static int fault_around_bytes_set(void *data, u64 val) >>> { >> >> Kindly ignore the question if not relevant. Even though we need root >> access to alter the value, will we be fine with >> negative value?. > ppc > val is u64. or I miss something? > My Bad. What I wanted to check was for all 0xf input and guess we are fine. Sorry about that. Regards Maddy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 23:23 ` Hugh Dickins 2014-05-19 23:43 ` Andrew Morton @ 2014-05-20 1:14 ` Rusty Russell 2014-05-20 2:34 ` Hugh Dickins 2014-05-20 2:06 ` Madhavan Srinivasan 2 siblings, 1 reply; 26+ messages in thread From: Rusty Russell @ 2014-05-20 1:14 UTC (permalink / raw) To: Hugh Dickins, Madhavan Srinivasan Cc: Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen Hugh Dickins <hughd@google.com> writes: > On Mon, 19 May 2014, Madhavan Srinivasan wrote: >> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote: >> > Hugh Dickins <hughd@google.com> writes: >> >> On Thu, 15 May 2014, Madhavan Srinivasan wrote: >> >>> >> >>> Hi Ingo, >> >>> >> >>> Do you have any comments for the latest version of the patchset. If >> >>> not, kindly can you pick it up as is. >> >>> >> >>> >> >>> With regards >> >>> Maddy >> >>> >> >>>> Kirill A. Shutemov with 8c6e50b029 commit introduced >> >>>> vm_ops->map_pages() for mapping easy accessible pages around >> >>>> fault address in hope to reduce number of minor page faults. >> >>>> >> >>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER >> >>>> value using mm/Kconfig. This will enable architecture maintainers >> >>>> to decide on suitable FAULT_AROUND_ORDER value based on >> >>>> performance data for that architecture. First patch also defaults >> >>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list >> >>>> out the performance numbers for powerpc (platform pseries) and >> >>>> initialize the fault around order variable for pseries platform of >> >>>> powerpc. >> >> >> >> Sorry for not commenting earlier - just reminded by this ping to Ingo. >> >> >> >> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. >> >> >> >> arch/powerpc/Kconfig suggests that Power supports base page size of >> >> 4k, 16k, 64k or 256k. >> >> >> >> I would expect your optimal fault_around_order to depend very much on >> >> the base page size. >> > >> > It was 64k, which is what PPC64 uses on all the major distributions. >> > You really only get a choice of 4k and 64k with 64 bit power. >> > >> This is true. PPC64 support multiple pagesize and yes the default page >> size of 64k, is taken as base pagesize for the tests. >> >> >> Perhaps fault_around_size would provide a more useful default? >> > >> > That seems to fit. With 4k pages and order 4, you're asking for 64k. >> > Maddy's result shows 64k is also reasonable for 64k pages. >> > >> > Perhaps we try to generalize from two data points (a slight improvement >> > over doing it from 1!), eg: >> > >> > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ >> > unsigned int fault_around_order __read_mostly = >> > (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); > > Rusty's bimodal answer doesn't seem the right starting point to me. ? It's not bimodal, it's graded. I think you misread? > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > the order of the fault-around size in bytes, and fault_around_pages() > use 1UL << (fault_around_order - PAGE_SHIFT) > - when that doesn't wrap, of course! > > That would at least have a better chance of being appropriate for > architectures with 8k and 16k pages (Itanium springs to mind). Well, from our two data points it seems that we want to fault in 64k at a time whatever our page size. Perhaps it's clearer if the code expresses itself that way. > Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages? > Did other architectures, with other page sizes, back that default? > Clearly not powerpc. Yeah, BenH flagged it as "we should test this" for powerpc, which is what Maddy then did. >> and also this will remove the >> compile time option to disable the feature? > > Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig > for v3.16? > > I'm not sure whether Rusty was arguing against that or not. I think > we are all three concerned to have a more sensible default than what's > there at present. I don't feel very strongly about your Kconfig > option: I've no objection, if it were to default to byte order 16. I don't mind either. Cheers, Rusty. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-20 1:14 ` Rusty Russell @ 2014-05-20 2:34 ` Hugh Dickins 0 siblings, 0 replies; 26+ messages in thread From: Hugh Dickins @ 2014-05-20 2:34 UTC (permalink / raw) To: Rusty Russell Cc: Madhavan Srinivasan, Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen On Tue, 20 May 2014, Rusty Russell wrote: > Hugh Dickins <hughd@google.com> writes: > >> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote: > >> > > >> > Perhaps we try to generalize from two data points (a slight improvement > >> > over doing it from 1!), eg: > >> > > >> > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ > >> > unsigned int fault_around_order __read_mostly = > >> > (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); > > > > Rusty's bimodal answer doesn't seem the right starting point to me. > > ? It's not bimodal, it's graded. I think you misread? Yikes, worse than misread, more like I was too rude even to read: sorry! Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc 2014-05-19 23:23 ` Hugh Dickins 2014-05-19 23:43 ` Andrew Morton 2014-05-20 1:14 ` Rusty Russell @ 2014-05-20 2:06 ` Madhavan Srinivasan 2 siblings, 0 replies; 26+ messages in thread From: Madhavan Srinivasan @ 2014-05-20 2:06 UTC (permalink / raw) To: Hugh Dickins Cc: Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak, peterz, mingo, dave.hansen On Tuesday 20 May 2014 04:53 AM, Hugh Dickins wrote: > On Mon, 19 May 2014, Madhavan Srinivasan wrote: >> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote: >>> Hugh Dickins <hughd@google.com> writes: >>>> On Thu, 15 May 2014, Madhavan Srinivasan wrote: >>>>> >>>>> Hi Ingo, >>>>> >>>>> Do you have any comments for the latest version of the patchset. If >>>>> not, kindly can you pick it up as is. >>>>> >>>>> >>>>> With regards >>>>> Maddy >>>>> >>>>>> Kirill A. Shutemov with 8c6e50b029 commit introduced >>>>>> vm_ops->map_pages() for mapping easy accessible pages around >>>>>> fault address in hope to reduce number of minor page faults. >>>>>> >>>>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER >>>>>> value using mm/Kconfig. This will enable architecture maintainers >>>>>> to decide on suitable FAULT_AROUND_ORDER value based on >>>>>> performance data for that architecture. First patch also defaults >>>>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list >>>>>> out the performance numbers for powerpc (platform pseries) and >>>>>> initialize the fault around order variable for pseries platform of >>>>>> powerpc. >>>> >>>> Sorry for not commenting earlier - just reminded by this ping to Ingo. >>>> >>>> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use. >>>> >>>> arch/powerpc/Kconfig suggests that Power supports base page size of >>>> 4k, 16k, 64k or 256k. >>>> >>>> I would expect your optimal fault_around_order to depend very much on >>>> the base page size. >>> >>> It was 64k, which is what PPC64 uses on all the major distributions. >>> You really only get a choice of 4k and 64k with 64 bit power. >>> >> This is true. PPC64 support multiple pagesize and yes the default page >> size of 64k, is taken as base pagesize for the tests. >> >>>> Perhaps fault_around_size would provide a more useful default? >>> >>> That seems to fit. With 4k pages and order 4, you're asking for 64k. >>> Maddy's result shows 64k is also reasonable for 64k pages. >>> >>> Perhaps we try to generalize from two data points (a slight improvement >>> over doing it from 1!), eg: >>> >>> /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */ >>> unsigned int fault_around_order __read_mostly = >>> (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT); > > Rusty's bimodal answer doesn't seem the right starting point to me. > > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > the order of the fault-around size in bytes, and fault_around_pages() > use 1UL << (fault_around_order - PAGE_SHIFT) > - when that doesn't wrap, of course! > > That would at least have a better chance of being appropriate for > architectures with 8k and 16k pages (Itanium springs to mind). > > Not necessarily right for them, since each architecture may have > different faulting overheads; but a better chance of being right > than blindly assuming 4k or 64k pages for everyone. > > I'd be glad to see that change go into v3.15: what do you think, > Kirill, are we too late to make such a change now? > Or do you see some objection to it? > >> This may be right. But these are the concerns, will not this make other >> arch to pick default without any tuning > > Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages? > Did other architectures, with other page sizes, back that default? > Clearly not powerpc. Ok. > >> and also this will remove the >> compile time option to disable the feature? > > Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig > for v3.16? > > I'm not sure whether Rusty was arguing against that or not I think > we are all three concerned to have a more sensible default than what's > there at present. I don't feel very strongly about your Kconfig Added it as one way to reset or disable the default value. But then I guess we decided on having FAULT_AROUND_ORDER as a variable which is more important than Kconfig option. > option: I've no objection, if it were to default to byte order 16. > Thanks for review With regards Maddy > Hugh > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-05-27 10:44 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-08 9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan 2014-05-08 9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan 2014-05-20 7:28 ` Andrew Morton 2014-05-20 8:03 ` Madhavan Srinivasan 2014-05-15 8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan 2014-05-15 17:28 ` Hugh Dickins 2014-05-19 0:12 ` Rusty Russell 2014-05-19 3:05 ` Madhavan Srinivasan 2014-05-19 23:23 ` Hugh Dickins 2014-05-19 23:43 ` Andrew Morton 2014-05-20 0:44 ` Kirill A. Shutemov 2014-05-20 6:22 ` Rusty Russell 2014-05-20 7:32 ` Andrew Morton 2014-05-20 7:53 ` Madhavan Srinivasan 2014-05-20 10:27 ` Kirill A. Shutemov 2014-05-20 19:59 ` Andrew Morton 2014-05-21 13:40 ` Kirill A. Shutemov 2014-05-21 20:34 ` Andrew Morton 2014-05-23 12:28 ` Kirill A. Shutemov 2014-05-27 6:24 ` Madhavan Srinivasan 2014-05-27 10:21 ` Kirill A. Shutemov 2014-05-27 10:44 ` Madhavan Srinivasan 2014-05-20 1:14 ` Rusty Russell 2014-05-20 2:34 ` Hugh Dickins 2014-05-20 2:06 ` Madhavan Srinivasan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).