From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 31 Jan 2014 14:52:24 -0800 From: Andrew Morton Subject: Re: [PATCH 1/3] Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Message-Id: <20140131145224.7f8efc67d882a2e1a89b0778@linux-foundation.org> In-Reply-To: <1391192628-113858-3-git-send-email-athorlton@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> <1391192628-113858-3-git-send-email-athorlton@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Alex Thorlton Cc: linux-kernel@vger.kernel.org, Oleg Nesterov , Gerald Schaefer , Martin Schwidefsky , Heiko Carstens , Christian Borntraeger , Paolo Bonzini , "Kirill A. Shutemov" , Mel Gorman , Rik van Riel , Ingo Molnar , Peter Zijlstra , Sasha Levin , linux390@de.ibm.com, linux-s390@vger.kernel.org, linux-mm@kvack.org List-ID: On Fri, 31 Jan 2014 12:23:43 -0600 Alex Thorlton wrote: > This reverts commit 8e72033f2a489b6c98c4e3c7cc281b1afd6cb85cm, and adds 'm' is not a hex digit ;) > in code to fix up any issues caused by the revert. > > The revert is necessary because hugepage_madvise would return -EINVAL > when VM_NOHUGEPAGE is set, which will break subsequent chunks of this > patch set. This is a bit skimpy. Why doesn't the patch re-break kvm-on-s390? it would be nice to have a lot more detail here, please. What was the intent of 8e72033f2a48, how this patch retains 8e72033f2a48's behavior, etc. > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -504,6 +504,9 @@ static int gmap_connect_pgtable(unsigned long address, unsigned long segment, > if (!pmd_present(*pmd) && > __pte_alloc(mm, vma, pmd, vmaddr)) > return -ENOMEM; > + /* large pmds cannot yet be handled */ > + if (pmd_large(*pmd)) > + return -EFAULT; This bit wasn't in 8e72033f2a48. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Alex Thorlton Subject: [PATCHv3 0/3] Add mm flag to control THP Date: Fri, 31 Jan 2014 12:23:42 -0600 Message-Id: <1391192628-113858-2-git-send-email-athorlton@sgi.com> In-Reply-To: <1391192628-113858-1-git-send-email-athorlton@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: linux-kernel@vger.kernel.org Cc: Alex Thorlton , Alexander Viro , Andrew Morton , Christian Borntraeger , "Eric W. Biederman" , Heiko Carstens , Ingo Molnar , Jiang Liu , Kees Cook , "Kirill A. Shutemov" , Martin Schwidefsky , Mel Gorman , Oleg Nesterov , Paolo Bonzini , Peter Zijlstra , Rik van Riel , Robin Holt , linux390@de.ibm.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-s390@vger.kernel.org List-ID: This patch is based on some of my work combined with some suggestions/patches given by Oleg Nesterov. The main goal here is to add a prctl switch to allow us to disable to THP on a per mm_struct basis. Changes for v3: * Pulled in Oleg's idea to use mm->def_flags and the VM_NOHUGEPAGE flag, which will get copied down to each vm, instead of adding in a whole new MMF_THP_DISABLE flag to mm->flags. This also creates a VM_INIT_DEF_MASK which allows the VM_NOHUGEPAGE flag to get carried down from def_flags. - Main benefit of implementing the flag this way is that, if a user specifically requests THP via madvise, that request can still be respected in vmas where necessary; however, for all other vmas we can have THP turned off. - This also prevents us from having to check for a new flag in multiple locations, since the VM_NOHUGEPAGE flag is already respected wherever necessary. * Made some adjustments to the way that the prctl call returns information, made sure to return -EINVAL when unnecessary arguments are passed for PRCTL_GET/SET_THP_DISABLE. * Reverted/added some code for s390 arch that was needed to get the VM_INIT_DEF_MASK idea working. The main motivation behind this patch is to provide a way to disable THP for jobs where the code cannot be modified, and using a malloc hook with madvise is not an option (i.e. statically allocated data). This patch allows us to do just that, without affecting other jobs running on the system. We need to do this sort of thing for jobs where THP hurts performance, due to the possibility of increased remote memory accesses that can be created by situations such as the following: When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will be handed out, and the THP will be stuck on whatever node the chunk was originally referenced from. If many remote nodes need to do work on that same chunk, they'll be making remote accesses. With THP disabled, 4K pages can be handed out to separate nodes as they're needed, greatly reducing the amount of remote accesses to memory. First with the flag unset: # perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 0 Set thp_disabled state to 0 Process pid = 18027 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23 Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 273719072.841402 task-clock # 641.026 CPUs utilized [100.00%] 1,008,986 context-switches # 0.000 M/sec [100.00%] 7,717 CPU-migrations # 0.000 M/sec [100.00%] 1,698,932 page-faults # 0.000 M/sec 355,222,544,890,379 cycles # 1.298 GHz [100.00%] 536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%] 409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%] 148,286,797,266,411 instructions # 0.42 insns per cycle # 3.62 stalled cycles per insn [100.00%] 27,061,793,159,503 branches # 98.867 M/sec [100.00%] 1,188,655,196 branch-misses # 0.00% of all branches 427.001706337 seconds time elapsed Now with the flag set: # perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 1 Set thp_disabled state to 1 Process pid = 144957 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23 Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 138789390.540183 task-clock # 641.959 CPUs utilized [100.00%] 534,205 context-switches # 0.000 M/sec [100.00%] 4,595 CPU-migrations # 0.000 M/sec [100.00%] 63,133,119 page-faults # 0.000 M/sec 147,977,747,269,768 cycles # 1.066 GHz [100.00%] 200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%] 105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%] 180,916,213,503,160 instructions # 1.22 insns per cycle # 1.11 stalled cycles per insn [100.00%] 26,999,511,005,868 branches # 194.536 M/sec [100.00%] 714,066,351 branch-misses # 0.00% of all branches 216.196778807 seconds time elapsed As with previous versions of the patch, We're getting about a 2x performance increase here. Here's a link to the test case I used, along with the little wrapper to activate the flag: http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz Let me know if anybody has any further suggestions here. Thanks! Alex Thorlton (3): Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE exec: kill the unnecessary mm->def_flags setting in load_elf_binary() Cc: Alexander Viro Cc: Andrew Morton Cc: Christian Borntraeger Cc: "Eric W. Biederman" Cc: Heiko Carstens Cc: Ingo Molnar Cc: Jiang Liu Cc: Kees Cook Cc: "Kirill A. Shutemov" Cc: Martin Schwidefsky Cc: Mel Gorman Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Rik van Riel Cc: Robin Holt Cc: linux390@de.ibm.com Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org arch/s390/mm/pgtable.c | 3 +++ fs/binfmt_elf.c | 4 ---- include/linux/mm.h | 2 ++ include/uapi/linux/prctl.h | 3 +++ kernel/fork.c | 11 ++++++++--- kernel/sys.c | 17 +++++++++++++++++ mm/huge_memory.c | 4 ---- 7 files changed, 33 insertions(+), 11 deletions(-) -- 1.7.12.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Alex Thorlton Subject: [PATCHv3 0/3] Add mm flag to control THP Date: Fri, 31 Jan 2014 12:23:41 -0600 Message-Id: <1391192628-113858-1-git-send-email-athorlton@sgi.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: linux-kernel@vger.kernel.org Cc: Alex Thorlton , Alexander Viro , Andrew Morton , Christian Borntraeger , "Eric W. Biederman" , Heiko Carstens , Ingo Molnar , Jiang Liu , Kees Cook , "Kirill A. Shutemov" , Martin Schwidefsky , Mel Gorman , Oleg Nesterov , Paolo Bonzini , Peter Zijlstra , Rik van Riel , Robin Holt , linux390@de.ibm.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-s390@vger.kernel.org List-ID: This patch is based on some of my work combined with some suggestions/patches given by Oleg Nesterov. The main goal here is to add a prctl switch to allow us to disable to THP on a per mm_struct basis. Changes for v3: * Pulled in Oleg's idea to use mm->def_flags and the VM_NOHUGEPAGE flag, which will get copied down to each vm, instead of adding in a whole new MMF_THP_DISABLE flag to mm->flags. This also creates a VM_INIT_DEF_MASK which allows the VM_NOHUGEPAGE flag to get carried down from def_flags. - Main benefit of implementing the flag this way is that, if a user specifically requests THP via madvise, that request can still be respected in vmas where necessary; however, for all other vmas we can have THP turned off. - This also prevents us from having to check for a new flag in multiple locations, since the VM_NOHUGEPAGE flag is already respected wherever necessary. * Made some adjustments to the way that the prctl call returns information, made sure to return -EINVAL when unnecessary arguments are passed for PRCTL_GET/SET_THP_DISABLE. * Reverted/added some code for s390 arch that was needed to get the VM_INIT_DEF_MASK idea working. The main motivation behind this patch is to provide a way to disable THP for jobs where the code cannot be modified, and using a malloc hook with madvise is not an option (i.e. statically allocated data). This patch allows us to do just that, without affecting other jobs running on the system. We need to do this sort of thing for jobs where THP hurts performance, due to the possibility of increased remote memory accesses that can be created by situations such as the following: When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will be handed out, and the THP will be stuck on whatever node the chunk was originally referenced from. If many remote nodes need to do work on that same chunk, they'll be making remote accesses. With THP disabled, 4K pages can be handed out to separate nodes as they're needed, greatly reducing the amount of remote accesses to memory. First with the flag unset: # perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 0 Set thp_disabled state to 0 Process pid = 18027 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23 Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 273719072.841402 task-clock # 641.026 CPUs utilized [100.00%] 1,008,986 context-switches # 0.000 M/sec [100.00%] 7,717 CPU-migrations # 0.000 M/sec [100.00%] 1,698,932 page-faults # 0.000 M/sec 355,222,544,890,379 cycles # 1.298 GHz [100.00%] 536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%] 409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%] 148,286,797,266,411 instructions # 0.42 insns per cycle # 3.62 stalled cycles per insn [100.00%] 27,061,793,159,503 branches # 98.867 M/sec [100.00%] 1,188,655,196 branch-misses # 0.00% of all branches 427.001706337 seconds time elapsed Now with the flag set: # perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 1 Set thp_disabled state to 1 Process pid = 144957 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23 Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 138789390.540183 task-clock # 641.959 CPUs utilized [100.00%] 534,205 context-switches # 0.000 M/sec [100.00%] 4,595 CPU-migrations # 0.000 M/sec [100.00%] 63,133,119 page-faults # 0.000 M/sec 147,977,747,269,768 cycles # 1.066 GHz [100.00%] 200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%] 105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%] 180,916,213,503,160 instructions # 1.22 insns per cycle # 1.11 stalled cycles per insn [100.00%] 26,999,511,005,868 branches # 194.536 M/sec [100.00%] 714,066,351 branch-misses # 0.00% of all branches 216.196778807 seconds time elapsed As with previous versions of the patch, We're getting about a 2x performance increase here. Here's a link to the test case I used, along with the little wrapper to activate the flag: http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz Let me know if anybody has any further suggestions here. Thanks! Cc: Alexander Viro Cc: Andrew Morton Cc: Christian Borntraeger Cc: "Eric W. Biederman" Cc: Heiko Carstens Cc: Ingo Molnar Cc: Jiang Liu Cc: Kees Cook Cc: "Kirill A. Shutemov" Cc: Martin Schwidefsky Cc: Mel Gorman Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Rik van Riel Cc: Robin Holt Cc: linux390@de.ibm.com Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Alex Thorlton (3): Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE exec: kill the unnecessary mm->def_flags setting in load_elf_binary() arch/s390/mm/pgtable.c | 3 +++ fs/binfmt_elf.c | 4 ---- include/linux/mm.h | 2 ++ include/uapi/linux/prctl.h | 3 +++ kernel/fork.c | 11 ++++++++--- kernel/sys.c | 17 +++++++++++++++++ mm/huge_memory.c | 4 ---- 7 files changed, 33 insertions(+), 11 deletions(-) -- 1.7.12.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Alex Thorlton Subject: [PATCH 1/3] Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Date: Fri, 31 Jan 2014 12:23:43 -0600 Message-Id: <1391192628-113858-3-git-send-email-athorlton@sgi.com> In-Reply-To: <1391192628-113858-1-git-send-email-athorlton@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: linux-kernel@vger.kernel.org Cc: Alex Thorlton , Oleg Nesterov , Gerald Schaefer , Martin Schwidefsky , Heiko Carstens , Christian Borntraeger , Andrew Morton , Paolo Bonzini , "Kirill A. Shutemov" , Mel Gorman , Rik van Riel , Ingo Molnar , Peter Zijlstra , Sasha Levin , linux390@de.ibm.com, linux-s390@vger.kernel.org, linux-mm@kvack.org List-ID: This reverts commit 8e72033f2a489b6c98c4e3c7cc281b1afd6cb85cm, and adds in code to fix up any issues caused by the revert. The revert is necessary because hugepage_madvise would return -EINVAL when VM_NOHUGEPAGE is set, which will break subsequent chunks of this patch set. Signed-off-by: Alex Thorlton Suggested-by: Oleg Nesterov Cc: Oleg Nesterov Cc: Gerald Schaefer Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Christian Borntraeger Cc: Andrew Morton Cc: Paolo Bonzini Cc: "Kirill A. Shutemov" Cc: Mel Gorman Cc: Rik van Riel Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Sasha Levin Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- arch/s390/mm/pgtable.c | 3 +++ mm/huge_memory.c | 4 ---- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 3584ed9..a87cdb4 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -504,6 +504,9 @@ static int gmap_connect_pgtable(unsigned long address, unsigned long segment, if (!pmd_present(*pmd) && __pte_alloc(mm, vma, pmd, vmaddr)) return -ENOMEM; + /* large pmds cannot yet be handled */ + if (pmd_large(*pmd)) + return -EFAULT; /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 82166bf..a4310a5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1968,8 +1968,6 @@ out: int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { - struct mm_struct *mm = vma->vm_mm; - switch (advice) { case MADV_HUGEPAGE: /* @@ -1977,8 +1975,6 @@ int hugepage_madvise(struct vm_area_struct *vma, */ if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP)) return -EINVAL; - if (mm->def_flags & VM_NOHUGEPAGE) - return -EINVAL; *vm_flags &= ~VM_NOHUGEPAGE; *vm_flags |= VM_HUGEPAGE; /* -- 1.7.12.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Alex Thorlton Subject: [PATCH 1/3] Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Date: Fri, 31 Jan 2014 12:23:44 -0600 Message-Id: <1391192628-113858-4-git-send-email-athorlton@sgi.com> In-Reply-To: <1391192628-113858-1-git-send-email-athorlton@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: linux-kernel@vger.kernel.org Cc: Alex Thorlton , Oleg Nesterov , Gerald Schaefer , Martin Schwidefsky , Heiko Carstens , Christian Borntraeger , Andrew Morton , Paolo Bonzini , "Kirill A. Shutemov" , Mel Gorman , Rik van Riel , Ingo Molnar , Peter Zijlstra , Sasha Levin , linux390@de.ibm.com, linux-s390@vger.kernel.org, linux-mm@kvack.org List-ID: This reverts commit 8e72033f2a489b6c98c4e3c7cc281b1afd6cb85cm, and adds in code to fix up any issues caused by the revert. The revert is necessary because hugepage_madvise would return -EINVAL when VM_NOHUGEPAGE is set, which will break subsequent chunks of this patch set. Signed-off-by: Alex Thorlton Cc: Oleg Nesterov Cc: Gerald Schaefer Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Christian Borntraeger Cc: Andrew Morton Cc: Paolo Bonzini Cc: "Kirill A. Shutemov" Cc: Mel Gorman Cc: Rik van Riel Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Sasha Levin Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- arch/s390/mm/pgtable.c | 3 +++ mm/huge_memory.c | 4 ---- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 3584ed9..a87cdb4 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -504,6 +504,9 @@ static int gmap_connect_pgtable(unsigned long address, unsigned long segment, if (!pmd_present(*pmd) && __pte_alloc(mm, vma, pmd, vmaddr)) return -ENOMEM; + /* large pmds cannot yet be handled */ + if (pmd_large(*pmd)) + return -EFAULT; /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 82166bf..a4310a5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1968,8 +1968,6 @@ out: int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { - struct mm_struct *mm = vma->vm_mm; - switch (advice) { case MADV_HUGEPAGE: /* @@ -1977,8 +1975,6 @@ int hugepage_madvise(struct vm_area_struct *vma, */ if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP)) return -EINVAL; - if (mm->def_flags & VM_NOHUGEPAGE) - return -EINVAL; *vm_flags &= ~VM_NOHUGEPAGE; *vm_flags |= VM_HUGEPAGE; /* -- 1.7.12.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 31 Jan 2014 12:25:50 -0600 From: Alex Thorlton Subject: Re: [PATCHv3 0/3] Add mm flag to control THP Message-ID: <20140131182550.GB21948@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1391192628-113858-1-git-send-email-athorlton@sgi.com> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: linux-kernel@vger.kernel.org Cc: Alexander Viro , Andrew Morton , Christian Borntraeger , "Eric W. Biederman" , Heiko Carstens , Ingo Molnar , Jiang Liu , Kees Cook , "Kirill A. Shutemov" , Martin Schwidefsky , Mel Gorman , Oleg Nesterov , Paolo Bonzini , Peter Zijlstra , Rik van Riel , Robin Holt , linux390@de.ibm.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-s390@vger.kernel.org List-ID: Ugh. Screwed up the git send-email somehow. Sorry for the duplicates in the thread. I'll get it right one of these days... - Alex -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 3 Feb 2014 14:53:21 +0100 From: Gerald Schaefer Subject: Re: [PATCH 1/3] Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Message-ID: <20140203145321.7de2bcf1@thinkpad> In-Reply-To: <20140131145224.7f8efc67d882a2e1a89b0778@linux-foundation.org> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> <1391192628-113858-3-git-send-email-athorlton@sgi.com> <20140131145224.7f8efc67d882a2e1a89b0778@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Andrew Morton Cc: Alex Thorlton , linux-kernel@vger.kernel.org, Oleg Nesterov , Martin Schwidefsky , Heiko Carstens , Christian Borntraeger , Paolo Bonzini , "Kirill A. Shutemov" , Mel Gorman , Rik van Riel , Ingo Molnar , Peter Zijlstra , Sasha Levin , linux390@de.ibm.com, linux-s390@vger.kernel.org, linux-mm@kvack.org List-ID: On Fri, 31 Jan 2014 14:52:24 -0800 Andrew Morton wrote: > On Fri, 31 Jan 2014 12:23:43 -0600 Alex Thorlton wrote: > > > This reverts commit 8e72033f2a489b6c98c4e3c7cc281b1afd6cb85cm, and adds > > 'm' is not a hex digit ;) > > > in code to fix up any issues caused by the revert. > > > > The revert is necessary because hugepage_madvise would return -EINVAL > > when VM_NOHUGEPAGE is set, which will break subsequent chunks of this > > patch set. > > This is a bit skimpy. Why doesn't the patch re-break kvm-on-s390? > > it would be nice to have a lot more detail here, please. What was the > intent of 8e72033f2a48, how this patch retains 8e72033f2a48's behavior, > etc. The intent of 8e72033f2a48 was to guard against any future programming errors that may result in an madvice(MADV_HUGEPAGE) on guest mappings, which would crash the kernel. Martin suggested adding the bit to arch/s390/mm/pgtable.c, if 8e72033f2a48 was to be reverted, because that check will also prevent a kernel crash in the case described above, it will now send a SIGSEGV instead. This would now also allow to do the madvise on other parts, if needed, so it is a more flexible approach. One could also say that it would have been better to do it this way right from the beginning... > > --- a/arch/s390/mm/pgtable.c > > +++ b/arch/s390/mm/pgtable.c > > @@ -504,6 +504,9 @@ static int gmap_connect_pgtable(unsigned long address, unsigned long segment, > > if (!pmd_present(*pmd) && > > __pte_alloc(mm, vma, pmd, vmaddr)) > > return -ENOMEM; > > + /* large pmds cannot yet be handled */ > > + if (pmd_large(*pmd)) > > + return -EFAULT; > > This bit wasn't in 8e72033f2a48. Yes, in order to be on the safe side regarding potential distribution backports, it would be good to have the revert and the "replacement" in the same patch. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-s390" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 3 Feb 2014 11:14:12 -0600 From: Alex Thorlton Subject: Re: [PATCH 1/3] Revert "thp: make MADV_HUGEPAGE check for mm->def_flags" Message-ID: <20140203171412.GA3034@sgi.com> References: <1391192628-113858-1-git-send-email-athorlton@sgi.com> <1391192628-113858-3-git-send-email-athorlton@sgi.com> <20140131145224.7f8efc67d882a2e1a89b0778@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140131145224.7f8efc67d882a2e1a89b0778@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-Archive: List-Post: To: Andrew Morton Cc: linux-kernel@vger.kernel.org, Oleg Nesterov , Gerald Schaefer , Martin Schwidefsky , Heiko Carstens , Christian Borntraeger , Paolo Bonzini , "Kirill A. Shutemov" , Mel Gorman , Rik van Riel , Ingo Molnar , Peter Zijlstra , Sasha Levin , linux390@de.ibm.com, linux-s390@vger.kernel.org, linux-mm@kvack.org List-ID: On Fri, Jan 31, 2014 at 02:52:24PM -0800, Andrew Morton wrote: > On Fri, 31 Jan 2014 12:23:43 -0600 Alex Thorlton wrote: > > > This reverts commit 8e72033f2a489b6c98c4e3c7cc281b1afd6cb85cm, and adds > > 'm' is not a hex digit ;) My mistake! Sorry about that. > > in code to fix up any issues caused by the revert. > > > > The revert is necessary because hugepage_madvise would return -EINVAL > > when VM_NOHUGEPAGE is set, which will break subsequent chunks of this > > patch set. > > This is a bit skimpy. Why doesn't the patch re-break kvm-on-s390? > > it would be nice to have a lot more detail here, please. What was the > intent of 8e72033f2a48, how this patch retains 8e72033f2a48's behavior, > etc. I'm actually not too sure about this, off hand. I just know that we couldn't have it in there because of the check for VM_NOHUGEPAGE. The s390 guys approved the revert, as long as we added in the following piece: > > --- a/arch/s390/mm/pgtable.c > > +++ b/arch/s390/mm/pgtable.c > > @@ -504,6 +504,9 @@ static int gmap_connect_pgtable(unsigned long address, unsigned long segment, > > if (!pmd_present(*pmd) && > > __pte_alloc(mm, vma, pmd, vmaddr)) > > return -ENOMEM; > > + /* large pmds cannot yet be handled */ > > + if (pmd_large(*pmd)) > > + return -EFAULT; > > This bit wasn't in 8e72033f2a48. I added the fix-up code in with the revert, so that it would all be in one place; wasn't sure what the standard was for this sort of thing. If it's preferable to see this code in a separate patch, that's easy enough to do. I'll look into exactly what the original commit was intended to do, and get a better description of what's going on here. Let me know if I should split the two changes into separate patches. - Alex -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org