* [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages [not found] <1060163918.101411.1293793346203.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2010-12-31 11:08 ` CAI Qian 2011-01-04 9:56 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: CAI Qian @ 2010-12-31 11:08 UTC (permalink / raw) To: linux-mm Hi, Problem: nr_overcommit_hugepages for 1gb hugepage went crazy. Symptom: 1) setup 1gb hugepages. # cat /proc/cmdline default_hugepagesz=1g hugepagesz=1g hugepages=1 # cat /proc/meminfo ... HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB ... 2) set nr_overcommit_hugepages # echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 1 3) overcommit 2gb hugepages. mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory) # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 18446744071589420672 As you can see from the above, it did not allow overcommit despite nr_overcommit_hugepages value. Also, nr_overcommit_hugepages was overwritten with such a strange value after overcommit failure. Should we just remove this file from sysfs for simplicity? Thanks. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2010-12-31 11:08 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages CAI Qian @ 2011-01-04 9:56 ` Michal Hocko 2011-01-04 10:21 ` CAI Qian 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2011-01-04 9:56 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm On Fri 31-12-10 06:08:21, CAI Qian wrote: > Hi, Hi, > > Problem: nr_overcommit_hugepages for 1gb hugepage went crazy. > > Symptom: > 1) setup 1gb hugepages. > # cat /proc/cmdline > default_hugepagesz=1g hugepagesz=1g hugepages=1 > # cat /proc/meminfo > ... > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 1048576 kB > ... > > 2) set nr_overcommit_hugepages > # echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > 1 > > 3) overcommit 2gb hugepages. > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory) Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB huge pages). > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > 18446744071589420672 > > As you can see from the above, it did not allow overcommit despite nr_overcommit_hugepages value. You are trying to allocate much more than your overcommit allows. > Also, nr_overcommit_hugepages was overwritten with such a strange > value after overcommit failure. Should we just remove this file from > sysfs for simplicity? This is strange. The value is set only in hugetlb_overcommit_handler which is a sysctl handler. Are you sure that you are not changing the value by the /sys interface somewhere (there is no check for the value so you can set what-ever value you like)? I fail to see any mmap code path which would change this value. Btw. which kernel version are you using. > > Thanks. > > CAI Qian Regards -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-04 9:56 ` Michal Hocko @ 2011-01-04 10:21 ` CAI Qian 2011-01-04 10:52 ` Michal Hocko 2011-01-04 17:21 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson 0 siblings, 2 replies; 18+ messages in thread From: CAI Qian @ 2011-01-04 10:21 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm > > 3) overcommit 2gb hugepages. > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, > > 3, 0) = -1 ENOMEM (Cannot allocate memory) > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB > huge pages). That is strange - the test code merely did this, addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0); Do you know if overcommit was designed for 1GB pages? At least, read this from Documentation/kernel-parameters.txt, hugepagesz= ... Note that 1GB pages can only be allocated at boot time using hugepages= and not freed afterwards. How does it allow to be overcommitted for only being able to allocate at boot time? > > Also, nr_overcommit_hugepages was overwritten with such a strange > > value after overcommit failure. Should we just remove this file from > > sysfs for simplicity? > > This is strange. The value is set only in hugetlb_overcommit_handler > which is a sysctl handler. > > Are you sure that you are not changing the value by the /sys interface > somewhere (there is no check for the value so you can set what-ever > value you like)? I fail to see any mmap code path which would change > this value. I could double-check here, but it is not important if the fact is that overcommit is not supported for 1GB pages. > Btw. which kernel version are you using. mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present in 2.6.18. Thanks. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-04 10:21 ` CAI Qian @ 2011-01-04 10:52 ` Michal Hocko 2011-01-05 4:52 ` CAI Qian 2011-01-04 17:21 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson 1 sibling, 1 reply; 18+ messages in thread From: Michal Hocko @ 2011-01-04 10:52 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm On Tue 04-01-11 05:21:46, CAI Qian wrote: > > > > 3) overcommit 2gb hugepages. > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, > > > 3, 0) = -1 ENOMEM (Cannot allocate memory) > > > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB > > huge pages). > That is strange - the test code merely did this, > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0); Didn't you want 1<<30 instead? > > Do you know if overcommit was designed for 1GB pages? At least, read this > from Documentation/kernel-parameters.txt, > > hugepagesz= > ... > Note that 1GB pages can only be allocated at boot time > using hugepages= and not freed afterwards. > > How does it allow to be overcommitted for only being able to allocate at > boot time? Sorry, I am not very much familiar with 1GB pages but the hugetlb code is not page size specific AFAICS so if there are no other background things than it should just work. > > > > Also, nr_overcommit_hugepages was overwritten with such a strange > > > value after overcommit failure. Should we just remove this file from > > > sysfs for simplicity? > > > > This is strange. The value is set only in hugetlb_overcommit_handler > > which is a sysctl handler. > > > > Are you sure that you are not changing the value by the /sys interface > > somewhere (there is no check for the value so you can set what-ever > > value you like)? I fail to see any mmap code path which would change > > this value. > I could double-check here, but it is not important if the fact is that > overcommit is not supported for 1GB pages. What is the complete test case? > > > Btw. which kernel version are you using. > mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present > in 2.6.18. > > Thanks. > > CAI Qian > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-04 10:52 ` Michal Hocko @ 2011-01-05 4:52 ` CAI Qian 2011-01-05 8:43 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: CAI Qian @ 2011-01-05 4:52 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm ----- Original Message ----- > On Tue 04-01-11 05:21:46, CAI Qian wrote: > > > > > > 3) overcommit 2gb hugepages. > > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, > > > > MAP_SHARED, > > > > 3, 0) = -1 ENOMEM (Cannot allocate memory) > > > > > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 > > > 1GB > > > huge pages). > > That is strange - the test code merely did this, > > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0); > > Didn't you want 1<<30 instead? No, it was expecting to use both the allocate + overcommited 1GB pages. > > > Are you sure that you are not changing the value by the /sys > > > interface > > > somewhere (there is no check for the value so you can set > > > what-ever > > > value you like)? I fail to see any mmap code path which would > > > change > > > this value. > > I could double-check here, but it is not important if the fact is > > that > > overcommit is not supported for 1GB pages. > > What is the complete test case? Here is the reproducer I was using. The trick to reproduce this is to run at the end. echo "" >/proc/sys/vm/nr_overcommit_hugepages CAI Qian --------------------- #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/mount.h> #include <sys/shm.h> #include <sys/ipc.h> #include <unistd.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <ctype.h> #define PROTECTION (PROT_READ | PROT_WRITE) #define ADDR (void *)(0x0UL) #define FLAGS (MAP_SHARED) static void setup(void); static void cleanup(void); static void overcommit(void); int main(int argc, char *argv[]) { setup(); overcommit(); cleanup(); return 0; } static void overcommit(void) { void *addr = NULL; int fd = -1; char s[BUFSIZ]; snprintf(s, BUFSIZ, "/mnt/hugemmap05/file"); fd = open(s, O_CREAT | O_RDWR, 0755); if (fd == -1) perror("open"); addr = mmap(ADDR, 2UL<<30, PROTECTION, FLAGS, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); cleanup(); } close(fd); unlink(s); } static void cleanup(void) { system("echo "" >/proc/sys/vm/nr_overcommit_hugepages"); system("umount /mnt/hugemmap05"); exit(1); } static void setup(void) { system("echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages"); system("mkdir /mnt/hugemmap05"); system("mount none -t hugetlbfs /mnt/hugemmap05"); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 4:52 ` CAI Qian @ 2011-01-05 8:43 ` Michal Hocko 2011-01-05 8:54 ` CAI Qian ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Michal Hocko @ 2011-01-05 8:43 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan [Let's CC Andrew to pick up the patch - please see bellow] On Tue 04-01-11 23:52:42, CAI Qian wrote: > ----- Original Message ----- > > On Tue 04-01-11 05:21:46, CAI Qian wrote: > > > > > > > > 3) overcommit 2gb hugepages. > > > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, > > > > > MAP_SHARED, > > > > > 3, 0) = -1 ENOMEM (Cannot allocate memory) > > > > > > > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 > > > > 1GB > > > > huge pages). > > > That is strange - the test code merely did this, > > > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0); > > > > Didn't you want 1<<30 instead? > No, it was expecting to use both the allocate + overcommited 1GB pages. Then you propably wanted 2*1UL<<30 rather than 2<<30 which is something different than you want, I guess. Anyway this is not related to the bogus value in nr_overcommit_hugepages after your testcase. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 8:43 ` Michal Hocko @ 2011-01-05 8:54 ` CAI Qian 2011-01-05 9:51 ` Michal Hocko 2011-01-05 15:36 ` CAI Qian 2011-01-05 20:59 ` Andrew Morton 2 siblings, 1 reply; 18+ messages in thread From: CAI Qian @ 2011-01-05 8:54 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan > OK, this explains the bogus value because hugetlb_overcommit_handler > doesn't check the return value of proc_doulongvec_minmax which fails > for > "\n" string which you are writing to the file so we end up setting a > random value from the stack. The following patch should fix this: > > Btw. what did you want to achieve by this command? Just to do some testing for robustness. :) CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 8:54 ` CAI Qian @ 2011-01-05 9:51 ` Michal Hocko 0 siblings, 0 replies; 18+ messages in thread From: Michal Hocko @ 2011-01-05 9:51 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan On Wed 05-01-11 03:54:16, CAI Qian wrote: > > > OK, this explains the bogus value because hugetlb_overcommit_handler > > doesn't check the return value of proc_doulongvec_minmax which fails > > for > > "\n" string which you are writing to the file so we end up setting a > > random value from the stack. The following patch should fix this: > > > > Btw. what did you want to achieve by this command? > Just to do some testing for robustness. :) OK, you hit the nail ;) I have just noticed I forgot to add your Reported-by: tag and I guess this is also a stable material. I will repost the patch once Andrew says he is going to take it. Thanks > > CAI Qian > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 8:43 ` Michal Hocko 2011-01-05 8:54 ` CAI Qian @ 2011-01-05 15:36 ` CAI Qian 2011-01-05 15:59 ` Michal Hocko 2011-01-05 20:59 ` Andrew Morton 2 siblings, 1 reply; 18+ messages in thread From: CAI Qian @ 2011-01-05 15:36 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan I have a new patch to address boarder issues. --- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 15:36 ` CAI Qian @ 2011-01-05 15:59 ` Michal Hocko 2011-01-05 16:42 ` CAI Qian 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2011-01-05 15:59 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan On Wed 05-01-11 10:36:47, CAI Qian wrote: > From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00 2001 > From: CAI Qian <caiqian@redhat.com> > Date: Wed, 5 Jan 2011 23:26:57 +0800 > Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and nr_overcommit_hugepages > > First, nr_*hugepages* in procfs and sysfs do not check for invalid > input like "". Second, when using oversize pages, nr_*hugepages* are > expected to be allocated during boot time. Therefore, return -EINVAL > for those cases. I think that the two things should be split into two patches - one for the proper input data handling and the other one for the size check. Albeit, I am not sure about the size check because this is a thing that is just a current implementation limitation and can be change later. > > Signed-off-by: CAI Qian <caiqian@redhat.com> > --- > fs/sysfs/file.c | 2 ++ > mm/hugetlb.c | 18 ++++++++++++++++-- > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c > index da3fefe..9f4ea67 100644 > --- a/fs/sysfs/file.c > +++ b/fs/sysfs/file.c > @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry, struct sysfs_buffer * buffer, size_t > return -ENODEV; > > rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count); > + if (!rc) > + return -EINVAL; This doesn't look correct, you would imbalance sysfs_{get,put}_active. [...] -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 15:59 ` Michal Hocko @ 2011-01-05 16:42 ` CAI Qian 2011-01-05 16:44 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: CAI Qian @ 2011-01-05 16:42 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan ----- Original Message ----- > On Wed 05-01-11 10:36:47, CAI Qian wrote: > > From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00 > > 2001 > > From: CAI Qian <caiqian@redhat.com> > > Date: Wed, 5 Jan 2011 23:26:57 +0800 > > Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and > > nr_overcommit_hugepages > > > > First, nr_*hugepages* in procfs and sysfs do not check for invalid > > input like "". Second, when using oversize pages, nr_*hugepages* are > > expected to be allocated during boot time. Therefore, return -EINVAL > > for those cases. > > I think that the two things should be split into two patches - one for > the proper input data handling and the other one for the size check. > Albeit, I am not sure about the size check because this is a thing > that is just a current implementation limitation and can be change later. OK, will do two patches. > > Signed-off-by: CAI Qian <caiqian@redhat.com> > > --- > > fs/sysfs/file.c | 2 ++ > > mm/hugetlb.c | 18 ++++++++++++++++-- > > 2 files changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c > > index da3fefe..9f4ea67 100644 > > --- a/fs/sysfs/file.c > > +++ b/fs/sysfs/file.c > > @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry, > > struct sysfs_buffer * buffer, size_t > > return -ENODEV; > > > > rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count); > > + if (!rc) > > + return -EINVAL; > > This doesn't look correct, you would imbalance sysfs_{get,put}_active. Right, I'll fix this up. Thanks. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 16:42 ` CAI Qian @ 2011-01-05 16:44 ` Michal Hocko 2011-01-05 17:00 ` CAI Qian 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2011-01-05 16:44 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan On Wed 05-01-11 11:42:01, CAI Qian wrote: > > > ----- Original Message ----- > > On Wed 05-01-11 10:36:47, CAI Qian wrote: > > > From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00 > > > 2001 > > > From: CAI Qian <caiqian@redhat.com> > > > Date: Wed, 5 Jan 2011 23:26:57 +0800 > > > Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and > > > nr_overcommit_hugepages > > > > > > First, nr_*hugepages* in procfs and sysfs do not check for invalid > > > input like "". Second, when using oversize pages, nr_*hugepages* are > > > expected to be allocated during boot time. Therefore, return -EINVAL > > > for those cases. > > > > I think that the two things should be split into two patches - one for > > the proper input data handling and the other one for the size check. > > Albeit, I am not sure about the size check because this is a thing > > that is just a current implementation limitation and can be change later. > OK, will do two patches. Then you can use the patch I sent earlier and make just a size check patch on top of it, right? > > > > Signed-off-by: CAI Qian <caiqian@redhat.com> > > > --- > > > fs/sysfs/file.c | 2 ++ > > > mm/hugetlb.c | 18 ++++++++++++++++-- > > > 2 files changed, 18 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c > > > index da3fefe..9f4ea67 100644 > > > --- a/fs/sysfs/file.c > > > +++ b/fs/sysfs/file.c > > > @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry, > > > struct sysfs_buffer * buffer, size_t > > > return -ENODEV; > > > > > > rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count); > > > + if (!rc) > > > + return -EINVAL; > > > > This doesn't look correct, you would imbalance sysfs_{get,put}_active. > Right, I'll fix this up. > > Thanks. > > CAI Qian -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 16:44 ` Michal Hocko @ 2011-01-05 17:00 ` CAI Qian 0 siblings, 0 replies; 18+ messages in thread From: CAI Qian @ 2011-01-05 17:00 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan > Then you can use the patch I sent earlier and make just a size check > patch on top of it, right? There is a sysfs path needs to be checked for invalid input too. Eric said he had patchs ready, so I can just let him to post it. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-05 8:43 ` Michal Hocko 2011-01-05 8:54 ` CAI Qian 2011-01-05 15:36 ` CAI Qian @ 2011-01-05 20:59 ` Andrew Morton 2011-01-06 10:04 ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko 2 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2011-01-05 20:59 UTC (permalink / raw) To: Michal Hocko; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan On Wed, 5 Jan 2011 09:43:57 +0100 Michal Hocko <mhocko@suse.cz> wrote: > ... > > proc_doulongvec_minmax may fail if the given buffer doesn't represent > a valid number. If we provide something invalid we will initialize the > resulting value (nr_overcommit_huge_pages in this case) to a random > value from the stack. > > The issue was introduced by a3d0c6aa when the default handler has been > replaced by the helper function where we do not check the return value. > > Reproducer: > echo "" > /proc/sys/vm/nr_overcommit_hugepages > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1928,7 +1928,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy, > > table->data = &tmp; > table->maxlen = sizeof(unsigned long); > - proc_doulongvec_minmax(table, write, buffer, length, ppos); > + if (proc_doulongvec_minmax(table, write, buffer, length, ppos)) > + return -EINVAL; proc_doulongvec_minmax() can return -EFAULT or -ENOMEM. It is incorrect to unconditionally convert those into -EINVAL. > if (write) { > NODEMASK_ALLOC(nodemask_t, nodes_allowed, hm, the code doesn't check that NODEMASK_ALLOC succeeded. That NODEMASK_ALLOC conversion was quite sloppy. --- a/mm/hugetlb.c~hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler-fix +++ a/mm/hugetlb.c @@ -1859,14 +1859,16 @@ static int hugetlb_sysctl_handler_common { struct hstate *h = &default_hstate; unsigned long tmp; + int ret; if (!write) tmp = h->max_huge_pages; table->data = &tmp; table->maxlen = sizeof(unsigned long); - if (proc_doulongvec_minmax(table, write, buffer, length, ppos)) - return -EINVAL; + ret = proc_doulongvec_minmax(table, write, buffer, length, ppos); + if (ret) + goto out; if (write) { NODEMASK_ALLOC(nodemask_t, nodes_allowed, @@ -1881,8 +1883,8 @@ static int hugetlb_sysctl_handler_common if (nodes_allowed != &node_states[N_HIGH_MEMORY]) NODEMASK_FREE(nodes_allowed); } - - return 0; +out: + return ret; } int hugetlb_sysctl_handler(struct ctl_table *table, int write, @@ -1920,22 +1922,24 @@ int hugetlb_overcommit_handler(struct ct { struct hstate *h = &default_hstate; unsigned long tmp; + int ret; if (!write) tmp = h->nr_overcommit_huge_pages; table->data = &tmp; table->maxlen = sizeof(unsigned long); - if (proc_doulongvec_minmax(table, write, buffer, length, ppos)) - return -EINVAL; + ret = proc_doulongvec_minmax(table, write, buffer, length, ppos); + if (ret) + goto out; if (write) { spin_lock(&hugetlb_lock); h->nr_overcommit_huge_pages = tmp; spin_unlock(&hugetlb_lock); } - - return 0; +out: + return ret; } #endif /* CONFIG_SYSCTL */ _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly 2011-01-05 20:59 ` Andrew Morton @ 2011-01-06 10:04 ` Michal Hocko 2011-01-06 20:38 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2011-01-06 10:04 UTC (permalink / raw) To: Andrew Morton; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan On Wed 05-01-11 12:59:59, Andrew Morton wrote: > On Wed, 5 Jan 2011 09:43:57 +0100 > Michal Hocko <mhocko@suse.cz> wrote: [...] > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1928,7 +1928,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy, > > > > table->data = &tmp; > > table->maxlen = sizeof(unsigned long); > > - proc_doulongvec_minmax(table, write, buffer, length, ppos); > > + if (proc_doulongvec_minmax(table, write, buffer, length, ppos)) > > + return -EINVAL; > > proc_doulongvec_minmax() can return -EFAULT or -ENOMEM. It is > incorrect to unconditionally convert those into -EINVAL. You are right, I have missed that. Thanks for fixing that up > > > if (write) { > > NODEMASK_ALLOC(nodemask_t, nodes_allowed, > > hm, the code doesn't check that NODEMASK_ALLOC succeeded. That > NODEMASK_ALLOC conversion was quite sloppy. What do you think about the patch bellow? I have based it on top of you mm patches (I was CCed): hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler.patch hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler-fix.patch hugetlb-do-not-allow-pagesize-=-max_order-pool-adjustment.patch hugetlb-do-not-allow-pagesize-=-max_order-pool-adjustment-fix.patch hugetlb-fix-handling-of-parse-errors-in-sysfs.patch Some of them didn't apply cleanly so I had to tweak them a bit so maybe I am missing some other patches. --- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly 2011-01-06 10:04 ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko @ 2011-01-06 20:38 ` Andrew Morton 2011-01-06 22:23 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2011-01-06 20:38 UTC (permalink / raw) To: Michal Hocko; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan On Thu, 6 Jan 2011 11:04:39 +0100 Michal Hocko <mhocko@suse.cz> wrote: > NODEMASK_ALLOC can use kmalloc if nodemask_t > 256 bytes so it might > fail with NULL as a result. Let's check the resulting variable and > fail with -ENOMEM if NODEMASK_ALLOC failed. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > --- > mm/hugetlb.c | 18 +++++++++++++++--- > 1 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4c0606c..21f31b2 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1439,14 +1439,19 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy, > struct hstate *h; > NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); > > + if (!nodes_allowed) { > + err = -ENOMEM; > + goto out; > + } Looks good to me. I was going to complain that it adds extra unneeded instructions in the case where the nodemasks are allocated on the stack. But it appears that gcc assumes that stack-based variables cannot have address zero, so if gcc sees this: { nodemask_t foo; if (!&foo) { stuff } } if just removes it all for us. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly 2011-01-06 20:38 ` Andrew Morton @ 2011-01-06 22:23 ` Michal Hocko 0 siblings, 0 replies; 18+ messages in thread From: Michal Hocko @ 2011-01-06 22:23 UTC (permalink / raw) To: Andrew Morton; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan On Thu 06-01-11 12:38:58, Andrew Morton wrote: > On Thu, 6 Jan 2011 11:04:39 +0100 > Michal Hocko <mhocko@suse.cz> wrote: > > > NODEMASK_ALLOC can use kmalloc if nodemask_t > 256 bytes so it might > > fail with NULL as a result. Let's check the resulting variable and > > fail with -ENOMEM if NODEMASK_ALLOC failed. > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > --- > > mm/hugetlb.c | 18 +++++++++++++++--- > > 1 files changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 4c0606c..21f31b2 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1439,14 +1439,19 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy, > > struct hstate *h; > > NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); > > > > + if (!nodes_allowed) { > > + err = -ENOMEM; > > + goto out; > > + } > > Looks good to me. I was going to complain that it adds extra unneeded > instructions in the case where the nodemasks are allocated on the > stack. But it appears that gcc assumes that stack-based variables > cannot have address zero, so if gcc sees this: > > { > nodemask_t foo; > > if (!&foo) { > stuff > } > } > > if just removes it all for us. Yes, I cannot find it anywhere (maybe just a bad searching pattern) but this seems to be the case for -O1, -O2 and -Os. Anyway it makes a perfect sense to me. Thanks for the review. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 2011-01-04 10:21 ` CAI Qian 2011-01-04 10:52 ` Michal Hocko @ 2011-01-04 17:21 ` Eric B Munson 1 sibling, 0 replies; 18+ messages in thread From: Eric B Munson @ 2011-01-04 17:21 UTC (permalink / raw) To: CAI Qian; +Cc: Michal Hocko, linux-mm [-- Attachment #1: Type: text/plain, Size: 2417 bytes --] On Tue, 04 Jan 2011, CAI Qian wrote: > > > > 3) overcommit 2gb hugepages. > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, > > > 3, 0) = -1 ENOMEM (Cannot allocate memory) > > > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB > > huge pages). > That is strange - the test code merely did this, > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0); > > Do you know if overcommit was designed for 1GB pages? At least, read this > from Documentation/kernel-parameters.txt, > > hugepagesz= > ... > Note that 1GB pages can only be allocated at boot time > using hugepages= and not freed afterwards. > > How does it allow to be overcommitted for only being able to allocate at > boot time? It does not, huge page sizes that have to be allocated at boot can not be overcommitted as the pool size cannot change after boot. > > > > Also, nr_overcommit_hugepages was overwritten with such a strange > > > value after overcommit failure. Should we just remove this file from > > > sysfs for simplicity? I don't think having pagesize+arch specific logic here is going to scale (we would need to check for 16GB pages on ppc64 as well because they have the same restrictions as 1GB pages on x86_64) but 1GB pages might be fine to overcommit on ia64. Perhaps the documentation needs to change to call this out specifically. > > > > This is strange. The value is set only in hugetlb_overcommit_handler > > which is a sysctl handler. > > > > Are you sure that you are not changing the value by the /sys interface > > somewhere (there is no check for the value so you can set what-ever > > value you like)? I fail to see any mmap code path which would change > > this value. > I could double-check here, but it is not important if the fact is that > overcommit is not supported for 1GB pages. > > > Btw. which kernel version are you using. > mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present > in 2.6.18. > > Thanks. > > CAI Qian > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-01-06 22:23 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1060163918.101411.1293793346203.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> 2010-12-31 11:08 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages CAI Qian 2011-01-04 9:56 ` Michal Hocko 2011-01-04 10:21 ` CAI Qian 2011-01-04 10:52 ` Michal Hocko 2011-01-05 4:52 ` CAI Qian 2011-01-05 8:43 ` Michal Hocko 2011-01-05 8:54 ` CAI Qian 2011-01-05 9:51 ` Michal Hocko 2011-01-05 15:36 ` CAI Qian 2011-01-05 15:59 ` Michal Hocko 2011-01-05 16:42 ` CAI Qian 2011-01-05 16:44 ` Michal Hocko 2011-01-05 17:00 ` CAI Qian 2011-01-05 20:59 ` Andrew Morton 2011-01-06 10:04 ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko 2011-01-06 20:38 ` Andrew Morton 2011-01-06 22:23 ` Michal Hocko 2011-01-04 17:21 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).