linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
       [not found] <1060163918.101411.1293793346203.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-12-31 11:08 ` CAI Qian
  2011-01-04  9:56   ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: CAI Qian @ 2010-12-31 11:08 UTC (permalink / raw)
  To: linux-mm

Hi,

Problem: nr_overcommit_hugepages for 1gb hugepage went crazy.

Symptom:
1) setup 1gb hugepages.
# cat /proc/cmdline
default_hugepagesz=1g hugepagesz=1g hugepages=1
# cat /proc/meminfo
...
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
...

2) set nr_overcommit_hugepages
# echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
1

3) overcommit 2gb hugepages.
mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory)
# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
18446744071589420672

As you can see from the above, it did not allow overcommit despite nr_overcommit_hugepages value. Also, nr_overcommit_hugepages was overwritten with such a strange value after overcommit failure. Should we just remove this file from sysfs for simplicity?

Thanks.

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2010-12-31 11:08 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages CAI Qian
@ 2011-01-04  9:56   ` Michal Hocko
  2011-01-04 10:21     ` CAI Qian
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2011-01-04  9:56 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm

On Fri 31-12-10 06:08:21, CAI Qian wrote:
> Hi,

Hi,

> 
> Problem: nr_overcommit_hugepages for 1gb hugepage went crazy.
> 
> Symptom:
> 1) setup 1gb hugepages.
> # cat /proc/cmdline
> default_hugepagesz=1g hugepagesz=1g hugepages=1
> # cat /proc/meminfo
> ...
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> ...
> 
> 2) set nr_overcommit_hugepages
> # echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
> 1
> 
> 3) overcommit 2gb hugepages.
> mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory)

Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB huge
pages).

> # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
> 18446744071589420672
> 
> As you can see from the above, it did not allow overcommit despite nr_overcommit_hugepages value.

You are trying to allocate much more than your overcommit allows.

> Also, nr_overcommit_hugepages was overwritten with such a strange
> value after overcommit failure. Should we just remove this file from
> sysfs for simplicity?

This is strange. The value is set only in hugetlb_overcommit_handler
which is a sysctl handler.

Are you sure that you are not changing the value by the /sys interface
somewhere (there is no check for the value so you can set what-ever
value you like)? I fail to see any mmap code path which would change
this value.

Btw. which kernel version are you using.

> 
> Thanks.
> 
> CAI Qian

Regards
-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-04  9:56   ` Michal Hocko
@ 2011-01-04 10:21     ` CAI Qian
  2011-01-04 10:52       ` Michal Hocko
  2011-01-04 17:21       ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson
  0 siblings, 2 replies; 18+ messages in thread
From: CAI Qian @ 2011-01-04 10:21 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm


> > 3) overcommit 2gb hugepages.
> > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED,
> > 3, 0) = -1 ENOMEM (Cannot allocate memory)
> 
> Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB
> huge pages).
That is strange - the test code merely did this,
addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0);

Do you know if overcommit was designed for 1GB pages? At least, read this
from Documentation/kernel-parameters.txt,

hugepagesz=
              ...
             Note that 1GB pages can only be allocated at boot time
             using hugepages= and not freed afterwards.

How does it allow to be overcommitted for only being able to allocate at
boot time?

> > Also, nr_overcommit_hugepages was overwritten with such a strange
> > value after overcommit failure. Should we just remove this file from
> > sysfs for simplicity?
> 
> This is strange. The value is set only in hugetlb_overcommit_handler
> which is a sysctl handler.
> 
> Are you sure that you are not changing the value by the /sys interface
> somewhere (there is no check for the value so you can set what-ever
> value you like)? I fail to see any mmap code path which would change
> this value.
I could double-check here, but it is not important if the fact is that
overcommit is not supported for 1GB pages.

> Btw. which kernel version are you using.
mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present
in 2.6.18.

Thanks.

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-04 10:21     ` CAI Qian
@ 2011-01-04 10:52       ` Michal Hocko
  2011-01-05  4:52         ` CAI Qian
  2011-01-04 17:21       ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson
  1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2011-01-04 10:52 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm

On Tue 04-01-11 05:21:46, CAI Qian wrote:
> 
> > > 3) overcommit 2gb hugepages.
> > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED,
> > > 3, 0) = -1 ENOMEM (Cannot allocate memory)
> > 
> > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB
> > huge pages).
> That is strange - the test code merely did this,
> addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0);

Didn't you want 1<<30 instead?

> 
> Do you know if overcommit was designed for 1GB pages? At least, read this
> from Documentation/kernel-parameters.txt,
> 
> hugepagesz=
>               ...
>              Note that 1GB pages can only be allocated at boot time
>              using hugepages= and not freed afterwards.
> 
> How does it allow to be overcommitted for only being able to allocate at
> boot time?

Sorry, I am not very much familiar with 1GB pages but the hugetlb code
is not page size specific AFAICS so if there are no other background
things than it should just work.

> 
> > > Also, nr_overcommit_hugepages was overwritten with such a strange
> > > value after overcommit failure. Should we just remove this file from
> > > sysfs for simplicity?
> > 
> > This is strange. The value is set only in hugetlb_overcommit_handler
> > which is a sysctl handler.
> > 
> > Are you sure that you are not changing the value by the /sys interface
> > somewhere (there is no check for the value so you can set what-ever
> > value you like)? I fail to see any mmap code path which would change
> > this value.
> I could double-check here, but it is not important if the fact is that
> overcommit is not supported for 1GB pages.

What is the complete test case?

> 
> > Btw. which kernel version are you using.
> mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present
> in 2.6.18.
> 
> Thanks.
> 
> CAI Qian
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-04 10:21     ` CAI Qian
  2011-01-04 10:52       ` Michal Hocko
@ 2011-01-04 17:21       ` Eric B Munson
  1 sibling, 0 replies; 18+ messages in thread
From: Eric B Munson @ 2011-01-04 17:21 UTC (permalink / raw)
  To: CAI Qian; +Cc: Michal Hocko, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2417 bytes --]

On Tue, 04 Jan 2011, CAI Qian wrote:

> 
> > > 3) overcommit 2gb hugepages.
> > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED,
> > > 3, 0) = -1 ENOMEM (Cannot allocate memory)
> > 
> > Hmm, you are trying to reserve/mmap a lot of memory (17179869182 1GB
> > huge pages).
> That is strange - the test code merely did this,
> addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0);
> 
> Do you know if overcommit was designed for 1GB pages? At least, read this
> from Documentation/kernel-parameters.txt,
> 
> hugepagesz=
>               ...
>              Note that 1GB pages can only be allocated at boot time
>              using hugepages= and not freed afterwards.
> 
> How does it allow to be overcommitted for only being able to allocate at
> boot time?

It does not, huge page sizes that have to be allocated at boot can not be
overcommitted as the pool size cannot change after boot.
> 
> > > Also, nr_overcommit_hugepages was overwritten with such a strange
> > > value after overcommit failure. Should we just remove this file from
> > > sysfs for simplicity?

I don't think having pagesize+arch specific logic here is going to scale (we
would need to check for 16GB pages on ppc64 as well because they have the same
restrictions as 1GB pages on x86_64) but 1GB pages might be fine to overcommit
on ia64.  Perhaps the documentation needs to change to call this out specifically.

> > 
> > This is strange. The value is set only in hugetlb_overcommit_handler
> > which is a sysctl handler.
> > 
> > Are you sure that you are not changing the value by the /sys interface
> > somewhere (there is no check for the value so you can set what-ever
> > value you like)? I fail to see any mmap code path which would change
> > this value.
> I could double-check here, but it is not important if the fact is that
> overcommit is not supported for 1GB pages.
> 
> > Btw. which kernel version are you using.
> mmotm 2010-12-02-16-34 version 2.6.37-rc4-mm1+. This problem is also present
> in 2.6.18.
> 
> Thanks.
> 
> CAI Qian
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-04 10:52       ` Michal Hocko
@ 2011-01-05  4:52         ` CAI Qian
  2011-01-05  8:43           ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: CAI Qian @ 2011-01-05  4:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm



----- Original Message -----
> On Tue 04-01-11 05:21:46, CAI Qian wrote:
> >
> > > > 3) overcommit 2gb hugepages.
> > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE,
> > > > MAP_SHARED,
> > > > 3, 0) = -1 ENOMEM (Cannot allocate memory)
> > >
> > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182
> > > 1GB
> > > huge pages).
> > That is strange - the test code merely did this,
> > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0);
> 
> Didn't you want 1<<30 instead?
No, it was expecting to use both the allocate + overcommited 1GB pages.

> > > Are you sure that you are not changing the value by the /sys
> > > interface
> > > somewhere (there is no check for the value so you can set
> > > what-ever
> > > value you like)? I fail to see any mmap code path which would
> > > change
> > > this value.
> > I could double-check here, but it is not important if the fact is
> > that
> > overcommit is not supported for 1GB pages.
> 
> What is the complete test case?
Here is the reproducer I was using. The trick to reproduce this is to run at the end.

echo "" >/proc/sys/vm/nr_overcommit_hugepages

CAI Qian

---------------------
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/shm.h>
#include <sys/ipc.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <ctype.h>

#define PROTECTION		(PROT_READ | PROT_WRITE)

#define ADDR (void *)(0x0UL)
#define FLAGS (MAP_SHARED)

static void setup(void);
static void cleanup(void);
static void overcommit(void);

int main(int argc, char *argv[])
{
	setup();
	overcommit();
	cleanup();

	return 0;
}

static void overcommit(void)
{
	void *addr = NULL;
	int fd = -1;
	char s[BUFSIZ];

	snprintf(s, BUFSIZ, "/mnt/hugemmap05/file");
	fd = open(s, O_CREAT | O_RDWR, 0755);
	if (fd == -1)
		perror("open");

	addr = mmap(ADDR, 2UL<<30, PROTECTION, FLAGS, fd, 0);
	if (addr == MAP_FAILED) {
		perror("mmap");
		cleanup();
	}
	close(fd);
	unlink(s);
}

static void cleanup(void)
{
	system("echo "" >/proc/sys/vm/nr_overcommit_hugepages");
	system("umount /mnt/hugemmap05");
	exit(1);
}

static void setup(void)
{
	system("echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages");
	system("mkdir /mnt/hugemmap05");
	system("mount none -t hugetlbfs /mnt/hugemmap05");
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05  4:52         ` CAI Qian
@ 2011-01-05  8:43           ` Michal Hocko
  2011-01-05  8:54             ` CAI Qian
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michal Hocko @ 2011-01-05  8:43 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan

[Let's CC Andrew to pick up the patch - please see bellow]

On Tue 04-01-11 23:52:42, CAI Qian wrote:
> ----- Original Message -----
> > On Tue 04-01-11 05:21:46, CAI Qian wrote:
> > >
> > > > > 3) overcommit 2gb hugepages.
> > > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE,
> > > > > MAP_SHARED,
> > > > > 3, 0) = -1 ENOMEM (Cannot allocate memory)
> > > >
> > > > Hmm, you are trying to reserve/mmap a lot of memory (17179869182
> > > > 1GB
> > > > huge pages).
> > > That is strange - the test code merely did this,
> > > addr = mmap(ADDR, 2<<30, PROTECTION, FLAGS, fd, 0);
> > 
> > Didn't you want 1<<30 instead?
> No, it was expecting to use both the allocate + overcommited 1GB pages.

Then you propably wanted 2*1UL<<30 rather than 2<<30 which is something
different than you want, I guess. Anyway this is not related to the
bogus value in nr_overcommit_hugepages after your testcase.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05  8:43           ` Michal Hocko
@ 2011-01-05  8:54             ` CAI Qian
  2011-01-05  9:51               ` Michal Hocko
  2011-01-05 15:36             ` CAI Qian
  2011-01-05 20:59             ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: CAI Qian @ 2011-01-05  8:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan


> OK, this explains the bogus value because hugetlb_overcommit_handler
> doesn't check the return value of proc_doulongvec_minmax which fails
> for
> "\n" string which you are writing to the file so we end up setting a
> random value from the stack. The following patch should fix this:
> 
> Btw. what did you want to achieve by this command?
Just to do some testing for robustness. :)

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05  8:54             ` CAI Qian
@ 2011-01-05  9:51               ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2011-01-05  9:51 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan

On Wed 05-01-11 03:54:16, CAI Qian wrote:
> 
> > OK, this explains the bogus value because hugetlb_overcommit_handler
> > doesn't check the return value of proc_doulongvec_minmax which fails
> > for
> > "\n" string which you are writing to the file so we end up setting a
> > random value from the stack. The following patch should fix this:
> > 
> > Btw. what did you want to achieve by this command?
> Just to do some testing for robustness. :)

OK, you hit the nail ;)
I have just noticed I forgot to add your Reported-by: tag and I guess
this is also a stable material. I will repost the patch once Andrew says
he is going to take it.

Thanks

> 
> CAI Qian
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05  8:43           ` Michal Hocko
  2011-01-05  8:54             ` CAI Qian
@ 2011-01-05 15:36             ` CAI Qian
  2011-01-05 15:59               ` Michal Hocko
  2011-01-05 20:59             ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: CAI Qian @ 2011-01-05 15:36 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan

I have a new patch to address boarder issues.

---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05 15:36             ` CAI Qian
@ 2011-01-05 15:59               ` Michal Hocko
  2011-01-05 16:42                 ` CAI Qian
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2011-01-05 15:59 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan

On Wed 05-01-11 10:36:47, CAI Qian wrote:
> From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00 2001
> From: CAI Qian <caiqian@redhat.com>
> Date: Wed, 5 Jan 2011 23:26:57 +0800
> Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and nr_overcommit_hugepages
> 
> First, nr_*hugepages* in procfs and sysfs do not check for invalid
> input like "". Second, when using oversize pages, nr_*hugepages* are
> expected to be allocated during boot time. Therefore, return -EINVAL
> for those cases.

I think that the two things should be split into two patches - one for
the proper input data handling and the other one for the size check.
Albeit, I am not sure about the size check because this is a thing that
is just a current implementation limitation and can be change later.

> 
> Signed-off-by: CAI Qian <caiqian@redhat.com>
> ---
>  fs/sysfs/file.c |    2 ++
>  mm/hugetlb.c    |   18 ++++++++++++++++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index da3fefe..9f4ea67 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry, struct sysfs_buffer * buffer, size_t
>  		return -ENODEV;
>  
>  	rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count);
> +	if (!rc)
> +		return -EINVAL;

This doesn't look correct, you would imbalance sysfs_{get,put}_active.

[...]
-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05 15:59               ` Michal Hocko
@ 2011-01-05 16:42                 ` CAI Qian
  2011-01-05 16:44                   ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: CAI Qian @ 2011-01-05 16:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan



----- Original Message -----
> On Wed 05-01-11 10:36:47, CAI Qian wrote:
> > From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00
> > 2001
> > From: CAI Qian <caiqian@redhat.com>
> > Date: Wed, 5 Jan 2011 23:26:57 +0800
> > Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and
> > nr_overcommit_hugepages
> >
> > First, nr_*hugepages* in procfs and sysfs do not check for invalid
> > input like "". Second, when using oversize pages, nr_*hugepages* are
> > expected to be allocated during boot time. Therefore, return -EINVAL
> > for those cases.
> 
> I think that the two things should be split into two patches - one for
> the proper input data handling and the other one for the size check.
> Albeit, I am not sure about the size check because this is a thing
> that is just a current implementation limitation and can be change later.
OK, will do two patches.

> > Signed-off-by: CAI Qian <caiqian@redhat.com>
> > ---
> >  fs/sysfs/file.c | 2 ++
> >  mm/hugetlb.c | 18 ++++++++++++++++--
> >  2 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > index da3fefe..9f4ea67 100644
> > --- a/fs/sysfs/file.c
> > +++ b/fs/sysfs/file.c
> > @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry,
> > struct sysfs_buffer * buffer, size_t
> >  		return -ENODEV;
> >
> >  	rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count);
> > + if (!rc)
> > + return -EINVAL;
> 
> This doesn't look correct, you would imbalance sysfs_{get,put}_active.
Right, I'll fix this up.

Thanks.

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05 16:42                 ` CAI Qian
@ 2011-01-05 16:44                   ` Michal Hocko
  2011-01-05 17:00                     ` CAI Qian
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2011-01-05 16:44 UTC (permalink / raw)
  To: CAI Qian; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan

On Wed 05-01-11 11:42:01, CAI Qian wrote:
> 
> 
> ----- Original Message -----
> > On Wed 05-01-11 10:36:47, CAI Qian wrote:
> > > From f90b54f9f5607128e375bd78d21e751c433b3cf6 Mon Sep 17 00:00:00
> > > 2001
> > > From: CAI Qian <caiqian@redhat.com>
> > > Date: Wed, 5 Jan 2011 23:26:57 +0800
> > > Subject: [PATCH] hugetlbfs: check invalid nr_hugepages and
> > > nr_overcommit_hugepages
> > >
> > > First, nr_*hugepages* in procfs and sysfs do not check for invalid
> > > input like "". Second, when using oversize pages, nr_*hugepages* are
> > > expected to be allocated during boot time. Therefore, return -EINVAL
> > > for those cases.
> > 
> > I think that the two things should be split into two patches - one for
> > the proper input data handling and the other one for the size check.
> > Albeit, I am not sure about the size check because this is a thing
> > that is just a current implementation limitation and can be change later.
> OK, will do two patches.

Then you can use the patch I sent earlier and make just a size check
patch on top of it, right?

> 
> > > Signed-off-by: CAI Qian <caiqian@redhat.com>
> > > ---
> > >  fs/sysfs/file.c | 2 ++
> > >  mm/hugetlb.c | 18 ++++++++++++++++--
> > >  2 files changed, 18 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > > index da3fefe..9f4ea67 100644
> > > --- a/fs/sysfs/file.c
> > > +++ b/fs/sysfs/file.c
> > > @@ -207,6 +207,8 @@ flush_write_buffer(struct dentry * dentry,
> > > struct sysfs_buffer * buffer, size_t
> > >  		return -ENODEV;
> > >
> > >  	rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count);
> > > + if (!rc)
> > > + return -EINVAL;
> > 
> > This doesn't look correct, you would imbalance sysfs_{get,put}_active.
> Right, I'll fix this up.
> 
> Thanks.
> 
> CAI Qian

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05 16:44                   ` Michal Hocko
@ 2011-01-05 17:00                     ` CAI Qian
  0 siblings, 0 replies; 18+ messages in thread
From: CAI Qian @ 2011-01-05 17:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Nishanth Aravamudan


> Then you can use the patch I sent earlier and make just a size check
> patch on top of it, right?
There is a sysfs path needs to be checked for invalid input too. Eric
said he had patchs ready, so I can just let him to post it.

CAI Qian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
  2011-01-05  8:43           ` Michal Hocko
  2011-01-05  8:54             ` CAI Qian
  2011-01-05 15:36             ` CAI Qian
@ 2011-01-05 20:59             ` Andrew Morton
  2011-01-06 10:04               ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2011-01-05 20:59 UTC (permalink / raw)
  To: Michal Hocko; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan

On Wed, 5 Jan 2011 09:43:57 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> ...
>
> proc_doulongvec_minmax may fail if the given buffer doesn't represent
> a valid number. If we provide something invalid we will initialize the
> resulting value (nr_overcommit_huge_pages in this case) to a random
> value from the stack.
> 
> The issue was introduced by a3d0c6aa when the default handler has been
> replaced by the helper function where we do not check the return value.
> 
> Reproducer:
> echo "" > /proc/sys/vm/nr_overcommit_hugepages
> 
> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1928,7 +1928,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
>  
>  	table->data = &tmp;
>  	table->maxlen = sizeof(unsigned long);
> -	proc_doulongvec_minmax(table, write, buffer, length, ppos);
> +	if (proc_doulongvec_minmax(table, write, buffer, length, ppos))
> +		return -EINVAL;

proc_doulongvec_minmax() can return -EFAULT or -ENOMEM.  It is
incorrect to unconditionally convert those into -EINVAL.

>  	if (write) {
>  		NODEMASK_ALLOC(nodemask_t, nodes_allowed,

hm, the code doesn't check that NODEMASK_ALLOC succeeded.  That
NODEMASK_ALLOC conversion was quite sloppy.


--- a/mm/hugetlb.c~hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler-fix
+++ a/mm/hugetlb.c
@@ -1859,14 +1859,16 @@ static int hugetlb_sysctl_handler_common
 {
 	struct hstate *h = &default_hstate;
 	unsigned long tmp;
+	int ret;
 
 	if (!write)
 		tmp = h->max_huge_pages;
 
 	table->data = &tmp;
 	table->maxlen = sizeof(unsigned long);
-	if (proc_doulongvec_minmax(table, write, buffer, length, ppos))
-		return -EINVAL;
+	ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+	if (ret)
+		goto out;
 
 	if (write) {
 		NODEMASK_ALLOC(nodemask_t, nodes_allowed,
@@ -1881,8 +1883,8 @@ static int hugetlb_sysctl_handler_common
 		if (nodes_allowed != &node_states[N_HIGH_MEMORY])
 			NODEMASK_FREE(nodes_allowed);
 	}
-
-	return 0;
+out:
+	return ret;
 }
 
 int hugetlb_sysctl_handler(struct ctl_table *table, int write,
@@ -1920,22 +1922,24 @@ int hugetlb_overcommit_handler(struct ct
 {
 	struct hstate *h = &default_hstate;
 	unsigned long tmp;
+	int ret;
 
 	if (!write)
 		tmp = h->nr_overcommit_huge_pages;
 
 	table->data = &tmp;
 	table->maxlen = sizeof(unsigned long);
-	if (proc_doulongvec_minmax(table, write, buffer, length, ppos))
-		return -EINVAL;
+	ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);
+	if (ret)
+		goto out;
 
 	if (write) {
 		spin_lock(&hugetlb_lock);
 		h->nr_overcommit_huge_pages = tmp;
 		spin_unlock(&hugetlb_lock);
 	}
-
-	return 0;
+out:
+	return ret;
 }
 
 #endif /* CONFIG_SYSCTL */
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly
  2011-01-05 20:59             ` Andrew Morton
@ 2011-01-06 10:04               ` Michal Hocko
  2011-01-06 20:38                 ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2011-01-06 10:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan

On Wed 05-01-11 12:59:59, Andrew Morton wrote:
> On Wed, 5 Jan 2011 09:43:57 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1928,7 +1928,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
> >  
> >  	table->data = &tmp;
> >  	table->maxlen = sizeof(unsigned long);
> > -	proc_doulongvec_minmax(table, write, buffer, length, ppos);
> > +	if (proc_doulongvec_minmax(table, write, buffer, length, ppos))
> > +		return -EINVAL;
> 
> proc_doulongvec_minmax() can return -EFAULT or -ENOMEM.  It is
> incorrect to unconditionally convert those into -EINVAL.

You are right, I have missed that. Thanks for fixing that up
> 
> >  	if (write) {
> >  		NODEMASK_ALLOC(nodemask_t, nodes_allowed,
> 
> hm, the code doesn't check that NODEMASK_ALLOC succeeded.  That
> NODEMASK_ALLOC conversion was quite sloppy.

What do you think about the patch bellow? I have based it on top of
you mm patches (I was CCed):
hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler.patch
hugetlb-check-the-return-value-of-string-conversion-in-sysctl-handler-fix.patch
hugetlb-do-not-allow-pagesize-=-max_order-pool-adjustment.patch
hugetlb-do-not-allow-pagesize-=-max_order-pool-adjustment-fix.patch
hugetlb-fix-handling-of-parse-errors-in-sysfs.patch

Some of them didn't apply cleanly so I had to tweak them a bit so maybe
I am missing some other patches.
---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly
  2011-01-06 10:04               ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko
@ 2011-01-06 20:38                 ` Andrew Morton
  2011-01-06 22:23                   ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2011-01-06 20:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan

On Thu, 6 Jan 2011 11:04:39 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> NODEMASK_ALLOC can use kmalloc if nodemask_t > 256 bytes so it might
> fail with NULL as a result. Let's check the resulting variable and
> fail with -ENOMEM if NODEMASK_ALLOC failed.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/hugetlb.c |   18 +++++++++++++++---
>  1 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4c0606c..21f31b2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1439,14 +1439,19 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
>  	struct hstate *h;
>  	NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);
>  
> +	if (!nodes_allowed) {
> +		err = -ENOMEM;
> +		goto out;
> +	}

Looks good to me.  I was going to complain that it adds extra unneeded
instructions in the case where the nodemasks are allocated on the
stack.  But it appears that gcc assumes that stack-based variables
cannot have address zero, so if gcc sees this:

	{
		nodemask_t foo;

		if (!&foo) {
			stuff
		}
	}

if just removes it all for us.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly
  2011-01-06 20:38                 ` Andrew Morton
@ 2011-01-06 22:23                   ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2011-01-06 22:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: CAI Qian, linux-mm, Nishanth Aravamudan

On Thu 06-01-11 12:38:58, Andrew Morton wrote:
> On Thu, 6 Jan 2011 11:04:39 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > NODEMASK_ALLOC can use kmalloc if nodemask_t > 256 bytes so it might
> > fail with NULL as a result. Let's check the resulting variable and
> > fail with -ENOMEM if NODEMASK_ALLOC failed.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> >  mm/hugetlb.c |   18 +++++++++++++++---
> >  1 files changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4c0606c..21f31b2 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1439,14 +1439,19 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
> >  	struct hstate *h;
> >  	NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);
> >  
> > +	if (!nodes_allowed) {
> > +		err = -ENOMEM;
> > +		goto out;
> > +	}
> 
> Looks good to me.  I was going to complain that it adds extra unneeded
> instructions in the case where the nodemasks are allocated on the
> stack.  But it appears that gcc assumes that stack-based variables
> cannot have address zero, so if gcc sees this:
> 
> 	{
> 		nodemask_t foo;
> 
> 		if (!&foo) {
> 			stuff
> 		}
> 	}
> 
> if just removes it all for us.

Yes, I cannot find it anywhere (maybe just a bad searching pattern) but
this seems to be the case for -O1, -O2 and -Os. Anyway it makes a
perfect sense to me.

Thanks for the review.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-01-06 22:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1060163918.101411.1293793346203.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-12-31 11:08 ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages CAI Qian
2011-01-04  9:56   ` Michal Hocko
2011-01-04 10:21     ` CAI Qian
2011-01-04 10:52       ` Michal Hocko
2011-01-05  4:52         ` CAI Qian
2011-01-05  8:43           ` Michal Hocko
2011-01-05  8:54             ` CAI Qian
2011-01-05  9:51               ` Michal Hocko
2011-01-05 15:36             ` CAI Qian
2011-01-05 15:59               ` Michal Hocko
2011-01-05 16:42                 ` CAI Qian
2011-01-05 16:44                   ` Michal Hocko
2011-01-05 17:00                     ` CAI Qian
2011-01-05 20:59             ` Andrew Morton
2011-01-06 10:04               ` PATCH: hugetlb: handle NODEMASK_ALLOC failure correctly Michal Hocko
2011-01-06 20:38                 ` Andrew Morton
2011-01-06 22:23                   ` Michal Hocko
2011-01-04 17:21       ` [RFC] /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages Eric B Munson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).