* [PATCH] hugetlb: remove overcommit sysfs for 1GB pages [not found] <2026935485.119940.1294126785849.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> @ 2011-01-04 7:42 ` CAI Qian 2011-01-04 17:56 ` Eric B Munson 0 siblings, 1 reply; 7+ messages in thread From: CAI Qian @ 2011-01-04 7:42 UTC (permalink / raw) To: linux-mm [-- Attachment #1: Type: text/plain, Size: 2254 bytes --] 1GB pages cannot be over-commited, attempting to do so results in corruption, so remove those files for simplicity. Symptoms: 1) setup 1gb hugepages. cat /proc/cmdline ...default_hugepagesz=1g hugepagesz=1g hugepages=1... cat /proc/meminfo ... HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB ... 2) set nr_overcommit_hugepages echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 1 3) overcommit 2gb hugepages. mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory) cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 18446744071589420672 Signed-off-by: CAI Qian <caiqian@redhat.com> --- mm/hugetlb.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c4a3558..adc9a9f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1587,6 +1587,20 @@ static struct attribute_group hstate_attr_group = { .attrs = hstate_attrs, }; +static struct attribute *hstate_1gb_attrs[] = { + &nr_hugepages_attr.attr, + &free_hugepages_attr.attr, + &resv_hugepages_attr.attr, +#ifdef CONFIG_NUMA + &nr_hugepages_mempolicy_attr.attr, +#endif + NULL, +}; + +static struct attribute_group hstate_1gb_attr_group = { + .attrs = hstate_1gb_attrs, +}; + static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct kobject **hstate_kobjs, struct attribute_group *hstate_attr_group) @@ -1615,8 +1629,13 @@ static void __init hugetlb_sysfs_init(void) return; for_each_hstate(h) { - err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, - hstate_kobjs, &hstate_attr_group); + /* 1GB pages can not be over-committed, so don't need those files. */ + if (huge_page_size(h) == 1UL << 30) + err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, + hstate_kobjs, &hstate_1gb_attr_group); + else + err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, + hstate_kobjs, &hstate_attr_group); if (err) printk(KERN_ERR "Hugetlb: Unable to add hstate %s", h->name); -- 1.7.3.2 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-hugetlb-remove-overcommit-sysfs-for-1GB-pages.patch --] [-- Type: text/x-patch; name=0001-hugetlb-remove-overcommit-sysfs-for-1GB-pages.patch, Size: 2465 bytes --] From c88209f7a21ed0c257cc215a7874df50d5e525d5 Mon Sep 17 00:00:00 2001 From: CAI Qian <caiqian@redhat.com> Date: Tue, 4 Jan 2011 15:30:00 +0800 Subject: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 1GB pages cannot be over-commited, attempting to do so results in corruption, so remove those files for simplicity. Symptoms: 1) setup 1gb hugepages. cat /proc/cmdline ...default_hugepagesz=1g hugepagesz=1g hugepages=1... cat /proc/meminfo ... HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB ... 2) set nr_overcommit_hugepages echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 1 3) overcommit 2gb hugepages. mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory) cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages 18446744071589420672 Signed-off-by: CAI Qian <caiqian@redhat.com> --- mm/hugetlb.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c4a3558..adc9a9f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1587,6 +1587,20 @@ static struct attribute_group hstate_attr_group = { .attrs = hstate_attrs, }; +static struct attribute *hstate_1gb_attrs[] = { + &nr_hugepages_attr.attr, + &free_hugepages_attr.attr, + &resv_hugepages_attr.attr, +#ifdef CONFIG_NUMA + &nr_hugepages_mempolicy_attr.attr, +#endif + NULL, +}; + +static struct attribute_group hstate_1gb_attr_group = { + .attrs = hstate_1gb_attrs, +}; + static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, struct kobject **hstate_kobjs, struct attribute_group *hstate_attr_group) @@ -1615,8 +1629,13 @@ static void __init hugetlb_sysfs_init(void) return; for_each_hstate(h) { - err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, - hstate_kobjs, &hstate_attr_group); + /* 1GB pages can not be over-committed, so don't need those files. */ + if (huge_page_size(h) == 1UL << 30) + err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, + hstate_kobjs, &hstate_1gb_attr_group); + else + err = hugetlb_sysfs_add_hstate(h, hugepages_kobj, + hstate_kobjs, &hstate_attr_group); if (err) printk(KERN_ERR "Hugetlb: Unable to add hstate %s", h->name); -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-04 7:42 ` [PATCH] hugetlb: remove overcommit sysfs for 1GB pages CAI Qian @ 2011-01-04 17:56 ` Eric B Munson 2011-01-05 5:02 ` CAI Qian 2011-01-05 15:02 ` CAI Qian 0 siblings, 2 replies; 7+ messages in thread From: Eric B Munson @ 2011-01-04 17:56 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, mel [-- Attachment #1: Type: text/plain, Size: 1723 bytes --] On Tue, 04 Jan 2011, CAI Qian wrote: > 1GB pages cannot be over-commited, attempting to do so results in corruption, > so remove those files for simplicity. > > Symptoms: > 1) setup 1gb hugepages. > > cat /proc/cmdline > ...default_hugepagesz=1g hugepagesz=1g hugepages=1... > > cat /proc/meminfo > ... > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 1048576 kB > ... > > 2) set nr_overcommit_hugepages > > echo 1 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > 1 > > 3) overcommit 2gb hugepages. > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, 3, > 0) = -1 ENOMEM (Cannot allocate memory) > > cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > 18446744071589420672 > > Signed-off-by: CAI Qian <caiqian@redhat.com> There are a couple of issues here: first, I think the overcommit value being overwritten is a bug and this needs to be addressed and fixed before we cover it by removing the sysfs file. Second, will it be easier for userspace to work with some huge page sizes having the overcommit file and others not or making the kernel hand EINVAL back when nr_overcommit is is changed for an unsupported page size? Finally, this is a problem for more than 1GB pages on x86_64. It is true for all pages > 1 << MAX_ORDER. Once the overcommit bug is fixed and the second issue is answered, the solution that is used (either EINVAL or no overcommit file) needs to happen for all cases where it applies, not just the 1GB case. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-04 17:56 ` Eric B Munson @ 2011-01-05 5:02 ` CAI Qian 2011-01-05 7:31 ` CAI Qian 2011-01-05 15:02 ` CAI Qian 1 sibling, 1 reply; 7+ messages in thread From: CAI Qian @ 2011-01-05 5:02 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-mm, mel > There are a couple of issues here: first, I think the overcommit value being overwritten > is a bug and this needs to be addressed and fixed before we cover it by removing the sysfs > file. I have a reproducer mentioned in another thread. The trick is to run this command at the end, echo "" >/proc/sys/vm/nr_overcommit_hugepages > Second, will it be easier for userspace to work with some huge page > sizes having the > overcommit file and others not or making the kernel hand EINVAL back > when nr_overcommit is > is changed for an unsupported page size? I am not sure if it is normal for sysfs and procfs entries to return EINVAL. At least, nr_hugepages files are not capable to return EINVAL for 1GB pages case as well. It merely keep the value intact when trying to change it. I was also wondering if it is possible to modify those files' permission based on the page size, but it looks like hard to implement since sysctl files permission is pretty much static. > Finally, this is a problem for more than 1GB pages on x86_64. It is > true for all pages > > 1 << MAX_ORDER. Once the overcommit bug is fixed and the second issue > is answered, the > solution that is used (either EINVAL or no overcommit file) needs to > happen for all cases > where it applies, not just the 1GB case. OK, good point. Thanks. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-05 5:02 ` CAI Qian @ 2011-01-05 7:31 ` CAI Qian 2011-01-05 11:36 ` CAI Qian 0 siblings, 1 reply; 7+ messages in thread From: CAI Qian @ 2011-01-05 7:31 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-mm, mel ----- Original Message ----- > > There are a couple of issues here: first, I think the overcommit > > value being overwritten > > is a bug and this needs to be addressed and fixed before we cover it > > by removing the sysfs > > file. > I have a reproducer mentioned in another thread. The trick is to run > this command at the end, > > echo "" >/proc/sys/vm/nr_overcommit_hugepages This caused the process hung. # echo "" >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages # echo t >/proc/sysrq-trigger ... bash R running task 0 3189 3183 0x00000080 ffff8804196bfe58 ffffffff8149fcab 00007f4ab98c1700 ffffffff81130a40 ffff8804194495c0 0000000000014d80 0000000000000246 ffff8804196be010 ffff8804196bffd8 0000000000000000 00007f4ab98c1700 0000000000000000 Call Trace: [<ffffffff81130a40>] ? nr_overcommit_hugepages_store+0x0/0x70 [<ffffffff8100c9ae>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff81130a40>] ? nr_overcommit_hugepages_store+0x0/0x70 [<ffffffff81226236>] ? strict_strtoul+0x46/0x70 [<ffffffff81130a7a>] ? nr_overcommit_hugepages_store+0x3a/0x70 [<ffffffff811e047b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d9473>] ? security_file_permission+0x23/0x90 [<ffffffff811b9ae5>] ? sysfs_write_file+0x115/0x180 [<ffffffff811504f8>] ? vfs_write+0xc8/0x190 [<ffffffff81150d61>] ? sys_write+0x51/0x90 [<ffffffff8100c0f4>] ? sysret_audit+0x16/0x20 ... CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-05 7:31 ` CAI Qian @ 2011-01-05 11:36 ` CAI Qian 0 siblings, 0 replies; 7+ messages in thread From: CAI Qian @ 2011-01-05 11:36 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-mm, mel > This caused the process hung. > # echo "" > >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > # echo t >/proc/sysrq-trigger > ... > bash R running task 0 3189 3183 0x00000080 > ffff8804196bfe58 ffffffff8149fcab 00007f4ab98c1700 ffffffff81130a40 > ffff8804194495c0 0000000000014d80 0000000000000246 ffff8804196be010 > ffff8804196bffd8 0000000000000000 00007f4ab98c1700 0000000000000000 > Call Trace: > [<ffffffff81130a40>] ? nr_overcommit_hugepages_store+0x0/0x70 > [<ffffffff8100c9ae>] ? apic_timer_interrupt+0xe/0x20 > [<ffffffff81130a40>] ? nr_overcommit_hugepages_store+0x0/0x70 > [<ffffffff81226236>] ? strict_strtoul+0x46/0x70 > [<ffffffff81130a7a>] ? nr_overcommit_hugepages_store+0x3a/0x70 > [<ffffffff811e047b>] ? selinux_file_permission+0xfb/0x150 > [<ffffffff811d9473>] ? security_file_permission+0x23/0x90 > [<ffffffff811b9ae5>] ? sysfs_write_file+0x115/0x180 > [<ffffffff811504f8>] ? vfs_write+0xc8/0x190 > [<ffffffff81150d61>] ? sys_write+0x51/0x90 > [<ffffffff8100c0f4>] ? sysret_audit+0x16/0x20 Looks like it is looping here... ... audit_syscall_exit sys_write vfs_write sysfs_write_file nr_overcommit_hugepages_store audit_syscall_exit audit_syscall_exit sys_write vfs_write sysfs_write_file nr_overcommit_hugepages_store audit_syscall_exit ... CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-04 17:56 ` Eric B Munson 2011-01-05 5:02 ` CAI Qian @ 2011-01-05 15:02 ` CAI Qian 2011-01-05 16:44 ` Eric B Munson 1 sibling, 1 reply; 7+ messages in thread From: CAI Qian @ 2011-01-05 15:02 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-mm, mel ----- Original Message ----- > On Tue, 04 Jan 2011, CAI Qian wrote: > > > 1GB pages cannot be over-commited, attempting to do so results in > > corruption, > > so remove those files for simplicity. > > > > Symptoms: > > 1) setup 1gb hugepages. > > > > cat /proc/cmdline > > ...default_hugepagesz=1g hugepagesz=1g hugepages=1... > > > > cat /proc/meminfo > > ... > > HugePages_Total: 1 > > HugePages_Free: 1 > > HugePages_Rsvd: 0 > > HugePages_Surp: 0 > > Hugepagesize: 1048576 kB > > ... > > > > 2) set nr_overcommit_hugepages > > > > echo 1 > > >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > cat > > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > 1 > > > > 3) overcommit 2gb hugepages. > > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, > > 3, > > 0) = -1 ENOMEM (Cannot allocate memory) > > > > cat > > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > 18446744071589420672 > > > > Signed-off-by: CAI Qian <caiqian@redhat.com> > > There are a couple of issues here: first, I think the overcommit value > being overwritten > is a bug and this needs to be addressed and fixed before we cover it > by removing the sysfs > file. > > Second, will it be easier for userspace to work with some huge page > sizes having the > overcommit file and others not or making the kernel hand EINVAL back > when nr_overcommit is > is changed for an unsupported page size? > > Finally, this is a problem for more than 1GB pages on x86_64. It is > true for all pages > > 1 << MAX_ORDER. Once the overcommit bug is fixed and the second issue > is answered, the > solution that is used (either EINVAL or no overcommit file) needs to > happen for all cases > where it applies, not just the 1GB case. I have a new patch ready to return EINVAL for both sysfs/procfs, and will reject changing of nr_hugepages. Do you know if nr_hugepages_mempolicy is supposed to be able to change in this case? It is not possible currently. # cat /proc/sys/vm/nr_hugepages_mempolicy 1 # echo 0 >/proc/sys/vm/nr_hugepages_mempolicy # cat /proc/sys/vm/nr_hugepages_mempolicy 1 Thanks. CAI Qian -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] hugetlb: remove overcommit sysfs for 1GB pages 2011-01-05 15:02 ` CAI Qian @ 2011-01-05 16:44 ` Eric B Munson 0 siblings, 0 replies; 7+ messages in thread From: Eric B Munson @ 2011-01-05 16:44 UTC (permalink / raw) To: CAI Qian; +Cc: linux-mm, mel [-- Attachment #1: Type: text/plain, Size: 3025 bytes --] On Wed, 05 Jan 2011, CAI Qian wrote: > > ----- Original Message ----- > > On Tue, 04 Jan 2011, CAI Qian wrote: > > > > > 1GB pages cannot be over-commited, attempting to do so results in > > > corruption, > > > so remove those files for simplicity. > > > > > > Symptoms: > > > 1) setup 1gb hugepages. > > > > > > cat /proc/cmdline > > > ...default_hugepagesz=1g hugepagesz=1g hugepages=1... > > > > > > cat /proc/meminfo > > > ... > > > HugePages_Total: 1 > > > HugePages_Free: 1 > > > HugePages_Rsvd: 0 > > > HugePages_Surp: 0 > > > Hugepagesize: 1048576 kB > > > ... > > > > > > 2) set nr_overcommit_hugepages > > > > > > echo 1 > > > >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > > cat > > > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > > 1 > > > > > > 3) overcommit 2gb hugepages. > > > > > > mmap(NULL, 18446744071562067968, PROT_READ|PROT_WRITE, MAP_SHARED, > > > 3, > > > 0) = -1 ENOMEM (Cannot allocate memory) > > > > > > cat > > > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages > > > 18446744071589420672 > > > > > > Signed-off-by: CAI Qian <caiqian@redhat.com> > > > > There are a couple of issues here: first, I think the overcommit value > > being overwritten > > is a bug and this needs to be addressed and fixed before we cover it > > by removing the sysfs > > file. > > > > Second, will it be easier for userspace to work with some huge page > > sizes having the > > overcommit file and others not or making the kernel hand EINVAL back > > when nr_overcommit is > > is changed for an unsupported page size? > > > > Finally, this is a problem for more than 1GB pages on x86_64. It is > > true for all pages > > > 1 << MAX_ORDER. Once the overcommit bug is fixed and the second issue > > is answered, the > > solution that is used (either EINVAL or no overcommit file) needs to > > happen for all cases > > where it applies, not just the 1GB case. > I have a new patch ready to return EINVAL for both sysfs/procfs, and will > reject changing of nr_hugepages. Do you know if nr_hugepages_mempolicy > is supposed to be able to change in this case? It is not possible currently. > > # cat /proc/sys/vm/nr_hugepages_mempolicy > 1 > # echo 0 >/proc/sys/vm/nr_hugepages_mempolicy > # cat /proc/sys/vm/nr_hugepages_mempolicy > 1 nr_hugepages_mempolicy should follow all the same rules WRT MAX_ORDER as nr_hugepages. The difference is nr_hugepages_mempolicy respects the NUMA allocation policy that is set. I have a pair of patches that do about the same thing but instead of altering flush_write_buffer, they make the functions that use strict_strtoul in hugetlb.c return -EINVAL on error instead of 0. The second patch is the same as your check for MAX_ORDER. I think that returning -EINVAL from hugetlb.c makes better sense than changing the behavior of flush_write_buffer. Patches will be on the way as soon as I am sure they build. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-01-05 16:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <2026935485.119940.1294126785849.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> 2011-01-04 7:42 ` [PATCH] hugetlb: remove overcommit sysfs for 1GB pages CAI Qian 2011-01-04 17:56 ` Eric B Munson 2011-01-05 5:02 ` CAI Qian 2011-01-05 7:31 ` CAI Qian 2011-01-05 11:36 ` CAI Qian 2011-01-05 15:02 ` CAI Qian 2011-01-05 16:44 ` Eric B Munson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).