* [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
@ 2007-12-13 7:41 Nishanth Aravamudan
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 7:41 UTC (permalink / raw)
To: agl; +Cc: wli, mel, apw, akpm, lee.schermerhorn, linux-mm
While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
became convinced that having a boolean sysctl was insufficient:
1) To support per-node control of hugepages, I have previously submitted
patches to add a sysfs attribute related to nr_hugepages. However, with
a boolean global value and per-mount quota enforcement constraining the
dynamic pool, adding corresponding control of the dynamic pool on a
per-node basis seems inconsistent to me.
2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
mount points is, arguably, more arduous than it needs to be. Each quota
would need to be set separately, and the sum would need to be monitored.
To ease the administration, and to help make the way for per-node
control of the static & dynamic hugepage pool, I added a separate
sysctl, nr_overcommit_hugepages. This value serves as a high watermark
for the overall hugepage pool, while nr_hugepages serves as a low
watermark. The boolean sysctl can then be removed, as the condition
nr_overcommit_hugepages > 0
indicates the same administrative setting as
hugetlb_dynamic_pool == 1
Quotas still serve as local enforcement of the size of the pool on a
per-mount basis.
A few caveats:
1) There is a race whereby the global surplus huge page counter is
incremented before a hugepage has allocated. Another process could then
try grow the pool, and fail to convert a surplus huge page to a normal
huge page and instead allocate a fresh huge page. I believe this is
benign, as no memory is leaked (the actual pages are still tracked
correctly) and the counters won't go out of sync.
2) Shrinking the static pool while a surplus is in effect will allow the
number of surplus huge pages to exceed the overcommit value. As long as
this condition holds, however, no more surplus huge pages will be
allowed on the system until one of the two sysctls are increased
sufficiently, or the surplus huge pages go out of use and are freed.
Successfully tested on x86_64 with the current libhugetlbfs snapshot,
modified to use the new sysctl.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
Andrew, since 2.6.24 would be the first release with the dynamic_pool
sysctl, I think these might deserve consideration for inclusion, even so
late in the cycle? It all depends on how close to an (important?)
user-space visible change removing the sysctl might be in 2.6.25?
Obviously pending comments, acks, nacks from Adam et al.
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2496879..f7bc869 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -34,6 +34,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
extern unsigned long max_huge_pages;
extern unsigned long hugepages_treat_as_movable;
extern int hugetlb_dynamic_pool;
+extern unsigned long nr_overcommit_huge_pages;
extern const unsigned long hugetlb_zero, hugetlb_infinity;
extern int sysctl_hugetlb_shm_group;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8ac5171..b85a128 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -912,6 +912,14 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nr_overcommit_hugepages",
+ .data = &nr_overcommit_huge_pages,
+ .maxlen = sizeof(nr_overcommit_huge_pages),
+ .mode = 0644,
+ .proc_handler = &proc_doulongvec_minmax,
+ },
#endif
{
.ctl_name = VM_LOWMEM_RESERVE_RATIO,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f97821..3a79065 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -32,6 +32,7 @@ static unsigned int surplus_huge_pages_node[MAX_NUMNODES];
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
int hugetlb_dynamic_pool;
+unsigned long nr_overcommit_huge_pages;
static int hugetlb_next_nid;
/*
@@ -227,22 +228,62 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma,
unsigned long address)
{
struct page *page;
+ unsigned int nid;
/* Check if the dynamic pool is enabled */
if (!hugetlb_dynamic_pool)
return NULL;
+ /*
+ * Assume we will successfully allocate the surplus page to
+ * prevent racing processes from causing the surplus to exceed
+ * overcommit
+ *
+ * This however introduces a different race, where a process B
+ * tries to grow the static hugepage pool while alloc_pages() is
+ * called by process A. B will only examine the per-node
+ * counters in determining if surplus huge pages can be
+ * converted to normal huge pages in adjust_pool_surplus(). A
+ * won't be able to increment the per-node counter, until the
+ * lock is dropped by B, but B doesn't drop hugetlb_lock until
+ * no more huge pages can be converted from surplus to normal
+ * state (and doesn't try to convert again). Thus, we have a
+ * case where a surplus huge page exists, the pool is grown, and
+ * the surplus huge page still exists after, even though it
+ * should just have been converted to a normal huge page. This
+ * does not leak memory, though, as the hugepage will be freed
+ * once it is out of use. It also does not allow the counters to
+ * go out of whack in adjust_pool_surplus() as we don't modify
+ * the node values until we've gotten the hugepage and only the
+ * per-node value is checked there.
+ */
+ spin_lock(&hugetlb_lock);
+ if (surplus_huge_pages >= nr_overcommit_huge_pages) {
+ spin_unlock(&hugetlb_lock);
+ return NULL;
+ } else {
+ nr_huge_pages++;
+ surplus_huge_pages++;
+ }
+ spin_unlock(&hugetlb_lock);
+
page = alloc_pages(htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
HUGETLB_PAGE_ORDER);
+
+ spin_lock(&hugetlb_lock);
if (page) {
+ nid = page_to_nid(page);
set_compound_page_dtor(page, free_huge_page);
- spin_lock(&hugetlb_lock);
- nr_huge_pages++;
- nr_huge_pages_node[page_to_nid(page)]++;
- surplus_huge_pages++;
- surplus_huge_pages_node[page_to_nid(page)]++;
- spin_unlock(&hugetlb_lock);
+ /*
+ * We incremented the global counters already
+ */
+ nr_huge_pages_node[nid]++;
+ surplus_huge_pages_node[nid]++;
+ } else {
+ nr_huge_pages--;
+ surplus_huge_pages--;
}
+ spin_unlock(&hugetlb_lock);
return page;
}
@@ -481,6 +522,12 @@ static unsigned long set_max_huge_pages(unsigned long count)
* Increase the pool size
* First take pages out of surplus state. Then make up the
* remaining difference by allocating fresh huge pages.
+ *
+ * We might race with alloc_buddy_huge_page() here and be unable
+ * to convert a surplus huge page to a normal huge page. That is
+ * not critical, though, it just means the overall size of the
+ * pool might be one hugepage larger than it needs to be, but
+ * within all the constraints specified by the sysctls.
*/
spin_lock(&hugetlb_lock);
while (surplus_huge_pages && count > persistent_huge_pages) {
@@ -509,6 +556,14 @@ static unsigned long set_max_huge_pages(unsigned long count)
* to keep enough around to satisfy reservations). Then place
* pages into surplus state as needed so the pool will shrink
* to the desired size as pages become free.
+ *
+ * By placing pages into the surplus state independent of the
+ * overcommit value, we are allowing the surplus pool size to
+ * exceed overcommit. There are few sane options here. Since
+ * alloc_buddy_huge_page() is checking the global counter,
+ * though, we'll note that we're not allowed to exceed surplus
+ * and won't grow the pool anywhere else. Not until one of the
+ * sysctls are changed, or the surplus pages go out of use.
*/
min_count = resv_huge_pages + nr_huge_pages - free_huge_pages;
min_count = max(count, min_count);
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
2007-12-13 7:41 [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
@ 2007-12-13 7:42 ` Nishanth Aravamudan
2007-12-13 8:53 ` William Lee Irwin III
2007-12-13 22:14 ` Adam Litke
2007-12-13 16:17 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Dave Hansen
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 7:42 UTC (permalink / raw)
To: agl; +Cc: wli, mel, apw, akpm, lee.schermerhorn, linux-mm
Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790.
Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
sysctl is not needed, as its semantics can be expressed by 0 in the
overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
(pool enabled).
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f7bc869..30d606a 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -33,7 +33,6 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
extern unsigned long max_huge_pages;
extern unsigned long hugepages_treat_as_movable;
-extern int hugetlb_dynamic_pool;
extern unsigned long nr_overcommit_huge_pages;
extern const unsigned long hugetlb_zero, hugetlb_infinity;
extern int sysctl_hugetlb_shm_group;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b85a128..1135de7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -906,14 +906,6 @@ static struct ctl_table vm_table[] = {
},
{
.ctl_name = CTL_UNNUMBERED,
- .procname = "hugetlb_dynamic_pool",
- .data = &hugetlb_dynamic_pool,
- .maxlen = sizeof(hugetlb_dynamic_pool),
- .mode = 0644,
- .proc_handler = &proc_dointvec,
- },
- {
- .ctl_name = CTL_UNNUMBERED,
.procname = "nr_overcommit_hugepages",
.data = &nr_overcommit_huge_pages,
.maxlen = sizeof(nr_overcommit_huge_pages),
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3a79065..7224a4f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -31,7 +31,6 @@ static unsigned int free_huge_pages_node[MAX_NUMNODES];
static unsigned int surplus_huge_pages_node[MAX_NUMNODES];
static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
-int hugetlb_dynamic_pool;
unsigned long nr_overcommit_huge_pages;
static int hugetlb_next_nid;
@@ -230,10 +229,6 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma,
struct page *page;
unsigned int nid;
- /* Check if the dynamic pool is enabled */
- if (!hugetlb_dynamic_pool)
- return NULL;
-
/*
* Assume we will successfully allocate the surplus page to
* prevent racing processes from causing the surplus to exceed
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
@ 2007-12-13 8:53 ` William Lee Irwin III
2007-12-13 16:47 ` Nishanth Aravamudan
2007-12-13 22:14 ` Adam Litke
1 sibling, 1 reply; 18+ messages in thread
From: William Lee Irwin III @ 2007-12-13 8:53 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, mel, apw, akpm, lee.schermerhorn, linux-mm
On Wed, Dec 12, 2007 at 11:42:59PM -0800, Nishanth Aravamudan wrote:
> Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
> This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790.
> Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
> sysctl is not needed, as its semantics can be expressed by 0 in the
> overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
> (pool enabled).
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
This is recent enough that dependencies shouldn't have developed, but
it'd be nice to stage user-visible API/ABI changes more consciously
and carefully in the future. Or at least we should try to avoid the
sorts of situations where we end up changing recently introduced
user/kernel ABI's and API's shortly after merging. We'll run the risk
of getting stuck with a user/kernel ABI we can't abandon for years on
account of not fixing it up before dependencies develop if this happens
too often.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 7:41 [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
@ 2007-12-13 16:17 ` Dave Hansen
2007-12-13 16:44 ` Nishanth Aravamudan
2007-12-13 19:24 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
2007-12-13 22:14 ` Adam Litke
3 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2007-12-13 16:17 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Wed, 2007-12-12 at 23:41 -0800, Nishanth Aravamudan wrote:
> While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
> became convinced that having a boolean sysctl was insufficient:
>
> 1) To support per-node control of hugepages, I have previously submitted
> patches to add a sysfs attribute related to nr_hugepages. However, with
> a boolean global value and per-mount quota enforcement constraining the
> dynamic pool, adding corresponding control of the dynamic pool on a
> per-node basis seems inconsistent to me.
Documentation/sysctl, please :)
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 16:17 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Dave Hansen
@ 2007-12-13 16:44 ` Nishanth Aravamudan
2007-12-13 16:49 ` Nishanth Aravamudan
2007-12-13 17:02 ` Dave Hansen
0 siblings, 2 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 16:44 UTC (permalink / raw)
To: Dave Hansen; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [08:17:08 -0800], Dave Hansen wrote:
> On Wed, 2007-12-12 at 23:41 -0800, Nishanth Aravamudan wrote:
> > While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
> > became convinced that having a boolean sysctl was insufficient:
> >
> > 1) To support per-node control of hugepages, I have previously submitted
> > patches to add a sysfs attribute related to nr_hugepages. However, with
> > a boolean global value and per-mount quota enforcement constraining the
> > dynamic pool, adding corresponding control of the dynamic pool on a
> > per-node basis seems inconsistent to me.
>
> Documentation/sysctl, please :)
Err, yes, will need to updated that. I note that the old sysctl is not
there...nor is nr_hugepages, for that matter. So maybe I'll just add a
3rd patch to fix the Documentation? I really just wanted to get the
patches out there as soon as I got them tested...
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
2007-12-13 8:53 ` William Lee Irwin III
@ 2007-12-13 16:47 ` Nishanth Aravamudan
2007-12-13 17:37 ` William Lee Irwin III
0 siblings, 1 reply; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 16:47 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: agl, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [00:53:46 -0800], William Lee Irwin III wrote:
> On Wed, Dec 12, 2007 at 11:42:59PM -0800, Nishanth Aravamudan wrote:
> > Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
> > This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790.
> > Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
> > sysctl is not needed, as its semantics can be expressed by 0 in the
> > overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
> > (pool enabled).
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
> This is recent enough that dependencies shouldn't have developed, but
> it'd be nice to stage user-visible API/ABI changes more consciously
> and carefully in the future. Or at least we should try to avoid the
> sorts of situations where we end up changing recently introduced
> user/kernel ABI's and API's shortly after merging. We'll run the risk
> of getting stuck with a user/kernel ABI we can't abandon for years on
> account of not fixing it up before dependencies develop if this
> happens too often.
I agree and I apologize if I'm making things hard for everyone. However,
I hadn't fully considered the implications of the dynamic pool for my
other patches. The patches moved from -mm to -linus rather quickly,
iirc. No excuse, however, I should have been paying more attention.
If folks really don't want things to change, I guess we could also just
make the sysctl's per-node corresponding attribute be a boolean too. It
just seems less flexible than this approach.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 16:44 ` Nishanth Aravamudan
@ 2007-12-13 16:49 ` Nishanth Aravamudan
2007-12-13 17:03 ` Dave Hansen
2007-12-13 17:02 ` Dave Hansen
1 sibling, 1 reply; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 16:49 UTC (permalink / raw)
To: Dave Hansen; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [08:44:53 -0800], Nishanth Aravamudan wrote:
> On 13.12.2007 [08:17:08 -0800], Dave Hansen wrote:
> > On Wed, 2007-12-12 at 23:41 -0800, Nishanth Aravamudan wrote:
> > > While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
> > > became convinced that having a boolean sysctl was insufficient:
> > >
> > > 1) To support per-node control of hugepages, I have previously submitted
> > > patches to add a sysfs attribute related to nr_hugepages. However, with
> > > a boolean global value and per-mount quota enforcement constraining the
> > > dynamic pool, adding corresponding control of the dynamic pool on a
> > > per-node basis seems inconsistent to me.
> >
> > Documentation/sysctl, please :)
>
> Err, yes, will need to updated that. I note that the old sysctl is not
> there...nor is nr_hugepages, for that matter. So maybe I'll just add a
> 3rd patch to fix the Documentation? I really just wanted to get the
> patches out there as soon as I got them tested...
Hrm, nr_hugepages is documented in vm/hugetlbpage.txt and not
sysctl/vm.txt Should I document this sysctl there too?
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 16:44 ` Nishanth Aravamudan
2007-12-13 16:49 ` Nishanth Aravamudan
@ 2007-12-13 17:02 ` Dave Hansen
2007-12-13 18:01 ` [RFC][PATCH 3/3] Documetation: update hugetlb information Nishanth Aravamudan
1 sibling, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2007-12-13 17:02 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Thu, 2007-12-13 at 08:44 -0800, Nishanth Aravamudan wrote:
> Err, yes, will need to updated that. I note that the old sysctl is not
> there...nor is nr_hugepages, for that matter. So maybe I'll just add a
> 3rd patch to fix the Documentation? I really just wanted to get the
> patches out there as soon as I got them tested...
Yeah, that should be fine. Adding nr_hugepages will probably get you
bonus points. :)
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 16:49 ` Nishanth Aravamudan
@ 2007-12-13 17:03 ` Dave Hansen
0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2007-12-13 17:03 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Thu, 2007-12-13 at 08:49 -0800, Nishanth Aravamudan wrote:
> Hrm, nr_hugepages is documented in vm/hugetlbpage.txt and not
> sysctl/vm.txt Should I document this sysctl there too?
You might just want to add a pointer to the sysctl file pointing to the
VM documentation.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
2007-12-13 16:47 ` Nishanth Aravamudan
@ 2007-12-13 17:37 ` William Lee Irwin III
0 siblings, 0 replies; 18+ messages in thread
From: William Lee Irwin III @ 2007-12-13 17:37 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [00:53:46 -0800], William Lee Irwin III wrote:
>> This is recent enough that dependencies shouldn't have developed, but
>> it'd be nice to stage user-visible API/ABI changes more consciously
>> and carefully in the future. Or at least we should try to avoid the
>> sorts of situations where we end up changing recently introduced
>> user/kernel ABI's and API's shortly after merging. We'll run the risk
>> of getting stuck with a user/kernel ABI we can't abandon for years on
>> account of not fixing it up before dependencies develop if this
>> happens too often.
On Thu, Dec 13, 2007 at 08:47:27AM -0800, Nishanth Aravamudan wrote:
> I agree and I apologize if I'm making things hard for everyone. However,
> I hadn't fully considered the implications of the dynamic pool for my
> other patches. The patches moved from -mm to -linus rather quickly,
> iirc. No excuse, however, I should have been paying more attention.
> If folks really don't want things to change, I guess we could also just
> make the sysctl's per-node corresponding attribute be a boolean too. It
> just seems less flexible than this approach.
I'm fine with this getting changed over since we've not spanned a point
release with the old nomenclature. If we had spanned a point release
there would be trouble. I've no particular preference about old vs. new
nomenclature apart from following user/kernel ABI/API stability rules.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC][PATCH 3/3] Documetation: update hugetlb information
2007-12-13 17:02 ` Dave Hansen
@ 2007-12-13 18:01 ` Nishanth Aravamudan
2007-12-13 18:01 ` Nishanth Aravamudan
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 18:01 UTC (permalink / raw)
To: Dave Hansen; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [09:02:44 -0800], Dave Hansen wrote:
> On Thu, 2007-12-13 at 08:44 -0800, Nishanth Aravamudan wrote:
> > Err, yes, will need to updated that. I note that the old sysctl is not
> > there...nor is nr_hugepages, for that matter. So maybe I'll just add a
> > 3rd patch to fix the Documentation? I really just wanted to get the
> > patches out there as soon as I got them tested...
>
> Yeah, that should be fine. Adding nr_hugepages will probably get you
> bonus points. :)
Documentation: updated hugetlb information
The hugetlb documentation has gotten a bit out of sync with the current
code. Updated the sysctl file to refer to
Documentation/vm/hugetlbpage.txt. Update that file to contain the
current state of affairs (with the newer named sysctl in place).
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
This is 3/3 because it depends on 1/2 and 2/2 ... Not sure if this is
complete enough, either. Adam, do you have any input?
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index b89570c..6f31f0a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -34,6 +34,8 @@ Currently, these files are in /proc/sys/vm:
- oom_kill_allocating_task
- mmap_min_address
- numa_zonelist_order
+- nr_hugepages
+- nr_overcommit_hugepages
==============================================================
@@ -305,3 +307,20 @@ will select "node" order in following case.
Otherwise, "zone" order will be selected. Default order is recommended unless
this is causing problems for your system/application.
+
+==============================================================
+
+nr_hugepages
+
+Change the minimum size of the hugepage pool.
+
+See Documentation/vm/hugetlbpage.txt
+
+==============================================================
+
+nr_overcommit_hugepages
+
+Change the maximum size of the hugepage pool. The maximum is
+nr_hugepages + nr_overcommit_hugepages.
+
+See Documentation/vm/hugetlbpage.txt
diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index 51ccc48..f962d01 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -30,9 +30,10 @@ alignment and size of the arguments to the above system calls.
The output of "cat /proc/meminfo" will have lines like:
.....
-HugePages_Total: xxx
-HugePages_Free: yyy
-HugePages_Rsvd: www
+HugePages_Total: vvv
+HugePages_Free: www
+HugePages_Rsvd: xxx
+HugePages_Surp: yyy
Hugepagesize: zzz kB
where:
@@ -42,6 +43,10 @@ allocated.
HugePages_Rsvd is short for "reserved," and is the number of hugepages
for which a commitment to allocate from the pool has been made, but no
allocation has yet been made. It's vaguely analogous to overcommit.
+HugePages_Surp is short for "surplus," and is the number of hugepages in
+the pool above the value in /proc/sys/vm/nr_hugepages. The maximum
+number of surplus hugepages is controlled by
+/proc/sys/vm/nr_overcommit_hugepages.
/proc/filesystems should also show a filesystem of type "hugetlbfs" configured
in the kernel.
@@ -71,7 +76,25 @@ or failure of allocation depends on the amount of physically contiguous
memory that is preset in system at this time. System administrators may want
to put this command in one of the local rc init files. This will enable the
kernel to request huge pages early in the boot process (when the possibility
-of getting physical contiguous pages is still very high).
+of getting physical contiguous pages is still very high). In either
+case, adminstrators will want to verify the number of hugepages actually
+allocated by checking the sysctl or meminfo.
+
+/proc/sys/vm/nr_overcommit_hugepages indicates how large the pool of
+hugepages can grow, if more hugepages than /proc/sys/vm/nr_hugepages are
+requested by applications. echo'ing any non-zero value into this file
+indicates that the hugetlb subsystem is allowed to try to obtain
+hugepages from the buddy allocator, if the normal pool is exhausted. As
+these surplus hugepages go out of use, they are freed back to the buddy
+allocator.
+
+Caveat: Shrinking the pool via nr_hugepages while a surplus is in effect
+will allow the number of surplus huge pages to exceed the overcommit
+value, as the pool hugepages (which must have been in use for a surplus
+hugepages to be allocated) will become surplus hugepages. As long as
+this condition holds, however, no more surplus huge pages will be
+allowed on the system until one of the two sysctls are increased
+sufficiently, or the surplus huge pages go out of use and are freed.
If the user applications are going to request hugepages using mmap system
call, then it is required that system administrator mount a file system of
@@ -94,8 +117,8 @@ provided on command line then no limits are set. For size and nr_inodes
options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
example, size=2K has the same meaning as size=2048.
-read and write system calls are not supported on files that reside on hugetlb
-file systems.
+While read system calls are supported on files that reside on hugetlb
+file systems, write system calls are not.
Regular chown, chgrp, and chmod commands (with right permissions) could be
used to change the file attributes on hugetlbfs.
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 3/3] Documetation: update hugetlb information
2007-12-13 18:01 ` [RFC][PATCH 3/3] Documetation: update hugetlb information Nishanth Aravamudan
@ 2007-12-13 18:01 ` Nishanth Aravamudan
2007-12-13 19:04 ` Dave Hansen
2007-12-13 22:17 ` Adam Litke
2 siblings, 0 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 18:01 UTC (permalink / raw)
To: Dave Hansen; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [10:01:16 -0800], Nishanth Aravamudan wrote:
> On 13.12.2007 [09:02:44 -0800], Dave Hansen wrote:
> > On Thu, 2007-12-13 at 08:44 -0800, Nishanth Aravamudan wrote:
> > > Err, yes, will need to updated that. I note that the old sysctl is not
> > > there...nor is nr_hugepages, for that matter. So maybe I'll just add a
> > > 3rd patch to fix the Documentation? I really just wanted to get the
> > > patches out there as soon as I got them tested...
> >
> > Yeah, that should be fine. Adding nr_hugepages will probably get you
> > bonus points. :)
>
> Documentation: updated hugetlb information
Clearly this is what the subject of the mail should have been too. Sorry
for the typo...
-Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 3/3] Documetation: update hugetlb information
2007-12-13 18:01 ` [RFC][PATCH 3/3] Documetation: update hugetlb information Nishanth Aravamudan
2007-12-13 18:01 ` Nishanth Aravamudan
@ 2007-12-13 19:04 ` Dave Hansen
2007-12-13 19:20 ` Nishanth Aravamudan
2007-12-13 22:17 ` Adam Litke
2 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2007-12-13 19:04 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Thu, 2007-12-13 at 10:01 -0800, Nishanth Aravamudan wrote:
> +Caveat: Shrinking the pool via nr_hugepages while a surplus is in effect
> +will allow the number of surplus huge pages to exceed the overcommit
> +value, as the pool hugepages (which must have been in use for a surplus
> +hugepages to be allocated) will become surplus hugepages. As long as
> +this condition holds, however, no more surplus huge pages will be
> +allowed on the system until one of the two sysctls are increased
> +sufficiently, or the surplus huge pages go out of use and are freed.
I guess you could, in theory, disallow the writes to the sysctl and
return -EINVAL or -ENOSPC or something. But, I think documenting it
like this is probably OK by itself and is pretty sane behavior given the
circumstances.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 3/3] Documetation: update hugetlb information
2007-12-13 19:04 ` Dave Hansen
@ 2007-12-13 19:20 ` Nishanth Aravamudan
0 siblings, 0 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 19:20 UTC (permalink / raw)
To: Dave Hansen; +Cc: agl, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 13.12.2007 [11:04:28 -0800], Dave Hansen wrote:
> On Thu, 2007-12-13 at 10:01 -0800, Nishanth Aravamudan wrote:
> > +Caveat: Shrinking the pool via nr_hugepages while a surplus is in effect
> > +will allow the number of surplus huge pages to exceed the overcommit
> > +value, as the pool hugepages (which must have been in use for a surplus
> > +hugepages to be allocated) will become surplus hugepages. As long as
> > +this condition holds, however, no more surplus huge pages will be
> > +allowed on the system until one of the two sysctls are increased
> > +sufficiently, or the surplus huge pages go out of use and are freed.
>
> I guess you could, in theory, disallow the writes to the sysctl and
> return -EINVAL or -ENOSPC or something. But, I think documenting it
> like this is probably OK by itself and is pretty sane behavior given
> the circumstances.
That's true -- would complicate the sysctl callback which is currently
able to just use one of the generic functions.
I'm willing to investigate changing this, if there is interest.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 7:41 [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
2007-12-13 16:17 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Dave Hansen
@ 2007-12-13 19:24 ` Nishanth Aravamudan
2007-12-13 22:14 ` Adam Litke
3 siblings, 0 replies; 18+ messages in thread
From: Nishanth Aravamudan @ 2007-12-13 19:24 UTC (permalink / raw)
To: agl; +Cc: wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On 12.12.2007 [23:41:56 -0800], Nishanth Aravamudan wrote:
> hugetlb: introduce nr_overcommit_hugepages sysctl
<snip>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8ac5171..b85a128 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -912,6 +912,14 @@ static struct ctl_table vm_table[] = {
> .mode = 0644,
> .proc_handler = &proc_dointvec,
> },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "nr_overcommit_hugepages",
> + .data = &nr_overcommit_huge_pages,
> + .maxlen = sizeof(nr_overcommit_huge_pages),
> + .mode = 0644,
> + .proc_handler = &proc_doulongvec_minmax,
> + },
Dave's reply regarding the sysctl documentation, while unrelated to this
hunk, did remind me of something I wanted to ask. Having looked at
proc_doulongvec_minmax() a bit, it seems like I'm ok not specifying a
min and max, as the code checks to see if the min and max are specified.
Essentially, I want to allow any unsigned long value. Does this seem ok?
(there doesn't seem to be a proc_doulongvec() like there is
proc_dointvec().
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
2007-12-13 8:53 ` William Lee Irwin III
@ 2007-12-13 22:14 ` Adam Litke
1 sibling, 0 replies; 18+ messages in thread
From: Adam Litke @ 2007-12-13 22:14 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Wed, 2007-12-12 at 23:42 -0800, Nishanth Aravamudan wrote:
> Revert "hugetlb: Add hugetlb_dynamic_pool sysctl"
>
> This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790.
>
> Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
> sysctl is not needed, as its semantics can be expressed by 0 in the
> overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
> (pool enabled).
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl
2007-12-13 7:41 [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
` (2 preceding siblings ...)
2007-12-13 19:24 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
@ 2007-12-13 22:14 ` Adam Litke
3 siblings, 0 replies; 18+ messages in thread
From: Adam Litke @ 2007-12-13 22:14 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: wli, mel, apw, akpm, lee.schermerhorn, linux-mm
On Wed, 2007-12-12 at 23:41 -0800, Nishanth Aravamudan wrote:
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
> ---
> Andrew, since 2.6.24 would be the first release with the dynamic_pool
> sysctl, I think these might deserve consideration for inclusion, even so
> late in the cycle? It all depends on how close to an (important?)
> user-space visible change removing the sysctl might be in 2.6.25?
> Obviously pending comments, acks, nacks from Adam et al.
Also tested on powerpc. I do think this is a more comprehensive
interface and would advocate a conversion to it in time for 2.6.24 if
possible.
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 3/3] Documetation: update hugetlb information
2007-12-13 18:01 ` [RFC][PATCH 3/3] Documetation: update hugetlb information Nishanth Aravamudan
2007-12-13 18:01 ` Nishanth Aravamudan
2007-12-13 19:04 ` Dave Hansen
@ 2007-12-13 22:17 ` Adam Litke
2 siblings, 0 replies; 18+ messages in thread
From: Adam Litke @ 2007-12-13 22:17 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Dave Hansen, wli, mel, apw, akpm, lee.schermerhorn, linux-mm
Additionally, I have some diagrams that illustrate the relationship of
the various pool counters and the numerous valid states a hugetlb
pool-managed page can go through. Due to problems with the linux-mm
wiki authentication, I have not been able to post them yet :(
On Thu, 2007-12-13 at 10:01 -0800, Nishanth Aravamudan wrote:
> On 13.12.2007 [09:02:44 -0800], Dave Hansen wrote:
> > On Thu, 2007-12-13 at 08:44 -0800, Nishanth Aravamudan wrote:
> > > Err, yes, will need to updated that. I note that the old sysctl is not
> > > there...nor is nr_hugepages, for that matter. So maybe I'll just add a
> > > 3rd patch to fix the Documentation? I really just wanted to get the
> > > patches out there as soon as I got them tested...
> >
> > Yeah, that should be fine. Adding nr_hugepages will probably get you
> > bonus points. :)
>
> Documentation: updated hugetlb information
>
> The hugetlb documentation has gotten a bit out of sync with the current
> code. Updated the sysctl file to refer to
> Documentation/vm/hugetlbpage.txt. Update that file to contain the
> current state of affairs (with the newer named sysctl in place).
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
> ---
> This is 3/3 because it depends on 1/2 and 2/2 ... Not sure if this is
> complete enough, either. Adam, do you have any input?
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index b89570c..6f31f0a 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -34,6 +34,8 @@ Currently, these files are in /proc/sys/vm:
> - oom_kill_allocating_task
> - mmap_min_address
> - numa_zonelist_order
> +- nr_hugepages
> +- nr_overcommit_hugepages
>
> ==============================================================
>
> @@ -305,3 +307,20 @@ will select "node" order in following case.
>
> Otherwise, "zone" order will be selected. Default order is recommended unless
> this is causing problems for your system/application.
> +
> +==============================================================
> +
> +nr_hugepages
> +
> +Change the minimum size of the hugepage pool.
> +
> +See Documentation/vm/hugetlbpage.txt
> +
> +==============================================================
> +
> +nr_overcommit_hugepages
> +
> +Change the maximum size of the hugepage pool. The maximum is
> +nr_hugepages + nr_overcommit_hugepages.
> +
> +See Documentation/vm/hugetlbpage.txt
> diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
> index 51ccc48..f962d01 100644
> --- a/Documentation/vm/hugetlbpage.txt
> +++ b/Documentation/vm/hugetlbpage.txt
> @@ -30,9 +30,10 @@ alignment and size of the arguments to the above system calls.
> The output of "cat /proc/meminfo" will have lines like:
>
> .....
> -HugePages_Total: xxx
> -HugePages_Free: yyy
> -HugePages_Rsvd: www
> +HugePages_Total: vvv
> +HugePages_Free: www
> +HugePages_Rsvd: xxx
> +HugePages_Surp: yyy
> Hugepagesize: zzz kB
>
> where:
> @@ -42,6 +43,10 @@ allocated.
> HugePages_Rsvd is short for "reserved," and is the number of hugepages
> for which a commitment to allocate from the pool has been made, but no
> allocation has yet been made. It's vaguely analogous to overcommit.
> +HugePages_Surp is short for "surplus," and is the number of hugepages in
> +the pool above the value in /proc/sys/vm/nr_hugepages. The maximum
> +number of surplus hugepages is controlled by
> +/proc/sys/vm/nr_overcommit_hugepages.
>
> /proc/filesystems should also show a filesystem of type "hugetlbfs" configured
> in the kernel.
> @@ -71,7 +76,25 @@ or failure of allocation depends on the amount of physically contiguous
> memory that is preset in system at this time. System administrators may want
> to put this command in one of the local rc init files. This will enable the
> kernel to request huge pages early in the boot process (when the possibility
> -of getting physical contiguous pages is still very high).
> +of getting physical contiguous pages is still very high). In either
> +case, adminstrators will want to verify the number of hugepages actually
> +allocated by checking the sysctl or meminfo.
> +
> +/proc/sys/vm/nr_overcommit_hugepages indicates how large the pool of
> +hugepages can grow, if more hugepages than /proc/sys/vm/nr_hugepages are
> +requested by applications. echo'ing any non-zero value into this file
> +indicates that the hugetlb subsystem is allowed to try to obtain
> +hugepages from the buddy allocator, if the normal pool is exhausted. As
> +these surplus hugepages go out of use, they are freed back to the buddy
> +allocator.
> +
> +Caveat: Shrinking the pool via nr_hugepages while a surplus is in effect
> +will allow the number of surplus huge pages to exceed the overcommit
> +value, as the pool hugepages (which must have been in use for a surplus
> +hugepages to be allocated) will become surplus hugepages. As long as
> +this condition holds, however, no more surplus huge pages will be
> +allowed on the system until one of the two sysctls are increased
> +sufficiently, or the surplus huge pages go out of use and are freed.
>
> If the user applications are going to request hugepages using mmap system
> call, then it is required that system administrator mount a file system of
> @@ -94,8 +117,8 @@ provided on command line then no limits are set. For size and nr_inodes
> options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
> example, size=2K has the same meaning as size=2048.
>
> -read and write system calls are not supported on files that reside on hugetlb
> -file systems.
> +While read system calls are supported on files that reside on hugetlb
> +file systems, write system calls are not.
>
> Regular chown, chgrp, and chmod commands (with right permissions) could be
> used to change the file attributes on hugetlbfs.
>
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-12-13 22:15 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-13 7:41 [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
2007-12-13 7:42 ` [RFC][PATCH 2/2] Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" Nishanth Aravamudan
2007-12-13 8:53 ` William Lee Irwin III
2007-12-13 16:47 ` Nishanth Aravamudan
2007-12-13 17:37 ` William Lee Irwin III
2007-12-13 22:14 ` Adam Litke
2007-12-13 16:17 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Dave Hansen
2007-12-13 16:44 ` Nishanth Aravamudan
2007-12-13 16:49 ` Nishanth Aravamudan
2007-12-13 17:03 ` Dave Hansen
2007-12-13 17:02 ` Dave Hansen
2007-12-13 18:01 ` [RFC][PATCH 3/3] Documetation: update hugetlb information Nishanth Aravamudan
2007-12-13 18:01 ` Nishanth Aravamudan
2007-12-13 19:04 ` Dave Hansen
2007-12-13 19:20 ` Nishanth Aravamudan
2007-12-13 22:17 ` Adam Litke
2007-12-13 19:24 ` [RFC][PATCH 1/2] hugetlb: introduce nr_overcommit_hugepages sysctl Nishanth Aravamudan
2007-12-13 22:14 ` Adam Litke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).