[PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
@ 2007-09-06 18:21 Nishanth Aravamudan
  2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-06 18:21 UTC (permalink / raw)
  To: clameter; +Cc: wli, agl, lee.schermerhorn, linux-mm

Currently, alloc_fresh_huge_page() returns NULL when it is not able to
allocate a huge page on the current node, as specified by its custom
interleave variable. The callers of this function, though, assume that a
failure in alloc_fresh_huge_page() indicates no hugepages can be
allocated on the system period. This might not be the case, for
instance, if we have an uneven NUMA system, and we happen to try to
allocate a hugepage on a node (with __GFP_THISNODE) with less memory and
fail, while there is still plenty of free memory on the other nodes.

To correct this, make alloc_fresh_huge_page() search through all online
nodes before deciding no hugepages can be allocated. Add a helper
function for actually allocating the hugepage. Also, while we expect
particular semantics for __GFP_THISNODE, which are newly enforced --
that is, that the allocation won't go off-node -- still use
page_to_nid() to guarantee we don't mess up the accounting.

Tested on 4-node ppc64, 2-node ia64 and 4-node x86_64.

Before this patch on a 4-node ppc64 with the following memory
characteristics:

Node 0 MemTotal:      1310720 kB
Node 1 MemTotal:      1048576 kB
Node 2 MemTotal:      1048576 kB
Node 3 MemTotal:       786432 kB

Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     25
Node 1 HugePages_Free:     25
Node 2 HugePages_Free:     25
Node 3 HugePages_Free:     25
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:     50
Node 1 HugePages_Free:     57
Node 2 HugePages_Free:     52
Node 3 HugePages_Free:     41
Done.     200 free

After:

Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     25
Node 1 HugePages_Free:     25
Node 2 HugePages_Free:     25
Node 3 HugePages_Free:     25
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:     53
Node 1 HugePages_Free:     53
Node 2 HugePages_Free:     52
Node 3 HugePages_Free:     42
Done.     200 free

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c53bd5a..edb2100 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -101,26 +101,13 @@ static void free_huge_page(struct page *page)
 	spin_unlock(&hugetlb_lock);
 }
 
-static int alloc_fresh_huge_page(void)
+static struct page *alloc_fresh_huge_page_node(int nid)
 {
-	static int prev_nid;
 	struct page *page;
-	int nid;
-
-	/*
-	 * Copy static prev_nid to local nid, work on that, then copy it
-	 * back to prev_nid afterwards: otherwise there's a window in which
-	 * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node.
-	 * But we don't need to use a spin_lock here: it really doesn't
-	 * matter if occasionally a racer chooses the same nid as we do.
-	 */
-	nid = next_node(prev_nid, node_online_map);
-	if (nid == MAX_NUMNODES)
-		nid = first_node(node_online_map);
-	prev_nid = nid;
 
-	page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
-					HUGETLB_PAGE_ORDER);
+	page = alloc_pages_node(nid,
+		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN,
+		HUGETLB_PAGE_ORDER);
 	if (page) {
 		set_compound_page_dtor(page, free_huge_page);
 		spin_lock(&hugetlb_lock);
@@ -128,9 +115,45 @@ static int alloc_fresh_huge_page(void)
 		nr_huge_pages_node[page_to_nid(page)]++;
 		spin_unlock(&hugetlb_lock);
 		put_page(page); /* free it into the hugepage allocator */
-		return 1;
 	}
-	return 0;
+
+	return page;
+}
+
+static int alloc_fresh_huge_page(void)
+{
+	static int nid = -1;
+	struct page *page;
+	int start_nid;
+	int next_nid;
+	int ret = 0;
+
+	if (nid < 0)
+		nid = first_node(node_online_map);
+	start_nid = nid;
+
+	do {
+		page = alloc_fresh_huge_page_node(nid);
+		if (page)
+			ret = 1;
+		/*
+		 * Use a helper variable to find the next node and then
+		 * copy it back to nid nid afterwards: otherwise there's
+		 * a window in which a racer might pass invalid nid
+		 * MAX_NUMNODES to alloc_pages_node.  But we don't need
+		 * to use a spin_lock here: it really doesn't matter if
+		 * occasionally a racer chooses the same nid as we do.
+		 * Move nid forward in the mask even if we just
+		 * successfully allocated a hugepage so that the next
+		 * caller gets hugepages on the next node.
+		 */
+		next_nid = next_node(nid, node_online_map);
+		if (next_nid == MAX_NUMNODES)
+			next_nid = first_node(node_online_map);
+		nid = next_nid;
+	} while (!page && nid != start_nid);
+
+	return ret;
 }
 
 static struct page *alloc_huge_page(struct vm_area_struct *vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] hugetlb: fix pool allocation with empty nodes
  2007-09-06 18:21 [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
@ 2007-09-06 18:24 ` Nishanth Aravamudan
  2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
  2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
  2007-09-14 17:26 ` [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
  2007-09-14 18:51 ` Christoph Lameter
  2 siblings, 2 replies; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-06 18:24 UTC (permalink / raw)
  To: clameter; +Cc: anton, wli, agl, lee.schermerhorn, linux-mm

Anton found a problem with the hugetlb pool allocation when some nodes
have no memory (http://marc.info/?l=linux-mm&m=118133042025995&w=2). Lee
worked on versions that tried to fix it, but none were accepted.
Christoph has created a set of patches which allow for GFP_THISNODE
allocations to fail if the node has no memory and for exporting a
nodemask indicating which nodes have memory. Simply interleave across
this nodemask rather than the online nodemask.

Tested on 4-node ppc64, 2-node ia64 and 4-node x86_64.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---
My 4-node ppc64 box with memoryless nodes is having issues with
2.6.23-rc4-mm1, so I'm unable to test. Lee, could you give this a spin?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index edb2100..cc875c6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -129,7 +129,7 @@ static int alloc_fresh_huge_page(void)
 	int ret = 0;
 
 	if (nid < 0)
-		nid = first_node(node_online_map);
+		nid = first_node(node_states[N_HIGH_MEMORY]);
 	start_nid = nid;
 
 	do {
@@ -147,9 +147,9 @@ static int alloc_fresh_huge_page(void)
 		 * successfully allocated a hugepage so that the next
 		 * caller gets hugepages on the next node.
 		 */
-		next_nid = next_node(nid, node_online_map);
+		next_nid = next_node(nid, node_states[N_HIGH_MEMORY]);
 		if (next_nid == MAX_NUMNODES)
-			next_nid = first_node(node_online_map);
+			next_nid = first_node(node_states[N_HIGH_MEMORY]);
 		nid = next_nid;
 	} while (!page && nid != start_nid);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
@ 2007-09-06 18:27   ` Nishanth Aravamudan
  2007-09-06 18:28     ` [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
  2007-09-14 18:54     ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Christoph Lameter
  2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
  1 sibling, 2 replies; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-06 18:27 UTC (permalink / raw)
  To: clameter; +Cc: wli, agl, lee.schermerhorn, linux-mm

Currently, when shrinking the hugetlb pool, we free all of the pages on
node 0, then all the pages on node 1, etc. Instead, we interleave over
the nodes with memory. If some particularly node should be cleared
first, the to-be-introduced sysfs allocator can be used for
finer-grained control. This also helps with keeping the pool balanced as
we change the pool at run-time.

Tested on 4-node ppc64, 2-node ia64 and 4-node x86_64.

Before, on the same ppc64 box as 1/4:

Trying to resize the pool to 200
Node 0 HugePages_Free:     53
Node 1 HugePages_Free:     53
Node 2 HugePages_Free:     53
Node 3 HugePages_Free:     41
Done.     200 free
Trying to resize the pool back to     100
Node 0 HugePages_Free:      0
Node 1 HugePages_Free:      6
Node 2 HugePages_Free:     53
Node 3 HugePages_Free:     41
Done.     100 free

After:

Trying to resize the pool to 200
Node 0 HugePages_Free:     53
Node 1 HugePages_Free:     52
Node 2 HugePages_Free:     52
Node 3 HugePages_Free:     43
Done.     200 free
Trying to resize the pool back to     100
Node 0 HugePages_Free:     28
Node 1 HugePages_Free:     27
Node 2 HugePages_Free:     27
Node 3 HugePages_Free:     18
Done.     100 free

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cc875c6..6a732bb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -66,11 +66,56 @@ static void enqueue_huge_page(struct page *page)
 	free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page(struct vm_area_struct *vma,
+static struct page *dequeue_huge_page_node(int nid)
+{
+	struct page *page;
+
+	page = list_entry(hugepage_freelists[nid].next,
+					struct page, lru);
+	list_del(&page->lru);
+	free_huge_pages--;
+	free_huge_pages_node[nid]--;
+	return page;
+}
+
+static struct page *dequeue_huge_page(void)
+{
+	static int nid = -1;
+	struct page *page = NULL;
+	int start_nid;
+	int next_nid;
+
+	if (nid < 0)
+		nid = first_node(node_states[N_HIGH_MEMORY]);
+	start_nid = nid;
+
+	do {
+		if (!list_empty(&hugepage_freelists[nid]))
+			page = dequeue_huge_page_node(nid);
+		/*
+		 * Use a helper variable to find the next node and then
+		 * copy it back to nid nid afterwards: otherwise there's
+		 * a window in which a racer might pass invalid nid
+		 * MAX_NUMNODES to dequeue_huge_page_node. But we don't
+		 * need to use a spin_lock here: it really doesn't
+		 * matter if occasionally a racer chooses the same nid
+		 * as we do.  Move nid forward in the mask even if we
+		 * just successfully allocated a hugepage so that the
+		 * next caller frees hugepages on the next node.
+		 */
+		next_nid = next_node(nid, node_states[N_HIGH_MEMORY]);
+		if (next_nid == MAX_NUMNODES)
+			next_nid = first_node(node_states[N_HIGH_MEMORY]);
+		nid = next_nid;
+	} while (!page && nid != start_nid);
+
+	return page;
+}
+
+static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma,
 				unsigned long address)
 {
 	int nid;
-	struct page *page = NULL;
 	struct zonelist *zonelist = huge_zonelist(vma, address,
 						htlb_alloc_mask);
 	struct zone **z;
@@ -79,15 +124,10 @@ static struct page *dequeue_huge_page(struct vm_area_struct *vma,
 		nid = zone_to_nid(*z);
 		if (cpuset_zone_allowed_softwall(*z, htlb_alloc_mask) &&
 		    !list_empty(&hugepage_freelists[nid])) {
-			page = list_entry(hugepage_freelists[nid].next,
-					  struct page, lru);
-			list_del(&page->lru);
-			free_huge_pages--;
-			free_huge_pages_node[nid]--;
-			break;
+			return dequeue_huge_page_node(nid);
 		}
 	}
-	return page;
+	return NULL;
 }
 
 static void free_huge_page(struct page *page)
@@ -167,7 +207,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
 	else if (free_huge_pages <= resv_huge_pages)
 		goto fail;
 
-	page = dequeue_huge_page(vma, addr);
+	page = dequeue_huge_page_vma(vma, addr);
 	if (!page)
 		goto fail;
 
@@ -275,7 +315,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
 	count = max(count, resv_huge_pages);
 	try_to_free_low(count);
 	while (count < nr_huge_pages) {
-		struct page *page = dequeue_huge_page(NULL, 0);
+		struct page *page = dequeue_huge_page();
 		if (!page)
 			break;
 		update_and_free_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute
  2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
@ 2007-09-06 18:28     ` Nishanth Aravamudan
  2007-09-14 18:56       ` Christoph Lameter
  2007-09-14 18:54     ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Christoph Lameter
  1 sibling, 1 reply; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-06 18:28 UTC (permalink / raw)
  To: clameter; +Cc: wli, agl, lee.schermerhorn, linux-mm

Allow specifying the number of hugepages to allocate on a particular
node. Our current global sysctl will try its best to put hugepages
equally on each node, but htat may not always be desired. This allows
the admin to control the layout of hugepage allocation at a finer level
(while not breaking the existing interface).  Add callbacks in the sysfs
node registration and unregistration functions into hugetlb to add the
nr_hugepages attribute, which is a no-op if !NUMA or !HUGETLB.

Tested on 4-node ppc64, 2-node ia64 and 4-node x86_64.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/drivers/base/node.c b/drivers/base/node.c
index cae346e..c9d531f 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -151,6 +151,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
 		sysdev_create_file(&node->sysdev, &attr_distance);
+		hugetlb_register_node(node);
 	}
 	return error;
 }
@@ -168,6 +169,7 @@ void unregister_node(struct node *node)
 	sysdev_remove_file(&node->sysdev, &attr_meminfo);
 	sysdev_remove_file(&node->sysdev, &attr_numastat);
 	sysdev_remove_file(&node->sysdev, &attr_distance);
+	hugetlb_unregister_node(node);
 
 	sysdev_unregister(&node->sysdev);
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3a19b03..f8260ac 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -6,7 +6,9 @@
 #ifdef CONFIG_HUGETLB_PAGE
 
 #include <linux/mempolicy.h>
+#include <linux/node.h>
 #include <linux/shm.h>
+#include <linux/sysdev.h>
 #include <asm/tlbflush.h>
 
 struct ctl_table;
@@ -25,6 +27,13 @@ void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned lon
 int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
 int hugetlb_report_meminfo(char *);
 int hugetlb_report_node_meminfo(int, char *);
+#ifdef CONFIG_NUMA
+int hugetlb_register_node(struct node *);
+void hugetlb_unregister_node(struct node *);
+#else
+#define hugetlb_register_node(node)		0
+#define hugetlb_unregister_node(node)		do {} while(0)
+#endif
 unsigned long hugetlb_total_pages(void);
 int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, int write_access);
@@ -112,6 +121,8 @@ static inline unsigned long hugetlb_total_pages(void)
 #define unmap_hugepage_range(vma, start, end)	BUG()
 #define hugetlb_report_meminfo(buf)		0
 #define hugetlb_report_node_meminfo(n, buf)	0
+#define hugetlb_register_node(node)		0
+#define hugetlb_unregister_node(node)		do {} while(0)
 #define follow_huge_pmd(mm, addr, pmd, write)	NULL
 #define prepare_hugepage_range(addr,len)	(-EINVAL)
 #define pmd_huge(x)	0
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6a732bb..58306cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -261,12 +261,11 @@ static unsigned int cpuset_mems_nr(unsigned int *array)
 	return nr;
 }
 
-#ifdef CONFIG_SYSCTL
-static void update_and_free_page(struct page *page)
+static void update_and_free_page(int nid, struct page *page)
 {
 	int i;
 	nr_huge_pages--;
-	nr_huge_pages_node[page_to_nid(page)]--;
+	nr_huge_pages_node[nid]--;
 	for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
 		page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
 				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
@@ -278,30 +277,42 @@ static void update_and_free_page(struct page *page)
 }
 
 #ifdef CONFIG_HIGHMEM
+static void try_to_free_low_node(int nid, unsigned long count)
+{
+	struct page *page, *next;
+	list_for_each_entry_safe(page, next, &hugepage_freelists[nid], lru) {
+		if (PageHighMem(page))
+			continue;
+		list_del(&page->lru);
+		update_and_free_page(nid, page);
+		free_huge_pages--;
+		free_huge_pages_node[nid]--;
+		if (count >= nr_huge_pages_node[nid])
+			return;
+	}
+}
+
 static void try_to_free_low(unsigned long count)
 {
 	int i;
 
 	for (i = 0; i < MAX_NUMNODES; ++i) {
-		struct page *page, *next;
-		list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
-			if (PageHighMem(page))
-				continue;
-			list_del(&page->lru);
-			update_and_free_page(page);
-			free_huge_pages--;
-			free_huge_pages_node[page_to_nid(page)]--;
-			if (count >= nr_huge_pages)
-				return;
-		}
+		try_to_free_low_node(i, count);
+		if (count >= nr_huge_pages)
+			return;
 	}
 }
 #else
+static inline void try_to_free_low_node(int nid, unsigned long count)
+{
+}
+
 static inline void try_to_free_low(unsigned long count)
 {
 }
 #endif
 
+#ifdef CONFIG_SYSCTL
 static unsigned long set_max_huge_pages(unsigned long count)
 {
 	while (count > nr_huge_pages) {
@@ -318,7 +329,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
 		struct page *page = dequeue_huge_page();
 		if (!page)
 			break;
-		update_and_free_page(page);
+		update_and_free_page(page_to_nid(page), page);
 	}
 	spin_unlock(&hugetlb_lock);
 	return nr_huge_pages;
@@ -369,6 +380,67 @@ int hugetlb_report_node_meminfo(int nid, char *buf)
 		nid, free_huge_pages_node[nid]);
 }
 
+#ifdef CONFIG_NUMA
+static ssize_t hugetlb_read_nr_hugepages_node(struct sys_device *dev,
+							char *buf)
+{
+	return sprintf(buf, "%u\n", nr_huge_pages_node[dev->id]);
+}
+
+static ssize_t hugetlb_write_nr_hugepages_node(struct sys_device *dev,
+					const char *buf, size_t count)
+{
+	int nid = dev->id;
+	unsigned long target;
+	unsigned long free_on_other_nodes;
+	unsigned long nr_huge_pages_req = simple_strtoul(buf, NULL, 10);
+
+	while (nr_huge_pages_req > nr_huge_pages_node[nid]) {
+		if (!alloc_fresh_huge_page_node(nid))
+			return count;
+	}
+	if (nr_huge_pages_req >= nr_huge_pages_node[nid])
+		return count;
+
+	/* need to ensure that our counts are accurate */
+	spin_lock(&hugetlb_lock);
+	free_on_other_nodes = free_huge_pages - free_huge_pages_node[nid];
+	if (free_on_other_nodes >= resv_huge_pages) {
+		/* other nodes can satisfy reserve */
+		target = nr_huge_pages_req;
+	} else {
+		/* this node needs some free to satisfy reserve */
+		target = max((resv_huge_pages - free_on_other_nodes),
+						nr_huge_pages_req);
+	}
+	try_to_free_low_node(nid, target);
+	while (target < nr_huge_pages_node[nid]) {
+		struct page *page = dequeue_huge_page_node(nid);
+		if (!page)
+			break;
+		update_and_free_page(nid, page);
+	}
+	spin_unlock(&hugetlb_lock);
+
+	return count;
+}
+
+static SYSDEV_ATTR(nr_hugepages, S_IRUGO | S_IWUSR,
+			hugetlb_read_nr_hugepages_node,
+			hugetlb_write_nr_hugepages_node);
+
+int hugetlb_register_node(struct node *node)
+{
+	return sysdev_create_file(&node->sysdev, &attr_nr_hugepages);
+}
+
+void hugetlb_unregister_node(struct node *node)
+{
+	sysdev_remove_file(&node->sysdev, &attr_nr_hugepages);
+}
+
+#endif
+
 /* Return the number pages of memory we physically have, in PAGE_SIZE units. */
 unsigned long hugetlb_total_pages(void)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute
  2007-09-06 18:28     ` [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
@ 2007-09-14 18:56       ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 18:56 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: wli, agl, lee.schermerhorn, linux-mm

On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:

> hugetlb: add per-node nr_hugepages sysfs attribute

Looks good. Nice new functionality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
  2007-09-06 18:28     ` [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
@ 2007-09-14 18:54     ` Christoph Lameter
  2007-09-14 19:03       ` Lee Schermerhorn
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 18:54 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: wli, agl, lee.schermerhorn, linux-mm

On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:

> +static struct page *dequeue_huge_page(void)
> +{
> +	static int nid = -1;
> +	struct page *page = NULL;
> +	int start_nid;
> +	int next_nid;
> +
> +	if (nid < 0)
> +		nid = first_node(node_states[N_HIGH_MEMORY]);
> +	start_nid = nid;

nid is -1 so the tests are useless.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 18:54     ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Christoph Lameter
@ 2007-09-14 19:03       ` Lee Schermerhorn
  2007-09-14 19:42         ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: Lee Schermerhorn @ 2007-09-14 19:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Nishanth Aravamudan, wli, agl, linux-mm

On Fri, 2007-09-14 at 11:54 -0700, Christoph Lameter wrote:
> On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:
> 
> > +static struct page *dequeue_huge_page(void)
> > +{
> > +	static int nid = -1;
> > +	struct page *page = NULL;
> > +	int start_nid;
> > +	int next_nid;
> > +
> > +	if (nid < 0)
> > +		nid = first_node(node_states[N_HIGH_MEMORY]);
> > +	start_nid = nid;
> 
> nid is -1 so the tests are useless.
> 
start_nid is a [private] static variable.  It is initialized to -1 at
boot, and thereafter loops around nodes on each call, as huge pages are
allocated.  It is only == -1 on the very first call to this function. I
think it has worked like this since hugetlbfs was added.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 19:03       ` Lee Schermerhorn
@ 2007-09-14 19:42         ` Christoph Lameter
  2007-09-14 20:09           ` Lee Schermerhorn
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 19:42 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Nishanth Aravamudan, wli, agl, linux-mm

On Fri, 14 Sep 2007, Lee Schermerhorn wrote:

> On Fri, 2007-09-14 at 11:54 -0700, Christoph Lameter wrote:
> > On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:
> > 
> > > +static struct page *dequeue_huge_page(void)
> > > +{
> > > +	static int nid = -1;
> > > +	struct page *page = NULL;
> > > +	int start_nid;
> > > +	int next_nid;
> > > +
> > > +	if (nid < 0)
> > > +		nid = first_node(node_states[N_HIGH_MEMORY]);
> > > +	start_nid = nid;
> > 
> > nid is -1 so the tests are useless.
> > 
> start_nid is a [private] static variable.  It is initialized to -1 at
> boot, and thereafter loops around nodes on each call, as huge pages are
> allocated.  It is only == -1 on the very first call to this function. I
> think it has worked like this since hugetlbfs was added.

Ahh not start_nid but nid is a static variable that is initialized to -1. 
Could we move that out of dequeue_huge_page? Its confusing.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 19:42         ` Christoph Lameter
@ 2007-09-14 20:09           ` Lee Schermerhorn
  2007-09-14 20:16             ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: Lee Schermerhorn @ 2007-09-14 20:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Nishanth Aravamudan, wli, agl, linux-mm

On Fri, 2007-09-14 at 12:42 -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Lee Schermerhorn wrote:
> 
> > On Fri, 2007-09-14 at 11:54 -0700, Christoph Lameter wrote:
> > > On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:
> > > 
> > > > +static struct page *dequeue_huge_page(void)
> > > > +{
> > > > +	static int nid = -1;
> > > > +	struct page *page = NULL;
> > > > +	int start_nid;
> > > > +	int next_nid;
> > > > +
> > > > +	if (nid < 0)
> > > > +		nid = first_node(node_states[N_HIGH_MEMORY]);
> > > > +	start_nid = nid;
> > > 
> > > nid is -1 so the tests are useless.
> > > 
> > start_nid is a [private] static variable.  It is initialized to -1 at
> > boot, and thereafter loops around nodes on each call, as huge pages are
> > allocated.  It is only == -1 on the very first call to this function. I
> > think it has worked like this since hugetlbfs was added.
> 
> Ahh not start_nid but nid is a static variable that is initialized to -1. 
> Could we move that out of dequeue_huge_page? Its confusing.

Yeah, I mistyped...  But, nid IS private to that function.  This is a
valid use of static.  But, perhaps it could use a comment to call
attention to it.

Lee


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 20:09           ` Lee Schermerhorn
@ 2007-09-14 20:16             ` Christoph Lameter
  2007-09-14 20:33               ` Lee Schermerhorn
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 20:16 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Nishanth Aravamudan, wli, agl, linux-mm

On Fri, 14 Sep 2007, Lee Schermerhorn wrote:

> Yeah, I mistyped...  But, nid IS private to that function.  This is a
> valid use of static.  But, perhaps it could use a comment to call
> attention to it.

I think its best to move nis outside of the function and give it a longer 
name that is distinctive from names we use for local variables. F.e.

last_allocated_node

?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 20:16             ` Christoph Lameter
@ 2007-09-14 20:33               ` Lee Schermerhorn
  2007-09-24 23:23                 ` Nishanth Aravamudan
  0 siblings, 1 reply; 23+ messages in thread
From: Lee Schermerhorn @ 2007-09-14 20:33 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Nishanth Aravamudan, wli, agl, linux-mm

On Fri, 2007-09-14 at 13:16 -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Lee Schermerhorn wrote:
> 
> > Yeah, I mistyped...  But, nid IS private to that function.  This is a
> > valid use of static.  But, perhaps it could use a comment to call
> > attention to it.
> 
> I think its best to move nis outside of the function and give it a longer 
> name that is distinctive from names we use for local variables. F.e.
> 
> last_allocated_node
> 
> ?

I do like to see variables' [and functions'] visibility kept within the
minimum necessary scope, and moving it outside of the function violates
this.  Nothing else in the source file needs it.  But, If Nish agrees, I
guess I don't feel that strongly about it.  I like the suggested name,
tho'

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-14 20:33               ` Lee Schermerhorn
@ 2007-09-24 23:23                 ` Nishanth Aravamudan
  2007-09-24 23:29                   ` Nishanth Aravamudan
  0 siblings, 1 reply; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-24 23:23 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Christoph Lameter, wli, agl, linux-mm

On 14.09.2007 [16:33:00 -0400], Lee Schermerhorn wrote:
> On Fri, 2007-09-14 at 13:16 -0700, Christoph Lameter wrote:
> > On Fri, 14 Sep 2007, Lee Schermerhorn wrote:
> > 
> > > Yeah, I mistyped...  But, nid IS private to that function.  This is a
> > > valid use of static.  But, perhaps it could use a comment to call
> > > attention to it.
> > 
> > I think its best to move nis outside of the function and give it a longer 
> > name that is distinctive from names we use for local variables. F.e.
> > 
> > last_allocated_node
> > 
> > ?
> 
> I do like to see variables' [and functions'] visibility kept within
> the minimum necessary scope, and moving it outside of the function
> violates this.  Nothing else in the source file needs it.  But, If
> Nish agrees, I guess I don't feel that strongly about it.  I like the
> suggested name, tho'

I've changed the name, but I don't see how moving the scope helps. I
guess I could it make it globally static -- as opposed to local to the
function -- and then it would be easier to dequeue based upon the
global's value (something Lee asked for earlier). However, that would
require locking to avoid races between two processes both echo'ing
values into the sysctl? I guess it's not a serious race with the sanity
check that Andrew has in there, it just means sometimes a node might get
skipped in the interleaving...

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] hugetlb: interleave dequeueing of huge pages
  2007-09-24 23:23                 ` Nishanth Aravamudan
@ 2007-09-24 23:29                   ` Nishanth Aravamudan
  0 siblings, 0 replies; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-24 23:29 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Christoph Lameter, wli, agl, linux-mm

On 24.09.2007 [16:23:46 -0700], Nishanth Aravamudan wrote:
> On 14.09.2007 [16:33:00 -0400], Lee Schermerhorn wrote:
> > On Fri, 2007-09-14 at 13:16 -0700, Christoph Lameter wrote:
> > > On Fri, 14 Sep 2007, Lee Schermerhorn wrote:
> > > 
> > > > Yeah, I mistyped...  But, nid IS private to that function.  This is a
> > > > valid use of static.  But, perhaps it could use a comment to call
> > > > attention to it.
> > > 
> > > I think its best to move nis outside of the function and give it a longer 
> > > name that is distinctive from names we use for local variables. F.e.
> > > 
> > > last_allocated_node
> > > 
> > > ?
> > 
> > I do like to see variables' [and functions'] visibility kept within
> > the minimum necessary scope, and moving it outside of the function
> > violates this.  Nothing else in the source file needs it.  But, If
> > Nish agrees, I guess I don't feel that strongly about it.  I like the
> > suggested name, tho'
> 
> I've changed the name, but I don't see how moving the scope helps. I
> guess I could it make it globally static -- as opposed to local to the
> function -- and then it would be easier to dequeue based upon the
> global's value (something Lee asked for earlier). However, that would
> require locking to avoid races between two processes both echo'ing
> values into the sysctl? I guess it's not a serious race with the sanity
> check that Andrew has in there, it just means sometimes a node might get
> skipped in the interleaving...

err, not skipped, but allocated to twice. Then again, we already have a
comment to that effect now. So I'll go ahead and test this out.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hugetlb: fix pool allocation with empty nodes
  2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
  2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
@ 2007-09-14 18:53   ` Christoph Lameter
  2007-09-14 18:57     ` Christoph Lameter
  2007-10-02 22:47     ` Nishanth Aravamudan
  1 sibling, 2 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 18:53 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: anton, wli, agl, lee.schermerhorn, linux-mm

On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:

>  	if (nid < 0)
> -		nid = first_node(node_online_map);
> +		nid = first_node(node_states[N_HIGH_MEMORY]);
>  	start_nid = nid;

Can huge pages live in high memory? Otherwise I think we could use
N_REGULAR_MEMORY here. There may be issues on 32 bit NUMA if we attempt to 
allocate memory from the highmem nodes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hugetlb: fix pool allocation with empty nodes
  2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
@ 2007-09-14 18:57     ` Christoph Lameter
  2007-10-02 22:47     ` Nishanth Aravamudan
  1 sibling, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 18:57 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: anton, wli, agl, lee.schermerhorn, linux-mm

Actually we may want to introduce a new nodemask N_HUGEPAGES or so? That 
could contain the nodemask determined at boot?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hugetlb: fix pool allocation with empty nodes
  2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
  2007-09-14 18:57     ` Christoph Lameter
@ 2007-10-02 22:47     ` Nishanth Aravamudan
  2007-10-02 23:39       ` Christoph Lameter
  1 sibling, 1 reply; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-10-02 22:47 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: anton, wli, agl, lee.schermerhorn, linux-mm

On 14.09.2007 [11:53:25 -0700], Christoph Lameter wrote:
> On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:
> 
> >  	if (nid < 0)
> > -		nid = first_node(node_online_map);
> > +		nid = first_node(node_states[N_HIGH_MEMORY]);
> >  	start_nid = nid;
> 
> Can huge pages live in high memory? Otherwise I think we could use
> N_REGULAR_MEMORY here. There may be issues on 32 bit NUMA if we
> attempt to allocate memory from the highmem nodes.

hugepages are allocated with:

	htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN

where

	static gfp_t htlb_alloc_mask = GFP_HIGHUSER;

which, in turn, is:

	#define GFP_HIGHUSER \
	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM)

So, yes, they can come from HIGHMEM, AFAICT. And I've tested this
patchset (at some point in the past admittedly) on NUMA-Q.

But I'm confused by your question altogether now. Looking at
2.6.23-rc8-mm2:

memoryless-nodes-introduce-mask-of-nodes-with-memory
(74a0f5ea5609629a07fd73d59bde255a56a57fa5):

A node has its bit in N_HIGH_MEMORY set if it has any memory regardless
of t type of memory.  If a node has memory then it has at least one zone
defined in its pgdat structure that is located in the pgdat itself.

And, indeed, if CONFIG_HIGHMEM is off, N_HIGH_MEMORY == N_NORMAL_MEMORY.

So I think I'm ok?

I'll make sure to test on 32-bit NUMA (well, if 2.6.23-rc8-mm2 works on
it, of course. Looks like -mm1 did and -mm2 is still pending.)

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] hugetlb: fix pool allocation with empty nodes
  2007-10-02 22:47     ` Nishanth Aravamudan
@ 2007-10-02 23:39       ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-10-02 23:39 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: anton, wli, agl, lee.schermerhorn, linux-mm

On Tue, 2 Oct 2007, Nishanth Aravamudan wrote:

> A node has its bit in N_HIGH_MEMORY set if it has any memory regardless
> of t type of memory.  If a node has memory then it has at least one zone
> defined in its pgdat structure that is located in the pgdat itself.
> 
> And, indeed, if CONFIG_HIGHMEM is off, N_HIGH_MEMORY == N_NORMAL_MEMORY.
> 
> So I think I'm ok?

Yes that reasoning sounds sane.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-06 18:21 [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
  2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
@ 2007-09-14 17:26 ` Nishanth Aravamudan
  2007-09-14 17:43   ` Christoph Lameter
  2007-09-14 18:51 ` Christoph Lameter
  2 siblings, 1 reply; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-14 17:26 UTC (permalink / raw)
  To: clameter; +Cc: wli, agl, lee.schermerhorn, akpm, linux-mm

On 06.09.2007 [11:21:34 -0700], Nishanth Aravamudan wrote:
> hugetlb: search harder for memory in alloc_fresh_huge_page()
> 
> Currently, alloc_fresh_huge_page() returns NULL when it is not able to
> allocate a huge page on the current node, as specified by its custom
> interleave variable. The callers of this function, though, assume that a
> failure in alloc_fresh_huge_page() indicates no hugepages can be
> allocated on the system period. This might not be the case, for
> instance, if we have an uneven NUMA system, and we happen to try to
> allocate a hugepage on a node (with __GFP_THISNODE) with less memory and
> fail, while there is still plenty of free memory on the other nodes.
> 
> To correct this, make alloc_fresh_huge_page() search through all online
> nodes before deciding no hugepages can be allocated. Add a helper
> function for actually allocating the hugepage. Also, while we expect
> particular semantics for __GFP_THISNODE, which are newly enforced --
> that is, that the allocation won't go off-node -- still use
> page_to_nid() to guarantee we don't mess up the accounting.

Christoph, Lee, ping? I haven't heard any response on these patches this
time around. Would it be acceptable to ask Andrew to pick them up for
the next -mm?

Andrew, there probably will be conflicts with Lee's nodes_state patches
and perhaps other patches queued for -mm, if you'd like me to
rebase/retest before picking them up.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-14 17:26 ` [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
@ 2007-09-14 17:43   ` Christoph Lameter
  2007-09-14 18:20     ` Lee Schermerhorn
  2007-09-24 16:22     ` Nishanth Aravamudan
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 17:43 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: wli, agl, lee.schermerhorn, akpm, linux-mm

On Fri, 14 Sep 2007, Nishanth Aravamudan wrote:

> Christoph, Lee, ping? I haven't heard any response on these patches this
> time around. Would it be acceptable to ask Andrew to pick them up for
> the next -mm?

I am sorry but there is some churn already going on with other core memory 
management patches. Could we hold this off until the dust settles on those 
and then rebase?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-14 17:43   ` Christoph Lameter
@ 2007-09-14 18:20     ` Lee Schermerhorn
  2007-09-24 16:22     ` Nishanth Aravamudan
  1 sibling, 0 replies; 23+ messages in thread
From: Lee Schermerhorn @ 2007-09-14 18:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Nishanth Aravamudan, wli, agl, akpm, linux-mm

On Fri, 2007-09-14 at 10:43 -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Nishanth Aravamudan wrote:
> 
> > Christoph, Lee, ping? I haven't heard any response on these patches this
> > time around. Would it be acceptable to ask Andrew to pick them up for
> > the next -mm?
> 
> I am sorry but there is some churn already going on with other core memory 
> management patches. Could we hold this off until the dust settles on those 
> and then rebase?

Hi, Nish:

Sorry not to have responded sooner.  I have been building your patches
atop my memory policy changes, and I did test them on my platform.  The
seem to work.  There was one conflict with my memory policy reference
counting fix, but that was easy to resolve.  I'd have no problem with
these going in.  They probably will conflict with Mel's patches, but
again this should be easy to resolve.

Earlier Christoph said he didn't think Mel's 'one zonelist' series would
make .24.  I think that's still under discussion, but if Mel's patches
don't make .24, then I think these should go in.  So, I'll go ahead and
ACK them as they are, against 23-rc4-mm1.  Still, I think it would a
good idea for you to grab Mel's patches check out the conflicts.
Whether to rebase Mel's atop yours or vice versa is a more difficult
question.  

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-14 17:43   ` Christoph Lameter
  2007-09-14 18:20     ` Lee Schermerhorn
@ 2007-09-24 16:22     ` Nishanth Aravamudan
  2007-09-24 19:07       ` Christoph Lameter
  1 sibling, 1 reply; 23+ messages in thread
From: Nishanth Aravamudan @ 2007-09-24 16:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: wli, agl, lee.schermerhorn, akpm, linux-mm

On 14.09.2007 [10:43:20 -0700], Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Nishanth Aravamudan wrote:
> 
> > Christoph, Lee, ping? I haven't heard any response on these patches this
> > time around. Would it be acceptable to ask Andrew to pick them up for
> > the next -mm?
> 
> I am sorry but there is some churn already going on with other core
> memory management patches. Could we hold this off until the dust
> settles on those and then rebase?

Yes, I'll keep tracking -mm with my series. I wonder, though, if it
would be possible to at least get the bugfixes for memoryless nodes in
hugetlb code (patches 1 and 2) in to -mm sooner rather than later (I can
fix your issues with the static variable, I hope). The other two patches
are more feature-like, so can be postponed for now.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-24 16:22     ` Nishanth Aravamudan
@ 2007-09-24 19:07       ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-24 19:07 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: wli, agl, lee.schermerhorn, akpm, linux-mm

On Mon, 24 Sep 2007, Nishanth Aravamudan wrote:

> Yes, I'll keep tracking -mm with my series. I wonder, though, if it
> would be possible to at least get the bugfixes for memoryless nodes in
> hugetlb code (patches 1 and 2) in to -mm sooner rather than later (I can
> fix your issues with the static variable, I hope). The other two patches
> are more feature-like, so can be postponed for now.

Sure. Please post them and CC me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
  2007-09-06 18:21 [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
  2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
  2007-09-14 17:26 ` [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
@ 2007-09-14 18:51 ` Christoph Lameter
  2 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-09-14 18:51 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: wli, agl, lee.schermerhorn, linux-mm

On Thu, 6 Sep 2007, Nishanth Aravamudan wrote:

> particular semantics for __GFP_THISNODE, which are newly enforced --
> that is, that the allocation won't go off-node -- still use
> page_to_nid() to guarantee we don't mess up the accounting.

Hmmm..... Suspicious?

> +static int alloc_fresh_huge_page(void)
> +{
> +	static int nid = -1;
> +	struct page *page;
> +	int start_nid;
> +	int next_nid;
> +	int ret = 0;
> +
> +	if (nid < 0)

nid was set to -1 so why the if statement?

> +		nid = first_node(node_online_map);
> +	start_nid = nid;

Replace the above with

start_nid = first_node(node_online_map)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-10-02 23:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-06 18:21 [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
2007-09-06 18:28     ` [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-09-14 18:56       ` Christoph Lameter
2007-09-14 18:54     ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Christoph Lameter
2007-09-14 19:03       ` Lee Schermerhorn
2007-09-14 19:42         ` Christoph Lameter
2007-09-14 20:09           ` Lee Schermerhorn
2007-09-14 20:16             ` Christoph Lameter
2007-09-14 20:33               ` Lee Schermerhorn
2007-09-24 23:23                 ` Nishanth Aravamudan
2007-09-24 23:29                   ` Nishanth Aravamudan
2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
2007-09-14 18:57     ` Christoph Lameter
2007-10-02 22:47     ` Nishanth Aravamudan
2007-10-02 23:39       ` Christoph Lameter
2007-09-14 17:26 ` [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
2007-09-14 17:43   ` Christoph Lameter
2007-09-14 18:20     ` Lee Schermerhorn
2007-09-24 16:22     ` Nishanth Aravamudan
2007-09-24 19:07       ` Christoph Lameter
2007-09-14 18:51 ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).