[PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nishanth Aravamudan <nacc@us.ibm.com>
To: clameter@sgi.com
Cc: wli@holomorphy.com, agl@us.ibm.com, lee.schermerhorn@hp.com,
	linux-mm@kvack.org
Subject: [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()
Date: Thu, 6 Sep 2007 11:21:34 -0700	[thread overview]
Message-ID: <20070906182134.GA7779@us.ibm.com> (raw)

Currently, alloc_fresh_huge_page() returns NULL when it is not able to
allocate a huge page on the current node, as specified by its custom
interleave variable. The callers of this function, though, assume that a
failure in alloc_fresh_huge_page() indicates no hugepages can be
allocated on the system period. This might not be the case, for
instance, if we have an uneven NUMA system, and we happen to try to
allocate a hugepage on a node (with __GFP_THISNODE) with less memory and
fail, while there is still plenty of free memory on the other nodes.

To correct this, make alloc_fresh_huge_page() search through all online
nodes before deciding no hugepages can be allocated. Add a helper
function for actually allocating the hugepage. Also, while we expect
particular semantics for __GFP_THISNODE, which are newly enforced --
that is, that the allocation won't go off-node -- still use
page_to_nid() to guarantee we don't mess up the accounting.

Tested on 4-node ppc64, 2-node ia64 and 4-node x86_64.

Before this patch on a 4-node ppc64 with the following memory
characteristics:

Node 0 MemTotal:      1310720 kB
Node 1 MemTotal:      1048576 kB
Node 2 MemTotal:      1048576 kB
Node 3 MemTotal:       786432 kB

Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     25
Node 1 HugePages_Free:     25
Node 2 HugePages_Free:     25
Node 3 HugePages_Free:     25
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:     50
Node 1 HugePages_Free:     57
Node 2 HugePages_Free:     52
Node 3 HugePages_Free:     41
Done.     200 free

After:

Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     25
Node 1 HugePages_Free:     25
Node 2 HugePages_Free:     25
Node 3 HugePages_Free:     25
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:     53
Node 1 HugePages_Free:     53
Node 2 HugePages_Free:     52
Node 3 HugePages_Free:     42
Done.     200 free

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c53bd5a..edb2100 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -101,26 +101,13 @@ static void free_huge_page(struct page *page)
 	spin_unlock(&hugetlb_lock);
 }
 
-static int alloc_fresh_huge_page(void)
+static struct page *alloc_fresh_huge_page_node(int nid)
 {
-	static int prev_nid;
 	struct page *page;
-	int nid;
-
-	/*
-	 * Copy static prev_nid to local nid, work on that, then copy it
-	 * back to prev_nid afterwards: otherwise there's a window in which
-	 * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node.
-	 * But we don't need to use a spin_lock here: it really doesn't
-	 * matter if occasionally a racer chooses the same nid as we do.
-	 */
-	nid = next_node(prev_nid, node_online_map);
-	if (nid == MAX_NUMNODES)
-		nid = first_node(node_online_map);
-	prev_nid = nid;
 
-	page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
-					HUGETLB_PAGE_ORDER);
+	page = alloc_pages_node(nid,
+		htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN,
+		HUGETLB_PAGE_ORDER);
 	if (page) {
 		set_compound_page_dtor(page, free_huge_page);
 		spin_lock(&hugetlb_lock);
@@ -128,9 +115,45 @@ static int alloc_fresh_huge_page(void)
 		nr_huge_pages_node[page_to_nid(page)]++;
 		spin_unlock(&hugetlb_lock);
 		put_page(page); /* free it into the hugepage allocator */
-		return 1;
 	}
-	return 0;
+
+	return page;
+}
+
+static int alloc_fresh_huge_page(void)
+{
+	static int nid = -1;
+	struct page *page;
+	int start_nid;
+	int next_nid;
+	int ret = 0;
+
+	if (nid < 0)
+		nid = first_node(node_online_map);
+	start_nid = nid;
+
+	do {
+		page = alloc_fresh_huge_page_node(nid);
+		if (page)
+			ret = 1;
+		/*
+		 * Use a helper variable to find the next node and then
+		 * copy it back to nid nid afterwards: otherwise there's
+		 * a window in which a racer might pass invalid nid
+		 * MAX_NUMNODES to alloc_pages_node.  But we don't need
+		 * to use a spin_lock here: it really doesn't matter if
+		 * occasionally a racer chooses the same nid as we do.
+		 * Move nid forward in the mask even if we just
+		 * successfully allocated a hugepage so that the next
+		 * caller gets hugepages on the next node.
+		 */
+		next_nid = next_node(nid, node_online_map);
+		if (next_nid == MAX_NUMNODES)
+			next_nid = first_node(node_online_map);
+		nid = next_nid;
+	} while (!page && nid != start_nid);
+
+	return ret;
 }
 
 static struct page *alloc_huge_page(struct vm_area_struct *vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2007-09-06 18:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-06 18:21 Nishanth Aravamudan [this message]
2007-09-06 18:24 ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Nishanth Aravamudan
2007-09-06 18:27   ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
2007-09-06 18:28     ` [PATCH 4/4] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-09-14 18:56       ` Christoph Lameter
2007-09-14 18:54     ` [PATCH 3/4] hugetlb: interleave dequeueing of huge pages Christoph Lameter
2007-09-14 19:03       ` Lee Schermerhorn
2007-09-14 19:42         ` Christoph Lameter
2007-09-14 20:09           ` Lee Schermerhorn
2007-09-14 20:16             ` Christoph Lameter
2007-09-14 20:33               ` Lee Schermerhorn
2007-09-24 23:23                 ` Nishanth Aravamudan
2007-09-24 23:29                   ` Nishanth Aravamudan
2007-09-14 18:53   ` [PATCH 2/4] hugetlb: fix pool allocation with empty nodes Christoph Lameter
2007-09-14 18:57     ` Christoph Lameter
2007-10-02 22:47     ` Nishanth Aravamudan
2007-10-02 23:39       ` Christoph Lameter
2007-09-14 17:26 ` [PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
2007-09-14 17:43   ` Christoph Lameter
2007-09-14 18:20     ` Lee Schermerhorn
2007-09-24 16:22     ` Nishanth Aravamudan
2007-09-24 19:07       ` Christoph Lameter
2007-09-14 18:51 ` Christoph Lameter

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c53bd5a dfblob:edb2100 )
 OR (
bs:"[PATCH 1/4] hugetlb: search harder for memory in alloc_fresh_huge_page()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070906182134.GA7779@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=agl@us.ibm.com \
    --cc=clameter@sgi.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).