[PATCH 1/2] hugetlb: Delay page zeroing for faulted pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Adam Litke <agl@us.ibm.com>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 1/2] hugetlb: Delay page zeroing for faulted pages
Date: Wed, 11 Jan 2006 16:02:40 -0600	[thread overview]
Message-ID: <1137016960.9672.5.camel@localhost.localdomain> (raw)
In-Reply-To: <1136920951.23288.5.camel@localhost.localdomain>

I've come up with a much better idea to resolve the issue I mention
below.  The attached patch changes hugetlb_no_page to allocate unzeroed
huge pages initially.  For shared mappings, we wait until after
inserting the page into the page_cache succeeds before we zero it.  This
has a side benefit of preventing the wasted zeroing that happened often
in the original code.  The page_lock should guard against someone else
using the page before it has been zeroed (but correct me if I am wrong
here).  The patch doesn't completely close the race (there is a much
smaller window without the zeroing though).  The next patch should close
the race window completely.

On Tue, 2006-01-10 at 13:22 -0600, Adam Litke wrote: 
> The race occurs when multiple threads shmat a hugetlb area and begin
> faulting in it's pages.  During a hugetlb fault, hugetlb_no_page checks
> for the page in the page cache.  If not found, it allocates (and zeroes)
> a new page and tries to add it to the page cache.  If this fails, the
> huge page is freed and we retry the page cache lookup (assuming someone
> else beat us to the add_to_page_cache call).
> 
> The above works fine, but due to the large window (while zeroing the
> huge page) it is possible that many threads could be "borrowing" pages
> only to return them later.  This causes free_hugetlb_pages to be lower
> than the logical number of free pages and some threads trying to shmat
> can falsely fail the accounting check.

Signed-off-by: Adam Litke <agl@us.ibm.com>

 hugetlb.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)
diff -upN reference/mm/hugetlb.c current/mm/hugetlb.c
--- reference/mm/hugetlb.c
+++ current/mm/hugetlb.c
@@ -92,10 +92,10 @@ void free_huge_page(struct page *page)
 	spin_unlock(&hugetlb_lock);
 }
 
-struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr)
+struct page *alloc_unzeroed_huge_page(struct vm_area_struct *vma,
+					unsigned long addr)
 {
 	struct page *page;
-	int i;
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page(vma, addr);
@@ -106,8 +106,26 @@ struct page *alloc_huge_page(struct vm_a
 	spin_unlock(&hugetlb_lock);
 	set_page_count(page, 1);
 	page[1].mapping = (void *)free_huge_page;
+
+	return page;
+}
+	
+void zero_huge_page(struct page *page)
+{
+	int i;
+	
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
+}
+
+struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct page *page;
+
+	page = alloc_unzeroed_huge_page(vma, addr);
+	if (page)
+		zero_huge_page(page);
+	
 	return page;
 }
 
@@ -441,7 +459,7 @@ retry:
 	if (!page) {
 		if (hugetlb_get_quota(mapping))
 			goto out;
-		page = alloc_huge_page(vma, address);
+		page = alloc_unzeroed_huge_page(vma, address);
 		if (!page) {
 			hugetlb_put_quota(mapping);
 			goto out;
@@ -460,6 +478,8 @@ retry:
 			}
 		} else
 			lock_page(page);
+
+		zero_huge_page(page);
 	}
 
 	spin_lock(&mm->page_table_lock);


-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

WARNING: multiple messages have this Message-ID (diff)

From: Adam Litke <agl@us.ibm.com>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 1/2] hugetlb: Delay page zeroing for faulted pages
Date: Wed, 11 Jan 2006 16:02:40 -0600	[thread overview]
Message-ID: <1137016960.9672.5.camel@localhost.localdomain> (raw)
In-Reply-To: <1136920951.23288.5.camel@localhost.localdomain>

I've come up with a much better idea to resolve the issue I mention
below.  The attached patch changes hugetlb_no_page to allocate unzeroed
huge pages initially.  For shared mappings, we wait until after
inserting the page into the page_cache succeeds before we zero it.  This
has a side benefit of preventing the wasted zeroing that happened often
in the original code.  The page_lock should guard against someone else
using the page before it has been zeroed (but correct me if I am wrong
here).  The patch doesn't completely close the race (there is a much
smaller window without the zeroing though).  The next patch should close
the race window completely.

On Tue, 2006-01-10 at 13:22 -0600, Adam Litke wrote: 
> The race occurs when multiple threads shmat a hugetlb area and begin
> faulting in it's pages.  During a hugetlb fault, hugetlb_no_page checks
> for the page in the page cache.  If not found, it allocates (and zeroes)
> a new page and tries to add it to the page cache.  If this fails, the
> huge page is freed and we retry the page cache lookup (assuming someone
> else beat us to the add_to_page_cache call).
> 
> The above works fine, but due to the large window (while zeroing the
> huge page) it is possible that many threads could be "borrowing" pages
> only to return them later.  This causes free_hugetlb_pages to be lower
> than the logical number of free pages and some threads trying to shmat
> can falsely fail the accounting check.

Signed-off-by: Adam Litke <agl@us.ibm.com>

 hugetlb.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)
diff -upN reference/mm/hugetlb.c current/mm/hugetlb.c
--- reference/mm/hugetlb.c
+++ current/mm/hugetlb.c
@@ -92,10 +92,10 @@ void free_huge_page(struct page *page)
 	spin_unlock(&hugetlb_lock);
 }
 
-struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr)
+struct page *alloc_unzeroed_huge_page(struct vm_area_struct *vma,
+					unsigned long addr)
 {
 	struct page *page;
-	int i;
 
 	spin_lock(&hugetlb_lock);
 	page = dequeue_huge_page(vma, addr);
@@ -106,8 +106,26 @@ struct page *alloc_huge_page(struct vm_a
 	spin_unlock(&hugetlb_lock);
 	set_page_count(page, 1);
 	page[1].mapping = (void *)free_huge_page;
+
+	return page;
+}
+	
+void zero_huge_page(struct page *page)
+{
+	int i;
+	
 	for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
 		clear_highpage(&page[i]);
+}
+
+struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct page *page;
+
+	page = alloc_unzeroed_huge_page(vma, addr);
+	if (page)
+		zero_huge_page(page);
+	
 	return page;
 }
 
@@ -441,7 +459,7 @@ retry:
 	if (!page) {
 		if (hugetlb_get_quota(mapping))
 			goto out;
-		page = alloc_huge_page(vma, address);
+		page = alloc_unzeroed_huge_page(vma, address);
 		if (!page) {
 			hugetlb_put_quota(mapping);
 			goto out;
@@ -460,6 +478,8 @@ retry:
 			}
 		} else
 			lock_page(page);
+
+		zero_huge_page(page);
 	}
 
 	spin_lock(&mm->page_table_lock);


-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-01-11 22:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-10 19:22 Hugetlb: Shared memory race Adam Litke
2006-01-10 19:22 ` Adam Litke
2006-01-10 19:44 ` William Lee Irwin III
2006-01-10 19:44   ` William Lee Irwin III
2006-01-11 22:02 ` Adam Litke [this message]
2006-01-11 22:02   ` [PATCH 1/2] hugetlb: Delay page zeroing for faulted pages Adam Litke
2006-01-11 22:24   ` [PATCH 2/2] hugetlb: synchronize alloc with page cache insert Adam Litke
2006-01-11 22:24     ` Adam Litke
2006-01-11 22:52     ` William Lee Irwin III
2006-01-11 22:52       ` William Lee Irwin III
2006-01-11 23:03       ` Adam Litke
2006-01-11 23:03         ` Adam Litke
2006-01-11 23:24         ` William Lee Irwin III
2006-01-11 23:24           ` William Lee Irwin III
2006-01-11 23:46         ` Chen, Kenneth W
2006-01-11 23:46           ` Chen, Kenneth W
2006-01-12  0:40     ` Chen, Kenneth W
2006-01-12  0:40       ` Chen, Kenneth W
2006-01-12  1:05       ` William Lee Irwin III
2006-01-12  1:05         ` William Lee Irwin III
2006-01-12 17:26         ` Adam Litke
2006-01-12 17:26           ` Adam Litke
2006-01-12 19:07           ` Chen, Kenneth W
2006-01-12 19:07             ` Chen, Kenneth W
2006-01-12 19:48             ` Adam Litke
2006-01-12 19:48               ` Adam Litke
2006-01-12 20:06               ` Chen, Kenneth W
2006-01-12 20:06                 ` Chen, Kenneth W
2006-01-11 22:42   ` [PATCH 1/2] hugetlb: Delay page zeroing for faulted pages William Lee Irwin III
2006-01-11 22:42     ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1137016960.9672.5.camel@localhost.localdomain \
    --to=agl@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.