RFC: Cleanup / small fixes to hugetlb fault handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Adam Litke <agl@us.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hugh@veritas.com, William Irwin <wli@holomorphy.com>
Subject: RFC: Cleanup / small fixes to hugetlb fault handling
Date: Wed, 26 Oct 2005 12:00:55 +1000	[thread overview]
Message-ID: <20051026020055.GA17191@localhost.localdomain> (raw)

Hi, Adam, Bill, Hugh,

Does this look like a reasonable patch to send to akpm for -mm.

This patch makes some slight tweaks / cleanups to the fault handling
path for huge pages in -mm.  My main motivation is to make it simpler
to fit COW in, but along the way it addresses a few minor problems
with the existing code:

- The check against i_size was duplicated: once in
  find_lock_huge_page() and again in hugetlb_fault() after taking the
  page_table_lock.  We only really need the locked one, so remove the
  other.

- find_lock_huge_page() didn't, in fact, lock the page if it newly
  allocated one, rather than finding it in the page cache already.  As
  far as I can tell this is a bug, so the patch corrects it.

- find_lock_huge_page() isn't a great name, since it does extra things
  not analagous to find_lock_page().  Rename it
  find_or_alloc_huge_page() which is closer to the mark.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c	2005-10-26 11:18:39.000000000 +1000
+++ working-2.6/mm/hugetlb.c	2005-10-26 11:34:32.000000000 +1000
@@ -336,8 +336,8 @@
 	flush_tlb_range(vma, start, end);
 }
 
-static struct page *find_lock_huge_page(struct address_space *mapping,
-			unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+					    unsigned long idx)
 {
 	struct page *page;
 	int err;
@@ -347,21 +347,19 @@
 retry:
 	page = find_lock_page(mapping, idx);
 	if (page)
-		goto out;
-
-	/* Check to make sure the mapping hasn't been truncated */
-	size = i_size_read(inode) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto out;
+		return page;
 
 	if (hugetlb_get_quota(mapping))
-		goto out;
+		return NULL;
+
 	page = alloc_huge_page();
 	if (!page) {
 		hugetlb_put_quota(mapping);
-		goto out;
+		return NULL;
 	}
 
+	lock_page(page);
+
 	err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
 	if (err) {
 		put_page(page);
@@ -370,50 +368,49 @@
 			goto retry;
 		page = NULL;
 	}
-out:
+
 	return page;
 }
 
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+		    unsigned long address, int write_access, pte_t *ptep)
 {
-	int ret = VM_FAULT_SIGBUS;
+	int ret;
 	unsigned long idx;
 	unsigned long size;
-	pte_t *pte;
 	struct page *page;
 	struct address_space *mapping;
 
-	pte = huge_pte_alloc(mm, address);
-	if (!pte)
-		goto out;
-
 	mapping = vma->vm_file->f_mapping;
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 
-	/*
-	 * Use page lock to guard against racing truncation
-	 * before we get page_table_lock.
-	 */
-	page = find_lock_huge_page(mapping, idx);
+	/* This returns a locked page, which keeps us safe in the
+	 * event of a race with truncate() */
+	page = find_or_alloc_huge_page(mapping, idx);
 	if (!page)
-		goto out;
+		return VM_FAULT_SIGBUS;
 
 	spin_lock(&mm->page_table_lock);
+
+	ret = VM_FAULT_SIGBUS;
+
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 	if (idx >= size)
 		goto backout;
 
 	ret = VM_FAULT_MINOR;
+
 	if (!pte_none(*pte))
+		/* oops, someone instantiated this PTE before us */
 		goto backout;
 
 	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
 	set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+
 	spin_unlock(&mm->page_table_lock);
 	unlock_page(page);
-out:
+
 	return ret;
 
 backout:
@@ -421,7 +418,29 @@
 	hugetlb_put_quota(mapping);
 	unlock_page(page);
 	put_page(page);
-	goto out;
+
+	return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+		  unsigned long address, int write_access)
+{
+	pte_t *ptep;
+	pte_t entry;
+
+	ptep = huge_pte_alloc(mm, address);
+	if (! ptep)
+		goto out;
+
+	entry = *ptep;
+
+	if (pte_none(entry))
+		return hugetlb_no_page(mm, vma, address, ptep);
+
+	/* we could get here if another thread instantiated the pte
+	 * before the test above */
+
+	return VM_FAULT_SIGBUS;
 }
 
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/people/dgibson

WARNING: multiple messages have this Message-ID (diff)

From: David Gibson <david@gibson.dropbear.id.au>
To: Adam Litke <agl@us.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hugh@veritas.com, William Irwin <wli@holomorphy.com>
Subject: RFC: Cleanup / small fixes to hugetlb fault handling
Date: Wed, 26 Oct 2005 12:00:55 +1000	[thread overview]
Message-ID: <20051026020055.GA17191@localhost.localdomain> (raw)

Hi, Adam, Bill, Hugh,

Does this look like a reasonable patch to send to akpm for -mm.

This patch makes some slight tweaks / cleanups to the fault handling
path for huge pages in -mm.  My main motivation is to make it simpler
to fit COW in, but along the way it addresses a few minor problems
with the existing code:

- The check against i_size was duplicated: once in
  find_lock_huge_page() and again in hugetlb_fault() after taking the
  page_table_lock.  We only really need the locked one, so remove the
  other.

- find_lock_huge_page() didn't, in fact, lock the page if it newly
  allocated one, rather than finding it in the page cache already.  As
  far as I can tell this is a bug, so the patch corrects it.

- find_lock_huge_page() isn't a great name, since it does extra things
  not analagous to find_lock_page().  Rename it
  find_or_alloc_huge_page() which is closer to the mark.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c	2005-10-26 11:18:39.000000000 +1000
+++ working-2.6/mm/hugetlb.c	2005-10-26 11:34:32.000000000 +1000
@@ -336,8 +336,8 @@
 	flush_tlb_range(vma, start, end);
 }
 
-static struct page *find_lock_huge_page(struct address_space *mapping,
-			unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+					    unsigned long idx)
 {
 	struct page *page;
 	int err;
@@ -347,21 +347,19 @@
 retry:
 	page = find_lock_page(mapping, idx);
 	if (page)
-		goto out;
-
-	/* Check to make sure the mapping hasn't been truncated */
-	size = i_size_read(inode) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto out;
+		return page;
 
 	if (hugetlb_get_quota(mapping))
-		goto out;
+		return NULL;
+
 	page = alloc_huge_page();
 	if (!page) {
 		hugetlb_put_quota(mapping);
-		goto out;
+		return NULL;
 	}
 
+	lock_page(page);
+
 	err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
 	if (err) {
 		put_page(page);
@@ -370,50 +368,49 @@
 			goto retry;
 		page = NULL;
 	}
-out:
+
 	return page;
 }
 
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+		    unsigned long address, int write_access, pte_t *ptep)
 {
-	int ret = VM_FAULT_SIGBUS;
+	int ret;
 	unsigned long idx;
 	unsigned long size;
-	pte_t *pte;
 	struct page *page;
 	struct address_space *mapping;
 
-	pte = huge_pte_alloc(mm, address);
-	if (!pte)
-		goto out;
-
 	mapping = vma->vm_file->f_mapping;
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 
-	/*
-	 * Use page lock to guard against racing truncation
-	 * before we get page_table_lock.
-	 */
-	page = find_lock_huge_page(mapping, idx);
+	/* This returns a locked page, which keeps us safe in the
+	 * event of a race with truncate() */
+	page = find_or_alloc_huge_page(mapping, idx);
 	if (!page)
-		goto out;
+		return VM_FAULT_SIGBUS;
 
 	spin_lock(&mm->page_table_lock);
+
+	ret = VM_FAULT_SIGBUS;
+
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 	if (idx >= size)
 		goto backout;
 
 	ret = VM_FAULT_MINOR;
+
 	if (!pte_none(*pte))
+		/* oops, someone instantiated this PTE before us */
 		goto backout;
 
 	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
 	set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+
 	spin_unlock(&mm->page_table_lock);
 	unlock_page(page);
-out:
+
 	return ret;
 
 backout:
@@ -421,7 +418,29 @@
 	hugetlb_put_quota(mapping);
 	unlock_page(page);
 	put_page(page);
-	goto out;
+
+	return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+		  unsigned long address, int write_access)
+{
+	pte_t *ptep;
+	pte_t entry;
+
+	ptep = huge_pte_alloc(mm, address);
+	if (! ptep)
+		goto out;
+
+	entry = *ptep;
+
+	if (pte_none(entry))
+		return hugetlb_no_page(mm, vma, address, ptep);
+
+	/* we could get here if another thread instantiated the pte
+	 * before the test above */
+
+	return VM_FAULT_SIGBUS;
 }
 
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/people/dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2005-10-26  2:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-26  2:00 David Gibson [this message]
2005-10-26  2:00 ` RFC: Cleanup / small fixes to hugetlb fault handling David Gibson
2005-10-26  2:48 ` David Gibson
2005-10-26  2:48   ` David Gibson
2005-10-26  5:06   ` __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) salve vandana
2005-10-26  5:06     ` salve vandana
2005-10-26 11:12     ` salve vandana
2005-10-26 11:12       ` salve vandana
2005-10-26 18:44   ` RFC: Cleanup / small fixes to hugetlb fault handling Chen, Kenneth W
2005-10-26 18:44     ` Chen, Kenneth W
2005-10-27  0:05     ` 'David Gibson'
2005-10-27  0:05       ` 'David Gibson'
2005-10-27  0:16       ` Chen, Kenneth W
2005-10-27  0:16         ` Chen, Kenneth W
2005-10-27  0:23         ` 'David Gibson'
2005-10-27  0:23           ` 'David Gibson'
2005-10-27  6:37         ` 'David Gibson'
2005-10-27  6:37           ` 'David Gibson'
2005-10-26 20:12   ` Chen, Kenneth W
2005-10-26 20:12     ` Chen, Kenneth W
2005-10-27  0:14     ` 'David Gibson'
2005-10-27  0:14       ` 'David Gibson'
2005-10-26 21:46   ` Adam Litke
2005-10-26 21:46     ` Adam Litke
2005-10-27  0:20     ` David Gibson
2005-10-27  0:20       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051026020055.GA17191@localhost.localdomain \
    --to=david@gibson.dropbear.id.au \
    --cc=agl@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.