[RFC 1/2] Hugetlb fault fixes and reorg

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Adam Litke <agl@us.ibm.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>,
	hugh@veritas.com, rohit.seth@intel.com, "Chen,
	Kenneth W" <kenneth.w.chen@intel.com>,
	akpm@osdl.org
Subject: [RFC 1/2] Hugetlb fault fixes and reorg
Date: Mon, 07 Nov 2005 15:38:16 -0600	[thread overview]
Message-ID: <1131399496.25133.103.camel@localhost.localdomain> (raw)
In-Reply-To: <1131397841.25133.90.camel@localhost.localdomain>

[RFC] Cleanup / small fixes to hugetlb fault handling
(Patch originally from David Gibson <david@gibson.dropbear.id.au>)
Initial Post: Tue. 25 Oct 2005

On Thu, 2005-10-27 at 16:37 +1000, 'David Gibson' wrote:
> This patch makes some slight tweaks / cleanups to the fault handling
> path for huge pages in -mm.  My main motivation is to make it simpler
> to fit COW in, but along the way it addresses a few minor problems
> with the existing code:
> 
> - The check against i_size was duplicated: once in
>   find_lock_huge_page() and again in hugetlb_fault() after taking the
>   page_table_lock.  We only really need the locked one, so remove the
>   other.
> 
> - find_lock_huge_page() isn't a great name, since it does extra things
>   not analagous to find_lock_page().  Rename it
>   find_or_alloc_huge_page() which is closer to the mark.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Adam Litke <agl@us.ibm.com>

---
 hugetlb.c |   77 +++++++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 46 insertions(+), 31 deletions(-)
diff -upN reference/mm/hugetlb.c current/mm/hugetlb.c
--- reference/mm/hugetlb.c
+++ current/mm/hugetlb.c
@@ -339,30 +339,24 @@ void unmap_hugepage_range(struct vm_area
 	flush_tlb_range(vma, start, end);
 }
 
-static struct page *find_lock_huge_page(struct address_space *mapping,
-			unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+					    unsigned long idx)
 {
 	struct page *page;
 	int err;
-	struct inode *inode = mapping->host;
-	unsigned long size;
 
 retry:
 	page = find_lock_page(mapping, idx);
 	if (page)
-		goto out;
-
-	/* Check to make sure the mapping hasn't been truncated */
-	size = i_size_read(inode) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto out;
+		return page;
 
 	if (hugetlb_get_quota(mapping))
-		goto out;
+		return NULL;
+
 	page = alloc_huge_page();
 	if (!page) {
 		hugetlb_put_quota(mapping);
-		goto out;
+		return NULL;
 	}
 
 	err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
@@ -373,50 +367,49 @@ retry:
 			goto retry;
 		page = NULL;
 	}
-out:
+
 	return page;
 }
 
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+		    unsigned long address, pte_t *ptep)
 {
-	int ret = VM_FAULT_SIGBUS;
+	int ret;
 	unsigned long idx;
 	unsigned long size;
-	pte_t *pte;
 	struct page *page;
 	struct address_space *mapping;
 
-	pte = huge_pte_alloc(mm, address);
-	if (!pte)
-		goto out;
-
 	mapping = vma->vm_file->f_mapping;
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 
-	/*
-	 * Use page lock to guard against racing truncation
-	 * before we get page_table_lock.
-	 */
-	page = find_lock_huge_page(mapping, idx);
+	/* This returns a locked page, which keeps us safe in the
+	 * event of a race with truncate() */
+	page = find_or_alloc_huge_page(mapping, idx);
 	if (!page)
-		goto out;
+		return VM_FAULT_SIGBUS;
 
 	spin_lock(&mm->page_table_lock);
+
+	ret = VM_FAULT_SIGBUS;
+
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 	if (idx >= size)
 		goto backout;
 
 	ret = VM_FAULT_MINOR;
-	if (!pte_none(*pte))
+
+	if (!pte_none(*ptep))
+		/* oops, someone instantiated this PTE before us */
 		goto backout;
 
 	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
-	set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+	set_huge_pte_at(mm, address, ptep, make_huge_pte(vma, page));
+
 	spin_unlock(&mm->page_table_lock);
 	unlock_page(page);
-out:
+
 	return ret;
 
 backout:
@@ -424,7 +417,29 @@ backout:
 	hugetlb_put_quota(mapping);
 	unlock_page(page);
 	put_page(page);
-	goto out;
+
+	return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+		  unsigned long address, int write_access)
+{
+	pte_t *ptep;
+	pte_t entry;
+
+	ptep = huge_pte_alloc(mm, address);
+	if (! ptep)
+		return VM_FAULT_OOM;
+
+	entry = *ptep;
+
+	if (pte_none(entry))
+		return hugetlb_no_page(mm, vma, address, ptep);
+
+	/* we could get here if another thread instantiated the pte
+	 * before the test above */
+
+	return VM_FAULT_MINOR;
 }
 
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

WARNING: multiple messages have this Message-ID (diff)

From: Adam Litke <agl@us.ibm.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>,
	hugh@veritas.com, rohit.seth@intel.com, "Chen,
	Kenneth W" <kenneth.w.chen@intel.com>,
	akpm@osdl.org
Subject: [RFC 1/2] Hugetlb fault fixes and reorg
Date: Mon, 07 Nov 2005 15:38:16 -0600	[thread overview]
Message-ID: <1131399496.25133.103.camel@localhost.localdomain> (raw)
In-Reply-To: <1131397841.25133.90.camel@localhost.localdomain>

[RFC] Cleanup / small fixes to hugetlb fault handling
(Patch originally from David Gibson <david@gibson.dropbear.id.au>)
Initial Post: Tue. 25 Oct 2005

On Thu, 2005-10-27 at 16:37 +1000, 'David Gibson' wrote:
> This patch makes some slight tweaks / cleanups to the fault handling
> path for huge pages in -mm.  My main motivation is to make it simpler
> to fit COW in, but along the way it addresses a few minor problems
> with the existing code:
> 
> - The check against i_size was duplicated: once in
>   find_lock_huge_page() and again in hugetlb_fault() after taking the
>   page_table_lock.  We only really need the locked one, so remove the
>   other.
> 
> - find_lock_huge_page() isn't a great name, since it does extra things
>   not analagous to find_lock_page().  Rename it
>   find_or_alloc_huge_page() which is closer to the mark.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Adam Litke <agl@us.ibm.com>

---
 hugetlb.c |   77 +++++++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 46 insertions(+), 31 deletions(-)
diff -upN reference/mm/hugetlb.c current/mm/hugetlb.c
--- reference/mm/hugetlb.c
+++ current/mm/hugetlb.c
@@ -339,30 +339,24 @@ void unmap_hugepage_range(struct vm_area
 	flush_tlb_range(vma, start, end);
 }
 
-static struct page *find_lock_huge_page(struct address_space *mapping,
-			unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+					    unsigned long idx)
 {
 	struct page *page;
 	int err;
-	struct inode *inode = mapping->host;
-	unsigned long size;
 
 retry:
 	page = find_lock_page(mapping, idx);
 	if (page)
-		goto out;
-
-	/* Check to make sure the mapping hasn't been truncated */
-	size = i_size_read(inode) >> HPAGE_SHIFT;
-	if (idx >= size)
-		goto out;
+		return page;
 
 	if (hugetlb_get_quota(mapping))
-		goto out;
+		return NULL;
+
 	page = alloc_huge_page();
 	if (!page) {
 		hugetlb_put_quota(mapping);
-		goto out;
+		return NULL;
 	}
 
 	err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
@@ -373,50 +367,49 @@ retry:
 			goto retry;
 		page = NULL;
 	}
-out:
+
 	return page;
 }
 
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+		    unsigned long address, pte_t *ptep)
 {
-	int ret = VM_FAULT_SIGBUS;
+	int ret;
 	unsigned long idx;
 	unsigned long size;
-	pte_t *pte;
 	struct page *page;
 	struct address_space *mapping;
 
-	pte = huge_pte_alloc(mm, address);
-	if (!pte)
-		goto out;
-
 	mapping = vma->vm_file->f_mapping;
 	idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
 		+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
 
-	/*
-	 * Use page lock to guard against racing truncation
-	 * before we get page_table_lock.
-	 */
-	page = find_lock_huge_page(mapping, idx);
+	/* This returns a locked page, which keeps us safe in the
+	 * event of a race with truncate() */
+	page = find_or_alloc_huge_page(mapping, idx);
 	if (!page)
-		goto out;
+		return VM_FAULT_SIGBUS;
 
 	spin_lock(&mm->page_table_lock);
+
+	ret = VM_FAULT_SIGBUS;
+
 	size = i_size_read(mapping->host) >> HPAGE_SHIFT;
 	if (idx >= size)
 		goto backout;
 
 	ret = VM_FAULT_MINOR;
-	if (!pte_none(*pte))
+
+	if (!pte_none(*ptep))
+		/* oops, someone instantiated this PTE before us */
 		goto backout;
 
 	add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
-	set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+	set_huge_pte_at(mm, address, ptep, make_huge_pte(vma, page));
+
 	spin_unlock(&mm->page_table_lock);
 	unlock_page(page);
-out:
+
 	return ret;
 
 backout:
@@ -424,7 +417,29 @@ backout:
 	hugetlb_put_quota(mapping);
 	unlock_page(page);
 	put_page(page);
-	goto out;
+
+	return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+		  unsigned long address, int write_access)
+{
+	pte_t *ptep;
+	pte_t entry;
+
+	ptep = huge_pte_alloc(mm, address);
+	if (! ptep)
+		return VM_FAULT_OOM;
+
+	entry = *ptep;
+
+	if (pte_none(entry))
+		return hugetlb_no_page(mm, vma, address, ptep);
+
+	/* we could get here if another thread instantiated the pte
+	 * before the test above */
+
+	return VM_FAULT_MINOR;
 }
 
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2005-11-07 21:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-07 21:10 [RFC 0/2] Copy on write for hugetlbfs Adam Litke
2005-11-07 21:10 ` Adam Litke
2005-11-07 21:38 ` Adam Litke [this message]
2005-11-07 21:38   ` [RFC 1/2] Hugetlb fault fixes and reorg Adam Litke
2005-11-07 23:30   ` William Lee Irwin III
2005-11-07 23:30     ` William Lee Irwin III
2005-11-08  1:21     ` David Gibson
2005-11-08  1:21       ` David Gibson
2005-11-07 21:38 ` [RFC 2/2] Hugetlb COW Adam Litke
2005-11-07 21:38   ` Adam Litke
2005-11-07 21:47   ` Adam Litke
2005-11-07 21:47     ` Adam Litke
2005-11-08  3:44     ` David Gibson
2005-11-08  3:44       ` David Gibson
2005-11-08  4:17     ` David Gibson
2005-11-08  4:17       ` David Gibson
2005-11-07 23:35   ` William Lee Irwin III
2005-11-07 23:35     ` William Lee Irwin III
2005-11-08  2:53     ` David Gibson
2005-11-08  2:53       ` David Gibson
2005-11-08  3:46   ` David Gibson
2005-11-08  3:46     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1131399496.25133.103.camel@localhost.localdomain \
    --to=agl@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=hugh@veritas.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rohit.seth@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.