From: "'David Gibson'" <david@gibson.dropbear.id.au>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
hugh@veritas.com, William Irwin <wli@holomorphy.com>
Subject: Re: RFC: Cleanup / small fixes to hugetlb fault handling
Date: Thu, 27 Oct 2005 16:37:02 +1000 [thread overview]
Message-ID: <20051027063702.GB7176@localhost.localdomain> (raw)
In-Reply-To: <200510270016.j9R0Gdg26347@unix-os.sc.intel.com>
On Wed, Oct 26, 2005 at 05:16:39PM -0700, Chen, Kenneth W wrote:
> David Gibson wrote on Wednesday, October 26, 2005 5:05 PM
> > On Wed, Oct 26, 2005 at 11:44:52AM -0700, Chen, Kenneth W wrote:
> > > David Gibson wrote on Tuesday, October 25, 2005 7:49 PM
> > > > +int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > + unsigned long address, int write_access)
> > > > +{
> > > > + pte_t *ptep;
> > > > + pte_t entry;
> > > > +
> > > > + ptep = huge_pte_alloc(mm, address);
> > > > + if (! ptep)
> > > > + /* OOM */
> > > > + return VM_FAULT_SIGBUS;
> > > > +
> > > > + entry = *ptep;
> > > > +
> > > > + if (pte_none(entry))
> > > > + return hugetlb_no_page(mm, vma, address, ptep);
> > > > +
> > > > + /* we could get here if another thread instantiated the pte
> > > > + * before the test above */
> > > > +
> > > > + return VM_FAULT_SIGBUS;
> > > > }
> > >
> > > Are you sure about the last return? Looks like a typo to me, if *ptep
> > > is present, it should return VM_FAULT_MINOR.
> >
> > Oops, yes, thinko. Corrected patch shortly.
>
> While you at it, I think it would be preferable that the first return be
> VM_FAULT_OOM, your thoughts?
Ok, here's the revised patch with this change, and the others
mentioned elsewhere.
RFC: Cleanup / small fixes to hugetlb fault handling
This patch makes some slight tweaks / cleanups to the fault handling
path for huge pages in -mm. My main motivation is to make it simpler
to fit COW in, but along the way it addresses a few minor problems
with the existing code:
- The check against i_size was duplicated: once in
find_lock_huge_page() and again in hugetlb_fault() after taking the
page_table_lock. We only really need the locked one, so remove the
other.
- find_lock_huge_page() isn't a great name, since it does extra things
not analagous to find_lock_page(). Rename it
find_or_alloc_huge_page() which is closer to the mark.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c 2005-10-27 16:34:20.000000000 +1000
+++ working-2.6/mm/hugetlb.c 2005-10-27 16:34:55.000000000 +1000
@@ -336,30 +336,24 @@
flush_tlb_range(vma, start, end);
}
-static struct page *find_lock_huge_page(struct address_space *mapping,
- unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+ unsigned long idx)
{
struct page *page;
int err;
- struct inode *inode = mapping->host;
- unsigned long size;
retry:
page = find_lock_page(mapping, idx);
if (page)
- goto out;
-
- /* Check to make sure the mapping hasn't been truncated */
- size = i_size_read(inode) >> HPAGE_SHIFT;
- if (idx >= size)
- goto out;
+ return page;
if (hugetlb_get_quota(mapping))
- goto out;
+ return NULL;
+
page = alloc_huge_page();
if (!page) {
hugetlb_put_quota(mapping);
- goto out;
+ return NULL;
}
err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
@@ -370,50 +364,49 @@
goto retry;
page = NULL;
}
-out:
+
return page;
}
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep)
{
- int ret = VM_FAULT_SIGBUS;
+ int ret;
unsigned long idx;
unsigned long size;
- pte_t *pte;
struct page *page;
struct address_space *mapping;
- pte = huge_pte_alloc(mm, address);
- if (!pte)
- goto out;
-
mapping = vma->vm_file->f_mapping;
idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
- /*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
- */
- page = find_lock_huge_page(mapping, idx);
+ /* This returns a locked page, which keeps us safe in the
+ * event of a race with truncate() */
+ page = find_or_alloc_huge_page(mapping, idx);
if (!page)
- goto out;
+ return VM_FAULT_SIGBUS;
spin_lock(&mm->page_table_lock);
+
+ ret = VM_FAULT_SIGBUS;
+
size = i_size_read(mapping->host) >> HPAGE_SHIFT;
if (idx >= size)
goto backout;
ret = VM_FAULT_MINOR;
- if (!pte_none(*pte))
+
+ if (!pte_none(*ptep))
+ /* oops, someone instantiated this PTE before us */
goto backout;
add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
- set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+ set_huge_pte_at(mm, address, ptep, make_huge_pte(vma, page));
+
spin_unlock(&mm->page_table_lock);
unlock_page(page);
-out:
+
return ret;
backout:
@@ -421,7 +414,29 @@
hugetlb_put_quota(mapping);
unlock_page(page);
put_page(page);
- goto out;
+
+ return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access)
+{
+ pte_t *ptep;
+ pte_t entry;
+
+ ptep = huge_pte_alloc(mm, address);
+ if (! ptep)
+ return VM_FAULT_OOM;
+
+ entry = *ptep;
+
+ if (pte_none(entry))
+ return hugetlb_no_page(mm, vma, address, ptep);
+
+ /* we could get here if another thread instantiated the pte
+ * before the test above */
+
+ return VM_FAULT_MINOR;
}
int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
WARNING: multiple messages have this Message-ID (diff)
From: 'David Gibson' <david@gibson.dropbear.id.au>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
hugh@veritas.com, William Irwin <wli@holomorphy.com>
Subject: Re: RFC: Cleanup / small fixes to hugetlb fault handling
Date: Thu, 27 Oct 2005 16:37:02 +1000 [thread overview]
Message-ID: <20051027063702.GB7176@localhost.localdomain> (raw)
In-Reply-To: <200510270016.j9R0Gdg26347@unix-os.sc.intel.com>
On Wed, Oct 26, 2005 at 05:16:39PM -0700, Chen, Kenneth W wrote:
> David Gibson wrote on Wednesday, October 26, 2005 5:05 PM
> > On Wed, Oct 26, 2005 at 11:44:52AM -0700, Chen, Kenneth W wrote:
> > > David Gibson wrote on Tuesday, October 25, 2005 7:49 PM
> > > > +int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > + unsigned long address, int write_access)
> > > > +{
> > > > + pte_t *ptep;
> > > > + pte_t entry;
> > > > +
> > > > + ptep = huge_pte_alloc(mm, address);
> > > > + if (! ptep)
> > > > + /* OOM */
> > > > + return VM_FAULT_SIGBUS;
> > > > +
> > > > + entry = *ptep;
> > > > +
> > > > + if (pte_none(entry))
> > > > + return hugetlb_no_page(mm, vma, address, ptep);
> > > > +
> > > > + /* we could get here if another thread instantiated the pte
> > > > + * before the test above */
> > > > +
> > > > + return VM_FAULT_SIGBUS;
> > > > }
> > >
> > > Are you sure about the last return? Looks like a typo to me, if *ptep
> > > is present, it should return VM_FAULT_MINOR.
> >
> > Oops, yes, thinko. Corrected patch shortly.
>
> While you at it, I think it would be preferable that the first return be
> VM_FAULT_OOM, your thoughts?
Ok, here's the revised patch with this change, and the others
mentioned elsewhere.
RFC: Cleanup / small fixes to hugetlb fault handling
This patch makes some slight tweaks / cleanups to the fault handling
path for huge pages in -mm. My main motivation is to make it simpler
to fit COW in, but along the way it addresses a few minor problems
with the existing code:
- The check against i_size was duplicated: once in
find_lock_huge_page() and again in hugetlb_fault() after taking the
page_table_lock. We only really need the locked one, so remove the
other.
- find_lock_huge_page() isn't a great name, since it does extra things
not analagous to find_lock_page(). Rename it
find_or_alloc_huge_page() which is closer to the mark.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Index: working-2.6/mm/hugetlb.c
===================================================================
--- working-2.6.orig/mm/hugetlb.c 2005-10-27 16:34:20.000000000 +1000
+++ working-2.6/mm/hugetlb.c 2005-10-27 16:34:55.000000000 +1000
@@ -336,30 +336,24 @@
flush_tlb_range(vma, start, end);
}
-static struct page *find_lock_huge_page(struct address_space *mapping,
- unsigned long idx)
+static struct page *find_or_alloc_huge_page(struct address_space *mapping,
+ unsigned long idx)
{
struct page *page;
int err;
- struct inode *inode = mapping->host;
- unsigned long size;
retry:
page = find_lock_page(mapping, idx);
if (page)
- goto out;
-
- /* Check to make sure the mapping hasn't been truncated */
- size = i_size_read(inode) >> HPAGE_SHIFT;
- if (idx >= size)
- goto out;
+ return page;
if (hugetlb_get_quota(mapping))
- goto out;
+ return NULL;
+
page = alloc_huge_page();
if (!page) {
hugetlb_put_quota(mapping);
- goto out;
+ return NULL;
}
err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
@@ -370,50 +364,49 @@
goto retry;
page = NULL;
}
-out:
+
return page;
}
-int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address, int write_access)
+int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, pte_t *ptep)
{
- int ret = VM_FAULT_SIGBUS;
+ int ret;
unsigned long idx;
unsigned long size;
- pte_t *pte;
struct page *page;
struct address_space *mapping;
- pte = huge_pte_alloc(mm, address);
- if (!pte)
- goto out;
-
mapping = vma->vm_file->f_mapping;
idx = ((address - vma->vm_start) >> HPAGE_SHIFT)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));
- /*
- * Use page lock to guard against racing truncation
- * before we get page_table_lock.
- */
- page = find_lock_huge_page(mapping, idx);
+ /* This returns a locked page, which keeps us safe in the
+ * event of a race with truncate() */
+ page = find_or_alloc_huge_page(mapping, idx);
if (!page)
- goto out;
+ return VM_FAULT_SIGBUS;
spin_lock(&mm->page_table_lock);
+
+ ret = VM_FAULT_SIGBUS;
+
size = i_size_read(mapping->host) >> HPAGE_SHIFT;
if (idx >= size)
goto backout;
ret = VM_FAULT_MINOR;
- if (!pte_none(*pte))
+
+ if (!pte_none(*ptep))
+ /* oops, someone instantiated this PTE before us */
goto backout;
add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE);
- set_huge_pte_at(mm, address, pte, make_huge_pte(vma, page));
+ set_huge_pte_at(mm, address, ptep, make_huge_pte(vma, page));
+
spin_unlock(&mm->page_table_lock);
unlock_page(page);
-out:
+
return ret;
backout:
@@ -421,7 +414,29 @@
hugetlb_put_quota(mapping);
unlock_page(page);
put_page(page);
- goto out;
+
+ return ret;
+}
+
+int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access)
+{
+ pte_t *ptep;
+ pte_t entry;
+
+ ptep = huge_pte_alloc(mm, address);
+ if (! ptep)
+ return VM_FAULT_OOM;
+
+ entry = *ptep;
+
+ if (pte_none(entry))
+ return hugetlb_no_page(mm, vma, address, ptep);
+
+ /* we could get here if another thread instantiated the pte
+ * before the test above */
+
+ return VM_FAULT_MINOR;
}
int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-10-27 6:39 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-26 2:00 RFC: Cleanup / small fixes to hugetlb fault handling David Gibson
2005-10-26 2:00 ` David Gibson
2005-10-26 2:48 ` David Gibson
2005-10-26 2:48 ` David Gibson
2005-10-26 5:06 ` __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) salve vandana
2005-10-26 5:06 ` salve vandana
2005-10-26 11:12 ` salve vandana
2005-10-26 11:12 ` salve vandana
2005-10-26 18:44 ` RFC: Cleanup / small fixes to hugetlb fault handling Chen, Kenneth W
2005-10-26 18:44 ` Chen, Kenneth W
2005-10-27 0:05 ` 'David Gibson'
2005-10-27 0:05 ` 'David Gibson'
2005-10-27 0:16 ` Chen, Kenneth W
2005-10-27 0:16 ` Chen, Kenneth W
2005-10-27 0:23 ` 'David Gibson'
2005-10-27 0:23 ` 'David Gibson'
2005-10-27 6:37 ` 'David Gibson' [this message]
2005-10-27 6:37 ` 'David Gibson'
2005-10-26 20:12 ` Chen, Kenneth W
2005-10-26 20:12 ` Chen, Kenneth W
2005-10-27 0:14 ` 'David Gibson'
2005-10-27 0:14 ` 'David Gibson'
2005-10-26 21:46 ` Adam Litke
2005-10-26 21:46 ` Adam Litke
2005-10-27 0:20 ` David Gibson
2005-10-27 0:20 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051027063702.GB7176@localhost.localdomain \
--to=david@gibson.dropbear.id.au \
--cc=agl@us.ibm.com \
--cc=hugh@veritas.com \
--cc=kenneth.w.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.