linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mmotm] thp: transparent hugepage core fixlet
@ 2011-01-11  0:55 Hugh Dickins
  2011-01-11  1:57 ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2011-01-11  0:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, linux-mm

If you configure THP in addition to HUGETLB_PAGE on x86_32 without PAE,
the p?d-folding works out that munlock_vma_pages_range() can crash to
follow_page()'s pud_huge() BUG_ON(flags & FOLL_GET): it needs the same
VM_HUGETLB check already there on the pmd_huge() line.  Conveniently,
openSUSE provides a "blogd" which tests this out at startup!

Signed-off-by: Hugh Dickins <hughd@google.com>
---
This massive rework belongs just after thp-transparent-hugepage-core.patch

 mm/memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mmotm.orig/mm/memory.c	2011-01-10 16:31:29.000000000 -0800
+++ mmotm/mm/memory.c	2011-01-10 16:33:16.000000000 -0800
@@ -1288,7 +1288,7 @@ struct page *follow_page(struct vm_area_
 	pud = pud_offset(pgd, address);
 	if (pud_none(*pud))
 		goto no_page_table;
-	if (pud_huge(*pud)) {
+	if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
 		BUG_ON(flags & FOLL_GET);
 		page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
 		goto out;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11  0:55 [PATCH mmotm] thp: transparent hugepage core fixlet Hugh Dickins
@ 2011-01-11  1:57 ` Andrea Arcangeli
  2011-01-11  2:29   ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2011-01-11  1:57 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-mm

Hi Hugh,

On Mon, Jan 10, 2011 at 04:55:53PM -0800, Hugh Dickins wrote:
> If you configure THP in addition to HUGETLB_PAGE on x86_32 without PAE,
> the p?d-folding works out that munlock_vma_pages_range() can crash to
> follow_page()'s pud_huge() BUG_ON(flags & FOLL_GET): it needs the same
> VM_HUGETLB check already there on the pmd_huge() line.  Conveniently,
> openSUSE provides a "blogd" which tests this out at startup!
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> This massive rework belongs just after thp-transparent-hugepage-core.patch
> 
>  mm/memory.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- mmotm.orig/mm/memory.c	2011-01-10 16:31:29.000000000 -0800
> +++ mmotm/mm/memory.c	2011-01-10 16:33:16.000000000 -0800
> @@ -1288,7 +1288,7 @@ struct page *follow_page(struct vm_area_
>  	pud = pud_offset(pgd, address);
>  	if (pud_none(*pud))
>  		goto no_page_table;
> -	if (pud_huge(*pud)) {
> +	if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
>  		BUG_ON(flags & FOLL_GET);
>  		page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
>  		goto out;

How is THP related to this? pud_trans_huge doesn't exist, if pud_huge
is true, vma is already guaranteed to belong to hugetlbfs without
requiring the additional check.

I added the check to pmd_huge already, there it is needed, but for
pud_huge it isn't as far as I can tell.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11  1:57 ` Andrea Arcangeli
@ 2011-01-11  2:29   ` Hugh Dickins
  2011-01-11 14:04     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2011-01-11  2:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, linux-mm

On Mon, Jan 10, 2011 at 5:57 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Mon, Jan 10, 2011 at 04:55:53PM -0800, Hugh Dickins wrote:
>> If you configure THP in addition to HUGETLB_PAGE on x86_32 without PAE,
>> the p?d-folding works out that munlock_vma_pages_range() can crash to
>> follow_page()'s pud_huge() BUG_ON(flags & FOLL_GET): it needs the same
>> VM_HUGETLB check already there on the pmd_huge() line.  Conveniently,
>> openSUSE provides a "blogd" which tests this out at startup!
>
> How is THP related to this? pud_trans_huge doesn't exist, if pud_huge
> is true, vma is already guaranteed to belong to hugetlbfs without
> requiring the additional check.

THP puts in pmds that are huge.  In this configuration the "folding" is
such that the puds are the pmds.  So the pud_huge test passes and
the BUG_ON hits.  I hope I've explained that correctly, agreed that
it's confusing!

>
> I added the check to pmd_huge already, there it is needed, but for
> pud_huge it isn't as far as I can tell.

Crashing on that BUG_ON suggests otherwise ;)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11  2:29   ` Hugh Dickins
@ 2011-01-11 14:04     ` Andrea Arcangeli
  2011-01-11 16:31       ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2011-01-11 14:04 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-mm

On Mon, Jan 10, 2011 at 06:29:29PM -0800, Hugh Dickins wrote:
> On Mon, Jan 10, 2011 at 5:57 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > On Mon, Jan 10, 2011 at 04:55:53PM -0800, Hugh Dickins wrote:
> >> If you configure THP in addition to HUGETLB_PAGE on x86_32 without PAE,
> >> the p?d-folding works out that munlock_vma_pages_range() can crash to
> >> follow_page()'s pud_huge() BUG_ON(flags & FOLL_GET): it needs the same
> >> VM_HUGETLB check already there on the pmd_huge() line.  Conveniently,
> >> openSUSE provides a "blogd" which tests this out at startup!
> >
> > How is THP related to this? pud_trans_huge doesn't exist, if pud_huge
> > is true, vma is already guaranteed to belong to hugetlbfs without
> > requiring the additional check.
> 
> THP puts in pmds that are huge.  In this configuration the "folding" is
> such that the puds are the pmds.  So the pud_huge test passes and
> the BUG_ON hits.  I hope I've explained that correctly, agreed that
> it's confusing!
> 
> >
> > I added the check to pmd_huge already, there it is needed, but for
> > pud_huge it isn't as far as I can tell.
> 
> Crashing on that BUG_ON suggests otherwise ;)

I think I see what you mean, pgd=pud=pmd with 2 levels only, but if
pud_huge can return 1 on x86_32 without PAE, that sounds like an
architectural bug to me. Why can't pud_huge simply return 0 for
x86_32? Any other place dealing with hugepages and calling pud_huge on
x86 noPAE would be at risk, otherwise, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11 14:04     ` Andrea Arcangeli
@ 2011-01-11 16:31       ` Andrea Arcangeli
  2011-01-11 22:59         ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2011-01-11 16:31 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-mm

On Tue, Jan 11, 2011 at 03:04:21PM +0100, Andrea Arcangeli wrote:
> architectural bug to me. Why can't pud_huge simply return 0 for
> x86_32? Any other place dealing with hugepages and calling pud_huge on
> x86 noPAE would be at risk, otherwise, no?

Isn't this better solution?

======
Subject: avoid confusing hugetlbfs code when pmd_trans_huge is set

From: Andrea Arcangeli <aarcange@redhat.com>

If pmd is set huge by THP, pud_huge shouldn't return 1 when pud doesn't exist
and it's just a 1:1 bypass over the pmd (like it happens on 32bit x86 because
there are at most 2 or 3 level of pagetables). Only pmd_huge can return 1.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -227,7 +227,15 @@ int pmd_huge(pmd_t pmd)
 
 int pud_huge(pud_t pud)
 {
+#ifdef CONFIG_X86_64
 	return !!(pud_val(pud) & _PAGE_PSE);
+#else
+	/*
+	 * pud is a bypass with 2 or 3 level pagetables, only pmd_huge
+	 * can return 1.
+	 */
+	return 0;
+#endif
 }
 
 struct page *

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11 16:31       ` Andrea Arcangeli
@ 2011-01-11 22:59         ` Hugh Dickins
  2011-01-12  2:02           ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2011-01-11 22:59 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, Jeremy Fitzhardinge, Andrew Morton, linux-mm

On Tue, 11 Jan 2011, Andrea Arcangeli wrote:
> On Tue, Jan 11, 2011 at 03:04:21PM +0100, Andrea Arcangeli wrote:
> > architectural bug to me. Why can't pud_huge simply return 0 for
> > x86_32? Any other place dealing with hugepages and calling pud_huge on
> > x86 noPAE would be at risk, otherwise, no?
> 
> Isn't this better solution?

[Better solution than my patch to follow_page() in mmotm, to fix crash
with Transparent Huge Pages by duplicating Andrea's pmd_huge VM_HUGETLB
check to the pud_huge line too.]

The truth is, I'm sure one of the solutions is better than the other,
but I'm too confused by p?d folding to know which is which ;)

Certainly I don't oppose your patch as a replacement for mine,
if you're sure yours is better.

There are only two places which are using pud_huge() anyway:
follow_page() and apply_to_pmd_range().  Is the latter's
BUG_ON(pud_huge) safe?  Safe in the THP world?

And I never quite understood why we have both pmd_huge and pmd_large,
pud_huge and pud_large.

There are answers to these questions, but it would take me hours and
hours of easily-confused research (across several arches) to decide.

I'm hoping someone else has a surer grasp: Andi introduced pud_huge(),
and Jeremy is the most active in the pagetable layers nowadays -
perhaps they can tell us more quickly.

Hugh

> 
> ======
> Subject: avoid confusing hugetlbfs code when pmd_trans_huge is set
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> If pmd is set huge by THP, pud_huge shouldn't return 1 when pud doesn't exist
> and it's just a 1:1 bypass over the pmd (like it happens on 32bit x86 because
> there are at most 2 or 3 level of pagetables). Only pmd_huge can return 1.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -227,7 +227,15 @@ int pmd_huge(pmd_t pmd)
>  
>  int pud_huge(pud_t pud)
>  {
> +#ifdef CONFIG_X86_64
>  	return !!(pud_val(pud) & _PAGE_PSE);
> +#else
> +	/*
> +	 * pud is a bypass with 2 or 3 level pagetables, only pmd_huge
> +	 * can return 1.
> +	 */
> +	return 0;
> +#endif
>  }
>  
>  struct page *

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mmotm] thp: transparent hugepage core fixlet
  2011-01-11 22:59         ` Hugh Dickins
@ 2011-01-12  2:02           ` Andrea Arcangeli
  0 siblings, 0 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2011-01-12  2:02 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andi Kleen, Jeremy Fitzhardinge, Andrew Morton, linux-mm

On Tue, Jan 11, 2011 at 02:59:43PM -0800, Hugh Dickins wrote:
> On Tue, 11 Jan 2011, Andrea Arcangeli wrote:
> > On Tue, Jan 11, 2011 at 03:04:21PM +0100, Andrea Arcangeli wrote:
> > > architectural bug to me. Why can't pud_huge simply return 0 for
> > > x86_32? Any other place dealing with hugepages and calling pud_huge on
> > > x86 noPAE would be at risk, otherwise, no?
> > 
> > Isn't this better solution?
> 
> [Better solution than my patch to follow_page() in mmotm, to fix crash
> with Transparent Huge Pages by duplicating Andrea's pmd_huge VM_HUGETLB
> check to the pud_huge line too.]
> 
> The truth is, I'm sure one of the solutions is better than the other,
> but I'm too confused by p?d folding to know which is which ;)
> 
> Certainly I don't oppose your patch as a replacement for mine,
> if you're sure yours is better.
> 
> There are only two places which are using pud_huge() anyway:
> follow_page() and apply_to_pmd_range().  Is the latter's
> BUG_ON(pud_huge) safe?  Safe in the THP world?

The latter BUG_ON should be safe in THP world, there's a pmd_huge bug
on too so it can't be a problem in THP world.

> And I never quite understood why we have both pmd_huge and pmd_large,
> pud_huge and pud_large.
> 
> There are answers to these questions, but it would take me hours and
> hours of easily-confused research (across several arches) to decide.
> 
> I'm hoping someone else has a surer grasp: Andi introduced pud_huge(),
> and Jeremy is the most active in the pagetable layers nowadays -
> perhaps they can tell us more quickly.

I'd like their opinion too but for exactly the same reason why you
asked yourself if the latter BUG_ON is safe, I think my patch from
practical prospective reduces the risk.

When THP uses pmd_mkhuge it's counter intuitive that pud_huge returns
1, and there's no benefit to that at all other than risking troubles
like this one. In fact a branch and a block of follow_page is
eliminated at compile time by my patch (as opposed your patch adds one
more branch and can't eliminate a block of code if the second branch
would be taken but we know it can't).

I consider this an arch bug, not common code issue. This is the THP
modifications to the code and I didn't expect having to alter the
pud_huge check in addition to the below one. I thought this shall be
enough if the arch is correct (and optimal).

@@ -1273,11 +1301,32 @@ struct page *follow_page(struct vm_area_
        pmd = pmd_offset(pud, address);
        if (pmd_none(*pmd))
                goto no_page_table;
-       if (pmd_huge(*pmd)) {
+       if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
                BUG_ON(flags & FOLL_GET);
                page = follow_huge_pmd(mm, address, pmd, flags &
        FOLL_WRITE);
                goto out;

I think the x86 3level should work ok with follow_page_pmd (it's
basically identical to follow_page_pud so it won't notice the
difference) so I hope it doesn't break anything, and it will speedup
follow_page too (even when THP is off).

Across the whole tree if you grep for pmd_offset, you'll find all the
places that you've to care for THP, I'd like to still not having to
care about the result of pud_offset (having to care for pmd_offset is
more than enough ;).

Other archs implementing pud_huge should also return 0 if there are
only 3 levels, if they introduce THP, this will have the benefit of
optimizing follow_page when THP is off as well for them.

Your patch is ok if this will not be considered an arch bug (I think
to avoid mistakes pud_huge should be implemented by pgtable-nopud.h
but that's a little bigger cleanup I didn't do myself yet).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-01-12  2:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-11  0:55 [PATCH mmotm] thp: transparent hugepage core fixlet Hugh Dickins
2011-01-11  1:57 ` Andrea Arcangeli
2011-01-11  2:29   ` Hugh Dickins
2011-01-11 14:04     ` Andrea Arcangeli
2011-01-11 16:31       ` Andrea Arcangeli
2011-01-11 22:59         ` Hugh Dickins
2011-01-12  2:02           ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).