xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Linux PATCH] Fix to hugepages to work around new PWT handling
@ 2010-06-09 14:02 Dave McCracken
  2010-06-09 18:21 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Dave McCracken @ 2010-06-09 14:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen Developers List

Recent changes to Linux include code to set new flags in the pte,
including _PAGE_PAT and _PAGE_PWT.  That change conflicts with hugepage
using the pte macros to set up its pmd entries.  This patch resolves
that problem.

An additional fix here is to make sure the _PAGE_PRESENT bit is set
before hugepages does a mk_pte(), since Xen depends on that bit to
trigger the pfn->mfn translation.

Signed-off-by: Dave McCracken <dave.mccracken@oracle.com>

--------


--- stable-2.6.32.x//arch/x86/include/asm/hugetlb.h	2010-06-04 12:19:31.000000000 -0500
+++ 2.6.32-hfix//arch/x86/include/asm/hugetlb.h	2010-06-08 12:23:53.000000000 -0500
@@ -44,7 +44,7 @@ static inline pte_t huge_ptep_get(pte_t
 static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 				   pte_t *ptep, pte_t pte)
 {
-	set_pmd((pmd_t *)ptep, __pmd(pte_val(pte)));
+	set_pmd((pmd_t *)ptep, native_make_pmd(native_pte_val(pte)));
 }
 
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
--- stable-2.6.32.x//mm/hugetlb.c	2010-06-04 12:19:36.000000000 -0500
+++ 2.6.32-hfix//mm/hugetlb.c	2010-06-07 07:11:34.000000000 -0500
@@ -1732,12 +1732,14 @@ static pte_t make_huge_pte(struct vm_are
 				int writable)
 {
 	pte_t entry;
+	pgprot_t pgprot;
 
+	pgprot = __pgprot(pgprot_val(vma->vm_page_prot) | _PAGE_PRESENT);
 	if (writable) {
 		entry =
-		    pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
+		    pte_mkwrite(pte_mkdirty(mk_pte(page, pgprot)));
 	} else {
-		entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
+		entry = huge_pte_wrprotect(mk_pte(page, pgprot));
 	}
 	entry = pte_mkyoung(entry);
 	entry = pte_mkhuge(entry);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-09 14:02 [Linux PATCH] Fix to hugepages to work around new PWT handling Dave McCracken
@ 2010-06-09 18:21 ` Jeremy Fitzhardinge
  2010-06-09 18:35   ` Dave McCracken
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-09 18:21 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 06/09/2010 07:02 AM, Dave McCracken wrote:
> Recent changes to Linux include code to set new flags in the pte,
> including _PAGE_PAT and _PAGE_PWT.  That change conflicts with hugepage
> using the pte macros to set up its pmd entries.  This patch resolves
> that problem.
>   

Could you explain this a bit more clearly?  Why is using __pmd not
working in this case?  Is it because the kernel is now setting PAT and
PWT on huge pages?  But PAT isn't even the same flag for huge pages...

> An additional fix here is to make sure the _PAGE_PRESENT bit is set
> before hugepages does a mk_pte(), since Xen depends on that bit to
> trigger the pfn->mfn translation.
>   

Why is the kernel creating a non-present mapping?  If it isn't present,
why does it matter whether we do the pfn->mfn conversion?

    J

> Signed-off-by: Dave McCracken <dave.mccracken@oracle.com>
>
> --------
>
>
> --- stable-2.6.32.x//arch/x86/include/asm/hugetlb.h	2010-06-04 12:19:31.000000000 -0500
> +++ 2.6.32-hfix//arch/x86/include/asm/hugetlb.h	2010-06-08 12:23:53.000000000 -0500
> @@ -44,7 +44,7 @@ static inline pte_t huge_ptep_get(pte_t
>  static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  				   pte_t *ptep, pte_t pte)
>  {
> -	set_pmd((pmd_t *)ptep, __pmd(pte_val(pte)));
> +	set_pmd((pmd_t *)ptep, native_make_pmd(native_pte_val(pte)));
>  }
>  
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> --- stable-2.6.32.x//mm/hugetlb.c	2010-06-04 12:19:36.000000000 -0500
> +++ 2.6.32-hfix//mm/hugetlb.c	2010-06-07 07:11:34.000000000 -0500
> @@ -1732,12 +1732,14 @@ static pte_t make_huge_pte(struct vm_are
>  				int writable)
>  {
>  	pte_t entry;
> +	pgprot_t pgprot;
>  
> +	pgprot = __pgprot(pgprot_val(vma->vm_page_prot) | _PAGE_PRESENT);
>  	if (writable) {
>  		entry =
> -		    pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
> +		    pte_mkwrite(pte_mkdirty(mk_pte(page, pgprot)));
>  	} else {
> -		entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
> +		entry = huge_pte_wrprotect(mk_pte(page, pgprot));
>  	}
>  	entry = pte_mkyoung(entry);
>  	entry = pte_mkhuge(entry);
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-09 18:21 ` Jeremy Fitzhardinge
@ 2010-06-09 18:35   ` Dave McCracken
  2010-06-09 18:54     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Dave McCracken @ 2010-06-09 18:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen Developers List

On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote:
> > Recent changes to Linux include code to set new flags in the pte,
> > including _PAGE_PAT and _PAGE_PWT.  That change conflicts with hugepage
> > using the pte macros to set up its pmd entries.  This patch resolves
> > that problem.
> >
> >   
> 
> Could you explain this a bit more clearly?  Why is using __pmd not
> working in this case?  Is it because the kernel is now setting PAT and
> PWT on huge pages?  But PAT isn't even the same flag for huge pages...

For some reason the latest xen_make_pte() and xen_pte_val() are attempting to 
do some magic with PAT, PCD, and PWT.  The previous version of the code in 
set_huge_pte_at() did "__pmd(pte_val(pte))", and the end result was that PSE 
got turned off and PWT was turned on.  Doing the native versions of those calls 
avoids this issue, since all the code is really trying to do here is a 
typecast.

I realize the more complete fix probably involves something like converting 
hugepages to use pmd throughout instead of pte, but that's a much bigger 
change and this solves the immediate problem.

> > An additional fix here is to make sure the _PAGE_PRESENT bit is set
> > before hugepages does a mk_pte(), since Xen depends on that bit to
> > trigger the pfn->mfn translation.
> >
> >   
> 
> Why is the kernel creating a non-present mapping?  If it isn't present,
> why does it matter whether we do the pfn->mfn conversion?

The hugepage function make_huge_pte() called mk_pte() to turn a page and a 
pgprot into a pte before it set PRESENT.  The PRESENT flag was set after the 
pte was made.  This meant that the Xen version of the macro did not see 
PRESENT so did not do the pfn_to_mfn().  My patch sets PRESENT first so the 
right thing will happen.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-09 18:35   ` Dave McCracken
@ 2010-06-09 18:54     ` Jeremy Fitzhardinge
  2010-06-09 19:26       ` Dave McCracken
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-09 18:54 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 06/09/2010 11:35 AM, Dave McCracken wrote:
> On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote:
>   
>>> Recent changes to Linux include code to set new flags in the pte,
>>> including _PAGE_PAT and _PAGE_PWT.  That change conflicts with hugepage
>>> using the pte macros to set up its pmd entries.  This patch resolves
>>> that problem.
>>>
>>>   
>>>       
>> Could you explain this a bit more clearly?  Why is using __pmd not
>> working in this case?  Is it because the kernel is now setting PAT and
>> PWT on huge pages?  But PAT isn't even the same flag for huge pages...
>>     
> For some reason the latest xen_make_pte() and xen_pte_val() are attempting to 
> do some magic with PAT, PCD, and PWT.  The previous version of the code in 
> set_huge_pte_at() did "__pmd(pte_val(pte))", and the end result was that PSE 
> got turned off and PWT was turned on.  Doing the native versions of those calls 
> avoids this issue, since all the code is really trying to do here is a 
> typecast.
>   

Yes, the pte code is converting the Linux PAT encodings to the Xen ones,
so that the resulting mappings have the right page properties.

Does the Linux hugepage code set PAT_LARGE on huge ptes, and does Xen
support huge mappings with PAT properties?  If so, you'll probably need
to do the same thing.

> I realize the more complete fix probably involves something like converting 
> hugepages to use pmd throughout instead of pte, but that's a much bigger 
> change and this solves the immediate problem.
>   

I think that's actually pretty straightforward.  It shouldn't be using
the plain pte/pmd macros anyway, but the huge_ counterparts.  huge_pte
should map to pmd.

But this portion of the patch looks OK for now.

>>> An additional fix here is to make sure the _PAGE_PRESENT bit is set
>>> before hugepages does a mk_pte(), since Xen depends on that bit to
>>> trigger the pfn->mfn translation.
>>>
>>>   
>>>       
>> Why is the kernel creating a non-present mapping?  If it isn't present,
>> why does it matter whether we do the pfn->mfn conversion?
>>     
> The hugepage function make_huge_pte() called mk_pte() to turn a page and a 
> pgprot into a pte before it set PRESENT.  The PRESENT flag was set after the 
> pte was made.  This meant that the Xen version of the macro did not see 
> PRESENT so did not do the pfn_to_mfn().  My patch sets PRESENT first so the 
> right thing will happen.
>   

But in general kernel code shouldn't be just nakedly setting present on
the pte without also remaking the whole thing.  That doesn't happen with
normal ptes, and it probably shouldn't happen with huge ptes.  Forcing
present on a pte at this level seems very bogus.   Why not change the
upper code to set present if that's want it wants?

I'll skip this chunk for now.

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-09 18:54     ` Jeremy Fitzhardinge
@ 2010-06-09 19:26       ` Dave McCracken
  2010-06-24 10:05         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Dave McCracken @ 2010-06-09 19:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen Developers List

On Wednesday, June 09, 2010, Jeremy Fitzhardinge wrote:
> >>> An additional fix here is to make sure the _PAGE_PRESENT bit is set
> >>> before hugepages does a mk_pte(), since Xen depends on that bit to
> >>> trigger the pfn->mfn translation.
> >>>
> >>>   
> >>>       
> >>>
> >> Why is the kernel creating a non-present mapping?  If it isn't present,
> >> why does it matter whether we do the pfn->mfn conversion?
> >>
> >>     
> >>
> > The hugepage function make_huge_pte() called mk_pte() to turn a page and
> > a  pgprot into a pte before it set PRESENT.  The PRESENT flag was set
> > after the pte was made.  This meant that the Xen version of the macro
> > did not see PRESENT so did not do the pfn_to_mfn().  My patch sets
> > PRESENT first so the right thing will happen.
> >
> >   
> 
> But in general kernel code shouldn't be just nakedly setting present on
> the pte without also remaking the whole thing.  That doesn't happen with
> normal ptes, and it probably shouldn't happen with huge ptes.  Forcing
> present on a pte at this level seems very bogus.   Why not change the
> upper code to set present if that's want it wants?
> 
> I'll skip this chunk for now.

Um, this is the upper level code.  The entire purpose of make_huge_pte is to 
construct a present huge pte from page and pgprot. The problem is that the 
original code makes the pte, then sets the present bit via pte_mkhuge().  This 
means the Xen-specific macro that triggers on present is misled and doesn't do 
the pfn_to_mfn().  Without this patch hugepages is handing pfns to the 
hypervisor to map instead of mfns.

Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-09 19:26       ` Dave McCracken
@ 2010-06-24 10:05         ` Jeremy Fitzhardinge
  2010-06-24 22:38           ` Dave McCracken
  0 siblings, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-24 10:05 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 06/09/2010 08:26 PM, Dave McCracken wrote:
>> But in general kernel code shouldn't be just nakedly setting present on
>> the pte without also remaking the whole thing.  That doesn't happen with
>> normal ptes, and it probably shouldn't happen with huge ptes.  Forcing
>> present on a pte at this level seems very bogus.   Why not change the
>> upper code to set present if that's want it wants?
>>
>> I'll skip this chunk for now.
>>     
> Um, this is the upper level code.  The entire purpose of make_huge_pte is to 
> construct a present huge pte from page and pgprot. The problem is that the 
> original code makes the pte, then sets the present bit via pte_mkhuge().  This 
> means the Xen-specific macro that triggers on present is misled and doesn't do 
> the pfn_to_mfn().  Without this patch hugepages is handing pfns to the 
> hypervisor to map instead of mfns.
>   

In principle, setting present should cause the pte to be converted from
pfn to mfn, but I don't think that ever happens with normal ptes (since
non-present ptes contain swap info).  But I don't see where a huge pte
gets present set; pte_mkhuge itself doesn't do anything except set PSE.

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linux PATCH] Fix to hugepages to work around new PWT handling
  2010-06-24 10:05         ` Jeremy Fitzhardinge
@ 2010-06-24 22:38           ` Dave McCracken
  0 siblings, 0 replies; 7+ messages in thread
From: Dave McCracken @ 2010-06-24 22:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen Developers List

On Thursday, June 24, 2010, Jeremy Fitzhardinge wrote:
> > Um, this is the upper level code.  The entire purpose of make_huge_pte is
> > to  construct a present huge pte from page and pgprot. The problem is
> > that the original code makes the pte, then sets the present bit via
> > pte_mkhuge().  This means the Xen-specific macro that triggers on
> > present is misled and doesn't do the pfn_to_mfn().  Without this patch
> > hugepages is handing pfns to the hypervisor to map instead of mfns.
> >
> >   
> 
> In principle, setting present should cause the pte to be converted from
> pfn to mfn, but I don't think that ever happens with normal ptes (since
> non-present ptes contain swap info).  But I don't see where a huge pte
> gets present set; pte_mkhuge itself doesn't do anything except set PSE.

Wow.  I just dug through the code.  The landscape has sure changed since the 
last time I followed this path.

It used to be that vma->vm_page_prot only contained the various read/write 
flags for that vma.  At that time pte_mkhuge() did in fact add _PAGE_PRESENT|
_PAGE_PSE to the pte.

Now it appears that vma->vm_page_prot does include _PAGE_PRESENT in all its 
various states.  So this part of the patch is in fact unnecessary.

It's what I get for not rechecking my facts to be sure they haven't changed.  
Sorry.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-06-24 22:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-09 14:02 [Linux PATCH] Fix to hugepages to work around new PWT handling Dave McCracken
2010-06-09 18:21 ` Jeremy Fitzhardinge
2010-06-09 18:35   ` Dave McCracken
2010-06-09 18:54     ` Jeremy Fitzhardinge
2010-06-09 19:26       ` Dave McCracken
2010-06-24 10:05         ` Jeremy Fitzhardinge
2010-06-24 22:38           ` Dave McCracken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).