All of lore.kernel.org
 help / color / mirror / Atom feed
* 2MB page PV guest support clarification
@ 2009-02-27 23:01 Mick Jordan
  2009-02-27 23:28 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 13+ messages in thread
From: Mick Jordan @ 2009-02-27 23:01 UTC (permalink / raw)
  To: xen-devel

I was inspired by the talk from Ben Serebrin at this weeks' summit to 
investigate using 2MB pages in my Java VM on Xen for x86-64.

I think I have done everything correctly, but Xen (3.1.4) rejects my 
attempt to set the PSE bit in the L2 frame for the 2MB page. Looking at 
the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the 
update if the PSE bit is set.

I found some posts from quite a while ago on xen-devel discussing 
patches to allow large pages, so my question is for clarification  on 
the status of this feature. I.e., is it in any stable release and if so 
what version?

Thanks
Mick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-27 23:01 2MB page PV guest support clarification Mick Jordan
@ 2009-02-27 23:28 ` Jeremy Fitzhardinge
  2009-02-27 23:54   ` Mick Jordan
  2009-02-28  0:03   ` Ian Pratt
  0 siblings, 2 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-27 23:28 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: xen-devel

Mick Jordan wrote:
> I was inspired by the talk from Ben Serebrin at this weeks' summit to 
> investigate using 2MB pages in my Java VM on Xen for x86-64.
>
> I think I have done everything correctly, but Xen (3.1.4) rejects my 
> attempt to set the PSE bit in the L2 frame for the 2MB page. Looking 
> at the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the 
> update if the PSE bit is set.

Yes.  Xen doesn't support large mappings for PV guests.  However, 
there's a lot less to worry about for PV guests compared to hvm guests. 
A PV guest directly uses the CPU's pagetable+tlb hardware, and so a tlb 
miss results in a single simple walk of the pagetable, and the overall 
tlb pressure is a lot less.  The desire to use large pages for hvm 
guests is driven by the cost of a tlb miss when you have 4k guest pages 
layered on 4k host pages, resulting in 24 memory accesses in the worst 
case; a PV tlb miss is no more expensive than a native tlb miss by 
comparison.

Large pages could potentially reduce the cost of a PV tlb miss as well, 
but also pose quite a few tradeoffs.  You can't generally use large 
mappings for the kernel, as you can native, because of all the pages 
which need RO mappings (pagetables, gdt, etc).  Also, IO and the balloon 
driver operate at 4k page resolution, so breaking a contiguous 2M page 
would require the mapping to be shattered.

> I found some posts from quite a while ago on xen-devel discussing 
> patches to allow large pages, so my question is for clarification  on 
> the status of this feature. I.e., is it in any stable release and if 
> so what version?

Its a work in progress, but there's nothing usable yet, as far as I know.

    J

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-27 23:28 ` Jeremy Fitzhardinge
@ 2009-02-27 23:54   ` Mick Jordan
  2009-02-28  0:03   ` Ian Pratt
  1 sibling, 0 replies; 13+ messages in thread
From: Mick Jordan @ 2009-02-27 23:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On 02/27/09 15:28, Jeremy Fitzhardinge wrote:
> Mick Jordan wrote:
>> I was inspired by the talk from Ben Serebrin at this weeks' summit to 
>> investigate using 2MB pages in my Java VM on Xen for x86-64.
>>
>> I think I have done everything correctly, but Xen (3.1.4) rejects my 
>> attempt to set the PSE bit in the L2 frame for the 2MB page. Looking 
>> at the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the 
>> update if the PSE bit is set.
>
> Yes.  Xen doesn't support large mappings for PV guests.  However, 
> there's a lot less to worry about for PV guests compared to hvm 
> guests. A PV guest directly uses the CPU's pagetable+tlb hardware, and 
> so a tlb miss results in a single simple walk of the pagetable, and 
> the overall tlb pressure is a lot less.  The desire to use large pages 
> for hvm guests is driven by the cost of a tlb miss when you have 4k 
> guest pages layered on 4k host pages, resulting in 24 memory accesses 
> in the worst case; a PV tlb miss is no more expensive than a native 
> tlb miss by comparison.
>
> Large pages could potentially reduce the cost of a PV tlb miss as 
> well, but also pose quite a few tradeoffs.  You can't generally use 
> large mappings for the kernel, as you can native, because of all the 
> pages which need RO mappings (pagetables, gdt, etc).  Also, IO and the 
> balloon driver operate at 4k page resolution, so breaking a contiguous 
> 2M page would require the mapping to be shattered.
Well that's disappointing! The Java heap is a perfect candidate for 
large pages and, since the heap tends to be large, would result in a  
TLB size reduction of a factor of 512, thereby reducing the misses. I 
have the luxury of a lot more semantics on memory usage than a typical 
OS, so would only use large pages where it makes sense (heap and runtime 
compiled code). I have my own equivalent of the balloon driver.

Mick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: 2MB page PV guest support clarification
  2009-02-27 23:28 ` Jeremy Fitzhardinge
  2009-02-27 23:54   ` Mick Jordan
@ 2009-02-28  0:03   ` Ian Pratt
  2009-02-28  0:42     ` Mick Jordan
  2009-02-28 11:12     ` Keir Fraser
  1 sibling, 2 replies; 13+ messages in thread
From: Ian Pratt @ 2009-02-28  0:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Mick.Jordan@sun.com
  Cc: Ian Pratt, xen-devel@lists.xensource.com

> > I found some posts from quite a while ago on xen-devel discussing
> > patches to allow large pages, so my question is for clarification  on
> > the status of this feature. I.e., is it in any stable release and if
> > so what version?
> 
> Its a work in progress, but there's nothing usable yet, as far as I
> know.

Oracle have been working on PV 2MB page support, and I expect they'll pitch in with an update.

Over the last 18 months or so there have been a number of changes to xen's PV PT handling that make support of 2MB pages significantly easier than it was previously. However, the guest has to be careful how it uses them as it can't alias any memory that may be used for storing pagetables pages (that must be RO).

Ian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28  0:03   ` Ian Pratt
@ 2009-02-28  0:42     ` Mick Jordan
  2009-02-28  1:28       ` Ian Pratt
  2009-02-28 11:12     ` Keir Fraser
  1 sibling, 1 reply; 13+ messages in thread
From: Mick Jordan @ 2009-02-28  0:42 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 1238 bytes --]

On 02/27/09 16:03, Ian Pratt wrote:
>>> I found some posts from quite a while ago on xen-devel discussing
>>> patches to allow large pages, so my question is for clarification  on
>>> the status of this feature. I.e., is it in any stable release and if
>>> so what version?
>>>       
>> Its a work in progress, but there's nothing usable yet, as far as I
>> know.
>>     
>
> Oracle have been working on PV 2MB page support, and I expect they'll pitch in with an update.
>
> Over the last 18 months or so there have been a number of changes to xen's PV PT handling that make support of 2MB pages significantly easier than it was previously. However, the guest has to be careful how it uses them as it can't alias any memory that may be used for storing pagetables pages (that must be RO).
>
>   
Thanks for the update. I'll wait to hear from the Oracle guys.

You remark about aliasing prompts me to ask a general question about 
that. I am currently mapping physical to virtual 1-1 (because that is 
what minis-os has always done) as well as mapping parts of that to other 
areas in virtual memory. Both of these are RW mappings. Is that ok? It 
perfectly possible for me to unmap the 1-1 mappings or make them RO if I 
have to.

Mick


[-- Attachment #1.2: Type: text/html, Size: 1740 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: 2MB page PV guest support clarification
  2009-02-28  0:42     ` Mick Jordan
@ 2009-02-28  1:28       ` Ian Pratt
  2009-02-28  1:37         ` Mick Jordan
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Pratt @ 2009-02-28  1:28 UTC (permalink / raw)
  To: Mick.Jordan@sun.com
  Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com, Ian Pratt

> You remark about aliasing prompts me to ask a general question about
> that. I am currently mapping physical to virtual 1-1 (because that is
> what minis-os has always done) as well as mapping parts of that to
> other areas in virtual memory. Both of these are RW mappings. Is that
> ok? It perfectly possible for me to unmap the 1-1 mappings or make them
> RO if I have to.

Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it).

Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It's best to have a PT page allocator that tries to allocate PT's from contiguous regions and then recycles them.

Ian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28  1:28       ` Ian Pratt
@ 2009-02-28  1:37         ` Mick Jordan
  2009-03-02 10:44           ` Rolf Neugebauer
  0 siblings, 1 reply; 13+ messages in thread
From: Mick Jordan @ 2009-02-28  1:37 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 1147 bytes --]

On 02/27/09 17:28, Ian Pratt wrote:
>> You remark about aliasing prompts me to ask a general question about
>> that. I am currently mapping physical to virtual 1-1 (because that is
>> what minis-os has always done) as well as mapping parts of that to
>> other areas in virtual memory. Both of these are RW mappings. Is that
>> ok? It perfectly possible for me to unmap the 1-1 mappings or make them
>> RO if I have to.
>>     
>
> Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it).
>
>   
> Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It's best to have a PT page allocator that tries to allocate PT's from contiguous regions and then recycles them.
>
>   
Ok. I need to check this. Certainly I am at some point taking already 
mapped pages and using them as pagetables. However, I am not getting any 
errors when adding the PTE. So perhaps the code does the mapping change 
already.

Mick


[-- Attachment #1.2: Type: text/html, Size: 1719 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28  0:03   ` Ian Pratt
  2009-02-28  0:42     ` Mick Jordan
@ 2009-02-28 11:12     ` Keir Fraser
  2009-03-02 13:45       ` Dave McCracken
  2009-03-02 16:23       ` Mick Jordan
  1 sibling, 2 replies; 13+ messages in thread
From: Keir Fraser @ 2009-02-28 11:12 UTC (permalink / raw)
  To: Ian Pratt, Jeremy Fitzhardinge, Mick.Jordan@sun.com
  Cc: xen-devel@lists.xensource.com

On 28/02/2009 00:03, "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote:

>> Its a work in progress, but there's nothing usable yet, as far as I
>> know.
> 
> Oracle have been working on PV 2MB page support, and I expect they'll pitch in
> with an update.
> 
> Over the last 18 months or so there have been a number of changes to xen's PV
> PT handling that make support of 2MB pages significantly easier than it was
> previously. However, the guest has to be careful how it uses them as it can't
> alias any memory that may be used for storing pagetables pages (that must be
> RO).

Oracle already got their code checked in. You have to specify
'allowhugepage' on Xen's command line to enable it. It has limitations, such
as save/restore doesn't work.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28  1:37         ` Mick Jordan
@ 2009-03-02 10:44           ` Rolf Neugebauer
  0 siblings, 0 replies; 13+ messages in thread
From: Rolf Neugebauer @ 2009-03-02 10:44 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: Ian Pratt, xen-devel@lists.xensource.com



Mick Jordan wrote:
> On 02/27/09 17:28, Ian Pratt wrote:
>>> You remark about aliasing prompts me to ask a general question about
>>> that. I am currently mapping physical to virtual 1-1 (because that is
>>> what minis-os has always done) as well as mapping parts of that to
>>> other areas in virtual memory. Both of these are RW mappings. Is that
>>> ok? It perfectly possible for me to unmap the 1-1 mappings or make them
>>> RO if I have to.
>>>     
>>
>> Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it).
>>
>>   
>> Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It's best to have a PT page allocator that tries to allocate PT's from contiguous regions and then recycles them.
>>
>>   
> Ok. I need to check this. Certainly I am at some point taking already 
> mapped pages and using them as pagetables. However, I am not getting any 
> errors when adding the PTE. So perhaps the code does the mapping change 
> already.

In mini-os, new_pt_frame() will update the 1:1 mapping to mark a PT page 
RO before hooking it into the page table.

rolf


> 
> Mick
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28 11:12     ` Keir Fraser
@ 2009-03-02 13:45       ` Dave McCracken
  2009-03-02 16:38         ` Mick Jordan
  2009-03-02 16:23       ` Mick Jordan
  1 sibling, 1 reply; 13+ messages in thread
From: Dave McCracken @ 2009-03-02 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Pratt, Mick.Jordan@sun.com, Jeremy Fitzhardinge, Keir Fraser

On Saturday 28 February 2009, Keir Fraser wrote:
> On 28/02/2009 00:03, "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote:
> >> Its a work in progress, but there's nothing usable yet, as far as I
> >> know.
> >
> > Oracle have been working on PV 2MB page support, and I expect they'll
> > pitch in with an update.
> >
> > Over the last 18 months or so there have been a number of changes to
> > xen's PV PT handling that make support of 2MB pages significantly easier
> > than it was previously. However, the guest has to be careful how it uses
> > them as it can't alias any memory that may be used for storing pagetables
> > pages (that must be RO).
>
> Oracle already got their code checked in. You have to specify
> 'allowhugepage' on Xen's command line to enable it. It has limitations,
> such as save/restore doesn't work.

I am the person at Oracle working on PV guest support for 2MB pages.  I did 
get an initial patch accepted into the Xen hypervisor that enables basic 2MB 
page support.  As Keir said, it requires 'allowhugepage' on the Xen 
hypervisor command line.  It supports the basic ability to specify PSE in the 
page table, and takes care of the associated type and reference tracking for 
the mapped page(s).

What this patch does not do is make any guarantee about the alignment of the 
mapped page, which is a hardware requirement.  The solution I am working on 
for this is to create domains with 2MB pages.  The hypervisor already 
supports populating a domain with larger pages.  I am working on supporting 
2MB page domains at creation time and restore time.  This approach will also 
require that balloon drivers understand and work with 2MB pages.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-02-28 11:12     ` Keir Fraser
  2009-03-02 13:45       ` Dave McCracken
@ 2009-03-02 16:23       ` Mick Jordan
  2009-03-02 16:34         ` Keir Fraser
  1 sibling, 1 reply; 13+ messages in thread
From: Mick Jordan @ 2009-03-02 16:23 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel@lists.xensource.com

On 02/28/09 03:12, Keir Fraser wrote:
> Oracle already got their code checked in. You have to specify
> 'allowhugepage' on Xen's command line to enable it. It has limitations, such
> as save/restore doesn't work.
>
>  -- Keir
>
>   
Checked into xen-unstable I presume and not, say 3.3.x? So what stable 
release will this make it into?
Mick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-03-02 16:23       ` Mick Jordan
@ 2009-03-02 16:34         ` Keir Fraser
  0 siblings, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2009-03-02 16:34 UTC (permalink / raw)
  To: Mick.Jordan@sun.com; +Cc: Ian Pratt, xen-devel@lists.xensource.com

On 02/03/2009 16:23, "Mick Jordan" <Mick.Jordan@sun.com> wrote:

> On 02/28/09 03:12, Keir Fraser wrote:
>> Oracle already got their code checked in. You have to specify
>> 'allowhugepage' on Xen's command line to enable it. It has limitations, such
>> as save/restore doesn't work.
>> 
>>  -- Keir
>> 
>>   
> Checked into xen-unstable I presume and not, say 3.3.x? So what stable
> release will this make it into?

3.4.0. It's not generally useful enough to get backported to 3.3 branch. And
if the extra support to make it useful does get checked into xen-unstable,
it's almost certainly then going to be too invasive for 3.3 branch.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2MB page PV guest support clarification
  2009-03-02 13:45       ` Dave McCracken
@ 2009-03-02 16:38         ` Mick Jordan
  0 siblings, 0 replies; 13+ messages in thread
From: Mick Jordan @ 2009-03-02 16:38 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Ian Pratt, xen-devel, Mick Jordan, Keir Fraser

On 03/02/09 05:45, Dave McCracken wrote:
>
> I am the person at Oracle working on PV guest support for 2MB pages.  I did 
> get an initial patch accepted into the Xen hypervisor that enables basic 2MB 
> page support.  As Keir said, it requires 'allowhugepage' on the Xen 
> hypervisor command line.  It supports the basic ability to specify PSE in the 
> page table, and takes care of the associated type and reference tracking for 
> the mapped page(s).
>
> What this patch does not do is make any guarantee about the alignment of the 
> mapped page, which is a hardware requirement.  The solution I am working on 
> for this is to create domains with 2MB pages.  The hypervisor already 
> supports populating a domain with larger pages.  I am working on supporting 
> 2MB page domains at creation time and restore time.  This approach will also 
> require that balloon drivers understand and work with 2MB pages.
>   
In my world, I make sure I allocate aligned contiguous machine 2MB 
pages. Of course that may not always be possible, depending on what you 
get from Xen. And I've seem some wild outer cases, such as swiss cheese 
memory with every other page missing and no physical run longer than two 
4K pages!

Mick

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-02 16:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-27 23:01 2MB page PV guest support clarification Mick Jordan
2009-02-27 23:28 ` Jeremy Fitzhardinge
2009-02-27 23:54   ` Mick Jordan
2009-02-28  0:03   ` Ian Pratt
2009-02-28  0:42     ` Mick Jordan
2009-02-28  1:28       ` Ian Pratt
2009-02-28  1:37         ` Mick Jordan
2009-03-02 10:44           ` Rolf Neugebauer
2009-02-28 11:12     ` Keir Fraser
2009-03-02 13:45       ` Dave McCracken
2009-03-02 16:38         ` Mick Jordan
2009-03-02 16:23       ` Mick Jordan
2009-03-02 16:34         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.