All of lore.kernel.org
 help / color / mirror / Atom feed
* Design question for PV superpage support
@ 2009-03-02 13:54 Dave McCracken
  2009-03-02 13:58 ` Keir Fraser
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 13:54 UTC (permalink / raw)
  To: Xen Developers List


The solution I am working on for how to support Linux hugepages (Xen 
superpages) involves creating domains made up entirely of superpages.  I can 
create a working domain with superpages and am in the process of supporting 
it in save/restore.

For this to work properly this should be an attribute of a domain, specified 
somewhere in domain configuration and attached to that domain for its 
lifetime.  This way it could be checked at memory populate time, save/restore 
time, and by the various balloon drivers.

My question is for those of you who best know the overall Xen design 
principles.  Where should the flag be specified by the user?  Where should it 
best be set in the running domain?  I've seen some examples of flags being 
passed around, but would like some guidance on the best place to put it to 
fit into the Xen design.

Thanks,
Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 13:54 Design question for PV superpage support Dave McCracken
@ 2009-03-02 13:58 ` Keir Fraser
  2009-03-02 16:43 ` Mick Jordan
  2009-03-03  1:15 ` Jeremy Fitzhardinge
  2 siblings, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 13:58 UTC (permalink / raw)
  To: Dave McCracken, Xen Developers List

On 02/03/2009 13:54, "Dave McCracken" <dcm@mccr.org> wrote:

> My question is for those of you who best know the overall Xen design
> principles.  Where should the flag be specified by the user?  Where should it
> best be set in the running domain?  I've seen some examples of flags being
> passed around, but would like some guidance on the best place to put it to
> fit into the Xen design.

Specify in domain config file, and also stick it in xenstore somewhere. From
there it should be possible to get it picked up and packed into the
save/restore file pretty much automatically I think, and also you can make
it accessible there by balloon drivers.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 13:54 Design question for PV superpage support Dave McCracken
  2009-03-02 13:58 ` Keir Fraser
@ 2009-03-02 16:43 ` Mick Jordan
  2009-03-02 17:06   ` Keir Fraser
                     ` (2 more replies)
  2009-03-03  1:15 ` Jeremy Fitzhardinge
  2 siblings, 3 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 16:43 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 03/02/09 05:54, Dave McCracken wrote:
> The solution I am working on for how to support Linux hugepages (Xen 
> superpages) involves creating domains made up entirely of superpages.  I can 
> create a working domain with superpages and am in the process of supporting 
> it in save/restore.
>
>   
This wouldn't work too well for me in the case of thread stacks because 
we need to map out parts of the stack and, although we want large 
virtual stacks, we don't want do dedicate that much physical memory. Is 
it really difficult to support mixed pages sizes in the general case, 
e.g., save/restore etc.?

Mick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 16:43 ` Mick Jordan
@ 2009-03-02 17:06   ` Keir Fraser
  2009-03-02 18:02     ` Mick Jordan
  2009-03-02 17:29   ` Dave McCracken
  2009-03-02 17:45   ` Mick Jordan
  2 siblings, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:06 UTC (permalink / raw)
  To: Mick.Jordan@sun.com, Dave McCracken; +Cc: Xen Developers List

On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote:

> On 03/02/09 05:54, Dave McCracken wrote:
>> The solution I am working on for how to support Linux hugepages (Xen
>> superpages) involves creating domains made up entirely of superpages.  I can
>> create a working domain with superpages and am in the process of supporting
>> it in save/restore.
>>   
> This wouldn't work too well for me in the case of thread stacks because
> we need to map out parts of the stack and, although we want large
> virtual stacks, we don't want do dedicate that much physical memory. Is
> it really difficult to support mixed pages sizes in the general case,
> e.g., save/restore etc.?

You can still make 4kB mappings of subsections of 2MB physical extents. And
the guest kernel will still be able to allocate subsections of 2MB physical
extents for various uses. Isn't that all you need for e.g., this thread
stack situation?

Presumably Dave McCracken will be implementing a 'best effort' mode for
domains where we try to allocate superpages but we get by at reduced
performance if we have to allocate some discontiguous extents due to lack of
contiguous available memory. That would be reasonably sensible.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 16:43 ` Mick Jordan
  2009-03-02 17:06   ` Keir Fraser
@ 2009-03-02 17:29   ` Dave McCracken
  2009-03-02 17:52     ` Keir Fraser
  2009-03-02 17:45   ` Mick Jordan
  2 siblings, 1 reply; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 17:29 UTC (permalink / raw)
  To: Mick.Jordan, Xen Developers List

On Monday 02 March 2009, Mick Jordan wrote:
> > The solution I am working on for how to support Linux hugepages (Xen
> > superpages) involves creating domains made up entirely of superpages.  I
> > can create a working domain with superpages and am in the process of
> > supporting it in save/restore.
>
> This wouldn't work too well for me in the case of thread stacks because
> we need to map out parts of the stack and, although we want large
> virtual stacks, we don't want do dedicate that much physical memory. Is
> it really difficult to support mixed pages sizes in the general case,
> e.g., save/restore etc.?

What I am doing is populating the domain with 2M pages.  The hypervisor fills 
in all its internal arrays as if they were regular 4K pages.  The guest is 
then free to use mixed size pages.  The only significant difference is that 
when a guest does allocate a 2M page, it's guaranteed to be properly aligned 
at the machine page level so it can be mapped as a hugepage.   All 4K page 
allocations will continue to work.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 16:43 ` Mick Jordan
  2009-03-02 17:06   ` Keir Fraser
  2009-03-02 17:29   ` Dave McCracken
@ 2009-03-02 17:45   ` Mick Jordan
  2009-03-02 17:54     ` Keir Fraser
  2009-03-02 18:00     ` Dave McCracken
  2 siblings, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 17:45 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 03/02/09 08:43, Mick Jordan wrote:
> On 03/02/09 05:54, Dave McCracken wrote:
>> The solution I am working on for how to support Linux hugepages (Xen 
>> superpages) involves creating domains made up entirely of 
>> superpages.  I can create a working domain with superpages and am in 
>> the process of supporting it in save/restore.
>>
>>   
I'm assuming that this means that everything is upgraded from 4K to 2MB. 
E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.
> This wouldn't work too well for me in the case of thread stacks 
> because we need to map out parts of the stack and, although we want 
> large virtual stacks, we don't want do dedicate that much physical 
> memory. Is it really difficult to support mixed pages sizes in the 
> general case, e.g., save/restore etc.?
Save/restore is definitely important for me and we do support it at 
present. I'm wondering if I might be able to "reapply" my 2MB mappings 
after a restore on a 4K system, given that these are just layered on a 
1-1 mapping between physical/virtual for all allocated memory.

Mick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 17:29   ` Dave McCracken
@ 2009-03-02 17:52     ` Keir Fraser
  2009-03-02 18:03       ` Dan Magenheimer
  0 siblings, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:52 UTC (permalink / raw)
  To: Dave McCracken, Mick.Jordan@sun.com, Xen Developers List

On 02/03/2009 17:29, "Dave McCracken" <dcm@mccr.org> wrote:

>> This wouldn't work too well for me in the case of thread stacks because
>> we need to map out parts of the stack and, although we want large
>> virtual stacks, we don't want do dedicate that much physical memory. Is
>> it really difficult to support mixed pages sizes in the general case,
>> e.g., save/restore etc.?
> 
> What I am doing is populating the domain with 2M pages.  The hypervisor fills
> in all its internal arrays as if they were regular 4K pages.  The guest is
> then free to use mixed size pages.  The only significant difference is that
> when a guest does allocate a 2M page, it's guaranteed to be properly aligned
> at the machine page level so it can be mapped as a hugepage.   All 4K page
> allocations will continue to work.

It'd be nice to fall back to the case of not being able to guarantee all 2MB
extents are aligned and contiguous. So for example being able to migrate to
or restore on a system that currently doesn't have enough contiguous memory.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 17:45   ` Mick Jordan
@ 2009-03-02 17:54     ` Keir Fraser
  2009-03-02 18:00     ` Dave McCracken
  1 sibling, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:54 UTC (permalink / raw)
  To: Mick.Jordan@sun.com, Dave McCracken; +Cc: Xen Developers List

On 02/03/2009 17:45, "Mick Jordan" <Mick.Jordan@sun.com> wrote:

>>> The solution I am working on for how to support Linux hugepages (Xen
>>> superpages) involves creating domains made up entirely of
>>> superpages.  I can create a working domain with superpages and am in
>>> the process of supporting it in save/restore.
> I'm assuming that this means that everything is upgraded from 4K to 2MB.
> E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.

No, it doesn't mean that, which will be clear from Dave's response just now.

 K.

>> This wouldn't work too well for me in the case of thread stacks
>> because we need to map out parts of the stack and, although we want
>> large virtual stacks, we don't want do dedicate that much physical
>> memory. Is it really difficult to support mixed pages sizes in the
>> general case, e.g., save/restore etc.?
> Save/restore is definitely important for me and we do support it at
> present. I'm wondering if I might be able to "reapply" my 2MB mappings
> after a restore on a 4K system, given that these are just layered on a
> 1-1 mapping between physical/virtual for all allocated memory.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 17:45   ` Mick Jordan
  2009-03-02 17:54     ` Keir Fraser
@ 2009-03-02 18:00     ` Dave McCracken
  2009-03-02 18:14       ` Mick Jordan
  2009-03-03  1:32       ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 18:00 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: Xen Developers List

On Monday 02 March 2009, Mick Jordan wrote:
> On 03/02/09 08:43, Mick Jordan wrote:
> > On 03/02/09 05:54, Dave McCracken wrote:
> >> The solution I am working on for how to support Linux hugepages (Xen
> >> superpages) involves creating domains made up entirely of
> >> superpages.  I can create a working domain with superpages and am in
> >> the process of supporting it in save/restore.
>
> I'm assuming that this means that everything is upgraded from 4K to 2MB.
> E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.

No, actually, it doesn't do that.  The hypervisor allocates 2M pages, then 
expands them into 4K pages for the mfn<->pfn maps, etc.

The only effective difference is that any given 2M-aligned range of pfns is 
guaranteed to map to a contiguous 2M-aligned range of mfns.  Therefore the 
guest can safely allocate 2M pages.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 17:06   ` Keir Fraser
@ 2009-03-02 18:02     ` Mick Jordan
  0 siblings, 0 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:02 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Dave McCracken, Xen Developers List


[-- Attachment #1.1: Type: text/plain, Size: 1602 bytes --]

On 03/02/09 09:06, Keir Fraser wrote:
> On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote:
>
>   
>> On 03/02/09 05:54, Dave McCracken wrote:
>>     
>>> The solution I am working on for how to support Linux hugepages (Xen
>>> superpages) involves creating domains made up entirely of superpages.  I can
>>> create a working domain with superpages and am in the process of supporting
>>> it in save/restore.
>>>   
>>>       
>> This wouldn't work too well for me in the case of thread stacks because
>> we need to map out parts of the stack and, although we want large
>> virtual stacks, we don't want do dedicate that much physical memory. Is
>> it really difficult to support mixed pages sizes in the general case,
>> e.g., save/restore etc.?
>>     
>
> You can still make 4kB mappings of subsections of 2MB physical extents. And
> the guest kernel will still be able to allocate subsections of 2MB physical
> extents for various uses. Isn't that all you need for e.g., this thread
> stack situation?
>
>   
Yes, that would work. Assuming it doesn't cause problems in other ways, 
e.g. save/restore, given that this re-introduces mixed mappings. I'd 
appreciate someone explaining the problems for save/restore with the 
earlier patch that simply allowed 2MB pages in PTEs.
> Presumably Dave McCracken will be implementing a 'best effort' mode for
> domains where we try to allocate superpages but we get by at reduced
> performance if we have to allocate some discontiguous extents due to lack of
> contiguous available memory. That would be reasonably sensible.
>
>   
Indeed.
Mick


[-- Attachment #1.2: Type: text/html, Size: 2347 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Design question for PV superpage support
  2009-03-02 17:52     ` Keir Fraser
@ 2009-03-02 18:03       ` Dan Magenheimer
  2009-03-02 18:30         ` Keir Fraser
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-02 18:03 UTC (permalink / raw)
  To: Keir Fraser, Dave McCracken, Mick.Jordan, Xen Developers List

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> 
> It'd be nice to fall back to the case of not being able to 
> guarantee all 2MB
> extents are aligned and contiguous. So for example being able 
> to migrate to
> or restore on a system that currently doesn't have enough 
> contiguous memory.

Well, yes and no.  I believe the ONLY reason to use 2MB
pages is to achieve a significant performance advantage.
And I suspect emulating 2MB "virtual pages" on 4KB physical
pages will perform at least slightly worse than just
4KB-on-4KB, true?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:00     ` Dave McCracken
@ 2009-03-02 18:14       ` Mick Jordan
  2009-03-02 19:14         ` Dave McCracken
  2009-03-03  1:32       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:14 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

On 03/02/09 10:00, Dave McCracken wrote:
> No, actually, it doesn't do that.  The hypervisor allocates 2M pages, then 
> expands them into 4K pages for the mfn<->pfn maps, etc.
>
> The only effective difference is that any given 2M-aligned range of pfns is 
> guaranteed to map to a contiguous 2M-aligned range of mfns.  Therefore the 
> guest can safely allocate 2M pages.
>   
Ok. So I want to re-iterate my question from a previous post. After the 
patch allowing mixed mappings, what exactly went wrong on save/restore. 
And would my special case of 1-1 physival/virtual mappings with 
additional 2MB VM mappings adddress after domain start suffer in that case?

 From my (brief) experience, I think the problems of finding enough 
contiguous machine memory to allocate an all 2MB domain might be 
prohibitive. And when the memory is not fragmented I did not find it 
hard to "find" contiguous aligned 2MB machine pages even with the usual 
(seemingly random) pfn -> mfn mappings. It's a bit more code and runtime 
overhead, but it doesn't happen enough to worry about that.

Mick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:03       ` Dan Magenheimer
@ 2009-03-02 18:30         ` Keir Fraser
  2009-03-02 18:46           ` Mick Jordan
  2009-03-02 18:48           ` Dan Magenheimer
  0 siblings, 2 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 18:30 UTC (permalink / raw)
  To: Dan Magenheimer, Dave McCracken, Mick.Jordan@sun.com,
	Xen Developers List

On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> 
>> It'd be nice to fall back to the case of not being able to
>> guarantee all 2MB
>> extents are aligned and contiguous. So for example being able
>> to migrate to
>> or restore on a system that currently doesn't have enough
>> contiguous memory.
> 
> Well, yes and no.  I believe the ONLY reason to use 2MB
> pages is to achieve a significant performance advantage.
> And I suspect emulating 2MB "virtual pages" on 4KB physical
> pages will perform at least slightly worse than just
> 4KB-on-4KB, true?

If you make this constraint then you risk creating domains that you cannot
always conveniently restore. Obviously you would allocate 2MB extents
wherever possible, since that is the whole point of this drawn out exercise.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:30         ` Keir Fraser
@ 2009-03-02 18:46           ` Mick Jordan
  2009-03-02 18:48           ` Dan Magenheimer
  1 sibling, 0 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:46 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Dan Magenheimer, Dave McCracken, Xen Developers List


[-- Attachment #1.1: Type: text/plain, Size: 1143 bytes --]

On 03/02/09 10:30, Keir Fraser wrote:
> On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>
>   
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>>
>>> It'd be nice to fall back to the case of not being able to
>>> guarantee all 2MB
>>> extents are aligned and contiguous. So for example being able
>>> to migrate to
>>> or restore on a system that currently doesn't have enough
>>> contiguous memory.
>>>       
>> Well, yes and no.  I believe the ONLY reason to use 2MB
>> pages is to achieve a significant performance advantage.
>> And I suspect emulating 2MB "virtual pages" on 4KB physical
>> pages will perform at least slightly worse than just
>> 4KB-on-4KB, true?
>>     
>
> If you make this constraint then you risk creating domains that you cannot
> always conveniently restore. Obviously you would allocate 2MB extents
> wherever possible, since that is the whole point of this drawn out exercise.
>   
Indeed, performance is the issue, less TLB misses. I'm happy to use 2MB 
pages when I can and fall back on 4K when I can't. I just want Xen not 
to fall over and save/restore to work.

Mick




[-- Attachment #1.2: Type: text/html, Size: 1773 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Design question for PV superpage support
  2009-03-02 18:30         ` Keir Fraser
  2009-03-02 18:46           ` Mick Jordan
@ 2009-03-02 18:48           ` Dan Magenheimer
  2009-03-02 19:04             ` Keir Fraser
  1 sibling, 1 reply; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-02 18:48 UTC (permalink / raw)
  To: Keir Fraser, Dave McCracken, Mick.Jordan, Xen Developers List

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> 
> On 02/03/2009 18:03, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> 
> >> It'd be nice to fall back to the case of not being able to
> >> guarantee all 2MB
> >> extents are aligned and contiguous. So for example being able
> >> to migrate to
> >> or restore on a system that currently doesn't have enough
> >> contiguous memory.
> > 
> > Well, yes and no.  I believe the ONLY reason to use 2MB
> > pages is to achieve a significant performance advantage.
> > And I suspect emulating 2MB "virtual pages" on 4KB physical
> > pages will perform at least slightly worse than just
> > 4KB-on-4KB, true?
> 
> If you make this constraint then you risk creating domains 
> that you cannot
> always conveniently restore. Obviously you would allocate 2MB extents
> wherever possible, since that is the whole point of this 
> drawn out exercise.

Understood.  This is a case where convenience and the primary objective
conflict.  I can't think offhand of a way to do it, but restoring
or migrating a 2MB-assumed domain into an environment where the
vast majority of 2MB pages are emulated should probably raise a
bright red flag somehow.  Or there needs to be some tool that
can at least be queried as to how many 2MB pages are being emulated.

But probably the right long-term answer is a 2MB Xen with a 2MB
Linux when applications assume/prefer 2MB pages.

Dan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:48           ` Dan Magenheimer
@ 2009-03-02 19:04             ` Keir Fraser
  0 siblings, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 19:04 UTC (permalink / raw)
  To: Dan Magenheimer, Dave McCracken, Mick.Jordan@sun.com,
	Xen Developers List

On 02/03/2009 18:48, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>> If you make this constraint then you risk creating domains
>> that you cannot
>> always conveniently restore. Obviously you would allocate 2MB extents
>> wherever possible, since that is the whole point of this
>> drawn out exercise.
> 
> Understood.  This is a case where convenience and the primary objective
> conflict.  I can't think offhand of a way to do it, but restoring
> or migrating a 2MB-assumed domain into an environment where the
> vast majority of 2MB pages are emulated should probably raise a
> bright red flag somehow.  Or there needs to be some tool that
> can at least be queried as to how many 2MB pages are being emulated.
> 
> But probably the right long-term answer is a 2MB Xen with a 2MB
> Linux when applications assume/prefer 2MB pages.

I'd certainly be okay with this new config option meaning 'must get 2MB
extents' for now. It can be improved if the apparent downsides bite in
practice.

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:14       ` Mick Jordan
@ 2009-03-02 19:14         ` Dave McCracken
  2009-03-03  1:37           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 19:14 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: Xen Developers List

On Monday 02 March 2009, Mick Jordan wrote:
> Ok. So I want to re-iterate my question from a previous post. After the
> patch allowing mixed mappings, what exactly went wrong on save/restore.
> And would my special case of 1-1 physival/virtual mappings with
> additional 2MB VM mappings adddress after domain start suffer in that case?

My understanding of save/restore is that it will save your carefully selected 
2M pages, cheerfully restore them onto a random set of mfns, then expect your 
guest to continue running.  I haven't studied it enough to know whether your 
guest at least gets a chance to intervene and fix things after the restore.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 13:54 Design question for PV superpage support Dave McCracken
  2009-03-02 13:58 ` Keir Fraser
  2009-03-02 16:43 ` Mick Jordan
@ 2009-03-03  1:15 ` Jeremy Fitzhardinge
  2 siblings, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03  1:15 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List

Dave McCracken wrote:
> The solution I am working on for how to support Linux hugepages (Xen 
> superpages) involves creating domains made up entirely of superpages.  I can 
> create a working domain with superpages and am in the process of supporting 
> it in save/restore.
>
> For this to work properly this should be an attribute of a domain, specified 
> somewhere in domain configuration and attached to that domain for its 
> lifetime.  This way it could be checked at memory populate time, save/restore 
> time, and by the various balloon drivers.
>
> My question is for those of you who best know the overall Xen design 
> principles.  Where should the flag be specified by the user?  Where should it 
> best be set in the running domain?  I've seen some examples of flags being 
> passed around, but would like some guidance on the best place to put it to 
> fit into the Xen design.

One thing I'm not quite sure about: when you support 2M pages for a 
domain, is it fully-supported, to the extent you can safely set PSE in 
cpuid, and allow the guest kernel to use 2M mappings as it usually 
would?  Or are there further restrictions?

You should support a feature flag in the guest kernel's ELF notes to say 
that it support large PV pages.  If the kernel asks for it, then you can 
enable PSE in cpuid, or have some other mechanism for the kernel to 
query that the feature is available.

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 18:00     ` Dave McCracken
  2009-03-02 18:14       ` Mick Jordan
@ 2009-03-03  1:32       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03  1:32 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List, Mick.Jordan

Dave McCracken wrote:
> No, actually, it doesn't do that.  The hypervisor allocates 2M pages, then 
> expands them into 4K pages for the mfn<->pfn maps, etc.
>   

What happens if you start using MMU_MACHPHYS_UPDATE on pages which are 
part of a 2M mapping?

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-02 19:14         ` Dave McCracken
@ 2009-03-03  1:37           ` Jeremy Fitzhardinge
  2009-03-03  3:59             ` Mick Jordan
  0 siblings, 1 reply; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03  1:37 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Xen Developers List, Mick.Jordan

Dave McCracken wrote:
> My understanding of save/restore is that it will save your carefully selected 
> 2M pages, cheerfully restore them onto a random set of mfns, then expect your 
> guest to continue running.  I haven't studied it enough to know whether your 
> guest at least gets a chance to intervene and fix things after the restore.

Your guest would need to be in the position to allocate an extra 512 L1 
pte pages to replace each shattered 2M page, which could be awkward - 
and wouldn't have any realistic way to continue if it fails to do so.  
Perhaps some kind of special pool of pages could be provided to the 
domain to help it satisfy its memory needs in recovering from a 
restore-shattered large page.

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03  1:37           ` Jeremy Fitzhardinge
@ 2009-03-03  3:59             ` Mick Jordan
  2009-03-03 14:33               ` Dan Magenheimer
  2009-03-03 17:26               ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-03  3:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Dave McCracken, Xen Developers List

On 03/02/09 17:37, Jeremy Fitzhardinge wrote:
> Dave McCracken wrote:
>> My understanding of save/restore is that it will save your carefully 
>> selected 2M pages, cheerfully restore them onto a random set of mfns, 
>> then expect your guest to continue running.  I haven't studied it 
>> enough to know whether your guest at least gets a chance to intervene 
>> and fix things after the restore.
>
Since restore already requires quite a lot of reset, e.g., grant table 
mappings, on the part of the guest, it seems that checking the validity 
of any large page mappings should be possible at the same time. 
Obviously you could get in a big mess if you mapped the code that is 
going to do the fixup on a large page, but that is unlikely and easily 
avoidable.
> Your guest would need to be in the position to allocate an extra 512 
> L1 pte pages to replace each shattered 2M page, which could be awkward 
> - and wouldn't have any realistic way to continue if it fails to do 
> so.  Perhaps some kind of special pool of pages could be provided to 
> the domain to help it satisfy its memory needs in recovering from a 
> restore-shattered large page.
In general, I think the guest should assume that large page mappings are 
merely an optimization that (a) might not be possible on domain start 
due to machine memory fragmentation and (b) that this condition might 
also occur on restore. Given these, it must always be prepared to 
function with 4K pages, which implies that it would need to preserve 
enough page table frame memory to be able revert from large to small pages.

Mick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Design question for PV superpage support
  2009-03-03  3:59             ` Mick Jordan
@ 2009-03-03 14:33               ` Dan Magenheimer
  2009-03-03 17:06                 ` Mick Jordan
  2009-03-03 17:28                 ` Jeremy Fitzhardinge
  2009-03-03 17:26               ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-03 14:33 UTC (permalink / raw)
  To: Mick.Jordan, Jeremy Fitzhardinge; +Cc: Dave McCracken, Xen Developers List

> In general, I think the guest should assume that large page 
> mappings are 
> merely an optimization that (a) might not be possible on domain start 
> due to machine memory fragmentation and (b) that this condition might 
> also occur on restore. Given these, it must always be prepared to 
> function with 4K pages, which implies that it would need to preserve 
> enough page table frame memory to be able revert from large 
> to small pages.
> 
> Mick

Do you disagree with my assertion that use of 2MB pages is
almost always an attempt to eke out a performance improvement,
that emulating 2MB pages with fragmented 4KB pages is likely
slower than just using 4KB pages to start with, and thus
that "must always be prepared to function with 4KB pages"
should NOT occur silently (if at all)?

BTW, thinking ahead to ballooning with 2MB pages, are we prepared
to assume that a relinquished 2MB page can't be fragmented?
While this may be appealing for systems where nearly all
guests are using 2MB pages, systems where the 2MB guest is
an odd duck might suffer substantially by making that
assumption.

Dan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03 14:33               ` Dan Magenheimer
@ 2009-03-03 17:06                 ` Mick Jordan
  2009-03-03 17:23                   ` Jeremy Fitzhardinge
  2009-03-03 18:10                   ` Keir Fraser
  2009-03-03 17:28                 ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-03 17:06 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Jeremy Fitzhardinge, Dave McCracken, Xen Developers List


[-- Attachment #1.1: Type: text/plain, Size: 1909 bytes --]

On 03/03/09 06:33, Dan Magenheimer wrote:
>> In general, I think the guest should assume that large page 
>> mappings are 
>> merely an optimization that (a) might not be possible on domain start 
>> due to machine memory fragmentation and (b) that this condition might 
>> also occur on restore. Given these, it must always be prepared to 
>> function with 4K pages, which implies that it would need to preserve 
>> enough page table frame memory to be able revert from large 
>> to small pages.
>>
>> Mick
>>     
>
> Do you disagree with my assertion that use of 2MB pages is
> almost always an attempt to eke out a performance improvement,
> that emulating 2MB pages with fragmented 4KB pages is likely
> slower than just using 4KB pages to start with, and thus
> that "must always be prepared to function with 4KB pages"
> should NOT occur silently (if at all)?
>   
I agree with the first statement. I'm not sure what you mean by "emulate 
2MB pages with fragmented 4K pages" unless you assume nested page table 
support or you just mean falling back to 4K pages. As for whether a 
change should be silent, I'm less clear on that. I certainly wouldn't 
consider it a fatal condition requiring domain termination, That 
position is consistent with the "optimization not correctness" view of 
using large tables. However, a guest might want to indicate in some way 
that it has downgraded
> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> to assume that a relinquished 2MB page can't be fragmented?
> While this may be appealing for systems where nearly all
> guests are using 2MB pages, systems where the 2MB guest is
> an odd duck might suffer substantially by making that
> assumption.
>   
Agreed. All of this really only becomes an issue when memory is 
overcommitted. Unfortunately, that is precisely when 2MB machine 
contiguous pages are likely to be difficult to find.

Mick


[-- Attachment #1.2: Type: text/html, Size: 2405 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03 17:06                 ` Mick Jordan
@ 2009-03-03 17:23                   ` Jeremy Fitzhardinge
  2009-03-03 18:10                   ` Keir Fraser
  1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:23 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: Dan Magenheimer, Dave McCracken, Xen Developers List

Mick Jordan wrote:
> On 03/03/09 06:33, Dan Magenheimer wrote:
>>> In general, I think the guest should assume that large page 
>>> mappings are 
>>> merely an optimization that (a) might not be possible on domain start 
>>> due to machine memory fragmentation and (b) that this condition might 
>>> also occur on restore. Given these, it must always be prepared to 
>>> function with 4K pages, which implies that it would need to preserve 
>>> enough page table frame memory to be able revert from large 
>>> to small pages.
>>>
>>> Mick
>>>     
>>
>> Do you disagree with my assertion that use of 2MB pages is
>> almost always an attempt to eke out a performance improvement,
>> that emulating 2MB pages with fragmented 4KB pages is likely
>> slower than just using 4KB pages to start with, and thus
>> that "must always be prepared to function with 4KB pages"
>> should NOT occur silently (if at all)?
>>   
> I agree with the first statement. I'm not sure what you mean by 
> "emulate 2MB pages with fragmented 4K pages" unless you assume nested 
> page table support or you just mean falling back to 4K pages. As for 
> whether a change should be silent, I'm less clear on that. I certainly 
> wouldn't consider it a fatal condition requiring domain termination, 
> That position is consistent with the "optimization not correctness" 
> view of using large tables. However, a guest might want to indicate in 
> some way that it has downgraded

The tradeoff is between the performance gain one might get from using 
large pages vs the intrusiveness of changes to a PV kernel.  Given that 
when paravirtualizing this we're going to be making small changes to the 
kernel's existing large page support, rather than adding it new or a 
separate large-page mechanism, we need to make sure that as many of the 
guest's existing assumptions can be satisfied.

The requirement that a guest be able to come up with enough L1 pagetable 
pages to be able to map all the shattered 2M mappings at any time 
definitely doesn't fall into that category.  You'd need to:

   1. Have an interface for Xen to tell the guest which pages need to be
      remapped.  Presumably this would be in terms of once contiguous
      pfn ranges which are now backed with discontinuous mfns.
   2. Get the guest to remap those pfns to the new mfns, which will
      require walking every pagetable of every process searching for
      those pfns, allocating memory for the new pagetable level.

However the main use of 2M mappings in Linux is to map the kernel text 
and data.  That's clearly not going to be possible if we need to run 
kernel code to put things together after a restore.  Hm, given that, I 
guess we could just kludge it into hugetlbfs, but it really does make it 
a very narrow set of users.

>> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
>> to assume that a relinquished 2MB page can't be fragmented?
>> While this may be appealing for systems where nearly all
>> guests are using 2MB pages, systems where the 2MB guest is
>> an odd duck might suffer substantially by making that
>> assumption.
>>   
> Agreed. All of this really only becomes an issue when memory is 
> overcommitted. Unfortunately, that is precisely when 2MB machine 
> contiguous pages are likely to be difficult to find.

If 2M pages are becoming more important, then we should change Xen to do 
all domain allocations in 2M units, while reserving separate superpages 
specifically for fragmenting into 4k allocations.  Its certainly 
sensible to always round a domain's initial size up to 2M (most will 
already be a 2M multiple, I suspect).  Balloon is the obvious exception, 
but I would argue that ballooning in less than 2M units is a lot of 
fiddly makework.  The difference between a giving a domain 128MB vs 
126MB is already pretty trivial; dealing with 4k changes in domain size 
is laughably small.

(Now Keir brings up all difficulties...)

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03  3:59             ` Mick Jordan
  2009-03-03 14:33               ` Dan Magenheimer
@ 2009-03-03 17:26               ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:26 UTC (permalink / raw)
  To: Mick.Jordan; +Cc: Dave McCracken, Xen Developers List

Mick Jordan wrote:
> Since restore already requires quite a lot of reset, e.g., grant table 
> mappings, on the part of the guest, it seems that checking the 
> validity of any large page mappings should be possible at the same 
> time. Obviously you could get in a big mess if you mapped the code 
> that is going to do the fixup on a large page, but that is unlikely 
> and easily avoidable.

That's actually the most likely case in Linux.  Not being able to use 2M 
mappings for kernel code+data removes about 95% of the utility.

> In general, I think the guest should assume that large page mappings 
> are merely an optimization that (a) might not be possible on domain 
> start due to machine memory fragmentation and (b) that this condition 
> might also occur on restore. Given these, it must always be prepared 
> to function with 4K pages, which implies that it would need to 
> preserve enough page table frame memory to be able revert from large 
> to small pages.

I think that too intrusive.  I'd want to see some very convincing 
measurements to justify doing these kinds of changes to pvops Linux, for 
example.

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03 14:33               ` Dan Magenheimer
  2009-03-03 17:06                 ` Mick Jordan
@ 2009-03-03 17:28                 ` Jeremy Fitzhardinge
  2009-03-03 18:09                   ` Dan Magenheimer
  1 sibling, 1 reply; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:28 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Dave McCracken, Mick.Jordan, Xen Developers List

Dan Magenheimer wrote:
> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> to assume that a relinquished 2MB page can't be fragmented?
> While this may be appealing for systems where nearly all
> guests are using 2MB pages, systems where the 2MB guest is
> an odd duck might suffer substantially by making that
> assumption.
>   

Well, I still think that 4k pages are a ludicrously tiny unit of memory 
for Xen to be dealing with, and it shouldn't bother to get out of bed 
for less than 2M.  If we treat 4k pages as the special case then keeping 
the Xen heap unfragmented at the 2M level should be fairly easy, no?

    J

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Design question for PV superpage support
  2009-03-03 17:28                 ` Jeremy Fitzhardinge
@ 2009-03-03 18:09                   ` Dan Magenheimer
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-03 18:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Dave McCracken, Mick.Jordan, Xen Developers List

> Dan Magenheimer wrote:
> > BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> > to assume that a relinquished 2MB page can't be fragmented?
> > While this may be appealing for systems where nearly all
> > guests are using 2MB pages, systems where the 2MB guest is
> > an odd duck might suffer substantially by making that
> > assumption.
> 
> Well, I still think that 4k pages are a ludicrously tiny unit 
> of memory 
> for Xen to be dealing with, and it shouldn't bother to get out of bed 
> for less than 2M.  If we treat 4k pages as the special case 
> then keeping 
> the Xen heap unfragmented at the 2M level should be fairly easy, no?

Probably true, though I suspect this is harder than it sounds.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Design question for PV superpage support
  2009-03-03 17:06                 ` Mick Jordan
  2009-03-03 17:23                   ` Jeremy Fitzhardinge
@ 2009-03-03 18:10                   ` Keir Fraser
  1 sibling, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-03 18:10 UTC (permalink / raw)
  To: Mick.Jordan@sun.com, Dan Magenheimer
  Cc: Jeremy Fitzhardinge, Dave McCracken, Xen Developers List

On 03/03/2009 17:06, "Mick Jordan" <Mick.Jordan@sun.com> wrote:

> I agree with the first statement. I'm not sure what you mean by "emulate 2MB
> pages with fragmented 4K pages" unless you assume nested page table support or
> you just mean falling back to 4K pages. As for whether a change should be
> silent, I'm less clear on that. I certainly wouldn't consider it a fatal
> condition requiring domain termination, That position is consistent with the
> "optimization not correctness" view of using large tables. However, a guest
> might want to indicate in some way that it has downgraded

Yeah, I somehow forgot about this actually. Of course it is hard to
downgrade to non-2MB pages across save/restore, because the guest-owned
pagetables have the superpage mappings baked into them. Oh well, that makes
such graceful downgrade much less attractive to implement, so I withdraw the
suggestion!

 -- Keir

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-03-03 18:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
2009-03-02 13:58 ` Keir Fraser
2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06   ` Keir Fraser
2009-03-02 18:02     ` Mick Jordan
2009-03-02 17:29   ` Dave McCracken
2009-03-02 17:52     ` Keir Fraser
2009-03-02 18:03       ` Dan Magenheimer
2009-03-02 18:30         ` Keir Fraser
2009-03-02 18:46           ` Mick Jordan
2009-03-02 18:48           ` Dan Magenheimer
2009-03-02 19:04             ` Keir Fraser
2009-03-02 17:45   ` Mick Jordan
2009-03-02 17:54     ` Keir Fraser
2009-03-02 18:00     ` Dave McCracken
2009-03-02 18:14       ` Mick Jordan
2009-03-02 19:14         ` Dave McCracken
2009-03-03  1:37           ` Jeremy Fitzhardinge
2009-03-03  3:59             ` Mick Jordan
2009-03-03 14:33               ` Dan Magenheimer
2009-03-03 17:06                 ` Mick Jordan
2009-03-03 17:23                   ` Jeremy Fitzhardinge
2009-03-03 18:10                   ` Keir Fraser
2009-03-03 17:28                 ` Jeremy Fitzhardinge
2009-03-03 18:09                   ` Dan Magenheimer
2009-03-03 17:26               ` Jeremy Fitzhardinge
2009-03-03  1:32       ` Jeremy Fitzhardinge
2009-03-03  1:15 ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.