* Design question for PV superpage support
@ 2009-03-02 13:54 Dave McCracken
2009-03-02 13:58 ` Keir Fraser
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 13:54 UTC (permalink / raw)
To: Xen Developers List
The solution I am working on for how to support Linux hugepages (Xen
superpages) involves creating domains made up entirely of superpages. I can
create a working domain with superpages and am in the process of supporting
it in save/restore.
For this to work properly this should be an attribute of a domain, specified
somewhere in domain configuration and attached to that domain for its
lifetime. This way it could be checked at memory populate time, save/restore
time, and by the various balloon drivers.
My question is for those of you who best know the overall Xen design
principles. Where should the flag be specified by the user? Where should it
best be set in the running domain? I've seen some examples of flags being
passed around, but would like some guidance on the best place to put it to
fit into the Xen design.
Thanks,
Dave McCracken
Oracle Corp.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
@ 2009-03-02 13:58 ` Keir Fraser
2009-03-02 16:43 ` Mick Jordan
2009-03-03 1:15 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 13:58 UTC (permalink / raw)
To: Dave McCracken, Xen Developers List
On 02/03/2009 13:54, "Dave McCracken" <dcm@mccr.org> wrote:
> My question is for those of you who best know the overall Xen design
> principles. Where should the flag be specified by the user? Where should it
> best be set in the running domain? I've seen some examples of flags being
> passed around, but would like some guidance on the best place to put it to
> fit into the Xen design.
Specify in domain config file, and also stick it in xenstore somewhere. From
there it should be possible to get it picked up and packed into the
save/restore file pretty much automatically I think, and also you can make
it accessible there by balloon drivers.
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
2009-03-02 13:58 ` Keir Fraser
@ 2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06 ` Keir Fraser
` (2 more replies)
2009-03-03 1:15 ` Jeremy Fitzhardinge
2 siblings, 3 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 16:43 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List
On 03/02/09 05:54, Dave McCracken wrote:
> The solution I am working on for how to support Linux hugepages (Xen
> superpages) involves creating domains made up entirely of superpages. I can
> create a working domain with superpages and am in the process of supporting
> it in save/restore.
>
>
This wouldn't work too well for me in the case of thread stacks because
we need to map out parts of the stack and, although we want large
virtual stacks, we don't want do dedicate that much physical memory. Is
it really difficult to support mixed pages sizes in the general case,
e.g., save/restore etc.?
Mick
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 16:43 ` Mick Jordan
@ 2009-03-02 17:06 ` Keir Fraser
2009-03-02 18:02 ` Mick Jordan
2009-03-02 17:29 ` Dave McCracken
2009-03-02 17:45 ` Mick Jordan
2 siblings, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:06 UTC (permalink / raw)
To: Mick.Jordan@sun.com, Dave McCracken; +Cc: Xen Developers List
On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote:
> On 03/02/09 05:54, Dave McCracken wrote:
>> The solution I am working on for how to support Linux hugepages (Xen
>> superpages) involves creating domains made up entirely of superpages. I can
>> create a working domain with superpages and am in the process of supporting
>> it in save/restore.
>>
> This wouldn't work too well for me in the case of thread stacks because
> we need to map out parts of the stack and, although we want large
> virtual stacks, we don't want do dedicate that much physical memory. Is
> it really difficult to support mixed pages sizes in the general case,
> e.g., save/restore etc.?
You can still make 4kB mappings of subsections of 2MB physical extents. And
the guest kernel will still be able to allocate subsections of 2MB physical
extents for various uses. Isn't that all you need for e.g., this thread
stack situation?
Presumably Dave McCracken will be implementing a 'best effort' mode for
domains where we try to allocate superpages but we get by at reduced
performance if we have to allocate some discontiguous extents due to lack of
contiguous available memory. That would be reasonably sensible.
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06 ` Keir Fraser
@ 2009-03-02 17:29 ` Dave McCracken
2009-03-02 17:52 ` Keir Fraser
2009-03-02 17:45 ` Mick Jordan
2 siblings, 1 reply; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 17:29 UTC (permalink / raw)
To: Mick.Jordan, Xen Developers List
On Monday 02 March 2009, Mick Jordan wrote:
> > The solution I am working on for how to support Linux hugepages (Xen
> > superpages) involves creating domains made up entirely of superpages. I
> > can create a working domain with superpages and am in the process of
> > supporting it in save/restore.
>
> This wouldn't work too well for me in the case of thread stacks because
> we need to map out parts of the stack and, although we want large
> virtual stacks, we don't want do dedicate that much physical memory. Is
> it really difficult to support mixed pages sizes in the general case,
> e.g., save/restore etc.?
What I am doing is populating the domain with 2M pages. The hypervisor fills
in all its internal arrays as if they were regular 4K pages. The guest is
then free to use mixed size pages. The only significant difference is that
when a guest does allocate a 2M page, it's guaranteed to be properly aligned
at the machine page level so it can be mapped as a hugepage. All 4K page
allocations will continue to work.
Dave McCracken
Oracle Corp.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06 ` Keir Fraser
2009-03-02 17:29 ` Dave McCracken
@ 2009-03-02 17:45 ` Mick Jordan
2009-03-02 17:54 ` Keir Fraser
2009-03-02 18:00 ` Dave McCracken
2 siblings, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 17:45 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List
On 03/02/09 08:43, Mick Jordan wrote:
> On 03/02/09 05:54, Dave McCracken wrote:
>> The solution I am working on for how to support Linux hugepages (Xen
>> superpages) involves creating domains made up entirely of
>> superpages. I can create a working domain with superpages and am in
>> the process of supporting it in save/restore.
>>
>>
I'm assuming that this means that everything is upgraded from 4K to 2MB.
E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.
> This wouldn't work too well for me in the case of thread stacks
> because we need to map out parts of the stack and, although we want
> large virtual stacks, we don't want do dedicate that much physical
> memory. Is it really difficult to support mixed pages sizes in the
> general case, e.g., save/restore etc.?
Save/restore is definitely important for me and we do support it at
present. I'm wondering if I might be able to "reapply" my 2MB mappings
after a restore on a 4K system, given that these are just layered on a
1-1 mapping between physical/virtual for all allocated memory.
Mick
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 17:29 ` Dave McCracken
@ 2009-03-02 17:52 ` Keir Fraser
2009-03-02 18:03 ` Dan Magenheimer
0 siblings, 1 reply; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:52 UTC (permalink / raw)
To: Dave McCracken, Mick.Jordan@sun.com, Xen Developers List
On 02/03/2009 17:29, "Dave McCracken" <dcm@mccr.org> wrote:
>> This wouldn't work too well for me in the case of thread stacks because
>> we need to map out parts of the stack and, although we want large
>> virtual stacks, we don't want do dedicate that much physical memory. Is
>> it really difficult to support mixed pages sizes in the general case,
>> e.g., save/restore etc.?
>
> What I am doing is populating the domain with 2M pages. The hypervisor fills
> in all its internal arrays as if they were regular 4K pages. The guest is
> then free to use mixed size pages. The only significant difference is that
> when a guest does allocate a 2M page, it's guaranteed to be properly aligned
> at the machine page level so it can be mapped as a hugepage. All 4K page
> allocations will continue to work.
It'd be nice to fall back to the case of not being able to guarantee all 2MB
extents are aligned and contiguous. So for example being able to migrate to
or restore on a system that currently doesn't have enough contiguous memory.
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 17:45 ` Mick Jordan
@ 2009-03-02 17:54 ` Keir Fraser
2009-03-02 18:00 ` Dave McCracken
1 sibling, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 17:54 UTC (permalink / raw)
To: Mick.Jordan@sun.com, Dave McCracken; +Cc: Xen Developers List
On 02/03/2009 17:45, "Mick Jordan" <Mick.Jordan@sun.com> wrote:
>>> The solution I am working on for how to support Linux hugepages (Xen
>>> superpages) involves creating domains made up entirely of
>>> superpages. I can create a working domain with superpages and am in
>>> the process of supporting it in save/restore.
> I'm assuming that this means that everything is upgraded from 4K to 2MB.
> E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.
No, it doesn't mean that, which will be clear from Dave's response just now.
K.
>> This wouldn't work too well for me in the case of thread stacks
>> because we need to map out parts of the stack and, although we want
>> large virtual stacks, we don't want do dedicate that much physical
>> memory. Is it really difficult to support mixed pages sizes in the
>> general case, e.g., save/restore etc.?
> Save/restore is definitely important for me and we do support it at
> present. I'm wondering if I might be able to "reapply" my 2MB mappings
> after a restore on a 4K system, given that these are just layered on a
> 1-1 mapping between physical/virtual for all allocated memory.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 17:45 ` Mick Jordan
2009-03-02 17:54 ` Keir Fraser
@ 2009-03-02 18:00 ` Dave McCracken
2009-03-02 18:14 ` Mick Jordan
2009-03-03 1:32 ` Jeremy Fitzhardinge
1 sibling, 2 replies; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 18:00 UTC (permalink / raw)
To: Mick.Jordan; +Cc: Xen Developers List
On Monday 02 March 2009, Mick Jordan wrote:
> On 03/02/09 08:43, Mick Jordan wrote:
> > On 03/02/09 05:54, Dave McCracken wrote:
> >> The solution I am working on for how to support Linux hugepages (Xen
> >> superpages) involves creating domains made up entirely of
> >> superpages. I can create a working domain with superpages and am in
> >> the process of supporting it in save/restore.
>
> I'm assuming that this means that everything is upgraded from 4K to 2MB.
> E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.
No, actually, it doesn't do that. The hypervisor allocates 2M pages, then
expands them into 4K pages for the mfn<->pfn maps, etc.
The only effective difference is that any given 2M-aligned range of pfns is
guaranteed to map to a contiguous 2M-aligned range of mfns. Therefore the
guest can safely allocate 2M pages.
Dave McCracken
Oracle Corp.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 17:06 ` Keir Fraser
@ 2009-03-02 18:02 ` Mick Jordan
0 siblings, 0 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:02 UTC (permalink / raw)
To: Keir Fraser; +Cc: Dave McCracken, Xen Developers List
[-- Attachment #1.1: Type: text/plain, Size: 1602 bytes --]
On 03/02/09 09:06, Keir Fraser wrote:
> On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote:
>
>
>> On 03/02/09 05:54, Dave McCracken wrote:
>>
>>> The solution I am working on for how to support Linux hugepages (Xen
>>> superpages) involves creating domains made up entirely of superpages. I can
>>> create a working domain with superpages and am in the process of supporting
>>> it in save/restore.
>>>
>>>
>> This wouldn't work too well for me in the case of thread stacks because
>> we need to map out parts of the stack and, although we want large
>> virtual stacks, we don't want do dedicate that much physical memory. Is
>> it really difficult to support mixed pages sizes in the general case,
>> e.g., save/restore etc.?
>>
>
> You can still make 4kB mappings of subsections of 2MB physical extents. And
> the guest kernel will still be able to allocate subsections of 2MB physical
> extents for various uses. Isn't that all you need for e.g., this thread
> stack situation?
>
>
Yes, that would work. Assuming it doesn't cause problems in other ways,
e.g. save/restore, given that this re-introduces mixed mappings. I'd
appreciate someone explaining the problems for save/restore with the
earlier patch that simply allowed 2MB pages in PTEs.
> Presumably Dave McCracken will be implementing a 'best effort' mode for
> domains where we try to allocate superpages but we get by at reduced
> performance if we have to allocate some discontiguous extents due to lack of
> contiguous available memory. That would be reasonably sensible.
>
>
Indeed.
Mick
[-- Attachment #1.2: Type: text/html, Size: 2347 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Design question for PV superpage support
2009-03-02 17:52 ` Keir Fraser
@ 2009-03-02 18:03 ` Dan Magenheimer
2009-03-02 18:30 ` Keir Fraser
0 siblings, 1 reply; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-02 18:03 UTC (permalink / raw)
To: Keir Fraser, Dave McCracken, Mick.Jordan, Xen Developers List
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>
> It'd be nice to fall back to the case of not being able to
> guarantee all 2MB
> extents are aligned and contiguous. So for example being able
> to migrate to
> or restore on a system that currently doesn't have enough
> contiguous memory.
Well, yes and no. I believe the ONLY reason to use 2MB
pages is to achieve a significant performance advantage.
And I suspect emulating 2MB "virtual pages" on 4KB physical
pages will perform at least slightly worse than just
4KB-on-4KB, true?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:00 ` Dave McCracken
@ 2009-03-02 18:14 ` Mick Jordan
2009-03-02 19:14 ` Dave McCracken
2009-03-03 1:32 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:14 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List
On 03/02/09 10:00, Dave McCracken wrote:
> No, actually, it doesn't do that. The hypervisor allocates 2M pages, then
> expands them into 4K pages for the mfn<->pfn maps, etc.
>
> The only effective difference is that any given 2M-aligned range of pfns is
> guaranteed to map to a contiguous 2M-aligned range of mfns. Therefore the
> guest can safely allocate 2M pages.
>
Ok. So I want to re-iterate my question from a previous post. After the
patch allowing mixed mappings, what exactly went wrong on save/restore.
And would my special case of 1-1 physival/virtual mappings with
additional 2MB VM mappings adddress after domain start suffer in that case?
From my (brief) experience, I think the problems of finding enough
contiguous machine memory to allocate an all 2MB domain might be
prohibitive. And when the memory is not fragmented I did not find it
hard to "find" contiguous aligned 2MB machine pages even with the usual
(seemingly random) pfn -> mfn mappings. It's a bit more code and runtime
overhead, but it doesn't happen enough to worry about that.
Mick
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:03 ` Dan Magenheimer
@ 2009-03-02 18:30 ` Keir Fraser
2009-03-02 18:46 ` Mick Jordan
2009-03-02 18:48 ` Dan Magenheimer
0 siblings, 2 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 18:30 UTC (permalink / raw)
To: Dan Magenheimer, Dave McCracken, Mick.Jordan@sun.com,
Xen Developers List
On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>
>> It'd be nice to fall back to the case of not being able to
>> guarantee all 2MB
>> extents are aligned and contiguous. So for example being able
>> to migrate to
>> or restore on a system that currently doesn't have enough
>> contiguous memory.
>
> Well, yes and no. I believe the ONLY reason to use 2MB
> pages is to achieve a significant performance advantage.
> And I suspect emulating 2MB "virtual pages" on 4KB physical
> pages will perform at least slightly worse than just
> 4KB-on-4KB, true?
If you make this constraint then you risk creating domains that you cannot
always conveniently restore. Obviously you would allocate 2MB extents
wherever possible, since that is the whole point of this drawn out exercise.
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:30 ` Keir Fraser
@ 2009-03-02 18:46 ` Mick Jordan
2009-03-02 18:48 ` Dan Magenheimer
1 sibling, 0 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-02 18:46 UTC (permalink / raw)
To: Keir Fraser; +Cc: Dan Magenheimer, Dave McCracken, Xen Developers List
[-- Attachment #1.1: Type: text/plain, Size: 1143 bytes --]
On 03/02/09 10:30, Keir Fraser wrote:
> On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>
>
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>>
>>> It'd be nice to fall back to the case of not being able to
>>> guarantee all 2MB
>>> extents are aligned and contiguous. So for example being able
>>> to migrate to
>>> or restore on a system that currently doesn't have enough
>>> contiguous memory.
>>>
>> Well, yes and no. I believe the ONLY reason to use 2MB
>> pages is to achieve a significant performance advantage.
>> And I suspect emulating 2MB "virtual pages" on 4KB physical
>> pages will perform at least slightly worse than just
>> 4KB-on-4KB, true?
>>
>
> If you make this constraint then you risk creating domains that you cannot
> always conveniently restore. Obviously you would allocate 2MB extents
> wherever possible, since that is the whole point of this drawn out exercise.
>
Indeed, performance is the issue, less TLB misses. I'm happy to use 2MB
pages when I can and fall back on 4K when I can't. I just want Xen not
to fall over and save/restore to work.
Mick
[-- Attachment #1.2: Type: text/html, Size: 1773 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Design question for PV superpage support
2009-03-02 18:30 ` Keir Fraser
2009-03-02 18:46 ` Mick Jordan
@ 2009-03-02 18:48 ` Dan Magenheimer
2009-03-02 19:04 ` Keir Fraser
1 sibling, 1 reply; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-02 18:48 UTC (permalink / raw)
To: Keir Fraser, Dave McCracken, Mick.Jordan, Xen Developers List
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>
> On 02/03/2009 18:03, "Dan Magenheimer"
> <dan.magenheimer@oracle.com> wrote:
>
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >>
> >> It'd be nice to fall back to the case of not being able to
> >> guarantee all 2MB
> >> extents are aligned and contiguous. So for example being able
> >> to migrate to
> >> or restore on a system that currently doesn't have enough
> >> contiguous memory.
> >
> > Well, yes and no. I believe the ONLY reason to use 2MB
> > pages is to achieve a significant performance advantage.
> > And I suspect emulating 2MB "virtual pages" on 4KB physical
> > pages will perform at least slightly worse than just
> > 4KB-on-4KB, true?
>
> If you make this constraint then you risk creating domains
> that you cannot
> always conveniently restore. Obviously you would allocate 2MB extents
> wherever possible, since that is the whole point of this
> drawn out exercise.
Understood. This is a case where convenience and the primary objective
conflict. I can't think offhand of a way to do it, but restoring
or migrating a 2MB-assumed domain into an environment where the
vast majority of 2MB pages are emulated should probably raise a
bright red flag somehow. Or there needs to be some tool that
can at least be queried as to how many 2MB pages are being emulated.
But probably the right long-term answer is a 2MB Xen with a 2MB
Linux when applications assume/prefer 2MB pages.
Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:48 ` Dan Magenheimer
@ 2009-03-02 19:04 ` Keir Fraser
0 siblings, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-02 19:04 UTC (permalink / raw)
To: Dan Magenheimer, Dave McCracken, Mick.Jordan@sun.com,
Xen Developers List
On 02/03/2009 18:48, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>> If you make this constraint then you risk creating domains
>> that you cannot
>> always conveniently restore. Obviously you would allocate 2MB extents
>> wherever possible, since that is the whole point of this
>> drawn out exercise.
>
> Understood. This is a case where convenience and the primary objective
> conflict. I can't think offhand of a way to do it, but restoring
> or migrating a 2MB-assumed domain into an environment where the
> vast majority of 2MB pages are emulated should probably raise a
> bright red flag somehow. Or there needs to be some tool that
> can at least be queried as to how many 2MB pages are being emulated.
>
> But probably the right long-term answer is a 2MB Xen with a 2MB
> Linux when applications assume/prefer 2MB pages.
I'd certainly be okay with this new config option meaning 'must get 2MB
extents' for now. It can be improved if the apparent downsides bite in
practice.
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:14 ` Mick Jordan
@ 2009-03-02 19:14 ` Dave McCracken
2009-03-03 1:37 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 28+ messages in thread
From: Dave McCracken @ 2009-03-02 19:14 UTC (permalink / raw)
To: Mick.Jordan; +Cc: Xen Developers List
On Monday 02 March 2009, Mick Jordan wrote:
> Ok. So I want to re-iterate my question from a previous post. After the
> patch allowing mixed mappings, what exactly went wrong on save/restore.
> And would my special case of 1-1 physival/virtual mappings with
> additional 2MB VM mappings adddress after domain start suffer in that case?
My understanding of save/restore is that it will save your carefully selected
2M pages, cheerfully restore them onto a random set of mfns, then expect your
guest to continue running. I haven't studied it enough to know whether your
guest at least gets a chance to intervene and fix things after the restore.
Dave McCracken
Oracle Corp.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
2009-03-02 13:58 ` Keir Fraser
2009-03-02 16:43 ` Mick Jordan
@ 2009-03-03 1:15 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 1:15 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List
Dave McCracken wrote:
> The solution I am working on for how to support Linux hugepages (Xen
> superpages) involves creating domains made up entirely of superpages. I can
> create a working domain with superpages and am in the process of supporting
> it in save/restore.
>
> For this to work properly this should be an attribute of a domain, specified
> somewhere in domain configuration and attached to that domain for its
> lifetime. This way it could be checked at memory populate time, save/restore
> time, and by the various balloon drivers.
>
> My question is for those of you who best know the overall Xen design
> principles. Where should the flag be specified by the user? Where should it
> best be set in the running domain? I've seen some examples of flags being
> passed around, but would like some guidance on the best place to put it to
> fit into the Xen design.
One thing I'm not quite sure about: when you support 2M pages for a
domain, is it fully-supported, to the extent you can safely set PSE in
cpuid, and allow the guest kernel to use 2M mappings as it usually
would? Or are there further restrictions?
You should support a feature flag in the guest kernel's ELF notes to say
that it support large PV pages. If the kernel asks for it, then you can
enable PSE in cpuid, or have some other mechanism for the kernel to
query that the feature is available.
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 18:00 ` Dave McCracken
2009-03-02 18:14 ` Mick Jordan
@ 2009-03-03 1:32 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 1:32 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List, Mick.Jordan
Dave McCracken wrote:
> No, actually, it doesn't do that. The hypervisor allocates 2M pages, then
> expands them into 4K pages for the mfn<->pfn maps, etc.
>
What happens if you start using MMU_MACHPHYS_UPDATE on pages which are
part of a 2M mapping?
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-02 19:14 ` Dave McCracken
@ 2009-03-03 1:37 ` Jeremy Fitzhardinge
2009-03-03 3:59 ` Mick Jordan
0 siblings, 1 reply; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 1:37 UTC (permalink / raw)
To: Dave McCracken; +Cc: Xen Developers List, Mick.Jordan
Dave McCracken wrote:
> My understanding of save/restore is that it will save your carefully selected
> 2M pages, cheerfully restore them onto a random set of mfns, then expect your
> guest to continue running. I haven't studied it enough to know whether your
> guest at least gets a chance to intervene and fix things after the restore.
Your guest would need to be in the position to allocate an extra 512 L1
pte pages to replace each shattered 2M page, which could be awkward -
and wouldn't have any realistic way to continue if it fails to do so.
Perhaps some kind of special pool of pages could be provided to the
domain to help it satisfy its memory needs in recovering from a
restore-shattered large page.
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 1:37 ` Jeremy Fitzhardinge
@ 2009-03-03 3:59 ` Mick Jordan
2009-03-03 14:33 ` Dan Magenheimer
2009-03-03 17:26 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-03 3:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Dave McCracken, Xen Developers List
On 03/02/09 17:37, Jeremy Fitzhardinge wrote:
> Dave McCracken wrote:
>> My understanding of save/restore is that it will save your carefully
>> selected 2M pages, cheerfully restore them onto a random set of mfns,
>> then expect your guest to continue running. I haven't studied it
>> enough to know whether your guest at least gets a chance to intervene
>> and fix things after the restore.
>
Since restore already requires quite a lot of reset, e.g., grant table
mappings, on the part of the guest, it seems that checking the validity
of any large page mappings should be possible at the same time.
Obviously you could get in a big mess if you mapped the code that is
going to do the fixup on a large page, but that is unlikely and easily
avoidable.
> Your guest would need to be in the position to allocate an extra 512
> L1 pte pages to replace each shattered 2M page, which could be awkward
> - and wouldn't have any realistic way to continue if it fails to do
> so. Perhaps some kind of special pool of pages could be provided to
> the domain to help it satisfy its memory needs in recovering from a
> restore-shattered large page.
In general, I think the guest should assume that large page mappings are
merely an optimization that (a) might not be possible on domain start
due to machine memory fragmentation and (b) that this condition might
also occur on restore. Given these, it must always be prepared to
function with 4K pages, which implies that it would need to preserve
enough page table frame memory to be able revert from large to small pages.
Mick
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Design question for PV superpage support
2009-03-03 3:59 ` Mick Jordan
@ 2009-03-03 14:33 ` Dan Magenheimer
2009-03-03 17:06 ` Mick Jordan
2009-03-03 17:28 ` Jeremy Fitzhardinge
2009-03-03 17:26 ` Jeremy Fitzhardinge
1 sibling, 2 replies; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-03 14:33 UTC (permalink / raw)
To: Mick.Jordan, Jeremy Fitzhardinge; +Cc: Dave McCracken, Xen Developers List
> In general, I think the guest should assume that large page
> mappings are
> merely an optimization that (a) might not be possible on domain start
> due to machine memory fragmentation and (b) that this condition might
> also occur on restore. Given these, it must always be prepared to
> function with 4K pages, which implies that it would need to preserve
> enough page table frame memory to be able revert from large
> to small pages.
>
> Mick
Do you disagree with my assertion that use of 2MB pages is
almost always an attempt to eke out a performance improvement,
that emulating 2MB pages with fragmented 4KB pages is likely
slower than just using 4KB pages to start with, and thus
that "must always be prepared to function with 4KB pages"
should NOT occur silently (if at all)?
BTW, thinking ahead to ballooning with 2MB pages, are we prepared
to assume that a relinquished 2MB page can't be fragmented?
While this may be appealing for systems where nearly all
guests are using 2MB pages, systems where the 2MB guest is
an odd duck might suffer substantially by making that
assumption.
Dan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 14:33 ` Dan Magenheimer
@ 2009-03-03 17:06 ` Mick Jordan
2009-03-03 17:23 ` Jeremy Fitzhardinge
2009-03-03 18:10 ` Keir Fraser
2009-03-03 17:28 ` Jeremy Fitzhardinge
1 sibling, 2 replies; 28+ messages in thread
From: Mick Jordan @ 2009-03-03 17:06 UTC (permalink / raw)
To: Dan Magenheimer; +Cc: Jeremy Fitzhardinge, Dave McCracken, Xen Developers List
[-- Attachment #1.1: Type: text/plain, Size: 1909 bytes --]
On 03/03/09 06:33, Dan Magenheimer wrote:
>> In general, I think the guest should assume that large page
>> mappings are
>> merely an optimization that (a) might not be possible on domain start
>> due to machine memory fragmentation and (b) that this condition might
>> also occur on restore. Given these, it must always be prepared to
>> function with 4K pages, which implies that it would need to preserve
>> enough page table frame memory to be able revert from large
>> to small pages.
>>
>> Mick
>>
>
> Do you disagree with my assertion that use of 2MB pages is
> almost always an attempt to eke out a performance improvement,
> that emulating 2MB pages with fragmented 4KB pages is likely
> slower than just using 4KB pages to start with, and thus
> that "must always be prepared to function with 4KB pages"
> should NOT occur silently (if at all)?
>
I agree with the first statement. I'm not sure what you mean by "emulate
2MB pages with fragmented 4K pages" unless you assume nested page table
support or you just mean falling back to 4K pages. As for whether a
change should be silent, I'm less clear on that. I certainly wouldn't
consider it a fatal condition requiring domain termination, That
position is consistent with the "optimization not correctness" view of
using large tables. However, a guest might want to indicate in some way
that it has downgraded
> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> to assume that a relinquished 2MB page can't be fragmented?
> While this may be appealing for systems where nearly all
> guests are using 2MB pages, systems where the 2MB guest is
> an odd duck might suffer substantially by making that
> assumption.
>
Agreed. All of this really only becomes an issue when memory is
overcommitted. Unfortunately, that is precisely when 2MB machine
contiguous pages are likely to be difficult to find.
Mick
[-- Attachment #1.2: Type: text/html, Size: 2405 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 17:06 ` Mick Jordan
@ 2009-03-03 17:23 ` Jeremy Fitzhardinge
2009-03-03 18:10 ` Keir Fraser
1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:23 UTC (permalink / raw)
To: Mick.Jordan; +Cc: Dan Magenheimer, Dave McCracken, Xen Developers List
Mick Jordan wrote:
> On 03/03/09 06:33, Dan Magenheimer wrote:
>>> In general, I think the guest should assume that large page
>>> mappings are
>>> merely an optimization that (a) might not be possible on domain start
>>> due to machine memory fragmentation and (b) that this condition might
>>> also occur on restore. Given these, it must always be prepared to
>>> function with 4K pages, which implies that it would need to preserve
>>> enough page table frame memory to be able revert from large
>>> to small pages.
>>>
>>> Mick
>>>
>>
>> Do you disagree with my assertion that use of 2MB pages is
>> almost always an attempt to eke out a performance improvement,
>> that emulating 2MB pages with fragmented 4KB pages is likely
>> slower than just using 4KB pages to start with, and thus
>> that "must always be prepared to function with 4KB pages"
>> should NOT occur silently (if at all)?
>>
> I agree with the first statement. I'm not sure what you mean by
> "emulate 2MB pages with fragmented 4K pages" unless you assume nested
> page table support or you just mean falling back to 4K pages. As for
> whether a change should be silent, I'm less clear on that. I certainly
> wouldn't consider it a fatal condition requiring domain termination,
> That position is consistent with the "optimization not correctness"
> view of using large tables. However, a guest might want to indicate in
> some way that it has downgraded
The tradeoff is between the performance gain one might get from using
large pages vs the intrusiveness of changes to a PV kernel. Given that
when paravirtualizing this we're going to be making small changes to the
kernel's existing large page support, rather than adding it new or a
separate large-page mechanism, we need to make sure that as many of the
guest's existing assumptions can be satisfied.
The requirement that a guest be able to come up with enough L1 pagetable
pages to be able to map all the shattered 2M mappings at any time
definitely doesn't fall into that category. You'd need to:
1. Have an interface for Xen to tell the guest which pages need to be
remapped. Presumably this would be in terms of once contiguous
pfn ranges which are now backed with discontinuous mfns.
2. Get the guest to remap those pfns to the new mfns, which will
require walking every pagetable of every process searching for
those pfns, allocating memory for the new pagetable level.
However the main use of 2M mappings in Linux is to map the kernel text
and data. That's clearly not going to be possible if we need to run
kernel code to put things together after a restore. Hm, given that, I
guess we could just kludge it into hugetlbfs, but it really does make it
a very narrow set of users.
>> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
>> to assume that a relinquished 2MB page can't be fragmented?
>> While this may be appealing for systems where nearly all
>> guests are using 2MB pages, systems where the 2MB guest is
>> an odd duck might suffer substantially by making that
>> assumption.
>>
> Agreed. All of this really only becomes an issue when memory is
> overcommitted. Unfortunately, that is precisely when 2MB machine
> contiguous pages are likely to be difficult to find.
If 2M pages are becoming more important, then we should change Xen to do
all domain allocations in 2M units, while reserving separate superpages
specifically for fragmenting into 4k allocations. Its certainly
sensible to always round a domain's initial size up to 2M (most will
already be a 2M multiple, I suspect). Balloon is the obvious exception,
but I would argue that ballooning in less than 2M units is a lot of
fiddly makework. The difference between a giving a domain 128MB vs
126MB is already pretty trivial; dealing with 4k changes in domain size
is laughably small.
(Now Keir brings up all difficulties...)
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 3:59 ` Mick Jordan
2009-03-03 14:33 ` Dan Magenheimer
@ 2009-03-03 17:26 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:26 UTC (permalink / raw)
To: Mick.Jordan; +Cc: Dave McCracken, Xen Developers List
Mick Jordan wrote:
> Since restore already requires quite a lot of reset, e.g., grant table
> mappings, on the part of the guest, it seems that checking the
> validity of any large page mappings should be possible at the same
> time. Obviously you could get in a big mess if you mapped the code
> that is going to do the fixup on a large page, but that is unlikely
> and easily avoidable.
That's actually the most likely case in Linux. Not being able to use 2M
mappings for kernel code+data removes about 95% of the utility.
> In general, I think the guest should assume that large page mappings
> are merely an optimization that (a) might not be possible on domain
> start due to machine memory fragmentation and (b) that this condition
> might also occur on restore. Given these, it must always be prepared
> to function with 4K pages, which implies that it would need to
> preserve enough page table frame memory to be able revert from large
> to small pages.
I think that too intrusive. I'd want to see some very convincing
measurements to justify doing these kinds of changes to pvops Linux, for
example.
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 14:33 ` Dan Magenheimer
2009-03-03 17:06 ` Mick Jordan
@ 2009-03-03 17:28 ` Jeremy Fitzhardinge
2009-03-03 18:09 ` Dan Magenheimer
1 sibling, 1 reply; 28+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-03 17:28 UTC (permalink / raw)
To: Dan Magenheimer; +Cc: Dave McCracken, Mick.Jordan, Xen Developers List
Dan Magenheimer wrote:
> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> to assume that a relinquished 2MB page can't be fragmented?
> While this may be appealing for systems where nearly all
> guests are using 2MB pages, systems where the 2MB guest is
> an odd duck might suffer substantially by making that
> assumption.
>
Well, I still think that 4k pages are a ludicrously tiny unit of memory
for Xen to be dealing with, and it shouldn't bother to get out of bed
for less than 2M. If we treat 4k pages as the special case then keeping
the Xen heap unfragmented at the 2M level should be fairly easy, no?
J
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Design question for PV superpage support
2009-03-03 17:28 ` Jeremy Fitzhardinge
@ 2009-03-03 18:09 ` Dan Magenheimer
0 siblings, 0 replies; 28+ messages in thread
From: Dan Magenheimer @ 2009-03-03 18:09 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Dave McCracken, Mick.Jordan, Xen Developers List
> Dan Magenheimer wrote:
> > BTW, thinking ahead to ballooning with 2MB pages, are we prepared
> > to assume that a relinquished 2MB page can't be fragmented?
> > While this may be appealing for systems where nearly all
> > guests are using 2MB pages, systems where the 2MB guest is
> > an odd duck might suffer substantially by making that
> > assumption.
>
> Well, I still think that 4k pages are a ludicrously tiny unit
> of memory
> for Xen to be dealing with, and it shouldn't bother to get out of bed
> for less than 2M. If we treat 4k pages as the special case
> then keeping
> the Xen heap unfragmented at the 2M level should be fairly easy, no?
Probably true, though I suspect this is harder than it sounds.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Design question for PV superpage support
2009-03-03 17:06 ` Mick Jordan
2009-03-03 17:23 ` Jeremy Fitzhardinge
@ 2009-03-03 18:10 ` Keir Fraser
1 sibling, 0 replies; 28+ messages in thread
From: Keir Fraser @ 2009-03-03 18:10 UTC (permalink / raw)
To: Mick.Jordan@sun.com, Dan Magenheimer
Cc: Jeremy Fitzhardinge, Dave McCracken, Xen Developers List
On 03/03/2009 17:06, "Mick Jordan" <Mick.Jordan@sun.com> wrote:
> I agree with the first statement. I'm not sure what you mean by "emulate 2MB
> pages with fragmented 4K pages" unless you assume nested page table support or
> you just mean falling back to 4K pages. As for whether a change should be
> silent, I'm less clear on that. I certainly wouldn't consider it a fatal
> condition requiring domain termination, That position is consistent with the
> "optimization not correctness" view of using large tables. However, a guest
> might want to indicate in some way that it has downgraded
Yeah, I somehow forgot about this actually. Of course it is hard to
downgrade to non-2MB pages across save/restore, because the guest-owned
pagetables have the superpage mappings baked into them. Oh well, that makes
such graceful downgrade much less attractive to implement, so I withdraw the
suggestion!
-- Keir
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2009-03-03 18:10 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
2009-03-02 13:58 ` Keir Fraser
2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06 ` Keir Fraser
2009-03-02 18:02 ` Mick Jordan
2009-03-02 17:29 ` Dave McCracken
2009-03-02 17:52 ` Keir Fraser
2009-03-02 18:03 ` Dan Magenheimer
2009-03-02 18:30 ` Keir Fraser
2009-03-02 18:46 ` Mick Jordan
2009-03-02 18:48 ` Dan Magenheimer
2009-03-02 19:04 ` Keir Fraser
2009-03-02 17:45 ` Mick Jordan
2009-03-02 17:54 ` Keir Fraser
2009-03-02 18:00 ` Dave McCracken
2009-03-02 18:14 ` Mick Jordan
2009-03-02 19:14 ` Dave McCracken
2009-03-03 1:37 ` Jeremy Fitzhardinge
2009-03-03 3:59 ` Mick Jordan
2009-03-03 14:33 ` Dan Magenheimer
2009-03-03 17:06 ` Mick Jordan
2009-03-03 17:23 ` Jeremy Fitzhardinge
2009-03-03 18:10 ` Keir Fraser
2009-03-03 17:28 ` Jeremy Fitzhardinge
2009-03-03 18:09 ` Dan Magenheimer
2009-03-03 17:26 ` Jeremy Fitzhardinge
2009-03-03 1:32 ` Jeremy Fitzhardinge
2009-03-03 1:15 ` Jeremy Fitzhardinge
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.