From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Mick.Jordan@sun.com
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
Dave McCracken <dcm@mccr.org>,
Xen Developers List <xen-devel@lists.xensource.com>
Subject: Re: Design question for PV superpage support
Date: Tue, 03 Mar 2009 09:23:36 -0800 [thread overview]
Message-ID: <49AD6798.8040303@goop.org> (raw)
In-Reply-To: <49AD6397.9030803@Sun.COM>
Mick Jordan wrote:
> On 03/03/09 06:33, Dan Magenheimer wrote:
>>> In general, I think the guest should assume that large page
>>> mappings are
>>> merely an optimization that (a) might not be possible on domain start
>>> due to machine memory fragmentation and (b) that this condition might
>>> also occur on restore. Given these, it must always be prepared to
>>> function with 4K pages, which implies that it would need to preserve
>>> enough page table frame memory to be able revert from large
>>> to small pages.
>>>
>>> Mick
>>>
>>
>> Do you disagree with my assertion that use of 2MB pages is
>> almost always an attempt to eke out a performance improvement,
>> that emulating 2MB pages with fragmented 4KB pages is likely
>> slower than just using 4KB pages to start with, and thus
>> that "must always be prepared to function with 4KB pages"
>> should NOT occur silently (if at all)?
>>
> I agree with the first statement. I'm not sure what you mean by
> "emulate 2MB pages with fragmented 4K pages" unless you assume nested
> page table support or you just mean falling back to 4K pages. As for
> whether a change should be silent, I'm less clear on that. I certainly
> wouldn't consider it a fatal condition requiring domain termination,
> That position is consistent with the "optimization not correctness"
> view of using large tables. However, a guest might want to indicate in
> some way that it has downgraded
The tradeoff is between the performance gain one might get from using
large pages vs the intrusiveness of changes to a PV kernel. Given that
when paravirtualizing this we're going to be making small changes to the
kernel's existing large page support, rather than adding it new or a
separate large-page mechanism, we need to make sure that as many of the
guest's existing assumptions can be satisfied.
The requirement that a guest be able to come up with enough L1 pagetable
pages to be able to map all the shattered 2M mappings at any time
definitely doesn't fall into that category. You'd need to:
1. Have an interface for Xen to tell the guest which pages need to be
remapped. Presumably this would be in terms of once contiguous
pfn ranges which are now backed with discontinuous mfns.
2. Get the guest to remap those pfns to the new mfns, which will
require walking every pagetable of every process searching for
those pfns, allocating memory for the new pagetable level.
However the main use of 2M mappings in Linux is to map the kernel text
and data. That's clearly not going to be possible if we need to run
kernel code to put things together after a restore. Hm, given that, I
guess we could just kludge it into hugetlbfs, but it really does make it
a very narrow set of users.
>> BTW, thinking ahead to ballooning with 2MB pages, are we prepared
>> to assume that a relinquished 2MB page can't be fragmented?
>> While this may be appealing for systems where nearly all
>> guests are using 2MB pages, systems where the 2MB guest is
>> an odd duck might suffer substantially by making that
>> assumption.
>>
> Agreed. All of this really only becomes an issue when memory is
> overcommitted. Unfortunately, that is precisely when 2MB machine
> contiguous pages are likely to be difficult to find.
If 2M pages are becoming more important, then we should change Xen to do
all domain allocations in 2M units, while reserving separate superpages
specifically for fragmenting into 4k allocations. Its certainly
sensible to always round a domain's initial size up to 2M (most will
already be a 2M multiple, I suspect). Balloon is the obvious exception,
but I would argue that ballooning in less than 2M units is a lot of
fiddly makework. The difference between a giving a domain 128MB vs
126MB is already pretty trivial; dealing with 4k changes in domain size
is laughably small.
(Now Keir brings up all difficulties...)
J
next prev parent reply other threads:[~2009-03-03 17:23 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-02 13:54 Design question for PV superpage support Dave McCracken
2009-03-02 13:58 ` Keir Fraser
2009-03-02 16:43 ` Mick Jordan
2009-03-02 17:06 ` Keir Fraser
2009-03-02 18:02 ` Mick Jordan
2009-03-02 17:29 ` Dave McCracken
2009-03-02 17:52 ` Keir Fraser
2009-03-02 18:03 ` Dan Magenheimer
2009-03-02 18:30 ` Keir Fraser
2009-03-02 18:46 ` Mick Jordan
2009-03-02 18:48 ` Dan Magenheimer
2009-03-02 19:04 ` Keir Fraser
2009-03-02 17:45 ` Mick Jordan
2009-03-02 17:54 ` Keir Fraser
2009-03-02 18:00 ` Dave McCracken
2009-03-02 18:14 ` Mick Jordan
2009-03-02 19:14 ` Dave McCracken
2009-03-03 1:37 ` Jeremy Fitzhardinge
2009-03-03 3:59 ` Mick Jordan
2009-03-03 14:33 ` Dan Magenheimer
2009-03-03 17:06 ` Mick Jordan
2009-03-03 17:23 ` Jeremy Fitzhardinge [this message]
2009-03-03 18:10 ` Keir Fraser
2009-03-03 17:28 ` Jeremy Fitzhardinge
2009-03-03 18:09 ` Dan Magenheimer
2009-03-03 17:26 ` Jeremy Fitzhardinge
2009-03-03 1:32 ` Jeremy Fitzhardinge
2009-03-03 1:15 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49AD6798.8040303@goop.org \
--to=jeremy@goop.org \
--cc=Mick.Jordan@sun.com \
--cc=dan.magenheimer@oracle.com \
--cc=dcm@mccr.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.