From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
Juergen Gross <jgross@suse.com>,
xen-devel <xen-devel@lists.xenproject.org>
Cc: "Zhang, Yu C" <yu.c.zhang@intel.com>, Jan Beulich <JBeulich@suse.com>
Subject: Re: [XenSummit 2017] Notes from the 5-level-paging session
Date: Thu, 20 Jul 2017 18:36:48 +0800 [thread overview]
Message-ID: <6816708e-ac32-0ef3-3fd7-3af8f4ddc876@linux.intel.com> (raw)
In-Reply-To: <e887be4f-a351-99c9-1c75-3ee90410a303@citrix.com>
On 7/20/2017 6:42 PM, Andrew Cooper wrote:
> On 20/07/17 11:10, Yu Zhang wrote:
>>
>>
>> On 7/17/2017 6:53 PM, Juergen Gross wrote:
>>> Hey,
>>>
>>> I took a few notes at the 5-level-paging session at the summit.
>>> I hope there isn't any major stuff missing...
>>>
>>> Participants (at least naming the active ones): Andrew Cooper,
>>> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>>>
>>> The following topics have been discussed in the session:
>>>
>>>
>>> 1. Do we need support for 5-level-paging PV guests?
>>>
>>> There is no urgent need for 5-level-paging PV guests for the
>>> following reasons:
>>>
>>> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
>>> can be PVH or HVM.
>>>
>>> - A 5-level-paging host supports up to 4 PB physical memory. A
>>> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
>>> for 4 PB memory needs 8 TB space, which just fits into the
>>> hypervisor
>>> reserved memory area in the Linux kernel. Any other hypervisor data
>>> and/or code can live in the additionally available virtual space of
>>> the 5-level-paging mode.
>>>
>>> There was agreement we don't need support of 5-level-paging PV guests
>>> right now. There is a need, however, to support 4-level-paging PV
>>> guests located anywhere in the 52-bit physical space of a
>>> 5-level-paging
>>> host (right now they would have to be in the bottom 64 TB as the Linux
>>> kernel is masking away any MFN bit above 64 TB). I will send patches to
>>> support this.
>>>
>>>
>>> 2. Do we need 5-level-paging shadow mode support?
>>>
>>> While strictly required for PV guests only and no 5-level-paging PV
>>> guests are to be supported, we will need 5-level-paging shadow mode in
>>> the long run. This is necessary because even for a 4-level-paging PV
>>> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
>>> mode on a huge host as switching between the paging modes is rather
>>> complicated and should be avoided. It is much easier to run shadow
>>> mode for the whole page table tree instead for two subtrees only.
>>>
>>> OTOH the first step when implementing 5-level-paging in the hypervisor
>>> doesn't require shadow mode to be working, so it can be omitted in the
>>> beginning.
>>>
>>>
>>> 3. Is it possible to support 5-level-paging in Xen via a specific
>>> binary for the first step?
>>>
>>> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
>>> instead of dynamical switching at boot time for the first prototype.
>>> This request was accepted in order to reduce the complexity of the
>>> initial patches. Boot time switching should be available for the
>>> final solution, though.
>>>
>>>
>>> I hope I didn't miss anything.
>>
>> Thanks a lot for the your help and for the summary, Juergen.
>> And I really need to say thank you for quite a lot people who joined
>> this discussion. It's quite
>> enlightening. :)
>>
>> One thing I can recall is about the wr{fs,gs}base for pv guest. IIRC,
>> our agreement is to turn off
>> the FSGSBASE in cr4 for PV guests and try to emulate the
>> rd{fs,gs}base and wr{fs,gs}base in the
>> #UD handler.
>>
>> But please correct me if I misunderstood. :)
>
> Yes, that matches my understanding.
>
> A second piece of emulation which needs to happen is to modify the #PF
> handler to notice if a PV guest takes a fault with %cr2 being va57
> canonical but not va48 canonical. In this case, we need to decode the
> instruction as far as working out the segment of the memory operand,
> and inject #GP[0]/#SS[0] as appropriate.
Thanks, Andrew. So working out the segment is only to decide if #GP or
#SS is to be injected, right?
And I'm wondering, even when pv guest and hypervisor are both running in
4 level paging mode,
it could be possible for a #PF to have a va48 canonical address, but
there's no #GP/#SS injected.
So it is left to the PV guest kernel I guess?
And if the answer is yes, in 5 level case, to whom shall we inject the
fault? PV guest kernel shall not
handle this fault, right?
B.R.
Yu
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-07-20 10:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-17 10:53 [XenSummit 2017] Notes from the 5-level-paging session Juergen Gross
2017-07-20 10:10 ` Yu Zhang
2017-07-20 10:42 ` Andrew Cooper
2017-07-20 10:36 ` Yu Zhang [this message]
2017-07-20 11:24 ` Andrew Cooper
2017-07-20 12:07 ` Yu Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6816708e-ac32-0ef3-3fd7-3af8f4ddc876@linux.intel.com \
--to=yu.c.zhang@linux.intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=jgross@suse.com \
--cc=xen-devel@lists.xenproject.org \
--cc=yu.c.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).