* [XenSummit 2017] Notes from the 5-level-paging session
@ 2017-07-17 10:53 Juergen Gross
2017-07-20 10:10 ` Yu Zhang
0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2017-07-17 10:53 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper, Zhang, Yu C, Jan Beulich
Hey,
I took a few notes at the 5-level-paging session at the summit.
I hope there isn't any major stuff missing...
Participants (at least naming the active ones): Andrew Cooper,
Jan Beulich, Yu Zhang and myself (the list is just from my memory).
The following topics have been discussed in the session:
1. Do we need support for 5-level-paging PV guests?
There is no urgent need for 5-level-paging PV guests for the
following reasons:
- Guests >64TB (which is the upper limit for 4-level-paging Linux)
can be PVH or HVM.
- A 5-level-paging host supports up to 4 PB physical memory. A
4-level-paging PV-Dom0 can support that theoretically: the M2P map
for 4 PB memory needs 8 TB space, which just fits into the hypervisor
reserved memory area in the Linux kernel. Any other hypervisor data
and/or code can live in the additionally available virtual space of
the 5-level-paging mode.
There was agreement we don't need support of 5-level-paging PV guests
right now. There is a need, however, to support 4-level-paging PV
guests located anywhere in the 52-bit physical space of a 5-level-paging
host (right now they would have to be in the bottom 64 TB as the Linux
kernel is masking away any MFN bit above 64 TB). I will send patches to
support this.
2. Do we need 5-level-paging shadow mode support?
While strictly required for PV guests only and no 5-level-paging PV
guests are to be supported, we will need 5-level-paging shadow mode in
the long run. This is necessary because even for a 4-level-paging PV
guest (or a 32-bit PV guest) the processor will run in 5-level-paging
mode on a huge host as switching between the paging modes is rather
complicated and should be avoided. It is much easier to run shadow
mode for the whole page table tree instead for two subtrees only.
OTOH the first step when implementing 5-level-paging in the hypervisor
doesn't require shadow mode to be working, so it can be omitted in the
beginning.
3. Is it possible to support 5-level-paging in Xen via a specific
binary for the first step?
Yu Zhang asked for implementing 5-level-paging via a Kconfig option
instead of dynamical switching at boot time for the first prototype.
This request was accepted in order to reduce the complexity of the
initial patches. Boot time switching should be available for the
final solution, though.
I hope I didn't miss anything.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [XenSummit 2017] Notes from the 5-level-paging session
2017-07-17 10:53 [XenSummit 2017] Notes from the 5-level-paging session Juergen Gross
@ 2017-07-20 10:10 ` Yu Zhang
2017-07-20 10:42 ` Andrew Cooper
0 siblings, 1 reply; 6+ messages in thread
From: Yu Zhang @ 2017-07-20 10:10 UTC (permalink / raw)
To: Juergen Gross, xen-devel; +Cc: Andrew Cooper, Jan Beulich, Zhang, Yu C
On 7/17/2017 6:53 PM, Juergen Gross wrote:
> Hey,
>
> I took a few notes at the 5-level-paging session at the summit.
> I hope there isn't any major stuff missing...
>
> Participants (at least naming the active ones): Andrew Cooper,
> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>
> The following topics have been discussed in the session:
>
>
> 1. Do we need support for 5-level-paging PV guests?
>
> There is no urgent need for 5-level-paging PV guests for the
> following reasons:
>
> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
> can be PVH or HVM.
>
> - A 5-level-paging host supports up to 4 PB physical memory. A
> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
> for 4 PB memory needs 8 TB space, which just fits into the hypervisor
> reserved memory area in the Linux kernel. Any other hypervisor data
> and/or code can live in the additionally available virtual space of
> the 5-level-paging mode.
>
> There was agreement we don't need support of 5-level-paging PV guests
> right now. There is a need, however, to support 4-level-paging PV
> guests located anywhere in the 52-bit physical space of a 5-level-paging
> host (right now they would have to be in the bottom 64 TB as the Linux
> kernel is masking away any MFN bit above 64 TB). I will send patches to
> support this.
>
>
> 2. Do we need 5-level-paging shadow mode support?
>
> While strictly required for PV guests only and no 5-level-paging PV
> guests are to be supported, we will need 5-level-paging shadow mode in
> the long run. This is necessary because even for a 4-level-paging PV
> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
> mode on a huge host as switching between the paging modes is rather
> complicated and should be avoided. It is much easier to run shadow
> mode for the whole page table tree instead for two subtrees only.
>
> OTOH the first step when implementing 5-level-paging in the hypervisor
> doesn't require shadow mode to be working, so it can be omitted in the
> beginning.
>
>
> 3. Is it possible to support 5-level-paging in Xen via a specific
> binary for the first step?
>
> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
> instead of dynamical switching at boot time for the first prototype.
> This request was accepted in order to reduce the complexity of the
> initial patches. Boot time switching should be available for the
> final solution, though.
>
>
> I hope I didn't miss anything.
Thanks a lot for the your help and for the summary, Juergen.
And I really need to say thank you for quite a lot people who joined
this discussion. It's quite
enlightening. :)
One thing I can recall is about the wr{fs,gs}base for pv guest. IIRC,
our agreement is to turn off
the FSGSBASE in cr4 for PV guests and try to emulate the rd{fs,gs}base
and wr{fs,gs}base in the
#UD handler.
But please correct me if I misunderstood. :)
B.R.
Yu
>
> Juergen
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [XenSummit 2017] Notes from the 5-level-paging session
2017-07-20 10:42 ` Andrew Cooper
@ 2017-07-20 10:36 ` Yu Zhang
2017-07-20 11:24 ` Andrew Cooper
0 siblings, 1 reply; 6+ messages in thread
From: Yu Zhang @ 2017-07-20 10:36 UTC (permalink / raw)
To: Andrew Cooper, Juergen Gross, xen-devel; +Cc: Zhang, Yu C, Jan Beulich
On 7/20/2017 6:42 PM, Andrew Cooper wrote:
> On 20/07/17 11:10, Yu Zhang wrote:
>>
>>
>> On 7/17/2017 6:53 PM, Juergen Gross wrote:
>>> Hey,
>>>
>>> I took a few notes at the 5-level-paging session at the summit.
>>> I hope there isn't any major stuff missing...
>>>
>>> Participants (at least naming the active ones): Andrew Cooper,
>>> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>>>
>>> The following topics have been discussed in the session:
>>>
>>>
>>> 1. Do we need support for 5-level-paging PV guests?
>>>
>>> There is no urgent need for 5-level-paging PV guests for the
>>> following reasons:
>>>
>>> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
>>> can be PVH or HVM.
>>>
>>> - A 5-level-paging host supports up to 4 PB physical memory. A
>>> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
>>> for 4 PB memory needs 8 TB space, which just fits into the
>>> hypervisor
>>> reserved memory area in the Linux kernel. Any other hypervisor data
>>> and/or code can live in the additionally available virtual space of
>>> the 5-level-paging mode.
>>>
>>> There was agreement we don't need support of 5-level-paging PV guests
>>> right now. There is a need, however, to support 4-level-paging PV
>>> guests located anywhere in the 52-bit physical space of a
>>> 5-level-paging
>>> host (right now they would have to be in the bottom 64 TB as the Linux
>>> kernel is masking away any MFN bit above 64 TB). I will send patches to
>>> support this.
>>>
>>>
>>> 2. Do we need 5-level-paging shadow mode support?
>>>
>>> While strictly required for PV guests only and no 5-level-paging PV
>>> guests are to be supported, we will need 5-level-paging shadow mode in
>>> the long run. This is necessary because even for a 4-level-paging PV
>>> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
>>> mode on a huge host as switching between the paging modes is rather
>>> complicated and should be avoided. It is much easier to run shadow
>>> mode for the whole page table tree instead for two subtrees only.
>>>
>>> OTOH the first step when implementing 5-level-paging in the hypervisor
>>> doesn't require shadow mode to be working, so it can be omitted in the
>>> beginning.
>>>
>>>
>>> 3. Is it possible to support 5-level-paging in Xen via a specific
>>> binary for the first step?
>>>
>>> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
>>> instead of dynamical switching at boot time for the first prototype.
>>> This request was accepted in order to reduce the complexity of the
>>> initial patches. Boot time switching should be available for the
>>> final solution, though.
>>>
>>>
>>> I hope I didn't miss anything.
>>
>> Thanks a lot for the your help and for the summary, Juergen.
>> And I really need to say thank you for quite a lot people who joined
>> this discussion. It's quite
>> enlightening. :)
>>
>> One thing I can recall is about the wr{fs,gs}base for pv guest. IIRC,
>> our agreement is to turn off
>> the FSGSBASE in cr4 for PV guests and try to emulate the
>> rd{fs,gs}base and wr{fs,gs}base in the
>> #UD handler.
>>
>> But please correct me if I misunderstood. :)
>
> Yes, that matches my understanding.
>
> A second piece of emulation which needs to happen is to modify the #PF
> handler to notice if a PV guest takes a fault with %cr2 being va57
> canonical but not va48 canonical. In this case, we need to decode the
> instruction as far as working out the segment of the memory operand,
> and inject #GP[0]/#SS[0] as appropriate.
Thanks, Andrew. So working out the segment is only to decide if #GP or
#SS is to be injected, right?
And I'm wondering, even when pv guest and hypervisor are both running in
4 level paging mode,
it could be possible for a #PF to have a va48 canonical address, but
there's no #GP/#SS injected.
So it is left to the PV guest kernel I guess?
And if the answer is yes, in 5 level case, to whom shall we inject the
fault? PV guest kernel shall not
handle this fault, right?
B.R.
Yu
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [XenSummit 2017] Notes from the 5-level-paging session
2017-07-20 10:10 ` Yu Zhang
@ 2017-07-20 10:42 ` Andrew Cooper
2017-07-20 10:36 ` Yu Zhang
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2017-07-20 10:42 UTC (permalink / raw)
To: Yu Zhang, Juergen Gross, xen-devel; +Cc: Jan Beulich, Zhang, Yu C
On 20/07/17 11:10, Yu Zhang wrote:
>
>
> On 7/17/2017 6:53 PM, Juergen Gross wrote:
>> Hey,
>>
>> I took a few notes at the 5-level-paging session at the summit.
>> I hope there isn't any major stuff missing...
>>
>> Participants (at least naming the active ones): Andrew Cooper,
>> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>>
>> The following topics have been discussed in the session:
>>
>>
>> 1. Do we need support for 5-level-paging PV guests?
>>
>> There is no urgent need for 5-level-paging PV guests for the
>> following reasons:
>>
>> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
>> can be PVH or HVM.
>>
>> - A 5-level-paging host supports up to 4 PB physical memory. A
>> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
>> for 4 PB memory needs 8 TB space, which just fits into the hypervisor
>> reserved memory area in the Linux kernel. Any other hypervisor data
>> and/or code can live in the additionally available virtual space of
>> the 5-level-paging mode.
>>
>> There was agreement we don't need support of 5-level-paging PV guests
>> right now. There is a need, however, to support 4-level-paging PV
>> guests located anywhere in the 52-bit physical space of a 5-level-paging
>> host (right now they would have to be in the bottom 64 TB as the Linux
>> kernel is masking away any MFN bit above 64 TB). I will send patches to
>> support this.
>>
>>
>> 2. Do we need 5-level-paging shadow mode support?
>>
>> While strictly required for PV guests only and no 5-level-paging PV
>> guests are to be supported, we will need 5-level-paging shadow mode in
>> the long run. This is necessary because even for a 4-level-paging PV
>> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
>> mode on a huge host as switching between the paging modes is rather
>> complicated and should be avoided. It is much easier to run shadow
>> mode for the whole page table tree instead for two subtrees only.
>>
>> OTOH the first step when implementing 5-level-paging in the hypervisor
>> doesn't require shadow mode to be working, so it can be omitted in the
>> beginning.
>>
>>
>> 3. Is it possible to support 5-level-paging in Xen via a specific
>> binary for the first step?
>>
>> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
>> instead of dynamical switching at boot time for the first prototype.
>> This request was accepted in order to reduce the complexity of the
>> initial patches. Boot time switching should be available for the
>> final solution, though.
>>
>>
>> I hope I didn't miss anything.
>
> Thanks a lot for the your help and for the summary, Juergen.
> And I really need to say thank you for quite a lot people who joined
> this discussion. It's quite
> enlightening. :)
>
> One thing I can recall is about the wr{fs,gs}base for pv guest. IIRC,
> our agreement is to turn off
> the FSGSBASE in cr4 for PV guests and try to emulate the rd{fs,gs}base
> and wr{fs,gs}base in the
> #UD handler.
>
> But please correct me if I misunderstood. :)
Yes, that matches my understanding.
A second piece of emulation which needs to happen is to modify the #PF
handler to notice if a PV guest takes a fault with %cr2 being va57
canonical but not va48 canonical. In this case, we need to decode the
instruction as far as working out the segment of the memory operand, and
inject #GP[0]/#SS[0] as appropriate.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [XenSummit 2017] Notes from the 5-level-paging session
2017-07-20 10:36 ` Yu Zhang
@ 2017-07-20 11:24 ` Andrew Cooper
2017-07-20 12:07 ` Yu Zhang
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2017-07-20 11:24 UTC (permalink / raw)
To: Yu Zhang, Juergen Gross, xen-devel; +Cc: Zhang, Yu C, Jan Beulich
On 20/07/17 11:36, Yu Zhang wrote:
>
>
> On 7/20/2017 6:42 PM, Andrew Cooper wrote:
>> On 20/07/17 11:10, Yu Zhang wrote:
>>>
>>>
>>> On 7/17/2017 6:53 PM, Juergen Gross wrote:
>>>> Hey,
>>>>
>>>> I took a few notes at the 5-level-paging session at the summit.
>>>> I hope there isn't any major stuff missing...
>>>>
>>>> Participants (at least naming the active ones): Andrew Cooper,
>>>> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>>>>
>>>> The following topics have been discussed in the session:
>>>>
>>>>
>>>> 1. Do we need support for 5-level-paging PV guests?
>>>>
>>>> There is no urgent need for 5-level-paging PV guests for the
>>>> following reasons:
>>>>
>>>> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
>>>> can be PVH or HVM.
>>>>
>>>> - A 5-level-paging host supports up to 4 PB physical memory. A
>>>> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
>>>> for 4 PB memory needs 8 TB space, which just fits into the
>>>> hypervisor
>>>> reserved memory area in the Linux kernel. Any other hypervisor data
>>>> and/or code can live in the additionally available virtual space of
>>>> the 5-level-paging mode.
>>>>
>>>> There was agreement we don't need support of 5-level-paging PV guests
>>>> right now. There is a need, however, to support 4-level-paging PV
>>>> guests located anywhere in the 52-bit physical space of a
>>>> 5-level-paging
>>>> host (right now they would have to be in the bottom 64 TB as the Linux
>>>> kernel is masking away any MFN bit above 64 TB). I will send
>>>> patches to
>>>> support this.
>>>>
>>>>
>>>> 2. Do we need 5-level-paging shadow mode support?
>>>>
>>>> While strictly required for PV guests only and no 5-level-paging PV
>>>> guests are to be supported, we will need 5-level-paging shadow mode in
>>>> the long run. This is necessary because even for a 4-level-paging PV
>>>> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
>>>> mode on a huge host as switching between the paging modes is rather
>>>> complicated and should be avoided. It is much easier to run shadow
>>>> mode for the whole page table tree instead for two subtrees only.
>>>>
>>>> OTOH the first step when implementing 5-level-paging in the hypervisor
>>>> doesn't require shadow mode to be working, so it can be omitted in the
>>>> beginning.
>>>>
>>>>
>>>> 3. Is it possible to support 5-level-paging in Xen via a specific
>>>> binary for the first step?
>>>>
>>>> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
>>>> instead of dynamical switching at boot time for the first prototype.
>>>> This request was accepted in order to reduce the complexity of the
>>>> initial patches. Boot time switching should be available for the
>>>> final solution, though.
>>>>
>>>>
>>>> I hope I didn't miss anything.
>>>
>>> Thanks a lot for the your help and for the summary, Juergen.
>>> And I really need to say thank you for quite a lot people who joined
>>> this discussion. It's quite
>>> enlightening. :)
>>>
>>> One thing I can recall is about the wr{fs,gs}base for pv guest.
>>> IIRC, our agreement is to turn off
>>> the FSGSBASE in cr4 for PV guests and try to emulate the
>>> rd{fs,gs}base and wr{fs,gs}base in the
>>> #UD handler.
>>>
>>> But please correct me if I misunderstood. :)
>>
>> Yes, that matches my understanding.
>>
>> A second piece of emulation which needs to happen is to modify the
>> #PF handler to notice if a PV guest takes a fault with %cr2 being
>> va57 canonical but not va48 canonical. In this case, we need to
>> decode the instruction as far as working out the segment of the
>> memory operand, and inject #GP[0]/#SS[0] as appropriate.
>
> Thanks, Andrew. So working out the segment is only to decide if #GP or
> #SS is to be injected, right?
Correct. Any memory reference with an explicit %ss override, or which
uses %rsp/%rbp as a base register needs to be #SS. Everything else is #GP.
>
> And I'm wondering, even when pv guest and hypervisor are both running
> in 4 level paging mode,
> it could be possible for a #PF to have a va48 canonical address, but
> there's no #GP/#SS injected.
> So it is left to the PV guest kernel I guess?
Most pagefaults get handled either by Xen, or by Xen deeming that the
pagefault was caused by the guest, and passing the pagefault on to the
guest kernel. Xen doesn't really care at this point; it is the guest
kernel's job to work out what the correct next action is.
>
> And if the answer is yes, in 5 level case, to whom shall we inject the
> fault? PV guest kernel shall not
> handle this fault, right?
The problem we need to fix is new with Xen running in 5-level.
Previously with Xen running in 4-level, any non-va48 canonical address
would yield #GP/#SS and Xen would handle these directly (usually, by
passing them off the to guest kernel like we do with #PF).
When Xen is running in 5-level, a 64bit PV guest still running in
4-levels is can actually use memory references in the va57 canonical
range, because the hardware is actually operating in 5 levels.
In the context of a 64bit PV guest, the vast majority of the va57 range
will be not-present (as Xen is handling the L5 table on behalf of the
unaware PV guest), while the area Xen resides in will be mapped
supervisor and take a #PF that way.
If we were to do the simple thing and hand the #PF to the guest kernel,
that would be architecturally wrong, because the guest kernel is
expecting to be running in 4 levels. Therefore, Xen needs to emulate
the difference by converting the #PF to #GP/#SS for the guest kernel, so
the behaviour as observed by the guest kernel matches what is expected
from 4-levels.
I hope this is a clearer way of explaining the problem.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [XenSummit 2017] Notes from the 5-level-paging session
2017-07-20 11:24 ` Andrew Cooper
@ 2017-07-20 12:07 ` Yu Zhang
0 siblings, 0 replies; 6+ messages in thread
From: Yu Zhang @ 2017-07-20 12:07 UTC (permalink / raw)
To: Andrew Cooper, Juergen Gross, xen-devel; +Cc: Jan Beulich, Zhang, Yu C
On 7/20/2017 7:24 PM, Andrew Cooper wrote:
> On 20/07/17 11:36, Yu Zhang wrote:
>>
>>
>> On 7/20/2017 6:42 PM, Andrew Cooper wrote:
>>> On 20/07/17 11:10, Yu Zhang wrote:
>>>>
>>>>
>>>> On 7/17/2017 6:53 PM, Juergen Gross wrote:
>>>>> Hey,
>>>>>
>>>>> I took a few notes at the 5-level-paging session at the summit.
>>>>> I hope there isn't any major stuff missing...
>>>>>
>>>>> Participants (at least naming the active ones): Andrew Cooper,
>>>>> Jan Beulich, Yu Zhang and myself (the list is just from my memory).
>>>>>
>>>>> The following topics have been discussed in the session:
>>>>>
>>>>>
>>>>> 1. Do we need support for 5-level-paging PV guests?
>>>>>
>>>>> There is no urgent need for 5-level-paging PV guests for the
>>>>> following reasons:
>>>>>
>>>>> - Guests >64TB (which is the upper limit for 4-level-paging Linux)
>>>>> can be PVH or HVM.
>>>>>
>>>>> - A 5-level-paging host supports up to 4 PB physical memory. A
>>>>> 4-level-paging PV-Dom0 can support that theoretically: the M2P map
>>>>> for 4 PB memory needs 8 TB space, which just fits into the
>>>>> hypervisor
>>>>> reserved memory area in the Linux kernel. Any other hypervisor
>>>>> data
>>>>> and/or code can live in the additionally available virtual
>>>>> space of
>>>>> the 5-level-paging mode.
>>>>>
>>>>> There was agreement we don't need support of 5-level-paging PV guests
>>>>> right now. There is a need, however, to support 4-level-paging PV
>>>>> guests located anywhere in the 52-bit physical space of a
>>>>> 5-level-paging
>>>>> host (right now they would have to be in the bottom 64 TB as the
>>>>> Linux
>>>>> kernel is masking away any MFN bit above 64 TB). I will send
>>>>> patches to
>>>>> support this.
>>>>>
>>>>>
>>>>> 2. Do we need 5-level-paging shadow mode support?
>>>>>
>>>>> While strictly required for PV guests only and no 5-level-paging PV
>>>>> guests are to be supported, we will need 5-level-paging shadow
>>>>> mode in
>>>>> the long run. This is necessary because even for a 4-level-paging PV
>>>>> guest (or a 32-bit PV guest) the processor will run in 5-level-paging
>>>>> mode on a huge host as switching between the paging modes is rather
>>>>> complicated and should be avoided. It is much easier to run shadow
>>>>> mode for the whole page table tree instead for two subtrees only.
>>>>>
>>>>> OTOH the first step when implementing 5-level-paging in the
>>>>> hypervisor
>>>>> doesn't require shadow mode to be working, so it can be omitted in
>>>>> the
>>>>> beginning.
>>>>>
>>>>>
>>>>> 3. Is it possible to support 5-level-paging in Xen via a specific
>>>>> binary for the first step?
>>>>>
>>>>> Yu Zhang asked for implementing 5-level-paging via a Kconfig option
>>>>> instead of dynamical switching at boot time for the first prototype.
>>>>> This request was accepted in order to reduce the complexity of the
>>>>> initial patches. Boot time switching should be available for the
>>>>> final solution, though.
>>>>>
>>>>>
>>>>> I hope I didn't miss anything.
>>>>
>>>> Thanks a lot for the your help and for the summary, Juergen.
>>>> And I really need to say thank you for quite a lot people who
>>>> joined this discussion. It's quite
>>>> enlightening. :)
>>>>
>>>> One thing I can recall is about the wr{fs,gs}base for pv guest.
>>>> IIRC, our agreement is to turn off
>>>> the FSGSBASE in cr4 for PV guests and try to emulate the
>>>> rd{fs,gs}base and wr{fs,gs}base in the
>>>> #UD handler.
>>>>
>>>> But please correct me if I misunderstood. :)
>>>
>>> Yes, that matches my understanding.
>>>
>>> A second piece of emulation which needs to happen is to modify the
>>> #PF handler to notice if a PV guest takes a fault with %cr2 being
>>> va57 canonical but not va48 canonical. In this case, we need to
>>> decode the instruction as far as working out the segment of the
>>> memory operand, and inject #GP[0]/#SS[0] as appropriate.
>>
>> Thanks, Andrew. So working out the segment is only to decide if #GP
>> or #SS is to be injected, right?
>
> Correct. Any memory reference with an explicit %ss override, or which
> uses %rsp/%rbp as a base register needs to be #SS. Everything else is
> #GP.
>
>>
>> And I'm wondering, even when pv guest and hypervisor are both running
>> in 4 level paging mode,
>> it could be possible for a #PF to have a va48 canonical address, but
>> there's no #GP/#SS injected.
>> So it is left to the PV guest kernel I guess?
>
> Most pagefaults get handled either by Xen, or by Xen deeming that the
> pagefault was caused by the guest, and passing the pagefault on to the
> guest kernel. Xen doesn't really care at this point; it is the guest
> kernel's job to work out what the correct next action is.
>
>>
>> And if the answer is yes, in 5 level case, to whom shall we inject
>> the fault? PV guest kernel shall not
>> handle this fault, right?
>
> The problem we need to fix is new with Xen running in 5-level.
>
> Previously with Xen running in 4-level, any non-va48 canonical address
> would yield #GP/#SS and Xen would handle these directly (usually, by
> passing them off the to guest kernel like we do with #PF).
>
> When Xen is running in 5-level, a 64bit PV guest still running in
> 4-levels is can actually use memory references in the va57 canonical
> range, because the hardware is actually operating in 5 levels.
>
> In the context of a 64bit PV guest, the vast majority of the va57
> range will be not-present (as Xen is handling the L5 table on behalf
> of the unaware PV guest), while the area Xen resides in will be mapped
> supervisor and take a #PF that way.
>
> If we were to do the simple thing and hand the #PF to the guest
> kernel, that would be architecturally wrong, because the guest kernel
> is expecting to be running in 4 levels. Therefore, Xen needs to
> emulate the difference by converting the #PF to #GP/#SS for the guest
> kernel, so the behaviour as observed by the guest kernel matches what
> is expected from 4-levels.
>
Oh, right. We have talked about this before. Thanks, Andrew! :-)
Yu
> I hope this is a clearer way of explaining the problem.
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-20 12:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-17 10:53 [XenSummit 2017] Notes from the 5-level-paging session Juergen Gross
2017-07-20 10:10 ` Yu Zhang
2017-07-20 10:42 ` Andrew Cooper
2017-07-20 10:36 ` Yu Zhang
2017-07-20 11:24 ` Andrew Cooper
2017-07-20 12:07 ` Yu Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).