* [PATCH v3 1/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
2016-11-30 15:03 [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Ladi Prosek
@ 2016-11-30 15:03 ` Ladi Prosek
2016-12-21 7:31 ` Wanpeng Li
2016-11-30 15:03 ` [PATCH v3 2/4] KVM: nVMX: propagate errors from prepare_vmcs02 Ladi Prosek
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Ladi Prosek @ 2016-11-30 15:03 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, bsd
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
(see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
related issues:
1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
overwritten in ept_load_pdptrs because the registers are believed to have
been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
running with PAE enabled but PDPTRs have been set up by L1.
2) When L2 is about to enable paging and loads its CR3, we, again, attempt
to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
that this will succeed (it's just a CR3 load, paging is not enabled yet) and
if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
then lost and L2 crashes right after it enables paging.
This patch replaces the kvm_set_cr3 call with a simple register write if PAE
and EPT are both on. CR3 is not to be interpreted in this case.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81fbda0..034dc7a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9808,6 +9808,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+ bool nested_ept_enabled = false;
vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
@@ -9972,6 +9973,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_intr_status);
}
+ nested_ept_enabled = (exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
@@ -10113,8 +10115,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_set_cr4(vcpu, vmcs12->guest_cr4);
vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
- /* shadow page tables on either EPT or shadow page tables */
- kvm_set_cr3(vcpu, vmcs12->guest_cr3);
+ /*
+ * Shadow page tables on either EPT or shadow page tables.
+ * If PAE and EPT are both on, CR3 is not used by the CPU and must not
+ * be dereferenced.
+ */
+ if (is_pae(vcpu) && is_paging(vcpu) && !is_long_mode(vcpu) &&
+ nested_ept_enabled) {
+ vcpu->arch.cr3 = vmcs12->guest_cr3;
+ __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+ } else
+ kvm_set_cr3(vcpu, vmcs12->guest_cr3);
+
kvm_mmu_reset_context(vcpu);
if (!enable_ept)
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v3 1/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
2016-11-30 15:03 ` [PATCH v3 1/4] " Ladi Prosek
@ 2016-12-21 7:31 ` Wanpeng Li
2016-12-21 8:09 ` Ladi Prosek
0 siblings, 1 reply; 10+ messages in thread
From: Wanpeng Li @ 2016-12-21 7:31 UTC (permalink / raw)
To: Ladi Prosek; +Cc: kvm, Paolo Bonzini, Radim Krcmar, Bandan Das
2016-11-30 23:03 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
> KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
> PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
> PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
How L1 hypervisor populates guest PDPTE VMCS fields if undereference CR3?
> (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
> dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
> related issues:
>
> 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
> overwritten in ept_load_pdptrs because the registers are believed to have
> been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
> running with PAE enabled but PDPTRs have been set up by L1.
>
> 2) When L2 is about to enable paging and loads its CR3, we, again, attempt
> to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
> that this will succeed (it's just a CR3 load, paging is not enabled yet) and
> if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
> then lost and L2 crashes right after it enables paging.
>
> This patch replaces the kvm_set_cr3 call with a simple register write if PAE
> and EPT are both on. CR3 is not to be interpreted in this case.
>
> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 81fbda0..034dc7a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9808,6 +9808,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> u32 exec_control;
> + bool nested_ept_enabled = false;
>
> vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
> vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -9972,6 +9973,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> vmcs12->guest_intr_status);
> }
>
> + nested_ept_enabled = (exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
> vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> }
>
> @@ -10113,8 +10115,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> vmx_set_cr4(vcpu, vmcs12->guest_cr4);
> vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
>
> - /* shadow page tables on either EPT or shadow page tables */
> - kvm_set_cr3(vcpu, vmcs12->guest_cr3);
> + /*
> + * Shadow page tables on either EPT or shadow page tables.
> + * If PAE and EPT are both on, CR3 is not used by the CPU and must not
> + * be dereferenced.
> + */
> + if (is_pae(vcpu) && is_paging(vcpu) && !is_long_mode(vcpu) &&
> + nested_ept_enabled) {
> + vcpu->arch.cr3 = vmcs12->guest_cr3;
> + __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
> + } else
> + kvm_set_cr3(vcpu, vmcs12->guest_cr3);
> +
> kvm_mmu_reset_context(vcpu);
>
> if (!enable_ept)
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 1/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
2016-12-21 7:31 ` Wanpeng Li
@ 2016-12-21 8:09 ` Ladi Prosek
2016-12-21 12:20 ` Wanpeng Li
0 siblings, 1 reply; 10+ messages in thread
From: Ladi Prosek @ 2016-12-21 8:09 UTC (permalink / raw)
To: Wanpeng Li; +Cc: kvm, Paolo Bonzini, Radim Krcmar, Bandan Das
On Wed, Dec 21, 2016 at 8:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2016-11-30 23:03 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>> KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
>> PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
>> PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
>
> How L1 hypervisor populates guest PDPTE VMCS fields if undereference CR3?
L2 is free to write anything to CR3. From its perspective it's still
running in real mode so CR3 is effectively a general purpose register.
If L1 intercepts CR3 accesses, it just shadows the value in the guest
CR3 VMCS field so it's preserved across vmexists. It does not and
should not dereference it.
Let me know if I misunderstood the question.
>> (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
>> dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
>> related issues:
>>
>> 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
>> overwritten in ept_load_pdptrs because the registers are believed to have
>> been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
>> running with PAE enabled but PDPTRs have been set up by L1.
>>
>> 2) When L2 is about to enable paging and loads its CR3, we, again, attempt
>> to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
>> that this will succeed (it's just a CR3 load, paging is not enabled yet) and
>> if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
>> then lost and L2 crashes right after it enables paging.
>>
>> This patch replaces the kvm_set_cr3 call with a simple register write if PAE
>> and EPT are both on. CR3 is not to be interpreted in this case.
>>
>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> ---
>> arch/x86/kvm/vmx.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 81fbda0..034dc7a 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9808,6 +9808,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> u32 exec_control;
>> + bool nested_ept_enabled = false;
>>
>> vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>> vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> @@ -9972,6 +9973,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> vmcs12->guest_intr_status);
>> }
>>
>> + nested_ept_enabled = (exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
>> vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
>> }
>>
>> @@ -10113,8 +10115,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> vmx_set_cr4(vcpu, vmcs12->guest_cr4);
>> vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
>>
>> - /* shadow page tables on either EPT or shadow page tables */
>> - kvm_set_cr3(vcpu, vmcs12->guest_cr3);
>> + /*
>> + * Shadow page tables on either EPT or shadow page tables.
>> + * If PAE and EPT are both on, CR3 is not used by the CPU and must not
>> + * be dereferenced.
>> + */
>> + if (is_pae(vcpu) && is_paging(vcpu) && !is_long_mode(vcpu) &&
>> + nested_ept_enabled) {
>> + vcpu->arch.cr3 = vmcs12->guest_cr3;
>> + __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
>> + } else
>> + kvm_set_cr3(vcpu, vmcs12->guest_cr3);
>> +
>> kvm_mmu_reset_context(vcpu);
>>
>> if (!enable_ept)
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 1/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
2016-12-21 8:09 ` Ladi Prosek
@ 2016-12-21 12:20 ` Wanpeng Li
0 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2016-12-21 12:20 UTC (permalink / raw)
To: Ladi Prosek; +Cc: kvm, Paolo Bonzini, Radim Krcmar, Bandan Das
2016-12-21 16:09 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
> On Wed, Dec 21, 2016 at 8:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2016-11-30 23:03 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>>> KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
>>> PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
>>> PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
>>
>> How L1 hypervisor populates guest PDPTE VMCS fields if undereference CR3?
>
> L2 is free to write anything to CR3. From its perspective it's still
> running in real mode so CR3 is effectively a general purpose register.
> If L1 intercepts CR3 accesses, it just shadows the value in the guest
> CR3 VMCS field so it's preserved across vmexists. It does not and
> should not dereference it.
>
> Let me know if I misunderstood the question.
I just confuse how PDPTE VMCS field is set when CR3 is uninitialized,
I think it should be from something like ept_identity_map_addr in
Hyper-V.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/4] KVM: nVMX: propagate errors from prepare_vmcs02
2016-11-30 15:03 [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Ladi Prosek
2016-11-30 15:03 ` [PATCH v3 1/4] " Ladi Prosek
@ 2016-11-30 15:03 ` Ladi Prosek
2016-11-30 15:03 ` [PATCH v3 3/4] KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry Ladi Prosek
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Ladi Prosek @ 2016-11-30 15:03 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, bsd
It is possible that prepare_vmcs02 fails to load the guest state. This
patch adds the proper error handling for such a case. L1 will receive
an INVALID_STATE vmexit with the appropriate exit qualification if it
happens.
A failure to set guest CR3 is the only error propagated from prepare_vmcs02
at the moment.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/vmx.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 034dc7a..70598a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9803,8 +9803,11 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
* needs. In addition to modifying the active vmcs (which is vmcs02), this
* function also has additional necessary side-effects, like setting various
* vcpu->arch fields.
+ * Returns 0 on success, 1 on failure. Invalid state exit qualification code
+ * is assigned to entry_failure_code on failure.
*/
-static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
+static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
+ unsigned long *entry_failure_code)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
@@ -10124,8 +10127,12 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
nested_ept_enabled) {
vcpu->arch.cr3 = vmcs12->guest_cr3;
__set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
- } else
- kvm_set_cr3(vcpu, vmcs12->guest_cr3);
+ } else {
+ if (kvm_set_cr3(vcpu, vmcs12->guest_cr3)) {
+ *entry_failure_code = ENTRY_FAIL_DEFAULT;
+ return 1;
+ }
+ }
kvm_mmu_reset_context(vcpu);
@@ -10144,6 +10151,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->guest_rip);
+ return 0;
}
/*
@@ -10158,6 +10166,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
struct loaded_vmcs *vmcs02;
bool ia32e;
u32 msr_entry_idx;
+ unsigned long exit_qualification;
if (!nested_vmx_check_permission(vcpu) ||
!nested_vmx_check_vmcs12(vcpu))
@@ -10313,7 +10322,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
vmx_segment_cache_clear(vmx);
- prepare_vmcs02(vcpu, vmcs12);
+ if (prepare_vmcs02(vcpu, vmcs12, &exit_qualification)) {
+ leave_guest_mode(vcpu);
+ vmx_load_vmcs01(vcpu);
+ nested_vmx_entry_failure(vcpu, vmcs12,
+ EXIT_REASON_INVALID_STATE, exit_qualification);
+ return 1;
+ }
msr_entry_idx = nested_vmx_load_msr(vcpu,
vmcs12->vm_entry_msr_load_addr,
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 3/4] KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry
2016-11-30 15:03 [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Ladi Prosek
2016-11-30 15:03 ` [PATCH v3 1/4] " Ladi Prosek
2016-11-30 15:03 ` [PATCH v3 2/4] KVM: nVMX: propagate errors from prepare_vmcs02 Ladi Prosek
@ 2016-11-30 15:03 ` Ladi Prosek
2016-11-30 15:03 ` [PATCH v3 4/4] KVM: nVMX: check host CR3 on vmentry and vmexit Ladi Prosek
2016-12-01 14:18 ` [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Radim Krčmář
4 siblings, 0 replies; 10+ messages in thread
From: Ladi Prosek @ 2016-11-30 15:03 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, bsd
Loading CR3 as part of emulating vmentry is different from regular CR3 loads,
as implemented in kvm_set_cr3, in several ways.
* different rules are followed to check CR3 and it is desirable for the caller
to distinguish between the possible failures
* PDPTRs are not loaded if PAE paging and nested EPT are both enabled
* many MMU operations are not necessary
This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of
nested vmentry and vmexit, and makes use of it on the nested vmentry path.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx.c | 59 +++++++++++++++++++++++++++++------------
arch/x86/kvm/x86.c | 3 ++-
3 files changed, 45 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdde807..9ec06a3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1062,6 +1062,7 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
+bool pdptrs_changed(struct kvm_vcpu *vcpu);
int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
const void *val, int bytes);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 70598a1..58dc7f1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9796,6 +9796,44 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
}
/*
+ * Load guest's/host's cr3 at nested entry/exit. nested_ept is true if we are
+ * emulating VM entry into a guest with EPT enabled.
+ * Returns 0 on success, 1 on failure. Invalid state exit qualification code
+ * is assigned to entry_failure_code on failure.
+ */
+static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
+ unsigned long *entry_failure_code)
+{
+ unsigned long invalid_mask;
+
+ if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
+ invalid_mask = (~0ULL) << cpuid_maxphyaddr(vcpu);
+ if (cr3 & invalid_mask) {
+ *entry_failure_code = ENTRY_FAIL_DEFAULT;
+ return 1;
+ }
+
+ /*
+ * If PAE paging and EPT are both on, CR3 is not used by the CPU and
+ * must not be dereferenced.
+ */
+ if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
+ !nested_ept) {
+ if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
+ *entry_failure_code = ENTRY_FAIL_PDPTE;
+ return 1;
+ }
+ }
+
+ vcpu->arch.cr3 = cr3;
+ __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
+ }
+
+ kvm_mmu_reset_context(vcpu);
+ return 0;
+}
+
+/*
* prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
* L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
* with L0's requirements for its guest (a.k.a. vmcs01), so we can run the L2
@@ -10118,23 +10156,10 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
vmx_set_cr4(vcpu, vmcs12->guest_cr4);
vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(vmcs12));
- /*
- * Shadow page tables on either EPT or shadow page tables.
- * If PAE and EPT are both on, CR3 is not used by the CPU and must not
- * be dereferenced.
- */
- if (is_pae(vcpu) && is_paging(vcpu) && !is_long_mode(vcpu) &&
- nested_ept_enabled) {
- vcpu->arch.cr3 = vmcs12->guest_cr3;
- __set_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail);
- } else {
- if (kvm_set_cr3(vcpu, vmcs12->guest_cr3)) {
- *entry_failure_code = ENTRY_FAIL_DEFAULT;
- return 1;
- }
- }
-
- kvm_mmu_reset_context(vcpu);
+ /* Shadow page tables on either EPT or shadow page tables. */
+ if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_ept_enabled,
+ entry_failure_code))
+ return 1;
if (!enable_ept)
vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 78f6061..e4e06f9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -573,7 +573,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
}
EXPORT_SYMBOL_GPL(load_pdptrs);
-static bool pdptrs_changed(struct kvm_vcpu *vcpu)
+bool pdptrs_changed(struct kvm_vcpu *vcpu)
{
u64 pdpte[ARRAY_SIZE(vcpu->arch.walk_mmu->pdptrs)];
bool changed = true;
@@ -599,6 +599,7 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
return changed;
}
+EXPORT_SYMBOL_GPL(pdptrs_changed);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 4/4] KVM: nVMX: check host CR3 on vmentry and vmexit
2016-11-30 15:03 [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Ladi Prosek
` (2 preceding siblings ...)
2016-11-30 15:03 ` [PATCH v3 3/4] KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry Ladi Prosek
@ 2016-11-30 15:03 ` Ladi Prosek
2016-12-01 14:18 ` [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Radim Krčmář
4 siblings, 0 replies; 10+ messages in thread
From: Ladi Prosek @ 2016-11-30 15:03 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, rkrcmar, bsd
This commit adds missing host CR3 checks. Before entering guest mode, the value
of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is
called to set the new CR3 value and check and load PDPTRs.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/include/uapi/asm/vmx.h | 1 +
arch/x86/kvm/vmx.c | 25 ++++++++++++++++++-------
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index f9dea4f..1445865 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -133,6 +133,7 @@
{ EXIT_REASON_XRSTORS, "XRSTORS" }
#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
+#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2
#define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
#endif /* _UAPIVMX_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 58dc7f1..7d301de 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9795,6 +9795,14 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
return 0;
}
+static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ unsigned long invalid_mask;
+
+ invalid_mask = (~0ULL) << cpuid_maxphyaddr(vcpu);
+ return (val & invalid_mask) == 0;
+}
+
/*
* Load guest's/host's cr3 at nested entry/exit. nested_ept is true if we are
* emulating VM entry into a guest with EPT enabled.
@@ -9804,11 +9812,8 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
unsigned long *entry_failure_code)
{
- unsigned long invalid_mask;
-
if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
- invalid_mask = (~0ULL) << cpuid_maxphyaddr(vcpu);
- if (cr3 & invalid_mask) {
+ if (!nested_cr3_valid(vcpu, cr3)) {
*entry_failure_code = ENTRY_FAIL_DEFAULT;
return 1;
}
@@ -10267,7 +10272,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
}
if (((vmcs12->host_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
- ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+ ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON) ||
+ !nested_cr3_valid(vcpu, vmcs12->host_cr3)) {
nested_vmx_failValid(vcpu,
VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
return 1;
@@ -10685,6 +10691,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
struct kvm_segment seg;
+ unsigned long entry_failure_code;
if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->host_ia32_efer;
@@ -10722,8 +10729,12 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
nested_ept_uninit_mmu_context(vcpu);
- kvm_set_cr3(vcpu, vmcs12->host_cr3);
- kvm_mmu_reset_context(vcpu);
+ /*
+ * Only PDPTE load can fail as the value of cr3 was checked on entry and
+ * couldn't have changed.
+ */
+ if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
+ nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
if (!enable_ept)
vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
2016-11-30 15:03 [PATCH v3 0/4] KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT Ladi Prosek
` (3 preceding siblings ...)
2016-11-30 15:03 ` [PATCH v3 4/4] KVM: nVMX: check host CR3 on vmentry and vmexit Ladi Prosek
@ 2016-12-01 14:18 ` Radim Krčmář
2016-12-01 15:00 ` Ladi Prosek
4 siblings, 1 reply; 10+ messages in thread
From: Radim Krčmář @ 2016-12-01 14:18 UTC (permalink / raw)
To: Ladi Prosek; +Cc: kvm, pbonzini, bsd
2016-11-30 16:03+0100, Ladi Prosek:
> The series addresses a specific issue with guest CR3 load on nested vmentry
> as well as a few additional changes related to handling guest and host CR3.
Applied to kvm/queue, thanks. Two other series touched nVMX code in the
meantime and I did the minimum to resolve conflicts. Please check that
you are happy with the result.
^ permalink raw reply [flat|nested] 10+ messages in thread