* [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2
@ 2013-04-17 11:50 Abel Gordon
2013-04-17 11:51 ` [PATCH 01/10] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
` (9 more replies)
0 siblings, 10 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:50 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
This series of patches implements shadow-vmcs capability for nested VMX.
Shadow-vmcs - background and overview:
In Intel VMX, vmread and vmwrite privileged instructions are used by the
hypervisor to read and modify the guest and host specifications (VMCS). In a
nested virtualization environment, L1 executes multiple vmread and vmwrite
instruction to handle a single L2 exit. Each vmread and vmwrite executed by L1
traps (cause an exit) to the L0 hypervisor (KVM). L0 emulates the instruction
behaviour and resumes L1 execution.
Removing the need to trap and emulate these special instructions reduces the
number of exits and improves nested virtualization performance. As it was first
evaluated in [1], exit-less vmread and vmwrite can reduce nested virtualization
overhead up-to 40%.
Intel introduced a new feature to their processors called shadow-vmcs. Using
shadow-vmcs, L0 can configure the processor to let L1 running in guest-mode
access VMCS12 fields using vmread and vmwrite instructions but without causing
an exit to L0. The VMCS12 fields' data is stored in a shadow-vmcs controlled
by L0.
Shadow-vmcs - design considerations:
A shadow-vmcs is processor-dependent and must be accessed by L0 or L1 using
vmread and vmwrite instructions. With nested virtualization we aim to abstract
the hardware from the L1 hypervisor. Thus, to avoid hardware dependencies we
prefered to keep the software defined VMCS12 format as part of L1 address space
and hold the processor-specific shadow-vmcs format only in L0 address space.
In other words, the shadow-vmcs is used by L0 as an accelerator but the format
and content is never exposed to L1 directly. L0 syncs the content of the
processor-specific shadow vmcs with the content of the software-controlled
VMCS12 format.
We could have been kept the processor-specific shadow-vmcs format in L1 address
space to avoid using the software defined VMCS12 format, however, this type of
design/implementation would have been created hardware dependencies and
would complicate other capabilities (e.g. Live Migration of L1).
Changes since v1:
1) Added sync_shadow_vmcs flag used to indicate when the content of VMCS12
must be copied to the shadow vmcs. The flag value is checked during
vmx_vcpu_run.
2) Code quality improvements
Acknowledgments:
Many thanks to
"Xu, Dongxiao" <dongxiao.xu@intel.com>
"Nakajima, Jun" <jun.nakajima@intel.com>
"Har'El, Nadav" <nadav@harel.org.il>
for the insightful discussions, comments and reviews.
These patches were easily created and maintained using
Patchouli -- patch creator
http://patchouli.sourceforge.net/
[1] "The Turtles Project: Design and Implementation of Nested Virtualization",
http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/10] KVM: nVMX: Shadow-vmcs control fields/bits
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
@ 2013-04-17 11:51 ` Abel Gordon
2013-04-17 11:51 ` [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
` (8 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:51 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/include/asm/vmx.h | 3 +++
arch/x86/include/uapi/asm/msr-index.h | 2 ++
2 files changed, 5 insertions(+)
--- .before/arch/x86/include/asm/vmx.h 2013-04-17 14:20:49.000000000 +0300
+++ .after/arch/x86/include/asm/vmx.h 2013-04-17 14:20:49.000000000 +0300
@@ -65,6 +65,7 @@
#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 0x00000200
#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400
#define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000
+#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000
#define PIN_BASED_EXT_INTR_MASK 0x00000001
@@ -150,6 +151,8 @@ enum vmcs_field {
EOI_EXIT_BITMAP2_HIGH = 0x00002021,
EOI_EXIT_BITMAP3 = 0x00002022,
EOI_EXIT_BITMAP3_HIGH = 0x00002023,
+ VMREAD_BITMAP = 0x00002026,
+ VMWRITE_BITMAP = 0x00002028,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
VMCS_LINK_POINTER = 0x00002800,
--- .before/arch/x86/include/uapi/asm/msr-index.h 2013-04-17 14:20:49.000000000 +0300
+++ .after/arch/x86/include/uapi/asm/msr-index.h 2013-04-17 14:20:49.000000000 +0300
@@ -522,6 +522,8 @@
#define VMX_BASIC_MEM_TYPE_WB 6LLU
#define VMX_BASIC_INOUT 0x0040000000000000LLU
+/* MSR_IA32_VMX_MISC bits */
+#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
/* AMD-V MSRs */
#define MSR_VM_CR 0xc0010114
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
2013-04-17 11:51 ` [PATCH 01/10] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
@ 2013-04-17 11:51 ` Abel Gordon
2013-04-17 13:51 ` Gleb Natapov
2013-04-17 11:52 ` [PATCH 03/10] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
` (7 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:51 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Add logic required to detect if shadow-vmcs is supported by the
processor. Introduce a new kernel module parameter to specify if L0 should use
shadow vmcs (or not) to run L1.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:49.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
@@ -87,6 +87,8 @@ module_param(fasteoi, bool, S_IRUGO);
static bool __read_mostly enable_apicv = 1;
module_param(enable_apicv, bool, S_IRUGO);
+static bool __read_mostly enable_shadow_vmcs = 1;
+module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
/*
* If nested=1, nested virtualization is supported, i.e., guests may use
* VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -940,6 +942,18 @@ static inline bool cpu_has_vmx_wbinvd_ex
SECONDARY_EXEC_WBINVD_EXITING;
}
+static inline bool cpu_has_vmx_shadow_vmcs(void)
+{
+ u64 vmx_msr;
+ rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+ /* check if the cpu supports writing r/o exit information fields */
+ if (!(vmx_msr & (1u << 29)))
+ return false;
+
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_SHADOW_VMCS;
+}
+
static inline bool report_flexpriority(void)
{
return flexpriority_enabled;
@@ -2632,7 +2646,8 @@ static __init int setup_vmcs_config(stru
SECONDARY_EXEC_RDTSCP |
SECONDARY_EXEC_ENABLE_INVPCID |
SECONDARY_EXEC_APIC_REGISTER_VIRT |
- SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
+ SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+ SECONDARY_EXEC_SHADOW_VMCS;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -2833,6 +2848,8 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_vpid())
enable_vpid = 0;
+ if (!cpu_has_vmx_shadow_vmcs())
+ enable_shadow_vmcs = 0;
if (!cpu_has_vmx_ept() ||
!cpu_has_vmx_ept_4levels()) {
@@ -4075,6 +4092,12 @@ static u32 vmx_secondary_exec_control(st
exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+ /* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
+ (handle_vmptrld).
+ We can NOT enable shadow_vmcs here because we don't have yet
+ a current VMCS12
+ */
+ exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
return exec_control;
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 03/10] KVM: nVMX: Introduce vmread and vmwrite bitmaps
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
2013-04-17 11:51 ` [PATCH 01/10] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
2013-04-17 11:51 ` [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
@ 2013-04-17 11:52 ` Abel Gordon
2013-04-17 11:52 ` [PATCH 04/10] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
` (6 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:52 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields.
These lists are intended to specifiy most frequent accessed fields so we can
minimize the number of fields that are copied from/to the software controlled
VMCS12 format to/from to processor-specific shadow vmcs. The lists were built
measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was
running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were
additional fields which were frequently modified but they were not added to
these lists because after boot these fields were not longer accessed by L1.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 85 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 1 deletion(-)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
@@ -484,6 +484,59 @@ static inline struct vcpu_vmx *to_vmx(st
#define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
[number##_HIGH] = VMCS12_OFFSET(name)+4
+
+static const unsigned long shadow_read_only_fields[] = {
+ VM_EXIT_REASON,
+ VM_EXIT_INTR_INFO,
+ VM_EXIT_INSTRUCTION_LEN,
+ /*
+ * we do NOT shadow VM_INSTRUCTION_ERROR because it is read
+ * by L1 only as part of the error path.
+ * Note the code assumes this logic. If for some reason
+ * we start shadowing VM_INSTRUCTION_ERROR then we need to
+ * force a shadow sync in nested_vmx_failValid
+ */
+ IDT_VECTORING_INFO_FIELD,
+ IDT_VECTORING_ERROR_CODE,
+ VM_EXIT_INTR_ERROR_CODE,
+ EXIT_QUALIFICATION,
+ GUEST_LINEAR_ADDRESS,
+ GUEST_PHYSICAL_ADDRESS
+};
+static const int max_shadow_read_only_fields =
+ ARRAY_SIZE(shadow_read_only_fields);
+
+static const unsigned long shadow_read_write_fields[] = {
+ GUEST_RIP,
+ GUEST_RSP,
+ GUEST_CR0,
+ GUEST_CR3,
+ GUEST_CR4,
+ GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_RFLAGS,
+ GUEST_CS_SELECTOR,
+ GUEST_CS_AR_BYTES,
+ GUEST_CS_LIMIT,
+ GUEST_CS_BASE,
+ GUEST_ES_BASE,
+ CR0_GUEST_HOST_MASK,
+ CR0_READ_SHADOW,
+ CR4_READ_SHADOW,
+ TSC_OFFSET,
+ EXCEPTION_BITMAP,
+ CPU_BASED_VM_EXEC_CONTROL,
+ VM_ENTRY_EXCEPTION_ERROR_CODE,
+ VM_ENTRY_INTR_INFO_FIELD,
+ VM_ENTRY_INSTRUCTION_LEN,
+ VM_ENTRY_EXCEPTION_ERROR_CODE,
+ HOST_FS_BASE,
+ HOST_GS_BASE,
+ HOST_FS_SELECTOR,
+ HOST_GS_SELECTOR
+};
+static const int max_shadow_read_write_fields =
+ ARRAY_SIZE(shadow_read_write_fields);
+
static const unsigned short vmcs_field_to_offset_table[] = {
FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id),
FIELD(GUEST_ES_SELECTOR, guest_es_selector),
@@ -675,6 +728,8 @@ static unsigned long *vmx_msr_bitmap_leg
static unsigned long *vmx_msr_bitmap_longmode;
static unsigned long *vmx_msr_bitmap_legacy_x2apic;
static unsigned long *vmx_msr_bitmap_longmode_x2apic;
+static unsigned long *vmx_vmread_bitmap;
+static unsigned long *vmx_vmwrite_bitmap;
static bool cpu_has_load_ia32_efer;
static bool cpu_has_load_perf_global_ctrl;
@@ -4126,6 +4181,10 @@ static int vmx_vcpu_setup(struct vcpu_vm
vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a));
vmcs_write64(IO_BITMAP_B, __pa(vmx_io_bitmap_b));
+ if (enable_shadow_vmcs) {
+ vmcs_write64(VMREAD_BITMAP, __pa(vmx_vmread_bitmap));
+ vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
+ }
if (cpu_has_vmx_msr_bitmap())
vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
@@ -7936,6 +7995,24 @@ static int __init vmx_init(void)
(unsigned long *)__get_free_page(GFP_KERNEL);
if (!vmx_msr_bitmap_longmode_x2apic)
goto out4;
+ vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+ if (!vmx_vmread_bitmap)
+ goto out5;
+
+ vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
+ if (!vmx_vmwrite_bitmap)
+ goto out6;
+
+ memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
+ memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
+ /* shadowed read/write fields */
+ for (i = 0; i < max_shadow_read_write_fields; i++) {
+ clear_bit(shadow_read_write_fields[i], vmx_vmwrite_bitmap);
+ clear_bit(shadow_read_write_fields[i], vmx_vmread_bitmap);
+ }
+ /* shadowed read only fields */
+ for (i = 0; i < max_shadow_read_only_fields; i++)
+ clear_bit(shadow_read_only_fields[i], vmx_vmread_bitmap);
/*
* Allow direct access to the PC debug port (it is often used for I/O
@@ -7954,7 +8031,7 @@ static int __init vmx_init(void)
r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
__alignof__(struct vcpu_vmx), THIS_MODULE);
if (r)
- goto out5;
+ goto out7;
#ifdef CONFIG_KEXEC
rcu_assign_pointer(crash_vmclear_loaded_vmcss,
@@ -8002,6 +8079,10 @@ static int __init vmx_init(void)
return 0;
+out7:
+ free_page((unsigned long)vmx_vmwrite_bitmap);
+out6:
+ free_page((unsigned long)vmx_vmread_bitmap);
out5:
free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
out4:
@@ -8025,6 +8106,8 @@ static void __exit vmx_exit(void)
free_page((unsigned long)vmx_msr_bitmap_longmode);
free_page((unsigned long)vmx_io_bitmap_b);
free_page((unsigned long)vmx_io_bitmap_a);
+ free_page((unsigned long)vmx_vmwrite_bitmap);
+ free_page((unsigned long)vmx_vmread_bitmap);
#ifdef CONFIG_KEXEC
rcu_assign_pointer(crash_vmclear_loaded_vmcss, NULL);
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 04/10] KVM: nVMX: Refactor handle_vmwrite
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (2 preceding siblings ...)
2013-04-17 11:52 ` [PATCH 03/10] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
@ 2013-04-17 11:52 ` Abel Gordon
2013-04-17 11:53 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
` (5 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:52 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Refactor existent code so we re-use vmcs12_write_any to copy fields from the
shadow vmcs specified by the link pointer (used by the processor,
implementation-specific) to the VMCS12 software format used by L0 to hold
the fields in L1 memory address space.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 52 +++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 24 deletions(-)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
@@ -5835,6 +5835,33 @@ static inline bool vmcs12_read_any(struc
}
}
+
+static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
+ unsigned long field, u64 field_value){
+ short offset = vmcs_field_to_offset(field);
+ char *p = ((char *) get_vmcs12(vcpu)) + offset;
+ if (offset < 0)
+ return false;
+
+ switch (vmcs_field_type(field)) {
+ case VMCS_FIELD_TYPE_U16:
+ *(u16 *)p = field_value;
+ return true;
+ case VMCS_FIELD_TYPE_U32:
+ *(u32 *)p = field_value;
+ return true;
+ case VMCS_FIELD_TYPE_U64:
+ *(u64 *)p = field_value;
+ return true;
+ case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+ *(natural_width *)p = field_value;
+ return true;
+ default:
+ return false; /* can never happen. */
+ }
+
+}
+
/*
* VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
* used before) all generate the same failure when it is missing.
@@ -5899,8 +5926,6 @@ static int handle_vmwrite(struct kvm_vcp
gva_t gva;
unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- char *p;
- short offset;
/* The value to write might be 32 or 64 bits, depending on L1's long
* mode, and eventually we need to write that into a field of several
* possible lengths. The code below first zero-extends the value to 64
@@ -5937,28 +5962,7 @@ static int handle_vmwrite(struct kvm_vcp
return 1;
}
- offset = vmcs_field_to_offset(field);
- if (offset < 0) {
- nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
- skip_emulated_instruction(vcpu);
- return 1;
- }
- p = ((char *) get_vmcs12(vcpu)) + offset;
-
- switch (vmcs_field_type(field)) {
- case VMCS_FIELD_TYPE_U16:
- *(u16 *)p = field_value;
- break;
- case VMCS_FIELD_TYPE_U32:
- *(u32 *)p = field_value;
- break;
- case VMCS_FIELD_TYPE_U64:
- *(u64 *)p = field_value;
- break;
- case VMCS_FIELD_TYPE_NATURAL_WIDTH:
- *(natural_width *)p = field_value;
- break;
- default:
+ if (!vmcs12_write_any(vcpu, field, field_value)) {
nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
skip_emulated_instruction(vcpu);
return 1;
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (3 preceding siblings ...)
2013-04-17 11:52 ` [PATCH 04/10] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
@ 2013-04-17 11:53 ` Abel Gordon
2013-04-17 14:10 ` Gleb Natapov
2013-04-17 11:53 ` [PATCH 06/10] KVM: nVMX: Release " Abel Gordon
` (4 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:53 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12
becomes non-current, its shadow vmcs is released.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
@@ -355,6 +355,7 @@ struct nested_vmx {
/* The host-usable pointer to the above */
struct page *current_vmcs12_page;
struct vmcs12 *current_vmcs12;
+ struct vmcs *current_shadow_vmcs;
/* vmcs02_list cache of VMCSs recently used to run L2 guests */
struct list_head vmcs02_pool;
@@ -5980,6 +5981,7 @@ static int handle_vmptrld(struct kvm_vcp
gva_t gva;
gpa_t vmptr;
struct x86_exception e;
+ struct vmcs *shadow_vmcs;
if (!nested_vmx_check_permission(vcpu))
return 1;
@@ -6026,6 +6028,19 @@ static int handle_vmptrld(struct kvm_vcp
vmx->nested.current_vmptr = vmptr;
vmx->nested.current_vmcs12 = new_vmcs12;
vmx->nested.current_vmcs12_page = page;
+ if (enable_shadow_vmcs) {
+ shadow_vmcs = alloc_vmcs();
+ if (!shadow_vmcs) {
+ nested_vmx_failInvalid(vcpu);
+ skip_emulated_instruction(vcpu);
+ return 1;
+ }
+ /* mark vmcs as shadow */
+ shadow_vmcs->revision_id |= (1u << 31);
+ /* init shadow vmcs */
+ vmcs_clear(shadow_vmcs);
+ vmx->nested.current_shadow_vmcs = shadow_vmcs;
+ }
}
nested_vmx_succeed(vcpu);
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 06/10] KVM: nVMX: Release shadow vmcs
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (4 preceding siblings ...)
2013-04-17 11:53 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
@ 2013-04-17 11:53 ` Abel Gordon
2013-04-17 11:54 ` [PATCH 07/10] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
` (3 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:53 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Unmap vmcs12 and release the corresponding shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
@@ -5581,6 +5581,17 @@ static int nested_vmx_check_permission(s
return 1;
}
+static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
+{
+ if (enable_shadow_vmcs) {
+ if (vmx->nested.current_vmcs12 != NULL) {
+ free_vmcs(vmx->nested.current_shadow_vmcs);
+ }
+ }
+ kunmap(vmx->nested.current_vmcs12_page);
+ nested_release_page(vmx->nested.current_vmcs12_page);
+}
+
/*
* Free whatever needs to be freed from vmx->nested when L1 goes down, or
* just stops using VMX.
@@ -5591,8 +5602,7 @@ static void free_nested(struct vcpu_vmx
return;
vmx->nested.vmxon = false;
if (vmx->nested.current_vmptr != -1ull) {
- kunmap(vmx->nested.current_vmcs12_page);
- nested_release_page(vmx->nested.current_vmcs12_page);
+ nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = -1ull;
vmx->nested.current_vmcs12 = NULL;
}
@@ -5736,8 +5746,7 @@ static int handle_vmclear(struct kvm_vcp
}
if (vmptr == vmx->nested.current_vmptr) {
- kunmap(vmx->nested.current_vmcs12_page);
- nested_release_page(vmx->nested.current_vmcs12_page);
+ nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = -1ull;
vmx->nested.current_vmcs12 = NULL;
}
@@ -6020,10 +6029,8 @@ static int handle_vmptrld(struct kvm_vcp
skip_emulated_instruction(vcpu);
return 1;
}
- if (vmx->nested.current_vmptr != -1ull) {
- kunmap(vmx->nested.current_vmcs12_page);
- nested_release_page(vmx->nested.current_vmcs12_page);
- }
+ if (vmx->nested.current_vmptr != -1ull)
+ nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = vmptr;
vmx->nested.current_vmcs12 = new_vmcs12;
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 07/10] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (5 preceding siblings ...)
2013-04-17 11:53 ` [PATCH 06/10] KVM: nVMX: Release " Abel Gordon
@ 2013-04-17 11:54 ` Abel Gordon
2013-04-17 11:54 ` [PATCH 08/10] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
` (2 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:54 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Introduce a function used to copy fields from the processor-specific shadow
vmcs to the software controlled VMCS12
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
@@ -713,6 +713,7 @@ static void vmx_get_segment(struct kvm_v
static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -5872,6 +5873,40 @@ static inline bool vmcs12_write_any(stru
}
+static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
+{
+ int i;
+ unsigned long field;
+ u64 field_value;
+ struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+ unsigned long *fields = (unsigned long *)shadow_read_write_fields;
+ int num_fields = max_shadow_read_write_fields;
+
+ vmcs_load(shadow_vmcs);
+
+ for (i = 0; i < num_fields; i++) {
+ field = fields[i];
+ switch (vmcs_field_type(field)) {
+ case VMCS_FIELD_TYPE_U16:
+ field_value = vmcs_read16(field);
+ break;
+ case VMCS_FIELD_TYPE_U32:
+ field_value = vmcs_read32(field);
+ break;
+ case VMCS_FIELD_TYPE_U64:
+ field_value = vmcs_read64(field);
+ break;
+ case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+ field_value = vmcs_readl(field);
+ break;
+ }
+ vmcs12_write_any(&vmx->vcpu, field, field_value);
+ }
+
+ vmcs_clear(shadow_vmcs);
+ vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
/*
* VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
* used before) all generate the same failure when it is missing.
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 08/10] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (6 preceding siblings ...)
2013-04-17 11:54 ` [PATCH 07/10] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
@ 2013-04-17 11:54 ` Abel Gordon
2013-04-17 11:55 ` [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
2013-04-17 11:55 ` [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
9 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:54 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Introduce a function used to copy fields from the software controlled VMCS12
to the processor-specific shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 45 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
@@ -713,6 +713,7 @@ static void vmx_get_segment(struct kvm_v
static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
@@ -5907,6 +5908,50 @@ static void copy_shadow_to_vmcs12(struct
vmcs_load(vmx->loaded_vmcs->vmcs);
}
+static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
+{
+ unsigned long *fields[] = {
+ (unsigned long *)shadow_read_write_fields,
+ (unsigned long *)shadow_read_only_fields
+ };
+ int num_lists = ARRAY_SIZE(fields);
+ int max_fields[] = {
+ max_shadow_read_write_fields,
+ max_shadow_read_only_fields
+ };
+ int i, q;
+ unsigned long field;
+ u64 field_value = 0;
+ struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
+
+ vmcs_load(shadow_vmcs);
+
+ for (q = 0; q < num_lists; q++) {
+ for (i = 0; i < max_fields[q]; i++) {
+ field = fields[q][i];
+ vmcs12_read_any(&vmx->vcpu, field, &field_value);
+
+ switch (vmcs_field_type(field)) {
+ case VMCS_FIELD_TYPE_U16:
+ vmcs_write16(field, (u16)field_value);
+ break;
+ case VMCS_FIELD_TYPE_U32:
+ vmcs_write32(field, (u32)field_value);
+ break;
+ case VMCS_FIELD_TYPE_U64:
+ vmcs_write64(field, (u64)field_value);
+ break;
+ case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+ vmcs_writel(field, (long)field_value);
+ break;
+ }
+ }
+ }
+
+ vmcs_clear(shadow_vmcs);
+ vmcs_load(vmx->loaded_vmcs->vmcs);
+}
+
/*
* VMX instructions which assume a current vmcs12 (i.e., that VMPTRLD was
* used before) all generate the same failure when it is missing.
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (7 preceding siblings ...)
2013-04-17 11:54 ` [PATCH 08/10] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
@ 2013-04-17 11:55 ` Abel Gordon
2013-04-17 14:34 ` Gleb Natapov
2013-04-17 11:55 ` [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
9 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:55 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Synchronize between the VMCS12 software controlled structure and the
processor-specific shadow vmcs
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
@@ -356,6 +356,11 @@ struct nested_vmx {
struct page *current_vmcs12_page;
struct vmcs12 *current_vmcs12;
struct vmcs *current_shadow_vmcs;
+ /*
+ * Indicates if the shadow vmcs must be updated with the
+ * data hold by vmcs12
+ */
+ bool sync_shadow_vmcs;
/* vmcs02_list cache of VMCSs recently used to run L2 guests */
struct list_head vmcs02_pool;
@@ -5587,6 +5592,10 @@ static inline void nested_release_vmcs12
{
if (enable_shadow_vmcs) {
if (vmx->nested.current_vmcs12 != NULL) {
+ /* copy to memory all shadowed fields in case
+ they were modified */
+ copy_shadow_to_vmcs12(vmx);
+ vmx->nested.sync_shadow_vmcs = false;
free_vmcs(vmx->nested.current_shadow_vmcs);
}
}
@@ -5716,6 +5725,10 @@ static void nested_vmx_failValid(struct
X86_EFLAGS_SF | X86_EFLAGS_OF))
| X86_EFLAGS_ZF);
get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
+ /*
+ * We don't need to force a shadow sync because
+ * VM_INSTRUCTION_ERROR is not shdowed
+ */
}
/* Emulate the VMCLEAR instruction */
@@ -6127,6 +6140,7 @@ static int handle_vmptrld(struct kvm_vcp
/* init shadow vmcs */
vmcs_clear(shadow_vmcs);
vmx->nested.current_shadow_vmcs = shadow_vmcs;
+ vmx->nested.sync_shadow_vmcs = true;
}
}
@@ -6876,6 +6890,10 @@ static void __noclone vmx_vcpu_run(struc
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long debugctlmsr;
+ if (vmx->nested.sync_shadow_vmcs) {
+ copy_vmcs12_to_shadow(vmx);
+ vmx->nested.sync_shadow_vmcs = false;
+ }
/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
@@ -7496,6 +7514,8 @@ static int nested_vmx_run(struct kvm_vcp
skip_emulated_instruction(vcpu);
vmcs12 = get_vmcs12(vcpu);
+ if (enable_shadow_vmcs)
+ copy_shadow_to_vmcs12(vmx);
/*
* The nested entry process starts with enforcing various prerequisites
* on vmcs12 as required by the Intel SDM, and act appropriately when
@@ -7938,6 +7958,8 @@ static void nested_vmx_vmexit(struct kvm
nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
} else
nested_vmx_succeed(vcpu);
+ if (enable_shadow_vmcs)
+ vmx->nested.sync_shadow_vmcs = true;
}
/*
@@ -7955,6 +7977,8 @@ static void nested_vmx_entry_failure(str
vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
vmcs12->exit_qualification = qualification;
nested_vmx_succeed(vcpu);
+ if (enable_shadow_vmcs)
+ to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
}
static int vmx_check_intercept(struct kvm_vcpu *vcpu,
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
` (8 preceding siblings ...)
2013-04-17 11:55 ` [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
@ 2013-04-17 11:55 ` Abel Gordon
2013-04-17 14:41 ` Gleb Natapov
9 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 11:55 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
shadowed fields to the shadow vmcs. When we release the VMCS12, we also
disable shadow-vmcs capability.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
@@ -5590,12 +5590,17 @@ static int nested_vmx_check_permission(s
static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
{
+ u32 exec_control;
if (enable_shadow_vmcs) {
if (vmx->nested.current_vmcs12 != NULL) {
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
vmx->nested.sync_shadow_vmcs = false;
+ exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+ exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
+ vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+ vmcs_write64(VMCS_LINK_POINTER, -1ull);
free_vmcs(vmx->nested.current_shadow_vmcs);
}
}
@@ -6084,6 +6089,7 @@ static int handle_vmptrld(struct kvm_vcp
gpa_t vmptr;
struct x86_exception e;
struct vmcs *shadow_vmcs;
+ u32 exec_control;
if (!nested_vmx_check_permission(vcpu))
return 1;
@@ -6140,6 +6146,11 @@ static int handle_vmptrld(struct kvm_vcp
/* init shadow vmcs */
vmcs_clear(shadow_vmcs);
vmx->nested.current_shadow_vmcs = shadow_vmcs;
+ exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+ exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
+ vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+ vmcs_write64(VMCS_LINK_POINTER,
+ __pa(shadow_vmcs));
vmx->nested.sync_shadow_vmcs = true;
}
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability
2013-04-17 11:51 ` [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
@ 2013-04-17 13:51 ` Gleb Natapov
2013-04-17 14:33 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 13:51 UTC (permalink / raw)
To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu
On Wed, Apr 17, 2013 at 02:51:40PM +0300, Abel Gordon wrote:
> Add logic required to detect if shadow-vmcs is supported by the
> processor. Introduce a new kernel module parameter to specify if L0 should use
> shadow vmcs (or not) to run L1.
>
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
> arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:49.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> @@ -87,6 +87,8 @@ module_param(fasteoi, bool, S_IRUGO);
> static bool __read_mostly enable_apicv = 1;
> module_param(enable_apicv, bool, S_IRUGO);
>
> +static bool __read_mostly enable_shadow_vmcs = 1;
> +module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
> /*
> * If nested=1, nested virtualization is supported, i.e., guests may use
> * VMX and be a hypervisor for its own guests. If nested=0, guests may not
> @@ -940,6 +942,18 @@ static inline bool cpu_has_vmx_wbinvd_ex
> SECONDARY_EXEC_WBINVD_EXITING;
> }
>
> +static inline bool cpu_has_vmx_shadow_vmcs(void)
> +{
> + u64 vmx_msr;
> + rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> + /* check if the cpu supports writing r/o exit information fields */
> + if (!(vmx_msr & (1u << 29)))
I think you were going to use MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS
here.
> + return false;
> +
> + return vmcs_config.cpu_based_2nd_exec_ctrl &
> + SECONDARY_EXEC_SHADOW_VMCS;
> +}
> +
> static inline bool report_flexpriority(void)
> {
> return flexpriority_enabled;
> @@ -2632,7 +2646,8 @@ static __init int setup_vmcs_config(stru
> SECONDARY_EXEC_RDTSCP |
> SECONDARY_EXEC_ENABLE_INVPCID |
> SECONDARY_EXEC_APIC_REGISTER_VIRT |
> - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
> + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> + SECONDARY_EXEC_SHADOW_VMCS;
> if (adjust_vmx_controls(min2, opt2,
> MSR_IA32_VMX_PROCBASED_CTLS2,
> &_cpu_based_2nd_exec_control) < 0)
> @@ -2833,6 +2848,8 @@ static __init int hardware_setup(void)
>
> if (!cpu_has_vmx_vpid())
> enable_vpid = 0;
> + if (!cpu_has_vmx_shadow_vmcs())
> + enable_shadow_vmcs = 0;
>
> if (!cpu_has_vmx_ept() ||
> !cpu_has_vmx_ept_4levels()) {
> @@ -4075,6 +4092,12 @@ static u32 vmx_secondary_exec_control(st
> exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT |
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
> exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
> + /* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
> + (handle_vmptrld).
> + We can NOT enable shadow_vmcs here because we don't have yet
> + a current VMCS12
> + */
> + exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> return exec_control;
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 11:53 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
@ 2013-04-17 14:10 ` Gleb Natapov
2013-04-17 14:41 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 14:10 UTC (permalink / raw)
To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu
On Wed, Apr 17, 2013 at 02:53:10PM +0300, Abel Gordon wrote:
> Allocate a shadow vmcs used by the processor to shadow part of the fields
> stored in the software defined VMCS12 (let L1 access fields without causing
> exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12
> becomes non-current, its shadow vmcs is released.
>
>
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
> arch/x86/kvm/vmx.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> @@ -355,6 +355,7 @@ struct nested_vmx {
> /* The host-usable pointer to the above */
> struct page *current_vmcs12_page;
> struct vmcs12 *current_vmcs12;
> + struct vmcs *current_shadow_vmcs;
>
> /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> struct list_head vmcs02_pool;
> @@ -5980,6 +5981,7 @@ static int handle_vmptrld(struct kvm_vcp
> gva_t gva;
> gpa_t vmptr;
> struct x86_exception e;
> + struct vmcs *shadow_vmcs;
>
> if (!nested_vmx_check_permission(vcpu))
> return 1;
> @@ -6026,6 +6028,19 @@ static int handle_vmptrld(struct kvm_vcp
> vmx->nested.current_vmptr = vmptr;
> vmx->nested.current_vmcs12 = new_vmcs12;
> vmx->nested.current_vmcs12_page = page;
> + if (enable_shadow_vmcs) {
> + shadow_vmcs = alloc_vmcs();
Next patch frees vmx->nested.current_shadow_vmcs couple of lines above.
What about reusing previous page instead of allocation new one each
time?
> + if (!shadow_vmcs) {
> + nested_vmx_failInvalid(vcpu);
> + skip_emulated_instruction(vcpu);
> + return 1;
> + }
> + /* mark vmcs as shadow */
> + shadow_vmcs->revision_id |= (1u << 31);
> + /* init shadow vmcs */
> + vmcs_clear(shadow_vmcs);
> + vmx->nested.current_shadow_vmcs = shadow_vmcs;
> + }
> }
>
> nested_vmx_succeed(vcpu);
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability
2013-04-17 13:51 ` Gleb Natapov
@ 2013-04-17 14:33 ` Abel Gordon
0 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 14:33 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 04:51:16 PM:
> > +static inline bool cpu_has_vmx_shadow_vmcs(void)
> > +{
> > + u64 vmx_msr;
> > + rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> > + /* check if the cpu supports writing r/o exit information fields */
> > + if (!(vmx_msr & (1u << 29)))
> I think you were going to use MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS
> here.
>
Right, will be fixed. Sorry :)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 11:55 ` [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
@ 2013-04-17 14:34 ` Gleb Natapov
2013-04-17 14:59 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 14:34 UTC (permalink / raw)
To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu
On Wed, Apr 17, 2013 at 02:55:10PM +0300, Abel Gordon wrote:
> Synchronize between the VMCS12 software controlled structure and the
> processor-specific shadow vmcs
>
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
> arch/x86/kvm/vmx.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> @@ -356,6 +356,11 @@ struct nested_vmx {
> struct page *current_vmcs12_page;
> struct vmcs12 *current_vmcs12;
> struct vmcs *current_shadow_vmcs;
> + /*
> + * Indicates if the shadow vmcs must be updated with the
> + * data hold by vmcs12
> + */
> + bool sync_shadow_vmcs;
>
> /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> struct list_head vmcs02_pool;
> @@ -5587,6 +5592,10 @@ static inline void nested_release_vmcs12
> {
> if (enable_shadow_vmcs) {
> if (vmx->nested.current_vmcs12 != NULL) {
> + /* copy to memory all shadowed fields in case
> + they were modified */
> + copy_shadow_to_vmcs12(vmx);
> + vmx->nested.sync_shadow_vmcs = false;
> free_vmcs(vmx->nested.current_shadow_vmcs);
> }
> }
> @@ -5716,6 +5725,10 @@ static void nested_vmx_failValid(struct
> X86_EFLAGS_SF | X86_EFLAGS_OF))
> | X86_EFLAGS_ZF);
> get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
> + /*
> + * We don't need to force a shadow sync because
> + * VM_INSTRUCTION_ERROR is not shdowed
---------------------------------------^ shadowed.
But lets just request sync. This is slow path anyway.
> + */
> }
>
> /* Emulate the VMCLEAR instruction */
> @@ -6127,6 +6140,7 @@ static int handle_vmptrld(struct kvm_vcp
> /* init shadow vmcs */
> vmcs_clear(shadow_vmcs);
> vmx->nested.current_shadow_vmcs = shadow_vmcs;
> + vmx->nested.sync_shadow_vmcs = true;
> }
> }
>
> @@ -6876,6 +6890,10 @@ static void __noclone vmx_vcpu_run(struc
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> unsigned long debugctlmsr;
Leave free line here and move it after if(vmx->emulation_required).
> + if (vmx->nested.sync_shadow_vmcs) {
> + copy_vmcs12_to_shadow(vmx);
> + vmx->nested.sync_shadow_vmcs = false;
> + }
>
> /* Record the guest's net vcpu time for enforced NMI injections. */
> if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
> @@ -7496,6 +7514,8 @@ static int nested_vmx_run(struct kvm_vcp
> skip_emulated_instruction(vcpu);
> vmcs12 = get_vmcs12(vcpu);
>
> + if (enable_shadow_vmcs)
> + copy_shadow_to_vmcs12(vmx);
And free line here.
> /*
> * The nested entry process starts with enforcing various prerequisites
> * on vmcs12 as required by the Intel SDM, and act appropriately when
> @@ -7938,6 +7958,8 @@ static void nested_vmx_vmexit(struct kvm
> nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
> } else
> nested_vmx_succeed(vcpu);
> + if (enable_shadow_vmcs)
> + vmx->nested.sync_shadow_vmcs = true;
> }
>
> /*
> @@ -7955,6 +7977,8 @@ static void nested_vmx_entry_failure(str
> vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
> vmcs12->exit_qualification = qualification;
> nested_vmx_succeed(vcpu);
> + if (enable_shadow_vmcs)
> + to_vmx(vcpu)->nested.sync_shadow_vmcs = true;
> }
>
> static int vmx_check_intercept(struct kvm_vcpu *vcpu,
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality
2013-04-17 11:55 ` [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
@ 2013-04-17 14:41 ` Gleb Natapov
2013-04-17 15:18 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 14:41 UTC (permalink / raw)
To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu
On Wed, Apr 17, 2013 at 02:55:40PM +0300, Abel Gordon wrote:
> Once L1 loads VMCS12 we enable shadow-vmcs capability and copy all the VMCS12
> shadowed fields to the shadow vmcs. When we release the VMCS12, we also
> disable shadow-vmcs capability.
>
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
> arch/x86/kvm/vmx.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> @@ -5590,12 +5590,17 @@ static int nested_vmx_check_permission(s
>
> static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
> {
> + u32 exec_control;
> if (enable_shadow_vmcs) {
> if (vmx->nested.current_vmcs12 != NULL) {
> /* copy to memory all shadowed fields in case
> they were modified */
> copy_shadow_to_vmcs12(vmx);
> vmx->nested.sync_shadow_vmcs = false;
> + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> + exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> + vmcs_write64(VMCS_LINK_POINTER, -1ull);
> free_vmcs(vmx->nested.current_shadow_vmcs);
> }
> }
> @@ -6084,6 +6089,7 @@ static int handle_vmptrld(struct kvm_vcp
> gpa_t vmptr;
> struct x86_exception e;
> struct vmcs *shadow_vmcs;
> + u32 exec_control;
>
> if (!nested_vmx_check_permission(vcpu))
> return 1;
> @@ -6140,6 +6146,11 @@ static int handle_vmptrld(struct kvm_vcp
> /* init shadow vmcs */
> vmcs_clear(shadow_vmcs);
> vmx->nested.current_shadow_vmcs = shadow_vmcs;
> + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> + exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
> + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> + vmcs_write64(VMCS_LINK_POINTER,
> + __pa(shadow_vmcs));
How hard would it be to disable shadowing for individual vmcs if shadow
vmcs allocation fails? It bothers me a little that we can fail perfectly
valid vmptrld() because of failed allocation.
> vmx->nested.sync_shadow_vmcs = true;
> }
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 14:10 ` Gleb Natapov
@ 2013-04-17 14:41 ` Abel Gordon
2013-04-17 14:44 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 14:41 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:10:28 PM:
> On Wed, Apr 17, 2013 at 02:53:10PM +0300, Abel Gordon wrote:
> > Allocate a shadow vmcs used by the processor to shadow part of the
fields
> > stored in the software defined VMCS12 (let L1 access fields without
causing
> > exits). Note we keep a shadow vmcs only for the current vmcs12.
> Once a vmcs12
> > becomes non-current, its shadow vmcs is released.
> >
> >
> > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > ---
> > arch/x86/kvm/vmx.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> > @@ -355,6 +355,7 @@ struct nested_vmx {
> > /* The host-usable pointer to the above */
> > struct page *current_vmcs12_page;
> > struct vmcs12 *current_vmcs12;
> > + struct vmcs *current_shadow_vmcs;
> >
> > /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > struct list_head vmcs02_pool;
> > @@ -5980,6 +5981,7 @@ static int handle_vmptrld(struct kvm_vcp
> > gva_t gva;
> > gpa_t vmptr;
> > struct x86_exception e;
> > + struct vmcs *shadow_vmcs;
> >
> > if (!nested_vmx_check_permission(vcpu))
> > return 1;
> > @@ -6026,6 +6028,19 @@ static int handle_vmptrld(struct kvm_vcp
> > vmx->nested.current_vmptr = vmptr;
> > vmx->nested.current_vmcs12 = new_vmcs12;
> > vmx->nested.current_vmcs12_page = page;
> > + if (enable_shadow_vmcs) {
> > + shadow_vmcs = alloc_vmcs();
> Next patch frees vmx->nested.current_shadow_vmcs couple of lines above.
> What about reusing previous page instead of allocation new one each
> time?
Yes, we could have a single shadow vmcs per L1 vcpu that is used to shadow
multiple L2 vcpus. I preferred not to do that because I didn't want to
share the same page (physical vmcs) for different vmcs12s. However, this is
not an issues because we overwrite the shadowed fields every time we sync
the content.
It's your call. If you prefer to re-use, I'll send a new version that
do that. Please confirm.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 14:41 ` Abel Gordon
@ 2013-04-17 14:44 ` Gleb Natapov
0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 14:44 UTC (permalink / raw)
To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
On Wed, Apr 17, 2013 at 05:41:19PM +0300, Abel Gordon wrote:
>
>
> Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:10:28 PM:
>
>
> > On Wed, Apr 17, 2013 at 02:53:10PM +0300, Abel Gordon wrote:
> > > Allocate a shadow vmcs used by the processor to shadow part of the
> fields
> > > stored in the software defined VMCS12 (let L1 access fields without
> causing
> > > exits). Note we keep a shadow vmcs only for the current vmcs12.
> > Once a vmcs12
> > > becomes non-current, its shadow vmcs is released.
> > >
> > >
> > > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > > ---
> > > arch/x86/kvm/vmx.c | 15 +++++++++++++++
> > > 1 file changed, 15 insertions(+)
> > >
> > > --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> > > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:50.000000000 +0300
> > > @@ -355,6 +355,7 @@ struct nested_vmx {
> > > /* The host-usable pointer to the above */
> > > struct page *current_vmcs12_page;
> > > struct vmcs12 *current_vmcs12;
> > > + struct vmcs *current_shadow_vmcs;
> > >
> > > /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > > struct list_head vmcs02_pool;
> > > @@ -5980,6 +5981,7 @@ static int handle_vmptrld(struct kvm_vcp
> > > gva_t gva;
> > > gpa_t vmptr;
> > > struct x86_exception e;
> > > + struct vmcs *shadow_vmcs;
> > >
> > > if (!nested_vmx_check_permission(vcpu))
> > > return 1;
> > > @@ -6026,6 +6028,19 @@ static int handle_vmptrld(struct kvm_vcp
> > > vmx->nested.current_vmptr = vmptr;
> > > vmx->nested.current_vmcs12 = new_vmcs12;
> > > vmx->nested.current_vmcs12_page = page;
> > > + if (enable_shadow_vmcs) {
> > > + shadow_vmcs = alloc_vmcs();
> > Next patch frees vmx->nested.current_shadow_vmcs couple of lines above.
> > What about reusing previous page instead of allocation new one each
> > time?
>
> Yes, we could have a single shadow vmcs per L1 vcpu that is used to shadow
> multiple L2 vcpus. I preferred not to do that because I didn't want to
> share the same page (physical vmcs) for different vmcs12s. However, this is
> not an issues because we overwrite the shadowed fields every time we sync
> the content.
> It's your call. If you prefer to re-use, I'll send a new version that
> do that. Please confirm.
Yes, I prefer to reuse explicitly. There is a big chance that the same
page will be reused even if you go through freeing and allocation.
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 14:34 ` Gleb Natapov
@ 2013-04-17 14:59 ` Abel Gordon
2013-04-17 15:39 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 14:59 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:34:49 PM:
> > @@ -5716,6 +5725,10 @@ static void nested_vmx_failValid(struct
> > X86_EFLAGS_SF | X86_EFLAGS_OF))
> > | X86_EFLAGS_ZF);
> > get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
> > + /*
> > + * We don't need to force a shadow sync because
> > + * VM_INSTRUCTION_ERROR is not shdowed
> ---------------------------------------^ shadowed.
> But lets just request sync. This is slow path anyway.
Why then ?
Note this will require to call copy_shadow_to_vmcs12
because nested_vmx_failValid can be called while L0 handles a L1
exit (for vmx instruction emulation) and the shadow vmcs could
have been modified by L1 before the exit.
>
> > + */
> > }
> >
> > /* Emulate the VMCLEAR instruction */
> > @@ -6127,6 +6140,7 @@ static int handle_vmptrld(struct kvm_vcp
> > /* init shadow vmcs */
> > vmcs_clear(shadow_vmcs);
> > vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > + vmx->nested.sync_shadow_vmcs = true;
> > }
> > }
> >
> > @@ -6876,6 +6890,10 @@ static void __noclone vmx_vcpu_run(struc
> > {
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > unsigned long debugctlmsr;
> Leave free line here and move it after if(vmx->emulation_required).
Will do
> > + if (vmx->nested.sync_shadow_vmcs) {
> > + copy_vmcs12_to_shadow(vmx);
> > + vmx->nested.sync_shadow_vmcs = false;
> > + }
> >
> > /* Record the guest's net vcpu time for enforced NMI injections. */
> > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
> > @@ -7496,6 +7514,8 @@ static int nested_vmx_run(struct kvm_vcp
> > skip_emulated_instruction(vcpu);
> > vmcs12 = get_vmcs12(vcpu);
> >
> > + if (enable_shadow_vmcs)
> > + copy_shadow_to_vmcs12(vmx);
> And free line here.
Will do
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality
2013-04-17 14:41 ` Gleb Natapov
@ 2013-04-17 15:18 ` Abel Gordon
2013-04-17 15:20 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 15:18 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:41:07 PM:
> On Wed, Apr 17, 2013 at 02:55:40PM +0300, Abel Gordon wrote:
> > Once L1 loads VMCS12 we enable shadow-vmcs capability and copy allthe
VMCS12
> > shadowed fields to the shadow vmcs. When we release the VMCS12, we
also
> > disable shadow-vmcs capability.
> >
> > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > ---
> > arch/x86/kvm/vmx.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> > @@ -5590,12 +5590,17 @@ static int nested_vmx_check_permission(s
> >
> > static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
> > {
> > + u32 exec_control;
> > if (enable_shadow_vmcs) {
> > if (vmx->nested.current_vmcs12 != NULL) {
> > /* copy to memory all shadowed fields in case
> > they were modified */
> > copy_shadow_to_vmcs12(vmx);
> > vmx->nested.sync_shadow_vmcs = false;
> > + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > + exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> > + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> > + vmcs_write64(VMCS_LINK_POINTER, -1ull);
> > free_vmcs(vmx->nested.current_shadow_vmcs);
> > }
> > }
> > @@ -6084,6 +6089,7 @@ static int handle_vmptrld(struct kvm_vcp
> > gpa_t vmptr;
> > struct x86_exception e;
> > struct vmcs *shadow_vmcs;
> > + u32 exec_control;
> >
> > if (!nested_vmx_check_permission(vcpu))
> > return 1;
> > @@ -6140,6 +6146,11 @@ static int handle_vmptrld(struct kvm_vcp
> > /* init shadow vmcs */
> > vmcs_clear(shadow_vmcs);
> > vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > + exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
> > + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> > + vmcs_write64(VMCS_LINK_POINTER,
> > + __pa(shadow_vmcs));
> How hard would it be to disable shadowing for individual vmcs if shadow
> vmcs allocation fails? It bothers me a little that we can fail perfectly
> valid vmptrld() because of failed allocation.
That's really a corner case... IMHO, if we fail to allocate a shadow vmcs
we may experience bigger issues, like failing to allocate VMCS02.
Anyway, if we reuse the shadow vmcs as you requested, then we can allocate
the shadow vmcs once in handle_vmon. In this case, handle_vmon will fail
and
not handle_vmptrld.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality
2013-04-17 15:18 ` Abel Gordon
@ 2013-04-17 15:20 ` Gleb Natapov
0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 15:20 UTC (permalink / raw)
To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
On Wed, Apr 17, 2013 at 06:18:27PM +0300, Abel Gordon wrote:
>
>
> Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:41:07 PM:
>
> > On Wed, Apr 17, 2013 at 02:55:40PM +0300, Abel Gordon wrote:
> > > Once L1 loads VMCS12 we enable shadow-vmcs capability and copy allthe
> VMCS12
> > > shadowed fields to the shadow vmcs. When we release the VMCS12, we
> also
> > > disable shadow-vmcs capability.
> > >
> > > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > > ---
> > > arch/x86/kvm/vmx.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > --- .before/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> > > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 14:20:51.000000000 +0300
> > > @@ -5590,12 +5590,17 @@ static int nested_vmx_check_permission(s
> > >
> > > static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
> > > {
> > > + u32 exec_control;
> > > if (enable_shadow_vmcs) {
> > > if (vmx->nested.current_vmcs12 != NULL) {
> > > /* copy to memory all shadowed fields in case
> > > they were modified */
> > > copy_shadow_to_vmcs12(vmx);
> > > vmx->nested.sync_shadow_vmcs = false;
> > > + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > > + exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
> > > + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> > > + vmcs_write64(VMCS_LINK_POINTER, -1ull);
> > > free_vmcs(vmx->nested.current_shadow_vmcs);
> > > }
> > > }
> > > @@ -6084,6 +6089,7 @@ static int handle_vmptrld(struct kvm_vcp
> > > gpa_t vmptr;
> > > struct x86_exception e;
> > > struct vmcs *shadow_vmcs;
> > > + u32 exec_control;
> > >
> > > if (!nested_vmx_check_permission(vcpu))
> > > return 1;
> > > @@ -6140,6 +6146,11 @@ static int handle_vmptrld(struct kvm_vcp
> > > /* init shadow vmcs */
> > > vmcs_clear(shadow_vmcs);
> > > vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > > + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > > + exec_control |= SECONDARY_EXEC_SHADOW_VMCS;
> > > + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
> > > + vmcs_write64(VMCS_LINK_POINTER,
> > > + __pa(shadow_vmcs));
> > How hard would it be to disable shadowing for individual vmcs if shadow
> > vmcs allocation fails? It bothers me a little that we can fail perfectly
> > valid vmptrld() because of failed allocation.
>
> That's really a corner case... IMHO, if we fail to allocate a shadow vmcs
> we may experience bigger issues, like failing to allocate VMCS02.
> Anyway, if we reuse the shadow vmcs as you requested, then we can allocate
> the shadow vmcs once in handle_vmon. In this case, handle_vmon will fail
> and
> not handle_vmptrld.
Yes, I agree that with shadow vmcs reuse the issue is almost non
existent.
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 14:59 ` Abel Gordon
@ 2013-04-17 15:39 ` Gleb Natapov
2013-04-17 16:03 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-17 15:39 UTC (permalink / raw)
To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
On Wed, Apr 17, 2013 at 05:59:50PM +0300, Abel Gordon wrote:
>
>
> Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:34:49 PM:
>
> > > @@ -5716,6 +5725,10 @@ static void nested_vmx_failValid(struct
> > > X86_EFLAGS_SF | X86_EFLAGS_OF))
> > > | X86_EFLAGS_ZF);
> > > get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
> > > + /*
> > > + * We don't need to force a shadow sync because
> > > + * VM_INSTRUCTION_ERROR is not shdowed
> > ---------------------------------------^ shadowed.
> > But lets just request sync. This is slow path anyway.
>
> Why then ?
Because we do not care how fast the error case is handled, so no point
breaking the rules for it.
> Note this will require to call copy_shadow_to_vmcs12
> because nested_vmx_failValid can be called while L0 handles a L1
> exit (for vmx instruction emulation) and the shadow vmcs could
> have been modified by L1 before the exit.
>
Right, not a big deal if this is the only case when it happens. When we
discussed accessors vs sync_shadow_vmcs flag approach I said that flag
will work only if no vmcs12 fields are changed not as part of vmexit or
vmwrite emulations. This one is such a field unfortunately. Hope it is
the only one.
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 15:39 ` Gleb Natapov
@ 2013-04-17 16:03 ` Abel Gordon
2013-04-17 21:59 ` Paolo Bonzini
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 16:03 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 06:39:01 PM:
> On Wed, Apr 17, 2013 at 05:59:50PM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 17/04/2013 05:34:49 PM:
> >
> > > > @@ -5716,6 +5725,10 @@ static void nested_vmx_failValid(struct
> > > > X86_EFLAGS_SF | X86_EFLAGS_OF))
> > > > | X86_EFLAGS_ZF);
> > > > get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
> > > > + /*
> > > > + * We don't need to force a shadow sync because
> > > > + * VM_INSTRUCTION_ERROR is not shdowed
> > > ---------------------------------------^ shadowed.
> > > But lets just request sync. This is slow path anyway.
> >
> > Why then ?
> Because we do not care how fast the error case is handled, so no point
> breaking the rules for it.
Yep, so let's just keep it as a not shadowed field. We don't break
the rules, we just don't use an acceleration feature for the error
path (trap and emulate vmreads for this field as usual).
> > Note this will require to call copy_shadow_to_vmcs12
> > because nested_vmx_failValid can be called while L0 handles a L1
> > exit (for vmx instruction emulation) and the shadow vmcs could
> > have been modified by L1 before the exit.
> >
> Right, not a big deal if this is the only case when it happens. When we
> discussed accessors vs sync_shadow_vmcs flag approach I said that flag
> will work only if no vmcs12 fields are changed not as part of vmexit or
> vmwrite emulations. This one is such a field unfortunately. Hope it is
> the only one.
Yep, remember that. I answered that L0 should NOT change VMCS12 fields
if L1 is running and L1 didn't execute any vmlaunch, vmresume, vmwrite...
(any vmx instruction. Sorry if I wasn't clear).
nested_vmx_failValid is called ONLY when L1 executes vmx instructions
which L0 traps and emulate.
So, can we keep this part of the code as is ?
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 17:05 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v3 Abel Gordon
@ 2013-04-17 17:07 ` Abel Gordon
2013-04-18 6:38 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-17 17:07 UTC (permalink / raw)
To: kvm; +Cc: owasserm, nadav, jun.nakajima, dongxiao.xu, abelg
Allocate a shadow vmcs used by the processor to shadow part of the fields
stored in the software defined VMCS12 (let L1 access fields without causing
exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12
becomes non-current, its shadow vmcs is released.
Signed-off-by: Abel Gordon <abelg@il.ibm.com>
---
arch/x86/kvm/vmx.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- .before/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
@@ -355,6 +355,7 @@ struct nested_vmx {
/* The host-usable pointer to the above */
struct page *current_vmcs12_page;
struct vmcs12 *current_vmcs12;
+ struct vmcs *current_shadow_vmcs;
/* vmcs02_list cache of VMCSs recently used to run L2 guests */
struct list_head vmcs02_pool;
@@ -5517,6 +5518,7 @@ static int handle_vmon(struct kvm_vcpu *
{
struct kvm_segment cs;
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct vmcs *shadow_vmcs;
/* The Intel VMX Instruction Reference lists a bunch of bits that
* are prerequisite to running VMXON, most notably cr4.VMXE must be
@@ -5540,6 +5542,16 @@ static int handle_vmon(struct kvm_vcpu *
kvm_inject_gp(vcpu, 0);
return 1;
}
+ if (enable_shadow_vmcs) {
+ shadow_vmcs = alloc_vmcs();
+ if (!shadow_vmcs)
+ return -ENOMEM;
+ /* mark vmcs as shadow */
+ shadow_vmcs->revision_id |= (1u << 31);
+ /* init shadow vmcs */
+ vmcs_clear(shadow_vmcs);
+ vmx->nested.current_shadow_vmcs = shadow_vmcs;
+ }
INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
vmx->nested.vmcs02_num = 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 16:03 ` Abel Gordon
@ 2013-04-17 21:59 ` Paolo Bonzini
2013-04-18 6:24 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2013-04-17 21:59 UTC (permalink / raw)
To: Abel Gordon; +Cc: Gleb Natapov, dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Il 17/04/2013 18:03, Abel Gordon ha scritto:
>> > Right, not a big deal if this is the only case when it happens. When we
>> > discussed accessors vs sync_shadow_vmcs flag approach I said that flag
>> > will work only if no vmcs12 fields are changed not as part of vmexit or
>> > vmwrite emulations. This one is such a field unfortunately. Hope it is
>> > the only one.
> Yep, remember that. I answered that L0 should NOT change VMCS12 fields
> if L1 is running and L1 didn't execute any vmlaunch, vmresume, vmwrite...
> (any vmx instruction. Sorry if I wasn't clear).
> nested_vmx_failValid is called ONLY when L1 executes vmx instructions
> which L0 traps and emulate.
>
> So, can we keep this part of the code as is ?
I think so. Not shadowing the field is just as good a solution as
forcing the copy.
Perhaps at the top of the field lists you can replace the comment about
VM_INSTRUCTION_ERROR with one that is more generic, and mentions that
fields that are changed as part of vmexit or vmwrite emulation must not
be shadowed, or alternatively *insert explanation here*...
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-17 21:59 ` Paolo Bonzini
@ 2013-04-18 6:24 ` Abel Gordon
2013-04-18 6:54 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-18 6:24 UTC (permalink / raw)
To: Paolo Bonzini
Cc: dongxiao.xu, Gleb Natapov, jun.nakajima, kvm, nadav, owasserm,
Paolo Bonzini
Paolo Bonzini <paolo.bonzini@gmail.com> wrote on 18/04/2013 12:59:48 AM:
> Il 17/04/2013 18:03, Abel Gordon ha scritto:
> >> > Right, not a big deal if this is the only case when it happens. When
we
> >> > discussed accessors vs sync_shadow_vmcs flag approach I said that
flag
> >> > will work only if no vmcs12 fields are changed not as part of vmexit
or
> >> > vmwrite emulations. This one is such a field unfortunately. Hope it
is
> >> > the only one.
> > Yep, remember that. I answered that L0 should NOT change VMCS12 fields
> > if L1 is running and L1 didn't execute any vmlaunch, vmresume,
vmwrite...
> > (any vmx instruction. Sorry if I wasn't clear).
> > nested_vmx_failValid is called ONLY when L1 executes vmx instructions
> > which L0 traps and emulate.
> >
> > So, can we keep this part of the code as is ?
>
> I think so. Not shadowing the field is just as good a solution as
> forcing the copy.
Ok, then I'll keep the code as is (not shadowing).
> Perhaps at the top of the field lists you can replace the comment about
> VM_INSTRUCTION_ERROR with one that is more generic, and mentions that
> fields that are changed as part of vmexit or vmwrite emulation must not
> be shadowed, or alternatively *insert explanation here*...
Good idea, I will change the comment to be more generic.
Note I already sent v3 which targets all the other suggestions Gleb
wrote. Let me know if I should wait for another review or just
re-send v4 so you can apply the patches.
Right now, the only pending change for v4 is generalizing the
VM_INSTRUCTION_ERROR comment.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-17 17:07 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
@ 2013-04-18 6:38 ` Gleb Natapov
2013-04-18 7:07 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-18 6:38 UTC (permalink / raw)
To: Abel Gordon; +Cc: kvm, owasserm, nadav, jun.nakajima, dongxiao.xu
On Wed, Apr 17, 2013 at 08:07:40PM +0300, Abel Gordon wrote:
> Allocate a shadow vmcs used by the processor to shadow part of the fields
> stored in the software defined VMCS12 (let L1 access fields without causing
> exits). Note we keep a shadow vmcs only for the current vmcs12. Once a vmcs12
> becomes non-current, its shadow vmcs is released.
>
>
> Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> ---
> arch/x86/kvm/vmx.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> --- .before/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> @@ -355,6 +355,7 @@ struct nested_vmx {
> /* The host-usable pointer to the above */
> struct page *current_vmcs12_page;
> struct vmcs12 *current_vmcs12;
> + struct vmcs *current_shadow_vmcs;
>
> /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> struct list_head vmcs02_pool;
> @@ -5517,6 +5518,7 @@ static int handle_vmon(struct kvm_vcpu *
> {
> struct kvm_segment cs;
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct vmcs *shadow_vmcs;
>
> /* The Intel VMX Instruction Reference lists a bunch of bits that
> * are prerequisite to running VMXON, most notably cr4.VMXE must be
> @@ -5540,6 +5542,16 @@ static int handle_vmon(struct kvm_vcpu *
> kvm_inject_gp(vcpu, 0);
> return 1;
> }
> + if (enable_shadow_vmcs) {
> + shadow_vmcs = alloc_vmcs();
> + if (!shadow_vmcs)
> + return -ENOMEM;
> + /* mark vmcs as shadow */
> + shadow_vmcs->revision_id |= (1u << 31);
> + /* init shadow vmcs */
> + vmcs_clear(shadow_vmcs);
> + vmx->nested.current_shadow_vmcs = shadow_vmcs;
> + }
>
Guest can ddos host by calling vmxon repeatedly causing host to leak
memory. This point to a bug in vmxon implementation. vmxon should call
nested_vmx_failInvalid() if (vmx->nested.vmxon).
> INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
> vmx->nested.vmcs02_num = 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-18 6:24 ` Abel Gordon
@ 2013-04-18 6:54 ` Gleb Natapov
2013-04-18 6:59 ` Paolo Bonzini
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-18 6:54 UTC (permalink / raw)
To: Abel Gordon
Cc: Paolo Bonzini, dongxiao.xu, jun.nakajima, kvm, nadav, owasserm,
Paolo Bonzini
On Thu, Apr 18, 2013 at 09:24:20AM +0300, Abel Gordon wrote:
>
>
> Paolo Bonzini <paolo.bonzini@gmail.com> wrote on 18/04/2013 12:59:48 AM:
>
>
> > Il 17/04/2013 18:03, Abel Gordon ha scritto:
> > >> > Right, not a big deal if this is the only case when it happens. When
> we
> > >> > discussed accessors vs sync_shadow_vmcs flag approach I said that
> flag
> > >> > will work only if no vmcs12 fields are changed not as part of vmexit
> or
> > >> > vmwrite emulations. This one is such a field unfortunately. Hope it
> is
> > >> > the only one.
> > > Yep, remember that. I answered that L0 should NOT change VMCS12 fields
> > > if L1 is running and L1 didn't execute any vmlaunch, vmresume,
> vmwrite...
> > > (any vmx instruction. Sorry if I wasn't clear).
> > > nested_vmx_failValid is called ONLY when L1 executes vmx instructions
> > > which L0 traps and emulate.
> > >
> > > So, can we keep this part of the code as is ?
> >
> > I think so. Not shadowing the field is just as good a solution as
> > forcing the copy.
>
> Ok, then I'll keep the code as is (not shadowing).
>
Paolo is right that forcing the copy is not less error pron since each
case where vmcs is changed outside of vmwrite emulation or vmexit should
be tracked manually. Accessors is the only way to make the code more or
less error free without reviewing each case manually. But I am OK with
current approach for now. To get rid of unnecessary copying we will have
to move to accessors some day anyway.
> > Perhaps at the top of the field lists you can replace the comment about
> > VM_INSTRUCTION_ERROR with one that is more generic, and mentions that
> > fields that are changed as part of vmexit or vmwrite emulation must not
> > be shadowed, or alternatively *insert explanation here*...
>
> Good idea, I will change the comment to be more generic.
I like the comment since it provides example that can be looked at to
understand the problem, so if you make the comment more generic leave
the example please :)
>
> Note I already sent v3 which targets all the other suggestions Gleb
> wrote. Let me know if I should wait for another review or just
> re-send v4 so you can apply the patches.
> Right now, the only pending change for v4 is generalizing the
> VM_INSTRUCTION_ERROR comment.
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the shadow vmcs
2013-04-18 6:54 ` Gleb Natapov
@ 2013-04-18 6:59 ` Paolo Bonzini
0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2013-04-18 6:59 UTC (permalink / raw)
To: Gleb Natapov
Cc: Abel Gordon, Paolo Bonzini, dongxiao.xu, jun.nakajima, kvm, nadav,
owasserm, Paolo Bonzini
Il 18/04/2013 08:54, Gleb Natapov ha scritto:
> I like the comment since it provides example that can be looked at to
> understand the problem, so if you make the comment more generic leave
> the example please :)
Agreed. :)
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-18 6:38 ` Gleb Natapov
@ 2013-04-18 7:07 ` Abel Gordon
2013-04-18 7:11 ` Gleb Natapov
0 siblings, 1 reply; 32+ messages in thread
From: Abel Gordon @ 2013-04-18 7:07 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 18/04/2013 09:38:43 AM:
> On Wed, Apr 17, 2013 at 08:07:40PM +0300, Abel Gordon wrote:
> > Allocate a shadow vmcs used by the processor to shadow part of the
fields
> > stored in the software defined VMCS12 (let L1 access fields without
causing
> > exits). Note we keep a shadow vmcs only for the current vmcs12.
> Once a vmcs12
> > becomes non-current, its shadow vmcs is released.
> >
> >
> > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > ---
> > arch/x86/kvm/vmx.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > --- .before/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> > @@ -355,6 +355,7 @@ struct nested_vmx {
> > /* The host-usable pointer to the above */
> > struct page *current_vmcs12_page;
> > struct vmcs12 *current_vmcs12;
> > + struct vmcs *current_shadow_vmcs;
> >
> > /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > struct list_head vmcs02_pool;
> > @@ -5517,6 +5518,7 @@ static int handle_vmon(struct kvm_vcpu *
> > {
> > struct kvm_segment cs;
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > + struct vmcs *shadow_vmcs;
> >
> > /* The Intel VMX Instruction Reference lists a bunch of bits that
> > * are prerequisite to running VMXON, most notably cr4.VMXE must be
> > @@ -5540,6 +5542,16 @@ static int handle_vmon(struct kvm_vcpu *
> > kvm_inject_gp(vcpu, 0);
> > return 1;
> > }
> > + if (enable_shadow_vmcs) {
> > + shadow_vmcs = alloc_vmcs();
> > + if (!shadow_vmcs)
> > + return -ENOMEM;
> > + /* mark vmcs as shadow */
> > + shadow_vmcs->revision_id |= (1u << 31);
> > + /* init shadow vmcs */
> > + vmcs_clear(shadow_vmcs);
> > + vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > + }
> >
> Guest can ddos host by calling vmxon repeatedly causing host to leak
> memory. This point to a bug in vmxon implementation. vmxon should call
> nested_vmx_failInvalid() if (vmx->nested.vmxon).
Good point. I just checked the spec (VMXON pseudo-code) to verify
the right emulation:
According to the pseudo-code we should:
ELSE VMfail(“VMXON executed in VMX root operation”) which means:
VMfail(ErrorNumber):
IF VMCS pointer is valid
THEN VMfailValid(ErrorNumber);
ELSE VMfailInvalid;
FI;
So, I'll call nested_vmx_failValid if nested.current_vmptr != -1ull
Otherwise, I'll call nested_vmx_failInvalid.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-18 7:07 ` Abel Gordon
@ 2013-04-18 7:11 ` Gleb Natapov
2013-04-18 7:15 ` Abel Gordon
0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-04-18 7:11 UTC (permalink / raw)
To: Abel Gordon; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
On Thu, Apr 18, 2013 at 10:07:11AM +0300, Abel Gordon wrote:
>
>
> Gleb Natapov <gleb@redhat.com> wrote on 18/04/2013 09:38:43 AM:
>
> > On Wed, Apr 17, 2013 at 08:07:40PM +0300, Abel Gordon wrote:
> > > Allocate a shadow vmcs used by the processor to shadow part of the
> fields
> > > stored in the software defined VMCS12 (let L1 access fields without
> causing
> > > exits). Note we keep a shadow vmcs only for the current vmcs12.
> > Once a vmcs12
> > > becomes non-current, its shadow vmcs is released.
> > >
> > >
> > > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > > ---
> > > arch/x86/kvm/vmx.c | 12 ++++++++++++
> > > 1 file changed, 12 insertions(+)
> > >
> > > --- .before/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> > > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> > > @@ -355,6 +355,7 @@ struct nested_vmx {
> > > /* The host-usable pointer to the above */
> > > struct page *current_vmcs12_page;
> > > struct vmcs12 *current_vmcs12;
> > > + struct vmcs *current_shadow_vmcs;
> > >
> > > /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > > struct list_head vmcs02_pool;
> > > @@ -5517,6 +5518,7 @@ static int handle_vmon(struct kvm_vcpu *
> > > {
> > > struct kvm_segment cs;
> > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > + struct vmcs *shadow_vmcs;
> > >
> > > /* The Intel VMX Instruction Reference lists a bunch of bits that
> > > * are prerequisite to running VMXON, most notably cr4.VMXE must be
> > > @@ -5540,6 +5542,16 @@ static int handle_vmon(struct kvm_vcpu *
> > > kvm_inject_gp(vcpu, 0);
> > > return 1;
> > > }
> > > + if (enable_shadow_vmcs) {
> > > + shadow_vmcs = alloc_vmcs();
> > > + if (!shadow_vmcs)
> > > + return -ENOMEM;
> > > + /* mark vmcs as shadow */
> > > + shadow_vmcs->revision_id |= (1u << 31);
> > > + /* init shadow vmcs */
> > > + vmcs_clear(shadow_vmcs);
> > > + vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > > + }
> > >
> > Guest can ddos host by calling vmxon repeatedly causing host to leak
> > memory. This point to a bug in vmxon implementation. vmxon should call
> > nested_vmx_failInvalid() if (vmx->nested.vmxon).
>
> Good point. I just checked the spec (VMXON pseudo-code) to verify
> the right emulation:
> According to the pseudo-code we should:
> ELSE VMfail(“VMXON executed in VMX root operation”) which means:
>
> VMfail(ErrorNumber):
> IF VMCS pointer is valid
> THEN VMfailValid(ErrorNumber);
> ELSE VMfailInvalid;
> FI;
>
>
> So, I'll call nested_vmx_failValid if nested.current_vmptr != -1ull
> Otherwise, I'll call nested_vmx_failInvalid.
>
Just call nested_vmx_failValid(). It does that internally.
--
Gleb.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs
2013-04-18 7:11 ` Gleb Natapov
@ 2013-04-18 7:15 ` Abel Gordon
0 siblings, 0 replies; 32+ messages in thread
From: Abel Gordon @ 2013-04-18 7:15 UTC (permalink / raw)
To: Gleb Natapov; +Cc: dongxiao.xu, jun.nakajima, kvm, nadav, owasserm
Gleb Natapov <gleb@redhat.com> wrote on 18/04/2013 10:11:22 AM:
> On Thu, Apr 18, 2013 at 10:07:11AM +0300, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 18/04/2013 09:38:43 AM:
> >
> > > On Wed, Apr 17, 2013 at 08:07:40PM +0300, Abel Gordon wrote:
> > > > Allocate a shadow vmcs used by the processor to shadow part of the
> > fields
> > > > stored in the software defined VMCS12 (let L1 access fields without
> > causing
> > > > exits). Note we keep a shadow vmcs only for the current vmcs12.
> > > Once a vmcs12
> > > > becomes non-current, its shadow vmcs is released.
> > > >
> > > >
> > > > Signed-off-by: Abel Gordon <abelg@il.ibm.com>
> > > > ---
> > > > arch/x86/kvm/vmx.c | 12 ++++++++++++
> > > > 1 file changed, 12 insertions(+)
> > > >
> > > > --- .before/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000
+0300
> > > > +++ .after/arch/x86/kvm/vmx.c 2013-04-17 19:58:32.000000000 +0300
> > > > @@ -355,6 +355,7 @@ struct nested_vmx {
> > > > /* The host-usable pointer to the above */
> > > > struct page *current_vmcs12_page;
> > > > struct vmcs12 *current_vmcs12;
> > > > + struct vmcs *current_shadow_vmcs;
> > > >
> > > > /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > > > struct list_head vmcs02_pool;
> > > > @@ -5517,6 +5518,7 @@ static int handle_vmon(struct kvm_vcpu *
> > > > {
> > > > struct kvm_segment cs;
> > > > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > + struct vmcs *shadow_vmcs;
> > > >
> > > > /* The Intel VMX Instruction Reference lists a bunch of bits
that
> > > > * are prerequisite to running VMXON, most notably cr4.VMXE
must be
> > > > @@ -5540,6 +5542,16 @@ static int handle_vmon(struct kvm_vcpu *
> > > > kvm_inject_gp(vcpu, 0);
> > > > return 1;
> > > > }
> > > > + if (enable_shadow_vmcs) {
> > > > + shadow_vmcs = alloc_vmcs();
> > > > + if (!shadow_vmcs)
> > > > + return -ENOMEM;
> > > > + /* mark vmcs as shadow */
> > > > + shadow_vmcs->revision_id |= (1u << 31);
> > > > + /* init shadow vmcs */
> > > > + vmcs_clear(shadow_vmcs);
> > > > + vmx->nested.current_shadow_vmcs = shadow_vmcs;
> > > > + }
> > > >
> > > Guest can ddos host by calling vmxon repeatedly causing host to leak
> > > memory. This point to a bug in vmxon implementation. vmxon should
call
> > > nested_vmx_failInvalid() if (vmx->nested.vmxon).
> >
> > Good point. I just checked the spec (VMXON pseudo-code) to verify
> > the right emulation:
> > According to the pseudo-code we should:
> > ELSE VMfail(“VMXON executed in VMX root operation”) which means:
> >
> > VMfail(ErrorNumber):
> > IF VMCS pointer is valid
> > THEN VMfailValid(ErrorNumber);
> > ELSE VMfailInvalid;
> > FI;
> >
> >
> > So, I'll call nested_vmx_failValid if nested.current_vmptr != -1ull
> > Otherwise, I'll call nested_vmx_failInvalid.
> >
> Just call nested_vmx_failValid(). It does that internally.
Right, forgot that part :)
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2013-04-18 7:15 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-17 11:50 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v2 Abel Gordon
2013-04-17 11:51 ` [PATCH 01/10] KVM: nVMX: Shadow-vmcs control fields/bits Abel Gordon
2013-04-17 11:51 ` [PATCH 02/10] KVM: nVMX: Detect shadow-vmcs capability Abel Gordon
2013-04-17 13:51 ` Gleb Natapov
2013-04-17 14:33 ` Abel Gordon
2013-04-17 11:52 ` [PATCH 03/10] KVM: nVMX: Introduce vmread and vmwrite bitmaps Abel Gordon
2013-04-17 11:52 ` [PATCH 04/10] KVM: nVMX: Refactor handle_vmwrite Abel Gordon
2013-04-17 11:53 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
2013-04-17 14:10 ` Gleb Natapov
2013-04-17 14:41 ` Abel Gordon
2013-04-17 14:44 ` Gleb Natapov
2013-04-17 11:53 ` [PATCH 06/10] KVM: nVMX: Release " Abel Gordon
2013-04-17 11:54 ` [PATCH 07/10] KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12 Abel Gordon
2013-04-17 11:54 ` [PATCH 08/10] KVM: nVMX: Copy VMCS12 to processor-specific shadow vmcs Abel Gordon
2013-04-17 11:55 ` [PATCH 09/10] KVM: nVMX: Synchronize VMCS12 content with the " Abel Gordon
2013-04-17 14:34 ` Gleb Natapov
2013-04-17 14:59 ` Abel Gordon
2013-04-17 15:39 ` Gleb Natapov
2013-04-17 16:03 ` Abel Gordon
2013-04-17 21:59 ` Paolo Bonzini
2013-04-18 6:24 ` Abel Gordon
2013-04-18 6:54 ` Gleb Natapov
2013-04-18 6:59 ` Paolo Bonzini
2013-04-17 11:55 ` [PATCH 10/10] KVM: nVMX: Enable and disable shadow vmcs functionality Abel Gordon
2013-04-17 14:41 ` Gleb Natapov
2013-04-17 15:18 ` Abel Gordon
2013-04-17 15:20 ` Gleb Natapov
-- strict thread matches above, loose matches on Subject: below --
2013-04-17 17:05 [PATCH 0/10] KVM: nVMX: shadow VMCS support, v3 Abel Gordon
2013-04-17 17:07 ` [PATCH 05/10] KVM: nVMX: Allocate shadow vmcs Abel Gordon
2013-04-18 6:38 ` Gleb Natapov
2013-04-18 7:07 ` Abel Gordon
2013-04-18 7:11 ` Gleb Natapov
2013-04-18 7:15 ` Abel Gordon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox