* [PATCH 0/9] Add support for nested SVM (kernel) v3
@ 2008-09-17 13:41 Alexander Graf
2008-09-17 13:41 ` [PATCH 1/9] Add CPUID feature flag for SVM v3 Alexander Graf
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw)
To: kvm; +Cc: joro, anthony, avi, Alexander Graf
The current generation of virtualization extensions only supports one VM layer.
While we can't change that, it is pretty easy to emulate the CPU's behavior
and implement the virtualization opcodes ourselves.
This patchset does exactly this for SVM. Using it, KVM can run within a VM.
Since we're emulating the real CPU's behavior, this should also enable other
VMMs to run within KVM.
So far I've only tested to run KVM inside the VM though.
Currently running an SMP level2 guest is broken. I have yet to find out why.
As always, comments and suggestions are highly welcome.
v2 takes most comments from Avi into account. Two things are missing:
- save/restore of hflags
- #VMEXIT on save/restore
I'd rather have them submitted as later patches though, so we don't clutter
this already big patchset (I'm slowly getting crazy using git to manage it)
v3 addresses Joergs comments, including
- V_INTR_MASKING support
- a generic permission checking helper
To be usable, this patchset requires the two simple changes in the userspace
part, that I sent to the list with the first version.
Thanks for reviewing!
Alex
Alexander Graf (9):
Add CPUID feature flag for SVM v3
Clean up VINTR setting v3
Add helper functions for nested SVM v3
Implement GIF, clgi and stgi v3
Implement hsave v3
Add VMLOAD and VMSAVE handlers v3
Add VMRUN handler v3
Add VMEXIT handler and intercepts v3
Allow setting the SVME bit v3
arch/x86/kvm/kvm_svm.h | 11 +
arch/x86/kvm/svm.c | 743 +++++++++++++++++++++++++++++++++++++++++-
include/asm-x86/cpufeature.h | 1 +
include/asm-x86/kvm_host.h | 5 +
4 files changed, 748 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 30+ messages in thread* [PATCH 1/9] Add CPUID feature flag for SVM v3 2008-09-17 13:41 [PATCH 0/9] Add support for nested SVM (kernel) v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 2/9] Clean up VINTR setting v3 Alexander Graf 2008-09-19 14:36 ` [PATCH 0/9] Add support for nested SVM (kernel) v3 Joerg Roedel 2008-09-19 21:48 ` First performance numbers Joerg Roedel 2 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf This patch adds the CPUID feature flag for SVM in the x86 Linux headers. Signed-off-by: Alexander Graf <agraf@suse.de> --- include/asm-x86/cpufeature.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h index 762f6a6..bfb49fc 100644 --- a/include/asm-x86/cpufeature.h +++ b/include/asm-x86/cpufeature.h @@ -108,6 +108,7 @@ /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ #define X86_FEATURE_LAHF_LM (6*32+ 0) /* LAHF/SAHF in long mode */ #define X86_FEATURE_CMP_LEGACY (6*32+ 1) /* If yes HyperThreading not valid */ +#define X86_FEATURE_SVM (6*32+ 2) /* Secure Virtual Machine */ #define X86_FEATURE_IBS (6*32+ 10) /* Instruction Based Sampling */ /* -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/9] Clean up VINTR setting v3 2008-09-17 13:41 ` [PATCH 1/9] Add CPUID feature flag for SVM v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 3/9] Add helper functions for nested SVM v3 Alexander Graf 0 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf The current VINTR intercept setters don't look clean to me. To make the code easier to read and enable the possibilty to trap on a VINTR set, this uses a helper function to set the VINTR intercept. v2 uses two distinct functions for setting and clearing the bit Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 20 +++++++++++++++----- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 961dc3b..d9b18cc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -729,6 +729,16 @@ static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) to_svm(vcpu)->vmcb->save.rflags = rflags; } +static void svm_set_vintr(struct vcpu_svm *svm) +{ + svm->vmcb->control.intercept |= 1ULL << INTERCEPT_VINTR; +} + +static void svm_clear_vintr(struct vcpu_svm *svm) +{ + svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR); +} + static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg) { struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save; @@ -1363,7 +1373,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm, { KVMTRACE_0D(PEND_INTR, &svm->vcpu, handler); - svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR); + svm_clear_vintr(svm); svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; /* * If the user space waits to inject interrupts, exit as soon as @@ -1575,7 +1585,7 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || (vmcb->control.event_inj & SVM_EVTINJ_VALID)) { /* unable to deliver irq, set pending irq */ - vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR); + svm_set_vintr(svm); svm_inject_irq(svm, 0x0); goto out; } @@ -1635,9 +1645,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, */ if (!svm->vcpu.arch.interrupt_window_open && (svm->vcpu.arch.irq_summary || kvm_run->request_interrupt_window)) - control->intercept |= 1ULL << INTERCEPT_VINTR; - else - control->intercept &= ~(1ULL << INTERCEPT_VINTR); + svm_set_vintr(svm); + else + svm_clear_vintr(svm); } static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr) -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/9] Add helper functions for nested SVM v3 2008-09-17 13:41 ` [PATCH 2/9] Clean up VINTR setting v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Alexander Graf 0 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf These are helpers for the nested SVM implementation. - nsvm_printk implements a debug printk variant - nested_svm_do calls a handler that can accesses gpa-based memory v3 makes use of the new permission checker Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 88 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d9b18cc..c72e728 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -48,6 +48,16 @@ MODULE_LICENSE("GPL"); #define DEBUGCTL_RESERVED_BITS (~(0x3fULL)) +/* Turn on to get debugging output*/ +/* #define NESTED_DEBUG */ + +#ifdef NESTED_DEBUG +#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args) +#else +static inline void nsvm_printk(char *fmt, ...) { +} +#endif + /* enable NPT for AMD64 and X86 with PAE */ #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) static bool npt_enabled = true; @@ -1139,6 +1149,84 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int nested_svm_check_permissions(struct vcpu_svm *svm) +{ + if (svm->vmcb->save.cpl) { + printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n", + __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu)); + kvm_queue_exception(&svm->vcpu, GP_VECTOR); + return 1; + } + + if (!(svm->vcpu.arch.shadow_efer & MSR_EFER_SVME_MASK) + || !is_paging(&svm->vcpu)) { + kvm_queue_exception(&svm->vcpu, UD_VECTOR); + return 1; + } + + return 0; +} + +static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa) +{ + struct page *page; + + down_read(¤t->mm->mmap_sem); + page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT); + up_read(¤t->mm->mmap_sem); + + if (is_error_page(page)) { + printk(KERN_ERR "%s: could not find page at 0x%llx\n", + __func__, gpa); + kvm_release_page_clean(page); + kvm_queue_exception(&svm->vcpu, GP_VECTOR); + return NULL; + } + return page; +} + +static int nested_svm_do(struct vcpu_svm *svm, + u64 arg1_gpa, u64 arg2_gpa, void *opaque, + int (*handler)(struct vcpu_svm *svm, + void *arg1, + void *arg2, + void *opaque)) +{ + struct page *arg1_page; + struct page *arg2_page = NULL; + void *arg1; + void *arg2 = NULL; + int retval; + + arg1_page = nested_svm_get_page(svm, arg1_gpa); + if(arg1_page == NULL) + return 1; + + if (arg2_gpa) { + arg2_page = nested_svm_get_page(svm, arg2_gpa); + if(arg2_page == NULL) { + kvm_release_page_clean(arg1_page); + return 1; + } + } + + arg1 = kmap_atomic(arg1_page, KM_USER0); + if (arg2_gpa) + arg2 = kmap_atomic(arg2_page, KM_USER1); + + retval = handler(svm, arg1, arg2, opaque); + + kunmap_atomic(arg1, KM_USER0); + if (arg2_gpa) + kunmap_atomic(arg2, KM_USER1); + + kvm_release_page_dirty(arg1_page); + if (arg2_gpa) + kvm_release_page_dirty(arg2_page); + + return retval; +} + static int invalid_op_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-17 13:41 ` [PATCH 3/9] Add helper functions for nested SVM v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 5/9] Implement hsave v3 Alexander Graf 2008-09-25 18:47 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Joerg Roedel 0 siblings, 2 replies; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf This patch implements the GIF flag and the clgi and stgi instructions that set this flag. Only if the flag is set (default), interrupts can be received by the CPU. To keep the information about that somewhere, this patch adds a new hidden flags vector. that is used to store information that does not go into the vmcb, but is SVM specific. v2 moves the hflags to x86 generic code v3 makes use of the new permission helper Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++++++++--- include/asm-x86/kvm_host.h | 3 +++ 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c72e728..469ecc5 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -612,6 +612,8 @@ static void init_vmcb(struct vcpu_svm *svm) save->cr4 = 0; } force_new_asid(&svm->vcpu); + + svm->vcpu.arch.hflags = HF_GIF_MASK; } static int svm_vcpu_reset(struct kvm_vcpu *vcpu) @@ -1227,6 +1229,36 @@ static int nested_svm_do(struct vcpu_svm *svm, return retval; } +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + if (nested_svm_check_permissions(svm)) + return 1; + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + svm->vcpu.arch.hflags |= HF_GIF_MASK; + + return 1; +} + +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + if (nested_svm_check_permissions(svm)) + return 1; + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + svm->vcpu.arch.hflags &= ~HF_GIF_MASK; + + /* After a CLGI no interrupts should come */ + svm_clear_vintr(svm); + svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; + + return 1; +} + static int invalid_op_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { @@ -1521,8 +1553,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_VMMCALL] = vmmcall_interception, [SVM_EXIT_VMLOAD] = invalid_op_interception, [SVM_EXIT_VMSAVE] = invalid_op_interception, - [SVM_EXIT_STGI] = invalid_op_interception, - [SVM_EXIT_CLGI] = invalid_op_interception, + [SVM_EXIT_STGI] = stgi_interception, + [SVM_EXIT_CLGI] = clgi_interception, [SVM_EXIT_SKINIT] = invalid_op_interception, [SVM_EXIT_WBINVD] = emulate_on_interception, [SVM_EXIT_MONITOR] = invalid_op_interception, @@ -1669,6 +1701,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) if (!kvm_cpu_has_interrupt(vcpu)) goto out; + if (!(svm->vcpu.arch.hflags & HF_GIF_MASK)) + goto out; + if (!(vmcb->save.rflags & X86_EFLAGS_IF) || (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || (vmcb->control.event_inj & SVM_EVTINJ_VALID)) { @@ -1720,7 +1755,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, svm->vcpu.arch.interrupt_window_open = (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) && - (svm->vmcb->save.rflags & X86_EFLAGS_IF)); + (svm->vmcb->save.rflags & X86_EFLAGS_IF) && + (svm->vcpu.arch.hflags & HF_GIF_MASK)); if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary) /* diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 982b6b2..3e25004 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -245,6 +245,7 @@ struct kvm_vcpu_arch { unsigned long cr3; unsigned long cr4; unsigned long cr8; + u32 hflags; u64 pdptrs[4]; /* pae */ u64 shadow_efer; u64 apic_base; @@ -734,6 +735,8 @@ enum { TASK_SWITCH_GATE = 3, }; +#define HF_GIF_MASK (1 << 0) + /* * Hardware virtualization extension instructions may fault if a * reboot turns off virtualization while processes are running. -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 5/9] Implement hsave v3 2008-09-17 13:41 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v3 Alexander Graf 2008-09-25 18:47 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Joerg Roedel 1 sibling, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf Implement the hsave MSR, that gives the VCPU a GPA to save the old guest state in. v2 allows userspace to save/restore hsave Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/kvm_svm.h | 2 ++ arch/x86/kvm/svm.c | 7 +++++++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h index 65ef0fc..76ad107 100644 --- a/arch/x86/kvm/kvm_svm.h +++ b/arch/x86/kvm/kvm_svm.h @@ -41,6 +41,8 @@ struct vcpu_svm { unsigned long host_dr7; u32 *msrpm; + + u64 nested_hsave; }; #endif diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 469ecc5..987d06c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -613,6 +613,7 @@ static void init_vmcb(struct vcpu_svm *svm) } force_new_asid(&svm->vcpu); + svm->nested_hsave = 0; svm->vcpu.arch.hflags = HF_GIF_MASK; } @@ -1363,6 +1364,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_IA32_LASTINTTOIP: *data = svm->vmcb->save.last_excp_to; break; + case MSR_VM_HSAVE_PA: + *data = svm->nested_hsave; + break; default: return kvm_get_msr_common(vcpu, ecx, data); } @@ -1457,6 +1461,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", ecx, data); break; + case MSR_VM_HSAVE_PA: + svm->nested_hsave = data; + break; default: return kvm_set_msr_common(vcpu, ecx, data); } -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 6/9] Add VMLOAD and VMSAVE handlers v3 2008-09-17 13:41 ` [PATCH 5/9] Implement hsave v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 7/9] Add VMRUN handler v3 Alexander Graf 0 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf This implements the VMLOAD and VMSAVE instructions, that usually surround the VMRUN instructions. Both instructions load / restore the same elements, so we only need to implement them once. v2 fixes CPL checking and replaces memcpy by assignments v3 makes use of the new permission checking Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 58 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 987d06c..0aa22e5 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1230,6 +1230,62 @@ static int nested_svm_do(struct vcpu_svm *svm, return retval; } +static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb) +{ + to_vmcb->save.fs = from_vmcb->save.fs; + to_vmcb->save.gs = from_vmcb->save.gs; + to_vmcb->save.tr = from_vmcb->save.tr; + to_vmcb->save.ldtr = from_vmcb->save.ldtr; + to_vmcb->save.kernel_gs_base = from_vmcb->save.kernel_gs_base; + to_vmcb->save.star = from_vmcb->save.star; + to_vmcb->save.lstar = from_vmcb->save.lstar; + to_vmcb->save.cstar = from_vmcb->save.cstar; + to_vmcb->save.sfmask = from_vmcb->save.sfmask; + to_vmcb->save.sysenter_cs = from_vmcb->save.sysenter_cs; + to_vmcb->save.sysenter_esp = from_vmcb->save.sysenter_esp; + to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip; + + return 1; +} + +static int nested_svm_vmload(struct vcpu_svm *svm, void *nested_vmcb, + void *arg2, void *opaque) +{ + return nested_svm_vmloadsave((struct vmcb *)nested_vmcb, svm->vmcb); +} + +static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb, + void *arg2, void *opaque) +{ + return nested_svm_vmloadsave(svm->vmcb, (struct vmcb *)nested_vmcb); +} + +static int vmload_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + if (nested_svm_check_permissions(svm)) + return 1; + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmload); + + return 1; +} + +static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + if (nested_svm_check_permissions(svm)) + return 1; + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmsave); + + return 1; +} + static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { if (nested_svm_check_permissions(svm)) @@ -1558,8 +1614,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_SHUTDOWN] = shutdown_interception, [SVM_EXIT_VMRUN] = invalid_op_interception, [SVM_EXIT_VMMCALL] = vmmcall_interception, - [SVM_EXIT_VMLOAD] = invalid_op_interception, - [SVM_EXIT_VMSAVE] = invalid_op_interception, + [SVM_EXIT_VMLOAD] = vmload_interception, + [SVM_EXIT_VMSAVE] = vmsave_interception, [SVM_EXIT_STGI] = stgi_interception, [SVM_EXIT_CLGI] = clgi_interception, [SVM_EXIT_SKINIT] = invalid_op_interception, -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 7/9] Add VMRUN handler v3 2008-09-17 13:41 ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 8/9] Add VMEXIT handler and intercepts v3 Alexander Graf 2008-09-19 15:59 ` [PATCH 7/9] Add VMRUN handler v3 Joerg Roedel 0 siblings, 2 replies; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf This patch implements VMRUN. VMRUN enters a virtual CPU and runs that in the same context as the normal guest CPU would run. So basically it is implemented the same way, a normal CPU would do it. We also prepare all intercepts that get OR'ed with the original intercepts, as we do not allow a level 2 guest to be intercepted less than the first level guest. v2 implements the following improvements: - fixes the CPL check - does not allocate iopm when not used - remembers the host's IF in the HIF bit in the hflags v3: - make use of the new permission checking - add support for V_INTR_MASKING_MASK Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/kvm_svm.h | 9 ++ arch/x86/kvm/svm.c | 198 +++++++++++++++++++++++++++++++++++++++++++- include/asm-x86/kvm_host.h | 2 + 3 files changed, 207 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h index 76ad107..2afe0ce 100644 --- a/arch/x86/kvm/kvm_svm.h +++ b/arch/x86/kvm/kvm_svm.h @@ -43,6 +43,15 @@ struct vcpu_svm { u32 *msrpm; u64 nested_hsave; + u64 nested_vmcb; + + /* These are the merged vectors */ + u32 *nested_msrpm; + u32 *nested_iopm; + + /* gpa pointers to the real vectors */ + u64 nested_vmcb_msrpm; + u64 nested_vmcb_iopm; }; #endif diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0aa22e5..3601e75 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL"); /* Turn on to get debugging output*/ /* #define NESTED_DEBUG */ +/* Not needed until device passthrough */ +/* #define NESTED_KVM_MERGE_IOPM */ + #ifdef NESTED_DEBUG #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args) #else @@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_svm, vcpu); } +static inline bool is_nested(struct vcpu_svm *svm) +{ + return svm->nested_vmcb; +} + static unsigned long iopm_base; struct kvm_ldttss_desc { @@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm) force_new_asid(&svm->vcpu); svm->nested_hsave = 0; + svm->nested_vmcb = 0; svm->vcpu.arch.hflags = HF_GIF_MASK; } @@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) struct vcpu_svm *svm; struct page *page; struct page *msrpm_pages; + struct page *nested_msrpm_pages; +#ifdef NESTED_KVM_MERGE_IOPM + struct page *nested_iopm_pages; +#endif int err; svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); @@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); if (!msrpm_pages) goto uninit; + + nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); + if (!nested_msrpm_pages) + goto uninit; + +#ifdef NESTED_KVM_MERGE_IOPM + nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER); + if (!nested_iopm_pages) + goto uninit; +#endif + svm->msrpm = page_address(msrpm_pages); svm_vcpu_init_msrpm(svm->msrpm); + svm->nested_msrpm = page_address(nested_msrpm_pages); +#ifdef NESTED_KVM_MERGE_IOPM + svm->nested_iopm = page_address(nested_iopm_pages); +#endif + svm->vmcb = page_address(page); clear_page(svm->vmcb); svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT; @@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT)); __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); + __free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER); +#ifdef NESTED_KVM_MERGE_IOPM + __free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER); +#endif kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); } @@ -1230,6 +1263,138 @@ static int nested_svm_do(struct vcpu_svm *svm, return retval; } + +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1, + void *arg2, void *opaque) +{ + int i; + u32 *nested_msrpm = (u32*)arg1; + for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++) + svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i]; + svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm); + + return 0; +} + +#ifdef NESTED_KVM_MERGE_IOPM +static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1, + void *arg2, void *opaque) +{ + int i; + u32 *nested_iopm = (u32*)arg1; + u32 *iopm = (u32*)__va(iopm_base); + for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++) + svm->nested_iopm[i] = iopm[i] | nested_iopm[i]; + svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm); + + return 0; +} +#endif + +static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, + void *arg2, void *opaque) +{ + struct vmcb *nested_vmcb = (struct vmcb *)arg1; + struct vmcb *hsave = (struct vmcb *)arg2; + + /* nested_vmcb is our indicator if nested SVM is activated */ + svm->nested_vmcb = svm->vmcb->save.rax; + + /* Clear internal status */ + svm->vcpu.arch.exception.pending = false; + + /* Save the old vmcb, so we don't need to pick what we save, but + can restore everything when a VMEXIT occurs */ + memcpy(hsave, svm->vmcb, sizeof(struct vmcb)); + /* We need to remember the original CR3 in the SPT case */ + if (!npt_enabled) + hsave->save.cr3 = svm->vcpu.arch.cr3; + hsave->save.rip = svm->next_rip; + + if (svm->vmcb->save.rflags & X86_EFLAGS_IF) + svm->vcpu.arch.hflags |= HF_HIF_MASK; + else + svm->vcpu.arch.hflags &= ~HF_HIF_MASK; + + /* Load the nested guest state */ + svm->vmcb->save.es = nested_vmcb->save.es; + svm->vmcb->save.cs = nested_vmcb->save.cs; + svm->vmcb->save.ss = nested_vmcb->save.ss; + svm->vmcb->save.ds = nested_vmcb->save.ds; + svm->vmcb->save.gdtr = nested_vmcb->save.gdtr; + svm->vmcb->save.idtr = nested_vmcb->save.idtr; + svm->vmcb->save.rflags = nested_vmcb->save.rflags; + svm_set_efer(&svm->vcpu, nested_vmcb->save.efer); + svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0); + svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4); + if (npt_enabled) { + svm->vmcb->save.cr3 = nested_vmcb->save.cr3; + svm->vcpu.arch.cr3 = nested_vmcb->save.cr3; + } else { + kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3); + kvm_mmu_reset_context(&svm->vcpu); + } + svm->vmcb->save.cr2 = nested_vmcb->save.cr2; + kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax); + kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp); + kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip); + /* In case we don't even reach vcpu_run, the fields are not updated */ + svm->vmcb->save.rax = nested_vmcb->save.rax; + svm->vmcb->save.rsp = nested_vmcb->save.rsp; + svm->vmcb->save.rip = nested_vmcb->save.rip; + svm->vmcb->save.dr7 = nested_vmcb->save.dr7; + svm->vmcb->save.dr6 = nested_vmcb->save.dr6; + svm->vmcb->save.cpl = nested_vmcb->save.cpl; + + /* We don't want a nested guest to be more powerful than the guest, + so all intercepts are ORed */ + svm->vmcb->control.intercept_cr_read |= + nested_vmcb->control.intercept_cr_read; + svm->vmcb->control.intercept_cr_write |= + nested_vmcb->control.intercept_cr_write; + svm->vmcb->control.intercept_dr_read |= + nested_vmcb->control.intercept_dr_read; + svm->vmcb->control.intercept_dr_write |= + nested_vmcb->control.intercept_dr_write; + svm->vmcb->control.intercept_exceptions |= + nested_vmcb->control.intercept_exceptions; + + svm->vmcb->control.intercept |= nested_vmcb->control.intercept; + + svm->nested_vmcb_msrpm = nested_vmcb->control.msrpm_base_pa; + svm->nested_vmcb_iopm = nested_vmcb->control.iopm_base_pa; + + force_new_asid(&svm->vcpu); + svm->vmcb->control.exit_int_info = nested_vmcb->control.exit_int_info; + svm->vmcb->control.exit_int_info_err = nested_vmcb->control.exit_int_info_err; + svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK; + if (nested_vmcb->control.int_ctl & V_IRQ_MASK) { + nsvm_printk("nSVM Injecting Interrupt: 0x%x\n", + nested_vmcb->control.int_ctl); + } + if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK) + svm->vcpu.arch.hflags |= HF_VINTR_MASK; + else + svm->vcpu.arch.hflags &= ~HF_VINTR_MASK; + + nsvm_printk("nSVM exit_int_info: 0x%x | int_state: 0x%x\n", + nested_vmcb->control.exit_int_info, + nested_vmcb->control.int_state); + + svm->vmcb->control.int_vector = nested_vmcb->control.int_vector; + svm->vmcb->control.int_state = nested_vmcb->control.int_state; + svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset; + if (nested_vmcb->control.event_inj & SVM_EVTINJ_VALID) + nsvm_printk("Injecting Event: 0x%x\n", + nested_vmcb->control.event_inj); + svm->vmcb->control.event_inj = nested_vmcb->control.event_inj; + svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err; + + svm->vcpu.arch.hflags |= HF_GIF_MASK; + + return 1; +} + static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb) { to_vmcb->save.fs = from_vmcb->save.fs; @@ -1286,6 +1451,34 @@ static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + nsvm_printk("VMrun\n"); + if (nested_svm_check_permissions(svm)) + return 1; + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + if (nested_svm_do(svm, svm->vmcb->save.rax, svm->nested_hsave, + NULL, nested_svm_vmrun)) + return 1; + + /* Right now SVM always intecepts all IOs except for the DEBUG port. + This is fine for the nested VM too IMHO. */ +#ifdef NESTED_KVM_MERGE_IOPM + if (nested_svm_do(svm, svm->vmcb->control.iopm_base_pa, 0, + NULL, nested_svm_vmrun_iopm)) + return 1; +#endif + + if (nested_svm_do(svm, svm->vmcb->control.msrpm_base_pa, 0, + NULL, nested_svm_vmrun_msrpm)) + return 1; + + return 1; +} + static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { if (nested_svm_check_permissions(svm)) @@ -1612,7 +1805,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_MSR] = msr_interception, [SVM_EXIT_TASK_SWITCH] = task_switch_interception, [SVM_EXIT_SHUTDOWN] = shutdown_interception, - [SVM_EXIT_VMRUN] = invalid_op_interception, + [SVM_EXIT_VMRUN] = vmrun_interception, [SVM_EXIT_VMMCALL] = vmmcall_interception, [SVM_EXIT_VMLOAD] = vmload_interception, [SVM_EXIT_VMSAVE] = vmsave_interception, @@ -1918,7 +2111,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) svm->host_cr2 = kvm_read_cr2(); svm->host_dr6 = read_dr6(); svm->host_dr7 = read_dr7(); - svm->vmcb->save.cr2 = vcpu->arch.cr2; + if (!is_nested(svm)) + svm->vmcb->save.cr2 = vcpu->arch.cr2; /* required for live migration with NPT */ if (npt_enabled) svm->vmcb->save.cr3 = vcpu->arch.cr3; diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 3e25004..5b4042b 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -736,6 +736,8 @@ enum { }; #define HF_GIF_MASK (1 << 0) +#define HF_HIF_MASK (1 << 1) +#define HF_VINTR_MASK (1 << 2) /* * Hardware virtualization extension instructions may fault if a -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 8/9] Add VMEXIT handler and intercepts v3 2008-09-17 13:41 ` [PATCH 7/9] Add VMRUN handler v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 2008-09-17 13:41 ` [PATCH 9/9] Allow setting the SVME bit v3 Alexander Graf 2008-09-19 15:59 ` [PATCH 7/9] Add VMRUN handler v3 Joerg Roedel 1 sibling, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf This adds the #VMEXIT intercept, so we return to the level 1 guest when something happens in the level 2 guest that should return to the level 1 guest. v2 implements HIF handling and cleans up exception interception v3 adds support for V_INTR_MASKING_MASK Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 326 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 326 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3601e75..dc272c7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -74,6 +74,13 @@ module_param(npt, int, S_IRUGO); static void kvm_reput_irq(struct vcpu_svm *svm); static void svm_flush_tlb(struct kvm_vcpu *vcpu); +static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override); +static int nested_svm_vmexit(struct vcpu_svm *svm); +static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb, + void *arg2, void *opaque); +static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, + bool has_error_code, u32 error_code); + static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_svm, vcpu); @@ -223,6 +230,11 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, { struct vcpu_svm *svm = to_svm(vcpu); + /* If we are within a nested VM we'd better #VMEXIT and let the + guest handle the exception */ + if (nested_svm_check_exception(svm, nr, has_error_code, error_code)) + return; + svm->vmcb->control.event_inj = nr | SVM_EVTINJ_VALID | (has_error_code ? SVM_EVTINJ_VALID_ERR : 0) @@ -1203,6 +1215,46 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm) return 0; } +static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, + bool has_error_code, u32 error_code) +{ + if (is_nested(svm)) { + svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr; + svm->vmcb->control.exit_code_hi = 0; + svm->vmcb->control.exit_info_1 = error_code; + svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2; + if (nested_svm_exit_handled(svm, false)) { + nsvm_printk("VMexit -> EXCP 0x%x\n", nr); + + nested_svm_vmexit(svm); + return 1; + } + } + + return 0; +} + +static inline int nested_svm_intr(struct vcpu_svm *svm) +{ + if (is_nested(svm)) { + if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK)) + return 0; + + if (!(svm->vcpu.arch.hflags & HF_HIF_MASK)) + return 0; + + svm->vmcb->control.exit_code = SVM_EXIT_INTR; + + if (nested_svm_exit_handled(svm, false)) { + nsvm_printk("VMexit -> INTR\n"); + nested_svm_vmexit(svm); + return 1; + } + } + + return 0; +} + static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa) { struct page *page; @@ -1263,6 +1315,261 @@ static int nested_svm_do(struct vcpu_svm *svm, return retval; } +static int nested_svm_exit_handled_real(struct vcpu_svm *svm, + void *arg1, + void *arg2, + void *opaque) +{ + struct vmcb *nested_vmcb = (struct vmcb *)arg1; + bool kvm_overrides = *(bool *)opaque; + u32 exit_code = svm->vmcb->control.exit_code; + + if (kvm_overrides) { + switch (exit_code) { + case SVM_EXIT_INTR: + case SVM_EXIT_NMI: + return 0; + /* For now we are always handling NPFs when using them */ + case SVM_EXIT_NPF: + if (npt_enabled) + return 0; + break; + /* When we're shadowing, trap PFs */ + case SVM_EXIT_EXCP_BASE + PF_VECTOR: + if (!npt_enabled) + return 0; + break; + default: + break; + } + } + + switch (exit_code) { + case SVM_EXIT_READ_CR0 ... SVM_EXIT_READ_CR8: { + u32 cr_bits = 1 << (exit_code - SVM_EXIT_READ_CR0); + if (nested_vmcb->control.intercept_cr_read & cr_bits) + return 1; + break; + } + case SVM_EXIT_WRITE_CR0 ... SVM_EXIT_WRITE_CR8: { + u32 cr_bits = 1 << (exit_code - SVM_EXIT_WRITE_CR0); + if (nested_vmcb->control.intercept_cr_write & cr_bits) + return 1; + break; + } + case SVM_EXIT_READ_DR0 ... SVM_EXIT_READ_DR7: { + u32 dr_bits = 1 << (exit_code - SVM_EXIT_READ_DR0); + if (nested_vmcb->control.intercept_dr_read & dr_bits) + return 1; + break; + } + case SVM_EXIT_WRITE_DR0 ... SVM_EXIT_WRITE_DR7: { + u32 dr_bits = 1 << (exit_code - SVM_EXIT_WRITE_DR0); + if (nested_vmcb->control.intercept_dr_write & dr_bits) + return 1; + break; + } + case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: { + u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE); + if (nested_vmcb->control.intercept_exceptions & excp_bits) + return 1; + break; + } + default: { + u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR); + nsvm_printk("exit code: 0x%x\n", exit_code); + if (nested_vmcb->control.intercept & exit_bits) + return 1; + } + } + + return 0; +} + +#ifdef NESTED_KVM_MERGE_IOPM +static int nested_svm_exit_handled_io(struct vcpu_svm *svm, + void *arg1, void *arg2, + void *opaque) +{ + struct vmcb *nested_vmcb = (struct vmcb *)arg1; + u16 param = (u16)(svm->vmcb->control.exit_info_1); + u16 port = (u16)(svm->vmcb->control.exit_info_1 >> 16); + u16 mask = (1 << ((param >> 4) & 7)) - 1; + u8 *iopm = (u8 *)arg2 + (port / 8); + u16 iopmw = iopm[0] | (iopm[1] << 8); + + if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_IOIO_PROT))) + return 0; + if (iopmw & (mask << (port & 7))) + return 1; + + nsvm_printk("nKVM: No IO-Intercept on param=0x%hx port=0x%hx " + "mask=0x%hx iopm=0x%hx\n", param, port, mask, iopmw); + + return 0; +} +#endif + +static int nested_svm_exit_handled_msr(struct vcpu_svm *svm, + void *arg1, void *arg2, + void *opaque) +{ + struct vmcb *nested_vmcb = (struct vmcb *)arg1; + u8 *msrpm = (u8 *)arg2; + u32 t0, t1; + u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX]; + u32 param = svm->vmcb->control.exit_info_1 & 1; + + if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_MSR_PROT))) + return 0; + + switch(msr) { + case 0 ... 0x1fff: + t0 = (msr * 2) % 8; + t1 = msr / 8; + break; + case 0xc0000000 ... 0xc0001fff: + t0 = (8192 + msr - 0xc0000000) * 2; + t1 = (t0 / 8); + t0 %= 8; + break; + case 0xc0010000 ... 0xc0011fff: + t0 = (16384 + msr - 0xc0010000) * 2; + t1 = (t0 / 8); + t0 %= 8; + break; + default: + return 1; + break; + } + if (msrpm[t1] & ((1 << param) << t0)) + return 1; + + return 0; +} + +static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override) +{ + bool k = kvm_override; + + switch (svm->vmcb->control.exit_code) { +#ifdef NESTED_KVM_MERGE_IOPM + case SVM_EXIT_IOIO: + return nested_svm_do(svm, svm->nested_vmcb, + svm->nested_vmcb_iopm, NULL, + nested_svm_exit_handled_io); + break; +#endif + case SVM_EXIT_MSR: + return nested_svm_do(svm, svm->nested_vmcb, + svm->nested_vmcb_msrpm, NULL, + nested_svm_exit_handled_msr); + default: break; + } + + return nested_svm_do(svm, svm->nested_vmcb, 0, &k, + nested_svm_exit_handled_real); +} + +static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, + void *arg2, void *opaque) +{ + struct vmcb *nested_vmcb = (struct vmcb *)arg1; + struct vmcb *hsave = (struct vmcb *)arg2; + u64 nested_save[] = { nested_vmcb->save.cr0, + nested_vmcb->save.cr3, + nested_vmcb->save.cr4, + nested_vmcb->save.efer, + nested_vmcb->control.intercept_cr_read, + nested_vmcb->control.intercept_cr_write, + nested_vmcb->control.intercept_dr_read, + nested_vmcb->control.intercept_dr_write, + nested_vmcb->control.intercept_exceptions, + nested_vmcb->control.intercept, + nested_vmcb->control.msrpm_base_pa, + nested_vmcb->control.iopm_base_pa, + nested_vmcb->control.tsc_offset }; + + /* Give the current vmcb to the guest */ + memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb)); + nested_vmcb->save.cr0 = nested_save[0]; + if (!npt_enabled) + nested_vmcb->save.cr3 = nested_save[1]; + nested_vmcb->save.cr4 = nested_save[2]; + nested_vmcb->save.efer = nested_save[3]; + nested_vmcb->control.intercept_cr_read = nested_save[4]; + nested_vmcb->control.intercept_cr_write = nested_save[5]; + nested_vmcb->control.intercept_dr_read = nested_save[6]; + nested_vmcb->control.intercept_dr_write = nested_save[7]; + nested_vmcb->control.intercept_exceptions = nested_save[8]; + nested_vmcb->control.intercept = nested_save[9]; + nested_vmcb->control.msrpm_base_pa = nested_save[10]; + nested_vmcb->control.iopm_base_pa = nested_save[11]; + nested_vmcb->control.tsc_offset = nested_save[12]; + + /* We always set V_INTR_MASKING and remember the old value in hflags */ + if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK)) + nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK; + + if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) && + (nested_vmcb->control.int_vector)) { + nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n", + nested_vmcb->control.int_vector); + } + + /* Restore the original control entries */ + svm->vmcb->control = hsave->control; + + /* Flush the virtual TLB */ + force_new_asid(&svm->vcpu); + + /* Kill any pending exceptions */ + if (svm->vcpu.arch.exception.pending == true) + nsvm_printk("WARNING: Pending Exception\n"); + svm->vcpu.arch.exception.pending = false; + + /* Restore selected save entries */ + svm->vmcb->save.es = hsave->save.es; + svm->vmcb->save.cs = hsave->save.cs; + svm->vmcb->save.ss = hsave->save.ss; + svm->vmcb->save.ds = hsave->save.ds; + svm->vmcb->save.gdtr = hsave->save.gdtr; + svm->vmcb->save.idtr = hsave->save.idtr; + svm->vmcb->save.rflags = hsave->save.rflags; + svm_set_efer(&svm->vcpu, hsave->save.efer); + svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE); + svm_set_cr4(&svm->vcpu, hsave->save.cr4); + if (npt_enabled) { + svm->vmcb->save.cr3 = hsave->save.cr3; + svm->vcpu.arch.cr3 = hsave->save.cr3; + } else { + kvm_set_cr3(&svm->vcpu, hsave->save.cr3); + } + kvm_mmu_reset_context(&svm->vcpu); + kvm_mmu_load(&svm->vcpu); + kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax); + kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp); + kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip); + svm->vmcb->save.dr7 = 0; + svm->vmcb->save.cpl = 0; + svm->vmcb->control.exit_int_info = 0; + + svm->vcpu.arch.hflags &= ~HF_GIF_MASK; + /* Exit nested SVM mode */ + svm->nested_vmcb = 0; + + return 0; +} + +static int nested_svm_vmexit(struct vcpu_svm *svm) +{ + nsvm_printk("VMexit\n"); + if (nested_svm_do(svm, svm->nested_vmcb, svm->nested_hsave, + NULL, nested_svm_vmexit_real)) + return 1; + + return 0; +} static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1, void *arg2, void *opaque) @@ -1826,6 +2133,17 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip, (u32)((u64)svm->vmcb->save.rip >> 32), entryexit); + if (is_nested(svm)) { + nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n", + exit_code, svm->vmcb->control.exit_info_1, + svm->vmcb->control.exit_info_2, svm->vmcb->save.rip); + if (nested_svm_exit_handled(svm, true)) { + nested_svm_vmexit(svm); + nsvm_printk("-> #VMEXIT\n"); + return 1; + } + } + if (npt_enabled) { int mmu_reload = 0; if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) { @@ -1912,6 +2230,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq) { struct vcpu_svm *svm = to_svm(vcpu); + nested_svm_intr(svm); + svm_inject_irq(svm, irq); } @@ -1957,6 +2277,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) if (!kvm_cpu_has_interrupt(vcpu)) goto out; + if (nested_svm_intr(svm)) + goto out; + if (!(svm->vcpu.arch.hflags & HF_GIF_MASK)) goto out; @@ -2009,6 +2332,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, struct vcpu_svm *svm = to_svm(vcpu); struct vmcb_control_area *control = &svm->vmcb->control; + if (nested_svm_intr(svm)) + return; + svm->vcpu.arch.interrupt_window_open = (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) && (svm->vmcb->save.rflags & X86_EFLAGS_IF) && -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 9/9] Allow setting the SVME bit v3 2008-09-17 13:41 ` [PATCH 8/9] Add VMEXIT handler and intercepts v3 Alexander Graf @ 2008-09-17 13:41 ` Alexander Graf 0 siblings, 0 replies; 30+ messages in thread From: Alexander Graf @ 2008-09-17 13:41 UTC (permalink / raw) To: kvm; +Cc: joro, anthony, avi, Alexander Graf Normally setting the SVME bit in EFER is not allowed, as we did not support SVM. Not since we do, we should also allow enabling SVM mode. v2 comes as last patch, so we don't enable half-ready code Signed-off-by: Alexander Graf <agraf@suse.de> --- arch/x86/kvm/svm.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index dc272c7..ac108b6 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -460,6 +460,8 @@ static __init int svm_hardware_setup(void) if (boot_cpu_has(X86_FEATURE_NX)) kvm_enable_efer_bits(EFER_NX); + kvm_enable_efer_bits(MSR_EFER_SVME_MASK); + for_each_online_cpu(cpu) { r = svm_cpu_init(cpu); if (r) -- 1.5.6 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-17 13:41 ` [PATCH 7/9] Add VMRUN handler v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 8/9] Add VMEXIT handler and intercepts v3 Alexander Graf @ 2008-09-19 15:59 ` Joerg Roedel 2008-09-25 17:32 ` Alexander Graf 1 sibling, 1 reply; 30+ messages in thread From: Joerg Roedel @ 2008-09-19 15:59 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, joro, anthony, avi On Wed, Sep 17, 2008 at 03:41:24PM +0200, Alexander Graf wrote: > This patch implements VMRUN. VMRUN enters a virtual CPU and runs that > in the same context as the normal guest CPU would run. > So basically it is implemented the same way, a normal CPU would do it. > > We also prepare all intercepts that get OR'ed with the original > intercepts, as we do not allow a level 2 guest to be intercepted less > than the first level guest. > > v2 implements the following improvements: > > - fixes the CPL check > - does not allocate iopm when not used > - remembers the host's IF in the HIF bit in the hflags > > v3: > > - make use of the new permission checking > - add support for V_INTR_MASKING_MASK > > Signed-off-by: Alexander Graf <agraf@suse.de> > --- > arch/x86/kvm/kvm_svm.h | 9 ++ > arch/x86/kvm/svm.c | 198 +++++++++++++++++++++++++++++++++++++++++++- > include/asm-x86/kvm_host.h | 2 + > 3 files changed, 207 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h > index 76ad107..2afe0ce 100644 > --- a/arch/x86/kvm/kvm_svm.h > +++ b/arch/x86/kvm/kvm_svm.h > @@ -43,6 +43,15 @@ struct vcpu_svm { > u32 *msrpm; > > u64 nested_hsave; > + u64 nested_vmcb; > + > + /* These are the merged vectors */ > + u32 *nested_msrpm; > + u32 *nested_iopm; > + > + /* gpa pointers to the real vectors */ > + u64 nested_vmcb_msrpm; > + u64 nested_vmcb_iopm; > }; > > #endif > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 0aa22e5..3601e75 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL"); > /* Turn on to get debugging output*/ > /* #define NESTED_DEBUG */ > > +/* Not needed until device passthrough */ > +/* #define NESTED_KVM_MERGE_IOPM */ > + > #ifdef NESTED_DEBUG > #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args) > #else > @@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) > return container_of(vcpu, struct vcpu_svm, vcpu); > } > > +static inline bool is_nested(struct vcpu_svm *svm) > +{ > + return svm->nested_vmcb; > +} > + > static unsigned long iopm_base; > > struct kvm_ldttss_desc { > @@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm) > force_new_asid(&svm->vcpu); > > svm->nested_hsave = 0; > + svm->nested_vmcb = 0; > svm->vcpu.arch.hflags = HF_GIF_MASK; > } > > @@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) > struct vcpu_svm *svm; > struct page *page; > struct page *msrpm_pages; > + struct page *nested_msrpm_pages; > +#ifdef NESTED_KVM_MERGE_IOPM > + struct page *nested_iopm_pages; > +#endif > int err; > > svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); > @@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) > msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); > if (!msrpm_pages) > goto uninit; > + > + nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); > + if (!nested_msrpm_pages) > + goto uninit; > + > +#ifdef NESTED_KVM_MERGE_IOPM > + nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER); > + if (!nested_iopm_pages) > + goto uninit; > +#endif > + > svm->msrpm = page_address(msrpm_pages); > svm_vcpu_init_msrpm(svm->msrpm); > > + svm->nested_msrpm = page_address(nested_msrpm_pages); > +#ifdef NESTED_KVM_MERGE_IOPM > + svm->nested_iopm = page_address(nested_iopm_pages); > +#endif > + > svm->vmcb = page_address(page); > clear_page(svm->vmcb); > svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT; > @@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) > > __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT)); > __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); > + __free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER); > +#ifdef NESTED_KVM_MERGE_IOPM > + __free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER); > +#endif > kvm_vcpu_uninit(vcpu); > kmem_cache_free(kvm_vcpu_cache, svm); > } > @@ -1230,6 +1263,138 @@ static int nested_svm_do(struct vcpu_svm *svm, > return retval; > } > > + > +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1, > + void *arg2, void *opaque) > +{ > + int i; > + u32 *nested_msrpm = (u32*)arg1; > + for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++) > + svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i]; > + svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm); > + > + return 0; > +} > + > +#ifdef NESTED_KVM_MERGE_IOPM > +static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1, > + void *arg2, void *opaque) > +{ > + int i; > + u32 *nested_iopm = (u32*)arg1; > + u32 *iopm = (u32*)__va(iopm_base); > + for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++) > + svm->nested_iopm[i] = iopm[i] | nested_iopm[i]; > + svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm); > + > + return 0; > +} > +#endif > + > +static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, > + void *arg2, void *opaque) > +{ > + struct vmcb *nested_vmcb = (struct vmcb *)arg1; > + struct vmcb *hsave = (struct vmcb *)arg2; > + > + /* nested_vmcb is our indicator if nested SVM is activated */ > + svm->nested_vmcb = svm->vmcb->save.rax; > + > + /* Clear internal status */ > + svm->vcpu.arch.exception.pending = false; > + > + /* Save the old vmcb, so we don't need to pick what we save, but > + can restore everything when a VMEXIT occurs */ > + memcpy(hsave, svm->vmcb, sizeof(struct vmcb)); This is a big security hole. With this we give the guest access to its own VMCB. The guest can take over or crash the whole host machine by rewriting its VMCB. We should be more selective what we save in the hsave area. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-19 15:59 ` [PATCH 7/9] Add VMRUN handler v3 Joerg Roedel @ 2008-09-25 17:32 ` Alexander Graf 2008-09-25 17:37 ` Joerg Roedel 0 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-25 17:32 UTC (permalink / raw) To: Joerg Roedel Cc: kvm@vger.kernel.org, joro@8bytes.org, anthony@codemonkey.ws, avi@qumranet.com Am 19.09.2008 um 17:59 schrieb Joerg Roedel <joerg.roedel@amd.com>: > On Wed, Sep 17, 2008 at 03:41:24PM +0200, Alexander Graf wrote: >> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that >> in the same context as the normal guest CPU would run. >> So basically it is implemented the same way, a normal CPU would do >> it. >> >> We also prepare all intercepts that get OR'ed with the original >> intercepts, as we do not allow a level 2 guest to be intercepted less >> than the first level guest. >> >> v2 implements the following improvements: >> >> - fixes the CPL check >> - does not allocate iopm when not used >> - remembers the host's IF in the HIF bit in the hflags >> >> v3: >> >> - make use of the new permission checking >> - add support for V_INTR_MASKING_MASK >> >> Signed-off-by: Alexander Graf <agraf@suse.de> >> --- >> arch/x86/kvm/kvm_svm.h | 9 ++ >> arch/x86/kvm/svm.c | 198 ++++++++++++++++++++++++++++++++++ >> +++++++++- >> include/asm-x86/kvm_host.h | 2 + >> 3 files changed, 207 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h >> index 76ad107..2afe0ce 100644 >> --- a/arch/x86/kvm/kvm_svm.h >> +++ b/arch/x86/kvm/kvm_svm.h >> @@ -43,6 +43,15 @@ struct vcpu_svm { >> u32 *msrpm; >> >> u64 nested_hsave; >> + u64 nested_vmcb; >> + >> + /* These are the merged vectors */ >> + u32 *nested_msrpm; >> + u32 *nested_iopm; >> + >> + /* gpa pointers to the real vectors */ >> + u64 nested_vmcb_msrpm; >> + u64 nested_vmcb_iopm; >> }; >> >> #endif >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index 0aa22e5..3601e75 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL"); >> /* Turn on to get debugging output*/ >> /* #define NESTED_DEBUG */ >> >> +/* Not needed until device passthrough */ >> +/* #define NESTED_KVM_MERGE_IOPM */ >> + >> #ifdef NESTED_DEBUG >> #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args) >> #else >> @@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct >> kvm_vcpu *vcpu) >> return container_of(vcpu, struct vcpu_svm, vcpu); >> } >> >> +static inline bool is_nested(struct vcpu_svm *svm) >> +{ >> + return svm->nested_vmcb; >> +} >> + >> static unsigned long iopm_base; >> >> struct kvm_ldttss_desc { >> @@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm) >> force_new_asid(&svm->vcpu); >> >> svm->nested_hsave = 0; >> + svm->nested_vmcb = 0; >> svm->vcpu.arch.hflags = HF_GIF_MASK; >> } >> >> @@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct >> kvm *kvm, unsigned int id) >> struct vcpu_svm *svm; >> struct page *page; >> struct page *msrpm_pages; >> + struct page *nested_msrpm_pages; >> +#ifdef NESTED_KVM_MERGE_IOPM >> + struct page *nested_iopm_pages; >> +#endif >> int err; >> >> svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); >> @@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct >> kvm *kvm, unsigned int id) >> msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); >> if (!msrpm_pages) >> goto uninit; >> + >> + nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER); >> + if (!nested_msrpm_pages) >> + goto uninit; >> + >> +#ifdef NESTED_KVM_MERGE_IOPM >> + nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER); >> + if (!nested_iopm_pages) >> + goto uninit; >> +#endif >> + >> svm->msrpm = page_address(msrpm_pages); >> svm_vcpu_init_msrpm(svm->msrpm); >> >> + svm->nested_msrpm = page_address(nested_msrpm_pages); >> +#ifdef NESTED_KVM_MERGE_IOPM >> + svm->nested_iopm = page_address(nested_iopm_pages); >> +#endif >> + >> svm->vmcb = page_address(page); >> clear_page(svm->vmcb); >> svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT; >> @@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) >> >> __free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT)); >> __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); >> + __free_pages(virt_to_page(svm->nested_msrpm), >> MSRPM_ALLOC_ORDER); >> +#ifdef NESTED_KVM_MERGE_IOPM >> + __free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER); >> +#endif >> kvm_vcpu_uninit(vcpu); >> kmem_cache_free(kvm_vcpu_cache, svm); >> } >> @@ -1230,6 +1263,138 @@ static int nested_svm_do(struct vcpu_svm >> *svm, >> return retval; >> } >> >> + >> +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1, >> + void *arg2, void *opaque) >> +{ >> + int i; >> + u32 *nested_msrpm = (u32*)arg1; >> + for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++) >> + svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i]; >> + svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm); >> + >> + return 0; >> +} >> + >> +#ifdef NESTED_KVM_MERGE_IOPM >> +static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1, >> + void *arg2, void *opaque) >> +{ >> + int i; >> + u32 *nested_iopm = (u32*)arg1; >> + u32 *iopm = (u32*)__va(iopm_base); >> + for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++) >> + svm->nested_iopm[i] = iopm[i] | nested_iopm[i]; >> + svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm); >> + >> + return 0; >> +} >> +#endif >> + >> +static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, >> + void *arg2, void *opaque) >> +{ >> + struct vmcb *nested_vmcb = (struct vmcb *)arg1; >> + struct vmcb *hsave = (struct vmcb *)arg2; >> + >> + /* nested_vmcb is our indicator if nested SVM is activated */ >> + svm->nested_vmcb = svm->vmcb->save.rax; >> + >> + /* Clear internal status */ >> + svm->vcpu.arch.exception.pending = false; >> + >> + /* Save the old vmcb, so we don't need to pick what we save, but >> + can restore everything when a VMEXIT occurs */ >> + memcpy(hsave, svm->vmcb, sizeof(struct vmcb)); > > This is a big security hole. With this we give the guest access to its > own VMCB. The guest can take over or crash the whole host machine by > rewriting its VMCB. We should be more selective what we save in the > hsave area. Oh, right. I didn't even think of a case where the nested guest would have acvess to the hsave of itself. Since the hsave can never be used twice on one vcpu, we could just allocate our own memory for the hsave in the vcpu context and leave the nested hsave empty. Alex > > > Joerg > > -- > | AMD Saxony Limited Liability Company & Co. KG > Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany > System | Register Court Dresden: HRA 4896 > Research | General Partner authorized to represent: > Center | AMD Saxony LLC (Wilmington, Delaware, US) > | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, > Thomas McCoy > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-25 17:32 ` Alexander Graf @ 2008-09-25 17:37 ` Joerg Roedel 2008-09-25 20:00 ` Alexander Graf 0 siblings, 1 reply; 30+ messages in thread From: Joerg Roedel @ 2008-09-25 17:37 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, joro@8bytes.org, anthony@codemonkey.ws, avi@qumranet.com On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote: > >This is a big security hole. With this we give the guest access to its > >own VMCB. The guest can take over or crash the whole host machine by > >rewriting its VMCB. We should be more selective what we save in the > >hsave area. > > Oh, right. I didn't even think of a case where the nested guest would > have acvess to the hsave of itself. Since the hsave can never be used > twice on one vcpu, we could just allocate our own memory for the hsave > in the vcpu context and leave the nested hsave empty. I think we could also gain performance by only saving the important parts of the VMCB and not the whole page. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-25 17:37 ` Joerg Roedel @ 2008-09-25 20:00 ` Alexander Graf 2008-09-25 21:22 ` joro 2008-09-27 12:58 ` Avi Kivity 0 siblings, 2 replies; 30+ messages in thread From: Alexander Graf @ 2008-09-25 20:00 UTC (permalink / raw) To: Joerg Roedel Cc: kvm@vger.kernel.org, joro@8bytes.org, anthony@codemonkey.ws, avi@qumranet.com On 25.09.2008, at 19:37, Joerg Roedel wrote: > On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote: >>> This is a big security hole. With this we give the guest access to >>> its >>> own VMCB. The guest can take over or crash the whole host machine by >>> rewriting its VMCB. We should be more selective what we save in the >>> hsave area. >> >> Oh, right. I didn't even think of a case where the nested guest would >> have acvess to the hsave of itself. Since the hsave can never be used >> twice on one vcpu, we could just allocate our own memory for the >> hsave >> in the vcpu context and leave the nested hsave empty. > > I think we could also gain performance by only saving the important > parts of the VMCB and not the whole page. Is copying one page really that expensive? Is there any accelerated function available for that that copies it with SSE or so? :-) Alex > > > Joerg > > -- > | AMD Saxony Limited Liability Company & Co. KG > Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany > System | Register Court Dresden: HRA 4896 > Research | General Partner authorized to represent: > Center | AMD Saxony LLC (Wilmington, Delaware, US) > | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, > Thomas McCoy > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-25 20:00 ` Alexander Graf @ 2008-09-25 21:22 ` joro 2008-09-27 12:59 ` Avi Kivity 2008-09-27 12:58 ` Avi Kivity 1 sibling, 1 reply; 30+ messages in thread From: joro @ 2008-09-25 21:22 UTC (permalink / raw) To: Alexander Graf Cc: Joerg Roedel, kvm@vger.kernel.org, anthony@codemonkey.ws, avi@qumranet.com On Thu, Sep 25, 2008 at 10:00:17PM +0200, Alexander Graf wrote: > > On 25.09.2008, at 19:37, Joerg Roedel wrote: > > >On Thu, Sep 25, 2008 at 07:32:55PM +0200, Alexander Graf wrote: > >>>This is a big security hole. With this we give the guest access to > >>>its > >>>own VMCB. The guest can take over or crash the whole host machine by > >>>rewriting its VMCB. We should be more selective what we save in the > >>>hsave area. > >> > >>Oh, right. I didn't even think of a case where the nested guest would > >>have acvess to the hsave of itself. Since the hsave can never be used > >>twice on one vcpu, we could just allocate our own memory for the > >>hsave > >>in the vcpu context and leave the nested hsave empty. > > > >I think we could also gain performance by only saving the important > >parts of the VMCB and not the whole page. > > Is copying one page really that expensive? Is there any accelerated > function available for that that copies it with SSE or so? :-) Copying data in memory is always expensive because the accesses may miss in the caches and data must be fetched from memory. As far as I know this can be around 150 cycles per cache line. Joerg > >-- > > | AMD Saxony Limited Liability Company & Co. KG > >Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany > >System | Register Court Dresden: HRA 4896 > >Research | General Partner authorized to represent: > >Center | AMD Saxony LLC (Wilmington, Delaware, US) > > | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, > >Thomas McCoy > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-25 21:22 ` joro @ 2008-09-27 12:59 ` Avi Kivity 0 siblings, 0 replies; 30+ messages in thread From: Avi Kivity @ 2008-09-27 12:59 UTC (permalink / raw) To: joro@8bytes.org Cc: Alexander Graf, Joerg Roedel, kvm@vger.kernel.org, anthony@codemonkey.ws joro@8bytes.org wrote: > Copying data in memory is always expensive because the accesses may miss > in the caches and data must be fetched from memory. As far as I know > this can be around 150 cycles per cache line. > When the copy is sequential, the processor will prefetch the data ahead of time, so you don't get the full hit. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] Add VMRUN handler v3 2008-09-25 20:00 ` Alexander Graf 2008-09-25 21:22 ` joro @ 2008-09-27 12:58 ` Avi Kivity 1 sibling, 0 replies; 30+ messages in thread From: Avi Kivity @ 2008-09-27 12:58 UTC (permalink / raw) To: Alexander Graf Cc: Joerg Roedel, kvm@vger.kernel.org, joro@8bytes.org, anthony@codemonkey.ws Alexander Graf wrote: > > Is copying one page really that expensive? Is there any accelerated > function available for that that copies it with SSE or so? :-) > 'rep movs' is supposed to be accelerated, doing cacheline-by-cacheline copies (at least on Intel). In any case the kernel memcpy() should pick the fastest method. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-17 13:41 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 5/9] Implement hsave v3 Alexander Graf @ 2008-09-25 18:47 ` Joerg Roedel 2008-09-25 19:55 ` Alexander Graf 2008-09-27 12:52 ` Avi Kivity 1 sibling, 2 replies; 30+ messages in thread From: Joerg Roedel @ 2008-09-25 18:47 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, anthony, avi I had another possible idea for performance improvement here. Since we only inject normal interrupts and exceptions (and not NMI and such) we can patch clgi to cli and stgi to sti to save these two intercepts in the guests vmrun path. Any objections/problems with this? On Wed, Sep 17, 2008 at 03:41:21PM +0200, Alexander Graf wrote: > This patch implements the GIF flag and the clgi and stgi instructions that > set this flag. Only if the flag is set (default), interrupts can be received by > the CPU. > > To keep the information about that somewhere, this patch adds a new hidden > flags vector. that is used to store information that does not go into the > vmcb, but is SVM specific. > > v2 moves the hflags to x86 generic code > v3 makes use of the new permission helper > > Signed-off-by: Alexander Graf <agraf@suse.de> > --- > arch/x86/kvm/svm.c | 42 +++++++++++++++++++++++++++++++++++++++--- > include/asm-x86/kvm_host.h | 3 +++ > 2 files changed, 42 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index c72e728..469ecc5 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -612,6 +612,8 @@ static void init_vmcb(struct vcpu_svm *svm) > save->cr4 = 0; > } > force_new_asid(&svm->vcpu); > + > + svm->vcpu.arch.hflags = HF_GIF_MASK; > } > > static int svm_vcpu_reset(struct kvm_vcpu *vcpu) > @@ -1227,6 +1229,36 @@ static int nested_svm_do(struct vcpu_svm *svm, > return retval; > } > > +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) > +{ > + if (nested_svm_check_permissions(svm)) > + return 1; > + > + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; > + skip_emulated_instruction(&svm->vcpu); > + > + svm->vcpu.arch.hflags |= HF_GIF_MASK; > + > + return 1; > +} > + > +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) > +{ > + if (nested_svm_check_permissions(svm)) > + return 1; > + > + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; > + skip_emulated_instruction(&svm->vcpu); > + > + svm->vcpu.arch.hflags &= ~HF_GIF_MASK; > + > + /* After a CLGI no interrupts should come */ > + svm_clear_vintr(svm); > + svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; > + > + return 1; > +} > + > static int invalid_op_interception(struct vcpu_svm *svm, > struct kvm_run *kvm_run) > { > @@ -1521,8 +1553,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, > [SVM_EXIT_VMMCALL] = vmmcall_interception, > [SVM_EXIT_VMLOAD] = invalid_op_interception, > [SVM_EXIT_VMSAVE] = invalid_op_interception, > - [SVM_EXIT_STGI] = invalid_op_interception, > - [SVM_EXIT_CLGI] = invalid_op_interception, > + [SVM_EXIT_STGI] = stgi_interception, > + [SVM_EXIT_CLGI] = clgi_interception, > [SVM_EXIT_SKINIT] = invalid_op_interception, > [SVM_EXIT_WBINVD] = emulate_on_interception, > [SVM_EXIT_MONITOR] = invalid_op_interception, > @@ -1669,6 +1701,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) > if (!kvm_cpu_has_interrupt(vcpu)) > goto out; > > + if (!(svm->vcpu.arch.hflags & HF_GIF_MASK)) > + goto out; > + > if (!(vmcb->save.rflags & X86_EFLAGS_IF) || > (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || > (vmcb->control.event_inj & SVM_EVTINJ_VALID)) { > @@ -1720,7 +1755,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, > > svm->vcpu.arch.interrupt_window_open = > (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) && > - (svm->vmcb->save.rflags & X86_EFLAGS_IF)); > + (svm->vmcb->save.rflags & X86_EFLAGS_IF) && > + (svm->vcpu.arch.hflags & HF_GIF_MASK)); > > if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary) > /* > diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h > index 982b6b2..3e25004 100644 > --- a/include/asm-x86/kvm_host.h > +++ b/include/asm-x86/kvm_host.h > @@ -245,6 +245,7 @@ struct kvm_vcpu_arch { > unsigned long cr3; > unsigned long cr4; > unsigned long cr8; > + u32 hflags; > u64 pdptrs[4]; /* pae */ > u64 shadow_efer; > u64 apic_base; > @@ -734,6 +735,8 @@ enum { > TASK_SWITCH_GATE = 3, > }; > > +#define HF_GIF_MASK (1 << 0) > + > /* > * Hardware virtualization extension instructions may fault if a > * reboot turns off virtualization while processes are running. > -- > 1.5.6 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-25 18:47 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Joerg Roedel @ 2008-09-25 19:55 ` Alexander Graf 2008-09-25 21:27 ` Joerg Roedel 2008-09-27 12:52 ` Avi Kivity 1 sibling, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-25 19:55 UTC (permalink / raw) To: Joerg Roedel; +Cc: kvm, anthony, avi On 25.09.2008, at 20:47, Joerg Roedel wrote: > I had another possible idea for performance improvement here. Since we > only inject normal interrupts and exceptions (and not NMI and such) we > can patch clgi to cli and stgi to sti to save these two intercepts in > the guests vmrun path. > Any objections/problems with this? How do we know if we're allowed to inject interrupts with V_INTR set? Usually IF is on and GIF is off when entering the VM in KVM, so we allow interrupts to arrive even when IF is clear in the guest... > > > On Wed, Sep 17, 2008 at 03:41:21PM +0200, Alexander Graf wrote: >> This patch implements the GIF flag and the clgi and stgi >> instructions that >> set this flag. Only if the flag is set (default), interrupts can be >> received by >> the CPU. >> >> To keep the information about that somewhere, this patch adds a new >> hidden >> flags vector. that is used to store information that does not go >> into the >> vmcb, but is SVM specific. >> >> v2 moves the hflags to x86 generic code >> v3 makes use of the new permission helper >> >> Signed-off-by: Alexander Graf <agraf@suse.de> >> --- >> arch/x86/kvm/svm.c | 42 ++++++++++++++++++++++++++++++++++ >> +++++--- >> include/asm-x86/kvm_host.h | 3 +++ >> 2 files changed, 42 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> index c72e728..469ecc5 100644 >> --- a/arch/x86/kvm/svm.c >> +++ b/arch/x86/kvm/svm.c >> @@ -612,6 +612,8 @@ static void init_vmcb(struct vcpu_svm *svm) >> save->cr4 = 0; >> } >> force_new_asid(&svm->vcpu); >> + >> + svm->vcpu.arch.hflags = HF_GIF_MASK; >> } >> >> static int svm_vcpu_reset(struct kvm_vcpu *vcpu) >> @@ -1227,6 +1229,36 @@ static int nested_svm_do(struct vcpu_svm *svm, >> return retval; >> } >> >> +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run >> *kvm_run) >> +{ >> + if (nested_svm_check_permissions(svm)) >> + return 1; >> + >> + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; >> + skip_emulated_instruction(&svm->vcpu); >> + >> + svm->vcpu.arch.hflags |= HF_GIF_MASK; >> + >> + return 1; >> +} >> + >> +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run >> *kvm_run) >> +{ >> + if (nested_svm_check_permissions(svm)) >> + return 1; >> + >> + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; >> + skip_emulated_instruction(&svm->vcpu); >> + >> + svm->vcpu.arch.hflags &= ~HF_GIF_MASK; >> + >> + /* After a CLGI no interrupts should come */ >> + svm_clear_vintr(svm); >> + svm->vmcb->control.int_ctl &= ~V_IRQ_MASK; >> + >> + return 1; >> +} >> + >> static int invalid_op_interception(struct vcpu_svm *svm, >> struct kvm_run *kvm_run) >> { >> @@ -1521,8 +1553,8 @@ static int (*svm_exit_handlers[])(struct >> vcpu_svm *svm, >> [SVM_EXIT_VMMCALL] = vmmcall_interception, >> [SVM_EXIT_VMLOAD] = invalid_op_interception, >> [SVM_EXIT_VMSAVE] = invalid_op_interception, >> - [SVM_EXIT_STGI] = invalid_op_interception, >> - [SVM_EXIT_CLGI] = invalid_op_interception, >> + [SVM_EXIT_STGI] = stgi_interception, >> + [SVM_EXIT_CLGI] = clgi_interception, >> [SVM_EXIT_SKINIT] = invalid_op_interception, >> [SVM_EXIT_WBINVD] = emulate_on_interception, >> [SVM_EXIT_MONITOR] = invalid_op_interception, >> @@ -1669,6 +1701,9 @@ static void svm_intr_assist(struct kvm_vcpu >> *vcpu) >> if (!kvm_cpu_has_interrupt(vcpu)) >> goto out; >> >> + if (!(svm->vcpu.arch.hflags & HF_GIF_MASK)) >> + goto out; >> + >> if (!(vmcb->save.rflags & X86_EFLAGS_IF) || >> (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) || >> (vmcb->control.event_inj & SVM_EVTINJ_VALID)) { >> @@ -1720,7 +1755,8 @@ static void do_interrupt_requests(struct >> kvm_vcpu *vcpu, >> >> svm->vcpu.arch.interrupt_window_open = >> (!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) && >> - (svm->vmcb->save.rflags & X86_EFLAGS_IF)); >> + (svm->vmcb->save.rflags & X86_EFLAGS_IF) && >> + (svm->vcpu.arch.hflags & HF_GIF_MASK)); >> >> if (svm->vcpu.arch.interrupt_window_open && svm- >> >vcpu.arch.irq_summary) >> /* >> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h >> index 982b6b2..3e25004 100644 >> --- a/include/asm-x86/kvm_host.h >> +++ b/include/asm-x86/kvm_host.h >> @@ -245,6 +245,7 @@ struct kvm_vcpu_arch { >> unsigned long cr3; >> unsigned long cr4; >> unsigned long cr8; >> + u32 hflags; >> u64 pdptrs[4]; /* pae */ >> u64 shadow_efer; >> u64 apic_base; >> @@ -734,6 +735,8 @@ enum { >> TASK_SWITCH_GATE = 3, >> }; >> >> +#define HF_GIF_MASK (1 << 0) >> + >> /* >> * Hardware virtualization extension instructions may fault if a >> * reboot turns off virtualization while processes are running. >> -- >> 1.5.6 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-25 19:55 ` Alexander Graf @ 2008-09-25 21:27 ` Joerg Roedel 2008-09-26 9:01 ` Alexander Graf 0 siblings, 1 reply; 30+ messages in thread From: Joerg Roedel @ 2008-09-25 21:27 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, anthony, avi On Thu, Sep 25, 2008 at 09:55:27PM +0200, Alexander Graf wrote: > > On 25.09.2008, at 20:47, Joerg Roedel wrote: > > >I had another possible idea for performance improvement here. Since we > >only inject normal interrupts and exceptions (and not NMI and such) we > >can patch clgi to cli and stgi to sti to save these two intercepts in > >the guests vmrun path. > >Any objections/problems with this? > > How do we know if we're allowed to inject interrupts with V_INTR set? > Usually IF is on and GIF is off when entering the VM in KVM, so we > allow interrupts to arrive even when IF is clear in the guest... Hmm yes, this is a problem. So this optimization will not work. We need other ways to optimize :) Joerg ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-25 21:27 ` Joerg Roedel @ 2008-09-26 9:01 ` Alexander Graf 2008-09-27 12:55 ` Avi Kivity 0 siblings, 1 reply; 30+ messages in thread From: Alexander Graf @ 2008-09-26 9:01 UTC (permalink / raw) To: Joerg Roedel; +Cc: kvm, anthony, avi On 25.09.2008, at 23:27, Joerg Roedel wrote: > On Thu, Sep 25, 2008 at 09:55:27PM +0200, Alexander Graf wrote: >> >> On 25.09.2008, at 20:47, Joerg Roedel wrote: >> >>> I had another possible idea for performance improvement here. >>> Since we >>> only inject normal interrupts and exceptions (and not NMI and >>> such) we >>> can patch clgi to cli and stgi to sti to save these two intercepts >>> in >>> the guests vmrun path. >>> Any objections/problems with this? >> >> How do we know if we're allowed to inject interrupts with V_INTR set? >> Usually IF is on and GIF is off when entering the VM in KVM, so we >> allow interrupts to arrive even when IF is clear in the guest... > > Hmm yes, this is a problem. So this optimization will not work. We > need > other ways to optimize :) Well it would work for the KVM-in-KVM case, where we know that VMRUN is always triggered with IF=1 and V_INTR=1. The only case that hack fails is when we have IF=0 and V_INTR=1. Everything else should work just fine. And in this case we would simply issue some VMEXITs 0x60, so no big deal IMHO. It should be worth the tradeoff of making most VMMs a lot faster. There should be a compile-option to enable the "correct" behavior though. If we join that with the VMLOAD and VMSAVE hack there would be only the VMRUN and DR exits left. That sounds like a really good improvement where I wouldn't mind to break some specs :-). Alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-26 9:01 ` Alexander Graf @ 2008-09-27 12:55 ` Avi Kivity 0 siblings, 0 replies; 30+ messages in thread From: Avi Kivity @ 2008-09-27 12:55 UTC (permalink / raw) To: Alexander Graf; +Cc: Joerg Roedel, kvm, anthony Alexander Graf wrote: >> >> Hmm yes, this is a problem. So this optimization will not work. We need >> other ways to optimize :) > > Well it would work for the KVM-in-KVM case, where we know that VMRUN > is always triggered with IF=1 and V_INTR=1. The only case that hack > fails is when we have IF=0 and V_INTR=1. Everything else should work > just fine. And in this case we would simply issue some VMEXITs 0x60, > so no big deal IMHO. It should be worth the tradeoff of making most > VMMs a lot faster. > > There should be a compile-option to enable the "correct" behavior > though. If we join that with the VMLOAD and VMSAVE hack there would be > only the VMRUN and DR exits left. That sounds like a really good > improvement where I wouldn't mind to break some specs :-). Maybe a hypercall, so it can be enabled on a guest-by-guest basis. I must say that if we do guest-specific hacking this way, a paravirt approach doesn't look so bad. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] Implement GIF, clgi and stgi v3 2008-09-25 18:47 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Joerg Roedel 2008-09-25 19:55 ` Alexander Graf @ 2008-09-27 12:52 ` Avi Kivity 1 sibling, 0 replies; 30+ messages in thread From: Avi Kivity @ 2008-09-27 12:52 UTC (permalink / raw) To: Joerg Roedel; +Cc: Alexander Graf, kvm, anthony Joerg Roedel wrote: > I had another possible idea for performance improvement here. Since we > only inject normal interrupts and exceptions (and not NMI and such) we > can patch clgi to cli and stgi to sti to save these two intercepts in > the guests vmrun path. > Any objections/problems with this? > > The sequence 'clgi; sti' will break. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] Add support for nested SVM (kernel) v3 2008-09-17 13:41 [PATCH 0/9] Add support for nested SVM (kernel) v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 1/9] Add CPUID feature flag for SVM v3 Alexander Graf @ 2008-09-19 14:36 ` Joerg Roedel 2008-09-19 14:39 ` Joerg Roedel 2008-09-19 15:56 ` Joerg Roedel 2008-09-19 21:48 ` First performance numbers Joerg Roedel 2 siblings, 2 replies; 30+ messages in thread From: Joerg Roedel @ 2008-09-19 14:36 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, joro, anthony, avi On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote: > To be usable, this patchset requires the two simple changes in the userspace > part, that I sent to the list with the first version. > > Thanks for reviewing! Ok, with the patch attached applied on-top of your patches I got a recent KVM running inside KVM. And it doesn't feel very slow :-) I will do some benchmarks in the next days to get real numbers. The patches look good so far. But I think for now we should disable the feature by default and allow enabling it from userspace until we are sure we don't introduce any security hole and don't destroy migration with it. We can add a -nested-virt parameter to qemu to enable it for the guest then. Another thing missing is the SVM feature CPUID function. It is used to find out the number of ASIDs available. But this is a minor issue as long as we only run KVM inside KVM. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] Add support for nested SVM (kernel) v3 2008-09-19 14:36 ` [PATCH 0/9] Add support for nested SVM (kernel) v3 Joerg Roedel @ 2008-09-19 14:39 ` Joerg Roedel 2008-09-19 15:56 ` Joerg Roedel 1 sibling, 0 replies; 30+ messages in thread From: Joerg Roedel @ 2008-09-19 14:39 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, joro, anthony, avi On Fri, Sep 19, 2008 at 04:36:00PM +0200, Joerg Roedel wrote: > On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote: > > To be usable, this patchset requires the two simple changes in the userspace > > part, that I sent to the list with the first version. > > > > Thanks for reviewing! > > Ok, with the patch attached applied on-top of your patches I got a > recent KVM running inside KVM. And it doesn't feel very slow :-) > I will do some benchmarks in the next days to get real numbers. The > patches look good so far. > But I think for now we should disable the feature by default and allow > enabling it from userspace until we are sure we don't introduce any > security hole and don't destroy migration with it. We can add a > -nested-virt parameter to qemu to enable it for the guest then. > Another thing missing is the SVM feature CPUID function. It is used to > find out the number of ASIDs available. But this is a minor issue as > long as we only run KVM inside KVM. Oh, forgot the patch. Here is it: >From 15c4e38288cdaa6d142e94e77025dfd097d63a17 Mon Sep 17 00:00:00 2001 From: Joerg Roedel <jroedel@brandon.amd.com> Date: Sat, 20 Sep 2008 00:30:25 +0200 Subject: [PATCH] KVM: nested-svm-fix: allow read access to MSR_VM_VR KVM tries to read the VM_CR MSR to find out if SVM was disabled by the BIOS. So implement read support for this MSR to make nested SVM running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/kvm/svm.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 062ded6..7b91c74 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1929,6 +1929,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_VM_HSAVE_PA: *data = svm->nested_hsave; break; + case MSR_VM_CR: + *data = 0; + break; default: return kvm_get_msr_common(vcpu, ecx, data); } -- 1.5.5.1 -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] Add support for nested SVM (kernel) v3 2008-09-19 14:36 ` [PATCH 0/9] Add support for nested SVM (kernel) v3 Joerg Roedel 2008-09-19 14:39 ` Joerg Roedel @ 2008-09-19 15:56 ` Joerg Roedel 2008-10-15 17:07 ` Alexander Graf 1 sibling, 1 reply; 30+ messages in thread From: Joerg Roedel @ 2008-09-19 15:56 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, joro, anthony, avi On Fri, Sep 19, 2008 at 04:36:00PM +0200, Joerg Roedel wrote: > On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote: > > To be usable, this patchset requires the two simple changes in the userspace > > part, that I sent to the list with the first version. > > > > Thanks for reviewing! > > Ok, with the patch attached applied on-top of your patches I got a > recent KVM running inside KVM. And it doesn't feel very slow :-) > I will do some benchmarks in the next days to get real numbers. The > patches look good so far. > But I think for now we should disable the feature by default and allow > enabling it from userspace until we are sure we don't introduce any > security hole and don't destroy migration with it. We can add a > -nested-virt parameter to qemu to enable it for the guest then. > Another thing missing is the SVM feature CPUID function. It is used to > find out the number of ASIDs available. But this is a minor issue as > long as we only run KVM inside KVM. Ok, further testing showed two issues so far: - guest timing seems not to work, the printk timing information in ubuntu stay at 0.00000 all the time - smp for the second level guest does not work, it crashes also the first level guest Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/9] Add support for nested SVM (kernel) v3 2008-09-19 15:56 ` Joerg Roedel @ 2008-10-15 17:07 ` Alexander Graf 0 siblings, 0 replies; 30+ messages in thread From: Alexander Graf @ 2008-10-15 17:07 UTC (permalink / raw) To: Joerg Roedel; +Cc: kvm, joro, anthony, avi On 19.09.2008, at 17:56, Joerg Roedel wrote: > On Fri, Sep 19, 2008 at 04:36:00PM +0200, Joerg Roedel wrote: >> On Wed, Sep 17, 2008 at 03:41:17PM +0200, Alexander Graf wrote: >>> To be usable, this patchset requires the two simple changes in the >>> userspace >>> part, that I sent to the list with the first version. >>> >>> Thanks for reviewing! >> >> Ok, with the patch attached applied on-top of your patches I got a >> recent KVM running inside KVM. And it doesn't feel very slow :-) >> I will do some benchmarks in the next days to get real numbers. The >> patches look good so far. >> But I think for now we should disable the feature by default and >> allow >> enabling it from userspace until we are sure we don't introduce any >> security hole and don't destroy migration with it. We can add a >> -nested-virt parameter to qemu to enable it for the guest then. >> Another thing missing is the SVM feature CPUID function. It is used >> to >> find out the number of ASIDs available. But this is a minor issue as >> long as we only run KVM inside KVM. > > Ok, further testing showed two issues so far: > > - guest timing seems not to work, the printk timing information in > ubuntu > stay at 0.00000 all the time Hum, I just tried again and it seems to work - I see the numbers increasing when booting from a 8.10 Ubuntu CD. > - smp for the second level guest does not work, it crashes also the > first level guest This works when using -no-kvm-irqchip for the l2 guest. Any ideas? Alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* First performance numbers 2008-09-17 13:41 [PATCH 0/9] Add support for nested SVM (kernel) v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 1/9] Add CPUID feature flag for SVM v3 Alexander Graf 2008-09-19 14:36 ` [PATCH 0/9] Add support for nested SVM (kernel) v3 Joerg Roedel @ 2008-09-19 21:48 ` Joerg Roedel 2008-09-20 1:30 ` Avi Kivity 2 siblings, 1 reply; 30+ messages in thread From: Joerg Roedel @ 2008-09-19 21:48 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, joro, anthony, avi Ok, here are some performance numbers for nested svm. I ran kernbench -M on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests do currently not work). I measured simple virtualization with a shadow paging guest on bare metal and within a nested guest (same guest image) on a nested paging enabled first level guest. | Shadow Guest (100%) | Nested Guest (X) | X -----------------+---------------------+-------------------+-------- Elapsed Time | 553.244 (1.21208) | 1185.95 (20.0365) | 214.363% User Time | 407.728 (0.987279) | 520.434 (8.55643) | 127.642% System Time | 144.828 (0.480645) | 664.528 (11.6648) | 458.839% Percent CPU | 99 (0) | 99 (0) | 100.000% Context Switches | 98265.2 (183.001) | 220015 (3302.74) | 223.899% Sleeps | 49397.8 (31.0274) | 49460.2 (364.84) | 100.126% So we have an overall slowdown in the first nesting level of more than 50%. Mostly because we spend so much time in the system level. Seems there is some work to do for performance improvements :-) Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: First performance numbers 2008-09-19 21:48 ` First performance numbers Joerg Roedel @ 2008-09-20 1:30 ` Avi Kivity 2008-09-20 6:55 ` Joerg Roedel 0 siblings, 1 reply; 30+ messages in thread From: Avi Kivity @ 2008-09-20 1:30 UTC (permalink / raw) To: Joerg Roedel; +Cc: Alexander Graf, kvm, joro, anthony, avi Joerg Roedel wrote: > Ok, here are some performance numbers for nested svm. I ran kernbench -M > on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests > do currently not work). I measured simple virtualization with a shadow > paging guest on bare metal and within a nested guest (same guest image) > on a nested paging enabled first level guest. > > | Shadow Guest (100%) | Nested Guest (X) | X > -----------------+---------------------+-------------------+-------- > Elapsed Time | 553.244 (1.21208) | 1185.95 (20.0365) | 214.363% > User Time | 407.728 (0.987279) | 520.434 (8.55643) | 127.642% > System Time | 144.828 (0.480645) | 664.528 (11.6648) | 458.839% > Percent CPU | 99 (0) | 99 (0) | 100.000% > Context Switches | 98265.2 (183.001) | 220015 (3302.74) | 223.899% > Sleeps | 49397.8 (31.0274) | 49460.2 (364.84) | 100.126% > > So we have an overall slowdown in the first nesting level of more than > 50%. Mostly because we spend so much time in the system level. Seems > there is some work to do for performance improvements :-) > > Do you have kvm_stat output for the two cases? Also interesting to run kvm_stat on both guest and host. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: First performance numbers 2008-09-20 1:30 ` Avi Kivity @ 2008-09-20 6:55 ` Joerg Roedel 0 siblings, 0 replies; 30+ messages in thread From: Joerg Roedel @ 2008-09-20 6:55 UTC (permalink / raw) To: Avi Kivity; +Cc: Joerg Roedel, Alexander Graf, kvm, anthony, avi On Fri, Sep 19, 2008 at 06:30:50PM -0700, Avi Kivity wrote: > Joerg Roedel wrote: > >Ok, here are some performance numbers for nested svm. I ran kernbench -M > >on a virtual machine with 4G RAM and 1 VCPU (since nesting SMP guests > >do currently not work). I measured simple virtualization with a shadow > >paging guest on bare metal and within a nested guest (same guest image) > >on a nested paging enabled first level guest. > > > > | Shadow Guest (100%) | Nested Guest (X) | X > >-----------------+---------------------+-------------------+-------- > >Elapsed Time | 553.244 (1.21208) | 1185.95 (20.0365) | 214.363% > >User Time | 407.728 (0.987279) | 520.434 (8.55643) | 127.642% > >System Time | 144.828 (0.480645) | 664.528 (11.6648) | 458.839% > >Percent CPU | 99 (0) | 99 (0) | 100.000% > >Context Switches | 98265.2 (183.001) | 220015 (3302.74) | 223.899% > >Sleeps | 49397.8 (31.0274) | 49460.2 (364.84) | 100.126% > > > >So we have an overall slowdown in the first nesting level of more than > >50%. Mostly because we spend so much time in the system level. Seems > >there is some work to do for performance improvements :-) > > > > > > Do you have kvm_stat output for the two cases? Also interesting to run > kvm_stat on both guest and host. Sorry, no. But I can repeat the measurements and gather these numbers. Joerg ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-10-15 17:08 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-17 13:41 [PATCH 0/9] Add support for nested SVM (kernel) v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 1/9] Add CPUID feature flag for SVM v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 2/9] Clean up VINTR setting v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 3/9] Add helper functions for nested SVM v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 5/9] Implement hsave v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 7/9] Add VMRUN handler v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 8/9] Add VMEXIT handler and intercepts v3 Alexander Graf 2008-09-17 13:41 ` [PATCH 9/9] Allow setting the SVME bit v3 Alexander Graf 2008-09-19 15:59 ` [PATCH 7/9] Add VMRUN handler v3 Joerg Roedel 2008-09-25 17:32 ` Alexander Graf 2008-09-25 17:37 ` Joerg Roedel 2008-09-25 20:00 ` Alexander Graf 2008-09-25 21:22 ` joro 2008-09-27 12:59 ` Avi Kivity 2008-09-27 12:58 ` Avi Kivity 2008-09-25 18:47 ` [PATCH 4/9] Implement GIF, clgi and stgi v3 Joerg Roedel 2008-09-25 19:55 ` Alexander Graf 2008-09-25 21:27 ` Joerg Roedel 2008-09-26 9:01 ` Alexander Graf 2008-09-27 12:55 ` Avi Kivity 2008-09-27 12:52 ` Avi Kivity 2008-09-19 14:36 ` [PATCH 0/9] Add support for nested SVM (kernel) v3 Joerg Roedel 2008-09-19 14:39 ` Joerg Roedel 2008-09-19 15:56 ` Joerg Roedel 2008-10-15 17:07 ` Alexander Graf 2008-09-19 21:48 ` First performance numbers Joerg Roedel 2008-09-20 1:30 ` Avi Kivity 2008-09-20 6:55 ` Joerg Roedel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).