[PATCH 0/9] Add support for nested SVM (kernel) v2

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] Add support for nested SVM (kernel) v2
@ 2008-09-05  7:51 Alexander Graf
  2008-09-05  7:51 ` [PATCH 1/9] Add CPUID feature flag for SVM v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

The current generation of virtualization extensions only supports one VM layer.
While we can't change that, it is pretty easy to emulate the CPU's behavior
and implement the virtualization opcodes ourselves.

This patchset does exactly this for SVM. Using this, a KVM can run within a VM.
Since we're emulating the real CPU's behavior, this should also enable other
VMMs to run within KVM.
So far I've only tested to run KVM inside the VM though.

This was created with help from Joerg Roedel. Since he's not fond of the GIF
handling as it is now, there is no signed-off-by from him for now.

As always, comments and suggestions are highly welcome.

This is v2, taking most comments from Avi into account. Two things are missing:

- save/restore of hflags
- #VMEXIT on save/restore

I'd rather have them submitted as later patches though, so we don't clutter
this already big patchset (I'm slowly getting crazy using git to manage it)

Thanks for reviewing!

Alex

Alexander Graf (9):
  Add CPUID feature flag for SVM v2
  Clean up VINTR setting v2
  Implement GIF, clgi and stgi v2
  Add helper functions for nested SVM v2
  Implement hsave v2
  Add VMLOAD and VMSAVE handlers v2
  Add VMRUN handler v2
  Add VMEXIT handler and intercepts v2
  Allow setting the SVME bit v2

 arch/x86/kvm/kvm_svm.h       |   11 +
 arch/x86/kvm/svm.c           |  741 +++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/cpufeature.h |    1 +
 include/asm-x86/kvm_host.h   |    4 +
 4 files changed, 745 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/9] Add CPUID feature flag for SVM v2
  2008-09-05  7:51 [PATCH 0/9] Add support for nested SVM (kernel) v2 Alexander Graf
@ 2008-09-05  7:51 ` Alexander Graf
  2008-09-05  7:51   ` [PATCH 2/9] Clean up VINTR setting v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch adds the CPUID feature flag for SVM in the x86 Linux headers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 include/asm-x86/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index 762f6a6..bfb49fc 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -108,6 +108,7 @@
 /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
 #define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading not valid */
+#define X86_FEATURE_SVM		(6*32+ 2) /* Secure Virtual Machine */
 #define X86_FEATURE_IBS		(6*32+ 10) /* Instruction Based Sampling */
 
 /*
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/9] Clean up VINTR setting v2
  2008-09-05  7:51 ` [PATCH 1/9] Add CPUID feature flag for SVM v2 Alexander Graf
@ 2008-09-05  7:51   ` Alexander Graf
  2008-09-05  7:51     ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.

v2 uses two distinct functions for setting and clearing the bit

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   20 +++++++++++++++-----
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 961dc3b..d9b18cc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -729,6 +729,16 @@ static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	to_svm(vcpu)->vmcb->save.rflags = rflags;
 }
 
+static void svm_set_vintr(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.intercept |= 1ULL << INTERCEPT_VINTR;
+}
+
+static void svm_clear_vintr(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+}
+
 static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg)
 {
 	struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
@@ -1363,7 +1373,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm,
 {
 	KVMTRACE_0D(PEND_INTR, &svm->vcpu, handler);
 
-	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+	svm_clear_vintr(svm);
 	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
 	/*
 	 * If the user space waits to inject interrupts, exit as soon as
@@ -1575,7 +1585,7 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
 		/* unable to deliver irq, set pending irq */
-		vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm);
 		svm_inject_irq(svm, 0x0);
 		goto out;
 	}
@@ -1635,9 +1645,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	 */
 	if (!svm->vcpu.arch.interrupt_window_open &&
 	    (svm->vcpu.arch.irq_summary || kvm_run->request_interrupt_window))
-		control->intercept |= 1ULL << INTERCEPT_VINTR;
-	 else
-		control->intercept &= ~(1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm);
+	else
+		svm_clear_vintr(svm);
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/9] Implement GIF, clgi and stgi v2
  2008-09-05  7:51   ` [PATCH 2/9] Clean up VINTR setting v2 Alexander Graf
@ 2008-09-05  7:51     ` Alexander Graf
  2008-09-05  7:51       ` [PATCH 4/9] Add helper functions for nested SVM v2 Alexander Graf
  2008-09-10 19:06       ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Joerg Roedel
  0 siblings, 2 replies; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.

To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.

v2 moves the hflags to x86 generic code

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c         |   50 +++++++++++++++++++++++++++++++++++++++++--
 include/asm-x86/kvm_host.h |    3 ++
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d9b18cc..a6d9a74 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -602,6 +602,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		save->cr4 = 0;
 	}
 	force_new_asid(&svm->vcpu);
+
+	svm->vcpu.arch.hflags = HF_GIF_MASK;
 }
 
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -1139,6 +1141,44 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	svm->vcpu.arch.hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
+static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+
+	/* After a CLGI no interrupts should come */
+	svm_clear_vintr(svm);
+	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+
+	return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
@@ -1433,8 +1473,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
 	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
-	[SVM_EXIT_STGI]				= invalid_op_interception,
-	[SVM_EXIT_CLGI]				= invalid_op_interception,
+	[SVM_EXIT_STGI]				= stgi_interception,
+	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
 	[SVM_EXIT_WBINVD]                       = emulate_on_interception,
 	[SVM_EXIT_MONITOR]			= invalid_op_interception,
@@ -1581,6 +1621,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
+		goto out;
+
 	if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
@@ -1632,7 +1675,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
-		 (svm->vmcb->save.rflags & X86_EFLAGS_IF));
+		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
+		 (svm->vcpu.arch.hflags & HF_GIF_MASK));
 
 	if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary)
 		/*
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 982b6b2..3e25004 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -245,6 +245,7 @@ struct kvm_vcpu_arch {
 	unsigned long cr3;
 	unsigned long cr4;
 	unsigned long cr8;
+	u32 hflags;
 	u64 pdptrs[4]; /* pae */
 	u64 shadow_efer;
 	u64 apic_base;
@@ -734,6 +735,8 @@ enum {
 	TASK_SWITCH_GATE = 3,
 };
 
+#define HF_GIF_MASK		(1 << 0)
+
 /*
  * Hardware virtualization extension instructions may fault if a
  * reboot turns off virtualization while processes are running.
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/9] Add helper functions for nested SVM v2
  2008-09-05  7:51     ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Alexander Graf
@ 2008-09-05  7:51       ` Alexander Graf
  2008-09-05  7:51         ` [PATCH 5/9] Implement hsave v2 Alexander Graf
  2008-09-10 19:06       ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Joerg Roedel
  1 sibling, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

These are helpers for the nested SVM implementation.

- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a6d9a74..fe42916 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -48,6 +48,16 @@ MODULE_LICENSE("GPL");
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
+/* Turn on to get debugging output*/
+/* #define NESTED_DEBUG */
+
+#ifdef NESTED_DEBUG
+#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
+#else
+static inline void nsvm_printk(char *fmt, ...) {
+}
+#endif
+
 /* enable NPT for AMD64 and X86 with PAE */
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 static bool npt_enabled = true;
@@ -1141,6 +1151,72 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
+{
+	struct page *page;
+
+	down_read(&current->mm->mmap_sem);
+	page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT);
+	up_read(&current->mm->mmap_sem);
+
+	if (is_error_page(page)) {
+		printk(KERN_ERR "%s: could not find page at 0x%llx\n",
+		       __func__, gpa);
+		kvm_release_page_clean(page);
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return NULL;
+	}
+	return page;
+}
+
+static int nested_svm_do(struct vcpu_svm *svm,
+			 u64 arg1_gpa, u64 arg2_gpa, void *opaque,
+			 int (*handler)(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque))
+{
+	struct page *arg1_page;
+	struct page *arg2_page = NULL;
+	void *arg1;
+	void *arg2 = NULL;
+	int retval;
+
+	if (!(svm->vcpu.arch.shadow_efer & MSR_EFER_SVME_MASK)
+	    || !is_paging(&svm->vcpu)) {
+		kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	arg1_page = nested_svm_get_page(svm, arg1_gpa);
+	if(arg1_page == NULL)
+		return 1;
+
+	if (arg2_gpa) {
+		arg2_page = nested_svm_get_page(svm, arg2_gpa);
+		if(arg2_page == NULL) {
+			kvm_release_page_clean(arg1_page);
+			return 1;
+		}
+	}
+
+	arg1 = kmap_atomic(arg1_page, KM_USER0);
+	if (arg2_gpa)
+		arg2 = kmap_atomic(arg2_page, KM_USER1);
+
+	retval = handler(svm, arg1, arg2, opaque);
+
+	kunmap_atomic(arg1, KM_USER0);
+	if (arg2_gpa)
+		kunmap_atomic(arg2, KM_USER1);
+
+	kvm_release_page_dirty(arg1_page);
+	if (arg2_gpa)
+		kvm_release_page_dirty(arg2_page);
+
+	return retval;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	if (svm->vmcb->save.cpl) {
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/9] Implement hsave v2
  2008-09-05  7:51       ` [PATCH 4/9] Add helper functions for nested SVM v2 Alexander Graf
@ 2008-09-05  7:51         ` Alexander Graf
  2008-09-05  7:51           ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.

v2 allows userspace to save/restore hsave

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h |    2 ++
 arch/x86/kvm/svm.c     |    7 +++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 65ef0fc..76ad107 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -41,6 +41,8 @@ struct vcpu_svm {
 	unsigned long host_dr7;
 
 	u32 *msrpm;
+
+	u64 nested_hsave;
 };
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index fe42916..48f6d88 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -613,6 +613,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	}
 	force_new_asid(&svm->vcpu);
 
+	svm->nested_hsave = 0;
 	svm->vcpu.arch.hflags = HF_GIF_MASK;
 }
 
@@ -1359,6 +1360,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 	case MSR_IA32_LASTINTTOIP:
 		*data = svm->vmcb->save.last_excp_to;
 		break;
+	case MSR_VM_HSAVE_PA:
+		*data = svm->nested_hsave;
+		break;
 	default:
 		return kvm_get_msr_common(vcpu, ecx, data);
 	}
@@ -1453,6 +1457,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 		pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", ecx, data);
 
 		break;
+	case MSR_VM_HSAVE_PA:
+		svm->nested_hsave = data;
+		break;
 	default:
 		return kvm_set_msr_common(vcpu, ecx, data);
 	}
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/9] Add VMLOAD and VMSAVE handlers v2
  2008-09-05  7:51         ` [PATCH 5/9] Implement hsave v2 Alexander Graf
@ 2008-09-05  7:51           ` Alexander Graf
  2008-09-05  7:51             ` [PATCH 7/9] Add VMRUN handler v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This implements the VMLOAD and VMSAVE instructions, that usually surround
the VMRUN instructions. Both instructions load / restore the same elements,
so we only need to implement them once.

v2 fixes CPL checking and replaces memcpy by assignments

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 48f6d88..43ca2d8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1218,6 +1218,71 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+{
+	to_vmcb->save.fs = from_vmcb->save.fs;
+	to_vmcb->save.gs = from_vmcb->save.gs;
+	to_vmcb->save.tr = from_vmcb->save.tr;
+	to_vmcb->save.ldtr = from_vmcb->save.ldtr;
+	to_vmcb->save.kernel_gs_base = from_vmcb->save.kernel_gs_base;
+	to_vmcb->save.star = from_vmcb->save.star;
+	to_vmcb->save.lstar = from_vmcb->save.lstar;
+	to_vmcb->save.cstar = from_vmcb->save.cstar;
+	to_vmcb->save.sfmask = from_vmcb->save.sfmask;
+	to_vmcb->save.sysenter_cs = from_vmcb->save.sysenter_cs;
+	to_vmcb->save.sysenter_esp = from_vmcb->save.sysenter_esp;
+	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
+
+	return 1;
+}
+
+static int nested_svm_vmload(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave((struct vmcb *)nested_vmcb, svm->vmcb);
+}
+
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave(svm->vmcb, (struct vmcb *)nested_vmcb);
+}
+
+static int vmload_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl,
+		       kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmload);
+
+	return 1;
+}
+
+static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmsave);
+
+	return 1;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	if (svm->vmcb->save.cpl) {
@@ -1554,8 +1619,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
 	[SVM_EXIT_VMRUN]			= invalid_op_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
-	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
-	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
+	[SVM_EXIT_VMLOAD]			= vmload_interception,
+	[SVM_EXIT_VMSAVE]			= vmsave_interception,
 	[SVM_EXIT_STGI]				= stgi_interception,
 	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/9] Add VMRUN handler v2
  2008-09-05  7:51           ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v2 Alexander Graf
@ 2008-09-05  7:51             ` Alexander Graf
  2008-09-05  7:51               ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Alexander Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do it.

We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.

v2 implements the following improvements:

- fixes the CPL check
- does not allocate iopm when not used
- remembers the host's IF in the HIF bit in the hflags

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h     |    9 ++
 arch/x86/kvm/svm.c         |  198 +++++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/kvm_host.h |    1 +
 3 files changed, 206 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 76ad107..2afe0ce 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -43,6 +43,15 @@ struct vcpu_svm {
 	u32 *msrpm;
 
 	u64 nested_hsave;
+	u64 nested_vmcb;
+
+	/* These are the merged vectors */
+	u32 *nested_msrpm;
+	u32 *nested_iopm;
+
+	/* gpa pointers to the real vectors */
+	u64 nested_vmcb_msrpm;
+	u64 nested_vmcb_iopm;
 };
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 43ca2d8..c47f039 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
 /* Turn on to get debugging output*/
 /* #define NESTED_DEBUG */
 
+/* Not needed until device passthrough */
+/* #define NESTED_KVM_MERGE_IOPM */
+
 #ifdef NESTED_DEBUG
 #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
 #else
@@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_svm, vcpu);
 }
 
+static inline bool is_nested(struct vcpu_svm *svm)
+{
+	return svm->nested_vmcb;
+}
+
 static unsigned long iopm_base;
 
 struct kvm_ldttss_desc {
@@ -614,6 +622,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	force_new_asid(&svm->vcpu);
 
 	svm->nested_hsave = 0;
+	svm->nested_vmcb = 0;
 	svm->vcpu.arch.hflags = HF_GIF_MASK;
 }
 
@@ -639,6 +648,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	struct vcpu_svm *svm;
 	struct page *page;
 	struct page *msrpm_pages;
+	struct page *nested_msrpm_pages;
+#ifdef NESTED_KVM_MERGE_IOPM
+	struct page *nested_iopm_pages;
+#endif
 	int err;
 
 	svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -661,9 +674,25 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
 	if (!msrpm_pages)
 		goto uninit;
+
+	nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
+	if (!nested_msrpm_pages)
+		goto uninit;
+
+#ifdef NESTED_KVM_MERGE_IOPM
+	nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+	if (!nested_iopm_pages)
+		goto uninit;
+#endif
+
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
 
+	svm->nested_msrpm = page_address(nested_msrpm_pages);
+#ifdef NESTED_KVM_MERGE_IOPM
+	svm->nested_iopm = page_address(nested_iopm_pages);
+#endif
+
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -693,6 +722,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 
 	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
+	__free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER);
+#ifdef NESTED_KVM_MERGE_IOPM
+	__free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER);
+#endif
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -1218,6 +1251,133 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+
+static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_msrpm = (u32*)arg1;
+	for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
+	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1,
+				 void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_iopm = (u32*)arg1;
+	u32 *iopm = (u32*)__va(iopm_base);
+	for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_iopm[i] = iopm[i] | nested_iopm[i];
+	svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1,
+			    void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = (struct vmcb *)arg2;
+
+	/* nested_vmcb is our indicator if nested SVM is activated */
+	svm->nested_vmcb = svm->vmcb->save.rax;
+
+	/* Clear internal status */
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Save the old vmcb, so we don't need to pick what we save, but
+	   can restore everything when a VMEXIT occurs */
+	memcpy(hsave, svm->vmcb, sizeof(struct vmcb));
+	/* We need to remember the original CR3 in the SPT case */
+	if (!npt_enabled)
+		hsave->save.cr3 = svm->vcpu.arch.cr3;
+	hsave->save.rip = svm->next_rip;
+
+	if (svm->vmcb->save.rflags & X86_EFLAGS_IF)
+		svm->vcpu.arch.hflags |= HF_HIF_MASK;
+	else
+		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
+
+	/* Load the nested guest state */
+	svm->vmcb->save.es = nested_vmcb->save.es;
+	svm->vmcb->save.cs = nested_vmcb->save.cs;
+	svm->vmcb->save.ss = nested_vmcb->save.ss;
+	svm->vmcb->save.ds = nested_vmcb->save.ds;
+	svm->vmcb->save.gdtr = nested_vmcb->save.gdtr;
+	svm->vmcb->save.idtr = nested_vmcb->save.idtr;
+	svm->vmcb->save.rflags = nested_vmcb->save.rflags;
+	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
+	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
+	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = nested_vmcb->save.cr3;
+		svm->vcpu.arch.cr3 = nested_vmcb->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
+		kvm_mmu_reset_context(&svm->vcpu);
+	}
+	svm->vmcb->save.cr2 = nested_vmcb->save.cr2;
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip);
+	/* In case we don't even reach vcpu_run, the fields are not updated */
+	svm->vmcb->save.rax = nested_vmcb->save.rax;
+	svm->vmcb->save.rsp = nested_vmcb->save.rsp;
+	svm->vmcb->save.rip = nested_vmcb->save.rip;
+	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
+	svm->vmcb->save.dr6 = nested_vmcb->save.dr6;
+	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
+
+	/* We don't want a nested guest to be more powerful than the guest,
+	   so all intercepts are ORed */
+	svm->vmcb->control.intercept_cr_read |=
+		nested_vmcb->control.intercept_cr_read;
+	svm->vmcb->control.intercept_cr_write |=
+		nested_vmcb->control.intercept_cr_write;
+	svm->vmcb->control.intercept_dr_read |=
+		nested_vmcb->control.intercept_dr_read;
+	svm->vmcb->control.intercept_dr_write |=
+		nested_vmcb->control.intercept_dr_write;
+	svm->vmcb->control.intercept_exceptions |=
+		nested_vmcb->control.intercept_exceptions;
+
+	svm->vmcb->control.intercept |= nested_vmcb->control.intercept;
+
+	svm->nested_vmcb_msrpm = nested_vmcb->control.msrpm_base_pa;
+	svm->nested_vmcb_iopm = nested_vmcb->control.iopm_base_pa;
+
+	force_new_asid(&svm->vcpu);
+	svm->vmcb->control.exit_int_info = nested_vmcb->control.exit_int_info;
+	svm->vmcb->control.exit_int_info_err = nested_vmcb->control.exit_int_info_err;
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
+	if (nested_vmcb->control.int_ctl & V_IRQ_MASK) {
+		nsvm_printk("nSVM Injecting Interrupt: 0x%x\n",
+				nested_vmcb->control.int_ctl);
+	}
+	nsvm_printk("nSVM exit_int_info: 0x%x | int_state: 0x%x\n",
+			nested_vmcb->control.exit_int_info,
+			nested_vmcb->control.int_state);
+
+	svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
+	svm->vmcb->control.int_state = nested_vmcb->control.int_state;
+	svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
+	if (nested_vmcb->control.event_inj & SVM_EVTINJ_VALID)
+		nsvm_printk("Injecting Event: 0x%x\n",
+				nested_vmcb->control.event_inj);
+	svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
+	svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
+
+	svm->vcpu.arch.hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
 static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
 {
 	to_vmcb->save.fs = from_vmcb->save.fs;
@@ -1283,6 +1443,39 @@ static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	nsvm_printk("VMrun\n");
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (nested_svm_do(svm, svm->vmcb->save.rax, svm->nested_hsave,
+			  NULL, nested_svm_vmrun))
+		return 1;
+
+	/* Right now SVM always intecepts all IOs except for the DEBUG port.
+	   This is fine for the nested VM too IMHO. */
+#ifdef NESTED_KVM_MERGE_IOPM
+	if (nested_svm_do(svm, svm->vmcb->control.iopm_base_pa, 0,
+		      NULL, nested_svm_vmrun_iopm))
+		return 1;
+#endif
+
+	if (nested_svm_do(svm, svm->vmcb->control.msrpm_base_pa, 0,
+		      NULL, nested_svm_vmrun_msrpm))
+		return 1;
+
+	return 1;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	if (svm->vmcb->save.cpl) {
@@ -1617,7 +1810,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_MSR]				= msr_interception,
 	[SVM_EXIT_TASK_SWITCH]			= task_switch_interception,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
-	[SVM_EXIT_VMRUN]			= invalid_op_interception,
+	[SVM_EXIT_VMRUN]			= vmrun_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= vmload_interception,
 	[SVM_EXIT_VMSAVE]			= vmsave_interception,
@@ -1923,7 +2116,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	svm->host_cr2 = kvm_read_cr2();
 	svm->host_dr6 = read_dr6();
 	svm->host_dr7 = read_dr7();
-	svm->vmcb->save.cr2 = vcpu->arch.cr2;
+	if (!is_nested(svm))
+		svm->vmcb->save.cr2 = vcpu->arch.cr2;
 	/* required for live migration with NPT */
 	if (npt_enabled)
 		svm->vmcb->save.cr3 = vcpu->arch.cr3;
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 3e25004..ad2ad32 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -736,6 +736,7 @@ enum {
 };
 
 #define HF_GIF_MASK		(1 << 0)
+#define HF_HIF_MASK		(1 << 1)
 
 /*
  * Hardware virtualization extension instructions may fault if a
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/9] Add VMEXIT handler and intercepts v2
  2008-09-05  7:51             ` [PATCH 7/9] Add VMRUN handler v2 Alexander Graf
@ 2008-09-05  7:51               ` Alexander Graf
  2008-09-05  7:51                 ` [PATCH 9/9] Allow setting the SVME bit v2 Alexander Graf
  2008-09-10 19:12                 ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Joerg Roedel
  0 siblings, 2 replies; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This adds the #VMEXIT intercept, so we return to the level 1 guest
when something happens in the level 2 guest that should return to
the level 1 guest.

v2 implements HIF handling and cleans up exception interception

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |  319 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 319 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c47f039..8318a63 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -74,6 +74,13 @@ module_param(npt, int, S_IRUGO);
 static void kvm_reput_irq(struct vcpu_svm *svm);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override);
+static int nested_svm_vmexit(struct vcpu_svm *svm);
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque);
+static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
+				      bool has_error_code, u32 error_code);
+
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -223,6 +230,11 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/* If we are within a nested VM we'd better #VMEXIT and let the
+	   guest handle the exception */
+	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
+		return;
+
 	svm->vmcb->control.event_inj = nr
 		| SVM_EVTINJ_VALID
 		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
@@ -1185,6 +1197,43 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
+				      bool has_error_code, u32 error_code)
+{
+	if (is_nested(svm)) {
+		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
+		svm->vmcb->control.exit_code_hi = 0;
+		svm->vmcb->control.exit_info_1 = error_code;
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
+
+			nested_svm_vmexit(svm);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static inline int nested_svm_intr(struct vcpu_svm *svm)
+{
+	if (is_nested(svm)) {
+		if (!(svm->vcpu.arch.hflags & HF_HIF_MASK))
+			return 0;
+
+		svm->vmcb->control.exit_code = SVM_EXIT_INTR;
+
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> INTR\n");
+			nested_svm_vmexit(svm);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
 {
 	struct page *page;
@@ -1251,6 +1300,257 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_exit_handled_real(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	bool kvm_overrides = *(bool *)opaque;
+	u32 exit_code = svm->vmcb->control.exit_code;
+
+	if (kvm_overrides) {
+		switch (exit_code) {
+		case SVM_EXIT_INTR:
+		case SVM_EXIT_NMI:
+			return 0;
+		/* For now we are always handling NPFs when using them */
+		case SVM_EXIT_NPF:
+			if (npt_enabled)
+				return 0;
+			break;
+		/* When we're shadowing, trap PFs */
+		case SVM_EXIT_EXCP_BASE + PF_VECTOR:
+			if (!npt_enabled)
+				return 0;
+			break;
+		default:
+			break;
+		}
+	}
+
+	switch (exit_code) {
+	case SVM_EXIT_READ_CR0 ... SVM_EXIT_READ_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_READ_CR0);
+		if (nested_vmcb->control.intercept_cr_read & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_CR0 ... SVM_EXIT_WRITE_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_WRITE_CR0);
+		if (nested_vmcb->control.intercept_cr_write & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_READ_DR0 ... SVM_EXIT_READ_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_READ_DR0);
+		if (nested_vmcb->control.intercept_dr_read & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_DR0 ... SVM_EXIT_WRITE_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_WRITE_DR0);
+		if (nested_vmcb->control.intercept_dr_write & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
+		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
+		if (nested_vmcb->control.intercept_exceptions & excp_bits)
+			return 1;
+		break;
+	}
+	default: {
+		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
+		nsvm_printk("exit code: 0x%x\n", exit_code);
+		if (nested_vmcb->control.intercept & exit_bits)
+			return 1;
+	}
+	}
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_exit_handled_io(struct vcpu_svm *svm,
+				      void *arg1, void *arg2,
+				      void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u16 param = (u16)(svm->vmcb->control.exit_info_1);
+	u16 port = (u16)(svm->vmcb->control.exit_info_1 >> 16);
+	u16 mask = (1 << ((param >> 4) & 7)) - 1;
+	u8 *iopm = (u8 *)arg2 + (port / 8);
+	u16 iopmw = iopm[0] | (iopm[1] << 8);
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
+		return 0;
+	if (iopmw & (mask << (port & 7)))
+		return 1;
+
+	nsvm_printk("nKVM: No IO-Intercept on param=0x%hx port=0x%hx "
+		    "mask=0x%hx iopm=0x%hx\n", param, port, mask, iopmw);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_exit_handled_msr(struct vcpu_svm *svm,
+				       void *arg1, void *arg2,
+				       void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u8 *msrpm = (u8 *)arg2;
+        u32 t0, t1;
+	u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
+	u32 param = svm->vmcb->control.exit_info_1 & 1;
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+		return 0;
+
+	switch(msr) {
+	case 0 ... 0x1fff:
+		t0 = (msr * 2) % 8;
+		t1 = msr / 8;
+		break;
+	case 0xc0000000 ... 0xc0001fff:
+		t0 = (8192 + msr - 0xc0000000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	case 0xc0010000 ... 0xc0011fff:
+		t0 = (16384 + msr - 0xc0010000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	default:
+		return 1;
+		break;
+	}
+	if (msrpm[t1] & ((1 << param) << t0))
+		return 1;
+
+	return 0;
+}
+
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override)
+{
+	bool k = kvm_override;
+
+	switch (svm->vmcb->control.exit_code) {
+#ifdef NESTED_KVM_MERGE_IOPM
+	case SVM_EXIT_IOIO:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_iopm, NULL,
+				     nested_svm_exit_handled_io);
+		break;
+#endif
+	case SVM_EXIT_MSR:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_msrpm, NULL,
+				     nested_svm_exit_handled_msr);
+	default: break;
+	}
+
+	return nested_svm_do(svm, svm->nested_vmcb, 0, &k,
+			     nested_svm_exit_handled_real);
+}
+
+static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = (struct vmcb *)arg2;
+	u64 nested_save[] = { nested_vmcb->save.cr0,
+			      nested_vmcb->save.cr3,
+			      nested_vmcb->save.cr4,
+			      nested_vmcb->save.efer,
+			      nested_vmcb->control.intercept_cr_read,
+			      nested_vmcb->control.intercept_cr_write,
+			      nested_vmcb->control.intercept_dr_read,
+			      nested_vmcb->control.intercept_dr_write,
+			      nested_vmcb->control.intercept_exceptions,
+			      nested_vmcb->control.intercept,
+			      nested_vmcb->control.msrpm_base_pa,
+			      nested_vmcb->control.iopm_base_pa,
+			      nested_vmcb->control.tsc_offset };
+
+	/* Give the current vmcb to the guest */
+	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
+	nested_vmcb->save.cr0 = nested_save[0];
+	if (!npt_enabled)
+		nested_vmcb->save.cr3 = nested_save[1];
+	nested_vmcb->save.cr4 = nested_save[2];
+	nested_vmcb->save.efer = nested_save[3];
+	nested_vmcb->control.intercept_cr_read = nested_save[4];
+	nested_vmcb->control.intercept_cr_write = nested_save[5];
+	nested_vmcb->control.intercept_dr_read = nested_save[6];
+	nested_vmcb->control.intercept_dr_write = nested_save[7];
+	nested_vmcb->control.intercept_exceptions = nested_save[8];
+	nested_vmcb->control.intercept = nested_save[9];
+	nested_vmcb->control.msrpm_base_pa = nested_save[10];
+	nested_vmcb->control.iopm_base_pa = nested_save[11];
+	nested_vmcb->control.tsc_offset = nested_save[12];
+
+	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
+	    (nested_vmcb->control.int_vector)) {
+		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
+				nested_vmcb->control.int_vector);
+	}
+
+	/* Restore the original control entries */
+	svm->vmcb->control = hsave->control;
+
+	/* Flush the virtual TLB */
+	force_new_asid(&svm->vcpu);
+
+	/* Kill any pending exceptions */
+	if (svm->vcpu.arch.exception.pending == true)
+		nsvm_printk("WARNING: Pending Exception\n");
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Restore selected save entries */
+	svm->vmcb->save.es = hsave->save.es;
+	svm->vmcb->save.cs = hsave->save.cs;
+	svm->vmcb->save.ss = hsave->save.ss;
+	svm->vmcb->save.ds = hsave->save.ds;
+	svm->vmcb->save.gdtr = hsave->save.gdtr;
+	svm->vmcb->save.idtr = hsave->save.idtr;
+	svm->vmcb->save.rflags = hsave->save.rflags;
+	svm_set_efer(&svm->vcpu, hsave->save.efer);
+	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
+	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = hsave->save.cr3;
+		svm->vcpu.arch.cr3 = hsave->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, hsave->save.cr3);
+	}
+	kvm_mmu_reset_context(&svm->vcpu);
+	kvm_mmu_load(&svm->vcpu);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip);
+	svm->vmcb->save.dr7 = 0;
+	svm->vmcb->save.cpl = 0;
+	svm->vmcb->control.exit_int_info = 0;
+
+	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+	/* Exit nested SVM mode */
+	svm->nested_vmcb = 0;
+
+	return 0;
+}
+
+static int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+	nsvm_printk("VMexit\n");
+	if (nested_svm_do(svm, svm->nested_vmcb, svm->nested_hsave,
+			  NULL, nested_svm_vmexit_real))
+		return 1;
+
+	return 0;
+}
 
 static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
 				  void *arg2, void *opaque)
@@ -1831,6 +2131,17 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip,
 		    (u32)((u64)svm->vmcb->save.rip >> 32), entryexit);
 
+	if (is_nested(svm)) {
+		nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
+			    exit_code, svm->vmcb->control.exit_info_1,
+			    svm->vmcb->control.exit_info_2, svm->vmcb->save.rip);
+		if (nested_svm_exit_handled(svm, true)) {
+			nested_svm_vmexit(svm);
+			nsvm_printk("-> #VMEXIT\n");
+			return 1;
+		}
+	}
+
 	if (npt_enabled) {
 		int mmu_reload = 0;
 		if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) {
@@ -1917,6 +2228,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	nested_svm_intr(svm);
+
 	svm_inject_irq(svm, irq);
 }
 
@@ -1962,6 +2275,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (nested_svm_intr(svm))
+		goto out;
+
 	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
 		goto out;
 
@@ -2014,6 +2330,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_control_area *control = &svm->vmcb->control;
 
+	if (nested_svm_intr(svm))
+		return;
+
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
 		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 9/9] Allow setting the SVME bit v2
  2008-09-05  7:51               ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Alexander Graf
@ 2008-09-05  7:51                 ` Alexander Graf
  2008-09-10 19:12                 ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Joerg Roedel
  1 sibling, 0 replies; 13+ messages in thread
From: Alexander Graf @ 2008-09-05  7:51 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

Normally setting the SVME bit in EFER is not allowed, as we did
not support SVM. Not since we do, we should also allow enabling
SVM mode.

v2 comes as last patch, so we don't enable half-ready code

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8318a63..d37329e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -460,6 +460,8 @@ static __init int svm_hardware_setup(void)
 	if (boot_cpu_has(X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
 
+	kvm_enable_efer_bits(MSR_EFER_SVME_MASK);
+
 	for_each_online_cpu(cpu) {
 		r = svm_cpu_init(cpu);
 		if (r)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/9] Add VMEXIT handler and intercepts v2
  2008-09-05  7:51               ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Alexander Graf
  2008-09-05  7:51                 ` [PATCH 9/9] Allow setting the SVME bit v2 Alexander Graf
@ 2008-09-10 19:12                 ` Joerg Roedel
  2008-09-15 16:11                   ` Alexander Graf
  1 sibling, 1 reply; 13+ messages in thread
From: Joerg Roedel @ 2008-09-10 19:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, anthony, avi

On Fri, Sep 05, 2008 at 09:51:23AM +0200, Alexander Graf wrote:
> This adds the #VMEXIT intercept, so we return to the level 1 guest
> when something happens in the level 2 guest that should return to
> the level 1 guest.
> 
> v2 implements HIF handling and cleans up exception interception
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/x86/kvm/svm.c |  319 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 319 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c47f039..8318a63 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -74,6 +74,13 @@ module_param(npt, int, S_IRUGO);
>  static void kvm_reput_irq(struct vcpu_svm *svm);
>  static void svm_flush_tlb(struct kvm_vcpu *vcpu);
>  
> +static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override);
> +static int nested_svm_vmexit(struct vcpu_svm *svm);
> +static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
> +			     void *arg2, void *opaque);
> +static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
> +				      bool has_error_code, u32 error_code);
> +
>  static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
>  {
>  	return container_of(vcpu, struct vcpu_svm, vcpu);
> @@ -223,6 +230,11 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	/* If we are within a nested VM we'd better #VMEXIT and let the
> +	   guest handle the exception */
> +	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
> +		return;
> +
>  	svm->vmcb->control.event_inj = nr
>  		| SVM_EVTINJ_VALID
>  		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
> @@ -1185,6 +1197,43 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
>  	return 1;
>  }
>  
> +static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
> +				      bool has_error_code, u32 error_code)
> +{
> +	if (is_nested(svm)) {
> +		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
> +		svm->vmcb->control.exit_code_hi = 0;
> +		svm->vmcb->control.exit_info_1 = error_code;
> +		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
> +		if (nested_svm_exit_handled(svm, false)) {
> +			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
> +
> +			nested_svm_vmexit(svm);
> +			return 1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int nested_svm_intr(struct vcpu_svm *svm)
> +{
> +	if (is_nested(svm)) {
> +		if (!(svm->vcpu.arch.hflags & HF_HIF_MASK))

HF_HIF_MASK?
Do you really need to check GIF if the vcpu is in guest mode? The guest
itself can't influence the GIF so it should always be set.

> +			return 0;
> +
> +		svm->vmcb->control.exit_code = SVM_EXIT_INTR;
> +
> +		if (nested_svm_exit_handled(svm, false)) {
> +			nsvm_printk("VMexit -> INTR\n");
> +			nested_svm_vmexit(svm);
> +			return 1;
> +		}

The VMEXIT is only required if the guest vmcb has set V_INTR_MASKING.
Otherwise we can inject the interrupt directly into the l2 guest. This
is no problem for running KVM-in-KVM because KVM always sets
V_INTR_MASKING. But to keep the implementation close to real hardware
behavior this bit should be checked.

> +	}
> +
> +	return 0;
> +}
> +
>  static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
>  {
>  	struct page *page;
> @@ -1251,6 +1300,257 @@ static int nested_svm_do(struct vcpu_svm *svm,
>  	return retval;
>  }
>  
> +static int nested_svm_exit_handled_real(struct vcpu_svm *svm,
> +					void *arg1,
> +					void *arg2,
> +					void *opaque)
> +{
> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
> +	bool kvm_overrides = *(bool *)opaque;
> +	u32 exit_code = svm->vmcb->control.exit_code;
> +
> +	if (kvm_overrides) {
> +		switch (exit_code) {
> +		case SVM_EXIT_INTR:
> +		case SVM_EXIT_NMI:
> +			return 0;
> +		/* For now we are always handling NPFs when using them */
> +		case SVM_EXIT_NPF:
> +			if (npt_enabled)
> +				return 0;
> +			break;
> +		/* When we're shadowing, trap PFs */
> +		case SVM_EXIT_EXCP_BASE + PF_VECTOR:
> +			if (!npt_enabled)
> +				return 0;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
> +	switch (exit_code) {
> +	case SVM_EXIT_READ_CR0 ... SVM_EXIT_READ_CR8: {
> +		u32 cr_bits = 1 << (exit_code - SVM_EXIT_READ_CR0);
> +		if (nested_vmcb->control.intercept_cr_read & cr_bits)
> +			return 1;
> +		break;
> +	}
> +	case SVM_EXIT_WRITE_CR0 ... SVM_EXIT_WRITE_CR8: {
> +		u32 cr_bits = 1 << (exit_code - SVM_EXIT_WRITE_CR0);
> +		if (nested_vmcb->control.intercept_cr_write & cr_bits)
> +			return 1;
> +		break;
> +	}
> +	case SVM_EXIT_READ_DR0 ... SVM_EXIT_READ_DR7: {
> +		u32 dr_bits = 1 << (exit_code - SVM_EXIT_READ_DR0);
> +		if (nested_vmcb->control.intercept_dr_read & dr_bits)
> +			return 1;
> +		break;
> +	}
> +	case SVM_EXIT_WRITE_DR0 ... SVM_EXIT_WRITE_DR7: {
> +		u32 dr_bits = 1 << (exit_code - SVM_EXIT_WRITE_DR0);
> +		if (nested_vmcb->control.intercept_dr_write & dr_bits)
> +			return 1;
> +		break;
> +	}
> +	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
> +		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
> +		if (nested_vmcb->control.intercept_exceptions & excp_bits)
> +			return 1;
> +		break;
> +	}
> +	default: {
> +		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
> +		nsvm_printk("exit code: 0x%x\n", exit_code);
> +		if (nested_vmcb->control.intercept & exit_bits)
> +			return 1;
> +	}
> +	}
> +
> +	return 0;
> +}
> +
> +#ifdef NESTED_KVM_MERGE_IOPM
> +static int nested_svm_exit_handled_io(struct vcpu_svm *svm,
> +				      void *arg1, void *arg2,
> +				      void *opaque)
> +{
> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
> +	u16 param = (u16)(svm->vmcb->control.exit_info_1);
> +	u16 port = (u16)(svm->vmcb->control.exit_info_1 >> 16);
> +	u16 mask = (1 << ((param >> 4) & 7)) - 1;
> +	u8 *iopm = (u8 *)arg2 + (port / 8);
> +	u16 iopmw = iopm[0] | (iopm[1] << 8);
> +
> +	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
> +		return 0;
> +	if (iopmw & (mask << (port & 7)))
> +		return 1;
> +
> +	nsvm_printk("nKVM: No IO-Intercept on param=0x%hx port=0x%hx "
> +		    "mask=0x%hx iopm=0x%hx\n", param, port, mask, iopmw);
> +
> +	return 0;
> +}
> +#endif
> +
> +static int nested_svm_exit_handled_msr(struct vcpu_svm *svm,
> +				       void *arg1, void *arg2,
> +				       void *opaque)
> +{
> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
> +	u8 *msrpm = (u8 *)arg2;
> +        u32 t0, t1;
> +	u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
> +	u32 param = svm->vmcb->control.exit_info_1 & 1;
> +
> +	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_MSR_PROT)))
> +		return 0;
> +
> +	switch(msr) {
> +	case 0 ... 0x1fff:
> +		t0 = (msr * 2) % 8;
> +		t1 = msr / 8;
> +		break;
> +	case 0xc0000000 ... 0xc0001fff:
> +		t0 = (8192 + msr - 0xc0000000) * 2;
> +		t1 = (t0 / 8);
> +		t0 %= 8;
> +		break;
> +	case 0xc0010000 ... 0xc0011fff:
> +		t0 = (16384 + msr - 0xc0010000) * 2;
> +		t1 = (t0 / 8);
> +		t0 %= 8;
> +		break;
> +	default:
> +		return 1;
> +		break;
> +	}
> +	if (msrpm[t1] & ((1 << param) << t0))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override)
> +{
> +	bool k = kvm_override;
> +
> +	switch (svm->vmcb->control.exit_code) {
> +#ifdef NESTED_KVM_MERGE_IOPM
> +	case SVM_EXIT_IOIO:
> +		return nested_svm_do(svm, svm->nested_vmcb,
> +				     svm->nested_vmcb_iopm, NULL,
> +				     nested_svm_exit_handled_io);
> +		break;
> +#endif
> +	case SVM_EXIT_MSR:
> +		return nested_svm_do(svm, svm->nested_vmcb,
> +				     svm->nested_vmcb_msrpm, NULL,
> +				     nested_svm_exit_handled_msr);
> +	default: break;
> +	}
> +
> +	return nested_svm_do(svm, svm->nested_vmcb, 0, &k,
> +			     nested_svm_exit_handled_real);
> +}
> +
> +static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
> +				  void *arg2, void *opaque)
> +{
> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
> +	struct vmcb *hsave = (struct vmcb *)arg2;
> +	u64 nested_save[] = { nested_vmcb->save.cr0,
> +			      nested_vmcb->save.cr3,
> +			      nested_vmcb->save.cr4,
> +			      nested_vmcb->save.efer,
> +			      nested_vmcb->control.intercept_cr_read,
> +			      nested_vmcb->control.intercept_cr_write,
> +			      nested_vmcb->control.intercept_dr_read,
> +			      nested_vmcb->control.intercept_dr_write,
> +			      nested_vmcb->control.intercept_exceptions,
> +			      nested_vmcb->control.intercept,
> +			      nested_vmcb->control.msrpm_base_pa,
> +			      nested_vmcb->control.iopm_base_pa,
> +			      nested_vmcb->control.tsc_offset };
> +
> +	/* Give the current vmcb to the guest */
> +	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
> +	nested_vmcb->save.cr0 = nested_save[0];
> +	if (!npt_enabled)
> +		nested_vmcb->save.cr3 = nested_save[1];
> +	nested_vmcb->save.cr4 = nested_save[2];
> +	nested_vmcb->save.efer = nested_save[3];
> +	nested_vmcb->control.intercept_cr_read = nested_save[4];
> +	nested_vmcb->control.intercept_cr_write = nested_save[5];
> +	nested_vmcb->control.intercept_dr_read = nested_save[6];
> +	nested_vmcb->control.intercept_dr_write = nested_save[7];
> +	nested_vmcb->control.intercept_exceptions = nested_save[8];
> +	nested_vmcb->control.intercept = nested_save[9];
> +	nested_vmcb->control.msrpm_base_pa = nested_save[10];
> +	nested_vmcb->control.iopm_base_pa = nested_save[11];
> +	nested_vmcb->control.tsc_offset = nested_save[12];
> +
> +	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
> +	    (nested_vmcb->control.int_vector)) {
> +		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
> +				nested_vmcb->control.int_vector);
> +	}
> +
> +	/* Restore the original control entries */
> +	svm->vmcb->control = hsave->control;
> +
> +	/* Flush the virtual TLB */
> +	force_new_asid(&svm->vcpu);
> +
> +	/* Kill any pending exceptions */
> +	if (svm->vcpu.arch.exception.pending == true)
> +		nsvm_printk("WARNING: Pending Exception\n");
> +	svm->vcpu.arch.exception.pending = false;
> +
> +	/* Restore selected save entries */
> +	svm->vmcb->save.es = hsave->save.es;
> +	svm->vmcb->save.cs = hsave->save.cs;
> +	svm->vmcb->save.ss = hsave->save.ss;
> +	svm->vmcb->save.ds = hsave->save.ds;
> +	svm->vmcb->save.gdtr = hsave->save.gdtr;
> +	svm->vmcb->save.idtr = hsave->save.idtr;
> +	svm->vmcb->save.rflags = hsave->save.rflags;
> +	svm_set_efer(&svm->vcpu, hsave->save.efer);
> +	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
> +	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
> +	if (npt_enabled) {
> +		svm->vmcb->save.cr3 = hsave->save.cr3;
> +		svm->vcpu.arch.cr3 = hsave->save.cr3;
> +	} else {
> +		kvm_set_cr3(&svm->vcpu, hsave->save.cr3);
> +	}
> +	kvm_mmu_reset_context(&svm->vcpu);
> +	kvm_mmu_load(&svm->vcpu);
> +	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax);
> +	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp);
> +	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip);
> +	svm->vmcb->save.dr7 = 0;
> +	svm->vmcb->save.cpl = 0;
> +	svm->vmcb->control.exit_int_info = 0;
> +
> +	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
> +	/* Exit nested SVM mode */
> +	svm->nested_vmcb = 0;
> +
> +	return 0;
> +}
> +
> +static int nested_svm_vmexit(struct vcpu_svm *svm)
> +{
> +	nsvm_printk("VMexit\n");
> +	if (nested_svm_do(svm, svm->nested_vmcb, svm->nested_hsave,
> +			  NULL, nested_svm_vmexit_real))
> +		return 1;
> +
> +	return 0;
> +}
>  
>  static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
>  				  void *arg2, void *opaque)
> @@ -1831,6 +2131,17 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
>  	KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip,
>  		    (u32)((u64)svm->vmcb->save.rip >> 32), entryexit);
>  
> +	if (is_nested(svm)) {
> +		nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
> +			    exit_code, svm->vmcb->control.exit_info_1,
> +			    svm->vmcb->control.exit_info_2, svm->vmcb->save.rip);
> +		if (nested_svm_exit_handled(svm, true)) {
> +			nested_svm_vmexit(svm);
> +			nsvm_printk("-> #VMEXIT\n");
> +			return 1;
> +		}
> +	}
> +
>  	if (npt_enabled) {
>  		int mmu_reload = 0;
>  		if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) {
> @@ -1917,6 +2228,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	nested_svm_intr(svm);
> +
>  	svm_inject_irq(svm, irq);
>  }
>  
> @@ -1962,6 +2275,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
>  	if (!kvm_cpu_has_interrupt(vcpu))
>  		goto out;
>  
> +	if (nested_svm_intr(svm))
> +		goto out;
> +
>  	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
>  		goto out;
>  
> @@ -2014,6 +2330,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct vmcb_control_area *control = &svm->vmcb->control;
>  
> +	if (nested_svm_intr(svm))
> +		return;
> +
>  	svm->vcpu.arch.interrupt_window_open =
>  		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
>  		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
> -- 
> 1.5.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/9] Add VMEXIT handler and intercepts v2
  2008-09-10 19:12                 ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Joerg Roedel
@ 2008-09-15 16:11                   ` Alexander Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Graf @ 2008-09-15 16:11 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: kvm, anthony, avi


On Sep 10, 2008, at 9:12 PM, Joerg Roedel wrote:

> On Fri, Sep 05, 2008 at 09:51:23AM +0200, Alexander Graf wrote:
>> This adds the #VMEXIT intercept, so we return to the level 1 guest
>> when something happens in the level 2 guest that should return to
>> the level 1 guest.
>>
>> v2 implements HIF handling and cleans up exception interception
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/x86/kvm/svm.c |  319 ++++++++++++++++++++++++++++++++++++++++++ 
>> ++++++++++
>> 1 files changed, 319 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index c47f039..8318a63 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -74,6 +74,13 @@ module_param(npt, int, S_IRUGO);
>> static void kvm_reput_irq(struct vcpu_svm *svm);
>> static void svm_flush_tlb(struct kvm_vcpu *vcpu);
>>
>> +static int nested_svm_exit_handled(struct vcpu_svm *svm, bool  
>> kvm_override);
>> +static int nested_svm_vmexit(struct vcpu_svm *svm);
>> +static int nested_svm_vmsave(struct vcpu_svm *svm, void  
>> *nested_vmcb,
>> +			     void *arg2, void *opaque);
>> +static int nested_svm_check_exception(struct vcpu_svm *svm,  
>> unsigned nr,
>> +				      bool has_error_code, u32 error_code);
>> +
>> static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
>> {
>> 	return container_of(vcpu, struct vcpu_svm, vcpu);
>> @@ -223,6 +230,11 @@ static void svm_queue_exception(struct  
>> kvm_vcpu *vcpu, unsigned nr,
>> {
>> 	struct vcpu_svm *svm = to_svm(vcpu);
>>
>> +	/* If we are within a nested VM we'd better #VMEXIT and let the
>> +	   guest handle the exception */
>> +	if (nested_svm_check_exception(svm, nr, has_error_code,  
>> error_code))
>> +		return;
>> +
>> 	svm->vmcb->control.event_inj = nr
>> 		| SVM_EVTINJ_VALID
>> 		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
>> @@ -1185,6 +1197,43 @@ static int vmmcall_interception(struct  
>> vcpu_svm *svm, struct kvm_run *kvm_run)
>> 	return 1;
>> }
>>
>> +static int nested_svm_check_exception(struct vcpu_svm *svm,  
>> unsigned nr,
>> +				      bool has_error_code, u32 error_code)
>> +{
>> +	if (is_nested(svm)) {
>> +		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
>> +		svm->vmcb->control.exit_code_hi = 0;
>> +		svm->vmcb->control.exit_info_1 = error_code;
>> +		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>> +		if (nested_svm_exit_handled(svm, false)) {
>> +			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
>> +
>> +			nested_svm_vmexit(svm);
>> +			return 1;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static inline int nested_svm_intr(struct vcpu_svm *svm)
>> +{
>> +	if (is_nested(svm)) {
>> +		if (!(svm->vcpu.arch.hflags & HF_HIF_MASK))
>
> HF_HIF_MASK?
> Do you really need to check GIF if the vcpu is in guest mode? The  
> guest
> itself can't influence the GIF so it should always be set.

HIF is not GIF ;-). The HIF is the hidden interrupt flag, aka the IF  
flag the host had when it started VMRUN.

>
>
>> +			return 0;
>> +
>> +		svm->vmcb->control.exit_code = SVM_EXIT_INTR;
>> +
>> +		if (nested_svm_exit_handled(svm, false)) {
>> +			nsvm_printk("VMexit -> INTR\n");
>> +			nested_svm_vmexit(svm);
>> +			return 1;
>> +		}
>
> The VMEXIT is only required if the guest vmcb has set V_INTR_MASKING.
> Otherwise we can inject the interrupt directly into the l2 guest. This
> is no problem for running KVM-in-KVM because KVM always sets
> V_INTR_MASKING. But to keep the implementation close to real hardware
> behavior this bit should be checked.

Right now I always force V_INTR_MASK to on in VMRUN, so we don't have  
to look at that case. It might be a good idea to simply implement the  
behavior though ...

[snip]

Alex

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi v2
  2008-09-05  7:51     ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Alexander Graf
  2008-09-05  7:51       ` [PATCH 4/9] Add helper functions for nested SVM v2 Alexander Graf
@ 2008-09-10 19:06       ` Joerg Roedel
  1 sibling, 0 replies; 13+ messages in thread
From: Joerg Roedel @ 2008-09-10 19:06 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, anthony, avi

On Fri, Sep 05, 2008 at 09:51:18AM +0200, Alexander Graf wrote:
> This patch implements the GIF flag and the clgi and stgi instructions that
> set this flag. Only if the flag is set (default), interrupts can be received by
> the CPU.
> 
> To keep the information about that somewhere, this patch adds a new hidden
> flags vector. that is used to store information that does not go into the
> vmcb, but is SVM specific.
> 
> v2 moves the hflags to x86 generic code
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/x86/kvm/svm.c         |   50 +++++++++++++++++++++++++++++++++++++++++--
>  include/asm-x86/kvm_host.h |    3 ++
>  2 files changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d9b18cc..a6d9a74 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -602,6 +602,8 @@ static void init_vmcb(struct vcpu_svm *svm)
>  		save->cr4 = 0;
>  	}
>  	force_new_asid(&svm->vcpu);
> +
> +	svm->vcpu.arch.hflags = HF_GIF_MASK;
>  }
>  
>  static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
> @@ -1139,6 +1141,44 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
>  	return 1;
>  }
>  
> +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}

You need to check EFER.SVME here.

> +
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	svm->vcpu.arch.hflags |= HF_GIF_MASK;
> +
> +	return 1;
> +}
> +
> +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}

And here too. I suggest implementing a function which checks EFER.SVME
and CPL and call it in every emulated SVM instruction.

> +
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
> +
> +	/* After a CLGI no interrupts should come */
> +	svm_clear_vintr(svm);
> +	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
> +
> +	return 1;
> +}
> +
>  static int invalid_op_interception(struct vcpu_svm *svm,
>  				   struct kvm_run *kvm_run)
>  {
> @@ -1433,8 +1473,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
>  	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
>  	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
>  	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
> -	[SVM_EXIT_STGI]				= invalid_op_interception,
> -	[SVM_EXIT_CLGI]				= invalid_op_interception,
> +	[SVM_EXIT_STGI]				= stgi_interception,
> +	[SVM_EXIT_CLGI]				= clgi_interception,
>  	[SVM_EXIT_SKINIT]			= invalid_op_interception,
>  	[SVM_EXIT_WBINVD]                       = emulate_on_interception,
>  	[SVM_EXIT_MONITOR]			= invalid_op_interception,
> @@ -1581,6 +1621,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
>  	if (!kvm_cpu_has_interrupt(vcpu))
>  		goto out;
>  
> +	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
> +		goto out;
> +
>  	if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
>  	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
>  	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
> @@ -1632,7 +1675,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
>  
>  	svm->vcpu.arch.interrupt_window_open =
>  		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
> -		 (svm->vmcb->save.rflags & X86_EFLAGS_IF));
> +		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
> +		 (svm->vcpu.arch.hflags & HF_GIF_MASK));
>  
>  	if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary)
>  		/*
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index 982b6b2..3e25004 100644
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -245,6 +245,7 @@ struct kvm_vcpu_arch {
>  	unsigned long cr3;
>  	unsigned long cr4;
>  	unsigned long cr8;
> +	u32 hflags;
>  	u64 pdptrs[4]; /* pae */
>  	u64 shadow_efer;
>  	u64 apic_base;
> @@ -734,6 +735,8 @@ enum {
>  	TASK_SWITCH_GATE = 3,
>  };
>  
> +#define HF_GIF_MASK		(1 << 0)
> +
>  /*
>   * Hardware virtualization extension instructions may fault if a
>   * reboot turns off virtualization while processes are running.
> -- 
> 1.5.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-09-15 16:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-05  7:51 [PATCH 0/9] Add support for nested SVM (kernel) v2 Alexander Graf
2008-09-05  7:51 ` [PATCH 1/9] Add CPUID feature flag for SVM v2 Alexander Graf
2008-09-05  7:51   ` [PATCH 2/9] Clean up VINTR setting v2 Alexander Graf
2008-09-05  7:51     ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Alexander Graf
2008-09-05  7:51       ` [PATCH 4/9] Add helper functions for nested SVM v2 Alexander Graf
2008-09-05  7:51         ` [PATCH 5/9] Implement hsave v2 Alexander Graf
2008-09-05  7:51           ` [PATCH 6/9] Add VMLOAD and VMSAVE handlers v2 Alexander Graf
2008-09-05  7:51             ` [PATCH 7/9] Add VMRUN handler v2 Alexander Graf
2008-09-05  7:51               ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Alexander Graf
2008-09-05  7:51                 ` [PATCH 9/9] Allow setting the SVME bit v2 Alexander Graf
2008-09-10 19:12                 ` [PATCH 8/9] Add VMEXIT handler and intercepts v2 Joerg Roedel
2008-09-15 16:11                   ` Alexander Graf
2008-09-10 19:06       ` [PATCH 3/9] Implement GIF, clgi and stgi v2 Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox