[PATCH 0/9] [RFC] Add support for nested SVM (kernel)

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
@ 2008-09-01 11:57 Alexander Graf
  2008-09-01 11:57 ` [PATCH 1/9] Add CPUID feature flag for SVM Alexander Graf
                   ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

The current generation of virtualization extensions only supports one VM layer.
While we can't change that, it is pretty easy to emulate the CPU's behavior
and implement the virtualization opcodes ourselves.

This patchset does exactly this for SVM. Using this, a KVM can run within a VM.
Since we're emulating the real CPU's behavior, this should also enable other
VMMs to run within KVM.
So far I've only tested to run KVM inside the VM though.

This was created with help from Joerg Roedel. I don't really know how put this
information in the patchset though. Maybe set him on signed-off-by?

As always, comments and suggestions are highly welcome.

Alexander Graf (9):
  Add CPUID feature flag for SVM
  Clean up VINTR setting
  Implement GIF, clgi and stgi
  Add helper functions for nested SVM
  Allow setting the SVME bit
  Implement hsave
  Add VMLOAD and VMSAVE handlers
  Add VMRUN handler
  Add VMEXIT handler and intercepts

 arch/x86/kvm/kvm_svm.h       |   13 +
 arch/x86/kvm/svm.c           |  716 +++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm.h           |    2 +
 include/asm-x86/cpufeature.h |    1 +
 4 files changed, 720 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/9] Add CPUID feature flag for SVM
  2008-09-01 11:57 [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Alexander Graf
@ 2008-09-01 11:57 ` Alexander Graf
  2008-09-01 11:57   ` [PATCH 2/9] Clean up VINTR setting Alexander Graf
  2008-09-01 12:09 ` [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Avi Kivity
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

This patch adds the CPUID feature flag for SVM in the x86 Linux headers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 include/asm-x86/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index 762f6a6..bfb49fc 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -108,6 +108,7 @@
 /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
 #define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading not valid */
+#define X86_FEATURE_SVM		(6*32+ 2) /* Secure Virtual Machine */
 #define X86_FEATURE_IBS		(6*32+ 10) /* Instruction Based Sampling */
 
 /*
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/9] Clean up VINTR setting
  2008-09-01 11:57 ` [PATCH 1/9] Add CPUID feature flag for SVM Alexander Graf
@ 2008-09-01 11:57   ` Alexander Graf
  2008-09-01 11:57     ` [PATCH 3/9] Implement GIF, clgi and stgi Alexander Graf
  2008-09-01 13:13     ` [PATCH 2/9] Clean up VINTR setting Avi Kivity
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 961dc3b..0f423ce 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -729,6 +729,15 @@ static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	to_svm(vcpu)->vmcb->save.rflags = rflags;
 }
 
+static void svm_set_vintr(struct vcpu_svm *svm, bool on)
+{
+	if (on) {
+		svm->vmcb->control.intercept |= 1ULL << INTERCEPT_VINTR;
+	} else {
+		svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+	}
+}
+
 static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg)
 {
 	struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
@@ -1363,7 +1372,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm,
 {
 	KVMTRACE_0D(PEND_INTR, &svm->vcpu, handler);
 
-	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+	svm_set_vintr(svm, false);
 	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
 	/*
 	 * If the user space waits to inject interrupts, exit as soon as
@@ -1575,7 +1584,7 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
 		/* unable to deliver irq, set pending irq */
-		vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm, true);
 		svm_inject_irq(svm, 0x0);
 		goto out;
 	}
@@ -1635,9 +1644,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	 */
 	if (!svm->vcpu.arch.interrupt_window_open &&
 	    (svm->vcpu.arch.irq_summary || kvm_run->request_interrupt_window))
-		control->intercept |= 1ULL << INTERCEPT_VINTR;
-	 else
-		control->intercept &= ~(1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm, true);
+	else
+		svm_set_vintr(svm, false);
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 11:57   ` [PATCH 2/9] Clean up VINTR setting Alexander Graf
@ 2008-09-01 11:57     ` Alexander Graf
  2008-09-01 11:57       ` [PATCH 4/9] Add helper functions for nested SVM Alexander Graf
  2008-09-01 13:11       ` [PATCH 3/9] Implement GIF, clgi and stgi Avi Kivity
  2008-09-01 13:13     ` [PATCH 2/9] Clean up VINTR setting Avi Kivity
  1 sibling, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.

To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h |    2 +
 arch/x86/kvm/svm.c     |   50 +++++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm.h     |    2 +
 3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 65ef0fc..a3105a6 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -41,6 +41,8 @@ struct vcpu_svm {
 	unsigned long host_dr7;
 
 	u32 *msrpm;
+
+	u32 hflags;
 };
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0f423ce..7c75484 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -602,6 +602,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		save->cr4 = 0;
 	}
 	force_new_asid(&svm->vcpu);
+
+	svm->hflags = HF_GIF_MASK;
 }
 
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -1138,6 +1140,44 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
+static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	svm->hflags &= ~HF_GIF_MASK;
+
+	/* After a CLGI no interrupts should come */
+	svm_set_vintr(svm, false);
+	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+
+	return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
@@ -1432,8 +1472,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
 	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
-	[SVM_EXIT_STGI]				= invalid_op_interception,
-	[SVM_EXIT_CLGI]				= invalid_op_interception,
+	[SVM_EXIT_STGI]				= stgi_interception,
+	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
 	[SVM_EXIT_WBINVD]                       = emulate_on_interception,
 	[SVM_EXIT_MONITOR]			= invalid_op_interception,
@@ -1580,6 +1620,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (!(svm->hflags & HF_GIF_MASK))
+		goto out;
+
 	if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
@@ -1631,7 +1674,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
-		 (svm->vmcb->save.rflags & X86_EFLAGS_IF));
+		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
+		 (svm->hflags & HF_GIF_MASK));
 
 	if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary)
 		/*
diff --git a/arch/x86/kvm/svm.h b/arch/x86/kvm/svm.h
index 1b8afa7..61c0904 100644
--- a/arch/x86/kvm/svm.h
+++ b/arch/x86/kvm/svm.h
@@ -324,5 +324,7 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_STGI   ".byte 0x0f, 0x01, 0xdc"
 #define SVM_INVLPGA ".byte 0x0f, 0x01, 0xdf"
 
+#define HF_GIF_MASK		(1 << 0)
+
 #endif
 
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/9] Add helper functions for nested SVM
  2008-09-01 11:57     ` [PATCH 3/9] Implement GIF, clgi and stgi Alexander Graf
@ 2008-09-01 11:57       ` Alexander Graf
  2008-09-01 11:57         ` [PATCH 5/9] Allow setting the SVME bit Alexander Graf
  2008-09-01 13:11       ` [PATCH 3/9] Implement GIF, clgi and stgi Avi Kivity
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

These are helpers for the nested SVM implementation.

- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c75484..35c7fe7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -48,6 +48,16 @@ MODULE_LICENSE("GPL");
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
+/* Turn on to get debugging output*/
+/* #define NESTED_DEBUG */
+
+#ifdef NESTED_DEBUG
+#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
+#else
+static inline void nsvm_printk(char *fmt, ...) {
+}
+#endif
+
 /* enable NPT for AMD64 and X86 with PAE */
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 static bool npt_enabled = true;
@@ -1140,6 +1150,72 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
+{
+	struct page *page;
+
+	down_read(&current->mm->mmap_sem);
+	page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT);
+	up_read(&current->mm->mmap_sem);
+
+	if (is_error_page(page)) {
+		printk(KERN_ERR "%s: could not find page at 0x%llx\n",
+		       __func__, gpa);
+		kvm_release_page_clean(page);
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return NULL;
+	}
+	return page;
+}
+
+static int nested_svm_do(struct vcpu_svm *svm,
+			 u64 arg1_gpa, u64 arg2_gpa, void *opaque,
+			 int (*handler)(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque))
+{
+	struct page *arg1_page;
+	struct page *arg2_page = NULL;
+	void *arg1;
+	void *arg2 = NULL;
+	int retval;
+
+	if (!(svm->vcpu.arch.shadow_efer & MSR_EFER_SVME_MASK)
+	    || !is_paging(&svm->vcpu)) {
+		kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	arg1_page = nested_svm_get_page(svm, arg1_gpa);
+	if(arg1_page == NULL)
+		return 1;
+
+	if (arg2_gpa) {
+		arg2_page = nested_svm_get_page(svm, arg2_gpa);
+		if(arg2_page == NULL) {
+			kvm_release_page_clean(arg1_page);
+			return 1;
+		}
+	}
+
+	arg1 = kmap_atomic(arg1_page, KM_USER0);
+	if (arg2_gpa)
+		arg2 = kmap_atomic(arg2_page, KM_USER1);
+
+	retval = handler(svm, arg1, arg2, opaque);
+
+	kunmap_atomic(arg1, KM_USER0);
+	if (arg2_gpa)
+		kunmap_atomic(arg2, KM_USER1);
+
+	kvm_release_page_dirty(arg1_page);
+	if (arg2_gpa)
+		kvm_release_page_dirty(arg2_page);
+
+	return retval;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/9] Allow setting the SVME bit
  2008-09-01 11:57       ` [PATCH 4/9] Add helper functions for nested SVM Alexander Graf
@ 2008-09-01 11:57         ` Alexander Graf
  2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
  2008-09-01 13:14           ` [PATCH 5/9] Allow setting the SVME bit Avi Kivity
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

Normally setting the SVME bit in EFER is not allowed, as we did
not support SVM. Not since we do, we should also allow enabling
SVM mode.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 35c7fe7..3c9774e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -440,6 +440,8 @@ static __init int svm_hardware_setup(void)
 	if (boot_cpu_has(X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
 
+	kvm_enable_efer_bits(MSR_EFER_SVME_MASK);
+
 	for_each_online_cpu(cpu) {
 		r = svm_cpu_init(cpu);
 		if (r)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/9] Implement hsave
  2008-09-01 11:57         ` [PATCH 5/9] Allow setting the SVME bit Alexander Graf
@ 2008-09-01 11:57           ` Alexander Graf
  2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
                               ` (2 more replies)
  2008-09-01 13:14           ` [PATCH 5/9] Allow setting the SVME bit Avi Kivity
  1 sibling, 3 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h |    1 +
 arch/x86/kvm/svm.c     |    7 +++++++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index a3105a6..76e1f53 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -42,6 +42,7 @@ struct vcpu_svm {
 
 	u32 *msrpm;
 
+	u64 nested_hsave;
 	u32 hflags;
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3c9774e..b440731 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -615,6 +615,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	}
 	force_new_asid(&svm->vcpu);
 
+	svm->nested_hsave = 0;
 	svm->hflags = HF_GIF_MASK;
 }
 
@@ -1360,6 +1361,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 	case MSR_IA32_LASTINTTOIP:
 		*data = svm->vmcb->save.last_excp_to;
 		break;
+	case MSR_VM_HSAVE_PA:
+		*data = svm->nested_hsave;
+		break;
 	default:
 		return kvm_get_msr_common(vcpu, ecx, data);
 	}
@@ -1454,6 +1458,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 		pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", ecx, data);
 
 		break;
+	case MSR_VM_HSAVE_PA:
+		svm->nested_hsave = data;
+		break;
 	default:
 		return kvm_set_msr_common(vcpu, ecx, data);
 	}
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 7/9] Add VMLOAD and VMSAVE handlers
  2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
@ 2008-09-01 11:57             ` Alexander Graf
  2008-09-01 11:58               ` [PATCH 8/9] Add VMRUN handler Alexander Graf
  2008-09-01 13:27               ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Avi Kivity
  2008-09-01 13:15             ` [PATCH 6/9] Implement hsave Avi Kivity
  2008-09-01 13:21             ` Avi Kivity
  2 siblings, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:57 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

This implements the VMLOAD and VMSAVE instructions, that usually surround
the VMRUN instructions. Both instructions load / restore the same elements,
so we only need to implement them once.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b440731..f857642 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1219,6 +1219,71 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+{
+	memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct vmcb_seg));
+	memcpy(&to_vmcb->save.gs, &from_vmcb->save.gs, sizeof(struct vmcb_seg));
+	memcpy(&to_vmcb->save.tr, &from_vmcb->save.tr, sizeof(struct vmcb_seg));
+	memcpy(&to_vmcb->save.ldtr, &from_vmcb->save.ldtr,
+	       sizeof(struct vmcb_seg));
+	to_vmcb->save.kernel_gs_base = from_vmcb->save.kernel_gs_base;
+	to_vmcb->save.star = from_vmcb->save.star;
+	to_vmcb->save.lstar = from_vmcb->save.lstar;
+	to_vmcb->save.cstar = from_vmcb->save.cstar;
+	to_vmcb->save.sfmask = from_vmcb->save.sfmask;
+	to_vmcb->save.sysenter_cs = from_vmcb->save.sysenter_cs;
+	to_vmcb->save.sysenter_esp = from_vmcb->save.sysenter_esp;
+	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
+
+	return 1;
+}
+
+static int nested_svm_vmload(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave((struct vmcb *)nested_vmcb, svm->vmcb);
+}
+
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave(svm->vmcb, (struct vmcb *)nested_vmcb);
+}
+
+static int vmload_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl,
+		       kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmload);
+
+	return 1;
+}
+
+static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmsave);
+
+	return 1;
+}
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
@@ -1555,8 +1620,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
 	[SVM_EXIT_VMRUN]			= invalid_op_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
-	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
-	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
+	[SVM_EXIT_VMLOAD]			= vmload_interception,
+	[SVM_EXIT_VMSAVE]			= vmsave_interception,
 	[SVM_EXIT_STGI]				= stgi_interception,
 	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 8/9] Add VMRUN handler
  2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
@ 2008-09-01 11:58               ` Alexander Graf
  2008-09-01 11:58                 ` [PATCH 9/9] Add VMEXIT handler and intercepts Alexander Graf
  2008-09-01 13:41                 ` [PATCH 8/9] Add VMRUN handler Avi Kivity
  2008-09-01 13:27               ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Avi Kivity
  1 sibling, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:58 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do it.

We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h |   10 +++
 arch/x86/kvm/svm.c     |  188 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 196 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 76e1f53..7a70dc8 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -43,6 +43,16 @@ struct vcpu_svm {
 	u32 *msrpm;
 
 	u64 nested_hsave;
+	u64 nested_vmcb;
+
+	/* These are the merged vectors */
+	u32 *nested_msrpm;
+	u32 *nested_iopm;
+
+	/* gpa pointers to the real vectors */
+	u64 nested_vmcb_msrpm;
+	u64 nested_vmcb_iopm;
+
 	u32 hflags;
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f857642..5c1c87e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
 /* Turn on to get debugging output*/
 /* #define NESTED_DEBUG */
 
+/* Not needed until device passthrough */
+/* #define NESTED_KVM_MERGE_IOPM */
+
 #ifdef NESTED_DEBUG
 #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
 #else
@@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_svm, vcpu);
 }
 
+static inline bool is_nested(struct vcpu_svm *svm)
+{
+	return svm->nested_vmcb;
+}
+
 static unsigned long iopm_base;
 
 struct kvm_ldttss_desc {
@@ -616,6 +624,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	force_new_asid(&svm->vcpu);
 
 	svm->nested_hsave = 0;
+	svm->nested_vmcb = 0;
 	svm->hflags = HF_GIF_MASK;
 }
 
@@ -641,6 +650,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	struct vcpu_svm *svm;
 	struct page *page;
 	struct page *msrpm_pages;
+	struct page *nested_msrpm_pages;
+	struct page *nested_iopm_pages;
 	int err;
 
 	svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -663,9 +674,21 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
 	if (!msrpm_pages)
 		goto uninit;
+
+	nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
+	if (!nested_msrpm_pages)
+		goto uninit;
+
+	nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+	if (!nested_iopm_pages)
+		goto uninit;
+
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
 
+	svm->nested_msrpm = page_address(nested_msrpm_pages);
+	svm->nested_iopm = page_address(nested_iopm_pages);
+
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -695,6 +718,8 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 
 	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
+	__free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER);
+	__free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -1219,6 +1244,130 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+
+static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_msrpm = (u32*)arg1;
+	for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
+	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1,
+				 void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_iopm = (u32*)arg1;
+	u32 *iopm = (u32*)__va(iopm_base);
+	for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_iopm[i] = iopm[i] | nested_iopm[i];
+	svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1,
+			    void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = (struct vmcb *)arg2;
+
+	/* nested_vmcb is our indicator if nested SVM is activated */
+	svm->nested_vmcb = svm->vmcb->save.rax;
+
+	/* Clear internal status */
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Save the old vmcb, so we don't need to pick what we save, but
+	   can restore everything when a VMEXIT occurs */
+	memcpy(hsave, svm->vmcb, sizeof(struct vmcb));
+	/* We need to remember the original CR3 in the SPT case */
+	if (!npt_enabled)
+		hsave->save.cr3 = svm->vcpu.arch.cr3;
+	hsave->save.rip = svm->next_rip;
+
+	/* Load the nested guest state */
+	memcpy(&svm->vmcb->save.es, &nested_vmcb->save.es, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.cs, &nested_vmcb->save.cs, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.ss, &nested_vmcb->save.ss, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.ds, &nested_vmcb->save.ds, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.gdtr, &nested_vmcb->save.gdtr,
+	       sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.idtr, &nested_vmcb->save.idtr,
+	       sizeof(struct vmcb_seg));
+	svm->vmcb->save.rflags = nested_vmcb->save.rflags;
+	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
+	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
+	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = nested_vmcb->save.cr3;
+		svm->vcpu.arch.cr3 = nested_vmcb->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
+		kvm_mmu_reset_context(&svm->vcpu);
+	}
+	svm->vmcb->save.cr2 = nested_vmcb->save.cr2;
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip);
+	/* In case we don't even reach vcpu_run, the fields are not updated */
+	svm->vmcb->save.rax = nested_vmcb->save.rax;
+	svm->vmcb->save.rsp = nested_vmcb->save.rsp;
+	svm->vmcb->save.rip = nested_vmcb->save.rip;
+	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
+	svm->vmcb->save.dr6 = nested_vmcb->save.dr6;
+	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
+
+	/* We don't want a nested guest to be more powerful than the guest,
+	   so all intercepts are ORed */
+	svm->vmcb->control.intercept_cr_read |=
+		nested_vmcb->control.intercept_cr_read;
+	svm->vmcb->control.intercept_cr_write |=
+		nested_vmcb->control.intercept_cr_write;
+	svm->vmcb->control.intercept_dr_read |=
+		nested_vmcb->control.intercept_dr_read;
+	svm->vmcb->control.intercept_dr_write |=
+		nested_vmcb->control.intercept_dr_write;
+	svm->vmcb->control.intercept_exceptions |=
+		nested_vmcb->control.intercept_exceptions;
+
+	svm->vmcb->control.intercept |= nested_vmcb->control.intercept;
+
+	svm->nested_vmcb_msrpm = nested_vmcb->control.msrpm_base_pa;
+	svm->nested_vmcb_iopm = nested_vmcb->control.iopm_base_pa;
+
+	force_new_asid(&svm->vcpu);
+	svm->vmcb->control.exit_int_info = nested_vmcb->control.exit_int_info;
+	svm->vmcb->control.exit_int_info_err = nested_vmcb->control.exit_int_info_err;
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl;
+	if (nested_vmcb->control.int_ctl & V_IRQ_MASK) {
+		nsvm_printk("nSVM Injecting Interrupt: 0x%x\n",
+				nested_vmcb->control.int_ctl);
+	}
+	nsvm_printk("nSVM exit_int_info: 0x%x | int_state: 0x%x\n",
+			nested_vmcb->control.exit_int_info,
+			nested_vmcb->control.int_state);
+
+	svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
+	svm->vmcb->control.int_state = nested_vmcb->control.int_state;
+	svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
+	if (nested_vmcb->control.event_inj & SVM_EVTINJ_VALID)
+		nsvm_printk("Injecting Event: 0x%x\n",
+				nested_vmcb->control.event_inj);
+	svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
+	svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
+
+	svm->hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
 static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
 {
 	memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct vmcb_seg));
@@ -1284,6 +1433,40 @@ static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 
 	return 1;
 }
+
+static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	nsvm_printk("VMrun\n");
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+	if (nested_svm_do(svm, svm->vmcb->save.rax, svm->nested_hsave,
+			  NULL, nested_svm_vmrun))
+		return 1;
+
+	/* Right now SVM always intecepts all IOs except for the DEBUG port.
+	   This is fine for the nested VM too IMHO. */
+#ifdef NESTED_KVM_MERGE_IOPM
+	if (nested_svm_do(svm, svm->vmcb->control.iopm_base_pa, 0,
+		      NULL, nested_svm_vmrun_iopm))
+		return 1;
+#endif
+
+	if (nested_svm_do(svm, svm->vmcb->control.msrpm_base_pa, 0,
+		      NULL, nested_svm_vmrun_msrpm))
+		return 1;
+
+	return 1;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
@@ -1618,7 +1801,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_MSR]				= msr_interception,
 	[SVM_EXIT_TASK_SWITCH]			= task_switch_interception,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
-	[SVM_EXIT_VMRUN]			= invalid_op_interception,
+	[SVM_EXIT_VMRUN]			= vmrun_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= vmload_interception,
 	[SVM_EXIT_VMSAVE]			= vmsave_interception,
@@ -1924,7 +2107,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	svm->host_cr2 = kvm_read_cr2();
 	svm->host_dr6 = read_dr6();
 	svm->host_dr7 = read_dr7();
-	svm->vmcb->save.cr2 = vcpu->arch.cr2;
+	if (!is_nested(svm))
+		svm->vmcb->save.cr2 = vcpu->arch.cr2;
 	/* required for live migration with NPT */
 	if (npt_enabled)
 		svm->vmcb->save.cr3 = vcpu->arch.cr3;
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-01 11:58               ` [PATCH 8/9] Add VMRUN handler Alexander Graf
@ 2008-09-01 11:58                 ` Alexander Graf
  2008-09-01 13:58                   ` Avi Kivity
  2008-09-01 13:41                 ` [PATCH 8/9] Add VMRUN handler Avi Kivity
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 11:58 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, Alexander Graf

This adds the #VMEXIT intercept, so we return to the level 1 guest
when something happens in the level 2 guest that should return to
the level 1 guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |  305 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 305 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5c1c87e..da1fa1d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -74,6 +74,11 @@ module_param(npt, int, S_IRUGO);
 static void kvm_reput_irq(struct vcpu_svm *svm);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override);
+static int nested_svm_vmexit(struct vcpu_svm *svm);
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque);
+
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -223,6 +228,21 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/* If we are within a nested VM we'd better #VMEXIT and let the
+	   guest handle the exception */
+	if (is_nested(svm)) {
+		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
+		svm->vmcb->control.exit_code_hi = 0;
+		svm->vmcb->control.exit_info_1 = error_code;
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
+
+			nested_svm_vmexit(svm);
+			return;
+		}
+	}
+
 	svm->vmcb->control.event_inj = nr
 		| SVM_EVTINJ_VALID
 		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
@@ -1178,6 +1198,21 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static inline int nested_svm_intr(struct vcpu_svm *svm)
+{
+	if (is_nested(svm)) {
+		svm->vmcb->control.exit_code = SVM_EXIT_INTR;
+
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> INTR\n");
+			nested_svm_vmexit(svm);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
 {
 	struct page *page;
@@ -1244,6 +1279,257 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_exit_handled_real(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	bool kvm_overrides = *(bool *)opaque;
+	u32 exit_code = svm->vmcb->control.exit_code;
+
+	if (kvm_overrides) {
+		switch (exit_code) {
+		case SVM_EXIT_INTR:
+		case SVM_EXIT_NMI:
+			return 0;
+		/* For now we are always handling NPFs when using them */
+		case SVM_EXIT_NPF:
+			if (npt_enabled)
+				return 0;
+			break;
+		/* When we're shadowing, trap PFs */
+		case SVM_EXIT_EXCP_BASE + PF_VECTOR:
+			if (!npt_enabled)
+				return 0;
+			break;
+		default:
+			break;
+		}
+	}
+
+	switch (exit_code) {
+	case SVM_EXIT_READ_CR0 ... SVM_EXIT_READ_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_READ_CR0);
+		if (nested_vmcb->control.intercept_cr_read & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_CR0 ... SVM_EXIT_WRITE_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_WRITE_CR0);
+		if (nested_vmcb->control.intercept_cr_write & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_READ_DR0 ... SVM_EXIT_READ_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_READ_DR0);
+		if (nested_vmcb->control.intercept_dr_read & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_DR0 ... SVM_EXIT_WRITE_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_WRITE_DR0);
+		if (nested_vmcb->control.intercept_dr_write & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
+		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
+		if (nested_vmcb->control.intercept_exceptions & excp_bits)
+			return 1;
+		break;
+	}
+	default: {
+		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
+		nsvm_printk("exit code: 0x%x\n", exit_code);
+		if (nested_vmcb->control.intercept & exit_bits)
+			return 1;
+	}
+	}
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_exit_handled_io(struct vcpu_svm *svm,
+				      void *arg1, void *arg2,
+				      void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u16 param = (u16)(svm->vmcb->control.exit_info_1);
+	u16 port = (u16)(svm->vmcb->control.exit_info_1 >> 16);
+	u16 mask = (1 << ((param >> 4) & 7)) - 1;
+	u8 *iopm = (u8 *)arg2 + (port / 8);
+	u16 iopmw = iopm[0] | (iopm[1] << 8);
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
+		return 0;
+	if (iopmw & (mask << (port & 7)))
+		return 1;
+
+	nsvm_printk("nKVM: No IO-Intercept on param=0x%hx port=0x%hx "
+		    "mask=0x%hx iopm=0x%hx\n", param, port, mask, iopmw);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_exit_handled_msr(struct vcpu_svm *svm,
+				       void *arg1, void *arg2,
+				       void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u8 *msrpm = (u8 *)arg2;
+        u32 t0, t1;
+	u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
+	u32 param = svm->vmcb->control.exit_info_1 & 1;
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+		return 0;
+
+	switch(msr) {
+	case 0 ... 0x1fff:
+		t0 = (msr * 2) % 8;
+		t1 = msr / 8;
+		break;
+	case 0xc0000000 ... 0xc0001fff:
+		t0 = (8192 + msr - 0xc0000000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	case 0xc0010000 ... 0xc0011fff:
+		t0 = (16384 + msr - 0xc0010000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	default:
+		return 1;
+		break;
+	}
+	if (msrpm[t1] & ((1 << param) << t0))
+		return 1;
+
+	return 0;
+}
+
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override)
+{
+	bool k = kvm_override;
+
+	switch (svm->vmcb->control.exit_code) {
+#ifdef NESTED_KVM_MERGE_IOPM
+	case SVM_EXIT_IOIO:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_iopm, NULL,
+				     nested_svm_exit_handled_io);
+		break;
+#endif
+	case SVM_EXIT_MSR:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_msrpm, NULL,
+				     nested_svm_exit_handled_msr);
+	default: break;
+	}
+
+	return nested_svm_do(svm, svm->nested_vmcb, 0, &k,
+			     nested_svm_exit_handled_real);
+}
+
+static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = (struct vmcb *)arg2;
+	u64 nested_save[] = { nested_vmcb->save.cr0,
+			      nested_vmcb->save.cr3,
+			      nested_vmcb->save.cr4,
+			      nested_vmcb->save.efer,
+			      nested_vmcb->control.intercept_cr_read,
+			      nested_vmcb->control.intercept_cr_write,
+			      nested_vmcb->control.intercept_dr_read,
+			      nested_vmcb->control.intercept_dr_write,
+			      nested_vmcb->control.intercept_exceptions,
+			      nested_vmcb->control.intercept,
+			      nested_vmcb->control.msrpm_base_pa,
+			      nested_vmcb->control.iopm_base_pa,
+			      nested_vmcb->control.tsc_offset };
+
+	/* Give the current vmcb to the guest */
+	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
+	nested_vmcb->save.cr0 = nested_save[0];
+	if (!npt_enabled)
+		nested_vmcb->save.cr3 = nested_save[1];
+	nested_vmcb->save.cr4 = nested_save[2];
+	nested_vmcb->save.efer = nested_save[3];
+	nested_vmcb->control.intercept_cr_read = nested_save[4];
+	nested_vmcb->control.intercept_cr_write = nested_save[5];
+	nested_vmcb->control.intercept_dr_read = nested_save[6];
+	nested_vmcb->control.intercept_dr_write = nested_save[7];
+	nested_vmcb->control.intercept_exceptions = nested_save[8];
+	nested_vmcb->control.intercept = nested_save[9];
+	nested_vmcb->control.msrpm_base_pa = nested_save[10];
+	nested_vmcb->control.iopm_base_pa = nested_save[11];
+	nested_vmcb->control.tsc_offset = nested_save[12];
+
+	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
+	    (nested_vmcb->control.int_vector)) {
+		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
+				nested_vmcb->control.int_vector);
+	}
+
+	/* Restore the original control entries */
+	memcpy(&svm->vmcb->control, &hsave->control,
+	       sizeof(struct vmcb_control_area));
+	force_new_asid(&svm->vcpu);
+	/* Kill any pending exceptions */
+	if (svm->vcpu.arch.exception.pending == true)
+		nsvm_printk("WARNING: Pending Exception\n");
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Restore selected save entries */
+	memcpy(&svm->vmcb->save.es, &hsave->save.es, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.cs, &hsave->save.cs, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.ss, &hsave->save.ss, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.ds, &hsave->save.ds, sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.gdtr, &hsave->save.gdtr,
+	       sizeof(struct vmcb_seg));
+	memcpy(&svm->vmcb->save.idtr, &hsave->save.idtr,
+	       sizeof(struct vmcb_seg));
+	svm->vmcb->save.rflags = hsave->save.rflags;
+	svm_set_efer(&svm->vcpu, hsave->save.efer);
+	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
+	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = hsave->save.cr3;
+		svm->vcpu.arch.cr3 = hsave->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, hsave->save.cr3);
+	}
+	kvm_mmu_reset_context(&svm->vcpu);
+	kvm_mmu_load(&svm->vcpu);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip);
+	svm->vmcb->save.dr7 = 0;
+	svm->vmcb->save.cpl = 0;
+	svm->vmcb->control.exit_int_info = 0;
+
+	svm->hflags &= ~HF_GIF_MASK;
+	/* Exit nested SVM mode */
+	svm->nested_vmcb = 0;
+
+	return 0;
+}
+
+static int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+	nsvm_printk("VMexit\n");
+	if (nested_svm_do(svm, svm->nested_vmcb, svm->nested_hsave,
+			  NULL, nested_svm_vmexit_real))
+		return 1;
+
+	return 0;
+}
 
 static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
 				  void *arg2, void *opaque)
@@ -1822,6 +2108,17 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip,
 		    (u32)((u64)svm->vmcb->save.rip >> 32), entryexit);
 
+	if (is_nested(svm)) {
+		nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
+			    exit_code, svm->vmcb->control.exit_info_1,
+			    svm->vmcb->control.exit_info_2, svm->vmcb->save.rip);
+		if (nested_svm_exit_handled(svm, true)) {
+			nested_svm_vmexit(svm);
+			nsvm_printk("-> #VMEXIT\n");
+			return 1;
+		}
+	}
+
 	if (npt_enabled) {
 		int mmu_reload = 0;
 		if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) {
@@ -1908,6 +2205,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	nested_svm_intr(svm);
+
 	svm_inject_irq(svm, irq);
 }
 
@@ -1953,6 +2252,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (nested_svm_intr(svm))
+		goto out;
+
 	if (!(svm->hflags & HF_GIF_MASK))
 		goto out;
 
@@ -2005,6 +2307,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_control_area *control = &svm->vmcb->control;
 
+	if (nested_svm_intr(svm))
+		return;
+
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
 		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-01 11:58                 ` [PATCH 9/9] Add VMEXIT handler and intercepts Alexander Graf
@ 2008-09-01 13:58                   ` Avi Kivity
  2008-09-02 16:15                     ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:58 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> This adds the #VMEXIT intercept, so we return to the level 1 guest
> when something happens in the level 2 guest that should return to
> the level 1 guest.
>
> @@ -223,6 +228,21 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	/* If we are within a nested VM we'd better #VMEXIT and let the
> +	   guest handle the exception */
> +	if (is_nested(svm)) {
> +		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
> +		svm->vmcb->control.exit_code_hi = 0;
> +		svm->vmcb->control.exit_info_1 = error_code;
> +		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>   

#vmexit isn't supposed to modify cr2, but we've corrupted it here.

> +		if (nested_svm_exit_handled(svm, false)) {
> +			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
> +
> +			nested_svm_vmexit(svm);
> +			return;
> +		}
> +	}
> +
>   

Please move the entire block into a separate function.


> +static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
> +				  void *arg2, void *opaque)
> +{
> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
> +	struct vmcb *hsave = (struct vmcb *)arg2;
> +	u64 nested_save[] = { nested_vmcb->save.cr0,
> +			      nested_vmcb->save.cr3,
> +			      nested_vmcb->save.cr4,
> +			      nested_vmcb->save.efer,
> +			      nested_vmcb->control.intercept_cr_read,
> +			      nested_vmcb->control.intercept_cr_write,
> +			      nested_vmcb->control.intercept_dr_read,
> +			      nested_vmcb->control.intercept_dr_write,
> +			      nested_vmcb->control.intercept_exceptions,
> +			      nested_vmcb->control.intercept,
> +			      nested_vmcb->control.msrpm_base_pa,
> +			      nested_vmcb->control.iopm_base_pa,
> +			      nested_vmcb->control.tsc_offset };
> +
> +	/* Give the current vmcb to the guest */
> +	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
> +	nested_vmcb->save.cr0 = nested_save[0];
> +	if (!npt_enabled)
> +		nested_vmcb->save.cr3 = nested_save[1];
> +	nested_vmcb->save.cr4 = nested_save[2];
> +	nested_vmcb->save.efer = nested_save[3];
> +	nested_vmcb->control.intercept_cr_read = nested_save[4];
> +	nested_vmcb->control.intercept_cr_write = nested_save[5];
> +	nested_vmcb->control.intercept_dr_read = nested_save[6];
> +	nested_vmcb->control.intercept_dr_write = nested_save[7];
> +	nested_vmcb->control.intercept_exceptions = nested_save[8];
> +	nested_vmcb->control.intercept = nested_save[9];
> +	nested_vmcb->control.msrpm_base_pa = nested_save[10];
> +	nested_vmcb->control.iopm_base_pa = nested_save[11];
> +	nested_vmcb->control.tsc_offset = nested_save[12];
> +
> +	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
> +	    (nested_vmcb->control.int_vector)) {
> +		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
> +				nested_vmcb->control.int_vector);
> +	}
> +
> +	/* Restore the original control entries */
> +	memcpy(&svm->vmcb->control, &hsave->control,
> +	       sizeof(struct vmcb_control_area));
> +	force_new_asid(&svm->vcpu);
> +	/* Kill any pending exceptions */
> +	if (svm->vcpu.arch.exception.pending == true)
> +		nsvm_printk("WARNING: Pending Exception\n");
>   

This should set control.exit_int_info.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-01 13:58                   ` Avi Kivity
@ 2008-09-02 16:15                     ` Alexander Graf
  2008-09-03  9:23                       ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-02 16:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 3:58 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>> This adds the #VMEXIT intercept, so we return to the level 1 guest
>> when something happens in the level 2 guest that should return to
>> the level 1 guest.
>>
>> @@ -223,6 +228,21 @@ static void svm_queue_exception(struct  
>> kvm_vcpu *vcpu, unsigned nr,
>> {
>> 	struct vcpu_svm *svm = to_svm(vcpu);
>> +	/* If we are within a nested VM we'd better #VMEXIT and let the
>> +	   guest handle the exception */
>> +	if (is_nested(svm)) {
>> +		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
>> +		svm->vmcb->control.exit_code_hi = 0;
>> +		svm->vmcb->control.exit_info_1 = error_code;
>> +		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>>
>
> #vmexit isn't supposed to modify cr2, but we've corrupted it here.

Well, yes and no. We modified the arch.cr2 but later on in vcpu_run we  
don't set the vmcb cr2 field based on that when we're running inside a  
VM, so cr2 stays the same as before.

>> +		if (nested_svm_exit_handled(svm, false)) {
>> +			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
>> +
>> +			nested_svm_vmexit(svm);
>> +			return;
>> +		}
>> +	}
>> +
>>
>
> Please move the entire block into a separate function.
>
>
>> +static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
>> +				  void *arg2, void *opaque)
>> +{
>> +	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
>> +	struct vmcb *hsave = (struct vmcb *)arg2;
>> +	u64 nested_save[] = { nested_vmcb->save.cr0,
>> +			      nested_vmcb->save.cr3,
>> +			      nested_vmcb->save.cr4,
>> +			      nested_vmcb->save.efer,
>> +			      nested_vmcb->control.intercept_cr_read,
>> +			      nested_vmcb->control.intercept_cr_write,
>> +			      nested_vmcb->control.intercept_dr_read,
>> +			      nested_vmcb->control.intercept_dr_write,
>> +			      nested_vmcb->control.intercept_exceptions,
>> +			      nested_vmcb->control.intercept,
>> +			      nested_vmcb->control.msrpm_base_pa,
>> +			      nested_vmcb->control.iopm_base_pa,
>> +			      nested_vmcb->control.tsc_offset };
>> +
>> +	/* Give the current vmcb to the guest */
>> +	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
>> +	nested_vmcb->save.cr0 = nested_save[0];
>> +	if (!npt_enabled)
>> +		nested_vmcb->save.cr3 = nested_save[1];
>> +	nested_vmcb->save.cr4 = nested_save[2];
>> +	nested_vmcb->save.efer = nested_save[3];
>> +	nested_vmcb->control.intercept_cr_read = nested_save[4];
>> +	nested_vmcb->control.intercept_cr_write = nested_save[5];
>> +	nested_vmcb->control.intercept_dr_read = nested_save[6];
>> +	nested_vmcb->control.intercept_dr_write = nested_save[7];
>> +	nested_vmcb->control.intercept_exceptions = nested_save[8];
>> +	nested_vmcb->control.intercept = nested_save[9];
>> +	nested_vmcb->control.msrpm_base_pa = nested_save[10];
>> +	nested_vmcb->control.iopm_base_pa = nested_save[11];
>> +	nested_vmcb->control.tsc_offset = nested_save[12];
>> +
>> +	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
>> +	    (nested_vmcb->control.int_vector)) {
>> +		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
>> +				nested_vmcb->control.int_vector);
>> +	}
>> +
>> +	/* Restore the original control entries */
>> +	memcpy(&svm->vmcb->control, &hsave->control,
>> +	       sizeof(struct vmcb_control_area));
>> +	force_new_asid(&svm->vcpu);
>> +	/* Kill any pending exceptions */
>> +	if (svm->vcpu.arch.exception.pending == true)
>> +		nsvm_printk("WARNING: Pending Exception\n");
>>
>
> This should set control.exit_int_info.

This is more of a fallback. No exceptions should be in "injecting"  
state on vmexit. That would mean that after an exit that was not  
handled in the nested VMM we need to inject some exception, which  
should in almost all cases already raise a #VMEXIT itself. So this  
should never hit.

Alex

>
>
> -- 
> error compiling committee.c: too many arguments to function
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-02 16:15                     ` Alexander Graf
@ 2008-09-03  9:23                       ` Avi Kivity
  2008-09-03  9:33                         ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-03  9:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>>> +        svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>>>
>>
>> #vmexit isn't supposed to modify cr2, but we've corrupted it here.
>
> Well, yes and no. We modified the arch.cr2 but later on in vcpu_run we 
> don't set the vmcb cr2 field based on that when we're running inside a 
> VM, so cr2 stays the same as before.

What about later, when there is a virtual #VMEXIT?  Won't that cr2 leak in?

>>> +    /* Kill any pending exceptions */
>>> +    if (svm->vcpu.arch.exception.pending == true)
>>> +        nsvm_printk("WARNING: Pending Exception\n");
>>>
>>
>> This should set control.exit_int_info.
>
> This is more of a fallback. No exceptions should be in "injecting" 
> state on vmexit. That would mean that after an exit that was not 
> handled in the nested VMM we need to inject some exception, which 
> should in almost all cases already raise a #VMEXIT itself. So this 
> should never hit. 

What about, say, #PF on the IDT when attempting to inject an exception?  
We should tell the guest about that so it can reinject the exception 
into its own guest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-03  9:23                       ` Avi Kivity
@ 2008-09-03  9:33                         ` Alexander Graf
  2008-09-03  9:47                           ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-03  9:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 3, 2008, at 11:23 AM, Avi Kivity wrote:

> Alexander Graf wrote:
>>>> +        svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>>>>
>>>
>>> #vmexit isn't supposed to modify cr2, but we've corrupted it here.
>>
>> Well, yes and no. We modified the arch.cr2 but later on in vcpu_run  
>> we don't set the vmcb cr2 field based on that when we're running  
>> inside a VM, so cr2 stays the same as before.
>
> What about later, when there is a virtual #VMEXIT?  Won't that cr2  
> leak in?

The cr2 value would leak into the level1 guest that can't rely on CR2  
to be correct anyways.

>
>
>>>> +    /* Kill any pending exceptions */
>>>> +    if (svm->vcpu.arch.exception.pending == true)
>>>> +        nsvm_printk("WARNING: Pending Exception\n");
>>>>
>>>
>>> This should set control.exit_int_info.
>>
>> This is more of a fallback. No exceptions should be in "injecting"  
>> state on vmexit. That would mean that after an exit that was not  
>> handled in the nested VMM we need to inject some exception, which  
>> should in almost all cases already raise a #VMEXIT itself. So this  
>> should never hit.
>
> What about, say, #PF on the IDT when attempting to inject an  
> exception?  We should tell the guest about that so it can reinject  
> the exception into its own guest.

Doesn't the CPU already does this for us? When the guest VMM wants to  
inject an interrupt/exception, that would result in a real injection.  
So when we get a #PF on the IDT the real CPU already filled  
exit_int_info for us and we can just pass it on to the guest VMM...

I can't think of a situation where we inject a #PF and did not get a  
#VMEXIT from a #PF before that.

Alex

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-03  9:33                         ` Alexander Graf
@ 2008-09-03  9:47                           ` Avi Kivity
  2008-09-03 11:55                             ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-03  9:47 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>
> On Sep 3, 2008, at 11:23 AM, Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>>>> +        svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>>>>>
>>>>
>>>> #vmexit isn't supposed to modify cr2, but we've corrupted it here.
>>>
>>> Well, yes and no. We modified the arch.cr2 but later on in vcpu_run 
>>> we don't set the vmcb cr2 field based on that when we're running 
>>> inside a VM, so cr2 stays the same as before.
>>
>> What about later, when there is a virtual #VMEXIT?  Won't that cr2 
>> leak in?
>
> The cr2 value would leak into the level1 guest that can't rely on CR2 
> to be correct anyways.

That's not very nice.

>
>>
>>
>>>>> +    /* Kill any pending exceptions */
>>>>> +    if (svm->vcpu.arch.exception.pending == true)
>>>>> +        nsvm_printk("WARNING: Pending Exception\n");
>>>>>
>>>>
>>>> This should set control.exit_int_info.
>>>
>>> This is more of a fallback. No exceptions should be in "injecting" 
>>> state on vmexit. That would mean that after an exit that was not 
>>> handled in the nested VMM we need to inject some exception, which 
>>> should in almost all cases already raise a #VMEXIT itself. So this 
>>> should never hit.
>>
>> What about, say, #PF on the IDT when attempting to inject an 
>> exception?  We should tell the guest about that so it can reinject 
>> the exception into its own guest.
>
> Doesn't the CPU already does this for us? When the guest VMM wants to 
> inject an interrupt/exception, that would result in a real injection. 

It could be an exception initiated by the CPU (say, divide overflow) or 
an exception initiated by the guest VMM.

> So when we get a #PF on the IDT the real CPU already filled 
> exit_int_info for us and we can just pass it on to the guest VMM...
>

We might be emulating a nested guest instruction, and injecting some 
expection.

> I can't think of a situation where we inject a #PF and did not get a 
> #VMEXIT from a #PF before that.

I'm confused...

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 9/9] Add VMEXIT handler and intercepts
  2008-09-03  9:47                           ` Avi Kivity
@ 2008-09-03 11:55                             ` Alexander Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-03 11:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 3, 2008, at 11:47 AM, Avi Kivity wrote:

> Alexander Graf wrote:
>>
>> On Sep 3, 2008, at 11:23 AM, Avi Kivity wrote:
>>
>>> Alexander Graf wrote:
>>>>>> +        svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
>>>>>>
>>>>>
>>>>> #vmexit isn't supposed to modify cr2, but we've corrupted it here.
>>>>
>>>> Well, yes and no. We modified the arch.cr2 but later on in  
>>>> vcpu_run we don't set the vmcb cr2 field based on that when we're  
>>>> running inside a VM, so cr2 stays the same as before.
>>>
>>> What about later, when there is a virtual #VMEXIT?  Won't that cr2  
>>> leak in?
>>
>> The cr2 value would leak into the level1 guest that can't rely on  
>> CR2 to be correct anyways.
>
> That's not very nice.

Is it a problem? We could set arch.cr2 to nested_vmcb->save.cr2 on  
#VMEXIT, so the emulated #VMEXIT behaves the same way as real hardware  
(just let the CR2 value from the guest slip through).

>
>
>>
>>>
>>>
>>>>>> +    /* Kill any pending exceptions */
>>>>>> +    if (svm->vcpu.arch.exception.pending == true)
>>>>>> +        nsvm_printk("WARNING: Pending Exception\n");
>>>>>>
>>>>>
>>>>> This should set control.exit_int_info.
>>>>
>>>> This is more of a fallback. No exceptions should be in  
>>>> "injecting" state on vmexit. That would mean that after an exit  
>>>> that was not handled in the nested VMM we need to inject some  
>>>> exception, which should in almost all cases already raise a  
>>>> #VMEXIT itself. So this should never hit.
>>>
>>> What about, say, #PF on the IDT when attempting to inject an  
>>> exception?  We should tell the guest about that so it can reinject  
>>> the exception into its own guest.
>>
>> Doesn't the CPU already does this for us? When the guest VMM wants  
>> to inject an interrupt/exception, that would result in a real  
>> injection.
>
> It could be an exception initiated by the CPU (say, divide overflow)  
> or an exception initiated by the guest VMM.
>
>> So when we get a #PF on the IDT the real CPU already filled  
>> exit_int_info for us and we can just pass it on to the guest VMM...
>>
>
> We might be emulating a nested guest instruction, and injecting some  
> expection.
>
>> I can't think of a situation where we inject a #PF and did not get  
>> a #VMEXIT from a #PF before that.
>
> I'm confused...

The only cases we override a #VMEXIT to be taken by the host KVM  
instead of the nested VMM are:

- INTR
- NMI
- NPF
- PF

Intr and NMI are nops.
NPF handles only the nested page faults and should not inject anything  
or emulate any instructions. It's basically invisible from within the  
VM.

PF comes into the game when handling double-shadow-pages. Here we  
might need to emulate an instruction or inject a #PF into the guest.
Since we already left the VM due to a #PF, exit_int_info is already  
set correctly and will just be copied to the nested vmcb, so we don't  
need to handle that case, no?

All other exits result in a direct #VMEXIT that keep all control  
information for the guest VMM.

Alex


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/9] Add VMRUN handler
  2008-09-01 11:58               ` [PATCH 8/9] Add VMRUN handler Alexander Graf
  2008-09-01 11:58                 ` [PATCH 9/9] Add VMEXIT handler and intercepts Alexander Graf
@ 2008-09-01 13:41                 ` Avi Kivity
  2008-09-02 15:38                   ` Alexander Graf
  1 sibling, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:41 UTC (permalink / raw)
  To: Alexander Graf; +Cc: KVM list, Joerg Roedel, Anthony Liguori

Alexander Graf wrote:
> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
> in the same context as the normal guest CPU would run.
> So basically it is implemented the same way, a normal CPU would do it.
>
> We also prepare all intercepts that get OR'ed with the original
> intercepts, as we do not allow a level 2 guest to be intercepted less
> than the first level guest.
>
>  
>  
> +/* Not needed until device passthrough */
> +/* #define NESTED_KVM_MERGE_IOPM */
> +
>   

I'd like to drop port 80 passthrough anyway.  Device assignment is 
unlikely to make heavy use of ioports.

> @@ -663,9 +674,21 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
>  	msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
>  	if (!msrpm_pages)
>  		goto uninit;
> +
> +	nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
> +	if (!nested_msrpm_pages)
> +		goto uninit;
> +
> +	nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
> +	if (!nested_iopm_pages)
> +		goto uninit;
> +
>   

Maybe we should do that on the first time the guest enters nested svm, 
to save a bit of memory.

We can do that in a later patch, though.

>  
> +
> +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
> +				  void *arg2, void *opaque)
> +{
> +	int i;
> +	u32 *nested_msrpm = (u32*)arg1;
> +	for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
> +		svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
> +	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
> +
> +	return 0;
> +}
>   

Hm.  Have you verified that kvm actually has msr emulation for all the 
msrs it allows through msrpm?

I guess it has to, since the msrs can be set through save/restore.


(vmrun emulation)
> +
> +	force_new_asid(&svm->vcpu);
>   

I would be nice not to do this (can be left for later of course; it 
could be quite complex).

> +
> +static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	nsvm_printk("VMrun\n");
> +
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}
>   

Skip after check.

I think you also need special treatment for the guest's eflags.if.  If 
interrupts are enabled for the guest when vmrun is executed, and kvm 
tries to inject a virtual interrupt, then it should result in a virtual 
#VMEXIT.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/9] Add VMRUN handler
  2008-09-01 13:41                 ` [PATCH 8/9] Add VMRUN handler Avi Kivity
@ 2008-09-02 15:38                   ` Alexander Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-02 15:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, Joerg Roedel, Anthony Liguori


On Sep 1, 2008, at 3:41 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
>> in the same context as the normal guest CPU would run.
>> So basically it is implemented the same way, a normal CPU would do  
>> it.
>>
>> We also prepare all intercepts that get OR'ed with the original
>> intercepts, as we do not allow a level 2 guest to be intercepted less
>> than the first level guest.
>>
>>  +/* Not needed until device passthrough */
>> +/* #define NESTED_KVM_MERGE_IOPM */
>> +
>>
>
> I'd like to drop port 80 passthrough anyway.  Device assignment is  
> unlikely to make heavy use of ioports.
>
>> @@ -663,9 +674,21 @@ static struct kvm_vcpu *svm_create_vcpu(struct  
>> kvm *kvm, unsigned int id)
>> 	msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
>> 	if (!msrpm_pages)
>> 		goto uninit;
>> +
>> +	nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
>> +	if (!nested_msrpm_pages)
>> +		goto uninit;
>> +
>> +	nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
>> +	if (!nested_iopm_pages)
>> +		goto uninit;
>> +
>>
>
> Maybe we should do that on the first time the guest enters nested  
> svm, to save a bit of memory.
>
> We can do that in a later patch, though.
>
>> +
>> +static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
>> +				  void *arg2, void *opaque)
>> +{
>> +	int i;
>> +	u32 *nested_msrpm = (u32*)arg1;
>> +	for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
>> +		svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
>> +	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
>> +
>> +	return 0;
>> +}
>>
>
> Hm.  Have you verified that kvm actually has msr emulation for all  
> the msrs it allows through msrpm?
>
> I guess it has to, since the msrs can be set through save/restore.
>
>
> (vmrun emulation)
>> +
>> +	force_new_asid(&svm->vcpu);
>>
>
> I would be nice not to do this (can be left for later of course; it  
> could be quite complex).
>
>> +
>> +static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run  
>> *kvm_run)
>> +{
>> +	nsvm_printk("VMrun\n");
>> +
>> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>> +	skip_emulated_instruction(&svm->vcpu);
>> +
>> +	if (svm->vmcb->save.cpl) {
>> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
>> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
>> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
>> +		return 1;
>> +	}
>>
>
> Skip after check.
>
> I think you also need special treatment for the guest's eflags.if.   
> If interrupts are enabled for the guest when vmrun is executed, and  
> kvm tries to inject a virtual interrupt, then it should result in a  
> virtual #VMEXIT.

For now I just always assume that's the case. It might be a good idea  
to store the real eflags.if somewhere in the hflags though.

Btw: Thanks a bunch for reviewing all this!

>
>
> -- 
> error compiling committee.c: too many arguments to function
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] Add VMLOAD and VMSAVE handlers
  2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
  2008-09-01 11:58               ` [PATCH 8/9] Add VMRUN handler Alexander Graf
@ 2008-09-01 13:27               ` Avi Kivity
  2008-09-01 14:14                 ` Alexander Graf
  1 sibling, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> This implements the VMLOAD and VMSAVE instructions, that usually surround
> the VMRUN instructions. Both instructions load / restore the same elements,
> so we only need to implement them once.
>
>  
> +static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
> +{
> +	memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct vmcb_seg));
> +	memcpy(&to_vmcb->save.gs, &from_vmcb->save.gs, sizeof(struct vmcb_seg));
> +	memcpy(&to_vmcb->save.tr, &from_vmcb->save.tr, sizeof(struct vmcb_seg));
> +	memcpy(&to_vmcb->save.ldtr, &from_vmcb->save.ldtr,
> +	       sizeof(struct vmcb_seg));
>   

You can use simple assignment here.

> +static int vmload_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl,
> +		       kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}
>   

Check before skip.

Don't we need to check that svm is enabled in the guest as well (and 
inject #UD if not)?

> +
> +static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}
> +
> +	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmsave);
> +
>   

Ditto.

> +	return 1;
> +}
>   

Blank line.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] Add VMLOAD and VMSAVE handlers
  2008-09-01 13:27               ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Avi Kivity
@ 2008-09-01 14:14                 ` Alexander Graf
  2008-09-01 14:27                   ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 3:27 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>> This implements the VMLOAD and VMSAVE instructions, that usually  
>> surround
>> the VMRUN instructions. Both instructions load / restore the same  
>> elements,
>> so we only need to implement them once.
>>
>> +static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct  
>> vmcb *to_vmcb)
>> +{
>> +	memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct  
>> vmcb_seg));
>> +	memcpy(&to_vmcb->save.gs, &from_vmcb->save.gs, sizeof(struct  
>> vmcb_seg));
>> +	memcpy(&to_vmcb->save.tr, &from_vmcb->save.tr, sizeof(struct  
>> vmcb_seg));
>> +	memcpy(&to_vmcb->save.ldtr, &from_vmcb->save.ldtr,
>> +	       sizeof(struct vmcb_seg));
>>
>
> You can use simple assignment here.

Uh ... like to_vmcb->save.fs = from_vmcb->save.fs; ? That works?

>
>
>> +static int vmload_interception(struct vcpu_svm *svm, struct  
>> kvm_run *kvm_run)
>> +{
>> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>> +	skip_emulated_instruction(&svm->vcpu);
>> +
>> +	if (svm->vmcb->save.cpl) {
>> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
>> +		       __func__, svm->vmcb->save.cpl,
>> +		       kvm_rip_read(&svm->vcpu));
>> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
>> +		return 1;
>> +	}
>>
>
> Check before skip.
>
> Don't we need to check that svm is enabled in the guest as well (and  
> inject #UD if not)?

The EFER_SVME check happens in nested_svm_do. This way all SVM  
operations get the check without code duplication.

>> +
>> +static int vmsave_interception(struct vcpu_svm *svm, struct  
>> kvm_run *kvm_run)
>> +{
>> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>> +	skip_emulated_instruction(&svm->vcpu);
>> +
>> +	if (svm->vmcb->save.cpl) {
>> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
>> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
>> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
>> +		return 1;
>> +	}
>> +
>> +	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL,  
>> nested_svm_vmsave);
>> +
>>
>
> Ditto.
>
>> +	return 1;
>> +}
>>
>
> Blank line.
>
> -- 
> error compiling committee.c: too many arguments to function
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] Add VMLOAD and VMSAVE handlers
  2008-09-01 14:14                 ` Alexander Graf
@ 2008-09-01 14:27                   ` Avi Kivity
  2008-09-01 14:49                     ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 14:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>
> On Sep 1, 2008, at 3:27 PM, Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>> This implements the VMLOAD and VMSAVE instructions, that usually 
>>> surround
>>> the VMRUN instructions. Both instructions load / restore the same 
>>> elements,
>>> so we only need to implement them once.
>>>
>>> +static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct 
>>> vmcb *to_vmcb)
>>> +{
>>> +    memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct 
>>> vmcb_seg));
>>> +    memcpy(&to_vmcb->save.gs, &from_vmcb->save.gs, sizeof(struct 
>>> vmcb_seg));
>>> +    memcpy(&to_vmcb->save.tr, &from_vmcb->save.tr, sizeof(struct 
>>> vmcb_seg));
>>> +    memcpy(&to_vmcb->save.ldtr, &from_vmcb->save.ldtr,
>>> +           sizeof(struct vmcb_seg));
>>>
>>
>> You can use simple assignment here.
>
> Uh ... like to_vmcb->save.fs = from_vmcb->save.fs; ? That works?

Welcome to 1983 (or whenever this was introduced) :)

>>
>> Don't we need to check that svm is enabled in the guest as well (and 
>> inject #UD if not)?
>
> The EFER_SVME check happens in nested_svm_do. This way all SVM 
> operations get the check without code duplication.

But the cpl check is not done?  Sad.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] Add VMLOAD and VMSAVE handlers
  2008-09-01 14:27                   ` Avi Kivity
@ 2008-09-01 14:49                     ` Alexander Graf
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 4:27 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>>
>> On Sep 1, 2008, at 3:27 PM, Avi Kivity wrote:
>>
>>> Alexander Graf wrote:
>>>> This implements the VMLOAD and VMSAVE instructions, that usually  
>>>> surround
>>>> the VMRUN instructions. Both instructions load / restore the same  
>>>> elements,
>>>> so we only need to implement them once.
>>>>
>>>> +static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct  
>>>> vmcb *to_vmcb)
>>>> +{
>>>> +    memcpy(&to_vmcb->save.fs, &from_vmcb->save.fs, sizeof(struct  
>>>> vmcb_seg));
>>>> +    memcpy(&to_vmcb->save.gs, &from_vmcb->save.gs, sizeof(struct  
>>>> vmcb_seg));
>>>> +    memcpy(&to_vmcb->save.tr, &from_vmcb->save.tr, sizeof(struct  
>>>> vmcb_seg));
>>>> +    memcpy(&to_vmcb->save.ldtr, &from_vmcb->save.ldtr,
>>>> +           sizeof(struct vmcb_seg));
>>>>
>>>
>>> You can use simple assignment here.
>>
>> Uh ... like to_vmcb->save.fs = from_vmcb->save.fs; ? That works?
>
> Welcome to 1983 (or whenever this was introduced) :)

Wow, I didn't know that actually works - cool!

>
>
>>>
>>> Don't we need to check that svm is enabled in the guest as well  
>>> (and inject #UD if not)?
>>
>> The EFER_SVME check happens in nested_svm_do. This way all SVM  
>> operations get the check without code duplication.
>
> But the cpl check is not done?  Sad.

Yeah, I actually had it in there for quite a while until I realized  
that #VMEXIT never worked as soon as I came into userspace (cpl3) in  
the VM ...

Alex


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/9] Implement hsave
  2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
  2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
@ 2008-09-01 13:15             ` Avi Kivity
  2008-09-01 14:11               ` Alexander Graf
  2008-09-01 13:21             ` Avi Kivity
  2 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> Implement the hsave MSR, that gives the VCPU a GPA to save the
> old guest state in.
>   

There's a list of msrs exported to userspace somewhere; adding this will 
automatically add save/restore/migrate support for hsave.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/9] Implement hsave
  2008-09-01 13:15             ` [PATCH 6/9] Implement hsave Avi Kivity
@ 2008-09-01 14:11               ` Alexander Graf
  2008-09-01 14:26                 ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 3:15 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>> Implement the hsave MSR, that gives the VCPU a GPA to save the
>> old guest state in.
>>
>
> There's a list of msrs exported to userspace somewhere; adding this  
> will automatically add save/restore/migrate support for hsave.

Sounds great, but how do I access that value from kernel space then?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/9] Implement hsave
  2008-09-01 14:11               ` Alexander Graf
@ 2008-09-01 14:26                 ` Avi Kivity
  0 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 14:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>
> On Sep 1, 2008, at 3:15 PM, Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>> Implement the hsave MSR, that gives the VCPU a GPA to save the
>>> old guest state in.
>>>
>>
>> There's a list of msrs exported to userspace somewhere; adding this 
>> will automatically add save/restore/migrate support for hsave.
>
> Sounds great, but how do I access that value from kernel space then?
>

Same as before.  It's just a list of msr numbers which userspace knows 
it can load/save.

See msrs_to_save in x86.c.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/9] Implement hsave
  2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
  2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
  2008-09-01 13:15             ` [PATCH 6/9] Implement hsave Avi Kivity
@ 2008-09-01 13:21             ` Avi Kivity
  2 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> @@ -1360,6 +1361,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
>  	case MSR_IA32_LASTINTTOIP:
>  		*data = svm->vmcb->save.last_excp_to;
>  		break;
> +	case MSR_VM_HSAVE_PA:
> +		*data = svm->nested_hsave;
> +		break;
>   

Also, need to check the low 12 bits here for zero.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/9] Allow setting the SVME bit
  2008-09-01 11:57         ` [PATCH 5/9] Allow setting the SVME bit Alexander Graf
  2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
@ 2008-09-01 13:14           ` Avi Kivity
  1 sibling, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:14 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> Normally setting the SVME bit in EFER is not allowed, as we did
> not support SVM. Not since we do, we should also allow enabling
> SVM mode.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/x86/kvm/svm.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 35c7fe7..3c9774e 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -440,6 +440,8 @@ static __init int svm_hardware_setup(void)
>  	if (boot_cpu_has(X86_FEATURE_NX))
>  		kvm_enable_efer_bits(EFER_NX);
>  
> +	kvm_enable_efer_bits(MSR_EFER_SVME_MASK);
> +
>   

Should probably the last patch, to avoid a partial implementation when 
bisecting.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 11:57     ` [PATCH 3/9] Implement GIF, clgi and stgi Alexander Graf
  2008-09-01 11:57       ` [PATCH 4/9] Add helper functions for nested SVM Alexander Graf
@ 2008-09-01 13:11       ` Avi Kivity
  2008-09-01 14:02         ` Alexander Graf
  1 sibling, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:11 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> This patch implements the GIF flag and the clgi and stgi instructions that
> set this flag. Only if the flag is set (default), interrupts can be received by
> the CPU.
>
> To keep the information about that somewhere, this patch adds a new hidden
> flags vector. that is used to store information that does not go into the
> vmcb, but is SVM specific.
>
> +
> +	u32 hflags;
>  };
>   

bool gif : 1;

(or even

bool gif;)?

>  
>  
> +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}
>   

Check before adjusting rip.

> +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
> +{
> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> +	skip_emulated_instruction(&svm->vcpu);
> +
> +	if (svm->vmcb->save.cpl) {
> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
> +		return 1;
> +	}
>   

Ditto.

Need save/restore support as well.  Can be in a different patch, though.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 13:11       ` [PATCH 3/9] Implement GIF, clgi and stgi Avi Kivity
@ 2008-09-01 14:02         ` Alexander Graf
  2008-09-01 14:25           ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:02 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 3:11 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>> This patch implements the GIF flag and the clgi and stgi  
>> instructions that
>> set this flag. Only if the flag is set (default), interrupts can be  
>> received by
>> the CPU.
>>
>> To keep the information about that somewhere, this patch adds a new  
>> hidden
>> flags vector. that is used to store information that does not go  
>> into the
>> vmcb, but is SVM specific.
>>
>> +
>> +	u32 hflags;
>> };
>>
>
> bool gif : 1;
>
> (or even
>
> bool gif;)?

It's not visible in the patches I sent, but I did use the hflags for a  
VMLOAD+VMRUN+VMSAVE aggregation hack. It's still somewhat flacky wrt  
save/restore though, so I didn't send it. Hflags might be useful  
nevertheless though.

>
>
>>  +static int stgi_interception(struct vcpu_svm *svm, struct kvm_run  
>> *kvm_run)
>> +{
>> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>> +	skip_emulated_instruction(&svm->vcpu);
>> +
>> +	if (svm->vmcb->save.cpl) {
>> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
>> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
>> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
>> +		return 1;
>> +	}
>>
>
> Check before adjusting rip.
>
>> +static int clgi_interception(struct vcpu_svm *svm, struct kvm_run  
>> *kvm_run)
>> +{
>> +	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>> +	skip_emulated_instruction(&svm->vcpu);
>> +
>> +	if (svm->vmcb->save.cpl) {
>> +		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
>> +		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
>> +		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
>> +		return 1;
>> +	}
>>
>
> Ditto.
>
> Need save/restore support as well.  Can be in a different patch,  
> though.

It might be a good idea to share hflags with qemu. I implemented the  
GIF there already.

Alex

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 14:02         ` Alexander Graf
@ 2008-09-01 14:25           ` Avi Kivity
  2008-09-01 15:37             ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 14:25 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>>> +
>>> +    u32 hflags;
>>> };
>>>
>>
>> bool gif : 1;
>>
>> (or even
>>
>> bool gif;)?
>
> It's not visible in the patches I sent, but I did use the hflags for a 
> VMLOAD+VMRUN+VMSAVE aggregation hack. It's still somewhat flacky wrt 
> save/restore though, so I didn't send it. Hflags might be useful 
> nevertheless though.

You mean code patching?

>>
>> Need save/restore support as well.  Can be in a different patch, though.
>
> It might be a good idea to share hflags with qemu. I implemented the 
> GIF there already.

Nothing prevents a similar format, though of course qemu is free to 
evolve, and anything in the kvm userspace interface is cast in stone.

For the userspace interface, a __u32 is better than a bool and certainly 
better than bitfields.  I was only suggesting bool for the internal 
representation.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 14:25           ` Avi Kivity
@ 2008-09-01 15:37             ` Alexander Graf
  2008-09-01 16:05               ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 15:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 4:25 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>>>> +
>>>> +    u32 hflags;
>>>> };
>>>>
>>>
>>> bool gif : 1;
>>>
>>> (or even
>>>
>>> bool gif;)?
>>
>> It's not visible in the patches I sent, but I did use the hflags  
>> for a VMLOAD+VMRUN+VMSAVE aggregation hack. It's still somewhat  
>> flacky wrt save/restore though, so I didn't send it. Hflags might  
>> be useful nevertheless though.
>
> You mean code patching?

Yes.

>
>
>>>
>>> Need save/restore support as well.  Can be in a different patch,  
>>> though.
>>
>> It might be a good idea to share hflags with qemu. I implemented  
>> the GIF there already.
>
> Nothing prevents a similar format, though of course qemu is free to  
> evolve, and anything in the kvm userspace interface is cast in stone.
>
> For the userspace interface, a __u32 is better than a bool and  
> certainly better than bitfields.  I was only suggesting bool for the  
> internal representation.

How exactly would you like to see the userspace bridge implemented? I  
thought it might be good to put hflags in the x86 arch struct and just  
have it transferred in get_sregs and set_sregs. I don't know if I  
break any old userspace doing that though...

Alex

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 15:37             ` Alexander Graf
@ 2008-09-01 16:05               ` Avi Kivity
  2008-09-01 16:13                 ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 16:05 UTC (permalink / raw)
  To: Alexander Graf; +Cc: KVM list, Joerg Roedel, Anthony Liguori

Alexander Graf wrote:
>>> It's not visible in the patches I sent, but I did use the hflags for 
>>> a VMLOAD+VMRUN+VMSAVE aggregation hack. It's still somewhat flacky 
>>> wrt save/restore though, so I didn't send it. Hflags might be useful 
>>> nevertheless though.
>>
>> You mean code patching?
>
> Yes.

That's always fun.

>
> How exactly would you like to see the userspace bridge implemented? I 
> thought it might be good to put hflags in the x86 arch struct and just 
> have it transferred in get_sregs and set_sregs. I don't know if I 
> break any old userspace doing that though...

We can't change struct kvm_sregs, since the structure size is encoded in 
the ioctl number.  We need a new struct and new ioctls.  The same was 
done for mpstate, for example.

There's more info needed, btw.  Whether the guest is in guest mode or 
host mode; what the current vmcs is; the value of eflags.if at the time 
vmrun was executed; and probably more.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 16:05               ` Avi Kivity
@ 2008-09-01 16:13                 ` Alexander Graf
  2008-09-01 16:17                   ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 16:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, Joerg Roedel, Anthony Liguori


On Sep 1, 2008, at 6:05 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>>>> It's not visible in the patches I sent, but I did use the hflags  
>>>> for a VMLOAD+VMRUN+VMSAVE aggregation hack. It's still somewhat  
>>>> flacky wrt save/restore though, so I didn't send it. Hflags might  
>>>> be useful nevertheless though.
>>>
>>> You mean code patching?
>>
>> Yes.
>
> That's always fun.
>
>>
>> How exactly would you like to see the userspace bridge implemented?  
>> I thought it might be good to put hflags in the x86 arch struct and  
>> just have it transferred in get_sregs and set_sregs. I don't know  
>> if I break any old userspace doing that though...
>
> We can't change struct kvm_sregs, since the structure size is  
> encoded in the ioctl number.  We need a new struct and new ioctls.   
> The same was done for mpstate, for example.

Urks - doesn't sounds fast to me.

>
>
> There's more info needed, btw.  Whether the guest is in guest mode  
> or host mode; what the current vmcs is; the value of eflags.if at  
> the time vmrun was executed; and probably more.

Do we really want to support storing the state when within the Level2  
VM? Can't we just #VMEXIT as soon as we switch to userspace or so?  
That would probably simplify things a lot.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 16:13                 ` Alexander Graf
@ 2008-09-01 16:17                   ` Avi Kivity
  2008-09-01 16:40                     ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 16:17 UTC (permalink / raw)
  To: Alexander Graf; +Cc: KVM list, Joerg Roedel, Anthony Liguori

Alexander Graf wrote:
>>
>> We can't change struct kvm_sregs, since the structure size is encoded 
>> in the ioctl number.  We need a new struct and new ioctls.  The same 
>> was done for mpstate, for example.
>
> Urks - doesn't sounds fast to me.
>

It's only for save/restore, not in any fast path.

>>
>>
>> There's more info needed, btw.  Whether the guest is in guest mode or 
>> host mode; what the current vmcs is; the value of eflags.if at the 
>> time vmrun was executed; and probably more.
>
> Do we really want to support storing the state when within the Level2 
> VM? Can't we just #VMEXIT as soon as we switch to userspace or so? 
> That would probably simplify things a lot.
>

Great idea.  What do we give as the exit reason?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 16:17                   ` Avi Kivity
@ 2008-09-01 16:40                     ` Alexander Graf
  2008-09-02  9:15                       ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 16:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, Joerg Roedel, Anthony Liguori


Am 01.09.2008 um 18:17 schrieb Avi Kivity <avi@qumranet.com>:

> Alexander Graf wrote:
>>>
>>> We can't change struct kvm_sregs, since the structure size is  
>>> encoded in the ioctl number.  We need a new struct and new  
>>> ioctls.  The same was done for mpstate, for example.
>>
>> Urks - doesn't sounds fast to me.
>>
>
> It's only for save/restore, not in any fast path.

Ah, so we don't sync hflags with the userspace on every transition but  
only request it on save/restore?
Sounds good.

>
>
>>>
>>>
>>> There's more info needed, btw.  Whether the guest is in guest mode  
>>> or host mode; what the current vmcs is; the value of eflags.if at  
>>> the time vmrun was executed; and probably more.
>>
>> Do we really want to support storing the state when within the  
>> Level2 VM? Can't we just #VMEXIT as soon as we switch to userspace  
>> or so? That would probably simplify things a lot.
>>
>
> Great idea.  What do we give as the exit reason?

I thought of INTR here. That should be a nop in every VMM. Even if  
not, worst case it would end in an hlt that still fires on the next  
timer.

Alex

>
>
> -- 
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] Implement GIF, clgi and stgi
  2008-09-01 16:40                     ` Alexander Graf
@ 2008-09-02  9:15                       ` Avi Kivity
  0 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-02  9:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: KVM list, Joerg Roedel, Anthony Liguori

Alexander Graf wrote:
>
> Ah, so we don't sync hflags with the userspace on every transition but 
> only request it on save/restore?

Yes.  That's how all of the other state is synced.

>>>
>>> Do we really want to support storing the state when within the 
>>> Level2 VM? Can't we just #VMEXIT as soon as we switch to userspace 
>>> or so? That would probably simplify things a lot.
>>>
>>
>> Great idea.  What do we give as the exit reason?
>
> I thought of INTR here. That should be a nop in every VMM. Even if 
> not, worst case it would end in an hlt that still fires on the next 
> timer. 

Yes.  Luckily svm doesn't support capturing the vector number like vmx does.

There's only one case where this doesn't work, AFAICT: when the INTR 
intercept isn't enabled.  Once can think of a microkernel hypervisor 
that forwards all interrupts to a privileged guest (and when that guest 
is running, it can disable INTR intercepts).  I don't know of any 
hypervisor which does this (Xen doesn't and can't), so I don't think 
this is an objection to using INTR.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] Clean up VINTR setting
  2008-09-01 11:57   ` [PATCH 2/9] Clean up VINTR setting Alexander Graf
  2008-09-01 11:57     ` [PATCH 3/9] Implement GIF, clgi and stgi Alexander Graf
@ 2008-09-01 13:13     ` Avi Kivity
  1 sibling, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 13:13 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> The current VINTR intercept setters don't look clean to me. To make
> the code easier to read and enable the possibilty to trap on a VINTR
> set, this uses a helper function to set the VINTR intercept.
>
> +static void svm_set_vintr(struct vcpu_svm *svm, bool on)
> +{
> +	if (on) {
> +		svm->vmcb->control.intercept |= 1ULL << INTERCEPT_VINTR;
> +	} else {
> +		svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
> +	}
> +}
> +
>   


svm_set_vintr() + svm_clear_vintr() would be even clearer IMO.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 11:57 [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Alexander Graf
  2008-09-01 11:57 ` [PATCH 1/9] Add CPUID feature flag for SVM Alexander Graf
@ 2008-09-01 12:09 ` Avi Kivity
  2008-09-01 12:21 ` Joerg Roedel
  2008-09-01 13:41 ` Daniel P. Berrange
  3 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 12:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
> The current generation of virtualization extensions only supports one VM layer.
> While we can't change that, it is pretty easy to emulate the CPU's behavior
> and implement the virtualization opcodes ourselves.
>
> This patchset does exactly this for SVM. Using this, a KVM can run within a VM.
> Since we're emulating the real CPU's behavior, this should also enable other
> VMMs to run within KVM.
> So far I've only tested to run KVM inside the VM though.
>
> This was created with help from Joerg Roedel. I don't really know how put this
> information in the patchset though. Maybe set him on signed-off-by?
>
> As always, comments and suggestions are highly welcome.
>
>   

Wow.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 11:57 [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Alexander Graf
  2008-09-01 11:57 ` [PATCH 1/9] Add CPUID feature flag for SVM Alexander Graf
  2008-09-01 12:09 ` [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Avi Kivity
@ 2008-09-01 12:21 ` Joerg Roedel
  2008-09-01 13:41 ` Daniel P. Berrange
  3 siblings, 0 replies; 44+ messages in thread
From: Joerg Roedel @ 2008-09-01 12:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, anthony

On Mon, Sep 01, 2008 at 01:57:52PM +0200, Alexander Graf wrote:
> The current generation of virtualization extensions only supports one VM layer.
> While we can't change that, it is pretty easy to emulate the CPU's behavior
> and implement the virtualization opcodes ourselves.
> 
> This patchset does exactly this for SVM. Using this, a KVM can run within a VM.
> Since we're emulating the real CPU's behavior, this should also enable other
> VMMs to run within KVM.
> So far I've only tested to run KVM inside the VM though.
> 
> This was created with help from Joerg Roedel. I don't really know how put this
> information in the patchset though. Maybe set him on signed-off-by?
> 
> As always, comments and suggestions are highly welcome.

Cool stuff now that its working :-)
But I will have another look at the interupt handling code before I add
my Signed-off-by/Acked-By/Whatever :)

Congrats,

Joerg

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 11:57 [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Alexander Graf
                   ` (2 preceding siblings ...)
  2008-09-01 12:21 ` Joerg Roedel
@ 2008-09-01 13:41 ` Daniel P. Berrange
  2008-09-01 14:17   ` Alexander Graf
  3 siblings, 1 reply; 44+ messages in thread
From: Daniel P. Berrange @ 2008-09-01 13:41 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

On Mon, Sep 01, 2008 at 01:57:52PM +0200, Alexander Graf wrote:
> The current generation of virtualization extensions only supports one VM layer.
> While we can't change that, it is pretty easy to emulate the CPU's behavior
> and implement the virtualization opcodes ourselves.
> 
> This patchset does exactly this for SVM. Using this, a KVM can run within a VM.
> Since we're emulating the real CPU's behavior, this should also enable other
> VMMs to run within KVM.
> So far I've only tested to run KVM inside the VM though.
> 
> This was created with help from Joerg Roedel. I don't really know how put this
> information in the patchset though. Maybe set him on signed-off-by?
> 
> As always, comments and suggestions are highly welcome.

Very useful work - I need exactly this capability for building a libvirt
integration test harness. Running xen within kvm was not letting me test
libvirt's Xen HVM handling, and kvm/xen within qemu was too slow to be
viable. Hopefully xen/kvm within kvm will suit my test suite perfectly.

I'm guessing this requires that you have AMD cpus for the host ? Would it
be much more work to support VMX for those with Intel cpus too...

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 13:41 ` Daniel P. Berrange
@ 2008-09-01 14:17   ` Alexander Graf
  2008-09-01 14:22     ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:17 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: kvm, joro, anthony


On Sep 1, 2008, at 3:41 PM, Daniel P. Berrange wrote:

> On Mon, Sep 01, 2008 at 01:57:52PM +0200, Alexander Graf wrote:
>> The current generation of virtualization extensions only supports  
>> one VM layer.
>> While we can't change that, it is pretty easy to emulate the CPU's  
>> behavior
>> and implement the virtualization opcodes ourselves.
>>
>> This patchset does exactly this for SVM. Using this, a KVM can run  
>> within a VM.
>> Since we're emulating the real CPU's behavior, this should also  
>> enable other
>> VMMs to run within KVM.
>> So far I've only tested to run KVM inside the VM though.
>>
>> This was created with help from Joerg Roedel. I don't really know  
>> how put this
>> information in the patchset though. Maybe set him on signed-off-by?
>>
>> As always, comments and suggestions are highly welcome.
>
> Very useful work - I need exactly this capability for building a  
> libvirt

Thanks ;-). The amazingly simple design of SVM really makes this easy.

> integration test harness. Running xen within kvm was not letting me  
> test
> libvirt's Xen HVM handling, and kvm/xen within qemu was too slow to be
> viable. Hopefully xen/kvm within kvm will suit my test suite  
> perfectly.
>
> I'm guessing this requires that you have AMD cpus for the host ?  
> Would it
> be much more work to support VMX for those with Intel cpus too...

This is completely implementation specific. So for now this is 100%  
AMD only code.
Also I don't really like Intel's design for VMX and while making  
nested VMX work might not be _that_ hard, getting it fast would  
probably require a lot of work and hacks.

Alex

>
>
> Daniel
> -- 
> |: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ 
>  :|
> |: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org 
>  :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ 
>  :|
> |: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742  
> 7D3B 9505 :|


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 14:17   ` Alexander Graf
@ 2008-09-01 14:22     ` Avi Kivity
  2008-09-01 14:47       ` Alexander Graf
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 14:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Daniel P. Berrange, kvm, joro, anthony

Alexander Graf wrote:
>
> This is completely implementation specific. So for now this is 100% 
> AMD only code.
> Also I don't really like Intel's design for VMX and while making 
> nested VMX work might not be _that_ hard, getting it fast would 
> probably require a lot of work and hacks.

I've thought of introducing a third, hypercall based, virtualization 
extension and exposing that.  We could then optimize it to the hilts.

The advantages of this are that we can live migrate across vendors, and 
of course we have great freedom in the implementation.  The disadvantage 
is that a lot of thought needs to go into the design in order to 
accommodate future changes in both the host and guest.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 14:22     ` Avi Kivity
@ 2008-09-01 14:47       ` Alexander Graf
  2008-09-01 14:57         ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2008-09-01 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Daniel P. Berrange, kvm, joro, anthony


On Sep 1, 2008, at 4:22 PM, Avi Kivity wrote:

> Alexander Graf wrote:
>>
>> This is completely implementation specific. So for now this is 100%  
>> AMD only code.
>> Also I don't really like Intel's design for VMX and while making  
>> nested VMX work might not be _that_ hard, getting it fast would  
>> probably require a lot of work and hacks.
>
> I've thought of introducing a third, hypercall based, virtualization  
> extension and exposing that.  We could then optimize it to the hilts.
>
> The advantages of this are that we can live migrate across vendors,  
> and of course we have great freedom in the implementation.  The  
> disadvantage is that a lot of thought needs to go into the design in  
> order to accommodate future changes in both the host and guest.

Not only that, but we'd also only enable running KVM inside the VM for  
now. With real vendor specific extensions passed through you could run  
a 3rd party hypervisor, which is what I think is the really  
interesting use case.

Alex

>
>
> -- 
> error compiling committee.c: too many arguments to function
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] [RFC] Add support for nested SVM (kernel)
  2008-09-01 14:47       ` Alexander Graf
@ 2008-09-01 14:57         ` Avi Kivity
  0 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2008-09-01 14:57 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Daniel P. Berrange, kvm, joro, anthony

Alexander Graf wrote:
>
> On Sep 1, 2008, at 4:22 PM, Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>>
>>> This is completely implementation specific. So for now this is 100% 
>>> AMD only code.
>>> Also I don't really like Intel's design for VMX and while making 
>>> nested VMX work might not be _that_ hard, getting it fast would 
>>> probably require a lot of work and hacks.
>>
>> I've thought of introducing a third, hypercall based, virtualization 
>> extension and exposing that.  We could then optimize it to the hilts.
>>
>> The advantages of this are that we can live migrate across vendors, 
>> and of course we have great freedom in the implementation.  The 
>> disadvantage is that a lot of thought needs to go into the design in 
>> order to accommodate future changes in both the host and guest.
>
> Not only that, but we'd also only enable running KVM inside the VM for 
> now. 

Yes, of course.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2008-09-03 11:55 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-01 11:57 [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Alexander Graf
2008-09-01 11:57 ` [PATCH 1/9] Add CPUID feature flag for SVM Alexander Graf
2008-09-01 11:57   ` [PATCH 2/9] Clean up VINTR setting Alexander Graf
2008-09-01 11:57     ` [PATCH 3/9] Implement GIF, clgi and stgi Alexander Graf
2008-09-01 11:57       ` [PATCH 4/9] Add helper functions for nested SVM Alexander Graf
2008-09-01 11:57         ` [PATCH 5/9] Allow setting the SVME bit Alexander Graf
2008-09-01 11:57           ` [PATCH 6/9] Implement hsave Alexander Graf
2008-09-01 11:57             ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Alexander Graf
2008-09-01 11:58               ` [PATCH 8/9] Add VMRUN handler Alexander Graf
2008-09-01 11:58                 ` [PATCH 9/9] Add VMEXIT handler and intercepts Alexander Graf
2008-09-01 13:58                   ` Avi Kivity
2008-09-02 16:15                     ` Alexander Graf
2008-09-03  9:23                       ` Avi Kivity
2008-09-03  9:33                         ` Alexander Graf
2008-09-03  9:47                           ` Avi Kivity
2008-09-03 11:55                             ` Alexander Graf
2008-09-01 13:41                 ` [PATCH 8/9] Add VMRUN handler Avi Kivity
2008-09-02 15:38                   ` Alexander Graf
2008-09-01 13:27               ` [PATCH 7/9] Add VMLOAD and VMSAVE handlers Avi Kivity
2008-09-01 14:14                 ` Alexander Graf
2008-09-01 14:27                   ` Avi Kivity
2008-09-01 14:49                     ` Alexander Graf
2008-09-01 13:15             ` [PATCH 6/9] Implement hsave Avi Kivity
2008-09-01 14:11               ` Alexander Graf
2008-09-01 14:26                 ` Avi Kivity
2008-09-01 13:21             ` Avi Kivity
2008-09-01 13:14           ` [PATCH 5/9] Allow setting the SVME bit Avi Kivity
2008-09-01 13:11       ` [PATCH 3/9] Implement GIF, clgi and stgi Avi Kivity
2008-09-01 14:02         ` Alexander Graf
2008-09-01 14:25           ` Avi Kivity
2008-09-01 15:37             ` Alexander Graf
2008-09-01 16:05               ` Avi Kivity
2008-09-01 16:13                 ` Alexander Graf
2008-09-01 16:17                   ` Avi Kivity
2008-09-01 16:40                     ` Alexander Graf
2008-09-02  9:15                       ` Avi Kivity
2008-09-01 13:13     ` [PATCH 2/9] Clean up VINTR setting Avi Kivity
2008-09-01 12:09 ` [PATCH 0/9] [RFC] Add support for nested SVM (kernel) Avi Kivity
2008-09-01 12:21 ` Joerg Roedel
2008-09-01 13:41 ` Daniel P. Berrange
2008-09-01 14:17   ` Alexander Graf
2008-09-01 14:22     ` Avi Kivity
2008-09-01 14:47       ` Alexander Graf
2008-09-01 14:57         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox