[PATCH 00/10] Add support for nested SVM (kernel) v4

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] Add support for nested SVM (kernel) v4
@ 2008-10-15 19:31 Alexander Graf
  2008-10-15 19:31 ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
  2008-10-19  9:56 ` [PATCH 00/10] Add support for nested SVM (kernel) v4 Avi Kivity
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

The current generation of virtualization extensions only supports one VM layer.
While we can't change that, it is pretty easy to emulate the CPU's behavior
and implement the virtualization opcodes ourselves.

This patchset does exactly this for SVM. Using it, KVM can run within a VM.
Since we're emulating the real CPU's behavior, this should also enable other
VMMs to run within KVM.
So far I've only tested to run KVM inside the VM though.

As always, comments and suggestions are highly welcome.

v2 takes most comments from Avi into account.

v3 addresses Joergs comments, including

- V_INTR_MASKING support
- a generic permission checking helper

v4 addresses even more comments from Joerg, including

- don't use the guest's hsave to store the guest's vmcb in
- add nested=<int> flag for kvm-amd.ko, defaults to 0 (off)
- include Joerg's VM_CR MSR patch

To be usable, this patchset requires the two simple changes in the userspace
part, that I sent to the list with the first version.

Known issues:

- TODO: #VMEXIT on save/restore
- SMP l2 guests break with in-kernel-apic

Thanks for reviewing!

Alex

Alexander Graf (9):
  Add CPUID feature flag for SVM v4
  Clean up VINTR setting v4
  Add helper functions for nested SVM v4
  Implement GIF, clgi and stgi v4
  Implement hsave v4
  Add VMLOAD and VMSAVE handlers v4
  Add VMRUN handler v4
  Add VMEXIT handler and intercepts v4
  Allow setting the SVME bit v4

Joerg Roedel (1):
  allow read access to MSR_VM_VR

 arch/x86/kvm/kvm_svm.h       |   11 +
 arch/x86/kvm/svm.c           |  755 +++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/cpufeature.h |    1 +
 include/asm-x86/kvm_host.h   |    5 +
 4 files changed, 760 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/10] Add CPUID feature flag for SVM v4
  2008-10-15 19:31 [PATCH 00/10] Add support for nested SVM (kernel) v4 Alexander Graf
@ 2008-10-15 19:31 ` Alexander Graf
  2008-10-15 19:31   ` [PATCH 02/10] Clean up VINTR setting v4 Alexander Graf
  2008-10-16 11:25   ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
  2008-10-19  9:56 ` [PATCH 00/10] Add support for nested SVM (kernel) v4 Avi Kivity
  1 sibling, 2 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch adds the CPUID feature flag for SVM in the x86 Linux headers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 include/asm-x86/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index cfcfb0a..28217de 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -110,6 +110,7 @@
 /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
 #define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading not valid */
+#define X86_FEATURE_SVM		(6*32+ 2) /* Secure Virtual Machine */
 #define X86_FEATURE_IBS		(6*32+ 10) /* Instruction Based Sampling */
 
 /*
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] Clean up VINTR setting v4
  2008-10-15 19:31 ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
@ 2008-10-15 19:31   ` Alexander Graf
  2008-10-15 19:31     ` [PATCH 03/10] Add helper functions for nested SVM v4 Alexander Graf
  2008-10-16 11:25   ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.

v2 uses two distinct functions for setting and clearing the bit

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   20 +++++++++++++++-----
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 05efc4e..4ee5376 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -731,6 +731,16 @@ static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	to_svm(vcpu)->vmcb->save.rflags = rflags;
 }
 
+static void svm_set_vintr(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.intercept |= 1ULL << INTERCEPT_VINTR;
+}
+
+static void svm_clear_vintr(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+}
+
 static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg)
 {
 	struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
@@ -1376,7 +1386,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm,
 {
 	KVMTRACE_0D(PEND_INTR, &svm->vcpu, handler);
 
-	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_VINTR);
+	svm_clear_vintr(svm);
 	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
 	/*
 	 * If the user space waits to inject interrupts, exit as soon as
@@ -1589,7 +1599,7 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
 		/* unable to deliver irq, set pending irq */
-		vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm);
 		svm_inject_irq(svm, 0x0);
 		goto out;
 	}
@@ -1649,9 +1659,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	 */
 	if (!svm->vcpu.arch.interrupt_window_open &&
 	    (svm->vcpu.arch.irq_summary || kvm_run->request_interrupt_window))
-		control->intercept |= 1ULL << INTERCEPT_VINTR;
-	 else
-		control->intercept &= ~(1ULL << INTERCEPT_VINTR);
+		svm_set_vintr(svm);
+	else
+		svm_clear_vintr(svm);
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] Add helper functions for nested SVM v4
  2008-10-15 19:31   ` [PATCH 02/10] Clean up VINTR setting v4 Alexander Graf
@ 2008-10-15 19:31     ` Alexander Graf
  2008-10-15 19:31       ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Alexander Graf
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

These are helpers for the nested SVM implementation.

- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory

v3 makes use of the new permission checker

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4ee5376..a00421b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -48,6 +48,16 @@ MODULE_LICENSE("GPL");
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
+/* Turn on to get debugging output*/
+/* #define NESTED_DEBUG */
+
+#ifdef NESTED_DEBUG
+#define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
+#else
+static inline void nsvm_printk(char *fmt, ...) {
+}
+#endif
+
 /* enable NPT for AMD64 and X86 with PAE */
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 static bool npt_enabled = true;
@@ -1145,6 +1155,84 @@ static int vmmcall_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int nested_svm_check_permissions(struct vcpu_svm *svm)
+{
+	if (svm->vmcb->save.cpl) {
+		printk(KERN_ERR "%s: invalid cpl 0x%x at ip 0x%lx\n",
+		       __func__, svm->vmcb->save.cpl, kvm_rip_read(&svm->vcpu));
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return 1;
+	}
+
+       if (!(svm->vcpu.arch.shadow_efer & MSR_EFER_SVME_MASK)
+           || !is_paging(&svm->vcpu)) {
+               kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+               return 1;
+       }
+
+       return 0;
+}
+
+static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
+{
+	struct page *page;
+
+	down_read(&current->mm->mmap_sem);
+	page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT);
+	up_read(&current->mm->mmap_sem);
+
+	if (is_error_page(page)) {
+		printk(KERN_ERR "%s: could not find page at 0x%llx\n",
+		       __func__, gpa);
+		kvm_release_page_clean(page);
+		kvm_queue_exception(&svm->vcpu, GP_VECTOR);
+		return NULL;
+	}
+	return page;
+}
+
+static int nested_svm_do(struct vcpu_svm *svm,
+			 u64 arg1_gpa, u64 arg2_gpa, void *opaque,
+			 int (*handler)(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque))
+{
+	struct page *arg1_page;
+	struct page *arg2_page = NULL;
+	void *arg1;
+	void *arg2 = NULL;
+	int retval;
+
+	arg1_page = nested_svm_get_page(svm, arg1_gpa);
+	if(arg1_page == NULL)
+		return 1;
+
+	if (arg2_gpa) {
+		arg2_page = nested_svm_get_page(svm, arg2_gpa);
+		if(arg2_page == NULL) {
+			kvm_release_page_clean(arg1_page);
+			return 1;
+		}
+	}
+
+	arg1 = kmap_atomic(arg1_page, KM_USER0);
+	if (arg2_gpa)
+		arg2 = kmap_atomic(arg2_page, KM_USER1);
+
+	retval = handler(svm, arg1, arg2, opaque);
+
+	kunmap_atomic(arg1, KM_USER0);
+	if (arg2_gpa)
+		kunmap_atomic(arg2, KM_USER1);
+
+	kvm_release_page_dirty(arg1_page);
+	if (arg2_gpa)
+		kvm_release_page_dirty(arg2_page);
+
+	return retval;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/10] Implement GIF, clgi and stgi v4
  2008-10-15 19:31     ` [PATCH 03/10] Add helper functions for nested SVM v4 Alexander Graf
@ 2008-10-15 19:31       ` Alexander Graf
  2008-10-15 19:31         ` [PATCH 05/10] Implement hsave v4 Alexander Graf
  2008-10-19  9:46         ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Avi Kivity
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.

To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.

v2 moves the hflags to x86 generic code
v3 makes use of the new permission helper

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c         |   42 +++++++++++++++++++++++++++++++++++++++---
 include/asm-x86/kvm_host.h |    3 +++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a00421b..62bfa2b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -614,6 +614,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		save->cr4 = 0;
 	}
 	force_new_asid(&svm->vcpu);
+
+	svm->vcpu.arch.hflags = HF_GIF_MASK;
 }
 
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -1233,6 +1235,36 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (nested_svm_check_permissions(svm))
+		return 1;
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	svm->vcpu.arch.hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
+static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (nested_svm_check_permissions(svm))
+		return 1;
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+
+	/* After a CLGI no interrupts should come */
+	svm_clear_vintr(svm);
+	svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
+
+	return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
 				   struct kvm_run *kvm_run)
 {
@@ -1534,8 +1566,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
 	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
-	[SVM_EXIT_STGI]				= invalid_op_interception,
-	[SVM_EXIT_CLGI]				= invalid_op_interception,
+	[SVM_EXIT_STGI]				= stgi_interception,
+	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
 	[SVM_EXIT_WBINVD]                       = emulate_on_interception,
 	[SVM_EXIT_MONITOR]			= invalid_op_interception,
@@ -1683,6 +1715,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
+		goto out;
+
 	if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
 	    (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	    (vmcb->control.event_inj & SVM_EVTINJ_VALID)) {
@@ -1734,7 +1769,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
-		 (svm->vmcb->save.rflags & X86_EFLAGS_IF));
+		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
+		 (svm->vcpu.arch.hflags & HF_GIF_MASK));
 
 	if (svm->vcpu.arch.interrupt_window_open && svm->vcpu.arch.irq_summary)
 		/*
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 4b06ca8..dc30b7b 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -253,6 +253,7 @@ struct kvm_vcpu_arch {
 	unsigned long cr3;
 	unsigned long cr4;
 	unsigned long cr8;
+	u32 hflags;
 	u64 pdptrs[4]; /* pae */
 	u64 shadow_efer;
 	u64 apic_base;
@@ -734,6 +735,8 @@ enum {
 	TASK_SWITCH_GATE = 3,
 };
 
+#define HF_GIF_MASK		(1 << 0)
+
 /*
  * Hardware virtualization extension instructions may fault if a
  * reboot turns off virtualization while processes are running.
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/10] Implement hsave v4
  2008-10-15 19:31       ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Alexander Graf
@ 2008-10-15 19:31         ` Alexander Graf
  2008-10-15 19:31           ` [PATCH 06/10] Add VMLOAD and VMSAVE handlers v4 Alexander Graf
  2008-10-19  9:46         ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Avi Kivity
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.

v2 allows userspace to save/restore hsave
v4 dummys out the hsave MSR, so we use a host page

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h |    1 +
 arch/x86/kvm/svm.c     |   12 ++++++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 65ef0fc..40cb128 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -41,6 +41,7 @@ struct vcpu_svm {
 	unsigned long host_dr7;
 
 	u32 *msrpm;
+	struct vmcb *hsave;
 };
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 62bfa2b..eb301fe 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -640,6 +640,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	struct vcpu_svm *svm;
 	struct page *page;
 	struct page *msrpm_pages;
+	struct page *hsave_page;
 	int err;
 
 	svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -665,6 +666,11 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
 
+	hsave_page = alloc_page(GFP_KERNEL);
+	if (!hsave_page)
+		goto uninit;
+	svm->hsave = page_address(hsave_page);
+
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -694,6 +700,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 
 	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
+	__free_page(virt_to_page(svm->hsave));
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -1376,6 +1383,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 	case MSR_IA32_LASTINTTOIP:
 		*data = svm->vmcb->save.last_excp_to;
 		break;
+	case MSR_VM_HSAVE_PA:
+		*data = 0;
+		break;
 	default:
 		return kvm_get_msr_common(vcpu, ecx, data);
 	}
@@ -1470,6 +1480,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 		pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", ecx, data);
 
 		break;
+	case MSR_VM_HSAVE_PA:
+		break;
 	default:
 		return kvm_set_msr_common(vcpu, ecx, data);
 	}
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/10] Add VMLOAD and VMSAVE handlers v4
  2008-10-15 19:31         ` [PATCH 05/10] Implement hsave v4 Alexander Graf
@ 2008-10-15 19:31           ` Alexander Graf
  2008-10-15 19:31             ` [PATCH 07/10] Add VMRUN handler v4 Alexander Graf
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This implements the VMLOAD and VMSAVE instructions, that usually surround
the VMRUN instructions. Both instructions load / restore the same elements,
so we only need to implement them once.

v2 fixes CPL checking and replaces memcpy by assignments
v3 makes use of the new permission checking

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |   60 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index eb301fe..10ad02b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1242,6 +1242,62 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+{
+	to_vmcb->save.fs = from_vmcb->save.fs;
+	to_vmcb->save.gs = from_vmcb->save.gs;
+	to_vmcb->save.tr = from_vmcb->save.tr;
+	to_vmcb->save.ldtr = from_vmcb->save.ldtr;
+	to_vmcb->save.kernel_gs_base = from_vmcb->save.kernel_gs_base;
+	to_vmcb->save.star = from_vmcb->save.star;
+	to_vmcb->save.lstar = from_vmcb->save.lstar;
+	to_vmcb->save.cstar = from_vmcb->save.cstar;
+	to_vmcb->save.sfmask = from_vmcb->save.sfmask;
+	to_vmcb->save.sysenter_cs = from_vmcb->save.sysenter_cs;
+	to_vmcb->save.sysenter_esp = from_vmcb->save.sysenter_esp;
+	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
+
+	return 1;
+}
+
+static int nested_svm_vmload(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave((struct vmcb *)nested_vmcb, svm->vmcb);
+}
+
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque)
+{
+	return nested_svm_vmloadsave(svm->vmcb, (struct vmcb *)nested_vmcb);
+}
+
+static int vmload_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (nested_svm_check_permissions(svm))
+		return 1;
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmload);
+
+	return 1;
+}
+
+static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	if (nested_svm_check_permissions(svm))
+		return 1;
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	nested_svm_do(svm, svm->vmcb->save.rax, 0, NULL, nested_svm_vmsave);
+
+	return 1;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	if (nested_svm_check_permissions(svm))
@@ -1576,8 +1632,8 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
 	[SVM_EXIT_VMRUN]			= invalid_op_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
-	[SVM_EXIT_VMLOAD]			= invalid_op_interception,
-	[SVM_EXIT_VMSAVE]			= invalid_op_interception,
+	[SVM_EXIT_VMLOAD]			= vmload_interception,
+	[SVM_EXIT_VMSAVE]			= vmsave_interception,
 	[SVM_EXIT_STGI]				= stgi_interception,
 	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= invalid_op_interception,
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/10] Add VMRUN handler v4
  2008-10-15 19:31           ` [PATCH 06/10] Add VMLOAD and VMSAVE handlers v4 Alexander Graf
@ 2008-10-15 19:31             ` Alexander Graf
  2008-10-15 19:31               ` [PATCH 08/10] Add VMEXIT handler and intercepts v4 Alexander Graf
  2008-10-19  9:51               ` [PATCH 07/10] Add VMRUN handler v4 Avi Kivity
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do it.

We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.

v2 implements the following improvements:

- fixes the CPL check
- does not allocate iopm when not used
- remembers the host's IF in the HIF bit in the hflags

v3:

- make use of the new permission checking
- add support for V_INTR_MASKING_MASK

v4:

- use host page backed hsave

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/kvm_svm.h     |   10 ++
 arch/x86/kvm/svm.c         |  198 +++++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/kvm_host.h |    2 +
 3 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h
index 40cb128..a647602 100644
--- a/arch/x86/kvm/kvm_svm.h
+++ b/arch/x86/kvm/kvm_svm.h
@@ -42,6 +42,16 @@ struct vcpu_svm {
 
 	u32 *msrpm;
 	struct vmcb *hsave;
+
+	u64 nested_vmcb;
+
+	/* These are the merged vectors */
+	u32 *nested_msrpm;
+	u32 *nested_iopm;
+
+	/* gpa pointers to the real vectors */
+	u64 nested_vmcb_msrpm;
+	u64 nested_vmcb_iopm;
 };
 
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 10ad02b..6520bff 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
 /* Turn on to get debugging output*/
 /* #define NESTED_DEBUG */
 
+/* Not needed until device passthrough */
+/* #define NESTED_KVM_MERGE_IOPM */
+
 #ifdef NESTED_DEBUG
 #define nsvm_printk(fmt, args...) printk(KERN_INFO fmt, ## args)
 #else
@@ -76,6 +79,11 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_svm, vcpu);
 }
 
+static inline bool is_nested(struct vcpu_svm *svm)
+{
+	return svm->nested_vmcb;
+}
+
 static unsigned long iopm_base;
 
 struct kvm_ldttss_desc {
@@ -615,6 +623,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	}
 	force_new_asid(&svm->vcpu);
 
+	svm->nested_vmcb = 0;
 	svm->vcpu.arch.hflags = HF_GIF_MASK;
 }
 
@@ -641,6 +650,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	struct page *page;
 	struct page *msrpm_pages;
 	struct page *hsave_page;
+	struct page *nested_msrpm_pages;
+#ifdef NESTED_KVM_MERGE_IOPM
+	struct page *nested_iopm_pages;
+#endif
 	int err;
 
 	svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -663,6 +676,17 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 	msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
 	if (!msrpm_pages)
 		goto uninit;
+
+	nested_msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
+	if (!nested_msrpm_pages)
+		goto uninit;
+
+#ifdef NESTED_KVM_MERGE_IOPM
+	nested_iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+	if (!nested_iopm_pages)
+		goto uninit;
+#endif
+
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
 
@@ -671,6 +695,11 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 		goto uninit;
 	svm->hsave = page_address(hsave_page);
 
+	svm->nested_msrpm = page_address(nested_msrpm_pages);
+#ifdef NESTED_KVM_MERGE_IOPM
+	svm->nested_iopm = page_address(nested_iopm_pages);
+#endif
+
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -701,6 +730,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->hsave));
+	__free_pages(virt_to_page(svm->nested_msrpm), MSRPM_ALLOC_ORDER);
+#ifdef NESTED_KVM_MERGE_IOPM
+	__free_pages(virt_to_page(svm->nested_iopm), IOPM_ALLOC_ORDER);
+#endif
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
@@ -1242,6 +1275,138 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+
+static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_msrpm = (u32*)arg1;
+	for (i=0; i< PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_msrpm[i] = svm->msrpm[i] | nested_msrpm[i];
+	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested_msrpm);
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_vmrun_iopm(struct vcpu_svm *svm, void *arg1,
+				 void *arg2, void *opaque)
+{
+	int i;
+	u32 *nested_iopm = (u32*)arg1;
+	u32 *iopm = (u32*)__va(iopm_base);
+	for (i=0; i< PAGE_SIZE * (1 << IOPM_ALLOC_ORDER) / 4; i++)
+		svm->nested_iopm[i] = iopm[i] | nested_iopm[i];
+	svm->vmcb->control.iopm_base_pa = __pa(svm->nested_iopm);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1,
+			    void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = svm->hsave;
+
+	/* nested_vmcb is our indicator if nested SVM is activated */
+	svm->nested_vmcb = svm->vmcb->save.rax;
+
+	/* Clear internal status */
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Save the old vmcb, so we don't need to pick what we save, but
+	   can restore everything when a VMEXIT occurs */
+	memcpy(hsave, svm->vmcb, sizeof(struct vmcb));
+	/* We need to remember the original CR3 in the SPT case */
+	if (!npt_enabled)
+		hsave->save.cr3 = svm->vcpu.arch.cr3;
+	hsave->save.rip = svm->next_rip;
+
+	if (svm->vmcb->save.rflags & X86_EFLAGS_IF)
+		svm->vcpu.arch.hflags |= HF_HIF_MASK;
+	else
+		svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
+
+	/* Load the nested guest state */
+	svm->vmcb->save.es = nested_vmcb->save.es;
+	svm->vmcb->save.cs = nested_vmcb->save.cs;
+	svm->vmcb->save.ss = nested_vmcb->save.ss;
+	svm->vmcb->save.ds = nested_vmcb->save.ds;
+	svm->vmcb->save.gdtr = nested_vmcb->save.gdtr;
+	svm->vmcb->save.idtr = nested_vmcb->save.idtr;
+	svm->vmcb->save.rflags = nested_vmcb->save.rflags;
+	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
+	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
+	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = nested_vmcb->save.cr3;
+		svm->vcpu.arch.cr3 = nested_vmcb->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, nested_vmcb->save.cr3);
+		kvm_mmu_reset_context(&svm->vcpu);
+	}
+	svm->vmcb->save.cr2 = nested_vmcb->save.cr2;
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip);
+	/* In case we don't even reach vcpu_run, the fields are not updated */
+	svm->vmcb->save.rax = nested_vmcb->save.rax;
+	svm->vmcb->save.rsp = nested_vmcb->save.rsp;
+	svm->vmcb->save.rip = nested_vmcb->save.rip;
+	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
+	svm->vmcb->save.dr6 = nested_vmcb->save.dr6;
+	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
+
+	/* We don't want a nested guest to be more powerful than the guest,
+	   so all intercepts are ORed */
+	svm->vmcb->control.intercept_cr_read |=
+		nested_vmcb->control.intercept_cr_read;
+	svm->vmcb->control.intercept_cr_write |=
+		nested_vmcb->control.intercept_cr_write;
+	svm->vmcb->control.intercept_dr_read |=
+		nested_vmcb->control.intercept_dr_read;
+	svm->vmcb->control.intercept_dr_write |=
+		nested_vmcb->control.intercept_dr_write;
+	svm->vmcb->control.intercept_exceptions |=
+		nested_vmcb->control.intercept_exceptions;
+
+	svm->vmcb->control.intercept |= nested_vmcb->control.intercept;
+
+	svm->nested_vmcb_msrpm = nested_vmcb->control.msrpm_base_pa;
+	svm->nested_vmcb_iopm = nested_vmcb->control.iopm_base_pa;
+
+	force_new_asid(&svm->vcpu);
+	svm->vmcb->control.exit_int_info = nested_vmcb->control.exit_int_info;
+	svm->vmcb->control.exit_int_info_err = nested_vmcb->control.exit_int_info_err;
+	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
+	if (nested_vmcb->control.int_ctl & V_IRQ_MASK) {
+		nsvm_printk("nSVM Injecting Interrupt: 0x%x\n",
+				nested_vmcb->control.int_ctl);
+	}
+	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
+		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
+	else
+		svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
+
+	nsvm_printk("nSVM exit_int_info: 0x%x | int_state: 0x%x\n",
+			nested_vmcb->control.exit_int_info,
+			nested_vmcb->control.int_state);
+
+	svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
+	svm->vmcb->control.int_state = nested_vmcb->control.int_state;
+	svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
+	if (nested_vmcb->control.event_inj & SVM_EVTINJ_VALID)
+		nsvm_printk("Injecting Event: 0x%x\n",
+				nested_vmcb->control.event_inj);
+	svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
+	svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
+
+	svm->vcpu.arch.hflags |= HF_GIF_MASK;
+
+	return 1;
+}
+
 static int nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
 {
 	to_vmcb->save.fs = from_vmcb->save.fs;
@@ -1298,6 +1463,34 @@ static int vmsave_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 	return 1;
 }
 
+static int vmrun_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+	nsvm_printk("VMrun\n");
+	if (nested_svm_check_permissions(svm))
+		return 1;
+
+	svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+	skip_emulated_instruction(&svm->vcpu);
+
+	if (nested_svm_do(svm, svm->vmcb->save.rax, 0,
+			  NULL, nested_svm_vmrun))
+		return 1;
+
+	/* Right now SVM always intecepts all IOs except for the DEBUG port.
+	   This is fine for the nested VM too IMHO. */
+#ifdef NESTED_KVM_MERGE_IOPM
+	if (nested_svm_do(svm, svm->vmcb->control.iopm_base_pa, 0,
+		      NULL, nested_svm_vmrun_iopm))
+		return 1;
+#endif
+
+	if (nested_svm_do(svm, svm->vmcb->control.msrpm_base_pa, 0,
+		      NULL, nested_svm_vmrun_msrpm))
+		return 1;
+
+	return 1;
+}
+
 static int stgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
 	if (nested_svm_check_permissions(svm))
@@ -1630,7 +1823,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
 	[SVM_EXIT_MSR]				= msr_interception,
 	[SVM_EXIT_TASK_SWITCH]			= task_switch_interception,
 	[SVM_EXIT_SHUTDOWN]			= shutdown_interception,
-	[SVM_EXIT_VMRUN]			= invalid_op_interception,
+	[SVM_EXIT_VMRUN]			= vmrun_interception,
 	[SVM_EXIT_VMMCALL]			= vmmcall_interception,
 	[SVM_EXIT_VMLOAD]			= vmload_interception,
 	[SVM_EXIT_VMSAVE]			= vmsave_interception,
@@ -1937,7 +2130,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	svm->host_cr2 = kvm_read_cr2();
 	svm->host_dr6 = read_dr6();
 	svm->host_dr7 = read_dr7();
-	svm->vmcb->save.cr2 = vcpu->arch.cr2;
+	if (!is_nested(svm))
+		svm->vmcb->save.cr2 = vcpu->arch.cr2;
 	/* required for live migration with NPT */
 	if (npt_enabled)
 		svm->vmcb->save.cr3 = vcpu->arch.cr3;
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index dc30b7b..7469d96 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -736,6 +736,8 @@ enum {
 };
 
 #define HF_GIF_MASK		(1 << 0)
+#define HF_HIF_MASK		(1 << 1)
+#define HF_VINTR_MASK		(1 << 2)
 
 /*
  * Hardware virtualization extension instructions may fault if a
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/10] Add VMEXIT handler and intercepts v4
  2008-10-15 19:31             ` [PATCH 07/10] Add VMRUN handler v4 Alexander Graf
@ 2008-10-15 19:31               ` Alexander Graf
  2008-10-15 19:31                 ` [PATCH 09/10] allow read access to MSR_VM_VR Alexander Graf
  2008-10-19  9:51               ` [PATCH 07/10] Add VMRUN handler v4 Avi Kivity
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

This adds the #VMEXIT intercept, so we return to the level 1 guest
when something happens in the level 2 guest that should return to
the level 1 guest.

v2 implements HIF handling and cleans up exception interception
v3 adds support for V_INTR_MASKING_MASK
v4 uses the host page hsave

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |  326 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 326 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6520bff..f5c1ffd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -74,6 +74,13 @@ module_param(npt, int, S_IRUGO);
 static void kvm_reput_irq(struct vcpu_svm *svm);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override);
+static int nested_svm_vmexit(struct vcpu_svm *svm);
+static int nested_svm_vmsave(struct vcpu_svm *svm, void *nested_vmcb,
+			     void *arg2, void *opaque);
+static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
+				      bool has_error_code, u32 error_code);
+
 static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_svm, vcpu);
@@ -223,6 +230,11 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/* If we are within a nested VM we'd better #VMEXIT and let the
+	   guest handle the exception */
+	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
+		return;
+
 	svm->vmcb->control.event_inj = nr
 		| SVM_EVTINJ_VALID
 		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
@@ -1215,6 +1227,46 @@ static int nested_svm_check_permissions(struct vcpu_svm *svm)
        return 0;
 }
 
+static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
+				      bool has_error_code, u32 error_code)
+{
+	if (is_nested(svm)) {
+		svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
+		svm->vmcb->control.exit_code_hi = 0;
+		svm->vmcb->control.exit_info_1 = error_code;
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> EXCP 0x%x\n", nr);
+
+			nested_svm_vmexit(svm);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static inline int nested_svm_intr(struct vcpu_svm *svm)
+{
+	if (is_nested(svm)) {
+		if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
+			return 0;
+
+		if (!(svm->vcpu.arch.hflags & HF_HIF_MASK))
+			return 0;
+
+		svm->vmcb->control.exit_code = SVM_EXIT_INTR;
+
+		if (nested_svm_exit_handled(svm, false)) {
+			nsvm_printk("VMexit -> INTR\n");
+			nested_svm_vmexit(svm);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 static struct page *nested_svm_get_page(struct vcpu_svm *svm, u64 gpa)
 {
 	struct page *page;
@@ -1275,6 +1327,261 @@ static int nested_svm_do(struct vcpu_svm *svm,
 	return retval;
 }
 
+static int nested_svm_exit_handled_real(struct vcpu_svm *svm,
+					void *arg1,
+					void *arg2,
+					void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	bool kvm_overrides = *(bool *)opaque;
+	u32 exit_code = svm->vmcb->control.exit_code;
+
+	if (kvm_overrides) {
+		switch (exit_code) {
+		case SVM_EXIT_INTR:
+		case SVM_EXIT_NMI:
+			return 0;
+		/* For now we are always handling NPFs when using them */
+		case SVM_EXIT_NPF:
+			if (npt_enabled)
+				return 0;
+			break;
+		/* When we're shadowing, trap PFs */
+		case SVM_EXIT_EXCP_BASE + PF_VECTOR:
+			if (!npt_enabled)
+				return 0;
+			break;
+		default:
+			break;
+		}
+	}
+
+	switch (exit_code) {
+	case SVM_EXIT_READ_CR0 ... SVM_EXIT_READ_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_READ_CR0);
+		if (nested_vmcb->control.intercept_cr_read & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_CR0 ... SVM_EXIT_WRITE_CR8: {
+		u32 cr_bits = 1 << (exit_code - SVM_EXIT_WRITE_CR0);
+		if (nested_vmcb->control.intercept_cr_write & cr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_READ_DR0 ... SVM_EXIT_READ_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_READ_DR0);
+		if (nested_vmcb->control.intercept_dr_read & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_WRITE_DR0 ... SVM_EXIT_WRITE_DR7: {
+		u32 dr_bits = 1 << (exit_code - SVM_EXIT_WRITE_DR0);
+		if (nested_vmcb->control.intercept_dr_write & dr_bits)
+			return 1;
+		break;
+	}
+	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
+		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
+		if (nested_vmcb->control.intercept_exceptions & excp_bits)
+			return 1;
+		break;
+	}
+	default: {
+		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
+		nsvm_printk("exit code: 0x%x\n", exit_code);
+		if (nested_vmcb->control.intercept & exit_bits)
+			return 1;
+	}
+	}
+
+	return 0;
+}
+
+#ifdef NESTED_KVM_MERGE_IOPM
+static int nested_svm_exit_handled_io(struct vcpu_svm *svm,
+				      void *arg1, void *arg2,
+				      void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u16 param = (u16)(svm->vmcb->control.exit_info_1);
+	u16 port = (u16)(svm->vmcb->control.exit_info_1 >> 16);
+	u16 mask = (1 << ((param >> 4) & 7)) - 1;
+	u8 *iopm = (u8 *)arg2 + (port / 8);
+	u16 iopmw = iopm[0] | (iopm[1] << 8);
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
+		return 0;
+	if (iopmw & (mask << (port & 7)))
+		return 1;
+
+	nsvm_printk("nKVM: No IO-Intercept on param=0x%hx port=0x%hx "
+		    "mask=0x%hx iopm=0x%hx\n", param, port, mask, iopmw);
+
+	return 0;
+}
+#endif
+
+static int nested_svm_exit_handled_msr(struct vcpu_svm *svm,
+				       void *arg1, void *arg2,
+				       void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	u8 *msrpm = (u8 *)arg2;
+        u32 t0, t1;
+	u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
+	u32 param = svm->vmcb->control.exit_info_1 & 1;
+
+	if (!(nested_vmcb->control.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+		return 0;
+
+	switch(msr) {
+	case 0 ... 0x1fff:
+		t0 = (msr * 2) % 8;
+		t1 = msr / 8;
+		break;
+	case 0xc0000000 ... 0xc0001fff:
+		t0 = (8192 + msr - 0xc0000000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	case 0xc0010000 ... 0xc0011fff:
+		t0 = (16384 + msr - 0xc0010000) * 2;
+		t1 = (t0 / 8);
+		t0 %= 8;
+		break;
+	default:
+		return 1;
+		break;
+	}
+	if (msrpm[t1] & ((1 << param) << t0))
+		return 1;
+
+	return 0;
+}
+
+static int nested_svm_exit_handled(struct vcpu_svm *svm, bool kvm_override)
+{
+	bool k = kvm_override;
+
+	switch (svm->vmcb->control.exit_code) {
+#ifdef NESTED_KVM_MERGE_IOPM
+	case SVM_EXIT_IOIO:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_iopm, NULL,
+				     nested_svm_exit_handled_io);
+		break;
+#endif
+	case SVM_EXIT_MSR:
+		return nested_svm_do(svm, svm->nested_vmcb,
+				     svm->nested_vmcb_msrpm, NULL,
+				     nested_svm_exit_handled_msr);
+	default: break;
+	}
+
+	return nested_svm_do(svm, svm->nested_vmcb, 0, &k,
+			     nested_svm_exit_handled_real);
+}
+
+static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1,
+				  void *arg2, void *opaque)
+{
+	struct vmcb *nested_vmcb = (struct vmcb *)arg1;
+	struct vmcb *hsave = svm->hsave;
+	u64 nested_save[] = { nested_vmcb->save.cr0,
+			      nested_vmcb->save.cr3,
+			      nested_vmcb->save.cr4,
+			      nested_vmcb->save.efer,
+			      nested_vmcb->control.intercept_cr_read,
+			      nested_vmcb->control.intercept_cr_write,
+			      nested_vmcb->control.intercept_dr_read,
+			      nested_vmcb->control.intercept_dr_write,
+			      nested_vmcb->control.intercept_exceptions,
+			      nested_vmcb->control.intercept,
+			      nested_vmcb->control.msrpm_base_pa,
+			      nested_vmcb->control.iopm_base_pa,
+			      nested_vmcb->control.tsc_offset };
+
+	/* Give the current vmcb to the guest */
+	memcpy(nested_vmcb, svm->vmcb, sizeof(struct vmcb));
+	nested_vmcb->save.cr0 = nested_save[0];
+	if (!npt_enabled)
+		nested_vmcb->save.cr3 = nested_save[1];
+	nested_vmcb->save.cr4 = nested_save[2];
+	nested_vmcb->save.efer = nested_save[3];
+	nested_vmcb->control.intercept_cr_read = nested_save[4];
+	nested_vmcb->control.intercept_cr_write = nested_save[5];
+	nested_vmcb->control.intercept_dr_read = nested_save[6];
+	nested_vmcb->control.intercept_dr_write = nested_save[7];
+	nested_vmcb->control.intercept_exceptions = nested_save[8];
+	nested_vmcb->control.intercept = nested_save[9];
+	nested_vmcb->control.msrpm_base_pa = nested_save[10];
+	nested_vmcb->control.iopm_base_pa = nested_save[11];
+	nested_vmcb->control.tsc_offset = nested_save[12];
+
+	/* We always set V_INTR_MASKING and remember the old value in hflags */
+	if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
+		nested_vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
+
+	if ((nested_vmcb->control.int_ctl & V_IRQ_MASK) &&
+	    (nested_vmcb->control.int_vector)) {
+		nsvm_printk("WARNING: IRQ 0x%x still enabled on #VMEXIT\n",
+				nested_vmcb->control.int_vector);
+	}
+
+	/* Restore the original control entries */
+	svm->vmcb->control = hsave->control;
+
+	/* Flush the virtual TLB */
+	force_new_asid(&svm->vcpu);
+
+	/* Kill any pending exceptions */
+	if (svm->vcpu.arch.exception.pending == true)
+		nsvm_printk("WARNING: Pending Exception\n");
+	svm->vcpu.arch.exception.pending = false;
+
+	/* Restore selected save entries */
+	svm->vmcb->save.es = hsave->save.es;
+	svm->vmcb->save.cs = hsave->save.cs;
+	svm->vmcb->save.ss = hsave->save.ss;
+	svm->vmcb->save.ds = hsave->save.ds;
+	svm->vmcb->save.gdtr = hsave->save.gdtr;
+	svm->vmcb->save.idtr = hsave->save.idtr;
+	svm->vmcb->save.rflags = hsave->save.rflags;
+	svm_set_efer(&svm->vcpu, hsave->save.efer);
+	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
+	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
+	if (npt_enabled) {
+		svm->vmcb->save.cr3 = hsave->save.cr3;
+		svm->vcpu.arch.cr3 = hsave->save.cr3;
+	} else {
+		kvm_set_cr3(&svm->vcpu, hsave->save.cr3);
+	}
+	kvm_mmu_reset_context(&svm->vcpu);
+	kvm_mmu_load(&svm->vcpu);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp);
+	kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip);
+	svm->vmcb->save.dr7 = 0;
+	svm->vmcb->save.cpl = 0;
+	svm->vmcb->control.exit_int_info = 0;
+
+	svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+	/* Exit nested SVM mode */
+	svm->nested_vmcb = 0;
+
+	return 0;
+}
+
+static int nested_svm_vmexit(struct vcpu_svm *svm)
+{
+	nsvm_printk("VMexit\n");
+	if (nested_svm_do(svm, svm->nested_vmcb, NULL,
+			  NULL, nested_svm_vmexit_real))
+		return 1;
+
+	return 0;
+}
 
 static int nested_svm_vmrun_msrpm(struct vcpu_svm *svm, void *arg1,
 				  void *arg2, void *opaque)
@@ -1844,6 +2151,17 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	KVMTRACE_3D(VMEXIT, vcpu, exit_code, (u32)svm->vmcb->save.rip,
 		    (u32)((u64)svm->vmcb->save.rip >> 32), entryexit);
 
+	if (is_nested(svm)) {
+		nsvm_printk("nested handle_exit: 0x%x | 0x%lx | 0x%lx | 0x%lx\n",
+			    exit_code, svm->vmcb->control.exit_info_1,
+			    svm->vmcb->control.exit_info_2, svm->vmcb->save.rip);
+		if (nested_svm_exit_handled(svm, true)) {
+			nested_svm_vmexit(svm);
+			nsvm_printk("-> #VMEXIT\n");
+			return 1;
+		}
+	}
+
 	if (npt_enabled) {
 		int mmu_reload = 0;
 		if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) {
@@ -1931,6 +2249,8 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	nested_svm_intr(svm);
+
 	svm_inject_irq(svm, irq);
 }
 
@@ -1976,6 +2296,9 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
 	if (!kvm_cpu_has_interrupt(vcpu))
 		goto out;
 
+	if (nested_svm_intr(svm))
+		goto out;
+
 	if (!(svm->vcpu.arch.hflags & HF_GIF_MASK))
 		goto out;
 
@@ -2028,6 +2351,9 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_control_area *control = &svm->vmcb->control;
 
+	if (nested_svm_intr(svm))
+		return;
+
 	svm->vcpu.arch.interrupt_window_open =
 		(!(control->int_state & SVM_INTERRUPT_SHADOW_MASK) &&
 		 (svm->vmcb->save.rflags & X86_EFLAGS_IF) &&
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/10] allow read access to MSR_VM_VR
  2008-10-15 19:31               ` [PATCH 08/10] Add VMEXIT handler and intercepts v4 Alexander Graf
@ 2008-10-15 19:31                 ` Alexander Graf
  2008-10-15 19:31                   ` [PATCH 10/10] Allow setting the SVME bit v4 Alexander Graf
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Joerg Roedel, Alexander Graf

From: Joerg Roedel <joerg.roedel@amd.com>

KVM tries to read the VM_CR MSR to find out if SVM was disabled by
the BIOS. So implement read support for this MSR to make nested
SVM running.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f5c1ffd..d2fe440 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1942,6 +1942,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 	case MSR_VM_HSAVE_PA:
 		*data = 0;
 		break;
+	case MSR_VM_CR:
+		*data = 0;
+		break;
 	default:
 		return kvm_get_msr_common(vcpu, ecx, data);
 	}
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/10] Allow setting the SVME bit v4
  2008-10-15 19:31                 ` [PATCH 09/10] allow read access to MSR_VM_VR Alexander Graf
@ 2008-10-15 19:31                   ` Alexander Graf
  0 siblings, 0 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-15 19:31 UTC (permalink / raw)
  To: kvm; +Cc: joro, anthony, avi, Alexander Graf

Normally setting the SVME bit in EFER is not allowed, as we did
not support SVM. Not since we do, we should also allow enabling
SVM mode.

v2 comes as last patch, so we don't enable half-ready code
v4 introduces a module option to enable SVM

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/x86/kvm/svm.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d2fe440..aac8149 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -71,6 +71,9 @@ static int npt = 1;
 
 module_param(npt, int, S_IRUGO);
 
+static int nested = 0;
+module_param(nested, int, S_IRUGO);
+
 static void kvm_reput_irq(struct vcpu_svm *svm);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 
@@ -460,6 +463,9 @@ static __init int svm_hardware_setup(void)
 	if (boot_cpu_has(X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
 
+	if (nested)
+		kvm_enable_efer_bits(MSR_EFER_SVME_MASK);
+
 	for_each_online_cpu(cpu) {
 		r = svm_cpu_init(cpu);
 		if (r)
-- 
1.5.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/10] Add CPUID feature flag for SVM v4
  2008-10-15 19:31 ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
  2008-10-15 19:31   ` [PATCH 02/10] Clean up VINTR setting v4 Alexander Graf
@ 2008-10-16 11:25   ` Alexander Graf
  1 sibling, 0 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-16 11:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: KVM list, Joerg Roedel, anthony@codemonkey.ws Liguori, Avi Kivity


On 15.10.2008, at 21:31, Alexander Graf wrote:

> This patch adds the CPUID feature flag for SVM in the x86 Linux  
> headers.

Looks like the full leaf 0x80000001 found its way into mainline, so  
this patch is not necessary anymore.

Alex

>
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
> include/asm-x86/cpufeature.h |    1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/ 
> cpufeature.h
> index cfcfb0a..28217de 100644
> --- a/include/asm-x86/cpufeature.h
> +++ b/include/asm-x86/cpufeature.h
> @@ -110,6 +110,7 @@
> /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
> #define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
> #define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading  
> not valid */
> +#define X86_FEATURE_SVM		(6*32+ 2) /* Secure Virtual Machine */
> #define X86_FEATURE_IBS		(6*32+ 10) /* Instruction Based Sampling */
>
> /*
> -- 
> 1.5.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] Implement GIF, clgi and stgi v4
  2008-10-15 19:31       ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Alexander Graf
  2008-10-15 19:31         ` [PATCH 05/10] Implement hsave v4 Alexander Graf
@ 2008-10-19  9:46         ` Avi Kivity
  2008-10-19  9:57           ` Alexander Graf
  1 sibling, 1 reply; 24+ messages in thread
From: Avi Kivity @ 2008-10-19  9:46 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony, avi

Alexander Graf wrote:
> This patch implements the GIF flag and the clgi and stgi instructions that
> set this flag. Only if the flag is set (default), interrupts can be received by
> the CPU.
>
> To keep the information about that somewhere, this patch adds a new hidden
> flags vector. that is used to store information that does not go into the
> vmcb, but is SVM specific.
>
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index 4b06ca8..dc30b7b 100644
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -253,6 +253,7 @@ struct kvm_vcpu_arch {
>  	unsigned long cr3;
>  	unsigned long cr4;
>  	unsigned long cr8;
> +	u32 hflags;
>  	u64 pdptrs[4]; /* pae */
>  	u64 shadow_efer;
>  	u64 apic_base;
> @@ -734,6 +735,8 @@ enum {
>  	TASK_SWITCH_GATE = 3,
>  };
>  
> +#define HF_GIF_MASK		(1 << 0)
> +

Why not use bool for gif?  It will improve code readability.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-15 19:31             ` [PATCH 07/10] Add VMRUN handler v4 Alexander Graf
  2008-10-15 19:31               ` [PATCH 08/10] Add VMEXIT handler and intercepts v4 Alexander Graf
@ 2008-10-19  9:51               ` Avi Kivity
  2008-10-19  9:58                 ` Alexander Graf
  1 sibling, 1 reply; 24+ messages in thread
From: Avi Kivity @ 2008-10-19  9:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony, avi

Alexander Graf wrote:
> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
> in the same context as the normal guest CPU would run.
> So basically it is implemented the same way, a normal CPU would do it.
>
> We also prepare all intercepts that get OR'ed with the original
> intercepts, as we do not allow a level 2 guest to be intercepted less
> than the first level guest.
> index 10ad02b..6520bff 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
>  /* Turn on to get debugging output*/
>  /* #define NESTED_DEBUG */
>  
> +/* Not needed until device passthrough */
> +/* #define NESTED_KVM_MERGE_IOPM */
> +
>   

Let's assume device assignment is coming and drop the ifdefs.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add support for nested SVM (kernel) v4
  2008-10-15 19:31 [PATCH 00/10] Add support for nested SVM (kernel) v4 Alexander Graf
  2008-10-15 19:31 ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
@ 2008-10-19  9:56 ` Avi Kivity
  2008-10-19 10:01   ` Alexander Graf
  1 sibling, 1 reply; 24+ messages in thread
From: Avi Kivity @ 2008-10-19  9:56 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony, avi

Alexander Graf wrote:
> The current generation of virtualization extensions only supports one VM layer.
> While we can't change that, it is pretty easy to emulate the CPU's behavior
> and implement the virtualization opcodes ourselves.
>
> This patchset does exactly this for SVM. Using it, KVM can run within a VM.
> Since we're emulating the real CPU's behavior, this should also enable other
> VMMs to run within KVM.
> So far I've only tested to run KVM inside the VM though.
>
> As always, comments and suggestions are highly welcome.
>
> v2 takes most comments from Avi into account.
>
> v3 addresses Joergs comments, including
>
> - V_INTR_MASKING support
> - a generic permission checking helper
>
> v4 addresses even more comments from Joerg, including
>
> - don't use the guest's hsave to store the guest's vmcb in
> - add nested=<int> flag for kvm-amd.ko, defaults to 0 (off)
> - include Joerg's VM_CR MSR patch
>
> To be usable, this patchset requires the two simple changes in the userspace
> part, that I sent to the list with the first version.
>
> Known issues:
>
> - TODO: #VMEXIT on save/restore
> - SMP l2 guests break with in-kernel-apic
>   

Looks ready to apply, though it would be good to get smp working.  
Defaulting to off relaxes some of the worries.

I assume the move to host-backed hsave fixes the security hole Joerg 
spotted?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] Implement GIF, clgi and stgi v4
  2008-10-19  9:46         ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Avi Kivity
@ 2008-10-19  9:57           ` Alexander Graf
  2008-10-19 10:04             ` Avi Kivity
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-19  9:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony, avi


On 19.10.2008, at 11:46, Avi Kivity wrote:

> Alexander Graf wrote:
>> This patch implements the GIF flag and the clgi and stgi  
>> instructions that
>> set this flag. Only if the flag is set (default), interrupts can be  
>> received by
>> the CPU.
>>
>> To keep the information about that somewhere, this patch adds a new  
>> hidden
>> flags vector. that is used to store information that does not go  
>> into the
>> vmcb, but is SVM specific.
>>
>> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
>> index 4b06ca8..dc30b7b 100644
>> --- a/include/asm-x86/kvm_host.h
>> +++ b/include/asm-x86/kvm_host.h
>> @@ -253,6 +253,7 @@ struct kvm_vcpu_arch {
>> 	unsigned long cr3;
>> 	unsigned long cr4;
>> 	unsigned long cr8;
>> +	u32 hflags;
>> 	u64 pdptrs[4]; /* pae */
>> 	u64 shadow_efer;
>> 	u64 apic_base;
>> @@ -734,6 +735,8 @@ enum {
>> 	TASK_SWITCH_GATE = 3,
>> };
>> +#define HF_GIF_MASK		(1 << 0)
>> +
>
> Why not use bool for gif?  It will improve code readability.

This way the code is closer to what qemu cpu emulation is doing  
(probably no good reason). Also it allows for more flags to be added  
later on. I originally had a VMLOAD/VMRUN/VMSAVE merging hack, that  
would set the hflag - until I realized that it's not SMP safe.

Nevertheless, I thought it might be better to have something that's  
easily extensible. If you think otherwise, I'd change it, but I like  
the hflags idea in general.

Alex


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19  9:51               ` [PATCH 07/10] Add VMRUN handler v4 Avi Kivity
@ 2008-10-19  9:58                 ` Alexander Graf
  2008-10-19 10:06                   ` Avi Kivity
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-19  9:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony, avi


On 19.10.2008, at 11:51, Avi Kivity wrote:

> Alexander Graf wrote:
>> This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
>> in the same context as the normal guest CPU would run.
>> So basically it is implemented the same way, a normal CPU would do  
>> it.
>>
>> We also prepare all intercepts that get OR'ed with the original
>> intercepts, as we do not allow a level 2 guest to be intercepted less
>> than the first level guest.
>> index 10ad02b..6520bff 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
>> /* Turn on to get debugging output*/
>> /* #define NESTED_DEBUG */
>> +/* Not needed until device passthrough */
>> +/* #define NESTED_KVM_MERGE_IOPM */
>> +
>>
>
> Let's assume device assignment is coming and drop the ifdefs.

Will device assignment work with in-guest IO Passthrough or will it  
rather emulate the IO accesses in userspace by issuing them again?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add support for nested SVM (kernel) v4
  2008-10-19  9:56 ` [PATCH 00/10] Add support for nested SVM (kernel) v4 Avi Kivity
@ 2008-10-19 10:01   ` Alexander Graf
  0 siblings, 0 replies; 24+ messages in thread
From: Alexander Graf @ 2008-10-19 10:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony, avi


On 19.10.2008, at 11:56, Avi Kivity wrote:

> Alexander Graf wrote:
>> The current generation of virtualization extensions only supports  
>> one VM layer.
>> While we can't change that, it is pretty easy to emulate the CPU's  
>> behavior
>> and implement the virtualization opcodes ourselves.
>>
>> This patchset does exactly this for SVM. Using it, KVM can run  
>> within a VM.
>> Since we're emulating the real CPU's behavior, this should also  
>> enable other
>> VMMs to run within KVM.
>> So far I've only tested to run KVM inside the VM though.
>>
>> As always, comments and suggestions are highly welcome.
>>
>> v2 takes most comments from Avi into account.
>>
>> v3 addresses Joergs comments, including
>>
>> - V_INTR_MASKING support
>> - a generic permission checking helper
>>
>> v4 addresses even more comments from Joerg, including
>>
>> - don't use the guest's hsave to store the guest's vmcb in
>> - add nested=<int> flag for kvm-amd.ko, defaults to 0 (off)
>> - include Joerg's VM_CR MSR patch
>>
>> To be usable, this patchset requires the two simple changes in the  
>> userspace
>> part, that I sent to the list with the first version.
>>
>> Known issues:
>>
>> - TODO: #VMEXIT on save/restore
>> - SMP l2 guests break with in-kernel-apic
>>
>
> Looks ready to apply, though it would be good to get smp working.   
> Defaulting to off relaxes some of the worries.

Oh, cool. I think as long as everything's guarded by EFLAGS_SVME, and  
nested=0 by default, it really shouldn't bother anyone. Nothing except  
for svm.c should be affected.

> I assume the move to host-backed hsave fixes the security hole Joerg  
> spotted?

Yes. The problem was that potentially a l2 guest could have access to  
the hsave page when the l1 guest maps it into the allowed virtual  
memory of the l2 guest. If so, the l2 guest could modify the hsave,  
which is assumed to be not tempered with by the rest of the code.
So by not exposing the actual hsave to the l1 guest, no l2 guest will  
ever be able to temper with it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] Implement GIF, clgi and stgi v4
  2008-10-19  9:57           ` Alexander Graf
@ 2008-10-19 10:04             ` Avi Kivity
  0 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2008-10-19 10:04 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>>> };
>>> +#define HF_GIF_MASK        (1 << 0)
>>> +
>>
>> Why not use bool for gif?  It will improve code readability.
>
> This way the code is closer to what qemu cpu emulation is doing 
> (probably no good reason). Also it allows for more flags to be added 
> later on. I originally had a VMLOAD/VMRUN/VMSAVE merging hack, that 
> would set the hflag - until I realized that it's not SMP safe.
>

Well, bool is also extensible (add more bools...).

> Nevertheless, I thought it might be better to have something that's 
> easily extensible. If you think otherwise, I'd change it, but I like 
> the hflags idea in general.

It's a matter of personal preference; I guess it would be silly for me 
to insist on such a small point given the massive amount of work that 
has gone into this.  So I'll leave it up to you.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19  9:58                 ` Alexander Graf
@ 2008-10-19 10:06                   ` Avi Kivity
  2008-10-19 10:13                     ` Alexander Graf
  0 siblings, 1 reply; 24+ messages in thread
From: Avi Kivity @ 2008-10-19 10:06 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony, avi

Alexander Graf wrote:
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
>>> /* Turn on to get debugging output*/
>>> /* #define NESTED_DEBUG */
>>> +/* Not needed until device passthrough */
>>> +/* #define NESTED_KVM_MERGE_IOPM */
>>> +
>>>
>>
>> Let's assume device assignment is coming and drop the ifdefs.
>
> Will device assignment work with in-guest IO Passthrough or will it 
> rather emulate the IO accesses in userspace by issuing them again?
>

So far we're talking about reissuing pio.  The only case where the guest 
and host pio are likely to have the same addresses is the vga ports, and 
these aren't heavily used.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19 10:06                   ` Avi Kivity
@ 2008-10-19 10:13                     ` Alexander Graf
  2008-10-19 10:27                       ` Avi Kivity
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-19 10:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony, avi


On 19.10.2008, at 12:06, Avi Kivity wrote:

> Alexander Graf wrote:
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -51,6 +51,9 @@ MODULE_LICENSE("GPL");
>>>> /* Turn on to get debugging output*/
>>>> /* #define NESTED_DEBUG */
>>>> +/* Not needed until device passthrough */
>>>> +/* #define NESTED_KVM_MERGE_IOPM */
>>>> +
>>>>
>>>
>>> Let's assume device assignment is coming and drop the ifdefs.
>>
>> Will device assignment work with in-guest IO Passthrough or will it  
>> rather emulate the IO accesses in userspace by issuing them again?
>>
>
> So far we're talking about reissuing pio.  The only case where the  
> guest and host pio are likely to have the same addresses is the vga  
> ports, and these aren't heavily used.

So it's better to leave it disabled, I guess. The IOPM is only  
relevant when passing through IO ports 1:1. And parsing the IOPM takes  
a lot of effort (read: time). It's also only relevant if the same  
device, the guest is exposed to, is passed through to the l2 guest.  
IMHO speed is more important atm.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19 10:13                     ` Alexander Graf
@ 2008-10-19 10:27                       ` Avi Kivity
  2008-10-19 10:33                         ` Alexander Graf
  0 siblings, 1 reply; 24+ messages in thread
From: Avi Kivity @ 2008-10-19 10:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony, avi

Alexander Graf wrote:
>>
>> So far we're talking about reissuing pio.  The only case where the 
>> guest and host pio are likely to have the same addresses is the vga 
>> ports, and these aren't heavily used.
>
> So it's better to leave it disabled, I guess. The IOPM is only 
> relevant when passing through IO ports 1:1. And parsing the IOPM takes 
> a lot of effort (read: time). It's also only relevant if the same 
> device, the guest is exposed to, is passed through to the l2 guest. 
> IMHO speed is more important atm.
>

Okay.  Please drop those bits then.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19 10:27                       ` Avi Kivity
@ 2008-10-19 10:33                         ` Alexander Graf
  2008-10-19 10:54                           ` Avi Kivity
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2008-10-19 10:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joro, anthony, avi


On 19.10.2008, at 12:27, Avi Kivity wrote:

> Alexander Graf wrote:
>>>
>>> So far we're talking about reissuing pio.  The only case where the  
>>> guest and host pio are likely to have the same addresses is the  
>>> vga ports, and these aren't heavily used.
>>
>> So it's better to leave it disabled, I guess. The IOPM is only  
>> relevant when passing through IO ports 1:1. And parsing the IOPM  
>> takes a lot of effort (read: time). It's also only relevant if the  
>> same device, the guest is exposed to, is passed through to the l2  
>> guest. IMHO speed is more important atm.
>>
>
> Okay.  Please drop those bits then.

Can do. I only implemented them for functional completeness in the  
first place, but didn't want to affect performance because of them -  
that's why I ifdef'ed them out.

Any more requests for v5?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] Add VMRUN handler v4
  2008-10-19 10:33                         ` Alexander Graf
@ 2008-10-19 10:54                           ` Avi Kivity
  0 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2008-10-19 10:54 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joro, anthony

Alexander Graf wrote:
>
> Any more requests for v5?
>

Review and ack by Joerg.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-10-19 10:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-15 19:31 [PATCH 00/10] Add support for nested SVM (kernel) v4 Alexander Graf
2008-10-15 19:31 ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
2008-10-15 19:31   ` [PATCH 02/10] Clean up VINTR setting v4 Alexander Graf
2008-10-15 19:31     ` [PATCH 03/10] Add helper functions for nested SVM v4 Alexander Graf
2008-10-15 19:31       ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Alexander Graf
2008-10-15 19:31         ` [PATCH 05/10] Implement hsave v4 Alexander Graf
2008-10-15 19:31           ` [PATCH 06/10] Add VMLOAD and VMSAVE handlers v4 Alexander Graf
2008-10-15 19:31             ` [PATCH 07/10] Add VMRUN handler v4 Alexander Graf
2008-10-15 19:31               ` [PATCH 08/10] Add VMEXIT handler and intercepts v4 Alexander Graf
2008-10-15 19:31                 ` [PATCH 09/10] allow read access to MSR_VM_VR Alexander Graf
2008-10-15 19:31                   ` [PATCH 10/10] Allow setting the SVME bit v4 Alexander Graf
2008-10-19  9:51               ` [PATCH 07/10] Add VMRUN handler v4 Avi Kivity
2008-10-19  9:58                 ` Alexander Graf
2008-10-19 10:06                   ` Avi Kivity
2008-10-19 10:13                     ` Alexander Graf
2008-10-19 10:27                       ` Avi Kivity
2008-10-19 10:33                         ` Alexander Graf
2008-10-19 10:54                           ` Avi Kivity
2008-10-19  9:46         ` [PATCH 04/10] Implement GIF, clgi and stgi v4 Avi Kivity
2008-10-19  9:57           ` Alexander Graf
2008-10-19 10:04             ` Avi Kivity
2008-10-16 11:25   ` [PATCH 01/10] Add CPUID feature flag for SVM v4 Alexander Graf
2008-10-19  9:56 ` [PATCH 00/10] Add support for nested SVM (kernel) v4 Avi Kivity
2008-10-19 10:01   ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox