[PATCH 0/5] KVM paravirt_ops implementation

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] KVM paravirt_ops implementation
@ 2007-06-18  2:56 Anthony Liguori
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  2:56 UTC (permalink / raw)
  To: kvm-devel, virtualization, Avi Kivity, Rusty Russell, Ingo Molnar

Hi,

This patch series is an update of my previous paravirt_ops patches.  
They are loosely based on Ingo's original paravirt_ops implementation 
for KVM.  Some of the changes since the last series include:

1) Switch to using CPUID 0x40000000 instead of using MSR writes to 
discover shared memory area
2) Attempt to deal with SMP guests
3) Support for generic CR read caching
4) Support for batching of MMU operations

Some known issues:

1) Not really sure what is needed for CONFIG_PREEMPT support.  I'm not 
sure which paravirt_ops calls are actually re-entrant.
2) The paravirt_ops implementation is registered with core_initcall().  
However, the paravirt_ops banner is also printed with core_initcall() so 
that fact that this works now is just the luck of build order.  Need a 
better way to initialize the KVM paravirt_ops backend.
3) These patches probably break guest save/restore.  Need a generic way 
to get additional save/restore callbacks in the kernel (perhaps the 
in-kernel APIC series addresses this already?).

The latest versions of these patches are available at:

http://hg.codemonkey.ws/kvm-paravirt

Regards,

Anthony Liguori

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  2:58   ` Anthony Liguori
       [not found]     ` <4675F4C3.6050700-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  2:58   ` [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops Anthony Liguori
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  2:58 UTC (permalink / raw)
  To: kvm-devel; +Cc: virtualization

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

Regards,

Anthony Liguori

[-- Attachment #2: kvm-paravirt-core.diff --]
[-- Type: text/x-patch, Size: 14174 bytes --]

Subject: [PATCH] KVM paravirt_ops core infrastructure
Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

This patch implements paravirt_ops support for KVM and updates the current
paravirtualization support in KVM to match.  Some changes to the previous
paravirtualization support in KVM:

  1) Theoritical support for SMP guests
  2) Use CPUID to discover paravirtualization
  3) Use feature bitmap instead of versioning

Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 8770a5d..97ad1e1 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -231,6 +231,13 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_GUEST
+	bool "KVM paravirt-ops support"
+	depends on PARAVIRT
+	help
+	  This option enables various optimizations for running under the KVM
+          hypervisor.
+
 config ACPI_SRAT
 	bool
 	default y
diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 06da59f..12a4201 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -42,6 +42,7 @@ obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
 
 obj-$(CONFIG_VMI)		+= vmi.o vmiclock.o
+obj-$(CONFIG_KVM_GUEST)		+= kvm.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
new file mode 100644
index 0000000..22ea647
--- /dev/null
+++ b/arch/i386/kernel/kvm.c
@@ -0,0 +1,219 @@
+/*
+ * KVM paravirt_ops implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2007, Red Hat, Inc., Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
+ * Copyright IBM Corporation, 2007
+ *   Authors: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/kvm_para.h>
+#include <linux/cpu.h>
+#include <linux/mm.h>
+
+struct kvm_paravirt_state
+{
+	struct kvm_vmca *vmca;
+	struct kvm_hypercall_entry *queue;
+	void (*hypercall)(void);
+
+	u64 vmca_gpa;
+};
+
+static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
+
+static int do_nop_io_delay;
+static u64 msr_set_vmca;
+
+static long kvm_hypercall(unsigned int nr, unsigned long p1,
+			  unsigned long p2, unsigned long p3,
+			  unsigned long p4)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, smp_processor_id());
+	long ret;
+
+	asm volatile("call *(%6) \n\t"
+		     : "=a"(ret)
+		     : "a" (nr),
+		     "b" (p1),
+		     "c" (p2),
+		     "d" (p3),
+		     "S" (p4),
+		     "r" (&state->hypercall)
+		     : "memory", "cc"
+		     );
+
+	return ret;
+}
+
+/*
+ * No need for any "IO delay" on KVM
+ */
+static void kvm_io_delay(void)
+{
+}
+
+static void paravirt_ops_setup(void)
+{
+	paravirt_ops.name = "KVM";
+
+	if (do_nop_io_delay)
+		paravirt_ops.io_delay = kvm_io_delay;
+
+	paravirt_ops.paravirt_enabled = 1;
+
+	apply_paravirt(__parainstructions, __parainstructions_end);
+}
+
+static void paravirt_activate(void *unused)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, raw_smp_processor_id());
+	wrmsrl(msr_set_vmca, state->vmca_gpa);
+}
+
+static int paravirt_initialize(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	char signature[13];
+
+	/* verify that we're running on KVM */
+	cpuid(CPUID_HYPE_IDENT, &eax, &ebx, &ecx, &edx);
+	memcpy(signature, &ebx, 4);
+	memcpy(signature + 4, &ecx, 4);
+	memcpy(signature + 8, &edx, 4);
+	signature[12] = 0;
+
+	if (strcmp(signature, "KVMKVMKVMKVM"))
+		return -EINVAL;
+
+	/* check what features are supported */
+	cpuid(CPUID_HYPE_KVM_FEATURES, &eax, &ebx, &ecx, &edx);
+	msr_set_vmca = eax;
+
+	/* no paravirtualization is supported */
+	if (!(edx & KVM_FEATURE_VMCA))
+		return -ENOSYS;
+
+	if ((edx & KVM_FEATURE_NOP_IO_DELAY))
+		do_nop_io_delay = 1;
+
+	on_each_cpu(paravirt_activate, NULL, 0, 1);
+
+	return 0;
+}
+
+static __init void paravirt_free_state(struct kvm_paravirt_state *state)
+{
+	if (!state)
+		return;
+
+	if (state->hypercall)
+		__free_page(pfn_to_page(__pa(state->hypercall) >> PAGE_SHIFT));
+
+	if (state->vmca)
+		__free_page(pfn_to_page(__pa(state->vmca) >> PAGE_SHIFT));
+
+	__free_page(pfn_to_page(__pa(state) >> PAGE_SHIFT));
+}
+
+static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
+{
+	struct kvm_paravirt_state *state;
+
+	state = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state)
+		goto err;
+
+	state->vmca = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state->vmca)
+		goto err;
+
+	/* FIXME: what do I need for this to be executable on 64 bit? */
+	state->hypercall = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state->hypercall)
+		goto err;
+
+	state->vmca_gpa = __pa(state->vmca);
+	state->vmca->hypercall_gpa = __pa(state->hypercall);
+
+	return state;
+
+ err:
+	paravirt_free_state(state);
+	return NULL;
+}
+
+/* FIXME: hotplug hooks whenever KVM supports CPU hotplug */
+
+static __init void paravirt_free_area(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		struct kvm_paravirt_state *state;
+		state = per_cpu(paravirt_state, cpu);
+		paravirt_free_state(state);
+	}
+}
+
+static __init int paravirt_alloc_area(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		struct kvm_paravirt_state *state;
+
+		state = paravirt_alloc_state();
+		if (!state)
+			goto err;
+
+		per_cpu(paravirt_state, cpu) = state;
+	}
+
+	return 0;
+
+ err:
+	paravirt_free_area();
+	return -ENOMEM;
+}
+
+static int __init kvm_guest_init(void)
+{
+	int rc;
+
+	rc = paravirt_alloc_area();
+	if (rc)
+		return rc;
+
+	rc = paravirt_initialize();
+	if (rc)
+		goto err;
+
+	paravirt_ops_setup();
+
+	return rc;
+
+ err:
+	paravirt_free_area();
+	return rc;
+}
+
+/* FIXME: need a better solution! */
+core_initcall(kvm_guest_init);
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 633c2ed..f7a0e6e 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -43,6 +43,7 @@
 #include <linux/sched.h>
 #include <linux/cpumask.h>
 #include <linux/smp.h>
+#include <linux/kvm_para.h>
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
@@ -91,6 +92,11 @@ struct vfsmount *kvmfs_mnt;
 #define CR8_RESEVED_BITS (~0x0fULL)
 #define EFER_RESERVED_BITS 0xfffffffffffff2fe
 
+#define KVM_PARAVIRT_FEATURES \
+	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY)
+
+#define KVM_MSR_SET_VMCA	0x87655678
+
 #ifdef CONFIG_X86_64
 // LDT or TSS descriptor in the GDT. 16 bytes.
 struct segment_descriptor_64 {
@@ -1340,12 +1346,19 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
+			      unsigned long p1, unsigned long p2,
+			      unsigned long p3, unsigned long p4)
+{
+	return -ENOSYS;
+}
+
 int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	unsigned long nr, a0, a1, a2, a3, a4, a5, ret;
 
 	kvm_arch_ops->cache_regs(vcpu);
-	ret = -KVM_EINVAL;
+	ret = -EINVAL;
 #ifdef CONFIG_X86_64
 	if (is_long_mode(vcpu)) {
 		nr = vcpu->regs[VCPU_REGS_RAX];
@@ -1358,16 +1371,17 @@ int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	} else
 #endif
 	{
-		nr = vcpu->regs[VCPU_REGS_RBX] & -1u;
-		a0 = vcpu->regs[VCPU_REGS_RAX] & -1u;
+		nr = vcpu->regs[VCPU_REGS_RAX] & -1u;
+		a0 = vcpu->regs[VCPU_REGS_RBX] & -1u;
 		a1 = vcpu->regs[VCPU_REGS_RCX] & -1u;
 		a2 = vcpu->regs[VCPU_REGS_RDX] & -1u;
 		a3 = vcpu->regs[VCPU_REGS_RSI] & -1u;
 		a4 = vcpu->regs[VCPU_REGS_RDI] & -1u;
 		a5 = vcpu->regs[VCPU_REGS_RBP] & -1u;
 	}
-	switch (nr) {
-	default:
+
+	ret = dispatch_hypercall(vcpu, nr, a0, a1, a2, a3);
+	if (ret == -ENOSYS) {
 		run->hypercall.args[0] = a0;
 		run->hypercall.args[1] = a1;
 		run->hypercall.args[2] = a2;
@@ -1456,7 +1470,7 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
  */
 static int vcpu_register_para(struct kvm_vcpu *vcpu, gpa_t para_state_gpa)
 {
-	struct kvm_vcpu_para_state *para_state;
+	struct kvm_vmca *para_state;
 	hpa_t para_state_hpa, hypercall_hpa;
 	struct page *para_state_page;
 	unsigned char *hypercall;
@@ -1476,30 +1490,14 @@ static int vcpu_register_para(struct kvm_vcpu *vcpu, gpa_t para_state_gpa)
 	if (is_error_hpa(para_state_hpa))
 		goto err_gp;
 
-	mark_page_dirty(vcpu->kvm, para_state_gpa >> PAGE_SHIFT);
 	para_state_page = pfn_to_page(para_state_hpa >> PAGE_SHIFT);
 	para_state = kmap_atomic(para_state_page, KM_USER0);
 
-	printk(KERN_DEBUG "....  guest version: %d\n", para_state->guest_version);
-	printk(KERN_DEBUG "....           size: %d\n", para_state->size);
-
-	para_state->host_version = KVM_PARA_API_VERSION;
-	/*
-	 * We cannot support guests that try to register themselves
-	 * with a newer API version than the host supports:
-	 */
-	if (para_state->guest_version > KVM_PARA_API_VERSION) {
-		para_state->ret = -KVM_EINVAL;
-		goto err_kunmap_skip;
-	}
-
 	hypercall_gpa = para_state->hypercall_gpa;
 	hypercall_hpa = gpa_to_hpa(vcpu, hypercall_gpa);
 	printk(KERN_DEBUG ".... hypercall_hpa: %08Lx\n", hypercall_hpa);
-	if (is_error_hpa(hypercall_hpa)) {
-		para_state->ret = -KVM_EINVAL;
+	if (is_error_hpa(hypercall_hpa))
 		goto err_kunmap_skip;
-	}
 
 	printk(KERN_DEBUG "kvm: para guest successfully registered.\n");
 	vcpu->para_state_page = para_state_page;
@@ -1512,7 +1510,6 @@ static int vcpu_register_para(struct kvm_vcpu *vcpu, gpa_t para_state_gpa)
 	kvm_arch_ops->patch_hypercall(vcpu, hypercall);
 	kunmap_atomic(hypercall, KM_USER1);
 
-	para_state->ret = 0;
 err_kunmap_skip:
 	kunmap_atomic(para_state, KM_USER0);
 	return 0;
@@ -1633,12 +1630,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->ia32_misc_enable_msr = data;
 		break;
-	/*
-	 * This is the 'probe whether the host is KVM' logic:
-	 */
-	case MSR_KVM_API_MAGIC:
-		return vcpu_register_para(vcpu, data);
-
+	case KVM_MSR_SET_VMCA:
+		vcpu_register_para(vcpu, data);
+		break;
 	default:
 		printk(KERN_ERR "kvm: unhandled wrmsr: 0x%x\n", msr);
 		return 1;
@@ -1693,6 +1687,20 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 
 	kvm_arch_ops->cache_regs(vcpu);
 	function = vcpu->regs[VCPU_REGS_RAX];
+
+	if (function == CPUID_HYPE_IDENT) {
+		vcpu->regs[VCPU_REGS_RAX] = 0;
+		/* KVMKVMKVMKVM */
+		vcpu->regs[VCPU_REGS_RBX] = 0x4b4d564b;
+		vcpu->regs[VCPU_REGS_RCX] = 0x564b4d56;
+		vcpu->regs[VCPU_REGS_RDX] = 0x4d564b4d;
+		goto out;
+	} else if (function == CPUID_HYPE_KVM_FEATURES) {
+		vcpu->regs[VCPU_REGS_RAX] = KVM_MSR_SET_VMCA;
+		vcpu->regs[VCPU_REGS_RDX] = KVM_PARAVIRT_FEATURES;
+		goto out;
+	}
+
 	vcpu->regs[VCPU_REGS_RAX] = 0;
 	vcpu->regs[VCPU_REGS_RBX] = 0;
 	vcpu->regs[VCPU_REGS_RCX] = 0;
@@ -1717,6 +1725,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 		vcpu->regs[VCPU_REGS_RCX] = best->ecx;
 		vcpu->regs[VCPU_REGS_RDX] = best->edx;
 	}
+ out:
 	kvm_arch_ops->decache_regs(vcpu);
 	kvm_arch_ops->skip_emulated_instruction(vcpu);
 }
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 3b29256..cf51d4a 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -8,66 +8,26 @@
  *       as we make progress.
  */
 
-/*
- * Per-VCPU descriptor area shared between guest and host. Writable to
- * both guest and host. Registered with the host by the guest when
- * a guest acknowledges paravirtual mode.
- *
- * NOTE: all addresses are guest-physical addresses (gpa), to make it
- * easier for the hypervisor to map between the various addresses.
- */
-struct kvm_vcpu_para_state {
-	/*
-	 * API version information for compatibility. If there's any support
-	 * mismatch (too old host trying to execute too new guest) then
-	 * the host will deny entry into paravirtual mode. Any other
-	 * combination (new host + old guest and new host + new guest)
-	 * is supposed to work - new host versions will support all old
-	 * guest API versions.
-	 */
-	u32 guest_version;
-	u32 host_version;
-	u32 size;
-	u32 ret;
-
-	/*
-	 * The address of the vm exit instruction (VMCALL or VMMCALL),
-	 * which the host will patch according to the CPU model the
-	 * VM runs on:
-	 */
-	u64 hypercall_gpa;
-
-} __attribute__ ((aligned(PAGE_SIZE)));
+#define CPUID_HYPE_IDENT		0x40000000
+#define CPUID_HYPE_KVM_FEATURES		0x40000001
 
-#define KVM_PARA_API_VERSION 1
+#define KVM_FEATURE_VMCA		(1UL << 0)
+#define KVM_FEATURE_NOP_IO_DELAY	(1UL << 1)
 
-/*
- * This is used for an RDMSR's ECX parameter to probe for a KVM host.
- * Hopefully no CPU vendor will use up this number. This is placed well
- * out of way of the typical space occupied by CPU vendors' MSR indices,
- * and we think (or at least hope) it wont be occupied in the future
- * either.
- */
-#define MSR_KVM_API_MAGIC 0x87655678
-
-#define KVM_EINVAL 1
+struct kvm_vmca
+{
+	u64 hypercall_gpa;
+};
 
 /*
  * Hypercall calling convention:
  *
- * Each hypercall may have 0-6 parameters.
- *
- * 64-bit hypercall index is in RAX, goes from 0 to __NR_hypercalls-1
- *
- * 64-bit parameters 1-6 are in the standard gcc x86_64 calling convention
- * order: RDI, RSI, RDX, RCX, R8, R9.
+ * Each hypercall may have 0-4 parameters.
  *
- * 32-bit index is EBX, parameters are: EAX, ECX, EDX, ESI, EDI, EBP.
- * (the first 3 are according to the gcc regparm calling convention)
+ * 32-bit index is EAX, parameters are: EBX, ECX, EDX, ESI.
  *
  * No registers are clobbered by the hypercall, except that the
  * return value is in RAX.
  */
-#define __NR_hypercalls			0
 
 #endif

[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

[parent not found: <4675F4C3.6050700-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]     ` <4675F4C3.6050700-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  8:03       ` Avi Kivity
       [not found]         ` <46763C6B.9050004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-06-26  8:04       ` Dor Laor
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  8:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:


> -		nr = vcpu->regs[VCPU_REGS_RBX] & -1u;
> -		a0 = vcpu->regs[VCPU_REGS_RAX] & -1u;
> +		nr = vcpu->regs[VCPU_REGS_RAX] & -1u;
> +		a0 = vcpu->regs[VCPU_REGS_RBX] & -1u;


> - * Each hypercall may have 0-6 parameters.
> - *
> - * 64-bit hypercall index is in RAX, goes from 0 to __NR_hypercalls-1
> - *
> - * 64-bit parameters 1-6 are in the standard gcc x86_64 calling convention
> - * order: RDI, RSI, RDX, RCX, R8, R9.
> + * Each hypercall may have 0-4 parameters.
>   *
> - * 32-bit index is EBX, parameters are: EAX, ECX, EDX, ESI, EDI, EBP.
> - * (the first 3 are according to the gcc regparm calling convention)
> + * 32-bit index is EAX, parameters are: EBX, ECX, EDX, ESI.


What's the motivation for these changes?



-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46763C6B.9050004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]         ` <46763C6B.9050004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:25           ` Anthony Liguori
       [not found]             ` <467679C5.6030201-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>
>
>> -        nr = vcpu->regs[VCPU_REGS_RBX] & -1u;
>> -        a0 = vcpu->regs[VCPU_REGS_RAX] & -1u;
>> +        nr = vcpu->regs[VCPU_REGS_RAX] & -1u;
>> +        a0 = vcpu->regs[VCPU_REGS_RBX] & -1u;
>
>
>> - * Each hypercall may have 0-6 parameters.
>> - *
>> - * 64-bit hypercall index is in RAX, goes from 0 to __NR_hypercalls-1
>> - *
>> - * 64-bit parameters 1-6 are in the standard gcc x86_64 calling 
>> convention
>> - * order: RDI, RSI, RDX, RCX, R8, R9.
>> + * Each hypercall may have 0-4 parameters.
>>   *
>> - * 32-bit index is EBX, parameters are: EAX, ECX, EDX, ESI, EDI, EBP.
>> - * (the first 3 are according to the gcc regparm calling convention)
>> + * 32-bit index is EAX, parameters are: EBX, ECX, EDX, ESI.
>
>
> What's the motivation for these changes?

If we're queuing hypercalls, then having 4 arguments verses 6 means that 
we can queue 50% more hypercalls in a single page.  Using all six 
arguments clobbers all the GP registers in 32-bit mode too.

Regards,

Anthony Liguori

>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <467679C5.6030201-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]             ` <467679C5.6030201-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 12:28               ` Avi Kivity
  0 siblings, 0 replies; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 12:28 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
>>
>> What's the motivation for these changes?
>
> If we're queuing hypercalls, then having 4 arguments verses 6 means 
> that we can queue 50% more hypercalls in a single page.  Using all six 
> arguments clobbers all the GP registers in 32-bit mode too.

Makes sense.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]     ` <4675F4C3.6050700-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  8:03       ` Avi Kivity
@ 2007-06-26  8:04       ` Dor Laor
       [not found]         ` <64F9B87B6B770947A9F8391472E032160C73025E-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Dor Laor @ 2007-06-26  8:04 UTC (permalink / raw)
  To: Anthony Liguori, kvm-devel; +Cc: virtualization

...
+static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
+{
+	struct kvm_paravirt_state *state;
+
+	state = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state)
+		goto err;
+
+	state->vmca = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state->vmca)
+		goto err;
+
+	/* FIXME: what do I need for this to be executable on 64 bit? */
+	state->hypercall = (void *)get_zeroed_page(GFP_KERNEL);

Why do you alloc a page for the hypercall instead of using Ingo's code
below? This way it can work for 64 bit too.

Ingo's code:
/*
 * This is the vm-syscall address - to be patched by the host to
 * VMCALL (Intel) or VMMCALL (AMD), depending on the CPU model:
 */
asm (
        "       .globl hypercall_addr                   \n"
        "       .align 4                                \n"
        "       hypercall_addr:                         \n"
	"               movl $-38, %eax                 \n"
        "               ret                             \n"
);
extern unsigned char hypercall_addr[6];


And use it this way: (I used vmalloc_to_page since its compiles as a
module)
hypercall_addr_page = vmalloc_to_page(hypercall_addr);
para_state->hypercall_gpa = page_to_pfn(hypercall_addr_page) <<
PAGE_SHIFT | 
				    offset_in_page(hypercall_addr);

Regards,
	Dor.



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <64F9B87B6B770947A9F8391472E032160C73025E-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>]

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]         ` <64F9B87B6B770947A9F8391472E032160C73025E-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
@ 2007-06-26  8:45           ` Jun Koi
       [not found]             ` <fdaac4d50706260145x1ebceadt432edd5b6a6ac1f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2007-06-26 11:56           ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Jun Koi @ 2007-06-26  8:45 UTC (permalink / raw)
  To: Dor Laor; +Cc: kvm-devel, virtualization

On 6/26/07, Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> ...
> +static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
> +{
> +       struct kvm_paravirt_state *state;
> +
> +       state = (void *)get_zeroed_page(GFP_KERNEL);
> +       if (!state)
> +               goto err;
> +
> +       state->vmca = (void *)get_zeroed_page(GFP_KERNEL);
> +       if (!state->vmca)
> +               goto err;
> +
> +       /* FIXME: what do I need for this to be executable on 64 bit? */
> +       state->hypercall = (void *)get_zeroed_page(GFP_KERNEL);
>
> Why do you alloc a page for the hypercall instead of using Ingo's code
> below? This way it can work for 64 bit too.
>
> Ingo's code:
> /*
>  * This is the vm-syscall address - to be patched by the host to
>  * VMCALL (Intel) or VMMCALL (AMD), depending on the CPU model:
>  */
> asm (
>         "       .globl hypercall_addr                   \n"
>         "       .align 4                                \n"
>         "       hypercall_addr:                         \n"
>         "               movl $-38, %eax                 \n"
>         "               ret                             \n"
> );

The assembly code "movl $-38, %eax; \nret" is only a "reserved place",
which is later overwritten by hypercall address from the host, isnt
it?

If so, why dont we simply put 4 NOPs there?

Thanks,
Jun

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <fdaac4d50706260145x1ebceadt432edd5b6a6ac1f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]             ` <fdaac4d50706260145x1ebceadt432edd5b6a6ac1f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-06-26 11:57               ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-26 11:57 UTC (permalink / raw)
  To: Jun Koi; +Cc: kvm-devel, virtualization

Jun Koi wrote:
> On 6/26/07, Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>> ...
>> +static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
>> +{
>> +       struct kvm_paravirt_state *state;
>> +
>> +       state = (void *)get_zeroed_page(GFP_KERNEL);
>> +       if (!state)
>> +               goto err;
>> +
>> +       state->vmca = (void *)get_zeroed_page(GFP_KERNEL);
>> +       if (!state->vmca)
>> +               goto err;
>> +
>> +       /* FIXME: what do I need for this to be executable on 64 bit? */
>> +       state->hypercall = (void *)get_zeroed_page(GFP_KERNEL);
>>
>> Why do you alloc a page for the hypercall instead of using Ingo's code
>> below? This way it can work for 64 bit too.
>>
>> Ingo's code:
>> /*
>>  * This is the vm-syscall address - to be patched by the host to
>>  * VMCALL (Intel) or VMMCALL (AMD), depending on the CPU model:
>>  */
>> asm (
>>         "       .globl hypercall_addr                   \n"
>>         "       .align 4                                \n"
>>         "       hypercall_addr:                         \n"
>>         "               movl $-38, %eax                 \n"
>>         "               ret                             \n"
>> );
>
> The assembly code "movl $-38, %eax; \nret" is only a "reserved place",
> which is later overwritten by hypercall address from the host, isnt
> it?
>
> If so, why dont we simply put 4 NOPs there?

So if the hypervisor fails to patch it, we get a proper errno instead of 
running off into random code.

Regards,

Anthony Liguori

> Thanks,
> Jun
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 1/5] KVM paravirt_ops core infrastructure
       [not found]         ` <64F9B87B6B770947A9F8391472E032160C73025E-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
  2007-06-26  8:45           ` Jun Koi
@ 2007-06-26 11:56           ` Anthony Liguori
  1 sibling, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-26 11:56 UTC (permalink / raw)
  To: Dor Laor; +Cc: kvm-devel, virtualization

Dor Laor wrote:
> ...
> +static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
> +{
> +	struct kvm_paravirt_state *state;
> +
> +	state = (void *)get_zeroed_page(GFP_KERNEL);
> +	if (!state)
> +		goto err;
> +
> +	state->vmca = (void *)get_zeroed_page(GFP_KERNEL);
> +	if (!state->vmca)
> +		goto err;
> +
> +	/* FIXME: what do I need for this to be executable on 64 bit? */
> +	state->hypercall = (void *)get_zeroed_page(GFP_KERNEL);
>
> Why do you alloc a page for the hypercall instead of using Ingo's code
> below? This way it can work for 64 bit too.
>   

The current patch queue uses data in the text segment but it makes sure 
that it has a proper page.

Regards,

Anthony Liguori

> Ingo's code:
> /*
>  * This is the vm-syscall address - to be patched by the host to
>  * VMCALL (Intel) or VMMCALL (AMD), depending on the CPU model:
>  */
> asm (
>         "       .globl hypercall_addr                   \n"
>         "       .align 4                                \n"
>         "       hypercall_addr:                         \n"
> 	"               movl $-38, %eax                 \n"
>         "               ret                             \n"
> );
> extern unsigned char hypercall_addr[6];
>
>
> And use it this way: (I used vmalloc_to_page since its compiles as a
> module)
> hypercall_addr_page = vmalloc_to_page(hypercall_addr);
> para_state->hypercall_gpa = page_to_pfn(hypercall_addr_page) <<
> PAGE_SHIFT | 
> 				    offset_in_page(hypercall_addr);
>
> Regards,
> 	Dor.
>
>
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  2:58   ` [PATCH 1/5] KVM paravirt_ops core infrastructure Anthony Liguori
@ 2007-06-18  2:58   ` Anthony Liguori
       [not found]     ` <4675F4F1.5090207-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  3:00   ` [PATCH 3/5] KVM: Add paravirt MMU write support Anthony Liguori
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  2:58 UTC (permalink / raw)
  To: kvm-devel; +Cc: virtualization

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

Regards,

Anthony Liguori

[-- Attachment #2: kvm-cr-caching.diff --]
[-- Type: text/x-patch, Size: 4449 bytes --]

Subject: [PATCH] KVM: Implement CR read caching for KVM paravirt_ops
Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

With hardware virtualization, CR reads often times require a VMEXIT which is
rather expensive.  Instead of reading CR and taking the VMEXIT, maintain a
copy of each CR and return that on CR reads.

Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
index 22ea647..89e83a4 100644
--- a/arch/i386/kernel/kvm.c
+++ b/arch/i386/kernel/kvm.c
@@ -26,8 +26,13 @@
 #include <linux/cpu.h>
 #include <linux/mm.h>
 
+#define CR0_TS_MASK (1ULL << 3)
+
 struct kvm_paravirt_state
 {
+	unsigned long cached_cr[5];
+	int cr_valid[5];
+
 	struct kvm_vmca *vmca;
 	struct kvm_hypercall_entry *queue;
 	void (*hypercall)(void);
@@ -37,6 +42,7 @@ struct kvm_paravirt_state
 
 static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
 
+static int do_cr_read_caching;
 static int do_nop_io_delay;
 static u64 msr_set_vmca;
 
@@ -69,6 +75,85 @@ static void kvm_io_delay(void)
 {
 }
 
+/*
+ * Control register reads can be trapped.  Since trapping is relatively
+ * expensive, we can avoid paying the cost by caching logically.
+ */
+static unsigned long kvm_read_cr(int reg)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, smp_processor_id());
+
+	if (unlikely(!state->cr_valid[reg])) {
+		if (reg == 0)
+			state->cached_cr[reg] = native_read_cr0();
+		else if (reg == 3)
+			state->cached_cr[reg] = native_read_cr3();
+		else if (reg == 4)
+			state->cached_cr[reg] = native_read_cr4();
+		else
+			BUG();
+		state->cr_valid[reg] = 1;
+	}
+	return state->cached_cr[reg];
+}
+
+static void kvm_write_cr(int reg, unsigned long value)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, smp_processor_id());
+
+	state->cr_valid[reg] = 1;
+	state->cached_cr[reg] = value;
+
+	if (reg == 0)
+		native_write_cr0(value);
+	else if (reg == 3)
+		native_write_cr3(value);
+	else if (reg == 4)
+		native_write_cr4(value);
+	else
+		BUG();
+}
+
+static unsigned long kvm_read_cr0(void)
+{
+	return kvm_read_cr(0);
+}
+
+static void kvm_write_cr0(unsigned long value)
+{
+	kvm_write_cr(0, value);
+}
+
+/*
+ * We trap clts to ensure that our cached cr0 remains consistent.
+ */
+static void kvm_clts(void)
+{
+	write_cr0(read_cr0() & ~CR0_TS_MASK);
+}
+
+static unsigned long kvm_read_cr3(void)
+{
+	return kvm_read_cr(3);
+}
+
+static void kvm_write_cr3(unsigned long value)
+{
+	kvm_write_cr(3, value);
+}
+
+static unsigned long kvm_read_cr4(void)
+{
+	return kvm_read_cr(4);
+}
+
+static void kvm_write_cr4(unsigned long value)
+{
+	kvm_write_cr(4, value);
+}
+
 static void paravirt_ops_setup(void)
 {
 	paravirt_ops.name = "KVM";
@@ -76,6 +161,19 @@ static void paravirt_ops_setup(void)
 	if (do_nop_io_delay)
 		paravirt_ops.io_delay = kvm_io_delay;
 
+	if (do_cr_read_caching) {
+		paravirt_ops.clts = kvm_clts;
+		paravirt_ops.read_cr0 = kvm_read_cr0;
+		paravirt_ops.write_cr0 = kvm_write_cr0;
+		paravirt_ops.read_cr3 = kvm_read_cr3;
+		paravirt_ops.write_cr3 = kvm_write_cr3;
+		paravirt_ops.read_cr4 = kvm_read_cr4;
+		paravirt_ops.write_cr4 = kvm_write_cr4;
+
+		/* CR4 always exists in a KVM guest */
+		paravirt_ops.read_cr4_safe = kvm_read_cr4;
+	}
+
 	paravirt_ops.paravirt_enabled = 1;
 
 	apply_paravirt(__parainstructions, __parainstructions_end);
@@ -114,6 +212,9 @@ static int paravirt_initialize(void)
 	if ((edx & KVM_FEATURE_NOP_IO_DELAY))
 		do_nop_io_delay = 1;
 
+	if ((edx & KVM_FEATURE_CR_READ_CACHE))
+		do_cr_read_caching = 1;
+
 	on_each_cpu(paravirt_activate, NULL, 0, 1);
 
 	return 0;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index f7a0e6e..7b57431 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -93,7 +93,8 @@ struct vfsmount *kvmfs_mnt;
 #define EFER_RESERVED_BITS 0xfffffffffffff2fe
 
 #define KVM_PARAVIRT_FEATURES \
-	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY)
+	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
+	 KVM_FEATURE_CR_READ_CACHE)
 
 #define KVM_MSR_SET_VMCA	0x87655678
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index cf51d4a..121a09c 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -13,6 +13,7 @@
 
 #define KVM_FEATURE_VMCA		(1UL << 0)
 #define KVM_FEATURE_NOP_IO_DELAY	(1UL << 1)
+#define KVM_FEATURE_CR_READ_CACHE	(1UL << 2)
 
 struct kvm_vmca
 {

[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

[parent not found: <4675F4F1.5090207-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops
       [not found]     ` <4675F4F1.5090207-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  8:05       ` Avi Kivity
       [not found]         ` <46763CD3.3060704-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-06-18  8:11       ` Avi Kivity
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  8:05 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> With hardware virtualization, CR reads often times require a VMEXIT which is
> rather expensive.  Instead of reading CR and taking the VMEXIT, maintain a
> copy of each CR and return that on CR reads.
>   

Looks good.  Any measurable performance impact?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46763CD3.3060704-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops
       [not found]         ` <46763CD3.3060704-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:26           ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>> With hardware virtualization, CR reads often times require a VMEXIT 
>> which is
>> rather expensive.  Instead of reading CR and taking the VMEXIT, 
>> maintain a
>> copy of each CR and return that on CR reads.
>>   
>
> Looks good.  Any measurable performance impact?

I'm having trouble with virtbench on the latest versions of KVM so I've 
been running kernbench.  I don't have specific numbers for this one but 
I'll get some today.

Regards,

Anthony Liguori


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops
       [not found]     ` <4675F4F1.5090207-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  8:05       ` Avi Kivity
@ 2007-06-18  8:11       ` Avi Kivity
       [not found]         ` <46763E35.8020108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  8:11 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> +/*
> + * Control register reads can be trapped.  Since trapping is relatively
> + * expensive, we can avoid paying the cost by caching logically.
> + */
> +static unsigned long kvm_read_cr(int reg)
> +{
> +	struct kvm_paravirt_state *state
> +		= per_cpu(paravirt_state, smp_processor_id());
> +
> +	if (unlikely(!state->cr_valid[reg])) {
> +		if (reg == 0)
> +			state->cached_cr[reg] = native_read_cr0();
> +		else if (reg == 3)
> +			state->cached_cr[reg] = native_read_cr3();
> +		else if (reg == 4)
> +			state->cached_cr[reg] = native_read_cr4();
> +		else
> +			BUG();
> +		state->cr_valid[reg] = 1;
> +	}
> +	return state->cached_cr[reg];
> +}
> +

It would be good to declare this (and kvm_write_cr) always_inline.  
These functions are never called with a non-constant reg parameters, and 
the unsightly if tree (more readable as a switch, IMO) will fold nicely 
when inlined.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46763E35.8020108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops
       [not found]         ` <46763E35.8020108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:27           ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>> +/*
>> + * Control register reads can be trapped.  Since trapping is relatively
>> + * expensive, we can avoid paying the cost by caching logically.
>> + */
>> +static unsigned long kvm_read_cr(int reg)
>> +{
>> +    struct kvm_paravirt_state *state
>> +        = per_cpu(paravirt_state, smp_processor_id());
>> +
>> +    if (unlikely(!state->cr_valid[reg])) {
>> +        if (reg == 0)
>> +            state->cached_cr[reg] = native_read_cr0();
>> +        else if (reg == 3)
>> +            state->cached_cr[reg] = native_read_cr3();
>> +        else if (reg == 4)
>> +            state->cached_cr[reg] = native_read_cr4();
>> +        else
>> +            BUG();
>> +        state->cr_valid[reg] = 1;
>> +    }
>> +    return state->cached_cr[reg];
>> +}
>> +
>
> It would be good to declare this (and kvm_write_cr) always_inline.  
> These functions are never called with a non-constant reg parameters, 
> and the unsightly if tree (more readable as a switch, IMO) will fold 
> nicely when inlined.

Ok.

Regards,

Anthony Liguori



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  2:58   ` [PATCH 1/5] KVM paravirt_ops core infrastructure Anthony Liguori
  2007-06-18  2:58   ` [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops Anthony Liguori
@ 2007-06-18  3:00   ` Anthony Liguori
       [not found]     ` <4675F533.40809-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  3:00   ` [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation Anthony Liguori
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  3:00 UTC (permalink / raw)
  To: kvm-devel; +Cc: virtualization

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

Regards,

Anthony Liguori

[-- Attachment #2: kvm-mmu-write.diff --]
[-- Type: text/x-patch, Size: 5209 bytes --]

Subject: [PATCH] KVM: Add paravirt MMU write support
Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

On at least AMD hardware, hypercall based manipulation of page table memory
is significantly faster than taking a page fault.  Additionally, using
hypercalls to manipulation page table memory provides the infrastructure needed
to do lazy MMU updates.

Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
index 89e83a4..07ce38e 100644
--- a/arch/i386/kernel/kvm.c
+++ b/arch/i386/kernel/kvm.c
@@ -42,6 +42,7 @@ struct kvm_paravirt_state
 
 static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
 
+static int do_mmu_write;
 static int do_cr_read_caching;
 static int do_nop_io_delay;
 static u64 msr_set_vmca;
@@ -154,6 +155,69 @@ static void kvm_write_cr4(unsigned long value)
 	kvm_write_cr(4, value);
 }
 
+static void kvm_mmu_write(void *dest, const void *src, size_t size)
+{
+	const uint8_t *p = src;
+	u32 a1 = 0;
+
+	size >>= 2;
+	if (size == 2)
+		a1 = *(u32 *)&p[4];
+
+	kvm_hypercall(KVM_HYPERCALL_MMU_WRITE, (u32)dest, size, *(u32 *)p, a1);
+}
+
+/*
+ * We only need to hook operations that are MMU writes.  We hook these so that
+ * we can use lazy MMU mode to batch these operations.  We could probably
+ * improve the performance of the host code if we used some of the information
+ * here to simplify processing of batched writes.
+ */
+static void kvm_set_pte(pte_t *ptep, pte_t pte)
+{
+	kvm_mmu_write(ptep, &pte, sizeof(pte));
+}
+
+static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
+			   pte_t *ptep, pte_t pte)
+{
+	kvm_mmu_write(ptep, &pte, sizeof(pte));
+}
+
+static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
+{
+	kvm_mmu_write(ptep, &pte, sizeof(pte));
+}
+
+static void kvm_set_pte_present(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte)
+{
+	kvm_mmu_write(ptep, &pte, sizeof(pte));
+}
+
+static void kvm_pte_clear(struct mm_struct *mm,
+			  unsigned long addr, pte_t *ptep)
+{
+	pte_t pte = {0};
+	kvm_mmu_write(ptep, &pte, sizeof(pte));
+}
+
+static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
+{
+	kvm_mmu_write(pmdp, &pmd, sizeof(pmd));
+}
+
+static void kvm_set_pud(pud_t *pudp, pud_t pud)
+{
+	kvm_mmu_write(pudp, &pud, sizeof(pud));
+}
+
+static void kvm_pmd_clear(pmd_t *pmdp)
+{
+	pmd_t pmd = {0};
+	kvm_mmu_write(pmdp, &pmd, sizeof(pmd));
+}
+
 static void paravirt_ops_setup(void)
 {
 	paravirt_ops.name = "KVM";
@@ -174,6 +238,17 @@ static void paravirt_ops_setup(void)
 		paravirt_ops.read_cr4_safe = kvm_read_cr4;
 	}
 
+	if (do_mmu_write) {
+		paravirt_ops.set_pte = kvm_set_pte;
+		paravirt_ops.set_pte_at = kvm_set_pte_at;
+		paravirt_ops.set_pte_atomic = kvm_set_pte_atomic;
+		paravirt_ops.set_pte_present = kvm_set_pte_present;
+		paravirt_ops.pte_clear = kvm_pte_clear;
+		paravirt_ops.set_pmd = kvm_set_pmd;
+		paravirt_ops.pmd_clear = kvm_pmd_clear;
+		paravirt_ops.set_pud = kvm_set_pud;
+	}
+
 	paravirt_ops.paravirt_enabled = 1;
 
 	apply_paravirt(__parainstructions, __parainstructions_end);
@@ -215,6 +290,9 @@ static int paravirt_initialize(void)
 	if ((edx & KVM_FEATURE_CR_READ_CACHE))
 		do_cr_read_caching = 1;
 
+	if ((edx & KVM_FEATURE_MMU_WRITE))
+		do_mmu_write = 1;
+
 	on_each_cpu(paravirt_activate, NULL, 0, 1);
 
 	return 0;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7b57431..4f65729 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -94,7 +94,7 @@ struct vfsmount *kvmfs_mnt;
 
 #define KVM_PARAVIRT_FEATURES \
 	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
-	 KVM_FEATURE_CR_READ_CACHE)
+	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE)
 
 #define KVM_MSR_SET_VMCA	0x87655678
 
@@ -1347,10 +1347,36 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
+				   unsigned long size, unsigned long a0,
+				   unsigned long a1)
+{
+	gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
+	u64 value;
+
+	if (gpa == UNMAPPED_GVA)
+		return -EFAULT;
+	if (size == 1) {
+		if (!emulator_write_phys(vcpu, gpa, &a0, sizeof(a0)))
+			return -EFAULT;
+	} else if (size == 2) {
+		value = (u64)a1 << 32 | a0;
+		if (!emulator_write_phys(vcpu, gpa, &value, sizeof(value)))
+			return -EFAULT;
+	} else
+		return -E2BIG;
+
+	return 0;
+}
+
 static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 			      unsigned long p1, unsigned long p2,
 			      unsigned long p3, unsigned long p4)
 {
+	switch (nr) {
+	case KVM_HYPERCALL_MMU_WRITE:
+		return kvm_hypercall_mmu_write(vcpu, p1, p2, p3, p4);
+	}
 	return -ENOSYS;
 }
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 121a09c..e8ff676 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -14,6 +14,7 @@
 #define KVM_FEATURE_VMCA		(1UL << 0)
 #define KVM_FEATURE_NOP_IO_DELAY	(1UL << 1)
 #define KVM_FEATURE_CR_READ_CACHE	(1UL << 2)
+#define KVM_FEATURE_MMU_WRITE		(1UL << 3)
 
 struct kvm_vmca
 {
@@ -31,4 +32,6 @@ struct kvm_vmca
  * return value is in RAX.
  */
 
+#define KVM_HYPERCALL_MMU_WRITE		0
+
 #endif

[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

[parent not found: <4675F533.40809-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]     ` <4675F533.40809-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  8:20       ` Avi Kivity
       [not found]         ` <46764061.9080705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  8:20 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> +static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
> +				   unsigned long size, unsigned long a0,
> +				   unsigned long a1)
> +{
> +	gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
> +	u64 value;
> +
> +	if (gpa == UNMAPPED_GVA)
> +		return -EFAULT;
> +	if (size == 1) {
> +		if (!emulator_write_phys(vcpu, gpa, &a0, sizeof(a0)))
> +			return -EFAULT;
> +	} else if (size == 2) {
> +		value = (u64)a1 << 32 | a0;
> +		if (!emulator_write_phys(vcpu, gpa, &value, sizeof(value)))
> +			return -EFAULT;
> +	} else
> +		return -E2BIG;
> +
> +	return 0;
> +}

Hypercalls should return kvm-specific error codes (defined in 
include/linux/kvm_para.h), not Linux error codes, as they could be used 
in operating systems which have different values for E2BIG and friends.

> +static void kvm_pte_clear(struct mm_struct *mm,
> +			  unsigned long addr, pte_t *ptep)
> +{
> +	pte_t pte = {0};
>   

Surely there's a nice macro for creating a pte from an int?

Any performance measurement?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46764061.9080705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]         ` <46764061.9080705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:33           ` Anthony Liguori
       [not found]             ` <46767B8C.9050001-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-19 21:57           ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>> +static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
>> +                   unsigned long size, unsigned long a0,
>> +                   unsigned long a1)
>> +{
>> +    gpa_t gpa = vcpu->mmu.gva_to_gpa(vcpu, addr);
>> +    u64 value;
>> +
>> +    if (gpa == UNMAPPED_GVA)
>> +        return -EFAULT;
>> +    if (size == 1) {
>> +        if (!emulator_write_phys(vcpu, gpa, &a0, sizeof(a0)))
>> +            return -EFAULT;
>> +    } else if (size == 2) {
>> +        value = (u64)a1 << 32 | a0;
>> +        if (!emulator_write_phys(vcpu, gpa, &value, sizeof(value)))
>> +            return -EFAULT;
>> +    } else
>> +        return -E2BIG;
>> +
>> +    return 0;
>> +}
>
> Hypercalls should return kvm-specific error codes (defined in 
> include/linux/kvm_para.h), not Linux error codes, as they could be 
> used in operating systems which have different values for E2BIG and 
> friends.

If Linux's errnos are stable, we could just use them and let a non-Linux 
guest define a set of KVM_E2BIG, etc.?  It just seemed pretty ugly to 
add a bunch of these.

>> +static void kvm_pte_clear(struct mm_struct *mm,
>> +              unsigned long addr, pte_t *ptep)
>> +{
>> +    pte_t pte = {0};
>>   
>
> Surely there's a nice macro for creating a pte from an int?

Probably :-)

> Any performance measurement?

Yes, surprisingly enough.  COW faults in virtbench drop by a significant 
amount.  I'll repost each patch with virtbench results.  I suspect that 
the vmmcall path is much faster than a page fault.

Regards,

Anthony Liguori


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46767B8C.9050001-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]             ` <46767B8C.9050001-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 12:38               ` Avi Kivity
       [not found]                 ` <46767CD1.7030208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 12:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
>>
>> Hypercalls should return kvm-specific error codes (defined in 
>> include/linux/kvm_para.h), not Linux error codes, as they could be 
>> used in operating systems which have different values for E2BIG and 
>> friends.
>
> If Linux's errnos are stable, we could just use them and let a 
> non-Linux guest define a set of KVM_E2BIG, etc.?  It just seemed 
> pretty ugly to add a bunch of these.

The ugliness will serve to remind us that this is a potentially 
non-Linux path.

In this particular case, the names are not present in guest-visible 
headers, but any other that is made visible must have a 
non-Linux-dependent name, so I don't see why error numbers should be 
treated specially.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46767CD1.7030208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]                 ` <46767CD1.7030208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:48                   ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>>>
>>> Hypercalls should return kvm-specific error codes (defined in 
>>> include/linux/kvm_para.h), not Linux error codes, as they could be 
>>> used in operating systems which have different values for E2BIG and 
>>> friends.
>>
>> If Linux's errnos are stable, we could just use them and let a 
>> non-Linux guest define a set of KVM_E2BIG, etc.?  It just seemed 
>> pretty ugly to add a bunch of these.
>
> The ugliness will serve to remind us that this is a potentially 
> non-Linux path.
>
> In this particular case, the names are not present in guest-visible 
> headers, but any other that is made visible must have a 
> non-Linux-dependent name, so I don't see why error numbers should be 
> treated specially.

Ok.

Regards,

Anthony LIguori



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]         ` <46764061.9080705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-06-18 12:33           ` Anthony Liguori
@ 2007-06-19 21:57           ` Anthony Liguori
       [not found]             ` <46785132.3070505-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-19 21:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
>> +static void kvm_pte_clear(struct mm_struct *mm,
>> +              unsigned long addr, pte_t *ptep)
>> +{
>> +    pte_t pte = {0};
>>   
>
> Surely there's a nice macro for creating a pte from an int?
Perhaps my grep'ing skills are weak, but I don't seem to see any.  Were 
you thinking of something in particular?

Regards,

Anthony Liguori

> Any performance measurement?
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46785132.3070505-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]             ` <46785132.3070505-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-19 22:19               ` Jeremy Fitzhardinge
       [not found]                 ` <4678567C.6040400-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-19 22:19 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Perhaps my grep'ing skills are weak, but I don't seem to see any. 
> Were you thinking of something in particular? 

__pte(), of course.  Sheesh.   ;)

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4678567C.6040400-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 3/5] KVM: Add paravirt MMU write support
       [not found]                 ` <4678567C.6040400-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-19 22:28                   ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-19 22:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Perhaps my grep'ing skills are weak, but I don't seem to see any. 
>> Were you thinking of something in particular? 
>>     
>
> __pte(), of course.  Sheesh.   ;)
>   

How could I have missed something that is so clearly named!  :-)

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2007-06-18  3:00   ` [PATCH 3/5] KVM: Add paravirt MMU write support Anthony Liguori
@ 2007-06-18  3:00   ` Anthony Liguori
       [not found]     ` <4675F568.90608-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  3:03   ` [PATCH 5/5] KVM: paravirt time source Anthony Liguori
  2007-06-18  3:19   ` [PATCH 0/5] KVM paravirt_ops implementation Jeremy Fitzhardinge
  5 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  3:00 UTC (permalink / raw)
  To: kvm-devel; +Cc: virtualization

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

Regards,

Anthony Liguori

[-- Attachment #2: kvm-hypercall-queue.diff --]
[-- Type: text/x-patch, Size: 9728 bytes --]

Subject: [PATCH] KVM: Add hypercall queue for paravirt_ops implementation
Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Implemented a hypercall queue that can be used when paravirt_ops lazy mode
is enabled.  This patch enables queueing of MMU write operations and CR
updates.  This results in about a 50% bump in kernbench performance.

Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
index 07ce38e..4b323f1 100644
--- a/arch/i386/kernel/kvm.c
+++ b/arch/i386/kernel/kvm.c
@@ -33,8 +33,10 @@ struct kvm_paravirt_state
 	unsigned long cached_cr[5];
 	int cr_valid[5];
 
-	struct kvm_vmca *vmca;
+	enum paravirt_lazy_mode mode;
 	struct kvm_hypercall_entry *queue;
+
+	struct kvm_vmca *vmca;
 	void (*hypercall)(void);
 
 	u64 vmca_gpa;
@@ -42,17 +44,17 @@ struct kvm_paravirt_state
 
 static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
 
+static int do_hypercall_batching;
 static int do_mmu_write;
 static int do_cr_read_caching;
 static int do_nop_io_delay;
 static u64 msr_set_vmca;
 
-static long kvm_hypercall(unsigned int nr, unsigned long p1,
-			  unsigned long p2, unsigned long p3,
-			  unsigned long p4)
+static long _kvm_hypercall(struct kvm_paravirt_state *state,
+			   unsigned int nr, unsigned long p1,
+			   unsigned long p2, unsigned long p3,
+			   unsigned long p4)
 {
-	struct kvm_paravirt_state *state
-		= per_cpu(paravirt_state, smp_processor_id());
 	long ret;
 
 	asm volatile("call *(%6) \n\t"
@@ -69,6 +71,55 @@ static long kvm_hypercall(unsigned int nr, unsigned long p1,
 	return ret;
 }
 
+static int can_defer_hypercall(struct kvm_paravirt_state *state,
+			       unsigned int nr)
+{
+	if (state->mode == PARAVIRT_LAZY_MMU) {
+		if (nr == KVM_HYPERCALL_MMU_WRITE)
+			return 1;
+	} else if (state->mode == PARAVIRT_LAZY_CPU) {
+		if (nr == KVM_HYPERCALL_SET_CR)
+			return 1;
+	}
+
+	return 0;
+}
+
+static void _kvm_hypercall_defer(struct kvm_paravirt_state *state,
+				 unsigned int nr,
+				 unsigned long p1, unsigned long p2,
+				 unsigned long p3, unsigned long p4)
+{
+	struct kvm_hypercall_entry *entry;
+
+	if (state->vmca->queue_index == state->vmca->max_queue_index)
+		_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
+
+	/* FIXME: are we preempt safe here? */
+	entry = &state->queue[state->vmca->queue_index++];
+	entry->nr = nr;
+	entry->p1 = p1;
+	entry->p2 = p2;
+	entry->p3 = p3;
+	entry->p4 = p4;
+}
+
+static long kvm_hypercall(unsigned int nr, unsigned long p1,
+			  unsigned long p2, unsigned long p3,
+			  unsigned long p4)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, smp_processor_id());
+	long ret = 0;
+
+	if (can_defer_hypercall(state, nr))
+		_kvm_hypercall_defer(state, nr, p1, p2, p3, p4);
+	else
+		ret = _kvm_hypercall(state, nr, p1, p2, p3, p4);
+
+	return ret;
+}
+
 /*
  * No need for any "IO delay" on KVM
  */
@@ -107,7 +158,9 @@ static void kvm_write_cr(int reg, unsigned long value)
 	state->cr_valid[reg] = 1;
 	state->cached_cr[reg] = value;
 
-	if (reg == 0)
+	if (state->mode == PARAVIRT_LAZY_CPU)
+		kvm_hypercall(KVM_HYPERCALL_SET_CR, reg, value, 0, 0);
+	else if (reg == 0)
 		native_write_cr0(value);
 	else if (reg == 3)
 		native_write_cr3(value);
@@ -218,6 +271,18 @@ static void kvm_pmd_clear(pmd_t *pmdp)
 	kvm_mmu_write(pmdp, &pmd, sizeof(pmd));
 }
 
+static void kvm_set_lazy_mode(enum paravirt_lazy_mode mode)
+{
+	struct kvm_paravirt_state *state
+		= per_cpu(paravirt_state, smp_processor_id());
+
+	if (mode == PARAVIRT_LAZY_FLUSH || mode == PARAVIRT_LAZY_NONE) {
+		if (state->vmca->queue_index)
+			_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
+	}
+	state->mode = mode;
+}
+
 static void paravirt_ops_setup(void)
 {
 	paravirt_ops.name = "KVM";
@@ -249,6 +314,9 @@ static void paravirt_ops_setup(void)
 		paravirt_ops.set_pud = kvm_set_pud;
 	}
 
+	if (do_hypercall_batching)
+		paravirt_ops.set_lazy_mode = kvm_set_lazy_mode;
+
 	paravirt_ops.paravirt_enabled = 1;
 
 	apply_paravirt(__parainstructions, __parainstructions_end);
@@ -293,6 +361,9 @@ static int paravirt_initialize(void)
 	if ((edx & KVM_FEATURE_MMU_WRITE))
 		do_mmu_write = 1;
 
+	if ((edx & KVM_FEATURE_HYPERCALL_BATCHING))
+		do_hypercall_batching = 1;
+
 	on_each_cpu(paravirt_activate, NULL, 0, 1);
 
 	return 0;
@@ -303,6 +374,9 @@ static __init void paravirt_free_state(struct kvm_paravirt_state *state)
 	if (!state)
 		return;
 
+	if (state->queue)
+		__free_page(pfn_to_page(__pa(state->queue) >> PAGE_SHIFT));
+
 	if (state->hypercall)
 		__free_page(pfn_to_page(__pa(state->hypercall) >> PAGE_SHIFT));
 
@@ -329,8 +403,15 @@ static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
 	if (!state->hypercall)
 		goto err;
 
+	state->queue = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!state->queue)
+		goto err;
+
 	state->vmca_gpa = __pa(state->vmca);
 	state->vmca->hypercall_gpa = __pa(state->hypercall);
+	state->vmca->queue_gpa = __pa(state->queue);
+	state->vmca->max_queue_index
+		= (PAGE_SIZE / sizeof(struct kvm_hypercall_entry));
 
 	return state;
 
diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index b08272b..d531899 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -291,6 +291,7 @@ struct kvm_vcpu {
 	gpa_t para_state_gpa;
 	struct page *para_state_page;
 	gpa_t hypercall_gpa;
+	struct page *queue_page;
 	unsigned long cr4;
 	unsigned long cr8;
 	u64 pdptrs[4]; /* pae */
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 4f65729..79a2a64 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -94,7 +94,8 @@ struct vfsmount *kvmfs_mnt;
 
 #define KVM_PARAVIRT_FEATURES \
 	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
-	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE)
+	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE | \
+	 KVM_FEATURE_HYPERCALL_BATCHING)
 
 #define KVM_MSR_SET_VMCA	0x87655678
 
@@ -1369,6 +1370,24 @@ static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
 	return 0;
 }
 
+static int kvm_hypercall_set_cr(struct kvm_vcpu *vcpu,
+				u32 reg, unsigned long value)
+{
+	switch (reg) {
+	case 0:
+		set_cr0(vcpu, value);
+		break;
+	case 3:
+		set_cr3(vcpu, value);
+		break;
+	case 4:
+		set_cr4(vcpu, value);
+		break;
+	}
+
+	return 0;
+}
+
 static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 			      unsigned long p1, unsigned long p2,
 			      unsigned long p3, unsigned long p4)
@@ -1376,10 +1395,36 @@ static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 	switch (nr) {
 	case KVM_HYPERCALL_MMU_WRITE:
 		return kvm_hypercall_mmu_write(vcpu, p1, p2, p3, p4);
+	case KVM_HYPERCALL_SET_CR:
+		return kvm_hypercall_set_cr(vcpu, p1, p2);
 	}
 	return -ENOSYS;
 }
 
+static int kvm_hypercall_flush(struct kvm_vcpu *vcpu)
+{
+	struct kvm_hypercall_entry *queue;
+	struct kvm_vmca *vmca;
+	int ret = 0;
+	int i;
+
+	queue = kmap(vcpu->queue_page);
+	vmca = kmap(vcpu->para_state_page);
+
+	for (i = 0; i < vmca->queue_index; i++)
+		ret |= dispatch_hypercall(vcpu, queue[i].nr, queue[i].p1,
+					  queue[i].p2, queue[i].p3,
+					  queue[i].p4);
+
+	vmca->queue_index = 0;
+	mark_page_dirty(vcpu->kvm, vcpu->para_state_gpa >> PAGE_SHIFT);
+
+	kunmap(vcpu->para_state_page);
+	kunmap(vcpu->queue_page);
+
+	return ret;
+}
+
 int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	unsigned long nr, a0, a1, a2, a3, a4, a5, ret;
@@ -1407,7 +1452,11 @@ int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		a5 = vcpu->regs[VCPU_REGS_RBP] & -1u;
 	}
 
-	ret = dispatch_hypercall(vcpu, nr, a0, a1, a2, a3);
+	if (nr == KVM_HYPERCALL_FLUSH)
+		ret = kvm_hypercall_flush(vcpu);
+	else
+		ret = dispatch_hypercall(vcpu, nr, a0, a1, a2, a3);
+
 	if (ret == -ENOSYS) {
 		run->hypercall.args[0] = a0;
 		run->hypercall.args[1] = a1;
@@ -1498,8 +1547,8 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
 static int vcpu_register_para(struct kvm_vcpu *vcpu, gpa_t para_state_gpa)
 {
 	struct kvm_vmca *para_state;
-	hpa_t para_state_hpa, hypercall_hpa;
-	struct page *para_state_page;
+	hpa_t para_state_hpa, hypercall_hpa, queue_hpa;
+	struct page *para_state_page, *queue_page;
 	unsigned char *hypercall;
 	gpa_t hypercall_gpa;
 
@@ -1526,10 +1575,16 @@ static int vcpu_register_para(struct kvm_vcpu *vcpu, gpa_t para_state_gpa)
 	if (is_error_hpa(hypercall_hpa))
 		goto err_kunmap_skip;
 
+	queue_hpa = gpa_to_hpa(vcpu, para_state->queue_gpa);
+	if (is_error_hpa(queue_hpa))
+		goto err_kunmap_skip;
+	queue_page = pfn_to_page(queue_hpa >> PAGE_SHIFT);
+
 	printk(KERN_DEBUG "kvm: para guest successfully registered.\n");
 	vcpu->para_state_page = para_state_page;
 	vcpu->para_state_gpa = para_state_gpa;
 	vcpu->hypercall_gpa = hypercall_gpa;
+	vcpu->queue_page = queue_page;
 
 	mark_page_dirty(vcpu->kvm, hypercall_gpa >> PAGE_SHIFT);
 	hypercall = kmap_atomic(pfn_to_page(hypercall_hpa >> PAGE_SHIFT),
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e8ff676..7dd0cef 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -15,10 +15,23 @@
 #define KVM_FEATURE_NOP_IO_DELAY	(1UL << 1)
 #define KVM_FEATURE_CR_READ_CACHE	(1UL << 2)
 #define KVM_FEATURE_MMU_WRITE		(1UL << 3)
+#define KVM_FEATURE_HYPERCALL_BATCHING	(1UL << 4)
 
 struct kvm_vmca
 {
 	u64 hypercall_gpa;
+	u64 queue_gpa;
+	u32 queue_index;
+	u32 max_queue_index;
+};
+
+struct kvm_hypercall_entry
+{
+	unsigned long nr;
+	unsigned long p1;
+	unsigned long p2;
+	unsigned long p3;
+	unsigned long p4;
 };
 
 /*
@@ -33,5 +46,7 @@ struct kvm_vmca
  */
 
 #define KVM_HYPERCALL_MMU_WRITE		0
+#define KVM_HYPERCALL_SET_CR		1
+#define KVM_HYPERCALL_FLUSH		2
 
 #endif

[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

[parent not found: <4675F568.90608-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]     ` <4675F568.90608-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  4:00       ` Jeremy Fitzhardinge
       [not found]         ` <46760343.5070401-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-18  9:07       ` Avi Kivity
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18  4:00 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Regards,
>
> Anthony Liguori
> ------------------------------------------------------------------------
>
> Subject: [PATCH] KVM: Add hypercall queue for paravirt_ops implementation
> Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Implemented a hypercall queue that can be used when paravirt_ops lazy mode
> is enabled.  This patch enables queueing of MMU write operations and CR
> updates.  This results in about a 50% bump in kernbench performance.
>
> Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
> index 07ce38e..4b323f1 100644
> --- a/arch/i386/kernel/kvm.c
> +++ b/arch/i386/kernel/kvm.c
> @@ -33,8 +33,10 @@ struct kvm_paravirt_state
>  	unsigned long cached_cr[5];
>  	int cr_valid[5];
>  
> -	struct kvm_vmca *vmca;
> +	enum paravirt_lazy_mode mode;
>  	struct kvm_hypercall_entry *queue;
> +
> +	struct kvm_vmca *vmca;
>  	void (*hypercall)(void);
>  
>  	u64 vmca_gpa;
> @@ -42,17 +44,17 @@ struct kvm_paravirt_state
>  
>  static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
>  
> +static int do_hypercall_batching;
>  static int do_mmu_write;
>  static int do_cr_read_caching;
>  static int do_nop_io_delay;
>  static u64 msr_set_vmca;
>  
> -static long kvm_hypercall(unsigned int nr, unsigned long p1,
> -			  unsigned long p2, unsigned long p3,
> -			  unsigned long p4)
> +static long _kvm_hypercall(struct kvm_paravirt_state *state,
> +			   unsigned int nr, unsigned long p1,
> +			   unsigned long p2, unsigned long p3,
> +			   unsigned long p4)
>  {
> -	struct kvm_paravirt_state *state
> -		= per_cpu(paravirt_state, smp_processor_id());
>  	long ret;
>  
>  	asm volatile("call *(%6) \n\t"
> @@ -69,6 +71,55 @@ static long kvm_hypercall(unsigned int nr, unsigned long p1,
>  	return ret;
>  }
>  
> +static int can_defer_hypercall(struct kvm_paravirt_state *state,
> +			       unsigned int nr)
> +{
> +	if (state->mode == PARAVIRT_LAZY_MMU) {
> +		if (nr == KVM_HYPERCALL_MMU_WRITE)
> +			return 1;
> +	} else if (state->mode == PARAVIRT_LAZY_CPU) {
> +		if (nr == KVM_HYPERCALL_SET_CR)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static void _kvm_hypercall_defer(struct kvm_paravirt_state *state,
> +				 unsigned int nr,
> +				 unsigned long p1, unsigned long p2,
> +				 unsigned long p3, unsigned long p4)
> +{
> +	struct kvm_hypercall_entry *entry;
> +
> +	if (state->vmca->queue_index == state->vmca->max_queue_index)
> +		_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
> +
> +	/* FIXME: are we preempt safe here? */
>   

BUG_ON(preemptible()) would be a reasonable thing to put here to be sure.

> +	entry = &state->queue[state->vmca->queue_index++];
> +	entry->nr = nr;
> +	entry->p1 = p1;
> +	entry->p2 = p2;
> +	entry->p3 = p3;
> +	entry->p4 = p4;
> +}
> +
> +static long kvm_hypercall(unsigned int nr, unsigned long p1,
> +			  unsigned long p2, unsigned long p3,
> +			  unsigned long p4)
> +{
> +	struct kvm_paravirt_state *state
> +		= per_cpu(paravirt_state, smp_processor_id());
>   

Rather than using this here and passing state around, you could use
either x86_read/write_percpu, or get/put_cpu_var (or __get_vpu_var if
you don't need the preempt-disable).

> +	long ret = 0;
> +
> +	if (can_defer_hypercall(state, nr))
> +		_kvm_hypercall_defer(state, nr, p1, p2, p3, p4);
> +	else
> +		ret = _kvm_hypercall(state, nr, p1, p2, p3, p4);
> +
> +	return ret;
> +}
> +
>  /*
>   * No need for any "IO delay" on KVM
>   */
> @@ -107,7 +158,9 @@ static void kvm_write_cr(int reg, unsigned long value)
>  	state->cr_valid[reg] = 1;
>  	state->cached_cr[reg] = value;
>  
> -	if (reg == 0)
> +	if (state->mode == PARAVIRT_LAZY_CPU)
> +		kvm_hypercall(KVM_HYPERCALL_SET_CR, reg, value, 0, 0);
> +	else if (reg == 0)
>  		native_write_cr0(value);
>  	else if (reg == 3)
>  		native_write_cr3(value);
> @@ -218,6 +271,18 @@ static void kvm_pmd_clear(pmd_t *pmdp)
>  	kvm_mmu_write(pmdp, &pmd, sizeof(pmd));
>  }
>  
> +static void kvm_set_lazy_mode(enum paravirt_lazy_mode mode)
> +{
> +	struct kvm_paravirt_state *state
> +		= per_cpu(paravirt_state, smp_processor_id());
> +
> +	if (mode == PARAVIRT_LAZY_FLUSH || mode == PARAVIRT_LAZY_NONE) {
> +		if (state->vmca->queue_index)
> +			_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
> +	}
> +	state->mode = mode;
>   

No, you don't want to set state->mode to LAZY_FLUSH (its not a mode,
just a action which overloads the interface).

> +}
> +
>  static void paravirt_ops_setup(void)
>  {
>  	paravirt_ops.name = "KVM";
> @@ -249,6 +314,9 @@ static void paravirt_ops_setup(void)
>  		paravirt_ops.set_pud = kvm_set_pud;
>  	}
>  
> +	if (do_hypercall_batching)
> +		paravirt_ops.set_lazy_mode = kvm_set_lazy_mode;
> +
>  	paravirt_ops.paravirt_enabled = 1;
>  
>  	apply_paravirt(__parainstructions, __parainstructions_end);
> @@ -293,6 +361,9 @@ static int paravirt_initialize(void)
>  	if ((edx & KVM_FEATURE_MMU_WRITE))
>  		do_mmu_write = 1;
>  
> +	if ((edx & KVM_FEATURE_HYPERCALL_BATCHING))
> +		do_hypercall_batching = 1;
> +
>  	on_each_cpu(paravirt_activate, NULL, 0, 1);
>  
>  	return 0;
> @@ -303,6 +374,9 @@ static __init void paravirt_free_state(struct kvm_paravirt_state *state)
>  	if (!state)
>  		return;
>  
> +	if (state->queue)
> +		__free_page(pfn_to_page(__pa(state->queue) >> PAGE_SHIFT));
> +
>  	if (state->hypercall)
>  		__free_page(pfn_to_page(__pa(state->hypercall) >> PAGE_SHIFT));
>  
> @@ -329,8 +403,15 @@ static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
>  	if (!state->hypercall)
>  		goto err;
>  
> +	state->queue = (void *)get_zeroed_page(GFP_KERNEL);
> +	if (!state->queue)
> +		goto err;
> +
>  	state->vmca_gpa = __pa(state->vmca);
>  	state->vmca->hypercall_gpa = __pa(state->hypercall);
> +	state->vmca->queue_gpa = __pa(state->queue);
> +	state->vmca->max_queue_index
> +		= (PAGE_SIZE / sizeof(struct kvm_hypercall_entry));
>  
>  	return state;
>  
> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> index b08272b..d531899 100644
> --- a/drivers/kvm/kvm.h
> +++ b/drivers/kvm/kvm.h
> @@ -291,6 +291,7 @@ struct kvm_vcpu {
>  	gpa_t para_state_gpa;
>  	struct page *para_state_page;
>  	gpa_t hypercall_gpa;
> +	struct page *queue_page;
>  	unsigned long cr4;
>  	unsigned long cr8;
>  	u64 pdptrs[4]; /* pae */
> diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
> index 4f65729..79a2a64 100644
> --- a/drivers/kvm/kvm_main.c
> +++ b/drivers/kvm/kvm_main.c
> @@ -94,7 +94,8 @@ struct vfsmount *kvmfs_mnt;
>  
>  #define KVM_PARAVIRT_FEATURES \
>  	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
> -	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE)
> +	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE | \
> +	 KVM_FEATURE_HYPERCALL_BATCHING)
>  
>  #define KVM_MSR_SET_VMCA	0x87655678
>  
> @@ -1369,6 +1370,24 @@ static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
>  	return 0;
>  }
>  
> +static int kvm_hypercall_set_cr(struct kvm_vcpu *vcpu,
> +				u32 reg, unsigned long value)
> +{
> +	switch (reg) {
> +	case 0:
> +		set_cr0(vcpu, value);
> +		break;
> +	case 3:
> +		set_cr3(vcpu, value);
> +		break;
> +	case 4:
> +		set_cr4(vcpu, value);
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
>  static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>  			      unsigned long p1, unsigned long p2,
>  			      unsigned long p3, unsigned long p4)
> @@ -1376,10 +1395,36 @@ static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>  	switch (nr) {
>  	case KVM_HYPERCALL_MMU_WRITE:
>  		return kvm_hypercall_mmu_write(vcpu, p1, p2, p3, p4);
> +	case KVM_HYPERCALL_SET_CR:
> +		return kvm_hypercall_set_cr(vcpu, p1, p2);
>  	}
>  	return -ENOSYS;
>  }
>  
> +static int kvm_hypercall_flush(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_hypercall_entry *queue;
> +	struct kvm_vmca *vmca;
> +	int ret = 0;
> +	int i;
> +
> +	queue = kmap(vcpu->queue_page);
> +	vmca = kmap(vcpu->para_state_page);
>   

kmap_atomic?  Or why not keep them mapped all the time?

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46760343.5070401-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]         ` <46760343.5070401-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18  4:09           ` Jeremy Fitzhardinge
  2007-06-18 12:22           ` Anthony Liguori
  1 sibling, 0 replies; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18  4:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
>> +static int kvm_hypercall_flush(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_hypercall_entry *queue;
>> +	struct kvm_vmca *vmca;
>> +	int ret = 0;
>> +	int i;
>> +
>> +	queue = kmap(vcpu->queue_page);
>> +	vmca = kmap(vcpu->para_state_page);
>>   
>>     
>
> kmap_atomic?  Or why not keep them mapped all the time?
>   

Oh, right, this is on the kvm side.  Still, kmap_atomic?

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]         ` <46760343.5070401-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-18  4:09           ` Jeremy Fitzhardinge
@ 2007-06-18 12:22           ` Anthony Liguori
  1 sibling, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:22 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Regards,
>>
>> Anthony Liguori
>> ------------------------------------------------------------------------
>>
>> Subject: [PATCH] KVM: Add hypercall queue for paravirt_ops implementation
>> Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>>
>> Implemented a hypercall queue that can be used when paravirt_ops lazy mode
>> is enabled.  This patch enables queueing of MMU write operations and CR
>> updates.  This results in about a 50% bump in kernbench performance.
>>
>> Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>>
>> diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
>> index 07ce38e..4b323f1 100644
>> --- a/arch/i386/kernel/kvm.c
>> +++ b/arch/i386/kernel/kvm.c
>> @@ -33,8 +33,10 @@ struct kvm_paravirt_state
>>  	unsigned long cached_cr[5];
>>  	int cr_valid[5];
>>  
>> -	struct kvm_vmca *vmca;
>> +	enum paravirt_lazy_mode mode;
>>  	struct kvm_hypercall_entry *queue;
>> +
>> +	struct kvm_vmca *vmca;
>>  	void (*hypercall)(void);
>>  
>>  	u64 vmca_gpa;
>> @@ -42,17 +44,17 @@ struct kvm_paravirt_state
>>  
>>  static DEFINE_PER_CPU(struct kvm_paravirt_state *, paravirt_state);
>>  
>> +static int do_hypercall_batching;
>>  static int do_mmu_write;
>>  static int do_cr_read_caching;
>>  static int do_nop_io_delay;
>>  static u64 msr_set_vmca;
>>  
>> -static long kvm_hypercall(unsigned int nr, unsigned long p1,
>> -			  unsigned long p2, unsigned long p3,
>> -			  unsigned long p4)
>> +static long _kvm_hypercall(struct kvm_paravirt_state *state,
>> +			   unsigned int nr, unsigned long p1,
>> +			   unsigned long p2, unsigned long p3,
>> +			   unsigned long p4)
>>  {
>> -	struct kvm_paravirt_state *state
>> -		= per_cpu(paravirt_state, smp_processor_id());
>>  	long ret;
>>  
>>  	asm volatile("call *(%6) \n\t"
>> @@ -69,6 +71,55 @@ static long kvm_hypercall(unsigned int nr, unsigned long p1,
>>  	return ret;
>>  }
>>  
>> +static int can_defer_hypercall(struct kvm_paravirt_state *state,
>> +			       unsigned int nr)
>> +{
>> +	if (state->mode == PARAVIRT_LAZY_MMU) {
>> +		if (nr == KVM_HYPERCALL_MMU_WRITE)
>> +			return 1;
>> +	} else if (state->mode == PARAVIRT_LAZY_CPU) {
>> +		if (nr == KVM_HYPERCALL_SET_CR)
>> +			return 1;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void _kvm_hypercall_defer(struct kvm_paravirt_state *state,
>> +				 unsigned int nr,
>> +				 unsigned long p1, unsigned long p2,
>> +				 unsigned long p3, unsigned long p4)
>> +{
>> +	struct kvm_hypercall_entry *entry;
>> +
>> +	if (state->vmca->queue_index == state->vmca->max_queue_index)
>> +		_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
>> +
>> +	/* FIXME: are we preempt safe here? */
>>   
>>     
>
> BUG_ON(preemptible()) would be a reasonable thing to put here to be sure.
>   

Ok.

>> +	entry = &state->queue[state->vmca->queue_index++];
>> +	entry->nr = nr;
>> +	entry->p1 = p1;
>> +	entry->p2 = p2;
>> +	entry->p3 = p3;
>> +	entry->p4 = p4;
>> +}
>> +
>> +static long kvm_hypercall(unsigned int nr, unsigned long p1,
>> +			  unsigned long p2, unsigned long p3,
>> +			  unsigned long p4)
>> +{
>> +	struct kvm_paravirt_state *state
>> +		= per_cpu(paravirt_state, smp_processor_id());
>>   
>>     
>
> Rather than using this here and passing state around, you could use
> either x86_read/write_percpu, or get/put_cpu_var (or __get_vpu_var if
> you don't need the preempt-disable).
>   

Ok.

>> +	long ret = 0;
>> +
>> +	if (can_defer_hypercall(state, nr))
>> +		_kvm_hypercall_defer(state, nr, p1, p2, p3, p4);
>> +	else
>> +		ret = _kvm_hypercall(state, nr, p1, p2, p3, p4);
>> +
>> +	return ret;
>> +}
>> +
>>  /*
>>   * No need for any "IO delay" on KVM
>>   */
>> @@ -107,7 +158,9 @@ static void kvm_write_cr(int reg, unsigned long value)
>>  	state->cr_valid[reg] = 1;
>>  	state->cached_cr[reg] = value;
>>  
>> -	if (reg == 0)
>> +	if (state->mode == PARAVIRT_LAZY_CPU)
>> +		kvm_hypercall(KVM_HYPERCALL_SET_CR, reg, value, 0, 0);
>> +	else if (reg == 0)
>>  		native_write_cr0(value);
>>  	else if (reg == 3)
>>  		native_write_cr3(value);
>> @@ -218,6 +271,18 @@ static void kvm_pmd_clear(pmd_t *pmdp)
>>  	kvm_mmu_write(pmdp, &pmd, sizeof(pmd));
>>  }
>>  
>> +static void kvm_set_lazy_mode(enum paravirt_lazy_mode mode)
>> +{
>> +	struct kvm_paravirt_state *state
>> +		= per_cpu(paravirt_state, smp_processor_id());
>> +
>> +	if (mode == PARAVIRT_LAZY_FLUSH || mode == PARAVIRT_LAZY_NONE) {
>> +		if (state->vmca->queue_index)
>> +			_kvm_hypercall(state, KVM_HYPERCALL_FLUSH, 0, 0, 0, 0);
>> +	}
>> +	state->mode = mode;
>>   
>>     
>
> No, you don't want to set state->mode to LAZY_FLUSH (its not a mode,
> just a action which overloads the interface).
>   

Thanks, I wasn't aware of that.

>> +}
>> +
>>  static void paravirt_ops_setup(void)
>>  {
>>  	paravirt_ops.name = "KVM";
>> 	@@ -249,6 +314,9 @@ static void paravirt_ops_setup(void)
>>  		paravirt_ops.set_pud = kvm_set_pud;
>>  	}
>>  
>> +	if (do_hypercall_batching)
>> +		paravirt_ops.set_lazy_mode = kvm_set_lazy_mode;
>> +
>>  	paravirt_ops.paravirt_enabled = 1;
>>  
>>  	apply_paravirt(__parainstructions, __parainstructions_end);
>> @@ -293,6 +361,9 @@ static int paravirt_initialize(void)
>>  	if ((edx & KVM_FEATURE_MMU_WRITE))
>>  		do_mmu_write = 1;
>>  
>> +	if ((edx & KVM_FEATURE_HYPERCALL_BATCHING))
>> +		do_hypercall_batching = 1;
>> +
>>  	on_each_cpu(paravirt_activate, NULL, 0, 1);
>>  
>>  	return 0;
>> @@ -303,6 +374,9 @@ static __init void paravirt_free_state(struct kvm_paravirt_state *state)
>>  	if (!state)
>>  		return;
>>  
>> +	if (state->queue)
>> +		__free_page(pfn_to_page(__pa(state->queue) >> PAGE_SHIFT));
>> +
>>  	if (state->hypercall)
>>  		__free_page(pfn_to_page(__pa(state->hypercall) >> PAGE_SHIFT));
>>  
>> @@ -329,8 +403,15 @@ static __init struct kvm_paravirt_state *paravirt_alloc_state(void)
>>  	if (!state->hypercall)
>>  		goto err;
>>  
>> +	state->queue = (void *)get_zeroed_page(GFP_KERNEL);
>> +	if (!state->queue)
>> +		goto err;
>> +
>>  	state->vmca_gpa = __pa(state->vmca);
>>  	state->vmca->hypercall_gpa = __pa(state->hypercall);
>> +	state->vmca->queue_gpa = __pa(state->queue);
>> +	state->vmca->max_queue_index
>> +		= (PAGE_SIZE / sizeof(struct kvm_hypercall_entry));
>>  
>>  	return state;
>>  
>> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
>> index b08272b..d531899 100644
>> --- a/drivers/kvm/kvm.h
>> +++ b/drivers/kvm/kvm.h
>> @@ -291,6 +291,7 @@ struct kvm_vcpu {
>>  	gpa_t para_state_gpa;
>>  	struct page *para_state_page;
>>  	gpa_t hypercall_gpa;
>> +	struct page *queue_page;
>>  	unsigned long cr4;
>>  	unsigned long cr8;
>>  	u64 pdptrs[4]; /* pae */
>> diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
>> index 4f65729..79a2a64 100644
>> --- a/drivers/kvm/kvm_main.c
>> +++ b/drivers/kvm/kvm_main.c
>> @@ -94,7 +94,8 @@ struct vfsmount *kvmfs_mnt;
>>  
>>  #define KVM_PARAVIRT_FEATURES \
>>  	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
>> -	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE)
>> +	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE | \
>> +	 KVM_FEATURE_HYPERCALL_BATCHING)
>>  
>>  #define KVM_MSR_SET_VMCA	0x87655678
>>  
>> @@ -1369,6 +1370,24 @@ static int kvm_hypercall_mmu_write(struct kvm_vcpu *vcpu, gva_t addr,
>>  	return 0;
>>  }
>>  
>> +static int kvm_hypercall_set_cr(struct kvm_vcpu *vcpu,
>> +				u32 reg, unsigned long value)
>> +{
>> +	switch (reg) {
>> +	case 0:
>> +		set_cr0(vcpu, value);
>> +		break;
>> +	case 3:
>> +		set_cr3(vcpu, value);
>> +		break;
>> +	case 4:
>> +		set_cr4(vcpu, value);
>> +		break;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>  			      unsigned long p1, unsigned long p2,
>>  			      unsigned long p3, unsigned long p4)
>> @@ -1376,10 +1395,36 @@ static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>  	switch (nr) {
>>  	case KVM_HYPERCALL_MMU_WRITE:
>>  		return kvm_hypercall_mmu_write(vcpu, p1, p2, p3, p4);
>> +	case KVM_HYPERCALL_SET_CR:
>> +		return kvm_hypercall_set_cr(vcpu, p1, p2);
>>  	}
>>  	return -ENOSYS;
>>  }
>>  
>> +static int kvm_hypercall_flush(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_hypercall_entry *queue;
>> +	struct kvm_vmca *vmca;
>> +	int ret = 0;
>> +	int i;
>> +
>> +	queue = kmap(vcpu->queue_page);
>> +	vmca = kmap(vcpu->para_state_page);
>>   
>>     
>
> kmap_atomic?  Or why not keep them mapped all the time?
>   

On the kvm side, this ends up calling emulator_write_phys() which IIRC 
can potentially sleep.

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]     ` <4675F568.90608-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  4:00       ` Jeremy Fitzhardinge
@ 2007-06-18  9:07       ` Avi Kivity
       [not found]         ` <46764B47.5060403-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  9:07 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:

> Implemented a hypercall queue that can be used when paravirt_ops lazy mode
> is enabled.  This patch enables queueing of MMU write operations and CR
> updates.  This results in about a 50% bump in kernbench performance.
>   

Nice!  But 50%? a kernel build is at native-25%, so we're now 25% faster 
than native?

> +	state->vmca->queue_gpa = __pa(state->queue);
> +	state->vmca->max_queue_index
> +		= (PAGE_SIZE / sizeof(struct kvm_hypercall_entry));
>  

Why not pass the queue address as an argument to KVM_HYPERCALL_FLUSH?  
That reduces the amount of setup, and allows more flexibility (e.g. 
multiple queues).

I'm not thrilled with having queues of hypercalls; instead  I'd prefer 
queues of mmu operations, but I guess it won't do any good to go against 
prevailing custom here.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46764B47.5060403-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]         ` <46764B47.5060403-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 12:40           ` Anthony Liguori
       [not found]             ` <46767D47.1010104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>
>> Implemented a hypercall queue that can be used when paravirt_ops lazy 
>> mode
>> is enabled.  This patch enables queueing of MMU write operations and CR
>> updates.  This results in about a 50% bump in kernbench performance.
>>   
>
> Nice!  But 50%? a kernel build is at native-25%, so we're now 25% 
> faster than native?

Well, I haven't measured KVM to be 25% of native with kernbench :-)  On 
my LS21 (AMD), I get:

KVM
Elapsed Time 1054.39 (25.8237)
User Time 371.844 (8.57204)
System Time 682.61 (17.7778)
Percent CPU 99.8 (0.447214)  
Sleeps 50115 (475.693)

KVM PV
Elapsed Time 595.85 (13.7058)
User Time 360.99 (9.56093)
System Time 234.704 (4.21283)
Percent CPU 99 (0)
Context Switches 46989.8 (328.277)
Sleep 47882.8 (242.583)

NATIVE
Elapsed Time 328.602 (0.212415)
User Time 304.364 (0.353171)
System Time 23.99 (0.325192)
Percent CPU 99 (0)
Context Switches 39785.2 (159.796)
Sleeps 46398.6 (311.466)

With Intel, we're still only about 60% of native to start out with.  The 
PV patches take us to about 72%.

>> +    state->vmca->queue_gpa = __pa(state->queue);
>> +    state->vmca->max_queue_index
>> +        = (PAGE_SIZE / sizeof(struct kvm_hypercall_entry));
>>  
>
> Why not pass the queue address as an argument to KVM_HYPERCALL_FLUSH?  
> That reduces the amount of setup, and allows more flexibility (e.g. 
> multiple queues).

I agree.  I had that at first and then changed it to not take the queue 
address.  I'll change it for the next rev.

> I'm not thrilled with having queues of hypercalls; instead  I'd prefer 
> queues of mmu operations, but I guess it won't do any good to go 
> against prevailing custom here.

lguest uses a hypercall queue and I figured that puppies were never a 
bad thing :-)

Having multiple queues would get pretty ugly.  We're still pretty slow 
on context-switch some I'm hoping that we can more aggressive queuing wise.

Regards,

Anthony Liguori


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46767D47.1010104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]             ` <46767D47.1010104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 12:50               ` Avi Kivity
       [not found]                 ` <46767F98.70109-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 12:50 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>
>>> Implemented a hypercall queue that can be used when paravirt_ops 
>>> lazy mode
>>> is enabled.  This patch enables queueing of MMU write operations and CR
>>> updates.  This results in about a 50% bump in kernbench performance.
>>>   
>>
>> Nice!  But 50%? a kernel build is at native-25%, so we're now 25% 
>> faster than native?
>
> Well, I haven't measured KVM to be 25% of native with kernbench :-)  
> On my LS21 (AMD), I get:

I did, but using kbuild (a simple 'make' with defconfig), not 
kernbench.  I get (elapsed time) 308 sec for kvm and 243 sec for native.

Intel however is much faster than AMD due to the recent optimizations, 
and I guess we get some pagetable thrashing with kernbench vs. kbuild.

>
> KVM
> Elapsed Time 1054.39 (25.8237)
> User Time 371.844 (8.57204)
> System Time 682.61 (17.7778)
> Percent CPU 99.8 (0.447214)  Sleeps 50115 (475.693)
>
> KVM PV
> Elapsed Time 595.85 (13.7058)
> User Time 360.99 (9.56093)
> System Time 234.704 (4.21283)
> Percent CPU 99 (0)
> Context Switches 46989.8 (328.277)
> Sleep 47882.8 (242.583)
>
> NATIVE
> Elapsed Time 328.602 (0.212415)
> User Time 304.364 (0.353171)
> System Time 23.99 (0.325192)
> Percent CPU 99 (0)
> Context Switches 39785.2 (159.796)
> Sleeps 46398.6 (311.466)
>
> With Intel, we're still only about 60% of native to start out with.  
> The PV patches take us to about 72%.
>

These numbers are pretty bad.  I'd like to improve them, even without PV.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46767F98.70109-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                 ` <46767F98.70109-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 13:03                   ` Gregory Haskins
       [not found]                     ` <1182171781.4593.38.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
  2007-06-18 13:22                   ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Gregory Haskins @ 2007-06-18 13:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

On Mon, 2007-06-18 at 15:50 +0300, Avi Kivity wrote:

> 
> These numbers are pretty bad.  I'd like to improve them, even without PV.
> 

There's a 20% speedup just waiting to be checked in in the lapic
branch ;)

(This gain was observed using "make mrproper; make defconfig; time make"
on Intel with a non-PV 2.6.21 guest when the lapic code was enabled).

> 


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <1182171781.4593.38.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                     ` <1182171781.4593.38.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
@ 2007-06-18 13:19                       ` Anthony Liguori
       [not found]                         ` <4676867E.1090208-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 13:19 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm-devel, virtualization

Gregory Haskins wrote:
> On Mon, 2007-06-18 at 15:50 +0300, Avi Kivity wrote:
>
>   
>> These numbers are pretty bad.  I'd like to improve them, even without PV.
>>
>>     
>
> There's a 20% speedup just waiting to be checked in in the lapic
> branch ;)
>
> (This gain was observed using "make mrproper; make defconfig; time make"
> on Intel with a non-PV 2.6.21 guest when the lapic code was enabled).
>   

So what was the final number you were seeing?  Were you seeing only 25% 
below native to start with?

Regards,

Anthony Liguori

>
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676867E.1090208-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                         ` <4676867E.1090208-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 13:25                           ` Gregory Haskins
  0 siblings, 0 replies; 85+ messages in thread
From: Gregory Haskins @ 2007-06-18 13:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

On Mon, 2007-06-18 at 08:19 -0500, Anthony Liguori wrote:
> Gregory Haskins wrote:
> > On Mon, 2007-06-18 at 15:50 +0300, Avi Kivity wrote:
> >
> >   
> >> These numbers are pretty bad.  I'd like to improve them, even without PV.
> >>
> >>     
> >
> > There's a 20% speedup just waiting to be checked in in the lapic
> > branch ;)
> >
> > (This gain was observed using "make mrproper; make defconfig; time make"
> > on Intel with a non-PV 2.6.21 guest when the lapic code was enabled).
> >   
> 
> So what was the final number you were seeing?  Were you seeing only 25% 
> below native to start with?

If memory serves me correctly, I was getting 9m30s pre-patch, and 7m55s
post-patch.  This translates (I believe) to approximately 19.5%
improvement.  I didn't measure against native.  


-Greg


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                 ` <46767F98.70109-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-06-18 13:03                   ` Gregory Haskins
@ 2007-06-18 13:22                   ` Anthony Liguori
       [not found]                     ` <46768724.3000509-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 13:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>> Anthony Liguori wrote:
>>>
>>>> Implemented a hypercall queue that can be used when paravirt_ops 
>>>> lazy mode
>>>> is enabled.  This patch enables queueing of MMU write operations 
>>>> and CR
>>>> updates.  This results in about a 50% bump in kernbench performance.
>>>>   
>>>
>>> Nice!  But 50%? a kernel build is at native-25%, so we're now 25% 
>>> faster than native?
>>
>> Well, I haven't measured KVM to be 25% of native with kernbench :-)  
>> On my LS21 (AMD), I get:
>
> I did, but using kbuild (a simple 'make' with defconfig), not 
> kernbench.  I get (elapsed time) 308 sec for kvm and 243 sec for native.

kernbench is a little different.  It does a find over the kernel source 
tree to attempt to get as much of the kernel in the page cache as 
possible.  It also uses -j4 by default.

> Intel however is much faster than AMD due to the recent optimizations, 
> and I guess we get some pagetable thrashing with kernbench vs. kbuild.
>
>>
>> KVM
>> Elapsed Time 1054.39 (25.8237)
>> User Time 371.844 (8.57204)
>> System Time 682.61 (17.7778)
>> Percent CPU 99.8 (0.447214)  Sleeps 50115 (475.693)
>>
>> KVM PV
>> Elapsed Time 595.85 (13.7058)
>> User Time 360.99 (9.56093)
>> System Time 234.704 (4.21283)
>> Percent CPU 99 (0)
>> Context Switches 46989.8 (328.277)
>> Sleep 47882.8 (242.583)
>>
>> NATIVE
>> Elapsed Time 328.602 (0.212415)
>> User Time 304.364 (0.353171)
>> System Time 23.99 (0.325192)
>> Percent CPU 99 (0)
>> Context Switches 39785.2 (159.796)
>> Sleeps 46398.6 (311.466)
>>
>> With Intel, we're still only about 60% of native to start out with.  
>> The PV patches take us to about 72%.
>>
>
> These numbers are pretty bad.  I'd like to improve them, even without PV.

I agree.  Do you know what's missing at this point?  There isn't a whole 
lot of state saving going on for the light weight exit paths for SVM.

Regards,

Anthony Liguori



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46768724.3000509-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                     ` <46768724.3000509-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 13:35                       ` Avi Kivity
       [not found]                         ` <46768A3F.2010202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 13:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
>>
>> I did, but using kbuild (a simple 'make' with defconfig), not 
>> kernbench.  I get (elapsed time) 308 sec for kvm and 243 sec for native.
>
> kernbench is a little different.  It does a find over the kernel 
> source tree to attempt to get as much of the kernel in the page cache 
> as possible.  It also uses -j4 by default.
>
>>
>> These numbers are pretty bad.  I'd like to improve them, even without 
>> PV.
>
> I agree.  Do you know what's missing at this point?  There isn't a 
> whole lot of state saving going on for the light weight exit paths for 
> SVM.

The SVM code doesn't even have a lightweight vmexit path.  For every 
vmexit, it does the entire thing, including vmload/vmsave, fpu switch 
(if needed), segment reloading, and msr reloading.  It could use a lot 
of work.

For kbuild vs. kernbench, I suspect that -j4 causes the shadow page 
table cache to thrash.  1024 pages may be enough for a single instance 
but not -j4.  Hopefully replacing the eviction algorithm (currently 
FIFO) will help.  Otherwise we'll need to resize the cache again.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46768A3F.2010202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                         ` <46768A3F.2010202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 14:02                           ` Anthony Liguori
       [not found]                             ` <4676905B.6000805-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 14:02 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
>>> These numbers are pretty bad.  I'd like to improve them, even 
>>> without PV.
>>
>> I agree.  Do you know what's missing at this point?  There isn't a 
>> whole lot of state saving going on for the light weight exit paths 
>> for SVM.
>
> The SVM code doesn't even have a lightweight vmexit path.  

Sure it does.  Quite a lot is deferred to vcpu_{load,put}.

> For every vmexit, it does the entire thing, including vmload/vmsave

I haven't had a lot of luck eliminating vmload/vmsave.

> , fpu switch (if needed)

The FPU switch can really be avoided?  Is it safe to assume that the KVM 
code isn't going to use any FPU operations?

> , segment reloading, and msr reloading.

Yeah, the VMX path is doing some clever things there.  A fair bit more 
could be deferred on SVM.

>   It could use a lot of work.
>
> For kbuild vs. kernbench, I suspect that -j4 causes the shadow page 
> table cache to thrash.  1024 pages may be enough for a single instance 
> but not -j4.  Hopefully replacing the eviction algorithm (currently 
> FIFO) will help.  Otherwise we'll need to resize the cache again.

I naively tried to bump it to 2048 and hit a kmalloc limitation.

Regards,

Anthony Liguori



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676905B.6000805-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                             ` <4676905B.6000805-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 15:08                               ` Avi Kivity
       [not found]                                 ` <46769FFE.6040502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 15:08 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Avi Kivity wrote:
>>>> These numbers are pretty bad.  I'd like to improve them, even 
>>>> without PV.
>>>
>>> I agree.  Do you know what's missing at this point?  There isn't a 
>>> whole lot of state saving going on for the light weight exit paths 
>>> for SVM.
>>
>> The SVM code doesn't even have a lightweight vmexit path.  
>
> Sure it does.  Quite a lot is deferred to vcpu_{load,put}.

Ah, I forgot.  Yes, the syscall msrs are deferred.

>
>> For every vmexit, it does the entire thing, including vmload/vmsave
>
> I haven't had a lot of luck eliminating vmload/vmsave.
>

For x86_64, the only issue I see is with TR.  Unfortunately, I don't see 
a way around it.


>> , fpu switch (if needed)
>
> The FPU switch can really be avoided?  Is it safe to assume that the 
> KVM code isn't going to use any FPU operations?

Generally, kernel code does not use the fpu (when it does, it calls 
kernel_fpu_begin() and kernel_fpu_end()).  The vmx code avoids the switch.

Of course, if the guest doesn't use the fpu, the switch is avoided anyway.

>>
>> For kbuild vs. kernbench, I suspect that -j4 causes the shadow page 
>> table cache to thrash.  1024 pages may be enough for a single 
>> instance but not -j4.  Hopefully replacing the eviction algorithm 
>> (currently FIFO) will help.  Otherwise we'll need to resize the cache 
>> again.
>
> I naively tried to bump it to 2048 and hit a kmalloc limitation.
>

struct kvm is 22K on x86_64.  Adding 1024 pointers makes it 30K.  What 
error did you get?

We should probably make the hashtable a pointer, and allocate vcpus 
separately as well.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46769FFE.6040502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                                 ` <46769FFE.6040502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 15:20                                   ` Anthony Liguori
       [not found]                                     ` <4676A2D4.2040704-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18 16:00                                   ` Avi Kivity
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 15:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>>>> These numbers are pretty bad.  I'd like to improve them, even 
>>>>> without PV.
>>>>
>>>> I agree.  Do you know what's missing at this point?  There isn't a 
>>>> whole lot of state saving going on for the light weight exit paths 
>>>> for SVM.
>>>
>>> The SVM code doesn't even have a lightweight vmexit path.  
>>
>> Sure it does.  Quite a lot is deferred to vcpu_{load,put}.
>
> Ah, I forgot.  Yes, the syscall msrs are deferred.
>
>>
>>> For every vmexit, it does the entire thing, including vmload/vmsave
>>
>> I haven't had a lot of luck eliminating vmload/vmsave.
>>
>
> For x86_64, the only issue I see is with TR.  Unfortunately, I don't 
> see a way around it.
>
>
>>> , fpu switch (if needed)
>>
>> The FPU switch can really be avoided?  Is it safe to assume that the 
>> KVM code isn't going to use any FPU operations?
>
> Generally, kernel code does not use the fpu (when it does, it calls 
> kernel_fpu_begin() and kernel_fpu_end()).  The vmx code avoids the 
> switch.
>
> Of course, if the guest doesn't use the fpu, the switch is avoided 
> anyway.
>
>>>
>>> For kbuild vs. kernbench, I suspect that -j4 causes the shadow page 
>>> table cache to thrash.  1024 pages may be enough for a single 
>>> instance but not -j4.  Hopefully replacing the eviction algorithm 
>>> (currently FIFO) will help.  Otherwise we'll need to resize the 
>>> cache again.
>>
>> I naively tried to bump it to 2048 and hit a kmalloc limitation.
>>
>
> struct kvm is 22K on x86_64.  Adding 1024 pointers makes it 30K.  What 
> error did you get?

With an older kvm, on a different system, I was getting:

WARNING: "__you_cannot_kzalloc_that_much"

On the latest git though, I don't seem to get that warning on my 
development system even if I bump all the way up to 8192.  I'll see what 
bumping to 2048 does to kernbench.  4MB is actually small compared to 
other hypervisors for a shadow page table cache (Xen defaults to 8mb) so 
we may see good results.

Regards,

Anthony Liguori

> We should probably make the hashtable a pointer, and allocate vcpus 
> separately as well.
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676A2D4.2040704-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                                     ` <4676A2D4.2040704-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 16:01                                       ` Avi Kivity
  0 siblings, 0 replies; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 16:01 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
>>
>> struct kvm is 22K on x86_64.  Adding 1024 pointers makes it 30K.  
>> What error did you get?
>
> With an older kvm, on a different system, I was getting:

Older kvms had the entire shadow cache in struct kvm (e.g. with 256 
entries, struct kvm was 1MB+).


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                                 ` <46769FFE.6040502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-06-18 15:20                                   ` Anthony Liguori
@ 2007-06-18 16:00                                   ` Avi Kivity
       [not found]                                     ` <4676AC10.3090007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-18 16:00 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
>
>>
>>> For every vmexit, it does the entire thing, including vmload/vmsave
>>
>> I haven't had a lot of luck eliminating vmload/vmsave.
>>
>
> For x86_64, the only issue I see is with TR.  Unfortunately, I don't 
> see a way around it.
>

I think we can avoid vmload (but not vmsave):

1. Allocate a host gdt entry for kvm's exclusive use.

2. The first entry into the guest needs vmload as usual. The second 
entry reuses already-loaded registers, except tr, gs.base, and kernelgsbase.

3. To load tr, copy the descriptor into our gdt entry, and execute ltr.

4. To load gs.base, load the saved value into MSR_KERNELGSBASE, and 
execute swapgs

5. To load kernelgsbase, use wrmsr()

However, I'm not at all sure it's worth it.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676AC10.3090007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation
       [not found]                                     ` <4676AC10.3090007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-18 17:47                                       ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 17:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

Avi Kivity wrote:
> Avi Kivity wrote:
>>
>>>
>>>> For every vmexit, it does the entire thing, including vmload/vmsave
>>>
>>> I haven't had a lot of luck eliminating vmload/vmsave.
>>>
>>
>> For x86_64, the only issue I see is with TR.  Unfortunately, I don't 
>> see a way around it.
>>
>
> I think we can avoid vmload (but not vmsave):
>
> 1. Allocate a host gdt entry for kvm's exclusive use.
>
> 2. The first entry into the guest needs vmload as usual. The second 
> entry reuses already-loaded registers, except tr, gs.base, and 
> kernelgsbase.
>
> 3. To load tr, copy the descriptor into our gdt entry, and execute ltr.
>
> 4. To load gs.base, load the saved value into MSR_KERNELGSBASE, and 
> execute swapgs
>
> 5. To load kernelgsbase, use wrmsr()
>
> However, I'm not at all sure it's worth it.

Yeah, that's where I left it too.

Regards,

Anthony Liguori


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 5/5] KVM: paravirt time source
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2007-06-18  3:00   ` [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation Anthony Liguori
@ 2007-06-18  3:03   ` Anthony Liguori
       [not found]     ` <4675F601.3090706-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  3:19   ` [PATCH 0/5] KVM paravirt_ops implementation Jeremy Fitzhardinge
  5 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  3:03 UTC (permalink / raw)
  To: kvm-devel; +Cc: virtualization

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

Regards,

Anthony Liguori

[-- Attachment #2: kvm-paravirt-time.diff --]
[-- Type: text/x-patch, Size: 4522 bytes --]

Subject: [PATCH] KVM: paravirt time source
Author: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

This is a paravirt time source for KVM based on Ingo Molnars similar patch.
A very different patch will probably be needed that takes advantage of hrtimers
but I have to learn a bit more about that first.

Signed-off-by: Anthony Liguori <aliguori-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

diff --git a/arch/i386/kernel/kvm.c b/arch/i386/kernel/kvm.c
index 77c36f4..25fb2c1 100644
--- a/arch/i386/kernel/kvm.c
+++ b/arch/i386/kernel/kvm.c
@@ -26,6 +26,16 @@
 #include <linux/cpu.h>
 #include <linux/mm.h>
 
+#include <linux/clocksource.h>
+#include <linux/workqueue.h>
+#include <linux/cpufreq.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <linux/acpi_pmtmr.h>
+
+#include "mach_timer.h"
+
 #define CR0_TS_MASK (1ULL << 3)
 
 struct kvm_paravirt_state
@@ -48,6 +58,7 @@ static int do_hypercall_batching;
 static int do_mmu_write;
 static int do_cr_read_caching;
 static int do_nop_io_delay;
+static int do_paravirt_clock;
 static u64 msr_set_vmca;
 
 static long _kvm_hypercall(struct kvm_paravirt_state *state,
@@ -120,6 +131,27 @@ static long kvm_hypercall(unsigned int nr, unsigned long p1,
 	return ret;
 }
 
+static cycle_t read_hyper(void)
+{
+	struct timespec now;
+	int ret;
+
+	ret = kvm_hypercall(KVM_HYPERCALL_GET_KTIME, (u32)&now, 0, 0, 0);
+	WARN_ON(ret);
+
+	return now.tv_nsec + now.tv_sec * (cycles_t)1e9;
+}
+
+static struct clocksource clocksource_hyper = {
+	.name			= "hyper",
+	.rating			= 200,
+	.read			= read_hyper,
+	.mask			= CLOCKSOURCE_MASK(64),
+	.mult			= 1,
+	.shift			= 0,
+	.flags			= CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
 /*
  * No need for any "IO delay" on KVM
  */
@@ -317,6 +349,14 @@ static void kvm_paravirt_ops_setup(void)
 	if (do_hypercall_batching)
 		paravirt_ops.set_lazy_mode = kvm_set_lazy_mode;
 
+	if (do_paravirt_clock) {
+		int err;
+
+		err = clocksource_register(&clocksource_hyper);
+		WARN_ON(err);
+		printk(KERN_INFO "KVM: using paravirt clock source\n");
+	}
+
 	paravirt_ops.paravirt_enabled = 1;
 
 	apply_paravirt(__parainstructions, __parainstructions_end);
@@ -364,6 +404,9 @@ static int kvm_paravirt_initialize(void)
 	if ((edx & KVM_FEATURE_HYPERCALL_BATCHING))
 		do_hypercall_batching = 1;
 
+	if ((edx & KVM_FEATURE_PARAVIRT_CLOCK))
+		do_paravirt_clock = 1;
+
 	on_each_cpu(kvm_paravirt_activate, NULL, 0, 1);
 
 	return 0;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 55711a0..b8a0811 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -44,6 +44,7 @@
 #include <linux/cpumask.h>
 #include <linux/smp.h>
 #include <linux/kvm_para.h>
+#include <linux/delay.h>
 
 #include "x86_emulate.h"
 #include "segment_descriptor.h"
@@ -95,7 +96,7 @@ struct vfsmount *kvmfs_mnt;
 #define KVM_PARAVIRT_FEATURES \
 	(KVM_FEATURE_VMCA | KVM_FEATURE_NOP_IO_DELAY | \
 	 KVM_FEATURE_CR_READ_CACHE | KVM_FEATURE_MMU_WRITE | \
-	 KVM_FEATURE_HYPERCALL_BATCHING)
+	 KVM_FEATURE_HYPERCALL_BATCHING | KVM_FEATURE_PARAVIRT_CLOCK)
 
 #define KVM_MSR_SET_VMCA	0x87655678
 
@@ -1388,6 +1389,20 @@ static int kvm_hypercall_set_cr(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+static int kvm_hypercall_get_ktime(struct kvm_vcpu *vcpu, gva_t va)
+{
+	struct timespec now;
+	int ret;
+
+	ktime_get_ts(&now);
+
+	ret = kvm_write_guest(vcpu, va, sizeof(now), &now);
+	if (unlikely(ret))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 			      unsigned long p1, unsigned long p2,
 			      unsigned long p3, unsigned long p4)
@@ -1397,6 +1412,8 @@ static int dispatch_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 		return kvm_hypercall_mmu_write(vcpu, p1, p2, p3, p4);
 	case KVM_HYPERCALL_SET_CR:
 		return kvm_hypercall_set_cr(vcpu, p1, p2);
+	case KVM_HYPERCALL_GET_KTIME:
+		return kvm_hypercall_get_ktime(vcpu, p1);
 	}
 	return -ENOSYS;
 }
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 7dd0cef..9aed003 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -16,6 +16,7 @@
 #define KVM_FEATURE_CR_READ_CACHE	(1UL << 2)
 #define KVM_FEATURE_MMU_WRITE		(1UL << 3)
 #define KVM_FEATURE_HYPERCALL_BATCHING	(1UL << 4)
+#define KVM_FEATURE_PARAVIRT_CLOCK	(1UL << 5)
 
 struct kvm_vmca
 {
@@ -48,5 +49,6 @@ struct kvm_hypercall_entry
 #define KVM_HYPERCALL_MMU_WRITE		0
 #define KVM_HYPERCALL_SET_CR		1
 #define KVM_HYPERCALL_FLUSH		2
+#define KVM_HYPERCALL_GET_KTIME		3
 
 #endif

[-- Attachment #3: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

[parent not found: <4675F601.3090706-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]     ` <4675F601.3090706-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  9:24       ` Avi Kivity
  2007-06-18 19:11       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 85+ messages in thread
From: Avi Kivity @ 2007-06-18  9:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> +static int kvm_hypercall_get_ktime(struct kvm_vcpu *vcpu, gva_t va)
> +{
> +	struct timespec now;
> +	int ret;
> +
> +	ktime_get_ts(&now);
> +
> +	ret = kvm_write_guest(vcpu, va, sizeof(now), &now);
> +	if (unlikely(ret))
> +		return -EFAULT;
> +
> +	return 0;
> +}

Please use physical addresses (much faster, less possibility of 
confusion).  struct kvm_timespec instead of timespec. KVM_EFAULT instead 
of EFAULT.



-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]     ` <4675F601.3090706-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  9:24       ` Avi Kivity
@ 2007-06-18 19:11       ` Jeremy Fitzhardinge
       [not found]         ` <4676D8E4.3020806-TSDbQ3PG+2Y@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18 19:11 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> +static cycle_t read_hyper(void)
> +{
> +	struct timespec now;
> +	int ret;
> +
> +	ret = kvm_hypercall(KVM_HYPERCALL_GET_KTIME, (u32)&now, 0, 0, 0);
> +	WARN_ON(ret);
> +
> +	return now.tv_nsec + now.tv_sec * (cycles_t)1e9;
>   

Hm, use of FP looks pretty odd.  I guess its OK to assume the compiler
will completely remove all the FP stuff at compile time.  Or you could
use NSEC_PER_SEC.

> +}
> +
> +static struct clocksource clocksource_hyper = {
> +	.name			= "hyper",
> +	.rating			= 200,
>   

We should probably standardize on this.  I guess that if you're in a
paravirt environment, and there's a paravirt clocksource, that would
always be the best clocksource to use.

> +	.read			= read_hyper,
> +	.mask			= CLOCKSOURCE_MASK(64),
> +	.mult			= 1,
> +	.shift			= 0,
>   

It would be better to use a scale and shift here, so that adjtime has
something to work with when warping time.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676D8E4.3020806-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]         ` <4676D8E4.3020806-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18 21:52           ` Anthony Liguori
       [not found]             ` <4676FEB9.6060308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-19 20:38           ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 21:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> +static cycle_t read_hyper(void)
>> +{
>> +	struct timespec now;
>> +	int ret;
>> +
>> +	ret = kvm_hypercall(KVM_HYPERCALL_GET_KTIME, (u32)&now, 0, 0, 0);
>> +	WARN_ON(ret);
>> +
>> +	return now.tv_nsec + now.tv_sec * (cycles_t)1e9;
>>   
>>     
>
> Hm, use of FP looks pretty odd.  I guess its OK to assume the compiler
> will completely remove all the FP stuff at compile time.  Or you could
> use NSEC_PER_SEC.
>   

Agreed.

>> +}
>> +
>> +static struct clocksource clocksource_hyper = {
>> +	.name			= "hyper",
>> +	.rating			= 200,
>>   
>>     
>
> We should probably standardize on this.  I guess that if you're in a
> paravirt environment, and there's a paravirt clocksource, that would
> always be the best clocksource to use.
>
>   
>> +	.read			= read_hyper,
>> +	.mask			= CLOCKSOURCE_MASK(64),
>> +	.mult			= 1,
>> +	.shift			= 0,
>>   
>>     
>
> It would be better to use a scale and shift here, so that adjtime has
> something to work with when warping time.
>   

Okay.  I may remove this patch from the patch series and attempt to sit 
down next week and work out something more complete that also implements 
stolen time accounting.

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676FEB9.6060308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]             ` <4676FEB9.6060308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 22:04               ` Jeremy Fitzhardinge
       [not found]                 ` <46770162.6030101-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18 22:04 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Okay.  I may remove this patch from the patch series and attempt to
> sit down next week and work out something more complete that also
> implements stolen time accounting.

Well, that's a separate problem.  clocksource.read should always return
real time passed, so stolen time doesn't come into it. 
paravirt_ops.sched_clock should take stolen time into account, but
that's almost completely orthogonal.

How are you doing clockevents?

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46770162.6030101-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                 ` <46770162.6030101-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18 23:33                   ` Anthony Liguori
       [not found]                     ` <4677163F.2030308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 23:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Okay.  I may remove this patch from the patch series and attempt to
>> sit down next week and work out something more complete that also
>> implements stolen time accounting.
>>     
>
> Well, that's a separate problem.  clocksource.read should always return
> real time passed, so stolen time doesn't come into it.

Right.

>  
> paravirt_ops.sched_clock should take stolen time into account, but
> that's almost completely orthogonal.
>   

Except that I wanted to change the hypercall to allow querying of real 
time or "available" time as VMI puts it.

> How are you doing clockevents?
>   

Right now, I'm relying on the PIT but it would be nice to eliminate 
that.  I'd like to move to something PV so that I can make use of 
tickless guest kernels.  I'm very open to suggestion and even more open 
to reusing other people's code :-)

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4677163F.2030308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                     ` <4677163F.2030308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 23:56                       ` Jeremy Fitzhardinge
       [not found]                         ` <46771BA0.2000308-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-19  7:44                       ` Avi Kivity
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18 23:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> Jeremy Fitzhardinge wrote:
>> Anthony Liguori wrote:
>>  
>>> Okay.  I may remove this patch from the patch series and attempt to
>>> sit down next week and work out something more complete that also
>>> implements stolen time accounting.
>>>     
>>
>> Well, that's a separate problem.  clocksource.read should always return
>> real time passed, so stolen time doesn't come into it.
>
> Right.
>
>>  
>> paravirt_ops.sched_clock should take stolen time into account, but
>> that's almost completely orthogonal.
>>   
>
> Except that I wanted to change the hypercall to allow querying of real
> time or "available" time as VMI puts it.

I see.  You could have an interface like Xen's runstate interface, which
gives you a breakdown of how long each vcpu spends in each state
(running, runnable, paused or blocked).  The sum of all of them gives
you real time, or you can use a subset to work out stolen time
(runnable+paused), busyness (blocked / (running+runnable+blocked)), etc.

> Right now, I'm relying on the PIT but it would be nice to eliminate
> that.  I'd like to move to something PV so that I can make use of
> tickless guest kernels.  I'm very open to suggestion and even more
> open to reusing other people's code :-)

A simple hypercall interface which says "raise irq X after N ns" is
probably the easiest way to go.  Try to avoid the pitfalls of VMI's
interface in this area, which tries to recycle existing architectural
interrupt sources for hypervisor timers, and gets into a bit of mess. 
Xen is relatively clean, but I'm not sure how it would fit into the
mostly-emulated kvm model.  If you keep things simple, most of the Xen
clockevent code could be reused with little change.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46771BA0.2000308-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                         ` <46771BA0.2000308-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-19  0:53                           ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-19  0:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Except that I wanted to change the hypercall to allow querying of real
>> time or "available" time as VMI puts it.
>>     
>
> I see.  You could have an interface like Xen's runstate interface, which
> gives you a breakdown of how long each vcpu spends in each state
> (running, runnable, paused or blocked).  The sum of all of them gives
> you real time, or you can use a subset to work out stolen time
> (runnable+paused), busyness (blocked / (running+runnable+blocked)), etc.
>   

I'll look at that.  Since the vmca is per-vcpu, that may work well.

>> Right now, I'm relying on the PIT but it would be nice to eliminate
>> that.  I'd like to move to something PV so that I can make use of
>> tickless guest kernels.  I'm very open to suggestion and even more
>> open to reusing other people's code :-)
>>     
>
> A simple hypercall interface which says "raise irq X after N ns" is
> probably the easiest way to go.

That's what I was thinking.  The current bit that makes that ugly is 
that you can't really raise an interrupt in the host kernel yet 
(although it will be possible when the in-kernel apic code is merged).  
I could, in theory, implement that interface in userspace but that may 
be ugly.

> Try to avoid the pitfalls of VMI's
> interface in this area, which tries to recycle existing architectural
> interrupt sources for hypervisor timers, and gets into a bit of mess. 
> Xen is relatively clean, but I'm not sure how it would fit into the
> mostly-emulated kvm model.  If you keep things simple, most of the Xen
> clockevent code could be reused with little change.
>   

I think it should work out well.  Thanks for the feedback!

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                     ` <4677163F.2030308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18 23:56                       ` Jeremy Fitzhardinge
@ 2007-06-19  7:44                       ` Avi Kivity
       [not found]                         ` <4677894D.3050500-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Avi Kivity @ 2007-06-19  7:44 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization, Jeremy Fitzhardinge

Anthony Liguori wrote:
>
>> How are you doing clockevents?
>>   
>>     
>
> Right now, I'm relying on the PIT but it would be nice to eliminate 
> that.  I'd like to move to something PV so that I can make use of 
> tickless guest kernels.  I'm very open to suggestion and even more open 
> to reusing other people's code :-)
>   

You'll need an irqchip to inject irqs.  I think Ingo's patchset had an 
irqchip implementation.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4677894D.3050500-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                         ` <4677894D.3050500-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-06-19  8:04                           ` Rusty Russell
  0 siblings, 0 replies; 85+ messages in thread
From: Rusty Russell @ 2007-06-19  8:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, virtualization

On Tue, 2007-06-19 at 10:44 +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
> >
> >> How are you doing clockevents?
> >>   
> >>     
> >
> > Right now, I'm relying on the PIT but it would be nice to eliminate 
> > that.  I'd like to move to something PV so that I can make use of 
> > tickless guest kernels.  I'm very open to suggestion and even more open 
> > to reusing other people's code :-)
> >   
> 
> You'll need an irqchip to inject irqs.  I think Ingo's patchset had an 
> irqchip implementation.

Lguest has one in -rc4-mm2, but I'd look at Ingo's too.  Lguest's is
very much "mY fIrsT IrqchIP"...

We do tickless and it seems to work.

Cheers,
Rusty.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]         ` <4676D8E4.3020806-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-18 21:52           ` Anthony Liguori
@ 2007-06-19 20:38           ` Anthony Liguori
       [not found]             ` <46783EDB.5010808-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-19 20:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
>> +	.read			= read_hyper,
>> +	.mask			= CLOCKSOURCE_MASK(64),
>> +	.mult			= 1,
>> +	.shift			= 0,
>>   
>>     
>
> It would be better to use a scale and shift here, so that adjtime has
> something to work with when warping time.
>   

I've updated this patch and switched to using a scale/shift like Xen is 
doing, but I must admit, I don't understand how it helps adjtime.  I 
poked around a bit and it wasn't obvious.

Why is having {mult=1<<22, shift=22} better for adjtime than {mult=1, 
shift=0}?

Regards,

ANthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46783EDB.5010808-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]             ` <46783EDB.5010808-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-19 21:26               ` Jeremy Fitzhardinge
       [not found]                 ` <46784A0E.3080502-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-21  7:04               ` Dong, Eddie
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-19 21:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> I've updated this patch and switched to using a scale/shift like Xen
> is doing, but I must admit, I don't understand how it helps adjtime. 
> I poked around a bit and it wasn't obvious.
>
> Why is having {mult=1<<22, shift=22} better for adjtime than {mult=1,
> shift=0}?

I don't fully understand it myself, but I think its because adjtime
plays with the mult factor to scale the rate at which time passes.  If
the scale is 1, then it can only scale time by integer amounts. By
setting it to 2^22, then it can adjust time down to 1 part in 4 million.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46784A0E.3080502-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]                 ` <46784A0E.3080502-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-19 21:38                   ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-19 21:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> I've updated this patch and switched to using a scale/shift like Xen
>> is doing, but I must admit, I don't understand how it helps adjtime. 
>> I poked around a bit and it wasn't obvious.
>>
>> Why is having {mult=1<<22, shift=22} better for adjtime than {mult=1,
>> shift=0}?
>>     
>
> I don't fully understand it myself, but I think its because adjtime
> plays with the mult factor to scale the rate at which time passes.  If
> the scale is 1, then it can only scale time by integer amounts. By
> setting it to 2^22, then it can adjust time down to 1 part in 4 million.
>   

Ah, thanks.

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 5/5] KVM: paravirt time source
       [not found]             ` <46783EDB.5010808-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-19 21:26               ` Jeremy Fitzhardinge
@ 2007-06-21  7:04               ` Dong, Eddie
  1 sibling, 0 replies; 85+ messages in thread
From: Dong, Eddie @ 2007-06-21  7:04 UTC (permalink / raw)
  To: Anthony Liguori, Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

>   
>
>I've updated this patch and switched to using a scale/shift 
>like Xen is 
>doing, but I must admit, I don't understand how it helps adjtime.  I 
>poked around a bit and it wasn't obvious.

I think the reason is that Xen can't use FP to avoid FP save/restore 
at VM Exit time, while PV guest can use FP if want.

No matter scale/shift or FP, hypervisor need to guarantee the time
is accurate enough for years and no overflow. A server may 
run for couple years:-)


>
>Why is having {mult=1<<22, shift=22} better for adjtime than {mult=1, 
>shift=0}?
>

To avoid overflow after long run. (at least Xen/IA64 is for that
purpose).

thx,eddie

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2007-06-18  3:03   ` [PATCH 5/5] KVM: paravirt time source Anthony Liguori
@ 2007-06-18  3:19   ` Jeremy Fitzhardinge
       [not found]     ` <4675F9DE.6080806-TSDbQ3PG+2Y@public.gmane.org>
  5 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18  3:19 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, virtualization

Anthony Liguori wrote:
> 1) Not really sure what is needed for CONFIG_PREEMPT support.  I'm not
> sure which paravirt_ops calls are actually re-entrant.

I'm not sure that has specifically come up.  The main issue is whether a
particular call can be preempted and whether that matters.  I guess the
calls which affect a particular CPU's state will generally be called in
a non-preemptable context, but I guess we can't assume that; the best
approach is to assume that each call be atomic with respect to preemption.

Things like batching must be completed with preemption disabled over the
whole batch.  I check that with BUG_ON in the Xen code.

> 2) The paravirt_ops implementation is registered with
> core_initcall().  However, the paravirt_ops banner is also printed
> with core_initcall() so that fact that this works now is just the luck
> of build order.  Need a better way to initialize the KVM paravirt_ops
> backend.

Hm.  We could make the banner printing later; obviously its purely
cosmetic.  I put it as a core_initcall on the assumption that pv-ops
would be set up very early as it is with Xen and lguest, and so the
banner should print relatively early.  But if your model is that you
boot fully virtualized for a while, and then become paravirtualized
later, then it would make sense to defer banner printing until then.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4675F9DE.6080806-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]     ` <4675F9DE.6080806-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18  3:36       ` Anthony Liguori
       [not found]         ` <4675FDCA.4040006-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18  3:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Zachary Amsden, kvm-devel, virtualization

Hi Jeremy,

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> 1) Not really sure what is needed for CONFIG_PREEMPT support.  I'm not
>> sure which paravirt_ops calls are actually re-entrant.
>>     
>
> I'm not sure that has specifically come up.  The main issue is whether a
> particular call can be preempted and whether that matters.  I guess the
> calls which affect a particular CPU's state will generally be called in
> a non-preemptable context, but I guess we can't assume that; the best
> approach is to assume that each call be atomic with respect to preemption.
>   

So each call would need to disable preemption?  I'm not sure that makes 
a whole lot of sense for something like CR reads/writes.  In fact, 
without passing in a cpu parameter, I'm pretty sure that those 
operations *have* to require preemption to be disabled.

For something like MMU operations, preemption really doesn't have to be 
disabled.

> Things like batching must be completed with preemption disabled over the
> whole batch.  I check that with BUG_ON in the Xen code.
>   

Right now the KVM batching requires preemption to be disabled for 
batching.  I don't think that's a hard requirement though since we could 
pass the current batch PA as part of the flush hypercalls.

Things are a lot easier though if we can just assume preemption is 
disabled :-)

Are you aware of any paravirt_ops calls that are probably being called 
in the kernel with preemption enabled?

>> 2) The paravirt_ops implementation is registered with
>> core_initcall().  However, the paravirt_ops banner is also printed
>> with core_initcall() so that fact that this works now is just the luck
>> of build order.  Need a better way to initialize the KVM paravirt_ops
>> backend.
>>     
>
> Hm.  We could make the banner printing later; obviously its purely
> cosmetic.  I put it as a core_initcall on the assumption that pv-ops
> would be set up very early as it is with Xen and lguest, and so the
> banner should print relatively early.  But if your model is that you
> boot fully virtualized for a while, and then become paravirtualized
> later, then it would make sense to defer banner printing until then.
>   

I don't see a compelling reason to paravirtualize earlier although I 
also don't see a compelling reason not too.  I noticed that VMI hooks 
setup.c.  It wasn't immediately obvious why it was hooking there but 
perhaps it worthwhile to have a common hook?  I suspect VMI and KVM will 
have a similar model for startup.

Thanks for the feedback!

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4675FDCA.4040006-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]         ` <4675FDCA.4040006-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18  4:21           ` Jeremy Fitzhardinge
       [not found]             ` <4676084F.3090901-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-19 23:49           ` Zachary Amsden
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18  4:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Zachary Amsden, kvm-devel, virtualization

Anthony Liguori wrote:
> Hi Jeremy,
>
> Jeremy Fitzhardinge wrote:
>> Anthony Liguori wrote:
>>  
>>> 1) Not really sure what is needed for CONFIG_PREEMPT support.  I'm not
>>> sure which paravirt_ops calls are actually re-entrant.
>>>     
>>
>> I'm not sure that has specifically come up.  The main issue is whether a
>> particular call can be preempted and whether that matters.  I guess the
>> calls which affect a particular CPU's state will generally be called in
>> a non-preemptable context, but I guess we can't assume that; the best
>> approach is to assume that each call be atomic with respect to
>> preemption.
>>   
>
> So each call would need to disable preemption?  I'm not sure that
> makes a whole lot of sense for something like CR reads/writes.  In
> fact, without passing in a cpu parameter, I'm pretty sure that those
> operations *have* to require preemption to be disabled.

Yeah, its a little unclear to me.  If you're poking at a control
register, then one presumes you've got a specific CPU's CRx in mind. 
But in the Xen code I don't care about the preemption state for control
register updates - except for write_cr3, which never makes any sense
with preemption enabled.

> For something like MMU operations, preemption really doesn't have to
> be disabled.

Unless you're batching, since the lazy_mode is inherently per-cpu state.

>> Things like batching must be completed with preemption disabled over the
>> whole batch.  I check that with BUG_ON in the Xen code.
>>   
>
> Right now the KVM batching requires preemption to be disabled for
> batching.

I think that's probably overkill.  I had to put a few explicit
preempt_disable/enables in the Xen code, but mostly the preempt state is
reasonable for a given operation (ie, disabled for per-cpu state
updates, enabled for memory/global state updates).

>   I don't think that's a hard requirement though since we could pass
> the current batch PA as part of the flush hypercalls.

PA?

> Things are a lot easier though if we can just assume preemption is
> disabled :-)
>
> Are you aware of any paravirt_ops calls that are probably being called
> in the kernel with preemption enabled?

Erm, I haven't made a breakdown, but many are.  The descriptor updates
generally are, for example.  Pagetable updates could be, but are
generally done under a pagetable lock, and so are not preemptible anyway.

> I don't see a compelling reason to paravirtualize earlier although I
> also don't see a compelling reason not too.  I noticed that VMI hooks
> setup.c.  It wasn't immediately obvious why it was hooking there but
> perhaps it worthwhile to have a common hook?  I suspect VMI and KVM
> will have a similar model for startup.

Well, I was suggesting we could print the banner later rather than
forcing an earlier init.

The important part is that you set your pv_ops before patching occurs,
since that will bake the function calls into the rest of the kernel, and
it will ignore any further changes to the paravirt_ops structure.

I think Zach was originally thinking of initializing VMI much later
(even as a module load), but the subtleties of inveigling its way into
the kernel at that late stage got too complex.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4676084F.3090901-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]             ` <4676084F.3090901-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18 12:46               ` Anthony Liguori
       [not found]                 ` <46767EB2.4090707-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-20  0:21               ` Zachary Amsden
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 12:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Zachary Amsden, kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Hi Jeremy,
>>
>> Jeremy Fitzhardinge wrote:
>>     
>>> Anthony Liguori wrote:
>>>  
>>>       
>>>> 1) Not really sure what is needed for CONFIG_PREEMPT support.  I'm not
>>>> sure which paravirt_ops calls are actually re-entrant.
>>>>     
>>>>         
>>> I'm not sure that has specifically come up.  The main issue is whether a
>>> particular call can be preempted and whether that matters.  I guess the
>>> calls which affect a particular CPU's state will generally be called in
>>> a non-preemptable context, but I guess we can't assume that; the best
>>> approach is to assume that each call be atomic with respect to
>>> preemption.
>>>   
>>>       
>> So each call would need to disable preemption?  I'm not sure that
>> makes a whole lot of sense for something like CR reads/writes.  In
>> fact, without passing in a cpu parameter, I'm pretty sure that those
>> operations *have* to require preemption to be disabled.
>>     
>
> Yeah, its a little unclear to me.  If you're poking at a control
> register, then one presumes you've got a specific CPU's CRx in mind. 
> But in the Xen code I don't care about the preemption state for control
> register updates - except for write_cr3, which never makes any sense
> with preemption enabled.
>
>   
>> For something like MMU operations, preemption really doesn't have to
>> be disabled.
>>     
>
> Unless you're batching, since the lazy_mode is inherently per-cpu state.
>
>   
>>> Things like batching must be completed with preemption disabled over the
>>> whole batch.  I check that with BUG_ON in the Xen code.
>>>   
>>>       
>> Right now the KVM batching requires preemption to be disabled for
>> batching.
>>     
>
> I think that's probably overkill.  I had to put a few explicit
> preempt_disable/enables in the Xen code, but mostly the preempt state is
> reasonable for a given operation (ie, disabled for per-cpu state
> updates, enabled for memory/global state updates).
>
>   
>>   I don't think that's a hard requirement though since we could pass
>> the current batch PA as part of the flush hypercalls.
>>     
>
> PA?
>
>   
>> Things are a lot easier though if we can just assume preemption is
>> disabled :-)
>>
>> Are you aware of any paravirt_ops calls that are probably being called
>> in the kernel with preemption enabled?
>>     
>
> Erm, I haven't made a breakdown, but many are.  The descriptor updates
> generally are, for example.  Pagetable updates could be, but are
> generally done under a pagetable lock, and so are not preemptible anyway.
>
>   
>> I don't see a compelling reason to paravirtualize earlier although I
>> also don't see a compelling reason not too.  I noticed that VMI hooks
>> setup.c.  It wasn't immediately obvious why it was hooking there but
>> perhaps it worthwhile to have a common hook?  I suspect VMI and KVM
>> will have a similar model for startup.
>>     
>
> Well, I was suggesting we could print the banner later rather than
> forcing an earlier init.
>   

Perhaps we can just print the banner before batching occurs?  Then it's 
being printed at the last possible moment.

Regards,

Anthony Liguori

> The important part is that you set your pv_ops before patching occurs,
> since that will bake the function calls into the rest of the kernel, and
> it will ignore any further changes to the paravirt_ops structure.
>
> I think Zach was originally thinking of initializing VMI much later
> (even as a module load), but the subtleties of inveigling its way into
> the kernel at that late stage got too complex.
>
>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46767EB2.4090707-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                 ` <46767EB2.4090707-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-18 14:46                   ` Jeremy Fitzhardinge
       [not found]                     ` <46769AD2.9080105-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-18 14:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Zachary Amsden, kvm-devel, virtualization

Anthony Liguori wrote:
> Perhaps we can just print the banner before batching occurs?  Then
> it's being printed at the last possible moment.

s/batching/patching/?  Yes, that would work.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46769AD2.9080105-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                     ` <46769AD2.9080105-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-18 15:07                       ` Anthony Liguori
  0 siblings, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-18 15:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Zachary Amsden, kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> Perhaps we can just print the banner before batching occurs?  Then
>> it's being printed at the last possible moment.
>>     
>
> s/batching/patching/?  Yes, that would work.
>   

Yeah, sorry :-)  I'll add another patch to the series.

Regards,

Anthony Liguori

>     J
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]             ` <4676084F.3090901-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-18 12:46               ` Anthony Liguori
@ 2007-06-20  0:21               ` Zachary Amsden
       [not found]                 ` <46787325.30804-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20  0:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Well, I was suggesting we could print the banner later rather than
> forcing an earlier init.
>
> The important part is that you set your pv_ops before patching occurs,
> since that will bake the function calls into the rest of the kernel, and
> it will ignore any further changes to the paravirt_ops structure.
>
> I think Zach was originally thinking of initializing VMI much later
> (even as a module load), but the subtleties of inveigling its way into
> the kernel at that late stage got too complex.
>   

Definition: software-reliant paravirtualization is a guest-involved 
virtualization technique in which non-virtualizable operations are 
substituted in software with virtualized operations, thus making 
redirection of instruction flow necessary for correct operation.

For software-reliant paravirtualization, it is difficult to atomically 
switch from natural instructions to simulated para-instructions on the 
fly; you would need stop_machine_run that also holds off NMIs (so as to 
keep IF flag state intact across a window where non-virtualizable IRET 
instruction is not yet patched), and you would need to re-patch the 
kernel and modules dynamically.  Another problem is unloading the 
module, which requires restoring the smashed native paravirt-ops - some 
of which may have been patched, some not.  It is possible to do this 
from a module, just obtuse, and for 32-bit, not really worth the effort 
IMHO.

Definition: software-advisory paravirtualization is a guest-involved 
virtualization technique in which only advisory state is communicated to 
the hypervisor, thus making redirection of instruction flow at any 
particular point optional for more efficient virtualization (and 
non-virtualizability is eliminated by some other mechanism).

For software-advisory paravirtualization, it is totally possible to just 
switch over to new pv-ops at any time, and there need be no atomicity.  
This would make a paravirt-ops module rather easy to write; it simply 
needs to run some init code on each CPU and the patch paravirt-ops at 
leisure.

Now it is quite likely at least one developer is going to be assuming 
hardware virtualization capabilities for 64-bit paravirt, thus making an 
advisory method with module loading (and unloading) a more practical 
option than dissecting the 64-bit startup sequence.  In that case, 
perhaps having a paravirt_register function which would check to make 
sure no conflicting paravirt-ops have already been installed, printing 
the banner on success would be the most logical.  The 
paravirt_unregister function can then simply restore the native 
paravirt-ops.

More importantly, now device drivers for virtual devices would have a 
way to inquire into which set of paravirt-ops was loaded by having an 
official registered interface rather than an ad-hoc (if xxx_running == 
1) mess, and now the paravirt driver modules are nicely decoupled from 
the boot-strap code and can be loaded dynamically.

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46787325.30804-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                 ` <46787325.30804-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 14:22                   ` Anthony Liguori
       [not found]                     ` <4679381E.9090404-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 14:22 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Zachary Amsden wrote:
> For software-reliant paravirtualization, it is difficult to atomically 
> switch from natural instructions to simulated para-instructions on the 
> fly; you would need stop_machine_run that also holds off NMIs (so as 
> to keep IF flag state intact across a window where non-virtualizable 
> IRET instruction is not yet patched), and you would need to re-patch 
> the kernel and modules dynamically.  Another problem is unloading the 
> module, which requires restoring the smashed native paravirt-ops - 
> some of which may have been patched, some not.  It is possible to do 
> this from a module, just obtuse, and for 32-bit, not really worth the 
> effort IMHO.
>
> Definition: software-advisory paravirtualization is a guest-involved 
> virtualization technique in which only advisory state is communicated 
> to the hypervisor, thus making redirection of instruction flow at any 
> particular point optional for more efficient virtualization (and 
> non-virtualizability is eliminated by some other mechanism).
>
> For software-advisory paravirtualization, it is totally possible to 
> just switch over to new pv-ops at any time, and there need be no 
> atomicity.  This would make a paravirt-ops module rather easy to 
> write; it simply needs to run some init code on each CPU and the patch 
> paravirt-ops at leisure.
>
> Now it is quite likely at least one developer is going to be assuming 
> hardware virtualization capabilities for 64-bit paravirt, thus making 
> an advisory method with module loading (and unloading) a more 
> practical option than dissecting the 64-bit startup sequence.

I don't agree that having paravirt_ops within a normal module is all 
that useful.  By the time modules can be loaded, the kernel has 
completely booted.  There should only be a handful of paravirt_ops 
implementations and they aren't large so I don't think there's a big 
size savings either.

>   In that case, perhaps having a paravirt_register function which 
> would check to make sure no conflicting paravirt-ops have already been 
> installed, printing the banner on success would be the most logical.  
> The paravirt_unregister function can then simply restore the native 
> paravirt-ops.
>
> More importantly, now device drivers for virtual devices would have a 
> way to inquire into which set of paravirt-ops was loaded by having an 
> official registered interface rather than an ad-hoc (if xxx_running == 
> 1) mess, and now the paravirt driver modules are nicely decoupled from 
> the boot-strap code and can be loaded dynamically.

I'm not familiar with the particular problem here, but I don't think 
that driver modules should be checking to see what paravirt_ops is 
active.  Each VMM has it's own discovery mechanism (KVM and Xen are both 
based on CPUID) so that seems like a much better method to use.

Regards,

Anthony Liguori

> Zach
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4679381E.9090404-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                     ` <4679381E.9090404-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-20 15:32                       ` Jeremy Fitzhardinge
       [not found]                         ` <46794899.6070708-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 15:32 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Zachary Amsden, kvm-devel, virtualization

Anthony Liguori wrote:
> I don't agree that having paravirt_ops within a normal module is all
> that useful.  By the time modules can be loaded, the kernel has
> completely booted.  There should only be a handful of paravirt_ops
> implementations and they aren't large so I don't think there's a big
> size savings either.

It doesn't seem terribly valuable to me either.  But Zach is talking
about something very similar to the kvm case, where you have a fully
virtualized environment (with hardware support), but then you load a
module containing paravirtualized helpers at some late stage which makes
things more efficient but isn't required for functional correctness.

>> More importantly, now device drivers for virtual devices would have a
>> way to inquire into which set of paravirt-ops was loaded by having an
>> official registered interface rather than an ad-hoc (if xxx_running
>> == 1) mess, and now the paravirt driver modules are nicely decoupled
>> from the boot-strap code and can be loaded dynamically.
>
> I'm not familiar with the particular problem here, but I don't think
> that driver modules should be checking to see what paravirt_ops is
> active.  Each VMM has it's own discovery mechanism (KVM and Xen are
> both based on CPUID) so that seems like a much better method to use.

I think he's referring to the xen/kvm/vmi paravirt implementation as a
"driver" here.  I think.

I don't know of any "if (xxx_running)" tests at present.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46794899.6070708-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                         ` <46794899.6070708-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-20 19:35                           ` Zachary Amsden
       [not found]                             ` <46798174.2060304-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 19:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>   
>> I don't agree that having paravirt_ops within a normal module is all
>> that useful.  By the time modules can be loaded, the kernel has
>> completely booted.  There should only be a handful of paravirt_ops
>> implementations and they aren't large so I don't think there's a big
>> size savings either.
>>     
>
> It doesn't seem terribly valuable to me either.  But Zach is talking
> about something very similar to the kvm case, where you have a fully
> virtualized environment (with hardware support), but then you load a
> module containing paravirtualized helpers at some late stage which makes
> things more efficient but isn't required for functional correctness.
>   

Yes, the value isn't the space savings - it's the ability to include 
paravirtualized driver support for Xen, KVM, VMI, lhype - which need not 
be compiled in, but can now be modules in a ramdisk.  The goal is 
minimal effort for a single bootable image which works across native and 
all virtualization environments.

>>> More importantly, now device drivers for virtual devices would have a
>>> way to inquire into which set of paravirt-ops was loaded by having an
>>> official registered interface rather than an ad-hoc (if xxx_running
>>> == 1) mess, and now the paravirt driver modules are nicely decoupled
>>> from the boot-strap code and can be loaded dynamically.
>>>       
>> I'm not familiar with the particular problem here, but I don't think
>> that driver modules should be checking to see what paravirt_ops is
>> active.  Each VMM has it's own discovery mechanism (KVM and Xen are
>> both based on CPUID) so that seems like a much better method to use.
>>     
>
> I think he's referring to the xen/kvm/vmi paravirt implementation as a
> "driver" here.  I think.
>
> I don't know of any "if (xxx_running)" tests at present.
>   

For a VMM which supports both full emulation and para-virtualization, 
testing CPUID leaves is not sufficient to determine applicability of a 
paravirt device driver.  This only indicates the presence of the 
functionality, not the fact that the functionality has been activated.  
For 32-bit Xen, this might be an already assumed fact - but for VMI, 
KVM, and HV assisted Xen, which do support guests running without 
paravirt, you need a way to test whether the particular family of 
paravirt has been activated - for device drivers which assume paravirt 
semantics might well require this activation to work, or need to behave 
differently in an unactivated environment (emulate hypercalls with port 
I/O, for example).

Thus, all the paravirt drivers as modules would need to test if 
(xen_running) or (vmi_enabled) or (kvm_active), and then all these 
symbols need to be exported, and now you have an ad-hoc activation 
detection system for each brand of paravirt.

Better to have a standard interface, IMHO, where a paravirt-ops "parent" 
module gets registered and activated, and then well defined symbols to 
query that activation.  You also have module dependencies between the 
parent and child which are nicely modeled with the module system (xenbus 
dependes on xen, vmitimer depends on vmi, etc..).

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798174.2060304-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                             ` <46798174.2060304-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 19:47                               ` Jeremy Fitzhardinge
  2007-06-20 19:52                               ` Anthony Liguori
  1 sibling, 0 replies; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 19:47 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, virtualization

Zachary Amsden wrote:
> For a VMM which supports both full emulation and para-virtualization, 
> testing CPUID leaves is not sufficient to determine applicability of a 
> paravirt device driver.  This only indicates the presence of the 
> functionality, not the fact that the functionality has been 
> activated.  For 32-bit Xen, this might be an already assumed fact - 
> but for VMI, KVM, and HV assisted Xen, which do support guests running 
> without paravirt, you need a way to test whether the particular family 
> of paravirt has been activated - for device drivers which assume 
> paravirt semantics might well require this activation to work, or need 
> to behave differently in an unactivated environment (emulate 
> hypercalls with port I/O, for example).

paravirt_ops-style paravirtualization and paravirt device drivers are 
more or less completely orthogonal.  An unmodified OS running under hvm 
Xen can still have paravirt drivers which can detect the presence of 
Xenbus and do all the appropriate things.  It would presumably be a 
matter of loading xenbus.ko, which would probe for the presence of the 
Xen device infrastructure, and that in turn would start pulling in the 
appropriate paravirt drivers.  The state (or existence) of struct 
paravirt_ops is immaterial.

> Thus, all the paravirt drivers as modules would need to test if 
> (xen_running) or (vmi_enabled) or (kvm_active), and then all these 
> symbols need to be exported, and now you have an ad-hoc activation 
> detection system for each brand of paravirt.

No, I think not.  Normal bus/device probing should be able to deal with it.

> Better to have a standard interface, IMHO, where a paravirt-ops 
> "parent" module gets registered and activated, and then well defined 
> symbols to query that activation.  You also have module dependencies 
> between the parent and child which are nicely modeled with the module 
> system (xenbus dependes on xen, vmitimer depends on vmi, etc..).

To a large extent that already exists in the device model.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                             ` <46798174.2060304-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  2007-06-20 19:47                               ` Jeremy Fitzhardinge
@ 2007-06-20 19:52                               ` Anthony Liguori
       [not found]                                 ` <46798597.4020903-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 19:52 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Zachary Amsden wrote:
> Jeremy Fitzhardinge wrote:
>> Anthony Liguori wrote:
>>  
>>> I don't agree that having paravirt_ops within a normal module is all
>>> that useful.  By the time modules can be loaded, the kernel has
>>> completely booted.  There should only be a handful of paravirt_ops
>>> implementations and they aren't large so I don't think there's a big
>>> size savings either.
>>>     
>>
>> It doesn't seem terribly valuable to me either.  But Zach is talking
>> about something very similar to the kvm case, where you have a fully
>> virtualized environment (with hardware support), but then you load a
>> module containing paravirtualized helpers at some late stage which makes
>> things more efficient but isn't required for functional correctness.
>>   
>
> Yes, the value isn't the space savings - it's the ability to include 
> paravirtualized driver support for Xen, KVM, VMI, lhype - which need 
> not be compiled in, but can now be modules in a ramdisk.  The goal is 
> minimal effort for a single bootable image which works across native 
> and all virtualization environments.

But what's the value in having it not in the kernel?  Let's take Xen and 
lhype out of the picture because it clearly has to be there for them.  
You have a little less in the kernel now but then your kernel boots more 
slowly.  There's already a noticable difference in boot-time with the 
KVM paravirt_ops implementation.  I imagine there is for VMI too.


>>>> More importantly, now device drivers for virtual devices would have a
>>>> way to inquire into which set of paravirt-ops was loaded by having an
>>>> official registered interface rather than an ad-hoc (if xxx_running
>>>> == 1) mess, and now the paravirt driver modules are nicely decoupled
>>>> from the boot-strap code and can be loaded dynamically.
>>>>       
>>> I'm not familiar with the particular problem here, but I don't think
>>> that driver modules should be checking to see what paravirt_ops is
>>> active.  Each VMM has it's own discovery mechanism (KVM and Xen are
>>> both based on CPUID) so that seems like a much better method to use.
>>>     
>>
>> I think he's referring to the xen/kvm/vmi paravirt implementation as a
>> "driver" here.  I think.
>>
>> I don't know of any "if (xxx_running)" tests at present.
>>   
>
> For a VMM which supports both full emulation and para-virtualization, 
> testing CPUID leaves is not sufficient to determine applicability of a 
> paravirt device driver.  This only indicates the presence of the 
> functionality, not the fact that the functionality has been 
> activated.  For 32-bit Xen, this might be an already assumed fact - 
> but for VMI, KVM, and HV assisted Xen, which do support guests running 
> without paravirt, you need a way to test whether the particular family 
> of paravirt has been activated - for device drivers which assume 
> paravirt semantics might well require this activation to work, or need 
> to behave differently in an unactivated environment (emulate 
> hypercalls with port I/O, for example).

Presumably, this is the job of PV device discovery, not of 
paravirt_ops.  For instance, with Xen, this is the existence of XenBus.  
For KVM, it will probably be a psuedo-device on the PCI bus.

> Thus, all the paravirt drivers as modules would need to test if 
> (xen_running) or (vmi_enabled) or (kvm_active), and then all these 
> symbols need to be exported, and now you have an ad-hoc activation 
> detection system for each brand of paravirt.

In the case of KVM, the paravirt_ops implementation is orthogonal to 
paravirt device drivers.  A PV device driver can happily exist even if 
the paravirt_ops backend isn't activated.  This is assuming that 
hypercalls aren't used btw.  If hypercalls are desirable to use, then 
the paravirt_ops backend would have to EXPORT_GPL the hypercall 
interface.  I imagine returning a specific errno would suffice.

Regards,

Anthony Liguori

> Better to have a standard interface, IMHO, where a paravirt-ops 
> "parent" module gets registered and activated, and then well defined 
> symbols to query that activation.  You also have module dependencies 
> between the parent and child which are nicely modeled with the module 
> system (xenbus dependes on xen, vmitimer depends on vmi, etc..).
>
> Zach
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798597.4020903-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                 ` <46798597.4020903-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-20 20:03                                   ` Zachary Amsden
       [not found]                                     ` <4679882B.7070605-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 20:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Anthony Liguori wrote:
> Zachary Amsden wrote:
>> Jeremy Fitzhardinge wrote:
>>> Anthony Liguori wrote:
>>>  
>>>> I don't agree that having paravirt_ops within a normal module is all
>>>> that useful.  By the time modules can be loaded, the kernel has
>>>> completely booted.  There should only be a handful of paravirt_ops
>>>> implementations and they aren't large so I don't think there's a big
>>>> size savings either.
>>>>     
>>>
>>> It doesn't seem terribly valuable to me either.  But Zach is talking
>>> about something very similar to the kvm case, where you have a fully
>>> virtualized environment (with hardware support), but then you load a
>>> module containing paravirtualized helpers at some late stage which 
>>> makes
>>> things more efficient but isn't required for functional correctness.
>>>   
>>
>> Yes, the value isn't the space savings - it's the ability to include 
>> paravirtualized driver support for Xen, KVM, VMI, lhype - which need 
>> not be compiled in, but can now be modules in a ramdisk.  The goal is 
>> minimal effort for a single bootable image which works across native 
>> and all virtualization environments.
>
> But what's the value in having it not in the kernel?  Let's take Xen 
> and lhype out of the picture because it clearly has to be there for 
> them.  You have a little less in the kernel now but then your kernel 
> boots more slowly.  There's already a noticable difference in 
> boot-time with the KVM paravirt_ops implementation.  I imagine there 
> is for VMI too.

If it isn't compiled in the core kernel, then a distro need not do 
anything special to distribute VMI or KVM support - other than compile 
support for paravirt-ops.  Then the paravirt-ops module can be installed 
along with the guest tools and drivers, but need not be on install media.

Basically, it just makes it easier on distributors and allows any old 
kernel with paravirt-ops module support to run on any modern, new 
hypervisor - that might not have even existed at the time the distro was 
created.

>
> In the case of KVM, the paravirt_ops implementation is orthogonal to 
> paravirt device drivers.  A PV device driver can happily exist even if 
> the paravirt_ops backend isn't activated.  This is assuming that 
> hypercalls aren't used btw.  If hypercalls are desirable to use, then 
> the paravirt_ops backend would have to EXPORT_GPL the hypercall 
> interface.  I imagine returning a specific errno would suffice.

I'm mostly in agreement on that - although making dual hypercall / I/O 
emulated drivers is a bit more work.

Zach


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4679882B.7070605-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                     ` <4679882B.7070605-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 20:16                                       ` Jeremy Fitzhardinge
       [not found]                                         ` <46798B16.5090407-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-20 20:22                                       ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 20:16 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, virtualization

Zachary Amsden wrote:
> Basically, it just makes it easier on distributors and allows any old 
> kernel with paravirt-ops module support to run on any modern, new 
> hypervisor - that might not have even existed at the time the distro 
> was created.

Hey, isn't that what VMI's for? ;)

I'd been thinking about the possibility of allowing the domain builder 
to provide a new paravirt_ops implementation to the booting kernel.  It 
would be akin to a kernel module, in that its built for a specific 
kernel, but obviously run a lot earlier.  But at this point I think the 
idea is too crack-ridden to be taken seriously.

>> In the case of KVM, the paravirt_ops implementation is orthogonal to 
>> paravirt device drivers.  A PV device driver can happily exist even 
>> if the paravirt_ops backend isn't activated.  This is assuming that 
>> hypercalls aren't used btw.  If hypercalls are desirable to use, then 
>> the paravirt_ops backend would have to EXPORT_GPL the hypercall 
>> interface.  I imagine returning a specific errno would suffice.
>
> I'm mostly in agreement on that - although making dual hypercall / I/O 
> emulated drivers is a bit more work. 

Semi-paravirtualized real-hardware drivers seems like a difficult 
mishmash.  I would hope we could deal with it with a virtio-like thing.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798B16.5090407-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                         ` <46798B16.5090407-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-20 20:27                                           ` Anthony Liguori
       [not found]                                             ` <46798D9B.5000400-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 20:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Zachary Amsden, kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>> Basically, it just makes it easier on distributors and allows any old 
>> kernel with paravirt-ops module support to run on any modern, new 
>> hypervisor - that might not have even existed at the time the distro 
>> was created.
>
> Hey, isn't that what VMI's for? ;)
>
> I'd been thinking about the possibility of allowing the domain builder 
> to provide a new paravirt_ops implementation to the booting kernel.  
> It would be akin to a kernel module, in that its built for a specific 
> kernel, but obviously run a lot earlier.  But at this point I think 
> the idea is too crack-ridden to be taken seriously.

I've been thinking about this wrt the hypercall page in KVM.  The 
problem is that in a model like KVM (or presumably VMI), migration gets 
really difficult if you have anything but a trivial hypercall page since 
the hypercall page will change after migration.

If you cannot guarantee the guest isn't executing code within the 
hypercall page (or in your case, doing something with paravirt_ops), 
then you cannot safely migrate.

Regards,

Anthony Liguori

>>> In the case of KVM, the paravirt_ops implementation is orthogonal to 
>>> paravirt device drivers.  A PV device driver can happily exist even 
>>> if the paravirt_ops backend isn't activated.  This is assuming that 
>>> hypercalls aren't used btw.  If hypercalls are desirable to use, 
>>> then the paravirt_ops backend would have to EXPORT_GPL the hypercall 
>>> interface.  I imagine returning a specific errno would suffice.
>>
>> I'm mostly in agreement on that - although making dual hypercall / 
>> I/O emulated drivers is a bit more work. 
>
> Semi-paravirtualized real-hardware drivers seems like a difficult 
> mishmash.  I would hope we could deal with it with a virtio-like thing.
>
>    J
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798D9B.5000400-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                             ` <46798D9B.5000400-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-20 20:33                                               ` Jeremy Fitzhardinge
       [not found]                                                 ` <46798F1A.4090901-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-20 20:43                                               ` Zachary Amsden
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 20:33 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Zachary Amsden, kvm-devel, virtualization

Anthony Liguori wrote:
> I've been thinking about this wrt the hypercall page in KVM.  The 
> problem is that in a model like KVM (or presumably VMI), migration 
> gets really difficult if you have anything but a trivial hypercall 
> page since the hypercall page will change after migration.
>
> If you cannot guarantee the guest isn't executing code within the 
> hypercall page (or in your case, doing something with paravirt_ops), 
> then you cannot safely migrate. 

Hm, you need to quiesce the kernel in some way when you do a migrate, so 
making sure it isn't in a hypercall would be just part of that.  In 
general you'd make sure all but one CPU is parked somewhere, and the 
remaining CPU is doing the suspend, right?

The tricky part for Xen in all this is how to make sure all mfn 
references are visible to the hypervisor/toolstack so they can be 
remapped; hypercall page contents are not a concern by comparison.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798F1A.4090901-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46798F1A.4090901-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-20 20:46                                                   ` Zachary Amsden
       [not found]                                                     ` <4679921A.8090607-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  2007-06-20 22:08                                                   ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 20:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
>
> Hm, you need to quiesce the kernel in some way when you do a migrate, 
> so making sure it isn't in a hypercall would be just part of that.  In 
> general you'd make sure all but one CPU is parked somewhere, and the 
> remaining CPU is doing the suspend, right?
>
> The tricky part for Xen in all this is how to make sure all mfn 
> references are visible to the hypervisor/toolstack so they can be 
> remapped; hypercall page contents are not a concern by comparison.

You only need to quiesce if you have guest-visible data-structures that 
have details about the underlying hardware.  So Xen needs to quiesce, 
but I don't know of any other VMM that would.

VMI, KVM and lhype should be capable of transparent migration without 
guest involvement.

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4679921A.8090607-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                     ` <4679921A.8090607-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 20:55                                                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 20:55 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, virtualization

Zachary Amsden wrote:
> You only need to quiesce if you have guest-visible data-structures 
> that have details about the underlying hardware.  So Xen needs to 
> quiesce, but I don't know of any other VMM that would.
>
> VMI, KVM and lhype should be capable of transparent migration without 
> guest involvement. 

Sure; Xen makes the explicit design decision that suspend/resume/migrate 
are things that the guest is likely to want to have some involvement in 
if its already doing all the paravirt games.  A 
semi-kinda-paravirtualized hvm Xen guest doesn't have to worry too much 
about those kinds of things.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46798F1A.4090901-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-20 20:46                                                   ` Zachary Amsden
@ 2007-06-20 22:08                                                   ` Anthony Liguori
       [not found]                                                     ` <4679A54F.5020908-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 22:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Zachary Amsden, kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>> I've been thinking about this wrt the hypercall page in KVM.  The 
>> problem is that in a model like KVM (or presumably VMI), migration 
>> gets really difficult if you have anything but a trivial hypercall 
>> page since the hypercall page will change after migration.
>>
>> If you cannot guarantee the guest isn't executing code within the 
>> hypercall page (or in your case, doing something with paravirt_ops), 
>> then you cannot safely migrate. 
>
> Hm, you need to quiesce the kernel in some way when you do a migrate, 
> so making sure it isn't in a hypercall would be just part of that.  In 
> general you'd make sure all but one CPU is parked somewhere, and the 
> remaining CPU is doing the suspend, right?

The real trick is doing it without the guest being involved at all.  
Right now, it won't be a problem in KVM since the hypercall page only 
differs by a single instruction across platforms.  In the future, we'll 
have to be smarter and wait for all VCPUs to leave the hypercall page.

> The tricky part for Xen in all this is how to make sure all mfn 
> references are visible to the hypervisor/toolstack so they can be 
> remapped; hypercall page contents are not a concern by comparison.

I don't know HVM save/resume all that well but I think it's a similar 
model where the guest isn't involved.  They may have a similar issue 
when using PV drivers.

Regards,

Anthony Liguori

>    J
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <4679A54F.5020908-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                     ` <4679A54F.5020908-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-20 22:33                                                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 22:33 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Zachary Amsden, kvm-devel, virtualization

Anthony Liguori wrote:
> The real trick is doing it without the guest being involved at all.  
> Right now, it won't be a problem in KVM since the hypercall page only 
> differs by a single instruction across platforms.  In the future, 
> we'll have to be smarter and wait for all VCPUs to leave the hypercall 
> page.

Well, you could just fake the whole acpi suspend/resume ;)

>> The tricky part for Xen in all this is how to make sure all mfn 
>> references are visible to the hypervisor/toolstack so they can be 
>> remapped; hypercall page contents are not a concern by comparison.
>
> I don't know HVM save/resume all that well but I think it's a similar 
> model where the guest isn't involved.

Yes, the guest has little to nothing to do.

> They may have a similar issue when using PV drivers.

Hm, not sure.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                             ` <46798D9B.5000400-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-20 20:33                                               ` Jeremy Fitzhardinge
@ 2007-06-20 20:43                                               ` Zachary Amsden
       [not found]                                                 ` <46799157.3070805-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 20:43 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Anthony Liguori wrote:
> I've been thinking about this wrt the hypercall page in KVM.  The 
> problem is that in a model like KVM (or presumably VMI), migration 
> gets really difficult if you have anything but a trivial hypercall 
> page since the hypercall page will change after migration.
>
> If you cannot guarantee the guest isn't executing code within the 
> hypercall page (or in your case, doing something with paravirt_ops), 
> then you cannot safely migrate.

Unless you also migrate the hypercall page itself and impose migration 
restrictions on compatible hypercall pages.

Although I favor the guarantee that execution within the hypercall page 
is finished - it is important for protecting against non-reentrancy as 
well.  Think about interrupts during batching / queueing operations.

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46799157.3070805-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46799157.3070805-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 20:53                                                   ` Jeremy Fitzhardinge
       [not found]                                                     ` <467993B1.6000307-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-20 22:39                                                   ` Anthony Liguori
  1 sibling, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 20:53 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, virtualization

Zachary Amsden wrote:
> Unless you also migrate the hypercall page itself and impose migration 
> restrictions on compatible hypercall pages.

Seems unreasonable, especially if you support migration between VT and 
SVM machines.  The whole point of a hypercall page is to give you a 
point of indirection in order to hide these kinds of hardware 
differences; migrating it would defeat the purpose.

> Although I favor the guarantee that execution within the hypercall 
> page is finished - it is important for protecting against 
> non-reentrancy as well.  Think about interrupts during batching / 
> queueing operations.

Not quite sure that's specifically relevant to migration, but yes, its 
important to disable interrupts while doing the setup for a batch of 
stuff unless you want to see some surprises in your queue (and not "Oh, 
yay, a puppy!" kind of surprises).

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <467993B1.6000307-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                     ` <467993B1.6000307-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-20 21:08                                                       ` Zachary Amsden
       [not found]                                                         ` <46799755.8060405-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 21:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: kvm-devel, virtualization

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>> Unless you also migrate the hypercall page itself and impose 
>> migration restrictions on compatible hypercall pages.
>
> Seems unreasonable, especially if you support migration between VT and 
> SVM machines.  The whole point of a hypercall page is to give you a 
> point of indirection in order to hide these kinds of hardware 
> differences; migrating it would defeat the purpose.

Migrating across Intel<->AMD is likely to be problematic for many other 
reasons, and in general, migrating between such different hardware (yes, 
different instruction sets even) will probably not be possible in the 
majority of cases.

If I had a gentoo install, I would probably go so far as to want to 
recompile everything after migration across CPU vendors; things like 
NMIs, MSRs, thermal controls and sleep states are also vendor dependent 
and either need to be emulated both ways, re-invented in a new way 
entirely, or just dropped.

I don't think cross-CPU vendor hot migration is particularly compelling, 
although it certainly is possible, the payoff doesn't seem worth the 
implementation cost and you will find a maze of brambly thorns blocking 
your path.

>> Although I favor the guarantee that execution within the hypercall 
>> page is finished - it is important for protecting against 
>> non-reentrancy as well.  Think about interrupts during batching / 
>> queueing operations.
>
> Not quite sure that's specifically relevant to migration, but yes, its 
> important to disable interrupts while doing the setup for a batch of 
> stuff unless you want to see some surprises in your queue (and not 
> "Oh, yay, a puppy!" kind of surprises).

I would argue making the hypercall page atomic is a better solution to 
reentrancy than disable / enable, but perhaps just because it happened 
to work out very nicely for our implementation.  The point is, that also 
solves part of the migration problem.

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46799755.8060405-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                         ` <46799755.8060405-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 21:48                                                           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 21:48 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, virtualization

Zachary Amsden wrote:
> If I had a gentoo install,

Yes, but then you'd be a gentoo user. ;)

> I would probably go so far as to want to recompile everything after 
> migration across CPU vendors; things like NMIs, MSRs, thermal controls 
> and sleep states are also vendor dependent and either need to be 
> emulated both ways, re-invented in a new way entirely, or just dropped.

Many of those things are meaningless for a guest to see or control.

> I don't think cross-CPU vendor hot migration is particularly 
> compelling, although it certainly is possible, the payoff doesn't seem 
> worth the implementation cost and you will find a maze of brambly 
> thorns blocking your path.

We see people trying to do it, and for good reasons.  The selling point 
of VMs is that they can install their thingy once, and it becomes 
independent from its hardware environment, to the extent that they can 
update their hardware without having to worry about its effects on their 
guests.  With a stable OS and live migration, its reasonable to consider 
a guest VM undergoing multiple hardware upgrade transitions with no 
downtime.

I think the general approach is to have a compatible<->performance 
slider which disables/enables non-portable features.  Migrating between 
Intel/AMD is just a slightly more extreme point on the continuum of 
allowing people to migrate between different models within one 
manufacturer's line.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46799157.3070805-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  2007-06-20 20:53                                                   ` Jeremy Fitzhardinge
@ 2007-06-20 22:39                                                   ` Anthony Liguori
  1 sibling, 0 replies; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 22:39 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Zachary Amsden wrote:
> Anthony Liguori wrote:
>> I've been thinking about this wrt the hypercall page in KVM.  The 
>> problem is that in a model like KVM (or presumably VMI), migration 
>> gets really difficult if you have anything but a trivial hypercall 
>> page since the hypercall page will change after migration.
>>
>> If you cannot guarantee the guest isn't executing code within the 
>> hypercall page (or in your case, doing something with paravirt_ops), 
>> then you cannot safely migrate.
>
> Unless you also migrate the hypercall page itself and impose migration 
> restrictions on compatible hypercall pages.

Compatible hypercall pages == identical hypercall pages.  This also 
comes into play with save/restore.  If you save a guest, then upgrade 
the hypervisor, if the new hypervisor uses a different hypercall page 
(perhaps b/c an internal interface has changed), you run into the same 
problem.

> Although I favor the guarantee that execution within the hypercall 
> page is finished - it is important for protecting against 
> non-reentrancy as well.  Think about interrupts during batching / 
> queueing operations.

I think this is pretty straight forward to do without guest 
cooperation.  You merely have to let the guest run a little longer until 
vcpu->eip is not in the hypercall page.  This just becomes part of the 
criteria for convergence in migration.  The guest could get nasty and 
loop hard on calling into the hypercall page or, worse yet, rewrite 
portions of the hypercall page so that execution stayed indefinitely 
there.  That could be mitigated with heuristics but that's probably not 
all that important.

Of course, you also have to guarantee that the hypercall page doesn't 
maintain any state (at least, within the page itself).  That's 
definitely a limiting factor.

Regards,

Anthony Liguori

> Zach
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                     ` <4679882B.7070605-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  2007-06-20 20:16                                       ` Jeremy Fitzhardinge
@ 2007-06-20 20:22                                       ` Anthony Liguori
       [not found]                                         ` <46798C99.8010303-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  1 sibling, 1 reply; 85+ messages in thread
From: Anthony Liguori @ 2007-06-20 20:22 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Zachary Amsden wrote:
> Anthony Liguori wrote:
>> But what's the value in having it not in the kernel?  Let's take Xen 
>> and lhype out of the picture because it clearly has to be there for 
>> them.  You have a little less in the kernel now but then your kernel 
>> boots more slowly.  There's already a noticable difference in 
>> boot-time with the KVM paravirt_ops implementation.  I imagine there 
>> is for VMI too.
>
> If it isn't compiled in the core kernel, then a distro need not do 
> anything special to distribute VMI or KVM support - other than compile 
> support for paravirt-ops.  Then the paravirt-ops module can be 
> installed along with the guest tools and drivers, but need not be on 
> install media.

Typically, distros do not support third-party modules so that's not a 
very useful property.  Further, that just encourages out-of-kernel 
modules and worst yet, binary modules.

In fact, the whole install "guest tools" is fundamentally broken in this 
respect.  Guest tools always end up installing closed source drivers.  
Plus, these things aren't available during distro installation typically 
so you end up with a sucky user experience.

> Basically, it just makes it easier on distributors and allows any old 
> kernel with paravirt-ops module support to run on any modern, new 
> hypervisor - that might not have even existed at the time the distro 
> was created.

Yeah, I'm not buying it.  Is it really that much easier to backport a 
module than it is to just roll out a new kernel for an older distro?

BTW, isn't this the whole point of the VMI ROM? :-)

Regards,

Anthony Liguori

>>
>> In the case of KVM, the paravirt_ops implementation is orthogonal to 
>> paravirt device drivers.  A PV device driver can happily exist even 
>> if the paravirt_ops backend isn't activated.  This is assuming that 
>> hypercalls aren't used btw.  If hypercalls are desirable to use, then 
>> the paravirt_ops backend would have to EXPORT_GPL the hypercall 
>> interface.  I imagine returning a specific errno would suffice.
>
> I'm mostly in agreement on that - although making dual hypercall / I/O 
> emulated drivers is a bit more work.
>
> Zach
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46798C99.8010303-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                         ` <46798C99.8010303-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-06-20 20:37                                           ` Zachary Amsden
       [not found]                                             ` <46799001.5020807-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Zachary Amsden @ 2007-06-20 20:37 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Anthony Liguori wrote:
> Zachary Amsden wrote:
>> Anthony Liguori wrote:
>>> But what's the value in having it not in the kernel?  Let's take Xen 
>>> and lhype out of the picture because it clearly has to be there for 
>>> them.  You have a little less in the kernel now but then your kernel 
>>> boots more slowly.  There's already a noticable difference in 
>>> boot-time with the KVM paravirt_ops implementation.  I imagine there 
>>> is for VMI too.
>>
>> If it isn't compiled in the core kernel, then a distro need not do 
>> anything special to distribute VMI or KVM support - other than 
>> compile support for paravirt-ops.  Then the paravirt-ops module can 
>> be installed along with the guest tools and drivers, but need not be 
>> on install media.
>
> Typically, distros do not support third-party modules so that's not a 
> very useful property.  Further, that just encourages out-of-kernel 
> modules and worst yet, binary modules.
>
> In fact, the whole install "guest tools" is fundamentally broken in 
> this respect.  Guest tools always end up installing closed source 
> drivers.  Plus, these things aren't available during distro 
> installation typically so you end up with a sucky user experience.

Agree.

>
>> Basically, it just makes it easier on distributors and allows any old 
>> kernel with paravirt-ops module support to run on any modern, new 
>> hypervisor - that might not have even existed at the time the distro 
>> was created.
>
> Yeah, I'm not buying it.  Is it really that much easier to backport a 
> module than it is to just roll out a new kernel for an older distro?
>
> BTW, isn't this the whole point of the VMI ROM? :-)

Yes, but if we want to stay with that forward compatibility story, we 
need a way to allow paravirt device probing to be completely orthogonal 
to paravirt-ops probing.  Either the VMware hypervisor needs to NOT 
implement a CPUID leaf, keeping the same ROM based detection, or other 
VMI client drivers (say, as a wild example, a KVM driver running on a 
VMI to KVM paravirt-ops backend) need not to check CPUID leaf as a 
condition of execution.

We at least would like to use a CPUID leaf for the core paravirt-ops on 
64-bit and get rid of the need for ROM probing in that case, which would 
mean we either need a CPUID sub-leaf for the device model, a completely 
identical device model, or completely orthogonal device probing.  Since 
there hasn't been a formal specification for how the device probing 
should work, or, at least, I don't know all the details of how device 
probing works for all the various hypervisors, I worry that weird ad-hoc 
tests could trample the compatibility effort.

The completely identical device model is of course ideal, but the 
implementation and consolidation of that is a long term prospect to move 
towards, not something that will happen immediately.  We at least 
emulate physical hardware devices already, and will continue to need 
drivers compatible with those models for some time.

Zach

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46799001.5020807-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                             ` <46799001.5020807-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2007-06-20 21:07                                               ` Jeremy Fitzhardinge
       [not found]                                                 ` <46799716.9040402-TSDbQ3PG+2Y@public.gmane.org>
  0 siblings, 1 reply; 85+ messages in thread
From: Jeremy Fitzhardinge @ 2007-06-20 21:07 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm-devel, James Bottomley, virtualization, H. Peter Anvin

Zachary Amsden wrote:
> Yes, but if we want to stay with that forward compatibility story, we 
> need a way to allow paravirt device probing to be completely 
> orthogonal to paravirt-ops probing.  Either the VMware hypervisor 
> needs to NOT implement a CPUID leaf, keeping the same ROM based 
> detection, or other VMI client drivers (say, as a wild example, a KVM 
> driver running on a VMI to KVM paravirt-ops backend) need not to check 
> CPUID leaf as a condition of execution.

Yes, this is something that keeps coming up.  hpa originally floated the 
idea of reserving some PCI bus namespace as a gateway for probing for 
virtual/paravirtual devices, and Jun Nakajima proposed it again in the 
context of smart hardware which is virtualization friendly (ie, how to 
represent PCI-IOV to guests).

I'm not wildly happy about the idea of using PCI for probing for 
otherwise completely non-PCI devices, but some kind of probing mechanism 
might be nice in the general case.  Xen deals with it with Xenbus, but I 
figure I'm unlikely to convince everyone to adopt that.

> We at least would like to use a CPUID leaf for the core paravirt-ops 
> on 64-bit and get rid of the need for ROM probing in that case, which 
> would mean we either need a CPUID sub-leaf for the device model, a 
> completely identical device model, or completely orthogonal device 
> probing.

Well, cpuid leaf 0x40000000 seems to be gaining currency as a 
(semi-?)formal way for hypervisors to advertise themselves, so that 
seems completely doable.

>   Since there hasn't been a formal specification for how the device 
> probing should work, or, at least, I don't know all the details of how 
> device probing works for all the various hypervisors, I worry that 
> weird ad-hoc tests could trample the compatibility effort.

Yes.  That's the thinking behind using PCI as a somewhat common 
mechanism for device discovery.  s390 folks hate it, of course.

> The completely identical device model is of course ideal, but the 
> implementation and consolidation of that is a long term prospect to 
> move towards, not something that will happen immediately.  We at least 
> emulate physical hardware devices already, and will continue to need 
> drivers compatible with those models for some time.

Well, physical devices and completely emulated physical devices are 
fairly straightforward - do it like real hardware.  Its the semi-virtual 
devices which pose problems.  Either device emulations with a bit of 
performance paravirtualization sprinkled over them, or virtualization 
friendly devices which allow safe direct guest access, but need some 
paravirtual management interfaces as well.

    J

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

[parent not found: <46799716.9040402-TSDbQ3PG+2Y@public.gmane.org>]

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46799716.9040402-TSDbQ3PG+2Y@public.gmane.org>
@ 2007-06-20 21:27                                                   ` ron minnich
  2007-06-20 21:39                                                   ` H. Peter Anvin
  1 sibling, 0 replies; 85+ messages in thread
From: ron minnich @ 2007-06-20 21:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, kvm-devel, H. Peter Anvin, James Bottomley,
	virtualization

On 6/20/07, Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org> wrote:


> I'm not wildly happy about the idea of using PCI for probing for
> otherwise completely non-PCI devices,

Good :-)

>Xen deals with it with Xenbus, but I
> figure I'm unlikely to convince everyone to adopt that.

Especially those of us who have used it Xenbus :-)

> Yes.  That's the thinking behind using PCI as a somewhat common
> mechanism for device discovery.  s390 folks hate it, of course.

My fear with using PCI is that it is a short-term remedy that we might
find ourselves locked into once we start, and there are lots of things
that will not fit into the PCI model. So we would have picked a
short-term, expedient approach that works for simple cases that breaks
interesting uses.

It's happened in other projects ...

thanks

ron

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]                                                 ` <46799716.9040402-TSDbQ3PG+2Y@public.gmane.org>
  2007-06-20 21:27                                                   ` ron minnich
@ 2007-06-20 21:39                                                   ` H. Peter Anvin
  1 sibling, 0 replies; 85+ messages in thread
From: H. Peter Anvin @ 2007-06-20 21:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, kvm-devel, James Bottomley, virtualization

Jeremy Fitzhardinge wrote:
> 
>> The completely identical device model is of course ideal, but the
>> implementation and consolidation of that is a long term prospect to
>> move towards, not something that will happen immediately.  We at least
>> emulate physical hardware devices already, and will continue to need
>> drivers compatible with those models for some time.
> 
> Well, physical devices and completely emulated physical devices are
> fairly straightforward - do it like real hardware.  Its the semi-virtual
> devices which pose problems.  Either device emulations with a bit of
> performance paravirtualization sprinkled over them, or virtualization
> friendly devices which allow safe direct guest access, but need some
> paravirtual management interfaces as well.
> 

Those can still be detected by appearing in the PCI configuration space,
though.  It doesn't mean they actually have to emulate a PCI device.

One of the "nice" things (from a virtualization perspective) is that
there isn't a pan-architectural way to get to PCI config space, so on
platforms where PCI is irrelevant, it can be implemented as a
virtualization call.

Multistandard devices obviously need to model real hardware more
accurately, since that's the common denominator.

	-hpa

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 0/5] KVM paravirt_ops implementation
       [not found]         ` <4675FDCA.4040006-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-06-18  4:21           ` Jeremy Fitzhardinge
@ 2007-06-19 23:49           ` Zachary Amsden
  1 sibling, 0 replies; 85+ messages in thread
From: Zachary Amsden @ 2007-06-19 23:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jeremy Fitzhardinge, kvm-devel, virtualization

Anthony Liguori wrote:
>
> I don't see a compelling reason to paravirtualize earlier although I 
> also don't see a compelling reason not too.  I noticed that VMI hooks 
> setup.c.  It wasn't immediately obvious why it was hooking there but 
> perhaps it worthwhile to have a common hook?  I suspect VMI and KVM 
> will have a similar model for startup.

VMI would like to engage after kernel early facilities are available, so 
command line arguments have been processed.  It is desirable to activate 
before allocating kernel pagetables in pagetable_init, so the kernel 
page-tables do not need to be discovered, and after all the early boot 
stuff is out of the way so everything can be neatly done in C code.

Zach


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2007-06-26 11:57 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-18  2:56 [PATCH 0/5] KVM paravirt_ops implementation Anthony Liguori
     [not found] ` <4675F462.1010708-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  2:58   ` [PATCH 1/5] KVM paravirt_ops core infrastructure Anthony Liguori
     [not found]     ` <4675F4C3.6050700-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  8:03       ` Avi Kivity
     [not found]         ` <46763C6B.9050004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:25           ` Anthony Liguori
     [not found]             ` <467679C5.6030201-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 12:28               ` Avi Kivity
2007-06-26  8:04       ` Dor Laor
     [not found]         ` <64F9B87B6B770947A9F8391472E032160C73025E-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-06-26  8:45           ` Jun Koi
     [not found]             ` <fdaac4d50706260145x1ebceadt432edd5b6a6ac1f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-06-26 11:57               ` Anthony Liguori
2007-06-26 11:56           ` Anthony Liguori
2007-06-18  2:58   ` [PATCH 2/5] KVM: Implement CR read caching for KVM paravirt_ops Anthony Liguori
     [not found]     ` <4675F4F1.5090207-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  8:05       ` Avi Kivity
     [not found]         ` <46763CD3.3060704-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:26           ` Anthony Liguori
2007-06-18  8:11       ` Avi Kivity
     [not found]         ` <46763E35.8020108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:27           ` Anthony Liguori
2007-06-18  3:00   ` [PATCH 3/5] KVM: Add paravirt MMU write support Anthony Liguori
     [not found]     ` <4675F533.40809-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  8:20       ` Avi Kivity
     [not found]         ` <46764061.9080705-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:33           ` Anthony Liguori
     [not found]             ` <46767B8C.9050001-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 12:38               ` Avi Kivity
     [not found]                 ` <46767CD1.7030208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:48                   ` Anthony Liguori
2007-06-19 21:57           ` Anthony Liguori
     [not found]             ` <46785132.3070505-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-19 22:19               ` Jeremy Fitzhardinge
     [not found]                 ` <4678567C.6040400-TSDbQ3PG+2Y@public.gmane.org>
2007-06-19 22:28                   ` Anthony Liguori
2007-06-18  3:00   ` [PATCH 4/5] KVM: Add hypercall queue for paravirt_ops implementation Anthony Liguori
     [not found]     ` <4675F568.90608-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  4:00       ` Jeremy Fitzhardinge
     [not found]         ` <46760343.5070401-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18  4:09           ` Jeremy Fitzhardinge
2007-06-18 12:22           ` Anthony Liguori
2007-06-18  9:07       ` Avi Kivity
     [not found]         ` <46764B47.5060403-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 12:40           ` Anthony Liguori
     [not found]             ` <46767D47.1010104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 12:50               ` Avi Kivity
     [not found]                 ` <46767F98.70109-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 13:03                   ` Gregory Haskins
     [not found]                     ` <1182171781.4593.38.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-06-18 13:19                       ` Anthony Liguori
     [not found]                         ` <4676867E.1090208-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 13:25                           ` Gregory Haskins
2007-06-18 13:22                   ` Anthony Liguori
     [not found]                     ` <46768724.3000509-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 13:35                       ` Avi Kivity
     [not found]                         ` <46768A3F.2010202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 14:02                           ` Anthony Liguori
     [not found]                             ` <4676905B.6000805-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 15:08                               ` Avi Kivity
     [not found]                                 ` <46769FFE.6040502-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 15:20                                   ` Anthony Liguori
     [not found]                                     ` <4676A2D4.2040704-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 16:01                                       ` Avi Kivity
2007-06-18 16:00                                   ` Avi Kivity
     [not found]                                     ` <4676AC10.3090007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-18 17:47                                       ` Anthony Liguori
2007-06-18  3:03   ` [PATCH 5/5] KVM: paravirt time source Anthony Liguori
     [not found]     ` <4675F601.3090706-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  9:24       ` Avi Kivity
2007-06-18 19:11       ` Jeremy Fitzhardinge
     [not found]         ` <4676D8E4.3020806-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18 21:52           ` Anthony Liguori
     [not found]             ` <4676FEB9.6060308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 22:04               ` Jeremy Fitzhardinge
     [not found]                 ` <46770162.6030101-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18 23:33                   ` Anthony Liguori
     [not found]                     ` <4677163F.2030308-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 23:56                       ` Jeremy Fitzhardinge
     [not found]                         ` <46771BA0.2000308-TSDbQ3PG+2Y@public.gmane.org>
2007-06-19  0:53                           ` Anthony Liguori
2007-06-19  7:44                       ` Avi Kivity
     [not found]                         ` <4677894D.3050500-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-19  8:04                           ` Rusty Russell
2007-06-19 20:38           ` Anthony Liguori
     [not found]             ` <46783EDB.5010808-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-19 21:26               ` Jeremy Fitzhardinge
     [not found]                 ` <46784A0E.3080502-TSDbQ3PG+2Y@public.gmane.org>
2007-06-19 21:38                   ` Anthony Liguori
2007-06-21  7:04               ` Dong, Eddie
2007-06-18  3:19   ` [PATCH 0/5] KVM paravirt_ops implementation Jeremy Fitzhardinge
     [not found]     ` <4675F9DE.6080806-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18  3:36       ` Anthony Liguori
     [not found]         ` <4675FDCA.4040006-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18  4:21           ` Jeremy Fitzhardinge
     [not found]             ` <4676084F.3090901-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18 12:46               ` Anthony Liguori
     [not found]                 ` <46767EB2.4090707-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-18 14:46                   ` Jeremy Fitzhardinge
     [not found]                     ` <46769AD2.9080105-TSDbQ3PG+2Y@public.gmane.org>
2007-06-18 15:07                       ` Anthony Liguori
2007-06-20  0:21               ` Zachary Amsden
     [not found]                 ` <46787325.30804-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 14:22                   ` Anthony Liguori
     [not found]                     ` <4679381E.9090404-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-20 15:32                       ` Jeremy Fitzhardinge
     [not found]                         ` <46794899.6070708-TSDbQ3PG+2Y@public.gmane.org>
2007-06-20 19:35                           ` Zachary Amsden
     [not found]                             ` <46798174.2060304-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 19:47                               ` Jeremy Fitzhardinge
2007-06-20 19:52                               ` Anthony Liguori
     [not found]                                 ` <46798597.4020903-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-20 20:03                                   ` Zachary Amsden
     [not found]                                     ` <4679882B.7070605-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 20:16                                       ` Jeremy Fitzhardinge
     [not found]                                         ` <46798B16.5090407-TSDbQ3PG+2Y@public.gmane.org>
2007-06-20 20:27                                           ` Anthony Liguori
     [not found]                                             ` <46798D9B.5000400-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-20 20:33                                               ` Jeremy Fitzhardinge
     [not found]                                                 ` <46798F1A.4090901-TSDbQ3PG+2Y@public.gmane.org>
2007-06-20 20:46                                                   ` Zachary Amsden
     [not found]                                                     ` <4679921A.8090607-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 20:55                                                       ` Jeremy Fitzhardinge
2007-06-20 22:08                                                   ` Anthony Liguori
     [not found]                                                     ` <4679A54F.5020908-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-20 22:33                                                       ` Jeremy Fitzhardinge
2007-06-20 20:43                                               ` Zachary Amsden
     [not found]                                                 ` <46799157.3070805-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 20:53                                                   ` Jeremy Fitzhardinge
     [not found]                                                     ` <467993B1.6000307-TSDbQ3PG+2Y@public.gmane.org>
2007-06-20 21:08                                                       ` Zachary Amsden
     [not found]                                                         ` <46799755.8060405-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 21:48                                                           ` Jeremy Fitzhardinge
2007-06-20 22:39                                                   ` Anthony Liguori
2007-06-20 20:22                                       ` Anthony Liguori
     [not found]                                         ` <46798C99.8010303-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-06-20 20:37                                           ` Zachary Amsden
     [not found]                                             ` <46799001.5020807-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2007-06-20 21:07                                               ` Jeremy Fitzhardinge
     [not found]                                                 ` <46799716.9040402-TSDbQ3PG+2Y@public.gmane.org>
2007-06-20 21:27                                                   ` ron minnich
2007-06-20 21:39                                                   ` H. Peter Anvin
2007-06-19 23:49           ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).