public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 00/14] PV ticket locks without expanding spinlock
       [not found] <cover.1289940821.git.jeremy.fitzhardinge@citrix.com>
@ 2011-01-19 16:44 ` Srivatsa Vaddagiri
  2011-01-19 17:07   ` [PATCH 1/3] debugfs: Add support to print u32 array Srivatsa Vaddagiri
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 16:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Tue, Nov 16, 2010 at 01:08:31PM -0800, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Hi all,
> 
> This is a revised version of the pvticket lock series.

The 3-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks.

Two hypercalls are being introduced in KVM hypervisor, one that allows a
vcpu (spinning on a lock) to block and another that allows a vcpu to kick
another out of blocking state.

Patches are against 2.6.37 mainline kernel. I also don't yet have numbers 
at this time to show benefit of pv-ticketlocks - I would think the benefit
should be similar across hypervisors (Xen and KVM in this case).

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] debugfs: Add support to print u32 array
  2011-01-19 16:44 ` [PATCH 00/14] PV ticket locks without expanding spinlock Srivatsa Vaddagiri
@ 2011-01-19 17:07   ` Srivatsa Vaddagiri
  2011-01-19 17:12   ` [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock Srivatsa Vaddagiri
  2011-01-19 17:17   ` [PATCH 3/3] kvm guest : Add support for pv-ticketlocks Srivatsa Vaddagiri
  2 siblings, 0 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 17:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, Mathieu Desnoyers, Nick Piggin, kvm, Peter Zijlstra,
	Linux Kernel Mailing List, Jan Beulich, Eric Dumazet,
	Jeremy Fitzhardinge, suzuki, Avi Kivity, H. Peter Anvin,
	Américo Wang, Linux Virtualization

Add debugfs support to print u32-arrays.

Most of this comes from Xen-hypervisor sources, which has been refactored to
make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>

---
 arch/x86/xen/debugfs.c    |  104 --------------------------------------------
 arch/x86/xen/debugfs.h    |    4 -
 arch/x86/xen/mmu.c        |    2 
 arch/x86/xen/multicalls.c |    6 +-
 arch/x86/xen/spinlock.c   |    2 
 fs/debugfs/file.c         |  108 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h   |   11 ++++
 7 files changed, 124 insertions(+), 113 deletions(-)

Index: linux-2.6.37/arch/x86/xen/debugfs.c
===================================================================
--- linux-2.6.37.orig/arch/x86/xen/debugfs.c
+++ linux-2.6.37/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array, data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
Index: linux-2.6.37/arch/x86/xen/debugfs.h
===================================================================
--- linux-2.6.37.orig/arch/x86/xen/debugfs.h
+++ linux-2.6.37/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
Index: linux-2.6.37/arch/x86/xen/mmu.c
===================================================================
--- linux-2.6.37.orig/arch/x86/xen/mmu.c
+++ linux-2.6.37/arch/x86/xen/mmu.c
@@ -2757,7 +2757,7 @@ static int __init xen_mmu_debugfs(void)
 	debugfs_create_u32("mmu_update", 0444, d_mmu_debug, &mmu_stats.mmu_update);
 	debugfs_create_u32("mmu_update_extended", 0444, d_mmu_debug,
 			   &mmu_stats.mmu_update_extended);
-	xen_debugfs_create_u32_array("mmu_update_histo", 0444, d_mmu_debug,
+	debugfs_create_u32_array("mmu_update_histo", 0444, d_mmu_debug,
 				     mmu_stats.mmu_update_histo, 20);
 
 	debugfs_create_u32("set_pte_at", 0444, d_mmu_debug, &mmu_stats.set_pte_at);
Index: linux-2.6.37/arch/x86/xen/multicalls.c
===================================================================
--- linux-2.6.37.orig/arch/x86/xen/multicalls.c
+++ linux-2.6.37/arch/x86/xen/multicalls.c
@@ -269,11 +269,11 @@ static int __init xen_mc_debugfs(void)
 	debugfs_create_u32("hypercalls", 0444, d_mc_debug, &mc_stats.hypercalls);
 	debugfs_create_u32("arg_total", 0444, d_mc_debug, &mc_stats.arg_total);
 
-	xen_debugfs_create_u32_array("batch_histo", 0444, d_mc_debug,
+	debugfs_create_u32_array("batch_histo", 0444, d_mc_debug,
 				     mc_stats.histo, MC_BATCH);
-	xen_debugfs_create_u32_array("hypercall_histo", 0444, d_mc_debug,
+	debugfs_create_u32_array("hypercall_histo", 0444, d_mc_debug,
 				     mc_stats.histo_hypercalls, NHYPERCALLS);
-	xen_debugfs_create_u32_array("flush_reasons", 0444, d_mc_debug,
+	debugfs_create_u32_array("flush_reasons", 0444, d_mc_debug,
 				     mc_stats.flush, FL_N_REASONS);
 
 	return 0;
Index: linux-2.6.37/arch/x86/xen/spinlock.c
===================================================================
--- linux-2.6.37.orig/arch/x86/xen/spinlock.c
+++ linux-2.6.37/arch/x86/xen/spinlock.c
@@ -228,7 +228,7 @@ static int __init xen_spinlock_debugfs(v
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
Index: linux-2.6.37/fs/debugfs/file.c
===================================================================
--- linux-2.6.37.orig/fs/debugfs/file.c
+++ linux-2.6.37/fs/debugfs/file.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/debugfs.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -534,3 +535,110 @@ struct dentry *debugfs_create_blob(const
 	return debugfs_create_file(name, mode, parent, blob, &fops_blob);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+	void *array;
+	unsigned elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, unsigned array_size)
+{
+	size_t ret = 0;
+	unsigned i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *
+format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data =
+			 format_array_alloc("%u", data->array, data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					 file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, unsigned elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
Index: linux-2.6.37/include/linux/debugfs.h
===================================================================
--- linux-2.6.37.orig/include/linux/debugfs.h
+++ linux-2.6.37/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob);
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, unsigned elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -187,6 +191,13 @@ static inline struct dentry *debugfs_cre
 {
 	return ERR_PTR(-ENODEV);
 }
+
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, unsigned elements)
+{
+	return ERR_PTR(-ENODEV);
+}
 
 static inline bool debugfs_initialized(void)
 {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 16:44 ` [PATCH 00/14] PV ticket locks without expanding spinlock Srivatsa Vaddagiri
  2011-01-19 17:07   ` [PATCH 1/3] debugfs: Add support to print u32 array Srivatsa Vaddagiri
@ 2011-01-19 17:12   ` Srivatsa Vaddagiri
  2011-01-19 17:21     ` Peter Zijlstra
  2011-01-19 17:23     ` Srivatsa Vaddagiri
  2011-01-19 17:17   ` [PATCH 3/3] kvm guest : Add support for pv-ticketlocks Srivatsa Vaddagiri
  2 siblings, 2 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 17:12 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

Add two hypercalls to KVM hypervisor to support pv-ticketlocks.

KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
is woken up because of an event like interrupt.

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.

The presence of these hypercalls is indicated to guest via
KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
patch to pass up the presence of this feature to guest via cpuid. Patch to qemu
will be sent separately.


Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>

---
 arch/x86/include/asm/kvm_para.h |    2 +
 arch/x86/kvm/x86.c              |   69 +++++++++++++++++++++++++++++++++++++++-
 include/linux/kvm.h             |    1 
 include/linux/kvm_host.h        |    1 
 include/linux/kvm_para.h        |    2 +
 virt/kvm/kvm_main.c             |    1 
 6 files changed, 75 insertions(+), 1 deletion(-)

Index: linux-2.6.37/arch/x86/include/asm/kvm_para.h
===================================================================
--- linux-2.6.37.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6.37/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,8 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+#define KVM_FEATURE_WAIT_FOR_KICK       4
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
Index: linux-2.6.37/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6.37.orig/arch/x86/kvm/x86.c
+++ linux-2.6.37/arch/x86/kvm/x86.c
@@ -1922,6 +1922,7 @@ int kvm_dev_ioctl_check_extension(long e
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
+	case KVM_CAP_WAIT_FOR_KICK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2340,7 +2341,8 @@ static void do_cpuid_ent(struct kvm_cpui
 		entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_WAIT_FOR_KICK);
 		entry->ebx = 0;
 		entry->ecx = 0;
 		entry->edx = 0;
@@ -4766,6 +4768,63 @@ int kvm_hv_hypercall(struct kvm_vcpu *vc
 	return 1;
 }
 
+/*
+ * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
+ * hypercall or a event like interrupt.
+ *
+ * @vcpu : vcpu which is blocking.
+ */
+static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
+{
+	DEFINE_WAIT(wait);
+
+	/*
+	 * Blocking on vcpu->wq allows us to wake up sooner if required to
+	 * service pending events (like interrupts).
+	 *
+	 * Also set state to TASK_INTERRUPTIBLE before checking vcpu->kicked to
+	 * avoid racing with kvm_pv_kick_cpu_op().
+	 */
+	prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+	do {
+		/*
+		 * Somebody has already tried kicking us. Acknowledge that
+		 * and terminate the wait.
+		 */
+		if (vcpu->kicked) {
+			vcpu->kicked = 0;
+			break;
+		}
+
+		/* Let's wait for either KVM_HC_KICK_CPU or someother event
+		 * to wake us up.
+		 */
+
+		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+		schedule();
+		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	} while (0);
+
+	finish_wait(&vcpu->wq, &wait);
+}
+
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @cpu - vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+	struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
+
+	if (vcpu) {
+		vcpu->kicked = 1;
+		wake_up_interruptible(&vcpu->wq);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -4802,6 +4861,14 @@ int kvm_emulate_hypercall(struct kvm_vcp
 	case KVM_HC_MMU_OP:
 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
 		break;
+	case KVM_HC_WAIT_FOR_KICK:
+		kvm_pv_wait_for_kick_op(vcpu);
+		ret = 0;
+		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
Index: linux-2.6.37/include/linux/kvm.h
===================================================================
--- linux-2.6.37.orig/include/linux/kvm.h
+++ linux-2.6.37/include/linux/kvm.h
@@ -540,6 +540,7 @@ struct kvm_ppc_pvinfo {
 #endif
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
+#define KVM_CAP_WAIT_FOR_KICK 59
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
Index: linux-2.6.37/include/linux/kvm_host.h
===================================================================
--- linux-2.6.37.orig/include/linux/kvm_host.h
+++ linux-2.6.37/include/linux/kvm_host.h
@@ -105,6 +105,7 @@ struct kvm_vcpu {
 #endif
 
 	struct kvm_vcpu_arch arch;
+	int kicked;
 };
 
 /*
Index: linux-2.6.37/include/linux/kvm_para.h
===================================================================
--- linux-2.6.37.orig/include/linux/kvm_para.h
+++ linux-2.6.37/include/linux/kvm_para.h
@@ -19,6 +19,8 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_WAIT_FOR_KICK            5
+#define KVM_HC_KICK_CPU		        6
 
 /*
  * hypercalls use architecture specific
Index: linux-2.6.37/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.37.orig/virt/kvm/kvm_main.c
+++ linux-2.6.37/virt/kvm/kvm_main.c
@@ -185,6 +185,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu,
 	vcpu->cpu = -1;
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
+	vcpu->kicked = 0;
 	init_waitqueue_head(&vcpu->wq);
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);

 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/3] kvm guest : Add support for pv-ticketlocks
  2011-01-19 16:44 ` [PATCH 00/14] PV ticket locks without expanding spinlock Srivatsa Vaddagiri
  2011-01-19 17:07   ` [PATCH 1/3] debugfs: Add support to print u32 array Srivatsa Vaddagiri
  2011-01-19 17:12   ` [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock Srivatsa Vaddagiri
@ 2011-01-19 17:17   ` Srivatsa Vaddagiri
  2 siblings, 0 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 17:17 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

This patch extends Linux guests running on KVM hypervisor to support
pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if 
the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support 
pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>

---
 arch/x86/Kconfig                |    9 +
 arch/x86/include/asm/kvm_para.h |    8 +
 arch/x86/kernel/head64.c        |    3 
 arch/x86/kernel/kvm.c           |  208 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 228 insertions(+)

Index: linux-2.6.37/arch/x86/Kconfig
===================================================================
--- linux-2.6.37.orig/arch/x86/Kconfig
+++ linux-2.6.37/arch/x86/Kconfig
@@ -508,6 +508,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT
Index: linux-2.6.37/arch/x86/include/asm/kvm_para.h
===================================================================
--- linux-2.6.37.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6.37/arch/x86/include/asm/kvm_para.h
@@ -162,8 +162,16 @@ static inline unsigned int kvm_arch_para
 
 #ifdef CONFIG_KVM_GUEST
 void __init kvm_guest_init(void);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_guest_early_init(void);
+#else
+#define kvm_guest_early_init() do { } while (0)
+#endif
+
 #else
 #define kvm_guest_init() do { } while (0)
+#define kvm_guest_early_init() do { } while (0)
 #endif
 
 #endif /* __KERNEL__ */
Index: linux-2.6.37/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.37.orig/arch/x86/kernel/head64.c
+++ linux-2.6.37/arch/x86/kernel/head64.c
@@ -13,6 +13,7 @@
 #include <linux/start_kernel.h>
 #include <linux/io.h>
 #include <linux/memblock.h>
+#include <linux/kvm_para.h>
 
 #include <asm/processor.h>
 #include <asm/proto.h>
@@ -118,6 +119,8 @@ void __init x86_64_start_reservations(ch
 
 	reserve_ebda_region();
 
+	kvm_guest_early_init();
+
 	/*
 	 * At this point everything still needed from the boot loader
 	 * or BIOS or kernel text should be early reserved or marked not
Index: linux-2.6.37/arch/x86/kernel/kvm.c
===================================================================
--- linux-2.6.37.orig/arch/x86/kernel/kvm.c
+++ linux-2.6.37/arch/x86/kernel/kvm.c
@@ -238,3 +238,211 @@ void __init kvm_guest_init(void)
 
 	paravirt_ops_setup();
 }
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+#include <linux/debugfs.h>
+#include <linux/sched.h>
+
+static struct kvm_spinlock_stats
+{
+	u32 taken_slow;
+	u32 taken_slow_pickup;
+
+	u32 released_slow;
+	u32 released_slow_kicked;
+
+#define HISTO_BUCKETS	30
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	if (unlikely(zero_stats)) {
+		memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+		zero_stats = 0;
+	}
+}
+
+#define ADD_STATS(elem, val)			\
+	do { check_zero(); spinlock_stats.elem += (val); } while (0)
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index = ilog2(delta);
+
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta = sched_clock() - start;
+
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm = kvm_init_debugfs();
+
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+			   &spinlock_stats.taken_slow);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+			   &spinlock_stats.taken_slow_pickup);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+			   &spinlock_stats.released_slow);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+			   &spinlock_stats.released_slow_kicked);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+#define ADD_STATS(elem, val)	do { (void)(val); } while (0)
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static inline void kvm_wait_for_kick(void)
+{
+	kvm_hypercall0(KVM_HC_WAIT_FOR_KICK);
+}
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, unsigned want)
+{
+	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
+	u64 start;
+
+	start = spin_time_start();
+
+	w->want = want;
+	w->lock = lock;
+
+	ADD_STATS(taken_slow, 1);
+
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	/* Mark entry to slowpath before doing the pickup test to make
+	   sure we don't deadlock with an unlocker. */
+	__ticket_enter_slowpath(lock);
+
+	/* check again make sure it didn't become free while
+	   we weren't looking  */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		ADD_STATS(taken_slow_pickup, 1);
+		goto out;
+	}
+
+	kvm_wait_for_kick();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu */
+static inline void kvm_kick_cpu(int cpu)
+{
+	kvm_hypercall1(KVM_HC_KICK_CPU, cpu);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, unsigned ticket)
+{
+	int cpu;
+
+	ADD_STATS(released_slow, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (w->lock == lock && w->want == ticket) {
+			ADD_STATS(released_slow_kicked, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_WAIT_FOR_KICK if present.
+ * This needs to be setup really early in boot, before the first call to
+ * spinlock is issued!
+ */
+void __init kvm_guest_early_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	/* Does host kernel support KVM_FEATURE_WAIT_FOR_KICK? */
+	if (!kvm_para_has_feature(KVM_FEATURE_WAIT_FOR_KICK))
+		return;
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 17:12   ` [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock Srivatsa Vaddagiri
@ 2011-01-19 17:21     ` Peter Zijlstra
  2011-01-19 18:29       ` Srivatsa Vaddagiri
  2011-01-19 18:53       ` Jeremy Fitzhardinge
  2011-01-19 17:23     ` Srivatsa Vaddagiri
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2011-01-19 17:21 UTC (permalink / raw)
  To: vatsa
  Cc: Xen-devel, Desnoyers, Nick Piggin, Jeremy Fitzhardinge, Beulich,
	Linux Kernel Mailing List, Fitzhardinge, suzuki, Avi Kivity, kvm,
	H. Peter Anvin, Américo Wang, Eric Dumazet, Virtualization

On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> 
> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> is woken up because of an event like interrupt.
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
> patch to pass up the presence of this feature to guest via cpuid. Patch to qemu
> will be sent separately.


I didn't really read the patch, and I totally forgot everything from
when I looked at the Xen series, but does the Xen/KVM hypercall
interface for this include the vcpu to await the kick from?

My guess is not, since the ticket locks used don't know who the owner
is, which is of course, sad. There are FIFO spinlock implementations
that can do this though.. although I think they all have a bigger memory
footprint.

The reason for wanting this should be clear I guess, it allows PI.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 17:12   ` [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock Srivatsa Vaddagiri
  2011-01-19 17:21     ` Peter Zijlstra
@ 2011-01-19 17:23     ` Srivatsa Vaddagiri
  2011-01-19 17:50       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 17:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote:
> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> 
> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> is woken up because of an event like interrupt.

One possibility is to extend this hypercall to do a directed yield as well, 
which needs some more thought. Another issue that needs to be resolved with
pv-ticketlocks is the impact on intra-VM fairness. A guest experiencing heavy
contention can keep yielding cpu, allowing other VMs to get more time than they
deserve.

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 17:23     ` Srivatsa Vaddagiri
@ 2011-01-19 17:50       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2011-01-19 17:50 UTC (permalink / raw)
  To: vatsa
  Cc: Xen-devel, Desnoyers, Nick Piggin, Jeremy Fitzhardinge, Beulich,
	Linux Kernel Mailing List, Fitzhardinge, suzuki, Avi Kivity, kvm,
	H. Peter Anvin, Américo Wang, Eric Dumazet, Virtualization

On Wed, 2011-01-19 at 22:53 +0530, Srivatsa Vaddagiri wrote:
> On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote:
> > Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> > 
> > KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> > is woken up because of an event like interrupt.
> 
> One possibility is to extend this hypercall to do a directed yield as well, 
> which needs some more thought. Another issue that needs to be resolved with
> pv-ticketlocks is the impact on intra-VM fairness. A guest experiencing heavy
> contention can keep yielding cpu, allowing other VMs to get more time than they
> deserve.

No, yield sucks, if you know the owner you can do actual PI.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 17:21     ` Peter Zijlstra
@ 2011-01-19 18:29       ` Srivatsa Vaddagiri
  2011-01-19 18:53       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-19 18:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jeremy Fitzhardinge, Xen-devel, Mathieu Desnoyers, Nick Piggin,
	kvm, Linux Kernel Mailing List, Jan Beulich, Eric Dumazet,
	Jeremy Fitzhardinge, suzuki, Avi Kivity, H. Peter Anvin,
	Américo Wang, Linux Virtualization

On Wed, Jan 19, 2011 at 06:21:12PM +0100, Peter Zijlstra wrote:
> I didn't really read the patch, and I totally forgot everything from
> when I looked at the Xen series, but does the Xen/KVM hypercall
> interface for this include the vcpu to await the kick from?

No not yet, for reasons you mention below.

> My guess is not, since the ticket locks used don't know who the owner
> is, which is of course, sad. There are FIFO spinlock implementations
> that can do this though.. although I think they all have a bigger memory
> footprint.
> 
> The reason for wanting this should be clear I guess, it allows PI.

Yeah - that would be good to experiment with.

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 17:21     ` Peter Zijlstra
  2011-01-19 18:29       ` Srivatsa Vaddagiri
@ 2011-01-19 18:53       ` Jeremy Fitzhardinge
  2011-01-20 11:42         ` Srivatsa Vaddagiri
  2011-01-20 11:59         ` Srivatsa Vaddagiri
  1 sibling, 2 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-19 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: vatsa, Linux Kernel Mailing List, Nick Piggin, Mathieu Desnoyers,
	Américo Wang, Eric Dumazet, Jan Beulich, Avi Kivity,
	Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On 01/19/2011 09:21 AM, Peter Zijlstra wrote:
> On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
>> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
>>
>> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
>> is woken up because of an event like interrupt.
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
>> patch to pass up the presence of this feature to guest via cpuid. Patch to qemu
>> will be sent separately.
>
> I didn't really read the patch, and I totally forgot everything from
> when I looked at the Xen series, but does the Xen/KVM hypercall
> interface for this include the vcpu to await the kick from?
>
> My guess is not, since the ticket locks used don't know who the owner
> is, which is of course, sad. There are FIFO spinlock implementations
> that can do this though.. although I think they all have a bigger memory
> footprint.

At least in the Xen code, a current owner isn't very useful, because we
need the current owner to kick the *next* owner to life at release time,
which we can't do without some structure recording which ticket belongs
to which cpu.

(A reminder: the big problem with ticket locks is not with the current
owner getting preempted, but making sure the next VCPU gets scheduled
quickly when the current one releases; without that all the waiting
VCPUs burn the timeslices until the VCPU scheduler gets around to
scheduling the actual next in line.)

At present, the code needs to scan an array of percpu "I am waiting on
lock X with ticket Y" structures to work out who's next.  The search is
somewhat optimised by keeping a cpuset of which CPUs are actually
blocked on spinlocks, but its still going to scale badly with lots of CPUs.

I haven't thought of a good way to improve on this; an obvious approach
is to just add a pointer to the spinlock and hang an explicit linked
list off it, but that's incompatible with wanting to avoid expanding the
lock.

You could have a table of auxiliary per-lock data hashed on the lock
address, but its not clear to me that its an improvement on the array
approach, especially given the synchronization issues of keeping that
structure up to date (do we have a generic lockless hashtable
implementation?).  But perhaps its one of those things that makes sense
at larger scales.

> The reason for wanting this should be clear I guess, it allows PI.

Well, if we can expand the spinlock to include an owner, then all this
becomes moot...

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 18:53       ` Jeremy Fitzhardinge
@ 2011-01-20 11:42         ` Srivatsa Vaddagiri
  2011-01-20 17:49           ` Jeremy Fitzhardinge
  2011-01-20 11:59         ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-20 11:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote:
> > The reason for wanting this should be clear I guess, it allows PI.
> 
> Well, if we can expand the spinlock to include an owner, then all this
> becomes moot...

How so? Having an owner will not eliminate the need for pv-ticketlocks
afaict. We still need a mechanism to reduce latency in scheduling the next
thread-in-waiting, which is achieved by your patches. 

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-19 18:53       ` Jeremy Fitzhardinge
  2011-01-20 11:42         ` Srivatsa Vaddagiri
@ 2011-01-20 11:59         ` Srivatsa Vaddagiri
  2011-01-20 13:41           ` Peter Zijlstra
  2011-01-20 17:56           ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-20 11:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote:
> > I didn't really read the patch, and I totally forgot everything from
> > when I looked at the Xen series, but does the Xen/KVM hypercall
> > interface for this include the vcpu to await the kick from?
> >
> > My guess is not, since the ticket locks used don't know who the owner
> > is, which is of course, sad. There are FIFO spinlock implementations
> > that can do this though.. although I think they all have a bigger memory
> > footprint.
> 
> At least in the Xen code, a current owner isn't very useful, because we
> need the current owner to kick the *next* owner to life at release time,
> which we can't do without some structure recording which ticket belongs
> to which cpu.

If we had a yield-to [1] sort of interface _and_ information on which vcpu
owns a lock, then lock-spinners can yield-to the owning vcpu, while the
unlocking vcpu can yield-to the next-vcpu-in-waiting. The key here is not to
sleep when waiting for locks (as implemented by current patch-series, which can 
put other VMs at an advantage by giving them more time than they are entitled 
to) and also to ensure that lock-owner as well as the next-in-line lock-owner 
are not unduly made to wait for cpu. 

Is there a way we can dynamically expand the size of lock only upon contention
to include additional information like owning vcpu? Have the lock point to a
per-cpu area upon contention where additional details can be stored perhaps?

1. https://lkml.org/lkml/2011/1/14/44

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-20 11:59         ` Srivatsa Vaddagiri
@ 2011-01-20 13:41           ` Peter Zijlstra
  2011-01-20 14:34             ` Srivatsa Vaddagiri
  2011-01-20 17:56           ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2011-01-20 13:41 UTC (permalink / raw)
  To: vatsa
  Cc: Xen-devel, Desnoyers, Nick Piggin, Jeremy Fitzhardinge, Beulich,
	Linux Kernel Mailing List, Fitzhardinge, suzuki, Avi Kivity, kvm,
	H. Peter Anvin, Américo Wang, Eric Dumazet, Virtualization

On Thu, 2011-01-20 at 17:29 +0530, Srivatsa Vaddagiri wrote:
> 
> If we had a yield-to [1] sort of interface _and_ information on which vcpu
> owns a lock, then lock-spinners can yield-to the owning vcpu, 

and then I'd nak it for being stupid ;-)

really, yield*() is retarded, never even consider using it. If you've
got the actual owner you can do full blown PI, which is tons better than
a 'do-something-random' call.

The only reason the whole non-virt pause loop filtering muck uses it is
because it really doesn't know anything, and do-something is pretty much
all it can do. Its a broken interface.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-20 13:41           ` Peter Zijlstra
@ 2011-01-20 14:34             ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-20 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Thu, Jan 20, 2011 at 02:41:46PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-01-20 at 17:29 +0530, Srivatsa Vaddagiri wrote:
> > 
> > If we had a yield-to [1] sort of interface _and_ information on which vcpu
> > owns a lock, then lock-spinners can yield-to the owning vcpu, 
> 
> and then I'd nak it for being stupid ;-)
> 
> really, yield*() is retarded, never even consider using it. If you've
> got the actual owner you can do full blown PI, which is tons better than
> a 'do-something-random' call.

Yes definitely that would be much better than yield-to.

> The only reason the whole non-virt pause loop filtering muck uses it is
> because it really doesn't know anything, and do-something is pretty much
> all it can do. Its a broken interface.

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-20 11:42         ` Srivatsa Vaddagiri
@ 2011-01-20 17:49           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-20 17:49 UTC (permalink / raw)
  To: vatsa
  Cc: Xen-devel, Mathieu Desnoyers, Nick Piggin, kvm, Peter Zijlstra,
	Linux Kernel Mailing List, Jan Beulich, Eric Dumazet,
	Jeremy Fitzhardinge, suzuki, Avi Kivity, H. Peter Anvin,
	Américo Wang, Linux Virtualization

On 01/20/2011 03:42 AM, Srivatsa Vaddagiri wrote:
> On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote:
>>> The reason for wanting this should be clear I guess, it allows PI.
>> Well, if we can expand the spinlock to include an owner, then all this
>> becomes moot...
> How so? Having an owner will not eliminate the need for pv-ticketlocks
> afaict. We still need a mechanism to reduce latency in scheduling the next
> thread-in-waiting, which is achieved by your patches. 

No, sorry, I should have been clearer.  I meant that going to the effort
of not increasing the lock size to record "in slowpath" state.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-20 11:59         ` Srivatsa Vaddagiri
  2011-01-20 13:41           ` Peter Zijlstra
@ 2011-01-20 17:56           ` Jeremy Fitzhardinge
  2011-01-21 14:02             ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-20 17:56 UTC (permalink / raw)
  To: vatsa
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On 01/20/2011 03:59 AM, Srivatsa Vaddagiri wrote:
>> At least in the Xen code, a current owner isn't very useful, because we
>> need the current owner to kick the *next* owner to life at release time,
>> which we can't do without some structure recording which ticket belongs
>> to which cpu.
> If we had a yield-to [1] sort of interface _and_ information on which vcpu
> owns a lock, then lock-spinners can yield-to the owning vcpu, while the
> unlocking vcpu can yield-to the next-vcpu-in-waiting.

Perhaps, but the core problem is how to find "next-vcpu-in-waiting"
efficiently.  Once you have that info, there's a number of things you
can usefully do with it.

>  The key here is not to
> sleep when waiting for locks (as implemented by current patch-series, which can 
> put other VMs at an advantage by giving them more time than they are entitled 
> to)

Why?  If a VCPU can't make progress because its waiting for some
resource, then why not schedule something else instead?  Presumably when
the VCPU does become runnable, the scheduler will credit its previous
blocked state and let it run in preference to something else.

>  and also to ensure that lock-owner as well as the next-in-line lock-owner 
> are not unduly made to wait for cpu. 
>
> Is there a way we can dynamically expand the size of lock only upon contention
> to include additional information like owning vcpu? Have the lock point to a
> per-cpu area upon contention where additional details can be stored perhaps?

As soon as you add a pointer to the lock, you're increasing its size. 
If we had a pointer in there already, then all of this would be moot.

If auxiliary per-lock is uncommon, then using a hash keyed on lock
address would be one way to do it.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-20 17:56           ` Jeremy Fitzhardinge
@ 2011-01-21 14:02             ` Srivatsa Vaddagiri
  2011-01-21 14:48               ` Rik van Riel
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-21 14:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On Thu, Jan 20, 2011 at 09:56:27AM -0800, Jeremy Fitzhardinge wrote:
> >  The key here is not to
> > sleep when waiting for locks (as implemented by current patch-series, which can 
> > put other VMs at an advantage by giving them more time than they are entitled 
> > to)
> 
> Why?  If a VCPU can't make progress because its waiting for some
> resource, then why not schedule something else instead?

In the process, "something else" can get more share of cpu resource than its 
entitled to and that's where I was bit concerned. I guess one could
employ hard-limits to cap "something else's" bandwidth where it is of real 
concern (like clouds).

> Presumably when
> the VCPU does become runnable, the scheduler will credit its previous
> blocked state and let it run in preference to something else.

which may not be sufficient for it to gain back bandwidth lost while blocked
(speaking of mainline scheduler atleast).

> > Is there a way we can dynamically expand the size of lock only upon contention
> > to include additional information like owning vcpu? Have the lock point to a
> > per-cpu area upon contention where additional details can be stored perhaps?
> 
> As soon as you add a pointer to the lock, you're increasing its size. 

I didn't really mean to expand size statically. Rather have some bits of the 
lock word store pointer to a per-cpu area when there is contention (somewhat 
similar to how bits of rt_mutex.owner are used). I haven't thought thr' this in
detail to see if that is possible though.

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-21 14:02             ` Srivatsa Vaddagiri
@ 2011-01-21 14:48               ` Rik van Riel
  2011-01-22  6:14                 ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2011-01-21 14:48 UTC (permalink / raw)
  To: vatsa
  Cc: Jeremy Fitzhardinge, Xen-devel, Nick Piggin, kvm, Peter Zijlstra,
	H. Peter Anvin, Linux Kernel Mailing List, Jan Beulich,
	Eric Dumazet, Jeremy Fitzhardinge, suzuki, Avi Kivity, Wang,
	Mathieu Desnoyers, Linux Virtualization

On 01/21/2011 09:02 AM, Srivatsa Vaddagiri wrote:
> On Thu, Jan 20, 2011 at 09:56:27AM -0800, Jeremy Fitzhardinge wrote:
>>>   The key here is not to
>>> sleep when waiting for locks (as implemented by current patch-series, which can
>>> put other VMs at an advantage by giving them more time than they are entitled
>>> to)
>>
>> Why?  If a VCPU can't make progress because its waiting for some
>> resource, then why not schedule something else instead?
>
> In the process, "something else" can get more share of cpu resource than its
> entitled to and that's where I was bit concerned. I guess one could
> employ hard-limits to cap "something else's" bandwidth where it is of real
> concern (like clouds).

I'd like to think I fixed those things in my yield_task_fair +
yield_to + kvm_vcpu_on_spin patch series from yesterday.

https://lkml.org/lkml/2011/1/20/403

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-21 14:48               ` Rik van Riel
@ 2011-01-22  6:14                 ` Srivatsa Vaddagiri
  2011-01-22 14:53                   ` Rik van Riel
  0 siblings, 1 reply; 20+ messages in thread
From: Srivatsa Vaddagiri @ 2011-01-22  6:14 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Jeremy Fitzhardinge, Xen-devel, Mathieu Desnoyers, Nick Piggin,
	kvm, Peter Zijlstra, Linux Kernel Mailing List, Jan Beulich,
	Eric Dumazet, Jeremy Fitzhardinge, suzuki, Avi Kivity,
	H. Peter Anvin, Américo Wang, Linux Virtualization

On Fri, Jan 21, 2011 at 09:48:29AM -0500, Rik van Riel wrote:
> >>Why?  If a VCPU can't make progress because its waiting for some
> >>resource, then why not schedule something else instead?
> >
> >In the process, "something else" can get more share of cpu resource than its
> >entitled to and that's where I was bit concerned. I guess one could
> >employ hard-limits to cap "something else's" bandwidth where it is of real
> >concern (like clouds).
> 
> I'd like to think I fixed those things in my yield_task_fair +
> yield_to + kvm_vcpu_on_spin patch series from yesterday.

Speaking of the spinlock-in-virtualized-environment problem as whole, IMHO
I don't think that kvm_vcpu_on_spin + yield changes will provide the best
results, especially where ticketlocks are involved and they are paravirtualized 
in a manner being discussed in this thread. An important focus of pv-ticketlocks
is to reduce the lock _acquisition_ time by ensuring that the next-in-line 
vcpu gets to run asap when a ticket lock is released. With the way 
kvm_vcpu_on_spin+yield_to is implemented, I don't see how we can provide the 
best lock acquisition times for threads. It would be nice though to compare 
the two approaches (kvm_vcpu_on_spin optimization and the pv-ticketlock scheme) 
to get some real-world numbers. I unfortunately don't have access to a PLE
capable hardware which is required to test your kvm_vcpu_on_spin changes?

Also it may be possible for the pv-ticketlocks to track owning vcpu and make use
of a yield-to interface as further optimization to avoid the 
"others-get-more-time" problem, but Peterz rightly pointed that PI would be a 
better solution there than yield-to. So overall IMO kvm_vcpu_on_spin+yield_to
could be the best solution for unmodified guests, while paravirtualized
ticketlocks + some sort of PI would be a better solution where we have the
luxury of modifying guest sources!

- vatsa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-22  6:14                 ` Srivatsa Vaddagiri
@ 2011-01-22 14:53                   ` Rik van Riel
  2011-01-24 17:49                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2011-01-22 14:53 UTC (permalink / raw)
  To: vatsa
  Cc: Jeremy Fitzhardinge, Peter Zijlstra, Linux Kernel Mailing List,
	Nick Piggin, Mathieu Desnoyers, Américo Wang, Eric Dumazet,
	Jan Beulich, Avi Kivity, Xen-devel, H. Peter Anvin,
	Linux Virtualization, Jeremy Fitzhardinge, kvm, suzuki

On 01/22/2011 01:14 AM, Srivatsa Vaddagiri wrote:

> Also it may be possible for the pv-ticketlocks to track owning vcpu and make use
> of a yield-to interface as further optimization to avoid the
> "others-get-more-time" problem, but Peterz rightly pointed that PI would be a
> better solution there than yield-to. So overall IMO kvm_vcpu_on_spin+yield_to
> could be the best solution for unmodified guests, while paravirtualized
> ticketlocks + some sort of PI would be a better solution where we have the
> luxury of modifying guest sources!

Agreed, for unmodified guests (which is what people will mostly be
running for the next couple of years), we have little choice but
to use PLE + kvm_vcpu_on_spin + yield_to.

The main question that remains is whether the PV ticketlocks are
a large enough improvement to also merge those.  I expect they
will be, and we'll see so in the benchmark numbers.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock
  2011-01-22 14:53                   ` Rik van Riel
@ 2011-01-24 17:49                     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 17:49 UTC (permalink / raw)
  To: Rik van Riel
  Cc: vatsa, Peter Zijlstra, Linux Kernel Mailing List, Nick Piggin,
	Mathieu Desnoyers, Américo Wang, Eric Dumazet, Jan Beulich,
	Avi Kivity, Xen-devel, H. Peter Anvin, Linux Virtualization,
	Jeremy Fitzhardinge, kvm, suzuki

On 01/22/2011 06:53 AM, Rik van Riel wrote:
> The main question that remains is whether the PV ticketlocks are
> a large enough improvement to also merge those.  I expect they
> will be, and we'll see so in the benchmark numbers.

The pathological worst-case of ticket locks in a virtual environment
isn't very hard to hit, so I expect they'll make a huge difference
there.   On lightly loaded systems (little or no CPU overcommit) then
they should be close to a no-op.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-01-24 17:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1289940821.git.jeremy.fitzhardinge@citrix.com>
2011-01-19 16:44 ` [PATCH 00/14] PV ticket locks without expanding spinlock Srivatsa Vaddagiri
2011-01-19 17:07   ` [PATCH 1/3] debugfs: Add support to print u32 array Srivatsa Vaddagiri
2011-01-19 17:12   ` [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock Srivatsa Vaddagiri
2011-01-19 17:21     ` Peter Zijlstra
2011-01-19 18:29       ` Srivatsa Vaddagiri
2011-01-19 18:53       ` Jeremy Fitzhardinge
2011-01-20 11:42         ` Srivatsa Vaddagiri
2011-01-20 17:49           ` Jeremy Fitzhardinge
2011-01-20 11:59         ` Srivatsa Vaddagiri
2011-01-20 13:41           ` Peter Zijlstra
2011-01-20 14:34             ` Srivatsa Vaddagiri
2011-01-20 17:56           ` Jeremy Fitzhardinge
2011-01-21 14:02             ` Srivatsa Vaddagiri
2011-01-21 14:48               ` Rik van Riel
2011-01-22  6:14                 ` Srivatsa Vaddagiri
2011-01-22 14:53                   ` Rik van Riel
2011-01-24 17:49                     ` Jeremy Fitzhardinge
2011-01-19 17:23     ` Srivatsa Vaddagiri
2011-01-19 17:50       ` Peter Zijlstra
2011-01-19 17:17   ` [PATCH 3/3] kvm guest : Add support for pv-ticketlocks Srivatsa Vaddagiri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox