Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration
From: Raghavendra K T @ 2012-03-23  8:07 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
	Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
	Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
	Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Currently guest does not need to know pv_unhalt state and intended to be
used via GET/SET_MSR ioctls  during migration.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9234f13..46f9751 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -40,6 +40,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_UNHALT   0x4b564d04
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd5ef91..38e6c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	9
+#define KVM_SAVE_MSRS_BEGIN	10
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	MSR_KVM_PV_UNHALT,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
 		break;
-
+	case MSR_KVM_PV_UNHALT:
+		vcpu->pv_unhalted = (u32) data;
+		break;
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case MSR_KVM_STEAL_TIME:
 		data = vcpu->arch.st.msr_val;
 		break;
+	case MSR_KVM_PV_UNHALT:
+		data = (u64)vcpu->pv_unhalted;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:

^ permalink raw reply related

* [PATCH RFC V5 4/6] kvm guest : Added configuration support to enable debug information for KVM Guests
From: Raghavendra K T @ 2012-03-23  8:08 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
	Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
	Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
	Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10c28ec..a4530bd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -600,6 +600,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT

^ permalink raw reply related

* [PATCH RFC V5 5/6] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Raghavendra K T @ 2012-03-23  8:08 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
	Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
	Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
	Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 46f9751..2888c45 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -197,10 +197,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f0c6fd6..c535422 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -364,6 +365,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -446,3 +448,255 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

^ permalink raw reply related

* [PATCH RFC V5 6/6] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Raghavendra K T @ 2012-03-23  8:08 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
	Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
	Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
	Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> 

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT)

Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e1d94bf..cf8bf3b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 8820685..062dff9 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_UNHALT              ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..c9e8b9c
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,59 @@
+KVM Hypercalls Documentation
+===========================
+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. status (deprecated, obsolete, active)
+4. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+status: deprecated.
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid
+used to enumerate which hypercalls are available. On PPC, either device tree
+based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration
+mechanism (which is this hypercall) can be used.
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+shared page that contains parts of supervisor visible register state.
+The guest can map this shared page to access its supervisor register through
+memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID of the vcpu to be wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.
diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 5031780..a7662d3 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -219,3 +219,12 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
 		steal: the amount of time in which this vCPU did not run, in
 		nanoseconds. Time during which the vcpu is idle, will not be
 		reported as steal time.
+
+MSR_KVM_PV_UNHALT: 0x4b564d04
+	data: 32 bit flag indicating the paravirtual unhalt state of the VCPU.
+	This data is not expected to reside in guest memory. The unhalt state
+	indicates that corresponding VCPU (halted for some reason) is ready for
+	unhalt operation.
+
+	This data is expected to be filled only via ioctl. This is needed for
+	live migration of virtual machine.

^ permalink raw reply related

* [PULL net] vhost-net/virtio: fixes for 3.4
From: Michael S. Tsirkin @ 2012-03-23 15:28 UTC (permalink / raw)
  To: David Miller
  Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, mst,
	nyh, nyh

Hi David,

The following changes since commit 5ffca28a4ac7abb8a254fafe6bd03b2f83667df7:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs (2012-02-27 07:59:33 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

for you to fetch changes up to ea5d404655ba3b356d0c06d6a3c4f24112124522:

  vhost: fix release path lockdep checks (2012-02-28 09:13:22 +0200)

----------------------------------------------------------------
vhost/virtio: fixes for 3.4

This includes a couple of vhost-net bugfixes,
and fixes tools/virtio making it useful again.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Michael S. Tsirkin (4):
      tools/virtio: add linux/module.h stub
      tools/virtio: add linux/hrtimer.h stub
      tools/virtio: stub out strong barriers
      vhost: fix release path lockdep checks

Nadav Har'El (1):
      vhost: don't forget to schedule()

 drivers/vhost/net.c         |    2 +-
 drivers/vhost/vhost.c       |   11 +++++++----
 drivers/vhost/vhost.h       |    2 +-
 tools/virtio/linux/virtio.h |    3 +++
 4 files changed, 12 insertions(+), 6 deletions(-)
 create mode 100644 tools/virtio/linux/hrtimer.h
 create mode 100644 tools/virtio/linux/module.h

^ permalink raw reply

* Re: [PULL net] vhost-net/virtio: fixes for 3.4
From: David Miller @ 2012-03-23 18:47 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
	nyh
In-Reply-To: <20120323152807.GA6037@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 23 Mar 2012 17:28:07 +0200

> The following changes since commit 5ffca28a4ac7abb8a254fafe6bd03b2f83667df7:
> 
>   Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs (2012-02-27 07:59:33 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Pulled, thanks.

^ permalink raw reply

* CFP: Special Issue on Cloud Computing in Science & Engineering, in the the IEEE Computing in Science & Engineering (CiSE)
From: Ioan Raicu @ 2012-03-23 23:59 UTC (permalink / raw)
  To: virtualization


[-- Attachment #1.1: Type: text/plain, Size: 4079 bytes --]

*Call for Papers*

*IEEE Computing in Science & Engineering*

**

*Special Issue on Cloud Computing in Science & Engineering*

http://www.computer.org/portal/web/computingnow/cise **

*Submissions due: November 04, 2012*

*Estimated Publication date: July/August, 2013*

Cloud computing has emerged as a dominant paradigm that has been widely 
adopted by enterprises. Clouds provide on-demand access to computing 
utilities, an abstraction of unlimited computing resources, and support 
for on-demand scale up, scale down and scale out. Clouds are also 
rapidly joining more traditional computing platforms as viable platforms 
for scientific exploration and discovery, and education. As a result, 
understanding application formulations and usage modes that are 
meaningful in such a hybrid infrastructure, what are the fundamental 
conceptual and technological challenges, and how applications can 
effectively utilize it, is critical.

The goal of this special issue of CiSE is to explore how Clouds 
platforms and abstractions, either by themselves or in combination with 
other platforms, can be effectively used to support real-world science 
and engineering applications. Topics of interest include (but are not 
limited to) algorithmic and application formulations, programming models 
and systems, runtime systems and middleware, end-to-end application 
workflows and experiences with real applications.

Published by the IEEE Computer Society, Computing in Science & 
Engineering magazine features the latest computational science and 
engineering research in an accessible format, along with departments 
covering news and analysis, CSE in education, and emerging technologies.

We strongly encourage submissions that include multimedia, data, and 
community content, which will be featured on the IEEE Computer Society 
website along with the accepted papers.

**

For more information please see 
http://www.computer.org/portal/web/computingnow/cscfp4

*Questions?*

Contact guest editors *Manish Parashar, *Rutgers University (parashar at 
rutgers.edu) or *George K. Thiruvathukal, *Loyola University 
Chicago?(gkt at cs.luc.edu).**

**

*Submission Guidelines*

Authors are asked to submit high-quality original work that has neither 
appeared in nor is under consideration by other journals.  All 
submissions will be peer-reviewed following standard journal practices. 
Manuscripts based on previously published conference papers must be 
extended substantially to include at least 30 percent new material. 
Manuscripts should be written in the active voice, should be no longer 
than 7,200 words (counting each standard figure and table as 250 words), 
and should follow the style and presentation guidelines of /CiSE /(see 
*www.computer.org/cise/author* <http://www.computer.org/cise/author>for 
details).

Please submit your article using the online manuscript submission 
service at *https://mc.manuscriptcentral.com/cs-ieee*. When uploading 
your article, select the appropriate special-issue title under the 
category "Manuscript Type." Also include complete contact information 
for all authors. If you have any questions about submitting your 
article, contact the peer review coordinator at *cise@computer.org* 
<javascript:location.href='mailto:'+String.fromCharCode(99,105,115,101,64,99,111,109,112,117,116,101,114,46,111,114,103)+'?'>.

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel:    1-847-722-0876
Office: 1-312-567-5704
Email:  iraicu@cs.iit.edu
Web:    http://www.cs.iit.edu/~iraicu/
Web:    http://datasys.cs.iit.edu/
=================================================================
=================================================================



[-- Attachment #1.2: Type: text/html, Size: 34122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: vhost question
From: Stefan Hajnoczi @ 2012-03-25 16:18 UTC (permalink / raw)
  To: Steve Glass
In-Reply-To: <CADC0ay0ZRL79uMq72kAewOBTMCgMe1_1BLotdH9QnOwosJkTYQ@mail.gmail.com>

On Sun, Mar 25, 2012 at 1:29 PM, Steve Glass <stevie.glass@gmail.com> wrote:
> On 22 March 2012 19:52, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>>
>> Can you queue a tx->rx kick on the vhost work queue with
>> vhost_work_queue()?
>>
>> Stefan
>
>
> I've been looking at this and trying to work out how to achieve that. I can
> see how I can enqueue some work on the vhost queue but the rx_handler needs
> to run in the process context of the receiving VM (or else it can't access
> the ring). I don't think that's true of the work queue (doesn't it run with
> the qemu process context?)

The vhost worker thread has the mm of the process, see use_mm(dev->mm)
in drivers/vhost/vhost.c:vhost_worker().  It sounds like you need
vhost worker A (tx) to schedule work for vhost worker B (rx).

Stefan

^ permalink raw reply

* Re: [RFC V2 PATCH] virtio-spec: ack the announce notification through ctrl_vq
From: Rusty Russell @ 2012-03-26  0:42 UTC (permalink / raw)
  To: Jason Wang, virtualization, linux-kernel, mst
In-Reply-To: <20120323030501.6912.96218.stgit@jason-ThinkPad-T400>

On Fri, 23 Mar 2012 11:05:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> During link announcement, driver needs a method to notify device that it has
> received the notification and let it clear the VIRITO_NET_S_ANNOUNCE bit in the
> status field. Doing this through a dedicated command looks suitable for all
> platforms (especially for the ones who don't trap the status read or write) with
> a ctrl vq and can solve the race between host and guest.
> 
> So this patch makes VIRTIO_NET_F_ANNOUNCE depends on VIRTIO_NET_F_CTRL_VQ and
> introduces a dedicated command VIRTIO_NET_CTRL_ANNOUNCE_ACK to let device clear
> the VIRTIO_NET_S_ANNOUNCE bit in the status field.

Applied, thanks!

Cheers,
Rusty.
-- 
  How could I marry someone with more hair than me?  http://baldalex.org

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Avi Kivity @ 2012-03-26 14:25 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
	Srivatsa Vaddagiri, Jeremy Fitzhardinge, H. Peter Anvin,
	Attilio Rao, Ingo Molnar, Linus Torvalds, Xen Devel,
	Stephan Diestelhorst
In-Reply-To: <20120321102041.473.61069.sendpatchset@codeblue.in.ibm.com>

On 03/21/2012 12:20 PM, Raghavendra K T wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Changes since last posting: (Raghavendra K T)
> [
>  - Rebased to linux-3.3-rc6.
>  - used function+enum in place of macro (better type checking) 
>  - use cmpxchg while resetting zero status for possible race
> 	[suggested by Dave Hansen for KVM patches ]
> ]
>
> This series replaces the existing paravirtualized spinlock mechanism
> with a paravirtualized ticketlock mechanism.
>
> Ticket locks have an inherent problem in a virtualized case, because
> the vCPUs are scheduled rather than running concurrently (ignoring
> gang scheduled vCPUs).  This can result in catastrophic performance
> collapses when the vCPU scheduler doesn't schedule the correct "next"
> vCPU, and ends up scheduling a vCPU which burns its entire timeslice
> spinning.  (Note that this is not the same problem as lock-holder
> preemption, which this series also addresses; that's also a problem,
> but not catastrophic).
>
> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>
> Currently we deal with this by having PV spinlocks, which adds a layer
> of indirection in front of all the spinlock functions, and defining a
> completely new implementation for Xen (and for other pvops users, but
> there are none at present).
>
> PV ticketlocks keeps the existing ticketlock implemenentation
> (fastpath) as-is, but adds a couple of pvops for the slow paths:
>
> - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
>   iterations, then call out to the __ticket_lock_spinning() pvop,
>   which allows a backend to block the vCPU rather than spinning.  This
>   pvop can set the lock into "slowpath state".
>
> - When releasing a lock, if it is in "slowpath state", the call
>   __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
>   lock is no longer in contention, it also clears the slowpath flag.
>
> The "slowpath state" is stored in the LSB of the within the lock tail
> ticket.  This has the effect of reducing the max number of CPUs by
> half (so, a "small ticket" can deal with 128 CPUs, and "large ticket"
> 32768).
>
> This series provides a Xen implementation, but it should be
> straightforward to add a KVM implementation as well.
>

Looks like a good baseline on which to build the KVM implementation.  We
might need some handshake to prevent interference on the host side with
the PLE code.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-03-27  7:37 UTC (permalink / raw)
  To: Avi Kivity, H. Peter Anvin, Ingo Molnar
  Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
	Jeremy Fitzhardinge, Srivatsa Vaddagiri, Attilio Rao,
	Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F707C5F.1000905@redhat.com>

On 03/26/2012 07:55 PM, Avi Kivity wrote:
> On 03/21/2012 12:20 PM, Raghavendra K T wrote:
>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
[...]
>>
>> This series provides a Xen implementation, but it should be
>> straightforward to add a KVM implementation as well.
>>
>
> Looks like a good baseline on which to build the KVM implementation.  We
> might need some handshake to prevent interference on the host side with
> the PLE code.
>

Avi, Thanks for reviewing. True, it is sort of equivalent to PLE on non 
PLE machine.

Ingo, Peter,
Can you please let us know if this series can be considered for next 
merge window?
OR do you still have some concerns that needs addressing.

I shall rebase patches to 3.3 and resend. (main difference would be 
UNINLINE_SPIN_UNLOCK and jump label changes to use 
static_key_true/false() usage instead of static_branch.)

Thanks,
Raghu

^ permalink raw reply

* [PATCH net-next] virtio_net: do not rate limit counter increments
From: Rick Jones @ 2012-03-27 17:28 UTC (permalink / raw)
  To: netdev; +Cc: virtualization, mst

From: Rick Jones <rick.jones2@hp.com>

While it is desirable to rate limit certain messages, it is not
desirable to rate limit the incrementing of counters associated
with those messages.

Signed-off-by: Rick Jones <rick.jones2@hp.com>

---

Compiled, and run briefly in a 1 vCPU guest under a netperf workload.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 019da01..4de2760 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -625,12 +625,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* This can happen with OOM and indirect buffers. */
 	if (unlikely(capacity < 0)) {
-		if (net_ratelimit()) {
-			if (likely(capacity == -ENOMEM)) {
+		if (likely(capacity == -ENOMEM)) {
+			if (net_ratelimit()) {
 				dev_warn(&dev->dev,
 					 "TX queue failure: out of memory\n");
 			} else {
-				dev->stats.tx_fifo_errors++;
+			dev->stats.tx_fifo_errors++;
+			if (net_ratelimit())
 				dev_warn(&dev->dev,
 					 "Unexpected TX queue failure: %d\n",
 					 capacity);

^ permalink raw reply related

* Re: [V5 PATCH] virtio-net: send gratuitous packets when needed
From: David Miller @ 2012-03-28  2:31 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, mst, linux-kernel, virtualization
In-Reply-To: <20120316090100.5223.50783.stgit@amd-6168-8-1.englab.nay.redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 16 Mar 2012 17:01:01 +0800

> As hypervior does not have the knowledge of guest network configuration, it's
> better to ask guest to send gratuitous packets when needed.
> 
> Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
> is set, a workqueue is scheduled to send gratuitous packet through
> NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> VIRTIO_NET_F_GUEST_ANNOUNCE.
> 
> Changes from v4:
> - typos
> - handle workqueue unconditionally
> - move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits
> 
> Changes from v3:
> - cancel the workqueue during freeze
> 
> Changes from v2:
> - fix the race between unregister_dev() and workqueue
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

What's happening with this patch?

^ permalink raw reply

* Re: [V5 PATCH] virtio-net: send gratuitous packets when needed
From: Jason Wang @ 2012-03-28  2:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mst, linux-kernel, virtualization
In-Reply-To: <20120327.223112.2032989102915830685.davem@davemloft.net>

On 03/28/2012 10:31 AM, David Miller wrote:
> From: Jason Wang<jasowang@redhat.com>
> Date: Fri, 16 Mar 2012 17:01:01 +0800
>
>> >  As hypervior does not have the knowledge of guest network configuration, it's
>> >  better to ask guest to send gratuitous packets when needed.
>> >  
>> >  Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
>> >  is set, a workqueue is scheduled to send gratuitous packet through
>> >  NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
>> >  VIRTIO_NET_F_GUEST_ANNOUNCE.
>> >  
>> >  Changes from v4:
>> >  - typos
>> >  - handle workqueue unconditionally
>> >  - move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits
>> >  
>> >  Changes from v3:
>> >  - cancel the workqueue during freeze
>> >  
>> >  Changes from v2:
>> >  - fix the race between unregister_dev() and workqueue
>> >  
>> >  Signed-off-by: Jason Wang<jasowang@redhat.com>
> What's happening with this patch?
Hi David:

I'm working on a new version of this patch as there's some changes in 
virtio-spec, I would post it soon.

Thanks

^ permalink raw reply

* [V6 PATCH] virtio-net: send gratuitous packets when needed
From: Jason Wang @ 2012-03-28  5:44 UTC (permalink / raw)
  To: netdev, rusty, mst, linux-kernel, virtualization; +Cc: davem, qemu-devel

As hypervior does not have the knowledge of guest network configuration, it's
better to ask guest to send gratuitous packets when needed.

Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
is set, a workqueue is scheduled to send gratuitous packet through
NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
VIRTIO_NET_F_GUEST_ANNOUNCE.

Changes from v5:
- notify the chain before acking the link annoucement
- ack the link announcement notification through control vq

Changes from v4:
- typos
- handle workqueue unconditionally
- move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits

Changes from v3:
- cancel the workqueue during freeze

Changes from v2:
- fix the race between unregister_dev() and workqueue

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c   |   32 +++++++++++++++++++++++++++++++-
 include/linux/virtio_net.h |   13 +++++++++++++
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4880aa8..0f60da7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,9 @@ struct virtnet_info {
 	/* Work struct for refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	/* Work struct for sending gratuitous packets. */
+	struct work_struct announce;
+
 	/* Chain pages by the private ptr. */
 	struct page *pages;
 
@@ -781,12 +784,30 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	return status == VIRTIO_NET_OK;
 }
 
+static void virtnet_ack_link_announce(struct virtnet_info *vi)
+{
+	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
+				  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL,
+				  0, 0)) {
+		dev_warn(&vi->dev->dev, "Failed to ack link nnounce.\n");
+	}
+}
+
+static void announce_work(struct work_struct *work)
+{
+	struct virtnet_info *vi = container_of(work, struct virtnet_info,
+					       announce);
+	netif_notify_peers(vi->dev);
+	virtnet_ack_link_announce(vi);
+}
+
 static int virtnet_close(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
+	cancel_work_sync(&vi->announce);
 	napi_disable(&vi->napi);
 
 	return 0;
@@ -962,11 +983,17 @@ static void virtnet_update_status(struct virtnet_info *vi)
 		return;
 
 	/* Ignore unknown (future) status bits */
-	v &= VIRTIO_NET_S_LINK_UP;
+	v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
 
 	if (vi->status == v)
 		return;
 
+	if (v & VIRTIO_NET_S_ANNOUNCE) {
+		v &= ~VIRTIO_NET_S_ANNOUNCE;
+		if (v & VIRTIO_NET_S_LINK_UP)
+			schedule_work(&vi->announce);
+	}
+
 	vi->status = v;
 
 	if (vi->status & VIRTIO_NET_S_LINK_UP) {
@@ -1076,6 +1103,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free;
 
 	INIT_DELAYED_WORK(&vi->refill, refill_work);
+	INIT_WORK(&vi->announce, announce_work);
 	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
 	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
 
@@ -1187,6 +1215,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	virtqueue_disable_cb(vi->svq);
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
 		virtqueue_disable_cb(vi->cvq);
+	cancel_work_sync(&vi->announce);
 
 	netif_device_detach(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
@@ -1233,6 +1262,7 @@ static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+	VIRTIO_NET_F_GUEST_ANNOUNCE,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 970d5a2..383e8a0 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -49,8 +49,10 @@
 #define VIRTIO_NET_F_CTRL_RX	18	/* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN	19	/* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can send gratituous packet */
 
 #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE   2	/* Announcement is needed */
 
 struct virtio_net_config {
 	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
@@ -152,4 +154,15 @@ struct virtio_net_ctrl_mac {
  #define VIRTIO_NET_CTRL_VLAN_ADD             0
  #define VIRTIO_NET_CTRL_VLAN_DEL             1
 
+/*
+ * Control link announce acknowledgement
+ *
+ * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+ * driver has recevied the notification and device would clear the
+ * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
+ * this command.
+ */
+#define VIRTIO_NET_CTRL_ANNOUNCE       3
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
+
 #endif /* _LINUX_VIRTIO_NET_H */

^ permalink raw reply related

* Re: [PATCH net-next] virtio_net: do not rate limit counter increments
From: Rusty Russell @ 2012-03-28  6:03 UTC (permalink / raw)
  To: Rick Jones, netdev; +Cc: virtualization, mst
In-Reply-To: <20120327172809.C52572900384@tardy>

On Tue, 27 Mar 2012 10:28:09 -0700 (PDT), raj@tardy.cup.hp.com (Rick Jones) wrote:
> From: Rick Jones <rick.jones2@hp.com>
> 
> While it is desirable to rate limit certain messages, it is not
> desirable to rate limit the incrementing of counters associated
> with those messages.
> 
> Signed-off-by: Rick Jones <rick.jones2@hp.com>

Acked-by: Rusty Russell <rusty@rustcorp.com.au>

Thanks!
Rusty.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 019da01..4de2760 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -625,12 +625,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	/* This can happen with OOM and indirect buffers. */
>  	if (unlikely(capacity < 0)) {
> -		if (net_ratelimit()) {
> -			if (likely(capacity == -ENOMEM)) {
> +		if (likely(capacity == -ENOMEM)) {
> +			if (net_ratelimit()) {
>  				dev_warn(&dev->dev,
>  					 "TX queue failure: out of memory\n");
>  			} else {
> -				dev->stats.tx_fifo_errors++;
> +			dev->stats.tx_fifo_errors++;
> +			if (net_ratelimit())
>  				dev_warn(&dev->dev,
>  					 "Unexpected TX queue failure: %d\n",
>  					 capacity);
> 

-- 
  How could I marry someone with more hair than me?  http://baldalex.org

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: do not rate limit counter increments
From: Michael S. Tsirkin @ 2012-03-28  8:30 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, virtualization
In-Reply-To: <20120327172809.C52572900384@tardy>

On Tue, Mar 27, 2012 at 10:28:09AM -0700, Rick Jones wrote:
> From: Rick Jones <rick.jones2@hp.com>
> 
> While it is desirable to rate limit certain messages, it is not
> desirable to rate limit the incrementing of counters associated
> with those messages.
> 
> Signed-off-by: Rick Jones <rick.jones2@hp.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

Dave, can you apply pls? Thanks!

> ---
> 
> Compiled, and run briefly in a 1 vCPU guest under a netperf workload.
> 
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 019da01..4de2760 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -625,12 +625,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	/* This can happen with OOM and indirect buffers. */
>  	if (unlikely(capacity < 0)) {
> -		if (net_ratelimit()) {
> -			if (likely(capacity == -ENOMEM)) {
> +		if (likely(capacity == -ENOMEM)) {
> +			if (net_ratelimit()) {
>  				dev_warn(&dev->dev,
>  					 "TX queue failure: out of memory\n");
>  			} else {
> -				dev->stats.tx_fifo_errors++;
> +			dev->stats.tx_fifo_errors++;
> +			if (net_ratelimit())
>  				dev_warn(&dev->dev,
>  					 "Unexpected TX queue failure: %d\n",
>  					 capacity);

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: do not rate limit counter increments
From: David Miller @ 2012-03-28  8:41 UTC (permalink / raw)
  To: mst; +Cc: netdev, rick.jones2, virtualization
In-Reply-To: <20120328083046.GA5873@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 28 Mar 2012 10:30:46 +0200

> On Tue, Mar 27, 2012 at 10:28:09AM -0700, Rick Jones wrote:
>> From: Rick Jones <rick.jones2@hp.com>
>> 
>> While it is desirable to rate limit certain messages, it is not
>> desirable to rate limit the incrementing of counters associated
>> with those messages.
>> 
>> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> 
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Dave, can you apply pls? Thanks!

Done.

^ permalink raw reply

* Re: [PATCH] virtio_blk: add helper function to support mass of disks naming
From: Michael S. Tsirkin @ 2012-03-28 10:58 UTC (permalink / raw)
  To: Ren Mingxin; +Cc: linux-scsi, kvm, LKML, virtualization, jens.axboe, Tejun Heo
In-Reply-To: <4F72B5B3.6020602@cn.fujitsu.com>

On Wed, Mar 28, 2012 at 02:54:43PM +0800, Ren Mingxin wrote:
>  Hi,
> 
> The current virtblk's naming algorithm just supports 26^3
> disks. If there are mass of virtblks(exceeding 26^3), there
> will be disks with the same name.
> 
> According to "sd_format_disk_name()", I add function
> "virtblk_name_format()" for virtblk to support mass of
> disks.
> 
> Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>

Nod. This is basically what 3e1a7ff8a0a7b948f2684930166954f9e8e776fe
did. Except, maybe we should move this function into block core
instead of duplicating it? Where would be a good place to put it?
Jens, care to comment?

> ---
>  virtio_blk.c |   37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index c4a60ba..07b8bf9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -374,6 +374,30 @@ static int init_vq(struct virtio_blk *vblk)
>         return err;
>  }
> 
> +static int virtblk_name_format(char *prefix, int index, char *buf,
> int buflen)
> +{
> +       const int base = 'z' - 'a' + 1;
> +       char *begin = buf + strlen(prefix);
> +       char *begin = buf + strlen(prefix);
> +       char *end = buf + buflen;
> +       char *p;
> +       int unit;
> +
> +       p = end - 1;
> +       *p = '\0';
> +       unit = base;
> +       do {
> +               if (p == begin)
> +                       return -EINVAL;
> +               *--p = 'a' + (index % unit);
> +               index = (index / unit) - 1;
> +       } while (index >= 0);
> +
> +       memmove(begin, p, end - p);
> +       memcpy(buf, prefix, strlen(prefix));
> +
> +       return 0;
> +}
> +
>  static int __devinit virtblk_probe(struct virtio_device *vdev)
>  {
>         struct virtio_blk *vblk;
> @@ -442,18 +466,7 @@ static int __devinit virtblk_probe(struct
> virtio_device *vdev)
> 
>         q->queuedata = vblk;
> 
> -       if (index < 26) {
> -               sprintf(vblk->disk->disk_name, "vd%c", 'a' + index % 26);
> -       } else if (index < (26 + 1) * 26) {
> -               sprintf(vblk->disk->disk_name, "vd%c%c",
> -                       'a' + index / 26 - 1, 'a' + index % 26);
> -       } else {
> -               const unsigned int m1 = (index / 26 - 1) / 26 - 1;
> -               const unsigned int m2 = (index / 26 - 1) % 26;
> -               const unsigned int m3 =  index % 26;
> -               sprintf(vblk->disk->disk_name, "vd%c%c%c",
> -                       'a' + m1, 'a' + m2, 'a' + m3);
> -       }
> +       virtblk_name_format("vd", index, vblk->disk->disk_name,
> DISK_NAME_LEN);
> 
>         vblk->disk->major = major;
>         vblk->disk->first_minor = index_to_minor(index);

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Alan Meadows @ 2012-03-28 16:09 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
	Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
	H. Peter Anvin, Attilio Rao, Ingo Molnar, Linus Torvalds,
	Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F716E31.3000803@linux.vnet.ibm.com>


[-- Attachment #1.1: Type: text/plain, Size: 2073 bytes --]

I am happy to see this issue receiving some attention and second the wish
to see these patches be considered for further review and inclusion in an
upcoming release.

Overcommit is not as common in enterprise and single-tenant virtualized
environments as it is in multi-tenant environments, and frankly we have
been suffering.

We have been running an early copy of these patches in our lab and in a
small production node sample set both on 3.2.0-rc4 and 3.3.0-rc6 for over
two weeks now with great success. With the heavy level of vCPU:pCPU
overcommit required for our situation, the patches are increasing
performance by an _order of magnitude_ on our E5645 and E5620 systems.

Alan Meadows

On Tue, Mar 27, 2012 at 12:37 AM, Raghavendra K T <
raghavendra.kt@linux.vnet.ibm.com> wrote:

> On 03/26/2012 07:55 PM, Avi Kivity wrote:
>
>> On 03/21/2012 12:20 PM, Raghavendra K T wrote:
>>
>>> From: Jeremy Fitzhardinge<jeremy.**fitzhardinge@citrix.com<jeremy.fitzhardinge@citrix.com>
>>> >
>>>
>> [...]
>
>
>>> This series provides a Xen implementation, but it should be
>>> straightforward to add a KVM implementation as well.
>>>
>>>
>> Looks like a good baseline on which to build the KVM implementation.  We
>> might need some handshake to prevent interference on the host side with
>> the PLE code.
>>
>>
> Avi, Thanks for reviewing. True, it is sort of equivalent to PLE on non
> PLE machine.
>
> Ingo, Peter,
> Can you please let us know if this series can be considered for next merge
> window?
> OR do you still have some concerns that needs addressing.
>
> I shall rebase patches to 3.3 and resend. (main difference would be
> UNINLINE_SPIN_UNLOCK and jump label changes to use static_key_true/false()
> usage instead of static_branch.)
>
> Thanks,
> Raghu
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/**majordomo-info.html<http://vger.kernel.org/majordomo-info.html>
> Please read the FAQ at  http://www.tux.org/lkml/
>

[-- Attachment #1.2: Type: text/html, Size: 3840 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-03-28 18:21 UTC (permalink / raw)
  To: Alan Meadows, Avi Kivity
  Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
	the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
	Srivatsa Vaddagiri, Jeremy Fitzhardinge, H. Peter Anvin,
	Attilio Rao, Ingo Molnar, Linus Torvalds, Xen Devel,
	Stephan Diestelhorst
In-Reply-To: <CAMy5W3foop40+R1RLv_JPhnO5ZmV90uMmNERYY-e3QCeaJfqLw@mail.gmail.com>

On 03/28/2012 09:39 PM, Alan Meadows wrote:
> I am happy to see this issue receiving some attention and second the
> wish to see these patches be considered for further review and inclusion
> in an upcoming release.
>
> Overcommit is not as common in enterprise and single-tenant virtualized
> environments as it is in multi-tenant environments, and frankly we have
> been suffering.
>
> We have been running an early copy of these patches in our lab and in a
> small production node sample set both on3.2.0-rc4 and 3.3.0-rc6 for over
> two weeks now with great success. With the heavy level of vCPU:pCPU
> overcommit required for our situation, the patches are increasing
> performance by an _order of magnitude_ on our E5645 and E5620 systems.
>

Thanks Alan for the support. I feel timing of this patch was little bad
though. (merge window)

>
>
>         Looks like a good baseline on which to build the KVM
>         implementation.  We
>         might need some handshake to prevent interference on the host
>         side with
>         the PLE code.
>

I think I still missed some point in Avi's comment. I agree that PLE
may be interfering with above patches (resulting in less performance
advantages). but we have not seen performance degradation with the
patches in earlier benchmarks. [ theoretically since patch has very
slight advantage over PLE that atleast it knows who should run next ].

So TODO in my list on this is:
1. More analysis of performance on PLE mc.
2. Seeing how to implement handshake to increase performance (if PLE +
patch combination have slight negative effect).

Sorry that, I could not do more analysis on PLE (as promised last time)
because of machine availability.

I 'll do some work on this and comeback. But in the meantime, I do not
see it as blocking for next merge window.

>
>     Avi, Thanks for reviewing. True, it is sort of equivalent to PLE on
>     non PLE machine.
>
>     Ingo, Peter,
>     Can you please let us know if this series can be considered for next
>     merge window?
>     OR do you still have some concerns that needs addressing.
>
>     I shall rebase patches to 3.3 and resend. (main difference would be
>     UNINLINE_SPIN_UNLOCK and jump label changes to use
>     static_key_true/false() usage instead of static_branch.)

^ permalink raw reply

* Re: [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
From: Raghavendra K T @ 2012-03-28 18:32 UTC (permalink / raw)
  To: Ingo Molnar, Marcelo Tosatti, Jeremy Fitzhardinge, Alexander Graf,
	Gleb Natapov
  Cc: Raghavendra K T, KVM, Konrad Rzeszutek Wilk, Greg Kroah-Hartman,
	linux-doc, X86, LKML, Xen, Srivatsa Vaddagiri, Avi Kivity,
	H. Peter Anvin, Virtualization, Stefano Stabellini, Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>

On 03/23/2012 01:35 PM, Raghavendra K T wrote:
> The 6-patch series to follow this email extends KVM-hypervisor and Linux guest
> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's
> implementation.
>
> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
> another vcpu out of halt state.
> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
> one MSR is added to aid live migration.
>
> Changes in V5:
> - rebased to 3.3-rc6
> - added PV_UNHALT_MSR that would help in live migration (Avi)
> - removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.

Sorry for pinging
I know it is busy time. But I hope to get response on these patches in 
your free time, so that I can target next merge window for this. 
(whether it has reached some good state or it is heading in reverse 
direction!). it would really boost my morale.
especially MSR stuff and dropping vcpu request bit for PV unhalt.

- Raghu

^ permalink raw reply

* Re: [PATCH] virtio_blk: add helper function to support mass of disks naming
From: Rusty Russell @ 2012-03-29  4:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Ren Mingxin
  Cc: kvm, linux-scsi, LKML, virtualization, jens.axboe, Tejun Heo
In-Reply-To: <20120328105819.GD6194@redhat.com>

On Wed, 28 Mar 2012 12:58:21 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Mar 28, 2012 at 02:54:43PM +0800, Ren Mingxin wrote:
> >  Hi,
> > 
> > The current virtblk's naming algorithm just supports 26^3
> > disks. If there are mass of virtblks(exceeding 26^3), there
> > will be disks with the same name.
> > 
> > According to "sd_format_disk_name()", I add function
> > "virtblk_name_format()" for virtblk to support mass of
> > disks.
> > 
> > Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
> 
> Nod. This is basically what 3e1a7ff8a0a7b948f2684930166954f9e8e776fe
> did. Except, maybe we should move this function into block core
> instead of duplicating it? Where would be a good place to put it?
> Jens, care to comment?

Indeed.

Cheers,
Rusty.
-- 
  How could I marry someone with more hair than me?  http://baldalex.org

^ permalink raw reply

* Re: [PATCH] virtio_blk: add helper function to support mass of disks naming
From: Ren Mingxin @ 2012-03-29  5:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, jens.axboe
  Cc: kvm, linux-scsi, LKML, virtualization, Tejun Heo
In-Reply-To: <20120328105819.GD6194@redhat.com>

  On 03/28/2012 06:58 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 28, 2012 at 02:54:43PM +0800, Ren Mingxin wrote:
>>   Hi,
>>
>> The current virtblk's naming algorithm just supports 26^3
>> disks. If there are mass of virtblks(exceeding 26^3), there
>> will be disks with the same name.
>>
>> According to "sd_format_disk_name()", I add function
>> "virtblk_name_format()" for virtblk to support mass of
>> disks.
>>
>> Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
> Nod. This is basically what 3e1a7ff8a0a7b948f2684930166954f9e8e776fe
> did. Except, maybe we should move this function into block core
> instead of duplicating it? Where would be a good place to put it?

Yes, this was also what I thought.

How about placing the "sd_format_disk_name()"
as "disk_name_format()" into "block/genhd.c"
("include/linux/genhd.h")?

Thanks.
Ren

^ permalink raw reply

* [PATCH 0/5] virtio: S3 support, use PM API macro for init
From: Amit Shah @ 2012-03-29  8:28 UTC (permalink / raw)
  To: Virtualization List; +Cc: Linus Torvalds, Michael S. Tsirkin

Hello,

Turns out S3 is not different from S4 for virtio devices: the device
is assumed to be reset, so the host and guest state are to be assumed
to be out of sync upon resume.  We handle the S4 case with exactly the
same scenario, so just point the suspend/resume routines to the
freeze/restore ones.

Once that is done, we also use the PM API's macro to initialise the
sleep functions.

A couple of cleanups are included: there's no need for special thaw
processing in the balloon driver, so that's addressed in patches 1 and
2.

Testing: both S3 and S4 support have been tested using these patches
using a similar method used earlier during S4 patch development: a
guest is started with virtio-blk as the only disk, a virtio network
card, a virtio-serial port and a virtio balloon device.  Ping from
guest to host, dd /dev/zero to a file on the disk, and IO from the
host on the virtio-serial port, all at once, while exercising S4 and
S3 (separately) were tested.  They all continue to work fine after
resume.  virtio balloon values too were tested by inflating and
deflating the balloon.

Please review and apply,

Thanks,

Amit Shah (5):
  virtio: balloon: Allow stats update after restore from S4
  virtio: drop thaw PM operation
  virtio-pci: drop restore_common()
  virtio-pci: S3 support
  virtio-pci: switch to PM ops macro to initialise PM functions

 drivers/virtio/virtio_balloon.c |   14 -------
 drivers/virtio/virtio_pci.c     |   74 ++++----------------------------------
 include/linux/virtio.h          |    1 -
 3 files changed, 8 insertions(+), 81 deletions(-)

-- 
1.7.7.6

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox