[patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2)
@ 2014-11-25 17:21 Marcelo Tosatti
  2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  0 siblings, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm

The problem:

On -RT, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

v2: improve changelog (Rik van Riel)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti
@ 2014-11-25 17:21 ` Marcelo Tosatti
  2014-11-25 18:57   ` Rik van Riel
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  1 sibling, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Marcelo Tosatti

[-- Attachment #1: kvm-wq-use-simple-waitqueue --]
[-- Type: text/plain, Size: 14747 bytes --]

The problem:

On -RT, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the 
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT 
are sleepable locks. Therefore, waking up a waitqueue 
waiter involves locking a sleeping lock, which 
is not allowed from hard interrupt context.

cyclictest command line:
# cyclictest -m -n -q -p99 -l 1000000 -h60  -D 1m

This patch reduces the average latency in my tests from 14us to 11us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/arm/kvm/arm.c                  |    4 ++--
 arch/arm/kvm/psci.c                 |    4 ++--
 arch/mips/kvm/kvm_mips.c            |    8 ++++----
 arch/powerpc/include/asm/kvm_host.h |    4 ++--
 arch/powerpc/kvm/book3s_hv.c        |   20 ++++++++++----------
 arch/s390/include/asm/kvm_host.h    |    2 +-
 arch/s390/kvm/interrupt.c           |   22 ++++++++++------------
 arch/s390/kvm/sigp.c                |   16 ++++++++--------
 arch/x86/kvm/lapic.c                |    6 +++---
 include/linux/kvm_host.h            |    4 ++--
 virt/kvm/async_pf.c                 |    4 ++--
 virt/kvm/kvm_main.c                 |   16 ++++++++--------
 12 files changed, 54 insertions(+), 56 deletions(-)

Index: linux-stable-rt/arch/arm/kvm/arm.c
===================================================================
--- linux-stable-rt.orig/arch/arm/kvm/arm.c	2014-11-25 14:13:39.188899952 -0200
+++ linux-stable-rt/arch/arm/kvm/arm.c	2014-11-25 14:14:38.620810092 -0200
@@ -495,9 +495,9 @@
 
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
-	wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+	struct swait_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	wait_event_interruptible(*wq, !vcpu->arch.pause);
+	swait_event_interruptible(*wq, !vcpu->arch.pause);
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
Index: linux-stable-rt/arch/arm/kvm/psci.c
===================================================================
--- linux-stable-rt.orig/arch/arm/kvm/psci.c	2014-11-25 14:13:39.189899951 -0200
+++ linux-stable-rt/arch/arm/kvm/psci.c	2014-11-25 14:14:38.620810092 -0200
@@ -36,7 +36,7 @@
 {
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL, *tmp;
-	wait_queue_head_t *wq;
+	struct swait_head *wq;
 	unsigned long cpu_id;
 	unsigned long mpidr;
 	phys_addr_t target_pc;
@@ -80,7 +80,7 @@
 	smp_mb();		/* Make sure the above is visible */
 
 	wq = kvm_arch_vcpu_wq(vcpu);
-	wake_up_interruptible(wq);
+	swait_wake_interruptible(wq);
 
 	return KVM_PSCI_RET_SUCCESS;
 }
Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c
===================================================================
--- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c	2014-11-25 14:13:39.191899948 -0200
+++ linux-stable-rt/arch/mips/kvm/kvm_mips.c	2014-11-25 14:14:38.621810091 -0200
@@ -464,8 +464,8 @@
 
 	dvcpu->arch.wait = 0;
 
-	if (waitqueue_active(&dvcpu->wq)) {
-		wake_up_interruptible(&dvcpu->wq);
+	if (swaitqueue_active(&dvcpu->wq)) {
+		swait_wake_interruptible(&dvcpu->wq);
 	}
 
 	return 0;
@@ -971,8 +971,8 @@
 	kvm_mips_callbacks->queue_timer_int(vcpu);
 
 	vcpu->arch.wait = 0;
-	if (waitqueue_active(&vcpu->wq)) {
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq)) {
+		swait_wake_nterruptible(&vcpu->wq);
 	}
 }
 
Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h
===================================================================
--- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:13:39.193899944 -0200
+++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h	2014-11-25 14:14:38.621810091 -0200
@@ -295,7 +295,7 @@
 	u8 in_guest;
 	struct list_head runnable_threads;
 	spinlock_t lock;
-	wait_queue_head_t wq;
+	struct swait_head wq;
 	u64 stolen_tb;
 	u64 preempt_tb;
 	struct kvm_vcpu *runner;
@@ -612,7 +612,7 @@
 	u8 prodded;
 	u32 last_inst;
 
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 	struct kvmppc_vcore *vcore;
 	int ret;
 	int trap;
Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c
===================================================================
--- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:13:39.195899942 -0200
+++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c	2014-11-25 14:14:38.625810085 -0200
@@ -74,11 +74,11 @@
 {
 	int me;
 	int cpu = vcpu->cpu;
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swaitqueue_active(wqp)) {
+		swait_wake_interruptible(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -583,8 +583,8 @@
 		tvcpu->arch.prodded = 1;
 		smp_mb();
 		if (vcpu->arch.ceded) {
-			if (waitqueue_active(&vcpu->wq)) {
-				wake_up_interruptible(&vcpu->wq);
+			if (swaitqueue_active(&vcpu->wq)) {
+				swait_wake_interruptible(&vcpu->wq);
 				vcpu->stat.halt_wakeup++;
 			}
 		}
@@ -1199,7 +1199,7 @@
 		if (vcore) {
 			INIT_LIST_HEAD(&vcore->runnable_threads);
 			spin_lock_init(&vcore->lock);
-			init_waitqueue_head(&vcore->wq);
+			init_swait_head(&vcore->wq);
 			vcore->preempt_tb = TB_NIL;
 			vcore->lpcr = kvm->arch.lpcr;
 			vcore->first_vcpuid = core * threads_per_core;
@@ -1566,13 +1566,13 @@
  */
 static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 {
-	DEFINE_WAIT(wait);
+	DEFINE_SWAITER(wait);
 
-	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+	swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
 	vc->vcore_state = VCORE_SLEEPING;
 	spin_unlock(&vc->lock);
 	schedule();
-	finish_wait(&vc->wq, &wait);
+	swait_finish(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
 }
@@ -1613,7 +1613,7 @@
 			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
 		} else if (vc->vcore_state == VCORE_SLEEPING) {
-			wake_up(&vc->wq);
+			swait_wake(&vc->wq);
 		}
 
 	}
Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h
===================================================================
--- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h	2014-11-25 14:13:39.196899940 -0200
+++ linux-stable-rt/arch/s390/include/asm/kvm_host.h	2014-11-25 14:14:38.627810082 -0200
@@ -233,7 +233,7 @@
 	atomic_t active;
 	struct kvm_s390_float_interrupt *float_int;
 	int timer_due; /* event indicator for waitqueue below */
-	wait_queue_head_t *wq;
+	struct swait_head *wq;
 	atomic_t *cpuflags;
 	unsigned int action_bits;
 };
Index: linux-stable-rt/arch/s390/kvm/interrupt.c
===================================================================
--- linux-stable-rt.orig/arch/s390/kvm/interrupt.c	2014-11-25 14:13:39.197899939 -0200
+++ linux-stable-rt/arch/s390/kvm/interrupt.c	2014-11-25 14:14:38.630810077 -0200
@@ -403,7 +403,7 @@
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 {
 	u64 now, sltime;
-	DECLARE_WAITQUEUE(wait, current);
+	DECLARE_SWAITER(wait);
 
 	vcpu->stat.exit_wait_state++;
 	if (kvm_cpu_has_interrupt(vcpu))
@@ -440,12 +440,11 @@
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 	spin_lock(&vcpu->arch.local_int.float_int->lock);
 	spin_lock_bh(&vcpu->arch.local_int.lock);
-	add_wait_queue(&vcpu->wq, &wait);
+	swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 	while (list_empty(&vcpu->arch.local_int.list) &&
 		list_empty(&vcpu->arch.local_int.float_int->list) &&
 		(!vcpu->arch.local_int.timer_due) &&
 		!signal_pending(current)) {
-		set_current_state(TASK_INTERRUPTIBLE);
 		spin_unlock_bh(&vcpu->arch.local_int.lock);
 		spin_unlock(&vcpu->arch.local_int.float_int->lock);
 		schedule();
@@ -453,8 +452,7 @@
 		spin_lock_bh(&vcpu->arch.local_int.lock);
 	}
 	__unset_cpu_idle(vcpu);
-	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&vcpu->wq, &wait);
+	swait_finish(&vcpu->wq, &wait);
 	spin_unlock_bh(&vcpu->arch.local_int.lock);
 	spin_unlock(&vcpu->arch.local_int.float_int->lock);
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
@@ -469,8 +467,8 @@
 
 	spin_lock(&vcpu->arch.local_int.lock);
 	vcpu->arch.local_int.timer_due = 1;
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 	spin_unlock(&vcpu->arch.local_int.lock);
 }
 
@@ -617,7 +615,7 @@
 	spin_lock_bh(&li->lock);
 	list_add(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
-	BUG_ON(waitqueue_active(li->wq));
+	BUG_ON(swaitqueue_active(li->wq));
 	spin_unlock_bh(&li->lock);
 	return 0;
 }
@@ -750,8 +748,8 @@
 	li = fi->local_int[sigcpu];
 	spin_lock_bh(&li->lock);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	spin_unlock(&fi->lock);
 	mutex_unlock(&kvm->lock);
@@ -836,8 +834,8 @@
 	if (inti->type == KVM_S390_SIGP_STOP)
 		li->action_bits |= ACTION_STOP_ON_STOP;
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 	spin_unlock_bh(&li->lock);
 	mutex_unlock(&vcpu->kvm->lock);
 	return 0;
Index: linux-stable-rt/arch/s390/kvm/sigp.c
===================================================================
--- linux-stable-rt.orig/arch/s390/kvm/sigp.c	2014-11-25 14:13:39.198899937 -0200
+++ linux-stable-rt/arch/s390/kvm/sigp.c	2014-11-25 14:14:38.633810073 -0200
@@ -79,8 +79,8 @@
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 	VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
@@ -148,8 +148,8 @@
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	spin_unlock_bh(&li->lock);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 	VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr);
@@ -179,8 +179,8 @@
 	atomic_set(&li->active, 1);
 	atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags);
 	li->action_bits |= action;
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 out:
 	spin_unlock_bh(&li->lock);
 
@@ -288,8 +288,8 @@
 
 	list_add_tail(&inti->list, &li->list);
 	atomic_set(&li->active, 1);
-	if (waitqueue_active(li->wq))
-		wake_up_interruptible(li->wq);
+	if (swaitqueue_active(li->wq))
+		swait_wake_interruptible(li->wq);
 	rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 
 	VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:13:39.199899935 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
@@ -1533,7 +1533,7 @@
 	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 	struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);
 	struct kvm_vcpu *vcpu = apic->vcpu;
-	wait_queue_head_t *q = &vcpu->wq;
+	struct swait_head *q = &vcpu->wq;
 
 	/*
 	 * There is a race window between reading and incrementing, but we do
@@ -1547,8 +1547,8 @@
 		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
 	}
 
-	if (waitqueue_active(q))
-		wake_up_interruptible(q);
+	if (swaitqueue_active(q))
+		swait_wake_interruptible(q);
 
 	if (lapic_is_periodic(apic)) {
 		hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
Index: linux-stable-rt/include/linux/kvm_host.h
===================================================================
--- linux-stable-rt.orig/include/linux/kvm_host.h	2014-11-25 14:13:39.201899932 -0200
+++ linux-stable-rt/include/linux/kvm_host.h	2014-11-25 14:14:38.638810065 -0200
@@ -231,7 +231,7 @@
 
 	int fpu_active;
 	int guest_fpu_loaded, guest_xcr0_loaded;
-	wait_queue_head_t wq;
+	struct swait_head wq;
 	struct pid *pid;
 	int sigset_active;
 	sigset_t sigset;
@@ -677,7 +677,7 @@
 }
 #endif
 
-static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
+static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 {
 #ifdef __KVM_HAVE_ARCH_WQP
 	return vcpu->arch.wqp;
Index: linux-stable-rt/virt/kvm/async_pf.c
===================================================================
--- linux-stable-rt.orig/virt/kvm/async_pf.c	2014-11-25 14:13:39.202899931 -0200
+++ linux-stable-rt/virt/kvm/async_pf.c	2014-11-25 14:14:38.639810063 -0200
@@ -82,8 +82,8 @@
 
 	trace_kvm_async_pf_completed(addr, gva);
 
-	if (waitqueue_active(&vcpu->wq))
-		wake_up_interruptible(&vcpu->wq);
+	if (swaitqueue_active(&vcpu->wq))
+		swait_wake_interruptible(&vcpu->wq);
 
 	mmput(mm);
 	kvm_put_kvm(vcpu->kvm);
Index: linux-stable-rt/virt/kvm/kvm_main.c
===================================================================
--- linux-stable-rt.orig/virt/kvm/kvm_main.c	2014-11-25 14:13:39.204899928 -0200
+++ linux-stable-rt/virt/kvm/kvm_main.c	2014-11-25 14:14:38.641810060 -0200
@@ -219,7 +219,7 @@
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
-	init_waitqueue_head(&vcpu->wq);
+	init_swait_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
@@ -1675,10 +1675,10 @@
  */
 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
-	DEFINE_WAIT(wait);
+	DEFINE_SWAITER(wait);
 
 	for (;;) {
-		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+		swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_arch_vcpu_runnable(vcpu)) {
 			kvm_make_request(KVM_REQ_UNHALT, vcpu);
@@ -1692,7 +1692,7 @@
 		schedule();
 	}
 
-	finish_wait(&vcpu->wq, &wait);
+	swait_finish(&vcpu->wq, &wait);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
@@ -1704,11 +1704,11 @@
 {
 	int me;
 	int cpu = vcpu->cpu;
-	wait_queue_head_t *wqp;
+	struct swait_head *wqp;
 
 	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (waitqueue_active(wqp)) {
-		wake_up_interruptible(wqp);
+	if (swaitqueue_active(wqp)) {
+		swait_wake_interruptible(wqp);
 		++vcpu->stat.halt_wakeup;
 	}
 
@@ -1814,7 +1814,7 @@
 				continue;
 			if (vcpu == me)
 				continue;
-			if (waitqueue_active(&vcpu->wq))
+			if (swaitqueue_active(&vcpu->wq))
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti
  2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
@ 2014-11-25 17:21 ` Marcelo Tosatti
  2014-11-25 17:38   ` Paolo Bonzini
  2014-11-25 19:11   ` Rik van Riel
  1 sibling, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm, Marcelo Tosatti

[-- Attachment #1: kvm-lapic-irqsafe --]
[-- Type: text/plain, Size: 2681 bytes --]

Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.

Reduces average cyclictest latency by 3us.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/lapic.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

Index: linux-stable-rt/arch/x86/kvm/lapic.c
===================================================================
--- linux-stable-rt.orig/arch/x86/kvm/lapic.c	2014-11-25 14:14:38.636810068 -0200
+++ linux-stable-rt/arch/x86/kvm/lapic.c	2014-11-25 14:17:28.850567059 -0200
@@ -1031,8 +1031,38 @@
 				   apic->divide_count);
 }
 
+
+static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
+
+static void apic_timer_expired(struct hrtimer *data)
+{
+	int ret, i = 0;
+	enum hrtimer_restart r;
+	struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+
+	r = apic_timer_fn(data);
+
+	if (r == HRTIMER_RESTART) {
+		do {
+			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
+			if (ret == -ETIME)
+				hrtimer_add_expires_ns(&ktimer->timer,
+							ktimer->period);
+			i++;
+		} while (ret == -ETIME && i < 10);
+
+		if (ret == -ETIME) {
+			printk(KERN_ERR "%s: failed to reprogram timer\n",
+			       __func__);
+			WARN_ON(1);
+		}
+	}
+}
+
+
 static void start_apic_timer(struct kvm_lapic *apic)
 {
+	int ret;
 	ktime_t now;
 	atomic_set(&apic->lapic_timer.pending, 0);
 
@@ -1062,9 +1092,11 @@
 			}
 		}
 
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			      ktime_add_ns(now, apic->lapic_timer.period),
 			      HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
 			   PRIx64 ", "
@@ -1094,8 +1126,10 @@
 			ns = (tscdeadline - guest_tsc) * 1000000ULL;
 			do_div(ns, this_tsc_khz);
 		}
-		hrtimer_start(&apic->lapic_timer.timer,
+		ret = hrtimer_start(&apic->lapic_timer.timer,
 			ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+		if (ret == -ETIME)
+			apic_timer_expired(&apic->lapic_timer.timer);
 
 		local_irq_restore(flags);
 	}
@@ -1581,6 +1615,7 @@
 	hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
 		     HRTIMER_MODE_ABS);
 	apic->lapic_timer.timer.function = apic_timer_fn;
+	apic->lapic_timer.timer.irqsafe = 1;
 
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
@@ -1699,7 +1734,8 @@
 
 	timer = &vcpu->arch.apic->lapic_timer.timer;
 	if (hrtimer_cancel(timer))
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+		if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
+			apic_timer_expired(timer);
 }
 
 /*



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
@ 2014-11-25 17:38   ` Paolo Bonzini
  2014-11-25 19:01     ` Jan Kiszka
  2014-12-04 13:43     ` Marcelo Tosatti
  2014-11-25 19:11   ` Rik van Riel
  1 sibling, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner,
	kvm



On 25/11/2014 18:21, Marcelo Tosatti wrote:
> +
> +	if (r == HRTIMER_RESTART) {
> +		do {
> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> +			if (ret == -ETIME)
> +				hrtimer_add_expires_ns(&ktimer->timer,
> +							ktimer->period);

Is it possible to just compute the time where the next interrupt
happens?  I suspect the printk and WARN_ON below can be easily triggered
by a guest.

Paolo

> +			i++;
> +		} while (ret == -ETIME && i < 10);
> +
> +		if (ret == -ETIME) {
> +			printk(KERN_ERR "%s: failed to reprogram timer\n",
> +			       __func__);
> +			WARN_ON(1);
> +		}
> +	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
@ 2014-11-25 18:57   ` Rik van Riel
  2014-11-25 19:08     ` Rik van Riel
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Rik van Riel @ 2014-11-25 18:57 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm

On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> The problem:
> 
> On -RT, an emulated LAPIC timer instances has the following path:
> 
> 1) hard interrupt
> 2) ksoftirqd is scheduled
> 3) ksoftirqd wakes up vcpu thread
> 4) vcpu thread is scheduled
> 
> This extra context switch introduces unnecessary latency in the 
> LAPIC path for a KVM guest.
> 
> The solution:
> 
> Allow waking up vcpu thread from hardirq context,
> thus avoiding the need for ksoftirqd to be scheduled.
> 
> Normal waitqueues make use of spinlocks, which on -RT 
> are sleepable locks. Therefore, waking up a waitqueue 
> waiter involves locking a sleeping lock, which 
> is not allowed from hard interrupt context.
> 

What are the consequences for the realtime kernel of
making waitqueue traversal non-preemptible?

Is waitqueue traversal something that ever takes so
long we need to care about it being non-preemptible?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:38   ` Paolo Bonzini
@ 2014-11-25 19:01     ` Jan Kiszka
  2014-12-04 13:43     ` Marcelo Tosatti
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Kiszka @ 2014-11-25 19:01 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: kvm

On 2014-11-25 18:38, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 18:21, Marcelo Tosatti wrote:
>> +
>> +	if (r == HRTIMER_RESTART) {
>> +		do {
>> +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
>> +			if (ret == -ETIME)
>> +				hrtimer_add_expires_ns(&ktimer->timer,
>> +							ktimer->period);
> 
> Is it possible to just compute the time where the next interrupt
> happens?  I suspect the printk and WARN_ON below can be easily triggered
> by a guest.

We have a lower bound for the period that a guest can program. Unless
that value is set too low, this should practically not happen if we
avoid disturbances while handling the event and reprogramming the next
one (irqs off?).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2014-11-25 18:57   ` Rik van Riel
@ 2014-11-25 19:08     ` Rik van Riel
  2014-11-25 19:30     ` Marcelo Tosatti
  2014-11-25 20:23     ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2014-11-25 19:08 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm

On 11/25/2014 01:57 PM, Rik van Riel wrote:
> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
>> The problem:
>>
>> On -RT, an emulated LAPIC timer instances has the following path:
>>
>> 1) hard interrupt
>> 2) ksoftirqd is scheduled
>> 3) ksoftirqd wakes up vcpu thread
>> 4) vcpu thread is scheduled
>>
>> This extra context switch introduces unnecessary latency in the 
>> LAPIC path for a KVM guest.
>>
>> The solution:
>>
>> Allow waking up vcpu thread from hardirq context,
>> thus avoiding the need for ksoftirqd to be scheduled.
>>
>> Normal waitqueues make use of spinlocks, which on -RT 
>> are sleepable locks. Therefore, waking up a waitqueue 
>> waiter involves locking a sleeping lock, which 
>> is not allowed from hard interrupt context.
>>
> 
> What are the consequences for the realtime kernel of
> making waitqueue traversal non-preemptible?
> 
> Is waitqueue traversal something that ever takes so
> long we need to care about it being non-preemptible?

I answered my own question.

This patch only changes the kvm vcpu waitqueue,
which should only ever have the vcpu thread itself
waiting on it. In other words, it is a wait "queue"
of just one entry long, and the latency of traversing
it will be absolutely minimal.

Unless someone can think of a better way to implement
this patch, it gets my:

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
  2014-11-25 17:38   ` Paolo Bonzini
@ 2014-11-25 19:11   ` Rik van Riel
  2014-11-25 20:24     ` Thomas Gleixner
  1 sibling, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2014-11-25 19:11 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm

On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> Since lapic timer handler only wakes up a simple waitqueue,
> it can be executed from hardirq context.
> 
> Reduces average cyclictest latency by 3us.

Can this patch be merged in the KVM tree, and go
upstream via Paolo?

> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2014-11-25 18:57   ` Rik van Riel
  2014-11-25 19:08     ` Rik van Riel
@ 2014-11-25 19:30     ` Marcelo Tosatti
  2014-11-25 20:23     ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2014-11-25 19:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Luiz Capitulino, Steven Rostedt, Thomas Gleixner,
	kvm

On Tue, Nov 25, 2014 at 01:57:37PM -0500, Rik van Riel wrote:
> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> > The problem:
> > 
> > On -RT, an emulated LAPIC timer instances has the following path:
> > 
> > 1) hard interrupt
> > 2) ksoftirqd is scheduled
> > 3) ksoftirqd wakes up vcpu thread
> > 4) vcpu thread is scheduled
> > 
> > This extra context switch introduces unnecessary latency in the 
> > LAPIC path for a KVM guest.
> > 
> > The solution:
> > 
> > Allow waking up vcpu thread from hardirq context,
> > thus avoiding the need for ksoftirqd to be scheduled.
> > 
> > Normal waitqueues make use of spinlocks, which on -RT 
> > are sleepable locks. Therefore, waking up a waitqueue 
> > waiter involves locking a sleeping lock, which 
> > is not allowed from hard interrupt context.
> > 
> 
> What are the consequences for the realtime kernel of
> making waitqueue traversal non-preemptible?
> 
> Is waitqueue traversal something that ever takes so
> long we need to care about it being non-preemptible?

commit d872cfa018625071a3a6b1ea8a62a2790d2c157a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Dec 12 12:29:04 2011 +0100

    wait-simple: Simple waitqueue implementation

    wait_queue is a swiss army knife and in most of the cases the
    complexity is not needed. For RT waitqueues are a constant source of
    trouble as we can't convert the head lock to a raw spinlock due to
    fancy and long lasting callbacks.

Unsure about the details...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq
  2014-11-25 18:57   ` Rik van Riel
  2014-11-25 19:08     ` Rik van Riel
  2014-11-25 19:30     ` Marcelo Tosatti
@ 2014-11-25 20:23     ` Thomas Gleixner
  2 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2014-11-25 20:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt,
	kvm

On Tue, 25 Nov 2014, Rik van Riel wrote:
> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> > The problem:
> > 
> > On -RT, an emulated LAPIC timer instances has the following path:
> > 
> > 1) hard interrupt
> > 2) ksoftirqd is scheduled
> > 3) ksoftirqd wakes up vcpu thread
> > 4) vcpu thread is scheduled
> > 
> > This extra context switch introduces unnecessary latency in the 
> > LAPIC path for a KVM guest.
> > 
> > The solution:
> > 
> > Allow waking up vcpu thread from hardirq context,
> > thus avoiding the need for ksoftirqd to be scheduled.
> > 
> > Normal waitqueues make use of spinlocks, which on -RT 
> > are sleepable locks. Therefore, waking up a waitqueue 
> > waiter involves locking a sleeping lock, which 
> > is not allowed from hard interrupt context.
> > 
> 
> What are the consequences for the realtime kernel of
> making waitqueue traversal non-preemptible?
> 
> Is waitqueue traversal something that ever takes so
> long we need to care about it being non-preemptible?

__wake_up
	lock_irq_save()
	__wake_up_common()
		for_each_entry(curr)
			curr->func()
	unlock_irq_save()

There are two issues here:

1) Long waiter chains. We probably could work around that in some
   creative way

2) The non default callbacks. default is default_wake_function.

   The non default callbacks, which are used by nfsd and other stuff
   are calling the world and some more and take preemptible locks and
   do other fancy stuff which falls flat on its nose when called under
   a raw lock on rt

So I created the swait replacement which has a smaller memory
footprint, less featuritis and a raw lock because its only wakes up,
no custom callbacks.

The still suffer #1, but most use cases on RT are single waiter
scenarios anyway.

Hope that helps.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 19:11   ` Rik van Riel
@ 2014-11-25 20:24     ` Thomas Gleixner
  2014-12-04 13:53       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2014-11-25 20:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt,
	kvm

On Tue, 25 Nov 2014, Rik van Riel wrote:
> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> > Since lapic timer handler only wakes up a simple waitqueue,
> > it can be executed from hardirq context.
> > 
> > Reduces average cyclictest latency by 3us.
> 
> Can this patch be merged in the KVM tree, and go
> upstream via Paolo?

Not really as it has RT dependencies ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 17:38   ` Paolo Bonzini
  2014-11-25 19:01     ` Jan Kiszka
@ 2014-12-04 13:43     ` Marcelo Tosatti
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2014-12-04 13:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Luiz Capitulino, Rik van Riel, Steven Rostedt,
	Thomas Gleixner, kvm

On Tue, Nov 25, 2014 at 06:38:04PM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 18:21, Marcelo Tosatti wrote:
> > +
> > +	if (r == HRTIMER_RESTART) {
> > +		do {
> > +			ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
> > +			if (ret == -ETIME)
> > +				hrtimer_add_expires_ns(&ktimer->timer,
> > +							ktimer->period);
> 
> Is it possible to just compute the time where the next interrupt
> happens?  

Yes but there is no guarantee hrtimer_start_expires will not fail.

> I suspect the printk and WARN_ON below can be easily triggered
> by a guest.

I'll remove those, thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-11-25 20:24     ` Thomas Gleixner
@ 2014-12-04 13:53       ` Paolo Bonzini
  2014-12-05 16:17         ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw)
  To: Thomas Gleixner, Rik van Riel
  Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt,
	kvm



On 25/11/2014 21:24, Thomas Gleixner wrote:
> On Tue, 25 Nov 2014, Rik van Riel wrote:
>> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
>>> Since lapic timer handler only wakes up a simple waitqueue,
>>> it can be executed from hardirq context.
>>>
>>> Reduces average cyclictest latency by 3us.
>>
>> Can this patch be merged in the KVM tree, and go
>> upstream via Paolo?
> 
> Not really as it has RT dependencies ....

Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
I can take patch 2.  If not, it's better to keep both patches in the RT
tree.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe
  2014-12-04 13:53       ` Paolo Bonzini
@ 2014-12-05 16:17         ` Marcelo Tosatti
  0 siblings, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2014-12-05 16:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Gleixner, Rik van Riel, linux-kernel, Luiz Capitulino,
	Steven Rostedt, kvm

On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2014 21:24, Thomas Gleixner wrote:
> > On Tue, 25 Nov 2014, Rik van Riel wrote:
> >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
> >>> Since lapic timer handler only wakes up a simple waitqueue,
> >>> it can be executed from hardirq context.
> >>>
> >>> Reduces average cyclictest latency by 3us.
> >>
> >> Can this patch be merged in the KVM tree, and go
> >> upstream via Paolo?
> > 
> > Not really as it has RT dependencies ....
> 
> Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
> I can take patch 2.  If not, it's better to keep both patches in the RT
> tree.
> 
> Paolo

It can't. We're still evaluating the necessity of this patch, will
resubmit accordingly.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-05 16:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti
2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti
2014-11-25 18:57   ` Rik van Riel
2014-11-25 19:08     ` Rik van Riel
2014-11-25 19:30     ` Marcelo Tosatti
2014-11-25 20:23     ` Thomas Gleixner
2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti
2014-11-25 17:38   ` Paolo Bonzini
2014-11-25 19:01     ` Jan Kiszka
2014-12-04 13:43     ` Marcelo Tosatti
2014-11-25 19:11   ` Rik van Riel
2014-11-25 20:24     ` Thomas Gleixner
2014-12-04 13:53       ` Paolo Bonzini
2014-12-05 16:17         ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).