* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) @ 2014-11-25 17:21 Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 2 replies; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti @ 2014-11-25 17:21 ` Marcelo Tosatti 2014-11-25 18:57 ` Rik van Riel 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 1 sibling, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Marcelo Tosatti [-- Attachment #1: kvm-wq-use-simple-waitqueue --] [-- Type: text/plain, Size: 14747 bytes --] The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/arm/kvm/arm.c | 4 ++-- arch/arm/kvm/psci.c | 4 ++-- arch/mips/kvm/kvm_mips.c | 8 ++++---- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kvm/book3s_hv.c | 20 ++++++++++---------- arch/s390/include/asm/kvm_host.h | 2 +- arch/s390/kvm/interrupt.c | 22 ++++++++++------------ arch/s390/kvm/sigp.c | 16 ++++++++-------- arch/x86/kvm/lapic.c | 6 +++--- include/linux/kvm_host.h | 4 ++-- virt/kvm/async_pf.c | 4 ++-- virt/kvm/kvm_main.c | 16 ++++++++-------- 12 files changed, 54 insertions(+), 56 deletions(-) Index: linux-stable-rt/arch/arm/kvm/arm.c =================================================================== --- linux-stable-rt.orig/arch/arm/kvm/arm.c 2014-11-25 14:13:39.188899952 -0200 +++ linux-stable-rt/arch/arm/kvm/arm.c 2014-11-25 14:14:38.620810092 -0200 @@ -495,9 +495,9 @@ static void vcpu_pause(struct kvm_vcpu *vcpu) { - wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); + struct swait_head *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu->arch.pause); + swait_event_interruptible(*wq, !vcpu->arch.pause); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) Index: linux-stable-rt/arch/arm/kvm/psci.c =================================================================== --- linux-stable-rt.orig/arch/arm/kvm/psci.c 2014-11-25 14:13:39.189899951 -0200 +++ linux-stable-rt/arch/arm/kvm/psci.c 2014-11-25 14:14:38.620810092 -0200 @@ -36,7 +36,7 @@ { struct kvm *kvm = source_vcpu->kvm; struct kvm_vcpu *vcpu = NULL, *tmp; - wait_queue_head_t *wq; + struct swait_head *wq; unsigned long cpu_id; unsigned long mpidr; phys_addr_t target_pc; @@ -80,7 +80,7 @@ smp_mb(); /* Make sure the above is visible */ wq = kvm_arch_vcpu_wq(vcpu); - wake_up_interruptible(wq); + swait_wake_interruptible(wq); return KVM_PSCI_RET_SUCCESS; } Index: linux-stable-rt/arch/mips/kvm/kvm_mips.c =================================================================== --- linux-stable-rt.orig/arch/mips/kvm/kvm_mips.c 2014-11-25 14:13:39.191899948 -0200 +++ linux-stable-rt/arch/mips/kvm/kvm_mips.c 2014-11-25 14:14:38.621810091 -0200 @@ -464,8 +464,8 @@ dvcpu->arch.wait = 0; - if (waitqueue_active(&dvcpu->wq)) { - wake_up_interruptible(&dvcpu->wq); + if (swaitqueue_active(&dvcpu->wq)) { + swait_wake_interruptible(&dvcpu->wq); } return 0; @@ -971,8 +971,8 @@ kvm_mips_callbacks->queue_timer_int(vcpu); vcpu->arch.wait = 0; - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) { + swait_wake_nterruptible(&vcpu->wq); } } Index: linux-stable-rt/arch/powerpc/include/asm/kvm_host.h =================================================================== --- linux-stable-rt.orig/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:13:39.193899944 -0200 +++ linux-stable-rt/arch/powerpc/include/asm/kvm_host.h 2014-11-25 14:14:38.621810091 -0200 @@ -295,7 +295,7 @@ u8 in_guest; struct list_head runnable_threads; spinlock_t lock; - wait_queue_head_t wq; + struct swait_head wq; u64 stolen_tb; u64 preempt_tb; struct kvm_vcpu *runner; @@ -612,7 +612,7 @@ u8 prodded; u32 last_inst; - wait_queue_head_t *wqp; + struct swait_head *wqp; struct kvmppc_vcore *vcore; int ret; int trap; Index: linux-stable-rt/arch/powerpc/kvm/book3s_hv.c =================================================================== --- linux-stable-rt.orig/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:13:39.195899942 -0200 +++ linux-stable-rt/arch/powerpc/kvm/book3s_hv.c 2014-11-25 14:14:38.625810085 -0200 @@ -74,11 +74,11 @@ { int me; int cpu = vcpu->cpu; - wait_queue_head_t *wqp; + struct swait_head *wqp; wqp = kvm_arch_vcpu_wq(vcpu); - if (waitqueue_active(wqp)) { - wake_up_interruptible(wqp); + if (swaitqueue_active(wqp)) { + swait_wake_interruptible(wqp); ++vcpu->stat.halt_wakeup; } @@ -583,8 +583,8 @@ tvcpu->arch.prodded = 1; smp_mb(); if (vcpu->arch.ceded) { - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) { + swait_wake_interruptible(&vcpu->wq); vcpu->stat.halt_wakeup++; } } @@ -1199,7 +1199,7 @@ if (vcore) { INIT_LIST_HEAD(&vcore->runnable_threads); spin_lock_init(&vcore->lock); - init_waitqueue_head(&vcore->wq); + init_swait_head(&vcore->wq); vcore->preempt_tb = TB_NIL; vcore->lpcr = kvm->arch.lpcr; vcore->first_vcpuid = core * threads_per_core; @@ -1566,13 +1566,13 @@ */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { - DEFINE_WAIT(wait); + DEFINE_SWAITER(wait); - prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); + swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE); vc->vcore_state = VCORE_SLEEPING; spin_unlock(&vc->lock); schedule(); - finish_wait(&vc->wq, &wait); + swait_finish(&vc->wq, &wait); spin_lock(&vc->lock); vc->vcore_state = VCORE_INACTIVE; } @@ -1613,7 +1613,7 @@ kvmppc_create_dtl_entry(vcpu, vc); kvmppc_start_thread(vcpu); } else if (vc->vcore_state == VCORE_SLEEPING) { - wake_up(&vc->wq); + swait_wake(&vc->wq); } } Index: linux-stable-rt/arch/s390/include/asm/kvm_host.h =================================================================== --- linux-stable-rt.orig/arch/s390/include/asm/kvm_host.h 2014-11-25 14:13:39.196899940 -0200 +++ linux-stable-rt/arch/s390/include/asm/kvm_host.h 2014-11-25 14:14:38.627810082 -0200 @@ -233,7 +233,7 @@ atomic_t active; struct kvm_s390_float_interrupt *float_int; int timer_due; /* event indicator for waitqueue below */ - wait_queue_head_t *wq; + struct swait_head *wq; atomic_t *cpuflags; unsigned int action_bits; }; Index: linux-stable-rt/arch/s390/kvm/interrupt.c =================================================================== --- linux-stable-rt.orig/arch/s390/kvm/interrupt.c 2014-11-25 14:13:39.197899939 -0200 +++ linux-stable-rt/arch/s390/kvm/interrupt.c 2014-11-25 14:14:38.630810077 -0200 @@ -403,7 +403,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu *vcpu) { u64 now, sltime; - DECLARE_WAITQUEUE(wait, current); + DECLARE_SWAITER(wait); vcpu->stat.exit_wait_state++; if (kvm_cpu_has_interrupt(vcpu)) @@ -440,12 +440,11 @@ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); spin_lock(&vcpu->arch.local_int.float_int->lock); spin_lock_bh(&vcpu->arch.local_int.lock); - add_wait_queue(&vcpu->wq, &wait); + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); while (list_empty(&vcpu->arch.local_int.list) && list_empty(&vcpu->arch.local_int.float_int->list) && (!vcpu->arch.local_int.timer_due) && !signal_pending(current)) { - set_current_state(TASK_INTERRUPTIBLE); spin_unlock_bh(&vcpu->arch.local_int.lock); spin_unlock(&vcpu->arch.local_int.float_int->lock); schedule(); @@ -453,8 +452,7 @@ spin_lock_bh(&vcpu->arch.local_int.lock); } __unset_cpu_idle(vcpu); - __set_current_state(TASK_RUNNING); - remove_wait_queue(&vcpu->wq, &wait); + swait_finish(&vcpu->wq, &wait); spin_unlock_bh(&vcpu->arch.local_int.lock); spin_unlock(&vcpu->arch.local_int.float_int->lock); vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); @@ -469,8 +467,8 @@ spin_lock(&vcpu->arch.local_int.lock); vcpu->arch.local_int.timer_due = 1; - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); spin_unlock(&vcpu->arch.local_int.lock); } @@ -617,7 +615,7 @@ spin_lock_bh(&li->lock); list_add(&inti->list, &li->list); atomic_set(&li->active, 1); - BUG_ON(waitqueue_active(li->wq)); + BUG_ON(swaitqueue_active(li->wq)); spin_unlock_bh(&li->lock); return 0; } @@ -750,8 +748,8 @@ li = fi->local_int[sigcpu]; spin_lock_bh(&li->lock); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); spin_unlock(&fi->lock); mutex_unlock(&kvm->lock); @@ -836,8 +834,8 @@ if (inti->type == KVM_S390_SIGP_STOP) li->action_bits |= ACTION_STOP_ON_STOP; atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); spin_unlock_bh(&li->lock); mutex_unlock(&vcpu->kvm->lock); return 0; Index: linux-stable-rt/arch/s390/kvm/sigp.c =================================================================== --- linux-stable-rt.orig/arch/s390/kvm/sigp.c 2014-11-25 14:13:39.198899937 -0200 +++ linux-stable-rt/arch/s390/kvm/sigp.c 2014-11-25 14:14:38.633810073 -0200 @@ -79,8 +79,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr); @@ -148,8 +148,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); spin_unlock_bh(&li->lock); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr); @@ -179,8 +179,8 @@ atomic_set(&li->active, 1); atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags); li->action_bits |= action; - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); out: spin_unlock_bh(&li->lock); @@ -288,8 +288,8 @@ list_add_tail(&inti->list, &li->list); atomic_set(&li->active, 1); - if (waitqueue_active(li->wq)) - wake_up_interruptible(li->wq); + if (swaitqueue_active(li->wq)) + swait_wake_interruptible(li->wq); rc = SIGP_CC_ORDER_CODE_ACCEPTED; VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address); Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:13:39.199899935 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 @@ -1533,7 +1533,7 @@ struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer); struct kvm_vcpu *vcpu = apic->vcpu; - wait_queue_head_t *q = &vcpu->wq; + struct swait_head *q = &vcpu->wq; /* * There is a race window between reading and incrementing, but we do @@ -1547,8 +1547,8 @@ kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); } - if (waitqueue_active(q)) - wake_up_interruptible(q); + if (swaitqueue_active(q)) + swait_wake_interruptible(q); if (lapic_is_periodic(apic)) { hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); Index: linux-stable-rt/include/linux/kvm_host.h =================================================================== --- linux-stable-rt.orig/include/linux/kvm_host.h 2014-11-25 14:13:39.201899932 -0200 +++ linux-stable-rt/include/linux/kvm_host.h 2014-11-25 14:14:38.638810065 -0200 @@ -231,7 +231,7 @@ int fpu_active; int guest_fpu_loaded, guest_xcr0_loaded; - wait_queue_head_t wq; + struct swait_head wq; struct pid *pid; int sigset_active; sigset_t sigset; @@ -677,7 +677,7 @@ } #endif -static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) { #ifdef __KVM_HAVE_ARCH_WQP return vcpu->arch.wqp; Index: linux-stable-rt/virt/kvm/async_pf.c =================================================================== --- linux-stable-rt.orig/virt/kvm/async_pf.c 2014-11-25 14:13:39.202899931 -0200 +++ linux-stable-rt/virt/kvm/async_pf.c 2014-11-25 14:14:38.639810063 -0200 @@ -82,8 +82,8 @@ trace_kvm_async_pf_completed(addr, gva); - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); + if (swaitqueue_active(&vcpu->wq)) + swait_wake_interruptible(&vcpu->wq); mmput(mm); kvm_put_kvm(vcpu->kvm); Index: linux-stable-rt/virt/kvm/kvm_main.c =================================================================== --- linux-stable-rt.orig/virt/kvm/kvm_main.c 2014-11-25 14:13:39.204899928 -0200 +++ linux-stable-rt/virt/kvm/kvm_main.c 2014-11-25 14:14:38.641810060 -0200 @@ -219,7 +219,7 @@ vcpu->kvm = kvm; vcpu->vcpu_id = id; vcpu->pid = NULL; - init_waitqueue_head(&vcpu->wq); + init_swait_head(&vcpu->wq); kvm_async_pf_vcpu_init(vcpu); page = alloc_page(GFP_KERNEL | __GFP_ZERO); @@ -1675,10 +1675,10 @@ */ void kvm_vcpu_block(struct kvm_vcpu *vcpu) { - DEFINE_WAIT(wait); + DEFINE_SWAITER(wait); for (;;) { - prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); + swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); if (kvm_arch_vcpu_runnable(vcpu)) { kvm_make_request(KVM_REQ_UNHALT, vcpu); @@ -1692,7 +1692,7 @@ schedule(); } - finish_wait(&vcpu->wq, &wait); + swait_finish(&vcpu->wq, &wait); } EXPORT_SYMBOL_GPL(kvm_vcpu_block); @@ -1704,11 +1704,11 @@ { int me; int cpu = vcpu->cpu; - wait_queue_head_t *wqp; + struct swait_head *wqp; wqp = kvm_arch_vcpu_wq(vcpu); - if (waitqueue_active(wqp)) { - wake_up_interruptible(wqp); + if (swaitqueue_active(wqp)) { + swait_wake_interruptible(wqp); ++vcpu->stat.halt_wakeup; } @@ -1814,7 +1814,7 @@ continue; if (vcpu == me) continue; - if (waitqueue_active(&vcpu->wq)) + if (swaitqueue_active(&vcpu->wq)) continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) continue; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti @ 2014-11-25 18:57 ` Rik van Riel 2014-11-25 19:08 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Rik van Riel @ 2014-11-25 18:57 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > The problem: > > On -RT, an emulated LAPIC timer instances has the following path: > > 1) hard interrupt > 2) ksoftirqd is scheduled > 3) ksoftirqd wakes up vcpu thread > 4) vcpu thread is scheduled > > This extra context switch introduces unnecessary latency in the > LAPIC path for a KVM guest. > > The solution: > > Allow waking up vcpu thread from hardirq context, > thus avoiding the need for ksoftirqd to be scheduled. > > Normal waitqueues make use of spinlocks, which on -RT > are sleepable locks. Therefore, waking up a waitqueue > waiter involves locking a sleeping lock, which > is not allowed from hard interrupt context. > What are the consequences for the realtime kernel of making waitqueue traversal non-preemptible? Is waitqueue traversal something that ever takes so long we need to care about it being non-preemptible? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2014-11-25 18:57 ` Rik van Riel @ 2014-11-25 19:08 ` Rik van Riel 2014-11-25 19:30 ` Marcelo Tosatti 2014-11-25 20:23 ` Thomas Gleixner 2 siblings, 0 replies; 19+ messages in thread From: Rik van Riel @ 2014-11-25 19:08 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm On 11/25/2014 01:57 PM, Rik van Riel wrote: > On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: >> The problem: >> >> On -RT, an emulated LAPIC timer instances has the following path: >> >> 1) hard interrupt >> 2) ksoftirqd is scheduled >> 3) ksoftirqd wakes up vcpu thread >> 4) vcpu thread is scheduled >> >> This extra context switch introduces unnecessary latency in the >> LAPIC path for a KVM guest. >> >> The solution: >> >> Allow waking up vcpu thread from hardirq context, >> thus avoiding the need for ksoftirqd to be scheduled. >> >> Normal waitqueues make use of spinlocks, which on -RT >> are sleepable locks. Therefore, waking up a waitqueue >> waiter involves locking a sleeping lock, which >> is not allowed from hard interrupt context. >> > > What are the consequences for the realtime kernel of > making waitqueue traversal non-preemptible? > > Is waitqueue traversal something that ever takes so > long we need to care about it being non-preemptible? I answered my own question. This patch only changes the kvm vcpu waitqueue, which should only ever have the vcpu thread itself waiting on it. In other words, it is a wait "queue" of just one entry long, and the latency of traversing it will be absolutely minimal. Unless someone can think of a better way to implement this patch, it gets my: Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2014-11-25 18:57 ` Rik van Riel 2014-11-25 19:08 ` Rik van Riel @ 2014-11-25 19:30 ` Marcelo Tosatti 2014-11-25 20:23 ` Thomas Gleixner 2 siblings, 0 replies; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 19:30 UTC (permalink / raw) To: Rik van Riel Cc: linux-kernel, Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm On Tue, Nov 25, 2014 at 01:57:37PM -0500, Rik van Riel wrote: > On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > > The problem: > > > > On -RT, an emulated LAPIC timer instances has the following path: > > > > 1) hard interrupt > > 2) ksoftirqd is scheduled > > 3) ksoftirqd wakes up vcpu thread > > 4) vcpu thread is scheduled > > > > This extra context switch introduces unnecessary latency in the > > LAPIC path for a KVM guest. > > > > The solution: > > > > Allow waking up vcpu thread from hardirq context, > > thus avoiding the need for ksoftirqd to be scheduled. > > > > Normal waitqueues make use of spinlocks, which on -RT > > are sleepable locks. Therefore, waking up a waitqueue > > waiter involves locking a sleeping lock, which > > is not allowed from hard interrupt context. > > > > What are the consequences for the realtime kernel of > making waitqueue traversal non-preemptible? > > Is waitqueue traversal something that ever takes so > long we need to care about it being non-preemptible? commit d872cfa018625071a3a6b1ea8a62a2790d2c157a Author: Thomas Gleixner <tglx@linutronix.de> Date: Mon Dec 12 12:29:04 2011 +0100 wait-simple: Simple waitqueue implementation wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Unsure about the details... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq 2014-11-25 18:57 ` Rik van Riel 2014-11-25 19:08 ` Rik van Riel 2014-11-25 19:30 ` Marcelo Tosatti @ 2014-11-25 20:23 ` Thomas Gleixner 2 siblings, 0 replies; 19+ messages in thread From: Thomas Gleixner @ 2014-11-25 20:23 UTC (permalink / raw) To: Rik van Riel Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On Tue, 25 Nov 2014, Rik van Riel wrote: > On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > > The problem: > > > > On -RT, an emulated LAPIC timer instances has the following path: > > > > 1) hard interrupt > > 2) ksoftirqd is scheduled > > 3) ksoftirqd wakes up vcpu thread > > 4) vcpu thread is scheduled > > > > This extra context switch introduces unnecessary latency in the > > LAPIC path for a KVM guest. > > > > The solution: > > > > Allow waking up vcpu thread from hardirq context, > > thus avoiding the need for ksoftirqd to be scheduled. > > > > Normal waitqueues make use of spinlocks, which on -RT > > are sleepable locks. Therefore, waking up a waitqueue > > waiter involves locking a sleeping lock, which > > is not allowed from hard interrupt context. > > > > What are the consequences for the realtime kernel of > making waitqueue traversal non-preemptible? > > Is waitqueue traversal something that ever takes so > long we need to care about it being non-preemptible? __wake_up lock_irq_save() __wake_up_common() for_each_entry(curr) curr->func() unlock_irq_save() There are two issues here: 1) Long waiter chains. We probably could work around that in some creative way 2) The non default callbacks. default is default_wake_function. The non default callbacks, which are used by nfsd and other stuff are calling the world and some more and take preemptible locks and do other fancy stuff which falls flat on its nose when called under a raw lock on rt So I created the swait replacement which has a smaller memory footprint, less featuritis and a raw lock because its only wakes up, no custom callbacks. The still suffer #1, but most use cases on RT are single waiter scenarios anyway. Hope that helps. Thanks, tglx ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti @ 2014-11-25 17:21 ` Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:11 ` Rik van Riel 1 sibling, 2 replies; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 17:21 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2681 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:01 ` Jan Kiszka 2014-12-04 13:43 ` Marcelo Tosatti 2014-11-25 19:11 ` Rik van Riel 1 sibling, 2 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-11-25 17:38 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm On 25/11/2014 18:21, Marcelo Tosatti wrote: > + > + if (r == HRTIMER_RESTART) { > + do { > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + hrtimer_add_expires_ns(&ktimer->timer, > + ktimer->period); Is it possible to just compute the time where the next interrupt happens? I suspect the printk and WARN_ON below can be easily triggered by a guest. Paolo > + i++; > + } while (ret == -ETIME && i < 10); > + > + if (ret == -ETIME) { > + printk(KERN_ERR "%s: failed to reprogram timer\n", > + __func__); > + WARN_ON(1); > + } > + } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:38 ` Paolo Bonzini @ 2014-11-25 19:01 ` Jan Kiszka 2014-12-04 13:43 ` Marcelo Tosatti 1 sibling, 0 replies; 19+ messages in thread From: Jan Kiszka @ 2014-11-25 19:01 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel; +Cc: kvm On 2014-11-25 18:38, Paolo Bonzini wrote: > > > On 25/11/2014 18:21, Marcelo Tosatti wrote: >> + >> + if (r == HRTIMER_RESTART) { >> + do { >> + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); >> + if (ret == -ETIME) >> + hrtimer_add_expires_ns(&ktimer->timer, >> + ktimer->period); > > Is it possible to just compute the time where the next interrupt > happens? I suspect the printk and WARN_ON below can be easily triggered > by a guest. We have a lower bound for the period that a guest can program. Unless that value is set too low, this should practically not happen if we avoid disturbances while handling the event and reprogramming the next one (irqs off?). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:01 ` Jan Kiszka @ 2014-12-04 13:43 ` Marcelo Tosatti 1 sibling, 0 replies; 19+ messages in thread From: Marcelo Tosatti @ 2014-12-04 13:43 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel, Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm On Tue, Nov 25, 2014 at 06:38:04PM +0100, Paolo Bonzini wrote: > > > On 25/11/2014 18:21, Marcelo Tosatti wrote: > > + > > + if (r == HRTIMER_RESTART) { > > + do { > > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > > + if (ret == -ETIME) > > + hrtimer_add_expires_ns(&ktimer->timer, > > + ktimer->period); > > Is it possible to just compute the time where the next interrupt > happens? Yes but there is no guarantee hrtimer_start_expires will not fail. > I suspect the printk and WARN_ON below can be easily triggered > by a guest. I'll remove those, thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini @ 2014-11-25 19:11 ` Rik van Riel 2014-11-25 20:24 ` Thomas Gleixner 1 sibling, 1 reply; 19+ messages in thread From: Rik van Riel @ 2014-11-25 19:11 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. Can this patch be merged in the KVM tree, and go upstream via Paolo? > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 19:11 ` Rik van Riel @ 2014-11-25 20:24 ` Thomas Gleixner 2014-12-04 13:53 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: Thomas Gleixner @ 2014-11-25 20:24 UTC (permalink / raw) To: Rik van Riel Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On Tue, 25 Nov 2014, Rik van Riel wrote: > On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > > Since lapic timer handler only wakes up a simple waitqueue, > > it can be executed from hardirq context. > > > > Reduces average cyclictest latency by 3us. > > Can this patch be merged in the KVM tree, and go > upstream via Paolo? Not really as it has RT dependencies .... Thanks, tglx ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 20:24 ` Thomas Gleixner @ 2014-12-04 13:53 ` Paolo Bonzini 2014-12-05 16:17 ` Marcelo Tosatti 0 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2014-12-04 13:53 UTC (permalink / raw) To: Thomas Gleixner, Rik van Riel Cc: Marcelo Tosatti, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On 25/11/2014 21:24, Thomas Gleixner wrote: > On Tue, 25 Nov 2014, Rik van Riel wrote: >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: >>> Since lapic timer handler only wakes up a simple waitqueue, >>> it can be executed from hardirq context. >>> >>> Reduces average cyclictest latency by 3us. >> >> Can this patch be merged in the KVM tree, and go >> upstream via Paolo? > > Not really as it has RT dependencies .... Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, I can take patch 2. If not, it's better to keep both patches in the RT tree. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-12-04 13:53 ` Paolo Bonzini @ 2014-12-05 16:17 ` Marcelo Tosatti 0 siblings, 0 replies; 19+ messages in thread From: Marcelo Tosatti @ 2014-12-05 16:17 UTC (permalink / raw) To: Paolo Bonzini Cc: Thomas Gleixner, Rik van Riel, linux-kernel, Luiz Capitulino, Steven Rostedt, kvm On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote: > > > On 25/11/2014 21:24, Thomas Gleixner wrote: > > On Tue, 25 Nov 2014, Rik van Riel wrote: > >> On 11/25/2014 12:21 PM, Marcelo Tosatti wrote: > >>> Since lapic timer handler only wakes up a simple waitqueue, > >>> it can be executed from hardirq context. > >>> > >>> Reduces average cyclictest latency by 3us. > >> > >> Can this patch be merged in the KVM tree, and go > >> upstream via Paolo? > > > > Not really as it has RT dependencies .... > > Can hrtimer_start_expires even return ETIME on a non-RT kernel? If yes, > I can take patch 2. If not, it's better to keep both patches in the RT > tree. > > Paolo It can't. We're still evaluating the necessity of this patch, will resubmit accordingly. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) @ 2015-01-21 20:36 Marcelo Tosatti 2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini Against v3.14-rt branch of git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git The problem: On -RT, an emulated LAPIC timer instance has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) v4: fix typo (Steven Rostedt) ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti @ 2015-01-21 20:36 ` Marcelo Tosatti 0 siblings, 0 replies; 19+ messages in thread From: Marcelo Tosatti @ 2015-01-21 20:36 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2815 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Also handle the case where hrtimer_start_expires fails due to -ETIME, by injecting the interrupt to the guest immediately. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 14:59:17.840251874 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk_once(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON_ONCE(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) @ 2015-01-14 17:12 Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini Against v3.14-rt branch of git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git The problem: On -RT, an emulated LAPIC timer instance has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: # cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m This patch reduces the average latency in my tests from 14us to 11us. v2: improve changelog (Rik van Riel) v3: limit (once) guest triggered printk and WARN_ON (Paolo Bonzini) ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti @ 2015-01-14 17:12 ` Marcelo Tosatti 2015-01-14 18:23 ` Rik van Riel 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2015-01-14 17:12 UTC (permalink / raw) To: linux-kernel, linux-rt-users Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2691 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2015-01-14 14:59:17.840251874 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk_once(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON_ONCE(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2015-01-14 18:23 ` Rik van Riel 0 siblings, 0 replies; 19+ messages in thread From: Rik van Riel @ 2015-01-14 18:23 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel, linux-rt-users Cc: Luiz Capitulino, Steven Rostedt, Thomas Gleixner, kvm, Paolo Bonzini On 01/14/2015 12:12 PM, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue @ 2014-11-25 16:45 Marcelo Tosatti 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner Which allows waking up vcpu from hardirq context. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti @ 2014-11-25 16:45 ` Marcelo Tosatti 2014-11-25 17:10 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2014-11-25 16:45 UTC (permalink / raw) To: linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner, Marcelo Tosatti [-- Attachment #1: kvm-lapic-irqsafe --] [-- Type: text/plain, Size: 2681 bytes --] Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Reduces average cyclictest latency by 3us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) Index: linux-stable-rt/arch/x86/kvm/lapic.c =================================================================== --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 @@ -1031,8 +1031,38 @@ apic->divide_count); } + +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); + +static void apic_timer_expired(struct hrtimer *data) +{ + int ret, i = 0; + enum hrtimer_restart r; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + + r = apic_timer_fn(data); + + if (r == HRTIMER_RESTART) { + do { + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); + if (ret == -ETIME) + hrtimer_add_expires_ns(&ktimer->timer, + ktimer->period); + i++; + } while (ret == -ETIME && i < 10); + + if (ret == -ETIME) { + printk(KERN_ERR "%s: failed to reprogram timer\n", + __func__); + WARN_ON(1); + } + } +} + + static void start_apic_timer(struct kvm_lapic *apic) { + int ret; ktime_t now; atomic_set(&apic->lapic_timer.pending, 0); @@ -1062,9 +1092,11 @@ } } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1094,8 +1126,10 @@ ns = (tscdeadline - guest_tsc) * 1000000ULL; do_div(ns, this_tsc_khz); } - hrtimer_start(&apic->lapic_timer.timer, + ret = hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + if (ret == -ETIME) + apic_timer_expired(&apic->lapic_timer.timer); local_irq_restore(flags); } @@ -1581,6 +1615,7 @@ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer.irqsafe = 1; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from @@ -1699,7 +1734,8 @@ timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) + apic_timer_expired(timer); } /* ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti @ 2014-11-25 17:10 ` Paolo Bonzini 0 siblings, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-11-25 17:10 UTC (permalink / raw) To: Marcelo Tosatti, linux-kernel Cc: Luiz Capitulino, Rik van Riel, Steven Rostedt, Thomas Gleixner On 25/11/2014 17:45, Marcelo Tosatti wrote: > Since lapic timer handler only wakes up a simple waitqueue, > it can be executed from hardirq context. > > Reduces average cyclictest latency by 3us. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Please include kvm@vger.kernel.org too. Paolo > --- > arch/x86/kvm/lapic.c | 42 +++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 39 insertions(+), 3 deletions(-) > > Index: linux-stable-rt/arch/x86/kvm/lapic.c > =================================================================== > --- linux-stable-rt.orig/arch/x86/kvm/lapic.c 2014-11-25 14:14:38.636810068 -0200 > +++ linux-stable-rt/arch/x86/kvm/lapic.c 2014-11-25 14:17:28.850567059 -0200 > @@ -1031,8 +1031,38 @@ > apic->divide_count); > } > > + > +static enum hrtimer_restart apic_timer_fn(struct hrtimer *data); > + > +static void apic_timer_expired(struct hrtimer *data) > +{ > + int ret, i = 0; > + enum hrtimer_restart r; > + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); > + > + r = apic_timer_fn(data); > + > + if (r == HRTIMER_RESTART) { > + do { > + ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + hrtimer_add_expires_ns(&ktimer->timer, > + ktimer->period); > + i++; > + } while (ret == -ETIME && i < 10); > + > + if (ret == -ETIME) { > + printk(KERN_ERR "%s: failed to reprogram timer\n", > + __func__); > + WARN_ON(1); > + } > + } > +} > + > + > static void start_apic_timer(struct kvm_lapic *apic) > { > + int ret; > ktime_t now; > atomic_set(&apic->lapic_timer.pending, 0); > > @@ -1062,9 +1092,11 @@ > } > } > > - hrtimer_start(&apic->lapic_timer.timer, > + ret = hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, apic->lapic_timer.period), > HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + apic_timer_expired(&apic->lapic_timer.timer); > > apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" > PRIx64 ", " > @@ -1094,8 +1126,10 @@ > ns = (tscdeadline - guest_tsc) * 1000000ULL; > do_div(ns, this_tsc_khz); > } > - hrtimer_start(&apic->lapic_timer.timer, > + ret = hrtimer_start(&apic->lapic_timer.timer, > ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > + if (ret == -ETIME) > + apic_timer_expired(&apic->lapic_timer.timer); > > local_irq_restore(flags); > } > @@ -1581,6 +1615,7 @@ > hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, > HRTIMER_MODE_ABS); > apic->lapic_timer.timer.function = apic_timer_fn; > + apic->lapic_timer.timer.irqsafe = 1; > > /* > * APIC is created enabled. This will prevent kvm_lapic_set_base from > @@ -1699,7 +1734,8 @@ > > timer = &vcpu->arch.apic->lapic_timer.timer; > if (hrtimer_cancel(timer)) > - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); > + if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME) > + apic_timer_expired(timer); > } > > /* > > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-01-21 20:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-25 17:21 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v2) Marcelo Tosatti 2014-11-25 17:21 ` [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq Marcelo Tosatti 2014-11-25 18:57 ` Rik van Riel 2014-11-25 19:08 ` Rik van Riel 2014-11-25 19:30 ` Marcelo Tosatti 2014-11-25 20:23 ` Thomas Gleixner 2014-11-25 17:21 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:38 ` Paolo Bonzini 2014-11-25 19:01 ` Jan Kiszka 2014-12-04 13:43 ` Marcelo Tosatti 2014-11-25 19:11 ` Rik van Riel 2014-11-25 20:24 ` Thomas Gleixner 2014-12-04 13:53 ` Paolo Bonzini 2014-12-05 16:17 ` Marcelo Tosatti -- strict thread matches above, loose matches on Subject: below -- 2015-01-21 20:36 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v4) Marcelo Tosatti 2015-01-21 20:36 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2015-01-14 17:12 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue (v3) Marcelo Tosatti 2015-01-14 17:12 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2015-01-14 18:23 ` Rik van Riel 2014-11-25 16:45 [patch -rt 0/2] use simple waitqueue for kvm vcpu waitqueue Marcelo Tosatti 2014-11-25 16:45 ` [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe Marcelo Tosatti 2014-11-25 17:10 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).