From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH 5/5] KVM: x86: Using TSC deadline may cause multiple interrupts by user writes Date: Fri, 10 Oct 2014 14:50:38 +0200 Message-ID: <20141010125034.GA17347@potion.brq.redhat.com> References: <1412287806-16016-1-git-send-email-namit@cs.technion.ac.il> <1412287806-16016-6-git-send-email-namit@cs.technion.ac.il> <20141006205737.GC2722@potion.brq.redhat.com> <4E3FA8A7-6CEF-4077-AD91-9AAE1AF86FEF@gmail.com> <20141008100619.GA20422@potion.brq.redhat.com> <99CA15AA-DDC1-4E95-A168-7AA640C5A3B9@gmail.com> <5437AAB0.4060908@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Nadav Amit , Nadav Amit , joro@8bytes.org, kvm@vger.kernel.org To: Paolo Bonzini Return-path: Received: from mx1.redhat.com ([209.132.183.28]:6024 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753400AbaJJMuw (ORCPT ); Fri, 10 Oct 2014 08:50:52 -0400 Content-Disposition: inline In-Reply-To: <5437AAB0.4060908@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: 2014-10-10 11:45+0200, Paolo Bonzini: > Il 10/10/2014 03:55, Nadav Amit ha scritto: > >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >> index b8345dd..51428dd 100644 > >> --- a/arch/x86/kvm/lapic.c > >> +++ b/arch/x86/kvm/lapic.c > >> @@ -1096,9 +1096,12 @@ static void start_apic_timer(struct kvm_lap= ic *apic) > >> if (likely(tscdeadline > guest_tsc)) { > >> ns =3D (tscdeadline - guest_tsc) * 1000000ULL; > >> do_div(ns, this_tsc_khz); > >> + hrtimer_start(&apic->lapic_timer.timer, > >> + ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> + } else { > >> + atomic_inc(&ktimer->pending); > >> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > >> } > >> - hrtimer_start(&apic->lapic_timer.timer, > >> - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); > >> > >> local_irq_restore(flags); > >> } > >> @@ -1355,9 +1358,6 @@ void kvm_set_lapic_tscdeadline_msr(struct kv= m_vcpu *vcpu, u64 data) > >> return; > >> > >> hrtimer_cancel(&apic->lapic_timer.timer); > >> - /* Inject here so clearing tscdeadline won't override new value = */ > >> - if (apic_has_pending_timer(vcpu)) > >> - kvm_inject_apic_timer_irqs(vcpu); > >> apic->lapic_timer.tscdeadline =3D data; > >> start_apic_timer(apic); > >> } > >=20 > > Perhaps I am missing something, but I don=E2=80=99t see how it solv= es the problem I encountered. > > Recall the scenario: > > 1. A TSC deadline timer interrupt is pending. > > 2. TSC deadline was still not cleared (which happens during vcpu_ru= n). > > 3. Userspace uses KVM_GET_MSRS/KVM_SET_MSRS to load the same deadli= ne msr. > >=20 > > It appears that in such scenario the guest would still get spurious > > interrupt for no reason, as ktimer->pending may already be increase= d in > > apic_timer_fn. >=20 > If ktimer->pending =3D=3D 2 you still get only one interrupt, not two= =2E Am I > misunderstanding your objection? (I don't see it either, the patch removed kvm_inject_apic_timer_irqs(), so we inject as if the MSR write didn't happen.) > > Second, I think that the solution I proposed would perform better. > > Currently, there are many unnecessary cancellations and setups of t= he > > timer. This solution does not resolve this problem. >=20 > I think it does. You do not get an hrtimer_start if tscdeadline <=3D > guest_tsc. To avoid useless cancels, either check hrtimer_is_enqueue= d > before calling hrtimer_cancel, or go straight to the source and avoid > taking the lock in the easy cases: I think the useless cancels were when the timer is set to the same valu= e in the future. Which makes sense to optimize, but I wasn't sure how it would affect th= e guest if the TSC offset was be changing or TSC itself wasn't stable ... so I chose not to do it. > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index 1c2fe7de2842..6ce725007424 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -1043,10 +1043,17 @@ int hrtimer_try_to_cancel(struct hrtimer *tim= er) > { > struct hrtimer_clock_base *base; > unsigned long flags; > - int ret =3D -1; > + unsigned long state =3D timer->state; > + int ret; > + > + if (state & HRTIMER_STATE_ENQUEUED) > + return 0; It doesn't try very hard to cancel it ;) > + if (state & HRTIMER_STATE_CALLBACK) > + return -1; >=20 > base =3D lock_hrtimer_base(timer, &flags); >=20 > + ret =3D -1; > if (!hrtimer_callback_running(timer)) > ret =3D remove_hrtimer(timer, base); >=20 > > Last, I think that having less interrupts on deadline changes is no= t > > completely according to the SDM which says: "If software disarms th= e > > timer or postpones the deadline, race conditions may result in the > > delivery of a spurious timer interrupt.=E2=80=9D It never says inte= rrupts may > > be lost if you reprogram the deadline before you check it expired. Yeah, too bad it wasn't written in a formally defined language ... They could be just reserving design space for concurrent implementation= , not making race conditions mandatory. Our race windows are much bigger -- a disarm that is hundreds of cycles in the future can easily be hit after a msr write vm-exit, but it would be very unlikely to get an interrupt on bare metal. And thinking about the meaning of postpone or disarm -- software doesn'= t want the old interrupt anymore, so it would just add useless work. (And I liked the code better without it, but it'd be ok if we fixed all the cases.) > But the case when you rewrite the same value to the MSR is neither > disarming nor postponing. You would be getting two interrupts for th= e > same event. That is why I agree with Radim that checking host_initia= ted > is wrong. Current implementation also injects two interrupts when the timer is se= t to a lower nonzero value, which should give us just one under both interpretations.