From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH] Fix almost infinite loop in APIC Date: Fri, 09 Jan 2009 13:57:30 +0100 Message-ID: <496749BA.3080007@suse.de> References: <1231432566-9864-1-git-send-email-agraf@suse.de> <200901091434.39200.sheng@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, avi@redhat.com, Kevin Wolf , Marcelo Tosatti To: Sheng Yang Return-path: Received: from cantor2.suse.de ([195.135.220.15]:48168 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751674AbZAIM5d (ORCPT ); Fri, 9 Jan 2009 07:57:33 -0500 In-Reply-To: <200901091434.39200.sheng@linux.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: Sheng Yang wrote: > On Friday 09 January 2009 00:36:06 Alexander Graf wrote: > >> While booting Linux in VMware ESX, I encountered a strange effect >> in the in-kernel lapic implementation: time went backwards! >> >> While this should never occur, because of that the while loop that >> is done after the invalid calculations caused my host system to hang. >> >> In order to make debugging easier, let's replace this as suggested >> with a modulo function and not run into the danger of looping forever. >> >> To replace the nice hint this bug gave me that the values are broken, >> I added a printk message so people encountering this can at least >> see that something is fishy. >> >> Of course, the real issue needs to be fixed as well! I'm open to ideas >> why now < last_update! >> >> (Thanks to Kevin for his help in debugging this) >> >> Signed-off-by: Alexander Graf >> Signed-off-by: Kevin Wolf >> --- >> > > Hi Alexander > > I'm a little suspect here: > > >> if (unlikely(ktime_to_ns(now) <= >> ktime_to_ns(apic->timer.last_update))) { >> /* Wrap around */ >> passed = ktime_add(( { >> (ktime_t) { >> .tv64 = KTIME_MAX - >> (apic->timer.last_update).tv64}; } >> ), now); >> apic_debug("time elapsed\n"); >> } else >> passed = ktime_sub(now, apic->timer.last_update); >> > > And now apic timer base is hr_timer with CLOCK_MONOTONIC, and get_time() is > really ktime_get() which is almost impossible to wrap around. If it's > overflow, at least we need a warning. I think this piece of code due to clock > source change. > > So I doubt: due to some reason, now <= apic->timer.last_update, which cause a > big wrap around operation. > > And the most suspect: > > >> void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec) >> { >> struct kvm_lapic *apic = vcpu->arch.apic; >> >> if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec) >> apic->timer.last_update = ktime_add_ns( >> apic->timer.last_update, >> apic->timer.period); >> } >> > > Not sure what's happening, have you tried this? (In fact, I am little willing > to replace all apic->timer.dev.base->get_time() with more explicit ktime_get() > as in pit.) > Yes, this code is the culprit. Using that patch of yours now is always > last_update. I'd still like to see that while loop go away ;-). Alex > -- > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index afac68c..414e7e0 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1115,9 +1115,7 @@ void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int > vec) > struct kvm_lapic *apic = vcpu->arch.apic; > > if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec) > - apic->timer.last_update = ktime_add_ns( > - apic->timer.last_update, > - apic->timer.period); > + apic->timer.last_update = apic->timer.dev.base->get_time(); > } > > int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) > > >