From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sheng Yang Subject: Re: [PATCH] Fix almost infinite loop in APIC Date: Tue, 20 Jan 2009 21:43:15 +0800 Message-ID: <200901202143.16125.sheng@linux.intel.com> References: <1231432566-9864-1-git-send-email-agraf@suse.de> <20090116050143.GA13032@amt.cnet> <4975AA51.2060705@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, Alexander Graf , avi@redhat.com, Kevin Wolf To: Marcelo Tosatti Return-path: Received: from mga09.intel.com ([134.134.136.24]:56020 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756029AbZATNn3 convert rfc822-to-8bit (ORCPT ); Tue, 20 Jan 2009 08:43:29 -0500 In-Reply-To: <4975AA51.2060705@suse.de> Content-Disposition: inline Sender: kvm-owner@vger.kernel.org List-ID: On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote: > Marcelo Tosatti wrote: > > On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote: > >>> + * Since reinjection is not rate-limited, use the delay > >>> + * to inject the last interrupt as an estimate. > >>> + */ > >>> + if (unlikely(atomic_read(&apic->timer.pending) > 0)) { > >>> + remaining =3D apic->timer.injection_delay; > >>> + if (ktime_to_ns(remaining) > apic->timer.period) > >>> + remaining =3D ns_to_ktime(apic->timer.period); > >>> + } else > >>> + remaining =3D hrtimer_expires_remaining(&apic->timer.dev); > >> > >> A little doubt... > >> > >> A: time_fire > >> B: intr_post > >> C: read TMCCT > >> > >> The sequence can be ABC or ACB. > >> > >> injection_delay =3D time(B) - time(A) > >> > >> So it didn't count time from read TMCCT... And guest get interrupt= at > >> time(B), not quite understand why time(B) - time(A) matters here..= =2E > > > > Its an estimate of the delay it takes to inject an interrupt to the > > guest once fired. Its only used if there have been accumulated ones= , so > > ACB sequence with pending=3D0 will use hrtimer_expires_remaining(). > > > >> I think the reasonable here means, this interval is usable later a= fter > >> the accumulated interrupts are injected. From this point of view, = I > >> think current solution is reasonable. It just assume the delayed > >> interrupts have been injected. > >> > >> However, seriously, any value here is wrong, no elegant one. > > > > I agree. > > > >> But I still prefer to the current solution... > > > > Why? The proposed version is smaller and simpler than the current > > one IMO, and also not vulnerable to whatever bug is causing now < > > last_update. > > > > And more precise, since the current scheme uses interrupt injection= time > > as if it was "count-down restart" time, which is not true. > > > >> And here is not the really problem for now I think. The current > >> mechanism is mostly OK, but where is the bug... We have either hav= e a > >> simple fix(e.g. if now < last_update, then return 0) or dig into i= t. Did > >> it worth a try? Anyway, it would return a buggy result if we have > >> pending interrupts... > > > > The overflow calculation is not necessary as discussed. Alexander, = can > > you please try the following? Sheng, do you have any other suggesti= on to > > test? > > > > /proc/timer_list output of the host when ESX is running too. > > Sorry for the late reply. Here's the dmesg output and /proc/timer_lis= t > after the issue occured. I left the warning message in that I put the= re > in case the value is too high: > > counter_passed =3D div64_u64(ktime_to_ns(passed), > (APIC_BUS_CYCLE_NS * > apic->timer.divide_count)); > > + if (counter_passed > 0x7f00000000000000) { > + /* If we're in here we probably encountered a bug! */ > + printk(KERN_INFO "LAPIC: Too high counter_passed valu= e: > 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed, > ktime_to_ns(passed), passed, now.tv64, apic > ->timer.last_update.tv64); > + } > + > if (counter_passed > tmcct) { > if (unlikely(!apic_lvtt_period(apic))) { > /* one-shot timers stick at 0 until reset */ > > > > start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer > initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264. > start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer > initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f. > last_update =3D 1014860818546063 now =3D 1014860818426085 > rec[47] when=3D1014860811948512 last_update=3D1014860812078063 pend=3D= 0 > rec[46] when=3D1014860818417773 last_update=3D1014860818546063 pend=3D= 0 > rec[45] when=3D1014860818285454 last_update=3D1014860818414063 pend=3D= 0 > rec[44] when=3D1014860818153119 last_update=3D1014860818282063 pend=3D= 0 > rec[43] when=3D1014860818021746 last_update=3D1014860818150063 pend=3D= 0 > rec[42] when=3D1014860817889326 last_update=3D1014860818018063 pend=3D= 0 > rec[41] when=3D1014860817757297 last_update=3D1014860817886063 pend=3D= 0 > rec[40] when=3D1014860817625682 last_update=3D1014860817754063 pend=3D= 0 > rec[39] when=3D1014860817492484 last_update=3D1014860817622063 pend=3D= 0 > rec[38] when=3D1014860817360531 last_update=3D1014860817490063 pend=3D= 0 > rec[37] when=3D1014860817228489 last_update=3D1014860817358063 pend=3D= 0 > rec[36] when=3D1014860817101250 last_update=3D1014860817226063 pend=3D= 0 > rec[35] when=3D1014860816964378 last_update=3D1014860817094063 pend=3D= 0 > rec[34] when=3D1014860816832459 last_update=3D1014860816962063 pend=3D= 0 > rec[33] when=3D1014860816700514 last_update=3D1014860816830063 pend=3D= 0 > rec[32] when=3D1014860816568558 last_update=3D1014860816698063 pend=3D= 0 > rec[31] when=3D1014860816436510 last_update=3D1014860816566063 pend=3D= 0 > rec[30] when=3D1014860816305079 last_update=3D1014860816434063 pend=3D= 0 > rec[29] when=3D1014860816172560 last_update=3D1014860816302063 pend=3D= 0 > rec[28] when=3D1014860816040449 last_update=3D1014860816170063 pend=3D= 0 > rec[27] when=3D1014860815908500 last_update=3D1014860816038063 pend=3D= 0 > rec[26] when=3D1014860815776451 last_update=3D1014860815906063 pend=3D= 0 > rec[25] when=3D1014860815644499 last_update=3D1014860815774063 pend=3D= 0 > rec[24] when=3D1014860815513100 last_update=3D1014860815642063 pend=3D= 0 > rec[23] when=3D1014860815380476 last_update=3D1014860815510063 pend=3D= 0 > rec[22] when=3D1014860815248609 last_update=3D1014860815378063 pend=3D= 0 > rec[21] when=3D1014860815116494 last_update=3D1014860815246063 pend=3D= 0 > rec[20] when=3D1014860814984441 last_update=3D1014860815114063 pend=3D= 0 > rec[19] when=3D1014860814852498 last_update=3D1014860814982063 pend=3D= 0 > rec[18] when=3D1014860814721082 last_update=3D1014860814850063 pend=3D= 0 > rec[17] when=3D1014860814588467 last_update=3D1014860814718063 pend=3D= 0 > rec[16] when=3D1014860814456530 last_update=3D1014860814586063 pend=3D= 0 > rec[15] when=3D1014860814324452 last_update=3D1014860814454063 pend=3D= 0 > rec[14] when=3D1014860814192601 last_update=3D1014860814322063 pend=3D= 0 > rec[13] when=3D1014860814060489 last_update=3D1014860814190063 pend=3D= 0 > rec[12] when=3D1014860813928546 last_update=3D1014860814058063 pend=3D= 0 > rec[11] when=3D1014860813797571 last_update=3D1014860813926063 pend=3D= 0 > rec[10] when=3D1014860813664475 last_update=3D1014860813794063 pend=3D= 0 > rec[9] when=3D1014860813532520 last_update=3D1014860813662063 pend=3D= 0 > rec[8] when=3D1014860813400463 last_update=3D1014860813530063 pend=3D= 0 > rec[7] when=3D1014860813268519 last_update=3D1014860813398063 pend=3D= 0 > rec[6] when=3D1014860813136452 last_update=3D1014860813266063 pend=3D= 0 > rec[5] when=3D1014860813007312 last_update=3D1014860813134063 pend=3D= 0 > rec[4] when=3D1014860812872314 last_update=3D1014860813002063 pend=3D= 0 > rec[3] when=3D1014860812740791 last_update=3D1014860812870063 pend=3D= 0 > rec[2] when=3D1014860812652998 last_update=3D1014860812738063 pend=3D= 0 > rec[1] when=3D1014860812476526 last_update=3D1014860812606063 pend=3D= 0 > rec[0] when=3D1014860812344635 last_update=3D1014860812474063 pend=3D= 0 > rec[49] when=3D1014860812213098 last_update=3D1014860812342063 pend=3D= 0 > rec[48] when=3D1014860812080466 last_update=3D1014860812210063 pend=3D= 0 > LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 | > 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b29= 5fd8f) > start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer > initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda. > Marcelo, I realize some interesting things. In fact, on my machine, when I measured the delta of now() and last_upd= ate at=20 kvm_apic_timer_intr_post(), the delta was bigger and bigger...(and now(= ) is=20 always bigger for me, so still no clue for why data can be like above,=20 last_update always ahead of when for about one period...). Then I found something not good in original design - it ignored the int= erval=20 between time fire and injection, so we got: last_update =3D now() + n * period; And the time we update last_update: time =3D now() + n * period + n * interval. So last_update time is more and more inaccurate... Though it was revise= d by=20 tmcct function, it's still not a good way to go. Then I understand your purpose more. +=A0=A0=A0=A0=A0=A0=A0if (unlikely(atomic_read(&apic->timer.pending) > = 0)) { +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0remaining =3D apic->timer= =2Einjection_delay; +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (ktime_to_ns(remaining= ) > apic->timer.period) +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0r= emaining =3D ns_to_ktime(apic->timer.period); + =A0 =A0 =A0 =A0} else +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0remaining =3D hrtimer_exp= ires_remaining(&apic->timer.dev); And about your patch, how about take interval between intr_post() and r= ead=20 tmcct in to account as well? That can keep the consistent with=20 hrtimer_get_remaining() in the read tmcct. And I think if remaining > p= eriod,=20 remaining =3D remain % period maybe more reasonable here. How do you think? --=20 regards Yang, Sheng