From: Alexander Graf <agraf@suse.de>
To: Sheng Yang <sheng@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
kvm@vger.kernel.org, avi@redhat.com, Kevin Wolf <kwolf@suse.de>
Subject: Re: [PATCH] Fix almost infinite loop in APIC
Date: Tue, 20 Jan 2009 13:09:23 +0100 [thread overview]
Message-ID: <4975BEF3.8040509@suse.de> (raw)
In-Reply-To: <200901201920.42519.sheng@linux.intel.com>
Sheng Yang wrote:
> On Tuesday 20 January 2009 18:41:21 Alexander Graf wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> On Thu, Jan 15, 2009 at 03:20:06PM +0800, Sheng Yang wrote:
>>>
>>>>> + * Since reinjection is not rate-limited, use the delay
>>>>> + * to inject the last interrupt as an estimate.
>>>>> + */
>>>>> + if (unlikely(atomic_read(&apic->timer.pending) > 0)) {
>>>>> + remaining = apic->timer.injection_delay;
>>>>> + if (ktime_to_ns(remaining) > apic->timer.period)
>>>>> + remaining = ns_to_ktime(apic->timer.period);
>>>>> + } else
>>>>> + remaining = hrtimer_expires_remaining(&apic->timer.dev);
>>>>>
>>>> A little doubt...
>>>>
>>>> A: time_fire
>>>> B: intr_post
>>>> C: read TMCCT
>>>>
>>>> The sequence can be ABC or ACB.
>>>>
>>>> injection_delay = time(B) - time(A)
>>>>
>>>> So it didn't count time from read TMCCT... And guest get interrupt at
>>>> time(B), not quite understand why time(B) - time(A) matters here...
>>>>
>>> Its an estimate of the delay it takes to inject an interrupt to the
>>> guest once fired. Its only used if there have been accumulated ones, so
>>> ACB sequence with pending=0 will use hrtimer_expires_remaining().
>>>
>>>
>>>> I think the reasonable here means, this interval is usable later after
>>>> the accumulated interrupts are injected. From this point of view, I
>>>> think current solution is reasonable. It just assume the delayed
>>>> interrupts have been injected.
>>>>
>>>> However, seriously, any value here is wrong, no elegant one.
>>>>
>>> I agree.
>>>
>>>
>>>> But I still prefer to the current solution...
>>>>
>>> Why? The proposed version is smaller and simpler than the current
>>> one IMO, and also not vulnerable to whatever bug is causing now <
>>> last_update.
>>>
>>> And more precise, since the current scheme uses interrupt injection time
>>> as if it was "count-down restart" time, which is not true.
>>>
>>>
>>>> And here is not the really problem for now I think. The current
>>>> mechanism is mostly OK, but where is the bug... We have either have a
>>>> simple fix(e.g. if now < last_update, then return 0) or dig into it. Did
>>>> it worth a try? Anyway, it would return a buggy result if we have
>>>> pending interrupts...
>>>>
>>> The overflow calculation is not necessary as discussed. Alexander, can
>>> you please try the following? Sheng, do you have any other suggestion to
>>> test?
>>>
>>> /proc/timer_list output of the host when ESX is running too.
>>>
>> Sorry for the late reply. Here's the dmesg output and /proc/timer_list
>> after the issue occured. I left the warning message in that I put there
>> in case the value is too high:
>>
>> counter_passed = div64_u64(ktime_to_ns(passed),
>> (APIC_BUS_CYCLE_NS *
>> apic->timer.divide_count));
>>
>> + if (counter_passed > 0x7f00000000000000) {
>> + /* If we're in here we probably encountered a bug! */
>> + printk(KERN_INFO "LAPIC: Too high counter_passed value:
>> 0x%lx | 0x%lx (0x%lx) | 0x%lx (0x%lx)\n", counter_passed,
>> ktime_to_ns(passed), passed, now.tv64, apic
>> ->timer.last_update.tv64);
>> + }
>> +
>> if (counter_passed > tmcct) {
>> if (unlikely(!apic_lvtt_period(apic))) {
>> /* one-shot timers stick at 0 until reset */
>>
>>
>>
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f377cfbc4, timer
>> initial count 0x186a0, period 100000ns, expire @ 0x00039a6f377e8264.
>> start_apic_timer: bus cycle is 1ns, now 0x00039a6f3a7b5aef, timer
>> initial count 0x203a0, period 132000ns, expire @ 0x00039a6f3a7d5e8f.
>> last_update = 1014860818546063 now = 1014860818426085
>> rec[47] when=1014860811948512 last_update=1014860812078063 pend=0
>> rec[46] when=1014860818417773 last_update=1014860818546063 pend=0
>> rec[45] when=1014860818285454 last_update=1014860818414063 pend=0
>> rec[44] when=1014860818153119 last_update=1014860818282063 pend=0
>> rec[43] when=1014860818021746 last_update=1014860818150063 pend=0
>> rec[42] when=1014860817889326 last_update=1014860818018063 pend=0
>> rec[41] when=1014860817757297 last_update=1014860817886063 pend=0
>> rec[40] when=1014860817625682 last_update=1014860817754063 pend=0
>> rec[39] when=1014860817492484 last_update=1014860817622063 pend=0
>> rec[38] when=1014860817360531 last_update=1014860817490063 pend=0
>> rec[37] when=1014860817228489 last_update=1014860817358063 pend=0
>> rec[36] when=1014860817101250 last_update=1014860817226063 pend=0
>> rec[35] when=1014860816964378 last_update=1014860817094063 pend=0
>> rec[34] when=1014860816832459 last_update=1014860816962063 pend=0
>> rec[33] when=1014860816700514 last_update=1014860816830063 pend=0
>> rec[32] when=1014860816568558 last_update=1014860816698063 pend=0
>> rec[31] when=1014860816436510 last_update=1014860816566063 pend=0
>> rec[30] when=1014860816305079 last_update=1014860816434063 pend=0
>> rec[29] when=1014860816172560 last_update=1014860816302063 pend=0
>> rec[28] when=1014860816040449 last_update=1014860816170063 pend=0
>> rec[27] when=1014860815908500 last_update=1014860816038063 pend=0
>> rec[26] when=1014860815776451 last_update=1014860815906063 pend=0
>> rec[25] when=1014860815644499 last_update=1014860815774063 pend=0
>> rec[24] when=1014860815513100 last_update=1014860815642063 pend=0
>> rec[23] when=1014860815380476 last_update=1014860815510063 pend=0
>> rec[22] when=1014860815248609 last_update=1014860815378063 pend=0
>> rec[21] when=1014860815116494 last_update=1014860815246063 pend=0
>> rec[20] when=1014860814984441 last_update=1014860815114063 pend=0
>> rec[19] when=1014860814852498 last_update=1014860814982063 pend=0
>> rec[18] when=1014860814721082 last_update=1014860814850063 pend=0
>> rec[17] when=1014860814588467 last_update=1014860814718063 pend=0
>> rec[16] when=1014860814456530 last_update=1014860814586063 pend=0
>> rec[15] when=1014860814324452 last_update=1014860814454063 pend=0
>> rec[14] when=1014860814192601 last_update=1014860814322063 pend=0
>> rec[13] when=1014860814060489 last_update=1014860814190063 pend=0
>> rec[12] when=1014860813928546 last_update=1014860814058063 pend=0
>> rec[11] when=1014860813797571 last_update=1014860813926063 pend=0
>> rec[10] when=1014860813664475 last_update=1014860813794063 pend=0
>> rec[9] when=1014860813532520 last_update=1014860813662063 pend=0
>> rec[8] when=1014860813400463 last_update=1014860813530063 pend=0
>> rec[7] when=1014860813268519 last_update=1014860813398063 pend=0
>> rec[6] when=1014860813136452 last_update=1014860813266063 pend=0
>> rec[5] when=1014860813007312 last_update=1014860813134063 pend=0
>> rec[4] when=1014860812872314 last_update=1014860813002063 pend=0
>> rec[3] when=1014860812740791 last_update=1014860812870063 pend=0
>> rec[2] when=1014860812652998 last_update=1014860812738063 pend=0
>> rec[1] when=1014860812476526 last_update=1014860812606063 pend=0
>> rec[0] when=1014860812344635 last_update=1014860812474063 pend=0
>> rec[49] when=1014860812213098 last_update=1014860812342063 pend=0
>> rec[48] when=1014860812080466 last_update=1014860812210063 pend=0
>> LAPIC: Too high counter_passed value: 0x7ffffffffffe2b55 |
>> 0x7ffffffffffe2b55 (0x7ffffffffffe2b55) | 0x39b02b29428e5 (0x39b02b295fd8f)
>> start_apic_timer: bus cycle is 1ns, now 0x00039b02b2998ada, timer
>> initial count 0x8400, period 33792ns, expire @ 0x00039b02b29a0eda.
>>
>>
>
> Thanks, Alex!
>
> When update time of last_update is 0x39B02B29428E5, and the start_apic_timer
> at 0x00039b02b2998ada, the delta is 0x561F5 = 352,757, very close. Looks like
> a window here?:
>
>
>> static void start_apic_timer(struct kvm_lapic *apic)
>> {
>> ktime_t now = apic->timer.dev.base->get_time();
>>
>> apic->timer.last_update = now;
>>
>
> [window?]
>
>> apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>> APIC_BUS_CYCLE_NS * apic->timer.divide_count;
>> atomic_set(&apic->timer.pending, 0);
>>
>> if (!apic->timer.period)
>> return;
>>
>> hrtimer_start(&apic->timer.dev,
>> ktime_add_ns(now, apic->timer.period),
>> HRTIMER_MODE_ABS);
>>
>> apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
>> PRIx64 ", "
>> "timer initial count 0x%x, period %lldns, "
>> "expire @ 0x%016" PRIx64 ".\n", __func__,
>> APIC_BUS_CYCLE_NS, ktime_to_ns(now),
>> apic_get_reg(apic, APIC_TMICT),
>> apic->timer.period,
>> ktime_to_ns(ktime_add_ns(now,
>> apic->timer.period)));
>> }
>>
>
> Looks like due to the period is reduced, but the last_update use old value and
> got just updated... But guest vcpu B program vcpu A's lapic? Or intr_post of
> vcpu B happened at vcpu A? Both seems unreasonable...
>
> Alex, can you help to add another line above to confirm the problem? e.g.
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index afac68c..15f8ed5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -653,6 +653,8 @@ static void start_apic_timer(struct kvm_lapic *apic)
> {
> ktime_t now = apic->timer.dev.base->get_time();
>
> + printk("enter start_apic_timer! vcpu %d\n", apic->vcpu->vcpu_id);
> +
> apic->timer.last_update = now;
>
> apic->timer.period = apic_get_reg(apic, APIC_TMICT) *
>
> Also add a similar one at the beginning of print_last_update_record().
>
> Thanks! (And thanks for Marcelo's excellent debug patch! :) )
>
>
The VMware ESX VM is UP, so I don't think we'll get anything from the
vcpu_id apart from 0 :-). Nevertheless here's the dmesg output:
enter start_apic_timer! vcpu 0
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb339714386, timer
initial count 0x186a0, period 100000ns, expire @ 0x00039fb33972ca26.
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x00039fb33c58ebc2, timer
initial count 0x203a0, period 132000ns, expire @ 0x00039fb33c5aef62.
last_update = 1020361050214818 now = 1020361050178604
enter print_last_update_record! vcpu 0
rec[5] when=1020361043617361 last_update=1020361043746818 pend=0
rec[4] when=1020361050088268 last_update=1020361050214818 pend=0
rec[3] when=1020361049970143 last_update=1020361050082818 pend=0
rec[2] when=1020361049822220 last_update=1020361049950818 pend=0
rec[1] when=1020361049690023 last_update=1020361049818818 pend=0
rec[0] when=1020361049559209 last_update=1020361049686818 pend=0
rec[49] when=1020361049427469 last_update=1020361049554818 pend=0
rec[48] when=1020361049294815 last_update=1020361049422818 pend=0
rec[47] when=1020361049162401 last_update=1020361049290818 pend=0
rec[46] when=1020361049030431 last_update=1020361049158818 pend=0
rec[45] when=1020361048898352 last_update=1020361049026818 pend=0
rec[44] when=1020361048765413 last_update=1020361048894818 pend=0
rec[43] when=1020361048634310 last_update=1020361048762818 pend=0
rec[42] when=1020361048501343 last_update=1020361048630818 pend=0
rec[41] when=1020361048369388 last_update=1020361048498818 pend=0
rec[40] when=1020361048241616 last_update=1020361048366818 pend=0
rec[39] when=1020361048108017 last_update=1020361048234818 pend=0
rec[38] when=1020361047973826 last_update=1020361048102818 pend=0
rec[37] when=1020361047841371 last_update=1020361047970818 pend=0
rec[36] when=1020361047709322 last_update=1020361047838818 pend=0
rec[35] when=1020361047577366 last_update=1020361047706818 pend=0
rec[34] when=1020361047445316 last_update=1020361047574818 pend=0
rec[33] when=1020361047313423 last_update=1020361047442818 pend=0
rec[32] when=1020361047181310 last_update=1020361047310818 pend=0
rec[31] when=1020361047049990 last_update=1020361047178818 pend=0
rec[30] when=1020361046917375 last_update=1020361047046818 pend=0
rec[29] when=1020361046785332 last_update=1020361046914818 pend=0
rec[28] when=1020361046653374 last_update=1020361046782818 pend=0
rec[27] when=1020361046521325 last_update=1020361046650818 pend=0
rec[26] when=1020361046389374 last_update=1020361046518818 pend=0
rec[25] when=1020361046257989 last_update=1020361046386818 pend=0
rec[24] when=1020361046125335 last_update=1020361046254818 pend=0
rec[23] when=1020361045993393 last_update=1020361046122818 pend=0
rec[22] when=1020361045861330 last_update=1020361045990818 pend=0
rec[21] when=1020361045729389 last_update=1020361045858818 pend=0
rec[20] when=1020361045602604 last_update=1020361045726818 pend=0
rec[19] when=1020361045471032 last_update=1020361045594818 pend=0
rec[18] when=1020361045333385 last_update=1020361045462818 pend=0
rec[17] when=1020361045201397 last_update=1020361045330818 pend=0
rec[16] when=1020361045074634 last_update=1020361045198818 pend=0
rec[15] when=1020361044941707 last_update=1020361045066818 pend=0
rec[14] when=1020361044805373 last_update=1020361044934818 pend=0
rec[13] when=1020361044673417 last_update=1020361044802818 pend=0
rec[12] when=1020361044547848 last_update=1020361044670818 pend=0
rec[11] when=1020361044409439 last_update=1020361044538818 pend=0
rec[10] when=1020361044277316 last_update=1020361044406818 pend=0
rec[9] when=1020361044150456 last_update=1020361044274818 pend=0
rec[8] when=1020361044013204 last_update=1020361044142818 pend=0
rec[7] when=1020361043881203 last_update=1020361044010818 pend=0
rec[6] when=1020361043749913 last_update=1020361043878818 pend=0
LAPIC: Too high counter_passed value: 0x7fffffffffff7289 |
0x7fffffffffff7289 (0x7fffffffffff7289) | 0x3a00351f0442c (0x3a00351f0d1a2)
enter start_apic_timer! vcpu 0
start_apic_timer: bus cycle is 1ns, now 0x0003a00351f37054, timer
initial count 0x8400, period 33792ns, expire @ 0x0003a00351f3f454.
next prev parent reply other threads:[~2009-01-20 12:09 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-08 16:36 [PATCH] Fix almost infinite loop in APIC Alexander Graf
2009-01-09 6:34 ` Sheng Yang
2009-01-09 10:49 ` Alexander Graf
2009-01-09 12:57 ` Alexander Graf
2009-01-10 11:21 ` Sheng Yang
2009-01-11 4:55 ` Marcelo Tosatti
2009-01-13 7:47 ` Sheng Yang
2009-01-13 22:01 ` Marcelo Tosatti
2009-01-14 9:17 ` Sheng Yang
2009-01-14 17:03 ` Marcelo Tosatti
2009-01-15 7:20 ` Sheng Yang
2009-01-16 5:01 ` Marcelo Tosatti
2009-01-20 10:41 ` Alexander Graf
2009-01-20 11:20 ` Sheng Yang
2009-01-20 12:09 ` Alexander Graf [this message]
2009-01-20 12:30 ` Sheng Yang
2009-01-20 13:43 ` Sheng Yang
2009-01-20 18:51 ` Marcelo Tosatti
2009-01-21 2:40 ` Sheng Yang
2009-01-21 4:23 ` Marcelo Tosatti
2009-01-21 5:11 ` Sheng Yang
2009-01-21 15:07 ` Marcelo Tosatti
2009-01-21 16:01 ` Alexander Graf
2009-01-21 16:03 ` Alexander Graf
2009-01-21 16:18 ` Alexander Graf
2009-01-21 16:55 ` Marcelo Tosatti
2009-01-22 13:08 ` Avi Kivity
2009-01-23 17:58 ` Alex Williamson
2009-01-10 11:25 ` Sheng Yang
2009-01-10 11:28 ` Sheng Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4975BEF3.8040509@suse.de \
--to=agraf@suse.de \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwolf@suse.de \
--cc=mtosatti@redhat.com \
--cc=sheng@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.