public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sheng Yang <sheng@linux.intel.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Alexander Graf <agraf@suse.de>,
	kvm@vger.kernel.org, avi@redhat.com, Kevin Wolf <kwolf@suse.de>
Subject: Re: [PATCH] Fix almost infinite loop in APIC
Date: Wed, 14 Jan 2009 17:17:22 +0800	[thread overview]
Message-ID: <200901141717.23587.sheng@linux.intel.com> (raw)
In-Reply-To: <20090113220117.GB8945@amt.cnet>

On Wednesday 14 January 2009 06:01:17 Marcelo Tosatti wrote:
> On Tue, Jan 13, 2009 at 03:47:49PM +0800, Sheng Yang wrote:
> > On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote:
> >
> > The original apic_timer_intr_post() got:
> > >         if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> > >                 apic->timer.last_update = ktime_add_ns(
> > >                                 apic->timer.last_update,
> > >                                 apic->timer.period);
> >
> > I think the purpose is let guest see a reasonable TMCCT. Which means:
> >
> > 1. Timer start to fire at start_apic_timer(). last_update=now().
> >
> > 2. Every one LAPIC timer interrupt was injected into the guest, so the
> > time at least elapsed timer.period(the first alarm set at "period"
> > later), then last_update is increased by period. Notice last_update's
> > base is set before timer fire, so it's not indicated the next one, but
> > *the time this interrupt should be injected*(time can be delayed)... So
> > last_update = n*period + base.
>
> Right.
>
> > 3. If there is pending interrupt, last_update won't be updated, so we
> > have
> >
> > this:
> > >                         while (counter_passed > tmcct)
> > >                                 counter_passed -= tmcct;
> > >                         tmcct -= counter_passed;
> >
> > (notice tmcct is TMICT here.)
> >
> > And the returned tmcct value indicated (counter_passed % tmict), a offset
> > regardless of pending interrupt numbers.
>
> Right. The problem is that for accumulated interrupts, the guest will
> receive the interrupt as fast as the qemu process can be scheduled (as
> long as its not masked, of course). There could be higher priority
> vectors in the mix, but thats not the common case.
>
> So calculating the offset using last interrupt injection is not very
> reasonable in this case.

Um... I think it's not easy to find a complete reasonable result here, I 
haven't look into how OS us TMCCT, maybe it can help us to find a more 
reasonable result here.

> > I think now the overflow seems OK, but I am not sure why last_update can
> > be bigger than ktime_get()? Maybe due to vcpu migration? Suspect some
> > racy or boundary condition existed...
>
> Yep, it seems to be some small inaccuracy in the accounting that
> causes it.
>
> > And base on this, I don't think my quick fix is correct...
> >
> > > 3) And then there's interrupt reinjection. Once interrupts accumulate,
> > > and we reinject, current count reads become completly bogus. This is
> > > the reason for time drift on RHEL4-style kernels with "clock=tsc". The
> > > algorithm calculates the interrupt handling delay by using the PIT
> > > count (equivalent to APIC_TMCCT). But PIT count emulation uses the
> > > hrtimer expiration time, which has nothing to do with the accumulated
> > > interrupts.
> > >
> > > So what I propose is to first switch lapic current count emulation to a
> > > straightforward scaled hrtimer_get_remaining equivalent.
> > >
> > > For the reinjection case, maintain an average of the delay between host
> > > interrupt and interrupt injection (this can be generic, so both PIT
> > > and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
> > > emulation whenever pending > 2.
> > >
> > > What you think?
> >
> > 1. If we simply use ktime_get() to update last_update, the interval can't
> > be same between different last_update. I think we may got some problem
> > here, but not for sure. For example, three delayed interrupt was injected
> > one after one, last_update would show very little interval, and
> > APIC_TMCCT may got trouble.
>
> I mean get rid of last_update and use hrtimer_get_remaining on
> apic_get_tmcct for no pending interrupts. And estimated delay between
> timer fire and injection if there are pending interrupts.
>
> > Maintain an average of the delay seems a little tricky here, and I am not
> > sure if it would help the problem...  Average is average at most, not the
> > real affection at the time...
>
> Yes. Probably using the last delay between timer fire and injection is
> more accurate than an average.
>
> > 2. For the current mechanism, the interval is the same, so last_update
> > always equal to n*period + base. If there are more than one pending
> > interrupts, TMCCT should also can return the relative correct value.
> >
> > So I am leaning toward to fix current problem, though I haven't find the
> > root cause yet...
>
> I think it can be simpler, without the need to deal with overflow at
> all.
>
> Understanding the present root cause is important though.

Yes, but still don't have many clues now... I would look through the code and 
purpose some patches for Alex to test...

-- 
regards
Yang, Sheng

  reply	other threads:[~2009-01-14  9:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-08 16:36 [PATCH] Fix almost infinite loop in APIC Alexander Graf
2009-01-09  6:34 ` Sheng Yang
2009-01-09 10:49   ` Alexander Graf
2009-01-09 12:57   ` Alexander Graf
2009-01-10 11:21     ` Sheng Yang
2009-01-11  4:55       ` Marcelo Tosatti
2009-01-13  7:47         ` Sheng Yang
2009-01-13 22:01           ` Marcelo Tosatti
2009-01-14  9:17             ` Sheng Yang [this message]
2009-01-14 17:03               ` Marcelo Tosatti
2009-01-15  7:20                 ` Sheng Yang
2009-01-16  5:01                   ` Marcelo Tosatti
2009-01-20 10:41                     ` Alexander Graf
2009-01-20 11:20                       ` Sheng Yang
2009-01-20 12:09                         ` Alexander Graf
2009-01-20 12:30                           ` Sheng Yang
2009-01-20 13:43                       ` Sheng Yang
2009-01-20 18:51                         ` Marcelo Tosatti
2009-01-21  2:40                           ` Sheng Yang
2009-01-21  4:23                             ` Marcelo Tosatti
2009-01-21  5:11                               ` Sheng Yang
2009-01-21 15:07                                 ` Marcelo Tosatti
2009-01-21 16:01                                   ` Alexander Graf
2009-01-21 16:03                                     ` Alexander Graf
2009-01-21 16:18                                   ` Alexander Graf
2009-01-21 16:55                                     ` Marcelo Tosatti
2009-01-22 13:08                                   ` Avi Kivity
2009-01-23 17:58                                     ` Alex Williamson
2009-01-10 11:25     ` Sheng Yang
2009-01-10 11:28     ` Sheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200901141717.23587.sheng@linux.intel.com \
    --to=sheng@linux.intel.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@suse.de \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox