From: Sheng Yang <sheng@linux.intel.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Alexander Graf <agraf@suse.de>,
kvm@vger.kernel.org, avi@redhat.com, Kevin Wolf <kwolf@suse.de>
Subject: Re: [PATCH] Fix almost infinite loop in APIC
Date: Wed, 14 Jan 2009 17:17:22 +0800 [thread overview]
Message-ID: <200901141717.23587.sheng@linux.intel.com> (raw)
In-Reply-To: <20090113220117.GB8945@amt.cnet>
On Wednesday 14 January 2009 06:01:17 Marcelo Tosatti wrote:
> On Tue, Jan 13, 2009 at 03:47:49PM +0800, Sheng Yang wrote:
> > On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote:
> >
> > The original apic_timer_intr_post() got:
> > > if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec)
> > > apic->timer.last_update = ktime_add_ns(
> > > apic->timer.last_update,
> > > apic->timer.period);
> >
> > I think the purpose is let guest see a reasonable TMCCT. Which means:
> >
> > 1. Timer start to fire at start_apic_timer(). last_update=now().
> >
> > 2. Every one LAPIC timer interrupt was injected into the guest, so the
> > time at least elapsed timer.period(the first alarm set at "period"
> > later), then last_update is increased by period. Notice last_update's
> > base is set before timer fire, so it's not indicated the next one, but
> > *the time this interrupt should be injected*(time can be delayed)... So
> > last_update = n*period + base.
>
> Right.
>
> > 3. If there is pending interrupt, last_update won't be updated, so we
> > have
> >
> > this:
> > > while (counter_passed > tmcct)
> > > counter_passed -= tmcct;
> > > tmcct -= counter_passed;
> >
> > (notice tmcct is TMICT here.)
> >
> > And the returned tmcct value indicated (counter_passed % tmict), a offset
> > regardless of pending interrupt numbers.
>
> Right. The problem is that for accumulated interrupts, the guest will
> receive the interrupt as fast as the qemu process can be scheduled (as
> long as its not masked, of course). There could be higher priority
> vectors in the mix, but thats not the common case.
>
> So calculating the offset using last interrupt injection is not very
> reasonable in this case.
Um... I think it's not easy to find a complete reasonable result here, I
haven't look into how OS us TMCCT, maybe it can help us to find a more
reasonable result here.
> > I think now the overflow seems OK, but I am not sure why last_update can
> > be bigger than ktime_get()? Maybe due to vcpu migration? Suspect some
> > racy or boundary condition existed...
>
> Yep, it seems to be some small inaccuracy in the accounting that
> causes it.
>
> > And base on this, I don't think my quick fix is correct...
> >
> > > 3) And then there's interrupt reinjection. Once interrupts accumulate,
> > > and we reinject, current count reads become completly bogus. This is
> > > the reason for time drift on RHEL4-style kernels with "clock=tsc". The
> > > algorithm calculates the interrupt handling delay by using the PIT
> > > count (equivalent to APIC_TMCCT). But PIT count emulation uses the
> > > hrtimer expiration time, which has nothing to do with the accumulated
> > > interrupts.
> > >
> > > So what I propose is to first switch lapic current count emulation to a
> > > straightforward scaled hrtimer_get_remaining equivalent.
> > >
> > > For the reinjection case, maintain an average of the delay between host
> > > interrupt and interrupt injection (this can be generic, so both PIT
> > > and LAPIC timer can use it). Return that scaled average on APIC_TMCCT
> > > emulation whenever pending > 2.
> > >
> > > What you think?
> >
> > 1. If we simply use ktime_get() to update last_update, the interval can't
> > be same between different last_update. I think we may got some problem
> > here, but not for sure. For example, three delayed interrupt was injected
> > one after one, last_update would show very little interval, and
> > APIC_TMCCT may got trouble.
>
> I mean get rid of last_update and use hrtimer_get_remaining on
> apic_get_tmcct for no pending interrupts. And estimated delay between
> timer fire and injection if there are pending interrupts.
>
> > Maintain an average of the delay seems a little tricky here, and I am not
> > sure if it would help the problem... Average is average at most, not the
> > real affection at the time...
>
> Yes. Probably using the last delay between timer fire and injection is
> more accurate than an average.
>
> > 2. For the current mechanism, the interval is the same, so last_update
> > always equal to n*period + base. If there are more than one pending
> > interrupts, TMCCT should also can return the relative correct value.
> >
> > So I am leaning toward to fix current problem, though I haven't find the
> > root cause yet...
>
> I think it can be simpler, without the need to deal with overflow at
> all.
>
> Understanding the present root cause is important though.
Yes, but still don't have many clues now... I would look through the code and
purpose some patches for Alex to test...
--
regards
Yang, Sheng
next prev parent reply other threads:[~2009-01-14 9:17 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-08 16:36 [PATCH] Fix almost infinite loop in APIC Alexander Graf
2009-01-09 6:34 ` Sheng Yang
2009-01-09 10:49 ` Alexander Graf
2009-01-09 12:57 ` Alexander Graf
2009-01-10 11:21 ` Sheng Yang
2009-01-11 4:55 ` Marcelo Tosatti
2009-01-13 7:47 ` Sheng Yang
2009-01-13 22:01 ` Marcelo Tosatti
2009-01-14 9:17 ` Sheng Yang [this message]
2009-01-14 17:03 ` Marcelo Tosatti
2009-01-15 7:20 ` Sheng Yang
2009-01-16 5:01 ` Marcelo Tosatti
2009-01-20 10:41 ` Alexander Graf
2009-01-20 11:20 ` Sheng Yang
2009-01-20 12:09 ` Alexander Graf
2009-01-20 12:30 ` Sheng Yang
2009-01-20 13:43 ` Sheng Yang
2009-01-20 18:51 ` Marcelo Tosatti
2009-01-21 2:40 ` Sheng Yang
2009-01-21 4:23 ` Marcelo Tosatti
2009-01-21 5:11 ` Sheng Yang
2009-01-21 15:07 ` Marcelo Tosatti
2009-01-21 16:01 ` Alexander Graf
2009-01-21 16:03 ` Alexander Graf
2009-01-21 16:18 ` Alexander Graf
2009-01-21 16:55 ` Marcelo Tosatti
2009-01-22 13:08 ` Avi Kivity
2009-01-23 17:58 ` Alex Williamson
2009-01-10 11:25 ` Sheng Yang
2009-01-10 11:28 ` Sheng Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200901141717.23587.sheng@linux.intel.com \
--to=sheng@linux.intel.com \
--cc=agraf@suse.de \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwolf@suse.de \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.