From: Marcelo Tosatti <marcelo@kvack.org>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm-devel@lists.sourceforge.net
Subject: Re: The SMP RHEL 5.1 PAE guest can't boot up issue
Date: Fri, 22 Feb 2008 14:17:57 -0300 [thread overview]
Message-ID: <20080222171756.GA10840@dmt> (raw)
In-Reply-To: <47BEF550.8040803@qumranet.com>
On Fri, Feb 22, 2008 at 06:16:16PM +0200, Avi Kivity wrote:
> > 2. The critical one. In normal condition, VCPU0 migrated much more
> > frequently than other VCPUs. And the patch add more "delta" (always negative
> > if host TSC is stable) to TSC_OFFSET each
> > time migrated. Then after boot for a while, VCPU0 became much
> > slower than others (In my test, VCPU0 was migrated about two times than the
> > others, and easily to be more than 100k cycles slower). In the guest kernel,
> > clocksource TSC is global variable, the variable "cycle_last" may got the
> > VCPU1's TSC value, then turn to VCPU0. For VCPU0's TSC_OFFSET is
> > smaller than VCPU1's, so it's possible to got the "cycle_last" (from VCPU1)
> > bigger than current TSC value (from VCPU0) in next tick. Then "u64 offset =
> > clocksource_read() - cycle_last" overflowed and caused the "infinite" loop.
> > And it can also explained why Marcelo's patch don't work - it just reduce the
> > rate of gap increasing.
Another source of problems in this area is that the TSC_OFFSET is
initialized to represent zero at different times for VCPU0 (at boot) and
the remaining ones (at APIC_DM_INIT).
> > The freezing didn't happen when using userspace IOAPIC, just because the qemu
> > APIC didn't implement real LOWPRI(or round_robin) to choose CPU for delivery.
> > It choose VCPU0 everytime if possible, so CPU1 in guest won't update
> > cycle_last. :(
> >
> > This freezing only occurred on RHEL5/5.1 pae (kernel 2.6.18), because of they
> > set IO-APIC IRQ0's dest_mask to 0x3 (with 2 vcpus) and dest_mode as
> > LOWEST_PRIOITY, then other vcpus had chance to modify "cycle_last". In
> > contrast, RHEL5/5.1 32e set IRQ0's dest_mode as FIXED, to CPU0, then don't
> > have this problem. So does RHEL4(kernel 2.6.9).
> >
> > I don't know if the patch was still needed now, since it was posted long ago(I
> > don't know which issue it solved). I'd like to post a revert patch if
> > necessary.
> >
>
> I believe the patch is still necessary, since we still need to guarantee
> that a vcpu's tsc is monotonous. I think there are three issues to be
> addressed:
>
> 1. The majority of intel machines don't need the offset adjustment since
> they already have a constant rate tsc that is synchronized on all cpus.
> I think this is indicated by X86_FEATURE_CONSTANT_TSC (though I'm not
> 100% certain if it means that the rate is the same for all cpus, Thomas
> can you clarify?)
The TSC might be marked unstable for other reasons (C3 state, large
machines with clustered APIC, cpufreq).
> This will improve tsc quality for those machines, but we can't depend on
> it, since some machines don't have constant tsc. Further, I don't think
> really large machines can have constant tsc since clock distribution
> becomes difficult or impossible.
As discussed earlier, in case the host kernel does not have the TSC
stable, it needs to enforce a state which the guest OS will not trust
the TSC. The easier way to do that is to fake a C3 state. However, QEMU
does not emulate IO port based wait. This appears to be the reason for
the high-CPU-usage-on-idle with Windows guests, fixed by disabling C3
reporting on rombios (commit cb98751267c2d79f5674301ccac6c6b5c2e0c6b5 of
kvm-userspace).
>
> 2. We should implement round robin and lowest priority like qemu does.
> Xen does the same thing:
>
> > /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
> > #define IRQ0_SPECIAL_ROUTING 1
> in arch/x86/hvm/vioapic.c, at least for irq 0.
>
> 3. The extra migrations on vcpu 0 are likely due to its role servicing
> I/O on behalf of the entire virtual machine. We should move this extra
> work to an independent thread. I have done some work in this area. It
> is becoming more important as kvm becomes more scalable.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next prev parent reply other threads:[~2008-02-22 17:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-22 8:57 The SMP RHEL 5.1 PAE guest can't boot up issue Yang, Sheng
2008-02-22 16:16 ` Avi Kivity
2008-02-22 17:17 ` Marcelo Tosatti [this message]
2008-02-22 18:45 ` Avi Kivity
2008-02-22 20:12 ` Marcelo Tosatti
2008-02-23 15:24 ` Farkas Levente
2008-02-24 8:51 ` Avi Kivity
2008-02-25 4:09 ` Yang, Sheng
2008-02-25 18:03 ` Farkas Levente
2008-02-25 18:12 ` Avi Kivity
2008-02-25 18:24 ` Farkas Levente
2008-02-25 23:46 ` Dong, Eddie
2008-02-26 10:28 ` Avi Kivity
2008-02-29 4:35 ` Zhao Forrest
2008-03-04 11:38 ` Avi Kivity
2008-02-29 8:26 ` Zhao Forrest
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080222171756.GA10840@dmt \
--to=marcelo@kvack.org \
--cc=avi@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox