From: "Yang, Sheng" <sheng.yang@intel.com>
To: kvm-devel@lists.sourceforge.net
Subject: The SMP RHEL 5.1 PAE guest can't boot up issue
Date: Fri, 22 Feb 2008 16:57:34 +0800 [thread overview]
Message-ID: <200802221657.34243.sheng.yang@intel.com> (raw)
I believe I have found the root cause of SMP RHEL5.1 PAE guest can't boot up
issue. The problem was caused by
kvm:6685637b211ad67bdce21bfd9f91bc888b3acb4f
"KVM: VMX: Ensure vcpu time stamp counter is monotonous" (It didn't take me
much time to found the solution, but a lot of time to find the proper
explanation... :( )
As we guessed, the problem was the monotonous of TSC. I have traced to
the 2.6.18 PAE guest kernel, and finally found it caused by a overflow in the
loop of function update_wall_timer()(kernel/timer.c), when using TSC as
clocksource by default.
The reason is that the patch "KVM: VMX: Ensure vcpu time stamp counter is
monotonous" bring big gap between different VCPUs (error between
TSC_OFFSETs). Though I have proved that the patch can ensure the monotonous
on each VCPU (which rejected my first thought...), the patch
have 2 problems:
1. It have accumulated the error. Each vcpu's TSC is monotonous, but get
slower and slower, compared to the host. That's because the TSC is very
accuracy and the interval between reading TSC is big. But this is not very
critical.
2. The critical one. In normal condition, VCPU0 migrated much more
frequently than other VCPUs. And the patch add more "delta" (always negative
if host TSC is stable) to TSC_OFFSET each
time migrated. Then after boot for a while, VCPU0 became much
slower than others (In my test, VCPU0 was migrated about two times than the
others, and easily to be more than 100k cycles slower). In the guest kernel,
clocksource TSC is global variable, the variable "cycle_last" may got the
VCPU1's TSC value, then turn to VCPU0. For VCPU0's TSC_OFFSET is
smaller than VCPU1's, so it's possible to got the "cycle_last" (from VCPU1)
bigger than current TSC value (from VCPU0) in next tick. Then "u64 offset =
clocksource_read() - cycle_last" overflowed and caused the "infinite" loop.
And it can also explained why Marcelo's patch don't work - it just reduce the
rate of gap increasing.
The freezing didn't happen when using userspace IOAPIC, just because the qemu
APIC didn't implement real LOWPRI(or round_robin) to choose CPU for delivery.
It choose VCPU0 everytime if possible, so CPU1 in guest won't update
cycle_last. :(
This freezing only occurred on RHEL5/5.1 pae (kernel 2.6.18), because of they
set IO-APIC IRQ0's dest_mask to 0x3 (with 2 vcpus) and dest_mode as
LOWEST_PRIOITY, then other vcpus had chance to modify "cycle_last". In
contrast, RHEL5/5.1 32e set IRQ0's dest_mode as FIXED, to CPU0, then don't
have this problem. So does RHEL4(kernel 2.6.9).
I don't know if the patch was still needed now, since it was posted long ago(I
don't know which issue it solved). I'd like to post a revert patch if
necessary.
--
Thanks
Yang, Sheng
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next reply other threads:[~2008-02-22 8:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-22 8:57 Yang, Sheng [this message]
2008-02-22 16:16 ` The SMP RHEL 5.1 PAE guest can't boot up issue Avi Kivity
2008-02-22 17:17 ` Marcelo Tosatti
2008-02-22 18:45 ` Avi Kivity
2008-02-22 20:12 ` Marcelo Tosatti
2008-02-23 15:24 ` Farkas Levente
2008-02-24 8:51 ` Avi Kivity
2008-02-25 4:09 ` Yang, Sheng
2008-02-25 18:03 ` Farkas Levente
2008-02-25 18:12 ` Avi Kivity
2008-02-25 18:24 ` Farkas Levente
2008-02-25 23:46 ` Dong, Eddie
2008-02-26 10:28 ` Avi Kivity
2008-02-29 4:35 ` Zhao Forrest
2008-03-04 11:38 ` Avi Kivity
2008-02-29 8:26 ` Zhao Forrest
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200802221657.34243.sheng.yang@intel.com \
--to=sheng.yang@intel.com \
--cc=kvm-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox