From: Jan Kiszka <jan.kiszka@web.de>
To: Zachary Amsden <zamsden@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Marcelo Tosatti <mtosatti@redhat.com>,
Glauber Costa <glommer@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
John Stultz <johnstul@us.ibm.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [KVM timekeeping 10/35] Fix deep C-state TSC desynchronization
Date: Wed, 15 Sep 2010 07:34:52 +0200 [thread overview]
Message-ID: <4C905AFC.5020706@web.de> (raw)
In-Reply-To: <4C9007F2.9020205@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6729 bytes --]
Am 15.09.2010 01:40, Zachary Amsden wrote:
> On 09/14/2010 12:26 PM, Jan Kiszka wrote:
>> Am 14.09.2010 21:32, Zachary Amsden wrote:
>>
>>> On 09/14/2010 12:40 AM, Jan Kiszka wrote:
>>>
>>>> Am 14.09.2010 11:27, Avi Kivity wrote:
>>>>
>>>>
>>>>> On 09/14/2010 11:10 AM, Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>> Am 20.08.2010 10:07, Zachary Amsden wrote:
>>>>>>
>>>>>>
>>>>>>> When CPUs with unstable TSCs enter deep C-state, TSC may stop
>>>>>>> running. This causes us to require resynchronization. Since
>>>>>>> we can't tell when this may potentially happen, we assume the
>>>>>>> worst by forcing re-compensation for it at every point the VCPU
>>>>>>> task is descheduled.
>>>>>>>
>>>>>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>>>>>> ---
>>>>>>> arch/x86/kvm/x86.c | 2 +-
>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>> index 7fc4a55..52b6c21 100644
>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>> @@ -1866,7 +1866,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
>>>>>>> *vcpu, int cpu)
>>>>>>> }
>>>>>>>
>>>>>>> kvm_x86_ops->vcpu_load(vcpu, cpu);
>>>>>>> - if (unlikely(vcpu->cpu != cpu)) {
>>>>>>> + if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>>>>>> /* Make sure TSC doesn't go backwards */
>>>>>>> s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>>>>>>> native_read_tsc() - vcpu->arch.last_host_tsc;
>>>>>>>
>>>>>>>
>>>>>> For yet unknown reason, this commit breaks Linux guests here if they
>>>>>> are
>>>>>> started with only a single VCPU. They hang during boot, obviously no
>>>>>> longer receiving interrupts.
>>>>>>
>>>>>> I'm using kvm-kmod against a 2.6.34 host kernel, so this may be a
>>>>>> side
>>>>>> effect of the wrapping, though I cannot imagine how.
>>>>>>
>>>>>> Anyone any ideas?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Most likely, time went backwards, and some 'future - past' calculation
>>>>> resulted in a negative sleep value which was then interpreted as
>>>>> unsigned and resulted in a 2342525634 year sleep.
>>>>>
>>>>>
>>>> Looks like that's the case on first glance at the apic state.
>>>>
>>>>
>>> This compensation effectively nulls the delta between current and
>>> last TSC:
>>>
>>> if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>> /* Make sure TSC doesn't go backwards */
>>> s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>>> native_read_tsc() -
>>> vcpu->arch.last_host_tsc;
>>> if (tsc_delta< 0)
>>> mark_tsc_unstable("KVM discovered backwards
>>> TSC");
>>> if (check_tsc_unstable())
>>> kvm_x86_ops->adjust_tsc_offset(vcpu,
>>> -tsc_delta);
>>> kvm_migrate_timers(vcpu);
>>> vcpu->cpu = cpu;
>>>
>>> If TSC has advanced quite a bit due to a TSC jump during sleep(*), it
>>> will adjust the offset backwards to compensate; similarly, if it has
>>> gone backwards, it will advance the offset.
>>>
>>> In neither case should the visible TSC go backwards, assuming
>>> last_host_tsc is recorded properly, and so kvmclock should be similarly
>>> unaffected.
>>>
>>> Perhaps the guest is more intelligent than we hope, and is comparing two
>>> different clocks: kvmclock or TSC with the rate of PIT interrupts. This
>>> could result in negative arithmetic begin interpreted as unsigned. Are
>>> you using PIT interrupt reinjection on this guest or passing
>>> -no-kvm-pit-reinjection?
>>>
>>>
>>>>
>>>>
>>>>> Does your guest use kvmclock, tsc, or some other time source?
>>>>>
>>>>>
>>>> A kernel that has kvmclock support even hangs in SMP mode. The others
>>>> pick hpet or acpi_pm. TSC is considered unstable.
>>>>
>>>>
>>> SMP mode here has always and will always be unreliable. Are you running
>>> on an Intel or AMD CPU? The origin of this code comes from a workaround
>>> for (*) in vendor-specific code, and perhaps it is inappropriate for
>>> both.
>>>
>> I'm on a fairly new Intel i7 (M 620). And I accidentally rebooted my box
>> a few hours ago. Well, the issue is gone now...
>>
>> So I looked into the system logs and found this:
>>
>> [18446744053.434939] PM: resume of devices complete after 4379.595 msecs
>> [18446744053.457133] PM: Finishing wakeup.
>> [18446744053.457135] Restarting tasks ...
>> [ 0.000999] Marking TSC unstable due to KVM discovered backwards TSC
>> [270103.974668] done.
>>
>> From that point on the box was on hpet, including the time I did the
>> failing tests this morning. The kvm-kmod version loaded at this point
>> was based on kvm.git df549cfc.
>>
>> But my /proc/cpuinfo claims "constant_tsc", and Linux is generally happy
>> with using it as clock source. Does this tell you anything?
>>
>
> Yes, quite a bit.
>
> It's possible that marking the TSC unstable with an actively running VM
> causes a boundary condition that I had not accounted for. It's also
> possible that the clocksource switch triggered some bad behavior.
Suspend/resume (to RAM) is indeed triggering the tsc switch by KVM here.
This should be the first issue as the kernel itself has no problems with
recovering from suspend/resume /wrt tsc.
The next one is what happened to the guest running at that point. It was
a SUSE 11.3 32-bit image, using kvm-clock. After resume and host-side
clock switch it lost its timer ticks, likely due to some breakage of
kvm-clock.
And finally, I'm now in the original failure state again in which every
newly started Linux guest with kvm-clock support also suffers from stuck
timers. Linux kernel that lack kvm-clock run fine, e.g. on hpet
clocksource. Maybe this is just another symptom of what also cause the
second problem.
>
> This suggests two debugging techniques: I can manually switch the
> clocksource, and I can also load a module which does nothing other than
> mark the TSC unstable. Failing that, we can investigate PM suspend /
> resume for possible issues.
>
> I'll try this on my Intel boxes to see what happens.
Do you think kvm-kmod could contribute to this? As I said, I'm on a 34
kernel, namely SUSE's 2.6.34.4-0.1-desktop. Any feature missing in that
kernel latest KVM depends for proper tsc/kvm-clock handling? If you have
any concerns, I could try to run kvm.git natively later on.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
next prev parent reply other threads:[~2010-09-15 5:35 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 8:07 KVM timekeeping and TSC virtualization Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 01/35] Drop vm_init_tsc Zachary Amsden
2010-08-20 16:54 ` Glauber Costa
2010-08-20 8:07 ` [KVM timekeeping 02/35] Convert TSC writes to TSC offset writes Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 03/35] Move TSC offset writes to common code Zachary Amsden
2010-08-20 17:06 ` Glauber Costa
2010-08-24 0:51 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 04/35] Fix SVM VMCB reset Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 05/35] Move TSC reset out of vmcb_init Zachary Amsden
2010-08-20 17:08 ` Glauber Costa
2010-08-24 0:52 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 06/35] TSC reset compensation Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 07/35] Make cpu_tsc_khz updates use local CPU Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 08/35] Warn about unstable TSC Zachary Amsden
2010-08-20 17:28 ` Glauber Costa
2010-08-24 0:56 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 09/35] Unify TSC logic Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 10/35] Fix deep C-state TSC desynchronization Zachary Amsden
2010-08-20 17:30 ` Glauber Costa
2010-09-14 9:10 ` Jan Kiszka
2010-09-14 9:27 ` Avi Kivity
2010-09-14 10:40 ` Jan Kiszka
2010-09-14 10:47 ` Avi Kivity
2010-09-14 19:32 ` Zachary Amsden
2010-09-14 22:26 ` Jan Kiszka
2010-09-14 23:40 ` Zachary Amsden
2010-09-15 5:34 ` Jan Kiszka [this message]
2010-09-15 7:55 ` Avi Kivity
2010-09-15 8:04 ` Jan Kiszka
2010-09-15 12:29 ` Glauber Costa
2010-09-15 4:07 ` Zachary Amsden
2010-09-15 8:09 ` Jan Kiszka
2010-09-15 12:32 ` Glauber Costa
2010-09-15 18:27 ` Jan Kiszka
2010-09-17 22:09 ` Zachary Amsden
2010-09-17 22:31 ` Zachary Amsden
2010-09-18 23:53 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 11/35] Add helper functions for time computation Zachary Amsden
2010-08-20 17:34 ` Glauber Costa
2010-08-24 0:58 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 12/35] Robust TSC compensation Zachary Amsden
2010-08-20 17:40 ` Glauber Costa
2010-08-24 1:01 ` Zachary Amsden
2010-08-24 21:33 ` Daniel Verkamp
2010-08-20 8:07 ` [KVM timekeeping 13/35] Perform hardware_enable in CPU_STARTING callback Zachary Amsden
2010-08-27 16:32 ` Jan Kiszka
2010-08-27 23:43 ` Zachary Amsden
2010-08-30 9:10 ` Jan Kiszka
2010-08-20 8:07 ` [KVM timekeeping 14/35] Add clock sync request to hardware enable Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 15/35] Move scale_delta into common header Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 16/35] Fix a possible backwards warp of kvmclock Zachary Amsden
2011-09-02 18:34 ` Philipp Hahn
2011-09-05 14:06 ` [BUG, PATCH-2.6.32] " Philipp Hahn
2011-09-12 11:32 ` Marcelo Tosatti
2010-08-20 8:07 ` [KVM timekeeping 17/35] Implement getnsboottime kernel API Zachary Amsden
2010-08-20 18:39 ` john stultz
2010-08-20 23:37 ` Zachary Amsden
2010-08-21 0:02 ` john stultz
2010-08-21 0:52 ` Zachary Amsden
2010-08-21 1:04 ` john stultz
2010-08-21 1:22 ` Zachary Amsden
2010-08-27 18:05 ` Jan Kiszka
2010-08-27 23:48 ` Zachary Amsden
2010-08-30 18:07 ` Jan Kiszka
2010-08-20 8:07 ` [KVM timekeeping 18/35] Use getnsboottime in KVM Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 19/35] Add timekeeping documentation Zachary Amsden
2010-08-20 17:50 ` Glauber Costa
2010-08-20 8:07 ` [KVM timekeeping 20/35] Make math work for other scales Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 21/35] Track max tsc_khz Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 22/35] Track tsc last write in vcpu Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 23/35] Set initial TSC rate conversion factors Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 24/35] Timer request function renaming Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 25/35] Add clock catchup mode Zachary Amsden
2010-08-25 17:27 ` Marcelo Tosatti
2010-08-25 20:48 ` Zachary Amsden
2010-08-25 22:01 ` Marcelo Tosatti
2010-08-25 23:38 ` Glauber Costa
2010-08-26 0:17 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 26/35] Catchup slower TSC to guest rate Zachary Amsden
2010-09-07 3:44 ` Dong, Eddie
2010-09-07 3:44 ` Dong, Eddie
2010-09-07 22:14 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 27/35] Add TSC trapping Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 28/35] Unstable TSC write compensation Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 29/35] TSC overrun protection Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 30/35] IOCTL for setting TSC rate Zachary Amsden
2010-08-20 17:56 ` Glauber Costa
2010-08-21 16:11 ` Arnd Bergmann
2010-08-20 8:07 ` [KVM timekeeping 31/35] Exit conditions for TSC trapping Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 32/35] Entry " Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 33/35] Indicate reliable TSC in kvmclock Zachary Amsden
2010-08-20 17:45 ` Glauber Costa
2010-08-24 1:14 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 34/35] Remove dead code Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 35/35] Add some debug stuff Zachary Amsden
2010-08-20 13:26 ` KVM timekeeping and TSC virtualization David S. Ahern
2010-08-20 23:24 ` Zachary Amsden
2010-08-22 1:32 ` David S. Ahern
2010-08-24 1:44 ` Zachary Amsden
2010-08-24 3:04 ` David S. Ahern
2010-08-24 5:47 ` Zachary Amsden
2010-08-24 13:32 ` David S. Ahern
2010-08-24 23:01 ` Zachary Amsden
2010-08-25 16:55 ` Marcelo Tosatti
2010-08-25 20:32 ` Zachary Amsden
2010-08-24 22:13 ` Marcelo Tosatti
2010-08-25 4:04 ` Zachary Amsden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C905AFC.5020706@web.de \
--to=jan.kiszka@web.de \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=johnstul@us.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=tglx@linutronix.de \
--cc=zamsden@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.