From: Philipp Hahn <hahn@univention.de>
To: Zachary Amsden <zamsden@redhat.com>
Cc: kvm@vger.kernel.org, Avi Kivity <avi@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Glauber Costa <glommer@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
John Stultz <johnstul@us.ibm.com>
Subject: Re: [KVM timekeeping 16/35] Fix a possible backwards warp of kvmclock
Date: Fri, 2 Sep 2011 20:34:11 +0200 [thread overview]
Message-ID: <201109022034.16517.hahn@univention.de> (raw)
In-Reply-To: <1282291669-25709-17-git-send-email-zamsden@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6616 bytes --]
Hello,
there have been serveral reports on kvm-devel about problems with kvm-clock. I
can reproduce this 100%.
1. doesn't depend on guest kernel eing 2.6.32 or 3.0.1
2. qemu-0.14.1 is broken, qemu-0.15.0 works.
3. host-kernel 2.6.32.40 is broken, 3.0.0 works.
So currently I have found two solutions: either upgrade qemu-kvm or the
host-kernel, which (for me) both are no options.
I tracked it down to this patch: If I revert it, the VM starts fine.
My current procedure to reproduce this problem is like this:
1. boot into guest kernel
2. reboot from within guest
3. on the second boot, the VM crawls very slowly. The kernel-time printed are
roughly the same numbers as on the first boot, but they don't match
wall-clock: The kernel-time claims to be 10 s from uptime, but real-time is
more like 42 s. If the boot process makes it as far as a command line,
runnign a "sleep 1" takes much more than one second.
Putting in some prink(..., max_kernel_ns, kernel_ns) I've ovserved that during
the first boot, I get 10k calles to that function a second, on the seond boot
its down to 10-20 a second.
Also max_kernel_ns is way larger than kernel_ns:
18:27:17 [23755.005941] 6148360456312778392 23755005912468
This patch was back-ported to 2.6.32.40 but it looks like some other
infrastructure might have changed, so it doen't work as expected.
And idee on how to proceed?
On Friday 20 August 2010 10:07:30 Zachary Amsden wrote:
> Kernel time, which advances in discrete steps may progress much slower
> than TSC. As a result, when kvmclock is adjusted to a new base, the
> apparent time to the guest, which runs at a much higher, nsec scaled
> rate based on the current TSC, may have already been observed to have
> a larger value (kernel_ns + scaled tsc) than the value to which we are
> setting it (kernel_ns + 0).
>
> We must instead compute the clock as potentially observed by the guest
> for kernel_ns to make sure it does not go backwards.
>
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/x86.c | 43
> +++++++++++++++++++++++++++++++++++++- 2 files changed, 43 insertions(+), 2
> deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h index 324e892..871800d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -339,6 +339,8 @@ struct kvm_vcpu_arch {
> unsigned int time_offset;
> struct page *time_page;
> u64 last_host_tsc;
> + u64 last_guest_tsc;
> + u64 last_kernel_ns;
>
> bool nmi_pending;
> bool nmi_injected;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1948c36..fe74b42 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -976,14 +976,15 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
> struct kvm_vcpu_arch *vcpu = &v->arch;
> void *shared_kaddr;
> unsigned long this_tsc_khz;
> - s64 kernel_ns;
> + s64 kernel_ns, max_kernel_ns;
> + u64 tsc_timestamp;
>
> if ((!vcpu->time_page))
> return 0;
>
> /* Keep irq disabled to prevent changes to the clock */
> local_irq_save(flags);
> - kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp);
> + kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp);
> kernel_ns = get_kernel_ns();
> this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
> local_irq_restore(flags);
> @@ -993,13 +994,49 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
> return 1;
> }
>
> + /*
> + * Time as measured by the TSC may go backwards when resetting the base
> + * tsc_timestamp. The reason for this is that the TSC resolution is
> + * higher than the resolution of the other clock scales. Thus, many
> + * possible measurments of the TSC correspond to one measurement of any
> + * other clock, and so a spread of values is possible. This is not a
> + * problem for the computation of the nanosecond clock; with TSC rates
> + * around 1GHZ, there can only be a few cycles which correspond to one
> + * nanosecond value, and any path through this code will inevitably
> + * take longer than that. However, with the kernel_ns value itself,
> + * the precision may be much lower, down to HZ granularity. If the
> + * first sampling of TSC against kernel_ns ends in the low part of the
> + * range, and the second in the high end of the range, we can get:
> + *
> + * (TSC - offset_low) * S + kns_old > (TSC - offset_high) * S + kns_new
> + *
> + * As the sampling errors potentially range in the thousands of cycles,
> + * it is possible such a time value has already been observed by the
> + * guest. To protect against this, we must compute the system time as
> + * observed by the guest and ensure the new system time is greater.
> + */
> + max_kernel_ns = 0;
> + if (vcpu->hv_clock.tsc_timestamp && vcpu->last_guest_tsc) {
> + max_kernel_ns = vcpu->last_guest_tsc -
> + vcpu->hv_clock.tsc_timestamp;
> + max_kernel_ns = pvclock_scale_delta(max_kernel_ns,
> + vcpu->hv_clock.tsc_to_system_mul,
> + vcpu->hv_clock.tsc_shift);
> + max_kernel_ns += vcpu->last_kernel_ns;
> + }
> +
> if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
> kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock);
> vcpu->hw_tsc_khz = this_tsc_khz;
> }
>
> + if (max_kernel_ns > kernel_ns)
> + kernel_ns = max_kernel_ns;
> +
> /* With all the info we got, fill in the values */
> + vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
> vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
> + vcpu->last_kernel_ns = kernel_ns;
> vcpu->hv_clock.flags = 0;
>
> /*
> @@ -4931,6 +4968,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> if (hw_breakpoint_active())
> hw_breakpoint_restore();
>
> + kvm_get_msr(vcpu, MSR_IA32_TSC, &vcpu->arch.last_guest_tsc);
> +
> atomic_set(&vcpu->guest_mode, 0);
> smp_wmb();
> local_irq_enable();
Sincerely
Philipp
--
Philipp Hahn Open Source Software Engineer hahn@univention.de
Univention GmbH Linux for Your Business fon: +49 421 22 232- 0
Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99
http://www.univention.de/
----------------------------------------------------------------------------
Treffen Sie Univention auf der IT&Business vom 20. bis 22. September 2011
auf dem Gemeinschaftsstand der Open Source Business Alliance in Stuttgart in
Halle 3 Stand 3D27-7.
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
next prev parent reply other threads:[~2011-09-02 18:34 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 8:07 KVM timekeeping and TSC virtualization Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 01/35] Drop vm_init_tsc Zachary Amsden
2010-08-20 16:54 ` Glauber Costa
2010-08-20 8:07 ` [KVM timekeeping 02/35] Convert TSC writes to TSC offset writes Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 03/35] Move TSC offset writes to common code Zachary Amsden
2010-08-20 17:06 ` Glauber Costa
2010-08-24 0:51 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 04/35] Fix SVM VMCB reset Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 05/35] Move TSC reset out of vmcb_init Zachary Amsden
2010-08-20 17:08 ` Glauber Costa
2010-08-24 0:52 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 06/35] TSC reset compensation Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 07/35] Make cpu_tsc_khz updates use local CPU Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 08/35] Warn about unstable TSC Zachary Amsden
2010-08-20 17:28 ` Glauber Costa
2010-08-24 0:56 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 09/35] Unify TSC logic Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 10/35] Fix deep C-state TSC desynchronization Zachary Amsden
2010-08-20 17:30 ` Glauber Costa
2010-09-14 9:10 ` Jan Kiszka
2010-09-14 9:27 ` Avi Kivity
2010-09-14 10:40 ` Jan Kiszka
2010-09-14 10:47 ` Avi Kivity
2010-09-14 19:32 ` Zachary Amsden
2010-09-14 22:26 ` Jan Kiszka
2010-09-14 23:40 ` Zachary Amsden
2010-09-15 5:34 ` Jan Kiszka
2010-09-15 7:55 ` Avi Kivity
2010-09-15 8:04 ` Jan Kiszka
2010-09-15 12:29 ` Glauber Costa
2010-09-15 4:07 ` Zachary Amsden
2010-09-15 8:09 ` Jan Kiszka
2010-09-15 12:32 ` Glauber Costa
2010-09-15 18:27 ` Jan Kiszka
2010-09-17 22:09 ` Zachary Amsden
2010-09-17 22:31 ` Zachary Amsden
2010-09-18 23:53 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 11/35] Add helper functions for time computation Zachary Amsden
2010-08-20 17:34 ` Glauber Costa
2010-08-24 0:58 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 12/35] Robust TSC compensation Zachary Amsden
2010-08-20 17:40 ` Glauber Costa
2010-08-24 1:01 ` Zachary Amsden
2010-08-24 21:33 ` Daniel Verkamp
2010-08-20 8:07 ` [KVM timekeeping 13/35] Perform hardware_enable in CPU_STARTING callback Zachary Amsden
2010-08-27 16:32 ` Jan Kiszka
2010-08-27 23:43 ` Zachary Amsden
2010-08-30 9:10 ` Jan Kiszka
2010-08-20 8:07 ` [KVM timekeeping 14/35] Add clock sync request to hardware enable Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 15/35] Move scale_delta into common header Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 16/35] Fix a possible backwards warp of kvmclock Zachary Amsden
2011-09-02 18:34 ` Philipp Hahn [this message]
2011-09-05 14:06 ` [BUG, PATCH-2.6.32] " Philipp Hahn
2011-09-12 11:32 ` Marcelo Tosatti
2010-08-20 8:07 ` [KVM timekeeping 17/35] Implement getnsboottime kernel API Zachary Amsden
2010-08-20 18:39 ` john stultz
2010-08-20 23:37 ` Zachary Amsden
2010-08-21 0:02 ` john stultz
2010-08-21 0:52 ` Zachary Amsden
2010-08-21 1:04 ` john stultz
2010-08-21 1:22 ` Zachary Amsden
2010-08-27 18:05 ` Jan Kiszka
2010-08-27 23:48 ` Zachary Amsden
2010-08-30 18:07 ` Jan Kiszka
2010-08-20 8:07 ` [KVM timekeeping 18/35] Use getnsboottime in KVM Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 19/35] Add timekeeping documentation Zachary Amsden
2010-08-20 17:50 ` Glauber Costa
2010-08-20 8:07 ` [KVM timekeeping 20/35] Make math work for other scales Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 21/35] Track max tsc_khz Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 22/35] Track tsc last write in vcpu Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 23/35] Set initial TSC rate conversion factors Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 24/35] Timer request function renaming Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 25/35] Add clock catchup mode Zachary Amsden
2010-08-25 17:27 ` Marcelo Tosatti
2010-08-25 20:48 ` Zachary Amsden
2010-08-25 22:01 ` Marcelo Tosatti
2010-08-25 23:38 ` Glauber Costa
2010-08-26 0:17 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 26/35] Catchup slower TSC to guest rate Zachary Amsden
2010-09-07 3:44 ` Dong, Eddie
2010-09-07 22:14 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 27/35] Add TSC trapping Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 28/35] Unstable TSC write compensation Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 29/35] TSC overrun protection Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 30/35] IOCTL for setting TSC rate Zachary Amsden
2010-08-20 17:56 ` Glauber Costa
2010-08-21 16:11 ` Arnd Bergmann
2010-08-20 8:07 ` [KVM timekeeping 31/35] Exit conditions for TSC trapping Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 32/35] Entry " Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 33/35] Indicate reliable TSC in kvmclock Zachary Amsden
2010-08-20 17:45 ` Glauber Costa
2010-08-24 1:14 ` Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 34/35] Remove dead code Zachary Amsden
2010-08-20 8:07 ` [KVM timekeeping 35/35] Add some debug stuff Zachary Amsden
2010-08-20 13:26 ` KVM timekeeping and TSC virtualization David S. Ahern
2010-08-20 23:24 ` Zachary Amsden
2010-08-22 1:32 ` David S. Ahern
2010-08-24 1:44 ` Zachary Amsden
2010-08-24 3:04 ` David S. Ahern
2010-08-24 5:47 ` Zachary Amsden
2010-08-24 13:32 ` David S. Ahern
2010-08-24 23:01 ` Zachary Amsden
2010-08-25 16:55 ` Marcelo Tosatti
2010-08-25 20:32 ` Zachary Amsden
2010-08-24 22:13 ` Marcelo Tosatti
2010-08-25 4:04 ` Zachary Amsden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201109022034.16517.hahn@univention.de \
--to=hahn@univention.de \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=johnstul@us.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=tglx@linutronix.de \
--cc=zamsden@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).