From: Paolo Bonzini <pbonzini@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>
Subject: Re: KVM: x86: fix tsc catchup issue with tsc scaling
Date: Wed, 15 Jan 2014 12:43:02 +0100 [thread overview]
Message-ID: <52D67446.3020403@redhat.com> (raw)
In-Reply-To: <20140106141859.GA27260@amt.cnet>
Il 06/01/2014 15:18, Marcelo Tosatti ha scritto:
>
> To fix a problem related to different resolution of TSC and system clock,
> the offset in TSC units is approximated by
>
> delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc
>
> (Guest TSC value at (Guest TSC value at last VM-exit)
> the last kvm_guest_time_update
> call)
>
> Delta is then later scaled using mult,shift pair found in hv_clock
> structure (which is correct against tsc_timestamp in that
> structure).
>
> However, if a frequency change is performed between these two points,
> this delta is measured using different TSC frequencies, but scaled using
> mult,shift pair for one frequency only.
>
> The end result is an incorrect delta.
>
> The bug which this code works around is not the only cause for
> clock backwards events. The global accumulator is still
> necessary, so remove the max_kernel_ns fix and rely on the
> global accumulator for no clock backwards events.
This is basically reverting commit 1d5f066 (KVM: x86: Fix a possible
backwards warp of kvmclock, 2010-08-19).
Your commit message basically says that the guest-side 489fb49 (x86,
paravirt: Add a global synchronization point for pvclock, 2010-05-11) is
the real solution to the problem that the host-side commit 1d5f066 was
trying to fix. Right?
This patch makes vcpu->hv_clock.tsc_timestamp write only. Please
provide a follow up that drops the field entirely, then I'll apply both.
In the meanwhile:
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks,
Paolo
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> Index: linux-2.6.git/arch/x86/kvm/x86.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kvm/x86.c
> +++ linux-2.6.git/arch/x86/kvm/x86.c
> @@ -1484,7 +1484,7 @@ static int kvm_guest_time_update(struct
> unsigned long flags, this_tsc_khz;
> struct kvm_vcpu_arch *vcpu = &v->arch;
> struct kvm_arch *ka = &v->kvm->arch;
> - s64 kernel_ns, max_kernel_ns;
> + s64 kernel_ns;
> u64 tsc_timestamp, host_tsc;
> struct pvclock_vcpu_time_info guest_hv_clock;
> u8 pvclock_flags;
> @@ -1543,37 +1543,6 @@ static int kvm_guest_time_update(struct
> if (!vcpu->pv_time_enabled)
> return 0;
>
> - /*
> - * Time as measured by the TSC may go backwards when resetting the base
> - * tsc_timestamp. The reason for this is that the TSC resolution is
> - * higher than the resolution of the other clock scales. Thus, many
> - * possible measurments of the TSC correspond to one measurement of any
> - * other clock, and so a spread of values is possible. This is not a
> - * problem for the computation of the nanosecond clock; with TSC rates
> - * around 1GHZ, there can only be a few cycles which correspond to one
> - * nanosecond value, and any path through this code will inevitably
> - * take longer than that. However, with the kernel_ns value itself,
> - * the precision may be much lower, down to HZ granularity. If the
> - * first sampling of TSC against kernel_ns ends in the low part of the
> - * range, and the second in the high end of the range, we can get:
> - *
> - * (TSC - offset_low) * S + kns_old > (TSC - offset_high) * S + kns_new
> - *
> - * As the sampling errors potentially range in the thousands of cycles,
> - * it is possible such a time value has already been observed by the
> - * guest. To protect against this, we must compute the system time as
> - * observed by the guest and ensure the new system time is greater.
> - */
> - max_kernel_ns = 0;
> - if (vcpu->hv_clock.tsc_timestamp) {
> - max_kernel_ns = vcpu->last_guest_tsc -
> - vcpu->hv_clock.tsc_timestamp;
> - max_kernel_ns = pvclock_scale_delta(max_kernel_ns,
> - vcpu->hv_clock.tsc_to_system_mul,
> - vcpu->hv_clock.tsc_shift);
> - max_kernel_ns += vcpu->last_kernel_ns;
> - }
> -
> if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
> kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
> &vcpu->hv_clock.tsc_shift,
> @@ -1581,14 +1550,6 @@ static int kvm_guest_time_update(struct
> vcpu->hw_tsc_khz = this_tsc_khz;
> }
>
> - /* with a master <monotonic time, tsc value> tuple,
> - * pvclock clock reads always increase at the (scaled) rate
> - * of guest TSC - no need to deal with sampling errors.
> - */
> - if (!use_master_clock) {
> - if (max_kernel_ns > kernel_ns)
> - kernel_ns = max_kernel_ns;
> - }
> /* With all the info we got, fill in the values */
> vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
> vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
>
next prev parent reply other threads:[~2014-01-15 11:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-06 14:18 KVM: x86: fix tsc catchup issue with tsc scaling Marcelo Tosatti
2014-01-15 11:43 ` Paolo Bonzini [this message]
2014-01-15 12:25 ` Marcelo Tosatti
2014-01-15 12:34 ` Marcelo Tosatti
2014-01-15 12:55 ` Paolo Bonzini
2014-01-15 16:37 ` Marcelo Tosatti
2014-01-15 16:53 ` Paolo Bonzini
2014-01-15 16:55 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D67446.3020403@redhat.com \
--to=pbonzini@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox