All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Paul Durrant <paul@xen.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com,
	 Dongli Zhang <dongli.zhang@oracle.com>,
	Chenyi Qiang <chenyi.qiang@intel.com>
Subject: Re: [RFC PATCH v3 09/21] KVM: x86: Fix KVM clock precision in __get_kvmclock()
Date: Tue, 13 Aug 2024 19:58:16 -0700	[thread overview]
Message-ID: <ZrwdSLvlhde6uaAB@google.com> (raw)
In-Reply-To: <20240522001817.619072-10-dwmw2@infradead.org>

On Wed, May 22, 2024, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> When in 'master clock mode' (i.e. when host and guest TSCs are behaving
> sanely and in sync), the KVM clock is defined in terms of the guest TSC.
> 
> When TSC scaling is used, calculating the KVM clock directly from *host*
> TSC cycles leads to a systemic drift from the values calculated by the
> guest from its TSC.
> 
> Commit 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> had a simple workaround for the specific case of Xen timers, as it had an
> actual vCPU to hand and could use its scaling information. That commit
> noted that it was broken for the general case of get_kvmclock_ns(), and
> said "I'll come back to that".
> 
> Since __get_kvmclock() is invoked without a specific CPU, it needs to
> be able to find or generate the scaling values required to perform the
> correct calculation.
> 
> Thankfully, TSC scaling can only happen with X86_FEATURE_CONSTANT_TSC,
> so it isn't as complex as it might have been.
> 
> In __kvm_synchronize_tsc(), note the current vCPU's scaling ratio in
> kvm->arch.last_tsc_scaling_ratio. That is only protected by the
> tsc_write_lock, so in pvclock_update_vm_gtod_copy(), copy it into a
> separate kvm->arch.master_tsc_scaling_ratio so that it can be accessed
> using the kvm->arch.pvclock_sc seqcount lock. Also generate the mul and
> shift factors to convert to nanoseconds for the corresponding KVM clock,
> just as kvm_guest_time_update() would.
> 
> In __get_kvmclock(), which runs within a seqcount retry loop, use those
> values to convert host to guest TSC and then to nanoseconds. Only fall
> back to using get_kvmclock_base_ns() when not in master clock mode.
> 
> There was previously a code path in __get_kvmclock() which looked like
> it could set KVM_CLOCK_TSC_STABLE without KVM_CLOCK_REALTIME, perhaps
> even on 32-bit hosts. In practice that could never happen as the
> ka->use_master_clock flag couldn't be set on 32-bit, and even on 64-bit
> hosts it would never be set when the system clock isn't TSC-based. So
> that code path is now removed.

This should be a separate patch.  Actually, patches, plural.  More below

> The kvm_get_wall_clock_epoch() function had the same problem; make it
> just call get_kvmclock() and subtract kvmclock from wallclock, with
> the same fallback as before.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---

...

> @@ -3100,36 +3131,49 @@ static unsigned long get_cpu_tsc_khz(void)
>  static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
>  {
>  	struct kvm_arch *ka = &kvm->arch;
> -	struct pvclock_vcpu_time_info hv_clock;
> +
> +#ifdef CONFIG_X86_64
> +	uint64_t cur_tsc_khz = 0;
> +	struct timespec64 ts;
>  
>  	/* both __this_cpu_read() and rdtsc() should be on the same cpu */
>  	get_cpu();
>  
> -	data->flags = 0;
>  	if (ka->use_master_clock &&
> -	    (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_khz))) {
> -#ifdef CONFIG_X86_64
> -		struct timespec64 ts;
> +	    (cur_tsc_khz = get_cpu_tsc_khz()) &&

That is mean.  And if you push it inside the if-statement, the {get,put}_cpu()
can be avoided when the master clock isn't being used, e.g.

	if (ka->use_master_clock) {
		/*
		 * The RDTSC needs to happen on the same CPU whose frequency is
		 * used to compute kvmclock's time.
		 */
		get_cpu();
    
    		cur_tsc_khz = get_cpu_tsc_khz();
		if (cur_tsc_khz &&
	    	    !kvm_get_walltime_and_clockread(&ts, &data->host_tsc))
			cur_tsc_khz = 0;

		put_cpu();
	}

However, the changelog essentially claims kvm_get_walltime_and_clockread() should
never fail when use_master_clock is enabled, which suggests a WARN is warranted.

    There was previously a code path in __get_kvmclock() which looked like
    it could set KVM_CLOCK_TSC_STABLE without KVM_CLOCK_REALTIME, perhaps
    even on 32-bit hosts. In practice that could never happen as the
    ka->use_master_clock flag couldn't be set on 32-bit, and even on 64-bit
    hosts it would never be set when the system clock isn't TSC-based. So
    that code path is now removed.

But, I think kvm_get_walltime_and_clockread() can fail when use_master_clock is
true, i.e. I don't think a WARN is viable as it could get false positives.

Ah, this is protected by pvclock_sc, so a stale use_master_clock should result
in a retry.  What if we WARN on that?

Hrm, that requires plumbing in the original sequence count.  Ah, but looking at
the patch as a whole, if we keep kvm_get_wall_clock_epoch()'s style, then it's
much easier.  And FWIW, I like the existing kvm_get_wall_clock_epoch() style a
lot more than the get_kvmclock() => __get_kvmclock() approach.

So, can we do this as prep patch #1?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9c14d0f5a684..98806a59e110 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3360,9 +3360,16 @@ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm)
 
                local_tsc_khz = get_cpu_tsc_khz();
 
+               /*
+                * The master clock depends on the pvclock being based on TSC,
+                * so the only way kvm_get_walltime_and_clockread() can fail is
+                * if the clocksource changed and use_master_clock is stale, in
+                * which case a seqcount retry should be pending.
+                */
                if (local_tsc_khz &&
-                   !kvm_get_walltime_and_clockread(&ts, &host_tsc))
-                       local_tsc_khz = 0; /* Fall back to old method */
+                   !kvm_get_walltime_and_clockread(&ts, &host_tsc) &&
+                   WARN_ON_ONCE(!read_seqcount_retry(&ka->pvclock_sc, seq)))
+                           local_tsc_khz = 0; /* Fall back to old method */
 
                put_cpu();
 

And then as patch(es) 2..7 (give or take)

  (2) fold __get_kvmclock() into get_kvmclock()
  (3) and the same WARN on the seqcount in get_kvmclock() (but skimp on the comments)
  (4) use get_kvmclock_base_ns() as the fallback in get_kvmclock(), i.e. delete
      the raw rdtsc() and setting of KVM_CLOCK_TSC_STABLE w/o KVM_CLOCK_REALTIME
  (5) use get_cpu_tsc_khz() instead of open coding something similar
  (6) scale TSC when computing kvmclock (the core of this patch)
  (7) use get_kvmclock() in kvm_get_wall_clock_epoch() as the will be 100%
      equivalent at this point.

> +	    !kvm_get_walltime_and_clockread(&ts, &data->host_tsc))
> +		cur_tsc_khz = 0;
>  
> -		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
> -			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> -			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
> -		} else
> -#endif
> -		data->host_tsc = rdtsc();
> -
> -		data->flags |= KVM_CLOCK_TSC_STABLE;
> -		hv_clock.tsc_timestamp = ka->master_cycle_now;
> -		hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
> -		kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL,
> -				   &hv_clock.tsc_shift,
> -				   &hv_clock.tsc_to_system_mul);
> -		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> -	} else {
> -		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> +	put_cpu();
> +
> +	if (cur_tsc_khz) {
> +		uint64_t tsc_cycles;
> +		uint32_t mul;
> +		int8_t shift;
> +
> +		tsc_cycles = data->host_tsc - ka->master_cycle_now;
> +
> +		if (kvm_caps.has_tsc_control)
> +			tsc_cycles = kvm_scale_tsc(tsc_cycles,
> +						   ka->master_tsc_scaling_ratio);
> +
> +		if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +			mul = ka->master_tsc_mul;
> +			shift = ka->master_tsc_shift;
> +		} else {
> +			kvm_get_time_scale(NSEC_PER_SEC, cur_tsc_khz * 1000LL,
> +					   &shift, &mul);
> +		}
> +		data->clock = ka->master_kernel_ns + ka->kvmclock_offset +
> +			pvclock_scale_delta(tsc_cycles, mul, shift);
> +		data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> +		data->flags = KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC | KVM_CLOCK_TSC_STABLE;
> +		return;
>  	}
> +#endif
>  
> -	put_cpu();
> +	data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> +	data->flags = 0;
>  }


  parent reply	other threads:[~2024-08-14  2:58 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-22  0:16 [RFC PATCH v3 00/21] Cleaning up the KVM clock mess David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 01/21] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 02/21] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-08-13 17:50   ` Sean Christopherson
2024-05-22  0:16 ` [RFC PATCH v3 03/21] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 04/21] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms David Woodhouse
2024-05-24 13:14   ` Paul Durrant
2024-08-13 18:07   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 05/21] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-08-13 18:55   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 06/21] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-05-22  0:17 ` [RFC PATCH v3 07/21] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-08-14  1:52   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 08/21] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-08-14  1:57   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 09/21] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-05-24 13:20   ` Paul Durrant
2024-08-14  2:58   ` Sean Christopherson [this message]
2024-05-22  0:17 ` [RFC PATCH v3 10/21] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-05-24 13:26   ` Paul Durrant
2024-08-14  4:57   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 11/21] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2024-05-24 13:53   ` Paul Durrant
2024-08-15 15:46   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 12/21] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2024-05-24 13:56   ` Paul Durrant
2024-08-15 15:52   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 13/21] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2024-05-24 14:03   ` Paul Durrant
2024-08-15 16:00   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 14/21] KVM: x86: Kill cur_tsc_{nsec,offset,write} fields David Woodhouse
2024-05-24 14:05   ` Paul Durrant
2024-08-15 16:30   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 15/21] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2024-05-24 14:10   ` Paul Durrant
2024-08-16  2:38   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 16/21] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2024-05-24 14:13   ` Paul Durrant
2024-08-15 17:12   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 17/21] KVM: x86: Avoid global clock update on setting KVM clock MSR David Woodhouse
2024-05-24 14:14   ` Paul Durrant
2024-08-16  4:28   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 18/21] KVM: x86: Avoid gratuitous global clock reload in kvm_arch_vcpu_load() David Woodhouse
2024-05-24 14:16   ` Paul Durrant
2024-08-15 17:31   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 19/21] KVM: x86: Avoid periodic KVM clock updates in master clock mode David Woodhouse
2024-05-24 14:18   ` Paul Durrant
2024-08-16  4:33   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative David Woodhouse
2024-05-24 14:21   ` Paul Durrant
2024-08-16  4:39   ` Sean Christopherson
2024-08-20 10:22     ` David Woodhouse
2024-08-20 15:08       ` Steven Rostedt
2024-08-20 15:42         ` David Woodhouse
2024-05-22  0:17 ` [RFC PATCH v3 21/21] sched/cputime: Cope with steal time going backwards or negative David Woodhouse
2024-05-24 14:25   ` Paul Durrant
2024-07-02 14:09   ` Shrikanth Hegde
2024-08-16  4:35   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrwdSLvlhde6uaAB@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=chenyi.qiang@intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dongli.zhang@oracle.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.