kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Paul Durrant <paul@xen.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com,
	 Dongli Zhang <dongli.zhang@oracle.com>,
	Chenyi Qiang <chenyi.qiang@intel.com>
Subject: Re: [RFC PATCH v3 15/21] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
Date: Thu, 15 Aug 2024 19:38:36 -0700	[thread overview]
Message-ID: <Zr67rPxxSDYTYPYV@google.com> (raw)
In-Reply-To: <20240522001817.619072-16-dwmw2@infradead.org>

On Wed, May 22, 2024, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> There is no reason why the KVM clock cannot be in masterclock mode when
> the TSCs are not in sync, as long as they are at the same *frequency*.

Yes there is.  KVM sets PVCLOCK_TSC_STABLE_BIT when the master clock is enabled:

	if (use_master_clock)
		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;

which tells the guest its safe to skip the cross-(v)CPU forced monotonicty goo:

	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
		(flags & PVCLOCK_TSC_STABLE_BIT))
		return ret;

	/*
	 * Assumption here is that last_value, a global accumulator, always goes
	 * forward. If we are less than that, we should not be much smaller.
	 * We assume there is an error margin we're inside, and then the correction
	 * does not sacrifice accuracy.
	 *
	 * For reads: global may have changed between test and return,
	 * but this means someone else updated poked the clock at a later time.
	 * We just need to make sure we are not seeing a backwards event.
	 *
	 * For updates: last_value = ret is not enough, since two vcpus could be
	 * updating at the same time, and one of them could be slightly behind,
	 * making the assumption that last_value always go forward fail to hold.
	 */
	last = raw_atomic64_read(&last_value);
	do {
		if (ret <= last)
			return last;
	} while (!raw_atomic64_try_cmpxchg(&last_value, &last, ret));

	return ret;

As called out by commit b48aa97e3820 ("KVM: x86: require matched TSC offsets for
master clock"), that was the motivation for requiring matched offsets.

As an aside, Marcelo's assertion that "its obvious" was anything but to me.  Took
me forever to piece together the PVCLOCK_TSC_STABLE_BIT angle.

    KVM: x86: require matched TSC offsets for master clock
    
    With master clock, a pvclock clock read calculates:
    
    ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]
    
    Where 'rdtsc' is the host TSC.
    
    system_timestamp and tsc_timestamp are unique, one tuple
    per VM: the "master clock".
    
    Given a host with synchronized TSCs, its obvious that
    guest TSC must be matched for the above to guarantee monotonicity.
    
    Allow master clock usage only if guest TSCs are synchronized.
    
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

> Running at a different frequency would lead to a systemic skew between
> the clock(s) as observed by different vCPUs due to arithmetic precision
> in the scaling. So that should indeed force the clock to be based on the
> host's CLOCK_MONOTONIC_RAW instead of being in masterclock mode where it
> is defined by the (or 'a') guest TSC.
> 
> But when the vCPUs merely have a different TSC *offset*, that's not a
> problem. The offset is applied to that vCPU's kvmclock->tsc_timestamp
> field, and it all comes out in the wash.
> 
> So, remove ka->nr_vcpus_matched_tsc and replace it with a new field
> ka->all_vcpus_matched_tsc which is not only changed to a boolean, but
> also now tracks that the *frequency* matches, not the precise offset.
> 
> Using a *count* was always racy because a new vCPU could be being
> created *while* kvm_track_tsc_matching() was running and comparing with
> kvm->online_vcpus. That variable is only atomic with respect to itself.
> In particular, kvm_arch_vcpu_create() runs before kvm->online_vcpus is
> incremented for the new vCPU, and kvm_arch_vcpu_postcreate() runs later.
> 
> Repurpose kvm_track_tsc_matching() to be called from kvm_set_tsc_khz(),
> and kill the cur_tsc_generation/last_tsc_generation fields which tracked
> the precise TSC matching.

If this goes anywhere, it needs to be at least three patches:

  1. Bulk code movement
  2. Chaning from a count to a bool/flag
  3. Eliminating the offset matching

  parent reply	other threads:[~2024-08-16  2:38 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-22  0:16 [RFC PATCH v3 00/21] Cleaning up the KVM clock mess David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 01/21] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 02/21] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-08-13 17:50   ` Sean Christopherson
2024-05-22  0:16 ` [RFC PATCH v3 03/21] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-05-22  0:16 ` [RFC PATCH v3 04/21] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms David Woodhouse
2024-05-24 13:14   ` Paul Durrant
2024-08-13 18:07   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 05/21] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-08-13 18:55   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 06/21] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-05-22  0:17 ` [RFC PATCH v3 07/21] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-08-14  1:52   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 08/21] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-08-14  1:57   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 09/21] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-05-24 13:20   ` Paul Durrant
2024-08-14  2:58   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 10/21] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-05-24 13:26   ` Paul Durrant
2024-08-14  4:57   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 11/21] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2024-05-24 13:53   ` Paul Durrant
2024-08-15 15:46   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 12/21] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2024-05-24 13:56   ` Paul Durrant
2024-08-15 15:52   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 13/21] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2024-05-24 14:03   ` Paul Durrant
2024-08-15 16:00   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 14/21] KVM: x86: Kill cur_tsc_{nsec,offset,write} fields David Woodhouse
2024-05-24 14:05   ` Paul Durrant
2024-08-15 16:30   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 15/21] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2024-05-24 14:10   ` Paul Durrant
2024-08-16  2:38   ` Sean Christopherson [this message]
2024-05-22  0:17 ` [RFC PATCH v3 16/21] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2024-05-24 14:13   ` Paul Durrant
2024-08-15 17:12   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 17/21] KVM: x86: Avoid global clock update on setting KVM clock MSR David Woodhouse
2024-05-24 14:14   ` Paul Durrant
2024-08-16  4:28   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 18/21] KVM: x86: Avoid gratuitous global clock reload in kvm_arch_vcpu_load() David Woodhouse
2024-05-24 14:16   ` Paul Durrant
2024-08-15 17:31   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 19/21] KVM: x86: Avoid periodic KVM clock updates in master clock mode David Woodhouse
2024-05-24 14:18   ` Paul Durrant
2024-08-16  4:33   ` Sean Christopherson
2024-05-22  0:17 ` [RFC PATCH v3 20/21] KVM: x86/xen: Prevent runstate times from becoming negative David Woodhouse
2024-05-24 14:21   ` Paul Durrant
2024-08-16  4:39   ` Sean Christopherson
2024-08-20 10:22     ` David Woodhouse
2024-08-20 15:08       ` Steven Rostedt
2024-08-20 15:42         ` David Woodhouse
2024-05-22  0:17 ` [RFC PATCH v3 21/21] sched/cputime: Cope with steal time going backwards or negative David Woodhouse
2024-05-24 14:25   ` Paul Durrant
2024-07-02 14:09   ` Shrikanth Hegde
2024-08-16  4:35   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zr67rPxxSDYTYPYV@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=chenyi.qiang@intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dongli.zhang@oracle.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).