All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paul Durrant <paul@xen.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	David Woodhouse <dwmw2@infradead.org>,
	kvm@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT
Date: Tue, 31 Oct 2023 15:39:57 -0700	[thread overview]
Message-ID: <ZUGCPQegUeTutsrb@google.com> (raw)
In-Reply-To: <20231031115748.622578-1-paul@xen.org>

On Tue, Oct 31, 2023, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Unless explicitly told to do so (by passing 'clocksource=tsc' and
> 'tsc=stable:socket', and then jumping through some hoops concerning
> potential CPU hotplug) Xen will never use TSC as its clocksource.
> Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
> in either the primary or secondary pvclock memory areas. This has
> led to bugs in some guest kernels which only become evident if
> PVCLOCK_TSC_STABLE_BIT *is* set in the pvclocks. Hence, to support
> such guests, give the VMM a new Xen HVM config flag to tell KVM to
> forcibly clear the bit in the Xen pvclocks.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
>  Documentation/virt/kvm/api.rst |  6 ++++++
>  arch/x86/kvm/x86.c             | 28 +++++++++++++++++++++++-----
>  arch/x86/kvm/xen.c             |  3 ++-
>  include/uapi/linux/kvm.h       |  1 +
>  4 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 21a7578142a1..9752a01270df 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8252,6 +8252,7 @@ PVHVM guests. Valid flags are::
>    #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL		(1 << 4)
>    #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
>    #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
> +  #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
>  
>  The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
>  ioctl is available, for the guest to set its hypercall page.
> @@ -8295,6 +8296,11 @@ behave more correctly, not using the XEN_RUNSTATE_UPDATE flag until/unless
>  specifically enabled (by the guest making the hypercall, causing the VMM
>  to enable the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute).
>  
> +The KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag indicates that KVM supports
> +clearing the PVCLOCK_TSC_STABLE_BIT flag in Xen pvclock sources. This will be
> +done when the KVM_CAP_XEN_HVM ioctl sets the
> +KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag.
> +
>  8.31 KVM_CAP_PPC_MULTITCE
>  -------------------------
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 41cce5031126..6abad6dacf07 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3096,7 +3096,8 @@ u64 get_kvmclock_ns(struct kvm *kvm)
>  
>  static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>  				    struct gfn_to_pfn_cache *gpc,
> -				    unsigned int offset)
> +				    unsigned int offset,
> +				    bool force_tsc_unstable)
>  {
>  	struct kvm_vcpu_arch *vcpu = &v->arch;
>  	struct pvclock_vcpu_time_info *guest_hv_clock;
> @@ -3122,6 +3123,10 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>  	 */
>  
>  	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> +
> +	if (force_tsc_unstable)
> +		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;

I don't see how this works.  This clears the bit in the guest copy, then clobbers
all of guest_hv_clock with a memcpy().

	if (force_tsc_unstable)
		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;

	smp_wmb();

	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);

	if (vcpu->pvclock_set_guest_stopped_request) {
		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
		vcpu->pvclock_set_guest_stopped_request = false;
	}

	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock)); <= sets PVCLOCK_TSC_STABLE_BIT again, no?
	smp_wmb();

Any reason not to make this a generic "capability" instead of a Xen specific flag?
E.g. I assume these problematic guests would mishandle PVCLOCK_TSC_STABLE_BIT if
it showed up in kvmclock, but they don't use kvmclock so it's not a problem in
practice.

I doubt there's a real need or use case, but it'd require less churn and IMO is
simpler overall, e.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e3eb608b6692..731b201bfd5a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3225,7 +3225,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 
        /* If the host uses TSC clocksource, then it is stable */
        pvclock_flags = 0;
-       if (use_master_clock)
+       if (use_master_clock && !vcpu->kvm.force_tsc_unstable)
                pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
 
        vcpu->hv_clock.flags = pvclock_flags;

I also assume this is a "set and forget" thing?  I.e. KVM can require the flag
to be set before any vCPUs are created.

  reply	other threads:[~2023-10-31 22:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-31 11:57 [PATCH v2] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT Paul Durrant
2023-10-31 22:39 ` Sean Christopherson [this message]
2023-10-31 22:48   ` David Woodhouse
2023-10-31 22:58     ` Sean Christopherson
2023-10-31 23:06       ` David Woodhouse
2023-11-01 11:06         ` Paul Durrant
2023-11-01 11:02     ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUGCPQegUeTutsrb@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.