Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH v3 14/41] x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c
From: David Woodhouse @ 2026-05-20 21:59 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-15-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 345 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Move kvmclock's sched_clock save/restore helper "up" so that they can
> (eventually) be referenced by kvm_sched_clock_init().
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 15/41] x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks
From: David Woodhouse @ 2026-05-20 22:11 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-16-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Nullify the x86_platform sched_clock save/restore hooks when setting up
> Xen's PV clock to make it somewhat obvious the hooks aren't used when
> running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Hm. Xen HVM guests *do* use a standard ACPI S4 hibernation flow.

Does that need testing?

> ---
>  arch/x86/xen/time.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 3d3165eef821..21d366d01985 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -568,6 +568,12 @@ static void __init xen_init_time_common(void)
>  	xen_sched_clock_offset = xen_clocksource_read();
>  	static_call_update(pv_steal_clock, xen_steal_clock);
>  	paravirt_set_sched_clock(xen_sched_clock);
> +	/*
> +	 * Xen has paravirtualized suspend/resume and so doesn't use the common
> +	 * x86 sched_clock save/restore hooks.
> +	 */
> +	x86_platform.save_sched_clock_state = NULL;
> +	x86_platform.restore_sched_clock_state = NULL;
>  
>  	tsc_register_calibration_routines(xen_tsc_khz, NULL);
>  	x86_platform.get_wallclock = xen_get_wallclock;


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 16/41] x86/vmware: Nullify save/restore hooks when using VMware's sched_clock
From: David Woodhouse @ 2026-05-20 22:15 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-17-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Nullify the sched_clock save/restore hooks when using VMware's version of
> sched_clock.  This will allow extending paravirt_set_sched_clock() to set
> the save/restore hooks, without having to simultaneously change the
> behavior of VMware guests.
> 
> Note, it's not at all obvious that it's safe/correct for VMware guests to
> do nothing on suspend/resume, but that's a pre-existing problem.  Leave it
> for a VMware expert to sort out.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 17/41] x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock
From: David Woodhouse @ 2026-05-20 22:27 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-18-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that all PV clocksources override the sched_clock save/restore hooks
> when overriding sched_clock, WARN if the "default" TSC hooks are invoked
> when using a PV sched_clock, e.g. to guard against regressions.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 18/41] x86/paravirt: Pass sched_clock save/restore helpers during registration
From: Woodhouse, David @ 2026-05-20 22:35 UTC (permalink / raw)
  To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
	alexey.makhalov@broadcom.com, jstultz@google.com,
	dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
	jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
	seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
	decui@microsoft.com, daniel.lezcano@kernel.org,
	wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
  Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
	kvm@vger.kernel.org, mhklinux@outlook.com,
	thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
	nikunj@amd.com, xen-devel@lists.xenproject.org,
	linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
	rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
	sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-19-seanjc@google.com>


[-- Attachment #1.1: Type: text/plain, Size: 1541 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Pass in a PV clock's save/restore helpers when configuring sched_clock
> instead of relying on each PV clock to manually set the save/restore hooks.
> In addition to bringing sanity to the code, this will allow gracefully
> "rejecting" a PV sched_clock, e.g. when running as a CoCo guest that has
> access to a "secure" TSC.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Nice!

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -567,13 +567,12 @@ static void __init xen_init_time_common(void)
>  {
>  	xen_sched_clock_offset = xen_clocksource_read();
>  	static_call_update(pv_steal_clock, xen_steal_clock);
> -	paravirt_set_sched_clock(xen_sched_clock);
> +
>  	/*
>  	 * Xen has paravirtualized suspend/resume and so doesn't use the common
>  	 * x86 sched_clock save/restore hooks.
>  	 */
> -	x86_platform.save_sched_clock_state = NULL;
> -	x86_platform.restore_sched_clock_state = NULL;
> +	paravirt_set_sched_clock(xen_sched_clock, NULL, NULL);
>  
>  	tsc_register_calibration_routines(xen_tsc_khz, NULL);
>  	x86_platform.get_wallclock = xen_get_wallclock;


Same deal here as with kvmclock, FWIW. If the TSC is viable, then it's
the better choice. 

Especially now there's a significant chance that a "Xen guest" is
actually running under KVM. Xen never screwed its pvclock up quite as
much as KVM did :)

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v3 19/41] x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c
From: David Woodhouse @ 2026-05-20 22:39 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-20-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Move kvm_sched_clock_init() "down" so that it can reference the global
> kvm_clock structure without needing a forward declaration.
> 
> Opportunistically mark the helper as "__init" instead of "inline" to make
> its usage more obvious; modern compilers don't need a hint to inline a
> single-use function, and an extra CALL+RET pair during boot is a complete
> non-issue.  And, if the compiler ignores the hint and does NOT inline the
> function, the resulting code may not get discarded after boot due lack of
> an __init annotation.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 20/41] x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
From: David Woodhouse @ 2026-05-20 22:40 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-21-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 344 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Annotate xen_setup_vsyscall_time_info() as being used only during kernel
> initialization; it's called only by xen_time_init(), which is already
> tagged __init.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 21/41] x86/pvclock: Mark setup helpers and related various as __init/__ro_after_init
From: Woodhouse, David @ 2026-05-20 22:43 UTC (permalink / raw)
  To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
	alexey.makhalov@broadcom.com, jstultz@google.com,
	dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
	jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
	seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
	decui@microsoft.com, daniel.lezcano@kernel.org,
	wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
  Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
	kvm@vger.kernel.org, mhklinux@outlook.com,
	thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
	nikunj@amd.com, xen-devel@lists.xenproject.org,
	linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
	rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
	sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-22-seanjc@google.com>


[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that Xen PV clock and kvmclock explicitly do setup only during init,
> tag the common PV clock flags/vsyscall variables and their mutators with
> __init.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v3 22/41] x86/pvclock: WARN if pvclock's valid_flags are overwritten
From: David Woodhouse @ 2026-05-20 22:44 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-23-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> WARN if the common PV clock valid_flags are overwritten; all PV clocks
> expect that they are the one and only PV clock, i.e. don't guard against
> another PV clock having modified the flags.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 23/41] x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during kvmclock_init()
From: David Woodhouse @ 2026-05-20 22:46 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-24-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Clean up the setting of PVCLOCK_TSC_STABLE_BIT during kvmclock init to
> make it somewhat obvious that pvclock_read_flags() must be called *after*
> pvclock_set_flags().
> 
> Note, in theory, a different PV clock could have set PVCLOCK_TSC_STABLE_BIT
> in the supported flags, i.e. reading flags only if
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is set could very, very theoretically
> result in a change in behavior.  In practice, the kernel only supports a
> single PV clock.

> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 24/41] timekeeping: Resume clocksources before reading persistent clock
From: David Woodhouse @ 2026-05-20 22:52 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-25-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> When resuming timekeeping after suspend, restore clocksources prior to
> reading the persistent clock.  Paravirt clocks, e.g. kvmclock, tie the
> validity of a PV persistent clock to a clocksource, i.e. reading the PV
> persistent clock will return garbage if the underlying PV clocksource
> hasn't been enabled.  The flaw has gone unnoticed because kvmclock is a
> mess and uses its own suspend/resume hooks instead of the clocksource
> suspend/resume hooks, which happens to work by sheer dumb luck (the
> kvmclock resume hook runs before timekeeping_resume()).
> 
> Note, there is no evidence that any clocksource supported by the kernel
> depends on a persistent clock.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 15/41] x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks
From: David Woodhouse @ 2026-05-20 22:54 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <93f8fad19e2c617ca17f544d11483c517024bfc4.camel@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On Wed, 2026-05-20 at 23:11 +0100, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > Nullify the x86_platform sched_clock save/restore hooks when setting up
> > Xen's PV clock to make it somewhat obvious the hooks aren't used when
> > running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Hm. Xen HVM guests *do* use a standard ACPI S4 hibernation flow.
> 
> Does that need testing?

Ah, this looks as sane as we can expect.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

> 
> > ---
> >  arch/x86/xen/time.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> > index 3d3165eef821..21d366d01985 100644
> > --- a/arch/x86/xen/time.c
> > +++ b/arch/x86/xen/time.c
> > @@ -568,6 +568,12 @@ static void __init xen_init_time_common(void)
> >  	xen_sched_clock_offset = xen_clocksource_read();
> >  	static_call_update(pv_steal_clock, xen_steal_clock);
> >  	paravirt_set_sched_clock(xen_sched_clock);
> > +	/*
> > +	 * Xen has paravirtualized suspend/resume and so doesn't use the common
> > +	 * x86 sched_clock save/restore hooks.
> > +	 */
> > +	x86_platform.save_sched_clock_state = NULL;
> > +	x86_platform.restore_sched_clock_state = NULL;
> >  
> >  	tsc_register_calibration_routines(xen_tsc_khz, NULL);
> >  	x86_platform.get_wallclock = xen_get_wallclock;
> 


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 25/41] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-20 23:01 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-26-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 409 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Save/restore kvmclock across suspend/resume via clocksource hooks when
> kvmclock isn't being used for sched_clock.  This will allow using kvmclock
> as a clocksource (or for wallclock!) without also using it for sched_clock.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 25/41] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-20 23:06 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-26-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 409 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Save/restore kvmclock across suspend/resume via clocksource hooks when
> kvmclock isn't being used for sched_clock.  This will allow using kvmclock
> as a clocksource (or for wallclock!) without also using it for sched_clock.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 26/41] x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
From: David Woodhouse @ 2026-05-20 23:19 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-27-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> WARN if kvmclock is still suspended when its wallclock is read, i.e. when
> the kernel reads its persistent clock.  The wallclock subtly depends on
> the BSP's kvmclock being enabled, and returns garbage if kvmclock is
> disabled.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


Although I still hate the whole KVM wallclock thing, as the kvmclock
itself is monotonic_raw, so adding that to the wallclock epoch is kind
of wrong.

Maybe the host should updated the wallclock occasionally to keep it up
to date...


Or maybe the guest should prefer the KVM_HC_CLOCK_PAIRING hypercall if
it exists, over kvm-wallclock.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 27/41] x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-20 23:27 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-28-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
> kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
> invoked if and only if kvmclock is set as sched_clock, ensure APs enable
> their kvmclock during CPU online.  While a redundant write to the MSR is
> technically ok, skip the registration when kvmclock is sched_clock so that
> it's somewhat obvious that kvmclock *needs* to be enabled during early
> bringup when it's being used as sched_clock.
> 
> Plumb in the BSP's resume path purely for documentation purposes.  Both
> KVM (as-a-guest) and timekeeping/clocksource hook syscore_ops, and it's
> not super obvious that using KVM's hooks would be flawed.  E.g. it would
> work today, because KVM's hooks happen to run after/before timekeeping's
> hooks during suspend/resume, but that's sheer dumb luck as the order in
> which syscore_ops are invoked depends entirely on when a subsystem is
> initialized and thus registers its hooks.
> 
> Opportunsitically make the registration messages more precise to help
> debug issues where kvmclock is enabled too late.

That's a hard word to type, isn't it?

> Opportunstically WARN in kvmclock_{suspend,resume}() to harden against
> future bugs.

So is that :)

> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 28/41] x86/paravirt: Mark __paravirt_set_sched_clock() as __init
From: David Woodhouse @ 2026-05-20 23:42 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-29-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Annotate __paravirt_set_sched_clock() as __init, and make its wrapper
> __always_inline to ensure sanitizers don't result in a non-inline version
> hanging around.  All callers run during __init, and changing sched_clock
> after boot would be all kinds of crazy.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 29/41] x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
From: David Woodhouse @ 2026-05-20 23:44 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-30-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Add a return code to __paravirt_set_sched_clock() so that the kernel can
> reject attempts to use a PV sched_clock without breaking the caller.  E.g.
> when running as a CoCo VM with a secure TSC, using a PV clock is generally
> undesirable.
> 
> Note, kvmclock is the only PV clock that does anything "extra" beyond
> simply registering itself as sched_clock, i.e. is the only caller that
> needs to check the new return value.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Oooh... can we use this to reject the kvmclock when we have a stable
and reliable TSC even for non-CoCo guests?

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 30/41] x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted TSC
From: David Woodhouse @ 2026-05-20 23:45 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-31-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Silently ignore attempts to switch to a paravirt sched_clock when running
> as a CoCo guest with trusted TSC.  In hand-wavy theory, a misbehaving
> hypervisor could attack the guest by manipulating the PV clock to affect
> guest scheduling in some weird and/or predictable way.  More importantly,
> reading TSC on such platforms is faster than any PV clock, and sched_clock
> is all about speed.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

And kvmclock. And Xen.

Are there *any* reasons we'd use a PV sched_clock when the TSC is
usable?

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

> ---
>  arch/x86/kernel/tsc.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 3c15fc10e501..ac4abfec1f05 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -283,6 +283,15 @@ bool using_native_sched_clock(void)
>  int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
>  				      void (*save)(void), void (*restore)(void))
>  {
> +	/*
> +	 * Don't replace TSC with a PV clock when running as a CoCo guest and
> +	 * the TSC is secure/trusted; PV clocks are emulated by the hypervisor,
> +	 * which isn't in the guest's TCB.
> +	 */
> +	if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC) ||
> +	    boot_cpu_has(X86_FEATURE_TDX_GUEST))
> +		return -EPERM;
> +
>  	if (!stable)
>  		clear_sched_clock_stable();
>  


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 31/41] x86/tsc: Pass KNOWN_FREQ and RELIABLE as params to registration
From: Woodhouse, David @ 2026-05-20 23:49 UTC (permalink / raw)
  To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
	alexey.makhalov@broadcom.com, jstultz@google.com,
	dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
	jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
	seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
	decui@microsoft.com, daniel.lezcano@kernel.org,
	wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
  Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
	kvm@vger.kernel.org, mhklinux@outlook.com,
	thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
	nikunj@amd.com, xen-devel@lists.xenproject.org,
	linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
	rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
	sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-32-seanjc@google.com>


[-- Attachment #1.1: Type: text/plain, Size: 1363 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Add a "tsc_properties" set of flags and use it to annotate whether the
> TSC operates at a known and/or reliable frequency when registering a
> paravirtual TSC calibration routine.  Currently, each PV flow manually
> sets the associated feature flags, but often in haphazard fashion that
> makes it difficult for unfamiliar readers to see the properties of the
> TSC when running under a particular hypervisor.
> 
> The other, bigger issue with manually setting the feature flags is that
> it decouples the flags from the calibration routine.  E.g. in theory, PV
> code could mark the TSC as having a known frequency, but then have its
> PV calibration discarded in favor of a method that doesn't use that known
> frequency.  Passing the TSC properties along with the calibration routine
> will allow adding sanity checks to guard against replacing a "better"
> calibration routine with a "worse" routine.
> 
> As a bonus, the flags also give developers working on new PV code a heads
> up that they should at least mark the TSC as having a known frequency.
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v3 32/41] x86/tsc: Rejects attempts to override TSC calibration with lesser routine
From: David Woodhouse @ 2026-05-20 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-33-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> When registering a TSC frequency calibration routine, sanity check that
> the incoming routine is as robust as the outgoing routine, and reject the
> incoming routine if the sanity check fails.
> 
> Because native calibration routines only mark the TSC frequency as known
> and reliable when they actually run, the effective progression of
> capabilities is: None (native) => Known and maybe Reliable (PV) =>
> Known and Reliable (CoCo).  Violating that progression for a PV override
> is relatively benign, but messing up the progression when CoCo is
> involved is more problematic, as it likely means a trusted source of
> information (hardware/firmware) is being discarded in favor of a less
> trusted source (hypervisor).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 33/41] x86/kvmclock: Mark TSC as reliable when it's constant and nonstop
From: David Woodhouse @ 2026-05-20 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-34-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Mark the TSC as reliable if the hypervisor (KVM) has enumerated the TSC
> as constant and nonstop, and the admin hasn't explicitly marked the TSC
> as unstable.  Like most (all?) virtualization setups, any secondary
> clocksource that's used as a watchdog is guaranteed to be less reliable
> than a constant, nonstop TSC, as all clocksources the kernel uses as a
> watchdog are all but guaranteed to be emulated when running as a KVM
> guest.  I.e. any observed discrepancies between the TSC and watchdog will
> be due to jitter in the watchdog.
> 
> This is especially true for KVM, as the watchdog clocksource is usually
> emulated in host userspace, i.e. reading the clock incurs a roundtrip
> cost of thousands of cycles.
> 
> Marking the TSC reliable addresses a flaw where the TSC will occasionally
> be marked unstable if the host is under moderate/heavy load.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 36/41] x86/kvmclock: Get local APIC bus frequency from PV CPUID Timing Info
From: Woodhouse, David @ 2026-05-20 23:55 UTC (permalink / raw)
  To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
	alexey.makhalov@broadcom.com, jstultz@google.com,
	dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
	jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
	seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
	decui@microsoft.com, daniel.lezcano@kernel.org,
	wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
  Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
	kvm@vger.kernel.org, mhklinux@outlook.com,
	thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
	nikunj@amd.com, xen-devel@lists.xenproject.org,
	linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
	rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
	sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-37-seanjc@google.com>


[-- Attachment #1.1: Type: text/plain, Size: 647 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> When running as a KVM guest with kvmclock support enabled, stuff the APIC
> timer period/frequency with the local APIC bus frequency reported in
> CPUID.0x40000010.EBX instead of trying to calibrate/guess the frequency.
> 
> See Documentation/virt/kvm/x86/cpuid.rst for details.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

I still don't much like the way this is done inside kvm_get_tsc_khz().

We also probably ought to be looking for the timing leaf on other
hypervisors including VMware and probably Bhyve too. Should it be done
somewhere else?



[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v3 37/41] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
From: David Woodhouse @ 2026-05-20 23:56 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-38-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Prefer the TSC over kvmclock for sched_clock if the TSC is constant,
> nonstop, and not marked unstable via command line.  I.e. use the same
> criteria as tweaking the clocksource rating so that TSC is preferred over
> kvmclock.  Per the below comment from native_sched_clock(), sched_clock
> is more tolerant of slop than clocksource; using TSC for clocksource but
> not sched_clock makes little to no sense, especially now that KVM CoCo
> guests with a trusted TSC use TSC, not kvmclock.
> 
>         /*
>          * Fall back to jiffies if there's no TSC available:
>          * ( But note that we still use it if the TSC is marked
>          *   unstable. We do this because unlike Time Of Day,
>          *   the scheduler clock tolerates small errors and it's
>          *   very important for it to be as fast as the platform
>          *   can achieve it. )
>          */
> 
> The only advantage of using kvmclock is that doing so allows for early
> and common detection of PVCLOCK_GUEST_STOPPED, but that code has been
> broken for over two years with nary a complaint, i.e. it can't be
> _that_ valuable.  And as above, certain types of KVM guests are losing
> the functionality regardless, i.e. acknowledging PVCLOCK_GUEST_STOPPED
> needs to be decoupled from sched_clock() no matter what.
> 
> Link: https://lore.kernel.org/all/Z4hDK27OV7wK572A@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Yay! (Albeit only for sched_clock, and we should do Xen too)

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 38/41] x86/paravirt: kvmclock: Setup kvmclock early iff it's sched_clock
From: David Woodhouse @ 2026-05-20 23:59 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-39-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Rework the seemingly generic x86_cpuinit_ops.early_percpu_clock_init hook
> into a dedicated PV sched_clock hook, as the only reason the hook exists
> is to allow kvmclock to enable its PV clock on secondary CPUs before the
> kernel tries to reference sched_clock, e.g. when grabbing a timestamp for
> printk.
> 
> Rearranging the hook doesn't exactly reduce complexity; arguably it does
> the opposite.  But as-is, it's practically impossible to understand *why*
> kvmclock needs to do early configuration.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox