* Re: [PATCH v3 26/41] x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
From: David Woodhouse @ 2026-05-20 23:19 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-27-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 766 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> WARN if kvmclock is still suspended when its wallclock is read, i.e. when
> the kernel reads its persistent clock. The wallclock subtly depends on
> the BSP's kvmclock being enabled, and returns garbage if kvmclock is
> disabled.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Although I still hate the whole KVM wallclock thing, as the kvmclock
itself is monotonic_raw, so adding that to the wallclock epoch is kind
of wrong.
Maybe the host should updated the wallclock occasionally to keep it up
to date...
Or maybe the guest should prefer the KVM_HC_CLOCK_PAIRING hypercall if
it exists, over kvm-wallclock.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 25/41] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-20 23:06 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-26-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 409 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Save/restore kvmclock across suspend/resume via clocksource hooks when
> kvmclock isn't being used for sched_clock. This will allow using kvmclock
> as a clocksource (or for wallclock!) without also using it for sched_clock.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 25/41] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-20 23:01 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-26-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 409 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Save/restore kvmclock across suspend/resume via clocksource hooks when
> kvmclock isn't being used for sched_clock. This will allow using kvmclock
> as a clocksource (or for wallclock!) without also using it for sched_clock.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 15/41] x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks
From: David Woodhouse @ 2026-05-20 22:54 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <93f8fad19e2c617ca17f544d11483c517024bfc4.camel@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
On Wed, 2026-05-20 at 23:11 +0100, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > Nullify the x86_platform sched_clock save/restore hooks when setting up
> > Xen's PV clock to make it somewhat obvious the hooks aren't used when
> > running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Hm. Xen HVM guests *do* use a standard ACPI S4 hibernation flow.
>
> Does that need testing?
Ah, this looks as sane as we can expect.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
>
> > ---
> > arch/x86/xen/time.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> > index 3d3165eef821..21d366d01985 100644
> > --- a/arch/x86/xen/time.c
> > +++ b/arch/x86/xen/time.c
> > @@ -568,6 +568,12 @@ static void __init xen_init_time_common(void)
> > xen_sched_clock_offset = xen_clocksource_read();
> > static_call_update(pv_steal_clock, xen_steal_clock);
> > paravirt_set_sched_clock(xen_sched_clock);
> > + /*
> > + * Xen has paravirtualized suspend/resume and so doesn't use the common
> > + * x86 sched_clock save/restore hooks.
> > + */
> > + x86_platform.save_sched_clock_state = NULL;
> > + x86_platform.restore_sched_clock_state = NULL;
> >
> > tsc_register_calibration_routines(xen_tsc_khz, NULL);
> > x86_platform.get_wallclock = xen_get_wallclock;
>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 24/41] timekeeping: Resume clocksources before reading persistent clock
From: David Woodhouse @ 2026-05-20 22:52 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-25-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 919 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> When resuming timekeeping after suspend, restore clocksources prior to
> reading the persistent clock. Paravirt clocks, e.g. kvmclock, tie the
> validity of a PV persistent clock to a clocksource, i.e. reading the PV
> persistent clock will return garbage if the underlying PV clocksource
> hasn't been enabled. The flaw has gone unnoticed because kvmclock is a
> mess and uses its own suspend/resume hooks instead of the clocksource
> suspend/resume hooks, which happens to work by sheer dumb luck (the
> kvmclock resume hook runs before timekeeping_resume()).
>
> Note, there is no evidence that any clocksource supported by the kernel
> depends on a persistent clock.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 23/41] x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during kvmclock_init()
From: David Woodhouse @ 2026-05-20 22:46 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-24-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 660 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Clean up the setting of PVCLOCK_TSC_STABLE_BIT during kvmclock init to
> make it somewhat obvious that pvclock_read_flags() must be called *after*
> pvclock_set_flags().
>
> Note, in theory, a different PV clock could have set PVCLOCK_TSC_STABLE_BIT
> in the supported flags, i.e. reading flags only if
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is set could very, very theoretically
> result in a change in behavior. In practice, the kernel only supports a
> single PV clock.
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 22/41] x86/pvclock: WARN if pvclock's valid_flags are overwritten
From: David Woodhouse @ 2026-05-20 22:44 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-23-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 374 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> WARN if the common PV clock valid_flags are overwritten; all PV clocks
> expect that they are the one and only PV clock, i.e. don't guard against
> another PV clock having modified the flags.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 21/41] x86/pvclock: Mark setup helpers and related various as __init/__ro_after_init
From: Woodhouse, David @ 2026-05-20 22:43 UTC (permalink / raw)
To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
alexey.makhalov@broadcom.com, jstultz@google.com,
dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
decui@microsoft.com, daniel.lezcano@kernel.org,
wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
kvm@vger.kernel.org, mhklinux@outlook.com,
thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
nikunj@amd.com, xen-devel@lists.xenproject.org,
linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-22-seanjc@google.com>
[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that Xen PV clock and kvmclock explicitly do setup only during init,
> tag the common PV clock flags/vsyscall variables and their mutators with
> __init.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]
[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v3 20/41] x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
From: David Woodhouse @ 2026-05-20 22:40 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-21-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 344 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Annotate xen_setup_vsyscall_time_info() as being used only during kernel
> initialization; it's called only by xen_time_init(), which is already
> tagged __init.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 19/41] x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c
From: David Woodhouse @ 2026-05-20 22:39 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-20-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 763 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Move kvm_sched_clock_init() "down" so that it can reference the global
> kvm_clock structure without needing a forward declaration.
>
> Opportunistically mark the helper as "__init" instead of "inline" to make
> its usage more obvious; modern compilers don't need a hint to inline a
> single-use function, and an extra CALL+RET pair during boot is a complete
> non-issue. And, if the compiler ignores the hint and does NOT inline the
> function, the resulting code may not get discarded after boot due lack of
> an __init annotation.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 18/41] x86/paravirt: Pass sched_clock save/restore helpers during registration
From: Woodhouse, David @ 2026-05-20 22:35 UTC (permalink / raw)
To: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
alexey.makhalov@broadcom.com, jstultz@google.com,
dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
seanjc@google.com, pbonzini@redhat.com, kys@microsoft.com,
decui@microsoft.com, daniel.lezcano@kernel.org,
wei.liu@kernel.org, peterz@infradead.org, jgross@suse.com
Cc: boris.ostrovsky@oracle.com, linux-coco@lists.linux.dev,
kvm@vger.kernel.org, mhklinux@outlook.com,
thomas.lendacky@amd.com, linux-kernel@vger.kernel.org,
bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
nikunj@amd.com, xen-devel@lists.xenproject.org,
linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
sboyd@kernel.org, x86@kernel.org
In-Reply-To: <20260515191942.1892718-19-seanjc@google.com>
[-- Attachment #1.1: Type: text/plain, Size: 1541 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Pass in a PV clock's save/restore helpers when configuring sched_clock
> instead of relying on each PV clock to manually set the save/restore hooks.
> In addition to bringing sanity to the code, this will allow gracefully
> "rejecting" a PV sched_clock, e.g. when running as a CoCo guest that has
> access to a "secure" TSC.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Nice!
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -567,13 +567,12 @@ static void __init xen_init_time_common(void)
> {
> xen_sched_clock_offset = xen_clocksource_read();
> static_call_update(pv_steal_clock, xen_steal_clock);
> - paravirt_set_sched_clock(xen_sched_clock);
> +
> /*
> * Xen has paravirtualized suspend/resume and so doesn't use the common
> * x86 sched_clock save/restore hooks.
> */
> - x86_platform.save_sched_clock_state = NULL;
> - x86_platform.restore_sched_clock_state = NULL;
> + paravirt_set_sched_clock(xen_sched_clock, NULL, NULL);
>
> tsc_register_calibration_routines(xen_tsc_khz, NULL);
> x86_platform.get_wallclock = xen_get_wallclock;
Same deal here as with kvmclock, FWIW. If the TSC is viable, then it's
the better choice.
Especially now there's a significant chance that a "Xen guest" is
actually running under KVM. Xen never screwed its pvclock up quite as
much as KVM did :)
[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]
[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v3 17/41] x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock
From: David Woodhouse @ 2026-05-20 22:27 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-18-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 396 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that all PV clocksources override the sched_clock save/restore hooks
> when overriding sched_clock, WARN if the "default" TSC hooks are invoked
> when using a PV sched_clock, e.g. to guard against regressions.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 16/41] x86/vmware: Nullify save/restore hooks when using VMware's sched_clock
From: David Woodhouse @ 2026-05-20 22:15 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-17-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 629 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Nullify the sched_clock save/restore hooks when using VMware's version of
> sched_clock. This will allow extending paravirt_set_sched_clock() to set
> the save/restore hooks, without having to simultaneously change the
> behavior of VMware guests.
>
> Note, it's not at all obvious that it's safe/correct for VMware guests to
> do nothing on suspend/resume, but that's a pre-existing problem. Leave it
> for a VMware expert to sort out.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 15/41] x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks
From: David Woodhouse @ 2026-05-20 22:11 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-16-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Nullify the x86_platform sched_clock save/restore hooks when setting up
> Xen's PV clock to make it somewhat obvious the hooks aren't used when
> running as a Xen guest (Xen uses a paravirtualized suspend/resume flow).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Hm. Xen HVM guests *do* use a standard ACPI S4 hibernation flow.
Does that need testing?
> ---
> arch/x86/xen/time.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 3d3165eef821..21d366d01985 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -568,6 +568,12 @@ static void __init xen_init_time_common(void)
> xen_sched_clock_offset = xen_clocksource_read();
> static_call_update(pv_steal_clock, xen_steal_clock);
> paravirt_set_sched_clock(xen_sched_clock);
> + /*
> + * Xen has paravirtualized suspend/resume and so doesn't use the common
> + * x86 sched_clock save/restore hooks.
> + */
> + x86_platform.save_sched_clock_state = NULL;
> + x86_platform.restore_sched_clock_state = NULL;
>
> tsc_register_calibration_routines(xen_tsc_khz, NULL);
> x86_platform.get_wallclock = xen_get_wallclock;
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 14/41] x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c
From: David Woodhouse @ 2026-05-20 21:59 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-15-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 345 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Move kvmclock's sched_clock save/restore helper "up" so that they can
> (eventually) be referenced by kvm_sched_clock_init().
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 13/41] x86/paravirt: Move handling of unstable PV clocks into paravirt_set_sched_clock()
From: David Woodhouse @ 2026-05-20 21:57 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-14-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 853 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Move the handling of unstable PV clocks, of which kvmclock is the only
> example, into paravirt_set_sched_clock(). This will allow modifying
> paravirt_set_sched_clock() to keep using the TSC for sched_clock in
> certain scenarios without unintentionally marking the TSC-based clock as
> unstable.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
> -void paravirt_set_sched_clock(u64 (*func)(void));
> +void __paravirt_set_sched_clock(u64 (*func)(void), bool stable);
> +
> +static inline void paravirt_set_sched_clock(u64 (*func)(void))
> +{
> + __paravirt_set_sched_clock(func, true);
> +}
One of the few things I actually like about C++ is default arguments...
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 12/41] x86/paravirt: Remove unnecessary PARAVIRT=n stub for paravirt_set_sched_clock()
From: David Woodhouse @ 2026-05-20 21:53 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-13-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 526 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Remove the unnecessary paravirt_set_sched_clock() stub for PARAVIRT=n, as
> all callers are gated by PARAVIRT=y. Eliminating the stub will avoid a
> pile of pointless churn as the "real" implementation evolves.
>
> No functional change intended.
>
> Fixes: 39965afb1151 ("x86/paravirt: Move paravirt_sched_clock() related code into tsc.c")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 11/41] x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()
From: David Woodhouse @ 2026-05-20 21:51 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-12-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Don't disable kvmclock on the BSP during syscore_suspend(), as the BSP's
> clock is NOT restored during syscore_resume(), but is instead restored
> earlier via the sched_clock restore callback. If suspend is aborted, e.g.
> due to a late wakeup, the BSP will run without its clock enabled, which
> "works" only because KVM-the-hypervisor is kind enough to not clobber the
> shared memory when the clock is disabled. But over time, the BSP's view
> of time will drift from APs.
The work that Dongli and I have been doing elsewhere makes me want to
phrase that last sentence far more judgmentally, as the kvmclock
absolutely should *not* be randomly drifting over time without good
reason, and what was written to it earlier really *ought* to still be
valid.
But OK :)
> Plumb in an "action" to KVM-as-a-guest and kvmclock code in preparation
> for additional cleanups to kvmclock's suspend/resume logic.
>
> Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 10/41] x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y
From: David Woodhouse @ 2026-05-20 21:45 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-11-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 514 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Gate kvmclock's secondary CPU code on CONFIG_SMP, not CONFIG_X86_LOCAL_APIC.
> Originally, kvmclock piggybacked PV APIC ops to setup secondary CPUs.
> When that wart was fixed by commit df156f90a0f9 ("x86: Introduce
> x86_cpuinit.early_percpu_clock_init hook"), the dependency on a local APIC
> got carried forward unnecessarily.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: David Woodhouse @ 2026-05-20 21:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag4or9-9c6VZxqya@google.com>
[-- Attachment #1: Type: text/plain, Size: 3077 bytes --]
On Wed, 2026-05-20 at 14:33 -0700, Sean Christopherson wrote:
> On Wed, May 20, 2026, David Woodhouse wrote:
> > On Wed, 2026-05-20 at 13:44 -0700, Sean Christopherson wrote:
> > >
> > > + /*
> > > + * If the TSC counts at a constant frequency across P/T states, counts
> > > + * in deep C-states, and the TSC hasn't been marked unstable, treat the
> > > + * TSC reliable, as guaranteed by KVM. Note, the TSC unstable check
> > > + * exists purely to honor the TSC being marked unstable via command
> > > + * line, any runtime detection of an unstable will happen after this.
> > > + */
> > > + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > + !check_tsc_unstable())
> > {
> > > + tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
> >
> > kvmclock = 0; /* Why use it if the TSC works? The kvmclock exists
> > *purely* to work around a TSC which *doesn't*
> > have those properties checked above. */
>
> kvmclock still provides SYSTEM_TIME and WALL_CLOCK :-/
Um, SYSTEM_TIME *is* the kvmclock of which we speak, isn't it?
WALL_CLOCK is something else, and I guess we should still consume that,
yes.
>
> > > +
> > > + kvm_tsc_khz_cpuid = kvm_para_tsc_khz();
> > > +
> > > + /*
> > > + * If provided, use the TSC (and APIC bus) frequency provided in KVM's
> > > + * PV CPUID leaf even if kvmclock itself is disabled via command line.
> > > + * The PV CPUID information isn't dependent on kvmclock in any way, and
> > > + * in fact using the precise information is *more* important when the
> > > + * user has explicitly disabled kvmclock to force the kernel to use the
> > > + * TSC as its clocksource.
> > > + */
> > > + if (!kvmclock) {
> > > + if (kvm_tsc_khz_cpuid)
> > > + tsc_register_calibration_routines(kvm_get_tsc_khz,
> > > + kvm_get_cpu_khz,
> > > + tsc_properties);
> > > + return;
> > > + }
> > > +
> >
> >
> > Regardless of the above, why not just register these here
> > unconditionally, and remove the later call that does the same?
>
> Because if kvmclock=n, it's only safe to call kvm_get_tsc_khz() if kvm_tsc_khz_cpuid
> is non-zero, other wise the "else" path will hit a NULL pointer deref when trying
> to get the frequency from the PV clock struct:
>
> return kvm_tsc_khz_cpuid ? : pvclock_tsc_khz(this_cpu_pvti());
Ah, right. Thanks. Ick though :)
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: Sean Christopherson @ 2026-05-20 21:33 UTC (permalink / raw)
To: David Woodhouse
Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <621e10bdc9e297c6c600b561d8fa25c3b62968bc.camel@infradead.org>
On Wed, May 20, 2026, David Woodhouse wrote:
> On Wed, 2026-05-20 at 13:44 -0700, Sean Christopherson wrote:
> >
> > + /*
> > + * If the TSC counts at a constant frequency across P/T states, counts
> > + * in deep C-states, and the TSC hasn't been marked unstable, treat the
> > + * TSC reliable, as guaranteed by KVM. Note, the TSC unstable check
> > + * exists purely to honor the TSC being marked unstable via command
> > + * line, any runtime detection of an unstable will happen after this.
> > + */
> > + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > + !check_tsc_unstable())
> {
> > + tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
>
> kvmclock = 0; /* Why use it if the TSC works? The kvmclock exists
> *purely* to work around a TSC which *doesn't*
> have those properties checked above. */
kvmclock still provides SYSTEM_TIME and WALL_CLOCK :-/
> }
>
> I was going to say PVCLOCK_TSC_STABLE_BIT, and maybe we should check
> that *too* for paranoia?
No? PVCLOCK_TSC_STABLE is property of kvmclock more than it's a property of the
TSC itself. And for the no-kvmclock case, we most definitely don't want to setup
kvmclock just to query that flag.
> But hopefully the checks you have above are equivalent?
They aren't as paranoid, but if the host enumerates CONSTANT+NONSTOP TSC despite
KVM-the-host not being able to advertise PVCLOCK_TSC_STABLE_BIT, then the VMM done
messed up.
> > +
> > + kvm_tsc_khz_cpuid = kvm_para_tsc_khz();
> > +
> > + /*
> > + * If provided, use the TSC (and APIC bus) frequency provided in KVM's
> > + * PV CPUID leaf even if kvmclock itself is disabled via command line.
> > + * The PV CPUID information isn't dependent on kvmclock in any way, and
> > + * in fact using the precise information is *more* important when the
> > + * user has explicitly disabled kvmclock to force the kernel to use the
> > + * TSC as its clocksource.
> > + */
> > + if (!kvmclock) {
> > + if (kvm_tsc_khz_cpuid)
> > + tsc_register_calibration_routines(kvm_get_tsc_khz,
> > + kvm_get_cpu_khz,
> > + tsc_properties);
> > + return;
> > + }
> > +
>
>
> Regardless of the above, why not just register these here
> unconditionally, and remove the later call that does the same?
Because if kvmclock=n, it's only safe to call kvm_get_tsc_khz() if kvm_tsc_khz_cpuid
is non-zero, other wise the "else" path will hit a NULL pointer deref when trying
to get the frequency from the PV clock struct:
return kvm_tsc_khz_cpuid ? : pvclock_tsc_khz(this_cpu_pvti());
^ permalink raw reply
* Re: [PATCH v3 09/41] clocksource: hyper-v: Don't save/restore TSC offset when using HV sched_clock
From: David Woodhouse @ 2026-05-20 21:30 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-10-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 957 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that Hyper-V overrides the sched_clock save/restore hooks if and only
> sched_clock itself is set to the Hyper-V reference counter, drop the
> invocation of the "old" save/restore callbacks. When the registration of
> the PV sched_clock was done separately from overriding the save/restore
> hooks, it was possible for Hyper-V to clobber the TSC save/restore
> callbacks without actually switching to the Hyper-V refcounter.
>
> Enabling a PV sched_clock is a one-way street, i.e. the kernel will never
> revert to using TSC for sched_clock, and so there is no need to invoke the
> TSC save/restore hooks (and if there was, it belongs in common PV code).
>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 08/41] clocksource: hyper-v: Drop wrappers to sched_clock save/restore helpers
From: David Woodhouse @ 2026-05-20 21:29 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-9-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 477 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Now that all of the Hyper-V reference counter sched_clock code is located
> in a single file, drop the superfluous wrappers for the save/restore flows.
>
> No functional change intended.
>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 07/41] clocksource: hyper-v: Register sched_clock save/restore iff it's necessary
From: David Woodhouse @ 2026-05-20 21:27 UTC (permalink / raw)
To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
Thomas Gleixner, John Stultz
Cc: Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-8-seanjc@google.com>
[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]
On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Register the Hyper-V reference counter (refcounter) callbacks for saving
> and restoring its PV sched_clock, if and only if the refcounter is
> actually being used for sched_clock. Currently, Hyper-V overrides the
> save/restore hooks if the reference TSC available, whereas the Hyper-V
> refcounter code only overrides sched_clock if the reference TSC is
> available *and* it's not invariant. The flaw is effectively papered over
> by invoking the "old" save/restore callbacks as part of save/restore, but
> that's unnecessary and fragile.
>
> To avoid introducing more complexity, and to allow for additional cleanups
> of the PV sched_clock code, move the save/restore hooks and logic into
> hyperv_timer.c and simply wire up the hooks when overriding sched_clock
> itself.
>
> Note, while the Hyper-V refcounter code is intended to be architecture
> neutral, CONFIG_PARAVIRT is firmly x86-only, i.e. adding a small amount of
> x86 specific code (which will be reduced in future cleanups) doesn't
> meaningfully pollute generic code.
>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: David Woodhouse @ 2026-05-20 21:19 UTC (permalink / raw)
To: Sean Christopherson
Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag4dMc2B3JQi4vxU@google.com>
[-- Attachment #1: Type: text/plain, Size: 2384 bytes --]
On Wed, 2026-05-20 at 13:44 -0700, Sean Christopherson wrote:
>
> + /*
> + * If the TSC counts at a constant frequency across P/T states, counts
> + * in deep C-states, and the TSC hasn't been marked unstable, treat the
> + * TSC reliable, as guaranteed by KVM. Note, the TSC unstable check
> + * exists purely to honor the TSC being marked unstable via command
> + * line, any runtime detection of an unstable will happen after this.
> + */
> + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> + !check_tsc_unstable())
{
> + tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
kvmclock = 0; /* Why use it if the TSC works? The kvmclock exists
*purely* to work around a TSC which *doesn't*
have those properties checked above. */
}
I was going to say PVCLOCK_TSC_STABLE_BIT, and maybe we should check
that *too* for paranoia? But hopefully the checks you have above are
equivalent?
> +
> + kvm_tsc_khz_cpuid = kvm_para_tsc_khz();
> +
> + /*
> + * If provided, use the TSC (and APIC bus) frequency provided in KVM's
> + * PV CPUID leaf even if kvmclock itself is disabled via command line.
> + * The PV CPUID information isn't dependent on kvmclock in any way, and
> + * in fact using the precise information is *more* important when the
> + * user has explicitly disabled kvmclock to force the kernel to use the
> + * TSC as its clocksource.
> + */
> + if (!kvmclock) {
> + if (kvm_tsc_khz_cpuid)
> + tsc_register_calibration_routines(kvm_get_tsc_khz,
> + kvm_get_cpu_khz,
> + tsc_properties);
> + return;
> + }
> +
Regardless of the above, why not just register these here
unconditionally, and remove the later call that does the same?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox