Linux-HyperV List
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	 Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  Kiryl Shutsemau <kas@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	 "K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>,  Dexuan Cui <decui@microsoft.com>,
	Long Li <longli@microsoft.com>,
	 Ajay Kaher <ajay.kaher@broadcom.com>,
	Alexey Makhalov <alexey.makhalov@broadcom.com>,
	 Jan Kiszka <jan.kiszka@siemens.com>,
	Andy Lutomirski <luto@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Juergen Gross <jgross@suse.com>,
	 Daniel Lezcano <daniel.lezcano@kernel.org>,
	John Stultz <jstultz@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	 Vitaly Kuznetsov <vkuznets@redhat.com>,
	 Broadcom internal kernel review list
	<bcm-kernel-feedback-list@broadcom.com>,
	 Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Stephen Boyd <sboyd@kernel.org>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev,  linux-hyperv@vger.kernel.org,
	virtualization@lists.linux.dev,  xen-devel@lists.xenproject.org,
	David Woodhouse <dwmw@amazon.co.uk>,
	 Tom Lendacky <thomas.lendacky@amd.com>,
	Nikunj A Dadhania <nikunj@amd.com>,
	 David Woodhouse <dwmw2@infradead.org>,
	Michael Kelley <mhklinux@outlook.com>,
	 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v4 00/47] x86: Try to wrangle PV clocks vs. TSC
Date: Fri, 29 May 2026 07:43:47 -0700	[thread overview]
Message-ID: <20260529144435.704127-1-seanjc@google.com> (raw)

Well, the number of patches in the series is going in the wrong direction,
but I'm much happier with this version, which eschews the x86_platform
overrides entirely in favor of a fixed sequence for selecting the TSC/CPU
frequency "routine".

Given that previous versions had fatal NULL pointer deref bugs that affected
VMware and Xen, this series needs testing and acks from those maintainers.

The primary goal of this series to fix flaws with SNP and TDX guests where a
PV clock provided by the untrusted hypervisor is used instead of the secure
TSC that is controlled by trusted firmware.

The secondary goal is modernize running under KVM.  Currently, KVM guests will
use TSC for clocksource, but not sched_clock.  And Linux-as-a-KVM-guest doesn't
support paravirt enumeration of the TSC/APIC frequencies, even though QEMU
provides that information by default.

The tertiary goal is to clean up the PV clock code to deduplicate logic across
hypervisors, and to hopefully make it all easier to maintain going forward.

v4 also adds a quaternary goal of cleaning up the TSC calibration code, which
was made stupidly hard to follow by hypervisor code mixing in with the native
calibration routines, instead of being implemented as a pure alternative.

Lots more background on the SNP/TDX motiviation:
https://lore.kernel.org/all/20250106124633.1418972-13-nikunj@amd.com

As before, I deliberately omitted jailhouse-dev@googlegroups.com from the To/Cc,
as those emails bounced on v1, AFAICT nothing has changed.

Note, I deliberately didn't collect a few reviews as the patches changed quite
a bit from what was reviewed in v3.

v4:
 - Use x86_init_noop() to skip save/restore on VMware and Xen instead of
   nullifying x86_platform.{save,restore}_sched_clock_state. [Sashiko]
 - Use '0' to indicate "failure" when getting the CPU frequency from CPUID, to
   avoid using an out-param and thus make it all but impossible to
   unintentionally clobber the global cpu_khz (which v3 did). [Sashiko]
 - Rename cpuid_get_cpu_freq() => __cpu_khz_from_cpuid() to capture its
   relationship with cpu_khz_from_cpuid().
 - Compute lapic_timer_period in units of ticks, not Khz. [Sashiko]
 - Kill off x86_platform_ops.calibrate_{cpu,tsc}(), and instead use dedicated
   hooks for hypervisor code, and direct calls for TDX and SNP. [David, loosely]
 - Drop SNP's secure TSC override of _CPU_ calibration, as there's zero
   evidence it's justified or a net positive.
 - Collect reviews/acks. [David, Wei]
 - Decouple getting TSC/APIC frequencies from KVM PV CPUID from kvmclock. [David]
 - Fix an amusing number of Opportunistically misspellings. [David]
 - Set kvm_sched_clock_offset _before_ registering kvmclock as sched_clock,
   and add a comment to guard against future goofs. [Sashiko]
 - Keep "setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE)" in Hyper-V's handling
   of HV_ACCESS_TSC_INVARIANT, as it's technically possible to have a VM
   with HV_ACCESS_TSC_INVARIANT but not HV_ACCESS_FREQUENCY_MSRS.  Though as
   a _very_ nice side effect of using dedicated sequencing for selecting the
   TSC frequency source, this would have naturally happened anyways. [Sashiko]

v3:
 - https://lore.kernel.org/all/20260515191942.1892718-1-seanjc@google.com
 - Collect reviews. [Michael, Thomas]
 - Use Hyper-V reference counter / refcounter instead of Hyper-V timer. [Michael]
 - Use the paravirt CPUID interface first proposed by VMware for KVM's
   "official" mechanism for communicating frequency to KVM-aware guests,
   instead of abusing Intel's CPUID leafs. [David]
 - Deal with paravirt code being moved into asm/timers.h and
   arch/x86/kernel/tsc.c.

v2:
 - https://lore.kernel.org/all/Z8YWttWDtvkyCtdJ@google.com
 - Add struct to hold the TSC CPUID output. [Boris]
 - Don't pointlessly inline the TSC CPUID helpers. [Boris]
 - Fix a variable goof in a helper, hopefully for real this time. [Dan]
 - Collect reviews. [Nikunj]
 - Override the sched_clock save/restore hooks if and only if a PV clock
   is successfully registered.
 - During resome, restore clocksources before reading persistent time.
 - Clean up more warts created by kvmclock.
 - Fix more bugs in kvmclock's suspend/resume handling.
 - Try to harden kvmclock against future bugs.

v1: https://lore.kernel.org/all/20250201021718.699411-1-seanjc@google.com

David Woodhouse (3):
  KVM: x86: Officially define CPUID 0x40000010 as PV Timing Info (TSC
    and Bus)
  x86/kvm: Obtain TSC frequency from PV CPUID if present
  x86/xen: Obtain TSC frequency from CPUID if present

Sean Christopherson (44):
  x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
  x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
  x86/sev: Mark TSC as reliable when configuring Secure TSC
  x86/sev: Don't override CPU frequency calibration for SNP's Secure TSC
  x86/sev: Move check for SNP Secure TSC support to tsc_early_init()
  x86/sev: Shove SNP's secure/trusted TSC frequency directly into
    "calibration"
  x86/tdx: Force TSC frequency with CPUID-based info provided by the
    TDX-Module
  x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU
    frequencies
  x86/acrn: Mark TSC frequency as known when using ACRN for calibration
  x86/tsc: Consolidate forcing of X86_FEATURE_TSC_KNOWN_FREQ for PV code
  x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks
  x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() =>
    native_calibrate_cpu_late()
  x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()
  x86/kvmclock: Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz()
  x86/kvm: Mark TSC as reliable when it's constant and nonstop
  x86/kvm: Get local APIC bus frequency from PV CPUID Timing Info
  x86/tsc: Add standalone helper for getting CPU frequency from CPUID
  x86/kvm: Get CPU base frequency from CPUID when it's available
  clocksource: hyper-v: Register sched_clock save/restore iff it's
    necessary
  clocksource: hyper-v: Drop wrappers to sched_clock save/restore
    helpers
  clocksource: hyper-v: Don't save/restore TSC offset when using HV
    sched_clock
  x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y
  x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()
  x86/paravirt: Remove unnecessary PARAVIRT=n stub for
    paravirt_set_sched_clock()
  x86/paravirt: Move handling of unstable PV clocks into
    paravirt_set_sched_clock()
  x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c
  x86/xen/time: NOP-ify x86_platform's sched_clock save/restore hooks
  x86/vmware: NOP-ify save/restore hooks when using VMware's sched_clock
  x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock
  x86/paravirt: Pass sched_clock save/restore helpers during
    registration
  x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c
  x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
  x86/pvclock: Mark setup helpers and related various as
    __init/__ro_after_init
  x86/pvclock: WARN if pvclock's valid_flags are overwritten
  x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during
    kvmclock_init()
  timekeeping: Resume clocksources before reading persistent clock
  x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't
    sched_clock
  x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
  x86/paravirt: Mark __paravirt_set_sched_clock() as __init
  x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
  x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted
    TSC
  x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
  x86/kvmclock: Plumb in AP-online and BSP-resume to kvmlock, for
    documentation
  x86/paravirt: Move using_native_sched_clock() stub into timer.h

 Documentation/virt/kvm/x86/cpuid.rst |  12 ++
 arch/x86/coco/sev/core.c             |  21 +--
 arch/x86/coco/tdx/tdx.c              |  19 ++-
 arch/x86/include/asm/acrn.h          |   5 -
 arch/x86/include/asm/kvm_para.h      |  12 +-
 arch/x86/include/asm/sev.h           |   4 +-
 arch/x86/include/asm/tdx.h           |   2 +
 arch/x86/include/asm/timer.h         |  15 +-
 arch/x86/include/asm/tsc.h           |  11 +-
 arch/x86/include/asm/x86_init.h      |   8 +-
 arch/x86/include/uapi/asm/kvm_para.h |  11 ++
 arch/x86/kernel/cpu/acrn.c           |  10 +-
 arch/x86/kernel/cpu/mshyperv.c       |  65 +-------
 arch/x86/kernel/cpu/vmware.c         |  13 +-
 arch/x86/kernel/jailhouse.c          |   7 +-
 arch/x86/kernel/kvm.c                | 108 +++++++++++--
 arch/x86/kernel/kvmclock.c           | 208 ++++++++++++++++---------
 arch/x86/kernel/pvclock.c            |   9 +-
 arch/x86/kernel/tsc.c                | 218 +++++++++++++++++----------
 arch/x86/kernel/x86_init.c           |   2 -
 arch/x86/mm/mem_encrypt_amd.c        |   3 -
 arch/x86/xen/time.c                  |  25 ++-
 drivers/clocksource/hyperv_timer.c   |  38 +++--
 include/clocksource/hyperv_timer.h   |   2 -
 kernel/time/timekeeping.c            |   9 +-
 25 files changed, 533 insertions(+), 304 deletions(-)


base-commit: 4678d11f294de0fd295a265e02955b5d1a4a2684
-- 
2.54.0.823.g6e5bcc1fc9-goog


             reply	other threads:[~2026-05-29 14:44 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 14:43 Sean Christopherson [this message]
2026-05-29 14:43 ` [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known Sean Christopherson
2026-05-30  3:07   ` Borislav Petkov
2026-05-29 14:43 ` [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15 Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 03/47] x86/sev: Mark TSC as reliable when configuring Secure TSC Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 04/47] x86/sev: Don't override CPU frequency calibration for SNP's " Sean Christopherson
2026-05-29 15:44   ` sashiko-bot
2026-05-29 14:43 ` [PATCH v4 05/47] x86/sev: Move check for SNP Secure TSC support to tsc_early_init() Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 06/47] x86/sev: Shove SNP's secure/trusted TSC frequency directly into "calibration" Sean Christopherson
2026-05-29 16:14   ` sashiko-bot
2026-05-29 16:23     ` Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 07/47] x86/tdx: Force TSC frequency with CPUID-based info provided by the TDX-Module Sean Christopherson
2026-05-29 16:21   ` sashiko-bot
2026-05-29 16:59     ` Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 08/47] x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU frequencies Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 09/47] x86/acrn: Mark TSC frequency as known when using ACRN for calibration Sean Christopherson
2026-05-29 16:40   ` sashiko-bot
2026-05-29 17:01     ` Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 10/47] x86/tsc: Consolidate forcing of X86_FEATURE_TSC_KNOWN_FREQ for PV code Sean Christopherson
2026-05-29 19:01   ` sashiko-bot
2026-05-29 14:43 ` [PATCH v4 11/47] x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks Sean Christopherson
2026-05-29 14:43 ` [PATCH v4 12/47] x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() => native_calibrate_cpu_late() Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 13/47] x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz() Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 14/47] x86/kvmclock: Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz() Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 15/47] KVM: x86: Officially define CPUID 0x40000010 as PV Timing Info (TSC and Bus) Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 16/47] x86/kvm: Obtain TSC frequency from PV CPUID if present Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 17/47] x86/kvm: Mark TSC as reliable when it's constant and nonstop Sean Christopherson
2026-05-29 18:12   ` sashiko-bot
2026-05-29 18:57     ` Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 18/47] x86/kvm: Get local APIC bus frequency from PV CPUID Timing Info Sean Christopherson
2026-05-29 18:12   ` sashiko-bot
2026-05-29 18:24     ` Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 19/47] x86/tsc: Add standalone helper for getting CPU frequency from CPUID Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 20/47] x86/kvm: Get CPU base frequency from CPUID when it's available Sean Christopherson
2026-05-30  6:24   ` sashiko-bot
2026-05-29 14:44 ` [PATCH v4 21/47] x86/xen: Obtain TSC frequency from CPUID if present Sean Christopherson
2026-05-30  6:35   ` sashiko-bot
2026-05-29 14:44 ` [PATCH v4 22/47] clocksource: hyper-v: Register sched_clock save/restore iff it's necessary Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 23/47] clocksource: hyper-v: Drop wrappers to sched_clock save/restore helpers Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 24/47] clocksource: hyper-v: Don't save/restore TSC offset when using HV sched_clock Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 25/47] x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y Sean Christopherson
2026-05-29 14:44 ` [PATCH v4 26/47] x86/kvm: Don't disable kvmclock on BSP in syscore_suspend() Sean Christopherson
2026-05-30  7:08   ` sashiko-bot
2026-05-29 15:06 ` [PATCH v4 27/47] x86/paravirt: Remove unnecessary PARAVIRT=n stub for paravirt_set_sched_clock() Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 28/47] x86/paravirt: Move handling of unstable PV clocks into paravirt_set_sched_clock() Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 29/47] x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 30/47] x86/xen/time: NOP-ify x86_platform's sched_clock save/restore hooks Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 31/47] x86/vmware: NOP-ify save/restore hooks when using VMware's sched_clock Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 32/47] x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock Sean Christopherson
2026-05-29 15:07 ` [PATCH v4 33/47] x86/paravirt: Pass sched_clock save/restore helpers during registration Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 34/47] x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 35/47] x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 36/47] x86/pvclock: Mark setup helpers and related various as __init/__ro_after_init Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 37/47] x86/pvclock: WARN if pvclock's valid_flags are overwritten Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 38/47] x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during kvmclock_init() Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 39/47] timekeeping: Resume clocksources before reading persistent clock Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 40/47] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 41/47] x86/kvmclock: WARN if wall clock is read while kvmclock is suspended Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 42/47] x86/paravirt: Mark __paravirt_set_sched_clock() as __init Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 43/47] x86/paravirt: Plumb a return code into __paravirt_set_sched_clock() Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 44/47] x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted TSC Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 45/47] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 46/47] x86/kvmclock: Plumb in AP-online and BSP-resume to kvmlock, for documentation Sean Christopherson
2026-05-29 15:08 ` [PATCH v4 47/47] x86/paravirt: Move using_native_sched_clock() stub into timer.h Sean Christopherson
2026-05-29 15:10 ` [PATCH v4 00/47] x86: Try to wrangle PV clocks vs. TSC Sean Christopherson
2026-05-29 15:17   ` Jürgen Groß

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260529144435.704127-1-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=ajay.kaher@broadcom.com \
    --cc=alexey.makhalov@broadcom.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=daniel.lezcano@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jgross@suse.com \
    --cc=jstultz@google.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=luto@kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox