From: Sean Christopherson <seanjc@google.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Juergen Gross <jgross@suse.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Ajay Kaher <ajay.kaher@broadcom.com>,
Alexey Makhalov <alexey.amakhalov@broadcom.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
virtualization@lists.linux.dev, linux-hyperv@vger.kernel.org,
jailhouse-dev@googlegroups.com, kvm@vger.kernel.org,
xen-devel@lists.xenproject.org,
Sean Christopherson <seanjc@google.com>,
Nikunj A Dadhania <nikunj@amd.com>,
Tom Lendacky <thomas.lendacky@amd.com>
Subject: [PATCH 16/16] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
Date: Fri, 31 Jan 2025 18:17:18 -0800 [thread overview]
Message-ID: <20250201021718.699411-17-seanjc@google.com> (raw)
In-Reply-To: <20250201021718.699411-1-seanjc@google.com>
Prefer the TSC over kvmclock for sched_clock if the TSC is constant,
nonstop, and not marked unstable via command line. I.e. use the same
criteria as tweaking the clocksource rating so that TSC is preferred over
kvmclock. Per the below comment from native_sched_clock(), sched_clock
is more tolerant of slop than clocksource; using TSC for clocksource but
not sched_clock makes little to no sense, especially now that KVM CoCo
guests with a trusted TSC use TSC, not kvmclock.
/*
* Fall back to jiffies if there's no TSC available:
* ( But note that we still use it if the TSC is marked
* unstable. We do this because unlike Time Of Day,
* the scheduler clock tolerates small errors and it's
* very important for it to be as fast as the platform
* can achieve it. )
*/
The only advantage of using kvmclock is that doing so allows for early
and common detection of PVCLOCK_GUEST_STOPPED, but that code has been
broken for nearly two years with nary a complaint, i.e. it can't be
_that_ valuable. And as above, certain types of KVM guests are losing
the functionality regardless, i.e. acknowledging PVCLOCK_GUEST_STOPPED
needs to be decoupled from sched_clock() no matter what.
Link: https://lore.kernel.org/all/Z4hDK27OV7wK572A@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/kvmclock.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 9d05d070fe25..fb8cd8313d18 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -344,23 +344,23 @@ void __init kvmclock_init(void)
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
/*
- * X86_FEATURE_NONSTOP_TSC is TSC runs at constant rate
- * with P/T states and does not stop in deep C-states.
- *
- * Invariant TSC exposed by host means kvmclock is not necessary:
- * can use TSC as clocksource.
- *
+ * If the TSC counts at a constant frequency across P/T states, counts
+ * in deep C-states, and the TSC hasn't been marked unstable, prefer
+ * the TSC over kvmclock for sched_clock and drop kvmclock's rating so
+ * that TSC is chosen as the clocksource. Note, the TSC unstable check
+ * exists purely to honor the TSC being marked unstable via command
+ * line, any runtime detection of an unstable will happen after this.
*/
if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
!check_tsc_unstable()) {
kvm_clock.rating = 299;
tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
+ } else {
+ flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
+ kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
}
- flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
- kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
-
tsc_register_calibration_routines(kvm_get_tsc_khz, kvm_get_cpu_khz,
tsc_properties);
@@ -369,6 +369,11 @@ void __init kvmclock_init(void)
#ifdef CONFIG_X86_LOCAL_APIC
x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock;
#endif
+ /*
+ * Save/restore "sched" clock state even if kvmclock isn't being used
+ * for sched_clock, as kvmclock is still used for wallclock and relies
+ * on these hooks to re-enable kvmclock after suspend+resume.
+ */
x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
kvm_get_preset_lpj();
--
2.48.1.362.g079036d154-goog
next prev parent reply other threads:[~2025-02-01 2:17 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-01 2:17 [PATCH 00/16] x86/tsc: Try to wrangle PV clocks vs. TSC Sean Christopherson
2025-02-01 2:17 ` [PATCH 01/16] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15 Sean Christopherson
2025-02-03 5:55 ` Nikunj A Dadhania
2025-02-03 22:03 ` Sean Christopherson
2025-02-05 22:13 ` Sean Christopherson
2025-02-11 15:01 ` Borislav Petkov
2025-02-11 17:25 ` Sean Christopherson
2025-02-11 18:40 ` Borislav Petkov
2025-02-11 19:03 ` Sean Christopherson
2025-02-01 2:17 ` [PATCH 02/16] x86/tsc: Add standalone helper for getting CPU frequency from CPUID Sean Christopherson
2025-02-01 2:17 ` [PATCH 03/16] x86/tsc: Add helper to register CPU and TSC freq calibration routines Sean Christopherson
2025-02-11 17:32 ` Borislav Petkov
2025-02-11 17:43 ` Sean Christopherson
2025-02-11 20:32 ` Borislav Petkov
2025-02-12 16:49 ` Tom Lendacky
2025-02-01 2:17 ` [PATCH 04/16] x86/sev: Mark TSC as reliable when configuring Secure TSC Sean Christopherson
2025-02-04 8:02 ` Nikunj A Dadhania
2025-02-01 2:17 ` [PATCH 05/16] x86/sev: Move check for SNP Secure TSC support to tsc_early_init() Sean Christopherson
2025-02-04 8:27 ` Nikunj A Dadhania
2025-02-01 2:17 ` [PATCH 06/16] x86/tdx: Override PV calibration routines with CPUID-based calibration Sean Christopherson
2025-02-04 10:16 ` Nikunj A Dadhania
2025-02-04 19:29 ` Sean Christopherson
2025-02-05 3:56 ` Nikunj A Dadhania
2025-02-01 2:17 ` [PATCH 07/16] x86/acrn: Mark TSC frequency as known when using ACRN for calibration Sean Christopherson
2025-02-01 2:17 ` [PATCH 08/16] x86/tsc: Pass KNOWN_FREQ and RELIABLE as params to registration Sean Christopherson
2025-02-03 14:48 ` Tom Lendacky
2025-02-03 19:52 ` Sean Christopherson
2025-02-01 2:17 ` [PATCH 09/16] x86/tsc: Rejects attempts to override TSC calibration with lesser routine Sean Christopherson
2025-02-01 2:17 ` [PATCH 10/16] x86/paravirt: Move handling of unstable PV clocks into paravirt_set_sched_clock() Sean Christopherson
2025-02-01 2:17 ` [PATCH 11/16] x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted TSC Sean Christopherson
2025-02-01 2:17 ` [PATCH 12/16] x86/kvmclock: Mark TSC as reliable when it's constant and nonstop Sean Christopherson
2025-02-01 2:17 ` [PATCH 13/16] x86/kvmclock: Get CPU base frequency from CPUID when it's available Sean Christopherson
2025-02-01 2:17 ` [PATCH 14/16] x86/kvmclock: Get TSC frequency from CPUID when its available Sean Christopherson
2025-02-01 2:17 ` [PATCH 15/16] x86/kvmclock: Stuff local APIC bus period when core crystal freq comes from CPUID Sean Christopherson
2025-02-01 2:17 ` Sean Christopherson [this message]
2025-02-07 17:23 ` [PATCH 16/16] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop Sean Christopherson
2025-02-08 18:03 ` Michael Kelley
2025-02-10 16:21 ` Sean Christopherson
2025-02-12 16:44 ` Michael Kelley
2025-02-12 22:55 ` Sean Christopherson
2025-02-11 14:39 ` [PATCH 00/16] x86/tsc: Try to wrangle PV clocks vs. TSC Borislav Petkov
2025-02-11 16:28 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250201021718.699411-17-seanjc@google.com \
--to=seanjc@google.com \
--cc=ajay.kaher@broadcom.com \
--cc=alexey.amakhalov@broadcom.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=jailhouse-dev@googlegroups.com \
--cc=jan.kiszka@siemens.com \
--cc=jgross@suse.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=kys@microsoft.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=nikunj@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=virtualization@lists.linux.dev \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).