From: Sean Christopherson <seanjc@google.com>
To: "Nikunj A. Dadhania" <nikunj@amd.com>
Cc: linux-kernel@vger.kernel.org, thomas.lendacky@amd.com,
bp@alien8.de, x86@kernel.org, kvm@vger.kernel.org,
mingo@redhat.com, tglx@linutronix.de,
dave.hansen@linux.intel.com, pgonda@google.com,
pbonzini@redhat.com, peterz@infradead.org,
gautham.shenoy@amd.com
Subject: Re: [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available
Date: Wed, 25 Sep 2024 05:55:38 -0700 [thread overview]
Message-ID: <ZvQHpbNauYTBgU6M@google.com> (raw)
In-Reply-To: <ef194c25-22d8-204e-ffb6-8f9f0a0621fb@amd.com>
On Wed, Sep 25, 2024, Nikunj A. Dadhania wrote:
> >>>>> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock
> >>>>> should be disabled assuming that timesource is stable and always running?
> >>>>
> >>>> No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC
> >>>> is stable, irrespective of SNP or TDX. This is effectively already done for the
> >>>> timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> >>>> invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the
> >>>> kvm_sched_clock_init() code.
> >>>
> >>> The kvm-clock and tsc-early both are having the rating of 299. As they are of
> >>> same rating, kvm-clock is being picked up first.
> >>>
> >>> Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will
> >>> be picked up instead.
> >>
> >> IMO, it's ugly, but that's a problem with the rating system inasmuch as anything.
> >>
> >> But the kernel will still be using kvmclock for the scheduler clock, which is
> >> undesirable.
> >
> > Agree, kvm_sched_clock_init() is still being called. The above hunk was to use
> > tsc-early/tsc as the clocksource and not kvm-clock.
>
> How about the below patch:
>
> From: Nikunj A Dadhania <nikunj@amd.com>
> Date: Tue, 28 Nov 2023 18:29:56 +0530
> Subject: [RFC PATCH] x86/kvmclock: Prefer invariant TSC as the clocksource and
> scheduler clock
>
> For platforms that support stable and always running TSC, although the
> kvm-clock rating is dropped to 299 to prefer TSC, the guest scheduler clock
> still keeps on using the kvm-clock which is undesirable. Moreover, as the
> kvm-clock and early-tsc clocksource are both registered with 299 rating,
> kvm-clock is being picked up momentarily instead of selecting more stable
> tsc-early clocksource.
>
> kvm-clock: Using msrs 4b564d01 and 4b564d00
> kvm-clock: using sched offset of 1799357702246960 cycles
> clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> tsc: Detected 1996.249 MHz processor
> clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns
> clocksource: Switched to clocksource kvm-clock
> clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns
> clocksource: Switched to clocksource tsc
>
> Drop the kvm-clock rating to 298, so that tsc-early is picked up before
> kvm-clock and use TSC for scheduler clock as well when the TSC is invariant
> and stable.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
>
> ---
>
> The issue we see here is that on bare-metal if the TSC is marked unstable,
> then the sched-clock will fall back to jiffies. In the virtualization case,
> do we want to fall back to kvm-clock when TSC is marked unstable?
In the general case, yes. Though that might be a WARN-able offense if the TSC
is allegedly constant+nonstop. And for SNP and TDX, it might be a "panic and do
not boot" offense, since using kvmclock undermines the security of the guest.
> ---
> arch/x86/kernel/kvmclock.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 5b2c15214a6b..c997b2628c4b 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -317,9 +317,6 @@ void __init kvmclock_init(void)
> if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
> pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>
> - flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
> - kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
> -
> x86_platform.calibrate_tsc = kvm_get_tsc_khz;
> x86_platform.calibrate_cpu = kvm_get_tsc_khz;
> x86_platform.get_wallclock = kvm_get_wallclock;
> @@ -341,8 +338,12 @@ void __init kvmclock_init(void)
> */
> if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> - !check_tsc_unstable())
> - kvm_clock.rating = 299;
> + !check_tsc_unstable()) {
> + kvm_clock.rating = 298;
> + } else {
> + flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
> + kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
> + }
I would really, really like to fix this in a centralized location, not by having
each PV clocksource muck with their clock's rating. I'm not even sure the existing
code is entirely correct, as kvmclock_init() runs _before_ tsc_early_init(). Which
is desirable in the legacy case, as it allows calibrating the TSC using kvmclock,
x86_platform.calibrate_tsc = kvm_get_tsc_khz;
but on modern setups that's definitely undesirable, as it means the kernel won't
use CPUID.0x15, which every explicitly tells software the frequency of the TSC.
And I don't think we want to simply point at native_calibrate_tsc(), because that
thing is not at all correct for a VM, where checking x86_vendor and x86_vfm is at
best sketchy. E.g. I would think it's in AMD's interest for Secure TSC to define
the TSC frequency using CPUID.0x15, even if AMD CPUs don't (yet) natively support
CPUID.0x15.
In other words, I think we need to overhaul the PV clock vs. TSC logic so that it
makes sense for modern CPUs+VMs, not just keep hacking away at kvmclock. I don't
expect the code would be all that complex in the end, the hardest part is likely
just figuring out (and agreeing on) what exactly the kernel should be doing.
> clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
> pv_info.name = "KVM";
> --
> 2.34.1
>
next prev parent reply other threads:[~2024-09-25 12:55 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 15:07 [PATCH v11 00/20] Add Secure TSC support for SNP guests Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 01/20] virt: sev-guest: Replace dev_dbg with pr_debug Nikunj A Dadhania
2024-08-27 8:48 ` [tip: x86/sev] virt: sev-guest: Replace dev_dbg() with pr_debug() tip-bot2 for Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 02/20] virt: sev-guest: Rename local guest message variables Nikunj A Dadhania
2024-08-27 8:48 ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-09-13 17:22 ` [PATCH v11 02/20] " Tom Lendacky
2024-07-31 15:07 ` [PATCH v11 03/20] virt: sev-guest: Fix user-visible strings Nikunj A Dadhania
2024-08-27 8:48 ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-09-13 17:26 ` [PATCH v11 03/20] " Tom Lendacky
2024-07-31 15:07 ` [PATCH v11 04/20] virt: sev-guest: Ensure the SNP guest messages do not exceed a page Nikunj A Dadhania
2024-08-27 8:48 ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 05/20] virt: sev-guest: Use AES GCM crypto library Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 06/20] x86/sev: Handle failures from snp_init() Nikunj A Dadhania
2024-08-27 11:32 ` Borislav Petkov
2024-08-28 4:47 ` Nikunj A. Dadhania
2024-08-28 9:49 ` Borislav Petkov
2024-08-28 10:16 ` Nikunj A. Dadhania
2024-08-28 10:23 ` Borislav Petkov
2024-07-31 15:07 ` [PATCH v11 07/20] x86/sev: Cache the secrets page address Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 08/20] virt: sev-guest: Consolidate SNP guest messaging parameters to a struct Nikunj A Dadhania
2024-09-04 14:31 ` Borislav Petkov
2024-09-05 4:35 ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 09/20] virt: sev-guest: Reduce the scope of SNP command mutex Nikunj A Dadhania
2024-09-12 21:54 ` Tom Lendacky
2024-09-13 4:26 ` Nikunj A. Dadhania
2024-09-13 14:06 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 10/20] virt: sev-guest: Carve out SNP message context structure Nikunj A Dadhania
2024-09-13 15:52 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 11/20] x86/sev: Carve out and export SNP guest messaging init routines Nikunj A Dadhania
2024-09-13 15:53 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 12/20] x86/sev: Relocate SNP guest messaging routines to common code Nikunj A Dadhania
2024-09-13 16:27 ` Tom Lendacky
2024-09-16 4:42 ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 13/20] x86/cc: Add CC_ATTR_GUEST_SECURE_TSC Nikunj A Dadhania
2024-09-13 15:21 ` Tom Lendacky
2024-09-16 4:53 ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 14/20] x86/sev: Add Secure TSC support for SNP guests Nikunj A Dadhania
2024-09-13 16:29 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 15/20] x86/sev: Change TSC MSR behavior for Secure TSC enabled guests Nikunj A Dadhania
2024-07-31 15:08 ` [PATCH v11 16/20] x86/sev: Prevent RDTSC/RDTSCP interception " Nikunj A Dadhania
2024-09-13 16:49 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 17/20] x86/sev: Allow Secure TSC feature for SNP guests Nikunj A Dadhania
2024-09-13 16:53 ` Tom Lendacky
2024-09-16 6:23 ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 18/20] x86/sev: Mark Secure TSC as reliable clocksource Nikunj A Dadhania
2024-09-13 16:59 ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available Nikunj A Dadhania
2024-09-13 17:19 ` Tom Lendacky
2024-09-13 17:30 ` Sean Christopherson
2024-09-16 15:20 ` Nikunj A. Dadhania
2024-09-18 12:07 ` Sean Christopherson
2024-09-20 5:15 ` Nikunj A. Dadhania
2024-09-20 7:21 ` Sean Christopherson
2024-09-20 8:54 ` Nikunj A. Dadhania
2024-09-25 8:53 ` Nikunj A. Dadhania
2024-09-25 12:55 ` Sean Christopherson [this message]
2024-09-30 6:27 ` Nikunj A. Dadhania
2024-09-30 21:20 ` Thomas Gleixner
2024-10-01 4:26 ` Nikunj A. Dadhania
2024-10-01 14:36 ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 20/20] x86/cpu/amd: Do not print FW_BUG for Secure TSC Nikunj A Dadhania
2024-09-13 17:21 ` Tom Lendacky
2024-09-13 17:42 ` Jim Mattson
2024-09-16 11:40 ` Nikunj A. Dadhania
2024-09-16 20:21 ` Jim Mattson
2024-08-14 4:14 ` [PATCH v11 00/20] Add Secure TSC support for SNP guests Nikunj A. Dadhania
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZvQHpbNauYTBgU6M@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=gautham.shenoy@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nikunj@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pgonda@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.