public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Nikunj A. Dadhania" <nikunj@amd.com>
Cc: linux-kernel@vger.kernel.org, thomas.lendacky@amd.com,
	bp@alien8.de,  x86@kernel.org, kvm@vger.kernel.org,
	mingo@redhat.com, tglx@linutronix.de,
	 dave.hansen@linux.intel.com, pgonda@google.com,
	pbonzini@redhat.com,  peterz@infradead.org,
	gautham.shenoy@amd.com
Subject: Re: [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available
Date: Wed, 25 Sep 2024 05:55:38 -0700	[thread overview]
Message-ID: <ZvQHpbNauYTBgU6M@google.com> (raw)
In-Reply-To: <ef194c25-22d8-204e-ffb6-8f9f0a0621fb@amd.com>

On Wed, Sep 25, 2024, Nikunj A. Dadhania wrote:
> >>>>> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock
> >>>>> should be disabled assuming that timesource is stable and always running?
> >>>>
> >>>> No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC
> >>>> is stable, irrespective of SNP or TDX.  This is effectively already done for the
> >>>> timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> >>>> invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the
> >>>> kvm_sched_clock_init() code.
> >>>
> >>> The kvm-clock and tsc-early both are having the rating of 299. As they are of
> >>> same rating, kvm-clock is being picked up first.
> >>>
> >>> Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will
> >>> be picked up instead.
> >>
> >> IMO, it's ugly, but that's a problem with the rating system inasmuch as anything.
> >>
> >> But the kernel will still be using kvmclock for the scheduler clock, which is
> >> undesirable.
> > 
> > Agree, kvm_sched_clock_init() is still being called. The above hunk was to use
> > tsc-early/tsc as the clocksource and not kvm-clock.
> 
> How about the below patch:
> 
> From: Nikunj A Dadhania <nikunj@amd.com>
> Date: Tue, 28 Nov 2023 18:29:56 +0530
> Subject: [RFC PATCH] x86/kvmclock: Prefer invariant TSC as the clocksource and
>  scheduler clock
> 
> For platforms that support stable and always running TSC, although the
> kvm-clock rating is dropped to 299 to prefer TSC, the guest scheduler clock
> still keeps on using the kvm-clock which is undesirable. Moreover, as the
> kvm-clock and early-tsc clocksource are both registered with 299 rating,
> kvm-clock is being picked up momentarily instead of selecting more stable
> tsc-early clocksource.
> 
>   kvm-clock: Using msrs 4b564d01 and 4b564d00
>   kvm-clock: using sched offset of 1799357702246960 cycles
>   clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
>   tsc: Detected 1996.249 MHz processor
>   clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns
>   clocksource: Switched to clocksource kvm-clock
>   clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns
>   clocksource: Switched to clocksource tsc
> 
> Drop the kvm-clock rating to 298, so that tsc-early is picked up before
> kvm-clock and use TSC for scheduler clock as well when the TSC is invariant
> and stable.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> 
> ---
> 
> The issue we see here is that on bare-metal if the TSC is marked unstable,
> then the sched-clock will fall back to jiffies. In the virtualization case,
> do we want to fall back to kvm-clock when TSC is marked unstable?

In the general case, yes.  Though that might be a WARN-able offense if the TSC
is allegedly constant+nonstop.  And for SNP and TDX, it might be a "panic and do
not boot" offense, since using kvmclock undermines the security of the guest.

> ---
>  arch/x86/kernel/kvmclock.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 5b2c15214a6b..c997b2628c4b 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -317,9 +317,6 @@ void __init kvmclock_init(void)
>  	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
>  		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>  
> -	flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
> -	kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
> -
>  	x86_platform.calibrate_tsc = kvm_get_tsc_khz;
>  	x86_platform.calibrate_cpu = kvm_get_tsc_khz;
>  	x86_platform.get_wallclock = kvm_get_wallclock;
> @@ -341,8 +338,12 @@ void __init kvmclock_init(void)
>  	 */
>  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> -	    !check_tsc_unstable())
> -		kvm_clock.rating = 299;
> +	    !check_tsc_unstable()) {
> +		kvm_clock.rating = 298;
> +	} else {
> +		flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
> +		kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
> +	}

I would really, really like to fix this in a centralized location, not by having
each PV clocksource muck with their clock's rating.  I'm not even sure the existing
code is entirely correct, as kvmclock_init() runs _before_ tsc_early_init().  Which
is desirable in the legacy case, as it allows calibrating the TSC using kvmclock,

  	x86_platform.calibrate_tsc = kvm_get_tsc_khz;

but on modern setups that's definitely undesirable, as it means the kernel won't
use CPUID.0x15, which every explicitly tells software the frequency of the TSC.

And I don't think we want to simply point at native_calibrate_tsc(), because that
thing is not at all correct for a VM, where checking x86_vendor and x86_vfm is at
best sketchy.  E.g. I would think it's in AMD's interest for Secure TSC to define
the TSC frequency using CPUID.0x15, even if AMD CPUs don't (yet) natively support
CPUID.0x15.

In other words, I think we need to overhaul the PV clock vs. TSC logic so that it
makes sense for modern CPUs+VMs, not just keep hacking away at kvmclock.  I don't
expect the code would be all that complex in the end, the hardest part is likely
just figuring out (and agreeing on) what exactly the kernel should be doing.

>  	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
>  	pv_info.name = "KVM";
> -- 
> 2.34.1
> 

  reply	other threads:[~2024-09-25 12:55 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31 15:07 [PATCH v11 00/20] Add Secure TSC support for SNP guests Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 01/20] virt: sev-guest: Replace dev_dbg with pr_debug Nikunj A Dadhania
2024-08-27  8:48   ` [tip: x86/sev] virt: sev-guest: Replace dev_dbg() with pr_debug() tip-bot2 for Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 02/20] virt: sev-guest: Rename local guest message variables Nikunj A Dadhania
2024-08-27  8:48   ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-09-13 17:22   ` [PATCH v11 02/20] " Tom Lendacky
2024-07-31 15:07 ` [PATCH v11 03/20] virt: sev-guest: Fix user-visible strings Nikunj A Dadhania
2024-08-27  8:48   ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-09-13 17:26   ` [PATCH v11 03/20] " Tom Lendacky
2024-07-31 15:07 ` [PATCH v11 04/20] virt: sev-guest: Ensure the SNP guest messages do not exceed a page Nikunj A Dadhania
2024-08-27  8:48   ` [tip: x86/sev] " tip-bot2 for Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 05/20] virt: sev-guest: Use AES GCM crypto library Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 06/20] x86/sev: Handle failures from snp_init() Nikunj A Dadhania
2024-08-27 11:32   ` Borislav Petkov
2024-08-28  4:47     ` Nikunj A. Dadhania
2024-08-28  9:49       ` Borislav Petkov
2024-08-28 10:16         ` Nikunj A. Dadhania
2024-08-28 10:23           ` Borislav Petkov
2024-07-31 15:07 ` [PATCH v11 07/20] x86/sev: Cache the secrets page address Nikunj A Dadhania
2024-07-31 15:07 ` [PATCH v11 08/20] virt: sev-guest: Consolidate SNP guest messaging parameters to a struct Nikunj A Dadhania
2024-09-04 14:31   ` Borislav Petkov
2024-09-05  4:35     ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 09/20] virt: sev-guest: Reduce the scope of SNP command mutex Nikunj A Dadhania
2024-09-12 21:54   ` Tom Lendacky
2024-09-13  4:26     ` Nikunj A. Dadhania
2024-09-13 14:06       ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 10/20] virt: sev-guest: Carve out SNP message context structure Nikunj A Dadhania
2024-09-13 15:52   ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 11/20] x86/sev: Carve out and export SNP guest messaging init routines Nikunj A Dadhania
2024-09-13 15:53   ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 12/20] x86/sev: Relocate SNP guest messaging routines to common code Nikunj A Dadhania
2024-09-13 16:27   ` Tom Lendacky
2024-09-16  4:42     ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 13/20] x86/cc: Add CC_ATTR_GUEST_SECURE_TSC Nikunj A Dadhania
2024-09-13 15:21   ` Tom Lendacky
2024-09-16  4:53     ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 14/20] x86/sev: Add Secure TSC support for SNP guests Nikunj A Dadhania
2024-09-13 16:29   ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 15/20] x86/sev: Change TSC MSR behavior for Secure TSC enabled guests Nikunj A Dadhania
2024-07-31 15:08 ` [PATCH v11 16/20] x86/sev: Prevent RDTSC/RDTSCP interception " Nikunj A Dadhania
2024-09-13 16:49   ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 17/20] x86/sev: Allow Secure TSC feature for SNP guests Nikunj A Dadhania
2024-09-13 16:53   ` Tom Lendacky
2024-09-16  6:23     ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 18/20] x86/sev: Mark Secure TSC as reliable clocksource Nikunj A Dadhania
2024-09-13 16:59   ` Tom Lendacky
2024-07-31 15:08 ` [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available Nikunj A Dadhania
2024-09-13 17:19   ` Tom Lendacky
2024-09-13 17:30   ` Sean Christopherson
2024-09-16 15:20     ` Nikunj A. Dadhania
2024-09-18 12:07       ` Sean Christopherson
2024-09-20  5:15         ` Nikunj A. Dadhania
2024-09-20  7:21           ` Sean Christopherson
2024-09-20  8:54             ` Nikunj A. Dadhania
2024-09-25  8:53               ` Nikunj A. Dadhania
2024-09-25 12:55                 ` Sean Christopherson [this message]
2024-09-30  6:27                   ` Nikunj A. Dadhania
2024-09-30 21:20                     ` Thomas Gleixner
2024-10-01  4:26                       ` Nikunj A. Dadhania
2024-10-01 14:36                         ` Nikunj A. Dadhania
2024-07-31 15:08 ` [PATCH v11 20/20] x86/cpu/amd: Do not print FW_BUG for Secure TSC Nikunj A Dadhania
2024-09-13 17:21   ` Tom Lendacky
2024-09-13 17:42   ` Jim Mattson
2024-09-16 11:40     ` Nikunj A. Dadhania
2024-09-16 20:21       ` Jim Mattson
2024-08-14  4:14 ` [PATCH v11 00/20] Add Secure TSC support for SNP guests Nikunj A. Dadhania

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvQHpbNauYTBgU6M@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gautham.shenoy@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox