Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* Re: [PATCH v5 05/51] x86/sev: Mark TSC as reliable when configuring Secure TSC
From: Sean Christopherson @ 2026-07-01 20:07 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701195652.5DEF11F000E9@smtp.kernel.org>

On Wed, Jul 01, 2026, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> commit cdbd80de837f6d512a759452a18164ddd83b9b7b
> Author: Sean Christopherson <seanjc@google.com>
> 
> x86/sev: Mark TSC as reliable when configuring Secure TSC
> 
> This commit moves the setting of the X86_FEATURE_TSC_RELIABLE flag from
> sme_early_init() to snp_secure_tsc_init(). It consolidates the handling of
> known-frequency and reliable TSC flags when overriding the TSC calibration
> routine.
> 
> [Severity: High]
> This isn't a bug introduced by this patch, but does the early return in
> arch/x86/kernel/tsc.c:tsc_early_init() allow an untrusted hypervisor to
> bypass Secure TSC initialization?
> 
> The tsc_early_init() function checks is_early_uv_system() before initializing
> Secure TSC. The is_early_uv_system() check relies on the presence of
> UV_SYSTEM_TABLE_GUID in the EFI configuration tables.
> 
> Because the hypervisor can provide tables to the guest firmware which may
> expose them as EFI configuration tables, an untrusted hypervisor could inject
> this specific GUID. Would this trick the kernel into returning early,
> bypassing both Secure TSC and TDX TSC initialization, and causing the guest
> to fall back to untrusted legacy timers?

I suppose so?  But that's a problem best handled in a dedicated, separate patch.

> 
> [Severity: High]
> This is also a pre-existing issue, but should the newly updated
> arch/x86/coco/sev/core.c:snp_secure_tsc_init() validate that the calculated
> frequency is non-zero before returning?
> 
> If the calculated frequency evaluates to 0, which could happen if the
> hypervisor causes MSR_AMD64_GUEST_TSC_FREQ to read as 0 or if the secrets
> factor is invalid, the function simply returns 0 instead of terminating
> the VM.

The hypervisor can't do that?  If it can, that too should be addressed separately.

> The caller tsc_early_init() treats a return value of 0 as a failure to find
> a trusted frequency and gracefully falls back to x86_init.hyper.get_tsc_khz()
> or legacy timers. In a confidential computing threat model, should this
> security-critical initialization fail closed and terminate the VM, rather
> than falling back to the hypervisor-controlled time sources that Secure TSC
> is designed to protect against?
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=5

^ permalink raw reply

* Re: [PATCH v5 01/51] x86/apic: Provide helpers to set local APIC timer period in hz and khz
From: Sean Christopherson @ 2026-07-01 20:05 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvm, linux-hyperv
In-Reply-To: <20260701194621.4BD691F000E9@smtp.kernel.org>

On Wed, Jul 01, 2026, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index aa1e19979aa8e..8d3d930576fd7 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> 
> [ ... ]
> 
> > @@ -796,6 +796,16 @@ bool __init apic_needs_pit(void)
> >  	return lapic_timer_period == 0;
> >  }
> >  
> > +void apic_set_timer_period_khz(u64 period_khz, const char *source)
> > +{
> > +	lapic_timer_period = mul_u64_u32_div(period_khz, 1000, HZ);
> > +}
> > +
> > +void apic_set_timer_period_hz(u64 period_hz, const char *source)
> > +{
> > +	lapic_timer_period = div_u64(period_hz, HZ);
> > +}
> 
> [Severity: Medium]
> It looks like the source parameter is accepted by apic_set_timer_period_khz()
> and apic_set_timer_period_hz() in arch/x86/kernel/apic/apic.c but is never
> used. Was it intended to log the frequency using this source string?

/facepalm, yes.

^ permalink raw reply

* Re: [PATCH v5 22/51] x86/kvm: Mark TSC as reliable when it's constant and nonstop
From: sashiko-bot @ 2026-07-01 20:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701193212.749551-23-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unconditionally forcing X86_FEATURE_TSC_RELIABLE bypasses SMP cross-CPU TSC synchronization checks, which will cause time to go backwards for guests running on hosts with unsynchronized TSCs.
--

commit fb8fd2f8d18aabaf24f424391748d58e9427bdf7
Author: Sean Christopherson <seanjc@google.com>

x86/kvm: Mark TSC as reliable when it's constant and nonstop

This commit marks the TSC as reliable if KVM enumerates it as constant
and nonstop. This aims to prevent the TSC from occasionally being marked
unstable when the host is under moderate or heavy load.

> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 909d3e5e5bcd5..1cef54e1e7d9e 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
[ ... ]
> @@ -1040,7 +1041,16 @@ static void __init kvm_init_platform(void)
>  		}
>  	}
>  
> -	kvmclock_init();
> +        /*
> +         * If the TSC counts at a constant frequency across P/T states and in
> +         * deep C-states, treat the TSC reliable, as guaranteed by KVM.
> +         */
> +	tsc_is_reliable = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +			  boot_cpu_has(X86_FEATURE_NONSTOP_TSC);
> +	if (tsc_is_reliable)
> +		setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);

[Severity: High]
Does setting X86_FEATURE_TSC_RELIABLE here bypass SMP cross-CPU TSC
synchronization checks?

Forcing X86_FEATURE_TSC_RELIABLE based only on CONSTANT_TSC and NONSTOP_TSC,
without verifying if the hypervisor actually guarantees cross-CPU
synchronization (e.g., via KVM_FEATURE_CLOCKSOURCE_STABLE_BIT), sets
tsc_clocksource_reliable = 1.

This bypasses the synchronization check in check_tsc_sync_source():

arch/x86/kernel/tsc_sync.c:check_tsc_sync_source() {
    ...
    if (tsc_store_and_check_tsc_adjust(false) || tsc_clocksource_reliable)
        return;
    ...
}

Could this cause time to go backwards for guests running on hosts with
unsynchronized TSCs when threads migrate between vCPUs?

> +
> +	kvmclock_init(tsc_is_reliable);
>  	x86_platform.apic_post_init = kvm_apic_init;
>  
>  	/*

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=22

^ permalink raw reply

* Re: [PATCH v5 29/51] x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()
From: sashiko-bot @ 2026-07-01 20:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-hyperv
In-Reply-To: <20260701193212.749551-30-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] The patch inadvertently leaves the BSP's kvmclock enabled when powering off the machine for ACPI S4 hibernation, leading to potential memory corruption when the VM awakens.

Pre-existing issues:
- [Critical] Stack-allocated waitqueue head use-after-free in async page fault handling.
--

Note: The standard inline diff format has been altered to a free-form
summary due to patch recitation restrictions.

commit 03858970f2d7a88b91bdd5b16c1573166126d3c4
Author: Sean Christopherson <seanjc@google.com>
Subject: x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()

This patch modifies the guest suspend logic so that kvmclock is not disabled
on the BSP during syscore_suspend(), relying instead on the sched_clock
restore callback.

[Severity: High]
Does this change inadvertently leave the BSP's kvmclock enabled when
powering off the machine for ACPI S4 hibernation?

Looking at the final S4 power-off path in hibernation_platform_enter() in
kernel/power/hibernate.c, it calls syscore_suspend() but does not call
save_processor_state(). Because this patch removes the unconditional BSP
kvmclock_disable() from syscore_suspend() and relies on
kvm_save_sched_clock_state() (which only fires during image creation),
kvmclock appears to remain fully enabled during the S4 power-off.

Could this lead to memory corruption when the VM awakens, since the
hypervisor might continue writing to the old kvmclock address?

[Severity: Critical]
This isn't a bug introduced by this patch, but is there a use-after-free
risk with the stack-allocated waitqueue head in the async page fault
handling in arch/x86/kernel/kvm.c?

When a guest task waits for an async page fault in
kvm_async_pf_task_wait_schedule(), it waits on a stack-allocated struct
kvm_task_sleep_node. Concurrently, when a waker CPU executes
apf_task_wake_one(), it unhashes the node via hlist_del_init().

The waiter checks hlist_unhashed() locklessly. If it observes the unhashed
state before the waker is finished, the waiter can break its loop, return,
and destroy the stack frame. The waker might then dereference the
deallocated waitqueue head to call swake_up_one(), potentially corrupting
stack memory.

Can this sequence occur during concurrent page fault completions?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=29

^ permalink raw reply

* Re: [PATCH v5 05/51] x86/sev: Mark TSC as reliable when configuring Secure TSC
From: sashiko-bot @ 2026-07-01 19:56 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701193212.749551-6-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but a malicious hypervisor can bypass Secure TSC calibration by injecting a fake UV system table, forcing the guest to fall back to untrusted legacy timers.
- [High] This problem wasn't introduced by this patch, but if the calculated Secure TSC frequency is 0, the kernel silently falls back to untrusted timers instead of terminating the VM.
--

Note: Due to system recitation restrictions, this report is provided as a free-form summary without direct diff quoting.

commit cdbd80de837f6d512a759452a18164ddd83b9b7b
Author: Sean Christopherson <seanjc@google.com>

x86/sev: Mark TSC as reliable when configuring Secure TSC

This commit moves the setting of the X86_FEATURE_TSC_RELIABLE flag from
sme_early_init() to snp_secure_tsc_init(). It consolidates the handling of
known-frequency and reliable TSC flags when overriding the TSC calibration
routine.

[Severity: High]
This isn't a bug introduced by this patch, but does the early return in
arch/x86/kernel/tsc.c:tsc_early_init() allow an untrusted hypervisor to
bypass Secure TSC initialization?

The tsc_early_init() function checks is_early_uv_system() before initializing
Secure TSC. The is_early_uv_system() check relies on the presence of
UV_SYSTEM_TABLE_GUID in the EFI configuration tables.

Because the hypervisor can provide tables to the guest firmware which may
expose them as EFI configuration tables, an untrusted hypervisor could inject
this specific GUID. Would this trick the kernel into returning early,
bypassing both Secure TSC and TDX TSC initialization, and causing the guest
to fall back to untrusted legacy timers?

[Severity: High]
This is also a pre-existing issue, but should the newly updated
arch/x86/coco/sev/core.c:snp_secure_tsc_init() validate that the calculated
frequency is non-zero before returning?

If the calculated frequency evaluates to 0, which could happen if the
hypervisor causes MSR_AMD64_GUEST_TSC_FREQ to read as 0 or if the secrets
factor is invalid, the function simply returns 0 instead of terminating
the VM.

The caller tsc_early_init() treats a return value of 0 as a failure to find
a trusted frequency and gracefully falls back to x86_init.hyper.get_tsc_khz()
or legacy timers. In a confidential computing threat model, should this
security-critical initialization fail closed and terminate the VM, rather
than falling back to the hypervisor-controlled time sources that Secure TSC
is designed to protect against?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=5

^ permalink raw reply

* Re: [PATCH v5 24/51] x86/kvm: Get CPU base frequency from CPUID when it's available
From: sashiko-bot @ 2026-07-01 19:54 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-hyperv
In-Reply-To: <20260701193212.749551-25-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The base CPU frequency obtained from CPUID.0x16 is unconditionally clamped back to the TSC frequency if it differs by >10%, defeating the purpose of the patch.
--

commit 1331572140e49219bdcc6ceabc0370555476adde
Author: Sean Christopherson <seanjc@google.com>

x86/kvm: Get CPU base frequency from CPUID when it's available

This commit reads CPUID.0x16 to fetch the CPU base frequency if valid.
It then sets x86_init.hyper.get_cpu_khz to kvm_get_cpu_khz to avoid
assuming the CPU base frequency matches the TSC frequency.

> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 1cef54e1e7d9e..6c7011ff7bd13 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c

[ ... ]

> @@ -1041,6 +1047,14 @@ static void __init kvm_init_platform(void)
>  		}
>  	}
>  
> +	/*
> +	 * Prefer CPUID.0x16 over KVM's PV CPUID when possible, as the base CPU
> +	 * frequency isn't necessarily the same as the TSC frequency.
> +	 */
> +	kvm_cpu_khz_cpuid = __cpu_khz_from_cpuid();
> +	if (kvm_cpu_khz_cpuid)
> +		x86_init.hyper.get_cpu_khz = kvm_get_cpu_khz;

[Severity: Medium]
Will this new CPU frequency actually be used if it diverges significantly
from the TSC frequency? 

Looking at determine_cpu_tsc_frequencies() in arch/x86/kernel/tsc.c, there
is existing logic that unconditionally clamps the CPU frequency back to the
TSC frequency if they differ by more than 10%:

	else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
		cpu_khz = tsc_khz;

Does this clamping defeat the purpose of this patch for guests running on
hosts where the base CPU frequency and TSC frequency differ by >10%?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=24

^ permalink raw reply

* Re: [PATCH v5 06/51] x86/sev: Don't override CPU frequency calibration for SNP's Secure TSC
From: sashiko-bot @ 2026-07-01 19:53 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-hyperv
In-Reply-To: <20260701193212.749551-7-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch removes the `calibrate_cpu` override to allow the CPU base frequency to differ from the TSC frequency, but fails to account for `determine_cpu_tsc_frequencies()` which will forcefully clobber `cpu_khz` if it diverges by more than 10%.
--

commit 6d3b9e81661ead5f25434666266aec6514dadaa8
Author: Sean Christopherson <seanjc@google.com>

x86/sev: Don't override CPU frequency calibration for SNP's Secure TSC

This commit removes the override of the kernel's CPU frequency calibration
routine when registering SNP's Secure TSC. It aims to decouple the CPU base
frequency from the TSC frequency, as exposing APERF/MPERF requires an
accurate CPU frequency.

> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -2046,7 +2046,6 @@ void __init snp_secure_tsc_init(void)
>  
>  	snp_tsc_freq_khz = SNP_SCALE_TSC_FREQ(tsc_freq_mhz * 1000, secrets->tsc_factor);
>  
> -	x86_platform.calibrate_cpu = securetsc_get_tsc_khz;

[Severity: Medium]
Will this removal be sufficient to decouple cpu_khz from tsc_khz if they
diverge significantly?

Looking at determine_cpu_tsc_frequencies() in arch/x86/kernel/tsc.c, there
is a generic sanity check that enforces a 10% limit on the difference
between the two frequencies:

arch/x86/kernel/tsc.c:determine_cpu_tsc_frequencies() {
    ...
    else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
        cpu_khz = tsc_khz;
    ...
}

If the true host CPU frequency diverges from the guest TSC frequency by more
than 10%, won't this logic forcefully overwrite the correct cpu_khz back to
tsc_khz, unintentionally undoing the fix for the APERF/MPERF calculation?

>  	x86_platform.calibrate_tsc = securetsc_get_tsc_khz;
>  
>  	early_memunmap(mem, PAGE_SIZE);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=6

^ permalink raw reply

* Re: [PATCH v5 12/51] x86/acrn: Register TSC/CPU frequency callbacks iff frequency is actually in CPUID
From: sashiko-bot @ 2026-07-01 19:52 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701193212.749551-13-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] ACRN hypervisor integration statically hardcodes its CPUID leaf to 0x40000010 without applying the dynamic base offset, breaking nested setups and potentially causing severe timekeeping corruption.
--

commit 8beac699ed1da5364ebdf7f1cc4475bed5521d60
Author: Sean Christopherson <seanjc@google.com>

x86/acrn: Register TSC/CPU frequency callbacks iff frequency is actually in CPUID

This commit modifies ACRN initialization to only register the TSC and CPU
frequency override callbacks if the hypervisor actually provides a non-zero
frequency in its CPUID leaf. It achieves this by reading the frequency
during platform initialization and caching the result.

> diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
> index ad8f2da8003b3..dc71a6fdd4613 100644
> --- a/arch/x86/kernel/cpu/acrn.c
> +++ b/arch/x86/kernel/cpu/acrn.c
[ ... ]
> @@ -26,13 +28,19 @@ static u32 __init acrn_detect(void)
>  
>  static unsigned int __init acrn_get_tsc_khz(void)
>  {
> -	return cpuid_eax(ACRN_CPUID_TIMING_INFO);
> +	return acrn_tsc_khz_cpuid;
>  }
>  
>  static void __init acrn_init_platform(void)
>  {
>  	/* Install system interrupt handler for ACRN hypervisor callback */
>  	sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
> +
> +	acrn_tsc_khz_cpuid = cpuid_eax(ACRN_CPUID_TIMING_INFO);

[Severity: High]
This is a pre-existing issue, but does hardcoding ACRN_CPUID_TIMING_INFO to
0x40000010 without dynamically applying the base offset break nested setups?

If Linux is booted as an L2 guest with an L0 hypervisor like KVM occupying
the base 0x40000000 leaf, and L1 ACRN resides at 0x40000100, acrn_detect()
correctly finds ACRN using cpuid_base_hypervisor() at the 0x40000100 base.

However, acrn_init_platform() hardcodes the fetch:

arch/x86/include/asm/acrn.h:
#define ACRN_CPUID_TIMING_INFO        0x40000010

Will this bypass the dynamic base detected earlier and read the timing info
from the wrong hypervisor's CPUID space (e.g., KVM's L0 frequency)?

If nested under a hypervisor that returns garbage for 0x40000010, could
acrn_tsc_khz_cpuid be populated with a non-zero bogus value and corrupt
guest timekeeping?

> +	if (acrn_tsc_khz_cpuid) {
> +		x86_init.hyper.get_tsc_khz = acrn_get_tsc_khz;
> +		x86_init.hyper.get_cpu_khz = acrn_get_tsc_khz;
> +	}
>  }
>  
>  static bool acrn_x2apic_available(void)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=12

^ permalink raw reply

* Re: [PATCH v5 17/51] x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()
From: sashiko-bot @ 2026-07-01 19:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701193212.749551-18-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The refactoring in `recalibrate_cpu_khz()` introduces a race condition where the global variable `cpu_khz` is temporarily set to 0 for up to 150ms during runtime CPU frequency recalibration. This exposes the system to invalid state reads.
--

commit 34cf1b98d726e6c329bdefbd5c9712901d8b4bc8
Author: Sean Christopherson <seanjc@google.com>

x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()

This commit folds the implementation of native_calibrate_cpu() directly into
recalibrate_cpu_khz() to remove an extra CONFIG_SMP=n ifdef block. This change
aims to clarify the intentional use of early versus late calibration routines.

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 9764ac758081f..6ed6f8f012ebf 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
[ ... ]
> @@ -944,7 +929,9 @@ void recalibrate_cpu_khz(void)
>  	if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_TSC_KNOWN_FREQ)))
>  		return;
>  
> -	cpu_khz = native_calibrate_cpu();
> +	cpu_khz = native_calibrate_cpu_early();
> +	if (!cpu_khz)
> +		cpu_khz = native_calibrate_cpu_late();

[Severity: Medium]
Does this expose a race condition by temporarily assigning 0 to the global
cpu_khz variable?

On a UP kernel during cpufreq transitions, if early calibration fails, the
global cpu_khz is directly assigned 0. It remains 0 while
native_calibrate_cpu_late() executes, which can take ~150ms and enables
interrupts.

Could this lead to sysfs readers via arch_freq_get_on_cpu() observing a 0
frequency, or the NMI watchdog being misconfigured with a 0 sample period if
toggled during this window? 

The previous implementation used a local variable inside native_calibrate_cpu()
to hold the intermediate state, preventing concurrent readers from seeing an
invalid 0 value before the slow fallback calibration completes.

>  	tsc_khz = native_calibrate_tsc();
>  	if (tsc_khz == 0)
>  		tsc_khz = cpu_khz;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=17

^ permalink raw reply

* Re: [PATCH v5 02/51] x86/apic: Add CONFIG_X86_LOCAL_APIC=n stubs for apic_set_timer_period_{,k}hz()
From: sashiko-bot @ 2026-07-01 19:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-hyperv, kvm
In-Reply-To: <20260701193212.749551-3-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The `source` argument in `apic_set_timer_period_hz()` and `apic_set_timer_period_khz()` is completely unused, leading to dead code and lost debug logging.
--

commit a7b2ed4060ee2d44136d35fad056804351b581ca
Author: Sean Christopherson <seanjc@google.com>

x86/apic: Add CONFIG_X86_LOCAL_APIC=n stubs for apic_set_timer_period_{,k}hz()

This commit adds stubs for apic_set_timer_period_hz() and
apic_set_timer_period_khz() when the kernel is built without local APIC
support. It also removes the corresponding #ifdef guards from the calling
code in various x86 subsystems.

> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index cd84a94688a2f..035998555e992 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -189,6 +189,8 @@ static inline void disable_local_APIC(void) { }
>  # define setup_boot_APIC_clock x86_init_noop
>  # define setup_secondary_APIC_clock x86_init_noop
>  static inline void lapic_update_tsc_freq(void) { }
> +static inline void apic_set_timer_period_hz(u64 period_hz, const char *source) { }
> +static inline void apic_set_timer_period_khz(u64 period_khz, const char *source) { }

[Severity: Medium]
Is it intentional that the source argument is completely unused here in the
stubs?

Looking at the main implementation in arch/x86/kernel/apic/apic.c:

void apic_set_timer_period_khz(u64 period_khz, const char *source)
{
        lapic_timer_period = mul_u64_u32_div(period_khz, 1000, HZ);
}

The source parameter is discarded without being used for debug logging. Since
any code calling apic_set_timer_period_hz() or apic_set_timer_period_khz()
passes a string literal describing the source (such as "VMware hypervisor"
or "CPUID 0x15/0x16"), this results in dropped debug information.

Could the source parameter be removed from the API entirely to clean up the
dead code, or should the boot-time logging be restored?

>  static inline void init_bsp_APIC(void) { }
>  static inline void apic_intr_mode_select(void) { }
>  static inline void apic_intr_mode_init(void) { }

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=2

^ permalink raw reply

* Re: [PATCH v5 01/51] x86/apic: Provide helpers to set local APIC timer period in hz and khz
From: sashiko-bot @ 2026-07-01 19:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-hyperv
In-Reply-To: <20260701193212.749551-2-seanjc@google.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The `apic_set_timer_period_khz()` and `apic_set_timer_period_hz()` helper functions accept a `const char *source` argument which is never used, resulting in dead code and a silent loss of boot diagnostic logging on platforms where logging was removed.
--

commit 4fb576b0d909663c9b3e6628dbed45d329c08a5d
Author: Sean Christopherson <seanjc@google.com>

x86/apic: Provide helpers to set local APIC timer period in hz and khz

This commit adds APIs to set the local APIC timer period, abstracting the
HZ math previously open-coded by external callers. It also changes the
lapic_timer_period variable to be local to apic.c.

> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index aa1e19979aa8e..8d3d930576fd7 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c

[ ... ]

> @@ -796,6 +796,16 @@ bool __init apic_needs_pit(void)
>  	return lapic_timer_period == 0;
>  }
>  
> +void apic_set_timer_period_khz(u64 period_khz, const char *source)
> +{
> +	lapic_timer_period = mul_u64_u32_div(period_khz, 1000, HZ);
> +}
> +
> +void apic_set_timer_period_hz(u64 period_hz, const char *source)
> +{
> +	lapic_timer_period = div_u64(period_hz, HZ);
> +}

[Severity: Medium]
It looks like the source parameter is accepted by apic_set_timer_period_khz()
and apic_set_timer_period_hz() in arch/x86/kernel/apic/apic.c but is never
used. Was it intended to log the frequency using this source string?

> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 185d4f677ec0f..87beecec76f04 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -646,10 +646,7 @@ static void __init ms_hyperv_init_platform(void)
>  		u64	hv_lapic_frequency;
>  
>  		rdmsrq(HV_X64_MSR_APIC_FREQUENCY, hv_lapic_frequency);
> -		hv_lapic_frequency = div_u64(hv_lapic_frequency, HZ);
> -		lapic_timer_period = hv_lapic_frequency;
> -		pr_info("Hyper-V: LAPIC Timer Frequency: %#x\n",
> -			lapic_timer_period);
> +		apic_set_timer_period_hz(hv_lapic_frequency, "Hyper-V hypervisor");
>  	}
>  

[ ... ]

> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
> index 34b73573b108d..36f779dd311dc 100644
> --- a/arch/x86/kernel/cpu/vmware.c
> +++ b/arch/x86/kernel/cpu/vmware.c
> @@ -424,9 +424,7 @@ static void __init vmware_platform_setup(void)
>  
>  #ifdef CONFIG_X86_LOCAL_APIC
>  		/* Skip lapic calibration since we know the bus frequency. */
> -		lapic_timer_period = ecx / HZ;
> -		pr_info("Host bus clock speed read from hypervisor : %u Hz\n",
> -			ecx);
> +		apic_set_timer_period_hz(ecx, "VMware hypervisor");
>  #endif

[Severity: Medium]
These changes remove the pr_info() logging of the APIC frequency during early
boot APIC initialization in ms_hyperv_init_platform()
(arch/x86/kernel/cpu/mshyperv.c) and vmware_platform_setup()
(arch/x86/kernel/cpu/vmware.c).

Since the new apic_set_timer_period_hz() helper does not log this information,
does this result in an unintended loss of boot diagnostic logging on these
platforms?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701193212.749551-1-seanjc@google.com?part=1

^ permalink raw reply

* [PATCH v5 51/51] x86/kvm: Get local APIC bus frequency from PV CPUID Timing Info
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

When running as a KVM guest with PV timing info provided by the host,
stuff the APIC timer period/frequency with the local APIC bus frequency
reported in CPUID.0x40000010.EBX instead of trying to calibrate/guess the
frequency.

See Documentation/virt/kvm/x86/cpuid.rst for details.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f9a6346077b0..beea0b6aa78e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -990,7 +990,7 @@ static void __init kvm_init_platform(void)
 		.mask_lo = (u32)(~(SZ_4G - tolud - 1)) | MTRR_PHYSMASK_V,
 		.mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
 	};
-	u32 timing_info_leaf;
+	u32 timing_info_leaf, apic_khz;
 	bool tsc_is_reliable;
 
 	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
@@ -1052,6 +1052,11 @@ static void __init kvm_init_platform(void)
 			x86_init.hyper.get_tsc_khz = kvm_get_tsc_khz;
 			x86_init.hyper.get_cpu_khz = kvm_get_tsc_khz;
 		}
+
+		/* The leaf also includes the local APIC bus/timer frequency.*/
+		apic_khz = cpuid_ebx(timing_info_leaf);
+		if (apic_khz)
+			apic_set_timer_period_khz(apic_khz, "KVM hypervisor");
 	}
 
 	/*
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 50/51] x86/paravirt: Move using_native_sched_clock() stub into timer.h
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Now that timer.h ended up with CONFIG_PARAVIRT #ifdeffery anyways, move the
PARAVIRT=n using_native_sched_clock() stub into timer.h as a "free"
optimization.

No functional change intended.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/timer.h | 6 ++++--
 arch/x86/kernel/tsc.c        | 2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index ca5c95d48c03..a52388af6055 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -11,9 +11,9 @@ extern void recalibrate_cpu_khz(void);
 
 extern int no_timer_check;
 
-extern bool using_native_sched_clock(void);
-
 #ifdef CONFIG_PARAVIRT
+extern bool using_native_sched_clock(void);
+
 int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
 				      void (*save)(void), void (*restore)(void));
 
@@ -23,6 +23,8 @@ static __always_inline void paravirt_set_sched_clock(u64 (*func)(void),
 {
 	(void)__paravirt_set_sched_clock(func, true, save, restore);
 }
+#else
+static inline bool using_native_sched_clock(void) { return true; }
 #endif
 
 /*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a146fc7b5e74..564be4faa5a0 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -302,8 +302,6 @@ int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
 }
 #else
 u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
-
-bool using_native_sched_clock(void) { return true; }
 #endif
 
 notrace u64 sched_clock(void)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 49/51] x86/kvmclock: Plumb in AP-online and BSP-resume to kvmlock, for documentation
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Invoke kvmclock_cpu_action() with AP_ONLINE and BSP_RESUME, even though
kvmclock doesn't need to do anything in either case, so that the asymmetry
of kvmclock is a detail buried in kvmclock, and to explicitly document
that doing nothing during those phases is intentional and correct.

For all intents and purposes, no functional change intended.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c           | 22 +++++++++++++-------
 arch/x86/kernel/kvmclock.c      | 37 ++++++++++++++++++++++++++-------
 3 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 08686ff19caa..763ed017738a 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -120,6 +120,8 @@ static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
 #ifdef CONFIG_KVM_GUEST
 enum kvm_guest_cpu_action {
 	KVM_GUEST_BSP_SUSPEND,
+	KVM_GUEST_BSP_RESUME,
+	KVM_GUEST_AP_ONLINE,
 	KVM_GUEST_AP_OFFLINE,
 	KVM_GUEST_SHUTDOWN,
 };
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 604b52f233aa..f9a6346077b0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -474,18 +474,24 @@ static void kvm_guest_cpu_offline(enum kvm_guest_cpu_action action)
 	kvmclock_cpu_action(action);
 }
 
+static void __kvm_cpu_online(unsigned int cpu, enum kvm_guest_cpu_action action)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	kvmclock_cpu_action(action);
+	kvm_guest_cpu_init();
+	local_irq_restore(flags);
+}
+
+#ifdef CONFIG_SMP
+
 static int kvm_cpu_online(unsigned int cpu)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	kvm_guest_cpu_init();
-	local_irq_restore(flags);
+	__kvm_cpu_online(cpu, KVM_GUEST_AP_ONLINE);
 	return 0;
 }
 
-#ifdef CONFIG_SMP
-
 static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask);
 
 static bool pv_tlb_flush_supported(void)
@@ -750,7 +756,7 @@ static int kvm_suspend(void *data)
 
 static void kvm_resume(void *data)
 {
-	kvm_cpu_online(raw_smp_processor_id());
+	__kvm_cpu_online(raw_smp_processor_id(), KVM_GUEST_BSP_RESUME);
 
 #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
 	if (kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL) && has_guest_poll)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index bc98ebb8587d..842f38c5f6ca 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -150,7 +150,7 @@ static void kvm_save_sched_clock_state(void)
 #ifdef CONFIG_SMP
 static void kvm_setup_secondary_clock(void)
 {
-	kvm_register_clock("secondary cpu clock");
+	kvm_register_clock("secondary cpu, startup");
 }
 #endif
 
@@ -174,13 +174,34 @@ static void kvmclock_resume(struct clocksource *cs)
 
 void kvmclock_cpu_action(enum kvm_guest_cpu_action action)
 {
-	/*
-	 * Don't disable kvmclock on the BSP during suspend.  If kvmclock is
-	 * being used for sched_clock, then it needs to be kept alive until the
-	 * last minute, and restored as quickly as possible after resume.
-	 */
-	if (action != KVM_GUEST_BSP_SUSPEND)
+	switch (action) {
+		/*
+		 * The BSP's clock is managed via clocksource suspend/resume,
+		 * to ensure it's enabled/disabled when timekeeping needs it
+		 * to be, e.g. before reading wallclock (which uses kvmclock).
+		 */
+	case KVM_GUEST_BSP_SUSPEND:
+	case KVM_GUEST_BSP_RESUME:
+		break;
+	case KVM_GUEST_AP_ONLINE:
+		/*
+		 * Secondary CPUs use a dedicated hook to enable kvmclock early
+		 * during bringup, there's nothing to be done during CPU online
+		 * (which runs at CPUHP_AP_ONLINE_DYN).  When kvmclock is being
+		 * used as sched_clock, kvmclock must be enabled *very* early,
+		 * and even when kvmclock is "only" being used for the main
+		 * clocksource, it still needs to be enabled long before the
+		 * dynamic CPUHP calls are made.
+		 */
+		break;
+	case KVM_GUEST_AP_OFFLINE:
+	case KVM_GUEST_SHUTDOWN:
 		kvmclock_disable();
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
 }
 
 /*
@@ -382,7 +403,7 @@ void __init kvmclock_init(bool prefer_tsc)
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
 	this_cpu_write(hv_clock_per_cpu, &hv_clock_boot[0]);
-	kvm_register_clock("primary cpu clock");
+	kvm_register_clock("primary cpu, online");
 	pvclock_set_pvti_cpu0_va(hv_clock_boot);
 
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) {
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 48/51] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Prefer the TSC over kvmclock for sched_clock if the TSC is constant and
nonstop.  I.e. use the same criteria as tweaking the clocksource rating so
that TSC is preferred over kvmclock.  Per the below comment from
native_sched_clock(), sched_clock is more tolerant of slop than
clocksource; using TSC for clocksource but not sched_clock makes little to
no sense, especially now that KVM CoCo guests with a trusted TSC use TSC,
not kvmclock.

        /*
         * Fall back to jiffies if there's no TSC available:
         * ( But note that we still use it if the TSC is marked
         *   unstable. We do this because unlike Time Of Day,
         *   the scheduler clock tolerates small errors and it's
         *   very important for it to be as fast as the platform
         *   can achieve it. )
         */

The only advantage of using kvmclock is that doing so allows for early
and common detection of PVCLOCK_GUEST_STOPPED, but that code has been
broken for over two years with nary a complaint, i.e. it can't be
_that_ valuable.  And as above, certain types of KVM guests are losing
the functionality regardless, i.e. acknowledging PVCLOCK_GUEST_STOPPED
needs to be decoupled from sched_clock() no matter what.

Link: https://lore.kernel.org/all/Z4hDK27OV7wK572A@google.com
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvmclock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 22e8855fcd4d..bc98ebb8587d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -396,7 +396,6 @@ void __init kvmclock_init(bool prefer_tsc)
 			 PVCLOCK_TSC_STABLE_BIT;
 	}
 
-	kvm_sched_clock_init(stable);
 
 	if (!x86_init.hyper.get_tsc_khz)
 		x86_init.hyper.get_tsc_khz = kvmclock_get_tsc_khz;
@@ -416,6 +415,8 @@ void __init kvmclock_init(bool prefer_tsc)
 	 */
 	if (prefer_tsc)
 		kvm_clock.rating = 299;
+	else
+		kvm_sched_clock_init(stable);
 
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.name = "KVM";
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 47/51] x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted TSC
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Silently ignore attempts to switch to a paravirt sched_clock when running
as a CoCo guest with trusted TSC.  In hand-wavy theory, a misbehaving
hypervisor could attack the guest by manipulating the PV clock to affect
guest scheduling in some weird and/or predictable way.  More importantly,
reading TSC on such platforms is faster than any PV clock, and sched_clock
is all about speed.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/tsc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 012321fed5e5..a146fc7b5e74 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -283,6 +283,15 @@ bool using_native_sched_clock(void)
 int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
 				      void (*save)(void), void (*restore)(void))
 {
+	/*
+	 * Don't replace TSC with a PV clock when running as a CoCo guest and
+	 * the TSC is secure/trusted; PV clocks are emulated by the hypervisor,
+	 * which isn't in the guest's TCB.
+	 */
+	if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC) ||
+	    boot_cpu_has(X86_FEATURE_TDX_GUEST))
+		return -EPERM;
+
 	if (!stable)
 		clear_sched_clock_stable();
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 46/51] x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Add a return code to __paravirt_set_sched_clock() so that the kernel can
reject attempts to use a PV sched_clock without breaking the caller.  E.g.
when running as a CoCo VM with a secure TSC, using a PV clock is generally
undesirable.

Note, kvmclock is the only PV clock that does anything "extra" beyond
simply registering itself as sched_clock, i.e. is the only caller that
needs to check the new return value.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/timer.h | 6 +++---
 arch/x86/kernel/kvmclock.c   | 9 ++++++---
 arch/x86/kernel/tsc.c        | 5 +++--
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index 96ae7feac47c..ca5c95d48c03 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -14,14 +14,14 @@ extern int no_timer_check;
 extern bool using_native_sched_clock(void);
 
 #ifdef CONFIG_PARAVIRT
-void __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
-				       void (*save)(void), void (*restore)(void));
+int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
+				      void (*save)(void), void (*restore)(void));
 
 static __always_inline void paravirt_set_sched_clock(u64 (*func)(void),
 						     void (*save)(void),
 						     void (*restore)(void))
 {
-	__paravirt_set_sched_clock(func, true, save, restore);
+	(void)__paravirt_set_sched_clock(func, true, save, restore);
 }
 #endif
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2cc3dd2ba355..22e8855fcd4d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -332,10 +332,13 @@ static int kvmclock_setup_percpu(unsigned int cpu)
 
 static __init void kvm_sched_clock_init(bool stable)
 {
+	/* Ensure the offset is configured before making kvmclock visible! */
 	kvm_sched_clock_offset = kvm_clock_read();
-	__paravirt_set_sched_clock(kvm_sched_clock_read, stable,
-				   kvm_save_sched_clock_state,
-				   kvm_restore_sched_clock_state);
+
+	if (__paravirt_set_sched_clock(kvm_sched_clock_read, stable,
+				       kvm_save_sched_clock_state,
+				       kvm_restore_sched_clock_state))
+		return;
 
 	/*
 	 * The BSP's clock is managed via dedicated sched_clock save/restore
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 0f92b29adecc..012321fed5e5 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -280,8 +280,8 @@ bool using_native_sched_clock(void)
 	return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 
-void __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
-				       void (*save)(void), void (*restore)(void))
+int __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
+				      void (*save)(void), void (*restore)(void))
 {
 	if (!stable)
 		clear_sched_clock_stable();
@@ -289,6 +289,7 @@ void __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
 	static_call_update(pv_sched_clock, func);
 	x86_platform.save_sched_clock_state = save;
 	x86_platform.restore_sched_clock_state = restore;
+	return 0;
 }
 #else
 u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 45/51] x86/paravirt: Mark __paravirt_set_sched_clock() as __init
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Annotate __paravirt_set_sched_clock() as __init, and make its wrapper
__always_inline to ensure sanitizers don't result in a non-inline version
hanging around.  All callers run during __init, and changing sched_clock
after boot would be all kinds of crazy.

No functional change intended.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/timer.h | 10 +++++-----
 arch/x86/kernel/tsc.c        |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index e97cd1ae03d1..96ae7feac47c 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -14,12 +14,12 @@ extern int no_timer_check;
 extern bool using_native_sched_clock(void);
 
 #ifdef CONFIG_PARAVIRT
-void __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
-				void (*save)(void), void (*restore)(void));
+void __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
+				       void (*save)(void), void (*restore)(void));
 
-static inline void paravirt_set_sched_clock(u64 (*func)(void),
-					    void (*save)(void),
-					    void (*restore)(void))
+static __always_inline void paravirt_set_sched_clock(u64 (*func)(void),
+						     void (*save)(void),
+						     void (*restore)(void))
 {
 	__paravirt_set_sched_clock(func, true, save, restore);
 }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 83353d643150..0f92b29adecc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -280,8 +280,8 @@ bool using_native_sched_clock(void)
 	return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 
-void __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
-				void (*save)(void), void (*restore)(void))
+void __init __paravirt_set_sched_clock(u64 (*func)(void), bool stable,
+				       void (*save)(void), void (*restore)(void))
 {
 	if (!stable)
 		clear_sched_clock_stable();
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 44/51] x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

WARN if kvmclock is still suspended when its wallclock is read, i.e. when
the kernel reads its persistent clock.  The wallclock subtly depends on
the BSP's kvmclock being enabled, and returns garbage if kvmclock is
disabled.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvmclock.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 41aff709b90a..2cc3dd2ba355 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -53,6 +53,8 @@ static struct pvclock_vsyscall_time_info *hvclock_mem;
 DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
 EXPORT_PER_CPU_SYMBOL_GPL(hv_clock_per_cpu);
 
+static bool kvmclock_suspended;
+
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -60,6 +62,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(hv_clock_per_cpu);
  */
 static void kvm_get_wallclock(struct timespec64 *now)
 {
+	WARN_ON_ONCE(kvmclock_suspended);
 	wrmsrq(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
 	preempt_disable();
 	pvclock_read_wallclock(&wall_clock, this_cpu_pvti(), now);
@@ -140,6 +143,7 @@ static void kvm_save_sched_clock_state(void)
 	 * to the old address prior to reconfiguring kvmclock would clobber
 	 * random memory.
 	 */
+	kvmclock_suspended = true;
 	kvmclock_disable();
 }
 
@@ -152,16 +156,19 @@ static void kvm_setup_secondary_clock(void)
 
 static void kvm_restore_sched_clock_state(void)
 {
+	kvmclock_suspended = false;
 	kvm_register_clock("primary cpu, sched_clock resume");
 }
 
 static void kvmclock_suspend(struct clocksource *cs)
 {
+	kvmclock_suspended = true;
 	kvmclock_disable();
 }
 
 static void kvmclock_resume(struct clocksource *cs)
 {
+	kvmclock_suspended = false;
 	kvm_register_clock("primary cpu, clocksource resume");
 }
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 43/51] x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't sched_clock
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Save/restore kvmclock across suspend/resume via clocksource hooks when
kvmclock isn't being used for sched_clock.  This will allow using kvmclock
as a clocksource (or for wallclock!) without also using it for sched_clock.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvmclock.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 61d4d943fe74..41aff709b90a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -152,7 +152,17 @@ static void kvm_setup_secondary_clock(void)
 
 static void kvm_restore_sched_clock_state(void)
 {
-	kvm_register_clock("primary cpu clock, resume");
+	kvm_register_clock("primary cpu, sched_clock resume");
+}
+
+static void kvmclock_suspend(struct clocksource *cs)
+{
+	kvmclock_disable();
+}
+
+static void kvmclock_resume(struct clocksource *cs)
+{
+	kvm_register_clock("primary cpu, clocksource resume");
 }
 
 void kvmclock_cpu_action(enum kvm_guest_cpu_action action)
@@ -223,6 +233,8 @@ static struct clocksource kvm_clock = {
 	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
 	.id		= CSID_X86_KVM_CLK,
 	.enable		= kvm_cs_enable,
+	.suspend	= kvmclock_suspend,
+	.resume		= kvmclock_resume,
 };
 
 static void __init kvmclock_init_mem(void)
@@ -318,6 +330,15 @@ static __init void kvm_sched_clock_init(bool stable)
 				   kvm_save_sched_clock_state,
 				   kvm_restore_sched_clock_state);
 
+	/*
+	 * The BSP's clock is managed via dedicated sched_clock save/restore
+	 * hooks when kvmclock is used as sched_clock, as sched_clock needs to
+	 * be kept alive until the very end of suspend entry, and restored as
+	 * quickly as possible after resume.
+	 */
+	kvm_clock.suspend = NULL;
+	kvm_clock.resume = NULL;
+
 	pr_info("kvm-clock: using sched offset of %llu cycles",
 		kvm_sched_clock_offset);
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 42/51] timekeeping: Resume clocksources before reading persistent clock
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

When resuming timekeeping after suspend, restore clocksources prior to
reading the persistent clock.  Paravirt clocks, e.g. kvmclock, tie the
validity of a PV persistent clock to a clocksource, i.e. reading the PV
persistent clock will return garbage if the underlying PV clocksource
hasn't been enabled.  The flaw has gone unnoticed because kvmclock is a
mess and uses its own suspend/resume hooks instead of the clocksource
suspend/resume hooks, which happens to work by sheer dumb luck (the
kvmclock resume hook runs before timekeeping_resume()).

Note, there is no evidence that any clocksource supported by the kernel
depends on a persistent clock.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/time/timekeeping.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b1b5ec43c0f2..5bc77d36c7a3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2180,11 +2180,16 @@ void timekeeping_resume(void)
 	u64 cycle_now, nsec;
 	unsigned long flags;
 
-	read_persistent_clock64(&ts_new);
-
 	clockevents_resume();
 	clocksource_resume();
 
+	/*
+	 * Read persistent time after clocksources have been resumed.  Paravirt
+	 * clocks have a nasty habit of piggybacking a persistent clock on a
+	 * system clock, and may return garbage if the system clock is suspended.
+	 */
+	read_persistent_clock64(&ts_new);
+
 	raw_spin_lock_irqsave(&tk_core.lock, flags);
 
 	/*
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 41/51] x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during kvmclock_init()
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Clean up the setting of PVCLOCK_TSC_STABLE_BIT during kvmclock init to
make it somewhat obvious that pvclock_read_flags() must be called *after*
pvclock_set_flags().

Note, in theory, a different PV clock could have set PVCLOCK_TSC_STABLE_BIT
in the supported flags, i.e. reading flags only if
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is set could very, very theoretically
result in a change in behavior.  In practice, the kernel only supports a
single PV clock.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/kvmclock.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 5220d205abc7..61d4d943fe74 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -327,7 +327,7 @@ static __init void kvm_sched_clock_init(bool stable)
 
 void __init kvmclock_init(bool prefer_tsc)
 {
-	u8 flags;
+	bool stable = false;
 
 	if (!kvm_para_available() || !kvmclock)
 		return;
@@ -354,11 +354,18 @@ void __init kvmclock_init(bool prefer_tsc)
 	kvm_register_clock("primary cpu clock");
 	pvclock_set_pvti_cpu0_va(hv_clock_boot);
 
-	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
+	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) {
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 
-	flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
-	kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
+		/*
+		 * Check if the clock is stable *after* marking TSC_STABLE as a
+		 * valid flag.
+		 */
+		stable = pvclock_read_flags(&hv_clock_boot[0].pvti) &
+			 PVCLOCK_TSC_STABLE_BIT;
+	}
+
+	kvm_sched_clock_init(stable);
 
 	if (!x86_init.hyper.get_tsc_khz)
 		x86_init.hyper.get_tsc_khz = kvmclock_get_tsc_khz;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 40/51] x86/pvclock: WARN if pvclock's valid_flags are overwritten
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

WARN if the common PV clock valid_flags are overwritten; all PV clocks
expect that they are the one and only PV clock, i.e. don't guard against
another PV clock having modified the flags.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/pvclock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index a51adce67f92..8d098841a225 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -21,6 +21,7 @@ static struct pvclock_vsyscall_time_info *pvti_cpu0_va __ro_after_init;
 
 void __init pvclock_set_flags(u8 flags)
 {
+	WARN_ON(valid_flags);
 	valid_flags = flags;
 }
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 39/51] x86/pvclock: Mark setup helpers and related various as __init/__ro_after_init
From: Sean Christopherson @ 2026-07-01 19:32 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Now that Xen PV clock and kvmclock explicitly do setup only during init,
tag the common PV clock flags/vsyscall variables and their mutators with
__init.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/pvclock.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index b3f81379c2fc..a51adce67f92 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -16,10 +16,10 @@
 #include <asm/pvclock.h>
 #include <asm/vgtod.h>
 
-static u8 valid_flags __read_mostly = 0;
-static struct pvclock_vsyscall_time_info *pvti_cpu0_va __read_mostly;
+static u8 valid_flags __ro_after_init = 0;
+static struct pvclock_vsyscall_time_info *pvti_cpu0_va __ro_after_init;
 
-void pvclock_set_flags(u8 flags)
+void __init pvclock_set_flags(u8 flags)
 {
 	valid_flags = flags;
 }
@@ -153,7 +153,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 	set_normalized_timespec64(ts, now.tv_sec, now.tv_nsec);
 }
 
-void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
+void __init pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
 {
 	WARN_ON(vclock_was_used(VDSO_CLOCKMODE_PVCLOCK));
 	pvti_cpu0_va = pvti;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v5 38/51] x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
From: Sean Christopherson @ 2026-07-01 19:31 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau,
	Rick Edgecombe, Sean Christopherson, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, John Stultz
  Cc: Shuah Khan, H. Peter Anvin, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, linux-doc, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, David Woodhouse,
	Michael Kelley, Thomas Gleixner
In-Reply-To: <20260701193212.749551-1-seanjc@google.com>

Annotate xen_setup_vsyscall_time_info() as being used only during kernel
initialization; it's called only by xen_time_init(), which is already
tagged __init.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/xen/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 8cd8bfaf1320..bc26f00fc53e 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -443,7 +443,7 @@ void xen_restore_time_memory_area(void)
 	xen_sched_clock_offset = xen_clocksource_read() - xen_clock_value_saved;
 }
 
-static void xen_setup_vsyscall_time_info(void)
+static void __init xen_setup_vsyscall_time_info(void)
 {
 	struct vcpu_register_time_memory_area t;
 	struct pvclock_vsyscall_time_info *ti;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox