* [PATCH v4 12/47] x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() => native_calibrate_cpu_late()
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Rename the late CPU calibration routine so that its relationship to the
early routine is more obvious and intuitive.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/tsc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 5b4b6e43c94c..534462c81c78 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -779,7 +779,7 @@ static unsigned long cpu_khz_from_cpuid(void)
* calibrate cpu using pit, hpet, and ptimer methods. They are available
* later in boot after acpi is initialized.
*/
-static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
+static unsigned long native_calibrate_cpu_late(void)
{
u64 tsc1, tsc2, delta, ref1, ref2;
unsigned long tsc_pit_min = ULONG_MAX, tsc_ref_min = ULONG_MAX;
@@ -954,7 +954,7 @@ static unsigned long native_calibrate_cpu(void)
unsigned long tsc_freq = native_calibrate_cpu_early();
if (!tsc_freq)
- tsc_freq = pit_hpet_ptimer_calibrate_cpu();
+ tsc_freq = native_calibrate_cpu_late();
return tsc_freq;
}
@@ -1497,7 +1497,7 @@ static bool __init determine_cpu_tsc_frequencies(bool early,
else
tsc_khz = native_calibrate_tsc();
} else {
- cpu_khz = pit_hpet_ptimer_calibrate_cpu();
+ cpu_khz = native_calibrate_cpu_late();
}
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 11/47] x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Now that getting the CPU and/or TSC frequencies from the hypervisor uses
dedicated hooks, drop x86_platform_ops.calibrate_{cpu,tsc}() and instead
directly invoke the correct helper at each phase of (re)calibration. In
addition to eliminating unnecessary code, this makes it a bit more obvious
when the "late" path invokes pit_hpet_ptimer_calibrate_cpu() instead of
x86_platform_ops.calibrate_cpu().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/tsc.h | 2 --
arch/x86/include/asm/x86_init.h | 4 ----
arch/x86/kernel/tsc.c | 28 ++++++++++++----------------
arch/x86/kernel/x86_init.c | 2 --
4 files changed, 12 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 6cf26e62e9a6..4a224f99c3b9 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -97,8 +97,6 @@ extern void mark_tsc_unstable(char *reason);
extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern void mark_tsc_async_resets(char *reason);
-extern unsigned long native_calibrate_cpu_early(void);
-extern unsigned long native_calibrate_tsc(void);
extern unsigned long long native_sched_clock_from_tsc(u64 tsc);
extern int tsc_clocksource_reliable;
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index a4f8a4aa601d..ada17827ea51 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -292,8 +292,6 @@ struct x86_hyper_runtime {
/**
* struct x86_platform_ops - platform specific runtime functions
- * @calibrate_cpu: calibrate CPU
- * @calibrate_tsc: calibrate TSC, if different from CPU
* @get_wallclock: get time from HW clock like RTC etc.
* @set_wallclock: set time back to HW clock
* @iommu_shutdown: set by an IOMMU driver for shutdown if necessary
@@ -317,8 +315,6 @@ struct x86_hyper_runtime {
* @guest: guest incarnations callbacks
*/
struct x86_platform_ops {
- unsigned long (*calibrate_cpu)(void);
- unsigned long (*calibrate_tsc)(void);
void (*get_wallclock)(struct timespec64 *ts);
int (*set_wallclock)(const struct timespec64 *ts);
void (*iommu_shutdown)(void);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8cef918486db..5b4b6e43c94c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -696,7 +696,7 @@ int __init cpuid_get_tsc_freq(struct cpuid_tsc_info *info)
* native_calibrate_tsc - determine TSC frequency
* Determine TSC frequency via CPUID, else return 0.
*/
-unsigned long native_calibrate_tsc(void)
+static unsigned long native_calibrate_tsc(void)
{
struct cpuid_tsc_info info;
@@ -931,7 +931,7 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
/**
* native_calibrate_cpu_early - can calibrate the cpu early in boot
*/
-unsigned long native_calibrate_cpu_early(void)
+static unsigned long native_calibrate_cpu_early(void)
{
unsigned long flags, fast_calibrate = cpu_khz_from_cpuid();
@@ -945,7 +945,7 @@ unsigned long native_calibrate_cpu_early(void)
return fast_calibrate;
}
-
+#ifndef CONFIG_SMP
/**
* native_calibrate_cpu - calibrate the cpu
*/
@@ -958,6 +958,7 @@ static unsigned long native_calibrate_cpu(void)
return tsc_freq;
}
+#endif
void recalibrate_cpu_khz(void)
{
@@ -967,9 +968,9 @@ void recalibrate_cpu_khz(void)
if (!boot_cpu_has(X86_FEATURE_TSC))
return;
- cpu_khz = x86_platform.calibrate_cpu();
+ cpu_khz = native_calibrate_cpu();
if (!boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ))
- tsc_khz = x86_platform.calibrate_tsc();
+ tsc_khz = native_calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
@@ -1483,17 +1484,19 @@ static bool __init determine_cpu_tsc_frequencies(bool early,
WARN_ON(cpu_khz || tsc_khz);
if (early) {
+ /*
+ * Early CPU calibration can only use methods that are available
+ * early in boot (obviously).
+ */
if (known_cpu_khz)
cpu_khz = known_cpu_khz;
else
- cpu_khz = x86_platform.calibrate_cpu();
+ cpu_khz = native_calibrate_cpu_early();
if (known_tsc_khz)
tsc_khz = known_tsc_khz;
else
- tsc_khz = x86_platform.calibrate_tsc();
+ tsc_khz = native_calibrate_tsc();
} else {
- /* We should not be here with non-native cpu calibration */
- WARN_ON(x86_platform.calibrate_cpu != native_calibrate_cpu);
cpu_khz = pit_hpet_ptimer_calibrate_cpu();
}
@@ -1590,13 +1593,6 @@ void __init tsc_init(void)
return;
}
- /*
- * native_calibrate_cpu_early can only calibrate using methods that are
- * available early in boot.
- */
- if (x86_platform.calibrate_cpu == native_calibrate_cpu_early)
- x86_platform.calibrate_cpu = native_calibrate_cpu;
-
if (!tsc_khz) {
/* We failed to determine frequencies earlier, try again */
if (!determine_cpu_tsc_frequencies(false, 0, 0)) {
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index ebefb77c37bb..c674cbbd466d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -144,8 +144,6 @@ static void enc_kexec_finish_noop(void) {}
static bool is_private_mmio_noop(u64 addr) {return false; }
struct x86_platform_ops x86_platform __ro_after_init = {
- .calibrate_cpu = native_calibrate_cpu_early,
- .calibrate_tsc = native_calibrate_tsc,
.get_wallclock = mach_get_cmos_time,
.set_wallclock = mach_set_cmos_time,
.iommu_shutdown = iommu_shutdown_noop,
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 10/47] x86/tsc: Consolidate forcing of X86_FEATURE_TSC_KNOWN_FREQ for PV code
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Now that all paravirt code that explicitly specifies the TSC frequency
also sets X86_FEATURE_TSC_KNOWN_FREQ, replace all of the one-off code
and simply set X86_FEATURE_TSC_KNOWN_FREQ if the TSC frequency is known.
Do NOT force set TSC_KNOWN_FREQ if the "known" TSC frequency was provided
by the user. Per commit bd35c77e32e4 ("x86/tsc: Add tsc_early_khz command
line parameter"), one of the goals of the param is to allow the refined
calibration work "to do meaningful error checking".
Note, preferring the user-provided TSC frequency over the frequency from
the hypervisor or trusted firmware, while simultaneously not treating the
user-provided frequency as gospel, is obviously incongruous. Sweep the
problem under the rug for now to avoid opening a big can of worms that
likely doesn't have a great answer.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/sev/core.c | 1 -
arch/x86/coco/tdx/tdx.c | 1 -
arch/x86/kernel/cpu/acrn.c | 2 --
arch/x86/kernel/cpu/mshyperv.c | 1 -
arch/x86/kernel/cpu/vmware.c | 2 --
arch/x86/kernel/jailhouse.c | 1 -
arch/x86/kernel/kvmclock.c | 1 -
arch/x86/kernel/tsc.c | 9 +++++++++
arch/x86/xen/time.c | 1 -
9 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index bc5ae9ef74da..72313b36b6f5 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2027,7 +2027,6 @@ unsigned int __init snp_secure_tsc_init(void)
secrets = (__force struct snp_secrets_page *)mem;
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
rdmsrq(MSR_AMD64_GUEST_TSC_FREQ, tsc_freq_mhz);
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 5d7976359220..ab463c2b2dab 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -1205,7 +1205,6 @@ unsigned int __init tdx_tsc_init(void)
/* TSC is the only reliable clock in TDX guest */
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
return info.tsc_khz;
}
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 0303fe6a2efa..ad8f2da8003b 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -33,8 +33,6 @@ static void __init acrn_init_platform(void)
{
/* Install system interrupt handler for ACRN hypervisor callback */
sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
-
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
static bool acrn_x2apic_available(void)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 733e12d5a7dd..f8653fc05a40 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -575,7 +575,6 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_init.hyper.get_tsc_khz = hv_get_tsc_khz;
x86_init.hyper.get_cpu_khz = hv_get_tsc_khz;
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
if (ms_hyperv.priv_high & HV_ISOLATION) {
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 7c8cf4885e82..2d0624c66799 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -390,8 +390,6 @@ static void __init vmware_set_capabilities(void)
{
setup_force_cpu_cap(X86_FEATURE_CONSTANT_TSC);
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
- if (vmware_tsc_khz)
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
if (vmware_hypercall_mode == CPUID_VMWARE_FEATURES_ECX_VMCALL)
setup_force_cpu_cap(X86_FEATURE_VMCALL);
else if (vmware_hypercall_mode == CPUID_VMWARE_FEATURES_ECX_VMMCALL)
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index 4034e08c5f11..e4d7d9e2cd69 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -255,7 +255,6 @@ static void __init jailhouse_init_platform(void)
pr_debug("Jailhouse: PM-Timer IO Port: %#x\n", pmtmr_ioport);
precalibrated_tsc_khz = setup_data.v1.tsc_khz;
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
pci_probe = 0;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ec888eef74aa..69752b170e0a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -117,7 +117,6 @@ static inline void kvm_sched_clock_init(bool stable)
*/
static unsigned int __init kvm_get_tsc_khz(void)
{
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
return pvclock_tsc_khz(this_cpu_pvti());
}
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 362596612442..8cef918486db 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1569,6 +1569,15 @@ void __init tsc_early_init(void)
if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
known_tsc_khz = x86_init.hyper.get_tsc_khz();
+ /*
+ * Mark the TSC frequency as known if it was obtained from a hypervisor
+ * or trusted firmware. Don't mark the frequency as known if the user
+ * specified the frequency, as the user-provided frequency is intended
+ * as a "starting point", not a known, guaranteed frequency.
+ */
+ if (known_tsc_khz && !tsc_early_khz)
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
return;
tsc_enable_sched_clock();
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 1adb44fdddb2..487ad838c441 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -43,7 +43,6 @@ static unsigned int __init xen_tsc_khz(void)
struct pvclock_vcpu_time_info *info =
&HYPERVISOR_shared_info->vcpu_info[0].time;
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
return pvclock_tsc_khz(info);
}
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 09/47] x86/acrn: Mark TSC frequency as known when using ACRN for calibration
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Mark the TSC frequency as known when using ACRN's PV CPUID information.
Per commit 81a71f51b89e ("x86/acrn: Set up timekeeping") and common sense,
the TSC freq is explicitly provided by the hypervisor.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/cpu/acrn.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index ad8f2da8003b..0303fe6a2efa 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -33,6 +33,8 @@ static void __init acrn_init_platform(void)
{
/* Install system interrupt handler for ACRN hypervisor callback */
sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
+
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
static bool acrn_x2apic_available(void)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 08/47] x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
instead of overriding seemingly generic platform hooks, and explicitly
priotize hypervisor-provided frequencies over native methods, but do NOT
clobber the frequency obtained from trusted firmware. While shuffling the
hooks around is arguably "six of one, half dozen of the other", scoping
them to x86_hyper_init makes their purpose more obvious, and allows for
explicitly defining the priority of sources (as is done here).
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/acrn.h | 5 -----
arch/x86/include/asm/x86_init.h | 4 ++++
arch/x86/kernel/cpu/acrn.c | 10 +++++++---
arch/x86/kernel/cpu/mshyperv.c | 6 +++---
arch/x86/kernel/cpu/vmware.c | 8 ++++----
arch/x86/kernel/jailhouse.c | 6 +++---
arch/x86/kernel/kvmclock.c | 6 +++---
arch/x86/kernel/tsc.c | 23 +++++++++++++++++++----
arch/x86/xen/time.c | 4 ++--
9 files changed, 45 insertions(+), 27 deletions(-)
diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index db42b477c41d..a892179c61c6 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -32,11 +32,6 @@ static inline u32 acrn_cpuid_base(void)
return 0;
}
-static inline unsigned long acrn_get_tsc_khz(void)
-{
- return cpuid_eax(ACRN_CPUID_TIMING_INFO);
-}
-
/*
* Hypercalls for ACRN
*
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6c8a6ead84f6..a4f8a4aa601d 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -120,6 +120,8 @@ struct x86_init_pci {
* @msi_ext_dest_id: MSI supports 15-bit APIC IDs
* @init_mem_mapping: setup early mappings during init_mem_mapping()
* @init_after_bootmem: guest init after boot allocator is finished
+ * @get_tsc_khz: get the TSC frequency (returns 0 if frequency is unknown)
+ * @get_cpu_khz: get the CPU frequency (returns 0 if frequency is unknown)
*/
struct x86_hyper_init {
void (*init_platform)(void);
@@ -128,6 +130,8 @@ struct x86_hyper_init {
bool (*msi_ext_dest_id)(void);
void (*init_mem_mapping)(void);
void (*init_after_bootmem)(void);
+ unsigned int (*get_tsc_khz)(void);
+ unsigned int (*get_cpu_khz)(void);
};
/**
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index dc119af83524..ad8f2da8003b 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -24,13 +24,15 @@ static u32 __init acrn_detect(void)
return acrn_cpuid_base();
}
+static unsigned int __init acrn_get_tsc_khz(void)
+{
+ return cpuid_eax(ACRN_CPUID_TIMING_INFO);
+}
+
static void __init acrn_init_platform(void)
{
/* Install system interrupt handler for ACRN hypervisor callback */
sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
-
- x86_platform.calibrate_tsc = acrn_get_tsc_khz;
- x86_platform.calibrate_cpu = acrn_get_tsc_khz;
}
static bool acrn_x2apic_available(void)
@@ -78,4 +80,6 @@ const __initconst struct hypervisor_x86 x86_hyper_acrn = {
.type = X86_HYPER_ACRN,
.init.init_platform = acrn_init_platform,
.init.x2apic_available = acrn_x2apic_available,
+ .init.get_tsc_khz = acrn_get_tsc_khz,
+ .init.get_cpu_khz = acrn_get_tsc_khz,
};
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 185d4f677ec0..733e12d5a7dd 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -395,7 +395,7 @@ static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs)
}
#endif
-static unsigned long hv_get_tsc_khz(void)
+static unsigned int __init hv_get_tsc_khz(void)
{
unsigned long freq;
@@ -573,8 +573,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.features & HV_ACCESS_FREQUENCY_MSRS &&
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
- x86_platform.calibrate_tsc = hv_get_tsc_khz;
- x86_platform.calibrate_cpu = hv_get_tsc_khz;
+ x86_init.hyper.get_tsc_khz = hv_get_tsc_khz;
+ x86_init.hyper.get_cpu_khz = hv_get_tsc_khz;
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 34b73573b108..7c8cf4885e82 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -64,7 +64,7 @@ struct vmware_steal_time {
u64 reserved[7];
};
-static unsigned long vmware_tsc_khz __ro_after_init;
+static unsigned long vmware_tsc_khz __initdata;
static u8 vmware_hypercall_mode __ro_after_init;
unsigned long vmware_hypercall_slow(unsigned long cmd,
@@ -137,7 +137,7 @@ static inline int __vmware_platform(void)
return eax != UINT_MAX && ebx == VMWARE_HYPERVISOR_MAGIC;
}
-static unsigned long vmware_get_tsc_khz(void)
+static unsigned int __init vmware_get_tsc_khz(void)
{
return vmware_tsc_khz;
}
@@ -419,8 +419,8 @@ static void __init vmware_platform_setup(void)
}
vmware_tsc_khz = tsc_khz;
- x86_platform.calibrate_tsc = vmware_get_tsc_khz;
- x86_platform.calibrate_cpu = vmware_get_tsc_khz;
+ x86_init.hyper.get_tsc_khz = vmware_get_tsc_khz;
+ x86_init.hyper.get_cpu_khz = vmware_get_tsc_khz;
#ifdef CONFIG_X86_LOCAL_APIC
/* Skip lapic calibration since we know the bus frequency. */
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index f58ce9220e0f..4034e08c5f11 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -68,7 +68,7 @@ static void __init jailhouse_timer_init(void)
lapic_timer_period = setup_data.v1.apic_khz * (1000 / HZ);
}
-static unsigned long jailhouse_get_tsc(void)
+static unsigned int __init jailhouse_get_tsc(void)
{
return precalibrated_tsc_khz;
}
@@ -210,8 +210,6 @@ static void __init jailhouse_init_platform(void)
x86_init.mpparse.parse_smp_cfg = jailhouse_parse_smp_config;
x86_init.pci.arch_init = jailhouse_pci_arch_init;
- x86_platform.calibrate_cpu = jailhouse_get_tsc;
- x86_platform.calibrate_tsc = jailhouse_get_tsc;
x86_platform.get_wallclock = jailhouse_get_wallclock;
x86_platform.legacy.rtc = 0;
x86_platform.legacy.warm_reset = 0;
@@ -293,5 +291,7 @@ const struct hypervisor_x86 x86_hyper_jailhouse __refconst = {
.detect = jailhouse_detect,
.init.init_platform = jailhouse_init_platform,
.init.x2apic_available = jailhouse_x2apic_available,
+ .init.get_tsc_khz = jailhouse_get_tsc,
+ .init.get_cpu_khz = jailhouse_get_tsc,
.ignore_nopv = true,
};
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index b5991d53fc0e..ec888eef74aa 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -115,7 +115,7 @@ static inline void kvm_sched_clock_init(bool stable)
* poll of guests can be running and trouble each other. So we preset
* lpj here
*/
-static unsigned long kvm_get_tsc_khz(void)
+static unsigned int __init kvm_get_tsc_khz(void)
{
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
return pvclock_tsc_khz(this_cpu_pvti());
@@ -321,8 +321,8 @@ void __init kvmclock_init(void)
flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
- x86_platform.calibrate_tsc = kvm_get_tsc_khz;
- x86_platform.calibrate_cpu = kvm_get_tsc_khz;
+ x86_init.hyper.get_tsc_khz = kvm_get_tsc_khz;
+ x86_init.hyper.get_cpu_khz = kvm_get_tsc_khz;
x86_platform.get_wallclock = kvm_get_wallclock;
x86_platform.set_wallclock = kvm_set_wallclock;
#ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2603f136e29b..362596612442 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1476,13 +1476,17 @@ static int __init init_tsc_clocksource(void)
device_initcall(init_tsc_clocksource);
static bool __init determine_cpu_tsc_frequencies(bool early,
+ unsigned int known_cpu_khz,
unsigned int known_tsc_khz)
{
/* Make sure that cpu and tsc are not already calibrated */
WARN_ON(cpu_khz || tsc_khz);
if (early) {
- cpu_khz = x86_platform.calibrate_cpu();
+ if (known_cpu_khz)
+ cpu_khz = known_cpu_khz;
+ else
+ cpu_khz = x86_platform.calibrate_cpu();
if (known_tsc_khz)
tsc_khz = known_tsc_khz;
else
@@ -1539,7 +1543,7 @@ static void __init tsc_enable_sched_clock(void)
void __init tsc_early_init(void)
{
- unsigned int known_tsc_khz = 0;
+ unsigned int known_cpu_khz = 0, known_tsc_khz = 0;
if (!boot_cpu_has(X86_FEATURE_TSC))
return;
@@ -1547,6 +1551,9 @@ void __init tsc_early_init(void)
if (is_early_uv_system())
return;
+ if (x86_init.hyper.get_cpu_khz)
+ known_cpu_khz = x86_init.hyper.get_cpu_khz();
+
if (tsc_early_khz)
known_tsc_khz = tsc_early_khz;
else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
@@ -1554,7 +1561,15 @@ void __init tsc_early_init(void)
else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
known_tsc_khz = tdx_tsc_init();
- if (!determine_cpu_tsc_frequencies(true, known_tsc_khz))
+ /*
+ * If the TSC frequency is still unknown, i.e. not provided by the user
+ * or by trusted firmware, try to get it from the hypervisor (which is
+ * untrusted when running as a CoCo guest).
+ */
+ if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
+ known_tsc_khz = x86_init.hyper.get_tsc_khz();
+
+ if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
return;
tsc_enable_sched_clock();
}
@@ -1575,7 +1590,7 @@ void __init tsc_init(void)
if (!tsc_khz) {
/* We failed to determine frequencies earlier, try again */
- if (!determine_cpu_tsc_frequencies(false, 0)) {
+ if (!determine_cpu_tsc_frequencies(false, 0, 0)) {
mark_tsc_unstable("could not calculate TSC khz");
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index d62c14334b35..1adb44fdddb2 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -38,7 +38,7 @@
static u64 xen_sched_clock_offset __read_mostly;
/* Get the TSC speed from Xen */
-static unsigned long xen_tsc_khz(void)
+static unsigned int __init xen_tsc_khz(void)
{
struct pvclock_vcpu_time_info *info =
&HYPERVISOR_shared_info->vcpu_info[0].time;
@@ -569,7 +569,7 @@ static void __init xen_init_time_common(void)
static_call_update(pv_steal_clock, xen_steal_clock);
paravirt_set_sched_clock(xen_sched_clock);
- x86_platform.calibrate_tsc = xen_tsc_khz;
+ x86_init.hyper.get_tsc_khz = xen_tsc_khz;
x86_platform.get_wallclock = xen_get_wallclock;
}
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 07/47] x86/tdx: Force TSC frequency with CPUID-based info provided by the TDX-Module
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
When running as a TDX guest, explicitly set the TSC frequency to a known
value, using CPUID-based information, instead of potentially relying on a
hypervisor-controlled PV routine. For TDX guests, CPUID.0x15 is always
emulated by the TDX-Module, i.e. the information from CPUID is more
trustworthy than the information provided by the hypervisor.
To maintain backwards compatibility with TDX guest kernels that use native
calibration, and because it's the least awful option, retain
native_calibrate_tsc()'s stuffing of the local APIC bus period using the
core crystal frequency. While it's entirely possible for the hypervisor
to emulate the APIC timer at a different frequency than the core crystal
frequency, the commonly accepted interpretation of Intel's SDM is that APIC
timer runs at the core crystal frequency when that latter is enumerated via
CPUID:
The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is enumerated
in CPUID leaf 0x15).
If the hypervisor is malicious and deliberately runs the APIC timer at the
wrong frequency, nothing would stop the hypervisor from modifying the
frequency at any time, i.e. attempting to manually calibrate the frequency
out of paranoia would be futile.
Deliberately leave CPU frequency calibration as is, since the TDX-Module
doesn't provide any guarantees with respect to CPUID.0x16.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/tdx/tdx.c | 20 +++++++++++++++++---
arch/x86/include/asm/tdx.h | 2 ++
arch/x86/kernel/tsc.c | 3 +++
3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 29b6f1ed59ec..5d7976359220 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -8,6 +8,7 @@
#include <linux/export.h>
#include <linux/io.h>
#include <linux/kexec.h>
+#include <asm/apic.h>
#include <asm/coco.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
@@ -1123,9 +1124,6 @@ void __init tdx_early_init(void)
setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
- /* TSC is the only reliable clock in TDX guest */
- setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
-
cc_vendor = CC_VENDOR_INTEL;
/* Configure the TD */
@@ -1195,3 +1193,19 @@ void __init tdx_early_init(void)
tdx_announce();
}
+
+unsigned int __init tdx_tsc_init(void)
+{
+ struct cpuid_tsc_info info;
+
+ if (WARN_ON_ONCE(cpuid_get_tsc_freq(&info)))
+ return 0;
+
+ lapic_timer_period = info.crystal_khz * 1000 / HZ;
+
+ /* TSC is the only reliable clock in TDX guest */
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ return info.tsc_khz;
+}
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index e5a9cf656c07..1d841d464aa4 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -67,6 +67,7 @@ struct ve_info {
#ifdef CONFIG_INTEL_TDX_GUEST
void __init tdx_early_init(void);
+unsigned int __init tdx_tsc_init(void);
void tdx_get_ve_info(struct ve_info *ve);
@@ -88,6 +89,7 @@ void __init tdx_dump_td_ctls(u64 td_ctls);
#else
static inline void tdx_early_init(void) { };
+static inline unsigned int tdx_tsc_init(void) { return 0; }
static inline void tdx_halt(void) { };
static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2b8f94c3fcc7..2603f136e29b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -34,6 +34,7 @@
#include <asm/topology.h>
#include <asm/uv/uv.h>
#include <asm/sev.h>
+#include <asm/tdx.h>
unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */
EXPORT_SYMBOL(cpu_khz);
@@ -1550,6 +1551,8 @@ void __init tsc_early_init(void)
known_tsc_khz = tsc_early_khz;
else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
known_tsc_khz = snp_secure_tsc_init();
+ else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
+ known_tsc_khz = tdx_tsc_init();
if (!determine_cpu_tsc_frequencies(true, known_tsc_khz))
return;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 06/47] x86/sev: Shove SNP's secure/trusted TSC frequency directly into "calibration"
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
As a first step towards dropping .calibrate_{cpu,tsc}() and explicitly
defining precedence/priority for "calibration" routines, pass the secure
TSC frequency obtained from SNP firmware directly to
determine_cpu_tsc_frequencies() instead of overriding the .calibrate_tsc()
hook.
Unlike the native calibration routines, all of the paravirtual overrides,
including SNP and TDX, are constant in the sense that the frequency
provided by the hypervisor or trusted firmware is fixed, known, and always
available during early boot. More importantly, for CoCo (SNP and TDX) VMs,
it's imperative that the kernel uses the frequency provided by the trusted
firmware, not by the untrusted hypervisor. Enforcing the priority between
sources by carefully ordering seemingly unrelated init calls, so that the
trusted override "wins", is brittle and all but impossible to follow.
While it's rather weird, deliberately prioritize tsc_early_khz over all
else to maintain existing behavior.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/sev/core.c | 14 ++++----------
arch/x86/include/asm/sev.h | 4 ++--
arch/x86/kernel/tsc.c | 19 ++++++++++++-------
3 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 403dcea86452..bc5ae9ef74da 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -99,7 +99,6 @@ static const char * const sev_status_feat_names[] = {
*/
static u64 snp_tsc_scale __ro_after_init;
static u64 snp_tsc_offset __ro_after_init;
-static unsigned long snp_tsc_freq_khz __ro_after_init;
DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
@@ -2014,15 +2013,10 @@ void __init snp_secure_tsc_prepare(void)
pr_debug("SecureTSC enabled");
}
-static unsigned long securetsc_get_tsc_khz(void)
-{
- return snp_tsc_freq_khz;
-}
-
-void __init snp_secure_tsc_init(void)
+unsigned int __init snp_secure_tsc_init(void)
{
+ unsigned long snp_tsc_freq_khz, tsc_freq_mhz;
struct snp_secrets_page *secrets;
- unsigned long tsc_freq_mhz;
void *mem;
mem = early_memremap_encrypted(sev_secrets_pa, PAGE_SIZE);
@@ -2043,7 +2037,7 @@ void __init snp_secure_tsc_init(void)
snp_tsc_freq_khz = SNP_SCALE_TSC_FREQ(tsc_freq_mhz * 1000, secrets->tsc_factor);
- x86_platform.calibrate_tsc = securetsc_get_tsc_khz;
-
early_memunmap(mem, PAGE_SIZE);
+
+ return snp_tsc_freq_khz;
}
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 594cfa19cbd4..05ebf0b73ef4 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -530,7 +530,7 @@ int snp_send_guest_request(struct snp_msg_desc *mdesc, struct snp_guest_req *req
int snp_svsm_vtpm_send_command(u8 *buffer);
void __init snp_secure_tsc_prepare(void);
-void __init snp_secure_tsc_init(void);
+unsigned int snp_secure_tsc_init(void);
enum es_result savic_register_gpa(u64 gpa);
enum es_result savic_unregister_gpa(u64 *gpa);
u64 savic_ghcb_msr_read(u32 reg);
@@ -637,7 +637,7 @@ static inline int snp_send_guest_request(struct snp_msg_desc *mdesc,
struct snp_guest_req *req) { return -ENODEV; }
static inline int snp_svsm_vtpm_send_command(u8 *buffer) { return -ENODEV; }
static inline void __init snp_secure_tsc_prepare(void) { }
-static inline void __init snp_secure_tsc_init(void) { }
+static inline unsigned int __init snp_secure_tsc_init(void) { return 0; }
static inline void sev_evict_cache(void *va, int npages) {}
static inline enum es_result savic_register_gpa(u64 gpa) { return ES_UNSUPPORTED; }
static inline enum es_result savic_unregister_gpa(u64 *gpa) { return ES_UNSUPPORTED; }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 833eed5c048a..2b8f94c3fcc7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1474,15 +1474,16 @@ static int __init init_tsc_clocksource(void)
*/
device_initcall(init_tsc_clocksource);
-static bool __init determine_cpu_tsc_frequencies(bool early)
+static bool __init determine_cpu_tsc_frequencies(bool early,
+ unsigned int known_tsc_khz)
{
/* Make sure that cpu and tsc are not already calibrated */
WARN_ON(cpu_khz || tsc_khz);
if (early) {
cpu_khz = x86_platform.calibrate_cpu();
- if (tsc_early_khz)
- tsc_khz = tsc_early_khz;
+ if (known_tsc_khz)
+ tsc_khz = known_tsc_khz;
else
tsc_khz = x86_platform.calibrate_tsc();
} else {
@@ -1537,16 +1538,20 @@ static void __init tsc_enable_sched_clock(void)
void __init tsc_early_init(void)
{
+ unsigned int known_tsc_khz = 0;
+
if (!boot_cpu_has(X86_FEATURE_TSC))
return;
/* Don't change UV TSC multi-chassis synchronization */
if (is_early_uv_system())
return;
- if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
- snp_secure_tsc_init();
+ if (tsc_early_khz)
+ known_tsc_khz = tsc_early_khz;
+ else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
+ known_tsc_khz = snp_secure_tsc_init();
- if (!determine_cpu_tsc_frequencies(true))
+ if (!determine_cpu_tsc_frequencies(true, known_tsc_khz))
return;
tsc_enable_sched_clock();
}
@@ -1567,7 +1572,7 @@ void __init tsc_init(void)
if (!tsc_khz) {
/* We failed to determine frequencies earlier, try again */
- if (!determine_cpu_tsc_frequencies(false)) {
+ if (!determine_cpu_tsc_frequencies(false, 0)) {
mark_tsc_unstable("could not calculate TSC khz");
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 05/47] x86/sev: Move check for SNP Secure TSC support to tsc_early_init()
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Move the check on having a Secure TSC to the common tsc_early_init() so
that it's obvious that having a Secure TSC is conditional, and to prepare
for adding TDX to the mix (blindly initializing *both* SNP and TDX TSC
logic looks especially weird).
No functional change intended.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/sev/core.c | 3 ---
arch/x86/kernel/tsc.c | 3 ++-
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 665de1aea0ee..403dcea86452 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2025,9 +2025,6 @@ void __init snp_secure_tsc_init(void)
unsigned long tsc_freq_mhz;
void *mem;
- if (!cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
- return;
-
mem = early_memremap_encrypted(sev_secrets_pa, PAGE_SIZE);
if (!mem) {
pr_err("Unable to get TSC_FACTOR: failed to map the SNP secrets page.\n");
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f7f561722efa..833eed5c048a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1543,7 +1543,8 @@ void __init tsc_early_init(void)
if (is_early_uv_system())
return;
- snp_secure_tsc_init();
+ if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
+ snp_secure_tsc_init();
if (!determine_cpu_tsc_frequencies(true))
return;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 04/47] x86/sev: Don't override CPU frequency calibration for SNP's Secure TSC
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Don't override the kernel's CPU frequency calibration routine when
registering SNP's Secure TSC calibration routine. SNP (the architecture)
provides zero guarantees that the CPU runs at the same frequency as the
TSC. The justification for clobbering the CPU routine was:
Since the difference between CPU base and TSC frequency does not apply
in this case, the same callback is being used.
but that's simply not true. E.g. if APERF/MPERF is exposed to the VM, then
the CPU frequency absolutely does matter.
While relying on heuristics and/or the untrusted hypervisor to provide the
CPU frequency isn't ideal, it's at least not outright wrong.
Fixes: 73bbf3b0fbba ("x86/tsc: Init the TSC for Secure TSC guests")
Cc: Nikunj A Dadhania <nikunj@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/sev/core.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index ed0ac52a765e..665de1aea0ee 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2046,7 +2046,6 @@ void __init snp_secure_tsc_init(void)
snp_tsc_freq_khz = SNP_SCALE_TSC_FREQ(tsc_freq_mhz * 1000, secrets->tsc_factor);
- x86_platform.calibrate_cpu = securetsc_get_tsc_khz;
x86_platform.calibrate_tsc = securetsc_get_tsc_khz;
early_memunmap(mem, PAGE_SIZE);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 03/47] x86/sev: Mark TSC as reliable when configuring Secure TSC
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Move the code to mark the TSC as reliable from sme_early_init() to
snp_secure_tsc_init(). The only reader of TSC_RELIABLE is the aptly
named check_system_tsc_reliable(), which runs in tsc_init(), i.e.
after snp_secure_tsc_init().
This will allow consolidating the handling of TSC_KNOWN_FREQ and
TSC_RELIABLE when overriding the TSC calibration routine.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/coco/sev/core.c | 2 ++
arch/x86/mm/mem_encrypt_amd.c | 3 ---
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index ecd77d3217f3..ed0ac52a765e 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2037,6 +2037,8 @@ void __init snp_secure_tsc_init(void)
secrets = (__force struct snp_secrets_page *)mem;
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+
rdmsrq(MSR_AMD64_GUEST_TSC_FREQ, tsc_freq_mhz);
/* Extract the GUEST TSC MHZ from BIT[17:0], rest is reserved space */
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 2f8c32173972..6c3af974c7c2 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -535,9 +535,6 @@ void __init sme_early_init(void)
*/
x86_init.resources.dmi_setup = snp_dmi_setup;
}
-
- if (sev_status & MSR_AMD64_SNP_SECURE_TSC)
- setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
}
void __init mem_encrypt_free_decrypted_mem(void)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Extract retrieval of TSC frequency information from CPUID into standalone
helpers so that TDX guest support can reuse the logic. Provide a version
that includes the multiplier math as TDX does NOT want to use
native_calibrate_tsc()'s fallback logic that derives the TSC frequency
based on CPUID.0x16, when the core crystal frequency isn't known.
Opportunistically drop native_calibrate_tsc()'s "== 0" and "!= 0" checks
in favor of the kernel's preferred style.
No functional change intended.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/tsc.h | 8 +++++
arch/x86/kernel/tsc.c | 67 +++++++++++++++++++++++++-------------
2 files changed, 52 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 4f7f09f50552..6cf26e62e9a6 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -83,6 +83,14 @@ static inline cycles_t get_cycles(void)
}
#define get_cycles get_cycles
+struct cpuid_tsc_info {
+ unsigned int denominator;
+ unsigned int numerator;
+ unsigned int crystal_khz;
+ unsigned int tsc_khz;
+};
+extern int __init cpuid_get_tsc_freq(struct cpuid_tsc_info *info);
+
extern void tsc_early_init(void);
extern void tsc_init(void);
extern void mark_tsc_unstable(char *reason);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 08cf6625d484..f7f561722efa 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -658,46 +658,67 @@ static unsigned long quick_pit_calibrate(void)
return delta;
}
+static int cpuid_get_tsc_info(struct cpuid_tsc_info *info)
+{
+ unsigned int ecx_hz, edx;
+
+ memset(info, 0, sizeof(*info));
+
+ if (boot_cpu_data.cpuid_level < CPUID_LEAF_TSC)
+ return -ENOENT;
+
+ /* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
+ cpuid(CPUID_LEAF_TSC, &info->denominator, &info->numerator, &ecx_hz, &edx);
+
+ if (!info->denominator || !info->numerator)
+ return -ENOENT;
+
+ /*
+ * Note, some CPUs provide the multiplier information, but not the core
+ * crystal frequency. The multiplier information is still useful for
+ * such CPUs, as the crystal frequency can be gleaned from CPUID.0x16.
+ */
+ info->crystal_khz = ecx_hz / 1000;
+ return 0;
+}
+
+int __init cpuid_get_tsc_freq(struct cpuid_tsc_info *info)
+{
+ if (cpuid_get_tsc_info(info) || !info->crystal_khz)
+ return -ENOENT;
+
+ info->tsc_khz = info->crystal_khz * info->numerator / info->denominator;
+ return 0;
+}
+
/**
* native_calibrate_tsc - determine TSC frequency
* Determine TSC frequency via CPUID, else return 0.
*/
unsigned long native_calibrate_tsc(void)
{
- unsigned int eax_denominator, ebx_numerator, ecx_hz, edx;
- unsigned int crystal_khz;
+ struct cpuid_tsc_info info;
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return 0;
- if (boot_cpu_data.cpuid_level < CPUID_LEAF_TSC)
+ if (cpuid_get_tsc_info(&info))
return 0;
- eax_denominator = ebx_numerator = ecx_hz = edx = 0;
-
- /* CPUID 15H TSC/Crystal ratio, plus optionally Crystal Hz */
- cpuid(CPUID_LEAF_TSC, &eax_denominator, &ebx_numerator, &ecx_hz, &edx);
-
- if (ebx_numerator == 0 || eax_denominator == 0)
- return 0;
-
- crystal_khz = ecx_hz / 1000;
-
/*
* Denverton SoCs don't report crystal clock, and also don't support
* CPUID_LEAF_FREQ for the calculation below, so hardcode the 25MHz
* crystal clock.
*/
- if (crystal_khz == 0 &&
- boot_cpu_data.x86_vfm == INTEL_ATOM_GOLDMONT_D)
- crystal_khz = 25000;
+ if (!info.crystal_khz && boot_cpu_data.x86_vfm == INTEL_ATOM_GOLDMONT_D)
+ info.crystal_khz = 25000;
/*
* TSC frequency reported directly by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
* is considered a known frequency.
*/
- if (crystal_khz != 0)
+ if (info.crystal_khz)
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
/*
@@ -705,15 +726,15 @@ unsigned long native_calibrate_tsc(void)
* clock, but we can easily calculate it to a high degree of accuracy
* by considering the crystal ratio and the CPU speed.
*/
- if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= CPUID_LEAF_FREQ) {
+ if (!info.crystal_khz && boot_cpu_data.cpuid_level >= CPUID_LEAF_FREQ) {
unsigned int eax_base_mhz, ebx, ecx, edx;
cpuid(CPUID_LEAF_FREQ, &eax_base_mhz, &ebx, &ecx, &edx);
- crystal_khz = eax_base_mhz * 1000 *
- eax_denominator / ebx_numerator;
+ info.crystal_khz = eax_base_mhz * 1000 *
+ info.denominator / info.numerator;
}
- if (crystal_khz == 0)
+ if (!info.crystal_khz)
return 0;
/*
@@ -730,10 +751,10 @@ unsigned long native_calibrate_tsc(void)
* lapic_timer_period here to avoid having to calibrate the APIC
* timer later.
*/
- lapic_timer_period = crystal_khz * 1000 / HZ;
+ lapic_timer_period = info.crystal_khz * 1000 / HZ;
#endif
- return crystal_khz * ebx_numerator / eax_denominator;
+ return info.crystal_khz * info.numerator / info.denominator;
}
static unsigned long cpu_khz_from_cpuid(void)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
In-Reply-To: <20260529144435.704127-1-seanjc@google.com>
Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
frequency. In practice, this is likely one big nop, as re-calibration is
used only for SMP=n kernels, and only for hardware that is 20+ years old,
i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/tsc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index c5110eb554bc..08cf6625d484 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -946,7 +946,8 @@ void recalibrate_cpu_khz(void)
return;
cpu_khz = x86_platform.calibrate_cpu();
- tsc_khz = x86_platform.calibrate_tsc();
+ if (!boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ))
+ tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related
* [PATCH v4 00/47] x86: Try to wrangle PV clocks vs. TSC
From: Sean Christopherson @ 2026-05-29 14:43 UTC (permalink / raw)
To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
Broadcom internal kernel review list, Boris Ostrovsky,
Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
virtualization, xen-devel, David Woodhouse, Tom Lendacky,
Nikunj A Dadhania, David Woodhouse, Michael Kelley,
Thomas Gleixner
Well, the number of patches in the series is going in the wrong direction,
but I'm much happier with this version, which eschews the x86_platform
overrides entirely in favor of a fixed sequence for selecting the TSC/CPU
frequency "routine".
Given that previous versions had fatal NULL pointer deref bugs that affected
VMware and Xen, this series needs testing and acks from those maintainers.
The primary goal of this series to fix flaws with SNP and TDX guests where a
PV clock provided by the untrusted hypervisor is used instead of the secure
TSC that is controlled by trusted firmware.
The secondary goal is modernize running under KVM. Currently, KVM guests will
use TSC for clocksource, but not sched_clock. And Linux-as-a-KVM-guest doesn't
support paravirt enumeration of the TSC/APIC frequencies, even though QEMU
provides that information by default.
The tertiary goal is to clean up the PV clock code to deduplicate logic across
hypervisors, and to hopefully make it all easier to maintain going forward.
v4 also adds a quaternary goal of cleaning up the TSC calibration code, which
was made stupidly hard to follow by hypervisor code mixing in with the native
calibration routines, instead of being implemented as a pure alternative.
Lots more background on the SNP/TDX motiviation:
https://lore.kernel.org/all/20250106124633.1418972-13-nikunj@amd.com
As before, I deliberately omitted jailhouse-dev@googlegroups.com from the To/Cc,
as those emails bounced on v1, AFAICT nothing has changed.
Note, I deliberately didn't collect a few reviews as the patches changed quite
a bit from what was reviewed in v3.
v4:
- Use x86_init_noop() to skip save/restore on VMware and Xen instead of
nullifying x86_platform.{save,restore}_sched_clock_state. [Sashiko]
- Use '0' to indicate "failure" when getting the CPU frequency from CPUID, to
avoid using an out-param and thus make it all but impossible to
unintentionally clobber the global cpu_khz (which v3 did). [Sashiko]
- Rename cpuid_get_cpu_freq() => __cpu_khz_from_cpuid() to capture its
relationship with cpu_khz_from_cpuid().
- Compute lapic_timer_period in units of ticks, not Khz. [Sashiko]
- Kill off x86_platform_ops.calibrate_{cpu,tsc}(), and instead use dedicated
hooks for hypervisor code, and direct calls for TDX and SNP. [David, loosely]
- Drop SNP's secure TSC override of _CPU_ calibration, as there's zero
evidence it's justified or a net positive.
- Collect reviews/acks. [David, Wei]
- Decouple getting TSC/APIC frequencies from KVM PV CPUID from kvmclock. [David]
- Fix an amusing number of Opportunistically misspellings. [David]
- Set kvm_sched_clock_offset _before_ registering kvmclock as sched_clock,
and add a comment to guard against future goofs. [Sashiko]
- Keep "setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE)" in Hyper-V's handling
of HV_ACCESS_TSC_INVARIANT, as it's technically possible to have a VM
with HV_ACCESS_TSC_INVARIANT but not HV_ACCESS_FREQUENCY_MSRS. Though as
a _very_ nice side effect of using dedicated sequencing for selecting the
TSC frequency source, this would have naturally happened anyways. [Sashiko]
v3:
- https://lore.kernel.org/all/20260515191942.1892718-1-seanjc@google.com
- Collect reviews. [Michael, Thomas]
- Use Hyper-V reference counter / refcounter instead of Hyper-V timer. [Michael]
- Use the paravirt CPUID interface first proposed by VMware for KVM's
"official" mechanism for communicating frequency to KVM-aware guests,
instead of abusing Intel's CPUID leafs. [David]
- Deal with paravirt code being moved into asm/timers.h and
arch/x86/kernel/tsc.c.
v2:
- https://lore.kernel.org/all/Z8YWttWDtvkyCtdJ@google.com
- Add struct to hold the TSC CPUID output. [Boris]
- Don't pointlessly inline the TSC CPUID helpers. [Boris]
- Fix a variable goof in a helper, hopefully for real this time. [Dan]
- Collect reviews. [Nikunj]
- Override the sched_clock save/restore hooks if and only if a PV clock
is successfully registered.
- During resome, restore clocksources before reading persistent time.
- Clean up more warts created by kvmclock.
- Fix more bugs in kvmclock's suspend/resume handling.
- Try to harden kvmclock against future bugs.
v1: https://lore.kernel.org/all/20250201021718.699411-1-seanjc@google.com
David Woodhouse (3):
KVM: x86: Officially define CPUID 0x40000010 as PV Timing Info (TSC
and Bus)
x86/kvm: Obtain TSC frequency from PV CPUID if present
x86/xen: Obtain TSC frequency from CPUID if present
Sean Christopherson (44):
x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
x86/sev: Mark TSC as reliable when configuring Secure TSC
x86/sev: Don't override CPU frequency calibration for SNP's Secure TSC
x86/sev: Move check for SNP Secure TSC support to tsc_early_init()
x86/sev: Shove SNP's secure/trusted TSC frequency directly into
"calibration"
x86/tdx: Force TSC frequency with CPUID-based info provided by the
TDX-Module
x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU
frequencies
x86/acrn: Mark TSC frequency as known when using ACRN for calibration
x86/tsc: Consolidate forcing of X86_FEATURE_TSC_KNOWN_FREQ for PV code
x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks
x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() =>
native_calibrate_cpu_late()
x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()
x86/kvmclock: Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz()
x86/kvm: Mark TSC as reliable when it's constant and nonstop
x86/kvm: Get local APIC bus frequency from PV CPUID Timing Info
x86/tsc: Add standalone helper for getting CPU frequency from CPUID
x86/kvm: Get CPU base frequency from CPUID when it's available
clocksource: hyper-v: Register sched_clock save/restore iff it's
necessary
clocksource: hyper-v: Drop wrappers to sched_clock save/restore
helpers
clocksource: hyper-v: Don't save/restore TSC offset when using HV
sched_clock
x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y
x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()
x86/paravirt: Remove unnecessary PARAVIRT=n stub for
paravirt_set_sched_clock()
x86/paravirt: Move handling of unstable PV clocks into
paravirt_set_sched_clock()
x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c
x86/xen/time: NOP-ify x86_platform's sched_clock save/restore hooks
x86/vmware: NOP-ify save/restore hooks when using VMware's sched_clock
x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock
x86/paravirt: Pass sched_clock save/restore helpers during
registration
x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c
x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
x86/pvclock: Mark setup helpers and related various as
__init/__ro_after_init
x86/pvclock: WARN if pvclock's valid_flags are overwritten
x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during
kvmclock_init()
timekeeping: Resume clocksources before reading persistent clock
x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't
sched_clock
x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
x86/paravirt: Mark __paravirt_set_sched_clock() as __init
x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted
TSC
x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
x86/kvmclock: Plumb in AP-online and BSP-resume to kvmlock, for
documentation
x86/paravirt: Move using_native_sched_clock() stub into timer.h
Documentation/virt/kvm/x86/cpuid.rst | 12 ++
arch/x86/coco/sev/core.c | 21 +--
arch/x86/coco/tdx/tdx.c | 19 ++-
arch/x86/include/asm/acrn.h | 5 -
arch/x86/include/asm/kvm_para.h | 12 +-
arch/x86/include/asm/sev.h | 4 +-
arch/x86/include/asm/tdx.h | 2 +
arch/x86/include/asm/timer.h | 15 +-
arch/x86/include/asm/tsc.h | 11 +-
arch/x86/include/asm/x86_init.h | 8 +-
arch/x86/include/uapi/asm/kvm_para.h | 11 ++
arch/x86/kernel/cpu/acrn.c | 10 +-
arch/x86/kernel/cpu/mshyperv.c | 65 +-------
arch/x86/kernel/cpu/vmware.c | 13 +-
arch/x86/kernel/jailhouse.c | 7 +-
arch/x86/kernel/kvm.c | 108 +++++++++++--
arch/x86/kernel/kvmclock.c | 208 ++++++++++++++++---------
arch/x86/kernel/pvclock.c | 9 +-
arch/x86/kernel/tsc.c | 218 +++++++++++++++++----------
arch/x86/kernel/x86_init.c | 2 -
arch/x86/mm/mem_encrypt_amd.c | 3 -
arch/x86/xen/time.c | 25 ++-
drivers/clocksource/hyperv_timer.c | 38 +++--
include/clocksource/hyperv_timer.h | 2 -
kernel/time/timekeeping.c | 9 +-
25 files changed, 533 insertions(+), 304 deletions(-)
base-commit: 4678d11f294de0fd295a265e02955b5d1a4a2684
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply
* Re: [PATCH v3 1/1] drm/hyperv: Use "hv_drm_" as symbol name prefix
From: Hamza Mahfooz @ 2026-05-29 11:40 UTC (permalink / raw)
To: mhklinux
Cc: maarten.lankhorst, mripard, tzimmermann, airlied, simona, decui,
longli, ssengar, dri-devel, linux-kernel, linux-hyperv
In-Reply-To: <20260529014826.41256-1-mhklkml@zohomail.com>
On Thu, May 28, 2026 at 06:48:26PM -0700, Michael Kelley wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> Function and structure names in the Hyper-V DRM driver currently
> use "hyperv_" as the prefix. This conflicts with usage in core Hyper-V
> and VMBus code, and incorrectly implies that functions and structures
> in this driver apply generically to Hyper-V. A specific conflict arises
> for "hyperv_init", which is an initcall for generic Hyper-V
> initialization on arm64. The conflict prevents the use of
> initcall_blacklist on the kernel boot line to skip loading this driver.
>
> Fix this by substituting "hv_drm_" as the prefix for all functions and
> structures in this driver. In most places, this is replacing "hyperv_"
> with "hv_drm_". In a few places, the substitution results in
> "hv_drm_drm_", which has been collapsed to just "hv_drm_". In two
> cases, the existing prefix is a bare "hv" (including in the to_hv()
> macro), which has been replaced with "hv_drm" for consistency.
>
> The changes are all mechanical text substitution in symbol names.
> There are no other code or functional changes.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag
From: Dipayaan Roy @ 2026-05-29 2:55 UTC (permalink / raw)
To: Jakub Kicinski
Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar,
ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov
In-Reply-To: <aguFpq8+LV+I9oH0@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
On Mon, May 18, 2026 at 02:33:26PM -0700, Dipayaan Roy wrote:
> On Mon, May 11, 2026 at 07:21:33PM -0700, Jakub Kicinski wrote:
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > ---
> > net: mana: force full-page RX buffers via ethtool private flag
> >
> > This patch adds an ethtool private flag "full-page-rx" that forces one
> > RX buffer per page, bypassing the page_pool fragment path, to address
> > a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64
> > platforms under high connection counts. Toggling the flag triggers
> > mana_detach() + mana_attach() to apply the new RX buffer layout.
> >
> > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > > index a28ca461c135..dfd222de11d0 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> >
> > [ ... ]
> >
> > > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> > > return 0;
> > > }
> > >
> >
> > [ ... ]
> >
> > > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> > > +{
> > > + struct mana_port_context *apc = netdev_priv(ndev);
> > > + u32 changed = apc->priv_flags ^ priv_flags;
> > > + u32 old_priv_flags = apc->priv_flags;
> > > + bool schedule_port_reset = false;
> > > + int err = 0;
> > > +
> > > + if (!changed)
> > > + return 0;
> > > +
> > > + /* Reject unknown bits */
> > > + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> > > + return -EINVAL;
> > > +
> > > + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> > > + apc->priv_flags = priv_flags;
> > > +
> > > + if (!apc->port_is_up) {
> > > + /* Port is down, flag updated to apply on next up
> > > + * so just return.
> > > + */
> > > + return 0;
> > > + }
> > > +
> > > + /* Pre-allocate buffers to prevent failure in mana_attach
> > > + * later
> > > + */
> > > + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
> > > + if (err) {
> > > + netdev_err(ndev,
> > > + "Insufficient memory for new allocations\n");
> > > + apc->priv_flags = old_priv_flags;
> > > + return err;
> > > + }
> > > +
> > > + err = mana_detach(ndev, false);
> > > + if (err) {
> > > + netdev_err(ndev, "mana_detach failed: %d\n", err);
> > > + apc->priv_flags = old_priv_flags;
> > > +
> > > + /* Port is in an inconsistent state. Restore
> > > + * 'port_is_up' so that queue reset work handler
> > > + * can properly detach and re-attach.
> > > + */
> > > + apc->port_is_up = true;
> > > + schedule_port_reset = true;
> > > + goto out;
> > > + }
> > > +
> > > + err = mana_attach(ndev);
> > > + if (err) {
> > > + netdev_err(ndev, "mana_attach failed: %d\n", err);
> > > + apc->priv_flags = old_priv_flags;
> > > +
> > > + /* Restore 'port_is_up' so the reset work handler
> > > + * can properly detach/attach. Without this,
> > > + * the handler sees port_is_up=false and skips
> > > + * queue allocation, leaving the port dead.
> > > + */
> > > + apc->port_is_up = true;
> > > + schedule_port_reset = true;
> > > + }
> > > + }
> > > +
> > > +out:
> > > + mana_pre_dealloc_rxbufs(apc);
> > > +
> > > + if (schedule_port_reset)
> > > + queue_work(apc->ac->per_port_queue_reset_wq,
> > > + &apc->queue_reset_work);
> > > +
> > > + return err;
> > > +}
> >
> > [High]
> > Can this failure path oops inside the reset work handler?
> >
> > When mana_attach() fails after mana_detach() has already torn the vport
> > down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs
> > has been cleared by mana_cleanup_port_context(). This code then forces
> > apc->port_is_up back to true and queues apc->queue_reset_work:
> >
> > apc->port_is_up = true;
> > schedule_port_reset = true;
> >
> > When mana_per_port_queue_reset_work_handler runs it calls
> > mana_detach(ndev, false), which latches port_st_save = port_is_up = true
> > and, because port_st_save is true, invokes mana_dealloc_queues(). The TX
> > drain loop there does:
> >
> > txq = &apc->tx_qp[i].txq;
> > atomic_read(&txq->pending_sends);
> >
> > with no NULL check on apc->tx_qp, and mana_destroy_vport further down
> > iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a
> > NULL pointer dereference in the workqueue worker whenever mana_attach()
> > fails here (for example on an allocation failure in
> > mana_init_port_context or mana_create_txq, or a transient HW command
> > failure in mana_query_vport_cfg / mana_cfg_vport)?
> >
> > The mana_detach() failure path above (goto out) reaches
> > mana_dealloc_queues() through the same chain and looks to have the same
> > exposure.
> Hi Jakub,
>
> Thanks for your comments, I have sent a separate fixes series to the
> net tree that fixes the possible NULL pointer derefernce issue and also
> makes the reset handler safe for the case where it runs after a failed
> attach:
> https://lore.kernel.org/all/20260518194654.735580-1-dipayanroy@linux.microsoft.com/
>
>
> >
> > For comparison, mana_change_mtu() handles a mana_attach() failure by
> > returning the error without scheduling a reset. Would a similar
> > treatment here avoid the asynchronous oops, or is there a reason the
> > reset must be scheduled in this specific failure case?
> > --
> > pw-bot: cr
>
> The full-page-rx private flag is intended to be driven by a udev rule
> that fires automatically during VM provisioning on affected platforms.
> If there is a transient failure, the VM fails to provision, requiring manual
> intervention.The reset handler retries the attach, giving the port a
> chance to recover to default config autonomously without intervention.
>
> Regards
> Dipayaan Roy
Hi Jakub,
As the pre-requisite fixes patches are accepted now:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=17bfe0a8c014
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5b05aa36ee24
Can this series be merged now? Let me know if it needs a rebase or
anything else.
Regards
Dipayaan Roy
^ permalink raw reply
* Re: [PATCH v3 1/1] drm/hyperv: Use "hv_drm_" as symbol name prefix
From: sashiko-bot @ 2026-05-29 1:59 UTC (permalink / raw)
To: Michael Kelley; +Cc: linux-hyperv
In-Reply-To: <20260529014826.41256-1-mhklkml@zohomail.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Missing cleanup of PCI driver on VMBus registration failure in module init.
- [High] Flawed synchronous VMBus message handling: missing reinit_completion() and ignored send errors.
--
commit a50bc353055fe2753b3f284aeec394019513c888
Author: Michael Kelley <mhklinux@outlook.com>
Subject: drm/hyperv: Use "hv_drm_" as symbol name prefix
This commit mechanically substitutes the "hyperv_" prefix with "hv_drm_"
for all functions and structures in the Hyper-V DRM driver. This avoids
namespace conflicts with generic Hyper-V initialization code.
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> index b6bf6412ae34a..20f35c48c0b82 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
[ ... ]
> -static int __init hyperv_init(void)
> +static int __init hv_drm_init(void)
> {
> int ret;
>
> if (drm_firmware_drivers_only())
> return -ENODEV;
>
> - ret = pci_register_driver(&hyperv_pci_driver);
> + ret = pci_register_driver(&hv_drm_pci_driver);
> if (ret != 0)
> return ret;
>
> - return vmbus_driver_register(&hyperv_hv_driver);
> + return vmbus_driver_register(&hv_drm_hv_driver);
> }
[Severity: High]
This is a pre-existing issue, but if vmbus_driver_register() fails, does this
leave the PCI driver registered while the module initialization fails?
If the module initialization returns an error and the kernel frees the
module's memory, the PCI subsystem might retain a dangling pointer to the
freed hv_drm_pci_driver structure. Could this lead to a kernel panic or
use-after-free if the PCI driver list is traversed later?
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index 6e09b0218df4e..f0ef627b4898a 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
[ ... ]
> -static int hyperv_negotiate_version(struct hv_device *hdev, u32 ver)
> +static int hv_drm_negotiate_version(struct hv_device *hdev, u32 ver)
> {
> - struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
> + struct hv_drm_device *hv = hv_get_drvdata(hdev);
> struct synthvid_msg *msg = (struct synthvid_msg *)hv->init_buf;
> struct drm_device *dev = &hv->dev;
> unsigned long t;
> @@ -223,7 +223,7 @@ static int hyperv_negotiate_version(struct hv_device *hdev, u32 ver)
> msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
> sizeof(struct synthvid_version_req);
> msg->ver_req.version = ver;
> - hyperv_sendpacket(hdev, msg);
> + hv_drm_sendpacket(hdev, msg);
>
> t = wait_for_completion_timeout(&hv->wait, VMBUS_VSP_TIMEOUT);
[Severity: High]
This isn't a bug introduced by this patch, but since the driver only
initializes the completion once during probe, is it possible for it to become
stale here?
If a previous request timed out and the host responds later, the completion
could remain signaled. This would cause a subsequent request to immediately
return from wait_for_completion_timeout() and process incorrect or stale data.
Should there be a call to reinit_completion() before sending the packet?
Additionally, since hv_drm_sendpacket() errors are ignored, does the driver
unnecessarily hang in wait_for_completion_timeout() for the full timeout
duration if the packet completely failed to send?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260529014826.41256-1-mhklkml@zohomail.com?part=1
^ permalink raw reply
* [PATCH v3 1/1] drm/hyperv: Use "hv_drm_" as symbol name prefix
From: Michael Kelley @ 2026-05-29 1:48 UTC (permalink / raw)
To: maarten.lankhorst, mripard, tzimmermann, airlied, simona, decui,
longli, ssengar
Cc: dri-devel, linux-kernel, linux-hyperv
From: Michael Kelley <mhklinux@outlook.com>
Function and structure names in the Hyper-V DRM driver currently
use "hyperv_" as the prefix. This conflicts with usage in core Hyper-V
and VMBus code, and incorrectly implies that functions and structures
in this driver apply generically to Hyper-V. A specific conflict arises
for "hyperv_init", which is an initcall for generic Hyper-V
initialization on arm64. The conflict prevents the use of
initcall_blacklist on the kernel boot line to skip loading this driver.
Fix this by substituting "hv_drm_" as the prefix for all functions and
structures in this driver. In most places, this is replacing "hyperv_"
with "hv_drm_". In a few places, the substitution results in
"hv_drm_drm_", which has been collapsed to just "hv_drm_". In two
cases, the existing prefix is a bare "hv" (including in the to_hv()
macro), which has been replaced with "hv_drm" for consistency.
The changes are all mechanical text substitution in symbol names.
There are no other code or functional changes.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
---
This patch is built against linux-next20260526.
Changes in v3:
* Also s/to_hv/to_hv_drm/ since the to_hv() macro is essentially
referencing the prefix. [Dexuan Cui]
Changes in v2:
* Use "hv_drm_" as the new prefix instead of "hvdrm_". [Hamza Mahfooz]
* After the new prefix is applied, collapse occurrences of "hv_drm_drm_"
to just "hv_drm_", such as with hv_drm_device. [Hamza Mahfooz]
* Don't change comments referring to source code filenames. [Dexuan Cui]
* Change hv_fops to hv_drm_fops for consistency.
drivers/gpu/drm/hyperv/hyperv_drm.h | 16 +--
drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 92 ++++++++--------
drivers/gpu/drm/hyperv/hyperv_drm_modeset.c | 110 ++++++++++----------
drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 70 ++++++-------
4 files changed, 144 insertions(+), 144 deletions(-)
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm.h b/drivers/gpu/drm/hyperv/hyperv_drm.h
index 9e776112c03e..78136ec2c2f4 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm.h
+++ b/drivers/gpu/drm/hyperv/hyperv_drm.h
@@ -8,7 +8,7 @@
#define VMBUS_MAX_PACKET_SIZE 0x4000
-struct hyperv_drm_device {
+struct hv_drm_device {
/* drm */
struct drm_device dev;
struct drm_plane plane;
@@ -39,17 +39,17 @@ struct hyperv_drm_device {
struct hv_device *hdev;
};
-#define to_hv(_dev) container_of(_dev, struct hyperv_drm_device, dev)
+#define to_hv_drm(_dev) container_of(_dev, struct hv_drm_device, dev)
/* hyperv_drm_modeset */
-int hyperv_mode_config_init(struct hyperv_drm_device *hv);
+int hv_drm_mode_config_init(struct hv_drm_device *hv);
/* hyperv_drm_proto */
-int hyperv_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp);
-int hyperv_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
+int hv_drm_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp);
+int hv_drm_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
u32 w, u32 h, u32 pitch);
-int hyperv_hide_hw_ptr(struct hv_device *hdev);
-int hyperv_update_dirt(struct hv_device *hdev, struct drm_rect *rect);
-int hyperv_connect_vsp(struct hv_device *hdev);
+int hv_drm_hide_hw_ptr(struct hv_device *hdev);
+int hv_drm_update_dirt(struct hv_device *hdev, struct drm_rect *rect);
+int hv_drm_connect_vsp(struct hv_device *hdev);
#endif
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
index b6bf6412ae34..20f35c48c0b8 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
@@ -24,9 +24,9 @@
#define DRIVER_MAJOR 1
#define DRIVER_MINOR 0
-DEFINE_DRM_GEM_FOPS(hv_fops);
+DEFINE_DRM_GEM_FOPS(hv_drm_fops);
-static struct drm_driver hyperv_driver = {
+static struct drm_driver hv_drm_driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC,
.name = DRIVER_NAME,
@@ -34,22 +34,22 @@ static struct drm_driver hyperv_driver = {
.major = DRIVER_MAJOR,
.minor = DRIVER_MINOR,
- .fops = &hv_fops,
+ .fops = &hv_drm_fops,
DRM_GEM_SHMEM_DRIVER_OPS,
DRM_FBDEV_SHMEM_DRIVER_OPS,
};
-static int hyperv_pci_probe(struct pci_dev *pdev,
+static int hv_drm_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
return 0;
}
-static void hyperv_pci_remove(struct pci_dev *pdev)
+static void hv_drm_pci_remove(struct pci_dev *pdev)
{
}
-static const struct pci_device_id hyperv_pci_tbl[] = {
+static const struct pci_device_id hv_drm_pci_tbl[] = {
{
.vendor = PCI_VENDOR_ID_MICROSOFT,
.device = PCI_DEVICE_ID_HYPERV_VIDEO,
@@ -60,14 +60,14 @@ static const struct pci_device_id hyperv_pci_tbl[] = {
/*
* PCI stub to support gen1 VM.
*/
-static struct pci_driver hyperv_pci_driver = {
+static struct pci_driver hv_drm_pci_driver = {
.name = KBUILD_MODNAME,
- .id_table = hyperv_pci_tbl,
- .probe = hyperv_pci_probe,
- .remove = hyperv_pci_remove,
+ .id_table = hv_drm_pci_tbl,
+ .probe = hv_drm_pci_probe,
+ .remove = hv_drm_pci_remove,
};
-static int hyperv_setup_vram(struct hyperv_drm_device *hv,
+static int hv_drm_setup_vram(struct hv_drm_device *hv,
struct hv_device *hdev)
{
struct drm_device *dev = &hv->dev;
@@ -102,15 +102,15 @@ static int hyperv_setup_vram(struct hyperv_drm_device *hv,
return ret;
}
-static int hyperv_vmbus_probe(struct hv_device *hdev,
+static int hv_drm_vmbus_probe(struct hv_device *hdev,
const struct hv_vmbus_device_id *dev_id)
{
- struct hyperv_drm_device *hv;
+ struct hv_drm_device *hv;
struct drm_device *dev;
int ret;
- hv = devm_drm_dev_alloc(&hdev->device, &hyperv_driver,
- struct hyperv_drm_device, dev);
+ hv = devm_drm_dev_alloc(&hdev->device, &hv_drm_driver,
+ struct hv_drm_device, dev);
if (IS_ERR(hv))
return PTR_ERR(hv);
@@ -119,15 +119,15 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
hv_set_drvdata(hdev, hv);
hv->hdev = hdev;
- ret = hyperv_connect_vsp(hdev);
+ ret = hv_drm_connect_vsp(hdev);
if (ret) {
drm_err(dev, "Failed to connect to vmbus.\n");
goto err_hv_set_drv_data;
}
- aperture_remove_all_conflicting_devices(hyperv_driver.name);
+ aperture_remove_all_conflicting_devices(hv_drm_driver.name);
- ret = hyperv_setup_vram(hv, hdev);
+ ret = hv_drm_setup_vram(hv, hdev);
if (ret)
goto err_vmbus_close;
@@ -136,11 +136,11 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
* vram location is not fatal. Device will update dirty area till
* preferred resolution only.
*/
- ret = hyperv_update_vram_location(hdev, hv->fb_base);
+ ret = hv_drm_update_vram_location(hdev, hv->fb_base);
if (ret)
drm_warn(dev, "Failed to update vram location.\n");
- ret = hyperv_mode_config_init(hv);
+ ret = hv_drm_mode_config_init(hv);
if (ret)
goto err_free_mmio;
@@ -168,10 +168,10 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
return ret;
}
-static void hyperv_vmbus_remove(struct hv_device *hdev)
+static void hv_drm_vmbus_remove(struct hv_device *hdev)
{
struct drm_device *dev = hv_get_drvdata(hdev);
- struct hyperv_drm_device *hv = to_hv(dev);
+ struct hv_drm_device *hv = to_hv_drm(dev);
vmbus_set_skip_unload(false);
drm_dev_unplug(dev);
@@ -183,12 +183,12 @@ static void hyperv_vmbus_remove(struct hv_device *hdev)
vmbus_free_mmio(hv->mem->start, hv->fb_size);
}
-static void hyperv_vmbus_shutdown(struct hv_device *hdev)
+static void hv_drm_vmbus_shutdown(struct hv_device *hdev)
{
drm_atomic_helper_shutdown(hv_get_drvdata(hdev));
}
-static int hyperv_vmbus_suspend(struct hv_device *hdev)
+static int hv_drm_vmbus_suspend(struct hv_device *hdev)
{
struct drm_device *dev = hv_get_drvdata(hdev);
int ret;
@@ -202,67 +202,67 @@ static int hyperv_vmbus_suspend(struct hv_device *hdev)
return 0;
}
-static int hyperv_vmbus_resume(struct hv_device *hdev)
+static int hv_drm_vmbus_resume(struct hv_device *hdev)
{
struct drm_device *dev = hv_get_drvdata(hdev);
- struct hyperv_drm_device *hv = to_hv(dev);
+ struct hv_drm_device *hv = to_hv_drm(dev);
int ret;
- ret = hyperv_connect_vsp(hdev);
+ ret = hv_drm_connect_vsp(hdev);
if (ret)
return ret;
- ret = hyperv_update_vram_location(hdev, hv->fb_base);
+ ret = hv_drm_update_vram_location(hdev, hv->fb_base);
if (ret)
return ret;
return drm_mode_config_helper_resume(dev);
}
-static const struct hv_vmbus_device_id hyperv_vmbus_tbl[] = {
+static const struct hv_vmbus_device_id hv_drm_vmbus_tbl[] = {
/* Synthetic Video Device GUID */
{HV_SYNTHVID_GUID},
{}
};
-static struct hv_driver hyperv_hv_driver = {
+static struct hv_driver hv_drm_hv_driver = {
.name = KBUILD_MODNAME,
- .id_table = hyperv_vmbus_tbl,
- .probe = hyperv_vmbus_probe,
- .remove = hyperv_vmbus_remove,
- .shutdown = hyperv_vmbus_shutdown,
- .suspend = hyperv_vmbus_suspend,
- .resume = hyperv_vmbus_resume,
+ .id_table = hv_drm_vmbus_tbl,
+ .probe = hv_drm_vmbus_probe,
+ .remove = hv_drm_vmbus_remove,
+ .shutdown = hv_drm_vmbus_shutdown,
+ .suspend = hv_drm_vmbus_suspend,
+ .resume = hv_drm_vmbus_resume,
.driver = {
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
},
};
-static int __init hyperv_init(void)
+static int __init hv_drm_init(void)
{
int ret;
if (drm_firmware_drivers_only())
return -ENODEV;
- ret = pci_register_driver(&hyperv_pci_driver);
+ ret = pci_register_driver(&hv_drm_pci_driver);
if (ret != 0)
return ret;
- return vmbus_driver_register(&hyperv_hv_driver);
+ return vmbus_driver_register(&hv_drm_hv_driver);
}
-static void __exit hyperv_exit(void)
+static void __exit hv_drm_exit(void)
{
- vmbus_driver_unregister(&hyperv_hv_driver);
- pci_unregister_driver(&hyperv_pci_driver);
+ vmbus_driver_unregister(&hv_drm_hv_driver);
+ pci_unregister_driver(&hv_drm_pci_driver);
}
-module_init(hyperv_init);
-module_exit(hyperv_exit);
+module_init(hv_drm_init);
+module_exit(hv_drm_exit);
-MODULE_DEVICE_TABLE(pci, hyperv_pci_tbl);
-MODULE_DEVICE_TABLE(vmbus, hyperv_vmbus_tbl);
+MODULE_DEVICE_TABLE(pci, hv_drm_pci_tbl);
+MODULE_DEVICE_TABLE(vmbus, hv_drm_vmbus_tbl);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Deepak Rawat <drawat.floss@gmail.com>");
MODULE_DESCRIPTION("DRM driver for Hyper-V synthetic video device");
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
index 793dbbf61893..1855749c1e41 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -25,11 +25,11 @@
#include "hyperv_drm.h"
-static int hyperv_blit_to_vram_rect(struct drm_framebuffer *fb,
+static int hv_drm_blit_to_vram_rect(struct drm_framebuffer *fb,
const struct iosys_map *vmap,
struct drm_rect *rect)
{
- struct hyperv_drm_device *hv = to_hv(fb->dev);
+ struct hv_drm_device *hv = to_hv_drm(fb->dev);
struct iosys_map dst = IOSYS_MAP_INIT_VADDR_IOMEM(hv->vram);
int idx;
@@ -44,9 +44,9 @@ static int hyperv_blit_to_vram_rect(struct drm_framebuffer *fb,
return 0;
}
-static int hyperv_connector_get_modes(struct drm_connector *connector)
+static int hv_drm_connector_get_modes(struct drm_connector *connector)
{
- struct hyperv_drm_device *hv = to_hv(connector->dev);
+ struct hv_drm_device *hv = to_hv_drm(connector->dev);
int count;
count = drm_add_modes_noedid(connector,
@@ -58,11 +58,11 @@ static int hyperv_connector_get_modes(struct drm_connector *connector)
return count;
}
-static const struct drm_connector_helper_funcs hyperv_connector_helper_funcs = {
- .get_modes = hyperv_connector_get_modes,
+static const struct drm_connector_helper_funcs hv_drm_connector_helper_funcs = {
+ .get_modes = hv_drm_connector_get_modes,
};
-static const struct drm_connector_funcs hyperv_connector_funcs = {
+static const struct drm_connector_funcs hv_drm_connector_funcs = {
.fill_modes = drm_helper_probe_single_connector_modes,
.destroy = drm_connector_cleanup,
.reset = drm_atomic_helper_connector_reset,
@@ -70,15 +70,15 @@ static const struct drm_connector_funcs hyperv_connector_funcs = {
.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
};
-static inline int hyperv_conn_init(struct hyperv_drm_device *hv)
+static inline int hv_drm_conn_init(struct hv_drm_device *hv)
{
- drm_connector_helper_add(&hv->connector, &hyperv_connector_helper_funcs);
+ drm_connector_helper_add(&hv->connector, &hv_drm_connector_helper_funcs);
return drm_connector_init(&hv->dev, &hv->connector,
- &hyperv_connector_funcs,
+ &hv_drm_connector_funcs,
DRM_MODE_CONNECTOR_VIRTUAL);
}
-static int hyperv_check_size(struct hyperv_drm_device *hv, int w, int h,
+static int hv_drm_check_size(struct hv_drm_device *hv, int w, int h,
struct drm_framebuffer *fb)
{
u32 pitch = w * (hv->screen_depth / 8);
@@ -92,25 +92,25 @@ static int hyperv_check_size(struct hyperv_drm_device *hv, int w, int h,
return 0;
}
-static const uint32_t hyperv_formats[] = {
+static const uint32_t hv_drm_formats[] = {
DRM_FORMAT_XRGB8888,
};
-static const uint64_t hyperv_modifiers[] = {
+static const uint64_t hv_drm_modifiers[] = {
DRM_FORMAT_MOD_LINEAR,
DRM_FORMAT_MOD_INVALID
};
-static void hyperv_crtc_helper_atomic_enable(struct drm_crtc *crtc,
+static void hv_drm_crtc_helper_atomic_enable(struct drm_crtc *crtc,
struct drm_atomic_commit *state)
{
- struct hyperv_drm_device *hv = to_hv(crtc->dev);
+ struct hv_drm_device *hv = to_hv_drm(crtc->dev);
struct drm_plane *plane = &hv->plane;
struct drm_plane_state *plane_state = plane->state;
struct drm_crtc_state *crtc_state = crtc->state;
- hyperv_hide_hw_ptr(hv->hdev);
- hyperv_update_situation(hv->hdev, 1, hv->screen_depth,
+ hv_drm_hide_hw_ptr(hv->hdev);
+ hv_drm_update_situation(hv->hdev, 1, hv->screen_depth,
crtc_state->mode.hdisplay,
crtc_state->mode.vdisplay,
plane_state->fb->pitches[0]);
@@ -118,14 +118,14 @@ static void hyperv_crtc_helper_atomic_enable(struct drm_crtc *crtc,
drm_crtc_vblank_on(crtc);
}
-static const struct drm_crtc_helper_funcs hyperv_crtc_helper_funcs = {
+static const struct drm_crtc_helper_funcs hv_drm_crtc_helper_funcs = {
.atomic_check = drm_crtc_helper_atomic_check,
.atomic_flush = drm_crtc_vblank_atomic_flush,
- .atomic_enable = hyperv_crtc_helper_atomic_enable,
+ .atomic_enable = hv_drm_crtc_helper_atomic_enable,
.atomic_disable = drm_crtc_vblank_atomic_disable,
};
-static const struct drm_crtc_funcs hyperv_crtc_funcs = {
+static const struct drm_crtc_funcs hv_drm_crtc_funcs = {
.reset = drm_atomic_helper_crtc_reset,
.destroy = drm_crtc_cleanup,
.set_config = drm_atomic_helper_set_config,
@@ -135,11 +135,11 @@ static const struct drm_crtc_funcs hyperv_crtc_funcs = {
DRM_CRTC_VBLANK_TIMER_FUNCS,
};
-static int hyperv_plane_atomic_check(struct drm_plane *plane,
+static int hv_drm_plane_atomic_check(struct drm_plane *plane,
struct drm_atomic_commit *state)
{
struct drm_plane_state *plane_state = drm_atomic_get_new_plane_state(state, plane);
- struct hyperv_drm_device *hv = to_hv(plane->dev);
+ struct hv_drm_device *hv = to_hv_drm(plane->dev);
struct drm_framebuffer *fb = plane_state->fb;
struct drm_crtc *crtc = plane_state->crtc;
struct drm_crtc_state *crtc_state = NULL;
@@ -167,10 +167,10 @@ static int hyperv_plane_atomic_check(struct drm_plane *plane,
return 0;
}
-static void hyperv_plane_atomic_update(struct drm_plane *plane,
+static void hv_drm_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_commit *state)
{
- struct hyperv_drm_device *hv = to_hv(plane->dev);
+ struct hv_drm_device *hv = to_hv_drm(plane->dev);
struct drm_plane_state *old_state = drm_atomic_get_old_plane_state(state, plane);
struct drm_plane_state *new_state = drm_atomic_get_new_plane_state(state, plane);
struct drm_shadow_plane_state *shadow_plane_state = to_drm_shadow_plane_state(new_state);
@@ -185,15 +185,15 @@ static void hyperv_plane_atomic_update(struct drm_plane *plane,
if (!drm_rect_intersect(&dst_clip, &damage))
continue;
- hyperv_blit_to_vram_rect(new_state->fb, &shadow_plane_state->data[0], &damage);
- hyperv_update_dirt(hv->hdev, &damage);
+ hv_drm_blit_to_vram_rect(new_state->fb, &shadow_plane_state->data[0], &damage);
+ hv_drm_update_dirt(hv->hdev, &damage);
}
}
-static int hyperv_plane_get_scanout_buffer(struct drm_plane *plane,
+static int hv_drm_plane_get_scanout_buffer(struct drm_plane *plane,
struct drm_scanout_buffer *sb)
{
- struct hyperv_drm_device *hv = to_hv(plane->dev);
+ struct hv_drm_device *hv = to_hv_drm(plane->dev);
struct iosys_map map = IOSYS_MAP_INIT_VADDR_IOMEM(hv->vram);
if (plane->state && plane->state->fb) {
@@ -207,9 +207,9 @@ static int hyperv_plane_get_scanout_buffer(struct drm_plane *plane,
return -ENODEV;
}
-static void hyperv_plane_panic_flush(struct drm_plane *plane)
+static void hv_drm_plane_panic_flush(struct drm_plane *plane)
{
- struct hyperv_drm_device *hv = to_hv(plane->dev);
+ struct hv_drm_device *hv = to_hv_drm(plane->dev);
struct drm_rect rect;
if (plane->state && plane->state->fb) {
@@ -218,32 +218,32 @@ static void hyperv_plane_panic_flush(struct drm_plane *plane)
rect.x2 = plane->state->fb->width;
rect.y2 = plane->state->fb->height;
- hyperv_update_dirt(hv->hdev, &rect);
+ hv_drm_update_dirt(hv->hdev, &rect);
}
vmbus_initiate_unload(true);
}
-static const struct drm_plane_helper_funcs hyperv_plane_helper_funcs = {
+static const struct drm_plane_helper_funcs hv_drm_plane_helper_funcs = {
DRM_GEM_SHADOW_PLANE_HELPER_FUNCS,
- .atomic_check = hyperv_plane_atomic_check,
- .atomic_update = hyperv_plane_atomic_update,
- .get_scanout_buffer = hyperv_plane_get_scanout_buffer,
- .panic_flush = hyperv_plane_panic_flush,
+ .atomic_check = hv_drm_plane_atomic_check,
+ .atomic_update = hv_drm_plane_atomic_update,
+ .get_scanout_buffer = hv_drm_plane_get_scanout_buffer,
+ .panic_flush = hv_drm_plane_panic_flush,
};
-static const struct drm_plane_funcs hyperv_plane_funcs = {
+static const struct drm_plane_funcs hv_drm_plane_funcs = {
.update_plane = drm_atomic_helper_update_plane,
.disable_plane = drm_atomic_helper_disable_plane,
.destroy = drm_plane_cleanup,
DRM_GEM_SHADOW_PLANE_FUNCS,
};
-static const struct drm_encoder_funcs hyperv_drm_simple_encoder_funcs_cleanup = {
+static const struct drm_encoder_funcs hv_drm_simple_encoder_funcs_cleanup = {
.destroy = drm_encoder_cleanup,
};
-static inline int hyperv_pipe_init(struct hyperv_drm_device *hv)
+static inline int hv_drm_pipe_init(struct hv_drm_device *hv)
{
struct drm_device *dev = &hv->dev;
struct drm_encoder *encoder = &hv->encoder;
@@ -253,29 +253,29 @@ static inline int hyperv_pipe_init(struct hyperv_drm_device *hv)
int ret;
ret = drm_universal_plane_init(dev, plane, 0,
- &hyperv_plane_funcs,
- hyperv_formats, ARRAY_SIZE(hyperv_formats),
- hyperv_modifiers,
+ &hv_drm_plane_funcs,
+ hv_drm_formats, ARRAY_SIZE(hv_drm_formats),
+ hv_drm_modifiers,
DRM_PLANE_TYPE_PRIMARY, NULL);
if (ret)
return ret;
- drm_plane_helper_add(plane, &hyperv_plane_helper_funcs);
+ drm_plane_helper_add(plane, &hv_drm_plane_helper_funcs);
drm_plane_enable_fb_damage_clips(plane);
ret = drm_crtc_init_with_planes(dev, crtc, plane, NULL,
- &hyperv_crtc_funcs, NULL);
+ &hv_drm_crtc_funcs, NULL);
if (ret)
return ret;
- drm_crtc_helper_add(crtc, &hyperv_crtc_helper_funcs);
+ drm_crtc_helper_add(crtc, &hv_drm_crtc_helper_funcs);
encoder->possible_crtcs = drm_crtc_mask(crtc);
ret = drm_encoder_init(dev, encoder,
- &hyperv_drm_simple_encoder_funcs_cleanup,
+ &hv_drm_simple_encoder_funcs_cleanup,
DRM_MODE_ENCODER_NONE, NULL);
if (ret)
return ret;
- ret = hyperv_conn_init(hv);
+ ret = hv_drm_conn_init(hv);
if (ret) {
drm_err(dev, "Failed to initialized connector.\n");
return ret;
@@ -285,25 +285,25 @@ static inline int hyperv_pipe_init(struct hyperv_drm_device *hv)
}
static enum drm_mode_status
-hyperv_mode_valid(struct drm_device *dev,
+hv_drm_mode_valid(struct drm_device *dev,
const struct drm_display_mode *mode)
{
- struct hyperv_drm_device *hv = to_hv(dev);
+ struct hv_drm_device *hv = to_hv_drm(dev);
- if (hyperv_check_size(hv, mode->hdisplay, mode->vdisplay, NULL))
+ if (hv_drm_check_size(hv, mode->hdisplay, mode->vdisplay, NULL))
return MODE_BAD;
return MODE_OK;
}
-static const struct drm_mode_config_funcs hyperv_mode_config_funcs = {
+static const struct drm_mode_config_funcs hv_drm_mode_config_funcs = {
.fb_create = drm_gem_fb_create_with_dirty,
- .mode_valid = hyperv_mode_valid,
+ .mode_valid = hv_drm_mode_valid,
.atomic_check = drm_atomic_helper_check,
.atomic_commit = drm_atomic_helper_commit,
};
-int hyperv_mode_config_init(struct hyperv_drm_device *hv)
+int hv_drm_mode_config_init(struct hv_drm_device *hv)
{
struct drm_device *dev = &hv->dev;
int ret;
@@ -322,9 +322,9 @@ int hyperv_mode_config_init(struct hyperv_drm_device *hv)
dev->mode_config.preferred_depth = hv->screen_depth;
dev->mode_config.prefer_shadow = 0;
- dev->mode_config.funcs = &hyperv_mode_config_funcs;
+ dev->mode_config.funcs = &hv_drm_mode_config_funcs;
- ret = hyperv_pipe_init(hv);
+ ret = hv_drm_pipe_init(hv);
if (ret) {
drm_err(dev, "Failed to initialized pipe.\n");
return ret;
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index 6e09b0218df4..f0ef627b4898 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -181,7 +181,7 @@ struct synthvid_msg {
};
} __packed;
-static inline bool hyperv_version_ge(u32 ver1, u32 ver2)
+static inline bool hv_drm_version_ge(u32 ver1, u32 ver2)
{
if (SYNTHVID_VER_GET_MAJOR(ver1) > SYNTHVID_VER_GET_MAJOR(ver2) ||
(SYNTHVID_VER_GET_MAJOR(ver1) == SYNTHVID_VER_GET_MAJOR(ver2) &&
@@ -191,10 +191,10 @@ static inline bool hyperv_version_ge(u32 ver1, u32 ver2)
return false;
}
-static inline int hyperv_sendpacket(struct hv_device *hdev, struct synthvid_msg *msg)
+static inline int hv_drm_sendpacket(struct hv_device *hdev, struct synthvid_msg *msg)
{
static atomic64_t request_id = ATOMIC64_INIT(0);
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
int ret;
msg->pipe_hdr.type = PIPE_MSG_DATA;
@@ -211,9 +211,9 @@ static inline int hyperv_sendpacket(struct hv_device *hdev, struct synthvid_msg
return ret;
}
-static int hyperv_negotiate_version(struct hv_device *hdev, u32 ver)
+static int hv_drm_negotiate_version(struct hv_device *hdev, u32 ver)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg *msg = (struct synthvid_msg *)hv->init_buf;
struct drm_device *dev = &hv->dev;
unsigned long t;
@@ -223,7 +223,7 @@ static int hyperv_negotiate_version(struct hv_device *hdev, u32 ver)
msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
sizeof(struct synthvid_version_req);
msg->ver_req.version = ver;
- hyperv_sendpacket(hdev, msg);
+ hv_drm_sendpacket(hdev, msg);
t = wait_for_completion_timeout(&hv->wait, VMBUS_VSP_TIMEOUT);
if (!t) {
@@ -243,9 +243,9 @@ static int hyperv_negotiate_version(struct hv_device *hdev, u32 ver)
return 0;
}
-int hyperv_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp)
+int hv_drm_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg *msg = (struct synthvid_msg *)hv->init_buf;
struct drm_device *dev = &hv->dev;
unsigned long t;
@@ -257,7 +257,7 @@ int hyperv_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp)
msg->vram.user_ctx = vram_pp;
msg->vram.vram_gpa = vram_pp;
msg->vram.is_vram_gpa_specified = 1;
- hyperv_sendpacket(hdev, msg);
+ hv_drm_sendpacket(hdev, msg);
t = wait_for_completion_timeout(&hv->wait, VMBUS_VSP_TIMEOUT);
if (!t) {
@@ -272,7 +272,7 @@ int hyperv_update_vram_location(struct hv_device *hdev, phys_addr_t vram_pp)
return 0;
}
-int hyperv_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
+int hv_drm_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
u32 w, u32 h, u32 pitch)
{
struct synthvid_msg msg;
@@ -292,7 +292,7 @@ int hyperv_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
msg.situ.video_output[0].height_pixels = h;
msg.situ.video_output[0].pitch_bytes = pitch;
- hyperv_sendpacket(hdev, &msg);
+ hv_drm_sendpacket(hdev, &msg);
return 0;
}
@@ -306,11 +306,11 @@ int hyperv_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
* the msg.ptr_shape.data. Note: setting msg.ptr_pos.is_visible to 0 doesn't
* work in tests.
*
- * The hyperv_hide_hw_ptr() is also called in the handler of the
+ * The hv_drm_hide_hw_ptr() is also called in the handler of the
* SYNTHVID_FEATURE_CHANGE event, otherwise the host still draws an extra
* unwanted mouse pointer after the VM Connection window is closed and reopened.
*/
-int hyperv_hide_hw_ptr(struct hv_device *hdev)
+int hv_drm_hide_hw_ptr(struct hv_device *hdev)
{
struct synthvid_msg msg;
@@ -322,7 +322,7 @@ int hyperv_hide_hw_ptr(struct hv_device *hdev)
msg.ptr_pos.video_output = 0;
msg.ptr_pos.image_x = 0;
msg.ptr_pos.image_y = 0;
- hyperv_sendpacket(hdev, &msg);
+ hv_drm_sendpacket(hdev, &msg);
memset(&msg, 0, sizeof(struct synthvid_msg));
msg.vid_hdr.type = SYNTHVID_POINTER_SHAPE;
@@ -338,14 +338,14 @@ int hyperv_hide_hw_ptr(struct hv_device *hdev)
msg.ptr_shape.data[1] = 1;
msg.ptr_shape.data[2] = 1;
msg.ptr_shape.data[3] = 1;
- hyperv_sendpacket(hdev, &msg);
+ hv_drm_sendpacket(hdev, &msg);
return 0;
}
-int hyperv_update_dirt(struct hv_device *hdev, struct drm_rect *rect)
+int hv_drm_update_dirt(struct hv_device *hdev, struct drm_rect *rect)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg msg;
if (!hv->dirt_needed)
@@ -363,14 +363,14 @@ int hyperv_update_dirt(struct hv_device *hdev, struct drm_rect *rect)
msg.dirt.rect[0].x2 = rect->x2;
msg.dirt.rect[0].y2 = rect->y2;
- hyperv_sendpacket(hdev, &msg);
+ hv_drm_sendpacket(hdev, &msg);
return 0;
}
-static int hyperv_get_supported_resolution(struct hv_device *hdev)
+static int hv_drm_get_supported_resolution(struct hv_device *hdev)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg *msg = (struct synthvid_msg *)hv->init_buf;
struct drm_device *dev = &hv->dev;
unsigned long t;
@@ -383,7 +383,7 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
sizeof(struct synthvid_supported_resolution_req);
msg->resolution_req.maximum_resolution_count =
SYNTHVID_MAX_RESOLUTION_COUNT;
- hyperv_sendpacket(hdev, msg);
+ hv_drm_sendpacket(hdev, msg);
t = wait_for_completion_timeout(&hv->wait, VMBUS_VSP_TIMEOUT);
if (!t) {
@@ -420,9 +420,9 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
return 0;
}
-static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
+static void hv_drm_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg *msg;
size_t hdr_size;
size_t need;
@@ -486,7 +486,7 @@ static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
}
hv->dirt_needed = msg->feature_chg.is_dirt_needed;
if (hv->dirt_needed)
- hyperv_hide_hw_ptr(hv->hdev);
+ hv_drm_hide_hw_ptr(hv->hdev);
return;
default:
return;
@@ -508,10 +508,10 @@ static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
complete(&hv->wait);
}
-static void hyperv_receive(void *ctx)
+static void hv_drm_receive(void *ctx)
{
struct hv_device *hdev = ctx;
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct synthvid_msg *recv_buf;
u32 bytes_recvd;
u64 req_id;
@@ -539,19 +539,19 @@ static void hyperv_receive(void *ctx)
ret, bytes_recvd);
} else if (bytes_recvd > 0 &&
recv_buf->pipe_hdr.type == PIPE_MSG_DATA) {
- hyperv_receive_sub(hdev, bytes_recvd);
+ hv_drm_receive_sub(hdev, bytes_recvd);
}
} while (bytes_recvd > 0 && ret == 0);
}
-int hyperv_connect_vsp(struct hv_device *hdev)
+int hv_drm_connect_vsp(struct hv_device *hdev)
{
- struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
+ struct hv_drm_device *hv = hv_get_drvdata(hdev);
struct drm_device *dev = &hv->dev;
int ret;
ret = vmbus_open(hdev->channel, VMBUS_RING_BUFSIZE, VMBUS_RING_BUFSIZE,
- NULL, 0, hyperv_receive, hdev);
+ NULL, 0, hv_drm_receive, hdev);
if (ret) {
drm_err(dev, "Unable to open vmbus channel\n");
return ret;
@@ -561,16 +561,16 @@ int hyperv_connect_vsp(struct hv_device *hdev)
switch (vmbus_proto_version) {
case VERSION_WIN10:
case VERSION_WIN10_V5:
- ret = hyperv_negotiate_version(hdev, SYNTHVID_VERSION_WIN10);
+ ret = hv_drm_negotiate_version(hdev, SYNTHVID_VERSION_WIN10);
if (!ret)
break;
fallthrough;
case VERSION_WIN8:
case VERSION_WIN8_1:
- ret = hyperv_negotiate_version(hdev, SYNTHVID_VERSION_WIN8);
+ ret = hv_drm_negotiate_version(hdev, SYNTHVID_VERSION_WIN8);
break;
default:
- ret = hyperv_negotiate_version(hdev, SYNTHVID_VERSION_WIN10);
+ ret = hv_drm_negotiate_version(hdev, SYNTHVID_VERSION_WIN10);
break;
}
@@ -581,8 +581,8 @@ int hyperv_connect_vsp(struct hv_device *hdev)
hv->screen_depth = SYNTHVID_DEPTH_WIN8;
- if (hyperv_version_ge(hv->synthvid_version, SYNTHVID_VERSION_WIN10)) {
- ret = hyperv_get_supported_resolution(hdev);
+ if (hv_drm_version_ge(hv->synthvid_version, SYNTHVID_VERSION_WIN10)) {
+ ret = hv_drm_get_supported_resolution(hdev);
if (ret)
drm_err(dev, "Failed to get supported resolution from host, use default\n");
}
--
2.25.1
^ permalink raw reply related
* RE: [EXTERNAL] [PATCH v2 1/1] drm/hyperv: Replace "hyperv_" with "hv_drm_" as symbol name prefix
From: Michael Kelley @ 2026-05-29 1:44 UTC (permalink / raw)
To: Dexuan Cui, Michael Kelley, maarten.lankhorst@linux.intel.com,
mripard@kernel.org, tzimmermann@suse.de, airlied@gmail.com,
simona@ffwll.ch, Long Li, ssengar@linux.microsoft.com
Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org
In-Reply-To: <SA1PR21MB6921E2BE7D0F3B804877427CBF092@SA1PR21MB6921.namprd21.prod.outlook.com>
From: Dexuan Cui <DECUI@microsoft.com> Sent: Thursday, May 28, 2026 9:53 AM
>
> > From: Michael Kelley <mhklkml@zohomail.com>
> > Sent: Thursday, May 28, 2026 6:51 AM
> > ...
> > -#define to_hv(_dev) container_of(_dev, struct hyperv_drm_device, dev)
> > +#define to_hv(_dev) container_of(_dev, struct hv_drm_device, dev)
>
> A minor nit: change "to_hv" to "to_hv_drm"? Otherwise, LGTM.
Yes, that makes sense. It's not a symbol that would appear in, and cause
confusion in, a global symbol list. But the "hv" in "to_hv" is effectively
the prefix, so for completeness change it as well.
I'll send a v3 shortly.
>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
Thanks for reviewing.
Michael
^ permalink raw reply
* Re: [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure
From: patchwork-bot+netdevbpf @ 2026-05-28 23:40 UTC (permalink / raw)
To: Dipayaan Roy
Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
In-Reply-To: <20260525081129.1230035-1-dipayanroy@linux.microsoft.com>
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 25 May 2026 01:08:23 -0700 you wrote:
> When mana_attach() fails (e.g. during queue allocation), the error
> cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
> subsequent teardown paths can then dereference these NULL pointers,
> causing kernel panics.
>
> Patch 1 adds NULL guards in the low-level teardown functions
> (mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
> safe to call regardless of queue initialization state. This covers all
> callers: mana_remove(), mana_change_mtu() recovery, and internal error
> paths in mana_alloc_queues().
>
> [...]
Here is the summary with links:
- [net,v3,1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
https://git.kernel.org/netdev/net/c/17bfe0a8c014
- [net,v3,2/2] net: mana: Skip redundant detach on already-detached port
https://git.kernel.org/netdev/net/c/5b05aa36ee24
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH net-next v11 0/6] net: mana: Per-vPort EQ and MSI-X management
From: Long Li @ 2026-05-28 22:05 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Konstantin Taranov, David S . Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Jason Gunthorpe, Leon Romanovsky, Haiyang Zhang,
KY Srinivasan, Wei Liu, Dexuan Cui,
shradhagupta@linux.microsoft.com, Simon Horman,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260527192735.34a794cf@kernel.org>
> On Fri, 22 May 2026 19:02:50 -0700 Long Li wrote:
> > The following changes since commit
> 95fab46aea57d6d7b76b319341acbefe8a9293c8:
> >
> > Merge branch
> > 'net-convert-atm-xdp-af_iucv-l2tp_ppp-rxrpc-tipc-to-getsockopt_iter'
> > (2026-05-22 11:11:12 -0700)
> >
> > are available in the Git repository at:
> >
> >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2Flonglimsft%2Flinux.git&data=05%7C02%7Clongli%40microsoft.co
> m%
> >
> 7C36237239bb6949843c7508debc60af6c%7C72f988bf86f141af91ab2d7c
> d011db47%
> >
> 7C1%7C0%7C639155320616840917%7CUnknown%7CTWFpbGZsb3d8eyJF
> bXB0eU1hcGkiO
> >
> nRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoy
> fQ%
> >
> 3D%3D%7C0%7C%7C%7C&sdata=43aUwSeHYaOhd%2Bmd1lwfmCqmrAObg
> MWJoDRpKDhmCt8
> > %3D&reserved=0 tags/mana-eq-msi-v11
> >
> > for you to fetch changes up to
> a26d11135abba51e81ae8b9689e288718af95088:
> >
> > RDMA/mana_ib: Allocate interrupt contexts on EQs (2026-05-22
> > 20:35:43 +0000)
>
> The branch is no good, it needs to be your patches applied on top of a commit
> already in Linus's tree. The current branch is on top of net-next, RDMA would
> have to pull in 100s of networking commits together with your changes.
Hi Jakub,
Thanks for looking into this. Since the RDMA patch (patch 6) depends on the networking changes in patches 1-5, could this series go through net-next? I've verified that the tag pulls cleanly into the latest net-next.
Leon, Jason - could you provide an Acked-by for patch 6 ("RDMA/mana_ib: Allocate interrupt contexts on EQs") so it can be taken through the networking tree?
Thanks,
Long
^ permalink raw reply
* Re: [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port
From: Dipayaan Roy @ 2026-05-28 21:16 UTC (permalink / raw)
To: Paolo Abeni
Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, leon, longli, kotaranov, horms, shradhagupta, ssengar,
ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
In-Reply-To: <3665f7c1-9c97-44ac-8b6a-e6c31ad96730@redhat.com>
On Thu, May 28, 2026 at 11:30:39AM +0200, Paolo Abeni wrote:
> On 5/25/26 10:08 AM, Dipayaan Roy wrote:
> > When mana_per_port_queue_reset_work_handler() runs after a previous
> > detach succeeded but attach failed, the port is left in a detached
> > state with apc->tx_qp and apc->rxqs already freed. Calling
> > mana_detach() again unconditionally leads to NULL pointer dereferences
> > during queue teardown.
> >
> > Add an early exit in mana_detach() when the port is already in
> > detached state (!netif_device_present) for non-close callers, making
> > it safe to call idempotently. This allows the queue reset handler and
> > other recovery paths to simply retry mana_attach() without redundant
> > teardown.
> >
> > Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> > ---
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 0582803907a8..1e1ad2795c3c 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
> >
> > ASSERT_RTNL();
> >
> > + /* If already detached (indicates detach succeeded but attach failed
> > + * previously). Now skip mana detach and just retry mana_attach.
> > + */
> > + if (!from_close && !netif_device_present(ndev))
> > + return 0;
> > +
> > apc->port_st_save = apc->port_is_up;
> > apc->port_is_up = false;
>
> sashiko(gemini) notes the above can lead to different race:
>
> ---
> Can this early return cause state machine corruption by bypassing the
> updates
> to apc->port_st_save?
> Consider this sequence:
> 1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
> apc->port_is_up = false), but mana_attach() fails.
> 2. The admin brings the interface down (ip link set dev eth0 down), skipping
> mana_close() since apc->port_is_up is false.
> 3. The admin changes the MTU, triggering mana_change_mtu() which calls
> mana_detach() followed by mana_attach().
> 4. mana_detach() hits this new early return, preserving
> apc->port_st_save == true.
> When mana_attach() runs, it sees apc->port_st_save == true and allocates
> queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
> though the interface is administratively down.
> If the admin then brings the interface up, mana_open() will unconditionally
> call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
> return -EBUSY because apc->vport_use_count is already 1.
> This leaves mana_open() failing and the interface down. Since the interface
> is already down, trying to bring it down again is a no-op, meaning
> mana_close() is never called to clean up the orphaned queues.
> Does this sequence permanently brick the port until the driver is reloaded?
> ---
>
> I think you need to be more restrictive in the early return check.
>
> /P
>
Hi Paolo,
Thank you for the comments,
I think the scenario pointed out by sashiko does not seems valid,
as it mentioned in step 2 and 3 admin changing MTU after bringing
interface down. This is becasue netif_set_mtu_ext() in dev.c checks
netif_device_present and returns -ENODEV before calling
ndo_change_mtu. So mana_change_mtu() is never reachable when the
device is in the !present state.
https://elixir.bootlin.com/linux/v7.0/source/net/core/dev.c#L9906
Please let me know if the above check is good enough?
Regards
Dipayaan Roy
^ permalink raw reply
* [PATCH net-next] net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries
From: Erni Sri Satya Vennela @ 2026-05-28 18:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
linux-hyperv, netdev, linux-kernel, linux-rdma
mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.
Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.
To prevent a concurrent mana_set_bw_clamp() from racing with an
in-flight query and publishing stale pre-clamp speed/max_speed,
serialize the firmware transaction and the cache update under a new
per-port mutex (link_cfg_mutex). The mutex covers both the HWC
request and the subsequent stores in mana_query_link_cfg(), and the
HWC request and invalidation in mana_set_bw_clamp(). With this lock
held, two queries can no longer interleave their speed/max_speed
stores, and an invalidation can no longer slip in between a query's
response and its publish.
Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
link_cfg_mutex, so that any slow-path query already in flight cannot
later store 0 and silently overwrite the post-resume invalidation.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 41 +++++++++++++++----
include/net/mana/mana.h | 4 ++
2 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 82f1461a48e9..43018bc13dc1 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1456,6 +1456,12 @@ int mana_query_link_cfg(struct mana_port_context *apc)
struct mana_query_link_config_req req = {};
int err;
+ mutex_lock(&apc->link_cfg_mutex);
+
+ err = apc->link_cfg_error;
+ if (err <= 0)
+ goto out;
+
mana_gd_init_req_hdr(&req.hdr, MANA_QUERY_LINK_CONFIG,
sizeof(req), sizeof(resp));
@@ -1468,10 +1474,11 @@ int mana_query_link_cfg(struct mana_port_context *apc)
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_QUERY_LINK_CONFIG not supported\n");
- return err;
+ apc->link_cfg_error = err;
+ goto out;
}
netdev_err(ndev, "Failed to query link config: %d\n", err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_QUERY_LINK_CONFIG,
@@ -1482,16 +1489,20 @@ int mana_query_link_cfg(struct mana_port_context *apc)
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured) {
err = -EINVAL;
- return err;
+ goto out;
}
apc->speed = resp.link_speed_mbps;
apc->max_speed = resp.qos_speed_mbps;
- return 0;
+ apc->link_cfg_error = 0;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
@@ -1508,17 +1519,19 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
req.link_speed_mbps = speed;
req.enable_clamping = enable_clamping;
+ mutex_lock(&apc->link_cfg_mutex);
+
err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
sizeof(resp));
if (err) {
if (err == -EOPNOTSUPP) {
netdev_info_once(ndev, "MANA_SET_BW_CLAMP not supported\n");
- return err;
+ goto out;
}
netdev_err(ndev, "Failed to set bandwidth clamp for speed %u, err = %d",
speed, err);
- return err;
+ goto out;
}
err = mana_verify_resp_hdr(&resp.hdr, MANA_SET_BW_CLAMP,
@@ -1529,13 +1542,18 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
resp.hdr.status);
if (!err)
err = -EOPNOTSUPP;
- return err;
+ goto out;
}
if (resp.qos_unconfigured)
netdev_info(ndev, "QoS is unconfigured\n");
- return 0;
+ /* Invalidate the cache; next query will re-fetch from firmware. */
+ apc->link_cfg_error = 1;
+ err = 0;
+out:
+ mutex_unlock(&apc->link_cfg_mutex);
+ return err;
}
int mana_create_wq_obj(struct mana_port_context *apc,
@@ -3430,6 +3448,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc->port_handle = INVALID_MANA_HANDLE;
apc->pf_filter_handle = INVALID_MANA_HANDLE;
apc->port_idx = port_idx;
+ apc->link_cfg_error = 1;
+ mutex_init(&apc->link_cfg_mutex);
apc->cqe_coalescing_enable = 0;
mutex_init(&apc->vport_mutex);
@@ -3750,6 +3770,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
rtnl_lock();
apc = netdev_priv(ac->ports[i]);
enable_work(&apc->queue_reset_work);
+ mutex_lock(&apc->link_cfg_mutex);
+ apc->link_cfg_error = 1;
+ mutex_unlock(&apc->link_cfg_mutex);
err = mana_attach(ac->ports[i]);
rtnl_unlock();
/* Log the port for which the attach failed, stop
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index d9c27310fd04..af772b7297ec 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -555,6 +555,10 @@ struct mana_port_context {
u32 speed;
/* Maximum speed supported by the SKU (mbps) */
u32 max_speed;
+ /* 1 = not queried, 0 = cached success, negative = permanent error */
+ int link_cfg_error;
+ /* Serializes mana_query_link_cfg() and mana_set_bw_clamp(). */
+ struct mutex link_cfg_mutex;
bool port_is_up;
bool port_st_save; /* Saved port state */
--
2.34.1
^ permalink raw reply related
* RE: [EXTERNAL] [PATCH v2 1/1] drm/hyperv: Replace "hyperv_" with "hv_drm_" as symbol name prefix
From: Dexuan Cui @ 2026-05-28 16:52 UTC (permalink / raw)
To: mhklinux@outlook.com, maarten.lankhorst@linux.intel.com,
mripard@kernel.org, tzimmermann@suse.de, airlied@gmail.com,
simona@ffwll.ch, Long Li, ssengar@linux.microsoft.com
Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org
In-Reply-To: <20260528135108.1787-1-mhklkml@zohomail.com>
> From: Michael Kelley <mhklkml@zohomail.com>
> Sent: Thursday, May 28, 2026 6:51 AM
> ...
> -#define to_hv(_dev) container_of(_dev, struct hyperv_drm_device, dev)
> +#define to_hv(_dev) container_of(_dev, struct hv_drm_device, dev)
A minor nit: change "to_hv" to "to_hv_drm"? Otherwise, LGTM.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
^ permalink raw reply
* RE: [RFC PATCH 2/8] clocksource/hyperv: Implement read_raw() for TSC page clocksource
From: Michael Kelley @ 2026-05-28 16:47 UTC (permalink / raw)
To: David Woodhouse, Sean Christopherson, Paolo Bonzini,
Thomas Gleixner, John Stultz, Michael Kelley
Cc: Vitaly Kuznetsov, Marcelo Tosatti, Christopher S . Hall,
Stephen Boyd, Miroslav Lichvar, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, K . Y . Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Daniel Lezcano, kvm@vger.kernel.org,
linux-hyperv@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20260526230635.136914-2-dwmw2@infradead.org>
From: David Woodhouse <dwmw2@infradead.org> Sent: Tuesday, May 26, 2026 4:06 PM
>
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Implement the read_raw() callback for the Hyper-V TSC page
> clocksource. This returns the derived 10MHz reference time (for
> timekeeping) while also providing the raw TSC value that was used
> to compute it.
>
> When the TSC page is valid, hv_read_tsc_page_tsc() atomically
> captures both values from a single RDTSC inside the sequence-counter
> protected read. When the TSC page is invalid (sequence == 0), raw is
> set to zero indicating no value is available.
>
> This enables ktime_get_snapshot_id() to provide the raw TSC to
> consumers like KVM's master clock when running nested on Hyper-V.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Assisted-by: Kiro:claude-opus-4.6-1m
Looking narrowly at just the Hyper-V clocksource code in this patch:
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> ---
> drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index e9f5034a1bc8..c5ae01fdbd8e 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -444,6 +444,18 @@ static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
> return read_hv_clock_tsc();
> }
>
> +static u64 notrace read_hv_clock_tsc_cs_raw(struct clocksource *arg, u64 *raw)
> +{
> + u64 time;
> +
> + if (!hv_read_tsc_page_tsc(tsc_page, raw, &time)) {
> + time = read_hv_clock_msr();
> + *raw = 0;
> + }
> +
> + return time;
> +}
> +
> static u64 noinstr read_hv_sched_clock_tsc(void)
> {
> return (read_hv_clock_tsc() - hv_sched_clock_offset) *
> @@ -495,6 +507,8 @@ static struct clocksource hyperv_cs_tsc = {
> .name = "hyperv_clocksource_tsc_page",
> .rating = 500,
> .read = read_hv_clock_tsc_cs,
> + .read_raw = read_hv_clock_tsc_cs_raw,
> + .raw_csid = CSID_X86_TSC,
> .mask = CLOCKSOURCE_MASK(64),
> .flags = CLOCK_SOURCE_IS_CONTINUOUS,
> .suspend= suspend_hv_clock_tsc,
> --
> 2.54.0
>
^ permalink raw reply
* RE: [PATCH 1/1] drm/hyperv: Replace "hyperv_" with "hvdrm_" as symbol name prefix
From: Michael Kelley @ 2026-05-28 13:54 UTC (permalink / raw)
To: Dexuan Cui, Michael Kelley, Hamza Mahfooz
Cc: maarten.lankhorst@linux.intel.com, mripard@kernel.org,
tzimmermann@suse.de, airlied@gmail.com, simona@ffwll.ch, Long Li,
ssengar@linux.microsoft.com, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org
In-Reply-To: <SA1PR21MB6921A2C91F67C10F272E9430BF082@SA1PR21MB6921.namprd21.prod.outlook.com>
From: Dexuan Cui <DECUI@microsoft.com> Sent: Wednesday, May 27, 2026 12:55 PM
>
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Wednesday, May 27, 2026 8:05 AM
> > > >
> > > > Function and structure names in the Hyper-V DRM driver currently
> > > > use "hyperv_" as the prefix. This conflicts with usage in core Hyper-V
> > > > and VMBus code, and incorrectly implies that functions and structures
> > > > in this driver apply generically to Hyper-V. A specific conflict arises
> > > > for "hyperv_init", which is an initcall for generic Hyper-V
> > > > initialization on arm64. The conflict prevents the use of
> > > > initcall_blacklist on the kernel boot line to skip loading this driver.
>
> I also hit the issue. Thanks for the fix!
>
> > > > Fix this by substituting "hvdrm_" as the prefix for all functions and
> > >
> > > I would personally prefer "hv_drm_", since it seems clearer.
> >
> > My choice of "hvdrm" mimics the old Hyper-V FBdev driver, which
> > uses "hvfb" as the prefix. However, looking through everything that
> > starts with "hv" in /proc/kallsyms, I also see prefixes with the additional
> > underscore. "hv_kbd_" in the Hyper-V keyboard driver is an example.
> > The Hyper-V utils drivers have both forms -- I see "hv_vss_", "hv_ptp_",
> > and "hv_kvp_", but also "hvt" (for Hyper-V Transport). So the historical
> > practice is inconsistent.
> >
> > I'm OK going either way. Does anyone else want to express a
> > preference?
>
> I also prefer "hv_drm_".
>
> > > > -struct hyperv_drm_device {
> > > > +struct hvdrm_drm_device {
> > >
> > > "hvdrm_drm_device" looks kinda redundant, perhaps
> > > s/hyperv_drm_device/hv_drm_device would be more sensible.
>
> s/hyperv_drm_device/hv_drm_dev/ seems better to me.
>
>
> > Yes, I'll make this change. And in looking through kallsyms, I
> > see that the Hyper-V DRM driver has "hv_fops", which did not
> > get changed in the mechanical substitution because it doesn't
> > start with "hyperv_". I'll change it to hv_drm_fops.
> >
> > Michael
>
> Some comments need to be updated accordingly, e.g.
> /* hvdrm_drm_modeset */
> /* hvdrm_drm_proto */
>
> This needs to be updated as well:
> +static const struct drm_encoder_funcs hvdrm_drm_simple_encoder_funcs_cleanup
>
Dexuan and Hamza -- thanks for your feedback! I have incorporated
all of it into the "v2" that I just posted.
Michael
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox