From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDD4A4F798B for ; Wed, 1 Jul 2026 19:32:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782934378; cv=none; b=hIgQ6Oe3Tic6pfCioeVGlbT8gpHVFw+3FZ0J6J8WjxpFIN4JZsfgljjVmKsa8/71QUjVoH+pYd+NSl9A7rSeONKfPM1InnUTPOfYRUFLg4FtBGPk7JYVjDBpgjLzdfQUQpNt1ubsVMQgK1r/bv/nQWYM1XWX0i26n5RZuMz7ASw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782934378; c=relaxed/simple; bh=9x/WoAyEJ+5eY0VNcPunpgzEF61GW0nMgcwZdXKTf70=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nX1EivFOAzfzaQu2Qc0PD7I04U8aMgsO/13Xj1+zIthUfNojMAFIYbw3DJaGFa9Cf+dlUMlv0qa22gyGWRQ/K1ipsZZI9tUmyqlZUjoYB73HxMSxhCz2V3YgF8Kt8Ehy6JgZKH94nLv9re/ekq3y2x7LridW/Mx0aBbH4gNb7Bs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jmpH0/th; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jmpH0/th" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c9667280edeso637031a12.2 for ; Wed, 01 Jul 2026 12:32:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1782934374; x=1783539174; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=h0seWmgtqawiMKFH9SI3lPf42xnOLUsMbO7m+HkPiqQ=; b=jmpH0/thpRzPxXS0u+2BbGUFeIiEjPQ+YBpryVrVZO3jv+lWbLKK1I718d2E/NLFBg IBgsbLvvZAu74SZN3pe/V/j7252FjW+1/8pbsOqhrP07oZj7zMeSH5sOBd8clFAbO4Vp L2bCLGbyFi/xKA/xYZBRQMiwxZt+HuHW4GMIBVlpa0hABvS0q9ca96Vz3aqrFDNsK2UN cr4/FJzy+SibNiEJQtJfqlml6d+JLdG4It+aRaoA604qPlWNYAQNhV0QY9m2bdbfqQ+V ZklrCozvi07w0E+XDJIE9AA2qYQJuVR62M7ChkR1kgQJfEMqLP+QBpnuViSuGZHrxm4Y MzbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782934374; x=1783539174; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=h0seWmgtqawiMKFH9SI3lPf42xnOLUsMbO7m+HkPiqQ=; b=E7D/DMNhQOkNmS92ilBlcNhe1pQpAPt7/LgFbKLR1P/2oZChAJgzvXIEqm3dPHkJM+ gGe6n2ZzFcjU7fKbcakk0iTOW+PwSWAtnbLs72J9UuZRKspByRPLftF1legwe3wg5Ayn TXEY501GDTMneVnxuM1IoA4FyjjZoU/N68jjKxIUNx4G7Il1r6Tp/1tp/rKffzJJtLpo d8eHl7tDvADUZAFIGXauS2poIRIR++8Q0P352TPKAFYOPfd0I0NfuZYznzSINZehdoaf 2HjjJgjIrdBCnjtVIwONILoTN4ExMiWId29/MkGlm6fMTHWw9tD9RheYfvRqrtvDuNsy gn3g== X-Forwarded-Encrypted: i=1; AHgh+Ro/2P4avyv20atdl8yD32k+ZZYSbzIqiYN2EJ3vPdmbQhPPUJujmmV9H9lsmEYDY690AJfO1WiI3mY6gKo=@vger.kernel.org X-Gm-Message-State: AOJu0YxCfG7w0/RISRcuQ6AY/eR4/iaW+dIw+A0Zwq2UqhWBLTHNhf5X zYHN4EFb0GGvgap0FQKxOmii6NozQzc/rW6c0Mnep/JnyCgIfewZy+Y3WQoUVmejHPwyidb700i 9XiqUdA== X-Received: from plll3.prod.google.com ([2002:a17:902:d043:b0:2c7:ed23:f47c]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d4cd:b0:2c8:1f58:55dd with SMTP id d9443c01a7336-2ca7e710f10mr31641185ad.9.1782934373673; Wed, 01 Jul 2026 12:32:53 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 1 Jul 2026 12:31:43 -0700 In-Reply-To: <20260701193212.749551-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-hyperv@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260701193212.749551-1-seanjc@google.com> X-Mailer: git-send-email 2.55.0.rc0.799.gd6f94ed593-goog Message-ID: <20260701193212.749551-23-seanjc@google.com> Subject: [PATCH v5 22/51] x86/kvm: Mark TSC as reliable when it's constant and nonstop From: Sean Christopherson To: Jonathan Corbet , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Rick Edgecombe , Sean Christopherson , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Long Li , Ajay Kaher , Alexey Makhalov , Jan Kiszka , Andy Lutomirski , Peter Zijlstra , Juergen Gross , Daniel Lezcano , John Stultz Cc: Shuah Khan , "H. Peter Anvin" , Vitaly Kuznetsov , Broadcom internal kernel review list , Boris Ostrovsky , Stephen Boyd , linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, Tom Lendacky , Nikunj A Dadhania , David Woodhouse , David Woodhouse , Michael Kelley , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Mark the TSC as reliable if the hypervisor (KVM) has enumerated the TSC as constant and nonstop. Like most (all?) virtualization setups, any secondary clocksource that's used as a watchdog is guaranteed to be less reliable than a constant, nonstop TSC, as all clocksources the kernel uses as a watchdog are all but guaranteed to be emulated when running as a KVM guest. I.e. any observed discrepancies between the TSC and watchdog will be due to jitter in the watchdog. This is especially true for KVM, as the watchdog clocksource is usually emulated in host userspace, i.e. reading the clock incurs a roundtrip cost of thousands of cycles. Marking the TSC reliable addresses a flaw where the TSC will occasionally be marked unstable if the host is under moderate/heavy load. Reviewed-by: David Woodhouse Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_para.h | 2 +- arch/x86/kernel/kvm.c | 12 +++++++++++- arch/x86/kernel/kvmclock.c | 14 +++++--------- 3 files changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 4a47c16e2df8..4a49fc286b4c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -118,7 +118,7 @@ static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1, } #ifdef CONFIG_KVM_GUEST -void kvmclock_init(void); +void kvmclock_init(bool prefer_tsc); void kvmclock_disable(void); bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 909d3e5e5bcd..1cef54e1e7d9 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -978,6 +978,7 @@ static void __init kvm_init_platform(void) .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32, }; u32 timing_info_leaf; + bool tsc_is_reliable; if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) { @@ -1040,7 +1041,16 @@ static void __init kvm_init_platform(void) } } - kvmclock_init(); + /* + * If the TSC counts at a constant frequency across P/T states and in + * deep C-states, treat the TSC reliable, as guaranteed by KVM. + */ + tsc_is_reliable = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC); + if (tsc_is_reliable) + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); + + kvmclock_init(tsc_is_reliable); x86_platform.apic_post_init = kvm_apic_init; /* diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index f55d0305d1f3..2e7ab54cb9dc 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -307,7 +307,7 @@ static int kvmclock_setup_percpu(unsigned int cpu) return p ? 0 : -ENOMEM; } -void __init kvmclock_init(void) +void __init kvmclock_init(bool prefer_tsc) { u8 flags; @@ -356,15 +356,11 @@ void __init kvmclock_init(void) kvm_get_preset_lpj(); /* - * X86_FEATURE_NONSTOP_TSC is TSC runs at constant rate - * with P/T states and does not stop in deep C-states. - * - * Invariant TSC exposed by host means kvmclock is not necessary: - * can use TSC as clocksource. - * + * If TSC is preferred over kvmlock, drop kvmclock's rating so that TSC + * is chosen as the clocksource (but still register kvmclock in case + * the kernel doesn't want to use TSC for whatever reason). */ - if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && - boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) + if (prefer_tsc) kvm_clock.rating = 299; clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); -- 2.55.0.rc0.799.gd6f94ed593-goog