From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 259E23F7869 for ; Fri, 29 May 2026 14:45:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780065915; cv=none; b=tZIh8nMdCGrTKfXFIlA4vO9Ov6NWqiCCHhAaaMroRgxfD7MsvYcXJlpv2n/WZpWGxncutZ7hjtP9xNGOq5OGsgNSLzeikbU2vz7csRfP+m/PoB2oR7YfKyn+hr2HtYux6GwhEWx+zvHk9IO68kEK+Ct4+MiPcJ3GCOYuYoG+7dE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780065915; c=relaxed/simple; bh=aXPPNHBYalh8o5ZMSQ8rkjd5K0rcgTOVXnv3ZNuU6kE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iq2g7c8oN5KKlQTrSoHN0dnWRVHpGjbvs2b3xnJBfkUZ6sqW3w/fHIT9B813QxjgklRNNYjolw2FjtO23k+WIOEwWyWg/iH0toASXatsrcnkSmdJTmrtr5pl+2KwKRjbsRJv3RUFyLywp1SD06f08xS07op4pUvdfXA1Cq6IAQk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Urk/4CTx; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Urk/4CTx" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2bf08c2a24bso22524315ad.2 for ; Fri, 29 May 2026 07:45:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1780065912; x=1780670712; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=3J4tSt2+/CTMv4tcKt5oCpoZW7w+XDXf08twJvjnqkg=; b=Urk/4CTxpWk5+L+6SSQSihk5iu/EwH8Oo4fvMcriy7m1kM0YLpPs4HUpGSGdaaqHVj ar63yB4Ad3OqQTDf+jRs55Vw2UE7bIYhd/CODTBEg5ZHi8EVlSZpdanYZZUas0AbhvXZ /HWOQqELq5FsoIURyQXwihbPDVtRvqKzXt9lOyivzCFwOCJ8C/hhCH/am9bzppsGQfzx LJzYGLNekS31VRuU04Ub2i094A6gR7q/i/QImwA0AG/qVl+8CrDMLweX+MhoLX8l8u09 YWA1LYyPSGW8IdhG+rRgaNJsl4CbIuR1X2gwTh/vBpgPY60sCZEG4lQOPjOP3M6oIUeU tYGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780065912; x=1780670712; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3J4tSt2+/CTMv4tcKt5oCpoZW7w+XDXf08twJvjnqkg=; b=Uxd/UxzpKzKrz6ptfIpOckAF/rQ7VVMNlqOiiyxLAjC49K0c2CygdTaNsjF5nBHqAH UE2TFu2GsR7MATeDT/WYQ2NEdNIJFxzzuhbVTdP7TdHwBa8QA95JADKlze2xcakI3Uo2 3jF33zakFWwUGGz2DrwHkuBQAM3ZrYCJIrfZp9PsjKcxRGd/kcvXuOnhpTTVCvQO2FTy g9TXbs9RTnLNfyGx0eul2qITKAs2RKhcZUpce6r+HvBYnVZ+QquBDyAncPtvlwXPcUqS X5t2JPz+/iJ2J4wpggbilJfMqDxpxQW4kBD0C3yfURc2IyybxDadXkN9eNGE/dTqo6hx ZTXw== X-Forwarded-Encrypted: i=1; AFNElJ+wkISSTl0enPVpal5s0PGsGGE74q5dQuT/RQ00CQcGm8udz9H69Oh4C8Wy2zJy/c7GRcrDRQ4EcA6RQoQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxAVQyK9RtV15fYHT4A6LfypHC1HLrgoB5bub+xq6hwOYYYbIP5 qZ8puLfuM66G7+1Qy00KnhYu645dLIqgpYAYsnyLqyJ9jK6sYuwUfEDYmAdi93uIWf+S8soh4Ih C+4CQ6A== X-Received: from plbkk16.prod.google.com ([2002:a17:903:710:b0:2bf:222e:c947]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:15cf:b0:2b0:ac1e:9737 with SMTP id d9443c01a7336-2bf367c2eb0mr1549335ad.12.1780065912148; Fri, 29 May 2026 07:45:12 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 29 May 2026 07:44:04 -0700 In-Reply-To: <20260529144435.704127-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-hyperv@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260529144435.704127-1-seanjc@google.com> X-Mailer: git-send-email 2.54.0.823.g6e5bcc1fc9-goog Message-ID: <20260529144435.704127-18-seanjc@google.com> Subject: [PATCH v4 17/47] x86/kvm: Mark TSC as reliable when it's constant and nonstop From: Sean Christopherson To: Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Long Li , Ajay Kaher , Alexey Makhalov , Jan Kiszka , Andy Lutomirski , Peter Zijlstra , Juergen Gross , Daniel Lezcano , John Stultz Cc: "H. Peter Anvin" , Rick Edgecombe , Vitaly Kuznetsov , Broadcom internal kernel review list , Boris Ostrovsky , Stephen Boyd , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, David Woodhouse , Tom Lendacky , Nikunj A Dadhania , David Woodhouse , Michael Kelley , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Mark the TSC as reliable if the hypervisor (KVM) has enumerated the TSC as constant and nonstop, and the admin hasn't explicitly marked the TSC as unstable. Like most (all?) virtualization setups, any secondary clocksource that's used as a watchdog is guaranteed to be less reliable than a constant, nonstop TSC, as all clocksources the kernel uses as a watchdog are all but guaranteed to be emulated when running as a KVM guest. I.e. any observed discrepancies between the TSC and watchdog will be due to jitter in the watchdog. This is especially true for KVM, as the watchdog clocksource is usually emulated in host userspace, i.e. reading the clock incurs a roundtrip cost of thousands of cycles. Marking the TSC reliable addresses a flaw where the TSC will occasionally be marked unstable if the host is under moderate/heavy load. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_para.h | 2 +- arch/x86/kernel/kvm.c | 16 +++++++++++++++- arch/x86/kernel/kvmclock.c | 15 +++++---------- 3 files changed, 21 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 4a47c16e2df8..4a49fc286b4c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -118,7 +118,7 @@ static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1, } #ifdef CONFIG_KVM_GUEST -void kvmclock_init(void); +void kvmclock_init(bool prefer_tsc); void kvmclock_disable(void); bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 909d3e5e5bcd..4fe9c69bf40b 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -978,6 +978,7 @@ static void __init kvm_init_platform(void) .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32, }; u32 timing_info_leaf; + bool tsc_is_reliable; if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) { @@ -1040,7 +1041,20 @@ static void __init kvm_init_platform(void) } } - kvmclock_init(); + /* + * If the TSC counts at a constant frequency across P/T states, counts + * in deep C-states, and the TSC hasn't been marked unstable, treat the + * TSC reliable, as guaranteed by KVM. Note, the TSC unstable check + * exists purely to honor the TSC being marked unstable via command + * line, any runtime detection of an unstable will happen after this. + */ + tsc_is_reliable = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && + !check_tsc_unstable(); + if (tsc_is_reliable) + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); + + kvmclock_init(tsc_is_reliable); x86_platform.apic_post_init = kvm_apic_init; /* diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 404f60741aa8..69a15fbfb779 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -285,7 +285,7 @@ static int kvmclock_setup_percpu(unsigned int cpu) return p ? 0 : -ENOMEM; } -void __init kvmclock_init(void) +void __init kvmclock_init(bool prefer_tsc) { u8 flags; @@ -334,16 +334,11 @@ void __init kvmclock_init(void) kvm_get_preset_lpj(); /* - * X86_FEATURE_NONSTOP_TSC is TSC runs at constant rate - * with P/T states and does not stop in deep C-states. - * - * Invariant TSC exposed by host means kvmclock is not necessary: - * can use TSC as clocksource. - * + * If TSC is preferred over kvmlock, drop kvmclock's rating so that TSC + * is chosen as the clocksource (but still register kvmclock in case + * the kernel doesn't want to use TSC for whatever reason). */ - if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && - boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && - !check_tsc_unstable()) + if (prefer_tsc) kvm_clock.rating = 299; clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); -- 2.54.0.823.g6e5bcc1fc9-goog